Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lex.ppnumber] should also include user-defined-integer-literal and user-defined-floating-point-literal #5188

Open
xmh0511 opened this issue Jan 6, 2022 · 6 comments

Comments

@xmh0511
Copy link
Contributor

xmh0511 commented Jan 6, 2022

[lex.ppnumber] p1

Preprocessing number tokens lexically include all integer-literal tokens ([lex.icon]) and all floating-point-literal tokens ([lex.fcon]).

Doesn't it also include all user-defined-integer-literal tokens and all user-defined-floating-point-literal tokens? The same issue is also in [lex.ppnumber]p2.

@jensmaurer
Copy link
Member

jensmaurer commented Jan 6, 2022

Yes, pp-number also includes those user-defined literals and many more things that are not valid phase 7 tokens. It says "includes", which doesn't imply "and nothing else".

Regarding p2, the statement is correct; user-defined literal tokens cannot be successfully converted to integer-literals or floating-point literals.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 7, 2022

I would expect that [lex.ppnumber] can state the complete stuff it should have covered. Looks like:

Preprocessing number tokens lexically include all integer-literal tokens ([lex.icon]), all floating-point-literal tokens ([lex.fcon]), all user-defined-integer-literal tokens and all user-defined-floating-point-literal tokens([lex.ext]).

A preprocessing number does not have a type or a value; it acquires both after a successful conversion to one of the following:

  • an integer-literal token
  • a floating-point-literal token
  • a user-defined-integer-literal token
  • a user-defined-floating-point-literal token

@jensmaurer
Copy link
Member

If we want to improve something here, I think we should instead wholly change [lex.ppnumber] to clearly say to what things it can convert for phase 7, and avoid "include" and similar vague words.

Oh, and operator-or-punctuator should lose and, bitand, etc. because that's a parsing ambiguity with identifier.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 7, 2022

Agree. I think it's not only [lex.ppnumber] but also other preprocessing token clauses. Since [lex.phases] p7 says

Each preprocessing token is converted into a token ([lex.token]). The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

However, we never explicitly restrict what kind of preprocessing tokens could convert to the what kind of tokens. Maybe, we could define that for every preprocessing token. Such as:

a pp-number can be potentially converted to one of the following:

  • an integer-literal token
  • a floating-point-literal token
  • a user-defined-integer-literal token
  • a user-defined-floating-point-literal token

an identifier preprocessing token can be potentially converted to one of the following:

  • identifier
  • keyword

In short, we need to give the range of which tokens the preprocessing token could convert to.

@RedBeard0531
Copy link
Contributor

RedBeard0531 commented Jan 24, 2022

A while back I started working on a paper to clean up pp-number by having pp-token just directly include the relevant numeric literal tokens. Unfortunately, that breaks a non-trivial amount of real-world code that does terrible things like this:

#define HEX(tok) 0x ## tok
#define ID(tok) id_ ## tok

HEX(0b12) // 0x0b12 is a valid hex literal
ID(09) // id_09 is a valid identifier

That meant that I needed to completely redo the paper which I lost the motivation to do. I'm not trying to discourage someone from fixing pp-number (I still think it should be fixed), just warning that it is more subtle than it initially seems.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 24, 2022

IMHO, pp-number has a wider extent than the actual numeric literal token, which means the latter is a subset of the former since the former is not necessary to be a valid numeric literal token after converting to the token, such as 0xe+foo as shown in the standard. Based on this logic, I think it is not simply to clean up pp-number by having pp-token just directly include the relevant numeric literal tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants