[lex.ppnumber] should also include user-defined-integer-literal and user-defined-floating-point-literal #5188

xmh0511 · 2022-01-06T14:41:39Z

[lex.ppnumber] p1

Preprocessing number tokens lexically include all integer-literal tokens ([lex.icon]) and all floating-point-literal tokens ([lex.fcon]).

Doesn't it also include all user-defined-integer-literal tokens and all user-defined-floating-point-literal tokens? The same issue is also in [lex.ppnumber]p2.

jensmaurer · 2022-01-06T17:57:59Z

Yes, pp-number also includes those user-defined literals and many more things that are not valid phase 7 tokens. It says "includes", which doesn't imply "and nothing else".

Regarding p2, the statement is correct; user-defined literal tokens cannot be successfully converted to integer-literals or floating-point literals.

xmh0511 · 2022-01-07T01:58:53Z

I would expect that [lex.ppnumber] can state the complete stuff it should have covered. Looks like:

Preprocessing number tokens lexically include all integer-literal tokens ([lex.icon]), all floating-point-literal tokens ([lex.fcon]), all user-defined-integer-literal tokens and all user-defined-floating-point-literal tokens([lex.ext]).

A preprocessing number does not have a type or a value; it acquires both after a successful conversion to one of the following:

an integer-literal token

a floating-point-literal token

a user-defined-integer-literal token

a user-defined-floating-point-literal token

jensmaurer · 2022-01-07T08:50:08Z

If we want to improve something here, I think we should instead wholly change [lex.ppnumber] to clearly say to what things it can convert for phase 7, and avoid "include" and similar vague words.

Oh, and operator-or-punctuator should lose and, bitand, etc. because that's a parsing ambiguity with identifier.

xmh0511 · 2022-01-07T11:23:57Z

Agree. I think it's not only [lex.ppnumber] but also other preprocessing token clauses. Since [lex.phases] p7 says

Each preprocessing token is converted into a token ([lex.token]). The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.

However, we never explicitly restrict what kind of preprocessing tokens could convert to the what kind of tokens. Maybe, we could define that for every preprocessing token. Such as:

a pp-number can be potentially converted to one of the following:

an integer-literal token

a floating-point-literal token

a user-defined-integer-literal token

a user-defined-floating-point-literal token

an identifier preprocessing token can be potentially converted to one of the following:

identifier

keyword

In short, we need to give the range of which tokens the preprocessing token could convert to.

RedBeard0531 · 2022-01-24T10:50:11Z

A while back I started working on a paper to clean up pp-number by having pp-token just directly include the relevant numeric literal tokens. Unfortunately, that breaks a non-trivial amount of real-world code that does terrible things like this:

#define HEX(tok) 0x ## tok
#define ID(tok) id_ ## tok

HEX(0b12) // 0x0b12 is a valid hex literal
ID(09) // id_09 is a valid identifier

That meant that I needed to completely redo the paper which I lost the motivation to do. I'm not trying to discourage someone from fixing pp-number (I still think it should be fixed), just warning that it is more subtle than it initially seems.

xmh0511 · 2022-01-24T13:57:30Z

IMHO, pp-number has a wider extent than the actual numeric literal token, which means the latter is a subset of the former since the former is not necessary to be a valid numeric literal token after converting to the token, such as 0xe+foo as shown in the standard. Based on this logic, I think it is not simply to clean up pp-number by having pp-token just directly include the relevant numeric literal tokens.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lex.ppnumber] should also include user-defined-integer-literal and user-defined-floating-point-literal #5188

[lex.ppnumber] should also include user-defined-integer-literal and user-defined-floating-point-literal #5188

xmh0511 commented Jan 6, 2022

jensmaurer commented Jan 6, 2022 •

edited

xmh0511 commented Jan 7, 2022

jensmaurer commented Jan 7, 2022

xmh0511 commented Jan 7, 2022

RedBeard0531 commented Jan 24, 2022 •

edited

xmh0511 commented Jan 24, 2022 •

edited

[lex.ppnumber] should also include user-defined-integer-literal and user-defined-floating-point-literal #5188

[lex.ppnumber] should also include user-defined-integer-literal and user-defined-floating-point-literal #5188

Comments

xmh0511 commented Jan 6, 2022

jensmaurer commented Jan 6, 2022 • edited

xmh0511 commented Jan 7, 2022

jensmaurer commented Jan 7, 2022

xmh0511 commented Jan 7, 2022

RedBeard0531 commented Jan 24, 2022 • edited

xmh0511 commented Jan 24, 2022 • edited

jensmaurer commented Jan 6, 2022 •

edited

RedBeard0531 commented Jan 24, 2022 •

edited

xmh0511 commented Jan 24, 2022 •

edited