Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lex.name] p3 contradicts to [usrlit.suffix]p1 #5187

Open
xmh0511 opened this issue Jan 6, 2022 · 12 comments
Open

[lex.name] p3 contradicts to [usrlit.suffix]p1 #5187

xmh0511 opened this issue Jan 6, 2022 · 12 comments

Comments

@xmh0511
Copy link
Contributor

xmh0511 commented Jan 6, 2022

[usrlit.suffix]p1 says

Literal suffix identifiers that do not start with an underscore are reserved for future standardization.

Some literal suffix identifiers are reserved for future standardization; see [usrlit.suffix]. A declaration whose literal-operator-id uses such a literal suffix identifier is ill-formed, no diagnostic required.

That means the identifier in a literal-operator-id at least to have an initial underscore is guaranteed to be valid. However, [lex.name] p3 says

In addition, some identifiers are reserved for use by C++ implementations and shall not be used otherwise; no diagnostic is required.

  • Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
  • Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

what the form of an identifier stated by the above two bullets can be a valid form in a literal-operator-id, however, as the emphasized wording, [lex.name] p3 explicitly says these identifiers shall not be used otherwise. These two provisions seem to conflict with each other.

Proposal:

change [lex.name] p3 to that

In addition, some identifiers are reserved for use by C++ implementations and shall not be used otherwise unless as the ud-suffixs; no diagnostic is required.

@jensmaurer
Copy link
Member

No, that's not the right direction.

That means the identifier in a literal-operator-id at least to have an initial underscore is guaranteed to be valid.

No, as you correctly observed, there are additional constraints on the spelling of an identifier.

Instead, the combined effect of these rules means that you need to use the "operator user-defined-string-literal" grammar choice for literal-operator-id, because that way you avoid writing a forbidden identifier.

(Effectively, you need to write operator ""_suffix without a space before the underscore.)

@jensmaurer
Copy link
Member

The example in over.literal p8 might also be helpful, and directly addresses this case.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 7, 2022

@jensmaurer The concern in this issue is that operator user-defined-string-literal has the form:

string-literal ud-suffix

Where the ultimate form of the ud-suffix is

ud-suffix:

identifier

The identifier has a cross-reference to [lex.name]. [lex.name] p3.1 sounds like it uniformly applies to all identifiers to which a grammar component will refer. Although, the intent of [lex.name] p3 would say it only applies to these identifiers converted from preprocessing identifier tokens, [lex.name] p3 does not explicitly say such things in the current utterances. This is the obscure point here. Although [over.literal] p8 has a couple of contrast examples

double operator""_Bq(long double);                  // OK, does not use the reserved identifier _­Bq ([lex.name])
double operator"" _Bq(long double);                 // ill-formed, no diagnostic required:
                                                    // uses the reserved identifier _­Bq ([lex.name])

The comment also refers to [lex.name], but we cannot be aware of the difference when we just purely compare the identifier in an operator string-literal identifier and the identifier in an operator user-defined-string-literal(where the ud-suffix is identifier).

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 7, 2022

Also, we can see that [lex.name] make the misunderstanding, https://stackoverflow.com/questions/59180353/is-every-normal-use-of-user-defined-literals-undefined-behavior.

Incidentally, [dcl.fct.def#general-8]

struct S {
  S() : s(__func__) { }             // OK
  const char* s;
};

Isn't __func__ not used by implementations? Why is it ok? Even we can use the identifiers begin with __ defined in [cpp], but we didn't give the privilege in [lex.name]. In general, I think [lex.name] less or more should be clarified to cover these cases.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 7, 2022

To eliminate these obscure points, [lex.name] p3 may be changed to

In addition, the identifiers converted from preprocessing identifier tokens in translation phase 7 that have the following forms are reserved for use by C++ implementations, and except otherwise specified, they shall not be used; no diagnostic is required.

  • Each identifier that contains a double underscore __ or begins with an underscore followed by an uppercase letter is reserved to the implementation for any use.
  • Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.
double operator""_Bq(long double);

Although, the ud-suffix has the form of an identifier, however, it's not converted to an identifier token. Instead, in this declaration double operator "" _Bq(long double); , the declarator-id consists of string-literal and an identifier token.

@jensmaurer jensmaurer reopened this Jan 7, 2022
@jensmaurer
Copy link
Member

I would say that the introduction of __func__ as an implementation-provided identifier [dcl.fct.def.general] is more specific than the general prohibition in [lex.name] and overrides that statement. We could add a note, though.

The fundamental problem is that we don't differentiate between lexing (char-by-char) things and higher-level tokens sufficiently clearly. The ud-suffix is a lexing thing; it never forms a preprocessing-token on its own, but is part of a pp-number or similar. It's unfortunate that ud-suffix resolves to identifier; there should be a pp-identifier lexing production instead.

Note that we want the underscore prohibition to apply to macro names as well, so we can't defer the checking to phase 7.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 7, 2022

@jensmaurer

Note that we want the underscore prohibition to apply to macro names as well, so we can't defer the checking to phase 7.

If we prepare to make [lex.name] p3 apply to all phases in [lex.phases], I would say the most identifiers introduced in [cpp] clause would violate [lex.name]. Consider this example

# if __has_include(<iostream>)
# include <iostream>
#endif 

The use of the identifier __has_include will violate [lex.name], However, I think it's just a normal code used in practice. IIUC, I think [lex.name] does not intend to prohibit the "use" of such identifiers instead it prohibits us "introduce"(i.e. declare the higher-level tokens, or define the macro name) these identifiers?

@languagelawyer
Copy link
Contributor

languagelawyer commented Jan 8, 2022

(Effectively, you need to write operator ""_suffix without a space before the underscore.)
The example in over.literal p8 might also be helpful, and directly addresses this case.

@jensmaurer and it says

void operator "" _km(long double);                  // OK [space — note by me]
float operator ""_e(const char*);                   // OK [mo space — note by me]

So one does not have to write operator "" _suffix with no space (at global namespace) to not to violate [lex.name]/3; either is fine. Or I rather should say: did not have, before P1787 changed the definition of name.

[lex.name]/3 says:

Each identifier that begins with an underscore is reserved to the implementation for use as a name in the global namespace.

"Use as a name" is a key thing here. Pre-P1787 definition of name says:

A name is a use of an identifier ([lex.name]), operator-function-id ([over.oper]), literal-operator-id ([over.literal]), conversion-function-id ([class.conv.fct]), or template-id ([temp.names]) that denotes an entity

Only the whole literal-operator-id operator "" _suffix (in an expression) denotes an entity, its identifier _suffix doesn't, so it is not "used as a name" and thus such usage is not reserved for an implementation. Neither declaration of such literal-operator-id at the global namespaces interferes with an implementation ability to define and use-as-a-name a function or variable named _suffix.

@jensmaurer
Copy link
Member

Good point, but this doesn't change the overall situation.

There are two bullets in [lex.name] p3:

  • p3.2 is the _suffix case, which is actually fine, because _suffix doesn't denote an entity here.
  • However, there is also [lex.name] p3.1, which covers _Suffix and __suffix. Both are reserved "for any use", which includes macros and the literal-operator-id situation.

Again, the idea is that you're allowed to use _Suffix if you omit the space when writing operator""_Suffix.

@languagelawyer
Copy link
Contributor

Yep, I was speaking only about the second bullet. For it, with the help of definitions and [over.literal]/8 examples, it is sort of clear that it

  1. works at the level of names, not just preprocessor tokens (or something even lower)
  2. is only about identifiers which are direct productions of an unqualified-id

(I think the second bullet subsumes the first one)

That the first bullet refers to identifier s as preprocessor tokens, can only be gotten from [over.literal]/8 examples:

double operator""_Bq(long double);                  // OK, does not use the reserved identifier _­Bq ([lex.name])
double operator"" _Bq(long double);                 // ill-formed, no diagnostic required:
                                                    // uses the reserved identifier _­Bq ([lex.name])

which is a suboptimal situation.

Another issue is that P1787 removed "that denotes an entity" from the definition of name and it looks like that now any identifier is a name. I have discussed this with @opensdh and IIUC this is on the P1787 issue list. So, I guess, it is up to CWG to decide (and specify?), whether the second bullet only means identifier s which are direct descendants of unqualified-id.

@opensdh
Copy link
Contributor

opensdh commented Jan 8, 2022

It is my intent to change [lex.name]/3.2 to avoid "use as a name" entirely rather than to change the definition of name. The concerns about /3.1 and the different phases of translation are their own matter.

@frederick-vs-ja
Copy link
Contributor

Should have been fixed by #6121.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants