You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A code unit is an integer value of character type ([basic.fundamental]). Characters in a character-literal other than a multicharacter or non-encodable character literal or in a string-literal are encoded as a sequence of one or more code units, as determined by the encoding-prefix ([lex.ccon], [lex.string]);
What's the concrete character type? which determines the character type? We just say the characters will be encoded as a sequence of one or more code units, in other words, a sequence of integer values of the character type. The clear clarification of the character type is significant. Consider this example:
auto c = 'ʉ'.
The Unicode code point value of the character ʉ is 289. [lex.ccon] p1 just states
A non-encodable character literal is a character-literal whose c-char-sequence consists of a single c-char that is not a numeric-escape-sequence and that specifies a character that either lacks representation in the literal's associated character encoding or that cannot be encoded as a single code unit.
So, whether it is a non-encodable character literal depends on:
lacks representation in the literal's associated character encoding
cannot be encoded as a single code unit
Assume that the first bullet is always false in a circumstance. So, whether ʉ is a non-encodable character literal depends on the range a code unit can represent, which means the representable values for the character type. we didn't explicitly specify the character type for the code unit of a different kind of character-literal or string-literal. Although, it is implied by the Type in the corresponding table.
Should we improve [lex.charset] p5 to make that meaning to be clearer?
A code unit is an integer value of character type ([basic.fundamental]). Characters in a character-literal other than a multicharacter or non-encodable character literal or in a string-literal are encoded as a sequence of one or more code units, as determined by the encoding-prefix ([lex.ccon], [lex.string]); where the character type of a code unit is specified by the type or element type.
The text was updated successfully, but these errors were encountered:
[lex.charset] p5 just states
What's the concrete character type? which determines the character type? We just say the characters will be encoded as a sequence of one or more code units, in other words, a sequence of integer values of the character type. The clear clarification of the character type is significant. Consider this example:
The Unicode code point value of the character
ʉ
is289
. [lex.ccon] p1 just statesSo, whether it is a non-encodable character literal depends on:
Assume that the first bullet is always false in a circumstance. So, whether
ʉ
is a non-encodable character literal depends on the range a code unit can represent, which means the representable values for the character type. we didn't explicitly specify the character type for the code unit of a different kind of character-literal or string-literal. Although, it is implied by the Type in the corresponding table.Should we improve [lex.charset] p5 to make that meaning to be clearer?
The text was updated successfully, but these errors were encountered: