[lex.ccon] What is the single code unit for an ordinary character literal or wide character literal? #4517

xmh0511 · 2021-02-24T09:23:27Z

As the special rules specified in [lex.ccon]#1, that is:

A non-encodable character literal is a character-literal whose c-char-sequence consists of a single c-char that is not a numeric-escape-sequence and that specifies a character that either lacks representation in the literal's associated character encoding or that cannot be encoded as a single code unit.

The Unicode standard specifies how large a code unit for UTF8, UTF16, and UTF32 respectively. Which has a similar meaning as stated in wiki Character_encoding. However, it does not state how large the code unit for the encoding of the execution (wide-)character set. So, in this case, how to determine whether a code point value for a character in an ordinary or wide character literal can be encoded as a single code unit for the corresponding kind character literal?

Is it a good idea to change the wording "cannot be encoded as a single code unit" to "cannot be represented by an object with the type of the corresponding kind character-literal"?

The text was updated successfully, but these errors were encountered:

jensmaurer · 2021-02-24T15:53:33Z

I think [basic.fundamental] p7 and p8 try to establish the relationship between the type and code unit, but this could certainly be clearer.

xmh0511 · 2021-02-25T03:14:57Z

I think [basic.fundamental] p7 and p8 try to establish the relationship between the type and code unit, but this could certainly be clearer.

Although p7 states

The values of type char can represent distinct codes for all members of the implementation's basic character set.

However, here is unclear that whether the wording "implementation's basic character set" refers to "basic source character set " or "basic execution character set". Presumably, it refers to the latter. But, as stated in [lex.charset#3]. Execution character set is a superset of a basic execution character set.

Take Execution character set as set S and take basic execution character set as set A where A⊆S

As the lex.ccon#tab:lex.ccon.literal indicates, we don't know whether an element in the absolute complement set(∁UA) of basic execution character set can be encoded in a char object. After all, the standard does not specify how to encode an execution character set except that it specifies the value 0 for the null character.

jensmaurer · 2021-03-26T23:38:48Z

This is being addressed by P2314 Character sets and encodings cplusplus/papers#998.

xmh0511 changed the title ~~what is the single code unit for a ordinary character literal~~ what is the single code unit for an ordinary character literal or wide character literal Feb 24, 2021

jensmaurer changed the title ~~what is the single code unit for an ordinary character literal or wide character literal~~ [lex.ccon] What is the single code unit for an ordinary character literal or wide character literal? Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lex.ccon] What is the single code unit for an ordinary character literal or wide character literal? #4517

[lex.ccon] What is the single code unit for an ordinary character literal or wide character literal? #4517

xmh0511 commented Feb 24, 2021 •

edited

jensmaurer commented Feb 24, 2021

xmh0511 commented Feb 25, 2021 •

edited

jensmaurer commented Mar 26, 2021 •

edited

[lex.ccon] What is the single code unit for an ordinary character literal or wide character literal? #4517

[lex.ccon] What is the single code unit for an ordinary character literal or wide character literal? #4517

Comments

xmh0511 commented Feb 24, 2021 • edited

jensmaurer commented Feb 24, 2021

xmh0511 commented Feb 25, 2021 • edited

jensmaurer commented Mar 26, 2021 • edited

xmh0511 commented Feb 24, 2021 •

edited

xmh0511 commented Feb 25, 2021 •

edited

jensmaurer commented Mar 26, 2021 •

edited