This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 114a. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2024-04-18


933. 32-bit UCNs with 16-bit wchar_t

Section: 5.13.3  [lex.ccon]     Status: CD2     Submitter: Alisdair Meredith     Date: 7 July, 2009

[Voted into WP at October, 2009 meeting.]

According to 5.13.3 [lex.ccon] paragraph 2,

A character literal that begins with the letter L, such as L'x', is a wide-character literal. A wide-character literal has type wchar_t. The value of a wide-character literal containing a single c-char has value equal to the numerical value of the encoding of the c-char in the execution wide-character set.

A c-char that is a universal character name might, when translated to the execution character set, result in a multi-character sequence that is larger than can be represented in a wchar_t. There is wording that prevents this in char16_t literals, but not for wchar_t literals. This seems undesirable.

Proposed resolution (July, 2009):

  1. Change 5.13.3 [lex.ccon] paragraph 2 as follows:

  2. ...The value of a wide-character literal containing a single c-char has value equal to the numerical value of the encoding of the c-char in the execution wide-character set, unless the c-char has no representation in the execution wide-character set, in which case the value is implementation-defined. [Note: The type wchar_t is able to represent all members of the execution wide-character set, see 6.8.2 [basic.fundamental]. —end note]. The value of a wide-character literal containing multiple c-chars is implementation-defined.
  3. Change 5.13.3 [lex.ccon] paragraph 5 as follows:

  4. A universal-character-name is translated to the encoding, in the appropriate execution character set, of the character named...