This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 113d. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2024-03-20


578. Phase 1 replacement of characters with universal-character-names

Section: 5.2  [lex.phases]     Status: CD6     Submitter: Martin Vejnár     Date: 7 May 2006

[Accepted at the October, 2021 meeting as part of paper P2314R4.]

According to 5.2 [lex.phases] paragraph 1, in translation phase 1,

Any source file character not in the basic source character set (5.3 [lex.charset]) is replaced by the universal-character-name that designates that character.

If a character that is not in the basic character set is preceded by a backslash character, for example

    "\á"

the result is equivalent to

    "\\u00e1"

that is, a backslash character followed by the spelling of the universal-character-name. This is different from the result in C99, which accepts characters from the extended source character set without replacing them with universal-character-names.

See also issue 1335.

Additional note (February, 2022):

P2314R4 Character sets and encodings (approved in October, 2021) effected changes so that extended characters are no longer translated to UCNs in phase 1.