CWG Issue 2640

This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 117a. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2025-04-13

2640. Allow more characters in an n-char sequence

Section: 5.3.1 [lex.charset] Status: C++23 Submitter: US Date: 2022-11-03

P2720R0 comment US 1-028

[Accepted at the November, 2022 meeting.]

The n-char grammar term is defined to match only the Latin uppercase, Latin digit, hyphen and space characters. This results in \N{ABC} matching named-universal-character while \N{abc} does not. This leads to programs like the following being unexpectedly well-formed because the \N{abc} sequence is lexed as the preprocessing token sequence , N, {, abc, }. The expansion of macro a then leads to the token sequence being passed as an argument to macro z where it is discarded.

  #define z(x) 0
  #define a z(
  int x = a\N{abc});

Changes to make the above program ill-formed would provide two benefits:

Implementations could diagnose the \N{abc} sequence as an ill-formed named-universal-character regardless of where it appears in a program.
The \N{...} syntax space would be reserved for expansion (e.g., for extensions or future support of UAX44-LM2 loose matching schemes).

Proposed resolution (approved by CWG 2022-11-07):

Change the grammar in 5.3.1 [lex.charset] paragraph 3 as follows:

n-char:
     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
     0 1 2 3 4 5 6 7 8 9
     U+002d hyphen-minus
     U+0020 space
     any member of the translation character set except the U+007D RIGHT CURLY BRACKET or new-line character