New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lex.charset] p2 \00NNNNNN should use placeholder #2752
Comments
Other things to fix in this vicinity:
Something like this would seem much better:
... except that the term "short identifier" does not actually appear anywhere in the latest version of the Unicode specification (https://www.unicode.org/versions/Unicode12.0.0/UnicodeStandard-12.0.pdf). I'm not sure if that's an ISO 10646 invention, but the Unicode Consortium claims that "The Unicode Standard, Version 12.0 is aligned with Amendments 1 and 2 to ISO/IEC 10646:2017", so I suspect not. |
OK, it does appear in ISO 10646. However, there are many different short identifiers defined for each character, so talking about what the short identifier for a character "is" is meaningless. We can talk about the character for which U+blah is a short identifier, though. Also, U+NNNN is not a code point; a code point is really just a number (expressed in ISO 10646 as a hexadecimal number with no prefix). |
Another problem: we say
... but what does that mean? What is "a code point in ISO/IEC 10646"? Does this mean a UCN naming an unassigned code point is ill-formed? Or does it just mean the values \U00NNNNNN for which NNNNNN is not actually a code point at all? The term "code point" is effectively defined by ISO/IEC 10646 as an integer between 0 and 10FFFF (hexdecimal, inclusive). I think our phrasing here is very unclear and confusing. What we're trying to say is something very simple:
Any mention of short identifiers appears to be unnecessary circumlocution. |
... and it gets worse. There are three kinds of code point that do not correspond to a character: surrogates, noncharacters, and reserved code points. We want to allow the second and third kind in universal-character-names, which means that UCNs do not name characters at all, they just name code points. |
The NNNNNN here and in the vicinity are placeholders, not literal characters, and thus should use \placeholder.
The text was updated successfully, but these errors were encountered: