Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cpp.stringize] Whitespace is not a preprocessing token #4766

Open
xmh0511 opened this issue Jul 21, 2021 · 6 comments
Open

[cpp.stringize] Whitespace is not a preprocessing token #4766

xmh0511 opened this issue Jul 21, 2021 · 6 comments
Labels
cwg Issue must be reviewed by CWG.

Comments

@xmh0511
Copy link
Contributor

xmh0511 commented Jul 21, 2021

Consider [cpp.stringize]

Let the stringizing argument be the preprocessing token sequence for the corresponding argument with placemarker tokens removed. Each occurrence of whitespace between the stringizing argument's preprocessing tokens becomes a single space character in the character string literal.

Since whitespace is not a preprocessing token, the preprocessing token sequence should only consist of each preprocessing token that appears in the corresponding argument(i.e, a stringizing-argument does not contain whitespaces). This point is also highlighted in the answer of this issue
https://stackoverflow.com/a/37462188/11796722

@xmh0511 xmh0511 changed the title Whitespace is not a preprocessing token [cpp.stringize] Whitespace is not a preprocessing token Jul 21, 2021
@jensmaurer jensmaurer added the decision-required A decision of the editorial group (or the Project Editor) is required. label Jul 21, 2021
@jensmaurer
Copy link
Member

I agree that talking about a preprocessing token sequence already disregards whitespace (it's a sequence of tokens, after all), so it makes little sense to talk about the whitespace between preprocessing tokens in the sequence.

What do you envision should be done here editorially?

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jul 22, 2021

Consider this special case

#define fun(x)  #x
fun(&123) 

fun(&123) is expanded to "&123", although the sequence comprises two preprocessing-tokens. Hence, the whitespace is significant here. Since the definition of the stringizing arguments depend on that of arguments of a function-like macro where the arguments list is defined as that

The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms the list of arguments for the function-like macro.

It still does not comprise any whitespace. The concept "sequence of characters" is indispensable here. So, I would like to redefine what are the arguments for a function-like macro.

The sequence of preprocessing tokens bounded by the outside-most matching parentheses together with inserting a space into two adjacent preprocessing tokens if there is at least a whitespace between the two preprocessing tokens in the sequence of characters from which the sequence is taken, which forms the sequence of characters of the list of arguments for the function-like macro. The sequences of characters of individual arguments within the list are separated by comma preprocessing tokens, remove whitespace before the first preprocessing token and after the last preprocessing token in the sequence of characters of an argument forms an argument. if the sequence of characters of an argument consists of no preprocessing tokens, the argument is replaced by a placemarker preprocessing token instead.

Since the effect of the whitespace is used to separate preprocessing tokens, whatever the number of whitespace between them in the original sequence of characters, it seems do not contribute to any case defined in [cpp], except that whether the existence of whitespace between them impacts the result of stringizing. In other words, the effect of multi whitespace is equivalent to that of single whitespace.

The original relevant wording in [cpp.replace#cpp.stringize-2] could be modified to that

Let the stringizing argument be the corresponding argument with placemarker tokens removed. Each occurrence of whitespace between the stringizing argument's preprocessing tokens becomes a single space character in the character string literal. Whitespace before the first preprocessing token and after the last preprocessing token comprising the stringizing argument is deleted.

For instance

#define test(x,y) #x#y

For macro call

test(   &123               ,         c )
its sequence of characters of the list of the arguments is &123 , c where the arguments are &123 and c.

For macro call

test(   &      123,     c  )
its sequence of characters of the list of the arguments is & 123, c where the arguments are & 123 and c.


Simultaneously, [cpp.concat#2] could be modified to that

If, in the replacement list of a function-like macro, a parameter is immediately preceded or followed by a ## preprocessing token, the parameter is replaced by the corresponding argument 's preprocessing token sequence; however, if an argument consists of no preprocessing tokens, the parameter is replaced by a placemarker preprocessing token instead.

#define concat(x,y) x##y
concat( 1            2   ,      3        )

Since the arguments are 1 2 and 3, the result is 1 23, the whitespace in the first argument retains in the result. If we say the parameter is replaced by the corresponding argument's preprocessing token sequence, the result should be 123, it's not compatible with major implementations.

The second deleted wording sounds like

#define concat2(x,y) x ## void y
concat(   1   2,    4)

the parameter y is not replaced by the corresponding argument. https://godbolt.org/z/5hr76K5M6

@jensmaurer
Copy link
Member

jensmaurer commented Jul 22, 2021

I agree that multiple whitespace is conflated to a single space in all cases.

However, your examples (in particular the second one) show that the definition and formation of "preprocessing token sequence" itself is broken, and a fix should be applied there instead of to # and ## in isolation. Obviously, a "preprocessing token sequence" is intended to consist of preprocessing tokens, optionally separated by a single space. (Of course, the presence or absence of space only matters for # and ##.)

Given that any work in this area appears to change the specified meaning of ##, this looks non-editorial to me.

@jensmaurer
Copy link
Member

Regarding [cpp.concat] p2, I think the specification we have is sufficient: p2 says that the macro argument preprocessing token sequence is injected, and p3 says that only the single preprocessing tokens immediately preceding and following ## are spliced into a single preprocessing token. The rest of the tokens arriving by macro argument substitution survive unharmed; that's why you see "1 23" in the output (first example for cpp.concat above).

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jul 22, 2021

Yes, you're right that [cpp.replace#cpp.subst-1] has covered the second case for [cpp.concat] above.

The meaning of the sequence of preprocessing tokens is clear itself(namely, a sequence that consists of only valid preprocessing tokens ). The problem here is the intent definition for arguments has exceeded the ability of a sequence of preprocessing tokens since an argument can comprise the space for separating each preprocessing token. Hence, I think that we redefine the argument for the function-like macro to make the wording match the intent is reasonable.

Yes, I know that this change has exceeded the extent of an editorial. Maybe, a more simple way is that

The argument list for the function-like macro results from inserting a single space between two adjacent preprocessing tokens in the sequence of preprocessing tokens bounded by the outside-most matching parentheses if they were separated in the invocation of the macro.

The individual arguments are separated by comma preprocessing tokens in the argument list, but comma preprocessing tokens between matching inner parentheses do not separate arguments. Whitespace before the first preprocessing token and after the last preprocessing token comprising the argument is deleted when forming the argument(is identified).

Consider the phase [lex.phases#1.3], "The source file is decomposed into preprocessing tokens ([lex.pptoken]) and sequences of whitespace characters (including comments).", the wording "sequence of characters" is not suitable in [cpp].

@jensmaurer
Copy link
Member

jensmaurer commented Jul 22, 2021

Asked CWG for guidance: http://lists.isocpp.org/core/2021/07/11276.php

@jensmaurer jensmaurer added cwg Issue must be reviewed by CWG. and removed decision-required A decision of the editorial group (or the Project Editor) is required. labels Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cwg Issue must be reviewed by CWG.
Projects
None yet
Development

No branches or pull requests

2 participants