C++ assert: the precedence of the expression in an assert macro - c++

In C++:
assert( std::is_same<int , int>::value ); // does not compile
assert( (std::is_same<int , int>::value) ); // compiles
Can anyone explain why?

assert is a preprocessor macro. Preprocessor macros are dumb; they don't understand templates. The preprocessor sees 10 tokens within the parentheses:
assert( std :: is_same < int , int > :: value );
It splits at the comma. It doesn't know that this is the wrong place to split at, because it doesn't understand that std::is_same<int and int>::value aren't valid C++ expressions.
The preprocessor is smart enough to not break up the contents of inner pairs of parentheses across multiple arguments. That's why adding the extra parentheses fixes the problem.

The comma is being treated as a argument separator for the macro, but parenthesis in your second case protect the arguments. We can see this by going to the draft C++ standard section 16.3 Macro replacement which says (emphasis mine):
The sequence of preprocessing tokens bounded by the outside-most
matching parentheses forms the list of arguments for the function-like
macro. The individual arguments within the list are separated by comma
preprocessing tokens, but comma preprocessing tokens between matching
inner parentheses do not separate arguments. If there are sequences of
preprocessing tokens within the list of arguments that would otherwise
act as preprocessing directives,154 the behavior is undefined
We can see that macro expansion happens before semantic analysis by going to section 2.2 Phases of translation and see that phase 4 is includes:
Preprocessing directives are executed, macro invocations are expanded,
and [...] All preprocessing directives are then deleted.
and phase 7 includes:
[...]Each preprocessing token is converted into a token. (2.7). The
resulting tokens are syntactically and semantically analyzed and
translated as a translation unit[...]
As a side note we can see the Boost includes a special macro to deal with this situation: BOOST_PP_COMMA:
The BOOST_PP_COMMA macro expands to a comma.
and says:
The preprocessor interprets commas as argument separators in macro invocations. Because of this, commas require special handling.
and an example:
BOOST_PP_IF(1, BOOST_PP_COMMA, BOOST_PP_EMPTY)() // expands to ,

Related

Does recursion in the C preprocessor abuse an inconsistency in the standard?

Consider this code:
#define MAP_OUT
#define A(x) B MAP_OUT (x)
#define B(x) A MAP_OUT (x)
A(x)
Then A(x) expands to B MAP_OUT (x), then B (x). Now take a look at the standard:
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
Does B (x) belong to "resulting preprocessing token sequence for more macro names to replace"? All compilers I have tried don't expand B (x) during a single scan, but what about the standard itself?
Does B (x) belong to "resulting preprocessing token sequence for more macro names to replace"?
No, absolutely not. Read again:
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned.
The preprocessing token sequence that results from parameter replacement in A(x) is precisely B MAP_OUT (x), nothing more, nothing less. This sequence is then scanned for more macros to replace, once. There is only one eligible macro to replace in there, MAP_OUT. Then the replacement of MAP_OUTis scanned, nothing is found, and the processing is resumed.
There is no indication whatsoever that B in B MAP_OUT (x) should be scanned twice.
You're cherry-picking. The standard requires that rescanning and replacement stops.
There are other paragraphs, with identical wording in every C++ standard since C++98 (not just the one you've quoted) that actually control the behaviour you observe.
After all parameters in the replacement list have been substituted, the resulting preprocessing token sequence is rescanned with all subsequent preprocessing tokens of the source file for more macro names to replace.
If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Further, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts
in which that macro name preprocessing token would otherwise have been replaced.
As I said, the wording is identical in every C++ standard. Only the section and para numbers change.
In C++98, the above quote is Section 16.3.4 "Rescanning and further replacement", paras 1 and 2;
In C++17 the above quote is Section 19.3.4 "Rescanning and further replacement", paras 1 and 2;
In the latest C++20 draft (at least, the latest I've accessed) the above quote is Section 15.6.4 "Rescanning and further replacement", paras 1 and 3 (there is an added para 2 with an illustrative example, not normative text).

Concatenation and the standard

According to this page "A ## operator between any two successive identifiers in the replacement-list runs parameter replacement on the two identifiers". That is, the preprocessor operator ## acts on identifiers. Microsoft's page says ", each occurrence of the token-pasting operator in token-string is removed, and the tokens preceding and following it are concatenated". That is, the preprocessor operator ## acts on tokens.
I have looked for a definition of an identifier and/or token and the most I have found is this link: "An identifier is an arbitrary long sequence of digits, underscores, lowercase and uppercase Latin letters, and Unicode characters. A valid identifier must begin with a non-digit character".
According to that definition, the following macro should not work (on two accounts):
#define PROB1(x) x##0000
#define PROB2(x,y) x##y
int PROB1(z) = PROB2( 1, 2 * 3 );
Does the standard have some rigorous definitions regarding ## and the objects it acts on? Or, is it mostly 'try and see if it works' (a.k.a. implementation defined)?
The standard is extremely precise, both about what can be concatenated, and about what a valid token is.
The en.cppreference.com page is imprecise; what are concatenated are preprocessing tokens, not identifiers. The Microsoft page is much closer to the standard, although it omits some details and fails to distinguish "preprocessing token" from "token", which are slightly different concepts.
What the standard actually says (§16.3.3/3):
For both object-like and function-like macro invocations, before the replacement list is reexamined for more macro names to replace, each instance of a ## preprocessing token in the replacement list (not from an
argument) is deleted and the preceding preprocessing token is concatenated with the following preprocessing token.…
For reference, "preprocessing token" is defined in §2.4 to be one of the following:
header-name
identifier
pp-number
character-literal
user-defined-character-literal
string-literal
user-defined-string-literal
preprocessing-op-or-punc
each non-white-space character that cannot be one of the above
Most of the time, the tokens to be combined are identifiers (and numbers), but it is quite possible to generate a multicharacter token by concatenating individual characters. (Given the last item in the list of possible preprocessor tokens, any single non-whitespace character is a preprocessor token, even if it is not a letter, digit or standard punctuation symbol.)
The result of a concatenation must be a preprocessing token:
If the result is not a valid preprocessing token, the behavior is undefined. The resulting token is available for further macro replacement.
Note that the replacement of a function-like macro's argument names with the actual arguments may result in the argument name being replaced by 0 tokens or more than one token. If that argument is used on either side of a concatenation operator:
In the case that the actual argument had zero tokens, nothing is concatenated. (The Microsoft page implies that the concatenation operator will concatenate whatever tokens end up preceding and following it.)
In the case that the actual argument has more than one token, the one which is concatenated is the one which precedes or follows the concatenation operator.
As an example of the last case, remember that -42 is two preprocessing tokens (and two tokens, after preprocessing): - and 42. Consequently, although you can concatenate the pp-number 42E with the pp-number 3, resulting in the pp-number (and valid token) 42E3, you cannot create the token 42E-3 from 42E and -3, because only the - would be concatenated, resulting in two pp-number tokens: 42E-3. (The first of these is a valid preprocessing token but it cannot be converted into a valid token, so a tokenization error will be reported.)
In a sequence of concatenations:
#define concat3(a,b,c) a ## b ## c
the order of concatenations is not defined. So it is unspecified whether concat3(42E,-,3) is valid; if the first two tokens are concatenated first, all is well, but if the second two are concatenated first, the result is not a valid preprocessing token. On the other hand, concat3(.,.,.) must be an error, because .. is not a valid token, and so neither a##b nor b##c can be processed. So it is impossible to produce the token ... with concatenation.

What do you call or term for macro with ##

For example I have,
#define (name) ##name
What is the term for ##name?
Thanks!
Concat operator i believe. Also called token pasting or token concatenation operator. The ## preprocessing operator performs token pasting. When a macro is expanded, the two tokens on either side of each ## operator are combined into a single token, which then replaces the ## and the two original tokens in the macro expansion. Usually both will be identifiers, or one will be an identifier and the other a preprocessing number. When pasted, they make a longer identifier.
See here - ## Operator (Macro Concatenation)

Unclear #define syntax in cpp using `\` sign

#define is_module_error(_module_,_error_) \
((_module_##_errors<_error_)&&(_error_<_module_##_errors_end))
#define is_general_error(_error_) is_module_error(general,_error_)
#define is_network_error(_error_) is_module_error(network,_error_)
Can someone please explain to me what does the first define means?
How is is evaluated?
I don't understand what's the \ sign mean here?
The backslash is the line continuation symbol used in preprocessor directives. It tells the preprocessor to merge the following line with the current one. In other words it escapes the hard newline at the end of the line.
In the specific example, it tells the preprocessor that
#define is_module_error(_module_,_error_) \
((_module_##_errors<_error_)&&(_error_<_module_##_errors_end))
should be interpreted as:
#define is_module_error(_module_,_error_) ((_module_##_errors<_error_)&&(_error_<_module_##_errors_end))
The relevant quote from the C99 draft standard (N1256) is the following:
6.10 Preprocessing directives
[...]
Description
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the
following constraints: The first token in the sequence is a # preprocessing token that (at
the start of translation phase 4) is either the first character in the source file (optionally
after white space containing no new-line characters) or that follows white space
containing at least one new-line character. The last token in the sequence is the first new-line character that follows the first token in the sequence. A new-line character ends
the preprocessing directive even if it occurs within what would otherwise be an invocation of a function-like macro.
Emphasis on the relevant sentence is mine.
If you are also unsure of what the ## symbol means, it is the token-pasting operator. From the already cited C99 document (emphasis mine):
6.10.3.3 The ## operator
[...]
Semantics
If, in the replacement list of a function-like macro, a parameter is immediately preceded
or followed by a ## preprocessing token, the parameter is replaced by the corresponding
argument’s preprocessing token sequence; however, if an argument consists of no preprocessing tokens, the parameter is replaced by a placemarker preprocessing token instead.
In the case at hand this means that, for example, wherever the preprocessor finds the following macro "call":
is_module_error(dangerous_module,blow_up_error)
it will replace it with this code fragment:
((dangerous_module_errors<blow_up_error)&&(blow_up_error<dangerous_module_errors_end))

Spaces between #define arguments

I was confused with the concept that we shouldn't have spaces in #define arguments.
I think the only restriction is that we shouldn't have spaces between the macro name and the immediate ( bracket. Am I correct? Am I not supposed to put spaces even inside the () brackets?
Is the below notation correct
#define AVERAGE_NUMS( min_val, max_val ) ((min_val + max_val) / 2)
Guys, my above #define C++ statement is just an example. I am actually concerned about spaces while using #define. Anyway Thanks for your answers.
Yes, it's correct. But you should enclose each of the arguments in brackets
#define AVERAGE_NUMS( min_val, max_val ) (((min_val) + (max_val)) / 2)
to avoid operator precedence issues.
In C, when using macros for small computations like that, you should always put each parameter in the expression in brackets, like:
#define AVERAGE_NUMS(min_val, max_val) (((min_val) + (max_val)) / 2)
The spaces within the macro's argument list are optional, so they don't "hurt". As you already said, a space before the opening bracket changes the meaning of the macro: It will then take no parameter and replaces its occurrence with both what you wanted to be the parameter list as well as the "expression".
If you don't put the arguments in the expression in brackets, you can encounter strange results because the operator precedence might change your expression. This is because macros are just text replacement rules and thus don't respect anything of the programming language (C).
A small example where it fails (This is a strange example, I admit, but there are other functions you'd write as macros where "normal" usages fail):
AVERAGE_NUMS(1 << x, y)
Using your macro definition, this will expand to
((1 << x + y) / 2) // Operator precedence: 1 << (x + y)
But using the macro definition from above, it will expand to
(((1 << x) + (y)) / 2) // Operator precedence: (1 << x) + y
In C++, I strongly advise you not to use macros unless you really have to.
A way better method for calculating the average of two numbers without requiring to specify the type is to use a template method:
template<typename T>
T average_nums(T min_val, T max_val) {
return (min_val + max_val) / T(2);
}
If you are concerned about the performance, you should note that all modern compilers handle this piece of code the same way as it was a macro definition, namely they inline the code. This means that there is no function call involved, but the expression avg(a, b) gets replaced by (a + b) / 2.
The difference between this and macros is that a macro is only a text replacement which happens during precompilation, so the actual compilation step will see something like (a + b) / 2 instead of avg(a, b).
You are correct, no spaces between macro name and the immediately following open bracket. Spaces anywhere else == fine.
Be careful about the names used for your macro arguments, if you've got a variable called the same name as an argument you'll have strange things happening... For example in your example your argument is max_val, but you've typed Max_val in the macro, so if there was a variable Max_val where you happened to use this macro the code would compile fine, but not compile in other places...
Personally I always have a prefix _ on macro names to help avoid such situations... But purists have told me that underscore is reserved and should not be used, but stuff them :)
I believe the following excerpts from the C standard from 1999 cover the question of how spaces are treated within macros and they do indeed state that spaces are nothing more than a token separator, with the exception of the case where we differentiate between object-like macros and function-like macros based on whether there's a space between the macro name and the immediately following it opening parenthesis.
5.1.1.2 Translation phases
...
1.3 The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). ... Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
1.4 Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. ... All preprocessing directives are then deleted.
1.7 White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
...
6.4 Lexical elements
Syntax
token:
keyword
identifier
constant
string-literal
punctuator
preprocessing-token:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the above
Semantics
... Preprocessing tokens can be separated by white space; this consists of comments (described later), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. As described in 6.10, in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more than preprocessing token separation. ...
6.10 Preprocessing directives
Syntax
...
# define identifier replacement-list new-line
# define identifier lparen identifier-listopt ) replacement-list new-line
# define identifier lparen ... ) replacement-list new-line
# define identifier lparen identifier-list , ... ) replacement-list new-line
lparen:
a ( character not immediately preceded by white-space
6.10.3 Macro replacement
Constraints
3 There shall be white-space between the identifier and the replacement list in the definition of an object-like macro.
Semantics
10 A preprocessing directive of the form
# define identifier lparen identifier-listopt ) replacement-list new-line
# define identifier lparen ... ) replacement-list new-line
# define identifier lparen identifier-list , ... ) replacement-list new-line
defines a function-like macro with arguments, similar syntactically to a function call.
The question is already nine years old, but I think most answers miss the rationale behind the "no spaces before the opening paren" rule (or try to explain why macros are dangerous and templates are better...). The point is that you can use #define to define macros without parameters (i.e. constants) as well as real macros with parameters. Now, if the expansion of a constant starts with an opening parenthesis, the preprocessor must differentiate between the parenthesis belonging to the expansion and the parenthesis starting the macro parameter list. And this is done by (you guessed it) whitespace before the parenthesis.
Example: if you have #define FOUR (2+2) the ( is interpreted as belonging to the expansion and wherever you write FOUR its expanded to (2+2). Note, that the parens are necessary, because otherwise without parens expressions like FOUR*3 would be expanded to 2+2*3 which is 8 - not what you expected. On the other hand, if you define #define DOUBLE(a) ((a)+(a)) the parenthesis starts the parameter list and DOUBLE(2) expands to ((2)+(2)), whereas if you had wrongly written #define DOUBLE (a) ((a)+(a)) then DOUBLE(2) would be expanded to (a) ((a)+(a))(2), resulting in a compile error.