Unclear #define syntax in cpp using `\` sign - c++

#define is_module_error(_module_,_error_) \
((_module_##_errors<_error_)&&(_error_<_module_##_errors_end))
#define is_general_error(_error_) is_module_error(general,_error_)
#define is_network_error(_error_) is_module_error(network,_error_)
Can someone please explain to me what does the first define means?
How is is evaluated?
I don't understand what's the \ sign mean here?

The backslash is the line continuation symbol used in preprocessor directives. It tells the preprocessor to merge the following line with the current one. In other words it escapes the hard newline at the end of the line.
In the specific example, it tells the preprocessor that
#define is_module_error(_module_,_error_) \
((_module_##_errors<_error_)&&(_error_<_module_##_errors_end))
should be interpreted as:
#define is_module_error(_module_,_error_) ((_module_##_errors<_error_)&&(_error_<_module_##_errors_end))
The relevant quote from the C99 draft standard (N1256) is the following:
6.10 Preprocessing directives
[...]
Description
A preprocessing directive consists of a sequence of preprocessing tokens that satisfies the
following constraints: The first token in the sequence is a # preprocessing token that (at
the start of translation phase 4) is either the first character in the source file (optionally
after white space containing no new-line characters) or that follows white space
containing at least one new-line character. The last token in the sequence is the first new-line character that follows the first token in the sequence. A new-line character ends
the preprocessing directive even if it occurs within what would otherwise be an invocation of a function-like macro.
Emphasis on the relevant sentence is mine.
If you are also unsure of what the ## symbol means, it is the token-pasting operator. From the already cited C99 document (emphasis mine):
6.10.3.3 The ## operator
[...]
Semantics
If, in the replacement list of a function-like macro, a parameter is immediately preceded
or followed by a ## preprocessing token, the parameter is replaced by the corresponding
argument’s preprocessing token sequence; however, if an argument consists of no preprocessing tokens, the parameter is replaced by a placemarker preprocessing token instead.
In the case at hand this means that, for example, wherever the preprocessor finds the following macro "call":
is_module_error(dangerous_module,blow_up_error)
it will replace it with this code fragment:
((dangerous_module_errors<blow_up_error)&&(blow_up_error<dangerous_module_errors_end))

Related

Is there any relationship between the wording in [cpp.pre]/7 and its example?

[cpp.pre]/7:
The preprocessing tokens within a preprocessing directive are not
subject to macro expansion unless otherwise stated.
[Example 2: In:
#define EMPTY
EMPTY # include <file.h>
the sequence of preprocessing tokens on the second line is not a
preprocessing directive, because it does not begin with a # at the
start of translation phase 4, even though it will do so after the
macro EMPTY has been replaced. — end example]

Does recursion in the C preprocessor abuse an inconsistency in the standard?

Consider this code:
#define MAP_OUT
#define A(x) B MAP_OUT (x)
#define B(x) A MAP_OUT (x)
A(x)
Then A(x) expands to B MAP_OUT (x), then B (x). Now take a look at the standard:
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned, along with all subsequent preprocessing tokens of the source file, for more macro names to replace.
Does B (x) belong to "resulting preprocessing token sequence for more macro names to replace"? All compilers I have tried don't expand B (x) during a single scan, but what about the standard itself?
Does B (x) belong to "resulting preprocessing token sequence for more macro names to replace"?
No, absolutely not. Read again:
After all parameters in the replacement list have been substituted and # and ## processing has taken place, all placemarker preprocessing tokens are removed. The resulting preprocessing token sequence is then rescanned.
The preprocessing token sequence that results from parameter replacement in A(x) is precisely B MAP_OUT (x), nothing more, nothing less. This sequence is then scanned for more macros to replace, once. There is only one eligible macro to replace in there, MAP_OUT. Then the replacement of MAP_OUTis scanned, nothing is found, and the processing is resumed.
There is no indication whatsoever that B in B MAP_OUT (x) should be scanned twice.
You're cherry-picking. The standard requires that rescanning and replacement stops.
There are other paragraphs, with identical wording in every C++ standard since C++98 (not just the one you've quoted) that actually control the behaviour you observe.
After all parameters in the replacement list have been substituted, the resulting preprocessing token sequence is rescanned with all subsequent preprocessing tokens of the source file for more macro names to replace.
If the name of the macro being replaced is found during this scan of the replacement list (not including the rest of the source file’s preprocessing tokens), it is not replaced. Further, if any nested replacements encounter the name of the macro being replaced, it is not replaced. These nonreplaced macro name preprocessing tokens are no longer available for further replacement even if they are later (re)examined in contexts
in which that macro name preprocessing token would otherwise have been replaced.
As I said, the wording is identical in every C++ standard. Only the section and para numbers change.
In C++98, the above quote is Section 16.3.4 "Rescanning and further replacement", paras 1 and 2;
In C++17 the above quote is Section 19.3.4 "Rescanning and further replacement", paras 1 and 2;
In the latest C++20 draft (at least, the latest I've accessed) the above quote is Section 15.6.4 "Rescanning and further replacement", paras 1 and 3 (there is an added para 2 with an illustrative example, not normative text).

C++ macro expansion in include directive

In the current draft of the C++ standard (N4830, august 2019) there are the following paragraphs:
[cpp.include] p.2:
A preprocessing directive of the form #include < h-char-sequence > new-line [...]
[cpp.include] p.3:
A preprocessing directive of the form # include " q-char-sequence " new-line [...]
[cpp.include] p.4:
A preprocessing directive of the form # include pp-tokens new-line
(that does not match one of the two previous forms) is permitted. The preprocessing tokens after include in the directive are processed just as in normal text (i.e., each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens). If the directive resulting after all replacements does not match one of the two previous forms, the behavior is undefined. The method by which a sequence of preprocessing tokens between a < and a > preprocessing token pair or a pair of " characters is combined into a single header name preprocessing token is implementation-defined.
The sentence "If the directive resulting after all replacements does not match one of the two previous forms, the behavior is undefined" states that after the macro expansion is performed, the directive must match one of the two forms presented in the previous two paragraphs ([cpp.include]/2 and [cpp.include]/3). I will refer to this operation as check.
The following sentence "The method by which a sequence of preprocessing tokens between a < and a > preprocessing token pair or a pair of " characters is combined into a single header name preprocessing token is implementation-defined" implies that there is an implementation-defined process that transforms a sequence of preprocessing tokens between delimited by < > or by " " into a single preprocessing token (a header-name). I will refer to this operation as process.
My first question is whether process is only applied in the situation described in [cpp.include] p.4. I believe so, because tokenization is performed before processing of preprocessing directives, therefore in the first two forms of the include directive there is exactly one preprocessing token (a header-name) after "#include", not "a sequence of preprocessing tokens between a < and a > preprocessing token pair or a pair of " characters".
My second question is how is check performed? Is it performed before process? If so, is a sequence of preprocessing tokens separated by sequences of white-space characters (a white-space sequence can have zero or more white-space characters) compared to a sequence of characters by going lower in the "parsing hierarchy" (so that each preprocessing token in the first sequence is reverted to the characters forming it)?
For example the sequence of preprocessing tokens: <io[space][space]stream[space]> can be considered to match < h-char-sequence > if <io[space][space]stream[space]> is converted to the sequence of characters [<][i][o][space][space][s][t][r][e][a][m][space][>] (where [.] denotes an individual character)?
Considering that check is done first and it succeeds, then transform should be applied to the sequence of preprocessing tokens in order to transform it to a single preprocessing token (a header-name).
Are the details of these two operations and their ordering correct?
My last question is whether the last sentence in [cpp.include] p.4 is partly wrong. This is because it says: "[...] a sequence of preprocessing tokens between a < and a > preprocessing token pair or a pair of " characters". I do not think that the third form of the include directive can result (even after macro expansion) in a sequence of preprocessing tokens between a pair of " characters, because the " character alone is not a preprocessing token ([lex.pptoken] p.2) so I don't see any sequence of preprocessing tokens that can expand into "a sequence of preprocessing tokens between a pair of " characters".
Thank you.

Why paired comment can't be placed inside a string in c++?

Normally anything inside /* and */ is considered as a comment.
But the statement,
std::cout << "not-a-comment /* comment */";
prints not-a-comment /* comment */ instead of not-a-comment.
Why does this happen? Are there any other places in c++ where I can't use comments?
This is a consequence of the maximum munch principle. It's a lexing rule that the C++ language follows. When processing a source file, translation is divided into (logical) phases. During phase 3, we get preprocsessing tokens:
[lex.phases]
1.3 The source file is decomposed into preprocessing tokens and
sequences of white-space characters (including comments). A source
file shall not end in a partial preprocessing token or in a partial
comment. Each comment is replaced by one space character. New-line
characters are retained.
Turning comments into white space pp-tokens is done at the same phase. Now a string literal is a pp-token:
[lex.pptoken]
preprocessing-token:
header-name
identifier
pp-number
character-literal
user-defined-character-literal
string-literal
user-defined-string-literal
preprocessing-op-or-punc
each non-white-space character that cannot be one of the above
As are other literals. And the maximum munch principle, tells us that:
3 If the input stream has been parsed into preprocessing tokens up to a
given character:
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that
would cause further lexical analysis to fail, except that a
header-name is only formed within a #include directive.
So because preprocessing found the opening ", it must keep looking for the longest sequence of characters that will make a valid pp-token (in this case, the token is a string literal). This sequence ends at the closing ". That's why it can't stop and handle the comment, because it is obligated to consume up to the closing quotation mark.
Following these rules you can pin-point the places where comments will not be handled by the pre-processor as comments.
Why does this happen?
Because the comment becomes part of the string literal (eveything between the "" double quotes).
Are there any other places in c++ where I can't use comments?
Yes, the same applies for character literals (using '' single quotes).
You can think of it like single and double quotes have higher precedence before the comment delimiters /**/.

Spaces between #define arguments

I was confused with the concept that we shouldn't have spaces in #define arguments.
I think the only restriction is that we shouldn't have spaces between the macro name and the immediate ( bracket. Am I correct? Am I not supposed to put spaces even inside the () brackets?
Is the below notation correct
#define AVERAGE_NUMS( min_val, max_val ) ((min_val + max_val) / 2)
Guys, my above #define C++ statement is just an example. I am actually concerned about spaces while using #define. Anyway Thanks for your answers.
Yes, it's correct. But you should enclose each of the arguments in brackets
#define AVERAGE_NUMS( min_val, max_val ) (((min_val) + (max_val)) / 2)
to avoid operator precedence issues.
In C, when using macros for small computations like that, you should always put each parameter in the expression in brackets, like:
#define AVERAGE_NUMS(min_val, max_val) (((min_val) + (max_val)) / 2)
The spaces within the macro's argument list are optional, so they don't "hurt". As you already said, a space before the opening bracket changes the meaning of the macro: It will then take no parameter and replaces its occurrence with both what you wanted to be the parameter list as well as the "expression".
If you don't put the arguments in the expression in brackets, you can encounter strange results because the operator precedence might change your expression. This is because macros are just text replacement rules and thus don't respect anything of the programming language (C).
A small example where it fails (This is a strange example, I admit, but there are other functions you'd write as macros where "normal" usages fail):
AVERAGE_NUMS(1 << x, y)
Using your macro definition, this will expand to
((1 << x + y) / 2) // Operator precedence: 1 << (x + y)
But using the macro definition from above, it will expand to
(((1 << x) + (y)) / 2) // Operator precedence: (1 << x) + y
In C++, I strongly advise you not to use macros unless you really have to.
A way better method for calculating the average of two numbers without requiring to specify the type is to use a template method:
template<typename T>
T average_nums(T min_val, T max_val) {
return (min_val + max_val) / T(2);
}
If you are concerned about the performance, you should note that all modern compilers handle this piece of code the same way as it was a macro definition, namely they inline the code. This means that there is no function call involved, but the expression avg(a, b) gets replaced by (a + b) / 2.
The difference between this and macros is that a macro is only a text replacement which happens during precompilation, so the actual compilation step will see something like (a + b) / 2 instead of avg(a, b).
You are correct, no spaces between macro name and the immediately following open bracket. Spaces anywhere else == fine.
Be careful about the names used for your macro arguments, if you've got a variable called the same name as an argument you'll have strange things happening... For example in your example your argument is max_val, but you've typed Max_val in the macro, so if there was a variable Max_val where you happened to use this macro the code would compile fine, but not compile in other places...
Personally I always have a prefix _ on macro names to help avoid such situations... But purists have told me that underscore is reserved and should not be used, but stuff them :)
I believe the following excerpts from the C standard from 1999 cover the question of how spaces are treated within macros and they do indeed state that spaces are nothing more than a token separator, with the exception of the case where we differentiate between object-like macros and function-like macros based on whether there's a space between the macro name and the immediately following it opening parenthesis.
5.1.1.2 Translation phases
...
1.3 The source file is decomposed into preprocessing tokens and sequences of white-space characters (including comments). ... Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined.
1.4 Preprocessing directives are executed, macro invocations are expanded, and
_Pragma unary operator expressions are executed. ... All preprocessing directives are then deleted.
1.7 White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit.
...
6.4 Lexical elements
Syntax
token:
keyword
identifier
constant
string-literal
punctuator
preprocessing-token:
header-name
identifier
pp-number
character-constant
string-literal
punctuator
each non-white-space character that cannot be one of the above
Semantics
... Preprocessing tokens can be separated by white space; this consists of comments (described later), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both. As described in 6.10, in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more than preprocessing token separation. ...
6.10 Preprocessing directives
Syntax
...
# define identifier replacement-list new-line
# define identifier lparen identifier-listopt ) replacement-list new-line
# define identifier lparen ... ) replacement-list new-line
# define identifier lparen identifier-list , ... ) replacement-list new-line
lparen:
a ( character not immediately preceded by white-space
6.10.3 Macro replacement
Constraints
3 There shall be white-space between the identifier and the replacement list in the definition of an object-like macro.
Semantics
10 A preprocessing directive of the form
# define identifier lparen identifier-listopt ) replacement-list new-line
# define identifier lparen ... ) replacement-list new-line
# define identifier lparen identifier-list , ... ) replacement-list new-line
defines a function-like macro with arguments, similar syntactically to a function call.
The question is already nine years old, but I think most answers miss the rationale behind the "no spaces before the opening paren" rule (or try to explain why macros are dangerous and templates are better...). The point is that you can use #define to define macros without parameters (i.e. constants) as well as real macros with parameters. Now, if the expansion of a constant starts with an opening parenthesis, the preprocessor must differentiate between the parenthesis belonging to the expansion and the parenthesis starting the macro parameter list. And this is done by (you guessed it) whitespace before the parenthesis.
Example: if you have #define FOUR (2+2) the ( is interpreted as belonging to the expansion and wherever you write FOUR its expanded to (2+2). Note, that the parens are necessary, because otherwise without parens expressions like FOUR*3 would be expanded to 2+2*3 which is 8 - not what you expected. On the other hand, if you define #define DOUBLE(a) ((a)+(a)) the parenthesis starts the parameter list and DOUBLE(2) expands to ((2)+(2)), whereas if you had wrongly written #define DOUBLE (a) ((a)+(a)) then DOUBLE(2) would be expanded to (a) ((a)+(a))(2), resulting in a compile error.