Is there logical short-circuiting in the C preprocessor? - c++

The gcc docs for cpp explain about the #if directive:
[...] and logical operations (&& and ||). The latter two obey the usual short-circuiting rules of standard C.
What does that mean? There is no evaluation of expressions during preprocessing, so how can it be short-circuited?

Very simple: undefined macros have numeric value zero, and division by zero is illegal.
#if FIXEDSIZE && CHUNKSIZE/FIXEDSIZE > 42
#define USE_CELLPOOL
#endif
#if does evaluate the rest of its line as an integer constant expression. Your linked documentation begins:
The ‘#if’ directive allows you to test the value of an arithmetic expression, rather than the mere existence of one macro.
That isn't a gcc extension, the Standard's syntax for #if is
#ifconstant-expression new-line groupopt.
The C99 preprocessor treats all constants as [u]intmax_t.

What they are referring to is && and || operators for #if
#if defined (AAA) || defined (BBB)
If defined (AAA) is defined then defined (BBB) is never evaluated.
UPDATE
So running the calculation will be short circuited. For example, if you build with -Wundef to warn about the usage of undefined macros.
#if defined FOO && FOO > 1000
#endif
#if FOO > 1000
#endif
will result in
thomas:~ jeffery$ gcc foo.c -Wundef
foo.c:4:5: warning: 'FOO' is not defined, evaluates to 0 [-Wundef]
#if FOO > 1000
^
1 warning generated.
So the first version does not generate the undefined macro warning, because FOO > 1000 is not evaluated.
OLD MUSINGS
This become important if the second part is a macro which has side effects. The macro would not be evaluated, so the side effects would not take place.
To avoid macro abuse I'll give a somewhat sane example
#define FOO
#define IF_WARN(x) _Pragma (#x) 1
#if defined(FOO) || IF_WARN(GCC warning "FOO not defined")
#endif
Now that I constructed this example, I now run into a problem. IF_WARN is always evaluated.
huh, more research needed.
Well foo… now that I read it again.
Macros. All macros in the expression are expanded before actual computation of the expression's value begins.

There is no evaluation of expressions during preprocessing, so how can it be short-circuited?
Yes there is evaluation of expression during preprocessing.
C11: 6.10.1 Conditional inclusion (p4):
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become ...
In a footnote 166:
Because the controlling constant expression is evaluated during translation phase 4, all identifiers....
These statements clearly testify that there is evaluation of expression in preprocessing. The necessary condition is that the controlling expression must evaluate to an integer value.
Now the operator && and || will obey the usual short-circuiting rules of standard C as stated in GNU doc.
Now run this program with and without // and see the result to see the short-circuit behavior:
#include<stdio.h>
#define macro1 1
//#define macro2 1
int main( void )
{
#if defined (macro1) && defined (macro2)
printf( "Hello!\n" );
#endif
printf("World\n");
return 0;
}

Evaluating macro conditions is a part (a major part) of pre-processing, so it occurs and short-circuiting is meaningful there. You can see examples of the other answers.
A conditional is a directive that instructs the preprocessor to select
whether or not to include a chunk of code in the final token stream
passed to the compiler. Preprocessor conditionals can test arithmetic
expressions, or whether a name is defined as a macro, or both
simultaneously using the special defined operator.†
Moreover, it can reduce the compile time. Altering the following evaluations can speed up the compilation (depeds on implementation of a compiler).

Related

Do 'true' and 'false' have their usual meaning in preprocessor conditionals?

Given a C++11 compiler, which #error is the correct one it should end up with?
// no #includes!
#define SOMEMACRO true
#if SOMEMACRO
#error "it was true"
#else
#error "it was false"
#endif
Godbolt demo
Obviously I'm using #error just as a test. I know true and false are defined in the language proper, but this is preprocessor context. In C99 it seems not to be recognised by the preprocessor.
I'm asking because it seems that all compilers I tried see it as 'true', while a static code analysis tool insists that true isn't defined, implicitly false and ends up in "it was false".
In all ISO C++ standards, both true and false are keyword constants, just like nullptr in C++11. So #if SOMEMACRO = #if true and the preprocessor will go to the truthy branch.
In C, however, neither true nor false is ever a keyword. They're macros defined to 1 and 0 respectively, as of C99 and with #include <stdbool.h>. This does mean that however, if you don't include stdbool.h, the compiler should complain about unrecognized identifiers for true, false etc. After including the header, #if SOMEMACRO is now #if 1, which is truthy in C.
For preprocessing, this quote from CppReference is meaningful:
Any identifier, which is not literal, non defined using #define directive, evaluates to 0.
So in your (probably C-oriented) static analysis tool, it sees true as a non-#define-defined identifier, and therefore evaluates true to zero. You're not going to observe this behavior if you use a C++ analysis tool.
In that case, you probably shouldn't have missed the #include <stdbool.h> in the first place, though.
According to [cpp.cond]/4 in the C++11 standard:
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. […] After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0, and then each preprocessing token is converted into a token. The resulting tokens comprise the controlling constant expression which is evaluated according to the rules of [expr.const] using arithmetic that has at least the ranges specified in [support.limits]. […] Each subexpression with type bool is subjected to integral promotion before processing continues.
Emphasis mine; from the bolded passages it follows that bool-typed expressions are meant to be supported in preprocessor conditions just like in the language proper, including bool literals true and false. The [expr.const] section defining constant expressions is referred to from other sections that use it in non-preprocessing context, from which it follows that the evaluation rules are the same in the preprocessor and the language proper.
I’d assume similar language appears in all further revisions of the C++ standard, and probably in earlier ones too. In C, on the other hand, true and false are not keywords, but macros defined in stdbool.h, so the preprocessor treats them just like any other token.
The usual practice is to use 1 and 0 for logical values in preprocessor expressions for maximum portability, and preferably to avoid directly referring to them entirely.
As other answers already pointed out correctly, true and false should work there with C++ compilers.
OP here: it was indeed a configuration problem of the SCA tool. In Helix, the option -preproccppkeywords, which says "When enabled, the C++ alternative tokens are treated as keywords." was responsible for this. When switching on, it behaves as expected. true and false are recognized during preprocessing.

Dereferencing strings in preprocessor expressions

My reading of the draft standard documents suggests that it should be legal to dereference a string literal, either with a unary * or with a constant subscript, in a preprocessor expression. For instance, I should be able to say (using the predefined __ DATE __ macro which expands to a quoted string):
#if *__DATE__ == 'A'
or
#if __DATE__[0] == 'A'
If I do this in GCC, with -std=gnu++0x, the former complains
error: operator '*' has no left operand
and the latter complains
error: token ""Feb 16 2016"" is not valid in preprocessor expressions
The standards don't seem to define constant-expression any differently between the compiler and the preprocessor. The compiler happily compiles stuff like:
int foo[*__DATE__];
or
int foo[__DATE__[0]];
at global scope, proving that these are legitimate constant expressions.
I call foul. It seems to me that the standard requires the preprocessor to handle these types of expressions in #if or #elif clauses. Does anyone have any counterargument, before I go and report this as a GCC bug?
Your technique works in code, like an if (*_ _ DATE _ _ == 'A') statement, but not in an #IF macro. The preprocessor won't do that sort of expression evaluation.

C++ directive spelling error [duplicate]

My preprocessor appears to assume that undefined constants are 0 for the purpose of evaluating #if conditions.
Can this be relied upon, or do undefined constants give undefined behaviour?
Yes, it can be relied upon. The C99 standard specifies at §6.10.1 ¶3:
After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers are replaced with the pp-number
0
Edit
Sorry, I thought it was a C question; still, no big deal, the equivalent section in the C++ standard (§16.1 ¶4) states:
After all replacements due to macro expansion and the defined unary operator
have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0
The only difference is the different handling of true and false, which in C do not need special handling, while in C++ they have a special meaning even in the preprocessing phase.
An identifier that is not defined as a macro is converted to 0 before the expression is evaluated.
The exception is the identifier true, which is converted to 1. This is specific to the C++ preprocessor; in C, this doesn't happen and you would need to include <stdbool.h> to use true this way, in which case it will be defined as a macro and no special handling is required.
The OP was asking specifically about the C preprocessor and the first answer was correctly referring to the C preprocessor specification. But some of the other comments seem to blur the distinction between the C preprocessor and the C compiler. Just to be clear, those are two different things with separate rules and they are applied in two separate passes.
#if 0 == NAME_UNDEFINED
int foo = NAME_UNDEFINED;
#endif
This example will successfully output the foo definition because the C preprocessor evaluates NAME_UNDEFINED to 0 as part of a conditional expression, but a compiler error is generated because the initializer is not evaluated as a conditional expression and then the C compiler evaluates it as an undefined symbol.

Extra tokens at end of #ifdef directive

Why does the following code compiles?
#ifdef C++11
// ...
#endif
int main() {}
gcc 4.8.0 gives me the following warning:
extra tokens at end of #ifdef directive
According to the standard, the macro name can contain only letters, digits and underscore character.
Maybe because this?
ISO/IEC 14882:2011
16.1 Conditional inclusion [cpp.cond]
6 Each directive’s condition is checked in order. If it evaluates to
false (zero), the group that it controls is skipped: directives are
processed only through the name that determines the directive in order
to keep track of the level of nested conditionals; the rest of the
directives’ preprocessing tokens are ignored, as are the other
preprocessing tokens in the group. Only the first group whose control
condition evaluates to true (nonzero) is processed. If none of the
conditions evaluates to true, and there is a #else directive, the
group controlled by the #else is processed; lacking a #else directive,
all the groups until the #endif are skipped.151
I can't understand this quote correctly.
As far as C++ is concerted, #ifdef C++11 is a syntax error. There is no rule saying a compiler has to reject a program with a syntax error.
1.4 Implementation compliance [intro.compliance]
The set of diagnosable rules consists of all syntactic and semantic rules in this International Standard except for those rules containing an explicit notation that "no diagnostic is required" or which are described as resulting in "undefined behavior."
[...]
If a program contains a violation of any diagnosable rule or an occurrence of a construct described in this Standard as "conditionally-supported" when the implementation does not support that construct, a conforming implementation shall issue at least one diagnostic message.
A warning is a diagnostic message. The compilers are perfectly within their rights to continue to successfully compile the program, as long as they ensure they show you that one diagnostic message. Since compilers have historically accepted such directives, and accepting such directives does not conflict with the requirements of the standard, they continue to do so.
At least as far as GCC is concerned, you can ask to make all standard-required diagnostics a hard error with the -pedantic-errors option.
$ printf "#ifdef C++11\n#endif\n" | gcc -std=c++11 -pedantic-errors -E -x c++ -
# 1 "<stdin>"
# 1 "<command-line>"
# 1 "<stdin>"
<stdin>:1:9: error: extra tokens at end of #ifdef directive
A #ifdef is defined as follow (taken from §16.1)
# ifdef identifier new-line
With regexp-like notation, an identifier is: [a-zA-Z_][a-zA-Z_0-9]* (*)
The point is: the macro you declare is NOT C++11. It is in fact C (see this live example). The ++11 part is ignored by the preprocessor. The only allowed character after the identifier (which is C) is a new-line, but as said in hvd's answer, from §1.4, a syntax error only force a diagnostic message, here the warning; the only reason I see for this instead of an error is to be compatible with old code, where such names sould have been used.
Also: the quote explains how #ifdef / #elif / #else / #endif work together, not the way conditions are specified.
I do not have a copy of the standard. I used draft n3485 for this answer.
(*) It is possible to have implementation-defined characters in an identifier, but that does not impact your question. Note that variables, class name, macros, ... all follows the same identifier rules.

What is the value of an undefined constant used in #if?

My preprocessor appears to assume that undefined constants are 0 for the purpose of evaluating #if conditions.
Can this be relied upon, or do undefined constants give undefined behaviour?
Yes, it can be relied upon. The C99 standard specifies at §6.10.1 ¶3:
After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers are replaced with the pp-number
0
Edit
Sorry, I thought it was a C question; still, no big deal, the equivalent section in the C++ standard (§16.1 ¶4) states:
After all replacements due to macro expansion and the defined unary operator
have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0
The only difference is the different handling of true and false, which in C do not need special handling, while in C++ they have a special meaning even in the preprocessing phase.
An identifier that is not defined as a macro is converted to 0 before the expression is evaluated.
The exception is the identifier true, which is converted to 1. This is specific to the C++ preprocessor; in C, this doesn't happen and you would need to include <stdbool.h> to use true this way, in which case it will be defined as a macro and no special handling is required.
The OP was asking specifically about the C preprocessor and the first answer was correctly referring to the C preprocessor specification. But some of the other comments seem to blur the distinction between the C preprocessor and the C compiler. Just to be clear, those are two different things with separate rules and they are applied in two separate passes.
#if 0 == NAME_UNDEFINED
int foo = NAME_UNDEFINED;
#endif
This example will successfully output the foo definition because the C preprocessor evaluates NAME_UNDEFINED to 0 as part of a conditional expression, but a compiler error is generated because the initializer is not evaluated as a conditional expression and then the C compiler evaluates it as an undefined symbol.