What is the value of an undefined constant used in #if? - c++

My preprocessor appears to assume that undefined constants are 0 for the purpose of evaluating #if conditions.
Can this be relied upon, or do undefined constants give undefined behaviour?

Yes, it can be relied upon. The C99 standard specifies at §6.10.1 ¶3:
After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers are replaced with the pp-number
0
Edit
Sorry, I thought it was a C question; still, no big deal, the equivalent section in the C++ standard (§16.1 ¶4) states:
After all replacements due to macro expansion and the defined unary operator
have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0
The only difference is the different handling of true and false, which in C do not need special handling, while in C++ they have a special meaning even in the preprocessing phase.

An identifier that is not defined as a macro is converted to 0 before the expression is evaluated.
The exception is the identifier true, which is converted to 1. This is specific to the C++ preprocessor; in C, this doesn't happen and you would need to include <stdbool.h> to use true this way, in which case it will be defined as a macro and no special handling is required.

The OP was asking specifically about the C preprocessor and the first answer was correctly referring to the C preprocessor specification. But some of the other comments seem to blur the distinction between the C preprocessor and the C compiler. Just to be clear, those are two different things with separate rules and they are applied in two separate passes.
#if 0 == NAME_UNDEFINED
int foo = NAME_UNDEFINED;
#endif
This example will successfully output the foo definition because the C preprocessor evaluates NAME_UNDEFINED to 0 as part of a conditional expression, but a compiler error is generated because the initializer is not evaluated as a conditional expression and then the C compiler evaluates it as an undefined symbol.

Related

Do 'true' and 'false' have their usual meaning in preprocessor conditionals?

Given a C++11 compiler, which #error is the correct one it should end up with?
// no #includes!
#define SOMEMACRO true
#if SOMEMACRO
#error "it was true"
#else
#error "it was false"
#endif
Godbolt demo
Obviously I'm using #error just as a test. I know true and false are defined in the language proper, but this is preprocessor context. In C99 it seems not to be recognised by the preprocessor.
I'm asking because it seems that all compilers I tried see it as 'true', while a static code analysis tool insists that true isn't defined, implicitly false and ends up in "it was false".
In all ISO C++ standards, both true and false are keyword constants, just like nullptr in C++11. So #if SOMEMACRO = #if true and the preprocessor will go to the truthy branch.
In C, however, neither true nor false is ever a keyword. They're macros defined to 1 and 0 respectively, as of C99 and with #include <stdbool.h>. This does mean that however, if you don't include stdbool.h, the compiler should complain about unrecognized identifiers for true, false etc. After including the header, #if SOMEMACRO is now #if 1, which is truthy in C.
For preprocessing, this quote from CppReference is meaningful:
Any identifier, which is not literal, non defined using #define directive, evaluates to 0.
So in your (probably C-oriented) static analysis tool, it sees true as a non-#define-defined identifier, and therefore evaluates true to zero. You're not going to observe this behavior if you use a C++ analysis tool.
In that case, you probably shouldn't have missed the #include <stdbool.h> in the first place, though.
According to [cpp.cond]/4 in the C++11 standard:
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. […] After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0, and then each preprocessing token is converted into a token. The resulting tokens comprise the controlling constant expression which is evaluated according to the rules of [expr.const] using arithmetic that has at least the ranges specified in [support.limits]. […] Each subexpression with type bool is subjected to integral promotion before processing continues.
Emphasis mine; from the bolded passages it follows that bool-typed expressions are meant to be supported in preprocessor conditions just like in the language proper, including bool literals true and false. The [expr.const] section defining constant expressions is referred to from other sections that use it in non-preprocessing context, from which it follows that the evaluation rules are the same in the preprocessor and the language proper.
I’d assume similar language appears in all further revisions of the C++ standard, and probably in earlier ones too. In C, on the other hand, true and false are not keywords, but macros defined in stdbool.h, so the preprocessor treats them just like any other token.
The usual practice is to use 1 and 0 for logical values in preprocessor expressions for maximum portability, and preferably to avoid directly referring to them entirely.
As other answers already pointed out correctly, true and false should work there with C++ compilers.
OP here: it was indeed a configuration problem of the SCA tool. In Helix, the option -preproccppkeywords, which says "When enabled, the C++ alternative tokens are treated as keywords." was responsible for this. When switching on, it behaves as expected. true and false are recognized during preprocessing.

C++ directive spelling error [duplicate]

My preprocessor appears to assume that undefined constants are 0 for the purpose of evaluating #if conditions.
Can this be relied upon, or do undefined constants give undefined behaviour?
Yes, it can be relied upon. The C99 standard specifies at §6.10.1 ¶3:
After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers are replaced with the pp-number
0
Edit
Sorry, I thought it was a C question; still, no big deal, the equivalent section in the C++ standard (§16.1 ¶4) states:
After all replacements due to macro expansion and the defined unary operator
have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0
The only difference is the different handling of true and false, which in C do not need special handling, while in C++ they have a special meaning even in the preprocessing phase.
An identifier that is not defined as a macro is converted to 0 before the expression is evaluated.
The exception is the identifier true, which is converted to 1. This is specific to the C++ preprocessor; in C, this doesn't happen and you would need to include <stdbool.h> to use true this way, in which case it will be defined as a macro and no special handling is required.
The OP was asking specifically about the C preprocessor and the first answer was correctly referring to the C preprocessor specification. But some of the other comments seem to blur the distinction between the C preprocessor and the C compiler. Just to be clear, those are two different things with separate rules and they are applied in two separate passes.
#if 0 == NAME_UNDEFINED
int foo = NAME_UNDEFINED;
#endif
This example will successfully output the foo definition because the C preprocessor evaluates NAME_UNDEFINED to 0 as part of a conditional expression, but a compiler error is generated because the initializer is not evaluated as a conditional expression and then the C compiler evaluates it as an undefined symbol.

Differences in C and C++ with sequence points and UB

I used this post Undefined Behavior and Sequence Points to document undefined behavior(UB) in a C program and it was pointed to me that C and C++ have their own divergent rules for this [sequence points]. So what are the differences between C and C++ when it comes to sequence points and related UB? Can’t I use a post about C++ sequences to analyze what is happening in C code?
* Of Course I am not talking about features of C++ not applicable to C.
There are two parts to this question, we can tackle a comparison of sequence points rules without much trouble. This does not get us too far though, C and C++ are different languages which have different standards(the latest C++ standard is almost twice as large as the the latest C standard) and even though C++ uses C as a normative reference it would be incorrect to quote the C++ standard for C and vice versa, regardless how similar certain sections may be. The C++ standard does explicitly reference the C standard but that is for small sections.
The second part is a comparison of undefined behavior between C and C++, there can be some big differences and enumerating all the differences in undefined behavior may not be possible but we can give some indicative examples.
Sequence Points
Since we are talking about sequence points then this is covering pre C++11 and pre C11. The sequence point rules do not differ greatly as far as I can tell between C99 and Pre C++11 draft standards. As we will see in some of the example I give of differing undefined behavior the sequence point rules do not play a part in them.
The sequence points rules are covered in the closest draft C++ standard to C++03 section 1.9 Program execution which says:
There is a sequence point at the completion of evaluation of each full-expression12).
When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all
function arguments (if any) which takes place before execution of any expressions or statements in the function body.
There is also a sequence point after the copying of a returned value and before the execution of any expressions outside
the function13). Several contexts in C++ cause evaluation of a function call, even though no corresponding function call
syntax appears in the translation unit. [ Example: evaluation of a new expression invokes one or more allocation and
constructor functions; see 5.3.4. For another example, invocation of a conversion function (12.3.2) can arise in contexts
in which no function call syntax appears. —end example ] The sequence points at function-entry and function-exit
(as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the
function might be.
In the evaluation of each of the expressions
a && b
a || b
a ? b : c
a , b
using the built-in meaning of the operators in these expressions (5.14, 5.15, 5.16, 5.18), there is a sequence point after
the evaluation of the first expression14).
I will use the sequence point list from the draft C99 standard Annex C which although it is not normative I can find no disagreement with the normative sections it references. It says:
The following are the sequence points described in 5.1.2.3:
The call to a function, after the arguments have been evaluated (6.5.2.2).
The end of the first operand of the following operators: logical AND && (6.5.13);
logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
The end of a full declarator: declarators (6.7.5);
The end of a full expression: an initializer (6.7.8); the expression in an expression
statement (6.8.3); the controlling expression of a selection statement (if or switch)
(6.8.4); the controlling expression of a while or do statement (6.8.5); each of the
expressions of a for statement (6.8.5.3); the expression in a return statement
(6.8.6.4).
The following entries do not seem to have equivalents in the draft C++ standard but these come from the C standard library which C++ incorporates by reference:
Immediately before a library function returns (7.1.4).
After the actions associated with each formatted input/output function conversion
specifier (7.19.6, 7.24.2).
Immediately before and immediately after each call to a comparison function, and
also between any call to a comparison function and any movement of the objects
passed as arguments to that call (7.20.5).
So there is not much of a difference between C and C++ here.
Undefined Behavior
When it comes to the typical examples of sequence points and undefined behavior, for example those covered in Section 5 Expression dealing with modifying a variable more than once within a sequence points I can not come up with an example that is undefined in one but not the other. In C99 it says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.72) Furthermore, the prior value shall be read only to
determine the value to be stored.73)
and it provides these examples:
i = ++i + 1;
a[i++] = i;
and in C++ it says:
Except where noted, the order of evaluation of operands of individual
operators and subexpressions of individual expressions, and the order
in which side effects take place, is unspecified.57) Between the
previous and next sequence point a scalar object shall have its stored
value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored. The requirements of this paragraph shall be met
for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined
and provides these examples:
i = v[i ++]; / / the behavior is undefined
i = ++ i + 1; / / the behavior is undefined
In C++11 and C11 we do have one major difference which is covered in Assignment operator sequencing in C11 expressions which is the following:
i = ++i + 1;
This is due to the result of pre-increment being an lvalue in C++11 but not in C11 even though the sequencing rules are the same.
We do have major difference in areas that have nothing to do with sequence points:
In C what uses of an indeterminate value is undefined has always been well specified while in C++ it was not until the recent draft C++1y standard that it has been well specified. This is covered in my answer to Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++1y?
Type punning through a union has always been well defined in C but not in C++ or at least it is hotly debatable whether it is undefined behavior or not. I have several references to this in my answer to Why does optimisation kill this function?
In C++ simply falling off the end of value returning function is undefined behavior while in C it is only undefined behavior if you use the value.
There are probably plenty more examples but these are ones I have written about before.

Is there logical short-circuiting in the C preprocessor?

The gcc docs for cpp explain about the #if directive:
[...] and logical operations (&& and ||). The latter two obey the usual short-circuiting rules of standard C.
What does that mean? There is no evaluation of expressions during preprocessing, so how can it be short-circuited?
Very simple: undefined macros have numeric value zero, and division by zero is illegal.
#if FIXEDSIZE && CHUNKSIZE/FIXEDSIZE > 42
#define USE_CELLPOOL
#endif
#if does evaluate the rest of its line as an integer constant expression. Your linked documentation begins:
The ‘#if’ directive allows you to test the value of an arithmetic expression, rather than the mere existence of one macro.
That isn't a gcc extension, the Standard's syntax for #if is
#ifconstant-expression new-line groupopt.
The C99 preprocessor treats all constants as [u]intmax_t.
What they are referring to is && and || operators for #if
#if defined (AAA) || defined (BBB)
If defined (AAA) is defined then defined (BBB) is never evaluated.
UPDATE
So running the calculation will be short circuited. For example, if you build with -Wundef to warn about the usage of undefined macros.
#if defined FOO && FOO > 1000
#endif
#if FOO > 1000
#endif
will result in
thomas:~ jeffery$ gcc foo.c -Wundef
foo.c:4:5: warning: 'FOO' is not defined, evaluates to 0 [-Wundef]
#if FOO > 1000
^
1 warning generated.
So the first version does not generate the undefined macro warning, because FOO > 1000 is not evaluated.
OLD MUSINGS
This become important if the second part is a macro which has side effects. The macro would not be evaluated, so the side effects would not take place.
To avoid macro abuse I'll give a somewhat sane example
#define FOO
#define IF_WARN(x) _Pragma (#x) 1
#if defined(FOO) || IF_WARN(GCC warning "FOO not defined")
#endif
Now that I constructed this example, I now run into a problem. IF_WARN is always evaluated.
huh, more research needed.
Well foo… now that I read it again.
Macros. All macros in the expression are expanded before actual computation of the expression's value begins.
There is no evaluation of expressions during preprocessing, so how can it be short-circuited?
Yes there is evaluation of expression during preprocessing.
C11: 6.10.1 Conditional inclusion (p4):
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become ...
In a footnote 166:
Because the controlling constant expression is evaluated during translation phase 4, all identifiers....
These statements clearly testify that there is evaluation of expression in preprocessing. The necessary condition is that the controlling expression must evaluate to an integer value.
Now the operator && and || will obey the usual short-circuiting rules of standard C as stated in GNU doc.
Now run this program with and without // and see the result to see the short-circuit behavior:
#include<stdio.h>
#define macro1 1
//#define macro2 1
int main( void )
{
#if defined (macro1) && defined (macro2)
printf( "Hello!\n" );
#endif
printf("World\n");
return 0;
}
Evaluating macro conditions is a part (a major part) of pre-processing, so it occurs and short-circuiting is meaningful there. You can see examples of the other answers.
A conditional is a directive that instructs the preprocessor to select
whether or not to include a chunk of code in the final token stream
passed to the compiler. Preprocessor conditionals can test arithmetic
expressions, or whether a name is defined as a macro, or both
simultaneously using the special defined operator.†
Moreover, it can reduce the compile time. Altering the following evaluations can speed up the compilation (depeds on implementation of a compiler).

Is there a valid C++11 program with the expression 'C++11'?

The name of the programming language C++ derives from the parent language C and the ++ operator (it should arguably be ++C) and, hence, the expression C++ may naturally occur in C++ programs. I was wondering whether you can write a valid C++ program using the 2011 standard (without extensions) and containing the expression C++11 not within quotes and after pre-processing (note: edited the requirement, see also answer).
Obviously, if you could write a C++ program prior to the 2011 standard with the expressions C++98 or C++03, then the answer is a trivial yes. But I don't think that was possible (though I don't really know). So, can it be done with the new armory of C++11?
NO if we require the characters C++11 to be outside any literal, after preprocessing -- because at translation phase 7 the three tokens will be identifier, ++ and integer-literal
The first two tokens are a postfix-expression, the later is a primary.
There is no reduction in the grammar that can contain these two nonterminals in sequence, so any program containing C++11 will fail syntax analysis.
However, if you do not consider character literals to be strings, then the answer is YES as you can contain it in a wide character literal:
int main()
{
wchar_t x = L'C++11';
}
which does not use the preprocessor or a string literal, and the construct is required to be supported by the standard:
The value of a wide-character literal containing multiple c-chars is implementation-defined.
So, can it be done with the new armory of C++11?
No.
Define “valid C++ program”.
The C++ standard defines a “well-formed C++ program” as “a C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule”. This leaves open the possibility of C++ programs that are not well-formed. (C explicitly has the notion of programs that are conforming but not strictly conforming, e.g., that use extensions of a particular compiler.)
If you consider it valid to use extensions, then you can implement a C++ compiler that permits C++11 in some context.