C++ directive spelling error [duplicate] - c++

My preprocessor appears to assume that undefined constants are 0 for the purpose of evaluating #if conditions.
Can this be relied upon, or do undefined constants give undefined behaviour?

Yes, it can be relied upon. The C99 standard specifies at §6.10.1 ¶3:
After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers are replaced with the pp-number
0
Edit
Sorry, I thought it was a C question; still, no big deal, the equivalent section in the C++ standard (§16.1 ¶4) states:
After all replacements due to macro expansion and the defined unary operator
have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0
The only difference is the different handling of true and false, which in C do not need special handling, while in C++ they have a special meaning even in the preprocessing phase.

An identifier that is not defined as a macro is converted to 0 before the expression is evaluated.
The exception is the identifier true, which is converted to 1. This is specific to the C++ preprocessor; in C, this doesn't happen and you would need to include <stdbool.h> to use true this way, in which case it will be defined as a macro and no special handling is required.

The OP was asking specifically about the C preprocessor and the first answer was correctly referring to the C preprocessor specification. But some of the other comments seem to blur the distinction between the C preprocessor and the C compiler. Just to be clear, those are two different things with separate rules and they are applied in two separate passes.
#if 0 == NAME_UNDEFINED
int foo = NAME_UNDEFINED;
#endif
This example will successfully output the foo definition because the C preprocessor evaluates NAME_UNDEFINED to 0 as part of a conditional expression, but a compiler error is generated because the initializer is not evaluated as a conditional expression and then the C compiler evaluates it as an undefined symbol.

Related

Do 'true' and 'false' have their usual meaning in preprocessor conditionals?

Given a C++11 compiler, which #error is the correct one it should end up with?
// no #includes!
#define SOMEMACRO true
#if SOMEMACRO
#error "it was true"
#else
#error "it was false"
#endif
Godbolt demo
Obviously I'm using #error just as a test. I know true and false are defined in the language proper, but this is preprocessor context. In C99 it seems not to be recognised by the preprocessor.
I'm asking because it seems that all compilers I tried see it as 'true', while a static code analysis tool insists that true isn't defined, implicitly false and ends up in "it was false".
In all ISO C++ standards, both true and false are keyword constants, just like nullptr in C++11. So #if SOMEMACRO = #if true and the preprocessor will go to the truthy branch.
In C, however, neither true nor false is ever a keyword. They're macros defined to 1 and 0 respectively, as of C99 and with #include <stdbool.h>. This does mean that however, if you don't include stdbool.h, the compiler should complain about unrecognized identifiers for true, false etc. After including the header, #if SOMEMACRO is now #if 1, which is truthy in C.
For preprocessing, this quote from CppReference is meaningful:
Any identifier, which is not literal, non defined using #define directive, evaluates to 0.
So in your (probably C-oriented) static analysis tool, it sees true as a non-#define-defined identifier, and therefore evaluates true to zero. You're not going to observe this behavior if you use a C++ analysis tool.
In that case, you probably shouldn't have missed the #include <stdbool.h> in the first place, though.
According to [cpp.cond]/4 in the C++11 standard:
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. […] After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0, and then each preprocessing token is converted into a token. The resulting tokens comprise the controlling constant expression which is evaluated according to the rules of [expr.const] using arithmetic that has at least the ranges specified in [support.limits]. […] Each subexpression with type bool is subjected to integral promotion before processing continues.
Emphasis mine; from the bolded passages it follows that bool-typed expressions are meant to be supported in preprocessor conditions just like in the language proper, including bool literals true and false. The [expr.const] section defining constant expressions is referred to from other sections that use it in non-preprocessing context, from which it follows that the evaluation rules are the same in the preprocessor and the language proper.
I’d assume similar language appears in all further revisions of the C++ standard, and probably in earlier ones too. In C, on the other hand, true and false are not keywords, but macros defined in stdbool.h, so the preprocessor treats them just like any other token.
The usual practice is to use 1 and 0 for logical values in preprocessor expressions for maximum portability, and preferably to avoid directly referring to them entirely.
As other answers already pointed out correctly, true and false should work there with C++ compilers.
OP here: it was indeed a configuration problem of the SCA tool. In Helix, the option -preproccppkeywords, which says "When enabled, the C++ alternative tokens are treated as keywords." was responsible for this. When switching on, it behaves as expected. true and false are recognized during preprocessing.

Definition of an "expression" in the C and C++ standards

I'm asking this question because I'm updating my C and C++ course materials and I've had past students ask about it...
From ISO/IEC 9899:2017 section 6.5 Expressions ¶1 (and similar in the C++ standard):
"An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. …"
Because the standards writers obviously choose their words carefully, the use of the phrase "sequence of operators and operands" seems potentially misleading to me. It seems to indicate that to be considered an expression there must be more than one operator and also more than one operand. Thus, literals like 123 or variables like XYZ would not be considered expressions because there is no operator, and they certainly can't be considered operands if there is no operator.
However, if 123 and XYZ actually are expressions, wouldn't replacing the phrase "sequence of operators and operands" with "sequence of one or more characters" or something similar be more accurate?
Please tell me what I am misinterpreting about what the standard is stating.
and similar in the C++ standard
I don't know about the C standard, but the C++ standard puts this statement in a non-normative notation. It has no normative value to C++, so it should be read as colloquial.
You forgot of Primary expressions that have a separate definition in (6.5.1).
You just confused different entities; the definition you provided describes exactly what it should describe.
6.5.1 Primary expressions
Syntax:
primary-expression:
identifier
constant
string-literal
(expression)
Yes, the definition of "expression" in the C standard is incomplete -- but not in a way that causes any actual problems (other than to picky people like me).
The word "expression" in the text you quoted is in italics, which means that that is the official definition of the term. It's clear from other parts of the standard that 123, for example, is an expression: it's a decimal-constant, which is an integer-constant, which is a constant, which is a primary-expression`, which is a postfix-expression, which (skipping multiple steps) is an expression.
It is not "a sequence of operators and operands". There is no operator, which implies that 123 is not an operand (this can be demonstrated by referring to the definitions of operator and operand elsewhere in the standard).
In practice, I've never heard of anyone, either a compiler implementer or a C programmer, having any real difficulty because of this incomplete definition. Compiler implementers refer to the language grammar. C programmers probably get a pretty good idea of what an "expression" is before reading the standard.
I'd like to see the definition of expression updated in a new edition of the standard. A definition that refers to the grammar rather than attempting an English description would IMHO be an improvement.
But if it isn't updated, we'll all keep using expressions without any problems.
As for C++, Nicol Bolas's answer correctly points out that the C++ standard doesn't have a formal definition of "expression" like the C standard does. It does have similar wording at the top of Clause 8: "An expression is a
sequence of operators and operands that specifies a computation." -- but the word "expression" is not in italics and that sentence is part of a "Note", and is therefore non-normative. In C++, the standard defines expressions syntactically.

Is there logical short-circuiting in the C preprocessor?

The gcc docs for cpp explain about the #if directive:
[...] and logical operations (&& and ||). The latter two obey the usual short-circuiting rules of standard C.
What does that mean? There is no evaluation of expressions during preprocessing, so how can it be short-circuited?
Very simple: undefined macros have numeric value zero, and division by zero is illegal.
#if FIXEDSIZE && CHUNKSIZE/FIXEDSIZE > 42
#define USE_CELLPOOL
#endif
#if does evaluate the rest of its line as an integer constant expression. Your linked documentation begins:
The ‘#if’ directive allows you to test the value of an arithmetic expression, rather than the mere existence of one macro.
That isn't a gcc extension, the Standard's syntax for #if is
#ifconstant-expression new-line groupopt.
The C99 preprocessor treats all constants as [u]intmax_t.
What they are referring to is && and || operators for #if
#if defined (AAA) || defined (BBB)
If defined (AAA) is defined then defined (BBB) is never evaluated.
UPDATE
So running the calculation will be short circuited. For example, if you build with -Wundef to warn about the usage of undefined macros.
#if defined FOO && FOO > 1000
#endif
#if FOO > 1000
#endif
will result in
thomas:~ jeffery$ gcc foo.c -Wundef
foo.c:4:5: warning: 'FOO' is not defined, evaluates to 0 [-Wundef]
#if FOO > 1000
^
1 warning generated.
So the first version does not generate the undefined macro warning, because FOO > 1000 is not evaluated.
OLD MUSINGS
This become important if the second part is a macro which has side effects. The macro would not be evaluated, so the side effects would not take place.
To avoid macro abuse I'll give a somewhat sane example
#define FOO
#define IF_WARN(x) _Pragma (#x) 1
#if defined(FOO) || IF_WARN(GCC warning "FOO not defined")
#endif
Now that I constructed this example, I now run into a problem. IF_WARN is always evaluated.
huh, more research needed.
Well foo… now that I read it again.
Macros. All macros in the expression are expanded before actual computation of the expression's value begins.
There is no evaluation of expressions during preprocessing, so how can it be short-circuited?
Yes there is evaluation of expression during preprocessing.
C11: 6.10.1 Conditional inclusion (p4):
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become ...
In a footnote 166:
Because the controlling constant expression is evaluated during translation phase 4, all identifiers....
These statements clearly testify that there is evaluation of expression in preprocessing. The necessary condition is that the controlling expression must evaluate to an integer value.
Now the operator && and || will obey the usual short-circuiting rules of standard C as stated in GNU doc.
Now run this program with and without // and see the result to see the short-circuit behavior:
#include<stdio.h>
#define macro1 1
//#define macro2 1
int main( void )
{
#if defined (macro1) && defined (macro2)
printf( "Hello!\n" );
#endif
printf("World\n");
return 0;
}
Evaluating macro conditions is a part (a major part) of pre-processing, so it occurs and short-circuiting is meaningful there. You can see examples of the other answers.
A conditional is a directive that instructs the preprocessor to select
whether or not to include a chunk of code in the final token stream
passed to the compiler. Preprocessor conditionals can test arithmetic
expressions, or whether a name is defined as a macro, or both
simultaneously using the special defined operator.†
Moreover, it can reduce the compile time. Altering the following evaluations can speed up the compilation (depeds on implementation of a compiler).

Is there a valid C++11 program with the expression 'C++11'?

The name of the programming language C++ derives from the parent language C and the ++ operator (it should arguably be ++C) and, hence, the expression C++ may naturally occur in C++ programs. I was wondering whether you can write a valid C++ program using the 2011 standard (without extensions) and containing the expression C++11 not within quotes and after pre-processing (note: edited the requirement, see also answer).
Obviously, if you could write a C++ program prior to the 2011 standard with the expressions C++98 or C++03, then the answer is a trivial yes. But I don't think that was possible (though I don't really know). So, can it be done with the new armory of C++11?
NO if we require the characters C++11 to be outside any literal, after preprocessing -- because at translation phase 7 the three tokens will be identifier, ++ and integer-literal
The first two tokens are a postfix-expression, the later is a primary.
There is no reduction in the grammar that can contain these two nonterminals in sequence, so any program containing C++11 will fail syntax analysis.
However, if you do not consider character literals to be strings, then the answer is YES as you can contain it in a wide character literal:
int main()
{
wchar_t x = L'C++11';
}
which does not use the preprocessor or a string literal, and the construct is required to be supported by the standard:
The value of a wide-character literal containing multiple c-chars is implementation-defined.
So, can it be done with the new armory of C++11?
No.
Define “valid C++ program”.
The C++ standard defines a “well-formed C++ program” as “a C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule”. This leaves open the possibility of C++ programs that are not well-formed. (C explicitly has the notion of programs that are conforming but not strictly conforming, e.g., that use extensions of a particular compiler.)
If you consider it valid to use extensions, then you can implement a C++ compiler that permits C++11 in some context.

What is the value of an undefined constant used in #if?

My preprocessor appears to assume that undefined constants are 0 for the purpose of evaluating #if conditions.
Can this be relied upon, or do undefined constants give undefined behaviour?
Yes, it can be relied upon. The C99 standard specifies at §6.10.1 ¶3:
After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers are replaced with the pp-number
0
Edit
Sorry, I thought it was a C question; still, no big deal, the equivalent section in the C++ standard (§16.1 ¶4) states:
After all replacements due to macro expansion and the defined unary operator
have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0
The only difference is the different handling of true and false, which in C do not need special handling, while in C++ they have a special meaning even in the preprocessing phase.
An identifier that is not defined as a macro is converted to 0 before the expression is evaluated.
The exception is the identifier true, which is converted to 1. This is specific to the C++ preprocessor; in C, this doesn't happen and you would need to include <stdbool.h> to use true this way, in which case it will be defined as a macro and no special handling is required.
The OP was asking specifically about the C preprocessor and the first answer was correctly referring to the C preprocessor specification. But some of the other comments seem to blur the distinction between the C preprocessor and the C compiler. Just to be clear, those are two different things with separate rules and they are applied in two separate passes.
#if 0 == NAME_UNDEFINED
int foo = NAME_UNDEFINED;
#endif
This example will successfully output the foo definition because the C preprocessor evaluates NAME_UNDEFINED to 0 as part of a conditional expression, but a compiler error is generated because the initializer is not evaluated as a conditional expression and then the C compiler evaluates it as an undefined symbol.