Extra tokens at end of #ifdef directive

Extra tokens at end of #ifdef directive - c++

Why does the following code compiles?
#ifdef C++11
// ...
#endif
int main() {}
gcc 4.8.0 gives me the following warning:
extra tokens at end of #ifdef directive
According to the standard, the macro name can contain only letters, digits and underscore character.
Maybe because this?
ISO/IEC 14882:2011
16.1 Conditional inclusion [cpp.cond]
6 Each directive’s condition is checked in order. If it evaluates to
false (zero), the group that it controls is skipped: directives are
processed only through the name that determines the directive in order
to keep track of the level of nested conditionals; the rest of the
directives’ preprocessing tokens are ignored, as are the other
preprocessing tokens in the group. Only the first group whose control
condition evaluates to true (nonzero) is processed. If none of the
conditions evaluates to true, and there is a #else directive, the
group controlled by the #else is processed; lacking a #else directive,
all the groups until the #endif are skipped.151
I can't understand this quote correctly.

As far as C++ is concerted, #ifdef C++11 is a syntax error. There is no rule saying a compiler has to reject a program with a syntax error.
1.4 Implementation compliance [intro.compliance]
The set of diagnosable rules consists of all syntactic and semantic rules in this International Standard except for those rules containing an explicit notation that "no diagnostic is required" or which are described as resulting in "undefined behavior."
[...]
If a program contains a violation of any diagnosable rule or an occurrence of a construct described in this Standard as "conditionally-supported" when the implementation does not support that construct, a conforming implementation shall issue at least one diagnostic message.
A warning is a diagnostic message. The compilers are perfectly within their rights to continue to successfully compile the program, as long as they ensure they show you that one diagnostic message. Since compilers have historically accepted such directives, and accepting such directives does not conflict with the requirements of the standard, they continue to do so.
At least as far as GCC is concerned, you can ask to make all standard-required diagnostics a hard error with the -pedantic-errors option.
$ printf "#ifdef C++11\n#endif\n" | gcc -std=c++11 -pedantic-errors -E -x c++ -
# 1 "<stdin>"
# 1 "<command-line>"
# 1 "<stdin>"
<stdin>:1:9: error: extra tokens at end of #ifdef directive

A #ifdef is defined as follow (taken from §16.1)
# ifdef identifier new-line
With regexp-like notation, an identifier is: [a-zA-Z_][a-zA-Z_0-9]* (*)
The point is: the macro you declare is NOT C++11. It is in fact C (see this live example). The ++11 part is ignored by the preprocessor. The only allowed character after the identifier (which is C) is a new-line, but as said in hvd's answer, from §1.4, a syntax error only force a diagnostic message, here the warning; the only reason I see for this instead of an error is to be compatible with old code, where such names sould have been used.
Also: the quote explains how #ifdef / #elif / #else / #endif work together, not the way conditions are specified.
I do not have a copy of the standard. I used draft n3485 for this answer.
(*) It is possible to have implementation-defined characters in an identifier, but that does not impact your question. Note that variables, class name, macros, ... all follows the same identifier rules.

Related

Preprocessor pragma precedence with #if

I'm using the IAR Embedded Workbench compiler and have an issue with precedence of #pragma with #if.
I'm using #if 0 during development to comment out code.
The #pragma in the code below is to suppress MISRA issues about using hex escape sequences in a C-String.
The code fragment:
#if 0
// Display the glyphs in the font.
// This is debug code.
#pragma diag_suppress=Pm118,Pm003
static const char * text[] =
{
// Limit rows to 20 characters.
" ()+-0123456789Vanot",
"y^?",
"\xE3\x81\x84" "\xE3\x81\x8A" "\xE3\x81\x8C",
};
#pragma diag_default=Pm118,Pm003
static const size_t quantity_text_lines =
sizeof (text) / sizeof(text[0]);
uint16_t y = 150U;
for (unsigned int i = 0u; i < quantity_text_lines; ++i)
{
cmd_text_convert(35U, y,
FONT_HEADING_HANDLE,
0u,
text[i]);
y += 60u;
}
#else
In the above code fragment, I'm getting errors:
Error[Pm118]: hexadecimal escape sequences shall not be used (MISRA C 2004 rule 4.1)
My understanding is that I should not be getting any warnings because the code is inside a #if 0 block.
FYI, when I change #if 0 to #if 1, there are no errors and no warnings generated.
Is the IAR compiler behaving correctly?
What's the ruling regarding the language standard(s) about this?
(The environment is actually C language, but can be compiled in C++ too.)

The standard says that the #pragmas should be ignored, which is consistent with what you see. (The error message was not suppressed.)
6.10.1p6: … Each directive’s condition is checked in order. If it evaluates to false (zero), the group that it controls is skipped: directives are processed only through the name that determines the directive in order to keep track of the level of nested conditionals; the rest of the directives’ preprocessing tokens are ignored, as are the other preprocessing tokens in the group.
(So the only part of #pragma inside #if 0 that is examined is the token pragma, and it is only examined to see if it affects conditional nesting.)
The question is whether the compiler can raise a tokenisation error for a preprocessing token. I'd say that it shouldn't, but there is nothing in the standard which prevents a compiler from generating diagnostics whenever it feels like it. That's a Quality of Implementation (QOI) issue, which is subject to the whim of the consumer, not the standards committee.
In any event, MISRA is not part of the standard.

Is the IAR compiler behaving correctly?
Pragmas or no pragmas, IAR is behaving surprisingly by performing MISRA conformance analysis on the contents of an #if 0 block at all, if indeed that's what it is doing. That seems inconsistent with C preprocessor semantics. Program text -- or more properly, preprocessing tokens -- following a conditional directive whose condition evaluates to zero is processed only enough to recognize the matching #else or #endif, and otherwise is ignored.
Possibly IAR's MISRA scanner is designed to ignore preprocessor conditionals, or to test both alternatives of each one. If it is indeed doing that then it is behaving surprisingly by not recognizing pragmas within those conditional blocks that it would recognize outside them.
I suspect that #rici has it right in supposing that the issue is flagged by IAR's source tokenizer, and then not suppressed on account the relevant #pragmas being ignored (as directed by the language specification). But this is within IAR's control. I would find it hard to accept an argument that IAR is free to implement MISRA compliance scanning, but not free to extend preprocessor behavior to properly support the associated pragma-based controls. Or for that matter, that it can't have preprocessor conditionals control MISRA conformance analysis directly where appropriate.
However,
What's the ruling regarding the language standard(s) about this?
The C language specification does not define any specific significance for the pragmas in question, and it does not define any procedure or semantics for MISRA compliance checking. It specifies that diagnostics must be issued for constraint violations, but does not limit the diagnostics that implementations may issue. This is why above I describe IAR's behavior as surprising rather than wrong. I might go so far as to call it buggy, but not on account of failure to conform to the language specification.

Do 'true' and 'false' have their usual meaning in preprocessor conditionals?

Given a C++11 compiler, which #error is the correct one it should end up with?
// no #includes!
#define SOMEMACRO true
#if SOMEMACRO
#error "it was true"
#else
#error "it was false"
#endif
Godbolt demo
Obviously I'm using #error just as a test. I know true and false are defined in the language proper, but this is preprocessor context. In C99 it seems not to be recognised by the preprocessor.
I'm asking because it seems that all compilers I tried see it as 'true', while a static code analysis tool insists that true isn't defined, implicitly false and ends up in "it was false".

In all ISO C++ standards, both true and false are keyword constants, just like nullptr in C++11. So #if SOMEMACRO = #if true and the preprocessor will go to the truthy branch.
In C, however, neither true nor false is ever a keyword. They're macros defined to 1 and 0 respectively, as of C99 and with #include <stdbool.h>. This does mean that however, if you don't include stdbool.h, the compiler should complain about unrecognized identifiers for true, false etc. After including the header, #if SOMEMACRO is now #if 1, which is truthy in C.
For preprocessing, this quote from CppReference is meaningful:
Any identifier, which is not literal, non defined using #define directive, evaluates to 0.
So in your (probably C-oriented) static analysis tool, it sees true as a non-#define-defined identifier, and therefore evaluates true to zero. You're not going to observe this behavior if you use a C++ analysis tool.
In that case, you probably shouldn't have missed the #include <stdbool.h> in the first place, though.

According to [cpp.cond]/4 in the C++11 standard:
Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. […] After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0, and then each preprocessing token is converted into a token. The resulting tokens comprise the controlling constant expression which is evaluated according to the rules of [expr.const] using arithmetic that has at least the ranges specified in [support.limits]. […] Each subexpression with type bool is subjected to integral promotion before processing continues.
Emphasis mine; from the bolded passages it follows that bool-typed expressions are meant to be supported in preprocessor conditions just like in the language proper, including bool literals true and false. The [expr.const] section defining constant expressions is referred to from other sections that use it in non-preprocessing context, from which it follows that the evaluation rules are the same in the preprocessor and the language proper.
I’d assume similar language appears in all further revisions of the C++ standard, and probably in earlier ones too. In C, on the other hand, true and false are not keywords, but macros defined in stdbool.h, so the preprocessor treats them just like any other token.
The usual practice is to use 1 and 0 for logical values in preprocessor expressions for maximum portability, and preferably to avoid directly referring to them entirely.

As other answers already pointed out correctly, true and false should work there with C++ compilers.
OP here: it was indeed a configuration problem of the SCA tool. In Helix, the option -preproccppkeywords, which says "When enabled, the C++ alternative tokens are treated as keywords." was responsible for this. When switching on, it behaves as expected. true and false are recognized during preprocessing.

C++ directive spelling error [duplicate]

My preprocessor appears to assume that undefined constants are 0 for the purpose of evaluating #if conditions.
Can this be relied upon, or do undefined constants give undefined behaviour?

Yes, it can be relied upon. The C99 standard specifies at §6.10.1 ¶3:
After all replacements due to macro expansion and the defined unary
operator have been performed, all remaining identifiers are replaced with the pp-number
0
Edit
Sorry, I thought it was a C question; still, no big deal, the equivalent section in the C++ standard (§16.1 ¶4) states:
After all replacements due to macro expansion and the defined unary operator
have been performed, all remaining identifiers and keywords, except for true and false, are replaced with the pp-number 0
The only difference is the different handling of true and false, which in C do not need special handling, while in C++ they have a special meaning even in the preprocessing phase.

An identifier that is not defined as a macro is converted to 0 before the expression is evaluated.
The exception is the identifier true, which is converted to 1. This is specific to the C++ preprocessor; in C, this doesn't happen and you would need to include <stdbool.h> to use true this way, in which case it will be defined as a macro and no special handling is required.

The OP was asking specifically about the C preprocessor and the first answer was correctly referring to the C preprocessor specification. But some of the other comments seem to blur the distinction between the C preprocessor and the C compiler. Just to be clear, those are two different things with separate rules and they are applied in two separate passes.
#if 0 == NAME_UNDEFINED
int foo = NAME_UNDEFINED;
#endif
This example will successfully output the foo definition because the C preprocessor evaluates NAME_UNDEFINED to 0 as part of a conditional expression, but a compiler error is generated because the initializer is not evaluated as a conditional expression and then the C compiler evaluates it as an undefined symbol.

C++ Preprocessor Standard Behaviour

I'm studying the C++ standard on the exact behaviour the preprocessor (I need to implement some sort of C++ preprocessor). From what I understand, the example I made up (to aid my understanding) below should be valid:
#define dds(x) f(x,
#define f(a,b) a+b
dds(eoe)
su)
I expect the first function like macro invocation dds(eoe) be replaced by f(eoe, (note the comma within the replacement string) which then considered as f(eoe,su) when the input is rescanned.
But a test with VC++2010 gave me this (I told the VC++ to output the preprocessed file):
eoe+et_leoe+et_l
su)
This is counter-intuitive and is obviously incorrect. Is it a bug with VC++2010 or my misunderstanding of the C++ standard? In particular, is it incorrect to put a comma at the end of the replacement string like I did? My understanding of the C++ standard grammar is that any preprocessing-token's are allowed there.
EDIT:
I don't have GCC or other versions of VC++. Could someone help me to verify with these compilers.

My answer is valid for the C preprocessor, but according to Is a C++ preprocessor identical to a C preprocessor?, the differences are not relevant for this case.
From C, A Reference Manual, 5th edition:
When a functionlike macro call is encoutered, the entire macro call is
replaced, after parameter processing, by a copy of the body. Parameter
processing proceeds as follows. Actual argument token strings are
associated with the corresponding formal parameter names. A copy of
the body is then made in which every occurrence of a formal parameter
name is replace by a copy of the actual parameter token sequence
associated with it. This copy of the body then replaces the macro
call.
[...] Once a macro call has been expanded, the scan for macro calls
resumes at the beginning of the expansion so that names of macros may
be recognized within the expansion for the purpose of further macro
replacement.
Note the words within the expansion. That's what makes your example invalid. Now, combine it with this: UPDATE: read comments below.
[...] The macro is invoked by writing its name, a left parenthesis,
then once actual argument token sequence for each formal parameter,
then a right parenthesis. The actual argument token sequences are
separated by commas.
Basically, it all boils down to whether the preprocessor will rescan for further macro invocations only within the previous expansion, or if it will keep reading tokens that show up even after the expansion.
This may be hard to think about, but I believe that what should happen with your example is that the macro name f is recognized during rescanning, and since subsequent token processing reveals a macro invocation for f(), your example is correct and should output what you expect. GCC and clang give the correct output, and according to this reasoning, this would also be valid (and yield equivalent outputs):
#define dds f
#define f(a,b) a+b
dds(eoe,su)
And indeed, the preprocessing output is the same in both examples. As for the output you get with VC++, I'd say you found a bug.
This is consistent with C99 section 6.10.3.4, as well as C++ standard section 16.3.4, Rescanning and further replacement:
After all parameters in the replacement list have been substituted and # and ##
processing has taken place, all placemarker preprocessing tokens are removed. Then, the
resulting preprocessing token sequence is rescanned, along with all subsequent
preprocessing tokens of the source file, for more macro names to replace.

To the best of my understanding there is nothing in the [cpp.subst/rescan] portions of the standard that makes what you do illegal, and clang and gcc are right in expanding it as eoe+su, and the MSC (Visual C++) behaviour has to be reported as a bug.
I failed to make it work but I managed to find an ugly MSC workaround for you, using variadics - you may find it helpful, or you may not, but in any event it is:
#define f(a,b) (a+b
#define dds(...) f(__VA_ARGS__)
It is expanded as:
(eoe+
su)
Of course, this won't work with gcc and clang.

Well, the problem i see is that the preprocessor does the following
ddx(x) becomes f(x,
However, f(x, is defined as well (even thou it's defined as f(a,b) ), so f(x, expands to x+ garbage.
So ddx(x) finally transforms into x + garbage (because you defined f(smthing, ).
Your dds(eoe) actually expands into a+b where a is eoe and b is et_l .
And it does that twice for whatever reason :).
This scenario you made is compiler specific, depends how the preprocessor chooses to handle the defines expansion.

Is it possible to disable GCC warning about missing underscore in user defined literal?

void operator"" test( const char* str, size_t sz )
{
std::cout<<str<<" world";
}
int main()
{
"hello"test;
return 0;
}
In GCC 4.7, this generates "warning: literal operator suffixes not preceded by '_' are reserved for future standardization [enabled by default]"
I understand why this warning is generated, but GCC says "enabled by default".
Is it possible to disable this warning without just disabling all warnings via the -w flag?

After reading several comments to this question, I reviewed the C++ 11 Standard (non-final draft N3337).
When I said "I understand why this warning is generated" I was mistaken.
I assumed that an underscore was not technically required by the standard, but just a recommendation (hence the warning rather than an error).
But as Nicol Bolas has brought up, the standard uses the following language when speaking about user defined literals:
"Literal suffix identifiers that do not start with an underscore are reserved for future standardization." usrlit.suffix
"Some literal suffix identifiers are reserved for future standardization; see [usrlit.suffix]. A declaration whose literal-operator-id uses such a literal suffix identifier is ill-formed, no diagnostic required." over.literal
This is similar to the language used for reserved identifiers and the "alternative representations" such as "and", "or", "not". I think this makes it pretty clear that this shouldn't actually be a warning in the first place, but an error.
This may not be the direct answer to the question of "is it possible to disable", but it is answer enough for me.

For what it is worth, -Wno-literal-suffix silences this warning since gcc-7 (see here live on godbold), i.e. this option also turns off warnings for user defined literal operators without leading underscore:
-Wliteral-suffix (C++ and Objective-C++ only)
...
Additionally, warn when a user-defined literal operator is declared with a literal suffix identifier that doesn’t
begin with an underscore. Literal suffix identifiers that don’t begin
with an underscore are reserved for future standardization.
However, one should stick to the advice in #cmeub's answer and rather avoid using literal suffix identifiers without underscore, as it leads to ill formed programs.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js