Short question:
Is it permitted to concat special signs such as +, - for the string concatenation macro ##? For example,
#define OP(var) operator##var
will OP(+) be expanded to operator+?
Exact problem:
#include "z3++.h"
#include <unordered_map>
namespace z3 {
z3::expr operator+(z3::expr const &, z3::expr const &);
}
typedef z3::expr (*MyOperatorTy)(z3::expr const &, z3::expr const &);
#define STR(var) #var
#define z3Op(var) static_cast<MyOperatorTy>(&z3::operator##var)
#define StrOpPair(var) \
{ STR(var), z3Op(var) }
void test() {
std::unordered_map<std::string, MyOperatorTy> strOpMap1{
{"+", static_cast<MyOperatorTy>(&z3::operator+)}}; // fine
std::unordered_map<std::string, MyOperatorTy> strOpMap2{StrOpPair(+)}; // error
}
For strOpMap2, using clang++ -c -std=c++11, it reports:
error: pasting formed 'operator+', an invalid preprocessing token
while using g++ -c -std=c++11, it gives:
error: pasting "operator" and "+" does not give a valid preprocessing token
By reading the manual by gcc I find it should be possible to concat, but why both compilers emit errors?
You can paste punctuation to form other punctuation, e.g.
#define PASTE(a,b) a##b
int main()
{
int i = 0;
i PASTE(+,+);
// i == 1 now
}
The ## operator is for producing a valid preprocessing token from other preprocessing tokens. The result of pasting must be a valid preprocessing token. So this is not valid:
PASTE(i,++)
because i++ is not a preprocessing token; it's two adjacent tokens i and ++.
The list of possible tokens is (N3797):
header-name
identifier
pp-number
character-literal
user-defined-character-literal
string-literal
user-defined-string-literal
preprocessing-op-or-punc
each non-white-space character that cannot be one of the above
Note: at the preprocessing stage, keyword does not exist; but after preprocessing, any identifiers which should be keyword are converted (semantically) into keywords. So you can build keywords by pasting shorter words.
In your code, operator+ is two tokens: operator and +. So you do not build it with ##; you just do one then the other.
#define OP(punc) operator punc
Related
I built C parser from Lex/Flex & YACC/Bison grammars (1, 2) as:
$ flex c.l && yacc -d c.y && gcc lex.yy.c y.tab.c -o c
and then tested on this C code:
char* s = "xxx;
which is expected to produce missing terminating " character (or syntax error) diagnostics.
However, it doesn't:
$ ./c t1.c
char* s = xxx;
Why? How to fix it?
Note: The STRING_LITERAL is defined in lex specification as:
L?\"(\\.|[^\\"])*\" { count(); return(STRING_LITERAL); }
Here we see the [^\\"] part, which represents the "except the double-quote ", backslash , or new-line character" (C11, 6.4.5 String literals, 1) and the \\. part, which (incorrectly?) represents the escape-sequence (C11, 6.4.4.4 Character constants, 1). -- end note
UPD: Fix: The STRING_LITERAL is defined in lex specification as:
L?\"(\\.|[^\\"\n])*\" { count(); return(STRING_LITERAL); }
The lexer you link has a rule:
. { /* Add code to complain about unmatched characters */ }
so when it sees an unmatched ", it will silently ignore it. If you add code here to complain about the character, you'll see that.
If you want a syntax error, you could have this action just return *yytext;
Note that your STRING_LITERAL pattern will match strings that contain embedded newlines, so if you have a mismatched " in a larger program wity another string later, it will be recognized as a long string with embedded newlines. This will likely lead to poor error reporting, since the error would be reported after the bug string rather than where it starts, making it hard for a user to debug.
The following code has been compiled with gcc-5.4.0 with no issues:
% gcc -W -Wall a.c
...
#include <stdio.h>
#include <stdarg.h>
static int debug_flag;
static void debug(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
vfprintf(stderr, fmt, ap);
va_end(ap);
}
#define DEBUG(...) \
do { \
if (debug_flag) { \
debug("DEBUG:"__VA_ARGS__); \
} \
} while(0)
int main(void)
{
int dummy = 10;
debug_flag = 1;
DEBUG("debug msg dummy=%d\n", dummy);
return 0;
}
However compiling this with g++ has interesting effects:
% g++ -W -Wall -std=c++11 a.c
a.c: In function ‘int main()’:
a.c:18:10: error: unable to find string literal operator ‘operator""__VA_ARGS__’ with ‘const char [8]’, ‘long unsigned int’ arguments
debug("DEBUG: "__VA_ARGS__); \
% g++ -W -Wall -std=c++0x
<same error>
% g++ -W -Wall -std=c++03
<no errors>
Changing debug("DEBUG:"__VA_ARGS__); to debug("DEBUG:" __VA_ARGS__); i.e. space before __VA_ARGS__ enables to compile with all three -std= options.
What is the reason for such behaviour? Thanks.
Since C++11 there is support for user-defined literals, which are literals, including string literals, immediately (without whitespace) followed by an identifier. A user-defined literal is considered a single preprocessor token. See https://en.cppreference.com/w/cpp/language/user_literal for details on their purpose.
Therefore "DEBUG:"__VA_ARGS__ is a single preprocessor token and it has no special meaning in a macro definition. The correct behavior is to simply place it unchanged into the macro expansion, where it then fails to compile as no user-defined literal operator for a __VA_ARG__ suffix was declared.
So GCC is correct to reject it as C++11 code.
This is one of the backwards-incompatible changes between C++03 and C++11 listed in the appendix of the C++11 standard draft N3337: https://timsong-cpp.github.io/cppwp/n3337/diff.cpp03.lex
Before C++11 the string literal (up to the closing ") would be its own preprocessor token and the following identifier a second preprocessor token, even without whitespace between them.
So GCC is also correct to accept it in C++03 mode. (-std=c++0x is the same as -std=c++11, C++0x was the placeholder name for C++11 when it was still in drafting)
It is also an incompatibility with C (in all revisions up to now) since C doesn't support user-defined literals either and considers the two parts of "DEBUG:"__VA_ARGS__ as two preprocessor tokens as well.
Therefore it is correct for GCC to accept it as C code as well (which is how the gcc command interprets .c files in contrast to g++ which treats them as C++).
To fix this add a whitespace between "DEBUG:" and __VA_ARGS__ as you suggested. That should make it compatible with all C and C++ revisions.
foo.cpp:
#define ID A
#if ID == A
#warning "hello, world"
#endif
Compilation with g++ -c foo.cpp works fine: (g++ v8.2.0)
foo.cpp:3:2: warning: #warning "hello, world" [-Wcpp]
#warning "hello, world"
^~~~~~~
Now, if I replace #define ID A with #define *, then I get:
foo.cpp:1:12: error: operator '*' has no left operand
#define ID *
^
foo.cpp:2:5: note: in expansion of macro ‘ID’
#if ID == A
^~
What is so special about *? Why does it fail in the #if expression?
There are two things of note in your post. The first, is that it doesn't work as you think. This will produce the warning too
#define ID B
#if ID == A
#warning "hello, world"
#endif
The reason is that in the context of #if the preprocessing tokens ID and A are taken as macros and are expanded. Since A is not defined, it is "expanded" to 0. So is ID via the expansion ID -> B -> 0. So the condition is true here as well.
This also answers why * causes an error. It cannot be expanded further (on account of not being a valid identifier), and therefore you get the comparison * == 0, which is nonsense.
Since your title implies you seek to compare against a character constant, the way to do that would be to define ID to expand into the token sequence of a character constant.
#define ID 'A'
#if ID == 'A'
It should now work as expected. As will #define ID '*'
#if does not what you think it is doing.
In your first example, it tries to evaluate 0 == 0, which is a valid expression with a value of true.
In your second example, it tries to evaluate * == 0, which is not a valid expression.
I try to embed a code block through the use of macro like this:
#define RUN_CODE_SNIPPET(c) do {\
c\
} while(0);
where 'c' is a code block enclosed inside '{ }'
Here is how to use it
#include <stdio.h>
#define RUN_CODE_SNIPPET(c) do {\
c\
} while(0);
int main(int argc, char *argv[]) {
RUN_CODE_SNIPPET({
//const char *message = "World";
const char message[] = {'w', 'o', 'r', 'l', 'd', '\0'};
printf("%s\r\n", message);
});
return 0;
}
You can run it here here
But I get compiler error when I use the initializer list format
test.c: In function ‘main’:
test.c:13:4: error: macro "RUN_CODE_SNIPPET" passed 6 arguments, but takes just 1
});
^
test.c:9:3: error: ‘RUN_CODE_SNIPPET’ undeclared (first use in this function)
RUN_CODE_SNIPPET({
^~~~~~~~~~~~~~~~
test.c:9:3: note: each undeclared identifier is reported only once for each
function it appears in
Seems the compiler is taking each element in the initializer list as the argument to the macro itself. The string initializer works fine.
What is wrong here?
The commas in what you pass inside the parentheses are interpreted as macro argument separators and the macro is expecting just one argument.
There are two ways around the problem:
parenthesize the commas-containing argument, i.e., pass (a,b,c) instead of a,b,c (not applicable in your case because your argument is not an expression)
use variadic macro arguments (... -> __VA_ARGS__)
In other words:
#define RUN_CODE_SNIPPET(...) do { __VA_ARGS__; }while(0)
will work (including the semicolon at the end of the macro is not advisable -- for a function-like macro, you should generally be able to do if(X) MACRO(something); else {} and the semicolon would mess that up).
The following doesn't compile:
#define SUPPRESS(w) _Pragma("GCC diagnostic ignored " ## w)
SUPPRESS("-Wuseless-cast")
int main() {
int a = (int)4;
return a;
}
Here's the error:
error: pasting ""GCC diagnostic ignored "" and ""-Wuseless-cast"" does not give a valid preprocessing token
How can I get it to work?
The thing is that _Pragma wants to have an escaped string-literal like so
_Pragma("GCC diagnostic ignored \"-Wuseless-cast\"")
So the trick is to add another layer of stringyfication between the call of SUPPRESS and the call of _Pragma like below
#define xSUPPRESS(w) _Pragma(#w)
#define SUPPRESS(w) xSUPPRESS(GCC diagnostic ignored w)
SUPPRESS("-Wuseless-cast")
See it here in action.