Difference in preprocessor output

Difference in preprocessor output - c++

I was trying to check some single line macros which have 2 pre-processor directives.
#define REPLACE { \
#if EXT == 42 \
#warning "Got 42" \
#endif \
}
int main(void){
REPLACE;
return 0;
}
The pre-processor parses this fine yielding:
$g++ -E includetest.cpp
# 1 "includetest.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "includetest.cpp"
int main(void){
{ #if EXT == 42 #warning "Got 42" #endif };
return 0;
}
which of course is illegal code, since just macro substitution occurs and the ifdef lookalike macro isn't processed again even though it looks like one.
Now if I slightly alter the macro to look something like
#define REPLACE(a) { a + 2 ; \
#if EXT == 42 \
#warning "Got 42" \
#endif \
}
int main(void){
REPLACE(0);
return 0;
}
Which yields this pre-processor error:
$g++ -E includetest.cpp
# 1 "includetest.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "includetest.cpp"
includetest.cpp:1:18: error: '#' is not followed by a macro parameter
#define REPLACE(a) { a + 2 ; \
^
int main(void){
REPLACE(0);
return 0;
}
Why does this error come up ? Of course, this won't compile but I want to know why addition of a parameter ensues in a parsing error from pre-processor ?
People would say " You can't nest another directive in another", but in the first case too they are nested, why doesn't pre-processor error out then ? Or is that responsibility delegated to the compiler ?
EDIT: I am not trying to achieve any functionality per se, this is just an exercise (in futility?) to understand the pre-processor.

Only inside a function macro does a # have special meaning ([cpp.stringize]p1). The standard says that a # needs to be followed by a function argument, which is not happening in your second case (if, warning and endif are not parameters).
Your first case is valid (you can actually have directives inside the replacement list of object macros) exactly because there # doesn't have any special meaning.

Related

Fortran and cpp option : How to protect a comma?

An option is defined (value = 1 or 2) to chose between two instructions and I would like to use with an instruction which have a comma.
#define option 1
#if option == 1
#define my_instr(instr1, instr2) instr1
#else if option == 2
#define my_instr(instr1, instr2) instr2
#endif
It works but when there is a comma in the instruction, I have a problem.
For example :
program main
my_instr(print *,"opt 1", print * ,"opt 2")
end program main
does not compile (gftran -cpp) : Too much args. I am ok.
Thus, to escape the comma, parentheses are added : my_instr((print *,"opt 1"), (print * ,"opt 2"))
But it does not compile any more because of parentheses.
How can I solve that ?

Using the answer (https://stackoverflow.com/a/46311121/7462275), I found a " solution ".
#define option 2
#define unparen(...) __VA_ARGS__
#if option == 1
#define my_instr(instr1, instr2) unparen instr1
#elif option == 2
#define my_instr(instr1, instr2) unparen instr2
#endif
program main
my_instr((print *,"opt 1"), (print * ,"opt 2"))
end program main
But,
gfortran -cpp does not compile (problem with __VA_ARGS__). So, cpp -P is used before gfortran
__VA_ARGS__ : It is not standard to use __VA_ARGS__ without something before (cf comments of :Matthew D. Scholefield in the answer used :
Just something to note, this still doesn't conform to the C standard because of the use of a variadic macro.
and KamilCuk in this question)
Even instructions without comma need to be enclosed between parentheses

This will select the correct string.
#ifdef my_instr
#undef my_instr
#endif
#define my_instr(x) print *, x
#if option == 1
#define str "Opt 1"
#else if option == 2
#define str "Opt 2"
#endif
program foo
my_instr(str)
end program foo
% gfortran -E -Doption=2 a.F90 | cat -s
# 1 "a.F90"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "a.F90"
program foo
print *, "Opt 2"
end program foo

Accessing pointer to data member using macro results in "error: expected unqualified-id before ‘*’ token"

Minimal code of the bigger problem:
struct S { int i; };
typedef int (S::*Type);
Type foo (int) { return &S::i; }
#define FOO(X) *foo(X)
int main ()
{
S s;
s.*foo(0) = 0; // ok
s.FOO(0) = 0; // error <--- ??
}
If foo() method is replaced with FOO() macro to avoid '*', then it results in the error posted in title. When I checked the preprocessing using g++ -E option, then both the "ok" & "error" lines look same.
Why is this error with macro?

With clang 3.8 I got the next output for your program:
# 1 "test.cpp"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 325 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.cpp" 2
struct S { int i; };
typedef int (S::*Type);
Type foo (int) { return &S::i; }
int main ()
{
S s;
s.*foo(0) = 0;
s. *foo(0) = 0;
}
One can see the space in the line:
s. *foo(0) = 0;
This space is the reason for "expected the unqualified-id before..." error. The space itself should be a product of token spacing.
I do not know why g++ does not show the space. Possibly it's a compiler bug concerning representing the output of preprocessing.

Matching list of #define to a pattern

I have an header with a lot of #defines, like this:
#define XY_ABC_FOO 1
#define XY_ABC_BAR 3
#define XY_ABC_PIPPO 5
#define XY_ABC_PLUTO 7
#define XY_ABC_ETC 19
...
and so on and on.
I'd like to put all those in a vector.I can do it by hand (in a few minutes).
std::vector<int> defs = { 1, 3, 5, 7, 19 , ... }
But then, next time definition are added in the header, someone will have to remember to add them in my code too.
Is there any very clever preprocessor/metaprogramming trick to catch them all at compile time?
I don't particularly care about fast compilation, it's test code so it will be compiled seldom and mostly overnight.

You could do it with awk:
awk '/^#define XY_ABC_\w+ \d+$/ {
if(line) {
line = line ", " $2
} else {
line = "std::vector<int> defs = { " $2
}
END { print line " };" }' < header.hpp > defs.hpp
Then in your main program use #include defs.hpp to get the declaration.

You seem to want to achieve that using only #defines, but I'm almost completely sure that it's impossible.
But if you'd allow those values to be constexprs, then you'd be able to do following. (And it's probably a best thing you can get without external tools.)
#define DEFS \
C(XY_ABC_FOO, 1) \
C(XY_ABC_BAR, 3) \
C(XY_ABC_PIPPO, 5) \
C(XY_ABC_PLUTO, 7) \
C(XY_ABC_ETC, 19)
#define C(name, value) constexpr int name = value;
DEFS
#undef C
/* ... */
#define C(name, value) value,
std::vector<int> defs = {DEFS};
#undef C

How to count how many #ifdef clauses have at least an #elif but no #else in a set of C files?

I have a bunch of C files and I need to count how many #ifdef clauses have an #elif clause but do not have an #else clause in those files, including possible nested #ifdef clauses. For instance, in the first code snippet there are no matches, while in the second code snippet, there are two matches:
1: No matches (the #ifdef contains an #else clause)
#ifdef A
...
#elif B
...
#else
...
#endif
2: Two matches (there are two #ifdef clauses with #elif clauses but without a corresponding #else)
#ifdef X1
...
#elif X2
...
#endif
...
#ifdef Y1
...
#elif Y2
...
#elif Y3
...
#endif
I'm looking for a way to do this using some command line tool, such as grep, awk or sed, but no luck so far. So, I'm still open for easier alternatives, if any.
I have tried this regular expression using grep: '^(?=.*#elif)((?!#elif|#else).)(?=.*\#endif).)*$' (an #elif that is not followed by another #elif or #else and have a corresponding #endif), but it does not work, since clauses are in different lines.

You need to write a recursive-descent parser that descends every time it finds a "#ifdef" and returns every time it finds "#endif". See How to compare and substitute strings in different lines in unix for an example of one written in awk.
You didn't provide useful sample input or expected output so I had to make up my own to test it (and so it might not be exactly what you need), but you will want something like:
$ cat tst.awk
function descend(cond, numElifs,numElses,gotEndif) {
while ( !gotEndif && (getline > 0) ) {
if ( /#ifdef/ ) { descend($2) }
else if ( /#elif/ ) { numElifs++ }
else if ( /#else/ ) { numElses++ }
else if ( /#endif/ ) { gotEndif++ }
}
print cond, numElses+0, numElifs+0, ((numElifs>0)&&(numElses==0) ? "UhOh" : "")
return
}
/#ifdef/ { descend($2) }
.
$ cat file
#ifdef A
#elif B
#else
#ifdef C
#elif D
#endif
#ifdef E
#elif F
#else
#endif
#ifdef G
#elif H
#ifdef I
#else
#endif
#elif J
#endif
#endif
.
$ awk -f tst.awk file
C 0 1 UhOh
E 1 1
I 1 0
G 0 2 UhOh
A 1 1
Note that this IS an appropriate use of getline but see http://awk.info/?tip/getline before using it elsewhere.
All the usual caveats about really needing a parser for the language (to handle e.g. #ifdef inside comments or string) instead of a script like this apply.

Solution
Apart from the assumptions that #if, #ifdef, etc. doesn't appear in string or comments, and that the code is written in a sane manner, i.e. no crazy stuffs like:
#i\
fdef
I made at least another assumption that if, ifdef must be immediately preceded by #, while there can be arbitrary tab or space characters in between.
The regex below has been tested to work for PCRE and Perl flavors.
# Look-ahead to allow overlapping matches
(?=
(
# Just define patterns. Doesn't match anything.
(?(DEFINE)
(?<re>
# Match lines not ifdef, if, elif, else, endif macro
(?![ \t]* [#](?:if(?:def)?|elif|else|endif)) .*\R
|
# Recurse into another if or ifdef
(?1)
)
)
# Only match ifdef at top level, and allow if and ifdef nested
^[ \t]* [#](?(R)if(?:def)?|ifdef) .*\R
(?&re)*
# Match elif clause at least once at top level
(?(R)
|
(?:
[ \t]* [#]elif .*\R
(?&re)*
)
)
# Match 0 or more elif clauses
(?:
[ \t]* [#]elif .*\R
(?&re)*
)*
# Optional else clause nested
# else clause not allowed at top level
(?(R)
(?:
[ \t]* [#]else. *\R
(?&re)*
)?
)
# Match endif
[ \t]*[#]endif.*\R?+
)
)
Required flags: m (multiline, for the ^), and x (free-spacing syntax and comment).
Demo on regex101
The construct (?(R)...) is a conditional construct, which tests if we are currently inside any routine call. It is used to check the current nesting level of if/ifdef.
Technically, (?&re) which calls into the pattern defined in (?(DEFINE)...) counts as routine call, but except for (?1) which enters into another nested if/ifdef, the first alternation only operates on lines without if/ifdef, so it doesn't affect the final result.
Appendix
General purpose version
This is the regex for general case, without the restrictions on else and elif clause as required in the question. It is simpler, since we don't have to take care of the restrictions.
If you have a hard time digesting the regex above, this could be a good starting point.
(?=
(
(?(DEFINE)
(?<re>
(?![ \t]* [#](?:if(?:def)?|elif|else|endif)) .*\R
|
(?1)
)
)
^[ \t]* [#]if(?:def)? .*\R
(?&re)*
(?:
[ \t]* [#]elif .*\R
(?&re)*
)*
(?:
[ \t]* [#]else. *\R
(?&re)*
)?
[ \t]*[#]endif.*\R?+
)
)
Demon on regex101
Test case
#ifdef X1
#elif X2
#endif
#ifdef Y1
#define DEF
#if defined(X) && U == 0
#elif
#endif
#elif Y2
#ifdef Y1
#elif Y2
#else
#endif
#elif Y3
#endif
#ifdef X
#ifdef Y
#else
#endif
#ifdef K
#elif
#ifdef N1
#elif
#endif
#ifdef N2
#elif
#endif
#endif
#elif defined Z
#ifdef T
#elif
#endif
#endif
#ifdef Y
#ifdef E1
#endif
#ifdef E2
#elif
#endif
#endif
#ifdef Y
#elif
#endif

If you just want to count them this should work.
As far as my testing it should work fine with nesting.
awk '/#ifdef/{x++}
/#elif/&&a[x]!="q"{a[x]="s"}
/#else/{a[x]="q"}
/#endif/{total+=a[x]=="s";delete a[x];x--}
END{print total}' file
For EdMortons input file this would result in
2

What is <built-in> in C++ preprocessor output?

Summary: C++ preprocessor output includes some lines that say <built-in>. I'm curious to know what these are for.
Details:
When I compile the following code in a file named test.cpp with clang++ -E (output from g++ is similar):
#include <iostream>
int main()
{
std::cout << "Hello World!" << std::endl;
return 0;
}
the first few lines of output are as follows:
# 1 "test.cpp"
# 1 "test.cpp" 1
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 156 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "test.cpp" 2
My question is what do the <built-in> statements mean.

A macro expands to "1", and in case of built-in, the macro is defined by default, e.g., __cplusplus, in case of command line, the macro is defined on the command-line, i.e., -DMACRO=1.
You can see a list of predefined macros:
cpp -dM foo.h

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Difference in preprocessor output - c++

Related

Fortran and cpp option : How to protect a comma?

Accessing pointer to data member using macro results in "error: expected unqualified-id before ‘*’ token"

Matching list of #define to a pattern

How to count how many #ifdef clauses have at least an #elif but no #else in a set of C files?

What is <built-in> in C++ preprocessor output?

Categories

Resources