Extracting preprocessor symbols from source

Extracting preprocessor symbols from source - c++

I'm looking for a way to extract all preprocessor symbols used in my code.
As an example, if my code looks like this:
#ifdef FOO
#endif
#if ( BAR == 1 && \
defined (Z) )
#endif
I'd like to get the list [FOO,BAR,Z] as the output.
I found some posts suggesting gcc -E -dM, but this displays all symbols that the preprocessor would apply to the code.
What I want, in contrast, is a list of all symbols actually used in the code.
Any suggestions?

That's quite simple. You have just to parse the source code exactly the way a conformant pre-processor would, and with the correct C or C++ version support. Ok, I'm joking, if you support only the later version, your code is likely to produce correct results on older versions - but even this should be thoroughly controlled.
More seriously now. As you can ask the pre-processor to give you the list of all defined symbols, you can simply tokenize the source, and identify all tokens from that list that are not immediately following an initial #define or #undef. This part should be reasonably feasable with lex+yacc.
The only alternative I can imagine would be to use the code of a real compiler (Clang should be easier than gcc but unsure) discard all code generation and consistently store every macro usage.
TL/DR: however you take it, it will be a hard work: if you can do without, keep away from that...

You can get half way there by using a preprocessor library such as Boost.Wave. It can act as a lexer so you wouldn't have to write that part yourself. You would have to supply a grammar for the bit you cared about (define, ifdef, ifndef, if, elif) though.

Related

How to disable inline assembly in GCC?

I'm developing an online judge system for programming contests. Since C/C++ inline assembly is not allowed in certain programming contests, I would like to add the same restriction to my system.
I would like to let GCC produce an error when compiling a C/C++ program containing inline assembly, so that any program containing inline assembly will be rejected. Is there a way to achieve that?
Note: disabling inline assembly is just for obeying the rules, not for security concerns.

Is there a way to disable inline assembler in GCC?
Yes there are a couple of methods; none useful for security, only guard-rails that could be worked around intentionally, but will stop people from accidentally using asm in places they didn't realize they shouldn't.
Turn off the asm keyword in the compiler (C only)
To do it in compilation phase, use the parameter -fno-asm. However, keep in mind that this will only affect asm for C, not C++. And not __asm__ or __asm for either language.
Documentation:
-fno-asm
Do not recognize "asm", "inline" or "typeof" as a keyword, so that code can use these words as identifiers. You can use the keywords "__asm__", "__inline__" and "__typeof__" instead. -ansi implies -fno-asm.
In C++ , this switch only affects the "typeof" keyword, since "asm" and "inline" are standard keywords. You may want to use the -fno-gnu-keywords flag instead, which has the same effect. In C99 mode (-std=c99 or -std=gnu99), this switch only affects the "asm" and "typeof" keywords, since "inline" is a standard keyword in ISO C99.
Define the keyword as a macro
You can use the parameters -Dasm=error -D__asm__=error -D__asm=error
Note that this construction is generic. What it does is to create macros. It works pretty much like a #define. The documentation says:
-D name=definition
The contents of definition are tokenized and processed as if they appeared during translation phase three in a #define directive. In particular, the definition will be truncated by embedded newline characters.
...
So what it does is simply to change occurrences of asm, __asm, or __asm__ to error. This is done in the preprocessor phase. You don't have to use error. Just pick anything that will not compile.
Use a macro that fires during compilation
A way to solve it in compilation phase by using a macro, as suggested in comments by zwol, you can use -D'asm(...)=_Static_assert(0,"inline assembly not allowed")'. This will also solve the problem if there exist an identifier called error.
Note: This method requires -std=c11 or higher.
Using grep before using gcc
Yet another way that may be the solution to your problem is to just do a grep in the root of the source tree before compiling:
grep -nr "asm"
This will also catch __asm__ but it may give false positives, for instance is you have a string literal, identifier or comment containing the substring "asm". But in your case you could solve this problem by also forbidding any occurrence of that string anywhere in the source code. Just change the rules.
Possible unexpected problems
Note that disabling assembly can cause other problems. For instance, I could not use stdio.h with this option. It is common that system headers contains inline assembly code.
A way to cheat above methods
Aside from the trivial #undef __asm__, it is possible to execute strings as machine code. See this answer for an example: https://stackoverflow.com/a/18477070/6699433
A piece of the code from the link above:
/* our machine code */
char code[] = {0x55,0x48,0x89,0xe5,0x89,0x7d,0xfc,0x48,
0x89,0x75,0xf0,0xb8,0x2a,0x00,0x00,0x00,0xc9,0xc3,0x00};
/* copy code to executable buffer */
void *buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (buf, code, sizeof(code));
/* run code */
int i = ((int (*) (void))buf)();
The code above is only intended to give a quick idea of how to trick the rules OP has stated. It is not intended to be a good example of how to actually perform it in reality. Furthermore, the code is not mine. It is just a short code quote from the link I supplied. If you have ideas about how to improve it, then please comment on 4pie0:s original post instead.

C++ macro inside macro [duplicate]

I know that I am trying to shoot myself in the leg ;) However, it will allow me to make the rest (big amount) of code smaller and more readable.
Is there any tricky way to create preprocessor macro inside of another preprocessor macro?
Here is the example, what I am looking for. My real scenario is more complex
// That's what I want to do and surely C++ doesn't like it.
#define MACROCREATER(B) #define MACRO##B B+B
void foo()
{
MACROCREATOR(5) // This should create new macro (#define MACRO5 5+5)
int a = MACRO5; // this will use new macro
}

The C++ Standard says (16.3.4.3):
The resulting completely
macro-replaced preprocessing token
sequence [... of the macro expansion...] is not processed as a
preprocessing directive even if it
resembles one...
So no, there is no 'official' way of achieving what you want with macros.

No. Even if a macro expands into something that looks like a preprocessing directive, the expansion is not evaluated as a preprocessing directive.

As a supplement to the answers above, if you really wanted to pre-process a source file twice—which is almost definitely not what you actually want to do—you could always invoke your compiler like this:
g++ -E input.cpp | g++ -c -x c++ - -o output.o
That is, run the file through the preprocessor, then run the preprocessed output via pipe through a full compilation routine, including a second preprocessing step. In order for this to have a reasonably good chance of working, I'd imagine you'd have to be rather careful in how you defined and used your macros, and all in all it would most likely not be worth the trouble and increased build time.
If you really want macros, use standard macro-based solutions. If you really want compile-time metaprogramming, use templates.
On a slightly related note, this reminds me of the fact that raytracing language POV-Ray made heavy use of a fairly complex preprocessing language, with flow-control directives such as #while that allowed conditional repetition, compile-time calculations, and other such goodies. Would that it were so in C++, but it simply isn't, so we just do it another way.

No. The pre-processor is single-pass. It doesn't re-evaluate the macro expansions.

As noted, one can #include a particular file more than once with different macro definitions active. This can make it practical to achieve some effects that could not be practically achieved via any other means.
As a simple example, on many embedded systems pointer indirection is very expensive compared to direct variable access. Code which uses a lot of pointer indirection may very well be twice as large and slow as code which simply uses variables. Consequently, if a particular routine is used with two sets of variables, in a scenario where one would usually pass in a pointer to a structure and then use the arrow operator, it may be far more efficient to simple put the routine in its own file (I normally use extension .i) which is #included once without macro _PASS2 defined, and a second time with. That file can then #ifdef _PASS2/#else to define macros for all the variables that should be different on the two passes. Even though the code gets generated twice, on some micros that will take less space than using the arrow operator with passed-in pointers.

Take a look at m4. It is similar to cpp, but recursive and much more powerful. I've used m4 to create a structured language for assemblers, e.g.
cmp r0, #0
if(eq)
mov r1, #0
else
add r1, #1
end
The "if", "else", and "end" are calls to m4 macros I wrote that generate jumps and labels, the rest is native assembly. In order to nest these if/else/end constructs, you need to do defines within a macro.

Is it wrong to add preprocessor directives in a function-like macro?

I know that my question is similar to this one or this one, but I find that it is not really the same and, more, the second one has not an answer accepted, I decided to ask if it is correct to add preprocessor directives when a function-like macro is called?
In my case I have a function-like macro:
#define FUNC_MACRO(a, b) // do something with the variables
and somewhere in the code I call it with specific difference if some other macro is defined:
// ...
FUNC_MACRO(aVal
#ifdef ANOTHER_MACRO
+ offset
#endif // ANOTHER_MACRO
, bVal);
// ...
I tested on my machine (linux, with gcc 4.8) and it worked ok (with and without the preprocessor directives, and with and without ANOTHER_MACRO defined), but is it safe to do so?
I read the 16.3/9 paragraph from the answer of the first similar question, but is it true for my case too?

The C language leaves this as undefined behavior in 6.10.3 Macro replacement, ¶11:
If there are sequences of preprocessing tokens within the list of arguments that would otherwise act as preprocessing directives, the behavior is undefined.
So indeed it's wrong to do it.
GCC and perhaps other popular compiles don't catch it, which is probably why many users of the language are not aware. I encountered this when some of my code failed to compile on PCC (and promptly fixed the bug in my code).
Update: PJTraill asked in the comments for a case where it would be "misleading or meaningless" to have preprocessor directives inside a macro expansion. Here's an obvious one:
foo(a, b,
#ifdef BAR
c);
#else
d);
#endif
I'm not sure whether it would have been plausible for the language to specify that balanced preprocessor conditionals inside the macro expansion are okay, but I think you'd run into problems there too with ambiguities in the order in which they should be processed.

Do the following instead?
#ifdef ANOTHER_MACRO
FUNC_MACRO(aVal + offset, bVal);
#else
FUNC_MACRO(aVal, bVal);
#endif
EDIT: Addressing concern raised by comment; I do not know if the OP's method is specifically wrong (I think other answers cover that). However, succinctness and clarity are two aspects held to be pretty important when coding with C.
As such I would much prefer to find better ways to achieve what the OP seems to be trying, by slightly rethinking the situation such as I have offered above. I guess the OP may have used a triviallised example but I usually find with most C situations that if something is becoming overly complex or attempting to do something it does not seem like the language should allow, then there are better ways to achieve what is needed.

How are macros handled by preprocessor?

I am reading Efficient c++ (older version) and have some doubts.
Here, for example, it says:
When you do something like this
#define ASPECT_RATIO 1.653
the symbolic name ASPECT_RATIO may never be seen by the compilers; it may be removed by the preprocessors before the source code ever gets compiled. As a results the ASPECT_RATIO may never get entered to SYMBOLIC_TABLE. It an be confusing if you get an error during compilation involving the constant, because the error message may refer to 1.653 and not ASPECT_RATIO
I don't understand this paragraph.How can anything be removed the preprocessor, just like that. what could be the reasons and how feasible are they in real world.
Thanks

I don't understand this paragraph under inverted quotes.How can
anything be removed the preprocessor, just like that. what could be
the reasons and how feasible are they in real world.
Basically what it describes is exactly how C and C++ pre-processor works. The reason is to replace macros/constants (that are made using the #define directive) with their actual values, instead of repeating the same values over and over again. In C++ it is considered a bad style using C-style macros, but they're supported for C compatibility.
The preprocessor, as the name suggests, runs prior to the actual compilation, and is basically changing the source code as directed through the pre-processor directives (those starting with #). This also includes replacement of the macros with their values, the inclusion of the header files as directed by the #include directive, etc etc.
This is used in order to avoid code repetitions, magic numbers, to share interfaces (header files) and many other useful things.

It's simply a global search and replace of "ASPECT_RATIO" with "1.653" in the file before passing it to the compiler
That's why macros are so dangerous. If you have #define max 123 and a variable int max = 100 the compiler will get int 123 = 100 and you will get a confusing error message

The pre-processor will replace all instances of the token ASPECT_RATIO that appear in the code with the actual token 1.653 ... thus the compiler will never see the token ASPECT_RATIO. By the time it compiles the code, it only sees the literal token 1.653 that was substituted in by the pre-processor.
Basically the "problem" you will encounter with this approach is that ASPECT_RATIO will not be seen as a symbol by the compiler, thus in a debugger, etc., you can't query the value ASPECT_RATIO as-if it were a variable. It's not a value that will have a memory address like a static const int may have (I say "may", because an optimizing compiler may decide to act like the pre-processor, and optimize-out the need for an explicit memory address to store the constant value, instead simply substituting the literal value where-ever it appears in the code). In a larger function macro it also won't have an instruction address like actual C/C++ function will have, thus you can't set break-points inside a function macro. But in a more general sense I'm not sure I would call this a "problem" unless you were intending to use the macro as a debug-symbol, and/or set debugging break-points inside your macro. Otherwise the macro is doing its job.

Is there a way to do a #define inside of another #define?

I know that I am trying to shoot myself in the leg ;) However, it will allow me to make the rest (big amount) of code smaller and more readable.
Is there any tricky way to create preprocessor macro inside of another preprocessor macro?
Here is the example, what I am looking for. My real scenario is more complex
// That's what I want to do and surely C++ doesn't like it.
#define MACROCREATER(B) #define MACRO##B B+B
void foo()
{
MACROCREATOR(5) // This should create new macro (#define MACRO5 5+5)
int a = MACRO5; // this will use new macro
}

The C++ Standard says (16.3.4.3):
The resulting completely
macro-replaced preprocessing token
sequence [... of the macro expansion...] is not processed as a
preprocessing directive even if it
resembles one...
So no, there is no 'official' way of achieving what you want with macros.

No. Even if a macro expands into something that looks like a preprocessing directive, the expansion is not evaluated as a preprocessing directive.

As a supplement to the answers above, if you really wanted to pre-process a source file twice—which is almost definitely not what you actually want to do—you could always invoke your compiler like this:
g++ -E input.cpp | g++ -c -x c++ - -o output.o
That is, run the file through the preprocessor, then run the preprocessed output via pipe through a full compilation routine, including a second preprocessing step. In order for this to have a reasonably good chance of working, I'd imagine you'd have to be rather careful in how you defined and used your macros, and all in all it would most likely not be worth the trouble and increased build time.
If you really want macros, use standard macro-based solutions. If you really want compile-time metaprogramming, use templates.
On a slightly related note, this reminds me of the fact that raytracing language POV-Ray made heavy use of a fairly complex preprocessing language, with flow-control directives such as #while that allowed conditional repetition, compile-time calculations, and other such goodies. Would that it were so in C++, but it simply isn't, so we just do it another way.

No. The pre-processor is single-pass. It doesn't re-evaluate the macro expansions.

As noted, one can #include a particular file more than once with different macro definitions active. This can make it practical to achieve some effects that could not be practically achieved via any other means.
As a simple example, on many embedded systems pointer indirection is very expensive compared to direct variable access. Code which uses a lot of pointer indirection may very well be twice as large and slow as code which simply uses variables. Consequently, if a particular routine is used with two sets of variables, in a scenario where one would usually pass in a pointer to a structure and then use the arrow operator, it may be far more efficient to simple put the routine in its own file (I normally use extension .i) which is #included once without macro _PASS2 defined, and a second time with. That file can then #ifdef _PASS2/#else to define macros for all the variables that should be different on the two passes. Even though the code gets generated twice, on some micros that will take less space than using the arrow operator with passed-in pointers.

Take a look at m4. It is similar to cpp, but recursive and much more powerful. I've used m4 to create a structured language for assemblers, e.g.
cmp r0, #0
if(eq)
mov r1, #0
else
add r1, #1
end
The "if", "else", and "end" are calls to m4 macros I wrote that generate jumps and labels, the rest is native assembly. In order to nest these if/else/end constructs, you need to do defines within a macro.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js