How are macros handled by preprocessor?

How are macros handled by preprocessor? - c++

I am reading Efficient c++ (older version) and have some doubts.
Here, for example, it says:
When you do something like this
#define ASPECT_RATIO 1.653
the symbolic name ASPECT_RATIO may never be seen by the compilers; it may be removed by the preprocessors before the source code ever gets compiled. As a results the ASPECT_RATIO may never get entered to SYMBOLIC_TABLE. It an be confusing if you get an error during compilation involving the constant, because the error message may refer to 1.653 and not ASPECT_RATIO
I don't understand this paragraph.How can anything be removed the preprocessor, just like that. what could be the reasons and how feasible are they in real world.
Thanks

I don't understand this paragraph under inverted quotes.How can
anything be removed the preprocessor, just like that. what could be
the reasons and how feasible are they in real world.
Basically what it describes is exactly how C and C++ pre-processor works. The reason is to replace macros/constants (that are made using the #define directive) with their actual values, instead of repeating the same values over and over again. In C++ it is considered a bad style using C-style macros, but they're supported for C compatibility.
The preprocessor, as the name suggests, runs prior to the actual compilation, and is basically changing the source code as directed through the pre-processor directives (those starting with #). This also includes replacement of the macros with their values, the inclusion of the header files as directed by the #include directive, etc etc.
This is used in order to avoid code repetitions, magic numbers, to share interfaces (header files) and many other useful things.

It's simply a global search and replace of "ASPECT_RATIO" with "1.653" in the file before passing it to the compiler
That's why macros are so dangerous. If you have #define max 123 and a variable int max = 100 the compiler will get int 123 = 100 and you will get a confusing error message

The pre-processor will replace all instances of the token ASPECT_RATIO that appear in the code with the actual token 1.653 ... thus the compiler will never see the token ASPECT_RATIO. By the time it compiles the code, it only sees the literal token 1.653 that was substituted in by the pre-processor.
Basically the "problem" you will encounter with this approach is that ASPECT_RATIO will not be seen as a symbol by the compiler, thus in a debugger, etc., you can't query the value ASPECT_RATIO as-if it were a variable. It's not a value that will have a memory address like a static const int may have (I say "may", because an optimizing compiler may decide to act like the pre-processor, and optimize-out the need for an explicit memory address to store the constant value, instead simply substituting the literal value where-ever it appears in the code). In a larger function macro it also won't have an instruction address like actual C/C++ function will have, thus you can't set break-points inside a function macro. But in a more general sense I'm not sure I would call this a "problem" unless you were intending to use the macro as a debug-symbol, and/or set debugging break-points inside your macro. Otherwise the macro is doing its job.

Related

How to disable inline assembly in GCC?

I'm developing an online judge system for programming contests. Since C/C++ inline assembly is not allowed in certain programming contests, I would like to add the same restriction to my system.
I would like to let GCC produce an error when compiling a C/C++ program containing inline assembly, so that any program containing inline assembly will be rejected. Is there a way to achieve that?
Note: disabling inline assembly is just for obeying the rules, not for security concerns.

Is there a way to disable inline assembler in GCC?
Yes there are a couple of methods; none useful for security, only guard-rails that could be worked around intentionally, but will stop people from accidentally using asm in places they didn't realize they shouldn't.
Turn off the asm keyword in the compiler (C only)
To do it in compilation phase, use the parameter -fno-asm. However, keep in mind that this will only affect asm for C, not C++. And not __asm__ or __asm for either language.
Documentation:
-fno-asm
Do not recognize "asm", "inline" or "typeof" as a keyword, so that code can use these words as identifiers. You can use the keywords "__asm__", "__inline__" and "__typeof__" instead. -ansi implies -fno-asm.
In C++ , this switch only affects the "typeof" keyword, since "asm" and "inline" are standard keywords. You may want to use the -fno-gnu-keywords flag instead, which has the same effect. In C99 mode (-std=c99 or -std=gnu99), this switch only affects the "asm" and "typeof" keywords, since "inline" is a standard keyword in ISO C99.
Define the keyword as a macro
You can use the parameters -Dasm=error -D__asm__=error -D__asm=error
Note that this construction is generic. What it does is to create macros. It works pretty much like a #define. The documentation says:
-D name=definition
The contents of definition are tokenized and processed as if they appeared during translation phase three in a #define directive. In particular, the definition will be truncated by embedded newline characters.
...
So what it does is simply to change occurrences of asm, __asm, or __asm__ to error. This is done in the preprocessor phase. You don't have to use error. Just pick anything that will not compile.
Use a macro that fires during compilation
A way to solve it in compilation phase by using a macro, as suggested in comments by zwol, you can use -D'asm(...)=_Static_assert(0,"inline assembly not allowed")'. This will also solve the problem if there exist an identifier called error.
Note: This method requires -std=c11 or higher.
Using grep before using gcc
Yet another way that may be the solution to your problem is to just do a grep in the root of the source tree before compiling:
grep -nr "asm"
This will also catch __asm__ but it may give false positives, for instance is you have a string literal, identifier or comment containing the substring "asm". But in your case you could solve this problem by also forbidding any occurrence of that string anywhere in the source code. Just change the rules.
Possible unexpected problems
Note that disabling assembly can cause other problems. For instance, I could not use stdio.h with this option. It is common that system headers contains inline assembly code.
A way to cheat above methods
Aside from the trivial #undef __asm__, it is possible to execute strings as machine code. See this answer for an example: https://stackoverflow.com/a/18477070/6699433
A piece of the code from the link above:
/* our machine code */
char code[] = {0x55,0x48,0x89,0xe5,0x89,0x7d,0xfc,0x48,
0x89,0x75,0xf0,0xb8,0x2a,0x00,0x00,0x00,0xc9,0xc3,0x00};
/* copy code to executable buffer */
void *buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (buf, code, sizeof(code));
/* run code */
int i = ((int (*) (void))buf)();
The code above is only intended to give a quick idea of how to trick the rules OP has stated. It is not intended to be a good example of how to actually perform it in reality. Furthermore, the code is not mine. It is just a short code quote from the link I supplied. If you have ideas about how to improve it, then please comment on 4pie0:s original post instead.

C++ macro inside macro [duplicate]

I know that I am trying to shoot myself in the leg ;) However, it will allow me to make the rest (big amount) of code smaller and more readable.
Is there any tricky way to create preprocessor macro inside of another preprocessor macro?
Here is the example, what I am looking for. My real scenario is more complex
// That's what I want to do and surely C++ doesn't like it.
#define MACROCREATER(B) #define MACRO##B B+B
void foo()
{
MACROCREATOR(5) // This should create new macro (#define MACRO5 5+5)
int a = MACRO5; // this will use new macro
}

The C++ Standard says (16.3.4.3):
The resulting completely
macro-replaced preprocessing token
sequence [... of the macro expansion...] is not processed as a
preprocessing directive even if it
resembles one...
So no, there is no 'official' way of achieving what you want with macros.

No. Even if a macro expands into something that looks like a preprocessing directive, the expansion is not evaluated as a preprocessing directive.

As a supplement to the answers above, if you really wanted to pre-process a source file twice—which is almost definitely not what you actually want to do—you could always invoke your compiler like this:
g++ -E input.cpp | g++ -c -x c++ - -o output.o
That is, run the file through the preprocessor, then run the preprocessed output via pipe through a full compilation routine, including a second preprocessing step. In order for this to have a reasonably good chance of working, I'd imagine you'd have to be rather careful in how you defined and used your macros, and all in all it would most likely not be worth the trouble and increased build time.
If you really want macros, use standard macro-based solutions. If you really want compile-time metaprogramming, use templates.
On a slightly related note, this reminds me of the fact that raytracing language POV-Ray made heavy use of a fairly complex preprocessing language, with flow-control directives such as #while that allowed conditional repetition, compile-time calculations, and other such goodies. Would that it were so in C++, but it simply isn't, so we just do it another way.

No. The pre-processor is single-pass. It doesn't re-evaluate the macro expansions.

As noted, one can #include a particular file more than once with different macro definitions active. This can make it practical to achieve some effects that could not be practically achieved via any other means.
As a simple example, on many embedded systems pointer indirection is very expensive compared to direct variable access. Code which uses a lot of pointer indirection may very well be twice as large and slow as code which simply uses variables. Consequently, if a particular routine is used with two sets of variables, in a scenario where one would usually pass in a pointer to a structure and then use the arrow operator, it may be far more efficient to simple put the routine in its own file (I normally use extension .i) which is #included once without macro _PASS2 defined, and a second time with. That file can then #ifdef _PASS2/#else to define macros for all the variables that should be different on the two passes. Even though the code gets generated twice, on some micros that will take less space than using the arrow operator with passed-in pointers.

Take a look at m4. It is similar to cpp, but recursive and much more powerful. I've used m4 to create a structured language for assemblers, e.g.
cmp r0, #0
if(eq)
mov r1, #0
else
add r1, #1
end
The "if", "else", and "end" are calls to m4 macros I wrote that generate jumps and labels, the rest is native assembly. In order to nest these if/else/end constructs, you need to do defines within a macro.

Will it be odd to #define inside a C++ function?

My little C++ function needs to calculate a simple timeout value.
CalcTimeout(const mystruct st)
{
return (st.x + 100) * st.y + 200;
}
The numbers 100 and 200 would be confusing to read the code later so I would like to use #define for them. But these defines are only going to be needed for this function only, can I define them inside the function? The advantages this way are:
It is very local values and nobody else needs to know about it
Being closer to where it is used, the intent is clear, it has no other use, they are like local variables (except that they are not)
The disadvantage can be it is rather crude way to define something like local variable/const but it is obviously not local.
Other than that would this be odd to #define inside a C++ function? Most of the time we use #defines at the top of the file. Is using const variables better in any way in replacing a fixed local hard coded value like this?
The objective really is make code more readable/understandable.

Don't use a macro to define a constant; use a constant.
const int thingy = 100; // Obviously, you'll choose a better name
const int doodad = 200;
return (st.x + thingy) * st.y + doodad;
Like macros that expand to constant expressions, these can be treated as compile-time constants. Unlike macros, these are properly scoped within the function.
If you do have a reason for defining a macro that's only used locally, you can use #undef to get rid of it once you're done. But in general, you should avoid macros when (like here) there's a language-level construct that does what you want.

In C++ specifically it would be rather weird to see macros being used for that purpose. In C++ const completely replaces macros for defining manifest constants. And const works much better. (In C you'd have to stick with #define in many (or most) cases, but your question is tagged C++).
Having said that, pseudo-local macros sometimes come handy in C++ (especially in pre-C++11 versions of the language). If for some reason you have to #define such a macro "inside" a function, it is a very good idea to make an explicit #undef fro that macro at the end of the same scope. (I enclosed the word inside in quotes since preprocessor does not really care about scopes and can't tell "inside" from "outside".) That way you will be able to simulate the scoped visibility behavior other local identifiers have, instead of having a "locally" defined macro to spill out into the rest of the code all the way to the end of the translation unit.

What are the major advantages of const versus #define for global constants?

In embedded programming, for example, #define GLOBAL_CONSTANT 42 is preferred to const int GLOBAL_CONSTANT = 42; for the following reasons:
it does not need place in RAM (which is usually very limited in microcontrollers, and µC applications usually need a large number of global constants)
const needs not only a storage place in the flash, but the compiler generates extra code at the start of the program to copy it.
Against all these advantages of using #define, what are the major advantages of using const?
In a non-µC environment memory is usually not such a big issue, and const is useful because it can be used locally, but what about global constants? Or is the answer just "we should never ever ever use global constants"?
Edit:
The examples might have caused some misunderstanding, so I have to state that they are in C. If the C compiler generated the exact same code for the two, I think that would be an error, not an optimization.
I just extended the question to C++ without thinking much about it, in the hopes of getting new insights, but it was clear to me, that in an object-oriented environment there is very little space for global constants, regardless whether they are macros or consts.

Are you sure your compiler is too dumb to optimize your constant by inserting its value where it is needed instead of putting it into memory? Compilers usually are good in optimizations.
And the main advantage of constants versus macros is that constants have scope. Macros are substituted everywhere with no respect for scope or context. And it leads to really hard to understand compiler error messages.
Also debuggers are not aware of macros.
More can be found here

The answer to your question varies for C and C++.
In C, const int GLOBAL_CONSTANT is not a constant in C, So the primary way to define a true constant in C is by using #define.
In C++, One of the major advantage of using const over #define is that #defines don't respect scopes so there is no way to create a class scoped namespace. While const variables can be scoped in classes.
Apart from that there are other subtle advantages like:
Avoiding Weird magical numbers during compilation errors:
If you are using #define those are replaced by the pre-processor at time of precompilation So if you receive an error during compilation, it will be confusing because the error message wont refer the macro name but the value and it will appear a sudden value, and one would waste lot of time tracking it down in code.
Ease of Debugging:
Also for same reasons mentioned in #2, while debugging #define would provide no help really.

Another reason that hasn't been mentioned yet is that const variables allow the compiler to perform explicit type-checking, but macros do not. Using const can help prevent subtle data-dependent errors that are often difficult to debug.

I think the main advantage is that you can change the constant without having to recompile everything that uses it.
Since a macro change will effectively modify the contents of the file that use the macro, recompilation is necessary.

In C the const qualifier does not define a constant but instead a read-only object:
#define A 42 // A is a constant
const int a = 42; // a is not constant
A const object cannot be used where a real constant is required, for example:
static int bla1 = A; // OK, A is a constant
static int bla2 = a; // compile error, a is not a constant
Note that this is different in C++ where the const really qualifies an object as a constant.

The only problems you list with const sum up as "I've got the most incompetent compiler I can possibly imagine". The problems with #define, however, are universal- for example, no scoping.

There's no reason to use #define instead of a const int in C++. Any decent C++ compiler will substitute the constant value from a const int in the same way it does for a #define where it is possible to do so. Both take approximately the same amount of flash when used the same way.
Using a const does allow you to take the address of the value (where a macro does not). At that point, the behavior obviously diverges from the behavior of a Macro. The const now needs a space in the program in both flash and in RAM to live so that it can have an address. But this is really what you want.
The overhead here is typically going to be an extra 8 bytes, which is tiny compared to the size of most programs. Before you get to this level of optimization, make sure you have exhausted all other options like compiler flags. Using the compiler to carefully optimize for size and not using things like templates in C++ will save you a lot more than 8 bytes.

Is there a way to do a #define inside of another #define?

I know that I am trying to shoot myself in the leg ;) However, it will allow me to make the rest (big amount) of code smaller and more readable.
Is there any tricky way to create preprocessor macro inside of another preprocessor macro?
Here is the example, what I am looking for. My real scenario is more complex
// That's what I want to do and surely C++ doesn't like it.
#define MACROCREATER(B) #define MACRO##B B+B
void foo()
{
MACROCREATOR(5) // This should create new macro (#define MACRO5 5+5)
int a = MACRO5; // this will use new macro
}

The C++ Standard says (16.3.4.3):
The resulting completely
macro-replaced preprocessing token
sequence [... of the macro expansion...] is not processed as a
preprocessing directive even if it
resembles one...
So no, there is no 'official' way of achieving what you want with macros.

No. Even if a macro expands into something that looks like a preprocessing directive, the expansion is not evaluated as a preprocessing directive.

As a supplement to the answers above, if you really wanted to pre-process a source file twice—which is almost definitely not what you actually want to do—you could always invoke your compiler like this:
g++ -E input.cpp | g++ -c -x c++ - -o output.o
That is, run the file through the preprocessor, then run the preprocessed output via pipe through a full compilation routine, including a second preprocessing step. In order for this to have a reasonably good chance of working, I'd imagine you'd have to be rather careful in how you defined and used your macros, and all in all it would most likely not be worth the trouble and increased build time.
If you really want macros, use standard macro-based solutions. If you really want compile-time metaprogramming, use templates.
On a slightly related note, this reminds me of the fact that raytracing language POV-Ray made heavy use of a fairly complex preprocessing language, with flow-control directives such as #while that allowed conditional repetition, compile-time calculations, and other such goodies. Would that it were so in C++, but it simply isn't, so we just do it another way.

No. The pre-processor is single-pass. It doesn't re-evaluate the macro expansions.

As noted, one can #include a particular file more than once with different macro definitions active. This can make it practical to achieve some effects that could not be practically achieved via any other means.
As a simple example, on many embedded systems pointer indirection is very expensive compared to direct variable access. Code which uses a lot of pointer indirection may very well be twice as large and slow as code which simply uses variables. Consequently, if a particular routine is used with two sets of variables, in a scenario where one would usually pass in a pointer to a structure and then use the arrow operator, it may be far more efficient to simple put the routine in its own file (I normally use extension .i) which is #included once without macro _PASS2 defined, and a second time with. That file can then #ifdef _PASS2/#else to define macros for all the variables that should be different on the two passes. Even though the code gets generated twice, on some micros that will take less space than using the arrow operator with passed-in pointers.

Take a look at m4. It is similar to cpp, but recursive and much more powerful. I've used m4 to create a structured language for assemblers, e.g.
cmp r0, #0
if(eq)
mov r1, #0
else
add r1, #1
end
The "if", "else", and "end" are calls to m4 macros I wrote that generate jumps and labels, the rest is native assembly. In order to nest these if/else/end constructs, you need to do defines within a macro.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js