Avoid expansion of macros while using boost preprocessor sequences - c++

I'm trying to get the OS and compiler name as a string in C++. Although there are many questions about this I did not find a definitive answer. So I tried to use Boost.Predef 1.55 which defines macros of the type BOOST_OS_<OS> and BOOST_OS_<OS>_NAME.
Hence one could simply do if(BOOST_OS_<OS>) return BOOST_OS_<OS>_NAME; for every OS boost supports. Same for compilers with COMP instead of OS. To avoid the repetition I wanted to use Boost.Preprocessor and put them all in a loop.
What I came up with is this:
#define MAKE_STMT_I2(PREFIX) if(PREFIX) return PREFIX ## _NAME;
#define MAKE_STMT_I(type, curName) MAKE_STMT_I2(BOOST_ ## type ## _ ## curName)
#define MAKE_STMT(s, type, curName) MAKE_STMT_I(type, curName)
#define OS_LIST (AIX)(AMIGAOS)(ANDROID)(BEOS)(BSD)(CYGWIN)(HPUX)(IRIX)(LINUX)(MACOS)(OS400)(QNX)(SOLARIS)(UNIX)(SVR4)(VMS)(WINDOWS)(BSDI)(DRAGONFLY)(BSD_FREE)(BSD_NET)(BSD_OPEN)
BOOST_PP_SEQ_FOR_EACH(MAKE_STMT, OS, OS_LIST)
However I run into problems where the values are expanded to soon. E.g. VMS defines already a macro named VMS which then gets replaced in OS_LIST. Even doing something like #define OS_LIST (##AIX##)(##AMIGAOS##)(... does not help as it seems to get expanded in boost later.
How can I avoid the expansion in the sequence completely?

Since you rely on the token VMS being undefined, a quick solution is a simple #undef VMS. Obviously, to avoid breaking code which relies on that macro, you should put your Boost PP code in its own .cpp file.

How can I avoid the expansion in the sequence completely?
You can't. Passing high level data structures as an argument to a macro necessarily involves evaluating the data structure.
You could avoid this problem and still use the boost macros in basically three ways:
1. Undefine problem macros before the call
This is essentially what MSalters recommended.
The idea being that if VMS isn't defined, its evaluation won't expand it.
Here, you risk VMS being left undefined, which could have dire consequences, so you have to mitigate that (MSalters touched on this).
2. Build high level macros from different data
2 might for example use:
#define OS_LIST (S_AIX)(S_BEOS)(S_VMS)
...and require you to change your MAKE_STMT macro complex; for example, this:
#define MAKE_STMT_I2(PREFIX) if(PREFIX) return PREFIX ## _NAME;
#define MAKE_STMT_I(curName) MAKE_STMT_I2(BOOST_O ## curName)
#define MAKE_STMT(s, type, curName) MAKE_STMT_I(curName)
#define OS_LIST (S_AIX)(S_AMIGAOS)(S_ANDROID)(S_BEOS)(S_BSD)(S_CYGWIN)(S_HPUX)(S_IRIX)(S_LINUX)(S_MACOS)(S_OS400)(S_QNX)(S_SOLARIS)(S_UNIX)(S_SVR4)(S_VMS)(S_WINDOWS)(S_BSDI)(S_DRAGONFLY)(S_BSD_FREE)(S_BSD_NET)(S_BSD_OPEN)
(Note: Here I'm ignoring the type; it's not necessary to pass OS in as data to the iteration sequence anyway).
The idea here is to find a different shared portion of BOOST_OS_FOO and BOOST_OS_FOO_NAME to put in your data, so that your data doesn't include the macros you're defining.
Here, you risk S_FOO being defined at some higher level messing you up. You could mitigate this by finding a different piece to use in your data.
3. Build wrapper identifiers
This is easiest to define by example:
#define OS_LIST (AIX)(BEOS)(8VMS)
#define BOOST_OS_8VMS BOOST_OS_VMS
#define BOOST_OS_8VMS_NAME BOOST_OS_VMS_NAME
The idea here is that you're building different BOOST_OS_xxx / BOOST_OS_xxx_NAME form macros, then remapping those back to the desired ones. Using a numeric prefix has the advantage of becoming immune to expansion (such entities are valid preprocessor tokens (pp-numbers), but they cannot be object-like macros).

Related

Use of '#' in unexpected way

There's a macro defined as:
#define SET_ARRAY(field, type) \
foo.field = bar[#field].data<type>();
foo is a structure with members that are of type int or float *. bar is of type cnpy::npz_t (data loaded from .npz file). I understand that the macro is setting the structure member pointer so that it is pointing to the corresponding data in bar from the .npy file contained in the .npz file, but I'm wondering about the usage bar[#field].
When I ran the code through the preprocessor, I get:
foo.struct_member_name = bar["struct_member_name"].data<float>();
but I've never seen that type of usage either. It looks like the struct member variable name is somehow getting converted to an array index or memory offset that resolves to the data within the cnpy::npz_t structure. Can anyone explain how that is happening?
# is actually a preprocessor marker. That means preprocessor commands (not functions), formally called "preprocessor directives", are being executed at compile time. Apart from commands, you'll also find something akin to constants (meaning they have predefined values, either static or dynamic - yes I used the term constants loosely, but I am oversimplifying this right now), but they aren't constants "in that way", they just seem like that to us.
A number of preprocessor commands that you will find are:
#define, #include, #undef, #if (yes, different from the normal "if" in code), #elif, #endif, #error - all those must be prefixed by a "#".
Some values might be the __FILE__, __LINE__, __cplusplus and more. These are not prefixed by #, but can be used in preprocessor macros. The values are dynamically set by the compiler, depending on context.
For more information on macros, you can check the MS Learn page for MSVS or the GNU page for GCC. For other preprocessor values, you can also see this SourceForge page.
And of course, you can define your own macro or pseudo-constants using the #define directive.
#define test_integer 7
Using test_integer anywhere in your code (or macros) will be replaced by 7 after compilation. Note that macros are case-sensitive, just like everything else in C and C++.
Now, let's talk about special cases of "#":
string-izing a parameter (also called "to stringify")
What that means is you can pass a parameters and it is turned into a string, which is what happened in your case. An example:
#define NAME_TO_STRING(x) #x
std::cout << NAME_TO_STRING(Hello) << std::endl;
This will turn Hello which is NOT a string, but an identifier, to a string.
concatenating two parameters
#define CONCAT(x1, x2) x1##x2
#define CONCAT_STRING(x1, x2) CONCAT(#x1,#x2)
#define CONCATENATE(x1, x2) CONCAT_STRING(x1, x2)
(yes, it doesn't work directly, you need a level of indirection for preprocessor concatenation to work; indirection means passing it again to a different macro).
std::cout << CONCATENATE(Hello,World) << std::endl;
This will turn Hello and World which are identifiers, to a concatenated string: HelloWorld.
Now, regarding usage of # and ##, that's a more advanced topic. There are many use cases from macro-magic (which might seem cool when you see it implemented - for examples, check the Unreal Engine as it's extensively used there, but be warned, such programming methods are not encouraged), helpers, some constant definitions (think #define TERRA_GRAV 9.807) and even help in some compile-time checks, for example using constexpr from the newest standards.
If you're curious what is the advantage of using #define versus a const float or const double, it might also be to not be part of the code (there is no actual syntax check on macros if they are not used).
In regards to helper macros, the most common are defining exports when building a library (search __declspec for MSVS and __attribute__ for GCC), the old style inclusion limitators (now replaced by #pragma once) to stop a *.h, *.hxx or *.hpp from being included multiple times in projects and debug handling (search for _DEBUG and assertions on Google). This paragraph handles slightly more advanced topics so I won't cover them here.
I tried to keep the explanation as simple as possible, so the terminology is not that formal. But if you really are curious, I am sure you can find more details online or you can post a comment on this answer :)

Why use these many macros when it is really not needed

When we look at STL header files, we see many macros used where we could instead write single lines, or sometimes single word, directly. I don't understand why people use so many macros. e.g.
_STD_BEGIN
using ::type_info;
_STD_END
#if defined(__cplusplus)
#define _STD_BEGIN namespace std {
#define _STD_END }
#define _STD ::std::
Library providers have to cope with a wide range of implementations and use case. I can see two reasons for use of macros in this case (and there are probably others I'm not thinking about now):
the need to support compilers which don't support namespace. I'm not sure if it would be a concern for a recent implementation, but most of them have a long history and removing such macros even if compilers which don't support namespaces are no more supported (the not protected using ::type_info; hints that it is the case) would have a low priority.
the desire to allow customers to use their implementation of the standard library in addition to the one provided by the compiler provider without replacing it. Configuring of the library would then allow to substitute another name for std.
That
#if defined(__cplusplus)
in your sample is the key. Further down in your source I would expect to see alternative definitions for the macros. Depending on compilation environment, some constructs may require different syntax or not be supported at all; so we write code once, using macros for such constructs, and arrange for the macros to be defined appropriately depending on what is supported.
Macros vs variables : macros can run faster in this case because they are actually made constants after pre-processing.(Operations on constants are faster than that on variables).
Macros vs functions : Using macros avoids the overhead compared to that when using functions requires pushing parameters to stack, pushing return address and then popping from stack....
Macros : Faster execution but requires more memory space.
Function : Slower execution but less memory space.

Limit scope of #define labels

What is the correct strategy to limit the scope of #define labels and avoid unwarranted token collision?
In the following configuration:
Main.c
# include "Utility_1.h"
# include "Utility_2.h"
# include "Utility_3.h"
VOID Main() { ... }
Utility_1.h
# define ZERO "Zero"
# define ONE "One"
BOOL Utility_1(); // Uses- ZERO:"Zero" & ONE:"One"
Utility_2.h
# define ZERO '0'
# define ONE '1'
BOOL Utility_2(); // Uses- ZERO:'0' & ONE:'1'
Utility_3.h
const UINT ZERO = 0;
const UINT ONE = 1;
BOOL Utility_3(); // Uses- ZERO:0 & ONE:1
Note: Utility _1, Utility_2 and Utility_3 have been written independently
Error: Macro Redefinition and Token Collision
Also, Most Worrying: Compiler does not indicate what replaced what incase of token replacement
{Edit} Note: This is meant to be a generic question so please: do not propose enum or const
i.e. What to do when: I MUST USE #define & _Please comment on my proposed solution below.. __
The correct strategy would be to not use
#define ZERO '0'
#define ONE '1'
at all. If you need constant values, use, in this case, a const char instead, wrapped in a namespace.
There are two types of #define Macros:
One which are need only in a single file. Let's call them Private #defines
eg. PI 3.14 In this case:
As per the standard practice: the correct strategy is to place #define labels - in only the implementation, ie. c, files and not the header h file.
Another that are needed by multiple files: Let's call these Shared #defines
eg. EXIT_CODE 0x0BAD In this case:
Place only such common #define labels in header h file.
Additionally try to name labels uniquely with False NameSpaces or similar conventions like prefixing the label with MACRO_ eg: #define MACRO_PI 3.14 so that the probability of collision reduces
#defines don't have scope that corresponds to C++ code; you cannot limit it. They are naive textual replacement macros. Imagine asking "how do I limit the scope when I replace text with grep?"
You should avoid them whenever you possibly can, and favor instead using real C++ typing.
Proper use of macros will relieve this problem almost by itself via naming convention. If the macro is named like an object, it should be an object (and not a macro). Problem solved. If the macro is named like a function (for example a verb), it should be a function.
That applies to literal values, variables, expressions, statements... these should all not be macros. And these are the places that can bite you.
In other cases when you're using like some kind syntax helper, your macro name will almost certainly not fit the naming convention of anything else. So the problem is almost gone. But most importantly, macros that NEED to be macros are going to cause compile errors when the naming clashes.
Some options:
Use different capitalization conventions for macros vs. ordinary identifiers.
const UINT Zero = 0;
Fake a namespace by prepending a module name to the macros:
#define UTIL_ZERO '0'
#define UTIL_ONE '1'
Where available (C++), ditch macros altogether and use a real namespace:
namespace util {
const char ZERO = '0';
const char ONE = '1';
};
What is the correct strategy to limit the scope of #define and avoid unwarrented token collisions.
Avoid macros unless they are truly necessary. In C++, constant variables and inline functions can usually be used instead. They have the advantage that they are typed, and can be scoped within a namespace, class, or code block. In C, macros are needed more often, but think hard about alternatives before introducing one.
Use a naming convention that makes it clear which symbols are macros, and which are language-level identifiers. It's common to reserve ALL_CAPITALS names for the exclusive use of macros; if you do that, then macros can only collide with other macros. This also draws the eye towards the parts of the code that are more likely to harbour bugs.
Include a "pseudo-namespace" prefix on each macro name, so that macros from different libraries/modules/whatever, and macros with different purposes, are less likely to collide. So, if you're designing a dodgy library that wants to define a character constant for the digit zero, call it something like DODGY_DIGIT_ZERO. Just ZERO could mean many things, and might well clash with a zero-valued constant defined by a different dodgy library.
What is the correct strategy to limit the scope of #define and avoid unwarrented token collisions.
Some simple rules:
Keep use of preprocessor tokens down to a minimum.
Some organizations go so far as down this road and limit preprocessor symbols to #include guards only. I don't go this far, but it is a good idea to keep preprocessor symbols down to a minimum.
Use enums rather than named integer constants.
Use const static variables rather than named floating point constants.
Use inline functions rather than macro functions.
Use typedefs rather than #defined type names.
Adopt a naming convention that precludes collisions.
For example,
The names of preprocessor symbols must consist of capital letters and underscores only.
No other kinds of symbols can have a name that consists of capital letters and underscores only.
const UINT ZERO = 0; // Programmer not aware of what's inside Utility.h
First off, if the programmer isn't away of what's inside Utility.h, why did the programmer use that #include statement? Obviously that UINT came from somewhere ...
Secondly, the programmer is asking for trouble by naming a variable ZERO. Leave those all cap names for preprocessor symbols. If you follow the rules, you don't have to know what's inside Utility.h. Simply assume that Utility.h follows the rules. Make that variable's name zero.
I think you really just have to know what it is you're including. That's like trying to include windows.h and then declare a variable named WM_KEYDOWN. If you have collisions, you should either rename your variable, or (somewhat of a hack), #undef it.
C is a structured programming language. It has its limitations. That is the very reason why object oriented systems came in 1st place. In C there seems to be no other way, then to understand what your header files's variables start with _VARIABLE notation, so that there are less chances of it getting over written.
in header file
_ZERO 0
in regular file
ZERO 0
I think the correct strategy would be to place #define labels - in only the implementation, ie. c, files
Further all #define could be put separately in yet another file- say: Utility_2_Def.h
(Quite like Microsoft's WinError.h:Error code definitions for the Win32 api functions)
Overheads:
an extra file
an extra #include statement
Gains:
Abstraction: ZERO is: 0, '0' or "Zero" as to where you use it
One standard place to change all static parameters of the whole module
Utility_2.h
BOOL Utility_2();
Utility_2_Def.h
# define ZERO '0'
# define ONE '1'
Utility_2.c
# include "Utility_2.h"
# include "Utility_2_Def.h"
BOOL Utility_2()
{
...
}

Preprocessor Directive Syntax and Etiquette

I have two unrelated questions:
Is it possible to use #define to define something other than a number? (Such as an extended ASCII character).
Is it considered good practice to use preprocessor directives within the main() function? The only reason I would ever think to do this is to execute different code depending on which OS is being run.
Object-like macros (#define macros with no arguments) are simply replacements. So anything that might otherwise be in your code can be the replacement, for example a literal string: #define PROGRAM_NAME "MyProgram", or multi-line code blocks. Here's a useless example of the latter:
#define INFINITE_PRINTF while (1) \
{ \
printf("looping..."); \
}
As for the second question, it is common practice to use preprocessor directives throughout C code to do just what you've mentioned: conditionally including/excluding code, in main and elsewhere. Occasionally I'll use #define for constants near where they'll be used, for clarity.
You can not only #define strings, people #define code. Although the creator of C++ frowns on use of the preprocessor.
I think main() is too high up for OS specific code. I would try and make functions/classes that wrap any OS specific code. The lower you can place OS specific code, the better.

What is the purpose of the #define directive in C++?

What is the role of the #define directive?
#define is used to create macros in C and in C++. You can read more about it in the C preprocessor documentation. The quick answer is that it does a few things:
Simple Macros - basically just text replacement. Compile time constants are a good example:
#define SOME_CONSTANT 12
simply replaces the text SOME_CONSTANT with 12 wherever it appears in your code. This sort of macro is often used to provide conditional compilation of code blocks. For example, there might be a header included by each source file in a project with a list of options for the project:
#define OPTION_1
#define OPTION_2
#undef OPTION_3
And then code blocks in the project would be wrapped with matching #ifdef/#endif# blocks to enable and disable those options in the finished project. Using the -D gcc flag would provide similar behaviour. There are strong opinions as to whether or not this method is really a good way to provide configuration for an application, however.
Macros with arguments - allows you to make 'function-like' macros that can take arguments and manipulate them. For example:
#define SQUARE(x) ((x) * (x))
would return the square of the argument as its result; be careful about potential order-of-operations or side-effect problems! The following example:
int x = SQUARE(3); // becomes int x = ((3) * (3));
will works fine, but something like:
int y = SQUARE(f()); // becomes int y = ((f()) * (f()));
will call f() twice, or even worse:
int z = SQUARE(x++); // becomes int z = ((x++) * (x++));
results in undefined behaviour!
With some tools, macros with arguments can also be variadic, which can come in handy.
As mentioned below in the comments, overuse of macros, or the development of overly complicated or confusing macros is considered bad style by many - as always, put the readability, maintainability, and debuggability of your code above 'clever' technical tricks.
#define (and it's opposite, #undef) can be used to set compiler directives which can then be tested against using #ifndef or #ifdef. This allows for custom behaviors to be defined within the source file. It's used commonly to compile for different environments or debug code.
An example:
#define DEBUG
#ifdef DEBUG
//perform debug code
#endif
The most common use (by far) of #define is for include guards:
// header.hh
#ifndef HEADER_HH_
#define HEADER_HH_
namespace pony {
// ...
}
#endif
Another common use of #define is in creating a configuration file, commonly a config.h file, where we #define macros based on various states and conditions. Then, in our code we test these macros with #ifdef, #elif defined() etc. to support different compiles for different situations. This is not as solid as the include-guard idiom and you need to be careful here because if the branching is wrong then you can get very obscure compiler errors, or worse, runtime behavior.
In general, other than for include guards you need to think through (twice, preferably) about the problem, and see if you can use the compiler rather than the preprocessor to solve it. The compiler is just smarter than the preprocessor. Not only that, but the compiler can't possibly confuse the preprocessor, whereas the preprocessor most definitely can confuse and mislead the compiler.
The #define directive has two common uses.
The first one, is control how the compiler will act. To do this, we also need #undef, #ifdef and #ifndef. (and #endif too...)
You can make "compiler logic" this way. A common use is to activate or not a debug portion of the code, like that:
#ifdef DEBUG
//debug code here
#endif
And you would be able to for example compile the debug code, by writing a #define DEBUG
Another use of this logic stuff, is to avoid double includes...
Example, file A, #includes file B and C. But file B also includes C. This likely will result in a compilation error, because "C" exists twice.
The solution is write:
#ifndef C_FILE_INCLUDED
#define C_FILE_INCLUDED
//the contents of header "c" go here.
#endif
The other use of #define, is make macros.
The most simple ones, consist of simple substitutions, like:
#define PI 3.14159265
float perimeter(float radius) {
return radius*2*PI;
}
or
#define SHOW_ERROR_MESSAGE printf("An serious error happened");
if ( 1 != 1 ) { SHOW_ERROR_MESSAGE }
Then you can also make macros that accept arguments, printf itself usually is a macro, created with a #define in a header file.
But this should not be done, for two reaons:
first, the speed os macros, is the same of using inline, and second, we have c++ templates, that allow more control over functions with variable type. So, the only reason to use macros with arguments, is make strange constructs, that will be hard to understand later, like metaprogrammed stuff...
In C++, #define has very narrow, specialized roles:
Header guards, described in other answers
Interacting with the standard libraries. For instance, #defining WINDOWS_LEAN_AND_MEAN before including windows.h turns off certain often-problematic macros like MAX.
Advanced macros involving stringization (ie, macros that print debugging messages) or token-pasting.
You should avoid using #define for the following purposes. The reasons are many; see for instace this FAQ entry.
Compile-time constants. Use const instead.
Simple macro functions. Use inline functions and templates instead.
in C or C++ #define allows you to create preprocessor Macros.
In the normal C or C++ build process the first thing that happens is that the PreProcessor runs, the preprocessor looks though the source files for preprocessor directives like #define or #include and then performs simple operations with them.
in the case of a #define directive the preprocessor does simple text based substitution.
For example if you had the code
#define PI 3.14159f
float circum = diameter*PI;
the preprocessor would turn it into:
float circum = diameter* 3.14159;
by simply replacing the instances of PI with the corresponding text. This is only the simplest form of a #define statement for more advanced uses check out this article from MSDN
inCorrectUseOfHashDefine()
{
The role of #define is to baffle people who inherit your code with out of the blue statements like:
foreverandever
because of:
#define foreverandever for(;;)
}
Please favour constants over #define.
It also for setting compiler directives...
Most things about #defines have been already told, but it's not clear that C++ has better replacements for most of their uses:
#define to define numerical constants can be easily replaced by a const "variable", that, as a #define, doesn't really exist in the compiled executable. AFAIK it can be used in almost all the situations where you could use a #defined numerical constant, including array bounds. The main advantage for me is that such constants are clearly typed, so there's no need to add casts in the macros "just to be sure", and are scoped, so they can be kept in namespaces/classes/functions, without polluting all the application.
const int max_array_size=50;
int an_array[max_array_size];
#define to create macros: macros can often be replaced by templates; for example, the dreaded MAX macro
#define MAX(a,b) ((a)<(b)?(b):(a))
, which has several downsides (e.g. repeated arguments evaluation, inevitable inline expansion), can be replaced by the max function
template<typename T> T & max(T & a, T & b)
{
return a<b?b:a;
}
which can be type-safe (in this version the two arguments are forced to be of the same type), can be expanded inline as well as not (it's compiler decision), evaluates the arguments just once (when it's called), and is scoped. A more detailed explanation can be found here.
Still, macros must still be used for include guards, to create some kind of strange language extensions that expand to more line of code, that have unbalanced parenthesis, etc.