How to speed up parsing C++ code using libclang? - c++

I have a project that must be parsed before original compilation. It's needed for reflection purposes. In short I want to check edited .h files for some attributes in the code and gather info about it to generate specific include files.
For example reflectable classes/fields/methods will be marked as META(parameters...). This macro will be defined like this #define META(...) __attribute__(annotate("reflectable")).
The code should looks like this (similar to Qt QObject or Unreal Engine 4):
1.
// --- macros outside of header to parse ---
// Marks declaration as reflectable (with some metadata), empty for base compiler, useful for clang
#ifdef __clang__
#define META(...) __attribute__(annotate("reflectable"))
#else
#define META(...)
#endif
// Injects some generated code after parsing step, empty for clang, useful for base compiler
#ifdef __clang__
#define GENERATED_REFLECTION_INFO
#else
#define GENERATED_REFLECTION_INFO GENERATE_CODE(__FILE__, __LINE__)
#endif
// --- reflectable class ---
META(Serializable, Exposed, etc)
class MyReflectableClass : public BaseReflectableClass
{
GENERATED_REFLECTION_INFO
public:
// Fields examples
META()
int32 MyReflectableField;
META(SkipSerialize)
float MySecondaryReflectableField;
// Methods example
META(RemoteMethod)
void MyReflectableMethod(int32 Param1, uint8 Param2);
};
libclang was used for this goal. But when project has includes like <string> or <memory>, it causes long time parsing of long chain of dependencies in standard library.
Also if project has a lot of headers, similar situation would be happened.
How can I avoid parsing of full standard library? Probably some kind of caches that I could use in libclang?
Or probably are there some ways to skip analyzing of these includes and convince the parser that unknown types are acceptable?
So how can I optimize application to reduce the time of parsing?

C++, as a language, is not very amenable to speculative parsing due to type-dependent parsing. For example, the < token might have different meanings depending upon whether it is preceded by a template or not.
However, libclang supports precompiled headers, which let you cache the standard library headers.

Related

Why do you need inclusion guard for C++ header files?

I get roughly what it does. What I don't understand is why it's not the default? What are the use cases where some header file would need to be included multiple times?
The reason it's not the default is primarily historical these days -- when the C language was formalized, #include was specified that it must act exactly as if the user had copy-and-pasted the specified file's contents at the location of the #include-line; and C++ wanted (and wants) to remain as compatible as possible with C, so C++ inherited that behavior from C.
As for a use-case where including the same header file more than once might be useful; one instance where I found it useful was for simulating a templated-container-class in C (because C doesn't support templates directly). I had a container-implementation-header-file that looked something like this (but more elaborate; I'm showing a simplified version here for readability):
// MyContainerImplemention.h
// be sure to #define MYTYPE and MYARRAYSIZE
// before #include-ing this file!
struct ArrayOf##MYTYPE
{
MYTYPE arrayOfItems[MYARRAYSIZE];
};
inline void Set##MYTYPE##Item(struct ArrayOf##MyType * container, int which, MYTYPE item)
{
container[which] = item;
}
[... and so on for various other MYTYPE-specific methods ...]
... then my .c files could do something like:
#define MYTYPE int
#define MYARRAYSIZE 10
#include "MyContainerImplementation.h"
#undef MYARRAYSIZE
#undef MYTYPE
#define MYTYPE short
#define MYARRAYSIZE 15
#include "MyContainerImplementation.h"
#undef MYARRAYSIZE
#undef MYTYPE
struct ArrayOfint myInts;
struct ArrayOfshort myShorts;
SetintItem(&myInts, 5, 12);
SetshortItem(&myShorts, 3, 2);
[...]
... and end up with the container "class" and its associated methods implemented for each data-type, without having to manually write a new implementation of the container "class" each time.
Yes, it was extremely ugly -- but not as ugly as having to manually write out thousands of lines of redundant container-code would have been. (The real container-implementation-header-file implemented a hash table and was several hundred lines long)
Without include guards or #pragma once the compiler would have to maintain a list of included files. This is not easy, because of different possible paths to these files (and #pragma once doesn't completely solve this) and would be expecting a bit much of the original C compilers, which had to work with very limited memory.
What's true today is not necessarily true when C came about and the C pre-processor, upon which the C++ one is based, was created.
#pragma once is just a step towards having proper C++ modules so this annoying historical legacy is finally eliminated.
Yes, it's valid to include a file multiple times, and yes, each time it's included it can behave in entirely different ways. This is why making pre-compiled headers is a huge headache for compiler developers.
Guard blocks or #pragma once are included in order to prevent a file from being included multiple times.
#pragma once, while supported on most compilers, is not an official part of the c++ standard, and may not work on every compiler. You can use a guard block, which will work on any compiler. An example of a guard block in the file MyClass.hpp would be:
#ifndef MYCLASS_HPP
#define MYCLASS_HPP
//Code here
#endif

Feature flags / toggles when artifact is a library and flags affect C or C++ headers

There exists quite a bit of discussions on feature flags/toggles and why you would use them but most of the discussion on implementing them center around (web or client) apps. If your product/artifact is a C or C++ library and your public headers are affected by the flags, how would you implement them?
The "naive" way of doing it doesn't really work:
/// Does something
/**
* Does something really cool
#ifdef FEATURE_FOO
* #param fooParam describe param for foo
#endif
*/
void doSomethingCool(
#ifdef FEATURE_FOO
int fooParam = 42
#endif
);
You wouldn't want to ship something like this.
Your library that you ship was built for a certain feature flag combination, clients shouldn't need to #define the same feature flags to make things work
The ifdefs in your public header are ugly
And most importantly, if you disable your flag, you don't want clients to see anything about the disabled features - maybe it is something upcoming and you don't want to show your stuff until it is ready
Running the preprocessor on the file to get the header for distribution doesn't really work because that would not only act on feature flags but also do everything else the preprocessor does.
What would be a technical solution to this that doesn't have these flaws?
This kind of goo ends up in a codebase due to versioning. Broad topic with very few happy answers. But you certainly want to avoid making it more difficult then it needs to be. Focus on the kind of compatibility you want to provide.
The syntax proposed in the snippet is only required when you need binary compatibility. It keeps the library compatible with a doSomethingCool() call in the client code (passing no argument) without having to compile that client code. In other words, the client programmer does nothing at all beyond copying the updated .dll or .so file, does not need any updated headers and it is entirely your burden to get the feature flags right. Binary compatibility is pretty difficult to pull off reliably, beyond the flag wrangling, easy to make a mistake.
But what you are actually talking about is source compatibility, you do provide the user with an updated header and he rebuilds his code to use the library update. In which case you don't need the feature flag, the C++ compiler by itself ensures that an argument is passed, it will be 42. No flag required at all, either on your end or the user's end.
Another way to do it is by providing an overload. In other words, both a doSomethingCool() and a doSomethingCool(int) function. The client programmer keeps using the original overload until he's ready to move ahead. You also favor an overload when the function body has to change too much. If these functions are not virtual then it even provides link compatibility, could be useful in some select case. No feature flags required.
I'd say it's a relatively broad question, but I'll trow in my two cents.
First, you really want to separate the public headers from implementation (source and internal headers, if any). The public header that gets installed (e.g., at /usr/include) should contain function declaration and, preferably, a constant boolean to inform the client whether the library has a certain feature compiled in or not, as so:
#define FEATURE_FOO 1
void doSomethingCool();
Such a header is generally generated. Autotools is de facto standard tools for this purpose in GNU/Linux. Otherwise you can write your own scripts to do this.
For completeness, in .c file you should have the
void doSomethingCool(
#ifdef FEATURE_FOO
int fooParam = 42
#endif
);
It's also up to your distribution tools to keep the installed headers and library binaries in sync.
Use the forward declarations
Hide implementation by using a pointer (Pimpl idiom)
this code id quoted from the previous link:
// Foo.hpp
class Foo {
public:
//...
private:
struct Impl;
Impl* _impl;
};
// Foo.cpp
struct Foo::Impl {
// stuff
};
Binary compatibility is not a forte of C++, it probably isn’t worth considering.
For C, you might construct something like an interface class, so that your first touch with the library is something like:
struct kv {
char *tag;
int val;
};
int Bind(struct kv *compat, void **funcs, void **stamp);
and your access to the library is now:
#define MyStrcpy(src, dest) (funcs->mystrcpy((stamp)(src),(dest)))
The contract is that Bind provides/constructs an appropriate (func, stamp) pair for the attribute set you provided; or fails if it cannot. Note that Bind is the only bit that has to know about multiple layouts of *funcs,*stamp; so it can transparently provide robust interface for this reduced version of the problem.
If you wanted to get really fancy, you might be able to achieve the same by re-writing the PLT that the dlopen/dlsym prepare for you, but:
You are grossly expanding your attack surface.
You are adding a lot of complexity for very little gain.
You are adding platform / architecture specific code where none is warranted.
A few downsides remain. You have to invoke Bind before any part of your program/library attempts to use it. Attempts to solve that lead straight to hell (Finding C++ static initialization order problems), which must make N.Wirth smile. If you get too clever with your Bind(), you will wish you hadn’t. You might want to be careful about re-entrency, since a given client might Bind multiple times for different attribute sets (users are such a pain).
That's how I would manage this in pure C.
First of all the features, I would pack them in a single unsigned int 32/64 bits long to keep them as compact as possible.
Second step a private header to use only in library compilation, where I would define a macro to create the API function wrapper, and the internal function:
#define CoolFeature1 0x00000001 //code value as 0 to disable feature
#define CoolFeature2 0x00000010
#define CoolFeature3 0x00000100
.... // Other features
#define Cool CoolFeature1 | CoolFeature2 | CoolFeature3 | ... | CoolFeature_n
#define ImplementApi(ret, fname, ...) ret fname(__VA_ARGS__) \
{ return Internal_#fname(Cool, __VA_ARGS__);} \
ret Internal_#fname(unsigned long Cool, __VA_ARGS__)
#include "user_header.h" //Include the standard user header where there is no reference to Cool features
Now we have a wrapper with a standard prototype that will be available in the user definition header, and an internal version which keep an addition flag group to specify optional features.
When coding using the macro you can write:
ImplementApi(int, MyCoolFunction, int param1, float param2, ...)
{
// Your code goes here
if (Cool & CoolFeature2)
{
// Do something cool
}
else
{
// Flat life ...
}
...
return 0;
}
In the case above you'll get 2 definitions:
int Internal_MyCoolFunction(unsigned long Cool, int param1, float param2, ...);
int MyCoolFunction(int param1, float param2, ...)
You can eventually add in the macro, for the API function, the attributes for export if you're distribuiting a dynamic library.
You can even use the same definition header if the definition of ImplementApi macro is done on the compiler command line, in that case the following simple definition in the header will do:
#define ImplementApi(ret, fname, ...) ret fname(__VA_ARGS__);
The last will generate only the exported API prototypes.
This suggestion, of course, is not exhaustive. There a lot of more adjustments you can do to make more elegant and automatic the definitions. I.e. including a sub header with function list to create only API function prototypes for the user, and both, internal and API, for developers.
Why are you using defines for feature flags? Feature flags are supposed to enable you to turn features on and off runtime, not compile time.
In the code you would then case out implementation as early as possible using interfaces and concrete classes that are chosen based on the feature flag.
If users of the header files arent supposed to be able to access the feature flags, then create header files that you dont distribute, that are only included in the implementation c/cpp files. You can then flip the flags in the private headers when you compile the library that they link to.
If you are keeping features internal until you are ready to release, you can move the feature flag into the public header, or just remove the feature flag entirely and switch to using the new implementation.
Sloppy example if you want this compile time:
public_class.h
class Thing
{
public:
void DoSomething();
}
private_class_feature1.h
#define USE_FEATURE_1
class NewFeatureImp
{
public:
static void CoolNewWay1();
}
public_class.cpp
#include “public_class.h”
#include “private_class_feature1.h”
void Thing::DoSomething()
{
#ifdef USE_FEATURE_1
NewFeatureImpl::CoolNewWay();
#else
// Regular impl
#endif
}

How to avoid long compilation time for #define in common header

I was wondering if there is an elegant way to solve this problem. Suppose there's a common header eg
// common.h
#ifndef COMMON_H
#define COMMON_H
#define ENABLE_SOMETHING
//#define ENABLE_SOMETHING_ELSE
#define ENABLE_WHATEVER
// many others
#endif
Now this file is included by, let's say 100 other header files and the various #define are used to enable or disable some parts of code which are confined to just 1-2 files.
Everytime a single #define is changed the whole project seems to be rebuilt (I'm working on Xcode 5.1), which makes sense as it must be literally replaced all around the code and the compiler can't know a priori where it's used.
I'm trying to find a better way to manage this, to avoid long compilation times, as these defines are indeed changed many times. Splitting each define in their corresponding file/files could be a solution but I'd like the practical way to have everything packed together.
So I was wondering if there is a pattern which is usually used to solve this problem, I was thinking about having
// common.h
class Enables
{
static const bool feature;
};
// common..cpp
bool Enables::feature = false;
Will this be semantically equivalent when compiling optimized binary? (eg. code inside false enables will totally disappear).
You have two distinct problems here:
Splitting each define in their corresponding file/files could be a solution but I'd like the practical way to have everything packed together.
This is your first problem. If I undestand correctly, if you have more than one functional area, you are not interested in having to include a header for each of them (but a single header for everything).
Apply these steps:
do split the code by functionality, into different headers; Each header should contain (at most) what was enabled by a single #define FEATURESET (and be completely agnostic to the existence of the FEATURESET macro).
ensure each header is only compiled once (add #pragma once at the beginning of each feature header file)
add a convenience header file that performs #if or #ifdef based on your defined features, and includes the feature files as required:
// parsers.h
// this shouldn't be here: #pragma once
#ifdef PARSEQUUX_SAFE
#include <QuuxSafe.h>
#elif defined PARSEQUUX_FAST
#include <QuuxFast.h>
#else
#include <QuuxSafe.h>
#endif
// eventually configure static/global class factory here
// see explanation below for mentions of class factory
Client code:
#include <parsers.h> // use default Quux parser
#define PARSEQUUX_SAFE
#include <parsers.h> // use safe (but slower) Quux parser
So I was wondering if there is a pattern which is usually used to solve this problem
This is your second problem.
The canonical way to enable functionality by feature in C++, is to define feature API, in terms of base classes, class factories and programming to a generic interface.
// common.h
#pragma once
#include <Quux.h> // base Quux class
struct QuuxFactory
{
enum QuuxType { Simple, Feathered };
static std::unique_ptr<Quux> CreateQuux(int arg);
static QuuxType type;
};
// common.cpp:
#include <common.h>
#include <SimpleQuux.h> // SimpleQuux: public Quux
#include <FeatheredQuux.h> // FeatheredQuux: public Quux
std::unique_ptr<Quux> QuuxFactory::CreateQuux(int arg)
{
switch(type) {
case Simple:
return std::unique_ptr<Quux>{new SimpleQuux{arg}};
case Feathered:
return std::unique_ptr<Quux>{new FeatheredQuux{arg}};
};
// TODO: handle errors
}
Client code:
// configure behavior:
QuuxFactory::type = QuuxFactory::FeatheredQuux;
// ...
auto quux = QuuxFactory::CreateQuux(10); // creates a FeatheredQuux in this case
This has the following advantages:
it is straightforward and uses no macros
it is reusable
it provides an adequate level of abstraction
it uses no macros (as in "at all")
the actual implementations of the hypothetical Quux functionality are only included in one file (as an implementation detail, compiled only once). You can include common.h wherever you want and it will not include SimpleQuux.h and FeatheredQuux.h at all.
As a generic guideline, you should write your code, such that it requires no macros to run. If you do, you will find that any macros you want to add over it, are trivial to add. If instead you rely on macros from the start to define your API, the code will be unusable (or close to unusable) without them.
There is a way to split defines but still use one central configuration header.
main_config.h (it must not have an include guard or #pragma once, because that would cause strange results if main_config.h is included more than once in one compilation unit):
#ifdef USES_SOMETHING
#include "something_config.h"
#endif
#ifdef USES_WHATEVER
#include "whatever_config.h"
#endif
something_config.h (must not have include guards for the same reason as main_config.h):
#define ENABLE_SOMETHING
All source and header files would #include only main_config.h, but before the include they must declare what part of it would they be referring to:
some_source.cpp:
#define USES_SOMETHING
#include "main_config.h"
some_other_file.h:
#define USES_WHATEVER
#include "main_config.h"

In C/C++, is there a directive similar to #ifndef for typedefs?

If I want to define a value only if it is not defined, I do something like this :
#ifndef THING
#define THING OTHER_THING
#endif
What if THING is a typedef'd identifier, and not defined? I would like to do something like this:
#ifntypedef thing_type
typedef uint32_t thing_type
#endif
The issue arose because I wanted to check to see if an external library has already defined the boolean type, but I'd be open to hearing a more general solution.
There is no such thing in the language, nor is it needed. Within a single project you should not have the same typedef alias referring to different types ever, as that is a violation of the ODR, and if you are going to create the same alias for the same type then just do it. The language allows you to perform the same typedef as many times as you wish and will usually catch that particular ODR (within the same translation unit):
typedef int myint;
typedef int myint; // OK: myint is still an alias to int
//typedef double myint; // Error: myint already defined as alias to int
If what you are intending to do is implementing a piece of functionality for different types by using a typedef to determine which to use, then you should be looking at templates rather than typedefs.
C++ does not provide any mechanism for code to test presence of typedef, the best you can have is something like this:
#ifndef THING_TYPE_DEFINED
#define THING_TYPE_DEFINED
typedef uint32_t thing_type
#endif
EDIT:
As #David, is correct in his comment, this answers the how? part but importantly misses the why? It can be done in the way above, If you want to do it et all, but important it you probably don't need to do it anyways, #David's answer & comment explains the details, and I think that answers the question correctly.
No there is no such facility in C++ at preprocessing stage. At the max can do is
#ifndef thing_type
#define thing_type uint32_t
#endif
Though this is not a good coding practice and I don't suggest it.
Preprocessor directives (like #define) are crude text replacement tools, which know nothing about the programming language, so they can't act on any language-level definitions.
There are two approaches to making sure a type is only defined once:
Structure the code so that each definition has its place, and there's no need for multiple definitions
#define a preprocessor macro alongside the type, and use #ifndef to check for the macro definition before defining the type.
The first option will generally lead to more maintainable code. The second could cause subtle bugs, if you accidentally end up with different definitions of the type within one program.
As other have already said, there are no such thing, but if you try to create an alias to different type, you'll get a compilation error :
typedef int myInt;
typedef int myInt; // ok, same alias
typedef float myInt; // error
However, there is a thing called ctag for finding where a typedef is defined.
The problem is actually real PITA, because some APIs or SDKs redefine commonly used things. I had issue that header files for a map processing software (GIS) were redefining TRUE and FALSE (generally used by windows SDK)keywords to integer literals instead of true and false keywords ( obviously, that can break SOMETHING). And yes, famous joke "#define true false" is relevant.
define would never feel a typedef or constant declared in C\C++ code because preprocessor doesn't analyze code, it only scans for # statements. And it modifies code prior to giving it to syntax analyzer. SO, in general, it's not possible.
https://msdn.microsoft.com/en-us/library/5xkf423c.aspx?f=255&MSPPError=-2147217396
That one isn't portable so far, though there were known request to implement it in GCC. I think, it also counts as "extension" in MSVC. It's a compiler statement, not a preprocessor statement, so it will not "feel" defined macros, it would detect only typedefs outside of function body. "full type" there means that it will react on full definition, ignoring statements like "class SomeClass;". Use it at own risk.
Edit: apparently it also supported on MacOS now and by Intel comiler with -fms-dialect flag (AIX\Linux?)
This might not directly answer the question, but serve as a possible solution to your problem.
Why not try something like this?
#define DEFAULT_TYPE int // just for argument's sake
#ifndef MY_COOL_TYPE
#define MY_COOL_TYPE DEFAULT_TYPE
#endif
typedef MY_COOL_TYPE My_Cool_Datatype_t;
Then if you want to customize the type, you can either define MY_COOL_TYPE somewhere above this (like in a "configure" header that is included at the top of this header) or pass it as a command line argument when compiling (as far as I know you can do this with GCC and LLVM, maybe others, too).
No there is nothing like what you wanted. I have had your same problem with libraries that include their owntypedefs for things like bool. It gets to be a problem when they just don't care about what you use for bool or if any other libs might be doing the same thing!!
So here's what I do. I edit the header file for the libs that do such things and find the typedef bool and add some code like this:
#ifdef USE_LIBNAME_BOOL
typedef unsigned char bool; // This is the lib's bool implementation
#else
#include <stdbool.h>
#endif
Notice that I included if I didn't want to use the libs' own bool typdef. This means that you need C99 support or later.
As mentioned before this is not included in the C++ standard, but you might be able to use autotools to get the same functionality.
You could use the ac_cxx_bool macro to make sure bool is defined (or different routines for different datatypes).
The solution I ended up using was including stdbool.h. I know this doesn't solve the question of how to check if a typedef is already defined, but it does let me ensure that the boolean type is defined.
This is a good question. C and Unix have a history together, and there are a lot of Unix C typedefs not available on a non-POSIX platform such as Windows (shhh Cygwin people). You'll need to decide how to answer this question whenever you're trying to write C that's portable between these systems (shhhhh Cygwin people).
If cross-platform portability is what you need this for, then knowing the platform-specific preprocessor macro for the compilation target is sometimes helpful. E.g. windows has the _WIN32 preprocessor macro defined - it's 1 whenever the compilation target is 32-bit ARM, 64-bit ARM, x86, or x64. But it's presence also informs us that we're on a Windows machine. This means that e.g. ssize_t won't be available (ssize_t, not size_t). So you might want to do something like:
#ifdef _WIN32
typedef long ssize_t;
#endif
By the way, people in this thread have commented about a similar pattern that is formally called a guard. You see it in header files (i.e. interfaces or ".h" files) a lot to prevent multiple inclusion. You'll hear about header guards.
/// #file poop.h
#ifndef POOP_H
#define POOP_H
void* poop(Poop* arg);
#endif
Now I can include the header file in the implementation file poop.c and some other file like main.c, and I know they will always compile successfully and without multiple inclusion, whether they are compiled together or individually, thanks to the header guards.
Salty seadogs write their header guards programmatically or with C++11 function-like macros. If you like books I recommend Jens Gustedt's "Modern C".
It is not transparent but you can try to compile it one time without typedef (just using the alias), and see if it compiles or not.
There is not such things.
It is possible to desactivate this duplicate_typedef compilator error.
"typedef name has already been declared (with same type)"
On a another hand, for some standardized typedef definition there is often a preprocessor macro defined like __bool_true_false_are_defined for bool that can be used.

What is the purpose of the #define directive in C++?

What is the role of the #define directive?
#define is used to create macros in C and in C++. You can read more about it in the C preprocessor documentation. The quick answer is that it does a few things:
Simple Macros - basically just text replacement. Compile time constants are a good example:
#define SOME_CONSTANT 12
simply replaces the text SOME_CONSTANT with 12 wherever it appears in your code. This sort of macro is often used to provide conditional compilation of code blocks. For example, there might be a header included by each source file in a project with a list of options for the project:
#define OPTION_1
#define OPTION_2
#undef OPTION_3
And then code blocks in the project would be wrapped with matching #ifdef/#endif# blocks to enable and disable those options in the finished project. Using the -D gcc flag would provide similar behaviour. There are strong opinions as to whether or not this method is really a good way to provide configuration for an application, however.
Macros with arguments - allows you to make 'function-like' macros that can take arguments and manipulate them. For example:
#define SQUARE(x) ((x) * (x))
would return the square of the argument as its result; be careful about potential order-of-operations or side-effect problems! The following example:
int x = SQUARE(3); // becomes int x = ((3) * (3));
will works fine, but something like:
int y = SQUARE(f()); // becomes int y = ((f()) * (f()));
will call f() twice, or even worse:
int z = SQUARE(x++); // becomes int z = ((x++) * (x++));
results in undefined behaviour!
With some tools, macros with arguments can also be variadic, which can come in handy.
As mentioned below in the comments, overuse of macros, or the development of overly complicated or confusing macros is considered bad style by many - as always, put the readability, maintainability, and debuggability of your code above 'clever' technical tricks.
#define (and it's opposite, #undef) can be used to set compiler directives which can then be tested against using #ifndef or #ifdef. This allows for custom behaviors to be defined within the source file. It's used commonly to compile for different environments or debug code.
An example:
#define DEBUG
#ifdef DEBUG
//perform debug code
#endif
The most common use (by far) of #define is for include guards:
// header.hh
#ifndef HEADER_HH_
#define HEADER_HH_
namespace pony {
// ...
}
#endif
Another common use of #define is in creating a configuration file, commonly a config.h file, where we #define macros based on various states and conditions. Then, in our code we test these macros with #ifdef, #elif defined() etc. to support different compiles for different situations. This is not as solid as the include-guard idiom and you need to be careful here because if the branching is wrong then you can get very obscure compiler errors, or worse, runtime behavior.
In general, other than for include guards you need to think through (twice, preferably) about the problem, and see if you can use the compiler rather than the preprocessor to solve it. The compiler is just smarter than the preprocessor. Not only that, but the compiler can't possibly confuse the preprocessor, whereas the preprocessor most definitely can confuse and mislead the compiler.
The #define directive has two common uses.
The first one, is control how the compiler will act. To do this, we also need #undef, #ifdef and #ifndef. (and #endif too...)
You can make "compiler logic" this way. A common use is to activate or not a debug portion of the code, like that:
#ifdef DEBUG
//debug code here
#endif
And you would be able to for example compile the debug code, by writing a #define DEBUG
Another use of this logic stuff, is to avoid double includes...
Example, file A, #includes file B and C. But file B also includes C. This likely will result in a compilation error, because "C" exists twice.
The solution is write:
#ifndef C_FILE_INCLUDED
#define C_FILE_INCLUDED
//the contents of header "c" go here.
#endif
The other use of #define, is make macros.
The most simple ones, consist of simple substitutions, like:
#define PI 3.14159265
float perimeter(float radius) {
return radius*2*PI;
}
or
#define SHOW_ERROR_MESSAGE printf("An serious error happened");
if ( 1 != 1 ) { SHOW_ERROR_MESSAGE }
Then you can also make macros that accept arguments, printf itself usually is a macro, created with a #define in a header file.
But this should not be done, for two reaons:
first, the speed os macros, is the same of using inline, and second, we have c++ templates, that allow more control over functions with variable type. So, the only reason to use macros with arguments, is make strange constructs, that will be hard to understand later, like metaprogrammed stuff...
In C++, #define has very narrow, specialized roles:
Header guards, described in other answers
Interacting with the standard libraries. For instance, #defining WINDOWS_LEAN_AND_MEAN before including windows.h turns off certain often-problematic macros like MAX.
Advanced macros involving stringization (ie, macros that print debugging messages) or token-pasting.
You should avoid using #define for the following purposes. The reasons are many; see for instace this FAQ entry.
Compile-time constants. Use const instead.
Simple macro functions. Use inline functions and templates instead.
in C or C++ #define allows you to create preprocessor Macros.
In the normal C or C++ build process the first thing that happens is that the PreProcessor runs, the preprocessor looks though the source files for preprocessor directives like #define or #include and then performs simple operations with them.
in the case of a #define directive the preprocessor does simple text based substitution.
For example if you had the code
#define PI 3.14159f
float circum = diameter*PI;
the preprocessor would turn it into:
float circum = diameter* 3.14159;
by simply replacing the instances of PI with the corresponding text. This is only the simplest form of a #define statement for more advanced uses check out this article from MSDN
inCorrectUseOfHashDefine()
{
The role of #define is to baffle people who inherit your code with out of the blue statements like:
foreverandever
because of:
#define foreverandever for(;;)
}
Please favour constants over #define.
It also for setting compiler directives...
Most things about #defines have been already told, but it's not clear that C++ has better replacements for most of their uses:
#define to define numerical constants can be easily replaced by a const "variable", that, as a #define, doesn't really exist in the compiled executable. AFAIK it can be used in almost all the situations where you could use a #defined numerical constant, including array bounds. The main advantage for me is that such constants are clearly typed, so there's no need to add casts in the macros "just to be sure", and are scoped, so they can be kept in namespaces/classes/functions, without polluting all the application.
const int max_array_size=50;
int an_array[max_array_size];
#define to create macros: macros can often be replaced by templates; for example, the dreaded MAX macro
#define MAX(a,b) ((a)<(b)?(b):(a))
, which has several downsides (e.g. repeated arguments evaluation, inevitable inline expansion), can be replaced by the max function
template<typename T> T & max(T & a, T & b)
{
return a<b?b:a;
}
which can be type-safe (in this version the two arguments are forced to be of the same type), can be expanded inline as well as not (it's compiler decision), evaluates the arguments just once (when it's called), and is scoped. A more detailed explanation can be found here.
Still, macros must still be used for include guards, to create some kind of strange language extensions that expand to more line of code, that have unbalanced parenthesis, etc.