Is there any way to use multiple precompiled headers simultaneously with Clang?

Is there any way to use multiple precompiled headers simultaneously with Clang? - c++

I am playing around with a clang++ command line in order to learn how precompiled headers work. I've got something like this:
clang++ [several lines of options] sourcefile.cpp -o sourcefile.o -include durr.h -include hurr.h
where the two headers included via the command line have been precompiled into corresponding .h.pch files.
If I "-include" just one of the two headers, compilation succeeds and is faster than it is when I include neither, in the normal fashion for precompiled headers. But when I include both (as above), I get this error:
clang: warning: precompiled header 'hurr.h.pch' was ignored because '-include hurr.h' is not first '-include'
Is there any way (not necessarily using -include) to use multiple .h.pch precompiled header files to speed compilation of one .cpp file? I understand that such a feature would be seriously complicated by the tendency of the preprocessor to cause headers to affect one another (even if only via include guards). I don't really expect what I want to be supported, now that I've thought about it a little. But I'm trying to confirm here. The above error message is suggestive but not comprehensive, and the Clang user manual didn't seem to tell me the answer....

It turns out there is: Clang supports "chained" PCH files, which is a feature that allows one PCH to represent an extension of another one. If the latter is included during some later compilation, then both it and the PCH that it depends on will be used to speed compilation. I think.
Something like this might produce an ordinary PCH:
clang++ -x c++-header header1.h -o header1.h.pch
And, if I'm understanding correctly (questionable), then something like this would produce a chained PCH that would extend header1.h.pch:
clang++ -x c++-header header2.h -o header2.h.pch -include-pch header1.h.pch
And then that chain of two PCH files can be used to speed compilation like so:
clang++ source.cpp -o source.o -include-pch header2.h.pch
(The parent PCH does not need to be mentioned in the command; header2.h.pch already knows where to find it I think.)
I haven't found any way to explicitly demand this sort of "chaining" via the command line. Simply including a PCH when compiling another PCH seems to produce a chained PCH... probably. My evidence that it does is mainly that in cases where header2.h actually includes header1.h, this technique seems to produce a small header2.h.pch and a large header1.h.pch, even though one would expect that if chaining weren't happening then header2.h.pch would generally be larger than header1.h.pch, since it contains at least as much information. This seems to match what I understand the purpose of PCH chaining to be: It saves resources by storing a dependent PCH's duplicate information as a cheap reference to the contents of another PCH.
My casual exploration suggests that a dependent PCH may itself have another one depending on it, extending the chain to three or more steps, though I'm not certain of this. When I tried to extend a chain involving real code by something like fifteen or twenty links, the PCH file sizes eventually blew up in what appeared to be an erroneous way, approximately doubling with each step. This eventually produced 300 MB PCH files for headers that, if compiled into PCHs without any attempt at chaining, would produce files smaller than 6 MB in size. This happened in Clang 3.6.something, and in Clang 3.7.0. I imagine it's a bug but who knows. I gave up on my exploration before reaching the point where I cared to try to pin it down and report it. And maybe the PCH chains aren't intended to ever grow very long anyway. This feature doesn't seem to be something people normally use....
Regardless, there seemed to be no way to do what I really wanted: mixing any two arbitrary PCHs, as long as they didn't directly conflict with one another, regardless of how they had been created. Chaining only allows two PCHs to be used together if one depends on the other. I had been interested in speeding compilation by making a PCH for each header in a project and then mixing groups of PCHs together during compilation as appropriate. But accomplishing this with chained PCHs seems to require making a tree of PCH files in which more than one PCH may correspond to a single header. I actually attempted to generate such a thing automatically, and seemed to succeed... but the above-mentioned "error" (if error it was) bogged me down, and to the extent that I did succeed, the time savings were not impressive enough to warrant continuing.
There is also some kind of "module" system in Clang that may be relevant here. But I have the impression that trying to exploit this to achieve the effect I want, if it could even be successful at all, would probably require me to change my source code to something special, and maybe to something non-standard. I didn't look into it much though, so maybe not. But anyway I guess standard C++ will probably get modules eventually and then all of this mess will (I hope) become a thing of the past.
GCC did not seem to support anything related to what I wanted, incidentally.

Related

How can I tell if two source files produce functionally identical code?

I'm using uncrustify to format a directory full of C and C++ code. I need to ensure that uncrustify won't change the resulting code; I can't do a diff on the object file or binaries because the object files have a timestamp and so won't ever be identical. I can't check the source of the files one by one because I'd be here for years.
The project uses make for the build process so I was wondering if there is some way to output something there that could be checked.
I've searched SO and Google to no avail, so my apologies if this is a duplicate.
EDIT: I'm using gcc/g++ and compiling for 32 bit.

One possibility would be to compile them with CLang, and get the output as LLVM IR. If memory serves, this should be command line arguments of -S -emit-llvm.
To do the same with gcc/g++, you can use one of its flags to generate a file containing its intermediate representation at some stage of compilation. Early stages will still show differences from changes in white space and such, but a quick test indicates that by the SSA stage, such non-operational changes have disappeared from the IR.
g++ -c -fdump-tree-ssa foo.cpp
In addition to the normal object file, this will produce a file named foo.cpp.018t.ssa that represents the semantic actions in your source file.
As noted above, I haven't tested this extensive though--it's possible that at this stage, some non-operational changes will still produce different output files (though I kind of doubt it). If necessary, you can use -fdump-tree-all to get output from all stages of compilation1. As a simple rule of thumb, I'd expect later stages to be more immune to changes in formatting and such, so if the ssa stage doesn't work, my next choice would probably be the optimized stage, which is one of the last stages (note: the files produced are numbered in order of the stage that produced each file, so when you dump all stages, it's obvious which are produced by early stages and which by later stages).
1. Note that this produces quite a few files, many of them quite large. The first time you do this, you probably want to do it on a single source file in a directory by itself to keep from drowning in files, so to speak. Also, don't be surprised when compilation this way takes quite a bit longer than normal.

Fast way to identify necessary includes for C++ [duplicate]

This question already has answers here:
Is there anyway to figure out what STL header file has not been included directly?
(2 answers)
Closed 9 years ago.
On Linux, what is a fast way to identify what are the necessary #include statements that I need for a C++ project?
I mean, let's say someone gives you a snippet from the web, but fails to provide the necessary #include statements. Is there potentially a way where you can run a Linux command or compiler command option and identify which functions or classes are missing, and, as a bonus, identify on the hard drive where I might have these things in a header file.

Basically you need some analyzer to parse your sources and headers and build a complete dependency graph which it spits out in the end for you to read and process further.
I'd follow john's advice on g++ and Clang for this purpose but I highly doubt they got what it takes.
What you actually can do, at least with g++, is print out a graph for already existing includes. Use the -H option to print a tree or -M to get a list.
I also refer you to this related topic: Tool to track #include dependencies
Not exactly what you want, but the tools mentioned there might be helpful.

I think Clang's "include-what-you-use" tool is what you want.

If, by necessary you mean minimal (i.e. if A includes B and B includes C then A doesn't need to include C) I don't know of a fast way.
One good approach, however, is for each cpp file to include its own header file first (after any precompiled headers.) That insures that each header file includes (directly or indirectly) all the header files it needs to define the symbols used in the header.
Also a project of reasonable size should be designed in layers such that Layer A knows about/depends on layer B which depends on layer C, etc, but lower layers never include higher layers (i.e. C never includes anything from layer A)
In that case the includes in each cpp or hpp should be in Layer order (A, B, C). If you do this it is fairly easy to check to see if any of the layer C headers can be eliminated (comment them out temporarily) because one of the includes that comes before them has already included them. This happens quite a lot and can significantly reduce the number of #includes in each file.
Having said all of that, this is a much less critical issue than it used to be because compilers are smarter. A combination of #pragma once and precompiled headers can keep build times down without requiring that you spend a lot of time optimizing includes.

The best way I know of to find undefined identifiers in a program is just to try to compile it. Depending on exactly what compiler you’re using, you might be able simply to pipe the output of GCC or Clang into grep, looking for phrases like “undeclared identifier.”
As for determining where the symbols are defined, I would recommend as a starting point looking at Ctags to parse your system headers (best managed using a Makefile) and using the resulting tags table to look up anything grep catches from GCC.

The fastest way... that's not how you should think of it.
https://stackoverflow.com/a/18544093/2112028
I wrote a lovely (I'm quite proud :P) answer there talking about how linking works (with templates) and proving it works and such, understand that.
The goal of #include directives is to create a "translation unit" where every symbol is declared (even if not defined) there's an example in my answer where I simply copy and paste the prototype into a code file, rather than use include.
You ought not worry about the "fastest" way if you use something called "Header guards" (these are mentioned briefly right at the bottom, but this isn't sufficient detail) they go like this:
#ifndef __WHATEVER_H
#define __WHATEVER_H
/*Your code here*/
#endif
So now you can include "whatever.h" AS MANY times as you like. the first time IN THE TRANSLATION UNIT, will define __WHATEVER_H, so the next file that includes it (however many includes deep from the file being compiled) will be empty. as everything between the #ifndef and #endif will be gone.
Hope this helps.
Also if you have unnecessary inputs, use -Wextra and -Wall, GCC will tell you about unused functions, typedefs and so forth. you can use the pragma error push and pop things to control this. For example wxWidget's header files may contain a lot of unused things, so you push the warnings onto the stack, remove the unused warning flags, include the file, pop the warnings stack (turning them back on), less you get thousands of lines of warnings.

Protection against accidental object incompatibility?

TL;DR
Protection against binary incompatibility resulting from compiler argument typos in shared, possibly templated headers' preprocessor directives, which control conditional compilation, in different compilation units?
Ex.
g++ ... -DYOUR_NORMAl_FLAG ... -o libA.so
/**Another compilation unit, or even project. **/
g++ ... -DYOUR_NORMA1_FLAG ... -o libB.so
/**Another compilation unit, or even project. **/
g++ ... -DYOUR_NORMAI_FLAG ... main.cpp libA.so //The possibilities!
The Basic Story
Recently, I ran into a strange bug: the symptom was a single SIGSEGV, which always seemed to occur at the same location after recompling. This led me to believe there was some kind of memory corruption going on, and the actual underlying pointer is not a pointer at all, but some data section.
I save you from the long and strenuous journey taking almost two otherwise perfectly good work days to track down the problem. Sufficient to say, Valgrind, GDB, nm, readelf, electric fence, GCC's stack smashing protection, and then some more measures/methods/approaches failed.
In utter devastation, my attention turned to the finest details in the build process, which was analogous to:
Build one small library.
Build one large library, which uses the small one.
Build the test suite of the large library.
Only in case when the large library was used as a static, or a dynamic library dependency (ie. the dynamic linker loaded it automatically, no dlopen)
was there a problem. The test case where all the code of the library was simply included in the tests, everything worked: this was the most important clue.
The "Solution"
In the end, it turned out to be the simplest thing: a single (!) typo.
Turns out, the compilation flags differed by a single char in the test suite, and the large library: a define, which was controlling the behavior of the small library, was misspelled.
Critical information morsel: the small library had some templates. These were used directly in every case, without explicit instantiation in advance. The contents of one of the templated classes changed when the flag was toggled: some data fields were simply not present in case the flag was defined!
The linker noticed nothing of this. (Since the class was templated, the resultant symbols were weak.)
The code used dynamic casts, and the class affected by this problem was inheriting from the mangled class -> things went south.
My question is as follows: how would you protect against this kind of problem? Are there any tools or solutions which address this specific issue?
Future Proofing
I've thought of two things, and believe no protection can be built on the object file level:
1: Save options implemented as preprocessor symbols in some well defined place, preferably extracted by a separate build step. Provide check script which uses this to check all compiler defines, and defines in user code. Integrate this check into the build process. Possibly use Levenshtein distance or similar to check for misspellings. Expensive, and the script / solution can get complicated. Possible problem with similar flags (but why have them?), additional files must accompany compiled library code. (Well, maybe with DWARF 2, this is untrue, but let's assume we don't want that.)
2: Centralize build options: cheap, customization option left open (think makefile.local), but makes monolithic monstrosities, strong project couplings.
I'd like to go ahead and quench a few likely flame inducing embers possibly flaring up in some readers: "do not use preprocessor symbols" is not an option here.
Conditional compilation does have it's place in high performance code, and doing everything with templates and enable_if-s would needlessly overcomplicate things. While the above solution is usually not desirable it can arise form the development process.
Please assume you have no control over the situation, assume you have legacy code, assume everything you can to force yourself to avoid side-stepping.
If those won't do, generalize into ABI incompatibility detection, though this might escalate the scope of the question too much for SO.
I'm aware of:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
DT_SONAME is not applicable.
Other version schemes therein are not applicable either - they were designed to protect a package which is in itself not faulty.
Mixing C++ ABIs to build against legacy libraries
Static analysis tool to detect ABI breaks in C++

If it matters, don't have a default case.
#ifdef YOUR_NORMAL_FLAG
// some code
#elsif YOUR_SPECIAL_FLAG
// some other code
#else
// in case of a typo, this is a compilation error
# error "No flag specified"
#endif
This may lead to a large list of compiler options if conditional compilation is overused, but there are ways around this like defining config-files
flag=normal
flag2=special
which get parsed by build scripts and generate the options and can possibly check for typos or could be parsed directly from the Makefile.

What techniques can be used to speed up C++ compilation times?

What techniques can be used to speed up C++ compilation times?
This question came up in some comments to Stack Overflow question C++ programming style, and I'm interested to hear what ideas there are.
I've seen a related question, Why does C++ compilation take so long?, but that doesn't provide many solutions.

Language techniques
Pimpl Idiom
Take a look at the Pimpl idiom here, and here, also known as an opaque pointer or handle classes. Not only does it speed up compilation, it also increases exception safety when combined with a non-throwing swap function. The Pimpl idiom lets you reduce the dependencies between headers and reduces the amount of recompilation that needs to be done.
Forward Declarations
Wherever possible, use forward declarations. If the compiler only needs to know that SomeIdentifier is a struct or a pointer or whatever, don't include the entire definition, forcing the compiler to do more work than it needs to. This can have a cascading effect, making this way slower than they need to be.
The I/O streams are particularly known for slowing down builds. If you need them in a header file, try #including <iosfwd> instead of <iostream> and #include the <iostream> header in the implementation file only. The <iosfwd> header holds forward declarations only. Unfortunately the other standard headers don't have a respective declarations header.
Prefer pass-by-reference to pass-by-value in function signatures. This will eliminate the need to #include the respective type definitions in the header file and you will only need to forward-declare the type. Of course, prefer const references to non-const references to avoid obscure bugs, but this is an issue for another question.
Guard Conditions
Use guard conditions to keep header files from being included more than once in a single translation unit.
#pragma once
#ifndef filename_h
#define filename_h
// Header declarations / definitions
#endif
By using both the pragma and the ifndef, you get the portability of the plain macro solution, as well as the compilation speed optimization that some compilers can do in the presence of the pragma once directive.
Reduce interdependency
The more modular and less interdependent your code design is in general, the less often you will have to recompile everything. You can also end up reducing the amount of work the compiler has to do on any individual block at the same time, by virtue of the fact that it has less to keep track of.
Compiler options
Precompiled Headers
These are used to compile a common section of included headers once for many translation units. The compiler compiles it once, and saves its internal state. That state can then be loaded quickly to get a head start in compiling another file with that same set of headers.
Be careful that you only include rarely changed stuff in the precompiled headers, or you could end up doing full rebuilds more often than necessary. This is a good place for STL headers and other library include files.
ccache is another utility that takes advantage of caching techniques to speed things up.
Use Parallelism
Many compilers / IDEs support using multiple cores/CPUs to do compilation simultaneously. In GNU Make (usually used with GCC), use the -j [N] option. In Visual Studio, there's an option under preferences to allow it to build multiple projects in parallel. You can also use the /MP option for file-level paralellism, instead of just project-level paralellism.
Other parallel utilities:
Incredibuild
Unity Build
distcc
Use a Lower Optimization Level
The more the compiler tries to optimize, the harder it has to work.
Shared Libraries
Moving your less frequently modified code into libraries can reduce compile time. By using shared libraries (.so or .dll), you can reduce linking time as well.
Get a Faster Computer
More RAM, faster hard drives (including SSDs), and more CPUs/cores will all make a difference in compilation speed.

I work on the STAPL project which is a heavily-templated C++ library. Once in a while, we have to revisit all the techniques to reduce compilation time. In here, I have summarized the techniques we use. Some of these techniques are already listed above:
Finding the most time-consuming sections
Although there is no proven correlation between the symbol lengths and compilation time, we have observed that smaller average symbol sizes can improve compilation time on all compilers. So your first goals it to find the largest symbols in your code.
Method 1 - Sort symbols based on size
You can use the nm command to list the symbols based on their sizes:
nm --print-size --size-sort --radix=d YOUR_BINARY
In this command the --radix=d lets you see the sizes in decimal numbers (default is hex). Now by looking at the largest symbol, identify if you can break the corresponding class and try to redesign it by factoring the non-templated parts in a base class, or by splitting the class into multiple classes.
Method 2 - Sort symbols based on length
You can run the regular nm command and pipe it to your favorite script (AWK, Python, etc.) to sort the symbols based on their length. Based on our experience, this method identifies the largest trouble making candidates better than method 1.
Method 3 - Use Templight
"Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantiation process".
You can install Templight by checking out LLVM and Clang (instructions) and applying the Templight patch on it. The default setting for LLVM and Clang is on debug and assertions, and these can impact your compilation time significantly. It does seem like Templight needs both, so you have to use the default settings. The process of installing LLVM and Clang should take about an hour or so.
After applying the patch you can use templight++ located in the build folder you specified upon installation to compile your code.
Make sure that templight++ is in your PATH. Now to compile add the following switches to your CXXFLAGS in your Makefile or to your command line options:
CXXFLAGS+=-Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system
Or
templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system
After compilation is done, you will have a .trace.memory.pbf and .trace.pbf generated in the same folder. To visualize these traces, you can use the Templight Tools that can convert these to other formats. Follow these instructions to install templight-convert. We usually use the callgrind output. You can also use the GraphViz output if your project is small:
$ templight-convert --format callgrind YOUR_BINARY --output YOUR_BINARY.trace
$ templight-convert --format graphviz YOUR_BINARY --output YOUR_BINARY.dot
The callgrind file generated can be opened using kcachegrind in which you can trace the most time/memory consuming instantiation.
Reducing the number of template instantiations
Although there are no exact solution for reducing the number of template instantiations, there are a few guidelines that can help:
Refactor classes with more than one template arguments
For example, if you have a class,
template <typename T, typename U>
struct foo { };
and both of T and U can have 10 different options, you have increased the possible template instantiations of this class to 100. One way to resolve this is to abstract the common part of the code to a different class. The other method is to use inheritance inversion (reversing the class hierarchy), but make sure that your design goals are not compromised before using this technique.
Refactor non-templated code to individual translation units
Using this technique, you can compile the common section once and link it with your other TUs (translation units) later on.
Use extern template instantiations (since C++11)
If you know all the possible instantiations of a class you can use this technique to compile all cases in a different translation unit.
For example, in:
enum class PossibleChoices = {Option1, Option2, Option3}
template <PossibleChoices pc>
struct foo { };
We know that this class can have three possible instantiations:
template class foo<PossibleChoices::Option1>;
template class foo<PossibleChoices::Option2>;
template class foo<PossibleChoices::Option3>;
Put the above in a translation unit and use the extern keyword in your header file, below the class definition:
extern template class foo<PossibleChoices::Option1>;
extern template class foo<PossibleChoices::Option2>;
extern template class foo<PossibleChoices::Option3>;
This technique can save you time if you are compiling different tests with a common set of instantiations.
NOTE : MPICH2 ignores the explicit instantiation at this point and always compiles the instantiated classes in all compilation units.
Use unity builds
The whole idea behind unity builds is to include all the .cc files that you use in one file and compile that file only once. Using this method, you can avoid reinstantiating common sections of different files and if your project includes a lot of common files, you probably would save on disk accesses as well.
As an example, let's assume you have three files foo1.cc, foo2.cc, foo3.cc and they all include tuple from STL. You can create a foo-all.cc that looks like:
#include "foo1.cc"
#include "foo2.cc"
#include "foo3.cc"
You compile this file only once and potentially reduce the common instantiations among the three files. It is hard to generally predict if the improvement can be significant or not. But one evident fact is that you would lose parallelism in your builds (you can no longer compile the three files at the same time).
Further, if any of these files happen to take a lot of memory, you might actually run out of memory before the compilation is over. On some compilers, such as GCC, this might ICE (Internal Compiler Error) your compiler for lack of memory. So don't use this technique unless you know all the pros and cons.
Precompiled headers
Precompiled headers (PCHs) can save you a lot of time in compilation by compiling your header files to an intermediate representation recognizable by a compiler. To generate precompiled header files, you only need to compile your header file with your regular compilation command. For example, on GCC:
$ g++ YOUR_HEADER.hpp
This will generate a YOUR_HEADER.hpp.gch file (.gch is the extension for PCH files in GCC) in the same folder. This means that if you include YOUR_HEADER.hpp in some other file, the compiler will use your YOUR_HEADER.hpp.gch instead of YOUR_HEADER.hpp in the same folder before.
There are two issues with this technique:
You have to make sure that the header files being precompiled is stable and is not going to change (you can always change your makefile)
You can only include one PCH per compilation unit (on most of compilers). This means that if you have more than one header file to be precompiled, you have to include them in one file (e.g., all-my-headers.hpp). But that means that you have to include the new file in all places. Fortunately, GCC has a solution for this problem. Use -include and give it the new header file. You can comma separate different files using this technique.
For example:
g++ foo.cc -include all-my-headers.hpp
Use unnamed or anonymous namespaces
Unnamed namespaces (a.k.a. anonymous namespaces) can reduce the generated binary sizes significantly. Unnamed namespaces use internal linkage, meaning that the symbols generated in those namespaces will not be visible to other TU (translation or compilation units). Compilers usually generate unique names for unnamed namespaces. This means that if you have a file foo.hpp:
namespace {
template <typename T>
struct foo { };
} // Anonymous namespace
using A = foo<int>;
And you happen to include this file in two TUs (two .cc files and compile them separately). The two foo template instances will not be the same. This violates the One Definition Rule (ODR). For the same reason, using unnamed namespaces is discouraged in the header files. Feel free to use them in your .cc files to avoid symbols showing up in your binary files. In some cases, changing all the internal details for a .cc file showed a 10% reduction in the generated binary sizes.
Changing visibility options
In newer compilers you can select your symbols to be either visible or invisible in the Dynamic Shared Objects (DSOs). Ideally, changing the visibility can improve compiler performance, link time optimizations (LTOs), and generated binary sizes. If you look at the STL header files in GCC you can see that it is widely used. To enable visibility choices, you need to change your code per function, per class, per variable and more importantly per compiler.
With the help of visibility you can hide the symbols that you consider them private from the generated shared objects. On GCC you can control the visibility of symbols by passing default or hidden to the -visibility option of your compiler. This is in some sense similar to the unnamed namespace but in a more elaborate and intrusive way.
If you would like to specify the visibilities per case, you have to add the following attributes to your functions, variables, and classes:
__attribute__((visibility("default"))) void foo1() { }
__attribute__((visibility("hidden"))) void foo2() { }
__attribute__((visibility("hidden"))) class foo3 { };
void foo4() { }
The default visibility in GCC is default (public), meaning that if you compile the above as a shared library (-shared) method, foo2 and class foo3 will not be visible in other TUs (foo1 and foo4 will be visible). If you compile with -visibility=hidden then only foo1 will be visible. Even foo4 would be hidden.
You can read more about visibility on GCC wiki.

I'd recommend these articles from "Games from Within, Indie Game Design And Programming":
Physical Structure and C++ – Part 1: A First Look
Physical Structure and C++ – Part 2: Build Times
Even More Experiments with Includes
How Incredible Is Incredibuild?
The Care and Feeding of Pre-Compiled Headers
The Quest for the Perfect Build System
The Quest for the Perfect Build System (Part 2)
Granted, they are pretty old - you'll have to re-test everything with the latest versions (or versions available to you), to get realistic results. Either way, it is a good source for ideas.

One technique which worked quite well for me in the past: don't compile multiple C++ source files independently, but rather generate one C++ file which includes all the other files, like this:
// myproject_all.cpp
// Automatically generated file - don't edit this by hand!
#include "main.cpp"
#include "mainwindow.cpp"
#include "filterdialog.cpp"
#include "database.cpp"
Of course this means you have to recompile all of the included source code in case any of the sources changes, so the dependency tree gets worse. However, compiling multiple source files as one translation unit is faster (at least in my experiments with MSVC and GCC) and generates smaller binaries. I also suspect that the compiler is given more potential for optimizations (since it can see more code at once).
This technique breaks in various cases; for instance, the compiler will bail out in case two or more source files declare a global function with the same name. I couldn't find this technique described in any of the other answers though, that's why I'm mentioning it here.
For what it's worth, the KDE Project used this exact same technique since 1999 to build optimized binaries (possibly for a release). The switch to the build configure script was called --enable-final. Out of archaeological interest I dug up the posting which announced this feature: http://lists.kde.org/?l=kde-devel&m=92722836009368&w=2

I will just link to my other answer: How do YOU reduce compile time, and linking time for Visual C++ projects (native C++)?. Another point I want to add, but which causes often problems is to use precompiled headers. But please, only use them for parts which hardly ever change (like GUI toolkit headers). Otherwise, they will cost you more time than they save you in the end.
Another option is, when you work with GNU make, to turn on -j<N> option:
-j [N], --jobs[=N] Allow N jobs at once; infinite jobs with no arg.
I usually have it at 3 since I've got a dual core here. It will then run compilers in parallel for different translation units, provided there are no dependencies between them. Linking cannot be done in parallel, since there is only one linker process linking together all object files.
But the linker itself can be threaded, and this is what the GNU gold ELF linker does. It's optimized threaded C++ code which is said to link ELF object files a magnitude faster than the old ld (and was actually included into binutils).

There's an entire book on this topic, which is titled Large-Scale C++ Software Design (written by John Lakos).
The book pre-dates templates, so to the contents of that book add "using templates, too, can make the compiler slower".

Once you have applied all the code tricks above (forward declarations, reducing header inclusion to the minimum in public headers, pushing most details inside the implementation file with Pimpl...) and nothing else can be gained language-wise, consider your build system. If you use Linux, consider using distcc (distributed compiler) and ccache (cache compiler).
The first one, distcc, executes the preprocessor step locally and then sends the output to the first available compiler in the network. It requires the same compiler and library versions in all the configured nodes in the network.
The latter, ccache, is a compiler cache. It again executes the preprocessor and then check with an internal database (held in a local directory) if that preprocessor file has already been compiled with the same compiler parameters. If it does, it just pops up the binary and output from the first run of the compiler.
Both can be used at the same time, so that if ccache does not have a local copy it can send it trough the net to another node with distcc, or else it can just inject the solution without further processing.

Here are some:
Use all processor cores by starting a multiple-compile job (make -j2 is a good example).
Turn off or lower optimizations (for example, GCC is much faster with -O1 than -O2 or -O3).
Use precompiled headers.

When I came out of college, the first real production-worthy C++ code I saw had these arcane #ifndef ... #endif directives in between them where the headers were defined. I asked the guy who was writing the code about these overarching things in a very naive fashion and was introduced to world of large-scale programming.
Coming back to the point, using directives to prevent duplicate header definitions was the first thing I learned when it came to reducing compiling times.

More RAM.
Someone talked about RAM drives in another answer. I did this with a 80286 and Turbo C++ (shows age) and the results were phenomenal. As was the loss of data when the machine crashed.

You could use Unity Builds.

Use
#pragma once
at the top of header files, so if they're included more than once in a translation unit, the text of the header will only get included and parsed once.

Use forward declarations where you can. If a class declaration only uses a pointer or reference to a type, you can just forward declare it and include the header for the type in the implementation file.
For example:
// T.h
class Class2; // Forward declaration
class T {
public:
void doSomething(Class2 &c2);
private:
Class2 *m_Class2Ptr;
};
// T.cpp
#include "Class2.h"
void Class2::doSomething(Class2 &c2) {
// Whatever you want here
}
Fewer includes means far less work for the preprocessor if you do it enough.

Just for completeness: a build might be slow because the build system is being stupid as well as because the compiler is taking a long time to do its work.
Read Recursive Make Considered Harmful (PDF) for a discussion of this topic in Unix environments.

Not about the compilation time, but about the build time:
Use ccache if you have to rebuild the same files when you are working
on your buildfiles
Use ninja-build instead of make. I am currently compiling a project
with ~100 source files and everything is cached by ccache. make needs
5 minutes, ninja less than 1.
You can generate your ninja files from cmake with -GNinja.

Upgrade your computer
Get a quad core (or a dual-quad system)
Get LOTS of RAM.
Use a RAM drive to drastically reduce file I/O delays. (There are companies that make IDE and SATA RAM drives that act like hard drives).
Then you have all your other typical suggestions
Use precompiled headers if available.
Reduce the amount of coupling between parts of your project. Changing one header file usually shouldn't require recompiling your entire project.

I had an idea about using a RAM drive. It turned out that for my projects it doesn't make that much of a difference after all. But then they are pretty small still. Try it! I'd be interested in hearing how much it helped.

Dynamic linking (.so) can be much much faster than static linking (.a). Especially when you have a slow network drive. This is since you have all of the code in the .a file which needs to be processed and written out. In addition, a much larger executable file needs to be written out to the disk.

Where are you spending your time? Are you CPU bound? Memory bound? Disk bound? Can you use more cores? More RAM? Do you need RAID? Do you simply want to improve the efficiency of your current system?
Under gcc/g++, have you looked at ccache? It can be helpful if you are doing make clean; make a lot.

Starting with Visual Studio 2017 you have the capability to have some compiler metrics about what takes time.
Add those parameters to C/C++ -> Command line (Additional Options) in the project properties window:
/Bt+ /d2cgsummary /d1reportTime
You can have more informations in this post.

Faster hard disks.
Compilers write many (and possibly huge) files to disk. Work with SSD instead of typical hard disk and compilation times are much lower.

On Linux (and maybe some other *NIXes), you can really speed the compilation by NOT STARING at the output and changing to another TTY.
Here is the experiment: printf slows down my program

Networks shares will drastically slow down your build, as the seek latency is high. For something like Boost, it made a huge difference for me, even though our network share drive is pretty fast. Time to compile a toy Boost program went from about 1 minute to 1 second when I switched from a network share to a local SSD.

If you have a multicore processor, both Visual Studio (2005 and later) as well as GCC support multi-processor compiles. It is something to enable if you have the hardware, for sure.

First of all, we have to understand what so different about C++ that sets it apart from other languages.
Some people say it's that C++ has many too features. But hey, there are languages that have a lot more features and they are nowhere near that slow.
Some people say it's the size of a file that matters. Nope, source lines of code don't correlate with compile times.
But wait, how can it be? More lines of code should mean longer compile times, what's the sorcery?
The trick is that a lot of lines of code is hidden in preprocessor directives. Yes. Just one #include can ruin your module's compilation performance.
You see, C++ doesn't have a module system. All *.cpp files are compiled from scratch. So having 1000 *.cpp files means compiling your project a thousand times. You have more than that? Too bad.
That's why C++ developers hesitate to split classes into multiple files. All those headers are tedious to maintain.
So what can we do other than using precompiled headers, merging all the cpp files into one, and keeping the number of headers minimal?
C++20 brings us preliminary support of modules! Eventually, you'll be able to forget about #include and the horrible compile performance that header files bring with them. Touched one file? Recompile only that file! Need to compile a fresh checkout? Compile in seconds rather than minutes and hours.
The C++ community should move to C++20 as soon as possible. C++ compiler developers should put more focus on this, C++ developers should start testing preliminary support in various compilers and use those compilers that support modules. This is the most important moment in C++ history!

Although not a "technique", I couldn't figure out how Win32 projects with many source files compiled faster than my "Hello World" empty project. Thus, I hope this helps someone like it did me.
In Visual Studio, one option to increase compile times is Incremental Linking (/INCREMENTAL). It's incompatible with Link-time Code Generation (/LTCG) so remember to disable incremental linking when doing release builds.

Using dynamic linking instead of static one make you compiler faster that can feel.
If you use t Cmake, active the property:
set(BUILD_SHARED_LIBS ON)
Build Release, using static linking can get more optimize.

From Microsoft: https://devblogs.microsoft.com/cppblog/recommendations-to-speed-c-builds-in-visual-studio/
Specific recommendations include:
DO USE PCH for projects
DO include commonly used system, runtime and third party headers in
PCH
DO include rarely changing project specific headers in PCH
DO NOT include headers that change frequently
DO audit PCH regularly to keep it up to date with product churn
DO USE /MP
DO Remove /Gm in favor of /MP
DO resolve conflict with #import and use /MP
DO USE linker switch /incremental
DO USE linker switch /debug:fastlink
DO consider using a third party build accelerator

Detecting superfluous #includes in C/C++?

I often find that the headers section of a file get larger and larger all the time but it never gets smaller. Throughout the life of a source file classes may have moved and been refactored and it's very possible that there are quite a few #includes that don't need to be there and anymore. Leaving them there only prolong the compile time and adds unnecessary compilation dependencies. Trying to figure out which are still needed can be quite tedious.
Is there some kind of tool that can detect superfluous #include directives and suggest which ones I can safely remove?
Does lint do this maybe?

Google's cppclean (links to: download, documentation) can find several categories of C++ problems, and it can now find superfluous #includes.
There's also a Clang-based tool, include-what-you-use, that can do this. include-what-you-use can even suggest forward declarations (so you don't have to #include so much) and optionally clean up your #includes for you.
Current versions of Eclipse CDT also have this functionality built in: going under the Source menu and clicking Organize Includes will alphabetize your #include's, add any headers that Eclipse thinks you're using without directly including them, and comments out any headers that it doesn't think you need. This feature isn't 100% reliable, however.

Also check out include-what-you-use, which solves a similar problem.

It's not automatic, but doxygen will produce dependency diagrams for #included files. You will have to go through them visually, but they can be very useful for getting a picture of what is using what.

The problem with detecting superfluous includes is that it can't be just a type dependency checker. A superfluous include is a file which provides nothing of value to the compilation and does not alter another item which other files depend. There are many ways a header file can alter a compile, say by defining a constant, redefining and/or deleting a used macro, adding a namespace which alters the lookup of a name some way down the line. In order to detect items like the namespace you need much more than a preprocessor, you in fact almost need a full compiler.
Lint is more of a style checker and certainly won't have this full capability.
I think you'll find the only way to detect a superfluous include is to remove, compile and run suites.

I thought that PCLint would do this, but it has been a few years since I've looked at it. You might check it out.
I looked at this blog and the author talked a bit about configuring PCLint to find unused includes. Might be worth a look.

The CScout refactoring browser can detect superfluous include directives in C (unfortunately not C++) code. You can find a description of how it works in this journal article.

Sorry to (re-)post here, people often don't expand comments.
Check my comment to crashmstr, FlexeLint / PC-Lint will do this for you. Informational message 766. Section 11.8.1 of my manual (version 8.0) discusses this.
Also, and this is important, keep iterating until the message goes away. In other words, after removing unused headers, re-run lint, more header files might have become "unneeded" once you remove some unneeded headers. (That might sound silly, read it slowly & parse it, it makes sense.)

I've never found a full-fledged tool that accomplishes what you're asking. The closest thing I've used is IncludeManager, which graphs your header inclusion tree so you can visually spot things like headers included in only one file and circular header inclusions.

You can write a quick script that erases a single #include directive, compiles the projects, and logs the name in the #include and the file it was removed from in the case that no compilation errors occurred.
Let it run during the night, and the next day you will have a 100% correct list of include files you can remove.
Sometimes brute-force just works :-)
edit: and sometimes it doesn't :-). Here's a bit of information from the comments:
Sometimes you can remove two header files separately, but not both together. A solution is to remove the header files during the run and not bring them back. This will find a list of files you can safely remove, although there might a solution with more files to remove which this algorithm won't find. (it's a greedy search over the space of include files to remove. It will only find a local maximum)
There may be subtle changes in behavior if you have some macros redefined differently depending on some #ifdefs. I think these are very rare cases, and the Unit Tests which are part of the build should catch these changes.

I've tried using Flexelint (the unix version of PC-Lint) and had somewhat mixed results. This is likely because I'm working on a very large and knotty code base. I recommend carefully examining each file that is reported as unused.
The main worry is false positives. Multiple includes of the same header are reported as an unneeded header. This is bad since Flexelint does not tell you what line the header is included on or where it was included before.
One of the ways automated tools can get this wrong:
In A.hpp:
class A {
// ...
};
In B.hpp:
#include "A.hpp
class B {
public:
A foo;
};
In C.cpp:
#include "C.hpp"
#include "B.hpp" // <-- Unneeded, but lint reports it as needed
#include "A.hpp" // <-- Needed, but lint reports it as unneeded
If you blindly follow the messages from Flexelint you'll muck up your #include dependencies. There are more pathological cases, but basically you're going to need to inspect the headers yourself for best results.
I highly recommend this article on Physical Structure and C++ from the blog Games from within. They recommend a comprehensive approach to cleaning up the #include mess:
Guidelines
Here’s a distilled set of guidelines from Lakos’ book that minimize the number of physical dependencies between files. I’ve been using them for years and I’ve always been really happy with the results.
Every cpp file includes its own header file first. [snip]
A header file must include all the header files necessary to parse it. [snip]
A header file should have the bare minimum number of header files necessary to parse it. [snip]

If you are using Eclipse CDT you can try http://includator.com which is free for beta testers (at the time of this writing) and automatically removes superfluous #includes or adds missing ones. For those users who have FlexeLint or PC-Lint and are using Elicpse CDT, http://linticator.com might be an option (also free for beta test). While it uses Lint's analysis, it provides quick-fixes for automatically remove the superfluous #include statements.

This article explains a technique of #include removing by using the parsing of Doxygen. That's just a perl script, so it's quite easy to use.

CLion, the C/C++ IDE from JetBrains, detects redundant includes out-of-the-box. These are grayed-out in the editor, but there are also functions to optimise includes in the current file or whole project.
I've found that you pay for this functionality though; CLion takes a while to scan and analyse your project when first loaded.

Here is a simple brute force way of identifying superfluous header includes. It's not perfect but eliminates the "obvious" unnecessary includes. Getting rid of these goes a long way in cleaning up the code.
The scripts can be accessed directly on GitHub.

Maybe a little late, but I once found a WebKit perl script that did just what you wanted. It'll need some adapting I believe (I'm not well versed in perl), but it should do the trick:
http://trac.webkit.org/browser/branches/old/safari-3-2-branch/WebKitTools/Scripts/find-extra-includes
(this is an old branch because trunk doesn't have the file anymore)

There is a free tool Include File Dependencies Watcher which can be integrated in the visual studio. It shows superfluous #includes in red.

There's two types of superfluous #include files:
A header file actually not needed by
the module(.c, .cpp) at all
A header file is need by the module
but being included more than once, directly, or indirectly.
There's 2 ways in my experience that works well to detecting it:
gcc -H or cl.exe /showincludes (resolve problem 2)
In real world,
you can export CFLAGS=-H before make,
if all the Makefile's not override
CFLAGS options. Or as I used, you
can create a cc/g++ wrapper to add -H
options forcibly to each invoke of
$(CC) and $(CXX). and prepend the
wrapper's directory to $PATH
variable, then your make will all
uses you wrapper command instead. Of
course your wrapper should invoke the
real gcc compiler. This tricks
need to change if your Makefile uses
gcc directly. instead of $(CC) or
$(CXX) or by implied rules.
You can also compile a single file by tweaking with the command line. But if you want to clean headers for the whole project. You can capture all the output by:
make clean
make 2>&1 | tee result.txt
PC-Lint/FlexeLint(resolve problem
both 1 and 2)
make sure add the +e766 options, this warning is about:
unused header files.
pclint/flint -vf ...
This will cause pclint output included header files, nested header files will be indented appropriately.

clangd is doing that for you now. Possibly clang-tidy will soon be able to do that as well.

To end this discussion: the c++ preprocessor is turing complete. It is a semantic property, whether an include is superfluous. Hence, it follows from Rice's theorem that it is undecidable whether an include is superfluous or not. There CAN'T be a program, that (always correctly) detects whether an include is superfluous.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js