How does precompiled header reduce compile time - c++

I've been using precompiled header for a while and been told (and saw) how they can reduce compile time. But I would really like to know what is going on (under the hood) so it can make my compilation faster.
Because from what I know, adding unused include in a .cpp can slower your compile time, and a header file can contain a lot of unused header to a .cpp.
So how does a precompiled header make my compilation faster?

From http://gamesfromwithin.com/the-care-and-feeding-of-pre-compiled-headers Thank you (#Pablo)
A C++ compiler operates on one compilation unit (cpp file) at the
time. For each file, it applies the pre-preprocessor (which takes care
of doing all the includes and “baking” them into the cpp file itself),
and then it compiles the module itself. Move on to the next cpp file,
rinse and repeat. Clearly, if several files include the same set of
expensive header files (large and/or including many other header files
in turn), the compiler will be doing a lot of duplicated effort.
The simplest way to think of pre-compiled headers is as a cache for
header files. The compiler can analyze a set of headers once, compile
them, and then have the results ready for any module that needs them.

Basically, a header file is compiled once for each translation unit (.cpp file) by which it is included. Using a pre-compiled header header saves on time used to compile an include file over and over again. This is really beneficial when the header file to be pre-compiled is very large (or indirectly includes many other header files).

Many years ago I had access to a C compiler that printed out the number of lines it processed (Watcom C version 6 or so). Compiling files with less than 100 lines of C code would display counts of 5,000 or even 10,000 lines. All of which were #included. In other words #included code completely dominates compilation time. So anything you can do to reduce that is going to be beneficial. You can see for yourself with compilers that allow you to disable preprocessing: compare the times for complete system builds with and without it.

I think the "precompiled" says something about how it makes compilation faster. You can read about the basic concept here I think:
http://en.wikipedia.org/wiki/Precompiled_header

Related

What's the REAL difference between .h and .cpp files?

This question was posted several times on StackOverflow, but most of the answers stated something similar to ".h files are supposed to contain declarations whereas .cpp files are supposed to contain their definitions/implementation". I've noticed that simply defining functions in .h files works just fine. What's the purpose of declaring functions in .h files but defining and implementing them in .cpp files? Does it really reduce compile time? What else?
Practically: the conventions around .h files are in place so that you can safely include that file in multiple other files in your project. Header files are designed to be shared, while code files are not.
Let's take your example of defining functions or variables. Suppose your header file contains the following line:
header.h:
int x = 10;
code.cpp:
#include "header.h"
Now, if you only have one code file and one header file this probably works just fine:
g++ code.cpp -o outputFile
However, if you have two code files this breaks:
header.h:
int x = 10;
code1.cpp:
#include "header.h"
code2.cpp:
#include "header.h"
And then:
g++ code1.cpp -c (produces code1.o)
g++ code2.cpp -c (produces code2.o)
g++ code1.o code2.o -o outputFile
This breaks, specifically at the linker step, because now you have two symbols in the same executable that have the same symbol, and the linker doesn't know what's it's supposed to do with that. When you include your header in code1 you get a symbol "x" and when you include your header in code2 you get another symbol "x". The linker doesn't know your intention here, so it throws an error:
code2.o:(.data+0x0): multiple definition of `x'
code1.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status
Which again is just the linker saying that it can't resolve the fact that you now have two symbols with the same name in the same executable.
What's the REAL difference between .h and .cpp files?
They are both fundamentally just text files. From certain perspective, their only difference is the filename.
However, many programming related tools treat the files differently depending on their name. For example, some tools will detect programming language: .c is compiled as C language, .cpp is compiled as C++ and .h is not compiled at all.
For header files, the name does not matter at all to the compiler. The name could be .h or .header or anything else, it doesn't affect how the pre processor includes it. It is however good practice to conform to a common convention in order to avoid confusion.
I've noticed that simply defining functions in .h files works just fine.
Are the functions declared non-inline? Have you ever included the header file into more than one translation unit? If you answered yes to both, then your program has been ill formed. If you didn't, then that would explain why you didn't encounter any problems.
Does it really reduce compile time?
Yes. Dividing function definitions into smaller translation units can indeed reduce the time to compile said translation units compared to compiling larger translation units.
This is because doing less work takes less time. What is important to realise is that other translation units do not need to be recompiled when only one is modified. If you only have one translation unit, then you have to compile it i.e. the program in its entirety.
Multiple translation units are also better because they can be compiled in parallel, which allows taking advantage of modern multi core hardware.
What else?
Does there need to be anything else? Having to wait a few minutes to compile your program instead of a day improves development speed drastically.
There are some other advantages too regarding organisation of files. In particular, it is quite convenient to be able to define different implementations for same function for different target systems on order to be able to support multiple platforms. With header files, you must do tricks with macros while with source files, you simply choose which files to compile.
Another use case where implementing functions in header is not an option is distributing a library without source, as some middleware providers do. You must give the headers or else your functions cannot be called, but if all your source is in the headers, then you've given up your trade secrets. Compiled sources have to be at least reverse engineered.
Keep in mind that the C++ compiler is a fairly simple beast as far as file-handling goes. All it's allowed to do is a read in a single source-code file (and, via the pre-processor, logically insert into that incoming text-stream the contents of any files that the file #includes, recursively), parse the contents, and spit out the resulting .o file.
For small programs, keeping the entire codebase in a single .cpp file (or even a single .h file) works fine, because number of lines of code that the compiler needs to load into memory are small (relative to the computer's RAM).
But imagine you are working on a monster program, with tens of millions of lines of code -- yes, such things do exist. Loading that much code into RAM at once would likely stress the capabilities of all but the most powerful computers, leading to exceedingly long compile times or even outright failure.
And even worse than that, touching any of the code in a .h file requires the recompilation of any other files that #include that .h file, either directly or indirectly -- so if all your code is in .h files, then your compiler is likely to spend a lot of time unnecessarily recompiling a lot of code that didn't actually change.
To avoid those problems, C++ lets you place your code into multiple .cpp files. Since .cpp files are (at least traditionally) never #include'd by anything, the only time your Makefile or IDE will need to recompile any given .cpp file is after you've actually modified that exact file, or a .h file it #include's.
So when you've modified a function in the 375th .cpp file out of 700 .cpp files in your program, and now you want to test your modification, the compiler only has to recompile that one .cpp file and then re-link the .o files into an executable. If OTOH you've modified a .h file, compilation might be much longer, because now the build system will have to recompile every other file that includes that .h file, directly or indirectly, just in case you changed the meaning of something those files depend on.
.cpp files also make link-time issues much easier to deal with. For example, if you want to have a global variable, defining that global variable in a .cpp file (and maybe declaring an extern for it in a .h file) is straightforward; if OTOH you want to do that in a .h file, you'll have to be very careful or you'll end up with duplicate-symbol errors from your linker, and/or subtle violations of the One Definition Rule that will come back to bite you later on.
The REAL difference is that your programming environment lists .h and .cpp files separately. And/or populates file-browser-dialogs appropriately. And/or tries to compile .cpp files into object form (but doesn't do that to .h files). And whatever, depending on which IDE / environment you use.
The second difference is that people assume that your .h files are header files, and that your .cpp files are code source files.
If you don't care about people or development environments, you can put any damn thing you want in a .h or .cpp file, and call them any thing you want. You can put your declarations in a .cpp file and call it an "include file", and your definitions in a .pas file and call it a "source file".
I have to do this kind of thing when working in a constrained environment.
Header files weren't part of the original definition of c. The world got on perfectly well without them. Opening and closing lots of header files did slow down the compilation of c, which is why we got pre-compiled header files. Pre-compiled header files do speed up the compilation and linking of source code, but not any faster than just writing assembler, or machine code, or any other thing that didn't take advantage of the co-operation of other people or a design environment.
It is useful to put declarations in a header file, and definitions in a code source file. That's why you should do that. There isn't a requirement.
Whenever you see an #include <header.h> directive, pretend that the contents of header.h is being copied and pasted right where the #include directive appears.
.cpp files get compiled to become .obj files. They have no knowledge of the existence of any other .cpp file, and are compiled individually. That's why we need to declare things before we use them - otherwise the compiler won't know whether the function we're trying to invoke exists within a different .cpp file.
We use header files to share declarations amongst multiple .cpp files to avoid having to write the same code over and over for every single .cpp file.

Single header file with all the necessary #include statements

I am currently working on program with a lot of source files. Sometimes it is difficult to keep track of what libraries I have already #included. Theoretically, I could make a single header file called Headers.h that just contains all the #include statements I need, then make all other header files #include "Headers.h".
Why is this a good/bad idea?
Pros:
Slightly less maintenance as you don't have to keep track of which of your files are including headers from which libraries or other compoenents.
Cons:
Definitions in included files might conflict with each other. Especially in C where you don't have namespaces (you tagged with C and C++)
Macros in particular can cause hard to debug problems, where a macro definition unexpectedly conflicts with some name in your file or one of the other included files
Depending on which compiler you use, compilation times might blow out. If using a compiler that pre-compiles headers it might actually reduce compilation time, but if not the opposite will happen
You will often unnecessarily trigger rebuilds of files. If you have your build system set up correctly, then each source file will get rebuilt if any of the included files gets modified. If you always include all headers in your project, then a change to any of your headers will force recompilation of all your source files. Not likely to be an issue for system headers but it will be if you include your own headers in the master file as well.
On the whole I would not recommend that approach. The last con listed above it particularly important.
Best practice would be to include only headers that are needed for the code in each file.
In complement of Harmic's answer, indeed the main issue is the build system (most builders work on file timestamp, not on file contents. omake is a notable exception).
Notice that if you only care about many dependencies, GNU make can be used with autodependencies, together with -M* options passed to GCC (i.e. to g++ and actually to the preprocessor).
However, many libraries are offering to their user a single header (e.g. <gtk/gtk.h>)
Also, a single header file is more friendly to precompiled headers technology. In particular, GCC wants a single header for precompilation.
See also ccache.
Tracking all the required includes would be more difficult as they are abstracted from their c source files and not really supporting modularisation pus all the cons from #harmic

Why would using a precompiled header cause a build to be slower?

Our solution contains over 100 projects, over 8000 cpp files and over 10'000 header files.
I'm trying to improve our build times.
One of the projects in the solution contains just 5 cpp files, and takes about 10 seconds to compile. The header files were initially included in the cpp files, but in preparation for switching on precompiled headers, I moved the includes into a single pch.h file.
Each cpp file now includes the pch.h file.
This in itself has not made any discernible change to the compile time - it's still about 10 seconds.
Now when I tell the project to actually use the pch file as a precompiled header, it takes 17 seconds to compile the project.
Why would precompiling the included headers make the project take longer to build than when the file is just #included by each individual cpp file?
More info.
We use a technique called "lumping" - (individual cpp files are not compiled individually - they are each #included into a single project-wide cpp file, and that is the only cpp file which is compiled).
For what it's worth, thanks to spaghetti code, according to "Show Includes" the dozen or so included files in the pch file cause around 3000(!) files to be included. Obviously, this needs fixing!
The precompiled header file is about 130Mb when compiled.
If we switch off lumping, the single project build (not the whole solution) takes 45 seconds. If we then switch on precompiled headers, the build time improves.
I'm probably missing the obvious, but why when lumping is switched on, does switching on precompiled headers slow the build down?
What PCH does is do a "preprocess"/"precompile" stage on common headers included by multiple source files. This helps because repeated "preprocessing"/"precompiling" is avoided, and the compiler loads its previous state for each source file.
If you have just one big source file, this "preprocessing"/"procompiling" also needs to happen, but in total, only once. So then saving and loading the PCH file introduces overhead without taking any away (because there is no repetition whatsoever).
I use the term "preprocess"/"precompile" here because PCH is implemented wildly differently depending on you compiler, and might lean more towards one or the other.
Now, unless you make use of a heavy template library like Boost throughout much of your code, it often is enough to clean up include dependencies to speed up compile time, often by quite a significant factor. But that requires maintenance.

Including all header files in application

I was recently looking through the source code of a C++ application and saw that each class did not #include its needed components, but instead #include'd a "Precompiled.h" header. In this Precompiled header was an inclusion of almost every header in the application (not all of them, it was clear that the length and order of the list was deliberate). Essentially, this would mean that every class had an inclusion of every other class in the application.
Is this wise? Why or why not?
Usually if you write an application, you should only include header files which are really needed in cpp files. If you got a really big application, you should use forward declaration in the header and include necessary files in the cpp file. With that, changes in code only affects a minimum on cpp files, so the compiler had only to compile what really has changed.
The situation can totally flip, when it comes to libraries or code which does not change very often. The filename "Precompiled.h" is already a hint. The compiler can precompile the headers to a special object file, often called PCH file. With that, the compiler has not to resolve every include on every compile time. On heavy nested includes, this has high impact on compile speed, because instead of many files to load and parse, there is only one preparsed file. To archive that you have to declare one or more headers as a kind of center file for building a precompiled header. How you do that differs between different compilers.
For example Visual studio uses the header file "stdafx.h" as the center of the precompilation of header files. Because of that, only header files should include there which are not altered very often. Also the file had to be included first in every cpp file. That is because the compiler can not detect any more if a include file which is included before may have influence to the precompiled file. To avoid that, includes before the precompiled includes are not allowed.
Back to your question. Including every file in one header file to use it as precompiled header makes no sense at all, as it conteract the meaning of a precompiled header file.
It is a very bad idea.
For a .cpp file only include the minimum number of #include files.
Thereby when one of them changes the make (or moral equilivant) will not require the whole lot to be recompiled.
Saves lots of time during development.
PS Use forward declarations in preference to #include

Including the same headers in every class

I am including several STL headers such as list and vector in all my code files for my project. I know for my own headers that I should use include guards, but what about for this scenario when they aren't defined by me?
Is it bad to include the same headers in every one of my files? Is there a performance penalty for each time it is included?
There is no performance cost. The standard headers have their own include guards, and all include guards are optimized by the preprocessor so the file isn't actually reloaded each time.
Correctness and maintainability are always the first concern… how much compile time do you have to save to make up for work spent fixing things when you rearrange the files and get "undefined identifier" errors, or worse!
EDIT: There is no performance cost to multiply including the same standard headers from all your header files. There is some performance cost to including additional standard headers from a source file. The question is a bit ambiguous… but either way, the really expensive part of C++ compilation is usually template instantiation, not parsing the text.
As a general rule,
You must only include the header files when your source file needs it.
Include guards would prevent the same header file from being included in the same translation unit more than once and guard you against linking errors however, Ofcourse Standard library headers have their own.
However, note that If you include header files in source files which do not need them, then it just might increase your compilation & cause pollution of the namespace names.