How to avoid recompiling header files - c++

My single C++ file includes many header files from various libraries such as MKL and template libraries. Compiling this single file every time takes a long time due to the many include statements. I tried separating the include statements in a single file, making an object file out of it, and linking it during the compilation of the main file, but it seems the compiler doesn't recognize the definitions of the objects and functions that are defined in the header files.
So, how can I save the compilation time of these header files?

Precompiled headers:
What you are doing does sound like it would benefit heavily from precompiled headers (pch's). The intel compiler does support pch's, as you can see here:
https://software.intel.com/en-us/node/522754.
Installing tools without root permissions
You can still experiment with ccache to see if that will help your situation -- it can make a huge difference in cases where units do not need to be recompiled, which happens surprisingly often. The path to doing this is to install ccache locally. Generally this can be done by downloading sources to a directory where you have write access (I keep an install folder in my home directory) , and then following the build directions for that project. Once the executable are built, you'll have to add the path for the executable to your path -- by doing
export PATH=$PATH:the-path-to-your-compiled-executables.
in BASH. At that point ccache will be available to your user.
A better explanation is available here: https://unix.stackexchange.com/questions/42567/how-to-install-program-locally-without-sudo-privileges
CMAKE, cotire, etc
One final point -- I haven't read the docu for the intel compiler's pch support, but pch support generally involves some manual code manipulation to instruct the compile which header to precompile and so on. I recommend looking in to managing your build with CMAKE, and using the cotire plugin to experiment with pch.
While I have gripes about CMAKE from a software engineering perspective, it does make managing your build, and especially experimenting with various ways to speed up your build, much much easier. After experimenting with pch's, and ccache, you could also try out ninja instead of make and see if that gets you any improvement.
Good luck.

Related

Clang precompiled headers - working with different /usr/include timestamps - perhaps by editing metadata?

I've been trying to address a compile time problem. The infrastructure in question compiles multiple objects each of which uses a multitude of stdlib/boost. I've essentially hit a limit where simplifying the dependency tree is no longer worth the effort.
So, I tried precompiled headers - and it worked a treat! The problem I have now is fitting it in a large compute farm and CI. Specifically, not all machines were setup at the same time so the timestamp for /usr/include/ is often different.
The flow we would like to have is:
build certain shared libraries first
precompile header
Launch multiple jobs on different machines using shared libraries (fine) and precompiled header
The header is precompiled in the following way :
clang++ precompiled.hpp -o /<path>/precompiled.hpp.pch
When I use the precompiled header, depending on the timestamp of /usr/include/ on the given machine, i get the following metadata error :
fatal error: file '/usr/include/math.h' has been modified since the
precompiled header '//precompiled.hpp.pch' was built
It may sometimes be a different header too - eg assert.h is a common one.
So far I've tried the following:
changing isysroot & using glibc - exposed a variety of different problems (so a can of warms I'd rathern ot yet open)
hack by copying /usr/include/ elsewhere and specifying that earlier in the search path. Unfortunately, doesn't work due to use of include_next in some headers but not others i.e. can't consistently force the headers to be picked from elsewhere and none from /usr/include
Any ideas on how to tackle this problem?
I am now even considering an even worse hack - trying to edit the metadata of the precompiled header. Unfortunately, I couldn't find any API to easily query/edit the PCH.
Any ideas?
Have now managed to come to a solution (probably beneficial longterm anyway in terms of stability even if ignoring precompiled headers).
Specify --no-standard-includes -nostdinc++ -nostdlibinc. This will ensure that the path is include path is stripped of includes due to the way gcc/clang was built. May also specify CPATH= and CPLUS_INCLUDE_PATH=.
Reconstruct path using a central location via CPLUS_INCLUDE_PATH. This will mean the headers from /usr/include always come from a central locations and the metacheck will pass. It should also help stability of builds
Link against correct version of stdlib

How properly specify the #include paths in c++ to make your program portable

I've been struggling back and forth with this for a while now looking stuff up and asking questions and I'm still at a crossroads. What I've done so far and where I'm currently at based on what I've been told is this: I've added 2 directories to my repo: src for my .cpp files and include for my .hpp files. In my include directory I have all the .hpp files directly in the folder where as in my src directory I have several sub-directories grouping my .cpp files according to the purpose they serve e.g. \src\ValuationFunctions\MonteCarloFunctions\FunctionHelpers.
I've changed the name of all the #include "header.h" to #include "..\include\header.h". This works for my main file which is directly in the src folder but I found now that it doesn't work for my .cpp files that are in sub-directories like in my example above, it would seem I would have to navigate back to the root folder doing something like #include "../../..\include\header.h" which obviously can't be the way to go.
How do I make this work, am I even on the right track here? I have uploaded my repo to github (https://github.com/OscarUngsgard/Cpp-Monte-Carlo-Value-at-Risk-Engine) and the goal is for someone to be able to go there, see how the program is structured, clone the repo and just run it (I imagine this is what the goal always is? Or does some responsibility usually fall on the cloner of the repo to make it work?).
I'm using Windows and Visual Studios, help greatly appreciated.
How properly specify the #include paths in c++ to make your program portable
Please read the C++11 standard n3337 and see this C++ reference website. An included header might not even be any file on your computer (in principle it could be some database).
If you use some recent GCC as your C++ compiler, it does have precompiled headers and link-time optimization facilities. Read also the documentation of its preprocessor. I recommend to enable all warnings and debug info, so use g++ -Wall -Wextra -g.
If you use Microsoft VisualStudio as your compiler, it has a documentation and provides a cl command, with various optimization facilities. Be sure to enable warnings.
You could consider using some C++ static analyzer, such as Clang's or Frama-C++. This draft report could be relevant and should interest you (at least for references).
The source code editor (either VisualStudioCode or GNU emacs or vim or many others) and the debugger (e.g. GDB) and the version control system (e.g. git) that you are using also have documentation. Please take time to read them, and read How to debug small programs.
Remember that C++ code can be generated, by tools such as ANTLR or SWIG.
A suggestion is to approach your issue in the dual way: ensure that proper include paths are passed to compilation commands (from your build automation tool such as GNU make or ninja or meson). This is what GNU autoconf does.
You could consider using autoconf in your software project.
I've changed the name of all the #include "header.h" to #include "..\include\header.h".
I believe it was a mistake, and you certainly want to use slashes, e.g. #include "../include/header.h" if you care about porting your code later to other operating systems (e.g. Linux, Android, MacOSX, or some other Unixes). On most operating systems, the separator for directories is a / and most C++ compilers accept it.
Studying the source code of either Qt or POCO could be inspirational, and one or both of these open source libraries could be useful to you. They are cross-platform. The source code of GCC and Clang could also be interesting to look into. Both are open source C++ compilers, written in C++ mostly (with some metaprogramming approaches, that is some generated C++ code).
See also this and that.
In program development, it is often necessary to use toolkits developed by others. Generally speaking, in Visual Studio, source files are rarely used, and most of them use header files and link libraries that declare classes. If you want to use these classes, you need to include the name of the header file in the file, such as #include "cv.h". But this is not enough, because this file is generally not in the current directory, the solution is as follows:
Open "Project-Properties-Configuration Properties-C/C++-General-Additional Include Directory" in turn and add all the paths.
For all kinds of IDEs, we can do similar operations to include directories. So for those who clone the project, it is quite normal to modify the directory contained in the project.

Strategy to omit unused boost src files while shipping source code

I'm using
#include <boost/numeric/ublas/matrix.hpp>
in fact that's the only boost file I've included. Now I want to ship the source code and I was hoping not have to include all hundreds of MBs of boost_1_67_0.
How to deal with this issue?
This is simply something you would add to the list of build-dependencies of your C++ source code.
This kind of dependency could be made technically "bound" to your source code distribution via your version control system. In Git, for example, you could link to certain Boost libraries via a sub-module that links to their official git mirrors (github.com/boostorg as of this writing). When cloning your repository, it would then be an option to take in the Boost libraries at the same time.
Though, taking the size of the Boost headers into consideration, having them installed as a system-wide library, might be less complicated. Tools like CMake can help you write the logic for header-inclusion so you can support different header locations.
Of course, if what you seek is to create a fully isolated copy of your source code, the approach to bake all code into one massive header-file might be an option as well (but it should not be necessary).
You can preprocess the one header file you need, which will expand all its #includes:
c++ -E /usr/include/boost/numeric/ublas/matrix.hpp -o boost_numeric_ublas_matrix.hpp
Be aware though: this will expand even your system header files, so it assumes your users will build on the same platform. If they might compile on different platforms, you should simply omit the Boost code from your project and let the users install it themselves in whatever manner they choose.

Check for precompiled headers with autotools?

How can I check if gcc precompiled headers are supported with autoconf? Is there a macro like AC_CHECK_GCH? The project I'm working on has a lot of templates and includes, I have tried writing a .h with the most commonly used includes and compiling it manually. It could be nice to integrate it with the rest of autotools.
It's not clear what you are hoping to accomplish. Are you hoping to distribute a precompiled header in your tarball? doing so would be almost completely useless. (I actually think it would be completely useless, but I say "almost" because I might be missing something.)
The system headers on the target box (the machine on which your project is being built) are almost certainly different than the ones on your development box. If they are the same, then there's no need to be using autoconf.
If the user of your package happens to be using gcc and wants to use precompiled headers, then they will put the .gch files in the appropriate location and gcc will use them. You don't need to do anything in your package.

What are the pros & cons of pre-compiled headers specifically in a GNU/Linux environment/tool-chain?

Pre-compiled headers seem like they can save a lot of time in large projects, but also seem to be a pain-in-the-ass that have some gotchas.
What are the pros & cons of using pre-compiled headers, and specifically as it pertains to using them in a Gnu/gcc/Linux environment?
The only potential benefit to precompiled headers is that if your builds are too slow, precompiled headers might speed them up. Potential cons:
More Makefile dependencies to get right; if they are wrong, you build the wrong thing fast. Not good.
In principle, not every header can be precompiled. (Think about putting some #define's before a #include.) So which cases does gcc actually get right? How much do you want to trust this bleeding edge feature.
If your builds are fast enough, there is no reason to use precompiled headers. If your builds are too slow, I'd consider
Buying faster hardware, which is cheap compared to salaries
Using a tool like AT&T nmake or like ccache (Dirk is right on), both of which use trustworthy techniques to avoid recompilations.
I can't talk to GNU/gcc/linux, but I've dealt with pre-compiled headers in vs2005:
Pros:
Saves compile time when you have large headers that lots of modules
include.
Works well on headers (say from a third party) that change very
infrequently.
Cons:
If you use them for headers that change a lot,
it can increase compile time.
Can be fiddly to set up and maintain.
There are cases where changes to headers are apparently ignored
if you don't force the pre-compiled header to compile.
The ccache caching frontend to gcc, g++, gfortran, ... works great for me. As its website says
ccache is a compiler cache. It acts as
a caching pre-processor to C/C++
compilers, using the -E compiler
switch and a hash to detect when a
compilation can be satisfied from
cache. This often results in a 5 to 10
times speedup in common compilations.
On Debian / Ubuntu, just do 'apt-get install ccache' and create soft-links in, say, /usr/local/bin with names gcc, g++, gfortran, c++, ... that point to /usr/bin/ccache.
[EDIT] To make this more explicit in response to some early comments: This provides essentially pre-compiled headers and sources by caching a larger chunk of the compilation step. So it uses an idea that is similar to pre-compiled headers, and carries it further. The speedups can be dramatic -- a factor of 5 to 10 as the website says.
For plain C, I would avoid precompiled headers. As you say, they can potentially cause problems, and preprocessing time is really small compared to the regular compilation.
For C++, precompiled headers can potentially save a lot of time, as C++ headers often contain large template code whose compilation is expensive. I have no practical experience with them, so I recommend you measure how much savings in compilation you get in your project. To so so, compile the entire project with precompiled headers once, then delete a single object file, and measure how long it takes to recompile that file.
The GNU gcc documentation discusses possible pitfalls with pre-compiled headers.
I am using PCH in a Qt project, which uses cmake as build system, and it saves a lot of time. I grabbed some PCH cmake scripts, which needed some tweaking, since they were quite old but it generally was easier to set up than I expected. I have to add, I am not much of a cmake expert.
I am including now a big part of Qt (QtCore, QtGui, QtOpenGL) and a few stable headers at once.
Pros:
For Qt classes,no forward declarations are needed, and of course no includes.
Fast.
Easy to setup.
Cons:
You can't include the PCH include in headers. This isn't much of a problem, exept you use Qt and let the build system translate the moc files seperatly, which happens to be exactly my configuration. In this case, you need to #include the qt headers in your headers, because the mocs are genreted from headers. Solution was to put additional include guards around the #include in the header.