Can cotire be made to work properly with Boost? - c++

PROBLEM SUMMARY
We took a crack at using cotire, the Compile-Time Reducer, as our precompiled header system due to the extremely long compile times caused by usage of the Boost C++ template library. We are getting poor to dangerous results-- the precompiled headers seem to be constantly rebuilt, and are occasionally masking build problems. Specifically, back-to-back builds result in differing hashes of the actual cotire precompiled headers themselves, and, when attempting incremental builds, the precompiled headers are rebuilt every time, even when no header material has changed.
BACKGROUND
The project is a Linux CMake build using g++ 6.3.1, which produces, as its artifacts, an installable shared-library and several executables. Possibly of note, we have no interest in using unity builds, due to constant requirements for rapid-iteration development. However, the project is very large, hence the interest in cotire.
It is worth noting that, to avoid unexpected interactions while troubleshooting this issue, we have disabled ccache, the compiler cache. (Our intent is to eventually enable ccache if and only if we get consistent, expected results from cotire, or after we give up and remove cotire.)
The project uses C++03 (though building with 11 did not change our results for purposes of this question) and relies extensively on the Boost C++ library. Specifically, we make constant use of Boost Signal-Slot, boost::bind, boost::function, and quite a few of the iteration capabilities. Boost implements all of these via extensive template meta-programming and variant self-inclusion.
It is possible, though not entirely straightforward, to "blacklist" header files from cotire, excluding them from the generated precompiled headers. We attempted to blacklist various subdirectories within Boost itself (Boost includes some 900 different header files, so we mostly restricted ourselves to its immediate subdirectories). This seemed to produce improved results-- blacklisting certain parts of Boost resulted in back-to-back clean builds that produced matching precompiled header hashes.
Unfortunately, after a more significant delay-- perhaps ten minutes-- a third attempt to do a clean build resulted in a different hash for the precompiled header.
At this point, our working theory was that various special preprocessor symbols such as __TIME__ and __DATE__ were somehow getting mangled into cotire's input. Searching for these in Boost's headers does indeed produce a number of hits in wave, spirit, etc. Presumably this causes cotire to believe that the headers have changed in some way, because it attempts to rebuild the entire precompiled header with every build (though not with every compilation unit; we tentatively believe we have it integrated into the build process correctly).
QUESTION
Has anyone successfully used cotire with gcc/g++ and Boost? Were any unusual steps required, as opposed to using cotire in a project without Boost?
We are interested in cotire specifically; results from, for example, Visual Studio's precompiled header system might be interesting but are unlikely to be helpful. However, if you are troubleshooting similar issues and wish to provide your observations here, please feel free to do so.

Related

Clang precompiled headers - working with different /usr/include timestamps - perhaps by editing metadata?

I've been trying to address a compile time problem. The infrastructure in question compiles multiple objects each of which uses a multitude of stdlib/boost. I've essentially hit a limit where simplifying the dependency tree is no longer worth the effort.
So, I tried precompiled headers - and it worked a treat! The problem I have now is fitting it in a large compute farm and CI. Specifically, not all machines were setup at the same time so the timestamp for /usr/include/ is often different.
The flow we would like to have is:
build certain shared libraries first
precompile header
Launch multiple jobs on different machines using shared libraries (fine) and precompiled header
The header is precompiled in the following way :
clang++ precompiled.hpp -o /<path>/precompiled.hpp.pch
When I use the precompiled header, depending on the timestamp of /usr/include/ on the given machine, i get the following metadata error :
fatal error: file '/usr/include/math.h' has been modified since the
precompiled header '//precompiled.hpp.pch' was built
It may sometimes be a different header too - eg assert.h is a common one.
So far I've tried the following:
changing isysroot & using glibc - exposed a variety of different problems (so a can of warms I'd rathern ot yet open)
hack by copying /usr/include/ elsewhere and specifying that earlier in the search path. Unfortunately, doesn't work due to use of include_next in some headers but not others i.e. can't consistently force the headers to be picked from elsewhere and none from /usr/include
Any ideas on how to tackle this problem?
I am now even considering an even worse hack - trying to edit the metadata of the precompiled header. Unfortunately, I couldn't find any API to easily query/edit the PCH.
Any ideas?
Have now managed to come to a solution (probably beneficial longterm anyway in terms of stability even if ignoring precompiled headers).
Specify --no-standard-includes -nostdinc++ -nostdlibinc. This will ensure that the path is include path is stripped of includes due to the way gcc/clang was built. May also specify CPATH= and CPLUS_INCLUDE_PATH=.
Reconstruct path using a central location via CPLUS_INCLUDE_PATH. This will mean the headers from /usr/include always come from a central locations and the metacheck will pass. It should also help stability of builds
Link against correct version of stdlib

Strategy to omit unused boost src files while shipping source code

I'm using
#include <boost/numeric/ublas/matrix.hpp>
in fact that's the only boost file I've included. Now I want to ship the source code and I was hoping not have to include all hundreds of MBs of boost_1_67_0.
How to deal with this issue?
This is simply something you would add to the list of build-dependencies of your C++ source code.
This kind of dependency could be made technically "bound" to your source code distribution via your version control system. In Git, for example, you could link to certain Boost libraries via a sub-module that links to their official git mirrors (github.com/boostorg as of this writing). When cloning your repository, it would then be an option to take in the Boost libraries at the same time.
Though, taking the size of the Boost headers into consideration, having them installed as a system-wide library, might be less complicated. Tools like CMake can help you write the logic for header-inclusion so you can support different header locations.
Of course, if what you seek is to create a fully isolated copy of your source code, the approach to bake all code into one massive header-file might be an option as well (but it should not be necessary).
You can preprocess the one header file you need, which will expand all its #includes:
c++ -E /usr/include/boost/numeric/ublas/matrix.hpp -o boost_numeric_ublas_matrix.hpp
Be aware though: this will expand even your system header files, so it assumes your users will build on the same platform. If they might compile on different platforms, you should simply omit the Boost code from your project and let the users install it themselves in whatever manner they choose.

uint32_t does not name a type

I have shared code given to me that compiles on one linux system but not a newer system. The error is uint32_t does not name a type. I realize that this is often fixed by including the <cstdint> or stdint.h. The source code has neither of these includes and I am trying to seek an option that doesn't require modifying due to internal business practices that I can't control. Since it compiles as is on one machine they don't want changes to the source code.
I am not sure if it matters but the older system uses gcc 4.1 while the newer one uses gcc 4.4. I could install different versions of gcc if needed, or add/install library/include files on the newer machine, I have full control of what is on that machine.
What are my options for trying to compile this code on my machine without modifying the source? I can provide other details if needed.
I am not sure if it matters but the older system uses gcc 4.1 while the newer one uses gcc 4.4
GCC stopped including <stdint.h> some time ago. You now have to include something to get it...
I realize that this is often fixed by including the <cstdint> or stdint.h. The source code has neither of these includes and I am trying to seek an option that doesn't require modifying due to internal business practices that I can't control...
I hope I am not splitting hairs... If you can't modify the source files, then are you allowed to modify the build system or configuration files; or the environment? If so, you can use a force include to insert the file. See Include header files using command line option?
You can modify Makefile to force include stdint.h. If the build system honors CFLAGS or CXXFLAGS, then you can force include it in the flags. You last choice is probably to do something like export CC="gcc -include stdint.h".
The reason I am splitting hairs is OpenSSL and FIPS. The OpenSSL source files for the FIPS Object Module are sequestered and cannot be modified. We have to fallback to modifying supporting scripts and the environment to get some things working as expected.
If you really don't want to amend the file you could wrap it. Suppose it's called src.c create a new file src1.c:
#include <stdint.h>
#include "src.c"
And then compile src1.c.
PS: The problem may arise because compilers include other headers in their header files. This can mean some symbols 'officially' defined in other headers are quietly defined when you include a header that isn't specified as including it.
It's an error to write a program relying on a symbol for which the appropriate header hasn't been included - but it's easy to do and difficult to spot.
A changing compiler or version sometimes reveals these quiet issues.
Unfortunately, you can't force your code to work on a newer compiler without modifying something.
If you are allowed to modify the build script and add source files to the project, you might be able to add another source file to the project which, in turn, includes your affected file and headers it really needs. Remove the affected source files from the build, add the new ones, and rebuild.
If your shared source is using macro magic (e.g. an #include SOME_MACRO, where SOME_MACRO can be defined on the command line), you might be able to get away with modifying build options (to define that macro for every compilation of each file). Apart from relying on modifying the build process, it also relies on a possible-but-less-than-usual usage of macros in your project.
It may be possible to modify the standard headers in your compiler/library installation - assuming you have sufficient access (administrative) to do so. The problem with this is that the problem will almost certainly re-emerge whenever an update/patch to the compiler/library is installed. Over time, this approach will lock the code into relying on an older and older compiler/library that has been superseded - no ability to benefit from compiler bug fixes, evolution of standards, etc. This also severely limits your ability to share the code, and ability of others to use it - anyone who receives the code needs to modify their compiler/library installation.
The basic fact, however, is that your shared code relies on a particular implementation (compiler/library) that exhibits non-standard behaviour. Hence it has failed with an update of that implementation - which removed those non-standard occurrences - it is likely to fail with other implementations (porting to different compilers in future, etc). The real technical solution is to modify the source, and #include needed headers correctly. The real business solution is to make a business case justifying the need for such modifications, citing inefficiency - which will grow over time - in terms of cost and effort needed to maintain the shared code whenever it needs to be ported, or whenever a compiler is updated.
look at the second last line of code above your error, you'll find everything above that terminates with a , and only use a ; on the last entery

Every C++ header in a project as a precompiled header

The usual approach is to have one precompiled header in a project that contains the most common includes.
The problem is, that it is either too small or two big. When it is too small, it doesn't cover all the used headers so these have to be processed over and over in every module. When it is too large, it slows down the compilation too much for two reasons:
The project needs to be recompiled too often when you change something in header contained in the precompiled header.
The precompiled header is too large, so including it in every file actually slows down compilation.
What if I made all of the header files in a project precompiled. This would add some additional compiler work to precompile them, but then it would work very nicely, as no header would have to be processed twice (even preparing the precompiled header would use precompiled headers recursively), no extra stuff would have to be put into modules and only modules that are actually needed to be recompiled would be recompiled. In other words, for extra work O(N) complexity I would (theoretically) optimise O(n^2) comlexity of C++ includes. The precosseor to O(N), the processing of precompiled data would still be O(N^2), but at least minimised.
Did anyone tried this? Can it boost compile times in real life scenarios?
With GCC, the reliable way to use precompiled headers is to have one single (big) header (which #include-s many standard headers ...), and perhaps include some small header after the precompiled one.
See this answer for a more detailed explanation (for GCC specifically).
My own experience with GCC and Clang with precompiled headers is that you only can give a single pre-compiled header per compilation. See also the GCC documentation, I quote:
A precompiled header file can be used only when these conditions apply:
Only one precompiled header can be used in a particular compilation.
...
In practice, it's possible to compile every header to a precompiled header. (Recommended if you want to verify if everything is included, not recommended if you want to speed up compilation)
Based on your code, you can decide to use a different precompiled header based on the code that needs to be compiled. However, in general, it's a balancing act between compile time of the headers, compile-time of the CPP files and maintenance.
Adding a simple precompiled header that already contains several standard headers like string, vector, map, utility ... can already speed up your compilation with a remarkable percentage. (A long time ago, I've noticed a 15-20% on a small project)
The main gain you get from precompiled headers is that it:
only have to read 1 file instead of more, which improves on disk access
reads a binary format that's optimized for reading instead of plain text
it doesn't need to do all of the error checking as this was already done on creation
Even if you add a few headers that you don't use everywhere, it can still be much faster.
Lately, I also found the Clang build analyzer, it ain't ideal for big projects (see issue on github), though, it can give you some insights on where the time is being spent and what it can improve. (Or what you can improve in the codebase)
In all fairness, I don't use precompiled headers at this point in time. However, I do want to see it enabled on the project I'm working on.
Some other interesting reads:
https://medium.com/#unicorn_dev/speeding-up-the-build-of-c-and-c-projects-453ce85dd0e1
https://llunak.blogspot.com/2019/05/why-precompiled-headers-do-not-improve.html
https://www.bitsnbites.eu/faster-c-builds/

What are the pros & cons of pre-compiled headers specifically in a GNU/Linux environment/tool-chain?

Pre-compiled headers seem like they can save a lot of time in large projects, but also seem to be a pain-in-the-ass that have some gotchas.
What are the pros & cons of using pre-compiled headers, and specifically as it pertains to using them in a Gnu/gcc/Linux environment?
The only potential benefit to precompiled headers is that if your builds are too slow, precompiled headers might speed them up. Potential cons:
More Makefile dependencies to get right; if they are wrong, you build the wrong thing fast. Not good.
In principle, not every header can be precompiled. (Think about putting some #define's before a #include.) So which cases does gcc actually get right? How much do you want to trust this bleeding edge feature.
If your builds are fast enough, there is no reason to use precompiled headers. If your builds are too slow, I'd consider
Buying faster hardware, which is cheap compared to salaries
Using a tool like AT&T nmake or like ccache (Dirk is right on), both of which use trustworthy techniques to avoid recompilations.
I can't talk to GNU/gcc/linux, but I've dealt with pre-compiled headers in vs2005:
Pros:
Saves compile time when you have large headers that lots of modules
include.
Works well on headers (say from a third party) that change very
infrequently.
Cons:
If you use them for headers that change a lot,
it can increase compile time.
Can be fiddly to set up and maintain.
There are cases where changes to headers are apparently ignored
if you don't force the pre-compiled header to compile.
The ccache caching frontend to gcc, g++, gfortran, ... works great for me. As its website says
ccache is a compiler cache. It acts as
a caching pre-processor to C/C++
compilers, using the -E compiler
switch and a hash to detect when a
compilation can be satisfied from
cache. This often results in a 5 to 10
times speedup in common compilations.
On Debian / Ubuntu, just do 'apt-get install ccache' and create soft-links in, say, /usr/local/bin with names gcc, g++, gfortran, c++, ... that point to /usr/bin/ccache.
[EDIT] To make this more explicit in response to some early comments: This provides essentially pre-compiled headers and sources by caching a larger chunk of the compilation step. So it uses an idea that is similar to pre-compiled headers, and carries it further. The speedups can be dramatic -- a factor of 5 to 10 as the website says.
For plain C, I would avoid precompiled headers. As you say, they can potentially cause problems, and preprocessing time is really small compared to the regular compilation.
For C++, precompiled headers can potentially save a lot of time, as C++ headers often contain large template code whose compilation is expensive. I have no practical experience with them, so I recommend you measure how much savings in compilation you get in your project. To so so, compile the entire project with precompiled headers once, then delete a single object file, and measure how long it takes to recompile that file.
The GNU gcc documentation discusses possible pitfalls with pre-compiled headers.
I am using PCH in a Qt project, which uses cmake as build system, and it saves a lot of time. I grabbed some PCH cmake scripts, which needed some tweaking, since they were quite old but it generally was easier to set up than I expected. I have to add, I am not much of a cmake expert.
I am including now a big part of Qt (QtCore, QtGui, QtOpenGL) and a few stable headers at once.
Pros:
For Qt classes,no forward declarations are needed, and of course no includes.
Fast.
Easy to setup.
Cons:
You can't include the PCH include in headers. This isn't much of a problem, exept you use Qt and let the build system translate the moc files seperatly, which happens to be exactly my configuration. In this case, you need to #include the qt headers in your headers, because the mocs are genreted from headers. Solution was to put additional include guards around the #include in the header.