Related
I once worked on a C++ project that took about an hour and a half for a full rebuild. Small edit, build, test cycles took about 5 to 10 minutes. It was an unproductive nightmare.
What is the worst build times you ever had to handle?
What strategies have you used to improve build times on large projects?
Update:
How much do you think the language used is to blame for the problem? I think C++ is prone to massive dependencies on large projects, which often means even simple changes to the source code can result in a massive rebuild. Which language do you think copes with large project dependency issues best?
Forward declaration
pimpl idiom
Precompiled headers
Parallel compilation (e.g. MPCL add-in for Visual Studio).
Distributed compilation (e.g. Incredibuild for Visual Studio).
Incremental build
Split build in several "projects" so not compile all the code if not needed.
[Later Edit]
8. Buy faster machines.
My strategy is pretty simple - I don't do large projects. The whole thrust of modern computing is away from the giant and monolithic and towards the small and componentised. So when I work on projects, I break things up into libraries and other components that can be built and tested independantly, and which have minimal dependancies on each other. A "full build" in this kind of environment never actually takes place, so there is no problem.
One trick that sometimes helps is to include everything into one .cpp file. Since includes are processed once per file, this can save you a lot of time. (The downside to this is that it makes it impossible for the compiler to parallelize compilation)
You should be able to specify that multiple .cpp files should be compiled in parallel (-j with make on linux, /MP on MSVC - MSVC also has an option to compile multiple projects in parallel. These are separate options, and there's no reason why you shouldn't use both)
In the same vein, distributed builds (Incredibuild, for example), may help take the load off a single system.
SSD disks are supposed to be a big win, although I haven't tested this myself (but a C++ build touches a huge number of files, which can quickly become a bottleneck).
Precompiled headers can help too, when used with care. (They can also hurt you, if they have to be recompiled too often).
And finally, trying to minimize dependencies in the code itself is important. Use the pImpl idiom, use forward declarations, keep the code as modular as possible. In some cases, use of templates may help you decouple classes and minimize dependencies. (In other cases, templates can slow down compilation significantly, of course)
But yes, you're right, this is very much a language thing. I don't know of another language which suffers from the problem to this extent. Most languages have a module system that allows them to eliminate header files, which area huge factor. C has header files, but is such a simple language that compile times are still manageable. C++ gets the worst of both worlds. A big complex language, and a terrible primitive build mechanism that requires a huge amount of code to be parsed again and again.
Multi core compilation. Very fast with 8 cores compiling on the I7.
Incremental linking
External constants
Removed inline methods on C++ classes.
The last two gave us a reduced linking time from around 12 minutes to 1-2 minutes. Note that this is only needed if things have a huge visibility, i.e. seen "everywhere" and if there are many different constants and classes.
Cheers
IncrediBuild
Unity Builds
Incredibuild
Pointer to implementation
forward declarations
compiling "finished" sections of the proejct into dll's
ccache & distcc (for C/C++ projects) -
ccache caches compiled output, using the pre-processed file as the 'key' for finding the output. This is great because pre-processing is pretty quick, and quite often changes that force recompile don't actually change the source for many files. Also, it really speeds up a full re-compile. Also nice is the instance where you can have a shared cache among team members. This means that only the first guy to grab the latest code actually compiles anything.
distcc does distributed compilation across a network of machines. This is only good if you HAVE a network of machines to use for compilation. It goes well with ccache, and only moves the pre-processed source around, so the only thing you have to worry about on the compiler engine systems is that they have the right compiler (no need for headers or your entire source tree to be visible).
The best suggestion is to build makefiles that actually understand dependencies and do not automatically rebuild the world for a small change. But, if a full rebuild takes 90 minutes, and a small rebuild takes 5-10 minutes, odds are good that your build system already does that.
Can the build be done in parallel? Either with multiple cores, or with multiple servers?
Checkin pre-compiled bits for pieces that really are static and do not need to be rebuilt every time. 3rd party tools/libraries that are used, but not altered are a good candidate for this treatment.
Limit the build to a single 'stream' if applicable. The 'full product' might include things like a debug version, or both 32 and 64 bit versions, or may include help files or man pages that are derived/built every time. Removing components that are not necessary for development can dramatically reduce the build time.
Does the build also package the product? Is that really required for development and testing? Does the build incorporate some basic sanity tests that can be skipped?
Finally, you can re-factor the code base to be more modular and to have fewer dependencies. Large Scale C++ Software Design is an excellent reference for learning to decouple large software products into something that is easier to maintain and faster to build.
EDIT: Building on a local filesystem as opposed to a NFS mounted filesystem can also dramatically speed up build times.
Fiddle with the compiler optimisation flags,
use option -j4 for gmake for parallel compilation (multicore or single core)
if you are using clearmake , use winking
we can take out the debug flags..in extreme cases.
Use some powerful servers.
This book Large-Scale C++ Software Design has very good advice I've used in past projects.
Minimize your public API
Minimize inline functions in your API. (Unfortunately this also increases linker requirements).
Maximize forward declarations.
Reduce coupling between code. For instance pass in two integers to a function, for coordinates, instead of your custom Point class that has it's own header file.
Use Incredibuild. But it has some issues sometimes.
Do NOT put code that get exported from two different modules in the SAME header file.
Use the PImple idiom. Mentioned before, but bears repeating.
Use Pre-compiled headers.
Avoid C++/CLI (i.e. managed c++). Linker times are impacted too.
Avoid using a global header file that includes 'everything else' in your API.
Don't put a dependency on a lib file if your code doesn't really need it.
Know the difference between including files with quotes and angle brackets.
Powerful compilation machines and parallel compilers. We also make sure the full build is needed as little as possible. We don't alter the code to make it compile faster.
Efficiency and correctness is more important than compilation speed.
In Visual Studio, you can set number of project to compile at a time. Its default value is 2, increasing that would reduce some time.
This will help if you don't want to mess with the code.
This is the list of things we did for a development under Linux :
As Warrior noted, use parallel builds (make -jN)
We use distributed builds (currently icecream which is very easy to setup), with this we can have tens or processors at a given time. This also has the advantage of giving the builds to the most powerful and less loaded machines.
We use ccache so that when you do a make clean, you don't have to really recompile your sources that didn't change, it's copied from a cache.
Note also that debug builds are usually faster to compile since the compiler doesn't have to make optimisations.
We tried creating proxy classes once.
These are really a simplified version of a class that only includes the public interface, reducing the number of internal dependencies that need to be exposed in the header file. However, they came with a heavy price of spreading each class over several files that all needed to be updated when changes to the class interface were made.
In general large C++ projects that I've worked on that had slow build times were pretty messy, with lots of interdependencies scattered through the code (the same include files used in most cpps, fat interfaces instead of slim ones). In those cases, the slow build time was just a symptom of the larger problem, and a minor symptom at that. Refactoring to make clearer interfaces and break code out into libraries improved the architecture, as well as the build time. When you make a library, it forces you to think about what is an interface and what isn't, which will actually (in my experience) end up improving the code base. If there's no technical reason to have to divide the code, some programmers through the course of maintenance will just throw anything into any header file.
Cătălin Pitiș covered a lot of good things. Other ones we do:
Have a tool that generates reduced Visual Studio .sln files for people working in a specific sub-area of a very large overall project
Cache DLLs and pdbs from when they are built on CI for distribution on developer machines
For CI, make sure that the link machine in particular has lots of memory and high-end drives
Store some expensive-to-regenerate files in source control, even though they could be created as part of the build
Replace Visual Studio's checking of what needs to be relinked by our own script tailored to our circumstances
It's a pet peeve of mine, so even though you already accepted an excellent answer, I'll chime in:
In C++, it's less the language as such, but the language-mandated build model that was great back in the seventies, and the header-heavy libraries.
The only thing that is wrong about Cătălin Pitiș' reply: "buy faster machines" should go first. It is the easyest way with the least impact.
My worst was about 80 minutes on an aging build machine running VC6 on W2K Professional. The same project (with tons of new code) now takes under 6 minutes on a machine with 4 hyperthreaded cores, 8G RAM Win 7 x64 and decent disks. (A similar machine, about 10..20% less processor power, with 4G RAM and Vista x86 takes twice as long)
Strangely, incremental builds are most of the time slower than full rebuuilds now.
Full build is about 2 hours. I try to avoid making modification to the base classes and since my work is mainly on the implementation of these base classes I only need to build small components (couple of minutes).
Create some unit test projects to test individual libraries, so that if you need to edit low level classes that would cause a huge rebuild, you can use TDD to know your new code works before you rebuild the entire app. The John Lakos book as mentioned by Themis has some very practical advice for restructuring your libraries to make this possible.
I'm currently trying to optimise build speed for a big project with following in mind:
build speed is priority 1
resulting binaries size does not matter
Infos:
Environment: Visual Studio 2012 (required, because of the software I'm developing for) + Windows machine
BuildTime: 12mins (clean build), 1min for small changes and every now and than small changes result in 5-6min because of slow linking (this is what I want to address)
Custom files in project: approx. 2500 (SDK I need to use excluded, a big SDK for a CAD system)
Lines of code in custom files: approx. 500000
I'm using an up-to-date CAD capable computer (32GB RAM, >3GHz QuadCore, SSD)
Ideas:
use precompiled headers => done, but does not have the effect I want; helps speed up compile time most of the time, but every now and than does not
split up project into libraries => not sure if this helps
Questions
I could not find anything about using libraries and build speed, but I assume if I precompile libraries, the linker will be faster.
Is this assumption true?
If I make a static library with the core functions, will this have an effect on build time? Or will the linker need as long as it does currently?
If I make a dynamic library, will this have an effect on build time? Or will the linker again check the dll completely and will need the same time?
I assume if I precompile libraries, the linker will be faster. Is this assumption true?
No, not likely. If at all (because the linker has to open fewer files), then the difference will be marginal.
If I make a static library with the core functions, will this have an effect on build time? Or will the linker need as long as it does currently?
It may make a huge difference on compile time, since although on a truly clean rebuild you still have to compile everything as before, on a normal "mostly clean" rebuild rebuilding the support libraries is superfluous since nothing ever changes inside them, so all you really need to rebuild is the user code, and as a result you compile a lot fewer files.
Note that every sane build system normally builds a dependency graph and tries to compile as few files as possible anyway (and, to the extent possible, with some level of parallelism), unless you explicitly tell it to do a clean build (which is rarely necessary to be done). Doctor, it hurts when I do this -- well, don't do it.
The difference for the linker will, again, be marginal. The linker still needs to look up the exact same amount of symbols, and still needs to copy the same amount of code into the executable.
You may want to play with link order. Funny as it sounds, sometimes the order in which libraries and object files are linked makes a 5x difference on how long it takes the linker to do its job.
That being said, 12 minutes for a clean build indeed isn't a lot. Your non-clean buils will likely be in the two-digit second range, of which linking probably takes 90%. That's normally not a showstopper. Come back when a build takes 4 hours :-)
If I make a dynamic library, will this have an effect on build time? Or will the linker again check the dll completely and will need the same time?
The linker will still have to do some work for every function you call, which might be slightly faster, but will still be more or less the same.
Note that you add runtime (startup) overhead by moving code into a DLL. It is more work for the loader to load a program with parts of the code in a DLL as it needs to load another image, parse its header, resolve symbols, set up some pointers, run per-thread init functions, etc. That's usually not an issue (the difference is not really that much noticeable), just letting you know it's not free.
12 minutes is a short full build time and 500KLOC is not that big. Many free software projects (GCC, Qt, ...) have longer ones (hours) and millions of C++ lines.
You might want to use a serious and parallel build automation tool, such as ninja. Perhaps you could do some distributed build (like what distcc permits) if you can compile on remote machines.
You could configure your IDE to run an external command (such as ninja) for builds. This don't change autocompletion abilities. You could adopt another source code editor (e.g. GNU emacs).
C++ is not (yet) modular (it does not have genuine modules, like e.g. Ocaml or Go), and that makes its compilation slow (e.g. because standard container headers are big, e.g. <vector> brings about 10KLOC of included code, probably used and included in most of your C++ code). So you should avoid having many small files (e.g. merging two files of 250 lines each into one of 500 lines could decrease build time) and it looks like you have too much small C++ files. I would recommend source files of more than a thousand lines each. Having only one class implementation (or one function) per source file slows down the total build time.
You surely want to use more indirection in your code. Use more systematically PIMPL idioms and virtual method tables, closures, std::function-s. Remember the rule of five.
Compiling my project takes ages and I thought I would like to improve the compile time of it. The first thing I am trying to do is to break down the compile time into the individual files.
So that the compiler tells me for example:
boost/variant.hpp: took 100ms in total
myproject/foo.hpp: took 25ms in total
myproject/bar.cpp: took 125ms in total
I could then specifically try to improve the compile time of the files taking up the most time, by introducing forward declaration and/or reordering things so I can omit include files.
Is there something for this task? I am using GCC and ICC (intel c++)
I use Scons as my build system.
The important metric is not how long it takes to process (whatever that means) a header file, but how often the header file changes and forces the build system to reinvoke the compiler on all dependent units.
The time the compiler spends parsing useless code is really small compared to all the other steps of the compilation process. Even if you include entire unneeded files, they're likely hot in disk cache. And precompiled headers make this even better.
The goal is to avoid recompiling units due to unrelated changes in header files. That's where techniques such as pimpl and other compile firewalls come in.
And link-time-code-generation aka whole-program-optimization makes matters worse, by undoing compile-time firewalls and reprocessing the entire program anyway.
Anyway, information on how unstable a header file is should be attainable from build logs, commit logs, even last modified date in the filesystem.
You have an unusual, quirky definition of the time spent processing header files that doesn't match what anyone else uses. So you can probably make this happen, but you'll have to do it yourself. Probably the best way is to run gcc under strace -tt. You can then see when it opens, reads, and closes each file, allowing you to tell how long it processes them.
Have you tried instrumenting the build as a whole yet? Like any performance problem, it's likely that what you think is the problem is not actually the problem. Electric Make is a GNU-make-compatible implementation of make that can produce an XML-annotated build log, which can in turn be used for analysis of build performance issues with ElectricInsight. For example, the "Job Time by Type" report in ElectricInsight can tell you broadly what activities consume the most time in your build, and specifically which jobs in the build are longest. That will help you to direct your efforts to the places where they will be most fruitful.
For example:
Disclaimer: I am the chief architect of Electric Make and ElectricInsight.
We have project which uses gcc and make files. Project also contains of one big subproject (SDK) and a lot of relatively small subprojects which use that SDK and some shared framework.
We use precompiled headers, but that helps only for re-compilation to be faster.
Is there any known techniques and tools to help with build-time optimizations? Or maybe you know some articles/resources about this or related topics?
You can tackle the problem from two sides: refactor the code to reduce the complexity the compiler is seeing, or speed up the compiler execution.
Without touching the code, you can add more compilation power into it. Use ccache to avoid recompiling files you have already compiled and distcc to distribute the build time among more machines. Use make -j where N is the number of cores+1 if you compile locally, or a bigger number for distributed builds. That flag will run more than one compiler in parallel.
Refactoring the code. Prefer forward declaration to includes (simple). Decouple as much as you can to avoid dependencies (use the PIMPL idiom).
Template instantiation is expensive, they are recompiled in every compilation unit that uses them. If you can refactor your templates as to forward declare them and then instantiate them in only one compilation unit.
The best I can think of with make is the -j option. This tells make to run as many jobs as possible in parallel:
make -j
If you want to limit the number of concurrent jobs to n you can use:
make -j n
Make sure the dependencies are correct so make doesn't run jobs it doesn't have to.
Another thing to take into account is optimizations that gcc does with the -O switch. You can specify various levels of optimization. The higher the optimization, the longer the compile and link times. A project I work with runs takes 2 minutes to link with -O3, and half a minute with -O1. You should make sure you're not optimizing more than you need to. You could build without optimization for development builds and with optimization for deployment builds.
Compiling with debug info (gcc -g) will probably increase the size of your executable and may impact your build time. If you don't need it, try removing it to see if it affects you.
The type of linking (static vs. dynamic) should make a difference. As far as I understand static linking takes longer (though I may be wrong here). You should see if this affects your build.
From the description of the project I guess that you have one Makefile per directory and are using recursive make a lot. In that case techniques from "Recursive Make Considered Harmful" should help very much.
If you have multiple computers available gcc is well distributed by distcc.
You can also use ccache in addition.
All this works with very little changes of the makefiles.
Also, you'll probably want to keep your source code files as small and self-contained as possible/feasible, i.e. prefer many smaller object files over one huge single object file.
This will also help avoid unnecessary recompilations, in addition you can have one static library with object files for each source code directory or module, basically allowing the compiler to reuse as much previously compiled code as possible.
Something else, which wasn't yet mentioned in any of the previous responses, is making symbol linkage as 'private' as possible, i.e. prefer static linkage (functions, variables) for your code if it doesn't have to be visible externally.
In addition, you may also want to look into using the GNU gold linker, which is much more efficient for compiling C++ code for ELF targets.
Basically, I'd advise you to carefully profile your build process and check where the most time is spend, that'll give you some hints as to how to optimize your build process or your projects source code structure.
You could consider switching to a different build system (which obviously won't work for everyone), such as SCons. SCons is much smarter than make. It automatically scans header dependencies, so you always have the smallest set of rebuild dependencies. By adding the line Decider('MD5-timestamp') to your SConstruct file, SCons will first look at the time stamp of a file, and if it's newer than the previously built time stamp, it will use the MD5 of the file to make sure you actually changed something. This works not just on source files but object files as well. This means that if you change a comment, for instance, you don't have to re-link.
The automatic scanning of header files has also ensured that I never have to type scons --clean. It always does the right thing.
If you have a LAN with developer machines, perhaps you should try implementing a distributed compiler solution, such as distcc.
This might not help if all of the time during the build is spent analyzing dependencies, or doing some single serial task. For the raw crunch of compiling many source files into object files, parallel building obviously helps, as suggested (on a single machine) by Nathan. Parallelizing across multiple machines can take it even further.
http://ccache.samba.org/ speeds up big time.
I work on a middle sized project, and that's the only thing we do to speed up the compile time.
You can use distcc distributed compiler to reduce the build time if you have access to several machines.
Here's an article from from IBM developerWorks related to distcc and how you can use it:
http://www.ibm.com/developerworks/linux/library/l-distcc.html
Another method to reduce build time is to use precompiled headers. Here's a starting point for gcc.
Also don't forget to use -j when building with make if your machine has more than one cpu/core(2x the number of cores/cpus is just fine).
Using small files may not always be a good recommendation. A disk have a 32 or 64K min sector size, with a file taking at least a sector. So 1024 files of 3K size (small code inside) will actually take 32 or 64 Meg on disk, instead of the expected 3 meg. 32/64 meg that needs to be read by the drive. If files are dispersed around on the disk you increase read time even more with seek time. This is helped with Disk Cache obviously, to a limit. pre-compiled header can also be of good help alleviating this.
So with due respect to coding guidelines, there is no point in going out of them just to place each strcuct, typedef or utility class into separate files.
I'm desperately looking for cheap ways to lower the build times on my home PC. I just read an article about disabling the Last Access Time attribute of a file on Windows XP, so that simple reads don't write anything back to disk.
It's really simple too. At a DOS-prompt write:
fsutil behavior set disablelastaccess 1
Has anyone ever tried it in the context of building C++ projects? Any drawbacks?
[Edit] More on the topic here.
From SetFileTime's documentation:
"NTFS delays updates to the last access time for a file by up to one hour after the last access."
There's no real point turning this off - the original article is wrong, the data is not written out on every access.
EDIT:
As to why the author of that article claimed a 10x speed-up, I think he attributed his speed-up to the wrong thing: he also disabled 8.3 filename generation. To generate an 8.3 filename for a file, NTFS has to basically generate each possibility in turn then see if it's already in use (no reference; I'm sure Raymond has talked about it but can't find a link). If your files all share the same first six characters, you will be bitten by this problem, and the corrolary is you should put characters which differentiate files in the first six characters so they don't clash. Turning off short name generation will prevent this.
I haven't tried this on a Windows box (I will be tonight, thanks) but the similar thing on Linux (noatime option when mounting the drive) sped things up considerably.
I can't think of any uses where the last access time would be useful other than for auditing purposes and, even then, does Windows store the user that accessed it? I know Linux doesn't.
I'd suggest you try it and see if it makes a difference.
However I'm pessimistic about this actually making any difference, since in the larger/clean builds you'll be writing out large amounts of data anyway, so adjusting the file access times wouldn't take that much time (plus it'd probably be cached anyway).
I'd love to be proven wrong though.
Results:
Ran a few builds on the code base at work in both debug and release configurations with the last access time enabled, and disabled.
Our source code is about 39 MB (48 MB size on disk), and we build about half of that for the configuration that I built for these tests. The debug build generated 1.76 GB of temporary and output files, while the release generated about 600 MB of such data. We build on the command line using a combination of Ant and the Visual Studio command line built tools.
My machine is a Core 2 Duo 3GHz, with 4GB of ram, a 7200rpm hdd, running Windows XP 32 bit.
Building with the last access time disabled:
Debug times = 6:17, 5:41
Release times = 6:07, 6:06
Building with the last access time enabled:
Debug times = 6:00, 5:47
Release times = 6:19, 5:48
Overall I did not notice any difference between the two modes, as in both cases the files are most likely in the system cache already so it should just be reading from memory.
I believe that you'll get the biggest bang for your buck by just implementing proper precompiled headers (not the automatically generated ones that Visual Studio creates in a project). We implemented this a few years ago at work (when the code base was far smaller) and it cut down our build time to a third of what it was.
It's a good alternative, but it will affect some tools. Like the Remote Storage Service, and other utilies that depend on file access statistics to optimize your file system (i.e. Norton Defrag)
it will improve the performance a little. Other than that it won't do much more (you won't be able to see when the file was last accessed of course). I have it turned of by default when I install windows XP using nLite to cut of the bloat I don't need.
I don't want to draw attention away from the "last access time" question, but there might be other ways to speed up your builds. Not knowing the context and your project setup, it's hard to say what might be slow, but there might be some things that might help:
Create "uber" builds. That is, create a single compilation uber.cpp file that contains a bunch of lines like
#include "file1.cpp"
#include "file2.cpp"
You might have trouble with conflicting static variable names, but those are generally easy to sort out. Initial setup is kind of a pain, but build times can increase dramatically. For us, the biggest drawback is that in developer studio, you can't right click a file and say 'compile' if that file is part of an uber build. It's not a big deal though. We have seperate build configurations for 'uber' builds which compile the uber files but exclude the individual cpp files from the build process. If you need more info, leave a comment and I can get you that. Also, the optimizer tends to do a slightly better job with uber builds.
Also, do you have a large number of include files, or a lot of depencendies between include files? If so, that will drastically slow down build times.
Are you using precompiled headers? If not, you might look into that as a solution as that will help as well.
Slow build times are usually tracked down to lots of file I/O. That is by far the biggest time sink in a build -- just opening, reading and parsing all of the files. If you cut down file I/O, you will improve build times.
Anyway, sorry to slightly derail the topic slightly, but the suggestion at hand to change how the last access time of a file is set seemed to be somewhat of a 'sledgehammer' solution.
For busy servers, disabling last access time is generally a good idea. The only potential downside is if there are scripts that use last access time to, for instance, tell that a file is no longer being written.
That said, if you're looking to improve build times on a C++ project, I highly recommend reading Recursive Make Considered Harmful. The article is about a decade old, but the points it makes about how recursive definitions in our build scripts cause long build times is still well worth understanding.
Disabling access time is useful when using ssd's (solid state drives - cards,usb drives etc) as it reduces the number of writes to the drive. All solid state storage devices have a life which is measured by the number of writes that can be made to each individual address. Some media specify a minimum of 100's of thousands and some even 1 million. Operating systems and other executables can access many files in a single operation as well as user document accesses. This would apply to eee pc's, embedded systems and others.
To Mike Dimmick:
Try to connect USB drive with many files and copy them to your internal drive. That's also the case in addition to program compilation (which is described in original post).