Related
I once worked on a C++ project that took about an hour and a half for a full rebuild. Small edit, build, test cycles took about 5 to 10 minutes. It was an unproductive nightmare.
What is the worst build times you ever had to handle?
What strategies have you used to improve build times on large projects?
Update:
How much do you think the language used is to blame for the problem? I think C++ is prone to massive dependencies on large projects, which often means even simple changes to the source code can result in a massive rebuild. Which language do you think copes with large project dependency issues best?
Forward declaration
pimpl idiom
Precompiled headers
Parallel compilation (e.g. MPCL add-in for Visual Studio).
Distributed compilation (e.g. Incredibuild for Visual Studio).
Incremental build
Split build in several "projects" so not compile all the code if not needed.
[Later Edit]
8. Buy faster machines.
My strategy is pretty simple - I don't do large projects. The whole thrust of modern computing is away from the giant and monolithic and towards the small and componentised. So when I work on projects, I break things up into libraries and other components that can be built and tested independantly, and which have minimal dependancies on each other. A "full build" in this kind of environment never actually takes place, so there is no problem.
One trick that sometimes helps is to include everything into one .cpp file. Since includes are processed once per file, this can save you a lot of time. (The downside to this is that it makes it impossible for the compiler to parallelize compilation)
You should be able to specify that multiple .cpp files should be compiled in parallel (-j with make on linux, /MP on MSVC - MSVC also has an option to compile multiple projects in parallel. These are separate options, and there's no reason why you shouldn't use both)
In the same vein, distributed builds (Incredibuild, for example), may help take the load off a single system.
SSD disks are supposed to be a big win, although I haven't tested this myself (but a C++ build touches a huge number of files, which can quickly become a bottleneck).
Precompiled headers can help too, when used with care. (They can also hurt you, if they have to be recompiled too often).
And finally, trying to minimize dependencies in the code itself is important. Use the pImpl idiom, use forward declarations, keep the code as modular as possible. In some cases, use of templates may help you decouple classes and minimize dependencies. (In other cases, templates can slow down compilation significantly, of course)
But yes, you're right, this is very much a language thing. I don't know of another language which suffers from the problem to this extent. Most languages have a module system that allows them to eliminate header files, which area huge factor. C has header files, but is such a simple language that compile times are still manageable. C++ gets the worst of both worlds. A big complex language, and a terrible primitive build mechanism that requires a huge amount of code to be parsed again and again.
Multi core compilation. Very fast with 8 cores compiling on the I7.
Incremental linking
External constants
Removed inline methods on C++ classes.
The last two gave us a reduced linking time from around 12 minutes to 1-2 minutes. Note that this is only needed if things have a huge visibility, i.e. seen "everywhere" and if there are many different constants and classes.
Cheers
IncrediBuild
Unity Builds
Incredibuild
Pointer to implementation
forward declarations
compiling "finished" sections of the proejct into dll's
ccache & distcc (for C/C++ projects) -
ccache caches compiled output, using the pre-processed file as the 'key' for finding the output. This is great because pre-processing is pretty quick, and quite often changes that force recompile don't actually change the source for many files. Also, it really speeds up a full re-compile. Also nice is the instance where you can have a shared cache among team members. This means that only the first guy to grab the latest code actually compiles anything.
distcc does distributed compilation across a network of machines. This is only good if you HAVE a network of machines to use for compilation. It goes well with ccache, and only moves the pre-processed source around, so the only thing you have to worry about on the compiler engine systems is that they have the right compiler (no need for headers or your entire source tree to be visible).
The best suggestion is to build makefiles that actually understand dependencies and do not automatically rebuild the world for a small change. But, if a full rebuild takes 90 minutes, and a small rebuild takes 5-10 minutes, odds are good that your build system already does that.
Can the build be done in parallel? Either with multiple cores, or with multiple servers?
Checkin pre-compiled bits for pieces that really are static and do not need to be rebuilt every time. 3rd party tools/libraries that are used, but not altered are a good candidate for this treatment.
Limit the build to a single 'stream' if applicable. The 'full product' might include things like a debug version, or both 32 and 64 bit versions, or may include help files or man pages that are derived/built every time. Removing components that are not necessary for development can dramatically reduce the build time.
Does the build also package the product? Is that really required for development and testing? Does the build incorporate some basic sanity tests that can be skipped?
Finally, you can re-factor the code base to be more modular and to have fewer dependencies. Large Scale C++ Software Design is an excellent reference for learning to decouple large software products into something that is easier to maintain and faster to build.
EDIT: Building on a local filesystem as opposed to a NFS mounted filesystem can also dramatically speed up build times.
Fiddle with the compiler optimisation flags,
use option -j4 for gmake for parallel compilation (multicore or single core)
if you are using clearmake , use winking
we can take out the debug flags..in extreme cases.
Use some powerful servers.
This book Large-Scale C++ Software Design has very good advice I've used in past projects.
Minimize your public API
Minimize inline functions in your API. (Unfortunately this also increases linker requirements).
Maximize forward declarations.
Reduce coupling between code. For instance pass in two integers to a function, for coordinates, instead of your custom Point class that has it's own header file.
Use Incredibuild. But it has some issues sometimes.
Do NOT put code that get exported from two different modules in the SAME header file.
Use the PImple idiom. Mentioned before, but bears repeating.
Use Pre-compiled headers.
Avoid C++/CLI (i.e. managed c++). Linker times are impacted too.
Avoid using a global header file that includes 'everything else' in your API.
Don't put a dependency on a lib file if your code doesn't really need it.
Know the difference between including files with quotes and angle brackets.
Powerful compilation machines and parallel compilers. We also make sure the full build is needed as little as possible. We don't alter the code to make it compile faster.
Efficiency and correctness is more important than compilation speed.
In Visual Studio, you can set number of project to compile at a time. Its default value is 2, increasing that would reduce some time.
This will help if you don't want to mess with the code.
This is the list of things we did for a development under Linux :
As Warrior noted, use parallel builds (make -jN)
We use distributed builds (currently icecream which is very easy to setup), with this we can have tens or processors at a given time. This also has the advantage of giving the builds to the most powerful and less loaded machines.
We use ccache so that when you do a make clean, you don't have to really recompile your sources that didn't change, it's copied from a cache.
Note also that debug builds are usually faster to compile since the compiler doesn't have to make optimisations.
We tried creating proxy classes once.
These are really a simplified version of a class that only includes the public interface, reducing the number of internal dependencies that need to be exposed in the header file. However, they came with a heavy price of spreading each class over several files that all needed to be updated when changes to the class interface were made.
In general large C++ projects that I've worked on that had slow build times were pretty messy, with lots of interdependencies scattered through the code (the same include files used in most cpps, fat interfaces instead of slim ones). In those cases, the slow build time was just a symptom of the larger problem, and a minor symptom at that. Refactoring to make clearer interfaces and break code out into libraries improved the architecture, as well as the build time. When you make a library, it forces you to think about what is an interface and what isn't, which will actually (in my experience) end up improving the code base. If there's no technical reason to have to divide the code, some programmers through the course of maintenance will just throw anything into any header file.
Cătălin Pitiș covered a lot of good things. Other ones we do:
Have a tool that generates reduced Visual Studio .sln files for people working in a specific sub-area of a very large overall project
Cache DLLs and pdbs from when they are built on CI for distribution on developer machines
For CI, make sure that the link machine in particular has lots of memory and high-end drives
Store some expensive-to-regenerate files in source control, even though they could be created as part of the build
Replace Visual Studio's checking of what needs to be relinked by our own script tailored to our circumstances
It's a pet peeve of mine, so even though you already accepted an excellent answer, I'll chime in:
In C++, it's less the language as such, but the language-mandated build model that was great back in the seventies, and the header-heavy libraries.
The only thing that is wrong about Cătălin Pitiș' reply: "buy faster machines" should go first. It is the easyest way with the least impact.
My worst was about 80 minutes on an aging build machine running VC6 on W2K Professional. The same project (with tons of new code) now takes under 6 minutes on a machine with 4 hyperthreaded cores, 8G RAM Win 7 x64 and decent disks. (A similar machine, about 10..20% less processor power, with 4G RAM and Vista x86 takes twice as long)
Strangely, incremental builds are most of the time slower than full rebuuilds now.
Full build is about 2 hours. I try to avoid making modification to the base classes and since my work is mainly on the implementation of these base classes I only need to build small components (couple of minutes).
Create some unit test projects to test individual libraries, so that if you need to edit low level classes that would cause a huge rebuild, you can use TDD to know your new code works before you rebuild the entire app. The John Lakos book as mentioned by Themis has some very practical advice for restructuring your libraries to make this possible.
Our product is a library which we deliver as a dll or static library. I've noticed that using Whole Program Optimization in Visual Studio improves the performance around 30%. This is good but referring to
http://blogs.msdn.com/b/vcblog/archive/2009/02/24/quick-tips-on-using-whole-program-optimization.aspx
I see that it is not suggested to use whole program optimization for libraries that are delivered to customers.
The same article mentions around 3-4% improvement in performance. Now that we see 10 times of the expected performance gain, I am thinking whether we are doing something wrong.
Not sure how to formulate this but I'll give it a try: Apparently our code base has a "problem" that WPO can solve very well. Whatever this "problem" (or problems?) is , it is less important in other software hence WPO has relatively small impact. Now my question is what might be this problem? We would like to optimize our code manually since turning on WPO is not an option.
Probably, you have some functions called many times, which can't be inlined without WPO due to being defined in source files. You can use a profiler to identify these, then move them into headers and mark them inline.
What techniques do you use to compile and start VSC++ projects fast?
For us, especially the loading of all the dlls take a long time. Is there a way to speed this up? The project loads a ton of .dlls and some of them are especially slow.
Now that we use unity build for our projects, it already compiles blazingly fast! =)
Thanks!
DLLs have a default load location embedded into them. This is typically defaulted by the development tool to the same address for all DLLs. This means that whenn the DLLs are loaded into memory, there are a lot of collisions and the DLL has to be readdressed and loaded into a free memory location. When working on a project that had a significant number of DLL dependencies, we were able to make significant load time savings by setting the default address for our DLLs.
A fuller explanation into what's going on and how it helps can be found at drdobbs.
It's been some years since I've done this, so it may be out of date now.
It's worth keeping in mind if you go down this route, it might not play very well with .net.
Use delay-loaded libraries. It's a simple compile settings change (typically no code changes needed), yet it can offer very big improvements.
Of course, you still have the load times of those DLLs when you actually use them, but if you have many DLLs there's also a large chance that you won't use all of them all of the time.
Is it better to have lots of DLL dependencies or better to static link as mich as possible?
Thanks
No, it is not bad practice to ship with lots of DLLs; it is bad practice, though, to put them in %System32%. Actually, it is usually good to use DLLs instead of statically linking; for one thing, you can easily swap out just the DLL that you need to update, rather than having to replace the entire binary, and for another, if your program eventually needs multiple executables that work together, you only pay for one copy of the DLL code (whereas, with static linking, you would end up duplicating the code that was common).
Having static link gives your app a large memory footprint, therefore having DLL's is better from that POV i.e. you only load what you need. Nowadays installations are normally done by an installer so it doesn't matter if you have lots of DLL's.
I don't think it's a bad practice. Look at Office or Adobe or any large-scale application. They end up with lots of DLLs -- because they otherwise would have to pack everything into a 100M+ exe.
Break things into DLL when you don't absolutely need them.
Generally speaking it is not a bad practice. It is better to split the code of a program into separated dynamic libraries, especially if the functions provided are used from more than one executable.
That doesn't mean that every program should have its code split in more dynamic libraries; for simple utilities, that is not probably needed.
As mentioned by others, lots of DLLs is not a bad practice. Put some thought into what to put in each one. I like to keep the DLLs as 'tiny-island-ish' as I can. If these will be distributed, I like to have a specific naming convention that reflects the product and/or company name and/or initials of some sort.
Just wanted to add another observation from other programs that use many dynamically loaded DLLs. For example, the GIMP and its plug-ins. The way you load your DLLs will affect your client's perceived application speed, if that's a factor among the other very good ones (updates, reuse, etc.). I'm sure there's some overhead for the OS to load a DLL and you might run into process limits (like open file handles). Having very many very small DLLs might not be as desired as "smaller than that" number of "larger than that"-sized DLLs.
Let's say that I have a binary that I am building, and I include a bunch of files that are never actually used, and do the subsequent linking to the libraries described by those include files? (again, these libraries are never used)
What are the negative consequences of this, beyond increased compile time?
A few I can think of are namespace pollution and binary size
In addition to compile time; Increased complexity, needless distraction while debugging, a maintenance overhead.
Apart from that, nothing.
In addition to what Sasha lists is maintenance cost. Will you be able to detect easily what is used and what is not used in the future when and if you chose to remove unused stuff?
If the libraries are never used, there should be no increase in size for the executable.
Depending on the exact linker, you might also notice that the global objects of your unused libraries still get constructed. This implies a memory overhead and increases startup costs.
If the libraries you're including but not using aren't on the target system, it won't be able to compile even though they aren't necessary.
Here is my answer for a similar question concerning C and static libraries. Perhaps it is useful to you in the context of C++ as well.
You mention an increase in the compilation time. From that I understand the libraries are statically linked, and not dynamically. In this case, it depends how the linker handles unused functions. If it ignores them, you will have mostly maintenance problems. If they’ll be included, the executable size will increase. Now, this is more significant than the place it takes on the hard drive. Large executables could run slower due to caching issues. If active code and non-active code are adjacent in the exe, they will be cached together, making the cache effectively smaller and less efficient.
VC2005 and above have an optimization called PGO, which orders the code within the executable in a way that ensures effective caching of code that is often used. I don’t know if g++ has a similar optimization, but it’s worth looking into that.
A little compilation here of the issues, wiki-edit it as necessary:
The main problem appears to be: Namespace Pollution
This can cause problems in future debugging, version control, and increased future maintenance cost.
There will also be, at the minimum, minor Binary Bloat, as the function/class/namespace references will be maintained (In the symbol table?). Dynamic libraries should not greatly increase binary size(but they become a dependency for the binary to run?). Judging from the GNU C compiler, statically linked libraries should not be included in final binary if they are never referenced in the source. (Assumption based on the C compiler, may need to clarify/correct)
Also, depending on the nature of your libraries, global and static objects/variables may be instantiated, causing increased startup time and memory overhead.
Oh, and increased compile/linking time.
I find it frustrating when I edit a file in the source tree because some symbol that I'm working on appears in the source file (e.g. a function name, where I've just changed the prototype - or, sadly but more typically, just added the prototype to a header) so I need to check that the use is correct, or the compiler now tells me the use in that file is incorrect. So, I edit the file. Then I see a problem - what is this file doing? And it turns out that although the code is 'used' in the product, it really isn't actively used at all.
I found an occurrence of this problem on Monday. A file with 10,000+ lines of code invoked a function 'extern void add_remainder(void);' with an argument of 0. So, I went to fix it. Then I looked at the rest of the code...it turned out it was a development stub from about 15 years ago that had never been removed. Cleanly excising the code turned out to involve minor edits to more than half-a-dozen files - and I've not yet worked out whether it is safe to remove the enumeration constant from the middle of an enumeration in case. Temporarily, that is marked 'Unused/Obsolete - can it be removed safely?'.
That chunk of code has had zero cove coverage for the last 15 years - production, test, ... True, it's only a tiny part of a vast system - percentage-wise, it's less than a 1% blip on the chart. Still, it is extra wasted code.
Puzzling. Annoying. Depressingly common (I've logged, and fixed, at least half a dozen similar bugs this year so far).
And a waste of my time - and other developers' time. The file had been edited periodically over the years by other people doing what I was doing - a thorough job.
I have never experienced any problems with linking a .lib file of which only a very small part is used. Only the code that is really used will be linked into the executable, and the linking time did not increase noticeably (with Visual Studio).
If you link to binaries and they get loaded at runtime, they may perform non-trivial initialization which can do anything from allocate a small amount of memory to consume scarce resources to alter the state of your module in ways you don't expect, and beyond.
You're better off getting rid of stuff you don't need, simply to eliminate a bunch of unknowns.
It could perhaps even fail to compile if the build tree isn't well maintained. if your'e compiling on embedded systems without swap space. The compiler can run out of memory while trying to compile a massive object file.
It happened at work to us recently.