Improving Rust binary build times - build

I’m just starting a Rust project and already it takes ~7.6s to build what I’d consider a simple binary.
I’m using async/await a lot and commenting out some awaits, like the await on my hyper server, can improve build performance by about 4s. But I’m not sure if that’s because building async/await is slow or by removing that await the Rust compiler no longer needs to build my HTTP response handling code.
// Removing the equivalent of this line (from https://hyper.rs) in
// my codebase improves build times by ~4s.
if let Err(e) = server.await {
eprintln!("server error: {}", e);
}
I’m not using Cargo which means I’m manually generating rustc commands. Here’s part of a command I’m using for one of these binaries:
rustc admin/dev/server/main.rs --edition=2018 --crate-name=dev_server --crate-type=bin --target=x86_64-apple-darwin --codegen=opt-level=0 --codegen=debuginfo=2 --cap-lints=allow --emit=link --color=always ...
I’ve also abstracted my code into smaller crates. I was hoping that would mean Rust could skip recompilation of smaller crates and third-party dependencies (like hyper), but my guess is that binary codegen is what’s taking all the time?
What are techniques I can use for profiling my Rust build and techniques I can use for improving it? Is there anyway I can reuse compilation for dependencies which haven’t changed when rebuilding the binary? In development could I use dynamic linking for my dependencies instead of static linking?

Use cargo build: it implements an incremental build, so it doesn't build dependencies every time, and only once. The first build is long, and all subsequent builds are fast. You need to use cargo init before that.
Use cargo check to quickly check and fix compilation errors. cargo check runs faster than cargo build because it doesn't do the final compilation step.
Attribute macros (like #[derive(Serialize, Deserialize, Debug)]) consumes a pretty decent amount of time usually, so pay attention.
Seems like there is an only manual way to identify a bottleneck of your build time consumption: remove some dependencies and look how the build time differs.
There is no way to specify dynamic linking just for development.

Rust 1.60 allows you to run cargo build --timings which helps you identify slow components in the build.
As the documentation states:
Look for slow dependencies.
Check if they have features that you may wish to consider disabling.
Consider trying to remove the dependency completely.
Look for a crate being built multiple times with different versions.
Try to remove the older versions from the dependency graph.
Split large crates into smaller pieces.
If there are a large number of crates bottlenecked on a single crate, focus your attention on improving that one crate to improve
parallelism.

Related

Faster build times in C++ [duplicate]

I once worked on a C++ project that took about an hour and a half for a full rebuild. Small edit, build, test cycles took about 5 to 10 minutes. It was an unproductive nightmare.
What is the worst build times you ever had to handle?
What strategies have you used to improve build times on large projects?
Update:
How much do you think the language used is to blame for the problem? I think C++ is prone to massive dependencies on large projects, which often means even simple changes to the source code can result in a massive rebuild. Which language do you think copes with large project dependency issues best?
Forward declaration
pimpl idiom
Precompiled headers
Parallel compilation (e.g. MPCL add-in for Visual Studio).
Distributed compilation (e.g. Incredibuild for Visual Studio).
Incremental build
Split build in several "projects" so not compile all the code if not needed.
[Later Edit]
8. Buy faster machines.
My strategy is pretty simple - I don't do large projects. The whole thrust of modern computing is away from the giant and monolithic and towards the small and componentised. So when I work on projects, I break things up into libraries and other components that can be built and tested independantly, and which have minimal dependancies on each other. A "full build" in this kind of environment never actually takes place, so there is no problem.
One trick that sometimes helps is to include everything into one .cpp file. Since includes are processed once per file, this can save you a lot of time. (The downside to this is that it makes it impossible for the compiler to parallelize compilation)
You should be able to specify that multiple .cpp files should be compiled in parallel (-j with make on linux, /MP on MSVC - MSVC also has an option to compile multiple projects in parallel. These are separate options, and there's no reason why you shouldn't use both)
In the same vein, distributed builds (Incredibuild, for example), may help take the load off a single system.
SSD disks are supposed to be a big win, although I haven't tested this myself (but a C++ build touches a huge number of files, which can quickly become a bottleneck).
Precompiled headers can help too, when used with care. (They can also hurt you, if they have to be recompiled too often).
And finally, trying to minimize dependencies in the code itself is important. Use the pImpl idiom, use forward declarations, keep the code as modular as possible. In some cases, use of templates may help you decouple classes and minimize dependencies. (In other cases, templates can slow down compilation significantly, of course)
But yes, you're right, this is very much a language thing. I don't know of another language which suffers from the problem to this extent. Most languages have a module system that allows them to eliminate header files, which area huge factor. C has header files, but is such a simple language that compile times are still manageable. C++ gets the worst of both worlds. A big complex language, and a terrible primitive build mechanism that requires a huge amount of code to be parsed again and again.
Multi core compilation. Very fast with 8 cores compiling on the I7.
Incremental linking
External constants
Removed inline methods on C++ classes.
The last two gave us a reduced linking time from around 12 minutes to 1-2 minutes. Note that this is only needed if things have a huge visibility, i.e. seen "everywhere" and if there are many different constants and classes.
Cheers
IncrediBuild
Unity Builds
Incredibuild
Pointer to implementation
forward declarations
compiling "finished" sections of the proejct into dll's
ccache & distcc (for C/C++ projects) -
ccache caches compiled output, using the pre-processed file as the 'key' for finding the output. This is great because pre-processing is pretty quick, and quite often changes that force recompile don't actually change the source for many files. Also, it really speeds up a full re-compile. Also nice is the instance where you can have a shared cache among team members. This means that only the first guy to grab the latest code actually compiles anything.
distcc does distributed compilation across a network of machines. This is only good if you HAVE a network of machines to use for compilation. It goes well with ccache, and only moves the pre-processed source around, so the only thing you have to worry about on the compiler engine systems is that they have the right compiler (no need for headers or your entire source tree to be visible).
The best suggestion is to build makefiles that actually understand dependencies and do not automatically rebuild the world for a small change. But, if a full rebuild takes 90 minutes, and a small rebuild takes 5-10 minutes, odds are good that your build system already does that.
Can the build be done in parallel? Either with multiple cores, or with multiple servers?
Checkin pre-compiled bits for pieces that really are static and do not need to be rebuilt every time. 3rd party tools/libraries that are used, but not altered are a good candidate for this treatment.
Limit the build to a single 'stream' if applicable. The 'full product' might include things like a debug version, or both 32 and 64 bit versions, or may include help files or man pages that are derived/built every time. Removing components that are not necessary for development can dramatically reduce the build time.
Does the build also package the product? Is that really required for development and testing? Does the build incorporate some basic sanity tests that can be skipped?
Finally, you can re-factor the code base to be more modular and to have fewer dependencies. Large Scale C++ Software Design is an excellent reference for learning to decouple large software products into something that is easier to maintain and faster to build.
EDIT: Building on a local filesystem as opposed to a NFS mounted filesystem can also dramatically speed up build times.
Fiddle with the compiler optimisation flags,
use option -j4 for gmake for parallel compilation (multicore or single core)
if you are using clearmake , use winking
we can take out the debug flags..in extreme cases.
Use some powerful servers.
This book Large-Scale C++ Software Design has very good advice I've used in past projects.
Minimize your public API
Minimize inline functions in your API. (Unfortunately this also increases linker requirements).
Maximize forward declarations.
Reduce coupling between code. For instance pass in two integers to a function, for coordinates, instead of your custom Point class that has it's own header file.
Use Incredibuild. But it has some issues sometimes.
Do NOT put code that get exported from two different modules in the SAME header file.
Use the PImple idiom. Mentioned before, but bears repeating.
Use Pre-compiled headers.
Avoid C++/CLI (i.e. managed c++). Linker times are impacted too.
Avoid using a global header file that includes 'everything else' in your API.
Don't put a dependency on a lib file if your code doesn't really need it.
Know the difference between including files with quotes and angle brackets.
Powerful compilation machines and parallel compilers. We also make sure the full build is needed as little as possible. We don't alter the code to make it compile faster.
Efficiency and correctness is more important than compilation speed.
In Visual Studio, you can set number of project to compile at a time. Its default value is 2, increasing that would reduce some time.
This will help if you don't want to mess with the code.
This is the list of things we did for a development under Linux :
As Warrior noted, use parallel builds (make -jN)
We use distributed builds (currently icecream which is very easy to setup), with this we can have tens or processors at a given time. This also has the advantage of giving the builds to the most powerful and less loaded machines.
We use ccache so that when you do a make clean, you don't have to really recompile your sources that didn't change, it's copied from a cache.
Note also that debug builds are usually faster to compile since the compiler doesn't have to make optimisations.
We tried creating proxy classes once.
These are really a simplified version of a class that only includes the public interface, reducing the number of internal dependencies that need to be exposed in the header file. However, they came with a heavy price of spreading each class over several files that all needed to be updated when changes to the class interface were made.
In general large C++ projects that I've worked on that had slow build times were pretty messy, with lots of interdependencies scattered through the code (the same include files used in most cpps, fat interfaces instead of slim ones). In those cases, the slow build time was just a symptom of the larger problem, and a minor symptom at that. Refactoring to make clearer interfaces and break code out into libraries improved the architecture, as well as the build time. When you make a library, it forces you to think about what is an interface and what isn't, which will actually (in my experience) end up improving the code base. If there's no technical reason to have to divide the code, some programmers through the course of maintenance will just throw anything into any header file.
Cătălin Pitiș covered a lot of good things. Other ones we do:
Have a tool that generates reduced Visual Studio .sln files for people working in a specific sub-area of a very large overall project
Cache DLLs and pdbs from when they are built on CI for distribution on developer machines
For CI, make sure that the link machine in particular has lots of memory and high-end drives
Store some expensive-to-regenerate files in source control, even though they could be created as part of the build
Replace Visual Studio's checking of what needs to be relinked by our own script tailored to our circumstances
It's a pet peeve of mine, so even though you already accepted an excellent answer, I'll chime in:
In C++, it's less the language as such, but the language-mandated build model that was great back in the seventies, and the header-heavy libraries.
The only thing that is wrong about Cătălin Pitiș' reply: "buy faster machines" should go first. It is the easyest way with the least impact.
My worst was about 80 minutes on an aging build machine running VC6 on W2K Professional. The same project (with tons of new code) now takes under 6 minutes on a machine with 4 hyperthreaded cores, 8G RAM Win 7 x64 and decent disks. (A similar machine, about 10..20% less processor power, with 4G RAM and Vista x86 takes twice as long)
Strangely, incremental builds are most of the time slower than full rebuuilds now.
Full build is about 2 hours. I try to avoid making modification to the base classes and since my work is mainly on the implementation of these base classes I only need to build small components (couple of minutes).
Create some unit test projects to test individual libraries, so that if you need to edit low level classes that would cause a huge rebuild, you can use TDD to know your new code works before you rebuild the entire app. The John Lakos book as mentioned by Themis has some very practical advice for restructuring your libraries to make this possible.

Speeding up C++ builds with Unity Builds and reduced header dependencies

I just converted an Objective-C(++) project to plain C++.
While moving more and more code over, I noticed the build time increase quite a lot.
My project is currently split up into several frameworks/dylibs and a main project which uses these frameworks.
I did some research and found that there are basically three things recommended to reduce the build time:
reducing header dependencies
using unity builds
using a tool like ccache to not redo unneeded work all the time
I implemented ccache and it works great and I was able to decrease the build time quite a bit.
I'm a bit unsure though about reducing the header dependencies and the unity builds. I read that a big downside of the unity builds is that you need to recompile everything if you make changes in one source file which makes sense. That however would not be a problem for the frameworks as they will need to be recompiled anyways if they change.
I read that it's bad practice to use "umbrella headers" such as "MyFramework.h" which will include all the public headers of a given framework although you may only need a few of them.
Cocoa uses umbrella headers everywhere and it's of course much easier than to pick the exact headers needed for each source file.
However, when using unity builds I will only have one header per framework, correct?
Does it still make sense to pick the individual headers or will using "umbrella headers" be ok with unity builds?
Tapping a bit in the dark here and don't want to spend time implementing a technique which doesn't help in the end.
Thanks for your help!
It feels like question for opinionated answers. Mine are such:
Always reduce header dependencies. Reduced dependencies make overall architecture cleaner. Independent little individual modules with clear responsibilities loosely coupled are always better to work with than spaghetti.
Use precompiled headers for compiling rarely changing headers. The third party, library and framework umbrella headers change rarely and so need to be rarely parsed and recompiled too.
Work most of the time with separate units (few cpp files) and unit tests for those. Otherwise you build whole program, then navigate in it to situation of interest then step with debugger there and so on. May be you like it but I'm too lazy, it is boring repetitive and wastes my time. Only linking of whole C++ program (worth anything) takes usually ten minutes or more and I don't need so many coffee-breaks.
Do not use unity builds, better use continuous integration that when you
push automatically builds and runs all the unit tests of whole program and prepares binaries on
other computer (or farm). You will be notified when it is done (or did fail) and
then you can take the binaries and debug whole program too if you want to.

C++ increase build speed in large project by using libraries

I'm currently trying to optimise build speed for a big project with following in mind:
build speed is priority 1
resulting binaries size does not matter
Infos:
Environment: Visual Studio 2012 (required, because of the software I'm developing for) + Windows machine
BuildTime: 12mins (clean build), 1min for small changes and every now and than small changes result in 5-6min because of slow linking (this is what I want to address)
Custom files in project: approx. 2500 (SDK I need to use excluded, a big SDK for a CAD system)
Lines of code in custom files: approx. 500000
I'm using an up-to-date CAD capable computer (32GB RAM, >3GHz QuadCore, SSD)
Ideas:
use precompiled headers => done, but does not have the effect I want; helps speed up compile time most of the time, but every now and than does not
split up project into libraries => not sure if this helps
Questions
I could not find anything about using libraries and build speed, but I assume if I precompile libraries, the linker will be faster.
Is this assumption true?
If I make a static library with the core functions, will this have an effect on build time? Or will the linker need as long as it does currently?
If I make a dynamic library, will this have an effect on build time? Or will the linker again check the dll completely and will need the same time?
I assume if I precompile libraries, the linker will be faster. Is this assumption true?
No, not likely. If at all (because the linker has to open fewer files), then the difference will be marginal.
If I make a static library with the core functions, will this have an effect on build time? Or will the linker need as long as it does currently?
It may make a huge difference on compile time, since although on a truly clean rebuild you still have to compile everything as before, on a normal "mostly clean" rebuild rebuilding the support libraries is superfluous since nothing ever changes inside them, so all you really need to rebuild is the user code, and as a result you compile a lot fewer files.
Note that every sane build system normally builds a dependency graph and tries to compile as few files as possible anyway (and, to the extent possible, with some level of parallelism), unless you explicitly tell it to do a clean build (which is rarely necessary to be done). Doctor, it hurts when I do this -- well, don't do it.
The difference for the linker will, again, be marginal. The linker still needs to look up the exact same amount of symbols, and still needs to copy the same amount of code into the executable.
You may want to play with link order. Funny as it sounds, sometimes the order in which libraries and object files are linked makes a 5x difference on how long it takes the linker to do its job.
That being said, 12 minutes for a clean build indeed isn't a lot. Your non-clean buils will likely be in the two-digit second range, of which linking probably takes 90%. That's normally not a showstopper. Come back when a build takes 4 hours :-)
If I make a dynamic library, will this have an effect on build time? Or will the linker again check the dll completely and will need the same time?
The linker will still have to do some work for every function you call, which might be slightly faster, but will still be more or less the same.
Note that you add runtime (startup) overhead by moving code into a DLL. It is more work for the loader to load a program with parts of the code in a DLL as it needs to load another image, parse its header, resolve symbols, set up some pointers, run per-thread init functions, etc. That's usually not an issue (the difference is not really that much noticeable), just letting you know it's not free.
12 minutes is a short full build time and 500KLOC is not that big. Many free software projects (GCC, Qt, ...) have longer ones (hours) and millions of C++ lines.
You might want to use a serious and parallel build automation tool, such as ninja. Perhaps you could do some distributed build (like what distcc permits) if you can compile on remote machines.
You could configure your IDE to run an external command (such as ninja) for builds. This don't change autocompletion abilities. You could adopt another source code editor (e.g. GNU emacs).
C++ is not (yet) modular (it does not have genuine modules, like e.g. Ocaml or Go), and that makes its compilation slow (e.g. because standard container headers are big, e.g. <vector> brings about 10KLOC of included code, probably used and included in most of your C++ code). So you should avoid having many small files (e.g. merging two files of 250 lines each into one of 500 lines could decrease build time) and it looks like you have too much small C++ files. I would recommend source files of more than a thousand lines each. Having only one class implementation (or one function) per source file slows down the total build time.
You surely want to use more indirection in your code. Use more systematically PIMPL idioms and virtual method tables, closures, std::function-s. Remember the rule of five.

Cache header files for make/g++

I am working on a large C++ code base and I want to speed up its compilation times. One thing I know is that all of my library includes are on a network drive which slows down things a lot. If I can get make or something else to automatically cache them, either in /tmp, or to a ramdisk, I expect that to improve the compile times for me quite a bit. Is there a way to do that? Of course I can copy them manually and set up a sync job but then every developer will have to do that on every box they ever compile so I am looking for an automated solution.
Thanks!
Of course. There are lots and lots of ways. You can just have a make rule that copies things then have people run make copylocal or whatever. This is fast but people have to remember to do it, and if the libraries change a lot this could be annoying. You can make a rule that copies things then put it as a prerequisite to some other target so it's done on every build, first. This will take longer: will the copy step plus using local copies take longer total time than just using the remote copies? Who knows?
You could also use a tool like ccache to locally cache the results of the compilation, rather than copying the library headers. This will give you a lot more savings for a typical small incremental build and it's easily integrated into the build system, but it will slightly slow down "rebuild the world" steps to cache the objects.
Etc. Your question is too open-ended to be easily answerable.
Avoid using network file systems for code. Use a version control system like git.
You might also consider using ccache.

When full build and when partial build?

Hi I am trying to find out when full build is required and when partial build is sufficient.
There are many articals but I am not able to find the specific answers.
Below are my thoughts
Full build is required when :
1.Change in build of dependent modules.
---change in build option or using optimization techniques.
2.changes in the object layout:
---Any change in the headder file, adding and deleting of new methods in class .
---Changing object size by adding or removing of variables or virtual functions.
---Data alignment changes using pragma pack.
3.Any change in global variables
Partial build is sufficient when:
1.Any change in the logic as long as it is not altering the interface specified
2.change in stack variable
In the ideal world a full build should never be necessary, because all the build tools detecting automatically if one of their dependencies have changed.
But this is true only in the ideal world. Practically build tools are written by humans and humans
make failures, so the tools may not take every possible change into account,
are lazy, so the tools may not take any change into account.
For you this means you have to have some experience with your build tools. With a good written makefile may take everything into account and you rarely have to do a full build. But in the 21st century a makefile is not really state of the art any more, and they become complex very soon. Todays development environments do a fairly good job in finding dependencies, but for larger projects, you may have dependencies which are hard to put in the concept of your development environment and you will writing a script.
So there is no real answer to your question. In practise it is good to do a full rebuild for every release, then this rebuild should be done by pressing just one button. And do a partial build for daily work, since nobody wants to wait 2 hours to see if is code is compilable or not. But even in dayly work a full rebuild is sometimes neccessary because the linker/compiler/(your choice of tool here) had not recognized even the simplest change.