Inlining behavior when doing unity build (clang)

Inlining behavior when doing unity build (clang) - c++

To optimize compile time I would like to enable unity build (via CMake) for my embedded system (C++) project. Works great, but as i see it there are some side effects.
One major thing that i observed is that the linked binary has a different size (bigger) than compared to the "normal" build. Looking at the elf file I noticed that there are fewer symbols in the unity built binary compared to the other one. As I see it there is some inlining happening at compile time (initially I thought inlining happens at link time?) and therefore binary size grows, when inlined functions are used multiple times.
Because of the inlining that happens with unity build, the runtime is also slightly shorter.
The concern i have right now is that with growing source code, i get different unity buckets and therefore inlining is not really deterministic.
If my assumption is right, is there a way to counteract to this problem?

First to this aspect
(initially I thought inlining happens at link time?)
Is compiler and context dependend but in general, compiler and linker are involved here both.
For the general question:
How do you define Unity buckets for you? I guess you mean effectively multiple translation units. If that's the case, nothing is really different to "common" building schemes (each .cpp leads to a single translation unit), so your concerns would have nothing to do with UnityBuild in detail. As a best practice: If you use UnityBuilds, make sure your project/solution is always fine for "common" builds and for UnityBuild likewise. That means for instance: Avoid global using namespaces always (in .cpp and .h), make sure you do not run into static initialization order fiasco (very relevant to your question here), avoid a lot of anonymous namespace usage cases (sad I know...), use a strict include guard scheme and so on...

Related

Are unnecessary include files an overhead?

I have seen a couple of questions on how to detect unnecessary #include files in a C++ project. This question has often intrigued me, but I have never found a satisfactory answer.
If there are some header files included which, are not being used in a c++ project, is that an overhead? I understand that it means that before compilation the contents of all the header files would be copied into the included source files and that would result in a lot of unnecessary compilation.
How far does this kind of overhead spread to the compiled object files and binaries?
Aren't compilers able to do some optimizations to make sure that this
kind of overhead is not transferred to the resulting object files and
binaries ?
Considering the fact, that I probably know nothing about compiler optimization, I still want to ask this, in case there is an answer.
As a programmer who uses a wide variety of c++ libraries for his work,
what kind of programming practices should I follow to keep avoiding
such overheads ? Is making myself intimately familiar with each
library's working the only way out ?

It does not affect the performance of the binary or even the contents of the binary file, for almost all headers. Declarations generate no code at all, inline/static/anonymous-namespace definitions are optimized away if they aren't used, and no header should include externally visible definitions (that breaks if the header is included by more than one translation unit).
As #T.C. points out, the exception are internally visible static objects with nontrivial constructors. iostream does this, for example. The program must behave as if the constructor is called, and the compiler usually doesn't have enough information to optimize the constructor away.
It does, however, affect how long compilation takes and how many files will be recompiled when a header is changed. For large projects, this is enough incentive to care about unnecessary includes.

Besides the obviously longer compile times, there might be other issues. The most important one IMHO is dependencies to external libraries. You don't want your program to depend on more libraries then necessary.
You also then need to install those libraries in every system you want to the program to build on. This can become a nightmare, especially when the next programmer needs to install some database client library although the program never uses a database.
Also, especially library headers often tend to define macros. Sometimes those macros have very generic names which will break you code or which are incompatible with other library headers you might actually need.

Of course any #include is an overhead. The compiler needs to parse that file.
So avoid them. Use forward declarations where ever possible.
It will speed up compilation. See Scott Myers book on the subject

The simple answer is YES its an overhead as far as the compilation is concerned but for runtime it is merely going to create any difference. Reason being lets say you add #include <iostream> (just for example) and assume that you are not using any of its function then g++ 4.5.2 has some additional 18,560 lines of code to process(compilation). But as far as the runtime overhead is concerned I hardly think that it creates a performance issue.
You can also refer Are unused includes harmful in C/C++? where I really liked this point made by David Young
Any singletons declared as external in a header and defined in a
source file will be included in your program. This obviously increases
memory usage and possibly contributes to a performance overhead by
causing one to access their page file more often (not much of a
problem now, as singletons are usually small-to-medium in size and
because most people I know have 6+ GB of RAM).

Protection against accidental object incompatibility?

TL;DR
Protection against binary incompatibility resulting from compiler argument typos in shared, possibly templated headers' preprocessor directives, which control conditional compilation, in different compilation units?
Ex.
g++ ... -DYOUR_NORMAl_FLAG ... -o libA.so
/**Another compilation unit, or even project. **/
g++ ... -DYOUR_NORMA1_FLAG ... -o libB.so
/**Another compilation unit, or even project. **/
g++ ... -DYOUR_NORMAI_FLAG ... main.cpp libA.so //The possibilities!
The Basic Story
Recently, I ran into a strange bug: the symptom was a single SIGSEGV, which always seemed to occur at the same location after recompling. This led me to believe there was some kind of memory corruption going on, and the actual underlying pointer is not a pointer at all, but some data section.
I save you from the long and strenuous journey taking almost two otherwise perfectly good work days to track down the problem. Sufficient to say, Valgrind, GDB, nm, readelf, electric fence, GCC's stack smashing protection, and then some more measures/methods/approaches failed.
In utter devastation, my attention turned to the finest details in the build process, which was analogous to:
Build one small library.
Build one large library, which uses the small one.
Build the test suite of the large library.
Only in case when the large library was used as a static, or a dynamic library dependency (ie. the dynamic linker loaded it automatically, no dlopen)
was there a problem. The test case where all the code of the library was simply included in the tests, everything worked: this was the most important clue.
The "Solution"
In the end, it turned out to be the simplest thing: a single (!) typo.
Turns out, the compilation flags differed by a single char in the test suite, and the large library: a define, which was controlling the behavior of the small library, was misspelled.
Critical information morsel: the small library had some templates. These were used directly in every case, without explicit instantiation in advance. The contents of one of the templated classes changed when the flag was toggled: some data fields were simply not present in case the flag was defined!
The linker noticed nothing of this. (Since the class was templated, the resultant symbols were weak.)
The code used dynamic casts, and the class affected by this problem was inheriting from the mangled class -> things went south.
My question is as follows: how would you protect against this kind of problem? Are there any tools or solutions which address this specific issue?
Future Proofing
I've thought of two things, and believe no protection can be built on the object file level:
1: Save options implemented as preprocessor symbols in some well defined place, preferably extracted by a separate build step. Provide check script which uses this to check all compiler defines, and defines in user code. Integrate this check into the build process. Possibly use Levenshtein distance or similar to check for misspellings. Expensive, and the script / solution can get complicated. Possible problem with similar flags (but why have them?), additional files must accompany compiled library code. (Well, maybe with DWARF 2, this is untrue, but let's assume we don't want that.)
2: Centralize build options: cheap, customization option left open (think makefile.local), but makes monolithic monstrosities, strong project couplings.
I'd like to go ahead and quench a few likely flame inducing embers possibly flaring up in some readers: "do not use preprocessor symbols" is not an option here.
Conditional compilation does have it's place in high performance code, and doing everything with templates and enable_if-s would needlessly overcomplicate things. While the above solution is usually not desirable it can arise form the development process.
Please assume you have no control over the situation, assume you have legacy code, assume everything you can to force yourself to avoid side-stepping.
If those won't do, generalize into ABI incompatibility detection, though this might escalate the scope of the question too much for SO.
I'm aware of:
http://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html
DT_SONAME is not applicable.
Other version schemes therein are not applicable either - they were designed to protect a package which is in itself not faulty.
Mixing C++ ABIs to build against legacy libraries
Static analysis tool to detect ABI breaks in C++

If it matters, don't have a default case.
#ifdef YOUR_NORMAL_FLAG
// some code
#elsif YOUR_SPECIAL_FLAG
// some other code
#else
// in case of a typo, this is a compilation error
# error "No flag specified"
#endif
This may lead to a large list of compiler options if conditional compilation is overused, but there are ways around this like defining config-files
flag=normal
flag2=special
which get parsed by build scripts and generate the options and can possibly check for typos or could be parsed directly from the Makefile.

Does a file structure that is mostly Header files slow down anything besides compilation?

Does a file structure that is mostly header files (90% of your code being header-only) slow down anything besides compilation?
Some people argue that it could cause inlining of most code in case of speed optimizations and so processor would calculate wrong stats about instruction calls or something like that. Is anywhere shown that it or something similar would happen and so slow down application speed?

This is possibly a duplicate of Benefits of inline functions in C++?
The practical performance implication depends on many factors. I would not concern myself with it until you actually have a performance problem, in which case I'm sure bigger gains can be obtained by optimizing other things.
Don't keep all your code in headers - if you continue with this trend you will hate yourself later because you will be waiting for your compiler most of the time. LTO is a better approach if you are looking for similar optimizations, and has less of an impact on compile time.

Linking is a concern.
If your libraries are header dominant, then larger intermediate object files may need to be written then read. The linker will then have more symbols to analyze and deduplicate, and some symbols will remain as legal duplicates. This increases your I/O, bloats your binary size, and throws a lot more work at the linker.
One benefit of header dominance is that there tends to be fewer sources to compile and consequently fewer images/objects to link. So header only also has the potential to be faster in this regard (if used correctly).
If your library is going to be visible to many translations, then size and impact on linking should also be an important consideration.

Not performance but potential bug concern:
From Uses Guidelines : In C++ class member functions declared in class defenition body always get inlined. If class member function has static members this would lead to each inlined function instance having its own static member. This would lead to bugs.

Difference in inlining functions by compiler or linker?

I am wondering whether there is any difference between inlining functions on a linker level or compiler level in terms of execution speed?
e.g. if I have all my functions in .cpp files and rely on the linker to do inlining, will this inlining potentially be less efficient than say defining some functions in the headers for selected inlining on the compiler level or unity builds without any linking and all inlining done by the compiler?
If the linker is just as efficient, why would one then still bother inlining functions explicitly on the compiler level? Is that just for convenience, say there is just a one line constructor hence one can't be bothered with a .cpp file?
I suppose this might depend on the compiler, in which case I would be most interested in Visual C++ (Windows) and gcc (Linux).
Thanks

The general rule is all else being equal the closer to execution (compiling->linking->(maybe JIT)->execution) the optimization is done the more data the optimizer has and the better optimization it can perform. So unless the optimizer is dumb you should expect better results when inlining is done by the linker - the linker will know more about the invokation context and do better optimization.

Generally, by the time the linker is run, your source has already been compiled into machine code. The linkers job is to take all the code fragments and link then together (possibly fixing addresses along the way). In such a case, there is no room for performing inlining.
But all is not lost. Gcc does provide a mechanism for link time optimization (using the -flto) option when compiling and linking. This causes gcc to produce a byte code that can then be compiled and linked by the linker into a single executable. Since the byte code contains more information than optimized machine code. The linker can now perform radical optimization on the whole codebase. Something that the compiler cannot do.
See here for more details on gcc. Not to sure about VC++ though.

Inlining is normally performed within a single translation unit (.cpp file). When you call functions in another file, they’re never inlined.
Link Time Optimization (LTO) changes this, allowing inlining to work across translation units. It should always be equal or better (sometimes very very significantly) to regular linking in terms of how efficient the generated code is.
The reason both options are still available is that LTO can take a large amount of RAM and CPU – I’ve had VC++ take several minutes on linking a large C++ project before. Sometimes it’s not worth it to enable until you ship. You could also run out of address space with a large enough project, as it has to load all that bytecode into RAM.
For writing efficient code, nothing changes – all the same rules apply with LTO. It is potentially more efficient to explicitly define an inline function in a header file versus depending on LTO to inline it. The inline keyword only provides a hint so there’s no guarantee, but it might nudge it into being inlined where normally (with or without LTO) it wouldn’t be.

If the function is inlined, there would be no difference.
I believe the main reason for having inline functions defined in the headers is history. Another is portability. Until resently most compilers did not do link time code generation, so it having the functions in the headers was a necessity. That of course affects code bases started on more than a couple of years ago.
Also, if you still target some compilers that don't support link time code generation, you dont have a choice.
As an aside, I have in one case been forced to add a pragma to ask one specific compiler not to inline an init() function defined in one .cpp file, but potentially called from many places.

How can I get my very large program to link?

Our next product has grown too large to link on a machine running 32-bit Windows. The sum total of all the lib files exceeds 2Gb and can only be linked on a 64-bit Windows machine. Eventually we will exceed that boundary, since our software tends to grow rather than contract and we are using a 32-bit linker (MS Visual Studio 2005): we expect to hit trouble when our lib size total exceeds 3Gb.
How can I reduce the size of the .lib files, or the .obj files without trimming code? For example, we use a lot of templates: is there any way of reducing their footprint? Is there any way of finding out what's causing the bloat from examining the .lib/.obj files? Can this be automated rather than inspected by eye? 2.5Gb is a lot of text to peer through and compare.
External constraints prevent us from shipping as anything other than a single .exe, so a DLL solution is not available.

I had once been working on a project with several MLoC. While ours would still link on a 32bit machine, link times where abysmal and became a major problem, because developers were reduced to only get a dozen edit-compile-test cycles done per workday. (Compile times were handled pretty well by doing distributed compilation.)
We switched to dynamic linking. That increased startup time, but this could be managed by delay-loading of DLLs.

First, of course, make sure you compile with the 'Optimize for Size' option.
If you do that, I wouldn't expect inlining, at least, to contribute significantly to the code size. The compiler makes a tradeoff for every inlining candidate regarding how much (if at all) it'd increase code size, compared to the performance boost it'd give. And if you're optimizing for size, the compiler won't risk bloating the code much. (Note that inlining very small functions can actually decrease code size)
Second, have you considered unity builds? That'd pretty much eliminate the linker entirely, and with only one translation unit, there'd be much less duplicate work and hopefully, a smaller memory footprint.
Finally, I know Visual Studio (or possibly the Windows SDK) has a 64-bit compiler (that is, a compiler that is itself a 64-bit application, not just a compiler producing 64-bit code). Consider using that. (I don't know if there is also a 64-bit linker)
I don't know i the linker is built with the LARGEADDRESSAWARE flag set. If so, running it on a 64-bit machine will let the process consume a full 4GB of memory instead of the 2 GB it normally gets. (if necessary, you can add the flag yourself by modifying the PE header)
Perhaps limiting the linkage of various symbols could help as well. If you know that a symbol won't be needed outside of the current translation unit, put it in an anonymous namespace. That might allow the compiler to trim down unused symbols before passing everything on to the linker

Try using the Symbol Sort program to show you where the main bits of bloat are in your code. Also just looking at the size of the raw .obj files will give you a reasonable idea of where to target.

OMFG!!!!! That's huuuuuge!
Apart from the fact I think it's too big to be rational... can't you use dynamic linking to avoid linking all the mess in compile time and only link in runtime what's necesary (I mean, loading dlls in demand)?

Does it need to be one big app?
One option is to split various modules into DLLs and load/unload them as needed.
Alternatively, you might be able to split into several apps and share data using mapped memory, pipes a DBMS or even simple data files.

First of all, find out how to measure the size which is used by various features. Don't go ahead and try to play replace template usage or other things because you suspect that it makes a significant difference.
Run
dumpbin /HEADERS <somebinary>
to find out which sections in your binary are causing the huge size. Do you have a huge Debug Directory section? Strip symbols then. Is the Import Address Table large? Check the table and locate symbols which you don't need (a problem with templates is that symbols of template instantiations tend to be very very large). Similiar analysis can be done for the Exception Directory, COM Descriptor Directory etc..

I do not think there is any single tool that can give you statistics that you want/need. Using either .map files or the dumpbin utility with /SYMBOLS parameter plus some post-processing of the created log might help you get what you want.
If the statistics confirm your suspicion of template bloat, or even without the confirmation, it might be a good idea to do several things with the source:
Try using explicit instantiations and move template definitions into .cpp files. Of course this works only if you have limited and well known set of types/values that you use as arguments to the templates.
Add more abstraction and/or indirection. Factor code that does not depend on your template parameters into their own base classes or free functions. If you have several template type parameters, see if you cannot split the single class template into several base classes without overlapping template parameters. (See http://www2.research.att.com/~bs/SCARY.pdf.)
Try using the pimpl idiom; avoid instantiating templates in headers if you can, instantiate them only in .cpp files.
Templates are nice but sometimes ordinary classes work as well; e.g. avoid passing integer constants as non-type template parameters if you can pass them as parameter to ctor.

#hatcat and #jalf: There is indeed a full set of 64bit tools. For example, you can set an environment variable:
set PreferredToolArchitecture=x64
and then run Visual Studio (from de developer console).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js