I am wondering whether there is any difference between inlining functions on a linker level or compiler level in terms of execution speed?
e.g. if I have all my functions in .cpp files and rely on the linker to do inlining, will this inlining potentially be less efficient than say defining some functions in the headers for selected inlining on the compiler level or unity builds without any linking and all inlining done by the compiler?
If the linker is just as efficient, why would one then still bother inlining functions explicitly on the compiler level? Is that just for convenience, say there is just a one line constructor hence one can't be bothered with a .cpp file?
I suppose this might depend on the compiler, in which case I would be most interested in Visual C++ (Windows) and gcc (Linux).
Thanks
The general rule is all else being equal the closer to execution (compiling->linking->(maybe JIT)->execution) the optimization is done the more data the optimizer has and the better optimization it can perform. So unless the optimizer is dumb you should expect better results when inlining is done by the linker - the linker will know more about the invokation context and do better optimization.
Generally, by the time the linker is run, your source has already been compiled into machine code. The linkers job is to take all the code fragments and link then together (possibly fixing addresses along the way). In such a case, there is no room for performing inlining.
But all is not lost. Gcc does provide a mechanism for link time optimization (using the -flto) option when compiling and linking. This causes gcc to produce a byte code that can then be compiled and linked by the linker into a single executable. Since the byte code contains more information than optimized machine code. The linker can now perform radical optimization on the whole codebase. Something that the compiler cannot do.
See here for more details on gcc. Not to sure about VC++ though.
Inlining is normally performed within a single translation unit (.cpp file). When you call functions in another file, they’re never inlined.
Link Time Optimization (LTO) changes this, allowing inlining to work across translation units. It should always be equal or better (sometimes very very significantly) to regular linking in terms of how efficient the generated code is.
The reason both options are still available is that LTO can take a large amount of RAM and CPU – I’ve had VC++ take several minutes on linking a large C++ project before. Sometimes it’s not worth it to enable until you ship. You could also run out of address space with a large enough project, as it has to load all that bytecode into RAM.
For writing efficient code, nothing changes – all the same rules apply with LTO. It is potentially more efficient to explicitly define an inline function in a header file versus depending on LTO to inline it. The inline keyword only provides a hint so there’s no guarantee, but it might nudge it into being inlined where normally (with or without LTO) it wouldn’t be.
If the function is inlined, there would be no difference.
I believe the main reason for having inline functions defined in the headers is history. Another is portability. Until resently most compilers did not do link time code generation, so it having the functions in the headers was a necessity. That of course affects code bases started on more than a couple of years ago.
Also, if you still target some compilers that don't support link time code generation, you dont have a choice.
As an aside, I have in one case been forced to add a pragma to ask one specific compiler not to inline an init() function defined in one .cpp file, but potentially called from many places.
Related
To optimize compile time I would like to enable unity build (via CMake) for my embedded system (C++) project. Works great, but as i see it there are some side effects.
One major thing that i observed is that the linked binary has a different size (bigger) than compared to the "normal" build. Looking at the elf file I noticed that there are fewer symbols in the unity built binary compared to the other one. As I see it there is some inlining happening at compile time (initially I thought inlining happens at link time?) and therefore binary size grows, when inlined functions are used multiple times.
Because of the inlining that happens with unity build, the runtime is also slightly shorter.
The concern i have right now is that with growing source code, i get different unity buckets and therefore inlining is not really deterministic.
If my assumption is right, is there a way to counteract to this problem?
First to this aspect
(initially I thought inlining happens at link time?)
Is compiler and context dependend but in general, compiler and linker are involved here both.
For the general question:
How do you define Unity buckets for you? I guess you mean effectively multiple translation units. If that's the case, nothing is really different to "common" building schemes (each .cpp leads to a single translation unit), so your concerns would have nothing to do with UnityBuild in detail. As a best practice: If you use UnityBuilds, make sure your project/solution is always fine for "common" builds and for UnityBuild likewise. That means for instance: Avoid global using namespaces always (in .cpp and .h), make sure you do not run into static initialization order fiasco (very relevant to your question here), avoid a lot of anonymous namespace usage cases (sad I know...), use a strict include guard scheme and so on...
I am aware that the keyword inline has useful properties e.g. for keeping template specializations inside a header file.
On the other hand I have often read that inline is almost useless as hint for the compiler to actually inline functions.
Further the keyword cannot be used inside a cpp file since the compiler wants to inspect functions marked with the inline keyword whenever they are called.
Hence I am a little confused about the "automatic" inlining capabilities of modern compilers (namely gcc 4.43). When I define a function inside a cpp, can the compiler inline it anyway if it deems that inlining makes sense for the function or do I rob him of some optimization capabilities ? (Which would be fine for the majority of functions, but important to know for small ones called very often)
Within the compilation unit the compiler will have no problem inline functions (even if they are not marked as inline). Across compilation units it is harder but modern compilers can do it.
Use of the inline tag has little affect on 'modern' compilers and whether it actually inlines functions (it has better heuristics than the human mind) (unless you specify flags to force it one way or the other (which is usually a bad idea as humans are bad at making this decision)).
Microsoft Visual C++ was able to do so at least since Visual Studio 2005. They call it "Whole Program Optimization" or "Link-Time Code Generation". In this implementation, the compiler will not actually produce machine code, but write the preprocessed C++ code into the object files. The linker will then merge all of the code into one huge code unit and perform the actual compilation..
GCC is able to do this since at least version 4.5, with major improvements coming in GCC 4.7. To my knowledge the feature is still considered somewhat experimental (at least in so far as many Linux distributions not using it). GCC's implementation works very similarly by first writing the preprocessed source (in its GIMPLE intermediate language) into the object files, then compiling all of the object files into a single object file which is then passed to the linker (this allows GCC to continue to work with existing linkers).
Many big C++ projects also do what is now being called "unity builds". Instead of passing hundreds of individual C++ source files into the compiler, one source file is created that includes all the other source files in the project. The original intent behind this is to decrease compilation times (since headers etc. do not have to be parsed over and over), but as a side-effect, it will have the same outcome as the LTO/LTCG techniques mentioned above: giving the compiler perfect visibility into all functions in all compilation units.
I jump between being impressed by my C++ compiler's (MSVC 2010) ingenuity and its stupidity. Some code that did pixel format conversion via templates, which would have resolved into 5-10 assembly instructions when properly inlined, got bloated into kilobytes(!) of nested function calls. At other times, it inlines so aggressively that whole classes disappear even though they contained non-trivial functionality.
This depends on your compilation flags. With -combine and -fwhole-program, gcc will do function inlining across cpp boundaries. I'm not sure how much the linker will do if you compile into multiple object files.
The standard dictates nothing about how a function can be inlined. And compilers can inline functions if they have access to their implementation. If you only have a header with binaries, it would be impossible. If it's in the same module, the compiler can inline the function even if it is in the cpp file.
I've working on a project at work where there's loads and loads of code in the header files. If I were using Visual Studio this wouldn't be an issue, as this has pre-compiled headers etc, but this is Linux GCC code.
Anyway, its starting to become a bit of an issue with compilation times. Of course the templates are going to have to remain in the headers etc, but most of this code could be extracted into implementation files and linked against as a static library. All of the projects uses these headers and get compiled each time, so it makes sense to create a static lib.
Implementations in the header files are in-lined, or is this only a hint, like the inline keyword? This code is VERY time critical and I'm concerned about moving the implementations out of the headers. Can I achieve the same thing if I use the inline keyword as opposed to having implementations in header files?
** UPDATE **
I know that inline is only a hint to the compiler. I'm not in control of everything in the project and I just want to move everything out of the headers into a library without affecting performance. Is this actually going to be a try it and see thing? I just want to keep performance exactly the same but enhance compile time.
The inline keyword is only a hint to the compiler that it may wish to inline that function. Its real purpose is to allow you to legally "violate" the one definition rule.
In order to inline a function its body has to be visible at the point of call, which typically means that if you move the function to an implentation file it may not be inlined anymore.
But keep in mind that most likely large functions in the header will not be inlined anyway. Also consider that in many cases inlined functions may actually be slower than called functions due to a variety of architecture-specific issues.
inline is a hint for optimization, but it is also used to work around ODR.
Consider using whole program optimization / link-time optimization instead. It allows you to have implementation in multiple files and basically everything has the same opportunity to be optimized (and inlined) as if it were in the same translation unit.
Your compile times become much quicker, but link times usually suffer, sometimes quite a bit. You don't need to enable it for debug builds though, so it can manifest a pretty immediate improvement to dev time.
If I were using Visual Studio this wouldn't be an issue, as this has pre-compiled headers etc
GCC has them too.
http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html
The inline keyword does NOT mean that the function has to be implemented "in the line" where you define the function. As you know it is a hint for the compiler, to let it try compiling as if the few lines in the function were at the place where you call it. Thus avoiding the overhad of saving the adress to jump back, a vtable lookup etc.
Thinking it is faster called because it is in the header, is wishfull thinking (of the original guy who wrote the code).
Try it and move the implementation to the cpp file in a minimal example - main one object one inline function one call to it. Once implemented in the header and once in a cpp file. Then look at the assembly. No difference.
Does a file structure that is mostly header files (90% of your code being header-only) slow down anything besides compilation?
Some people argue that it could cause inlining of most code in case of speed optimizations and so processor would calculate wrong stats about instruction calls or something like that. Is anywhere shown that it or something similar would happen and so slow down application speed?
This is possibly a duplicate of Benefits of inline functions in C++?
The practical performance implication depends on many factors. I would not concern myself with it until you actually have a performance problem, in which case I'm sure bigger gains can be obtained by optimizing other things.
Don't keep all your code in headers - if you continue with this trend you will hate yourself later because you will be waiting for your compiler most of the time. LTO is a better approach if you are looking for similar optimizations, and has less of an impact on compile time.
Linking is a concern.
If your libraries are header dominant, then larger intermediate object files may need to be written then read. The linker will then have more symbols to analyze and deduplicate, and some symbols will remain as legal duplicates. This increases your I/O, bloats your binary size, and throws a lot more work at the linker.
One benefit of header dominance is that there tends to be fewer sources to compile and consequently fewer images/objects to link. So header only also has the potential to be faster in this regard (if used correctly).
If your library is going to be visible to many translations, then size and impact on linking should also be an important consideration.
Not performance but potential bug concern:
From Uses Guidelines : In C++ class member functions declared in class defenition body always get inlined. If class member function has static members this would lead to each inlined function instance having its own static member. This would lead to bugs.
I am using C++ as intermediate language, for each function object I am creating a unique class with a call method. What I am avoiding is checking if a similar function is already used and its corresponding class defined, so I may end up with exact same class with a different name. So I am wondering if compiler (g++) will detect this and merge classes.
Just to clarify on both previous answers (which are good answers):
The compiler will absolutely not merge your classes, at all. Some linkers might have some optimizations along those lines, but it's by no means a standard feature and neither the standard Microsoft nor GNU/Linux linkers do that. Usually the linker will only do that if you emit weak entries with the same name in the object files directly, which is what happens with template instantiations for instance. There is no standard way to obtain this behavior in C/C++ directly, although at least GCC offers extensions to control this linking yourself.
You should do it yourself though because it actually is an optimization. Jason is right that it would "just" cut down on code size, but on modern PC architectures that is itself a huge optimization. The code caches on the CPU aren't getting much bigger and memory speeds are nowhere close to CPU speeds, so cache misses caused by having an overly huge code image can have very serious performance impacts. There are benchmarks showing that compiling the Linux kernel or large apps like Firefox or OpenOffice with -Os (optimize for size) is faster in some workloads by a wide margin than when compiled with -O3.
No, at least g++ won't, because a class defines a namespace, so a function in class A is actually not the same as a function in class B even if the function itself has the same name. For example, A::foo() is not the same as B::foo().
Also in the object file created after compilation, the function names are mangled, so A::foo() won't have the same literal name as B::foo() even though there is no namespace abstraction at the compiled object file level. So the linker is not going to be able to weed out functions from two different C++ classes based on their names.
I doubt it will. That would be difficult to detect in the general case, and there is no runtime efficiency in optimizing it. The only savings would be code space. An optimizing linker might perform such a transformation, but those are rare in the wild.