Modern Compilers Inlining Across Cpp Files and PImpl Idiom Overhead - c++

I was taught that in general practice its best not to try to beat the compiler, at least until its proven to be stupid. So in general, and since its only used as a hint anyways, the inline tag has never been a priority for me to use because I was taught that the compiler is pretty good when it comes to understanding the trade offs your system will have to make when it comes to inlining vs. looped calls to a function pointer for example.
But then I just read recently that functions cannot be inlined unless they are defined in the header file because otherwise the compiler won't see it when it starts to work on other header and cpp files (read from . I did some more research and I found that some people were saying that this is the old way things worked and that modern compilers perform "Whole Program Optimization" or "Link-Time Code Generation" which is responsible for the construction of the obj files that I'm used to seeing which can then be linked together and optimized further such as allowing the compiler to see functions in the obj file that were once "hidden" in the cpp file and then inline optimize them where appropriate. This sounds great but I'm wondering which source to trust and was wondering if I could get an additional opinion on the matter.
Further research lead me to a deeper understanding of the Pimpl idiom and why it is used, most obviously to save on compile time costs and increase readability and stability, but as I continued reading I noticed that the cppreference.com/cpp/pimpl documentation states that separating header information from implementation details can lead to runtime overhead as calls made to functions require at least one level of indirection in addition to increased spacial cost due to the additional pointers used. So if I was to create a real time application where runtime performance is high priority is it best to avoid the Pimpl idiom and if so should I instead use their suggested alternative and tag all functions inline and place them in the header file (if I'm understanding their suggested alternative correctly), or is their an alternative to that such as only using cpp files without causing lnk2005 errors that I'm not seeing (maybe only using one large cpp file).
Thanks for the help in advance I know it was a bit of a text crawl to get to this point.

Related

C++ header and implementation, (why) is it not automatically handled by the IDE/compiler?

In C++, your classes are often divided into two parts, being the header-file and the actual implementation. In my (unexperienced) opinion, this is awful. It requires me to do all sorts of unnecessary book-keeping, clutters up my project directory and goes against everything I've learned about software development (double implementation). Languages where you only deal with the implementation, such as Java or Python, are much nicer to work with.
I've always learned that the reason to use them was to significantly decrease compilation time. However, wouldn't a modern IDE (CLion in my case) or even the compiler be smart enough to either:
Keep some sort of "shadow"-header file, which would automatically be updated whenever a definition is changed in the implementation?
Automatically split it into the header and implementation during compile time, allowing you to only have to deal with one file? (Something that Lazy C++ seems to do)
Or are there any plugins available that offer this kind of behaviour? C++ modules also seem to offer a solution to this problem, but their current status/support is unclear to me and to make matters worse there seem to be two competing standards (Clang's and Microsoft's).
Unfortunately it is not that simple. Header/source file separation C++ inherited from C due to preprocessor, that both share. Automatic generation of a header file is not possible in general, first of all that separation is not trivial, second header file often has preprocessor code that manually written and generates compilation code. Third almost all templated code goes to a header file due to process of compilation and rules of visibility. Changing all of that would require breaking compatibility with existing code, amount of which is significant and nobody wants to do that. More easy would be to create yet another language (like D) but many people would not want to migrate due to various reasons. We know that committee is working on modules and if they manage to make them work without breaking compatibility, that would be helpful for many of us. But again this is not trivial task at all, the way you describe it would only work in certain environments (when you limit yourself) but cannot be applied to everybody.

Are unnecessary include files an overhead?

I have seen a couple of questions on how to detect unnecessary #include files in a C++ project. This question has often intrigued me, but I have never found a satisfactory answer.
If there are some header files included which, are not being used in a c++ project, is that an overhead? I understand that it means that before compilation the contents of all the header files would be copied into the included source files and that would result in a lot of unnecessary compilation.
How far does this kind of overhead spread to the compiled object files and binaries?
Aren't compilers able to do some optimizations to make sure that this
kind of overhead is not transferred to the resulting object files and
binaries ?
Considering the fact, that I probably know nothing about compiler optimization, I still want to ask this, in case there is an answer.
As a programmer who uses a wide variety of c++ libraries for his work,
what kind of programming practices should I follow to keep avoiding
such overheads ? Is making myself intimately familiar with each
library's working the only way out ?
It does not affect the performance of the binary or even the contents of the binary file, for almost all headers. Declarations generate no code at all, inline/static/anonymous-namespace definitions are optimized away if they aren't used, and no header should include externally visible definitions (that breaks if the header is included by more than one translation unit).
As #T.C. points out, the exception are internally visible static objects with nontrivial constructors. iostream does this, for example. The program must behave as if the constructor is called, and the compiler usually doesn't have enough information to optimize the constructor away.
It does, however, affect how long compilation takes and how many files will be recompiled when a header is changed. For large projects, this is enough incentive to care about unnecessary includes.
Besides the obviously longer compile times, there might be other issues. The most important one IMHO is dependencies to external libraries. You don't want your program to depend on more libraries then necessary.
You also then need to install those libraries in every system you want to the program to build on. This can become a nightmare, especially when the next programmer needs to install some database client library although the program never uses a database.
Also, especially library headers often tend to define macros. Sometimes those macros have very generic names which will break you code or which are incompatible with other library headers you might actually need.
Of course any #include is an overhead. The compiler needs to parse that file.
So avoid them. Use forward declarations where ever possible.
It will speed up compilation. See Scott Myers book on the subject
The simple answer is YES its an overhead as far as the compilation is concerned but for runtime it is merely going to create any difference. Reason being lets say you add #include <iostream> (just for example) and assume that you are not using any of its function then g++ 4.5.2 has some additional 18,560 lines of code to process(compilation). But as far as the runtime overhead is concerned I hardly think that it creates a performance issue.
You can also refer Are unused includes harmful in C/C++? where I really liked this point made by David Young
Any singletons declared as external in a header and defined in a
source file will be included in your program. This obviously increases
memory usage and possibly contributes to a performance overhead by
causing one to access their page file more often (not much of a
problem now, as singletons are usually small-to-medium in size and
because most people I know have 6+ GB of RAM).

Does a file structure that is mostly Header files slow down anything besides compilation?

Does a file structure that is mostly header files (90% of your code being header-only) slow down anything besides compilation?
Some people argue that it could cause inlining of most code in case of speed optimizations and so processor would calculate wrong stats about instruction calls or something like that. Is anywhere shown that it or something similar would happen and so slow down application speed?
This is possibly a duplicate of Benefits of inline functions in C++?
The practical performance implication depends on many factors. I would not concern myself with it until you actually have a performance problem, in which case I'm sure bigger gains can be obtained by optimizing other things.
Don't keep all your code in headers - if you continue with this trend you will hate yourself later because you will be waiting for your compiler most of the time. LTO is a better approach if you are looking for similar optimizations, and has less of an impact on compile time.
Linking is a concern.
If your libraries are header dominant, then larger intermediate object files may need to be written then read. The linker will then have more symbols to analyze and deduplicate, and some symbols will remain as legal duplicates. This increases your I/O, bloats your binary size, and throws a lot more work at the linker.
One benefit of header dominance is that there tends to be fewer sources to compile and consequently fewer images/objects to link. So header only also has the potential to be faster in this regard (if used correctly).
If your library is going to be visible to many translations, then size and impact on linking should also be an important consideration.
Not performance but potential bug concern:
From Uses Guidelines : In C++ class member functions declared in class defenition body always get inlined. If class member function has static members this would lead to each inlined function instance having its own static member. This would lead to bugs.

Effective C++ "35. Minimize compilation dependencies between files". Is it still valid today?

In this chapter Scott Meyer mentioned a few technique to avoid header files dependency. The main goal is to avoid recompiling a cpp file if changes are limited to other included header files.
My questions are:
In my past projects I never paid attention to this rule. The compilation time is not short but it is not intolerable. It could have more to do with the scale (or the lack of) of my projects. How practical is this tip today given the advance in the compiler technology (e.g. clang)?
Where can I find more examples of the use of this techniques? (e.g. Gnome or other OSS projects)
P.S. I am using the 2nd edition.
I don't think compiler technology has advanced particularly. clang is not some piece of magic - if you have dependencies then and you make changes, then dependent code will have to be recompiled. This can take a very, very long time - read hours, or even days for a big project, so people try to minimise such dependencies where possible.
Having said that, it is possible to overdo things - making all classes into PIMPLs, forward declaring everything, etc. Doing this just leads to obfuscated code, and should be avoided whenever possible.
Reducing compilation times is a red herring, and a form of premature optimization. Reorganizing your code to reduce compilation times (when this matters) can be done, but at a somehow great cost.
As for Gnome, Gnome has a "private pointer" in every GObject. This implements the pimpl idiom. This reduces dependencies between source files, and allow for some form of encapsulation. There are fewer compile time problems for C projects.
Modern C++ designs make heavy use of templates, which inevitably make your compilation times skyrocket. Using the pimpl idiom and forward declaring classes (instead of including a header, where possible) reduces the logical dependencies between translation units (this is a good thing), but in many situations do not really help with compilation times.
Using boost greatly increase compilation times (beware if you indirectly include boost headers in many source files), and many C++ projects use it.
I should mention also the thin template idiom is often used to reduce code bloat with templates.

What are the advantages and disadvantages of implementing classes in header files?

I love the concept of DRY (don't repeat yourself [oops]), yet C++'s concept of header files goes against this rule of programming. Is there any drawback to defining a class member entirely in the header? If it's right to do for templates, why not for normal classes? I have some ideas for drawbacks and benefits, but what are yours?
Possible advantages of putting everything in header files:
Less redundancy (which leads to easier changes, easier refactoring, etc.)
May give compiler/linker better opportunities for optimization
Often easier to incorporate into an existing project
Possible disadvantages of putting everything in header files:
Longer compile/link cycles
Loss of separation of interface and implementation
Could lead to hard-to-resolve circular dependencies
Lots of inlining could increase executable size
Prevents binary compatibility of shared libraries/DLLs
Upsets co-workers who prefer the traditional ways of using C++
Well - one problem is that typically implementations change much more often than class definitions - so for a large project you end up having to recompile the world for every small change.
The main reason not to implement a class in the header file is: do the consumers of your class need to know its implementation details? The answer is almost always no. They just want to know what interface they can use to interact with the class. Having the class implementation visible in the header makes it much more difficult to understand what this interface is.
Beyond considerations of compactness and separating interface from implementation, there are also commercial motivations. If you develop a library to sell, you (probably) do not want to give away the implementation details of the library you are selling.
You're not repeating yourself. You only write the code once in one header. It is repeated by the preprocessor, but that's not your problem, and it's not a violation of DRY.
If it's right to do for templates, why not for normal classes
It's not really that it's the right thing to do for templates. It's just the only one that really works in general.
Anyway, if you implement a class in a header, you get the following advantages and disadvantages:
The full implementation is visible anywhere it is used, which makes it easy for the compiler to inline as necessary.
The same code will be parsed and compiled multiple times, leading to higher compile-times.
On the other hand, if everything is in headers, that may lead to fewer translation units, and so the compiler has to run fewer times. Ultimately, you might end up with a single translation unit, which just includes everything once, which can result in very fast compilations.
And... that's it, really.
Most of my code tends to be in headers, but that's because most of my code is templates.
The main disadvantage (apart from the lengthy builds) is there is no clear separation of the interface and implementation.
Ideally, you should not need to see the implementation of an intuitive, and well documented interface.
Not mentionned yet: virtual functions are instantiated for each include, so you can bloat your executable (I'm not sure whether this is true for all compilers).
There is an alternative:
Do a lot of stuff in classes declared in your source-file. 1 example is the pimpl-idiom, but there are also people who are afraid to declare classes out of the header-file. However, this makes sense for private classes.