How to measure pimpl candidates? - c++

The pimpl (also: compiler firewall) idiom is used to shorten compile times, at the cost of readability and a little runtime performance. At the moment a project takes to long to compile, how to measure the best pimpl candidates?
I have my experience in using pimpl, shortening a project its compile time from two hours to ten minutes, but I did this just following my instincts: I reasoned that class header files that include (1) a lot of source code (2) complex/template classes, are the best candidates to use the pimple idiom on.
Is there a tool that points out which classes are good pimpl candidates objectively?

This is true that Pimpl is useful for incremental compile.
But the main reason to use Pimpl is to preserve ABI compatibility. This was the rule in my past company for almost all public class in the API.
Other advantage is that you can also distribute your library as a package containing header that not expose implementation details.
For this I will say : use Pimpl wherever possible.
A very good article of the Qt Pimpl implementation details and the benefits : https://wiki.qt.io/D-Pointer
The compile time problem must be addressed with :
using precompiled header
dividing your big projects into small ones by code touch frequency. Parts that not change often can be compiled in library and published in local repository that others project reference by version.
...

I'm not aware of a existing tool to do this, but I would suggest:
First, measure the stand-alone cost of including every header by itself. Make a list of all headers, and for each header, preprocess it. The simplest measure of the cost of that header is the number of lines that result from preprocessing. A possibly more accurate measure would be to count the occurrences of 'template', as processing template definitions seems to dominate compilation time in my experience. You could also count occurrences of 'inline', as I've seen large numbers of inline functions defined in headers be an issue too (but be aware that inline definitions of class methods don't necessarily use the keyword).
Next, measure the number of translation units (TUs) that include that header. For each main file of a TU (e.g., .cpp file), preprocess that file and gather the set of distinct headers that appear in the output (in the # lines). Afterward, invert that to get a map from header to number of TUs that use it.
Finally, for each header, multiply its stand-alone cost by the number of TUs that include it. This is a measure of the cumulative effect of this header on total compilation time. Sort that list and go through it in descending order, moving private implementation details into the associated implementation file and trimming the public header accordingly.
Now, the main issue with this or any such approach to measuring the benefit of private implementations is you probably won't see much change at first because, in the absence of engineering discipline to do otherwise, usually there will be many headers that include many others, with lots of overlap. Consequently, optimizing one heavily-used header will simply mean that some other heavily-used header that includes almost as much will keep compilation times high. But once you break through the critical mass of commonly used headers that have many dependencies, optimizing most or all of them, compilation times should start to drop dramatically.
One way to focus the effort, so it's not so "pie in the sky", is to begin by selecting the single TU that takes the most time to compile, and work on optimizing only the headers that it depends on. Once you've significantly reduced the time for that TU, look again at the big picture. And if you can't significantly improve that one TU's compilation time through the private implementation technique, then that suggests you need to consider other approaches for that code base.

Related

Deciding where to place a function implementation

Let me first state that I know that inline does not mean that the compiler will always inline a function...
In C++ there really are two places for a non-template non-constexpr function implementation to go:
A header, definition should be inline
A source file
There are benefits/negatives to placing the implementation in one or the other:
inline function definition
compiler can inline the function
slower compiler times both due to having to parse definitions and include implementation dependencies.
multiple copies of a function between multiple users on the same site
source file definition
compiler can never inline the function (maybe that's not true with LTO?)
can avoid recompilation if the file hasn't changed
one copy per site
I am in the midst of writing a reusable math library where inlining can offer significant speedups. I only have test code and snippets to work with right now, so profiling isn't an option for helping me decide. Are there any rules - or just rules of thumb - on deciding where to define the function? Are there certain types of functions, like those with exceptions, which are known to always generate large amounts of code that should be relegated to a source file?
If you have no data, keep it simple.
Libraries that suck to develop don't get finished, and those that suck to use don't get used. So split h/cpp by default; that makes build times slower and development faster.
Then get data. Write tests and see if you get significant speedups from inlining. Then go and learn how to profile and realize your speedups where spurious, and write better tests.
How to profile and determine what is spurious and what is microbenchmark noise is between a chapter of a book and a book in length. Read SO questions about performance in C++ and you'll at least learn the 10 most common ways to microbenchmark are not accurate.
For general rules, smallish bits of code in tight loops benefit from inlining, as do cases where external vectorization is plausible, and where false aliasing could block compiler optimizations.
Often you can hoist the benefits of inlining into your library by offering vector operations.
Generally speaking, if you are statically linking (as opposed to DLL/DSO methods), then the compiler/linker will basically ignore inline and do what's sensible.
The old rule of thumb (which everyone seems to ignore) is that inline should only be used for small functions. The one problem with inlining is that all do often I see people doing some timed test, e.g.
auto startTime = getTime();
for(int i = 0; i < BIG_NUM; ++i)
{
doThing();
}
auto endTime = getTime();
The immediate conclusion from that test is that inline is good for performance everywhere. But that isn't the case.
inlining also increases the size of your compiled exe. This has a nasty side effect in that it increases the burden placed on the instruction and uop caches, which can cause a performance loss. So in the case of a large scale app, more often than not you'll find that removing inline from commonly used functions can actually be a performance win.
One of the nastiest problems with inline is that if it's applied to the wrong method, it's very hard to get a profiler to point out a hot spot - It's just a little warmer than needed in multiple points in the codebase.
My rule of thumb - if the code for a method can fit on one line, inline it. If the code doesn't fit on one line, put it in the cpp file until a profiler indicates moving it to the header would be beneficial.
The rule of thumb I work by is simple: No function definitions in headers, and all function definitions in a source file, unless I have a specific reason to do otherwise.
Generally speaking, C++ code (like code in many languages) is easier to maintain if there is a clear separation of interface from implementation. Maintenance effort is (quite often) a cost driver in non-trivial programs, because it translates into developer time and salary costs. In C++, interface is represented by declarations of functions (without definition), type declarations, struct and class definition, etc i.e. the things that are typically placed in a header, if the intent is to use them in more than one source file. Changing the interface (e.g. changing a function's argument types or return type, adding a member to a class, etc) means that everything which depends on that interface must be recompiled. In the long run, it often works out that header files need to change less often than source files - as long as interface is kept separate from implementation. Whenever a header changes, all source files which use that header (i.e. that #include it) must be recompiled. If a header doesn't change, but a function definition changes, then only the source file which contains the changed function definition, needs to be recompiled.
In large projects (say, with hundreds of source and header files) this sort of thing can make the difference between incremental builds taking a few seconds or a few minutes to recompile a few changed source files versus significantly longer to recompile a large number of source files because a header they all depend on has changed.
Then the focus can be on getting the code working correctly - in the sense of producing the same observable output given a set of inputs, meeting its functional requirements, and passing suitable test cases.
Once the code is working correctly enough, attention can turn to other program characteristics, such as performance. If profiling shows that a function is called many times and represents a performance hot-spot, then you can look at options for improving performance. One option that MIGHT be relevant for improving performance of a program that is otherwise correct is to selectively inline functions. But, every time this is done, it amounts to deciding to accept a greater maintenance burden in order to get performance. But it is necessary to have evidence of the need (e.g. by profiling).
Like most rules of thumb, there are exceptions. For example, templated functions or classes in C++ do generally need to be inlined since, more often than not, the compiler needs to see their definition in order to instantiate the template. However, that is not an justification to inlining everything (and it is not a justification for turning every class or function into a template).
Without profiling or other evidence, I would rarely bother to inline functions. Inlining is a hint to the compiler, which the compiler may well ignore, so the effort of inlining may not even be worth it. Doing such a thing without evidence may achieve nothing - in which case it is simply premature optimisation.

What are the drawbacks of single source project structures?

I'm new in my current company and working on a project written by my direct team lead. The company usually doesn't work with C++ but there is productive code written by my coworker in C/C++. It's just us who know how to code in C++ (me and my lead, so no 3rd opinion that can be involved).
After I got enough insight of the project I realized the whole structure is... special.
It actually consist of a single compilation unit where the makefile lists as only source the main.hpp.
This headerfile then includes all the source files the project consists off, so it looks like a really big list of this:
#include "foo.cpp"
#include "bar.cpp"
While trying to understand the logic behind it and I realized that this is indeed working for this project, as it's just an interface where each unit can operate without accessing any other unit, at some point I asked him what are the reasons why he did it this way.
I got a defensive reaction with the argument
Well, it is working, isn't it? You are free to do it your way if you think that's better for you.
And that's what I'm doing now, simply because I'm really having trouble with thinking into this structure. So for now I'm applying the "usual" structure to the implementation I'm writing right now, while doing only mandatory changes to the whole project, to demonstrate how I would have designed it.
I think there are a lot of drawbacks, starting with mixing linkers and compilers jobs by own project structure can't serve well, up to optimizations that will probably end in redundant or obscure results, not to mention that a clean build of the project takes ~30 minutes, what I think might be caused by the structure as well. But I'm lacking the knowledge to name real and not just hypothetical issues with this.
And as his argument "It works my way, doesn't it?" holds true, I would like to be able to explain to him why it's a bad idea anyway, rather than coming over as the new nit picky guy.
So what problems could actually be caused by such a project structure?
Or am I overreacting and such a structure is totally fine?
not to mention that a (clean) build of the project takes ~30 minutes
The main drawback is that a change to any part of the code will require the entire program to be recompiled from the scratch. If the compilation takes a minute, this would probably not be significant, but if it takes 30 min, it's going to be painful; it destroys the make a change->compile->test workflow.
not to mention that a clean build of the project takes ~30 minutes
Having separate translation units is actually typically a quite a bit slower to compile from scratch, but you only need to recompile each unit separately when they're changed, which is the main advantage. Of course, it is easy to mistakenly destroy this advantage by including a massive, often changing header in all translation units. Separate translation units take a bit of care to do it right.
These days, with multi core CPU's, the slower build from scratch is mitigated by parallelism that multiple translation units allow (perhaps the disadvantage may even be overcome if the size of individual translation units happen to hit a sweet spot, and there are enough cores; you'll need some thorough research to find out).
Another potential drawback is that the entire compilation process must fit in memory. This is only a problem when that memory becomes more than the free memory on your developers workstations.
In conclusion: The problem is that one-massive-source-file approach does not scale well to big projects.
Now, a word about advantages, for fairness
up to optimizations will probably end in redundant or obscure results
Actually, the single translation unit is easier to optimize than separate ones. This is because some optimizations, inline expansion in particular, are not possible across translation units because they depend on the definitions that are not in the currently compiled translation unit.
This optimization advantage has been mitigated since link time optimization has been available in stable releases of popular compilers. As long as you're able and willing to use a modern compiler, and to enable link time optimization (which might not be enabled by default)
PS. It's very un-conventional to name the single source file with extension .hpp.
First thing I would like to mention are advantages of project with Single Compilation Unit:
Drastic compilation time reduction. This is actually one of the primary reasons to switch to SCU. When you have a regular project with n translation units compilation time will grow ~ linearly with each new translation unit added. While with SCU it will grow ~ logarithmically and adding new units to large projects hardly effects compilation time.
Compilation memory reduction. Both disc and RAM. "big" translation unit will obviously occupy considerably more memory than each individual "small" translation unit containing only part of the project, however their cumulative size will greatly exceed size of the "big" translation unit.
Some optimization benefits. Obviously you get "everything is inline" automatically.
No more fear of "compilation from scratch". This is very important because it is what CI server performs.
Now to disadvantages:
Developers must maintain strict header organization discipline. Header guards, consistent ordering of #include directives, mandatory inclusion of headers directly required by current file, proper forwarding, consistent naming conventions etc. The problem is that there are no tools to help developers with this and even minor violations in header organization may lead to messy failed build logs.
Potential increase of total amount of files in project. See this answer for details.
No more "it's compiling" excuses for wooden sword fencing.
P.S.
The SCU organization in your case is kind of "soft-boiled". By that I mean that project still has translation units that are not proper headers. Typically such scenario happens when an old project being converted to SCU.
If building your project with SCU takes ~30 minutes, I have a feeling that it is either not fault of project organization (it could be antivirus, no SSD, recursive templates bloat) or it would take several hours without SCU.
some numbers: compilation time dropped from ~14 minutes to ~20 seconds, 3x executable size reduction (result of converting existing project to SCU from my experience)
real-world use cases: CppCon 2014: Nicolas Fleury "C++ in Huge AAA Games", Chromium Jumbo / Unity builds ("is can save hours for a full build")
I might a bit exaggerating, but is seem to me that entire concept of "multiple translation units" (and static libraries as well) should be left in the past.

Putting method implementations of the same class in different object files

I'm working on a big project in C++.
I have many classes which have methods that do completely different things (like one dumps, another modifies the object, another checks it to see if it's valid and so on...).
Is it a good style to put the implementation of a method for all the classes in a source file (or in a group of object files that may be archived) and all of the implementations of another method of the classes in another archive?
Could this be good when linking, maybe when someone doesn't need the dumping methods (for example), or it's better to keep the method implementations of the same class in the same source file, in order to not make confusion?
There are trade-offs.
When you change the implementation of any function, the entire translation unit must be re-compiled into a new object file.
If you write only a single function per translation unit, you minimize the length of compilation time caused by unnecessary rebuilds.
On the other hand, writing a single function per translation unit, you maximize the length of compilation time from scratch, because it's slower to compile many small TU's than a few bit TU's.
The optimal solution is personal preference, but usually somewhere in between "single function per TU" and "one massive TU for entire program" (rather than exactly one of those). For member functions, one TU per class is a popular heuristic, but not necessarily always the best choice.
Another consideration is optimisation. Calls to non-inline functions can be expanded inline, but only within the same translation unit. Therefore, it is easier for the compiler to optimize a single massive TU.
Of course, you can choose to define the functions inline, in the header file, but that causes a re-building problem, because if any of the inline functions change, then all who include the header must re-build. This is worse problem than simply having bigger TUs but not as bad as having one massive TU.
So, defining related non-inline functions within the same TU allows the compiler to decide on optimization within that TU, while preventing a re-build cascade. This is advantageous if those related functions would benefit from inline expansion and call each other a lot.
This advantage is mitigated by whole program optimisation.
Third consideration is organisation. It may be likely, that a programmer who looks at member function of a class would also be interested in other member functions of that class. Having them in the same source file will allow them to spend less time on searching the correct file.
The organizational advantage of grouping all class functions into a common source file is somewhat mitigated by modern IDEs that allow for quickly jumping from source file to header and from there to the other function.
Fourth consideration is the performance of the editor. Parsing a file of tens of thousands of lines or more can be slow and may use a lot of memory depending on parsing technique. One massive TU doesn't necessarily cause this, because you can use separate files that are only included together.
On the other hand, massive number of files can be problematic for some file browsers (probably not much these days) and also for version control systems.
Finally, my opinion: I think that one source file per class is a decent heuristic. But it should not be followed religiously when it's not appropriate.
Some organizations have rules that mandate one definition per unit. In these organizations, a header file can define only one class, and a translation unit can define only one function. Other organizations mandate at most one source file for each header files (some header files have no implementation).
The optimal thing to do is somewhere in between. I generally don't care about compiler or linker performance. I do care a lot about code readability and maintainability. A source file that implements some class that is thousands of lines long is hard to navigate. It's better to break that file into multiple files. Breaking it into hundreds of files, one file per function, makes for a directory structure that is difficult to navigate. Breaking it into chunks of closely related functions keeps the directory structure and the contents of each file navigable.
However, and this is a big however: Why is your class so large that you have to worry about this? A class whose implementation spans thousands of lines or dozens of files is a code smell.

Multiple files and Object Oriented Programming

In my application, I use multiple files, each file contains a class, is it a good idea to gather all C++ files (implementation of all classes) in one file and gather all headers in another file, or this is not good for some reason but code organizing ?
Keeping declarations and definitions organized in to separate but related translation units can help to decrease compilation times.
And don't disregard the value of keeping things organized for humans too! Software can consist of many thousand different objects, functions, and other parts. Keep it as simple as possible (but no simpler)!
If you keep declaration and definitions of a class in the corresponding files, they need only to be recompiled when you made changes in these class. Also changes in one class only requires relinking of the changed class against the classes which depend on it. Therefore it decreases compile time.
It makes it also much more easier to debug, as the compilation errors can be traced back to one file.
There is no advantage to concatenate all files in one, as far as I know
In C++ it really doesn't matter so much. In some languages, such as Java, the compiler requires that every class be in a separate file, but as long as you make sure that the different files reference each other there is no reason either way.
Perfectly agree with other answers, and I would like to add my piece :
Breaking into several files also make it easier when using an editor. You can then use several tabs. (Imagine if your browser displayed all your pages one after another in one window only !)
It is even acceptable to break a class implementation in several files if the implementation is big.
On the other hand, there are sometimes reasons to put several classes in one file, for example when those classes are small and/or very highly related. For example a FooObject and its FooAllocator, or a FooObject and its small FooSubObject used only by him.

compile code as a single automaticly merged file to allow compiler better code optimization

suppose you have a program in C, C++ or any other language that employs the "compile-objects-then-link-them"-scheme.
When your program is not small, it is likely to compromise several files, in order to ease code management (and shorten compilation time). Furthermore, after a certain degree of abstraction you likely have a deep call hierarchy. Especially at the lowest level, where tasks are most repetitive, most frequent you want to impose a general framework.
However, if you fragment your code into different object files and use a very abstract archictecture for your code, it might inflict performance (which is bad if you or your supervisor emphasizes performance).
One way to circuvent this is might be extensive inlining - this is the approach of template meta-programming: in each translation unit you include all the code of your general, flexible structures, and count on the compiler to counteract performance issues. I want to do something similar without templates - say, because they are too hard to handle or because you use plain C.
You could write all your code into one single file. That would be horrible. What about writing a script, which merges all your code into one source file and compiles it? Requiring your source files are not too wildly written. Then a compiler could probably apply much more optimization (inlining, dead code elamination, compile-time arithmetics, etc.).
Do you Have any experience with or objections against this "trick"?
Pointless with a modern compiler. MSVC, GCC and clang all support link-time code generation (GCC and clang call it 'link-time optimisation'), which allows for exactly this. Plus, combining multiple translation units into one large makes you unable to parallelise the compilation process, and (at least in case of C++) makes RAM usage go through the roof.
in each translation unit you include all the code of your general, flexible structures, and count on the compiler to counteract performance issues.
This is not a feature, and it's not related to performance in any way. It's an annoying limitation of compilers and the include system.
This is a semi-valid technique, iirc KDE used to use this to speed up compilation back in the day when most people had one cpu core. There are caveats though, if you decide to do something like this you need to write your code with it in mind.
Some samples of things to watch out for:
Anonymous namespaces - namespace { int x; }; in two source files.
Using-declarations that affect following code. using namespace foo; in a .cpp file can be OK - the appended sources may not agree
The C version of anon namespaces, static globals. static int i; at file scope in several cpp files will cause problems.
#define's in .cpp files - will affect source files that don't expect it
Modern compilers/linkers are fully able to optimize across translation units (link-time code generation) - I don't think you'll see any noticeable difference using this approach.
It would be better to profile your code for bottlenecks, and apply inlining and other speed hacks only where appropriate. Optimization should be performed with a scalpel, not with a shotgun.
Though it is not suggested, using #include statements for C files is essentially the same as appending the entire contents of the included file in the current one.
This way, if you include all of your files in one "master file" that file will be essentially compile as if all the source code were appended in it.
SQlite does that with its Amalgamation source file, have a look at:
http://www.sqlite.org/amalgamation.html
Do you mind if I share some experience about what makes software slow, especially when the call tree gets bushy? The cost to enter and exit functions is almost totally insignificant except for functions that
do very little computation and (especially) do not call any further functions,
and are actually in use for a significant fraction of the time (i.e. random-time samples of the program counter are actually in the function for 10% or more of the time).
So in-lining helps performance only for a certain kind of function.
However, your supervisor could be right that software with layers of abstraction have performance problems.
It's not because of the cycles spent entering and leaving functions.
It's because of the temptation to write function calls without real awareness of how long they take.
A function is a bit like a credit card. It begs to be used. So it's no mystery that with a credit card you spend more than you would without it.
However, it's worse with functions, because functions call functions call functions, over many layers, and the overspending compounds exponentially.
If you get experience with performance tuning like this, you come to recognize the design approaches that result in performance problems. The one I see over and over is too many layers of abstraction, excess notification, overdesigned data structure, stuff like that.