How does forward declaration save compile time?

How does forward declaration save compile time? - c++

If you read online then there is plenty of claims that in C++ if you use forward declaration then it saves your compile time. The usual theory is that since #include means mere text replacement if I use forward declaration, then my compiler doesn't need to parse the header and possibly compile it, so it saves time. I found this claim hard to believe because consider I usually see code like this:
// B.h
class A;
class B {
public:
void doSomething(A& a);
}
In this case, yeah, we don't need to include A.h in B.h as we forward declared it, but the problem is that in B.cpp eventually, we need a full type A to use its methods and data members. So I found in nearly all cases, we need to include A.h in B.cpp.
So how does forward declaration actually save compile time? I see people with benchmarks to prove that if they use forward declaration instead of #includes, the compile time actually goes down, so there must be something I do not understand here...
I know saving compile time is not the sole purpose of forward declaration, I understand it has other purposes. I just want to understand why some people claim it can save compile time.

Compile times
the problem is that in B.cpp eventually, we need a full type A to use its methods and data members.
Yes, that is a typical pattern. Forward declare a class (e.g. A) in a header (e.g. B.h), then in the source code corresponding to that header (B.cpp), include the header for the forward-declared class (A.h).
So I found in nearly all cases, we need to include B.h in B.cpp.
Correct, forward declarations do not save time when compiling the corresponding source code. The savings come when compiling other source code that uses B. For example:
other.cpp
#include "B.h"
// Do stuff with `B` objects.
// Make no use of `A` objects.
Assume this file does not need definitions from A.h. This is where the savings come in. When compiling other.cpp, if B.h uses a forward declaration of A, there is no need to process A.h. Nor is there a need to process the headers that A.h itself includes, and so on. Now multiply this effect by the number of files that include B.h, either directly or indirectly.
Note that there is a compounding effect here. The number of "headers that A.h itself includes" and of "files that include B.h" would be the numbers before replacing any #include statements with forward declarations. (Once you start making these replacements, the numbers come down.)
How much of an effect? Not as much as there used to be. Still, as long as we're talking theoretically, even the smallest savings is still a savings.
Rebuild times
Instead of raw compile times (build everything), I think a better focus would be on rebuild times. That is, the time it takes to compile just the files affected by a change you made.
Suppose there are ten files that rely on B.h but not on A.h. If B.h were to include A.h, then those ten files would be affected by changes to A.h. If B.h were instead to forward declare A, then those files would not be affected by changes to A.h, reducing the time to rebuild after those changes.
Now suppose there is another class, call it B2, that also has the option to forward declare A instead of including the header. Maybe there are another ten files that depend on B2 but not on B and not on A. Now there are a twenty files that do not need to be re-compiled after changes to A.
But why stop there? Let's add B3 through B10 to the mix. Now there are a hundred files that do not need to be re-compiled after changes to A.
Add another layer. Suppose there is a C.h that has the option to forward declare B instead of including B.h. By using a forward declarations, changes to A.h no longer require re-compiling the ten files that use C.h. And, of course, we'll assume there are ten such files for each of B through B10. Now we're up to 10*10*10 files that do not need to be recompiled when A.h changes.
Takeaway
This is a simplified example to serve as a demonstration. The point is that there is a forest of dependency trees created by #include lines. (The root of such a tree would be the header file of interest, and its children are the files that #include it.) Each leaf in one of these trees represents a file that must be compiled when changes occur in the header file of interest. The number of leaves in a tree grows exponentially with the depth, so removing a branch (by replacing an #include with a forward declaration) can have a massive effect on rebuild time. Or maybe a negligible effect. This is theory, not practice.
I should note that like the question, this answer focuses on compile times, not on the other factors to consider. This is not supposed to be a comprehensive guide to the pros and cons of forward declarations, just an explanation for how they could save compilation time.

Related

How much is reduction in compile time if move very simple definition of function to .cpp?

I have been told by some colleagues (who are smart than me) that moving implementation (definition) outside header can reduce compile time in some cases - I should do it in most case.
After a lot of refactoring, I believe it is true.
Now I plan to move implementation of very simple functions too. (.h -> .cpp)
void f(int index){
return database[index*2]; //<--- contain trivial algorithm like this
}
Question
What are the factors to determine how much the benefit?
More specifically :-
Does the amount of time used to compile saved depends on
amount of characters (exclude comment) I moved to .cpp or ...
complexity (not mean O(n) here) of algorithm of function or ...
something else ?
Should I move definition of such simple functions to .cpp?
(concern only performance and compile time, not concern maintainability or readability)
Edit: detailed example
Consider this code.
B.h :-
class B{
public: static void fb(){
//some complex code (e.g. 1000 lines)
}
};
C.h :-
#include "B.h"
class C{
static void fc();
};
C.cpp contains implementation of fc()
D.h :-
#include "B.h"
class D{
static void fd();
};
D.cpp contains implementation of fd()
Before moving definition of fb, the compiler will have to compile large code of B.h for C.cpp and D.cpp.
After moving definition of fb to b.cpp, I think C.cpp and D.cpp will be a lot easier to compile.

What are the factors to determine how much the benefit?
The major factor for reduction of compile time depends on how many other translation units include that header with the inlined code. The other factors you mentioned are merely irrelevant.
If you change something in the definition many more files would need to be recompiled, than if you change the definition only in a single .cpp file.
Should I move definition of such simple functions to .cpp?
(concern only performance and compile time, not concern maintainability or readability)
No, what I said above refers to non trivial stuff. If you have such simple function definition and it's unlikely it will be changed in future, you can leave it in the header file.

The answer can be simple and less simple.
Simple answer:
put implementation of non-trivial function in the source files, there are many advantages to this, not just compilation time.
Leave implementation of trivial function in the header file and make non-member-functions inline, the compilation time will not differ significantly and there are even better optimization possibilities.
Less simple:
Putting non-trivial functions in source file is done specifically so the header files, which are like interfaces to the code, are more readable, don't have to contain all the includes needed for implementation, can prevent mutual cycle issues and on top have better compilation times.
Putting trivial functions in the header file let the compiler do better optimisation during compile-time (as opposed to link-time) because it knows at the calling point what the function does, so it knows better when to inline and reorder code (see here for link-time optimization).
Templates should still always be in header files. for some complex functions, the non-template part may be factored out and put in a source file, but this can be fiddly.
For encapsulation reasons, it may be better to declare helper functions and helper classes in the source file completely.
When using pimpl-constructs, the trivial delegation functions, must be in the source file because only there, the pimpl is fully known.
So in the end ordering the code can cause better compilation times, but that shouldn't be the main reason.

If you call this function which's body is in .h , it will get reference to your #include's in order to find it, because #include gives only a reference.
It will be easier for the compiler if your function's body is in .cpp
I suggest you check this out. It helped me too.
What techniques can be used to speed up C++ compilation times?

Different class version with same name in different files

I have two versions of a same class in two different files (A.cpp, A.h, B.cpp, B.h) in all files the class has the same name but different internal implementation.
My idea is to switch from one version to the other just by changing the name of the .h file at #include, so I shouldn't have to change anything else in the code (both version's methods have the same signature and same properties)
The A.h and B.h are never included at the same time.
The problem is that no matter what include file I use always A version is executed. I know that when I include B.h at least it is compiled (by putting some code error they are shown at compilation time)
Can this be done? or this is breaking some rules of C++? I think that this should not break One Definition Rule because I'm not using A.h and B.h at the same time.

The solution is not to link the old file into final executable. That way only the new implementation will be available.

What I'll often do is mangle the version into a namespace, and use that.
Something along the lines of:
namespace Xyz_A { // In A.h
// Define version A
}
namespace Xyz = Xyz_A;
; in B.h, use _B instead.
This way, you would write Xyz::... in your program, but the external
symbols will have Xyz_A or Xyz_B mangled into them. But in my
option, this is really more a protection against errors. I'll arrange
things in my makefiles so that whatever switches between A.h and B.h
also causes the executable to link against the appropriate library, and
not against the other.

If the header files are identical it would be easier just to have one header and 2 different implementations files. That would reduce your problem to just linking with the right object file. This also reduces the chance of subtle bugs should your headers ever diverge.
A better solution would, of course, something that does not depend on the build system but uses language facilities to change code at compile time, like a template.

You will need to load the correct library to match the header file.
I would suggest looking into the proxy design pattern so you can include both class A and B. Then you can use the proxy to choose which class function to use during runtime.
http://en.wikipedia.org/wiki/Proxy_pattern

C++ multi-file compilation process

i'm trying to minimize header inclusion in a project, maximizing the usage of forward declarations and need to get clear on how exactly the process of C++ compilation works.
It starts of with main.cpp where we allocate object A, therefore we include A.h. Class A uses classes B and C, so i include B.h and C.h. Now if I wanted to allocate B in main.cpp, the compilation would fail.
I can easily include B.h in main.cpp, but I'm wondering if its really necessary, because I'm already including A.h and in A.h I'm including B.h. I read some previous discussions on this topic, where there was something about recursion and recompilation of the source file. So how does that exactly work?
Thanks for any advice. :)

As a simple rule of thumb, you need to define symbols any time their alignment, interface, or size is required. If a header only refers to a type as a pointer, you only need to declare it.
All compilation units which reference a header have to go through the paces of understanding it independently. That is why code in a header increases compile times super-linearly.
You can see exactly what the preprocessor prepares for the compiler if you are interested. GCC has the below syntax.
g++ -E main.cpp
MSVC has similar functionality, though I cannot quote it.
I can easily include B.h in main.cpp, but I'm wondering if its really
necessary, because I'm already including A.h and in A.h I'm including
B.h
This is a matter of circumstance, I suppose. The major annoyance with omitting headers, is that usually what happens is someone else changes something in a disparate part of the code base and you have to guess at why you are missing symbols when you update from source control. Essentially you create dependencies between headers that are not clear at all.
If my whims were law, you could throw an include to any header in an empty cpp file and it would just compile. I don't see why you wouldn't want that, though I'm not prepared to defend it as the right thing to do in all situations.

Should I include files included in another header?

Most often when creating multiple classes inside a program that use each other, I like to include only the minimum number of header files I need to reduce clutter.
For example, say class C inherits from class B, which contains class A. Now of course since class B contains class A as a member, it needs to include a.h in b.h. However, let's say C also needs to include a.h. Being lazy as I am, I just include b.h (which C needs to include anyways), and since b.h already includes a.h, I don't need to include anything more, and it compiles fine. Same for my .cpp files: I just include the header, and anything that's included in the header will be automatically included in my .cpp file, so I don't include it there.
Is this a bad habit of mine? Does it make my code less readable?

I stick with this simple rule: include everything you need to completely declare a given class, but not more and make no assumptions about includes being pulled in from other sources, i.e. ensure your files are self-sufficient.

Include what's necessary for the header file to be parsed without relying on external include ordering (in other words : make your headers self-sufficient).
In your case, if c.h declares a class C which inherits from class B, obviously you must include B.h. However, if class A never appears in c.h, I believe there is no reason to include it. The fact that b.h mentions A means that b.h must make what's necessary to be parsed, either through forward declaring A or including a.h.
So from my point of view, you're doing what should be done.
Also note that if for some reasons c.h starts mentioning A, I would add the appropriate include or forward declaration so I wouldn't depend on the fact that b.h does it for me.

It's best to include every header with definitions that you are using directly.
Relying on one of the other headers to include stuff makes your code more fragile as it becomes dependent on the implementation of classes that are external to it.
EDIT:
A short example:
Class B uses class A, e.g. a hash table implementation B that uses a hashing mechanism A
You create a class C that needs a hash table (i.e. B) and a hash algorithm (i.e. A) for some other purpose. You include B.h and leave out A.h since B.h includes it anyway.
Mary, one of your co-workers, discovers a paper about this new fabulous hashing algorithm that reduces the probability of collisions, while it needs 10% less space and is twice as fast. She (rightly) rewrites class B to use class D, which implements that algorithm. Since class A is no longer needed in B, she also removes all references to it from B.h.
Your code breaks.
EDIT 2:
There are some programmers (and I've occasionally been guilty of this too, when in a hurry) who deal with this issue by having an "include-all" header file in their project. This should be avoided, since it causes namespace pollution of unparalleled proportions. And yes, windows.h in MSVC is one of those cases in my opinion.

Is it worth forward-declaring library classes?

I've just started learning Qt, using their tutorial. I'm currently on tutorial 7, where we've made a new LCDRange class. The implementation of LCDRange (the .cpp file) uses the Qt QSlider class, so in the .cpp file is
#include <QSlider>
but in the header is a forward declaration:
class QSlider;
According to Qt,
This is another classic trick, but one that's much less used often. Because we don't need QSlider in the interface of the class, only in the implementation, we use a forward declaration of the class in the header file and include the header file for QSlider in the .cpp file.
This makes the compilation of big projects much faster, because the compiler usually spends most of its time parsing header files, not the actual source code. This trick alone can often speed up compilations by a factor of two or more.
Is this worth doing? It seems to make sense, but it's one more thing to keep track of - I feel it would be much simpler just to include everything in the header file.

Absolutely. The C/C++ build model is ...ahem... an anachronism (to say the best). For large projects it becomes a serious PITA.
As Neil notes correctly, this should not be the default approach for your class design, don't go out of your way unless you really need to.
Breaking Circular include references is the one reason where you have to use forward declarations.
// a.h
#include "b.h"
struct A { B * a; }
// b.h
#include "a.h" // circlular include reference
struct B { A * a; }
// Solution: break circular reference by forward delcaration of B or A
Reducing rebuild time - Imagine the following code
// foo.h
#include <qslider>
class Foo
{
QSlider * someSlider;
}
now every .cpp file that directly or indirectly pulls in Foo.h also pulls in QSlider.h and all of its dependencies. That may be hundreds of .cpp files! (Precompiled headers help a bit - and sometimes a lot - but they turn disk/CPU pressure in memory/disk pressure, and thus are soon hitting the "next" limit)
If the header requires only a reference declaration, this dependency can often be limited to a few files, e.g. foo.cpp.
Reducing incremental build time - The effect is even more pronounced, when dealing with your own (rather than stable library) headers. Imagine you have
// bar.h
#include "foo.h"
class Bar
{
Foo * kungFoo;
// ...
}
Now if most of your .cpp's need to pull in bar.h, they also indirectly pull in foo.h. Thus, every change of foo.h triggers build of all these .cpp files (which might not even need to know Foo!). If bar.h uses a forward declaration for Foo instead, the dependency on foo.h is limited to bar.cpp:
// bar.h
class Foo;
class Bar
{
Foo * kungFoo;
// ...
}
// bar.cpp
#include "bar.h"
#include "foo.h"
// ...
It is so common that it is a pattern - the PIMPL pattern. It's use is two-fold: first it provides true interface/implementation isolation, the other is reducing build dependencies. In practice, I'd weight their usefulness 50:50.
You need a reference in the header, you can't have a direct instantiation of the dependent type. This limits the cases where forward declarations can be applied. If you do it explicitely, it is common to use a utility class (such as boost::scoped_ptr) for that.
Is Build Time worth it? Definitely, I'd say. In the worst case build time grows polynomial with the number of files in the project. other techniques - like faster machines and parallel builds - can provide only percentage gains.
The faster the build, the more often developers test what they did, the more often unit tests run, the faster build breaks can be found fixed, and less often developers end up procrastinating.
In practice, managing your build time, while essential on a large project (say, hundreds of source files), it still makes a "comfort difference" on small projects. Also, adding improvements after the fact is often an exercise in patience, as a single fix might shave off only seconds (or less) of a 40 minute build.

I use it all the time. My rule is if it doesn't need the header, then i put a forward declaration ("use headers if you must, use forward declarations if you can"). The only thing that sucks is that i need to know how the class was declared (struct/class, maybe if it is a template i need its parameters, ...). But in the vast majority of times, it just comes down to "class Slider;" or something along that. If something requires some more hassle to be just declared, one can always declare a special forward declare header like the Standard does with iosfwd too.
Not including the header file will not only reduce compile time but also will avoid polluting the namespace. Files including the header will thank you for including as little as possible so they can keep using a clean environment.
This is the rough plan:
/* --- --- --- Y.hpp */
class X;
class Y {
X *x;
};
/* --- --- --- Y.cpp */
#include <x.hpp>
#include <y.hpp>
...
There are smart pointers that are specifically designed to work with pointers to incomplete types. One very well known one is boost::shared_ptr.

Yes, it sure does help. Another thing to add to your repertoire is precompiled headers if you are worried about compilation time.
Look up FAQ 39.12 and 39.13

The standard library does this for some of the iostream classes in the standard header <iosfwd>. However, it is not a generally applicable technique - notice there are no such headers for the other standard library types, and it should not (IMHO) be your default approach to designing class heirarchies.
Although this eems to be a favourite "optimisation" for programmers, I suspect that like most optimisations, few of them have actually timed the build of their projects both with and without such declarations. My limited experiments in this area indicate that the use of pre-compiled headers in modern compilers makes it unecessary.

There is a HUGE difference in compile times for larger projects, even ones with carefully managed dependencies. You better get the habit of forward declaring and keep as much as possible out of header files, because at a lot of software shops which uses C++ it's required. The reason for why you don't see it all that much in the standard header files is because those make heavy use of templates, at which point forward declaring becomes hard. For MSVC you can use /P to take a look at how the preprocessed file looks before actual compilation. If you haven't done any forward declaration in your project it would probably be an interesting experience to see how much extra processing needs to be done.

In general, no.
I used to forward declare as much as I could, but no longer.
As far as Qt is concerned, you may notice that there is a <QtGui> include file that will pull in all the GUI Widgets. Also, there is a <QtCore>, <QtWebKit>, <QtNetwork> etc. There's a header file for each module. It seems the Qt team believes this is the preferred method also. They say so in their module documentation.
True, the compilation time may be increased. But in my experience its just not that much. And if it were, using precompiled headers would be the next step.

When you write ...
include "foo.h"
... you thereby instruct a conventional build system "Any time there is any change whatsover in the library file foo.h, discard this compilation unit and rebuild it, even if all that happened to foo.h was the addition of a comment, or the addition of a comment to some file which foo.h includes; even if all that happened was some ultra-fastidious colleague re-balanced the curly braces; even if nothing happened other than a pressured colleague checked in foo.h unchanged and inadvertently changed its timestamp."
Why would you want to issue such a command? Library headers, because in general they have more human readers than application headers, have a special vulnerability to changes that have no impact on the binary, such as improved documentation of functions and arguments or the bump of a version number or copyright date.
The C++ rules allow namespace to be re-opened at any point in a compilation unit (unlike a struct or class) in order to support forward declaration.

Forward declarations are very useful for breaking the circular dependencies, and sometimes may be ok to use with your own code, but using them with library code may break the program on another platform or with other versions of the library (this will happen even with your code if you're not careful enough). IMHO not worth it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js