i'm trying to minimize header inclusion in a project, maximizing the usage of forward declarations and need to get clear on how exactly the process of C++ compilation works.
It starts of with main.cpp where we allocate object A, therefore we include A.h. Class A uses classes B and C, so i include B.h and C.h. Now if I wanted to allocate B in main.cpp, the compilation would fail.
I can easily include B.h in main.cpp, but I'm wondering if its really necessary, because I'm already including A.h and in A.h I'm including B.h. I read some previous discussions on this topic, where there was something about recursion and recompilation of the source file. So how does that exactly work?
Thanks for any advice. :)
As a simple rule of thumb, you need to define symbols any time their alignment, interface, or size is required. If a header only refers to a type as a pointer, you only need to declare it.
All compilation units which reference a header have to go through the paces of understanding it independently. That is why code in a header increases compile times super-linearly.
You can see exactly what the preprocessor prepares for the compiler if you are interested. GCC has the below syntax.
g++ -E main.cpp
MSVC has similar functionality, though I cannot quote it.
I can easily include B.h in main.cpp, but I'm wondering if its really
necessary, because I'm already including A.h and in A.h I'm including
B.h
This is a matter of circumstance, I suppose. The major annoyance with omitting headers, is that usually what happens is someone else changes something in a disparate part of the code base and you have to guess at why you are missing symbols when you update from source control. Essentially you create dependencies between headers that are not clear at all.
If my whims were law, you could throw an include to any header in an empty cpp file and it would just compile. I don't see why you wouldn't want that, though I'm not prepared to defend it as the right thing to do in all situations.
Related
If you read online then there is plenty of claims that in C++ if you use forward declaration then it saves your compile time. The usual theory is that since #include means mere text replacement if I use forward declaration, then my compiler doesn't need to parse the header and possibly compile it, so it saves time. I found this claim hard to believe because consider I usually see code like this:
// B.h
class A;
class B {
public:
void doSomething(A& a);
}
In this case, yeah, we don't need to include A.h in B.h as we forward declared it, but the problem is that in B.cpp eventually, we need a full type A to use its methods and data members. So I found in nearly all cases, we need to include A.h in B.cpp.
So how does forward declaration actually save compile time? I see people with benchmarks to prove that if they use forward declaration instead of #includes, the compile time actually goes down, so there must be something I do not understand here...
I know saving compile time is not the sole purpose of forward declaration, I understand it has other purposes. I just want to understand why some people claim it can save compile time.
Compile times
the problem is that in B.cpp eventually, we need a full type A to use its methods and data members.
Yes, that is a typical pattern. Forward declare a class (e.g. A) in a header (e.g. B.h), then in the source code corresponding to that header (B.cpp), include the header for the forward-declared class (A.h).
So I found in nearly all cases, we need to include B.h in B.cpp.
Correct, forward declarations do not save time when compiling the corresponding source code. The savings come when compiling other source code that uses B. For example:
other.cpp
#include "B.h"
// Do stuff with `B` objects.
// Make no use of `A` objects.
Assume this file does not need definitions from A.h. This is where the savings come in. When compiling other.cpp, if B.h uses a forward declaration of A, there is no need to process A.h. Nor is there a need to process the headers that A.h itself includes, and so on. Now multiply this effect by the number of files that include B.h, either directly or indirectly.
Note that there is a compounding effect here. The number of "headers that A.h itself includes" and of "files that include B.h" would be the numbers before replacing any #include statements with forward declarations. (Once you start making these replacements, the numbers come down.)
How much of an effect? Not as much as there used to be. Still, as long as we're talking theoretically, even the smallest savings is still a savings.
Rebuild times
Instead of raw compile times (build everything), I think a better focus would be on rebuild times. That is, the time it takes to compile just the files affected by a change you made.
Suppose there are ten files that rely on B.h but not on A.h. If B.h were to include A.h, then those ten files would be affected by changes to A.h. If B.h were instead to forward declare A, then those files would not be affected by changes to A.h, reducing the time to rebuild after those changes.
Now suppose there is another class, call it B2, that also has the option to forward declare A instead of including the header. Maybe there are another ten files that depend on B2 but not on B and not on A. Now there are a twenty files that do not need to be re-compiled after changes to A.
But why stop there? Let's add B3 through B10 to the mix. Now there are a hundred files that do not need to be re-compiled after changes to A.
Add another layer. Suppose there is a C.h that has the option to forward declare B instead of including B.h. By using a forward declarations, changes to A.h no longer require re-compiling the ten files that use C.h. And, of course, we'll assume there are ten such files for each of B through B10. Now we're up to 10*10*10 files that do not need to be recompiled when A.h changes.
Takeaway
This is a simplified example to serve as a demonstration. The point is that there is a forest of dependency trees created by #include lines. (The root of such a tree would be the header file of interest, and its children are the files that #include it.) Each leaf in one of these trees represents a file that must be compiled when changes occur in the header file of interest. The number of leaves in a tree grows exponentially with the depth, so removing a branch (by replacing an #include with a forward declaration) can have a massive effect on rebuild time. Or maybe a negligible effect. This is theory, not practice.
I should note that like the question, this answer focuses on compile times, not on the other factors to consider. This is not supposed to be a comprehensive guide to the pros and cons of forward declarations, just an explanation for how they could save compilation time.
I started writing a simple interpreter in C++ with a class structure that I will describe below, but I quit and rewrote the thing in Java because headers were giving me a hard time. Here's the basic structure that is apparently not allowed in C++:
main.cpp contains the main function and includes a header for a class we can call printer.h (whose single void method is implemented in printer.cpp). Now imagine two other classes which are identical. Both want to call Printer::write_something();, so I included printer.h in each. So here's my first question: Why can I #include <iostream> a million times, even one after the other, but I can only include my header once? (Well, I think I could probably do the same thing with mine, as long as it's in the same file. But I may be wrong.) I understand the difference between a declaration and an implementation/definition, but that code gives me a class redefinition error. I don't see why. And here's the thing that blows my mind (and probably shows you why I don't understand any of this): I can't just include printer.h at the top of main.cpp and use the class from my other two classes. I know I can include printer.h in one of the two classes (headers) with no trouble, but I don't see why this is any different than just including it before I include the class in main.cpp (as doing so gives me a class not found error).
When I got fed up, I thought about moving to C since the OOP I was using was quite forced anyway, but I would run into the same problem unless I wrote everything in one file. It's frustrating to know C++ but be unable to use it correctly because of compilation issues.
I would really appreciate it if you could clear this up for me. Thanks!
Why can I #include a million times, even one after the other, but I can only include my header once?
It is probably because your header doesn't have an include guard.
// printer.h file
#ifndef PRINTER_H_
#define PRINTER_H_
// printer.h code goes here
#endif
Note that it is best practice to chose longer names for the include guard defines, to minimise the chance that two different headers might have the same one.
Most header files should be wrapped in an include guard:
#ifndef MY_UNIQUE_INCLUDE_NAME_H
#define MY_UNIQUE_INCLUDE_NAME_H
// All content here.
#endif
This way, the compiler will only see the header's contents once per translation unit.
C/C++ compilation is divided in compilation/translation units to generate object files. (.o, .obj)
see here the definition of translation unit
An #include directive in a C/C++ file results in the direct equivalent of a simple recursive copy-paste in the same file. You can try it as an experiment.
So if the same translation unit includes the same header twice the compiler sees that some entities are being defined multiple times, as it would happen if you write them in the same file. The error output would be exactly the same.
There is no built-in protection in the language that prevents you to do multiple inclusions, so you have to resort to write the include guard or specific #pragma boilerplate for each C/C++ header.
I have two versions of a same class in two different files (A.cpp, A.h, B.cpp, B.h) in all files the class has the same name but different internal implementation.
My idea is to switch from one version to the other just by changing the name of the .h file at #include, so I shouldn't have to change anything else in the code (both version's methods have the same signature and same properties)
The A.h and B.h are never included at the same time.
The problem is that no matter what include file I use always A version is executed. I know that when I include B.h at least it is compiled (by putting some code error they are shown at compilation time)
Can this be done? or this is breaking some rules of C++? I think that this should not break One Definition Rule because I'm not using A.h and B.h at the same time.
The solution is not to link the old file into final executable. That way only the new implementation will be available.
What I'll often do is mangle the version into a namespace, and use that.
Something along the lines of:
namespace Xyz_A { // In A.h
// Define version A
}
namespace Xyz = Xyz_A;
; in B.h, use _B instead.
This way, you would write Xyz::... in your program, but the external
symbols will have Xyz_A or Xyz_B mangled into them. But in my
option, this is really more a protection against errors. I'll arrange
things in my makefiles so that whatever switches between A.h and B.h
also causes the executable to link against the appropriate library, and
not against the other.
If the header files are identical it would be easier just to have one header and 2 different implementations files. That would reduce your problem to just linking with the right object file. This also reduces the chance of subtle bugs should your headers ever diverge.
A better solution would, of course, something that does not depend on the build system but uses language facilities to change code at compile time, like a template.
You will need to load the correct library to match the header file.
I would suggest looking into the proxy design pattern so you can include both class A and B. Then you can use the proxy to choose which class function to use during runtime.
http://en.wikipedia.org/wiki/Proxy_pattern
Most often when creating multiple classes inside a program that use each other, I like to include only the minimum number of header files I need to reduce clutter.
For example, say class C inherits from class B, which contains class A. Now of course since class B contains class A as a member, it needs to include a.h in b.h. However, let's say C also needs to include a.h. Being lazy as I am, I just include b.h (which C needs to include anyways), and since b.h already includes a.h, I don't need to include anything more, and it compiles fine. Same for my .cpp files: I just include the header, and anything that's included in the header will be automatically included in my .cpp file, so I don't include it there.
Is this a bad habit of mine? Does it make my code less readable?
I stick with this simple rule: include everything you need to completely declare a given class, but not more and make no assumptions about includes being pulled in from other sources, i.e. ensure your files are self-sufficient.
Include what's necessary for the header file to be parsed without relying on external include ordering (in other words : make your headers self-sufficient).
In your case, if c.h declares a class C which inherits from class B, obviously you must include B.h. However, if class A never appears in c.h, I believe there is no reason to include it. The fact that b.h mentions A means that b.h must make what's necessary to be parsed, either through forward declaring A or including a.h.
So from my point of view, you're doing what should be done.
Also note that if for some reasons c.h starts mentioning A, I would add the appropriate include or forward declaration so I wouldn't depend on the fact that b.h does it for me.
It's best to include every header with definitions that you are using directly.
Relying on one of the other headers to include stuff makes your code more fragile as it becomes dependent on the implementation of classes that are external to it.
EDIT:
A short example:
Class B uses class A, e.g. a hash table implementation B that uses a hashing mechanism A
You create a class C that needs a hash table (i.e. B) and a hash algorithm (i.e. A) for some other purpose. You include B.h and leave out A.h since B.h includes it anyway.
Mary, one of your co-workers, discovers a paper about this new fabulous hashing algorithm that reduces the probability of collisions, while it needs 10% less space and is twice as fast. She (rightly) rewrites class B to use class D, which implements that algorithm. Since class A is no longer needed in B, she also removes all references to it from B.h.
Your code breaks.
EDIT 2:
There are some programmers (and I've occasionally been guilty of this too, when in a hurry) who deal with this issue by having an "include-all" header file in their project. This should be avoided, since it causes namespace pollution of unparalleled proportions. And yes, windows.h in MSVC is one of those cases in my opinion.
I'm reading some c++ code and Notice that there are "#include" both in the header files and .cpp files . I guess if I move all the "#include" in the file, let's say foo.cpp, to its' header file foo.hh and let foo.cpp only include foo.hh the code should work anyway taking no account of issues like drawbacks , efficiency and etc .
I know my "all of sudden" idea must be in some way a bad idea, but what is the exact drawbacks of it? I'm new to c++ so I don't want to read lots of C++ book before I can answer this question by myself. so just drop the question here for your help . thanks in advance.
As a rule, put your includes in the .cpp files when you can, and only in the .h files when that is not possible.
You can use forward declarations to remove the need to include headers from other headers in many cases: this can help reduce compilation time which can become a big issue as your project grows. This is a good habit to get into early on because trying to sort it out at a later date (when its already a problem) can be a complete nightmare.
The exception to this rule is templated classes (or functions): in order to use them you need to see the full definition, which usually means putting them in a header file.
The include files in a header should only be those necessary to support that header. For example, if your header declares a vector, you should include vector, but there's no reason to include string. You should be able to have an empty program that only includes that single header file and will compile.
Within the source code, you need includes for everything you call, of course. If none of your headers required iostream but you needed it for the actual source, it should be included separately.
Include file pollution is, in my opinion, one of the worst forms of code rot.
edit: Heh. Looks like the parser eats the > and < symbols.
You would make all other files including your header file transitively include all the #includes in your header too.
In C++ (as in C) #include is handled by the preprocessor by simply inserting all the text in the #included file in place of the #include statement. So with lots of #includes you can literally boast the size of your compilable file to hundreds of kilobytes - and the compiler needs to parse all this for every single file. Note that the same file included in different places must be reparsed again in every single place where it is #included! This can slow down the compilation to a crawl.
If you need to declare (but not define) things in your header, use forward declaration instead of #includes.
While a header file should include only what it needs, "what it needs" is more fluid than you might think, and is dependent on the purpose to which you put the header. What I mean by this is that some headers are actually interface documents for libraries or other code. In those cases, the headers must include (and probably #include) everything another developer will need in order to correctly use your library.
Including header files from within header files is fine, so is including in c++ files, however, to minimize build times it is generally preferable to avoid including a header file from within another header unless absolutely necessary especially if many c++ files include the same header.
.hh (or .h) files are supposed to be for declarations.
.cpp (or .cc) files are supposed to be for definitions and implementations.
Realize first that an #include statement is literal. #include "foo.h" literally copies the contents of foo.h and pastes it where the include directive is in the other file.
The idea is that some other files bar.cpp and baz.cpp might want to make use of some code that exists in foo.cc. The way to do that, normally, would be for bar.cpp and baz.cpp to #include "foo.h" to get the declarations of the functions or classes that they wanted to use, and then at link time, the linker would hook up these uses in bar.cpp and baz.cpp to the implementations in foo.cpp (that's the whole point of the linker).
If you put everything in foo.h and tried to do this, you would have a problem. Say that foo.h declares a function called doFoo(). If the definition (code for) this function is in foo.cc, that's fine. But if the code for doFoo() is moved into foo.h, and then you include foo.h inside foo.cpp, bar.cpp and baz.cpp, there are now three definitions for a function named doFoo(), and your linker will complain because you are not allowed to have more than one thing with the same name in the same scope.
If you #include the .cpp files, you will probably end up with loads of "multiple definition" errors from the linker. You can in theory #include everything into a single translation unit, but that also means that everything must be re-built every time you make a change to a single file. For real-world projects, that is unacceptable, which is why we have linkers and tools like make.
There's nothing wrong with using #include in a header file. It is a very common practice, you don't want to burden a user a library with also remembering what other obscure headers are needed.
A standard example is #include <vector>. Gets you the vector class. And a raft of internal CRT header files that are needed to compile the vector class properly, stuff you really don't need nor want to know about.
You can avoid multiple definition errors if you use "include guards".
(begin myheader.h)
#ifndef _myheader_h_
#define _myheader_h_
struct blah {};
extern int whatsit;
#endif //_myheader_h_
Now if you #include "myheader.h" in other header files, it'll only get included once (due to _myheader_h_ being defined). I believe MSVC has a "#pragma once" with the equivalent functionality.