Should I include files included in another header? - c++

Most often when creating multiple classes inside a program that use each other, I like to include only the minimum number of header files I need to reduce clutter.
For example, say class C inherits from class B, which contains class A. Now of course since class B contains class A as a member, it needs to include a.h in b.h. However, let's say C also needs to include a.h. Being lazy as I am, I just include b.h (which C needs to include anyways), and since b.h already includes a.h, I don't need to include anything more, and it compiles fine. Same for my .cpp files: I just include the header, and anything that's included in the header will be automatically included in my .cpp file, so I don't include it there.
Is this a bad habit of mine? Does it make my code less readable?

I stick with this simple rule: include everything you need to completely declare a given class, but not more and make no assumptions about includes being pulled in from other sources, i.e. ensure your files are self-sufficient.

Include what's necessary for the header file to be parsed without relying on external include ordering (in other words : make your headers self-sufficient).
In your case, if c.h declares a class C which inherits from class B, obviously you must include B.h. However, if class A never appears in c.h, I believe there is no reason to include it. The fact that b.h mentions A means that b.h must make what's necessary to be parsed, either through forward declaring A or including a.h.
So from my point of view, you're doing what should be done.
Also note that if for some reasons c.h starts mentioning A, I would add the appropriate include or forward declaration so I wouldn't depend on the fact that b.h does it for me.

It's best to include every header with definitions that you are using directly.
Relying on one of the other headers to include stuff makes your code more fragile as it becomes dependent on the implementation of classes that are external to it.
EDIT:
A short example:
Class B uses class A, e.g. a hash table implementation B that uses a hashing mechanism A
You create a class C that needs a hash table (i.e. B) and a hash algorithm (i.e. A) for some other purpose. You include B.h and leave out A.h since B.h includes it anyway.
Mary, one of your co-workers, discovers a paper about this new fabulous hashing algorithm that reduces the probability of collisions, while it needs 10% less space and is twice as fast. She (rightly) rewrites class B to use class D, which implements that algorithm. Since class A is no longer needed in B, she also removes all references to it from B.h.
Your code breaks.
EDIT 2:
There are some programmers (and I've occasionally been guilty of this too, when in a hurry) who deal with this issue by having an "include-all" header file in their project. This should be avoided, since it causes namespace pollution of unparalleled proportions. And yes, windows.h in MSVC is one of those cases in my opinion.

Related

How does forward declaration save compile time?

If you read online then there is plenty of claims that in C++ if you use forward declaration then it saves your compile time. The usual theory is that since #include means mere text replacement if I use forward declaration, then my compiler doesn't need to parse the header and possibly compile it, so it saves time. I found this claim hard to believe because consider I usually see code like this:
// B.h
class A;
class B {
public:
void doSomething(A& a);
}
In this case, yeah, we don't need to include A.h in B.h as we forward declared it, but the problem is that in B.cpp eventually, we need a full type A to use its methods and data members. So I found in nearly all cases, we need to include A.h in B.cpp.
So how does forward declaration actually save compile time? I see people with benchmarks to prove that if they use forward declaration instead of #includes, the compile time actually goes down, so there must be something I do not understand here...
I know saving compile time is not the sole purpose of forward declaration, I understand it has other purposes. I just want to understand why some people claim it can save compile time.
Compile times
the problem is that in B.cpp eventually, we need a full type A to use its methods and data members.
Yes, that is a typical pattern. Forward declare a class (e.g. A) in a header (e.g. B.h), then in the source code corresponding to that header (B.cpp), include the header for the forward-declared class (A.h).
So I found in nearly all cases, we need to include B.h in B.cpp.
Correct, forward declarations do not save time when compiling the corresponding source code. The savings come when compiling other source code that uses B. For example:
other.cpp
#include "B.h"
// Do stuff with `B` objects.
// Make no use of `A` objects.
Assume this file does not need definitions from A.h. This is where the savings come in. When compiling other.cpp, if B.h uses a forward declaration of A, there is no need to process A.h. Nor is there a need to process the headers that A.h itself includes, and so on. Now multiply this effect by the number of files that include B.h, either directly or indirectly.
Note that there is a compounding effect here. The number of "headers that A.h itself includes" and of "files that include B.h" would be the numbers before replacing any #include statements with forward declarations. (Once you start making these replacements, the numbers come down.)
How much of an effect? Not as much as there used to be. Still, as long as we're talking theoretically, even the smallest savings is still a savings.
Rebuild times
Instead of raw compile times (build everything), I think a better focus would be on rebuild times. That is, the time it takes to compile just the files affected by a change you made.
Suppose there are ten files that rely on B.h but not on A.h. If B.h were to include A.h, then those ten files would be affected by changes to A.h. If B.h were instead to forward declare A, then those files would not be affected by changes to A.h, reducing the time to rebuild after those changes.
Now suppose there is another class, call it B2, that also has the option to forward declare A instead of including the header. Maybe there are another ten files that depend on B2 but not on B and not on A. Now there are a twenty files that do not need to be re-compiled after changes to A.
But why stop there? Let's add B3 through B10 to the mix. Now there are a hundred files that do not need to be re-compiled after changes to A.
Add another layer. Suppose there is a C.h that has the option to forward declare B instead of including B.h. By using a forward declarations, changes to A.h no longer require re-compiling the ten files that use C.h. And, of course, we'll assume there are ten such files for each of B through B10. Now we're up to 10*10*10 files that do not need to be recompiled when A.h changes.
Takeaway
This is a simplified example to serve as a demonstration. The point is that there is a forest of dependency trees created by #include lines. (The root of such a tree would be the header file of interest, and its children are the files that #include it.) Each leaf in one of these trees represents a file that must be compiled when changes occur in the header file of interest. The number of leaves in a tree grows exponentially with the depth, so removing a branch (by replacing an #include with a forward declaration) can have a massive effect on rebuild time. Or maybe a negligible effect. This is theory, not practice.
I should note that like the question, this answer focuses on compile times, not on the other factors to consider. This is not supposed to be a comprehensive guide to the pros and cons of forward declarations, just an explanation for how they could save compilation time.

Forward declaration for member pointer with public access

Somewhat similar situation to what was asked here.
I've got a class A that has a member pointer to class B.
//A.h
class B;
class A {
B *b;
public:
B *GetB();
};
B is defined in its own file.
Now, whenever I include A.h and want to access an A's b member, I also have to include B.h. In the case where both A and B have rather large headers (think old nasty legacy code) is it better to continue including both headers whenever I include one or to just have A.h include B.h and be done with it?
The headers are pretty large but most of our code requires both anyway, I'm just curious if there is some kind of design pattern that decides what is the best decision to make in this case.
This is opinion, of course. For me, it boils down to whether it makes sense to use A without B. If A has a ton of operations and only one of them involves B, then no, I wouldn't include B.h. Why should someone who only calls A.Foo() and A.Bar() need to pay the overhead of including an extra header?
On the other hand, if A is a B factory (for example) and you can't imagine anyone using it and not using B too, then maybe it makes sense to include B.h in A.h.
And if A had a member variable of type B (not B*) with the consequence that anyone who included A.h would have to include B.h too in order to compile, then I would definitely include it in A.h.
wrap your header files with preprocessor to make sure they will be included only 1 time..
on B.h define
#ifndef __B_HEADER__
#define __B_HEADER__
.... B header files goes here....
#endif
then on A.h define
#ifndef __A_HEADER__
#define __A_HEADER__
#include <B.h>
.... A header files goes here....
#endif
and then include only A.h when it needs.
i personally prefer to include header files of what i know and want to use - i don't want to be bothered by the dependendcy tree of the components i use.
think when you include <iostream> in C++ STD library - do you really want to know and include explicitly all the <iostrem> dependencies (if there are)?

C++ : Best place to include header files

Let's say we have four files: a.h, a.cpp, b1.h, and b2.h. And, we need to include b1.h and b2.h in either a.h or a.cpp. Where should I include b1 and b2? Let's say only a.cpp needs b1 and b2.
If it's not needed for the header file then include it in the cpp.
This reduces compilation time and helps to understand the dependencies between the modules.
A general rule is that you should avoid including a header inside headers that do not use definitions from it.
If b1.h and/or b2.h have definitions like struct or typedef, etc. that are actually used in a.h (in a function prototype as parameters or return type, for example), then you should include them in the top of the header.
Otherwise, if b1.h/b2.h only provide definitions that are used internally to a.cpp (private members, etc), then include it at the top of a.cpp.
You should try to only include in a file what is actually needed for the compiler to understand that file. (Unlike windows, with the monstrosity that is <Windows.h>.)
Only include what is needed in the header.
Do the class definitions and function declarations of a.h require b1.h or b2.h?
Then include what is needed.
Otherwise only include in the .cpp.
Just remember, each time you include a file your compilation takes that much longer.
Here are a couple hints of when things are needed:
Return values or parameters do not need to be included.
For example std::string blahFunc(std::string a); does not need <string> included in the header file(still needs a forward declaration though)
Pointer types do not need to be included, they just need to be forward declared. For example randomType * f(); does not need to include randomType's header in it's header. All you have to do is forward declare with class randomType;
References can also just be forward declared.
I think the include directives should always be in your .h** files. The only include you should put in the a.cpp file should be
#include "a.hpp"
To understand why, suppose Bob uses your code inside his program, and your code is not disclosed (i.e., you provide it only as a library). Then all he can see are your headers. If you include everything you need in your headers, he will still be able to check what are the dependencies of your code and make sure he has everything needed by your code. If you put the include directives in the .c** files, then unless your code is open source (i.e. he has access to the .c** files) he won't be able to see what are the packages he must make sure to have installed.

C++ multi-file compilation process

i'm trying to minimize header inclusion in a project, maximizing the usage of forward declarations and need to get clear on how exactly the process of C++ compilation works.
It starts of with main.cpp where we allocate object A, therefore we include A.h. Class A uses classes B and C, so i include B.h and C.h. Now if I wanted to allocate B in main.cpp, the compilation would fail.
I can easily include B.h in main.cpp, but I'm wondering if its really necessary, because I'm already including A.h and in A.h I'm including B.h. I read some previous discussions on this topic, where there was something about recursion and recompilation of the source file. So how does that exactly work?
Thanks for any advice. :)
As a simple rule of thumb, you need to define symbols any time their alignment, interface, or size is required. If a header only refers to a type as a pointer, you only need to declare it.
All compilation units which reference a header have to go through the paces of understanding it independently. That is why code in a header increases compile times super-linearly.
You can see exactly what the preprocessor prepares for the compiler if you are interested. GCC has the below syntax.
g++ -E main.cpp
MSVC has similar functionality, though I cannot quote it.
I can easily include B.h in main.cpp, but I'm wondering if its really
necessary, because I'm already including A.h and in A.h I'm including
B.h
This is a matter of circumstance, I suppose. The major annoyance with omitting headers, is that usually what happens is someone else changes something in a disparate part of the code base and you have to guess at why you are missing symbols when you update from source control. Essentially you create dependencies between headers that are not clear at all.
If my whims were law, you could throw an include to any header in an empty cpp file and it would just compile. I don't see why you wouldn't want that, though I'm not prepared to defend it as the right thing to do in all situations.

#import in header file or implementation file

Some have the habit of adding header file imports/includes to the header file. Some on the other hand, write a forward declaration in the header file and write the actual #include or #import lines in the implementation file.
Is there a standard practice for this? Which is better and why?
Given X.h and X.c, if you #include everything from X.h then clients of "X" that #include <X.h> will also include all those headers, even though some may only be needed in X.c.
X.h should include only what's needed to parse X.h. It should assume no other headers will have been included by the translation unit to ensure that reordering inclusions won't break a client. X.c should include any extras needed for implementation.
This minimises recompilation dependencies. You don't want a change only to the implementation to affect the header and hence trigger a client recompilation. You should simply include directly from X.c.
Including forward references instead of headers is necessary when classes have shallow dependencies. For example:
A.h
#include "B.h"
class A {
B* pointer;
};
B.h
#include "A.h"
class B {
A* pointer;
};
Will break at compilation.
A.h
class B;
class A {
B* pointer;
};
B.h
class A;
class B {
A* pointer;
};
Will work as each class only needs to know the other class exists at the declaration.
I write my imports in header files, so that every implementation file has only one inclusion directive. This also has the advantage of hiding dependencies from the user of the module's code.
However, that same hiding has a disadvantage: your module's user may be importing all kinds of other headers included in your header that he may not need at all. From that point of view, it's better to have the inclusion directives in the implementation file, even if it means manually resolving dependencies, because it leads to lighter code.
I don't think there's a single answer. Considering the reasons I gave, I prefer the first approach, I think it leads to cleaner code (albeit heavier and possibly with unnecessary imports).
I don't remember who I'm quoting (and thus the phrase is not exact), but I always remember reading: "programs are written for human beings to read, and ocasionally for computers to execute". I don't particularly care if there are a few kilobytes of code the user of my module won't need, as long as he can cleanly, easily import it and use it with a single directive.
Again, I think it's a matter of taste, unless there's something I failed to consider. Comments are more than welcome in that case!
Cheers.