C++ Headers - Best practice when including - c++

C++ headers
If I have A.cpp and A.h as well as b.h, c.h, d.h
Should I do:
in A.h:
#include "b.h"
#include "c.h"
#include "d.h"
in A.cpp:
#include "A.h"
or
in A.cpp:
#include "A.h"
#include "b.h"
#include "c.h"
#include "d.h"
Are there performance issues? Obvious benefits? Is something bad about this?

You should only include what is necessary to compile; adding unnecessary includes will hurt your compilation times, especially in large projects.
Each header file should be able to compile cleanly on its own -- that is, if you have a source file that includes only that header, it should compile without errors. The header file should include no more than is necessary for that.
Try to use forward declarations as much as possible. If you're using a class, but the header file only deals with pointers/references to objects of that class, then there's no need to include the definition of the class -- just use a forward declaration:
class SomeClass;
// Can now use pointers/references to SomeClass
// without needing the full definition

The key practice here is having around each foo.h file a guard such as:
#ifndef _FOO_H
#define _FOO_H
...rest of the .h file...
#endif
This prevents multiple-inclusions, with loops and all such attendant horrors. Once you do ensure every include file is thus guarded, the specifics are less important.
I like one guiding principle Adam expresses: make sure that, if a source file just includes a.h, it won't inevitably get errors due to a.h assuming other files have been included before it -- e.g. if a.h requires b.h to be included before, it can and should just include b.h itself (the guards will make that a noop if b.h was already included previously)
Vice versa, a source file should include the headers from which it is requiring something (macros, declarations, etc), not assume that other headers just come in magically because it has included some.
If you're using classes by value, alas, you need all the gory details of the class in some .h you include. But for some uses via references or pointers, just a bare class sic; will suffice. E.g., all other things being equal, if class a can get away with a pointer to an instance of class b (i.e. a member class b *my_bp; rather than a member class b *my_b;) the coupling between the include files can be made weaker (reducing much recompilation) -- e.g. b.h could have little more than class b; while all the gory details are in b_impl.h which is included only by the headers that really need it...

What Adam Rosenfield told you is spot on. An example of when you can use a forward declaration is:
#ifndef A_H_
#define A_H_
#include "D.h"
class B; //forward declaration
class C; //forward declaration
class A
{
B *m_pb; //you can forward declare this one bacause it's a pointer and the compilier doesn't need to know the size of object B at this point in the code. include B.h in the cpp file.
C &m_rc; //you can also forware declare this one for the same reason, except it's a reference.
D m_d; //you cannot forward declare this one because the complier need to calc the size of object D.
};
#endif

Answer: Let A.h include b.h, c.h and d.h only if it's needed to make the build succeed. The rule of thumb is: only include as much code in a header file as necessary. Anything which is not immediately needed in A.h should be included by A.cpp.
Reasoning: The less code is included into header files, the less likely you will need to recompile the code which uses the header file after doing some change somewhere. If there are lots of #include references between different header files, changing any of them will require rebuilding all other files which include the changed header file - recursivel. So if you decide to touch some toplevel header file, this might rebuild huge parts of your code.
By using forward declarations wherever possible in your header files, you reduce the coupling of the source files and thus make the build faster. Such forward declarations can be used in many more situations than you might think. As a rule of thumb, you only need the header file t.h (which defines a type T) if you
Declare a member variable of type T (note, this does not include declaring a pointer-to-T).
Write some inline functions which access members of an object of type T.
You do not need to include the declaration of T if your header file just
Declares constructors/functions which take references or pointers to a T object.
Declares functions which return a T object by pointer, reference or value.
Declares member variables which are references or pointers to a T object.
Consider this class declaration; which of the include files for A, B and C do you really need to include?:
class MyClass
{
public:
MyClass( const A &a );
void set( const B &b );
void set( const B *b );
B getB();
C getC();
private:
B *m_b;
C m_c;
};
You just need the include file for the type C, because of the member variable m_c. You can often remove this requirement as well by not declaring your member variables directly but using an opaque pointer to hide all the member variables in a private structure, so that they don't show up in the header file anymore.

Prefer not including headers in other headers - it slows down compilation and leas to circular reference

There would be no performance issues, it's all done at compile time, and your headers should be set up so they can't be included more than once.
I don't know if there's a standard way to do it, but I prefer to include all the headers I need in source files, and include them in the headers if something in the header itself needs it (for example, a typedef from a different header)

Related

Header Include Creep in C++

When I started learning C++ I learned that header files should typically be #included in other header files. Just now I had someone tell me that I should include a specific header file in my .cpp file to avoid header include creep.
Could someone tell me what exactly that is, why it is a problem and maybe point me to some documentation on when I would want to include a file in another header and when I should include one in a .cpp file?
The "creep" refers to the inclusion of one header including many others. This has some unwanted consequences:
a tendency towards every source file indirectly including every header, so that a change to any header requires everything to be recompiled;
more likelihood of circular dependencies causing heartache.
You can often avoid including one header from another by just declaring the classes you need, rather than including the header that gives the complete definition. Such an incomplete type can be used in various ways:
// Don't need this
//#include "thingy.h"
// just this
class thingy;
class whatsit {
// You can declare pointers and references
thingy * p;
thingy & r;
// You can declare static member variables (and external globals)
// They will need the header in the source file that defines them
static thingy s;
// You can declare (but not define) functions with incomplete
// parameter or return types
thingy f(thingy);
};
Some things do require the complete definition:
// Do need this
#include "thingy.h"
// Needed for inheritance
class whatsit : thingy {
// Needed for non-static member variables
thingy t;
// Needed to do many things like copying, accessing members, etc
thingy f() {return t;}
};
Could someone tell me what exactly [include creep] is
It's not a programming term, but interpreting it in an Engish-language context would imply that it's the introduction of #include statements that are not necessary.
why it is a problem
Because compiling code takes time. So compiling code that's not necessary takes unnecessary time.
and maybe point me to some documentation on when I would want to include a file in another header and when I should include one in a .cpp file?
If your header requires the definition of a type, you will need to include the header that defines that type.
#include "type.h"
struct TypeHolder
{
Type t;
// ^^^^ this value type requires the definition to know its size.
}
If your header only requires the declaration of a type, the definition is unnecessary, and so is the include.
class Type;
struct TypeHolder
{
Type * t;
// ^^^^^^ this pointer type has the already-known size of a pointer,
// so the definition is not required.
}
As an anecdote that supports the value of this practice, I was once put on a project whose codebase required ONE HOUR to fully compile. And changing a header often incurred most or all of that hour for the next compilation.
Adding forward-declarations to the headers where applicable instead of includes immediately reduced full compilation time to 16 minutes, and changing a header often didn't require a full rebuild.
Well they were probably referring to a couple of things ("include creep" is not a term I've ever heard before). If you, as an extreme rule, only include headers from headers (aside from one matching header per source file):
Compilation time increases as many headers must be compiled for all source files.
Unnecessary dependencies increase, e.g. with a dependency based build system, modifying one header may cause unnecessary recompilation of many other files, increasing compile time.
Increased possibility of namespace pollution.
Circular dependencies become an issue (two headers that need to include each other).
Other more advanced dependency-related topics (for example, this defeats the purpose of opaque pointers, which causes many design issues in certain types of applications).
As a general rule of thumb, include the bare minimum number of things from a header required to compile only the code in that header (that is, enough to allow the header to compile if it is included by itself, but no more than that). This will combat all of the above issues.
For example, if you have a source file for a class that uses std::list in the code, but the class itself has no members or function parameters that use std::list, there's no reason to #include <list> from the class's header, as nothing in the class's header uses std::list. Do it from the source file, where you actually need it, instead of in the header, where everything that uses the class also has to compile <list> unnecessarily.
Another common technique is forward declaring pointer types. This is the only real way to combat circular dependency issues, and has some of the compilation time benefits listed above as well. For example, if two classes refer to each other but only via pointers:
// A.h
class B; // forward declaration instead of #include "B.h"
class A {
B *b;
}
// B.h
class A; // forward declaration instead of #include "A.h"
class B {
A *a;
}
Then include the headers for each in the source files, where the definitions are actually needed.
Your header files should include all headers they need to compile when included alone (or first) in a source file. You may in many cases allow the header to compile by using forward declarations, but you should never rely on someone including a specific header before they include yours.
theres compie time to think of, but there's also dependancies. if A.h includes B.h and viceversa, code won't compile. You can get around this by forward referancing your class, then including the header in the cpp
A.h
class B;
class A
{
public:
void CreateNewB():
private:
B* m_b;
};
A.cpp
#include "B.h"
A::CreateNewB()
{
m_b = new B;
}

Forward declaration for member pointer with public access

Somewhat similar situation to what was asked here.
I've got a class A that has a member pointer to class B.
//A.h
class B;
class A {
B *b;
public:
B *GetB();
};
B is defined in its own file.
Now, whenever I include A.h and want to access an A's b member, I also have to include B.h. In the case where both A and B have rather large headers (think old nasty legacy code) is it better to continue including both headers whenever I include one or to just have A.h include B.h and be done with it?
The headers are pretty large but most of our code requires both anyway, I'm just curious if there is some kind of design pattern that decides what is the best decision to make in this case.
This is opinion, of course. For me, it boils down to whether it makes sense to use A without B. If A has a ton of operations and only one of them involves B, then no, I wouldn't include B.h. Why should someone who only calls A.Foo() and A.Bar() need to pay the overhead of including an extra header?
On the other hand, if A is a B factory (for example) and you can't imagine anyone using it and not using B too, then maybe it makes sense to include B.h in A.h.
And if A had a member variable of type B (not B*) with the consequence that anyone who included A.h would have to include B.h too in order to compile, then I would definitely include it in A.h.
wrap your header files with preprocessor to make sure they will be included only 1 time..
on B.h define
#ifndef __B_HEADER__
#define __B_HEADER__
.... B header files goes here....
#endif
then on A.h define
#ifndef __A_HEADER__
#define __A_HEADER__
#include <B.h>
.... A header files goes here....
#endif
and then include only A.h when it needs.
i personally prefer to include header files of what i know and want to use - i don't want to be bothered by the dependendcy tree of the components i use.
think when you include <iostream> in C++ STD library - do you really want to know and include explicitly all the <iostrem> dependencies (if there are)?

What to #include?

I understand if a.h includes b.h and I don't declare anything from b.h in my header including only a.h is good practice. If I were to declare anything from b.h in my header then I should include both a.h and b.h to make my header self-sufficient.
In my header I do declare both class A from a.h and class B from b.h. However, class A depends on class B, so I always use them together. I never use class B independently of class A. In this case does it still make sense to include both a.h and b.h?
a.h
#include "b.h"
#include <queue>
class A
{
private:
std::queue<B> mFoo;
}
In my actual code I think it makes my intention clearer when I include just my event system for example and not a number of includes that seem superfluous to me.
You should always include direct dependencies: if a depends directly on b and c, even if b already includes c, a should include both.
Be explicit about your dependencies. You should not rely on the fact that some other module already depends on the module you'd have to include. You never know how the dependency tree will change in the future.
In this regard, having implicit dependencies makes your code fragile.
To make sure you don't redefine something by including it twice, you can put header guard inside your header files:
#ifndef GRANDFATHER_H
#define GRANDFATHER_H
struct foo {
int member;
};
#endif /* GRANDFATHER_H */
You could also use
#pragma once
But not every compiler supports it.
That way even if the preprocessor will step on same include more than once, It won't re-include the code.
My rule: Given some random header foo.h, the following should compile clean, and preferably should link and run as well:
#include "foo.h"
int main () {}
Suppose foo.h doesn't contain any syntax errors (i.e., a source file that includes it in the context you intended compiles clean) but the above nonetheless doesn't compile. This almost always means there are some missing #include directives in foo.h.
What this doesn't tell you is whether all of the headers included in foo.h truly are needed. I have a handy little script that checks the above. If it passes, it creates a copy of foo.h and progressively comments out #include directives and sees if those modified versions of foo.h pass the above test. Any that do pass are indicative of suspect superfluous #include directives.
The problem is that the #include directives deemed to be superfluous might not be superfluous. Suppose foo.h defines class Foo, which has data members of type Bar and Baz (not pointers, members). The header foo.h correctly includes both bar.h and baz.h because that is where those two classes are defined. Suppose bar.h happens to include baz.h. As the author of foo.h, I don't care. I should still include both of those headers in foo.h because foo.h has direct dependencies on both of those other headers. My script will complain about suspect superfluous headers in this case. Those complaints are hints, not mandates.
On the other hand, fixing foo.h so that my simple test program above compiles clean is a mandate.

Forward declaration / when best to include headers?

I'm pretty clear on when I can/can't use forward declaration but I'm still not sure about one thing.
Let's say I know that I have to include a header sooner or later to de-reference an object of class A.
I'm not clear on whether it's more efficient to do something like..
class A;
class B
{
A* a;
void DoSomethingWithA();
};
and then in the cpp have something like..
#include "A.hpp"
void B::DoSomethingWithA()
{
a->FunctionOfA();
}
Or might I as well just include A's header in B's header file in the first place?
If the former is more efficient then I'd appreciate it if someone clearly explained why as I suspect it has something to do with the compilation process which I could always do with learning more about.
Use forward declarations (as in your example) whenever possible. This reduces compile times, but more importantly minimizes header and library dependencies for code that doesn't need to know and doesn't care for implementation details. In general, no code other than the actual implementation should care about implementation details.
Here is Google's rationale on this: Header File Dependencies
When you use forward declaration, you explicitly say with it "class B doesn't need to know anything about internal implementation of class A, it only needs to know that class named A exists". If you can avoid including that header, then avoid it. - it's good practice to use forward declaration instead because you eliminate redundant dependencies by using it.
Also note, that when you change the header file, it causes all files that include it to be recompiled.
These questions will also help you:
What are the drawbacks of forward declaration?
What is the purpose of forward declaration?
Don't try to make your compilation efficient. There be dragons. Just include A.hpp in B.hpp.
The standard practice for C and C++ header files is to wrap all of the header file in an #ifndef to make sure it is compiled only once:
#ifndef _A_HPP_
#define _A_HPP_
// all your definitions
#endif
That way, if you #include "A.hpp" in B.hpp, you can have a program that includes both, and it won't break because it won't try to define anything twice.

#import in header file or implementation file

Some have the habit of adding header file imports/includes to the header file. Some on the other hand, write a forward declaration in the header file and write the actual #include or #import lines in the implementation file.
Is there a standard practice for this? Which is better and why?
Given X.h and X.c, if you #include everything from X.h then clients of "X" that #include <X.h> will also include all those headers, even though some may only be needed in X.c.
X.h should include only what's needed to parse X.h. It should assume no other headers will have been included by the translation unit to ensure that reordering inclusions won't break a client. X.c should include any extras needed for implementation.
This minimises recompilation dependencies. You don't want a change only to the implementation to affect the header and hence trigger a client recompilation. You should simply include directly from X.c.
Including forward references instead of headers is necessary when classes have shallow dependencies. For example:
A.h
#include "B.h"
class A {
B* pointer;
};
B.h
#include "A.h"
class B {
A* pointer;
};
Will break at compilation.
A.h
class B;
class A {
B* pointer;
};
B.h
class A;
class B {
A* pointer;
};
Will work as each class only needs to know the other class exists at the declaration.
I write my imports in header files, so that every implementation file has only one inclusion directive. This also has the advantage of hiding dependencies from the user of the module's code.
However, that same hiding has a disadvantage: your module's user may be importing all kinds of other headers included in your header that he may not need at all. From that point of view, it's better to have the inclusion directives in the implementation file, even if it means manually resolving dependencies, because it leads to lighter code.
I don't think there's a single answer. Considering the reasons I gave, I prefer the first approach, I think it leads to cleaner code (albeit heavier and possibly with unnecessary imports).
I don't remember who I'm quoting (and thus the phrase is not exact), but I always remember reading: "programs are written for human beings to read, and ocasionally for computers to execute". I don't particularly care if there are a few kilobytes of code the user of my module won't need, as long as he can cleanly, easily import it and use it with a single directive.
Again, I think it's a matter of taste, unless there's something I failed to consider. Comments are more than welcome in that case!
Cheers.