I have a static library that I am building in C++. I have separated it into many header and source files. I am wondering if it's better to include all of the headers that a client of the library might need in one header file that they in turn can include in their source code or just have them include only the headers they need? Will that cause the code to be unecessary bloated? I wasn't sure if the classes or functions that don't get used will still be compiled into their products.
Thanks for any help.
Keep in mind that each source file that you compile involves an independent invocation of the compiler. With each invocation, the compiler has to read in every included header file, parse through it, and build up a symbol table.
When you use one of these "include the world" header files in lots of your source files, it can significantly impact your build time.
There are ways to mitigate this; for example, Microsoft has a precompiled header feature that essentially saves out the symbol table for subsequent compiles to use.
There is another consideration though. If I'm going to use your WhizzoString class, I shouldn't have to have headers installed for SOAP, OpenGL, and what have you. In fact, I'd rather that WhizzoString.h only include headers for the types and symbols that are part of the public interface (i.e., the stuff that I'm going to need as a user of your class).
As much as possible, you should try to shift includes from WhizzoString.h to WhizzoString.cpp:
OK:
// Only include the stuff needed for this class
#include "foo.h" // Foo class
#include "bar.h" // Bar class
public class WhizzoString
{
private Foo m_Foo;
private Bar * m_pBar;
.
.
.
}
BETTER:
// Only include the stuff needed by the users of this class
#include "foo.h" // Foo class
class Bar; // Forward declaration
public class WhizzoString
{
private Foo m_Foo;
private Bar * m_pBar;
.
.
.
}
If users of your class never have to create or use a Bar type, and the class doesn't contain any instances of Bar, then it may be sufficient to provide only a forward declaration of Bar in the header file (WhizzoString.cpp will have #include "bar.h"). This means that anyone including WhizzoString.h could avoid including Bar.h and everything that it includes.
In general, when linking the final executable, only the symbols and functions that are actually used by the program will be incorporated. You pay only for what you use. At least that's how the GCC toolchain appears to work for me. I can't speak for all toolchains.
If the client will always have to include the same set of header files, then it's okay to provide a "convience" header file that includes others. This is common practice in open-source libraries. If you decide to provide a convenience header, make it so that the client can also choose to include specifically what is needed.
To reduce compile times in large projects, it's common practice to include the least amount of headers as possible to make a unit compile.
what about giving both choices:
#include <library.hpp> // include everything
#include <library/module.hpp> // only single module
this way you do not have one huge include file, and for your separate files, they are stacked neatly in one directory
It depends on the library, and how you've structured it. Remember that header files for a library, and which pieces are in which header file, are essentially part of the API of the library. So, if you lead your clients to carefully pick and choose among your headers, then you will need to support that layout for a long time. It is fairly common for libraries to export their whole interface via one file, or just a few files, if some part of the API is truly optional and large.
A consideration should be compilation time: If the client has to include two dozen files to use your library, and those includes have internal includes, it can significantly increase compilation time in a big project, if used often. If you go this route, be sure all your includes have proper include guards around not only the file contents, but the including line as well. Though note: Modern GCC does a very good job of this particular issue and only requires the guards around the header's contents.
As to bloating the final compiled program, it depends on your tool chain, and how you compiled the library, not how the client of the library included header files. (With the caveat that if you declare static data objects in the headers, some systems will end up linking in the objects that define that data, even if the client doesn't use it.)
In summary, unless it is a very big library, or a very old and cranky tool chain, I'd tend to go with the single include. To me, freezing your current implementation's division into headers into the library's API is bigger worry than the others.
The problem with single file headers is explained in detail by Dr. Dobbs, an expert compiler writer. NEVER USE A SINGLE FILE HEADER!!! Each time a header is included in a .cc/.cpp file it has to be recompiled because you can feed the file macros to alter the compiled header. For this reason, a single header file will dramatically increase compile time without providing any benifit. With C++ you should optimize for human time first, and compile time is human time. You should never, because it dramatically increases compile time, include more than you need to compile in any header, each translation unit(TU) should have it's own implementation (.cc/.cpp) file, and each TU named with unique filenames;.
In my decade of C++ SDK development experience, I religiously ALWAYS have three files in EVERY module. I have a config.h that gets included into almost every header file that contains prereqs for the entire module such as platform-config and stdint.h stuff. I also have a global.h file that includes all of the header files in the module; this one is mostly for debugging (hint enumerate your seams in the global.h file for better tested and easier to debug code). The key missing piece here is that ou should really have a public.h file that includes ONLY your public API.
In libraries that are poorly programmed, such as boost and their hideous lower_snake_case class names, they use this half-baked worst practice of using a detail (sometimes named 'impl') folder design pattern to "conceal" their private interface. There is a long background behind why this is a worst practice, but the short story is that it creates an INSANE amount of redundant typing that turns one-liners into multi-liners, and it's not UML compliant and it messes up the UML dependency diagram resulting in overly complicated code and inconsistent design patterns such as children actually being parents and vice versa. You don't want or need a detail folder, you need to use a public.h header with a bunch of sibling modules WITHOUT ADDITIONAL NAMESPACES where your detail is a sibling and not a child that is in reatliy a parent. Namespaces are supposed to be for one thing and one thing only: to interface your code with other people's code, but if it's your code you control it and you should use unique class and funciton names because it's bad practice to use a namesapce when you don't need to because it may cause hash table collision that slow downt he compilation process. UML is the best pratice, so if you can organize your headers so they are UML compliant then your code is by definition more robust and portable. A public.h file is all you need to expose only the public API; thanks.
Related
I am trying to write my first (very) small, for now only self-use, library. In the process I came across questions regarding how I should separate my headers/source code/object files in a logical way.
To be more specific, I'm writing a small templated container class, so for one I have to include the implementation of this class inside its header.
I have a directory structure like this:
include/ - "public" .hh header files included by extern projects
src/ - .cc files for implementation (+ "private" .hh header files?)
lib/ - .o compiled library files linked by extern projects
I am already not sure if this makes sense.. in my case I also wrote some helper-classes used by my templated container class, one of which is something like an iterator. So I have the following files:
container.hh
container.cc
container_helper.hh
container_helper.cc
container_iterator.cc
container_iterator.hh
While I want to have access to their functions in external projects (e.g. incrementing the iterator), it makes no sense to me that a project would specifically
#include "container_iterator.hh"
Now, since I want projects to be able to use the container class, I put "container.hh" and "container.cc" (since it must be included in "container.hh" because of the template) into the "include/" directory, which is then included by other projects.
Now my confusion arises.. the container class needs access to the helper classes, but I don't want other projects to only include the helper classes, so it seems wrong to place also the helper classes into "include/" directory. Instead, I would place them in "src/".
But if I do this, then to include these in "include/container.cc" I have to use relative filepath
#include "../src/container_iterator.hh"
But now if I "distribute" my library to an external project, i.e. I only make the "include/" directory visible to the compiler, it will not compile (?), since "../src/container_iterator.hh" does not exist.
Or do I compile the container class and put it as library into "lib/", which is then linked by other projects? But even then do I not still need to include the header "container.hh", to be able to find the function declarations, which leads to the same problem?
Basically I'm lost here.. how does the standard do this? E.g. I can
#include <vector>
, but I don't know of any header to only include std::vector::iterator, which would make no sense to do so.
At some point in my expanation I must be talking nonsense but I cannot find where. I think I understand what a header and a library is/should be, but when it comes to how to design and/or "distribute" them for an actual project, I am stuck. I keep coming across problems like this even when I started learning C++ (or any language for that matter), no course / no book ever seems to explain how to implement all these concepts, only how to use them when they already exist..
Edit
To clarify my confusion (?) more.. (this got a bit too long for a comment) I did read before to put implementation of templated classes into the header, which is why I realized I need to at least put the "container.cc" into the include/ dir. While I don't particularly like this, at least it should be clear to an external user to not include ".cc" files.
Should I take this also as meaning that it never makes sense to compile templated classes into a library, since all of it will be always included?
(So templated code is always open-source? ..that sounds wrong?)
And in this case I still wonder how STL does it, does vector declare & define its iterator in its own header? Or is there a separate header for vector::iterator I could include, it just would make no sense to do so?
Hopefully I explained my intent clearly, please comment if not.
Thanks for any help!
In my experience, the most common method to handle your problems is to have headers with the template declarations and documentation (your .hh files), which also include .inc or .tcc (your preference) files with the template definitions. I also suggest keeping all files that may be included by external projects in the same folder, but if you want to keep things clean, put your .inc/.tcc files in a folder within include called detail (or bits if you like GNU style).
Putting stuff in a detail folder,
and using a weird extension should deter users enough.
To answer your other questions:
Due to the nature of C++ templates,
either the entire source of the parts of a template you use
must be present in the translation unit (ie. #include'd),
or you can use explicit instantiation
for a finite number of arguments
(this is not generally useful for a container, though).
So, for most purposes, you have to distribute a template's source, though,
of course, (as mentioned in the comments) "open source" is about licence,
not source visibility.
As for the standard libraries, lets use <vector> as an example.
The GNU C++ Library has a vector file that (among other things) includes
bits/stl_vector.h which has the declarations & documentation,
and includes a bits/vector.tcc that has the definitions.
LLVM's libc++ just has one giant file,
but puts the declarations at the top (without documentation!)
and all the definitions at the bottom.
As a final note, there are lots of open source C++ libraries that you can take a look at for inspiration in the future!
I have seen many explanations on when to use forward declarations over including header files, but few of them go into why it is important to do so. Some of the reasons I have seen include the following:
compilation speed
reducing complexity of header file management
removing cyclic dependencies
Coming from a .net background I find header management frustrating. I have this feeling I need to master forward declarations, but I have been scrapping by on includes so far.
Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?
How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
I can buy the argument for reduced complexity, but what would a practical example of this be?
"to master forward declarations" is not a requirement, it's a useful guideline where possible.
When a header is included, and it pulls in more headers, and yet more, the compiler has to do a lot of work processing a single translation module.
You can see how much, for example, with gcc -E:
A single #include <iostream> gives my g++ 4.5.2 additional 18,560 lines of code to process.
A #include <boost/asio.hpp> adds another 74,906 lines.
A #include <boost/spirit/include/qi.hpp> adds 154,024 lines, that's over 5 MB of code.
This adds up, especially if carelessly included in some file that's included in every file of your project.
Sometimes going over old code and pruning unnecessary includes improves the compilation dramatically just because of that. Replacing includes with forward declarations in the translation modules where only references or pointers to some class are used, improves this even further.
Why cannot the compiler work for me and figure out my dependencies using one mechanism (includes)?
It cannot because, unlike some other languages, C++ has an ambiguous grammar:
int f(X);
Is it a function declaration or a variable definition? To answer this question the compiler must know what does X mean, so X must be declared before that line.
Because when you're doing something like this :
bar.h :
class Bar {
int foo(Foo &);
}
Then the compiler does not need to know how the Foo struct / class is defined ; so importing the header that defines Foo is useless. Moreover, importing the header that defines Foo might also need importing the header that defines some other class that Foo uses ; and this might mean importing the header that defines some other class, etc.... turtles all the way.
In the end, the file that the compiler is working against is almost like the result of copy pasting all the headers ; so it will get big for no good reason, and when someone makes a typo in a header file that you don't need (or import , or something like that), then compiling your class starts to take waaay too much time (or fail for no obvious reason).
So it's a good thing to give as little info as needed to the compiler.
How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
1) reduced disk i/o (fewer files to open, fewer times)
2) reduced memory/cpu usage
most translations need only a name. if you use/allocate the object, you'll need its declaration.
this is probably where it will click for you: each file you compile compiles what is visible in its translation.
a poorly maintained system will end up including a ton of stuff it does not need - then this gets compiled for every file it sees. by using forwards where possible, you can bypass that, and significantly reduce the number of times a public interface (and all of its included dependencies) must be compiled.
that is to say: the content of the header won't be compiled once. it will be compiled over and over. everything in this translation must be parsed, checked that it's a valid program, checked for warnings, optimized, etc. many, many times.
including lazily only adds significant disk/cpu/memory increase, which turns into intolerable build times for you, while introducing significant dependencies (in non-trivial projects).
I can buy the argument for reduced complexity, but what would a practical example of this be?
unnecessary includes introduce dependencies as side effects. when you edit an include (necessary or not), then every file which includes it must be recompiled (not trivial when hundreds of thousands of files must be unnecessarily opened and compiled).
Lakos wrote a good book which covers this in detail:
http://www.amazon.com/Large-Scale-Software-Design-John-Lakos/dp/0201633620/ref=sr_1_1?ie=UTF8&s=books&qid=1304529571&sr=8-1
Header file inclusion rules specified in this article will help reduce the effort in managing header files.
I used forward declarations simply to reduce the amount of navigation between source files done. e.g. if module X calls some glue or interface function F in module Y, then using a forward declaration means the writing the function and the call can be done by only visiting 2 places, X.c and Y.c not so much of an issue when a good IDE helps you navigate, but I tend to prefer coding bottom-up creating working code then figuring out how to wrap it rather than through top down interface specification.. as the interfaces themselves evolve it's handy to not have to write them out in full.
In C (or c++ minus classes) it's possible to truly keep structure details Private by only defining them in the source files that use them, and only exposing forward declarations to the outside world - a level of black boxing that requires performance-destroying virtuals in the c++/classes way of doing things. It's also possible to avoid needing to prototype things (visiting the header) by listing 'bottom-up' within the source files (good old static keyword).
The pain of managing headers can sometimes expose how modular your program is or isn't - if its' truly modular, the number of headers you have to visit and the amount of code & datastructures declared within them should be minimized.
Working on a big project with 'everything included everywhere' through precompiled headers won't encourage this real modularity.
module dependancies can correlate with data-flow relating to performance issues, i.e. both i-cache & d-cache issues. If a program involves many modules that call each other & modify data at many random places, it's likely to have poor cache-coherency - the process of optimizing such a program will often involve breaking up passes and adding intermediate data.. often playing havoc with many'class diagrams'/'frameworks' (or at least requiring the creation of many intermediates datastructures). Heavy template use often means complex pointer-chasing cache-destroying data structures. In its optimized state, dependancies & pointer chasing will be reduced.
I believe forward declarations speed up compilation because the header file is ONLY included where it is actually used. This reduces the need to open and close the file once. You are correct that at some point the object referenced will need to be compiled, but if I am only using a pointer to that object in my other .h file, why actually include it? If I tell the compiler I am using a pointer to a class, that's all it needs (as long as I am not calling any methods on that class.)
This is not the end of it. Those .h files include other .h files... So, for a large project, opening, reading, and closing, all the .h files which are included repetitively can become a significant overhead. Even with #IF checks, you still have to open and close them a lot.
We practice this at my source of employment. My boss explained this in a similar way, but I'm sure his explanation was more clear.
How do forward declarations speed up compilations since at some point the object referenced will need to be compiled?
Because include is a preprocessor thing, which means it is done via brute force when parsing the file. Your object will be compiled once (compiler) then linked (linker) as appropriate later.
In C/C++, when you compile, you've got to remember there is a whole chain of tools involved (preprocessor, compiler, linker plus build management tools like make or Visual Studio, etc...)
Good and evil. The battle continues, but now on the battle field of header files. Header files are a necessity and a feature of the language, but they can create a lot of unnecessary overhead if used in a non optimal way, e.g. not using forward declarations etc.
How do forward declarations speed up
compilations since at some point the
object referenced will need to be
compiled?
I can buy the argument for reduced
complexity, but what would a practical
example of this be?
Forward declarations are bad ass. My experience is that a lot of c++ programmers are not aware of the fact that you don't have to include any header file, unless you actually want to use some type, e.g. you need to have the type defined so the compiler understands what you want to do. It's important to try and refrain from including header files in other header files.
Just passing around a pointer from one function to another, only requires a forward declaration:
// someFile.h
class CSomeClass;
void SomeFunctionUsingSomeClass(CSomeClass* foo);
Including someFile.h does not require you to include the header file of CSomeClass, since you are merely passing a pointer to it, not using the class. This means that the compiler only needs to parse one line (class CSomeClass;) instead of an entire header file (that might be chained to other header files etc etc).
This reduces both compile time and link time, and we are talking big optimizations here if you have many headers and many classes.
Should every .C or .cpp file should have a header (.h) file for it?
Suppose there are following C files :
Main.C
Func1.C
Func2.C
Func3.C
where main() is in Main.C file. Should there be four header files
Main.h
Func1.h
Func2.h
Func3.h
Or there should be only one header file for all .C files?
What is a better approach?
For a start, it would be unusual to have a main.h since there's usually nothing that needs to be exposed to the other compilation units at compile time. The main() function itself needs to be exposed for the linker or start-up code but they don't use header files.
You can have either one header file per C file or, more likely in my opinion, a header file for a related group of C files.
One example of that is if you have a BTree implementation and you've put add, delete, search and so on in their own C files to minimise recompilation when the code changes.
It doesn't really make sense in that case to have separate header files for each C file, as the header is the API. In other words, it's the view of the library as seen by the user. People who use your code generally care very little about how you've structured your source code, they just want to be able to write as little code as possible to use it.
Forcing them to include multiple distinct header files just so they can create, insert into, delete from, and search, a tree, is likely to have them questioning your sanity :-)
You would be better off with one btree.h file and a single btree.lib file containing all of the BTree object files that were built from the individual C files.
Another example can be found in the standard C headers.
We don't know for certain whether there are multiple C files for all the stdio.h functions (that's how I'd do it but it's not the only way) but, even if there were, they're treated as a unit in terms of the API.
You don't have to include stdio_printf.h, stdio_fgets.h and so on - there's a single stdio.h for the standard I/O part of the C runtime library.
Header files are not mandatory.
#include simply copy/paste whatever file included (including .c source files)
Commonly used in real life projects are global header files like config.h and constants.h that contains commonly used information such as compile-time flags and project wide constants.
A good design of a library API would be to expose an official interface with one set of header files and use an internal set of header files for implementation with all the details. This adds a nice extra layer of abstraction to a C library without adding unnecessary bloat.
Use common sense. C/C++ is not really for the ones without it.
I used to follow the "it depends" trend until I realized that consistency, uniformity and simplicity are more important than saving the effort to create a file, and that "standards are good even when they are bad".
What I mean is the following: a .cpp/.h pair of files is pretty much what all "modules" end up anyway. Making the existing of both a requirement saves a lot of confusion and bad engineering.
For instance, when I see some interface of something in a header file, I know exactly where to search for / place its implementation. Conversely, if I need to expose the interface of something that was previously hidden in .cpp file (e.g. static function becoming global), I know exactly where to put it.
I've seen too many bad consequences of not following this simple rule. Unnecessary inline functions, breaking any kind of rules about encapsulation, (non)separation of the interface and implementation, misplaced code, to name a few -- all due to the fact that the appropriate sibling header or cpp file was never added.
So: always define both .h and .c files. Make it a standard, follow it, and safely rely on it. Life is much simpler this way, and simplicity is the most important thing in software.
Generally it's best to have a header file for each .c file, containing the declarations for functions etc in the .c file that you want to expose. That way, another .c file can include the .h file for the functions it needs, and won't need to be recompiled if a header file it didn't include got changed.
Generally there will be one .h file for each .c/.cpp file.
Bjarne Stroustrup Explains it beautifully in his book "The C++ Programming Language"....
The single header style of physical partitioning is most useful when the program is small and its parts are not intended for separate use. When namespaces are used, the logical structure of the program can still be explained in a single header file.
For larger Programs, the single header file approach is unworkable in a conventional file-based development environment. A change to the common header forces recompilation of the whole program, and updates of that single header by several programmers are error prone. Unless strong emphasis is placed on programming styles relying heavily on namespaces and classes, the logical structure deteriorates as program grows.
An alternative physical organization lets each logical module have its own header defining the facilities it provides. Each .c file then has a corresponding h. file specifying what it provides(its interface). Each .c module includes its own .h file and usually also other .h files that specifies what it needs from other modules in order to implement the services advertised in its interface. This physical organization corresponds to the logical organization of a module. The multiple header approach makes it easy to determine the dependencies. The single header approach forces us to look at every declarations used by any module and decide if its relevant. The simple fact is that maintenance of a code is invariably done with incomplete information and from a local perspective.
The better localization leads to less information to compile a module and thus faster compilation..
It depends. Usually your reason for having separate .c files will dictate whether you need separate .h files.
Generally cpp/c files are for implementation and h/hpp (hpp are not used often) files are for header files (prototypes and declarations only). Cpp files don't always have to have a header file associated with it but it usually does as the header file acts like a bridge between cpp files so each cpp file can use code from another cpp file.
One thing that should be strongly enforced is the no use of code within a header file! There's been too many times where header files break compiles in any size project because of redefinitions. And that's simply when you include the header file in 2 different cpp files. Header files should always be designed to be included multiple times as well. Cpp files should never be included.
It's all about what code needs to be aware of what other code. You want to reduce the amount other files are aware of to the bare minimum for them to do their jobs.
They need to know that a function exists, what types they need to pass into it, and what types it will return, but not what it's doing internally. Note that it's also important from the programmers point of view to know what those types actually mean. (e.g which int is the row, and which is the column) but the code itself doesn't care. This is why naming the function and parameters sensibly is worthwhile.
As others have said, if there's nothing in a cpp file worth exposing to other parts of the code, as is normally the case with main.c, then there's no need for a header file.
It's occasionally worth putting everything you want to expose in a single header file (e.g, Func1and2and3.h), so that anything that knows about Func1 knows about Func2 as well, but I'm personally not keen on this, as it means that you tend to load a hell of a lot of junk along with the stuff you actually want.
Summary:
Imagine that you trust that someone can write code and that their algorithms, design, etc. are all good. You want to use code they've written. All you need to know is what to give them to get something to happen, what you should give it to, and what you'll get back. That's what needs to go in the header files.
I like putting interfaces into header files and implementation in cpp files. I don't like writing C++ where I need to add member variables and prototypes to the header and then the method again in the C++. I prefer something like:
module.h
struct IModuleInterface : public IUnknown
{
virtual void SomeMethod () = 0;
}
module.cpp
class ModuleImpl : public IModuleInterface,
public CObject // a common object to do the reference
// counting stuff for IUnknown (so we
// can stick this object in a smart
// pointer).
{
ModuleImpl () : m_MemberVariable (0)
{
}
int m_MemberVariable;
void SomeInternalMethod ()
{
// some internal code that doesn't need to be in the interface
}
void SomeMethod ()
{
// implementation for the method in the interface
}
// whatever else we need
};
I find this is a really clean way of separating implementation and interface.
There is no better approach, only common and less common cases.
The more common case is when you have a class/function interface to declare/define. It's better to have only one .cpp/.c with the definitions, and one header for the declarations.
Giving them the same name makes easy to understand that they are directly related.
But that's not a "rule", that's the common way and the most efficient in almost all cases.
Now in some cases( like template classes or some tiny struct definition ) you'll not need any .c/.cpp file, just the header. We often have some virtual class interface definition in only a header file for example, with only virtual pure functions or trivial functions.
And in other rare cases (like an hypothetical main.c/.cpp file) if wouldn't be always required to allow code from external compilation unit to call the function of a given compilation unit. The main function is an example (no header/declaration needed), but there are others, mostly when it's code that "connect all the other parts together" and is not called by other parts of the application. That's very rare but in this case a header make no sense.
If your file exposes an interface - that is, if it has functions which will be called from other files - then it should have a header file. Otherwise, it shouldn't.
As already noted, generally, there will be one header (.h) file for each source (.c or .cpp) file.
However, you should look at the cohesiveness of the files. If the various source files provide separate, individually reusable sets of functions - an ideal organization - then you should certainly have one header per file. If, however, the three source files provide a composite set of functions (that is too big to fit into one file), then you would use a more complex organization. There would be one header for the external services used by the main program - and that would be used by other programs needing the same services. There would also be a second header used by the cooperating source files that provides 'internal' definitions shared by those files.
(Also noted by Pax): The main program does not normally need its own header - no other source code should be using the services it provides; it uses the services provided by other files.
If you want your compiled code to be used from another compilation unit you will need the header files. There are some situations for which you do now need/want to have a headers.
The first such case are main.c/cpp files. This class is not meant to be included and as such there is no need for a header file.
In some cases you can have a header file that defines behavior of a set of different implementations that are loaded through a dll that is loaded at runtime. There will be different set of .c/.cpp files that implement variations of the same header. This can be common in plugin systems.
In general, I don't think there is any explicit relationship between .h and .c files. In many cases (probably most), a unit of code is a library of functionality with a public interface (.h) and an opaque implementation (.c). Sometimes a number of symbols are needed, like enums or macros, and you get a .h with no corresponding .c and in a few circumstances, you will have a lump of code with no public interface and no corresponding .h
in particular, there are a number of times when, for the sake of readability, the headers or implementations (seldom both) are so big and hairy that they end up being broken into many smaller files, for the sake of the programmer's sanity.
My personal style with C++ has always to put class declarations in an include file, and definitions in a .cpp file, very much like stipulated in Loki's answer to C++ Header Files, Code Separation. Admittedly, part of the reason I like this style probably has to do with all the years I spent coding Modula-2 and Ada, both of which have a similar scheme with specification files and body files.
I have a coworker, much more knowledgeable in C++ than I, who is insisting that all C++ declarations should, where possible, include the definitions right there in the header file. He's not saying this is a valid alternate style, or even a slightly better style, but rather this is the new universally-accepted style that everyone is now using for C++.
I'm not as limber as I used to be, so I'm not really anxious to scrabble up onto this bandwagon of his until I see a few more people up there with him. So how common is this idiom really?
Just to give some structure to the answers: Is it now The Way™, very common, somewhat common, uncommon, or bug-out crazy?
Your coworker is wrong, the common way is and always has been to put code in .cpp files (or whatever extension you like) and declarations in headers.
There is occasionally some merit to putting code in the header, this can allow more clever inlining by the compiler. But at the same time, it can destroy your compile times since all code has to be processed every time it is included by the compiler.
Finally, it is often annoying to have circular object relationships (sometimes desired) when all the code is the headers.
Bottom line, you were right, he is wrong.
EDIT: I have been thinking about your question. There is one case where what he says is true. templates. Many newer "modern" libraries such as boost make heavy use of templates and often are "header only." However, this should only be done when dealing with templates as it is the only way to do it when dealing with them.
EDIT: Some people would like a little more clarification, here's some thoughts on the downsides to writing "header only" code:
If you search around, you will see quite a lot of people trying to find a way to reduce compile times when dealing with boost. For example: How to reduce compilation times with Boost Asio, which is seeing a 14s compile of a single 1K file with boost included. 14s may not seem to be "exploding", but it is certainly a lot longer than typical and can add up quite quickly when dealing with a large project. Header only libraries do affect compile times in a quite measurable way. We just tolerate it because boost is so useful.
Additionally, there are many things which cannot be done in headers only (even boost has libraries you need to link to for certain parts such as threads, filesystem, etc). A Primary example is that you cannot have simple global objects in header only libs (unless you resort to the abomination that is a singleton) as you will run into multiple definition errors. NOTE: C++17's inline variables will make this particular example doable in the future.
As a final point, when using boost as an example of header only code, a huge detail often gets missed.
Boost is library, not user level code. so it doesn't change that often. In user code, if you put everything in headers, every little change will cause you to have to recompile the entire project. That's a monumental waste of time (and is not the case for libraries that don't change from compile to compile). When you split things between header/source and better yet, use forward declarations to reduce includes, you can save hours of recompiling when added up across a day.
The day C++ coders agree on The Way, lambs will lie down with lions, Palestinians will embrace Israelis, and cats and dogs will be allowed to marry.
The separation between .h and .cpp files is mostly arbitrary at this point, a vestige of compiler optimizations long past. To my eye, declarations belong in the header and definitions belong in the implementation file. But, that's just habit, not religion.
Code in headers is generally a bad idea since it forces recompilation of all files that includes the header when you change the actual code rather than the declarations. It will also slow down compilation since you'll need to parse the code in every file that includes the header.
A reason to have code in header files is that it's generally needed for the keyword inline to work properly and when using templates that's being instanced in other cpp files.
What might be informing you coworker is a notion that most C++ code should be templated to allow for maximum usability. And if it's templated, then everything will need to be in a header file, so that client code can see it and instantiate it. If it's good enough for Boost and the STL, it's good enough for us.
I don't agree with this point of view, but it may be where it's coming from.
I think your co-worker is smart and you are also correct.
The useful things I found that putting everything into the headers is that:
No need for writing & sync headers and sources.
The structure is plain and no circular dependencies force the coder to make a "better" structure.
Portable, easy to embedded to a new project.
I do agree with the compiling time problem, but I think we should notice that:
The change of source file are very likely to change the header files which leads to the whole project be recompiled again.
Compiling speed is much faster than before. And if you have a project to be built with a long time and high frequency, it may indicates that your project design has flaws. Seperate the tasks into different projects and module can avoid this problem.
Lastly I just wanna support your co-worker, just in my personal view.
Often I'll put trivial member functions into the header file, to allow them to be inlined. But to put the entire body of code there, just to be consistent with templates? That's plain nuts.
Remember: A foolish consistency is the hobgoblin of little minds.
As Tuomas said, your header should be minimal. To be complete I will expand a bit.
I personally use 4 types of files in my C++ projects:
Public:
Forwarding header: in case of templates etc, this file get the forwarding declarations that will appear in the header.
Header: this file includes the forwarding header, if any, and declare everything that I wish to be public (and defines the classes...)
Private:
Private header: this file is a header reserved for implementation, it includes the header and declares the helper functions / structures (for Pimpl for example or predicates). Skip if unnecessary.
Source file: it includes the private header (or header if no private header) and defines everything (non-template...)
Furthermore, I couple this with another rule: Do not define what you can forward declare. Though of course I am reasonable there (using Pimpl everywhere is quite a hassle).
It means that I prefer a forward declaration over an #include directive in my headers whenever I can get away with them.
Finally, I also use a visibility rule: I limit the scopes of my symbols as much as possible so that they do not pollute the outer scopes.
Putting it altogether:
// example_fwd.hpp
// Here necessary to forward declare the template class,
// you don't want people to declare them in case you wish to add
// another template symbol (with a default) later on
class MyClass;
template <class T> class MyClassT;
// example.hpp
#include "project/example_fwd.hpp"
// Those can't really be skipped
#include <string>
#include <vector>
#include "project/pimpl.hpp"
// Those can be forward declared easily
#include "project/foo_fwd.hpp"
namespace project { class Bar; }
namespace project
{
class MyClass
{
public:
struct Color // Limiting scope of enum
{
enum type { Red, Orange, Green };
};
typedef Color::type Color_t;
public:
MyClass(); // because of pimpl, I need to define the constructor
private:
struct Impl;
pimpl<Impl> mImpl; // I won't describe pimpl here :p
};
template <class T> class MyClassT: public MyClass {};
} // namespace project
// example_impl.hpp (not visible to clients)
#include "project/example.hpp"
#include "project/bar.hpp"
template <class T> void check(MyClass<T> const& c) { }
// example.cpp
#include "example_impl.hpp"
// MyClass definition
The lifesaver here is that most of the times the forward header is useless: only necessary in case of typedef or template and so is the implementation header ;)
To add more fun you can add .ipp files which contain the template implementation (that is being included in .hpp), while .hpp contains the interface.
As apart from templatized code (depending on the project this can be majority or minority of files) there is normal code and here it is better to separate the declarations and definitions. Provide also forward-declarations where needed - this may have effect on the compilation time.
Generally, when writing a new class, I will put all the code in the class, so I don't have to look in another file for it.. After everything is working, I break the body of the methods out into the cpp file, leaving the prototypes in the hpp file.
I personally do this in my header files:
// class-declaration
// inline-method-declarations
I don't like mixing the code for the methods in with the class as I find it a pain to look things up quickly.
I would not put ALL of the methods in the header file. The compiler will (normally) not be able to inline virtual methods and will (likely) only inline small methods without loops (totally depends on the compiler).
Doing the methods in the class is valid... but from a readablilty point of view I don't like it. Putting the methods in the header does mean that, when possible, they will get inlined.
I think that it's absolutely absurd to put ALL of your function definitions into the header file. Why? Because the header file is used as the PUBLIC interface to your class. It's the outside of the "black box".
When you need to look at a class to reference how to use it, you should look at the header file. The header file should give a list of what it can do (commented to describe the details of how to use each function), and it should include a list of the member variables. It SHOULD NOT include HOW each individual function is implemented, because that's a boat load of unnecessary information and only clutters the header file.
If this new way is really The Way, we might have been running into different direction in our projects.
Because we try to avoid all unnecessary things in headers. That includes avoiding header cascade. Code in headers will propably need some other header to be included, which will need another header and so on. If we are forced to use templates, we try avoid littering headers with template stuff too much.
Also we use "opaque pointer"-pattern when applicable.
With these practices we can do faster builds than most of our peers. And yes... changing code or class members will not cause huge rebuilds.
I put all the implementation out of the class definition. I want to have the doxygen comments out of the class definition.
IMHO, He has merit ONLY if he's doing templates and/or metaprogramming. There's plenty of reasons already mentioned that you limit header files to just declarations. They're just that... headers. If you want to include code, you compile it as a library and link it up.
Doesn't that really depends on the complexity of the system, and the in-house conventions?
At the moment I am working on a neural network simulator that is incredibly complex, and the accepted style that I am expected to use is:
Class definitions in classname.h
Class code in classnameCode.h
executable code in classname.cpp
This splits up the user-built simulations from the developer-built base classes, and works best in the situation.
However, I'd be surprised to see people do this in, say, a graphics application, or any other application that's purpose is not to provide users with a code base.
Template code should be in headers only. Apart from that all definitions except inlines should be in .cpp. The best argument for this would be the std library implementations which follow the same rule. You would not disagree the std lib developers would be right regarding this.
I think your co-worker is right as long as he does not enter in the process to write executable code in the header.
The right balance, I think, is to follow the path indicated by GNAT Ada where the .ads file gives a perfectly adequate interface definition of the package for its users and for its childs.
By the way Ted, have you had a look on this forum to the recent question on the Ada binding to the CLIPS library you wrote several years ago and which is no more available (relevant Web pages are now closed). Even if made to an old Clips version, this binding could be a good start example for somebody willing to use the CLIPS inference engine within an Ada 2012 program.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
When working on a big C/C++ project, do you have some specific rules regarding the #include within source or header files?
For instance, we can imagine to follow one of these two excessive rules:
#include are forbidden in .h files; it is up to each .c file to include all the headers it needs
Each .h file should include all its dependancies, i.e. it should be able to compile alone without any error.
I suppose there is trade-off in between for any project, but what is yours? Do you have more specific rules? Or any link that argues for any of the solutions?
If you include H-files exclusively into C-files, then including a H-file into a C-file might cause compilation to fail. It might fail because you may have to include 20 other H-files upfront, and even worse, you have to include them in the right order. With a real lot of H-files, this system ends up to be an administrative nightmare in the long run. All you wanted to do was including one H-file and you ended up spending two hours to find out which other H-files in which order you will need to include as well.
If a H-file can only be successfully included into a C-file in case another H-file is included first, then the first H-file should include the second one and so on. That way you can simply include every H-file into every C-file you like without having to fear that this may break compilation. That way you only specify your direct dependencies, yet if these dependencies themselves also have dependencies, its up to them to specify those.
On the other hand, don't include H-files into H-files if that isn't necessary. hashtable.h should only include other header files that are required to use your hashtable implementation. If the implementation itself needs hashing.h, then include it in hashtable.c, not in hashtable.h, as only the implementation needs it, not the code that only would like to use the final hashtable.
I think both suggested rules are bad. In my part I always apply:
Include only the header files required to compile a file using only what is defined in this header. This means:
All objects present as reference or pointers only should be forward-declared
Include all headers defining functions or objects used in the header itself.
I would use the rule 2:
All Headers should be self-sufficient, be it by:
not using anything defined elsewhere
forward declaring symbols defined elsewhere
including the headers defining the symbols that can't be forward-declared.
Thus, if you have an empty C/C++ source file, including an header should compile correctly.
Then, in the C/C++ source file, include only what is necessary: If HeaderA forward-declared a symbol defined in HeaderB, and that you use this symbol you'll have to include both... The good news being that if you don't use the forward-declared symbol, then you'll be able to include only HeaderA, and avoid including HeaderB.
Note that playing with templates makes this verification "empty source including your header should compile" somewhat more complicated (and amusing...)
The first rule will fail as soon as there are circular dependencies. So it cannot be applied strictly.
(This can still be made to work but this shifts a whole lot of work from the programmer to the consumer of these libraries which is obviously wrong.)
I'm all in favour of rule 2 (although it might be good to include “forward declaration headers” instead of the real deal, as in <iosfwd> because this reduces compile time). Generally, I believe it's a kind of self-documentation if a header file “declares” what dependencies it has – and what better way to do this than to include the required files?
EDIT:
In the comments, I've been challenged that circular dependencies between headers are a sign of bad design and should be avoided.
That's not correct. In fact, circular dependencies between classes may be unavoidable and aren't a sign of bad design at all. Examples are abundant, let me just mention the Observer pattern which has a circular reference between the observer and the subject.
To resolve the circularity between classes, you have to employ forward declaration because the order of declaration matters in C++. Now, it is completely acceptable to handle this forward declaration in a circular manner to reduce the number of overall files and to centralize code. Admittedly, the following case doesn't merit from this scenario because there's only a single forward declaration. However, I've worked on a library where this has been much more.
// observer.hpp
class Observer; // Forward declaration.
#ifndef MYLIB_OBSERVER_HPP
#define MYLIB_OBSERVER_HPP
#include "subject.hpp"
struct Observer {
virtual ~Observer() = 0;
virtual void Update(Subject* subject) = 0;
};
#endif
// subject.hpp
#include <list>
struct Subject; // Forward declaration.
#ifndef MYLIB_SUBJECT_HPP
#define MYLIB_SUBJECT_HPP
#include "observer.hpp"
struct Subject {
virtual ~Subject() = 0;
void Attach(Observer* observer);
void Detach(Observer* observer);
void Notify();
private:
std::list<Observer*> m_Observers;
};
#endif
A minimal version of 2. .h files include only the header files it specifically requires to compile, using forward declaration and pimpl as much as is practical.
Always have some sort of header guard.
Do not pollute the user's global namespace by putting any using namespace statements in a header.
I'd recommend going with the second option. You often end up in the situation where you want to add somwhing to a header file that suddenly requires another header file. And with the first option, you would have to go through and update lots of C files, sometimes not even under your control. With the second option, you simply update the header file, and the users who don't even need the new functionality you just added needn't even know you did it.
The first alternative (no #includes in headers) is a major no-no for me. I want to freely #include whatever I might need without worrying about manually #includeing its dependencies as well. So, in general, I follow the second rule.
Regarding cyclic dependencies, my personal solution is to structure my projects in terms of modules rather than in terms of classes. Inside a module, all types and functions may have arbitrary dependencies on one another. Across module boundaries, there may not be circular dependencies between modules. For each module, there is a single *.hpp file and a single *.cpp file. This ensures that any forward declarations (necessary for circular dependencies, which can only happen inside a module) in a header are ultimately always resolved inside the same header. There is no need for forward-declaration-only headers whatsoever.
Pt. 1 fails when you would like to have precompiled headers through a certain header; eg. this is what StdAfx.h are for in VisualStudio: you put all common headers there...
This comes down to interface design:
Always pass by reference or pointer. If you aren't going to check the pointer, pass by
reference.
Forward declare as much as possible.
Never use new in a class - create factories to do that for you and pass them to the class.
Never use pre-compiled headers.
In Windows my stdafx only ever includes afx___.h headers - no string, vector or boost libraries.
Rule nr. 1 would require you to list your header files in a very specific order (include files of base classes must go before include files of derived classes, etc), which would easily lead to compilation errors if you get the order wrong.
The trick is, as several others have mentioned, use forward declarations as much as possible, i.e. if references or pointers are used. To minimize build dependencies in this way the pimpl idiom can be useful.
I agree with Mecki, to put it shorter,
for every foo.h in your project include only those headers that are required to make
// foo.c
#include "any header"
// end of foo.c
compile.
(When using precompiled headers, they are allowed, of course - e.g. the #include "stdafx.h" in MSVC)
Personally I do it this way:
1 Perfer forward declare to include other .h files in a .h file. If something could be used as pointer/reference in that .h files or class, forward declare is possible without compile error. This could make headers less include dependencies(save compile time? not sure:( ).
2 Make .h files simple or specific. e.g. it is bad to define all constances in a file called CONST.h, it is better to divid them into multiple ones like CONST_NETWORK.h, CONST_DB.h. So to use one constance of DB, it needn't to include other information about network.
3 Don't put implementation in headers. Headers are used to quick review public things for other people; when implementing them, don't pollute declaration with detail for others.