C++ Single Header File Structure - c++

I want to speed up the build time of my c++ project, and I am wondering if my current structure may cause unnecessary recompilations.
I have *.cc and corresponding *.h files, but all my *.cc files include a single header file which is main.h.
In main.h, I include everything necessary and extern global variables and declare the functions I use. Basically, I'm not using any namespaces.
Is this a bad design that could cause unnecessary recompiles and slow builds?

It depends. If main.h is seldom modified, you could use precompiled headers, which will greatly improve compilation time.
On the other hand, if main.h is regularly used, it's probably not a good design.
An additional problem introduced by putting everything in one include file is that it doesn't really promote structure in your application. In well-designed applications you often have a layered structure. By putting everything in one include file, you obfuscate the structure in your application. This may work for a small application, but if your application grows, you will end up one day with a complete spaghetti, where everything depends on everything else.
Try to split the include file in multiple parts. Typically you will have one .cpp and one .h file per class. Try to use forward declarations as much as possible in your include file, and only include (in .h and .cpp) what's really needed.

That design will definitely lead to slow build time. What make files and IDEs do when you start a build is they check which source (cc) files have been modified since the last time you compiled. It also checks whether any files that a source file depends on have been modified. A source file depends on all the header files it includes, and all the header files those header files include, etc. If it detects any modifications then it recompiles that source file.
Since your set up means that each source files includes every single header file, any time you modify even a single header file you need to recompile every source file.
You'll definitely want to try and separate things a bit more and get rid of your main.h file. Usually people try and minimize the number of header files included in a header file and prefer to keep the includes in source files, by the way.

Related

What's the REAL difference between .h and .cpp files?

This question was posted several times on StackOverflow, but most of the answers stated something similar to ".h files are supposed to contain declarations whereas .cpp files are supposed to contain their definitions/implementation". I've noticed that simply defining functions in .h files works just fine. What's the purpose of declaring functions in .h files but defining and implementing them in .cpp files? Does it really reduce compile time? What else?
Practically: the conventions around .h files are in place so that you can safely include that file in multiple other files in your project. Header files are designed to be shared, while code files are not.
Let's take your example of defining functions or variables. Suppose your header file contains the following line:
header.h:
int x = 10;
code.cpp:
#include "header.h"
Now, if you only have one code file and one header file this probably works just fine:
g++ code.cpp -o outputFile
However, if you have two code files this breaks:
header.h:
int x = 10;
code1.cpp:
#include "header.h"
code2.cpp:
#include "header.h"
And then:
g++ code1.cpp -c (produces code1.o)
g++ code2.cpp -c (produces code2.o)
g++ code1.o code2.o -o outputFile
This breaks, specifically at the linker step, because now you have two symbols in the same executable that have the same symbol, and the linker doesn't know what's it's supposed to do with that. When you include your header in code1 you get a symbol "x" and when you include your header in code2 you get another symbol "x". The linker doesn't know your intention here, so it throws an error:
code2.o:(.data+0x0): multiple definition of `x'
code1.o:(.data+0x0): first defined here
collect2: error: ld returned 1 exit status
Which again is just the linker saying that it can't resolve the fact that you now have two symbols with the same name in the same executable.
What's the REAL difference between .h and .cpp files?
They are both fundamentally just text files. From certain perspective, their only difference is the filename.
However, many programming related tools treat the files differently depending on their name. For example, some tools will detect programming language: .c is compiled as C language, .cpp is compiled as C++ and .h is not compiled at all.
For header files, the name does not matter at all to the compiler. The name could be .h or .header or anything else, it doesn't affect how the pre processor includes it. It is however good practice to conform to a common convention in order to avoid confusion.
I've noticed that simply defining functions in .h files works just fine.
Are the functions declared non-inline? Have you ever included the header file into more than one translation unit? If you answered yes to both, then your program has been ill formed. If you didn't, then that would explain why you didn't encounter any problems.
Does it really reduce compile time?
Yes. Dividing function definitions into smaller translation units can indeed reduce the time to compile said translation units compared to compiling larger translation units.
This is because doing less work takes less time. What is important to realise is that other translation units do not need to be recompiled when only one is modified. If you only have one translation unit, then you have to compile it i.e. the program in its entirety.
Multiple translation units are also better because they can be compiled in parallel, which allows taking advantage of modern multi core hardware.
What else?
Does there need to be anything else? Having to wait a few minutes to compile your program instead of a day improves development speed drastically.
There are some other advantages too regarding organisation of files. In particular, it is quite convenient to be able to define different implementations for same function for different target systems on order to be able to support multiple platforms. With header files, you must do tricks with macros while with source files, you simply choose which files to compile.
Another use case where implementing functions in header is not an option is distributing a library without source, as some middleware providers do. You must give the headers or else your functions cannot be called, but if all your source is in the headers, then you've given up your trade secrets. Compiled sources have to be at least reverse engineered.
Keep in mind that the C++ compiler is a fairly simple beast as far as file-handling goes. All it's allowed to do is a read in a single source-code file (and, via the pre-processor, logically insert into that incoming text-stream the contents of any files that the file #includes, recursively), parse the contents, and spit out the resulting .o file.
For small programs, keeping the entire codebase in a single .cpp file (or even a single .h file) works fine, because number of lines of code that the compiler needs to load into memory are small (relative to the computer's RAM).
But imagine you are working on a monster program, with tens of millions of lines of code -- yes, such things do exist. Loading that much code into RAM at once would likely stress the capabilities of all but the most powerful computers, leading to exceedingly long compile times or even outright failure.
And even worse than that, touching any of the code in a .h file requires the recompilation of any other files that #include that .h file, either directly or indirectly -- so if all your code is in .h files, then your compiler is likely to spend a lot of time unnecessarily recompiling a lot of code that didn't actually change.
To avoid those problems, C++ lets you place your code into multiple .cpp files. Since .cpp files are (at least traditionally) never #include'd by anything, the only time your Makefile or IDE will need to recompile any given .cpp file is after you've actually modified that exact file, or a .h file it #include's.
So when you've modified a function in the 375th .cpp file out of 700 .cpp files in your program, and now you want to test your modification, the compiler only has to recompile that one .cpp file and then re-link the .o files into an executable. If OTOH you've modified a .h file, compilation might be much longer, because now the build system will have to recompile every other file that includes that .h file, directly or indirectly, just in case you changed the meaning of something those files depend on.
.cpp files also make link-time issues much easier to deal with. For example, if you want to have a global variable, defining that global variable in a .cpp file (and maybe declaring an extern for it in a .h file) is straightforward; if OTOH you want to do that in a .h file, you'll have to be very careful or you'll end up with duplicate-symbol errors from your linker, and/or subtle violations of the One Definition Rule that will come back to bite you later on.
The REAL difference is that your programming environment lists .h and .cpp files separately. And/or populates file-browser-dialogs appropriately. And/or tries to compile .cpp files into object form (but doesn't do that to .h files). And whatever, depending on which IDE / environment you use.
The second difference is that people assume that your .h files are header files, and that your .cpp files are code source files.
If you don't care about people or development environments, you can put any damn thing you want in a .h or .cpp file, and call them any thing you want. You can put your declarations in a .cpp file and call it an "include file", and your definitions in a .pas file and call it a "source file".
I have to do this kind of thing when working in a constrained environment.
Header files weren't part of the original definition of c. The world got on perfectly well without them. Opening and closing lots of header files did slow down the compilation of c, which is why we got pre-compiled header files. Pre-compiled header files do speed up the compilation and linking of source code, but not any faster than just writing assembler, or machine code, or any other thing that didn't take advantage of the co-operation of other people or a design environment.
It is useful to put declarations in a header file, and definitions in a code source file. That's why you should do that. There isn't a requirement.
Whenever you see an #include <header.h> directive, pretend that the contents of header.h is being copied and pasted right where the #include directive appears.
.cpp files get compiled to become .obj files. They have no knowledge of the existence of any other .cpp file, and are compiled individually. That's why we need to declare things before we use them - otherwise the compiler won't know whether the function we're trying to invoke exists within a different .cpp file.
We use header files to share declarations amongst multiple .cpp files to avoid having to write the same code over and over for every single .cpp file.

Why would using a precompiled header cause a build to be slower?

Our solution contains over 100 projects, over 8000 cpp files and over 10'000 header files.
I'm trying to improve our build times.
One of the projects in the solution contains just 5 cpp files, and takes about 10 seconds to compile. The header files were initially included in the cpp files, but in preparation for switching on precompiled headers, I moved the includes into a single pch.h file.
Each cpp file now includes the pch.h file.
This in itself has not made any discernible change to the compile time - it's still about 10 seconds.
Now when I tell the project to actually use the pch file as a precompiled header, it takes 17 seconds to compile the project.
Why would precompiling the included headers make the project take longer to build than when the file is just #included by each individual cpp file?
More info.
We use a technique called "lumping" - (individual cpp files are not compiled individually - they are each #included into a single project-wide cpp file, and that is the only cpp file which is compiled).
For what it's worth, thanks to spaghetti code, according to "Show Includes" the dozen or so included files in the pch file cause around 3000(!) files to be included. Obviously, this needs fixing!
The precompiled header file is about 130Mb when compiled.
If we switch off lumping, the single project build (not the whole solution) takes 45 seconds. If we then switch on precompiled headers, the build time improves.
I'm probably missing the obvious, but why when lumping is switched on, does switching on precompiled headers slow the build down?
What PCH does is do a "preprocess"/"precompile" stage on common headers included by multiple source files. This helps because repeated "preprocessing"/"precompiling" is avoided, and the compiler loads its previous state for each source file.
If you have just one big source file, this "preprocessing"/"procompiling" also needs to happen, but in total, only once. So then saving and loading the PCH file introduces overhead without taking any away (because there is no repetition whatsoever).
I use the term "preprocess"/"precompile" here because PCH is implemented wildly differently depending on you compiler, and might lean more towards one or the other.
Now, unless you make use of a heavy template library like Boost throughout much of your code, it often is enough to clean up include dependencies to speed up compile time, often by quite a significant factor. But that requires maintenance.

Including all header files in application

I was recently looking through the source code of a C++ application and saw that each class did not #include its needed components, but instead #include'd a "Precompiled.h" header. In this Precompiled header was an inclusion of almost every header in the application (not all of them, it was clear that the length and order of the list was deliberate). Essentially, this would mean that every class had an inclusion of every other class in the application.
Is this wise? Why or why not?
Usually if you write an application, you should only include header files which are really needed in cpp files. If you got a really big application, you should use forward declaration in the header and include necessary files in the cpp file. With that, changes in code only affects a minimum on cpp files, so the compiler had only to compile what really has changed.
The situation can totally flip, when it comes to libraries or code which does not change very often. The filename "Precompiled.h" is already a hint. The compiler can precompile the headers to a special object file, often called PCH file. With that, the compiler has not to resolve every include on every compile time. On heavy nested includes, this has high impact on compile speed, because instead of many files to load and parse, there is only one preparsed file. To archive that you have to declare one or more headers as a kind of center file for building a precompiled header. How you do that differs between different compilers.
For example Visual studio uses the header file "stdafx.h" as the center of the precompilation of header files. Because of that, only header files should include there which are not altered very often. Also the file had to be included first in every cpp file. That is because the compiler can not detect any more if a include file which is included before may have influence to the precompiled file. To avoid that, includes before the precompiled includes are not allowed.
Back to your question. Including every file in one header file to use it as precompiled header makes no sense at all, as it conteract the meaning of a precompiled header file.
It is a very bad idea.
For a .cpp file only include the minimum number of #include files.
Thereby when one of them changes the make (or moral equilivant) will not require the whole lot to be recompiled.
Saves lots of time during development.
PS Use forward declarations in preference to #include

Does it matter whether I put an #include directive in my cpp file or in an included header file?

My c++ program is using a separate header file (Let's call it myHeader.h) and therefore includes it (#include "myHeader.h"). In my program I need to use another header file (Let's call it another.h). Does it make a difference whether I put the #include "another.h" directive in the cpp file or in myHeader.h?
If it's not used in the .h file, then there will be no difference in compilation success/failure.
However, it is recommended to put include for header files you only need in the implementation in the .cpp files for the following reasons:
for encapsulation reasons - no one needs to know what you include solely for the implementation.
Including a file A.h in a header file B.h will also make any file that includes B.h include A.h. That can cause major dependency issues between seemingly unrelated files.
for the above reason, it can also increase build time substantially (every file you include is copied in your compilation unit).
If you need to include a header only in your cpp file then you should include it in your cpp file.
If you include it in your header it will add unneeded dependencies for everyone else who includes your header. This can explode if the unneeded headers you include also include other unneeded headers of their own.
The answer to your question is "No". However, you should try to avoid making unnecessary include statements in your .h files because it will induce longer build times. It is also better for encapsulation reasons as well.
Assuming all your include guards are in place etc then no.
It's best to think of how the user will use the code and try and avoid surprises for them.
In general you should avoid complex trees of include files included form other include files - although precompiled headers on modern compilers help.
BUT you should also make sure that you have all the advanced declarations in place so that the order of includes in a cpp file doesn't matter.
No difference really. Header files and cpp files can both include other files. The included files are effectively copied into the text stream.
There is a difference - every time your h file is included, any files included in that h file are included as well - I haven't kept up-to-date with modern C++ compilers, but this used to really increase compile time.
It also increases the physical dependency of the source - John Lakos' Large Scale C++ Software Design addresses this, and is well worth a read on structuring c++ programs. It's published in 1996, so it's not based around current practice, but the advise on structure is worth knowing.

Is including C++ source files an approved method?

I have a large C++ file (SS.cpp) which I decided to split in smaller files so that I can navigate it without the need of aspirins. So I created
SS_main.cpp
SS_screen.cpp
SS_disk.cpp
SS_web.cpp
SS_functions.cpp
and cut-pasted all the functions from the initial SS.cpp file to them.
And finally I included them in the original file :
#include "SS_main.cpp"
#include "SS_screen.cpp"
#include "SS_disk.cpp"
#include "SS_web.cpp"
#include "SS_functions.cpp"
This situation remains for some months now , and these are the problems I've had :
The Entire Solution search (Shift-Ctrl-F in VS) does not search in the included files, because they are not listed as source files.
I had to manually indicate them for Subversion inclusion.
Do you believe that including source files in other sources is an accepted workaround when files go really big ? I should say that splitting the implemented class in smaller classes is not an option here.
There are times when it's okay to include an implementation file, but this doesn't sound like one of them. Usually this is only useful when dealing with certain auto-generated files, such as the output of the MIDL compiler. As a workaround for large files, no.
You should just add all of those source files to your project instead of #including them. There's nothing wrong with splitting a large class into multiple implementation files, but just add them to your project, including them like that doesn't make much sense.
--
Also, as an FYI, you can add files to your projects, and then instruct the compiler to ignore them. This way they're still searchable. To do this, add the file to the project, then right-click it, and go to Properties, and under "General" set "Exclude from Build" to Yes.
Don't include cpp files in other files. You don't have to define every class function in one file, you can spread them across multiple files. Just add them individually to the project and have it compile all of them separately.
You don't include implementation (.cpp) files. Create header files for these implementation files containing the function/class declarations and include these as required.
There are actually times you will want to include CPP files. There are several questions here about Unity Builds which discuss this very topic.
You need to learn about Separate compilation, linking, and what header files are for.
You need to create a header file for each of those modules (except possibly main.cpp). The header file will contain the declarative parts of each .cpp source file, and the .cpp files themselves will contain the instantive parts. Each unit can then be separately compiled and linked. For example:
main.cpp
#include "function.h"
int main()
{
func1() ;
}
function.h
#if !defined FUNCTION_H
#define FUNCTION_H
extern void func1() ;
#endif
function.cpp
void func1()
{
// do stuff
}
Then function.cpp and main.cpp are separately compiled (by adding them to the sources for the project), and then linked. The header file is necessary so that the compiler is made aware of the interface to func1() without seeing the complete definition. The header should be added to the project headers, then you will find that the source browser and auto-completion etc. work correctly.
What bothers me with this question is the context of it.
A large cpp file has been created, large enough to warrant thinking about splitting it into smaller more manageable files. The proposed split is:
SS_main.cpp
SS_screen.cpp
SS_disk.cpp
SS_web.cpp
SS_functions.cpp
This seems to indicate that there are separate units of functionality from a specification and design perspective. We can only guess at the coupling between these units of code.
However, it would be a start to define these code units such that each new cpp file has its own header file thus defining the interfaces of these units and the (low) coupling between them to achieve (high) cohesion for each unit.
We are refactoring here.
It is not acceptable to use included cpp files in this context it as does not provide any advantages. The only time I've come across included cpp files is when a one is included to provide code for debug code, and example being to compile non-inline versions of functions. It helps in stepping through code in the debugger.