Why do includes need further dependencies? - c++

My current understanding is like this. Please correct me if I am wrong. When I include a C++ library (e.g. open source project) to my project I have to include the .h files so that the compiler knows about the interface of the included library. The compiled code of the included library is then linked by the linker.
But now during compilation, the included header file needs another dependency. If I would include the header file of this dependency won't this turn into some recursive loop until every dependency is included? Why is it needed? Shouldn't be this the concern of the linker? The compiled library contains the dependency.
I stumbled over this project using Xcode 9.4.

A compiler translates code into machine language. The said code is then strung together with other machine code using a linker. Google more on what I wrote, if confused; it is a simplification missing finer details.
When you type #include <cstdint> for example, a preprocessor, which is another separate program, does a pattern substitution, if you will, on #include <cstdint> and replaces that line with the whole contents of the cstdint.hh file. The substitute happens before the translation process to machine code even begins.
Usually, these #include <...> files are written carefully so that you do not need to chase other #include. However, that is not a guarantee.

The risk you identify exists. It's not automatic, though. If a.h includes b.h which includes c.h, there is no problem with nested includes.
You could have a problem if a.h includes both b.h and c.h, and b.h also includes c.h indirectly. The risk here isn't so much recursion, but double-definition of the contents of c.h.
The usual solution is that every header starts with
#ifndef A_H_INCLUDED
#define A_H_INCLUDED
// actual contents of "a.h"
and ends with
#endif // A_H_INCLUDED
Now, the second inclusion of c.h is harmless. When this happens, C_H_INCLUDED will be already defined by the first inclusion, so the second inclusion is wholly skipped. Some compilers are smart enough to recognize this pattern and won't even read c.h the second time, saving a few milliseconds of disk I/O.
The linker can't solve this, because the double-definition problem happens before the linker is involved. It happens at the level of individual Translation Units. A Translation Unit is basically a single .cpp file after all its .h files have been included. Each TU is handled individually by the compiler, and it's this compiler which trips over the double definitions. The linker cares a bit less about duplications. Duplicate function definitions are a problem for the linker, class definitions are not.

Related

Can main.cpp use the class which is included by the file main.cpp include? [duplicate]

If a library is included in a class header and then this header is included in another class do I have to include the library again?
For example:
#ifndef A_H
#define A_H
#include<someLibrary.h>
class A{
...
}
#endif
Then another class includes the A.h header
#include<A.h> //include class A
class B{
...
}
Do I have to include the "someLibrary.h" in class B?
No, #includes are transitive.
However, if your second file itself uses symbols from someLibrary, it's good style to re-include the header. That way you're not "hoping and praying" that you never remove the intermediate include. Your codebase will be more robust if every source file #includes everything that it directly needs. Header guards prevent this from being a wasteful policy.
The preprocessor #include directive does exactly what the name implies, it actually includes the file at the place of the directive.
Simple example: Lets say we have to files, first a header file named a.h
// Our class
class A
{
// Stuff for the class
};
// End of the class
Then a source file a.cpp:
// Include header file
#include "a.h"
// Header file included
int main()
{
// Code
}
The preprocessor generates a single file looking like
// Include header file
// Our class
class A
{
// Stuff for the class
};
// End of the class
// Header file included
int main()
{
// Code
}
This inclusion is recursive, so if a header file is including another header file, that other header file will also be included.
The source file generated by the preprocessor is called a translation unit and is what the compiler actually sees.
The above is a simplification on how a modern preprocessor works, and while it can be run separately to create a preprocessed source file, it's more common that the preprocessor is part of the compiler to streamline the parsing process.
You should also note that the terminology you use is not correct. A library can (and usually do) have one or more header files, which are used when compiling your source code. A library then often (but not always) contain a special library file that is linked with the object files created by the compiler.
A library that has no linker library is called a header only library.
You don't include classes or libraries, you just include headers, and that is a textual operation (a bit like a copy & paste done by the preprocessor).
Read more about the C/C++ preprocessor, and the GNU cpp.
You can ask your compiler to show you the preprocessed form of your source file foo.cc, e.g. with g++ -Wall -C -E foo.cc (it will spills on stdout the preprocessed form)
For a small project (e.g. less than 200KLOC), having just one single common header file included by all your translation units is sensible (and I believe that having many small header files is bad habit, so I usually put more than one class definition per header file). BTW, that (single header file) approach is friendly for precompiled headers. Some people prefer to have several of their own #include-d subheaders in that single header.
Notice that most C++ standard headers (like <map> or <vector>....) bring a lot of text, so you don't want to have tiny compilation units (on my Linux system, #include <vector> is dragging more than ten thousand lines, so having only a few dozen of your source code lines after is inefficient for the compiler)
Also notice that C++ class definitions usually have lots of inlined member functions (and you practically want to give the body of that inlined function in the same header file). So C++ header code tends to be quite big...
(some people prefer to break a single header file in many subheaders, which are always included together)

How does #include work in C++?

If a library is included in a class header and then this header is included in another class do I have to include the library again?
For example:
#ifndef A_H
#define A_H
#include<someLibrary.h>
class A{
...
}
#endif
Then another class includes the A.h header
#include<A.h> //include class A
class B{
...
}
Do I have to include the "someLibrary.h" in class B?
No, #includes are transitive.
However, if your second file itself uses symbols from someLibrary, it's good style to re-include the header. That way you're not "hoping and praying" that you never remove the intermediate include. Your codebase will be more robust if every source file #includes everything that it directly needs. Header guards prevent this from being a wasteful policy.
The preprocessor #include directive does exactly what the name implies, it actually includes the file at the place of the directive.
Simple example: Lets say we have to files, first a header file named a.h
// Our class
class A
{
// Stuff for the class
};
// End of the class
Then a source file a.cpp:
// Include header file
#include "a.h"
// Header file included
int main()
{
// Code
}
The preprocessor generates a single file looking like
// Include header file
// Our class
class A
{
// Stuff for the class
};
// End of the class
// Header file included
int main()
{
// Code
}
This inclusion is recursive, so if a header file is including another header file, that other header file will also be included.
The source file generated by the preprocessor is called a translation unit and is what the compiler actually sees.
The above is a simplification on how a modern preprocessor works, and while it can be run separately to create a preprocessed source file, it's more common that the preprocessor is part of the compiler to streamline the parsing process.
You should also note that the terminology you use is not correct. A library can (and usually do) have one or more header files, which are used when compiling your source code. A library then often (but not always) contain a special library file that is linked with the object files created by the compiler.
A library that has no linker library is called a header only library.
You don't include classes or libraries, you just include headers, and that is a textual operation (a bit like a copy & paste done by the preprocessor).
Read more about the C/C++ preprocessor, and the GNU cpp.
You can ask your compiler to show you the preprocessed form of your source file foo.cc, e.g. with g++ -Wall -C -E foo.cc (it will spills on stdout the preprocessed form)
For a small project (e.g. less than 200KLOC), having just one single common header file included by all your translation units is sensible (and I believe that having many small header files is bad habit, so I usually put more than one class definition per header file). BTW, that (single header file) approach is friendly for precompiled headers. Some people prefer to have several of their own #include-d subheaders in that single header.
Notice that most C++ standard headers (like <map> or <vector>....) bring a lot of text, so you don't want to have tiny compilation units (on my Linux system, #include <vector> is dragging more than ten thousand lines, so having only a few dozen of your source code lines after is inefficient for the compiler)
Also notice that C++ class definitions usually have lots of inlined member functions (and you practically want to give the body of that inlined function in the same header file). So C++ header code tends to be quite big...
(some people prefer to break a single header file in many subheaders, which are always included together)

Does putting a whole class definition in a ".h" make the executable larger?

We define a C++ class in a .h and define its methods in a .cpp, but it makes the code look less organized.
I want to put all method's definition in the class definition which is in a .h file, but I'm worrying that the compiler generate duplicated code for the same methods/functions when one class header file is included by different files.
Does the linker find out and merge the duplicated code pieces to reduce the file size?
If not, is it better to use .hpp instead? I heard that a .hpp is for this.
And it does make minor difference when I just change a .h file for a .hpp (I don't know why), compiled with G++.
Yes. It may create larger executable and that is because the member functions which are defined in the class itself, are inline by default, whether you mention the keyword inline in the defintion or not. Usually, inline function causes larger executable because the compiler will define it multiple times wherever it is called from.
.h vs .hpp is the 90% equivalence of
#include <cmath> vs #include <math.h>
Some people prefer to use .hpp when they are doing exclusive C++ programming. You will see .hpp in libraries like Boost.
However, the other 10% is really important. For example, taking from Boost library doc, they explain the reason of using .hpp over .h:
Most Boost libraries are header-only: they consist entirely of header
files containing templates and inline functions, and require no
separately-compiled library binaries or special treatment when
linking.
If you fall in that case, you should use .hpp, but this can cost longer compilation time. Otherwise, you might want to keep .h style. That's just my personal taste. It isn't C-oriented at all, in my honest opinion.
Further reading:
Splitting templated C++ classes into .hpp/.cpp files--is it possible?
Condensing Declaration and Implementation into an HPP file
C++ templates declare in .h, define in .hpp
You have nothing to worry about. It makes absolutely no difference how it's broken up, it's what your files describe that makes it bigger, not how that description is spread out.
.h or .hpp makes no difference as well.
To answer your question about a larger executable, yes it will make your executable larger. When a you #include a header file in a source or header file, the preprocessor replaces the #include with the contents of the header file. This is why it is necessary to protect your header files with the following header protection:
#ifndef HDR_H
#define HDR_H
...
#endif
However, you will get linker errors if you include the header file (that has function definitions) in multiple files that are part of the same executable. It would wise for you to split class and function definitions and declarations into .cpp and .hpp files, respectively. This will greatly reduce the amount of linker headaches.
Also, .h = .hpp. Doesn't matter which one you choose. Personal preference...
There's all you need here: Header files, pros and cons of putting all you code in them. Hope it helps!
Using header files results in quicker compile time and smaller executable. It also looks considerably cleaner because you can get a quick overview of your class by looking at its .h declaration.

Organize includes

Is there some preferred way to organize ones include directives?
Is it better to include the files you need in the .cpp file instead of the .h file? Are the translation units affected somehow?
How about if I need it in both the .h file and .cpp file, should I just include it in the .h file? Will it matter?
Is it a good practice to keep the already defined files in a precompiled header (stdafx.h), for instance std and third party libraries? How about my own files, should I include them in a stdafx.h file along the way as I create them?
// myClass.h
#include <string>
// ^-------- should I include it here? --------
class myClass{
myClass();
~myClass();
int calculation()
};
// myClass.cpp
#include "myClass.h"
#include <string>
// ^-------- or maybe here? --------
[..]
int myClass::calculation(){
std::string someString = "Hello World";
return someString.length();
}
// stdafx.h
#include <string.h>
// ^--------- or perhaps here, and then include stdafx.h everywhere? -------
You should have them at the top of the file, all in one place. This is what everyone expects. Also, it is useful to have them grouped, e.g. first all standard headers, then 3rd-party headers (grouped by library), then your own headers. Keep this order consistent throughout the project. It makes it easier to understand dependencies. As #James Kanze points out, it is also useful to put the header that declares the content first. This way you make sure that it works if included first (meaning it does no depend on any includes that it does not include itself).
Keep the scope as small as possible, so that a change in the header affects the least number of translation-units. This means, whenever possible include it in the cpp-file only. As #Pedro d'Aquino commented, you can reduce the number of includes in a header by using forward declarations whenever possible (basically whenever you only use references or pointers to a given type).
Both - explicit is better than implicit.
After some reading, I believe you should only include headers in the PCH if you are confident that they do not change anymore. This goes for all standard headers as well as (probably) third party libraries. For your own libraries, you be the judge.
This article on Header file include patterns should be helpful for you.
Is there some preferred way to organize ones include directives?
Yes, you can find them in the above article.
Is it better to include the files you need in the .cpp file instead of
the .h file? Are the translation units
affected somehow?
Yes, it is better to have them in .cpp. Even, if a defined type is required in definition of another type, you can use forward declaration.
How about if I need it in both the .h file and .cpp file, should I just
include it in the .h file? Will it
matter?
Only in .h file, but it is suggested to forward declare in header files, and include in .cpp files.
Is it a good practice to keep the already defined files in a precompiled
header (stdafx.h), for instance std
and third party libraries? How about
my own files, should I include them in
a stdafx.h file along the way as I
create them?
I personally have not used precompiled headers, but there has been a discussion on them on Stackoverflow earlier:
Precompiled Headers? Do we really need them
Is there some preferred way to organize ones include directives?
No common conventions. Some suggest alphabet-sorting them, I personally dislike it and prefer keeping them logically grouped.
Is it better to include the files you need in the .cpp file instead of the .h file?
In general, yes. It reduces the count of times that the compiler needs to open and read the header file just to see the include guards there. That may reduce overall compilation time.
Sometimes it's also recommended to forward-declare as much classes as possible in the headers and actually include them only in .cpp's, for the same reason. The "Qt people" do so, for example.
Are the translation units affected somehow?
In semantic sense, no.
How about if I need it in both the .h file and .cpp file, should I just include it in the .h file? Will it matter?
Just include it in the header.
Is it a good practice to keep the already defined files in a precompiled header (stdafx.h), for instance std and third party libraries? How about my own files, should I include them in a stdafx.h file along the way as I create them?
Precompiled headers can significantly reduce compilation times. For example: one of my projects that includes boost::spirit::qi compiles in 20 secs with PCH on, and 80 secs — without. In general, if you use some heavily template-stuffed library like boost, you'd want to utilise the advantage of PCH.
As for the question in your code sample: since you don't use std::string in the header, it's better to include it in the .cpp file. It's alright to #include <string> in stdafx.h too — but that will just add a little bit of complexity to your project and you'll hardly notice any compilation speed-up.
(4) I wouldn't recommend to include any additional files into stdafx.h. or similar "include_first.h" files. Direct including into cpp or particular h files allow you to express dependencies of your code explicitly and exclude redundant dependencies. It is especialy helpful when you decide to decompose monolithic code into a few libs or dll's. Personally, I use files like "include_first.h" (stdafx.h) for configuration purpose only (this file contains only macro definitions for current application configuration).
It is possible to provide precompiled headers for your own files by marking another file to stop precompilation instead of stdafx.h (for instance, you can use special empty file named like "stop_pch.h").
Note, precompiled headers may not work properly for some kinds of sofisticated usage of the preprocessor (particulary, for some technics used in BOOST_PP_* )
From the performance point of view:
Changing any of the headers included from stdafx.h will trigger a new precompilation, so it depends on how "frozen" the code is. External libraries are typical candidates for stdafx.h inclusion, but you can certainly include your own libraries as well - it's a tradeoff based on how often you expect to change them.
Also, with the Microsoft compiler you can put this at the top of each header file:
#pragma once
This allows the compiler to fully skip that file after the first occurrence, saving I/O operations. The traditional ifndef/define/endif pattern requires opening and parsing the file every time it's included, which of course takes some time. It can certainly accumulate and get noticeable!
(Make sure to leave the traditional guards in there, for portability.)
It might be important to notice that the order of classes in Translation Unit need to be correct or some c++ features are just disabled and results in a compile-time error.
Edit: Adding examples:
class A { };
class B { A a; }; // order of classes need to be correct

Why use #ifndef CLASS_H and #define CLASS_H in .h file but not in .cpp?

I have always seen people write
class.h
#ifndef CLASS_H
#define CLASS_H
//blah blah blah
#endif
The question is, why don't they also do that for the .cpp file that contain definitions for class functions?
Let's say I have main.cpp, and main.cpp includes class.h. The class.h file does not include anything, so how does main.cpp know what is in the class.cpp?
First, to address your first inquiry:
When you see this in .h file:
#ifndef FILE_H
#define FILE_H
/* ... Declarations etc here ... */
#endif
This is a preprocessor technique of preventing a header file from being included multiple times, which can be problematic for various reasons. During compilation of your project, each .cpp file (usually) is compiled. In simple terms, this means the compiler will take your .cpp file, open any files #included by it, concatenate them all into one massive text file, and then perform syntax analysis and finally it will convert it to some intermediate code, optimize/perform other tasks, and finally generate the assembly output for the target architecture. Because of this, if a file is #included multiple times under one .cpp file, the compiler will append its file contents twice, so if there are definitions within that file, you will get a compiler error telling you that you redefined a variable. When the file is processed by the preprocessor step in the compilation process, the first time its contents are reached the first two lines will check if FILE_H has been defined for the preprocessor. If not, it will define FILE_H and continue processing the code between it and the #endif directive. The next time that file's contents are seen by the preprocessor, the check against FILE_H will be false, so it will immediately scan down to the #endif and continue after it. This prevents redefinition errors.
And to address your second concern:
In C++ programming as a general practice we separate development into two file types. One is with an extension of .h and we call this a "header file." They usually provide a declaration of functions, classes, structs, global variables, typedefs, preprocessing macros and definitions, etc. Basically, they just provide you with information about your code. Then we have the .cpp extension which we call a "code file." This will provide definitions for those functions, class members, any struct members that need definitions, global variables, etc. So the .h file declares code, and the .cpp file implements that declaration. For this reason, we generally during compilation compile each .cpp file into an object and then link those objects (because you almost never see one .cpp file include another .cpp file).
How these externals are resolved is a job for the linker. When your compiler processes main.cpp, it gets declarations for the code in class.cpp by including class.h. It only needs to know what these functions or variables look like (which is what a declaration gives you). So it compiles your main.cpp file into some object file (call it main.obj). Similarly, class.cpp is compiled into a class.obj file. To produce the final executable, a linker is invoked to link those two object files together. For any unresolved external variables or functions, the compiler will place a stub where the access happens. The linker will then take this stub and look for the code or variable in another listed object file, and if it's found, it combines the code from the two object files into an output file and replaces the stub with the final location of the function or variable. This way, your code in main.cpp can call functions and use variables in class.cpp IF AND ONLY IF THEY ARE DECLARED IN class.h.
I hope this was helpful.
The CLASS_H is an include guard; it's used to avoid the same header file being included multiple times (via different routes) within the same CPP file (or, more accurately, the same translation unit), which would lead to multiple-definition errors.
Include guards aren't needed on CPP files because, by definition, the contents of the CPP file are only read once.
You seem to have interpreted the include guards as having the same function as import statements in other languages (such as Java); that's not the case, however. The #include itself is roughly equivalent to the import in other languages.
It doesn't - at least during the compilation phase.
The translation of a c++ program from source code to machine code is performed in three phases:
Preprocessing - The Preprocessor parses all source code for lines beginning with # and executes the directives. In your case, the contents of your file class.h is inserted in place of the line #include "class.h. Since you might be includein your header file in several places, the #ifndef clauses avoid duplicate declaration-errors, since the preprocessor directive is undefined only the first time the header file is included.
Compilation - The Compiler does now translate all preprocessed source code files to binary object files.
Linking - The Linker links (hence the name) together the object files. A reference to your class or one of its methods (which should be declared in class.h and defined in class.cpp) is resolved to the respective offset in one of the object files. I write 'one of your object files' since your class does not need to be defined in a file named class.cpp, it might be in a library which is linked to your project.
In summary, the declarations can be shared through a header file, while the mapping of declarations to definitions is done by the linker.
That's the distinction between declaration and definition. Header files typically include just the declaration, and the source file contains the definition.
In order to use something you only need to know it's declaration not it's definition. Only the linker needs to know the definition.
So this is why you will include a header file inside one or more source files but you won't include a source file inside another.
Also you mean #include and not import.
That's done for header files so that the contents only appear once in each preprocessed source file, even if it's included more than once (usually because it's included from other header files). The first time it's included, the symbol CLASS_H (known as an include guard) hasn't been defined yet, so all the contents of the file are included. Doing this defines the symbol, so if it's included again, the contents of the file (inside the #ifndef/#endif block) are skipped.
There's no need to do this for the source file itself since (normally) that's not included by any other files.
For your last question, class.h should contain the definition of the class, and declarations of all its members, associated functions, and whatever else, so that any file that includes it has enough information to use the class. The implementations of the functions can go in a separate source file; you only need the declarations to call them.
main.cpp doesn't have to know what is in class.cpp. It just has to know the declarations of the functions/classes that it goes to use, and these declarations are in class.h.
The linker links between the places where the functions/classes declared in class.h are used and their implementations in class.cpp
.cpp files are not included (using #include) into other files. Therefore they don't need include guarding. Main.cpp will know the names and signatures of the class that you have implemented in class.cpp only because you have specified all that in class.h - this is the purpose of a header file. (It is up to you to make sure that class.h accurately describes the code you implement in class.cpp.) The executable code in class.cpp will be made available to the executable code in main.cpp thanks to the efforts of the linker.
It is generally expected that modules of code such as .cpp files are compiled once and linked to in multiple projects, to avoid unnecessary repetitive compilation of logic. For example, g++ -o class.cpp would produce class.o which you could then link from multiple projects to using g++ main.cpp class.o.
We could use #include as our linker, as you seem to be implying, but that would just be silly when we know how to link properly using our compiler with less keystrokes and less wasteful repetition of compilation, rather than our code with more keystrokes and more wasteful repetition of compilation...
The header files are still required to be included into each of the multiple projects, however, because this provides the interface for each module. Without these headers the compiler wouldn't know about any of the symbols introduced by the .o files.
It is important to realise that the header files are what introduce the definitions of symbols for those modules; once that is realised then it makes sense that multiple inclusions could cause redefinitions of symbols (which causes errors), so we use include guards to prevent such redefinitions.
its because of Headerfiles define what the class contains (Members, data-structures) and cpp files implement it.
And of course, the main reason for this is that you could include one .h File multiple times in other .h files, but this would result in multiple definitions of a class, which is invalid.