So I'm writing a program which has gotten large enough now that it has several separate source files, and as a result, several separate header files. I keep constantly running into multiple include issues.
The problem is that I compile all of the individual files before I link them. So, A.cpp and B.cpp both include Z.h, because both A.cpp and B.cpp use function declarations and the such which exist inside of Z.h . This is all fine during the compile stage, because everything is in order, but when I go to link A.o and B.o together, the compiler (linker) throws multiple definition errors, because it's included the function definitions from Z.h while it was compiling each of the .o files, and so they exist in both .o files. This can normally be avoided by using include guards, but in this case, they won't work, since each .cpp file is compiled separately, the compiler "forgets" the state of defined preprocessor variables.
So my question is, how is this solved in the real world? I've had a good dig around and have come up dry, but I'm certain that this must have been solved before.
Thanks!
So, A.cpp and B.cpp both include Z.h, because both A.cpp and B.cpp use
function declarations and the such which exist inside of Z.h
This cannot be technically correct, or at least it's an incomplete description. Z.h most likely does not only contain function declarations but also function definitions.
Function declaration:
void f();
Function definition:
void f() { std::cout << "doing something\n"; }
So my question is, how is this solved in the real world?
You solve this problem by keeping the declarations in Z.h and moving the definitions into yet another to-be-created Z.cpp file.
Hard to say exactly what problem you're running into without code, but you are probably defining functions or variables in your headers. That's not what headers are for unless the functions are inline or templates. Including a header is like copy/pasting all the code in it into your cpp file. If you have the same variable in every cpp file, and it's not static or in an anonymous namespace, you'll have multiple definitions when you try to link and the linker will puke.
Related
How are header files connected to the cpp files? I have a cpp file, including header files. I understand what include does, but what about the cpp file of the header file?
Let's say:
calculate.cpp:
#include table.h
What happens with table .cpp? To fully understand calculate.cpp, is table.cpp also needed to be looked at?
You have file A.cpp which includes B.h. When you compile A.cpp the preprocessor will include everything from file B.h to the translation unit of A.cpp and the compiler create an object file from it.
The compiler doesn't care at this point about the implementation of whatever is in B.cpp. This is dealt with separately when the compiler compiles the translation unit B.cpp. The compiler trusts at this point that in the future (at link time) there will be something when calling something from B. If not, you will end up with a linker error (undefined symbols most likely).
Here you have a very good answer on what's happening: How does the compilation/linking process work?
But just to describe it in less words:
Preprocessor: reads through your .cpp and included .h files (e.g. A.cpp and B.h and creates an output which the compiler then can compile. This will independently also happen for B.cpp and its includes/defines)
Compiler: Takes the output from the preprocessor and creates object files. Object files contain mostly machine code and some linker information
Linker: Links the object files together so when you run the program they right functions are called.
So I guess the connection you are looking for happens in the Linking stage. That's where all the pieces come together.
I am writing a library for neural nets. There are some necessary functions I needed so I separated them in a separate header file. I also provided the definition guards. I also included the header file in only one file but then also linker claims that there are multiple definitions of all the functions in the program.
The library structure is as follows:
namespace maya:
class neuron [neuron.hpp, neuron.cpp]
class ffnet [ffnet.hpp, ffnet.cpp]
struct connection [connection.hpp]
functions [functions.hpp]
the functions header file is written something like this:
#ifndef FUNCTIONS_HPP
#define FUNCTIONS_HPP
// some functions here
double random_double(){//some code}
#endif
this functions.hpp file is included only one in neuron.hpp and since ffnet is dependent on neuron, I included neuron.hpp in ffnet only once. This ffnet.hpp is included in main.cpp only once. main.cpp is file that I use for testing my library.
this linker throws error something like this:
/usr/bin/ld: /tmp/ccN7ywby.o: in function `maya::random_double()':
neuron.cpp:(.text+0x0): multiple definition of maya::random_double()'; /tmp/ccvDr1aG.o:main.cpp:(.text+0x0): first defined here
/usr/bin/ld: /tmp/cc66mBIr.o: in function `maya::random_double()':``
ffnet.cpp:(.text+0x0): multiple definition of `maya::random_double()'; /tmp/ccvDr1aG.o:main.cpp:(.text+0x0): first defined here
Also I compiled my program using :
g++ main.cpp neuron.cpp ffnet.cpp -o net
I don't think this will be needed but just in case :
$ uname -a
Linux brightprogrammer 4.19.0-kali3-amd64 #1 SMP Debian 4.19.20-1kali1 (2019-02-14) x86_64 GNU/Linux
You must write code of random_double() in .cpp file other than .hpp or .h file. Or, add inline before double random_double() { //some code } if you keep your code in your .hpp file.
The problem
You have your function definition with their full code in a header that you include in several compilation units. This causes the function to be defined in each compilation unit (cpp) and this breaks the One Definition Rule (ODR).
The include guards make sure that the same definition doesn't occur several time in the same compilation unit (e.g. if you include function.hpp in neuron.hpp and also include it directly). But here this header is included directly or indirectly in main.cpp, ffnet.cpp and neuron.cpp, which makes a first definition and 2 invalid redefinitions.
The solution
You must change function.hpp to keep only the function declaration:
#ifndef FUNCTIONS_HPP
#define FUNCTIONS_HPP
double random_double(); // no body !!
#endif
and move the function bodies to a separate function.cpp, which must be added to your compiler command.
The advantage of this approach are:
You can then compile the utility functions separately. Every time you'd change the function body, you'd no longer have to recompile all the cpp.
Encapsulation is improved, by sharing in the hpp only what other modules need to know, and hiding the implementation details.
Reuse could be facilitated across projects by making a library of functions.
The includes would be shorter (in case in some distant future your code would evolve to a large project with thousands of hpp, this could make you gain some time)
Additional remarks
Not sure that it applies, but be aware also that it's not a good idea to include a header into a namespace.
I also recommend reading this article about headers. It's old but the advice is still very relevant :-)
Note that there are exceptions to the ODR for classes and inline functions, in which case the multiple definitions must be exactly the same sequence of tokens.
I'm an experienced programmer, but only in high level languages; I'm doing my first really large project in C++ right now.
I've got two classes, ClassA and ClassB; a ClassA is (among other things) an index of ClassBs, so ClassA needs to know what a ClassB is to build arrays out of it, and a ClassB needs to know what a ClassA is so it can update the index when something changes. Both of these classes are in their own .h & .cpp files.
I figured including each from the other would just cause infinite recursion, so I decided to instead have #include "ClassA.cpp" and #include "ClassB.cpp" at the beginning of main.cpp; but doing this just caused the compiler to warn about multiple definitions of every class and method in those files.
After some experimentation I found out that including ClassA.h and ClassB.h produces the desired behavior - but this doesn't make any sense, I'm only including the prototypes of those classes. Surely the code that actually makes them up never gets mixed in? And yet it does.
What's going on here that I don't understand? Why does including ClassA.h also make the actual code for ClassA show up with it? And why does including ClassA.cpp cause every include of ClassA.h to trigger "multiple definition" errors even though they're in a header shield or whatever it's called?
The missing step is that the definitions in ClassA.cpp and ClassB.cpp will not be seen by the linker unless those files are also compiled at some point. If you did something like this:
g++ main.cpp ClassA.cpp ClassB.cpp
then all references to definitions in ClassA.cpp and ClassB.cpp from main.cpp would be resolved. However, if you only did
g++ main.cpp
then the linker would have no idea where to find the definitions in ClassA.cpp and ClassB.cpp and you would probably get an error.
If you're using an IDE, this detail is hidden from you: the IDE ensures that as long as you add a .cpp file to your "project", it will be compiled into the final binary when you build the project.
This is the way how C++ is designed:
Your classes don't need to now anything more than the prototypes of other classes, so you don't have to include more than the headers.
Why is this so? Well, compilation of an entire application is the combination of two steps: compilation of the code itself and then linking (actually, there is a third step preceding these: pre-processing, but we could consider this one as part of code compilation).
Example function call: It is sufficient (exception: inline functions!) to know that a function with a specific proto type exists. The compiler then can generate all the code necessary to do the function call, except for the actuall address of the function - for which it leaves some kind of place holder.
The linker then combines all code generated during the compilation step to a single unit. As now knowing where every function is located, it can fill their actual addresses into the place holders, wherever they may appear.
C++ code is compiled to *.obj for per .cpp file, and it is the link process make the obj files to an executable.
Never include *.cpp because it usually causes redifinition issue.
For each *.h file, add a macro to avoid multiple including:
#ifndef XXX_H
#define XXX_H
//your code goes here
#endif
I learned that if I compile main.cpp the compiler simply replaces all includes with the actual content of the file i.e. #include "LongClassName.h" with the text in that file. This is done recursively in LongClassName.h. In the end the compiler sees a huge "virtual" file with the complete code of all .cpp and .h files.
But it seems to be much more complicated in real projects. I had a look at the Makefile Eclipse created for my Qt project and it seems that there is an entry for every file named file.o and its dependencies are file.cpp and file.h. So that means that eclipse compiles each .cpp separately(?)
Does that mean that class.cpp will know nothing about global stuff in main.cpp or a class in higher include hirarchy?
I stumbled upon this problem while trying to create an alias for a long class name. It is my main class and I wanted to call static functions with a shorter name: Ln::globalFunction() instead of LongClassName::globalFunction()
I have a class LongClassName whose header I include in main.cpp. This is the main class. All other classes are included in it.
LongClassName.h
#define PI 3.14159265
#include <QDebug>
Class LongClassName
{
...
public:
...
private:
...
};
typedef LongClassName Ln;
LongClassName.cpp
#include "Class1.h"
#include "Class2.h"
#include "Class3.h"
/*implementations of LongClassName's functions*/
So I assumed that when the code is included in one single "virtual" file by the compiler every class will be inserted after this source code and because of that every class should know that Ln is an alias for LongClassName
This didn't work
So what is the best way to propagate this alias to all classes?
I want to avoid including LongClassname.h in all classes because of reverse dependencies. LongClassName includes all other classes in its implementation. And almost all the other classes use some static functions of LongClassName.
(At the moment I have a seperate class Ln but try to merge it with LongClassName because it seems more logical.)
The compiler knows how to compile a .cpp file (if it's a cpp compiler) into a .o file called 'object file', which is your code translated (and probably manipulated, optimized, etc.) to a machine code. Actually the compiler creates an assembly code, which is translated to machine code by the assembler.
So each cpp file is compiled to a different object file, and knows nothing about variables declared in other cpp files, unless you include declarations you want the object file to know about, either in the cpp file or in an h file it includes.
Although the compilation is done separately for each cpp, the linker links all object files to a single executable (or a library), so a variable declared in the global namespace is indeed global, and every declaration not explicitly placed in a named
namespace is placed in the global namespace.
You will probably benefit from reading about all stages of "compiling", for example here: http://www.network-theory.co.uk/docs/gccintro/gccintro_83.html
In the end the compiler sees a huge "virtual" file with the complete code of all .cpp and .h files.
This is wrong. In .cpps you should include just the .hs (or .hpps if you like), almost never the .cpps; the .h in general just contain the declarations of the classes and of the methods, and not their actual body1 (i.e. their definition), so when you compile each .cpp the compiler still knows nothing about the definition of the functions defined in other .cpps, it just knows their declaration, and with it it can perform syntactical checks, generate code for function calls, ... but still it will generate an "incomplete" object file (.o), that will contain several "placeholders" ("here goes the address of this function defined somewhere else" "here goes the address of this extern variable" and so on)
After all the object files have been generated, it's the linker that have to take care of these placeholders, by plumbing all the object files together and linking their references to the actual code (which now can be found, since we have all the object files).
For some more info about the classical compile+link model, see here.
Does that mean that class.cpp will know nothing about global stuff in main.cpp or a class in higher include hirarchy?
Yes, it's exactly like that.
But why doesn't the Makefile created by eclipse simply compile main.cpp. Why isn't this enough? main.cpp contains all the dependencies. Why compile every .cpp separately?
main.cpp doesn't contain all the code, but just the declarations. You don't include all the code in the same .cpp (e.g. by including the other .cpps) mainly to decrease compilation time.
I want to avoid including LongClassname.h in all classes because of reverse dependencies. LongClassName includes all other classes in its implementation. And almost all the other classes use some static functions of LongClassName.
If you use header guards, you shouldn't have problems.
1. Ok, they also contain inline and template functions, but they are the exception, not the rule.
One of my "non-programmer" friends recently decided to make a C++ program to solve a complicated mechanical problem.
He wrote each function in a separate .cpp file, then included them all in the main source file, something like this:
main.cpp:
#include "function1.cpp"
#include "function2.cpp"
...
int main()
{
...
}
He then compiled the code, with a single gcc line:
g++ main.cpp // took about 2 seconds
Now, I know that this should work, but I'm not sure whether including .cpp files directly into the main program is a good idea. I have seen the following scheme several times, where all the function prototypes go into a header file with the extern keyword, like this:
funcs.h:
extern void function1(..);
extern void function2(..);
...
main.cpp:
...
#include "funcs.h"
...
& compiling with:
g++ -c function1.cpp
g++ -c function2.cpp
...
g++ -c main.cpp
g++ -o final main.o function1.o function2.o ...
I think that this scheme is better (with a makefile, ofcourse). What reasons can I give my friend to convince him so?
The main reason people compile object by object is to save time. High-level localised code changes often only require compilation of one object and a relink, which can be faster. (Compiling too many objects that draw in heaps of headers, or redundantly instantiate the same templates, may actually be slower when a change in common code triggers a fuller recompilation).
If the project is so small that it can be compiled in 2 seconds, then there's not much actual benefit to the traditional approach, though doing what's expected can save developer time - like yours and ours on here :-). Balancing that, maintaining a makefile takes time too, though you may well end up doing that anyway in order to conveniently capture include directories, libraries, compiler switches etc.
Actual implications to written/generated code:
cpp files normally first include their own headers, which provides a sanity check that the header content can be used independently by other client code: put everything together and the namespace is already "contaminated" with includes from earlier headers/implementation files
the compiler may optimise better when everything is in one translation unit (+1 for leppie's comment, do do the same...)
static non-member variables and anonymous namespaces are private to the translation unit, so including multiple cpps means sharing these around, for better or worse (+1 for Alexander :-))
say a cpp files defines a function or variable which is not mentioned in its header and might even be in an anonymous namespace or static: code later in the translation unit could call it freely without needing to hack up their own forward declaration (this is bad - if the function was intended to be called outside its own cpp then it should have been in the header and an externally exposed symbol in its translation unit's object)
BTW - in C++ your headers can declare functions without explicitly using the extern keyword, and it's normal to do so.
The reason for the second style is because each .cpp file can be treated separately, with its own classes, global variables, ect.. without risk of conflict.
It is also easier in IDEs that automatically link all the .cpp files (like MSVC).