I am writing a library for neural nets. There are some necessary functions I needed so I separated them in a separate header file. I also provided the definition guards. I also included the header file in only one file but then also linker claims that there are multiple definitions of all the functions in the program.
The library structure is as follows:
namespace maya:
class neuron [neuron.hpp, neuron.cpp]
class ffnet [ffnet.hpp, ffnet.cpp]
struct connection [connection.hpp]
functions [functions.hpp]
the functions header file is written something like this:
#ifndef FUNCTIONS_HPP
#define FUNCTIONS_HPP
// some functions here
double random_double(){//some code}
#endif
this functions.hpp file is included only one in neuron.hpp and since ffnet is dependent on neuron, I included neuron.hpp in ffnet only once. This ffnet.hpp is included in main.cpp only once. main.cpp is file that I use for testing my library.
this linker throws error something like this:
/usr/bin/ld: /tmp/ccN7ywby.o: in function `maya::random_double()':
neuron.cpp:(.text+0x0): multiple definition of maya::random_double()'; /tmp/ccvDr1aG.o:main.cpp:(.text+0x0): first defined here
/usr/bin/ld: /tmp/cc66mBIr.o: in function `maya::random_double()':``
ffnet.cpp:(.text+0x0): multiple definition of `maya::random_double()'; /tmp/ccvDr1aG.o:main.cpp:(.text+0x0): first defined here
Also I compiled my program using :
g++ main.cpp neuron.cpp ffnet.cpp -o net
I don't think this will be needed but just in case :
$ uname -a
Linux brightprogrammer 4.19.0-kali3-amd64 #1 SMP Debian 4.19.20-1kali1 (2019-02-14) x86_64 GNU/Linux
You must write code of random_double() in .cpp file other than .hpp or .h file. Or, add inline before double random_double() { //some code } if you keep your code in your .hpp file.
The problem
You have your function definition with their full code in a header that you include in several compilation units. This causes the function to be defined in each compilation unit (cpp) and this breaks the One Definition Rule (ODR).
The include guards make sure that the same definition doesn't occur several time in the same compilation unit (e.g. if you include function.hpp in neuron.hpp and also include it directly). But here this header is included directly or indirectly in main.cpp, ffnet.cpp and neuron.cpp, which makes a first definition and 2 invalid redefinitions.
The solution
You must change function.hpp to keep only the function declaration:
#ifndef FUNCTIONS_HPP
#define FUNCTIONS_HPP
double random_double(); // no body !!
#endif
and move the function bodies to a separate function.cpp, which must be added to your compiler command.
The advantage of this approach are:
You can then compile the utility functions separately. Every time you'd change the function body, you'd no longer have to recompile all the cpp.
Encapsulation is improved, by sharing in the hpp only what other modules need to know, and hiding the implementation details.
Reuse could be facilitated across projects by making a library of functions.
The includes would be shorter (in case in some distant future your code would evolve to a large project with thousands of hpp, this could make you gain some time)
Additional remarks
Not sure that it applies, but be aware also that it's not a good idea to include a header into a namespace.
I also recommend reading this article about headers. It's old but the advice is still very relevant :-)
Note that there are exceptions to the ODR for classes and inline functions, in which case the multiple definitions must be exactly the same sequence of tokens.
Related
My current understanding is like this. Please correct me if I am wrong. When I include a C++ library (e.g. open source project) to my project I have to include the .h files so that the compiler knows about the interface of the included library. The compiled code of the included library is then linked by the linker.
But now during compilation, the included header file needs another dependency. If I would include the header file of this dependency won't this turn into some recursive loop until every dependency is included? Why is it needed? Shouldn't be this the concern of the linker? The compiled library contains the dependency.
I stumbled over this project using Xcode 9.4.
A compiler translates code into machine language. The said code is then strung together with other machine code using a linker. Google more on what I wrote, if confused; it is a simplification missing finer details.
When you type #include <cstdint> for example, a preprocessor, which is another separate program, does a pattern substitution, if you will, on #include <cstdint> and replaces that line with the whole contents of the cstdint.hh file. The substitute happens before the translation process to machine code even begins.
Usually, these #include <...> files are written carefully so that you do not need to chase other #include. However, that is not a guarantee.
The risk you identify exists. It's not automatic, though. If a.h includes b.h which includes c.h, there is no problem with nested includes.
You could have a problem if a.h includes both b.h and c.h, and b.h also includes c.h indirectly. The risk here isn't so much recursion, but double-definition of the contents of c.h.
The usual solution is that every header starts with
#ifndef A_H_INCLUDED
#define A_H_INCLUDED
// actual contents of "a.h"
and ends with
#endif // A_H_INCLUDED
Now, the second inclusion of c.h is harmless. When this happens, C_H_INCLUDED will be already defined by the first inclusion, so the second inclusion is wholly skipped. Some compilers are smart enough to recognize this pattern and won't even read c.h the second time, saving a few milliseconds of disk I/O.
The linker can't solve this, because the double-definition problem happens before the linker is involved. It happens at the level of individual Translation Units. A Translation Unit is basically a single .cpp file after all its .h files have been included. Each TU is handled individually by the compiler, and it's this compiler which trips over the double definitions. The linker cares a bit less about duplications. Duplicate function definitions are a problem for the linker, class definitions are not.
in my current project I´m working with the arpackpp interface. The entire library is written in .h files, so that there is no need to compile the library. The problem I'm facing now - when I include some of the arpackpp header files in some of my files, which are not the main.cpp, I get the following errors:
/.../Files/Includes/../../../arpack++/include/arerror.h:163: multiple definition of ArpackError::Set(ArpackError::ErrorCode, std::string const&)'
/.../Files/Includes/../../../arpack++/include/arerror.h:163: first defined here
/tmp/ccruWhMn.o: In functionstd::iterator_traits::iterator_category std::__iterator_category(char* const&)':
/.../Files/Includes/../../../arpack++/include/arerror.h:163: multiple definition of ArpackError::code'
/.../Files/Includes/../../../arpack++/include/arerror.h:163: first defined here
/tmp/ccruWhMn.o: In functionstd::vector >::max_size() const':
for several arpackpp functions when linking all the .o files. As I have read in several threads the problem is that I actually include the instantiation of the functions, which should be normally avoided.
Because I don't want to change the whole library I included all classes and functions using arpackpp classes in main.cpp, which is getting quite messy. Is there a workaround to this problem? And why doesn't include guards (#ifndef...#endif) prevent this problem?
First of all, include guards do not help at this point as they only prevent multiple inclusions of a header in a "subtree" of your project files' dependency graph. In other words: If you include a header in two totally separated files of the same project, the c++ preprocessor will replace the #include <header.h> twice and independently by the code specified in the header. This is perfectly fine as long as the header only contains declarations.
In your case (and in the case of many other header-only libraries), definitions are provided in the headers as well. So unfortunately (as far as I know), there is no elegant way other than including definition-containing files once in your project. https://github.com/m-reuter/arpackpp/blob/master/include/README explicitly states which files contain definitions.
Some libraries, however, provide preprocessor macros to trigger the inclusion of definitions for the provided header files (e.g. https://github.com/nothings/stb). Maybe arpackpp provides similar mechanisms.
In general the easiest way to the work with header only libraries is to extend your code only using headers. Provided you use the correct header guards, this would remove the issue of multiple definitions of your code. If you have a large base of existing code then I would suggest that you rename all your *.cpp files to *.hpp (c++ header files) and then add suitable header guards. Furthermore a convenient way of handling this code of base is to create an additional header file config.hpp and include all your other headers in that file. Then in your main.c it is a simple matter of including the config.hpp file.
e.g.
// Config.hpp ------------------------------------------------=
#include "example.hpp"
#include "example1.hpp"
#include "example2.hpp"
// etc.
// main.cpp --------------------------------------------------=
#include "Config.hpp"
int main() {
// Your code here.
return 0;
}
Furthermore if you wanted to continue with your project structure it would be a simple matter of separating all your code into functions that needed to access arpackcpp directly. Then include them all into one *.cpp file, and compile into a *.o and link.
I learned that if I compile main.cpp the compiler simply replaces all includes with the actual content of the file i.e. #include "LongClassName.h" with the text in that file. This is done recursively in LongClassName.h. In the end the compiler sees a huge "virtual" file with the complete code of all .cpp and .h files.
But it seems to be much more complicated in real projects. I had a look at the Makefile Eclipse created for my Qt project and it seems that there is an entry for every file named file.o and its dependencies are file.cpp and file.h. So that means that eclipse compiles each .cpp separately(?)
Does that mean that class.cpp will know nothing about global stuff in main.cpp or a class in higher include hirarchy?
I stumbled upon this problem while trying to create an alias for a long class name. It is my main class and I wanted to call static functions with a shorter name: Ln::globalFunction() instead of LongClassName::globalFunction()
I have a class LongClassName whose header I include in main.cpp. This is the main class. All other classes are included in it.
LongClassName.h
#define PI 3.14159265
#include <QDebug>
Class LongClassName
{
...
public:
...
private:
...
};
typedef LongClassName Ln;
LongClassName.cpp
#include "Class1.h"
#include "Class2.h"
#include "Class3.h"
/*implementations of LongClassName's functions*/
So I assumed that when the code is included in one single "virtual" file by the compiler every class will be inserted after this source code and because of that every class should know that Ln is an alias for LongClassName
This didn't work
So what is the best way to propagate this alias to all classes?
I want to avoid including LongClassname.h in all classes because of reverse dependencies. LongClassName includes all other classes in its implementation. And almost all the other classes use some static functions of LongClassName.
(At the moment I have a seperate class Ln but try to merge it with LongClassName because it seems more logical.)
The compiler knows how to compile a .cpp file (if it's a cpp compiler) into a .o file called 'object file', which is your code translated (and probably manipulated, optimized, etc.) to a machine code. Actually the compiler creates an assembly code, which is translated to machine code by the assembler.
So each cpp file is compiled to a different object file, and knows nothing about variables declared in other cpp files, unless you include declarations you want the object file to know about, either in the cpp file or in an h file it includes.
Although the compilation is done separately for each cpp, the linker links all object files to a single executable (or a library), so a variable declared in the global namespace is indeed global, and every declaration not explicitly placed in a named
namespace is placed in the global namespace.
You will probably benefit from reading about all stages of "compiling", for example here: http://www.network-theory.co.uk/docs/gccintro/gccintro_83.html
In the end the compiler sees a huge "virtual" file with the complete code of all .cpp and .h files.
This is wrong. In .cpps you should include just the .hs (or .hpps if you like), almost never the .cpps; the .h in general just contain the declarations of the classes and of the methods, and not their actual body1 (i.e. their definition), so when you compile each .cpp the compiler still knows nothing about the definition of the functions defined in other .cpps, it just knows their declaration, and with it it can perform syntactical checks, generate code for function calls, ... but still it will generate an "incomplete" object file (.o), that will contain several "placeholders" ("here goes the address of this function defined somewhere else" "here goes the address of this extern variable" and so on)
After all the object files have been generated, it's the linker that have to take care of these placeholders, by plumbing all the object files together and linking their references to the actual code (which now can be found, since we have all the object files).
For some more info about the classical compile+link model, see here.
Does that mean that class.cpp will know nothing about global stuff in main.cpp or a class in higher include hirarchy?
Yes, it's exactly like that.
But why doesn't the Makefile created by eclipse simply compile main.cpp. Why isn't this enough? main.cpp contains all the dependencies. Why compile every .cpp separately?
main.cpp doesn't contain all the code, but just the declarations. You don't include all the code in the same .cpp (e.g. by including the other .cpps) mainly to decrease compilation time.
I want to avoid including LongClassname.h in all classes because of reverse dependencies. LongClassName includes all other classes in its implementation. And almost all the other classes use some static functions of LongClassName.
If you use header guards, you shouldn't have problems.
1. Ok, they also contain inline and template functions, but they are the exception, not the rule.
I have always seen people write
class.h
#ifndef CLASS_H
#define CLASS_H
//blah blah blah
#endif
The question is, why don't they also do that for the .cpp file that contain definitions for class functions?
Let's say I have main.cpp, and main.cpp includes class.h. The class.h file does not include anything, so how does main.cpp know what is in the class.cpp?
First, to address your first inquiry:
When you see this in .h file:
#ifndef FILE_H
#define FILE_H
/* ... Declarations etc here ... */
#endif
This is a preprocessor technique of preventing a header file from being included multiple times, which can be problematic for various reasons. During compilation of your project, each .cpp file (usually) is compiled. In simple terms, this means the compiler will take your .cpp file, open any files #included by it, concatenate them all into one massive text file, and then perform syntax analysis and finally it will convert it to some intermediate code, optimize/perform other tasks, and finally generate the assembly output for the target architecture. Because of this, if a file is #included multiple times under one .cpp file, the compiler will append its file contents twice, so if there are definitions within that file, you will get a compiler error telling you that you redefined a variable. When the file is processed by the preprocessor step in the compilation process, the first time its contents are reached the first two lines will check if FILE_H has been defined for the preprocessor. If not, it will define FILE_H and continue processing the code between it and the #endif directive. The next time that file's contents are seen by the preprocessor, the check against FILE_H will be false, so it will immediately scan down to the #endif and continue after it. This prevents redefinition errors.
And to address your second concern:
In C++ programming as a general practice we separate development into two file types. One is with an extension of .h and we call this a "header file." They usually provide a declaration of functions, classes, structs, global variables, typedefs, preprocessing macros and definitions, etc. Basically, they just provide you with information about your code. Then we have the .cpp extension which we call a "code file." This will provide definitions for those functions, class members, any struct members that need definitions, global variables, etc. So the .h file declares code, and the .cpp file implements that declaration. For this reason, we generally during compilation compile each .cpp file into an object and then link those objects (because you almost never see one .cpp file include another .cpp file).
How these externals are resolved is a job for the linker. When your compiler processes main.cpp, it gets declarations for the code in class.cpp by including class.h. It only needs to know what these functions or variables look like (which is what a declaration gives you). So it compiles your main.cpp file into some object file (call it main.obj). Similarly, class.cpp is compiled into a class.obj file. To produce the final executable, a linker is invoked to link those two object files together. For any unresolved external variables or functions, the compiler will place a stub where the access happens. The linker will then take this stub and look for the code or variable in another listed object file, and if it's found, it combines the code from the two object files into an output file and replaces the stub with the final location of the function or variable. This way, your code in main.cpp can call functions and use variables in class.cpp IF AND ONLY IF THEY ARE DECLARED IN class.h.
I hope this was helpful.
The CLASS_H is an include guard; it's used to avoid the same header file being included multiple times (via different routes) within the same CPP file (or, more accurately, the same translation unit), which would lead to multiple-definition errors.
Include guards aren't needed on CPP files because, by definition, the contents of the CPP file are only read once.
You seem to have interpreted the include guards as having the same function as import statements in other languages (such as Java); that's not the case, however. The #include itself is roughly equivalent to the import in other languages.
It doesn't - at least during the compilation phase.
The translation of a c++ program from source code to machine code is performed in three phases:
Preprocessing - The Preprocessor parses all source code for lines beginning with # and executes the directives. In your case, the contents of your file class.h is inserted in place of the line #include "class.h. Since you might be includein your header file in several places, the #ifndef clauses avoid duplicate declaration-errors, since the preprocessor directive is undefined only the first time the header file is included.
Compilation - The Compiler does now translate all preprocessed source code files to binary object files.
Linking - The Linker links (hence the name) together the object files. A reference to your class or one of its methods (which should be declared in class.h and defined in class.cpp) is resolved to the respective offset in one of the object files. I write 'one of your object files' since your class does not need to be defined in a file named class.cpp, it might be in a library which is linked to your project.
In summary, the declarations can be shared through a header file, while the mapping of declarations to definitions is done by the linker.
That's the distinction between declaration and definition. Header files typically include just the declaration, and the source file contains the definition.
In order to use something you only need to know it's declaration not it's definition. Only the linker needs to know the definition.
So this is why you will include a header file inside one or more source files but you won't include a source file inside another.
Also you mean #include and not import.
That's done for header files so that the contents only appear once in each preprocessed source file, even if it's included more than once (usually because it's included from other header files). The first time it's included, the symbol CLASS_H (known as an include guard) hasn't been defined yet, so all the contents of the file are included. Doing this defines the symbol, so if it's included again, the contents of the file (inside the #ifndef/#endif block) are skipped.
There's no need to do this for the source file itself since (normally) that's not included by any other files.
For your last question, class.h should contain the definition of the class, and declarations of all its members, associated functions, and whatever else, so that any file that includes it has enough information to use the class. The implementations of the functions can go in a separate source file; you only need the declarations to call them.
main.cpp doesn't have to know what is in class.cpp. It just has to know the declarations of the functions/classes that it goes to use, and these declarations are in class.h.
The linker links between the places where the functions/classes declared in class.h are used and their implementations in class.cpp
.cpp files are not included (using #include) into other files. Therefore they don't need include guarding. Main.cpp will know the names and signatures of the class that you have implemented in class.cpp only because you have specified all that in class.h - this is the purpose of a header file. (It is up to you to make sure that class.h accurately describes the code you implement in class.cpp.) The executable code in class.cpp will be made available to the executable code in main.cpp thanks to the efforts of the linker.
It is generally expected that modules of code such as .cpp files are compiled once and linked to in multiple projects, to avoid unnecessary repetitive compilation of logic. For example, g++ -o class.cpp would produce class.o which you could then link from multiple projects to using g++ main.cpp class.o.
We could use #include as our linker, as you seem to be implying, but that would just be silly when we know how to link properly using our compiler with less keystrokes and less wasteful repetition of compilation, rather than our code with more keystrokes and more wasteful repetition of compilation...
The header files are still required to be included into each of the multiple projects, however, because this provides the interface for each module. Without these headers the compiler wouldn't know about any of the symbols introduced by the .o files.
It is important to realise that the header files are what introduce the definitions of symbols for those modules; once that is realised then it makes sense that multiple inclusions could cause redefinitions of symbols (which causes errors), so we use include guards to prevent such redefinitions.
its because of Headerfiles define what the class contains (Members, data-structures) and cpp files implement it.
And of course, the main reason for this is that you could include one .h File multiple times in other .h files, but this would result in multiple definitions of a class, which is invalid.
In Python whenever I had a bunch of functions that I wanted to use across multiple programs I'd make another .py file and then just import that wherever I needed it. How would I do that in C/C++? Do I dump both prototype and implementation into an .h file? or do I need to place the function prototypes in the .h file and the implementations in a separate .cpp file with the same name as the .h file and #include the .h wherever I need it?
You need to do a couple of things:
Add the prototype to a header file.
Write a new source file with the function definitions.
In a source file that just wants to use the shared function, you need to add #include "header.h" (replacing header.h with the name of the file from step 1) someplace before you try to call the shared function (normally you put all includes at the top of the source file).
Make sure your build compiles the new source file and includes that in the link.
A couple of other comments. It's normal to have foo.h as the header for the foo.c but that is only a style guideline.
When using headers, you want to add include guards to protect against the multiple include issue.
In C/C++ we usually put declarations in .h files and implementation in .c/cpp files.
(Note: there're many other ways, for example the include, templates, inline, extern, ... so you may find some code only in header files or only in c/cpp files - for example some of the STL and templates.)
Then you need to "link" the file with your program, which works like the "import" in Python interpreter but actually works in static linking object files together into a single executable file.
However the "link" command and syntax depends on your compiler and OS linker. So you need to check your compiler for more information, for example "ld" on UNIX and "link.exe" on DOS/Windows. Moreover, usually the C compiler will invoke the linker automatically.
For example, say you have 2 files: a.c and b.c (with a.h and b.h), on gcc:
gcc -o a.out a.c b.c
On MSVC:
cl a.c b.c
There are two ways to approach this that differ only slightly. As others have said, the first steps are:
-Create a header file which contains your function prototypes. You'll want to mark this with
# ifndef myheader_h
# define myheader_h
// prototypes go here...
# endif
to prevent problems with multiple inclusions.
-Create a .c file which contains the actual definitions.
Here's where the solutions branch.
If you want to include the source directly in your project, make the .c file part of your compilation stage as well as your link stage.
However, if you really plan on using this across multiple projects, you'll probably want to compile this source file independently, and reference the object file from your other projects. This is loosely what a "library" is, though libraries may consist of multiple object modules - each of which has been compiled but not yet linked.
update
Someone pointed out that this really only keeps the header from being included in a single cpp file. News flash: that's all you need to do.
Compilers treat each cpp file individually. The header files included by each cpp source file tell the compiler, "hey! This thing is defined in another source file! Assume references that match this prototype are A-OK and keep moving on."
The LINKER, on other other hand, is responsible for fixing up these references, and IT will throw a fit if the same symbol is defined in multiple object files. For that to happen, a function would have to be defined in two separate source files - a real definition with a body, not just an extern prototype - OR the object file that contains its body/definition would have to be included in the link command more than once.
Re:"inline"
Use of "inline" is meant as an optmization feature. Functions declared as inline have their bodies expanded inline at each place where they are called. Using this to get around multiple definition errors is very, very bad. This is similar to macro expansion.
See Francis's answer. The sentence that you wrote, "or do I need to place the function prototypes in the .h file and the implementations in a separate .cpp file with the same name as the .h file and #include the .h wherever I need it?", is pretty-much correct. You don't have to do things exactly this way, but it works.
It's up to you how you do this, The compiler doesn't care. But if you put your functions in a .h file, you should declare them __inline otherwise if you include the header file in more than one .cpp file, you will have multiply defined symbols.
On the other hand, if you make them __inline, you will tend to get a copy created in each place that you use the function. This will bloat the size of your program. So unless the functions are quite small, it's probably best to put the functions in a .cpp and create a parallel .h with function prototypes and public structures. This is the way most programmers work.
On the other hand, in the STL (Standard Template Library), virtually all of the code is in header files. (without the .h extension)