Understanding the origin of a linker duplicate symbol error

Understanding the origin of a linker duplicate symbol error - c++

I have a c++ program that compiled previously, but after mucking with the Jamfiles, the program no longer compiled and ld emitted a duplicate symbol error. This persisted after successively reverting to the original Jamfiles, running bjam clean, removing the objects by hand, and switching from clang with the gcc front end to gcc 4.2.1 on MacOs 10.6.7.
A simplified description of the program is that there is main.cpp and four files, a.h,cpp and b.h,cpp, which are compiled into a static library which is linked to main.o. Both, main.cpp and b.cpp depend on the file containing the offending symbol, off.h, through two different intermediate files, but neither a.h nor a.cpp depend in any way on off.h.
Before you ask, I made sure that all files were wrapped in multiple definition guards (#ifndef, #define, #endif), and while I did find a file that was missing them, it did not reference off.h. More importantly, b.h does not include anything that references off.h, only the implementation, b.cpp, makes any reference to off.h. This alone had me puzzled.
To add to my confusion, I was able to remove the reference to off.h from b.cpp and, as expected, it recompiled successfully. However, when I added the reference back in, it also compiled successfully, and continued to do so after cleaning out the object files. I am still at a loss for why it was failing to compile, especially considering that the symbols should not have conflicted, I had prevented symbol duplication, and I had gotten rid of any prior/incomplete builds.
Since I was able to successfully compile my program, I doubt I'll be able to reproduce it to test out any suggestions. However, I am curious as to how this can happen, and if I run across this behavior in the future, what, if anything beyond what I've done, might I do to fix it?

This is often the result of defining an object in a header file, rather than merely declaring it. Consider:
h.h :
#ifndef H_H_
#define H_H_
int i;
#endif
a.cpp :
#include "h.h"
b.cpp :
#include "h.h"
int main() {}
This will produce a duplicate symbol i. The solution is to declare the object in the header file: extern int i; and to define it in exactly one of the source-code files: int i;.

Related

One Definition Rule - compilation

I am trying to understand ODR.
I created one file pr1.cpp like this:
struct S{
int a;
};
a second file pr2.cpp like this
struct S {
char a;
};
and a main file like this :
#include <iostream>
int main() {
return 0;
}
I am compiling using terminal with the command :
g++ -Wall -Wextra pr1.cpp pr2.cpp main.cpp -o mypr
The compiler does not find any kind of error BUT there are two declarations of the type "S"...I am not understanding what is really happening..I thought to get an error after the "linkage" phase because of the ODR violation..
I can get the error only editing the main.cpp file adding :
#include "pr1.cpp"
#include "pr2.cpp"
Can anyone exmplain me what is happening?

Unless you include both definitions in the same file, there is no problem. This is because the compiler operates on a single translation unit which is usually a .cpp file. Everything you #include in this file is also part of the translation unit because the preprocessor basically copies and pastes the contents of all included files.
What happens is that the compiler will create and object file (.obj usually) for each translation unit and then the linker will create a single executable (or .dll etc) by linking all the object files and the libraries the project depends on. In your case the compiler encountered each struct in a different translation unit so it doesn't see a problem. When you include both files, the two definitions now find themselves in the same translation unit and the compiler throws an error because it cannot resolve the ambiguity if an S is used in this translation unit (even though you don't have to use one for the program to be ill-formed).
As a side-note, do not include .cpp files in other .cpp files. I'm sure you can find a lot on how to organize your code in header and source files and it doesn't directly answer the question so I won't expand on it.
EDIT: I neglected to say why you didn't get a linker error. Some comments have pointed out that this is undefined behavior which means that even though your linker should probably complain it doesn't actually have to. In your case you have one .obj file for each struct and a main.obj. None of these references the other so the linker does not see any references that it needs to resolve and it probably doesn't bother checking for ambiguous symbols.
I assume most linkers would throw an error if you declared struct S; and tried to use a S* or S& (an actual S would require definition inside the same translation unit). That is because the linker would need to resolve that symbol and it would find two matching definitions. Given that this is undefined, though, a standard-compliant linker could just pick one and silently link your program into something nonsensical because you meant to use the other. This can be especially dangerous for structs that get passed around from one .cpp to the other as the definition needs to be consistent. It might also be a problem when identically named structs/classes are passed through library boundaries. Always avoid duplicating names for these reasons.

Why does including the .h also make the .cpp source come along with it?

I'm an experienced programmer, but only in high level languages; I'm doing my first really large project in C++ right now.
I've got two classes, ClassA and ClassB; a ClassA is (among other things) an index of ClassBs, so ClassA needs to know what a ClassB is to build arrays out of it, and a ClassB needs to know what a ClassA is so it can update the index when something changes. Both of these classes are in their own .h & .cpp files.
I figured including each from the other would just cause infinite recursion, so I decided to instead have #include "ClassA.cpp" and #include "ClassB.cpp" at the beginning of main.cpp; but doing this just caused the compiler to warn about multiple definitions of every class and method in those files.
After some experimentation I found out that including ClassA.h and ClassB.h produces the desired behavior - but this doesn't make any sense, I'm only including the prototypes of those classes. Surely the code that actually makes them up never gets mixed in? And yet it does.
What's going on here that I don't understand? Why does including ClassA.h also make the actual code for ClassA show up with it? And why does including ClassA.cpp cause every include of ClassA.h to trigger "multiple definition" errors even though they're in a header shield or whatever it's called?

The missing step is that the definitions in ClassA.cpp and ClassB.cpp will not be seen by the linker unless those files are also compiled at some point. If you did something like this:
g++ main.cpp ClassA.cpp ClassB.cpp
then all references to definitions in ClassA.cpp and ClassB.cpp from main.cpp would be resolved. However, if you only did
g++ main.cpp
then the linker would have no idea where to find the definitions in ClassA.cpp and ClassB.cpp and you would probably get an error.
If you're using an IDE, this detail is hidden from you: the IDE ensures that as long as you add a .cpp file to your "project", it will be compiled into the final binary when you build the project.

This is the way how C++ is designed:
Your classes don't need to now anything more than the prototypes of other classes, so you don't have to include more than the headers.
Why is this so? Well, compilation of an entire application is the combination of two steps: compilation of the code itself and then linking (actually, there is a third step preceding these: pre-processing, but we could consider this one as part of code compilation).
Example function call: It is sufficient (exception: inline functions!) to know that a function with a specific proto type exists. The compiler then can generate all the code necessary to do the function call, except for the actuall address of the function - for which it leaves some kind of place holder.
The linker then combines all code generated during the compilation step to a single unit. As now knowing where every function is located, it can fill their actual addresses into the place holders, wherever they may appear.

C++ code is compiled to *.obj for per .cpp file, and it is the link process make the obj files to an executable.
Never include *.cpp because it usually causes redifinition issue.
For each *.h file, add a macro to avoid multiple including:
#ifndef XXX_H
#define XXX_H
//your code goes here
#endif

C++ Multiple Include Annoyances

So I'm writing a program which has gotten large enough now that it has several separate source files, and as a result, several separate header files. I keep constantly running into multiple include issues.
The problem is that I compile all of the individual files before I link them. So, A.cpp and B.cpp both include Z.h, because both A.cpp and B.cpp use function declarations and the such which exist inside of Z.h . This is all fine during the compile stage, because everything is in order, but when I go to link A.o and B.o together, the compiler (linker) throws multiple definition errors, because it's included the function definitions from Z.h while it was compiling each of the .o files, and so they exist in both .o files. This can normally be avoided by using include guards, but in this case, they won't work, since each .cpp file is compiled separately, the compiler "forgets" the state of defined preprocessor variables.
So my question is, how is this solved in the real world? I've had a good dig around and have come up dry, but I'm certain that this must have been solved before.
Thanks!

So, A.cpp and B.cpp both include Z.h, because both A.cpp and B.cpp use
function declarations and the such which exist inside of Z.h
This cannot be technically correct, or at least it's an incomplete description. Z.h most likely does not only contain function declarations but also function definitions.
Function declaration:
void f();
Function definition:
void f() { std::cout << "doing something\n"; }
So my question is, how is this solved in the real world?
You solve this problem by keeping the declarations in Z.h and moving the definitions into yet another to-be-created Z.cpp file.

Hard to say exactly what problem you're running into without code, but you are probably defining functions or variables in your headers. That's not what headers are for unless the functions are inline or templates. Including a header is like copy/pasting all the code in it into your cpp file. If you have the same variable in every cpp file, and it's not static or in an anonymous namespace, you'll have multiple definitions when you try to link and the linker will puke.

Conditional compilation confusion and failure

I want to compile different files with a common *.c file. Like I want to compile A.c common.c xor B.c common.c but I can't figure out how to achieve that.
Can you please tell me how do I make common.c use different headers without using my text editor to change the headers list every time I want to compile
So let's say I have 3 files: A.c, B.c and common.c.
A.h and B.h define an enum enum {Mon, Tues, Wed...} in different ways. And that enum is used in common.c. So I can't just do in common.c:
#include "A.h"
#include "B.h"
What I thought of doing is to use preprocessor directives:
In common.h
#define A 1
#define B 2
In A.c
#define COMPILE_HEADER A
And in common.c
#if COMPILE_HEADER == A
#include A.h
#end
This doesn't work, of course, because the compiler didn't visit the A.c file to find #define COMPILE_HEADER A
So can you please tell me how do I make common.c use different headers without using my text editor to change the headers list every time I want to compile?

It's pretty complicated to explain, but I'll give it a try.
Explaination
The compiler, for example gcc, gets the input files provided, includes (literally just copy-pastes) the header files into their respective places (where the #include directive is located) in the *.c file, then compiles the file to the object (*.o) file. No executable is created. Here comes the linker, which is included in gcc. It takes the *.o files and links them into one executable.
The problem is, that the files are compiled independently, and then linked together. I mean, that predefinition like int func(int param); is like saying to compiler "Hey man, don't worry about any usage of func in the code, the linker will care". Compiler then just saves this usage as external symbol in the corresponding *.o file, and when linker is doing his job, he firstly finds the location of the symbol definition (the function implementation) and then just points to it whenever the func is called.
Try to include function definition in header file, then include it in 2 or more files from same project (compiled/linked together). Compiler will say it's ok, since the code is correct and the generated code is valid. Then, the linker will try to link it into one executable and he would have to decide which version of the same name-param function should it link to. Since most developer tools are not really good at making the right choices, he will just yell at you saying "hey man, you gave me two definitions of same function, what to do now?". This results with an error like this:
obj\Release\b.o:b.cpp:(.text+0x0): multiple definition of 'func(int)'
obj\Release\a.o:a.cpp:(.text+0xc): first defined here
What about having two main in one project?
obj\Release\b.o:b.cpp:(.text.startup+0x0): multiple definition of 'main'
obj\Release\a.o:a.cpp:(.text.startup+0x0): first defined here
Both files compile, but they cannot be linked together.
The header files are meant to contain class definitions and function predefinitions to allow you to write them only once, and then share between all files that want to use them. You can always just type class definitions for each file separately (as long as they stay same) and use them just like you would use them in .h file, same applies for function predefinitions.
There comes your problem. You have to compile only one file, not include only one. You can do this by using the preprocessor trick, but I wouldn't reccomend as solution, since it can be solved much easier (I'll tell you how in a moment).
TL;DR; (The actual answer without explaination)
You can #ifdef / #ifndef both files, then in another *.c define (or not define) some value. For example:
A.cpp
#include "t.h"
#ifdef USE_A
int func(int a)
{
return a + 5;
}
#endif
B.cpp
#include "t.h"
#ifndef USE_A
int func(int a)
{
return a * 10;
}
#endif
T.h
#ifndef T_H
#define T_H
//Or comment to use B
#define USE_A
int func(int a);
#endif // T_H
main.cpp
#include <iostream>
#include "t.h"
using namespace std;
int main()
{
cout << func(3);
}
Notice that the #ifdef/#ifndef are after the #include, so the preprocessor knows which one to compile. Also, it's not possible (at least I can't think of any way) to make the definition in any .c file, because they are compiled separately (as described above). You could use the -D switch, but that involves messing with build configurations in environments and if you want to do that it's better to try the second solution presented bellow.
Better answer
You should choose one file to compile with each version. It hardly depends on your needs, but basically instead of using g++ main.cpp a.cpp b.cpp you should compile either a.cpp or b.cpp. If you are using integrated environment like Visual Studio or Code Blocks, the configuration managers allow you to decide which file to include. I mean, those Debug/Release dropdowns can contain your own entries, which will customize your project. Then, switching between the A.cpp and B.cpp is just a question of choosing appropiate option in the always-visible bar in your environment.
If you want a more detailed tutorial on how to manage configurations on Code::Blocks or Visual Studio create appropiate question on stackoverflow and it will be answered in no time :)

Can't compile C++ in Ubuntu using GCC -- Include/Library Problems (collect2: ld returned 1 exit status)

I guess I'm not linking something right?
I want to call ABC.cpp which needs XYZ.h and XYZ.cpp. All are in my current directory and I've tried #include <XYZ.h> as well as#include "XYZ.h".
Running $ g++ -I. -l. ABC.cpp at the Ubuntu 10 Terminal gives me:
`/tmp/ccCneYzI.o: In function `ABC(double, double, unsigned long)':
ABC.cpp:(.text+0x93): undefined reference to `GetOneGaussianByBoxMuller()'
collect2: ld returned 1 exit status`
Here's a summary of ABC.cpp:
#include "XYZ.h"
#include <iostream>
#include <cmath>
using namespace std;
double ABC(double X, double Y, unsigned long Z)
{
...stuff...
}
int main()
{
...cin, ABC(cin), return, cout...
}
Here's XYZ.h:
#ifndef XYZ_H
#define XYZ_H
double GetOneGaussianByBoxMuller();
#endif
Here's XYZ.cpp:
#include "XYZ.h"
#include <cstdlib>
#include <cmath>
// basic math functions are in std namespace but not in Visual C++ 6
//(comment's in code but I'm using GNU, not Visual C++)
#if !defined(_MSC_VER)
using namespace std;
#endif
double GetOneGaussianByBoxMuller()
{
...stuff...
}
I'm using GNU Compiler version g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3.
This is my first post; I hope I included everything that someone would need to know to help me. I have actually read the "Related Questions" and the Gough article listed in one of the responses, as well as searched around for the error message. However, I still can't figure out how it applies to my problem.
Thanks in advance!

When you run g++ -I. -l. ABC.cpp you are asking the compiler to create an executable out of ABC.cpp. But the code in this file replies on a function defined in XYZ.cpp, so the executable cannot be created due to that missing function.
You have two options (depending on what it is that you want to do). Either you give the compiler all of the source files at once so that it has all the definitions, e.g.
g++ -I. -l. ABC.cpp XYZ.cpp
or, you use the -c option compile to ABC.cpp to object code (.obj on Windows, .o in Linux) which can be linked later, e.g.
g++ -I. -l. -c ABC.cpp
Which will produce ABC.o which can be linked later with XYZ.o to produce an executable.
Edit: What is the difference between #including and linking?
Understanding this fully requires understanding exactly what happens when you compile a C++ program, which unfortunately even many people who consider themselves to be C++ programmers do not. At a high level, the compilation of a C++ program goes through three stages: preprocessing, compilation, and linking.
Preprocessing
Every line that starts with # is a preprocessor directive which is evaluated at the preprocessing stage. The #include directive is literally a copy-and-paste. If you write #include "XYZ.h", the preprocessor replaces that line with the entire contents of XYZ.h (including recursive evaluations of #include within XYZ.h).
The purpose of including is to make declarations visible. In order to use the function GetOneGaussianByBoxMuller, the compiler needs to know that GetOneGaussianByBoxMuller is a function, and to know what (if any) arguments it takes and what value it returns, the compiler will need to see a declaration for it. Declarations go in header files, and header files are included to make declarations visible to the compiler before the point of use.
Compiling
This is the part where the compiler runs and turns your source code into machine code. Note that machine code is not the same thing as executable code. An executable requires additional information about how to load the machine code and the data into memory, and how to bring in external dynamic libraries if necessary. That's not done here. This is just the part where your code goes from C++ to raw machine instructions.
Unlike Java, Python, and some other languages, C++ has no concept of a "module". Instead, C++ works in terms of translation units. In nearly all cases, a translation unit corresponds to a single (non-header) source code file, e.g. ABC.cpp or XYZ.cpp. Each translation unit is compiled independently (whether you run separate -c commands for them, or you give them to the compiler all at once).
When a source file is compiled, the preprocessor runs first, and does the #include copy-pasting as well as macros and other things that the preprocessor does. The result is one long stream of C++ code consisting of the contents of the source file and everything included by it (and everything included by what it included, etc...) This long stream of code is the translation unit.
When the translation unit is compiled, every function and every variable used must be declared. The compiler will not allow you to call a function for which there is no declaration or to use a global variable for which there is no declaration, because then it wouldn't know the types, parameters, return values, etc, involved and could not generate sensible code. That's why you need headers -- keep in mind that at this point the compiler is not even remotely aware of the existence of any other source files; it is only considering this stream of code produced by the processing of the #include directives.
In the machine code produced by the compiler, there are no such things as variable names or function names. Everything must become a memory address. Every global variable must be translated to a memory address where it is stored, and every function must have a memory address that the flow of execution jumps to when it is called. For things that are defined (i.e. for functions, implemented) in the translation unit, the compiler can assign an address. For things that are only declared (usually as a result of included headers) and not defined, the compiler does not at this point know what the memory address should be. These functions and global variables for which the compiler has only a declaration but not a definition/implementation, are called external symbols, and they are presumed to exist in a different translation unit. For now, their memory addresses are represented with placeholders.
For example, when compiling the translation unit corresponding to ABC.cpp, it has a definition (implementation) of ABC, so it can assign an address to the function ABC and wherever in that translation unit ABC is called, it can create a jump instruction to that address. On the other hand, although its declaration is visible, GetOneGaussianByBoxMuller is not implemented in that translation unit, so its address must be represented with a placeholder.
The result of compiling a translation unit is an object file (with the .o suffix on Linux).
Linking
One of the main jobs of the linker is to resolve external symbols. That is, the linker looks through a set of object files, sees what their external symbols are, and then tries to find out what memory address should be assigned to them, replacing the placeholder.
In your case the function GetOneGaussianByBoxMuller is defined in the translation unit corresponding to XYZ.cpp, so inside XYZ.o it has been assigned a specific memory address. In the translation unit corresponding to ABC.cpp, it was only declared, so inside ABC.o, it is only a placeholder (external symbol). The linker, if given both ABC.o and XYZ.o will see that ABC.o needs an address filled in for GetOneGaussianByBoxMuller, find that address in XYZ.o, and replace the placeholder in ABC.o with it. Addresses for external symbols can also be found in libraries.
If the linker fails to find an address for GetOneGaussianByBoxMuller (as it does in your example where it is only working on ABC.o, as a result of not having passed XYZ.cpp to the compiler), it will report an unresolved external symbol error, also described as an undefined reference.
Finally, once the compiler has resolved all external symbols, it combines all of the now-placeholder-free object code, adds in all the loading information that the operating system needs, and produces an executable. Tada!
Note that through all of this, the names of the files don't matter one bit. It's a convention that XYZ.h should contain declarations for things that are defined in XYZ.cpp, and it's good for maintainable code to organize things that way, but the compiler and linker don't care one bit whether that's true or not. The linker will look through all the object files it's given and only the object files it's given to try to resolve a symbol. It neither knows nor cares which header the declaration of the symbol was in, and it will not try to automatically pull in other object files or compile other source files in order to resolve a missing symbol.
... wow, that was long.

Try
g++ ABC.cpp XYZ.cpp
If you want to compile the seprately you need to build object files:
g++ -c ABC.cpp
g++ -c XYZ.cpp
g++ ABC.o XYZ.o

Wish I had read these when I was having these problems:
http://c.learncodethehardway.org/book/learn-c-the-hard-waych3.html
http://www.thegeekstuff.com/2010/08/make-utility/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Understanding the origin of a linker duplicate symbol error - c++

Related

One Definition Rule - compilation

Why does including the .h also make the .cpp source come along with it?

C++ Multiple Include Annoyances

Conditional compilation confusion and failure

Can't compile C++ in Ubuntu using GCC -- Include/Library Problems (collect2: ld returned 1 exit status)

Categories

Resources