Wrong symbols linked. Why? [duplicate] - c++

This question already has answers here:
C++ Multiple Definition of Struct
(2 answers)
Why is there no multiple definition error when you define a class in a header file?
(3 answers)
Closed 1 year ago.
C++ translator seems uses correct declared structs of the same name, but then linker mismatches them without any warning or error! And this also leads to UB, because at least inappropriate ctor/dtor are used for the memory region.
Here is minimal sandbox code. Each struct Test should be treated as some internal non-public structure used only in one own .cpp file.
file1.cpp
#include <iostream>
using namespace std;
void someFunc();
struct Test
{
Test() { std::cout << "1 "; }
~Test() { std::cout << "~1" << std::endl; }
};
int main()
{
{
Test test;
}
someFunc();
return 0;
}
file2.cpp
#include <iostream>
struct Test {
Test() { std::cout << "2 "; }
~Test() { std::cout << "~2" << std::endl; }
};
void someFunc() {
Test test;
}
(Downloadable and buildable CMake-project just in case: https://file.io/dzafv409B2t0)
Output will be:
1 ~1
1 ~1
So, I expected:
Successful build with output: 1 ~1 2 ~2
Or failed build with multiple definition error
Yes, I can resolve the problem if:
Rename the struct
Put the struct into anonymous namespace - force internal linkage
...but this doesn't answer the main question:
Why linker behaves so? Why does it silently links to first available matching symbol (among several) instead of reporting multiple definition error?
Update: As I understood, this mechanism allows to include header with class declaration (with inline code) into several different source files without multiple definition problem.

Related

How to make a variable available to multiple .cpp files using a class?

This question has derived from this one.
I have a working program which must be split into multiple parts. In this program is needed to use a variable (now it's a GTK+ one :P) many times in parts of the program that will end up in separated .cpp files.
So, I made a simple example to understand how to make variables available to the program parts. A modified version of the previous code would be:
#include <iostream>
using namespace std;
int entero = 10;
void function()
{
cout<<entero<<endl;
//action1...;
}
void separated_function()
{
cout<<entero<<endl;
//action2...;
}
int main( int argc, char *argv[] )
{
function();
separated_function();
cout<<entero<<endl;
//something else with the mentioned variables...;
return 0;
}
It is needed to split the code correctly, to have function(), another_function() and main() in separated .cpp files,and make entero avaliable to all of them... BUT:
In the previous question #NeilKirk commented:Do not use global variables. Put the required state into a struct or class, and pass it to functions as necessary as a parameter (And I also have found many web pages pointing that is not recommended to use global variables).
And, as far I can understand, in the answer provided by #PaulH., he is describing how to make variables avaliable by making them global.
This answer was very useful, it worked fine not only with char arrays, but also with ints, strings and GTK+ variables (or pointers to variables :P).
But since this method is not recommended, I would thank anyone who could show what would be the correct way to split the code passing the variables as a function parameter or some other method more recommended than the - working - global variables one.
I researched about parameters and classes, but I'm a newbie, and I messed the code up with no good result.
You need to give the parameter as a reference if you want the same comportement as a global variable
#include <iostream>
using namespace std;
// renamed the parameter to avoid confusion ('entero' is valid though)
void function(int &ent)
{
cout<<ent<<endl;
++ent; // modify its value
//action1...;
}
void separated_function(int &ent)
{
cout<<ent<<endl;
++ent; // modify its value again
//action2...;
}
int main( int argc, char *argv[] )
{
int entero = 10; // initializing the variable
// give the parameter by reference => the functions will be able to modify its value
function(entero);
separated_function(entero);
cout<<entero<<endl;
//something else with the mentioned variables...;
return 0;
}
output:
10
11
12
Defining a class or struct in a header file is the way to go, then include the header file in all source files that needs the classes or structures. You can also place function prototypes or preprocessor macros in header files if they are needed by multiple source files, as well as variable declarations (e.g. extern int some_int_var;) and namespace declarations.
You will not get multiple definition errors from defining the classes, because classes is a concept for the compiler to handle, classes themselves are never passed on for the linker where multiple definition errors occurs.
Lets take a simple example, with one header file and two source files.
First the header file, e.g. myheader.h:
#ifndef MYHEADER_H
#define MYHEADER_H
// The above is called include guards (https://en.wikipedia.org/wiki/Include_guard)
// and are used to protect the header file from being included
// by the same source file twice
// Define a namespace
namespace foo
{
// Define a class
class my_class
{
public:
my_class(int val)
: value_(val)
{}
int get_value() const
{
return value_;
}
void set_value(const int val)
{
value_ = val;
}
private:
int value_;
};
// Declare a function prototype
void bar(my_class& v);
}
#endif // MYHEADER_H
The above header file defines a namespace foo and in the namespace a class my_class and a function bar.
(The namespace is strictly not necessary for a simple program like this, but for larger projects it becomes more needed.)
Then the first source file, e.g. main.cpp:
#include <iostream>
#include "myheader.h" // Include our own header file
int main()
{
using namespace foo;
my_class my_object(123); // Create an instance of the class
bar(my_object); // Call the function
std::cout << "In main(), value is " << my_object.get_value() << '\n';
// All done
}
And finally the second source file, e.g. bar.cpp:
#include <iostream>
#include "myheader.h"
void foo::bar(foo::my_class& val)
{
std::cout << "In foo::bar(), value is " << val.get_value() << '\n';
val.set_value(456);
}
Put all three files in the same project, and build. You should now get an executable program that outputs
In foo::bar(), value is 123
In main(), value is 456
I prefer to provide a functional interface to global data.
.h file:
extern int get_entero();
extern void set_entero(int v);
.cpp file:
static int entero = 10;
int get_entero()
{
return entero;
}
void set_entero(int v)
{
entero = v;
}
Then, everywhere else, use those functions.
#include "the_h_file"
void function()
{
cout << get_entero() << endl;
//action1...;
}
void separated_function()
{
cout << get_entero() << endl;
//action2...;
}
int main( int argc, char *argv[] )
{
function();
separated_function();
cout<< get_entero() <<endl;
//something else with the mentioned variables...;
return 0;
}
If you do not plan to modify the variable, it is generally ok to make it global. However, it is best to declare it with the const keyword to signal the compiler that it should not be modified, like so:
const int ENTERO = 10;
If you are using multiple cpp files, also consider using a header file for your structures and function declarations.
If you are planning on modifying the variable, just pass it around in function parameters.

linker error in simple program: multiple definition of function

My function test is added to two different .cpp-files and the functions are private to their respective files as shown below
test1.cpp
#include <iostream>
using namespace std;
void test()
{
cout << "test" << endl;
}
test2.cpp
#include <iostream>
using namespace std;
void test()
{
cout << "test" << endl;
}
main.cpp
#include <iostream>
using namespace std;
int main()
{
return 0;
}
During linking I get the error multiple definition of test() - but how is that possible, considering that the two files have their own private scope!? I could understand it if I included the function prototype in each .cpp-files' corresponding header, but there is no such thing in this example.
You need the inline keyword for that:
inline void test()
{
cout << "test" << endl;
}
This allows you to have multiple definitions in separate source files without violating the one-definition rule. However, note that the function still has external linkage and they will all resolve to the same address. Also:
An inline function shall be defined in every translation unit in which
it is odr-used and shall have exactly the same definition in every
case
If you want separate functions with different addresses (internal linkage), use the static keyword instead.
Both test functions are in the same global namespace of the program. In order to avoid error you may:
1) wrap any or both functions in namespace:
namespace A
{
void test()
{
...
}
}
2) use static keyword
3) just rename one of them
Add static in each test function.
#include <iostream>
using namespace std;
static
void test()
{
cout << "test" << endl;
}
To elaborate on above answers:
In C++, function declarations can be repeated as many times as you want. A function definition however (i.e. the function body), can occur only once.
When creating your binary, the compiler compiles each file to a obj file so in your example you end up with test1.obj, test2.obj and main.obj. After all files compiled successfully, the linker links them together to create your executable. This is where multiple definitions for the same function are found and why linking fails.
Depending on what you want, you can do the following to resolve this:
If you want multiple different functions with the same name, then you have to disambiguate them. C++ wouldn't be C++ if you only had one way to do this:
The old c way: use the static keyword
Use an anonymous namespace
Use a namespace
If you want only one function:
Separate the definition from the declaration, i.e. put the declaration in a header file and move the definition to a source file.
Define the function as inline in a header

how to prove that when compile the templates in C++, the compiler create multiple copies and remove the copies when link

how to prove that when that compile the templates in C++, the compiler generates an instantiation in each compilation unit that uses it, then the linker throws away all but one of them[the commond model];
so there are 2 thing we should prove
1. create multiple copies 2.remove the copies when link
we can prove the second one use the code like
////head.h
#ifndef _TEMP_H
#define _TEMP_H
#include <typeinfo>
#include <iostream>
template<typename T>
class Test
{
public:
Test(T i = 0) : val(i) {}
void getId() const
{
std::cout << typeid(*this).name() << std::endl;
}
void getVal() const
{
std::cout << "Val: " << val << std::endl;
}
private:
T val;
};
#endif
//a.cpp
#include "head.h"
Test<int> a(1);
//b.cpp
#include "head.h"
extern Test<int> a;
int main()
{
Test<int> b;
a.getId();
b.getId();
a.getVal();
b.getVal();
return 0;
}
compiler: g++ 4.4.1
get the result :
4TestIiE
4TestIiE
Val: 1
Val: 0
So the second one has been proved;
But I can not prove the first one
I google it and have some sugestions as followed
1. use the dump yes we can dump the objfile and get the result
but can we write some code to output something to prove it??
Number 1 is easy. Just create a bunch of different source files and include the template header in each one, and use the template class to produce some output. Then compile each source file separately. Now link them each one by one with a main program that calls it. If you don't get any linker errors but the program generates the output, that proves each compiled object file contained the template code.
P.S. The extra copies might not get eliminated, they may still exist as dead code in the executable.
Some compilers definitely don't do that. The IBM C++ compiler generates required templates at link time and compiles them then, in a repeat-until-closure process.

When should linkers generate multiply defined X warnings?

Never turn your back on C++. It'll getcha.
I'm in the habit of writing unit tests for everything I do. As part of this I frequently define classes with names like A and B, in the .cxx of the test to exercise code, safe in the knowledge that i) because this code never becomes part of a library or is used outside of the test, name collisions are likely very rate and ii) the worst that could happen is that the linker will complain about multiply defined A::A() or what every and I'll fix that error. How wrong I was.
Here are two compilation units:
#include <iostream>
using namespace std;
// Fwd decl.
void runSecondUnit();
class A {
public:
A() : version( 1 ) {
cerr << this << " A::A() --- 1\n";
}
virtual ~A() {
cerr << this << " A::~A() --- 1\n";
}
int version; };
void runFirstUnit() {
A a;
// Reports 1, correctly.
cerr << " a.version = " << a.version << endl;
// If you uncomment these, you will call
// secondCompileUnit: A::getName() instead of A::~A !
//A* a2 = new A;
//delete a2;
}
int main( int argc, char** argv ) {
cerr << "firstUnit BEGIN\n";
runFirstUnit();
cerr << "firstUnit END\n";
cerr << "secondUnit BEGIN\n";
runSecondUnit();
cerr << "secondUnit END\n";
}
and
#include <iostream>
using namespace std;
void runSecondUnit();
// Uncomment to fix all the errors:
//#define USE_NAMESPACE
#if defined( USE_NAMESPACE )
namespace mySpace
{
#endif
class A {
public:
A() : version( 2 ) {
cerr << this << " A::A() --- 2\n";
}
virtual const char* getName() const {
cerr << this << " A::getName() --- 2\n"; return "A";
}
virtual ~A() {
cerr << this << " A::~A() --- 2\n";
}
int version;
};
#if defined(USE_NAMESPACE )
} // mySpace
using namespace mySpace;
#endif
void runSecondUnit() {
A a;
// Reports 1. Not 2 as above!
cerr << " a.version = " << a.version << endl;
cerr << " a.getName()=='" << a.getName() << "'\n";
}
Ok, ok. Obviously I shouldn't have declared two classes called A. My bad. But I bet you can't guess what happens next...
I compiled each unit, and linked the two object files (successfully) and ran. Hmm...
Here's the output (g++ 4.3.3):
firstUnit BEGIN
0x7fff0a318300 A::A() --- 1
a.version = 1
0x7fff0a318300 A::~A() --- 1
firstUnit END
secondUnit BEGIN
0x7fff0a318300 A::A() --- 1
a.version = 1
0x7fff0a318300 A::getName() --- 2
a.getName()=='A'
0x7fff0a318300 A::~A() --- 1
secondUnit END
So there are two separate A classes. In the second use, the destructor and constructor for the first on was used, even though only the second one was in visible in its compilation unit. Even more bizarre, if I uncomment the lines in runFirstUnit, instead of calling either A::~A, the A::getName is called. Clearly in the first use, the object gets the vtable for the second definition (getName is the second virtual function in the second class, the destructor the second in the first). And it even correcly gets the constructor from the first.
So my question is, why didn't the linker complain about the multiply defined symbols.
It appears to choose the first match. Reordering the objects in the link step confirm.
The behavior is identical in Visual Studio, so I'm guessing that this is some standard-defined behavior. My question is, why? Clearly it would be easy for the linker to barf given the duplicate names.
If I add,
void f() {}
to both files it complains. Why not for my class constructors and destructors?
EDIT The problem isn't, "what should I have done to avoid this", or "how is the behavior explained". It is, "why don't linkers catch it?" Projects may have thousands of compile units. Sensible naming practices don't really solve this issue -- they only make the problem obscure and only then if you can train everyone to follow them.
The above example leads to ambiguous behavior that is easy and definitively solvable by compiler tools. So, why do they not? Is this simply a bug. (I suspect not.)
** EDIT ** See litb's answer below. I'm repeating is back to make sure my understanding's right:
Linkers only generate warnings for strong references.
Because we have shared headers, inline function definitions (i.e. where declaration and definition is made at the same place, or template functions) are be compiled into multiple object files for each TU that sees them. Because there's no easy way to restrict the generation this code to a single object file, the linker has the job of choosing one of many definitions. So that errors are not generated by the linker, the symbols for these compiled definitions are tagged as weak references in the object file.
The compiler and linker relies on both classes to be exactly the same. In your case, they are different and so strange things happen. The one definition rule says that the result is undefined behavior - so behavior is not at all required to be consistent among compilers. . I suspect that in runFirstUnit, in the delete line, it puts a call to the first virtual table entry (because in its translation unit, the destructor may occupy the first entry).
In the second translation unit, this entry happens to point to A::getName, but in the first translation unit (where you execute the delete), the entry points to A::~A. Since these two are differently named (A::~A vs A::getName) you don't get a name clash (you will have code emitted for both the destructor and getName). But since their class name is the same, their v-tables will clash on purpose, because since both classes have the same name, the linker will think they are the same class and assume same contents.
Notice that all member functions were defined in-class, which means they are all inline functions. These functions can be defined multiple times in a program. In the case of in-class definitions, the rationale is that you may include the same class definition into different translation units from their header files. Your test function, however, isn't an inline function and thus including it into different translation units will triggers a linker error.
If you enable namespaces, there will be no clash what-so ever, because ::A and ::mySpace::A are different classes, and of course will get different v-tables.
A simple way to restrict each class to the current translation unit is to enclose it in an anonymous namespace:
// a.cpp
namespace {
class A {
// ...
};
}
// b.cpp
namespace {
class A {
// ...
};
}
is perfecetly legal. Because the two classes are in separate translation units, and are inside anonymous namespaces, they won't conflict.
The functions are defined as inline. inline functions can be defined multiple times in the program. See point 3 in the summary here:
http://en.wikipedia.org/wiki/One_Definition_Rule
The important point is:
For a given entity, each definition must be the same.
Try not defining the functions as inline. The linker should start to give duplicate symbol errors then.

several definitions of the same class

Playing around with MSVC++ 2005, I noticed that if the same class is defined several times, the program still happily links, even at the highest warning level. I find it surprising, how comes this is not an error?
module_a.cpp:
#include <iostream>
struct Foo {
const char * Bar() { return "MODULE_A"; }
};
void TestA() { std::cout << "TestA: " << Foo().Bar() << std::endl; }
module_b.cpp:
#include <iostream>
struct Foo {
const char * Bar() { return "MODULE_B"; }
};
void TestB() { std::cout << "TestB: " << Foo().Bar() << std::endl; }
main.cpp:
void TestA();
void TestB();
int main() {
TestA();
TestB();
}
And the output is:
TestA: MODULE_A
TestB: MODULE_A
It is an error - the code breaks the C++ One Definition Rule. If you do that, the standard says you get undefined behaviour.
The code links, because if you had:
struct Foo {
const char * Bar() { return "MODULE_B"; }
};
in both modules there would NOT be a ODR violation - after all, this is basically what #including a header does. The violation comes because your definitions are different ( the other one contains the string "MODULE_A") but there is no way for the linker (which just looks at class/function names) to detect this.
The compiler might consider that the object is useless besides its use in Test#() function and hence inlines the whole thing. That way, the linker would never see that either class even existed ! Just an idea, though.
Or somehow, linking between TestA and class Foo[#] would be done inside compilation. There would be a conflict if linker was looking for class Foo (multiple definition), but the linker simply does not look for it !
Do you have linking errors if compiling in debug mode with no optimizations enabled ?