When can the linker error on multiply defined symbols? - c++

When I have the following code in A.cpp and B.cpp there is no warning or error generated, but Initializer::Initializer() in B.cpp gets called twice while the one in A.cpp doesn't get called.
static int x = 0;
struct Initializer
{
Initializer()
{
x = 10;
}
};
static Initializer init;
Since this is breaking the one definition rule and causing undefined behavior I think this is perfectly normal. However, when I move the constructor definition outside of the class declaration in either or both files, like this:
static int x = 0;
struct Initializer
{
Initializer();
};
Initializer::Initializer()
{
x = 10;
}
static Initializer init;
The linker suddenly becomes smart enough to error and say one or more multiply defined symbols found. What changed here and why did it matter? I would have thought the linker should always be able to detect ODR breakage - what are the cases when it can't?
Someone correct me if I'm wrong, but my theory is that when you have templated code (where the definitions are always in headers) you end up with duplicated definitions in many compilation units. They happen to all be identical, so it doesn't matter if the linker just chooses one and orphans the others, but it can't error that there are multiple definitions or templates wouldn't work.

Your second example has an obvious and easy to diagnose violation of the one definition rule. It has two definitions of a non-inline function with external linkage. This is easy for the linker to diagnose as the violation is obvious from the names of the functions contained in the object files that it is linking.
Your first example breaks the one definition rule in a much subtler way. Because functions defined in a class body are implicitly declared inline you have to examine the function bodies to determine that the one definition rule has been violated.
For this reason alone I am not surprised that your implementation fails to spot the violation (I didn't the first time around). Obviously the violation is impossible for the compiler to spot when looking at one source file in isolation but it may be that the information to detect the violation and link time is not actually present the object files that are passed to the linker. It is certainly beyond the scope of what I'd expect a linker to find.

The multiply defined symbol error occurs when multiple translation units each have the same name signature for an item with external linkage, except if they are inlined functions or methods.
Inlined functions are typically not subject to ODR at compile time. If they were, inlined implementations of methods would break everywhere. However, the ODR is applied retroactively during link time, in that one inline function is chosen. So, inline functions with the same signature are expected to behave identically. Your inline constructors violate that expectation.
If instead you had declared a template in a header file, like this:
#ifndef I_HH
#define I_HH
tempalte <typename T>
struct Initializer {
Initializer () { x = 10; }
};
#endif
and used this to be included in both A.cpp and B.cpp, and in each you created a static instance:
static int x;
#include "i.hh"
static Initializer<int> init;
I believe the compiler should complain about an improperly formed template (g++ did anyway), which is as good as detecting a violation of ODR (that the constructors would behave differently in different contexts).

if you have the implement of function within the class defination, the function is inlined which don't required to linked. So there has no error.
There must has one definition, but doesn't matter it is included by different cpp file. for example, you defined one class in a header file but include the header file in different cpp files.

Related

Why do class member functions defined outside the class (but in header file) have to be inlined?

I have read existing answers on the two meanings of inline, but I am still confused.
Let's assume we have the following header file:
// myclass.h
#ifndef INCLUDED_MYCLASS
#define INCLUDED_MYCLASS
class MyClass
{
public:
void foo(); // declaration
};
inline void MyClass::foo()
{
// definition
}
#endif
Why does void foo() which is defined outside the class in the file, have to be explicitly defined with inline?
It's because you defined MyClass::foo in a header file. Or a bit more abstract, that definition will be present in multiple translation units (every .cpp file that includes the header).
Having more than one definition of a variable/function in a program is a violation of the one definition rule, which requires that there must be only one definition in a single program of every variable/function.
Note that header guards do not protect against this, as they only protect if you include the same header multiple times in the same file.
Marking the function definition as inline though means that the definition will always be the same across multiple translation units.1.
In practice, this means that the linker will just use the first definition of MyClass::foo and use that everywhere, while ignoring the rest,
1: If this is not the case your program is ill-formed with no diagnostics required whatsoever.
If you put MyClass::foo() in a header file and fail to declare it inline then the compiler will generate a function body for every compilation unit that #includes the header and these will clash at link time. The usual error thrown by the linker is something along the lines of Multiple definition of symbol MyClass::foo() or somesuch. Declaring the function inline avoids this, and the compiler and linker have to be in cahoots about it.
As you mention in your comment, the inline keyword also acts a hint to the compiler that you'd like the function to be actually inlined, because (presumably) you call it often and care more about speed than code size. The compiler is not required to honour this request though, so it might generate one or more function bodies (in different compilation units) which is why the linker has to know that they are actually all the same and that it only needs to keep one of them (any one will do). If it didn't know they were all the same then it wouldn't know what to do, which is why the 'classical' behaviour has always been to throw an error.
Given that the compilers these days often inline small functions anyway, most compilers also have some kind of noinline keyword but that is not part of the standard.
More about inline at cppreference.

Why does defining inline global function in 2 different cpp files cause a magic result?

Suppose I have two .cpp files file1.cpp and file2.cpp:
// file1.cpp
#include <iostream>
inline void foo()
{
std::cout << "f1\n";
}
void f1()
{
foo();
}
and
// file2.cpp
#include <iostream>
inline void foo()
{
std::cout << "f2\n";
}
void f2()
{
foo();
}
And in main.cpp I have forward declared the f1() and f2():
void f1();
void f2();
int main()
{
f1();
f2();
}
Result (doesn't depend on build, same result for debug/release builds):
f1
f1
Whoa: Compiler somehow picks only the definition from file1.cpp and uses it also in f2(). What is the exact explanation of this behavior?.
Note, that changing inline to static is a solution for this problem. Putting the inline definition inside an unnamed namespace also solves the problem and the program prints:
f1
f2
This is undefined behavior, because the two definitions of the same inline function with external linkage break C++ requirement for objects that can be defined in several places, known as One Definition Rule:
3.2 One definition rule
...
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14),[...] in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
6.1 each definition of D shall consist of the same sequence of tokens; [...]
This is not an issue with static functions, because one definition rule does not apply to them: C++ considers static functions defined in different translation units to be independent of each other.
The compiler may assume that all definitions of the same inline function are identical across all translation units because the standard says so. So it can choose any definition it wants. In your case, that happened to be the one with f1.
Note that you cannot rely on the compiler always picking the same definition, violating the aforementioned rule makes the program ill-formed. The compiler could also diagnose that and error out.
If the function is static or in an anonymous namespace, you have two distinct functions called foo and the compiler must pick the one from the right file.
Relevant standardese for reference:
An inline function shall be defined in every translation unit in which it is odr-used and shall have exactly
the same definition in every case (3.2). [...]
7.1.2/4 in N4141, emphasize mine.
As others have noted, the compilers are in compliance with the C++ standard because the One definition rule states that you shall have only one definition of a function, except if the function is inline then the definitions must be the same.
In practice, what happens is that the function is flagged as inline, and at linking stage if it runs into multiple definitions of an inline flagged token, the linker silently discards all but one. If it runs into multiple definitions of a token not flagged inline, it instead generates an error.
This property is called inline because, prior to LTO (link time optimization), taking the body of a function and "inlining" it at the call site required that the compiler have the body of the function. inline functions could be put in header files, and each cpp file could see the body and "inline" the code into the call site.
It doesn't mean that the code is actually going to be inlined; rather, it makes it easier for compilers to inline it.
However, I am unaware of a compiler that checks that the definitions are identical before discarding duplicates. This includes compilers that otherwise check definitions of function bodies for being identical, such as MSVC's COMDAT folding. This makes me sad, because it is a reall subtle set of bugs.
The proper way around your problem is to place the function in an anonymous namespace. In general, you should consider putting everything in a source file in an anonymous namespace.
Another really nasty example of this:
// A.cpp
struct Helper {
std::vector<int> foo;
Helper() {
foo.reserve(100);
}
};
// B.cpp
struct Helper {
double x, y;
Helper():x(0),y(0) {}
};
methods defined in the body of a class are implicitly inline. The ODR rule applies. Here we have two different Helper::Helper(), both inline, and they differ.
The sizes of the two classes differ. In one case, we initialize two sizeof(double) with 0 (as the zero float is zero bytes in most situations).
In another, we first initialize three sizeof(void*) with zero, then call .reserve(100) on those bytes interpreting them as a vector.
At link time, one of these two implementations is discarded and used by the other. What more, which one is discarded is likely to be pretty determistic in a full build. In a partial build, it could change order.
So now you have code that might build and work "fine" in a full build, but a partial build causes memory corruption. And changing the order of files in makefiles could cause memory corruption, or even changing the order lib files are linked, or upgrading your compiler, etc.
If both cpp files had a namespace {} block containing everything except the stuff you are exporting (which can use fully qualified namespace names), this could not happen.
I've caught exactly this bug in production multiple times. Given how subtle it is, I do not know how many times it slipped through, waiting for its moment to pounce.
POINT OF CLARIFICATION:
Although the answer rooted in C++ inline rule is correct, it only applies if both sources are compiled together. If they are compiled separately, then, as one commentator noted, each resulting object file would contain its own 'foo()'. HOWEVER: If these two object files are then linked together, then because both 'foo()'-s are non-static, the name 'foo()' appears in the exported symbol table of both object files; then the linker has to coalesce the two table entries, hence all internal calls are re-bound to one of the two routines (presumably the one in the first object file processed, since it is already bound [i.e the linker would treat the second record as 'extern' regardless of binding]).

Why defining classes in header files works but not functions

I have this little piece of code :
File modA.h
#ifndef MODA
#define MODA
class CAdd {
public:
CAdd(int a, int b) : result_(a + b) { }
int getResult() const { return result_; }
private:
int result_;
};
/*
int add(int a, int b) {
return a + b;
}
*/
#end
File calc.cpp
#include "modA.h"
void doSomeCalc() {
//int r = add(1, 2);
int r = CAdd(1, 2).getResult();
}
File main.cpp
#include "modA.h"
int main() {
//int r = add(1, 2);
int r = CAdd(1, 2).getResult();
return 0;
}
If I understand well, we can't define a function in a header file and use it in different unit translations (unless the function is declared static). The macro MODA wouldn't be defined in each unit translation and thus the body guard wouldn't prevent the header from being copied in place of every #include "modA.h". This would cause the function to be defined at different places and the linker would complain about it. Is it correct ?
But then why is it possible to do so with a class and also with methods of a class. Why doesn't the linker complain about it ?
Isn't it a redefinition of a class ?
Thank you
When member functions are defined in the body of the class definition, they are inline by default. If you qualify the non-member functions inline in the .h file, it will work fine.
Without the inline qualifier, non-member functions defined in .h files are compiled in every .cpp file that the .h file is included in. That violates the following rule from the standard:
3.2 One definition rule
3 Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; ...
You will get the same error if you define member functions outside the body of the class definition in a .h file and did not add the inline qualifier explicitly.
Multiple translation units might need the definition of the class at compile time since it is not possible to know, for example, the types of members of the class (or even whether they exist) unless the definition of the class is available. (Therefore, you must be allowed to define a class in multiple translation units.) On the other hand, a translation unit only needs the declaration of a function because as long as it knows how to call the function, the compiler can leave the job of inserting the actual address of the function to the linker.
But this comes at a price: if a class is defined multiple times in a program, all the definitions must be identical, and if they're not, then you may get strange linker errors, or if the program links, it will probably segfault.
For functions you don't have this problem. If the function is defined multiple times, the linker will let you know. This is good, because it avoids accidentally defining multiple functions with the same name and signature in a given program. If you want to override this, you can declare the function inline. Then the same rule as that for classes applies: the function has to be defined in each translation unit in which it's used in a certain way (odr-used, to be precise) and all the definitions must be identical.
If a function is defined within a class definition, there's a special rule that it's implicitly inline. If this were not the case, then it would make it impossible to have multiple definitions of a class as long as there's at least one function defined in the class definition unless you went to the trouble of marking all such functions inline.
” If I understand well, we can't define a function in a header file and use it in different unit translations (unless the function is declared static)
That's incorrect. You can, and in the presented code CAdd::getResult is one such function.
In order to support general use of a header in multiple translation units, which gives multiple competing definitions of the function, it needs to be inline. A function defined in the class definition, like getResult, is automatically inline. A function defined outside the class definition needs to be explicitly declared inline.
In practical terms the inline specifier tells the linker to just arbitrarily select one of the definitions, if there are several.
There is unfortunately no simple syntax to do the same for data. That is, data can't just be declared as inline. However, there is an exemption for static data members of class templates, and also an inline function with extern linkage can contain static local variables, and so compilers are required to support effectively the same mechanism also for data.
An inline function has extern linkage by default. Since inline also serves as an optimization hint it's possible to have an inline static function. For the case of the default extern linkage, be aware that the standard then requires the function to be defined, identically, in every translation unit where it's used.
The part of the standard dealing with this is called the One Definition Rule, usually abbreviated as the ODR.
In C++11 the ODR is §3.2 “One definition rule”. Specifically, C++11 §3.2/3 specifies the requirement about definitions of an inline function in every relevant translation unit. This requirement is however repeated in C+11 §7.1.2/4 about “Function specifiers”.

What determines which class definition is included for identically-named classes in two source files?

If I have two source files in a project that each define a class of the same name, what determines which version of the class is used?
For example:
// file1.cpp:
#include <iostream>
#include "file2.h"
struct A
{
A() : a(1) {}
int a;
};
int main()
{
// foo() <-- uncomment this line to draw in file2.cpp's use of class A
A a; // <-- Which version of class A is chosen by the linker?
std::cout << a.a << std::endl; // <-- Is "1" or "2" output?
}
...
//file2.h:
void foo();
...
// file2.cpp:
#include <iostream>
#include "file2.h"
struct A
{
A() : a(2) {}
int a;
};
void foo()
{
A a; // <-- Which version of class A is chosen by the linker?
std::cout << a.a << std::endl; // <-- Is "1" or "2" output?
}
I have been able to get different versions of A to be selected by the linker, with identical code - merely by changing the order in which I type the code (building along the way).
Granted, it's poor programming practice to include different definitions of classes in the same namespace with the same name. However, are there defined rules that determine which class will be selected by the linker - and if so, what are they?
As a useful addendum to this question, I would like to know (in general) how the compiler / linker handles classes - does the compiler, when it builds each source file, incorporate the class name and compiled class definition within the object file, whereas the linker (in the scenario of a name clash) throws away one set of compiled class function/member definitions?
The issue of a name clash is not arcane - I now realize that it happens EVERY TIME a header-only template file is #included by two or more source files (and subsequently the same template classes are instantiated, and the same member functions called, in these multiple source files), as is a common scenario with the STL. Each source file must have a separately-compiled version of the same instantiated template class functions, so the linker MUST be selecting among different such compiled versions of these functions at linkage time), I would think.
-- ADDENDUM with related question about Java --
I note that various answers have indicated the One Definition Rule (http://en.wikipedia.org/wiki/One_definition_rule) for C++. As an interesting aside, am I correct that Java has NO SUCH rule - so that multiple, different definitions ARE allowed in Java by the Java specifications?
If a C++ program provides two definitions of the same class (i.e., within the same namespace and named identical), the program violates the rules of the standard and you'll get undefined behavior. What exactly does happen somewhat depends on the compiler and linker: sometimes you get a linker error but this isn't required.
The obvious fix is not to have conflicting class names. The easiest approach to obtain unique class names is to define locally used types within an unnamed namespace:
// file1.cpp
namespace {
class A { /*...*/ };
}
// file2.cpp
namespace {
class A { /*...*/ };
}
These two classes won't conflict.
Such a program violates the One Definition Rule and exhibits undefined behavior.
If there are multiple definitions of a class or inline function in a program (in different translation units, or source files), then all of the definitions must be identical. Neither the compiler nor the linker is required to diagnose all violations of this rule (and not all violations can be easily diagnosed).
This is only successfully linking because the definitions for the 2 constructors are implied to be inline. Try moving them under the class and not using the inline keyword. The kind of linkage you are abusing tells the linker that there will be multiple definitions, where normally it would error that you're breaking the One Definition Rule, which you are actually breaking. Normally this condition that allows you to seemingly break ODR exists for things like templates, which will always have multiple identical definitions in different translation units. But that's the condition: definitions in different translation units must be identical.
Ultimately it's up to your compiler which gets used, in your example.
The compiler will give you a warning for multiple definitions if you allow it (which you should).
The gnu linker resolves symbols in the order that you present files on the command line, so it uses the first definition that it sees. Not sure if all linkers work the same way.
The reason the One Definition Rule exists is so that it doesn't matter which definition is used, they're all identical. It's completely up to the compiler and linker in question as to which version is used, or whether they're consistent. The only externally viewable side effect is when there's a static variable inside a function, a single instance of the variable must be used between all instatiations of the function.
By violating the One Definition Rule you're exposing the mechanics of the compiler/linker in a way that isn't relevant to a correctly written program.

strange redefined symbols

I included this header into one of my own: http://codepad.org/lgJ6KM6b
When I compiled I started getting errors like this:
CMakeFiles/bin.dir/SoundProjection.cc.o: In function `Gnuplot::reset_plot()':
/usr/lib/gcc/x86_64-pc-linux-gnu/4.3.4/include/g++-v4/new:105: multiple definition of `Gnuplot::reset_plot()'
CMakeFiles/bin.dir/main.cc.o:project/gnuplot-cpp/gnuplot_i.hpp:962: first defined here
CMakeFiles/bin.dir/SoundProjection.cc.o: In function `Gnuplot::set_smooth(std::basic_string, std::allocator > const&)':
project/gnuplot-cpp/gnuplot_i.hpp:1041: multiple definition of `Gnuplot::set_smooth(std::basic_string, std::allocator > const&)'
CMakeFiles/bin.dir/main.cc.o:project/gnuplot-cpp/gnuplot_i.hpp:1041: first defined here
CMakeFiles/bin.dir/SoundProjection.cc.o:/usr/include/eigen2/Eigen/src/Core/arch/SSE/PacketMath.h:41: multiple definition of `Gnuplot::m_sGNUPlotFileName'
I know it's hard to see in this mess, but look at where the redefinitions are taking place. They take place in files like /usr/lib/gcc/x86_64-pc-linux-gnu/4.3.4/include/g++-v4/new:105. How is the new operator getting information about a gnuplot header? I can't even edit that file. How could that ever even be possible? I'm not even sure how to start debugging this. I hope I've provided enough information. I wasn't able to reproduce this in a small project. I mostly just looking for tips on how to find out why this is happening, and how to track it down.
Thanks.
You're obviously violating the "one definition rule". You have lots of definitions in your header file. Some of them are classes or class templates (which is fine), some of them are inline functions or function templates (which is also fine) and some of them are "normal" functions and static members of non-templates (which is not fine).
class foo; // declaration of foo
class foo { // definition of foo
static int x; // declaration of foo::x
};
int foo::x; // definition of foo::x
void bar(); // declaration
void bar() {} // definition
The one definition rule says that your program shall contain at most one definition of an entity. The exceptions are classes, inline functions, function templates, static members of class templates (I probably forgot something). For those entities multiple definitions may exist as long as no two definitions of the same entity are in the same translation unit. So, including this header file into more than one translation unit leads to multiple definitions.
Looks like you include conflicting headers. Try to check your include paths. They usually are defined in -I directive (at least in gcc) or in an environment variable.
Reading compiler errors usually helps. You should learn to understand what the compiler is telling you. The fact that it is complaining about a symbol being redefine is saying that you are breaking the One Definition Rule. Then it even tells you what the symbols are:
class GnuPlot {
//...
GnuPlot& reset_plot(); // <-- declaration
//...
};
//...
Gnuplot& Gnuplot::reset_plot() { // <-- Definition
nplots = 0;
return *this;
}
You can declare a symbol as many times as you wish within a program, but you can only define it once (unless it is inlined). In this case the reset_plot is compiled and defined in all translation units that include the header, violating the One Definition Rule.
The simplest way out of it is declaring it inline, so that it can appear in more than one compilation unit and let the linker remove the redundant copies (if any) later on.
A little more problematic are the static members that must be declared within the class and defined (only once) in a translation unit. For that you can either create a .cpp file to define those variables (and any function/method that you don't need inlined in the header).