As I have understood, a class can be defined in multiple translation units aslong they're identical. With that in mind, consider the following examples:
//1.cpp
class Foo{
public:
int i;
};
void FooBar();
void BarFoo(){
Foo f;
}
int main(){
FooBar();
BarFoo();
}
//2.cpp
class Foo{
public:
std::string s;
};
void FooBar(){
Foo f;
}
This compiles and I don't get a crash.
If I do the following changes:
//1.cpp
Foo FooBar();
//2.cpp
Foo FooBar(){
Foo f;
return f;
}
I get a crash. Why does one result in a crash and the other doesn't. Also, am I not violating ODR in the first example? If I am, why does it compile ok?
The program is ill-formed for the reason you stated. The compiler is not required a diagnostics, but I don't see a point in discussing reasons for a crash in an ill-formed program.
Still, let's do it:
The first example probably doesn't crash because FooBar's behavior doesn't affect the run of main. The method is called, it does something, and that's it.
In the second example, you attempt to return a Foo. FooBar returns the version of Foo defined in 2.cpp. main appears in 1.cpp so it expects the version of Foo defined in 1.cpp, which is a completely different version - different members, sizes. You most likely get a corruption on the destructor. (just a guess)
EDIT: this does break the one definition rule:
3.2 One definition rule [basic.def.odr]
6) There can be more than one definition of a class type [...] in a program provided that each definition
appears in a different translation unit, and provided the definitions satisfy the following requirements. [...]
each definition of D shall consist of the same sequence of tokens;
[...]
Here is how compiler/linker work:
Compiler translates cpp file having the headers that are provided. It generates an .obj file. In your case the o.bj file will have references to data-struct Foo. And there will be no any other details.
Linker links .obj files together. It compares only the string names. In your obj files your have the same Foo. Names match. For the linker this is the same thing.
After that you start your program. Most likely it will crash. To be more specific it will show undefined behavior. It can enter infinite loop, show strange messages, etc.
It is your responsibility to provide identical headers or definitions in the cpp files into translations of every cpp file. Existing software tools cannot check this for you. This is how it works.
Related
There's a particular ODR violation that results from innocent enough code.
// TU1.cpp
struct Foo
{
int Bar() { return 1; }
};
// TU2.cpp
struct Foo
{
int Bar() { return 2; }
};
These two classes have no relation to each other. They are purely implementation details in unrelated cpp files. They just happen to have the same name and a function with the same signature. Member functions have external linkage by default and the functions are inline by default because they are defined inside the class. So the linker sees 2 definitions for Foo::Bar, assumes they are the same, and silently throws one of them away.
I'd like to be able to catch this specific bug, but so far I can't find any options for any of the big three compilers/linkers that will actually catch this. Does such an option exist? Is there another way to catch these kinds of bugs that isn't "wrap every class in every cpp in an anonymous namespace"?
Here's a working example of this specific issue.
https://godbolt.org/z/as9frG5o3
So I stumbled upon this when playing around with compilation units.
I have 2 headers that define a class with the same name. The first compilation unit includes the first header and declares an extern pointer to the class, the second compilation unit includes the second header and defines the pointer.
Now I have T* pointing to an U.
mcve:
h1.h
#pragma once
struct a_struct {
int i;
a_struct(int _i) : i{ _i } {}
};
h2.h
#pragma once
struct a_struct {
float f;
a_struct(float _f) : f{ _f } {}
};
foo.h
#pragma once
struct foo {
int bar();
};
cu1.cpp
#include "foo.h"
#include "h1.h"
extern a_struct* s;
int foo::bar() {
return s->i;
}
cu2.cpp
#include "h2.h"
a_struct* s = new a_struct(1.0f);
main.cpp
#include "foo.h"
#include <iostream>
int main() {
foo f;
std::cout << f.bar() << std::endl; // <- 1065353216
system("PAUSE");
return 0;
}
Why doesn't the linker see that h1.h::a_struct is not a h2.h::a_struct ? Is this mentioned in the standard as undefined behaviour ?
(Also I know naming 2 classes with the same name is stupid...)
Is this mentioned in the standard as undefined behaviour ?
Yes, this is a violation of the "header version" of the One Definition Rule. In this version, which applies to class definitions, inline functions and variables, and other such things commonly defined in header files, multiple definitions of a single entity are allowed in separate translation units, but those definitions must all have the same tokens (after preprocessing) and must all mean essentially the same thing. Multiple definitions which aren't the same in this way are undefined behavior. See [basic.def.odr]/12 in the C++20 draft, and the fifth paragraph under One Definition Rule at cppreference.com.
Why doesn't the linker see that h1.h ::a_struct is not a h2.h ::a_struct ?
In most C++ implementations, the compiler converts a translation unit into an object file containing function code and symbol definitions, and the function code may make use of additional "undefined symbols" to be defined by other objects. By the point of an object file, little is saved about C++ source or type information, except possibly in debugger data. A linker will probably see just that function foo::bar() in cu1.o uses the undefined symbol s, cu2.o defines the symbol s, and the global-dynamic-initialization function of cu2.o also uses the symbol s. The linker will just adjust things so that executing foo::bar() will correctly access the same object s, without much caring what any function actually does with the bytes belonging to that symbol.
(Linkers can sometimes warn when object files disagree about the number of bytes associated with a symbol, but two pointers-to-class-type objects will probably have the same size.)
The compiler compiles each source files separately. It trusts a given class declaration to be the same for all source files.
When you do as above, you trick the compiler into compiling two files with two different definition for some class. Each file generates a self-consistent piece of code.
Then the linker comes in and links your various bits of code together. There's an object/library format which is shared across all compilers. This is to allow every linker to work with every compiler. At this point, all the linker knows is that some code will pass a foo object and some other code will receive a foo object. It's not its business to go peek and check and complain.
Keep in mind that, at link-time, the source code might not even be available. You may have a library from some vendor without source code. And there might be various #defines that could have affected this object. The linker doesn't need to know what the compilation settings were, or even what the source was. The code could even have been written in another language.
To gain this flexibility and interoperability, there's some rules you have to follow. One of them is "don't define the same class twice in different ways".
My 'Headers.h' file includes basic c++ Headers
#include <iostream>
#include <cstring>
// and many header files.
wrote a function definition for file exist check and saved it in 'common_utility.h' - ifFileExist()
common_utility.h
bool ifFileExist()
{
// ... My code
}
Wrote code for Class A
classA.h
class A
{
// Contains class A Declarations.
};
classA.cpp
// Contains
#include "Headers.h"
#include "common_utility.h"
#include "classA.h"
// class A Method definition
Wrote Code for Class B
I am using class A in Class B.
classB.h
class B
{
// Contains class A Declarations.
}
classB.cpp
// Contains
#include "Headers.h"
#include "common_utility.h"
#include "classA.h"
#include "classB.h"
// class B Method definition
// calling the function ifFileExist() in class B also.
wrote code for main program
main.cpp
// Contains
#include "Headers.h"
#include "common_utility.h"
#include "classA.h"
#include "classB.h"
// I am using class A and Class B in main program
// calling the function ifFileExist() in Main program also.
When I compile the whole program as
g++ -std=c++0x classA.cpp classB.cpp main.cpp -o main
I am getting the following error.
In function ifFileExist()': classB.cpp:(.text+0x0): multiple
definition ofifFileExist()'
/tmp/ccHkDT11.o:classA.cpp:(.text+0x2b6e): first defined here
So I decleard ifFileExist() function in Headers.h as extern.
extern bool ifFileExist();
But still I am getting the same error.
I am including 'Headers.h' in every .cpp file. That file contains basic c++ libraries. But I didn't get any mulitple definition error for that header files.
But only in my own function, I am getting the error 'multiple definition'.
I want to use 'common_utility.h' file, when ever I need to use it. If I doesn't need to use the common_utility functions in my main program, simply I should not include it.
I want my program to run in the every following cases.
g++ -std=c++0x classA.cpp main.cpp -o main
g++ -std=c++0x classB.cpp> main.cpp -o main
g++ -std=c++0x classA.cpp classB.cpp main.cpp -o main
I shouldn't get mulitple definition error at any cases. What Should I do now?
Since I could not find any complete (in my view) duplicate for this question, I am going to write a (hopefully) authoritive and complete answer.
What is One Definition Rule and why should I care
A One Definition Rule, usually dubbed ODR, is a rule which states (simplified) that any entity (informal term) used in the program should be defined once, and only once. An entity which is defined more than once is often causing a compilation or linker error, but sometimes can be left undetected by the compiler and lead to very hard-to-trace bugs.
I am not going to formally define entity here, but one can think of it as a function, variable or class. Before going further, one should very clear understand the difference between definition and declaration in C++, since while double definition is prohibited, double declaration is usually unavoidable.
Definition vs. declaration
Every entity used in the code should be declared in the given translation unit (translation unit is usually a cpp source file together with all header files included in it, directly or indirectly through other header files). The way an entitty is declared differes based on the entity itself. See below on how to declare different types of entities. Entities are often declared in header files. Since most complex application has more than one translation unit in it (more than one cpp file), and different cpp files often include the same headers, an application is likely to have multiple declarations for many entities used. Like I said above, this is not a problem.
Every entity used in the application, must be defined once and only once. The term 'application' is used a bit loosely here - for example, libraries (both static and dynamic) can have entities (at this point usually called symbols) left undefined within them, and an executable which was linked to use a dynamic library can have a symbol undefined as well. Instead, I refer to the application is an ultimate running something, after all the libraries have been statically or dynamically linked into it, and symbols resolved.
It is also worth noting that every definition serves as a declaration as well, meaning, that whenever you define something, you are also declaring the same thing.
As with declaration, the way to define an entity differes by the type of entity. Here is how one can declare/define 3 basic types of entities - variables, classes and functions - based on it's type.
Variables
Variables are declared using following construct:
extern int x;
This declares a variable x. It does not define it! A following piece of code will be compiled OK, but an attempt to link it without any other input files (for example, with g++ main.cpp) will produce a link-time error due to undefined symbols:
extern int x;
int main() {
return x;
}
The following piece of code defines variable x:
int x;
If this single line were to be put into file x.cpp, and this file compiled/linked together with main.cpp from above with g++ x.cpp main.cpp -o test it would compile and link without problems. You could even run resulting executable, and if you are to check exit code after the executable was run, you'd notice it is 0. (Since global variable x would be default-initialized to 0).
Functions
Functions are declared by providing their prototypes. A typical function declaration looks like following:
double foo(int x, double y);
This construct declares a function foo, returning double and accepting two arguments - one of type int, another of type double. This declaration can appear multiple times.
Following code defines above mentioned foo:
void foo(int x, double y) {
return x * y;
}
This definition can only appear once in the whole application.
Function definition has an additional quirk to variable definition. If above definition of foo were to put into header file foo.h, which in turn would be included by two cpp files 1.cpp and 2.cpp, which are compiled/linked together with g++ 1.cpp 2.cpp -o test you would have a linker error, saying that foo() is defined twice. This might be prevented by using following form of foo declaration:
inline void foo(int x, double y) {
return x * y;
}
Note inline there. What it tells compiler is that foo can be included by multiple .cpp files, and this inclusion should not produce linker error. Compiler have several options on how to make this happen, but it can be relied upon to do it's job. Note, it would still be an error to have this definition twice in the same translation unit! For example, following code will produce a compiler error
inline void foo() { }
inline void foo() { }
It is worth noting, that any class method defined within the class is implictly inline, for example:
class A {
public:
int foo() { return 42; }
};
Here A::foo() is defined inline.
Classess
Classess are declared by following construct:
class X;
Above declaration declares class X (and at this point X is formally called an incomplete type), so that it can be used when information about it contents, such as it's size or it's members is not needed. For example:
X* p; // OK - no information about class X is actually required to define a pointer to it
p->y = 42; // Error - compiler has no idea if X has any member named `y`
void foo(X x); // OK - compiler does not need to generated any code for this
void foo(X x) { } // Error - compiler needs to know the size of X to generate code for foo to properly read it's argument
void bar(X* x) { } // OK - compiler needs not to know specifics of X for this
A definition of class is well-known to everybody, and follows this construct:
class X {
public:
int y;
};
This makes a class X defined, and now it can be used in any context. An important note - class definition has to be unique per tralnlation unit, but does not have to be unique per application. That is, you can have X defined only once per translation unit, but it can be used in multiple files linked together.
How to properly follow ODR rules
Whenever a same entity is defined more than once in the resulting application, so-called ODR violation happenes. Most of the time, a linker will see the violation and will complain. However, there are cases when ODR violation does not break linking and instead causes bugs. This might happen, for example, when the same .cpp file defining a global variable X is put into both application and dynamic library, which is loaded on demand (with dlopen). (Yours trully spent a couple of days trying to trace a bug happened because of that.)
A more conventional causes of ODR violations are:
Same entity defined twice in the same file in the same scope
int x;
int x; // ODR violation
void foo() {
int x;
} // No ODR violation, foo::x is different from x in the global scope
Prevention: don't do this.
Same entity defined twice, when it was supposed to be declared
(in x.h)
int x;
(in 1.cpp)
#include <x.h>
void set_x(int y) {
x = y;
}
(in 2.cpp)
#include <x.h>
int get_x() {
return x;
}
While the wisdom of above code is questionable at best, in serves a point of illustrating ODR rule. In the code above, variable x is supposed to be shared between two files, 1.cpp and 2.cpp, but was coded incorrectly. Instead, the code should be following:
(in x.h)
extern int x; //declare x
(in x.xpp)
int x; // define x
// 1.cpp and 2.cpp remain the same
Prevention
Know what you are doing. Declare entities when you want them declared, do not define them.
If in the example above we'd use function instead of the variable, like following:
(in x.h)
int x_func() { return 42; }
We would have a problem which could be solved in two ways (as mentioned above). We could use inline function, or we could move definition to the cpp file:
(in x.h)
int x_func();
(in x.cpp)
int x_func() { return 42; }
Same header file included twice, causing the same class defined twice
This is a funny one. Imagine, you have a following code:
(in a.h)
class A { };
(in main.cpp)
#include <a.h>
#include <a.h> // compilation error!
The above code is seldom appearing as written, but it is quite easy to have the same file included twice through the intermediate:
(in foo.h)
#include <a.h>
(in main.cpp)
#include <a.h>
#include <foo.h>
Prevention Traditional solution to this is to use so-called include guards, that is, a special preprocessor definitions which would prevent the double-inclusion. In this regard, a.h should be redone as following:
(in a.h)
#ifndef INCLUDED_A_H
#define INCLUDED_A_H
class A { };
#endif
The code above will prevent inclusion of a.h into the same translation unit more than once, since INCLUDED_A_H will become defined after first inclusion, and will fail #ifndef on all subsequent ones.
Some compilers expose other ways to control inclusion, but to date include guards remain the way to do it uniformely across different compilers.
Before actually compiling source the compilation unit is generated from .cpp files. This basically means that all preprocessor directives are computed: all #include will be replaces with content of the included files, all #define'd values will be substituted with corresponding expressions, all #if 0 ... #endif will be removed, etc. So after this step in your case you'll get two pieces of C++ code without any preprocessor directives that will both have definition of same function bool ifFileExist() that is why you get this multiple definition error.
The fast solution is to mark it as inline bool ifFileExist(). Basically you ask compiler to replace all corresponding function calls with content of the function itself.
Another approach is to live the declaration of your function in common_utility.h and move definition to common_utility.cpp
I am reading Item 4 of Scott Meyer's Effective C++ where he is trying to show an example where a static non-local object is used across different translation units. He is highlighting the problem whereby the object used in one translation unit does not know if it has been initialised in the other one prior to usage. Its page 30 in the third edition in case anyone has a copy.
The example is such:
One file represents a library:
class FileSystem{
public:
std::size_t numDisks() const;
....
};
extern FileSystem tfs;
and in a client file:
class Directory {
public:
Directory(some_params);
....
};
Directory::Directory(some_params)
{
...
std::size_t disks = tfs.numDisks();
...
}
My two questions are thus:
1) If the client code needs to use tfs, then there will be some sort of include statement. Therefore surely this code is all in one translation unit? I do not see how you could refer to code which is in a different translation unit? Surely a program is always one translation unit?
2) If the client code included FileSystem.h would the line extern FileSystem tfs; be sufficient for the client code to call tfs (I appreciate there could be a run-time issue with initialisation, I am just talking about compile-time scope)?
EDIT to Q1
The book says these two pieces of code are in separate translation units. How could the client code use the variable tfs, knowing they're in separate translation units??
Here's a simplified example of how initialization across multiple TUs can be problematic.
gadget.h:
struct Foo;
extern Foo gadget;
gadget.cpp:
#include <foo.h>
#include <gadget.h>
Foo gadget(true, Blue, 'x'); // initialized here
client.cpp:
#include <foo.h>
#include <gadget.h>
int do_something()
{
int x = gadget.frumple(); // problem!
return bar(x * 2);
}
The problem is that it is not guaranteed that the gadgetobject will have been initialized by the time that do_something() refers to it. It is only guaranteed that initializers within one TU are completed before a function in that TU is called.
(The solution is to replace extern Foo gadget; with Foo & gadget();, implement that in gadget.cpp as { static Foo impl; return impl; } and use gadget().frumple().)
Here's the example from the Standard C++03 (I've added the a.h and b.h headers):
[basic.start.init]/3
// a.h
struct A { A(); Use(){} };
// b.h
struct B { Use(){} };
// – File 1 –
#include "a.h"
#include "b.h"
B b;
A::A(){
b.Use();
}
// – File 2 –
#include "a.h"
A a;
// – File 3 –
#include "a.h"
#include "b.h"
extern A a;
extern B b;
int main() {
a.Use();
b.Use();
}
It is implementation-defined whether either a or b is initialized before main is entered or whether the initializations are delayed until a is first used in main. In particular, if a is initialized before main is entered, it is not guaranteed that b will be initialized before it is used by the initialization of a, that is, before A::A is called. If, however, a is initialized at some point after the first statement of main, b will be initialized prior to its use in A::A.
1) If the client code needs to use tfs, then there will be some sort of include statement. Therefore surely this code is all in one translation unit? I do not see how you could refer to code which is in a different translation unit? Surely a program is always one translation unit?
A translation unit is (roughly) a single .cpp file after preprocessing. After you compile a single translation unit you get a module object (which typically have extension .o or .obj); after all TUs have been compiled, they are linked together by the linker to form the final executable. This is often hid by IDEs (and even by the compilers accepting multiple input files on the command line), but it's crucial to understand that building a C++ program is made in (at least) three passes: precompilation, compilation and linking.
The #include statement will include the declaration of the class and the extern declaration, telling to the current translation unit that the class FileSystem is made that way and that, in some translation unit, there's a variable tfs of type FileSystem.
2) If the client code included FileSystem.h would the line extern FileSystem tfs; be sufficient for the client code to call tfs
Yes, the extern declaration tells the compiler that in some TU there's a variable defined like that; the compiler puts a placeholder for it in the object module and the linker, when tying together the various object modules, will fix it with the address of the actual tfs variable (defined in some other translation unit).
Keep in mind that when you write extern you are only declaring a variable (i.e. you are telling the compiler "trust me, there's this thing somewhere"), when you omit it you are both declaring it and defining it ("there's this thing and you have to create it here").
The distinction maybe is clearer with functions: when you write a prototype you are declaring a function ("somewhere there's a function x that takes such parameters and returns this type"), when you actually write the function (with the function body) you are defining it ("this is what this function actually does"), and, if you haven't declared it before, it counts also as a declaration.
For how multiple TUs are actually used/managed, you can have a look at this answer of mine.
Following code generate no compilation/linker error/warning:
// A.h
#include<iostream>
struct A
{
template<typename T>
static void foo (T t)
{
std::cout << "A::foo(T)\n";
}
};
void other ();
// main.cpp
#include"A.h"
int main ()
{
A::foo(4.7);
other();
}
// other.cpp
#include"A.h"
template<>
void A::foo (double d)
{
cout << "A::foo(double)\n";
}
int other ()
{
A::foo(4.7);
}
The output surprisingly is:
A::foo(T)
A::foo(double)
Why compiler is not able to pick up the correct A::foo(double) in case of main.cpp ?
Agree that, there is no issue as expected, if there is a declaration in A.h like below:
template<> void A::foo (double);
But that's not the concern, because at link time, compiler has the specialized version.
Also, is having 2 different version of the same function an Undefined Behavior ?
All explicit specialization declarations must be visible at the time of the template instantiation. Since your explicit specialization declaration for A::foo<double> is visible in one translation unit but not the other, the program is ill-formed.
(In practice, the compiler will instantiate the primary template in main.cpp and the explicitly-specialized one in other.cpp. That would still an ODR violation anyway.)
main.cpp cannot see the code inside other.cpp. Template specializations are of file scope.
Why compiler is not able to pick up the correct A::foo(double) in case of main.cpp ?
The problem is that in a separate compilation model without a declaration available in the header the compiler wouldn't possibly know whether an specialization exists in any translation unit that will later be linked or whether it needs to instantiate the template. The decision in the language is that the absence of a declaration means that there is no manual specialization of the template, and thus the compiler needs to generate one now.
is having 2 different version of the same function an Undefined Behavior ?
Yes it is. Whether one of the specializations was automatically generated or not, the fact is that it is undefined behavior as it is a violation of the One Definition Rule (there are multiple definitions of the same symbol).