Why does the one definition rule exist in C/C++ - c++

In C and C++, you can't have a function with two definitions. For example, say we have the following two files:
1.c:
int main(){ return 0}
2.c:
int main(){ return 0}
Issuing the command gcc 1.c 2.c will give you a duplicate symbol linker error.
Why doesn't the same happen with structs and classes? Why are we allowed to have multiple
definitions of the same struct as long as they have the same tokens?

To answer this question, one has to delve into compilation process and what is needed in each part (question why these steps are perfomed is more historical, going back to beginning of C before it's standardization)
C and C++ programs are compiled in multiple steps:
Preprocessing
Compilation
Linkage
Preprocessing is everything that starts with #, it's not really important here.
Compilation is performed on each and every translation unit (typically a single .c or .cpp file plus the headers it includes). Compiler takes one translation unit at a time, reads it and produces an internal list of classes and their members, and then assembly code of each function in given unit (basing on the structures list). If a function call is not inlined (e.g. it is defined in different TU), compiler produces a "link" - "please insert function X here" for the linker to read.
Then linker takes all of the compiled translation units and merges them into one binary, substituting all the links specified by compiler.
Now, what is needed at each phase?
For compilation phase, you need the
definition of every class used in this file - compiler needs to know the size and offset of each class member to produce assembly
declaration of every function used in this file - to produce those "links".
Since function definitions are not needed for producing assembly (as long as they are compiled somewhere), they are not needed in compilation phase, only in linking phase.
To sum up:
One Definition Rule is there to protect programmers from theselves. If they'd accidentally define a function twice, linker will notice that and executable is not produced.
However, class definitions are required in every translation unit, and therefore such a rule cannot be set up for them. Since it cannot be forced by language, programmers have to be responsible beings and not define the same class in different ways.
ODR has also other limitations, e.g. you have to define template functions (or template class methods) in header files. You can also take the responsibility and say to the compiler "Every definition of this function will be the same, trust me dude" and make the function inline.

There is no use case for a function with 2 definitions. Either the two definitions would have to be the same, making it useless, or the compiler wouldn't be able to tell which one you meant.
This is not the case with classes or structures. There is also a large advantage to allowing multiple definitions of them, i.e. if we want to use a class or struct in multiple files. (This leads indirectly to multiple definitions because of includes.)

Structures, classes, unions and enumerations define types that can be used in several compilation units to define objects of these types. So each compilation unit need to know how the types are defined, for example to allocate correctly memory for an object or to be sure that specified member of a class does indeed exist.
For functions (if they are not inline functions) it is enough to have their declaration without their definition to generate for example a function call.
But the function definition shall be single. Otherwise the compiler will not know what function to call or the object code will be too big due to duplication and will be error prone..

It's quite simple: It's a question of scope. Non-static functions are seen (callable) by every compilation unit linked together, while structures are only seen in the compilation unit where they are defined.
For example, it's valid to link the following together because it's clear which definition of struct Foo and which definition of f is being used:
1.c:
struct Foo { int x; };
static void f(void) { struct Foo foo; ... }
2.c:
struct Foo { double d; };
static void f(void) { struct Foo foo; ... }
int main(void) { ... }
But it isn't valid to link the following together because the linker wouldn't know which f to call.
1.c:
void f(void) { ... }
2.c:
void f(void) { ... }
int main(void) { f(); }

Actually every programming element is associated with a scope of its applicability. And within this scope you cannot have the same name associated with multiple definitions of an element. In compiled world:
You cannot have more than one class definition with the same name within a single file. But you can have it in different compilation units.
You cannot have the same function or global variable name within a single link unit (library or executable), but you can potentially have functions named the same within different libraries.
you cannot have shared libraries with the same name situated in the same directory, but you can have them in different directories.
C/C++ compilation is very much after the compilation performance. Checking 2 objects like function or classes for identity is a time-consuming task. So, it is not done. Only names are considered for comparison. It is better to consider that 2 types are different and error out then checking them for identity. The only exception from this rule are text macros.
Macros are a pre-processor concept and historically it is allowed to have multiple identical macro definitions. If a definition changes, a warning gets generated. Comparing macro context is easy, just a simple string comparison, but some macro definitions could be huge.
Types are the compiler concept and they are resolved by the compiler. Types do not exist in object libraries and are represented by the sizes of corresponding variables. So, there is no reason for checking type name collisions at this scope.
Functions and variables on the other hand are named pointers to executable codes or data. They are the building blocks of applications. Applications are assembled from the codes and libraries coming from all around the world in some cases. In order to use someone else's function you'd better now its name and you do not want the same name to be used by some one else. Within a shared library names of functions and variables are usually stored in a hash table. There is no place for duplicates there.
And as I already mention checking functions for identical contents is seldom done, however there are some cases, but not in c or c++.

The reason of impeding two different definitions for the same thing to be used in programming is to avoid the ambiguity of deciding which definition to use at run time.
If you have two different implementations to the same thing to coexist in a program, then there's the possibility of aliasing them (with a different name each) into a common reference to decide at runtime which one of the two to use.
Anyway, in order to distinguish both, you have to be able to indicate the compiler which one you want to use. In C++ you can overload a function, giving it the same name and different lists of parameters, so you can distinguish which one of both you want to use. But in C, the compilers only preserve the name of the function to be able to solve at link time which definition matches the name you use in a different compilation unit. In case the linker ends with two different definitions with the same name, it is uncapable of deciding for you which one to use, so it emits an error and gives up the building process.
What should be the intention of using this ambiguity in a productive way? this is the question you have actually to ask to yourself.

Related

How are templated C++ functions in headers sent to the linker?

Suppose we have this header, with a function:
//template<int does_nothing=0> // <-- !! links correctly if uncommented
void incAndShow() {
static int myStaticVar = 0;
std::cout << ++myStaticVar << " " << std::endl;
}
If this is included in several .cpp files, you'll get a linker error due to the redefinition of incAndShow().
Now, if you uncomment the //template<int does_nothing=0> line above, this code will compile correctly with no linker error. And, the static variable will be shared correctly in all instances of incAndShow<0>().
This is because the C++ standard explicitly states that this should be allowed in its "One Definition Rule" (Page 43). The abbreviated text is:
There can be more than one definition of a... non-static function template (17.5.6)... provided that each definition appears in a different translation unit...
Now, note that the above is only true for templated functions, and not for non-templated functions. This is because, at least in the traditional paradigm, non-templated functions are compiled individually within each translation unit, and then the linker, envisioned as a totally separate process from the compiler, sees multiple definitions of the same symbol in different .obj files and throws the error.
But somehow, this doesn't happen for templated functions.
So my question is: how, in a low-level sense, does this work?
Does the compiler just compile each translation unit independently, meaning it generates multiple versions of the same symbol within each translation unit, but then have some special way to tell the linker to not worry about it?
Does the compiler, on-the-fly, keep a note of the first time some templated function definition is seen in some translation unit, and automatically ignore upon seeing the same definition again when compiling the next translation unit?
Does the compiler do some kind of "pre-compilation process," before compiling anything, where it goes through each file and removes duplicate templated code?
Something else?
This has been brought up on StackOverflow a few times, but this is the first time I've seen anyone ask how the compilation process really happens on a low-level:
Static variable inside template function (where my example came from)
Why C++'s <vector> templated class doesn't break one definition rule?

Why does defining inline global function in 2 different cpp files cause a magic result?

Suppose I have two .cpp files file1.cpp and file2.cpp:
// file1.cpp
#include <iostream>
inline void foo()
{
std::cout << "f1\n";
}
void f1()
{
foo();
}
and
// file2.cpp
#include <iostream>
inline void foo()
{
std::cout << "f2\n";
}
void f2()
{
foo();
}
And in main.cpp I have forward declared the f1() and f2():
void f1();
void f2();
int main()
{
f1();
f2();
}
Result (doesn't depend on build, same result for debug/release builds):
f1
f1
Whoa: Compiler somehow picks only the definition from file1.cpp and uses it also in f2(). What is the exact explanation of this behavior?.
Note, that changing inline to static is a solution for this problem. Putting the inline definition inside an unnamed namespace also solves the problem and the program prints:
f1
f2
This is undefined behavior, because the two definitions of the same inline function with external linkage break C++ requirement for objects that can be defined in several places, known as One Definition Rule:
3.2 One definition rule
...
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14),[...] in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
6.1 each definition of D shall consist of the same sequence of tokens; [...]
This is not an issue with static functions, because one definition rule does not apply to them: C++ considers static functions defined in different translation units to be independent of each other.
The compiler may assume that all definitions of the same inline function are identical across all translation units because the standard says so. So it can choose any definition it wants. In your case, that happened to be the one with f1.
Note that you cannot rely on the compiler always picking the same definition, violating the aforementioned rule makes the program ill-formed. The compiler could also diagnose that and error out.
If the function is static or in an anonymous namespace, you have two distinct functions called foo and the compiler must pick the one from the right file.
Relevant standardese for reference:
An inline function shall be defined in every translation unit in which it is odr-used and shall have exactly
the same definition in every case (3.2). [...]
7.1.2/4 in N4141, emphasize mine.
As others have noted, the compilers are in compliance with the C++ standard because the One definition rule states that you shall have only one definition of a function, except if the function is inline then the definitions must be the same.
In practice, what happens is that the function is flagged as inline, and at linking stage if it runs into multiple definitions of an inline flagged token, the linker silently discards all but one. If it runs into multiple definitions of a token not flagged inline, it instead generates an error.
This property is called inline because, prior to LTO (link time optimization), taking the body of a function and "inlining" it at the call site required that the compiler have the body of the function. inline functions could be put in header files, and each cpp file could see the body and "inline" the code into the call site.
It doesn't mean that the code is actually going to be inlined; rather, it makes it easier for compilers to inline it.
However, I am unaware of a compiler that checks that the definitions are identical before discarding duplicates. This includes compilers that otherwise check definitions of function bodies for being identical, such as MSVC's COMDAT folding. This makes me sad, because it is a reall subtle set of bugs.
The proper way around your problem is to place the function in an anonymous namespace. In general, you should consider putting everything in a source file in an anonymous namespace.
Another really nasty example of this:
// A.cpp
struct Helper {
std::vector<int> foo;
Helper() {
foo.reserve(100);
}
};
// B.cpp
struct Helper {
double x, y;
Helper():x(0),y(0) {}
};
methods defined in the body of a class are implicitly inline. The ODR rule applies. Here we have two different Helper::Helper(), both inline, and they differ.
The sizes of the two classes differ. In one case, we initialize two sizeof(double) with 0 (as the zero float is zero bytes in most situations).
In another, we first initialize three sizeof(void*) with zero, then call .reserve(100) on those bytes interpreting them as a vector.
At link time, one of these two implementations is discarded and used by the other. What more, which one is discarded is likely to be pretty determistic in a full build. In a partial build, it could change order.
So now you have code that might build and work "fine" in a full build, but a partial build causes memory corruption. And changing the order of files in makefiles could cause memory corruption, or even changing the order lib files are linked, or upgrading your compiler, etc.
If both cpp files had a namespace {} block containing everything except the stuff you are exporting (which can use fully qualified namespace names), this could not happen.
I've caught exactly this bug in production multiple times. Given how subtle it is, I do not know how many times it slipped through, waiting for its moment to pounce.
POINT OF CLARIFICATION:
Although the answer rooted in C++ inline rule is correct, it only applies if both sources are compiled together. If they are compiled separately, then, as one commentator noted, each resulting object file would contain its own 'foo()'. HOWEVER: If these two object files are then linked together, then because both 'foo()'-s are non-static, the name 'foo()' appears in the exported symbol table of both object files; then the linker has to coalesce the two table entries, hence all internal calls are re-bound to one of the two routines (presumably the one in the first object file processed, since it is already bound [i.e the linker would treat the second record as 'extern' regardless of binding]).

C++ - how can I inline a function that resides in a .lib?

As far as I know it's impossible to have a function declared as "inline" in a lib file and have that function "magically inlined" into a caller function into another project (since linking is not the same as compiling and the latter happens before).
How could I inline a function when having multiple functions (into multiple libraries) that have the same declaration but different definition?
e.g.
obj1.lib
void function1() { printf("Hi"); }
obj2.lib
void function1() {printf("whatsup?"); }
main.cpp
void function1();
int main()
{
function1(); // I'd like to be able to inline this, I can steer the linking against obj1 or obj2, but I can't inline this one
}
To inline a function from an object (or a library) file you'll need this object file be compiled with link-time optimization (LTO). See Inlining functions from object files for more details.
Even if you inline functions, they always have to have the same definition: having different defintions for the same entity in a C++ program is a violation of the one definition rule (ODR) specified in 3.2 [basic.def.odr]. ODR violations are often not detected by compilers and linkers and tend to result in rather weird problems.
You'll need to make sure the functions are different, e.g., using one of these techniques:
Give them a different name.
Put the functions into a namespace.
Give the functions a different signature.
Make the static to have them visible only in the given translation units.
The simplest you can do is to give the functions different names.
If you want built-time selection of a function with a given name, that has 2 or more different implementations, and you want to support machine code inlining of that function, then declare it as inline, which requires also supplying the implementation in each variant's header, and use include and lib paths to select the relevant header (for compilation) and lib (for linking) – they'd better match. As with any inline function this doesn't guarantee machine code inlining. With respect to machine code inlining inline is just a hint (it's guaranteed effect is about permitting a definition in each translation unit, and requiring such a definition in each translation unit where it's used).
How to use include and lib paths depends on your toolchain.
Edited in response to Sam Cristall's comment.
If you mean "inline at compile time" then:
In order to use a library you need to include the header file(s) associated with that library. If the desired function is declared inline in that header and the function definition (the body of the function) is available, then the compiler will (at it's discretion) inline the function. Otherwise it won't.
If you mean "inline at link time" (an unfortunate overloading of the word "inline") than see the other answers

What's the difference between inline member function and normal member function?

Is there any difference between inline member function (function body inline) and other normal member function (function body in a separate .cpp file)?
for example,
class A
{
void member(){}
};
and
// Header file (.hpp)
class B
{
void member();
};
// Implementation file (.cpp)
void B::member(){}
There is absolutely no difference.
The only difference between the two is that the member inside the class is implicitly tagged as inline. But this has no real meaning.
See: inline and good practices
The documentation says that the inline tag is a hint to the compiler (by the developer) that a method should be inlined. All modern compilers ignore this hint and use there own internal heuristic to determine when a method should be inlined (As humans are notoriously bad and making this decision).
The other use of inline is that it tells the linker that it may expect to see multiple definitions of a method. When the function definition is in the header file each compilation unit that gets the header file will have a definition of the function (assuming it is not inlined). Normally this would cause the linker to generate errors. With the inline tag the compiler understands why there are multiple definitions and will remove all but one from the application.
Note on inlining the processes: A method does not need to be in the header file to inlined. Modern compilers have a processes a full application optimization where all functions can be considered for inlining even if they have been compiled in different compilation units. Since the inline flag is generally ignored it make no difference if you put the method in the header or the source file.
Ignore the word inline here and compiler hints because it is not relevant.
The big practical difference in A and B is when they are used in different libraries.
With the case of A you can #include the header and are not required to link against anything. So you can use this class from different applications / libraries without any special linkage.
With the case of B, you need B.cpp and this should be compiled only into one library / application. Any other library or application that needs to use this class will need to link against the one that contains the actual body of the code.
With some setups / implementations you will need to specifically mark the class as "exported" or "imported" between libraries (for example with Windows you can use dllimport / dllexport and with GNU you can use attribute(visibility="default") )
The first one is implicitly inline, i.e. suggesting the compiler to expand it at the call site.
Other than the inline thing, there's a difference in that you could put more definitions in between the definition of class B, and the definition of the function.
For example, B.cpp might include header files that B.hpp doesn't, which can make a significant difference to the build process for large projects.
But even without a separate translation unit, you can occasionally have a circular dependency that's resolved by separating the definitions. For example the function might take a parameter of a type that's forward-declared before B is defined, then defined by the time the function is defined. If that type uses the definition of B in its own definition, it can't just be be defined before B.

Why do functions need to be declared before they are used?

When reading through some answers to this question, I started wondering why the compiler actually does need to know about a function when it first encounters it. Wouldn't it be simple to just add an extra pass when parsing a compilation unit that collects all symbols declared within, so that the order in which they are declared and used does not matter anymore?
One could argue, that declaring functions before they are used certainly is good style, but I am wondering, is there are any other reason why this is mandatory in C++?
Edit - An example to illustrate: Suppose you have to functions that are defined inline in a header file. These two function call each other (maybe a recursive tree traversal, where odd and even layers of the tree are handled differently). The only way to resolve this would be to make a forward declaration of one of the functions before the other.
A more common example (though with classes, not functions) is the case of classes with private constructors and factories. The factory needs to know the class in order to create instances of it, and the class needs to know the factory for the friend declaration.
If this is requirement is from the olden days, why was it not removed at some point? It would not break existing code, would it?
How do you propose to resolve undeclared identifiers that are defined in a different translation unit?
C++ has no module concept, but has separate translation as an inheritance from C. A C++ compiler will compile each translation unit by itself, not knowing anything about other translation units at all. (Except that export broke this, which is probably why it, sadly, never took off.)
Header files, which is where you usually put declarations of identifiers which are defined in other translation units, actually are just a very clumsy way of slipping the same declarations into different translation units. They will not make the compiler aware of there being other translation units with identifiers being defined in them.
Edit re your additional examples:
With all the textual inclusion instead of a proper module concept, compilation already takes agonizingly long for C++, so requiring another compilation pass (where compilation already is split into several passes, not all of which can be optimized and merged, IIRC) would worsen an already bad problem. And changing this would probably alter overload resolution in some scenarios and thus break existing code.
Note that C++ does require an additional pass for parsing class definitions, since member functions defined inline in the class definition are parsed as if they were defined right behind the class definition. However, this was decided when C with Classes was thought up, so there was no existing code base to break.
Historically C89 let you do this. The first time the compiler saw a use of a function and it didn't have a predefined prototype, it "created" a prototype that matched the use of the function.
When C++ decided to add strict typechecking to the compiler, it was decided that prototypes were now required. Also, C++ inherited the single-pass compilation from C, so it couldn't add a second pass to resolved all symbols.
Because C and C++ are old languages. Early compilers didn't have a lot of memory, so these languages were designed so a compiler can just read the file from top to bottom, without having to consider the file as a whole.
I think of two reasons:
It makes the parsing easy. No extra pass needed.
It also defines scope; symbols/names are available only after its declaration. Means, if I declare a global variable int g_count;, the later code after this line can use it, but not the code before the line! Same argument for global functions.
As an example, consider this code:
void g(double)
{
cout << "void g(double)" << endl;
}
void f()
{
g(int());//this calls g(double) - because that is what is visible here
}
void g(int)
{
cout << "void g(int)" << endl;
}
int main()
{
f();
g(int());//calls g(int) - because that is what is the best match!
}
Output:
void g(double)
void g(int)
See the output at ideone : http://www.ideone.com/EsK4A
The main reason will be to make the compilation process as efficient as possible. If you add an extra pass you're adding both time and storage. Remember that C++ was developed back before the time of Quad Core Processors :)
The C programming language was designed so that the compiler could be implemented as a one-pass compiler. In such a compiler, each compilation phase is only executed once. In such a compiler you cannot referrer to an entity that is defined later in the source file.
Moreover, in C, the compiler only interpret a single compilation unit (generally a .c file and all the included .h files) at a time. So you needed a mechanism to referrer to a function defined in another compilation unit.
The decision to allow one-pass compiler and to be able to split a project in small compilation unit was taken because at the time the memory and the processing power available was really tight. And allowing forward-declaration could easily solve the issue with a single feature.
The C++ language was derived from C and inherited the feature from it (as it wanted to be as compatible with C as possible to ease the transition).
I guess because C is quite old and at the time C was designed efficient compilation was a problem because CPUs were much slower.
Since C++ is a static language, the compiler needs to check if values' type is compatible with the type expected in the function's parameters. Of course, if you don't know the function signature, you can't do this kind of checks, thus defying the purpose of a static compiler. But, since you have a silver badge in C++, I think you already know this.
The C++ language specs were made right because the designer didn't want to force a multi-pass compiler, when hardware was not as fast as the one available today. In the end, I think that, if C++ was designed today, this imposition would go away but then, we would have another language :-).
One of the biggest reasons why this was made mandatory even in C99 (compared to C89, where you could have implicitly-declared functions) is that implicit declarations are very error-prone. Consider the following code:
First file:
#include <stdio.h>
void doSomething(double x, double y)
{
printf("%g %g\n",x,y);
}
Second file:
int main()
{
doSomething(12345,67890);
return 0;
}
This program is a syntactically valid* C89 program. You can compile it with GCC using this command (assuming the source files are named test.c and test0.c):
gcc -std=c89 -pedantic-errors test.c test0.c -o test
Why does it print something strange (at least on linux-x86 and linux-amd64)? Can you spot the problem in the code at a glance? Now try replacing c89 with c99 in the command line — and you'll be immediately notified about your mistake by the compiler.
Same with C++. But in C++ there're actually other important reasons why function declarations are needed, they are discussed in other answers.
* But has undefined behavior
Still, you can have a use of a function before it is declared sometimes (to be strict in the wording: "before" is about the order in which the program source is read) -- inside a class!:
class A {
public:
static void foo(void) {
bar();
}
private:
static void bar(void) {
return;
}
};
int main() {
A::foo();
return 0;
}
(Changing the class to a namespace doesn't work, per my tests.)
That's probably because the compiler actually puts the member-function definitions from inside the class right after the class declaration, as someone has pointed it out here in the answers.
The same approach could be applied to the whole source file: first, drop everything but declaration, then handle everything postponed. (Either a two-pass compiler, or large enough memory to hold the postponed source code.)
Haha! So, they thought a whole source file would be too large to hold in the memory, but a single class with function definitions wouldn't: they can allow for a whole class to sit in the memory and wait until the declaration is filtered out (or do a 2nd pass for the source code of classes)!
I remember with Unix and Linux, you have Global and Local. Within your own environment local works for functions, but does not work for Global(system). You must declare the function Global.