I have read materials below:
https://www.wikiwand.com/en/One_Definition_Rule
http://en.cppreference.com/w/cpp/language/definition
What is the difference between a definition and a declaration?
But still, can't figure out why it is One Definition Rule rather than One Declaration Rule?
I maintain that declaration is a subset of definition, so One Definition Rule is enough.
One declaration rule would be too strict, preventing programs that use the same header more than once from compiling. It would also make it impossible to define data structures with back references.
A simple way to see the first point (using headers) is to consider a program composed of two translation units, A.cpp and B.cpp, which both include <string> header.
Translation units A.cpp and B.cpp are translated independently. By including <string>, both translation units acquire a declaration of std::string.
As for the second point (data structures with back references) consider an example of defining a tree in which each node has a back reference to its parent tree:
// Does not compile
struct tree {
struct node *root;
};
struct node {
struct node *left;
struct node *right;
struct tree *owner;
};
This example would not compile, because node from struct node *tree is undeclared. Switching the order of struct node and struct tree declarations wouldn't help, because then tree from struct tree *owner would be undeclared. The only solution in C and C++ is to introduce a second declaration for either of the two structs.
Because the same declaration, in a .h file, may be included in multiple compilation units, and because multiple definitions is definitely a programming error, whereas multiple declarations isn't.
Definition is a subset of declaration, not the other way around. Every definition is a declaration, and there are declarations that are not definitions.
int i = 3; // definition and declaration
extern int i; // ok: (re)declaration
int i = 4; // error: redefinition
extern int j; // declaration
extern int j; // ok: (re)declaration
int j = 5; // ok: (re)declaration and definition
int j = 6; // error: redefinition
Because when you have declared a function in a header file
// header toto.h
int f(void);
and you want to define it in the compilation unit where it belongs, you'd do
#include "toto.h"
int f(void) {
return 0;
}
The definition is also a declaration, so this compilation unit sees two declarations, one in the header and one in the .c or .cpp file.
In short, the multiple declaration rule allows to check for consistency between different source files.
The reason is really that the C++ translation model can easily deal with conflicting multiple declarations; it just requires the compiler part of the toolset to detect errors like this in the source code:
int X();
void X(); // error
A compiler can easily do that.
And when there are no such errors in any translation units, then there's no problem; every X() call in every translation unit is identical; what remains to do is for the linker to link every call to the one correct destination. The declarations have done their job and no longer play a role.
Now with multiple definitions, it's not that easy. Definitions are something which concerns multiple translation units and which goes beyond the scope of the compilation phase.
We've already seen that in the example above. The X() calls are in place, but now we need the guarantee that they all end up at the same destination, the same definition of X().
That there can be only one such definition should be clear, but how to enforce it? Put in simple terms, when it's time to link the object code together, the source code has already been dealt with.
The answer is that C++ basically chooses to put the burden on the programmer. Forcing compiler/linker implementors to check all multiple definitions for equality and detect differences would be beyond the capabilities of C++ toolsets in most real-life situations or completely break the way those tools work, so the pragmatic solution is to just forbid it and/or force the programmer to make sure that they are all identical or else get undefined behaviour.
Related
I am learning multiple file compilation in C++ and found practice like this:
#ifndef MY_LIB_H
#define MY_LIB_H
void func(int a, int b);
#endif
Some people say that this practice is adopted to avoid repeating declarations.
But I try to declare a function twice and the code just runs well without any compilation error (like below).
int func();
int func();
int func()
{
return 1;
}
So is it really necessary to avoid repeating declarations? Or is there another reason for using #ifndef?
Some people say that this practice is adopted to avoid repeating declarations.
If some people say that then what they say is misleading. Header guards are used to avoid repeating definitions in order to conform to the One Definition Rule.
Repeating declarations is okay. Repeating definitions is not.
int func(); // declaration
int func(); // declaration; repetition is okay
class X; // declaration
class X; // declaration; repetition is okay
class Y {}; // definition
class Y {}; // definition; repetition is not okay
If a header consists only of declarations it can be included multiple times. But that's inefficient: the compiler has to compile each declaration, determine that it's just a duplicate, and ignore it. And, of course, even if it consists only of declarations at the moment, some future maintainer (including you) will, at some point, change it.
So is it really necessary to avoid repeating declarations?
You can have multiple declarations for a given entity(name). That is you can repeat declarations in a given scope.
is there another reason for using #ifndef?
The main reason for using header guards is to ensure that the second time a header file is #included, its contents are discarded, thereby avoiding the duplicate definition of a class, inline entity, template, and so on, that it may contain.
In other words, so that the program conform to the One Definition Rule(aka ODR).
what's the difference between the following 3 cases:
1) in point.h:
class point
{
int x,y;
public:
int getX();
};
int point::getX() {
return this->x;
}
2) in point.h:
class point
{
int x,y;
public:
int getX()
{
return this->x;
}
};
3) in point.h:
class point
{
int x,y;
public:
int getX();
};
int point.cpp:
int point::getX() {
return this->x;
}
Note: I read that it's somehow connected to inline but not sure which one of them makes the compiler to treat int getX() and inline int getX()
Avoid this first one:
struct point
{
int x,y;
int getX();
};
int point::getX() {
return this->x;
}
If multiple source files include point.h, you will get multiple definitions of point::getX, leading to a violation of the One Definition Rule (and modern linkers will give an error message).
For the second one:
struct point
{
int x,y;
int getX()
{
return this->x;
}
};
This implicitly inlines the function. This means that the function definition may be copy-pasted everywhere it is used, instead of resolving a function call. There are a few trade offs here. On one hand, by providing definitions in headers, you can more easily distribute your library. Additionally, in some cases you may see performance improvements due to the locality of the code. On the other hand, you may actually hurt performance due to instruction cache misses (more instructions around == it won't all fit in cache). And the size of your binaries may grow as the inlined function gets copied around.
Another tradeoff is that, should you ever need to change your implementation, all clients must rebuild.
Finally, depending on the sensitivity of the function, you may be revealing trade secrets through your headers (that is, there is absolutely no hiding of your secret sauce) (note: one can always decompile your binary and reverse engineer an implementation, so putting the def in the .cpp file won't stop a determined programmer, but it keeps honest people honest).
The third one, which separates a definition into a .cpp file:
// point.h
struct point
{
int x,y;
int getX();
};
// point.cpp
int point::getX() {
return this->x;
}
This will cause a function to get exported to your library (at least for gcc. In Windows, you need to be explicit by using __declspec directives to import/export). Again, there are tradeoffs here.
Changing the implementation does not require clients to recompile; you can distribute a new library for them to link to instead (the new library is ABI-compatible if you only change the impl details in the .cpp file). However, it is more difficult to distribute your library, as your binaries now need to be built for each platform.
You may see a performance decrease due to the requirement to resolve function pointers into a library for running code. You may also see a performance increase over inlining due to the fact that your code may be friendlier to the instruction cache.
In the end, there is a lot to consider. My recommendation is to go with #3 by default unless you are writing templates. When you want to look at improving performance, you can start to measure what inlining does for you (binary size as well as runtime perf). Of course you may have other information up front that makes approach #2 or #3 better suited for the task (e.g., you have a Point class, and you know that accessing X will happen everywhere and it's a really small function, so you decide to inline it).
what's the difference between the following 3 cases
The function definition is outside of the class definition. Note that in this example you've defined a non-inline function in a header. Including this header into more than one translation unit violates the One Definition Rule. This is most likely a bug.
The function definition is inside of the class definition. In this case, the function is implicitly inline. As such, it is fine to include it into multiple translation units.
The function definition is outside of the class definition again. The function is not declared inline. This time the function is defined in a separate translation unit, thereby conforming to the ODR even if the header is included into multiple translation units.
what's the problem if both b.cpp & a.cpp includes my header file
The problem is that then both b.cpp and a.cpp will define a non-inline function. The One Definition Rule says that there must be at most one definition of any inline function. Two is more than one. Therefore doing this violates the ODR and therefore such program would be ill-formed.
I'm too much confused why it's an error to write the same function in two different cpp files?
It is an "error" because the rules of the language (explained above) say that it is an "error".
what if both want to use that function?
Then declare the function in both translation units. Only define the function in one translation unit unless you declare the function inline, in which case define the function in all translation units (where the function is used) instead. Look at the examples 2. and 3. of your question to see how that can be done.
so the code in method 1 is not automatically inlined?
No. Functions are not automatically declared inline. Function is declared inline only if A. inline keyword is used, or if B. it is a non-static member function that is defined within the class definition (or in a case involving constexpr that I shall omit here). None of those cases apply to the example 1, therefore it is not an inline function.
C++17 introduced inline variable, and an inline static data member can be defined in the class definition with an initializer. It does not need an out-of-class definition. For example,
struct X {
inline static int n = 1;
};
Given this, I see no reason not to always use inline static data members, for the neat syntax. Any pitfall of doing this? Note that I don't mind slower compilation.
Not a pitfall, but here's one reason not to use an inline: if the initial value of the variable is not just a trivial constant, but something more complicated:
struct X {
inline static int n = and_then_more(figure_out_where_n_comes_from());
};
Now, the declaration of figure_out_where_n_comes_from() and and_then_more() must be pulled into the header file, now.
Also, whatever figure_out_where_n_comes_from() returns must also be declared. It could be some horribly overcomplicated class, which then gets passed to and_then_more(), as a parameter, to finally compute the initial value for n.
And everything that #includes the header file where X is declared must now include all the header files for all of these dependencies.
But without an inline, all you have is:
struct X {
static int n;
};
And you need to deal with all these dependencies only in one translation unit that instantiates X::x. Nothing else that #includes only X's header file cares about it.
In other words: information hiding. If it's necessary to reimplement where the initial value of n comes from, you get to recompile only one translation unit, instead of your entire source code.
I am reading Item 4 of Scott Meyer's Effective C++ where he is trying to show an example where a static non-local object is used across different translation units. He is highlighting the problem whereby the object used in one translation unit does not know if it has been initialised in the other one prior to usage. Its page 30 in the third edition in case anyone has a copy.
The example is such:
One file represents a library:
class FileSystem{
public:
std::size_t numDisks() const;
....
};
extern FileSystem tfs;
and in a client file:
class Directory {
public:
Directory(some_params);
....
};
Directory::Directory(some_params)
{
...
std::size_t disks = tfs.numDisks();
...
}
My two questions are thus:
1) If the client code needs to use tfs, then there will be some sort of include statement. Therefore surely this code is all in one translation unit? I do not see how you could refer to code which is in a different translation unit? Surely a program is always one translation unit?
2) If the client code included FileSystem.h would the line extern FileSystem tfs; be sufficient for the client code to call tfs (I appreciate there could be a run-time issue with initialisation, I am just talking about compile-time scope)?
EDIT to Q1
The book says these two pieces of code are in separate translation units. How could the client code use the variable tfs, knowing they're in separate translation units??
Here's a simplified example of how initialization across multiple TUs can be problematic.
gadget.h:
struct Foo;
extern Foo gadget;
gadget.cpp:
#include <foo.h>
#include <gadget.h>
Foo gadget(true, Blue, 'x'); // initialized here
client.cpp:
#include <foo.h>
#include <gadget.h>
int do_something()
{
int x = gadget.frumple(); // problem!
return bar(x * 2);
}
The problem is that it is not guaranteed that the gadgetobject will have been initialized by the time that do_something() refers to it. It is only guaranteed that initializers within one TU are completed before a function in that TU is called.
(The solution is to replace extern Foo gadget; with Foo & gadget();, implement that in gadget.cpp as { static Foo impl; return impl; } and use gadget().frumple().)
Here's the example from the Standard C++03 (I've added the a.h and b.h headers):
[basic.start.init]/3
// a.h
struct A { A(); Use(){} };
// b.h
struct B { Use(){} };
// – File 1 –
#include "a.h"
#include "b.h"
B b;
A::A(){
b.Use();
}
// – File 2 –
#include "a.h"
A a;
// – File 3 –
#include "a.h"
#include "b.h"
extern A a;
extern B b;
int main() {
a.Use();
b.Use();
}
It is implementation-defined whether either a or b is initialized before main is entered or whether the initializations are delayed until a is first used in main. In particular, if a is initialized before main is entered, it is not guaranteed that b will be initialized before it is used by the initialization of a, that is, before A::A is called. If, however, a is initialized at some point after the first statement of main, b will be initialized prior to its use in A::A.
1) If the client code needs to use tfs, then there will be some sort of include statement. Therefore surely this code is all in one translation unit? I do not see how you could refer to code which is in a different translation unit? Surely a program is always one translation unit?
A translation unit is (roughly) a single .cpp file after preprocessing. After you compile a single translation unit you get a module object (which typically have extension .o or .obj); after all TUs have been compiled, they are linked together by the linker to form the final executable. This is often hid by IDEs (and even by the compilers accepting multiple input files on the command line), but it's crucial to understand that building a C++ program is made in (at least) three passes: precompilation, compilation and linking.
The #include statement will include the declaration of the class and the extern declaration, telling to the current translation unit that the class FileSystem is made that way and that, in some translation unit, there's a variable tfs of type FileSystem.
2) If the client code included FileSystem.h would the line extern FileSystem tfs; be sufficient for the client code to call tfs
Yes, the extern declaration tells the compiler that in some TU there's a variable defined like that; the compiler puts a placeholder for it in the object module and the linker, when tying together the various object modules, will fix it with the address of the actual tfs variable (defined in some other translation unit).
Keep in mind that when you write extern you are only declaring a variable (i.e. you are telling the compiler "trust me, there's this thing somewhere"), when you omit it you are both declaring it and defining it ("there's this thing and you have to create it here").
The distinction maybe is clearer with functions: when you write a prototype you are declaring a function ("somewhere there's a function x that takes such parameters and returns this type"), when you actually write the function (with the function body) you are defining it ("this is what this function actually does"), and, if you haven't declared it before, it counts also as a declaration.
For how multiple TUs are actually used/managed, you can have a look at this answer of mine.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Static variables in C++
// x.h
int i = 3;
// x1.cpp
#include"x.h"
//...
// x2.cpp
#include"x.h"
//...
Above code will give linker error. However If I declare,
//x.h
static int i = 3;
It doesn't give linker error in gcc, even we have the same #include! Are we creating different static int i; for every .cpp file ? Will it cause any silent linking bug (due to same name)?
When C code is compiled, it's one "translation unit" at a time. Early on, #includes are expanded into the text of the referenced files. So what you've got in the static case is equivalent to x1.cpp saying static int i = 3; and x2.cpp doing the same. And static in this context means roughly "don't share this with other translation units."
So yes, when you use static there you are making two different i variables which have nothing to do with each other. This will not cause a linking error.
int x; is a definition of the entity x. The One Definition Rule of C++ says that any variable that is used shall be defined exactly once in the program. Hence the error.
static says that x has internal linkage. That is, the x's that appear in one.cpp and two.cpp are two different unrelated entities.
The C++ standard says that the use of static in this case is deprecated(As per Steve's comment, in C++0x it's undeprecated). Anonymous namespaces provide a superior alternative.
namespace
{
int x;
}
Also note that unlike C, in C++ const variables of scalar types also have internal linkage. That is
const int x = 7; // won't give you an error if included in different source files.
HTH
Are we creating different static int i; for every .cpp file ?
Yes
Will it cause any silent linking bug (due to same name)?
No. Due to static, they have different names.
If this isn't the behavior you want, you need to use extern in the header file, and allocate the variable in one translation unit (.cpp file)
static creates a global variable that is only visible inside the unit.
If you want to use a variable in more than on ecompilation unit, use extern in the header and declare it in the implmenetation without extern.
You get the linker error in your first code example because i is defined and exported in both compilation units. In the second case i is static, so there is no exported symbol because static variables are only visible in the current compilation unit and aren't exported to the linker. In this case you have two independent variables that are both called i.
As written, the code looks like the same i is being accessed by multiple .cpp files, whereas in reality, each .cpp file will have its own copy. This can lead to misunderstandings and bugs.
If you want there to be just one copy of i, the preferred idiom is to wrap it in an accessor function in x.h:
int& GetI() {
static int i = 3; // this initialization only happens once.
return i;
}
If you do want separate copies of i for each .cpp file, a much clearer expression of this is to simply declare i separately in each .cpp file:
namespace {
int i;
}
Putting it in an anonymous namespace as above keeps it from being accessible from other .cpp files, for safety.