Consider the following example:
// usedclass1.hpp
#include <iostream>
class UsedClass
{
public:
UsedClass() { }
void doit() { std::cout << "UsedClass 1 (" << this << ") doit hit" << std::endl; }
};
// usedclass2.hpp
#include <iostream>
class UsedClass
{
public:
UsedClass() { }
void doit() { std::cout << "UsedClass 2 (" << this << ") doit hit" << std::endl; }
};
// object.hpp
class Object
{
public:
Object();
};
// object.cpp
#include "object.hpp"
#include "usedclass2.hpp"
Object::Object()
{
UsedClass b;
b.doit();
}
// main.cpp
#include "usedclass1.hpp"
#include "object.hpp"
int main()
{
Object obj;
UsedClass a;
a.doit();
}
The code compiles without any compiler or linker errors. But the output is strange for me:
gcc (Red Hat 4.6.1-9) on Fedora x86_64 with no optimization [EG1]:
UsedClass 1 (0x7fff0be4a6ff) doit hit
UsedClass 1 (0x7fff0be4a72e) doit hit
same as [EG1] but with -O2 option enabled [EG2]:
UsedClass 2 (0x7fffcef79fcf) doit hit
UsedClass 1 (0x7fffcef79fff) doit hit
msvc2005 (14.00.50727.762) on Windows XP 32bit with no optimization [EG3]:
UsedClass 1 (0012FF5B) doit hit
UsedClass 1 (0012FF67) doit hit
same as [EG3] but with /O2 (or /Ox) enabled [EG4]:
UsedClass 1 (0012FF73) doit hit
UsedClass 1 (0012FF7F) doit hit
I would expect either a linker error (assuming ODR rule is violated) or the output as in [EG2] (code is inlined, nothing is exported from the translation unit, ODR rule is held). Thus my questions:
Why are outputs [EG1], [EG3], [EG4] possible?
Why do I get different results from different compilers or even from the same compiler? That makes me think that the standard somehow doesn't specify the behaviour in this case.
Thank you for any suggestions, comments and standard interpretations.
Update
I would like to understand the compiler's behaviour. More precisely, why there are no errors generated if the ODR is violated. A hypothesis is that since all functions in classes UsedClass1 and UsedClass2 are marked as inline (and therefore C++03 3.2 is not violated) the linker doesn't report errors, but in this case outputs [EG1], [EG3], [EG4] seem strange.
This is the rule that prohibits what you're doing (the C++11 wording), from section 3.2 of the Standard:
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
each definition of D shall consist of the same sequence of tokens; and
in each definition of D, corresponding names, looked up according to 3.4, shall refer to an entity defined within the definition of D, or shall refer to the same entity, after overload resolution (13.3) and after matching of partial template specialization (14.8.3), except that a name can refer to a const object with internal or no linkage if the object has the same literal type in all definitions of D, and the object is initialized with a constant expression (5.19), and the value (but not the address) of the object is used, and the object has the same value in all definitions of D; and
in each definition of D, corresponding entities shall have the same language linkage; and
in each definition of D, the overloaded operators referred to, the implicit calls to conversion functions, constructors, operator new functions and operator delete functions, shall refer to the same function, or to a function defined within the definition of D; and
in each definition of D, a default argument used by an (implicit or explicit) function call is treated as if its token sequence were present in the definition of D; that is, the default argument is subject to the three requirements described above (and, if the default argument has sub-expressions with default arguments, this requirement applies recursively).
if D is a class with an implicitly-declared constructor (12.1), it is as if the constructor was implicitly defined in every translation unit where it is odr-used, and the implicit definition in every translation unit shall call the same constructor for a base class or a class member of D.
In your program, you're violating the ODR for class UsedClass because the tokens are different in different compilation units. You could fix that by moving the definition of UsedClass::doit() outside the class body, but the same rule applies to the body of inline functions.
Your program violates the One Definition Rule and invokes an Undefined Behavior.
The standard does not mandate an diagnostic message if you break the ODR but the behavior is Undefined.
C++03 3.2 One definition rule
No translation unit shall contain more than one definition of any variable, function, class type, enumeration type or template.
...
Every program shall contain exactly one definition of every non-inline function or object that is used in that program; no diagnostic required. The definition can appear explicitly in the program, it can be found in the standard or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1, 12.4 and 12.8). An inline function shall be defined in every translation unit in which it is used.
Further the standard defines specific requirements for existence of multiple definitions of an symbol, those are aptly defined in Para #5 of 3.2.
There can be more than one definition of a class type (clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (clause 14), non-static function template (14.5.5), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.4) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
— each definition of D shall consist of the same sequence of tokens; and
...
Why are outputs [EG1], [EG3], [EG4] possible?
The simple answer is that the behaviour is undefined, so anything is possible.
Most compilers handle an inline function by generating a copy in each translation unit in which it's defined; the linker then arbitrarily chooses one to include in the final program. This is why, with optimisations disabled, it calls the same function in both cases. With optimisations enabled, the function might be inlined by the compiler, in which case each inlined call will use the version defined in the current translation unit.
That makes me think that the standard somehow doesn't specify the behaviour in this case.
That's correct. Breaking the one definition rule gives undefined behaviour, and no diagnostic is required.
Related
Consider the following example:
// usedclass1.hpp
#include <iostream>
class UsedClass
{
public:
UsedClass() { }
void doit() { std::cout << "UsedClass 1 (" << this << ") doit hit" << std::endl; }
};
// usedclass2.hpp
#include <iostream>
class UsedClass
{
public:
UsedClass() { }
void doit() { std::cout << "UsedClass 2 (" << this << ") doit hit" << std::endl; }
};
// object.hpp
class Object
{
public:
Object();
};
// object.cpp
#include "object.hpp"
#include "usedclass2.hpp"
Object::Object()
{
UsedClass b;
b.doit();
}
// main.cpp
#include "usedclass1.hpp"
#include "object.hpp"
int main()
{
Object obj;
UsedClass a;
a.doit();
}
The code compiles without any compiler or linker errors. But the output is strange for me:
gcc (Red Hat 4.6.1-9) on Fedora x86_64 with no optimization [EG1]:
UsedClass 1 (0x7fff0be4a6ff) doit hit
UsedClass 1 (0x7fff0be4a72e) doit hit
same as [EG1] but with -O2 option enabled [EG2]:
UsedClass 2 (0x7fffcef79fcf) doit hit
UsedClass 1 (0x7fffcef79fff) doit hit
msvc2005 (14.00.50727.762) on Windows XP 32bit with no optimization [EG3]:
UsedClass 1 (0012FF5B) doit hit
UsedClass 1 (0012FF67) doit hit
same as [EG3] but with /O2 (or /Ox) enabled [EG4]:
UsedClass 1 (0012FF73) doit hit
UsedClass 1 (0012FF7F) doit hit
I would expect either a linker error (assuming ODR rule is violated) or the output as in [EG2] (code is inlined, nothing is exported from the translation unit, ODR rule is held). Thus my questions:
Why are outputs [EG1], [EG3], [EG4] possible?
Why do I get different results from different compilers or even from the same compiler? That makes me think that the standard somehow doesn't specify the behaviour in this case.
Thank you for any suggestions, comments and standard interpretations.
Update
I would like to understand the compiler's behaviour. More precisely, why there are no errors generated if the ODR is violated. A hypothesis is that since all functions in classes UsedClass1 and UsedClass2 are marked as inline (and therefore C++03 3.2 is not violated) the linker doesn't report errors, but in this case outputs [EG1], [EG3], [EG4] seem strange.
This is the rule that prohibits what you're doing (the C++11 wording), from section 3.2 of the Standard:
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
each definition of D shall consist of the same sequence of tokens; and
in each definition of D, corresponding names, looked up according to 3.4, shall refer to an entity defined within the definition of D, or shall refer to the same entity, after overload resolution (13.3) and after matching of partial template specialization (14.8.3), except that a name can refer to a const object with internal or no linkage if the object has the same literal type in all definitions of D, and the object is initialized with a constant expression (5.19), and the value (but not the address) of the object is used, and the object has the same value in all definitions of D; and
in each definition of D, corresponding entities shall have the same language linkage; and
in each definition of D, the overloaded operators referred to, the implicit calls to conversion functions, constructors, operator new functions and operator delete functions, shall refer to the same function, or to a function defined within the definition of D; and
in each definition of D, a default argument used by an (implicit or explicit) function call is treated as if its token sequence were present in the definition of D; that is, the default argument is subject to the three requirements described above (and, if the default argument has sub-expressions with default arguments, this requirement applies recursively).
if D is a class with an implicitly-declared constructor (12.1), it is as if the constructor was implicitly defined in every translation unit where it is odr-used, and the implicit definition in every translation unit shall call the same constructor for a base class or a class member of D.
In your program, you're violating the ODR for class UsedClass because the tokens are different in different compilation units. You could fix that by moving the definition of UsedClass::doit() outside the class body, but the same rule applies to the body of inline functions.
Your program violates the One Definition Rule and invokes an Undefined Behavior.
The standard does not mandate an diagnostic message if you break the ODR but the behavior is Undefined.
C++03 3.2 One definition rule
No translation unit shall contain more than one definition of any variable, function, class type, enumeration type or template.
...
Every program shall contain exactly one definition of every non-inline function or object that is used in that program; no diagnostic required. The definition can appear explicitly in the program, it can be found in the standard or a user-defined library, or (when appropriate) it is implicitly defined (see 12.1, 12.4 and 12.8). An inline function shall be defined in every translation unit in which it is used.
Further the standard defines specific requirements for existence of multiple definitions of an symbol, those are aptly defined in Para #5 of 3.2.
There can be more than one definition of a class type (clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (clause 14), non-static function template (14.5.5), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.4) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
— each definition of D shall consist of the same sequence of tokens; and
...
Why are outputs [EG1], [EG3], [EG4] possible?
The simple answer is that the behaviour is undefined, so anything is possible.
Most compilers handle an inline function by generating a copy in each translation unit in which it's defined; the linker then arbitrarily chooses one to include in the final program. This is why, with optimisations disabled, it calls the same function in both cases. With optimisations enabled, the function might be inlined by the compiler, in which case each inlined call will use the version defined in the current translation unit.
That makes me think that the standard somehow doesn't specify the behaviour in this case.
That's correct. Breaking the one definition rule gives undefined behaviour, and no diagnostic is required.
(Note! This question particularly covers the state of C++14, before the introduction of inline variables in C++17)
TLDR; Question
What constitutes odr-use of a constexpr variable used in the definition of an inline function, such that multiple definitions of the function violates [basic.def.odr]/6?
(... likely [basic.def.odr]/3; but could this silently introduce UB in a program as soon as, say, the address of such a constexpr variable is taken in the context of the inline function's definition?)
TLDR example: does a program where doMath() defined as follows:
// some_math.h
#pragma once
// Forced by some guideline abhorring literals.
constexpr int kTwo{2};
inline int doMath(int arg) { return std::max(arg, kTwo); }
// std::max(const int&, const int&)
have undefined behaviour as soon as doMath() is defined in two different translation units (say by inclusion of some_math.h and subsequent use of doMath())?
Background
Consider the following example:
// constants.h
#pragma once
constexpr int kFoo{42};
// foo.h
#pragma once
#include "constants.h"
inline int foo(int arg) { return arg * kFoo; } // #1: kFoo not odr-used
// a.cpp
#include "foo.h"
int a() { return foo(1); } // foo odr-used
// b.cpp
#include "foo.h"
int b() { return foo(2); } // foo odr-used
compiled for C++14, particularly before inline variables and thus before constexpr variables were implicitly inline.
The inline function foo (which has external linkage) is odr-used in both translation units (TU) associated with a.cpp and b.cpp, say TU_a and TU_b, and shall thus be defined in both of these TU's ([basic.def.odr]/4).
[basic.def.odr]/6 covers the requirements for when such multiple definitions (different TU's) may appear, and particularly /6.1 and /6.2 is relevant in this context [emphasis mine]:
There can be more than one definition of a [...] inline function with external linkage [...] in a program
provided that each definition appears in a different translation unit,
and provided the definitions satisfy the following requirements. Given
such an entity named D defined in more than one translation unit, then
/6.1 each definition of D shall consist of the same sequence of tokens; and
/6.2 in each definition of D, corresponding names, looked up according to [basic.lookup], shall refer to an entity defined within
the definition of D, or shall refer to the same entity, after overload
resolution ([over.match]) and after matching of partial template
specialization ([temp.over]), except that a name can refer to a
non-volatile const object with internal or no linkage if the object
has the same literal type in all definitions of D, and the object is
initialized with a constant expression ([expr.const]), and the object
is not odr-used, and the object has the same value in all definitions
of D; and
...
If the definitions of D do not satisfy these requirements, then the behavior is undefined.
/6.1 is fulfilled.
/6.2 if fulfilled if kFoo in foo:
[OK] is const with internal linkage
[OK] is initialized with a constant expressions
[OK] is of same literal type over all definitions of foo
[OK] has the same value in all definitions of foo
[??] is not odr-used.
I interpret 5 as particularly "not odr-used in the definition of foo"; this could arguably have been clearer in the wording. However if kFoo is odr-used (at least in the definition of foo) I interpret it as opening up for odr-violations and subsequent undefined behavior, due to violation of [basic.def.odr]/6.
Afaict [basic.def.odr]/3 governs whether kFoo is odr-used or not,
A variable x whose name appears as a potentially-evaluated expression ex is odr-used by ex unless applying the lvalue-to-rvalue conversion ([conv.lval]) to x yields a constant expression ([expr.const]) that does not invoke any non-trivial functions and, if x is an object, ex is an element of the set of potential results of an expression e, where either the lvalue-to-rvalue conversion ([conv.lval]) is applied to e, or e is a discarded-value expression (Clause [expr]). [...]
but I'm having a hard time to understand whether kFoo is considered as odr-used e.g. if its address is taken within the definition of foo, or e.g. whether if its address is taken outside of the definition of foo or not affects whether [basic.def.odr]/6.2 is fulfilled or not.
Further details
Particularly, consider if foo is defined as:
// #2
inline int foo(int arg) {
std::cout << "&kFoo in foo() = " << &kFoo << "\n";
return arg * kFoo;
}
and a() and b() are defined as:
int a() {
std::cout << "TU_a, &kFoo = " << &kFoo << "\n";
return foo(1);
}
int b() {
std::cout << "TU_b, &kFoo = " << &kFoo << "\n";
return foo(2);
}
then running a program which calls a() and b() in sequence produces:
TU_a, &kFoo = 0x401db8
&kFoo in foo() = 0x401db8 // <-- foo() in TU_a:
// &kFoo from TU_a
TU_b, &kFoo = 0x401dbc
&kFoo in foo() = 0x401db8 // <-- foo() in TU_b:
// !!! &kFoo from TU_a
namely the address of the TU-local kFoo when accessed from the different a() and b() functions, but pointing to the same kFoo address when accessed from foo().
DEMO.
Does this program (with foo and a/b defined as per this section) have undefined behaviour?
A real life example would be where these constexpr variables represent mathematical constants, and where they are used, from within the definition of an inline function, as arguments to utility math functions such as std::max(), which takes its arguments by reference.
In the OP's example with std::max, an ODR violation does indeed occur, and the program is ill-formed NDR. To avoid this issue, you might consider one of the following fixes:
give the doMath function internal linkage, or
move the declaration of kTwo inside doMath
A variable that is used by an expression is considered to be odr-used unless there is a certain kind of simple proof that the reference to the variable can be replaced by the compile-time constant value of the variable without changing the result of the expression. If such a simple proof exists, then the standard requires the compiler perform such a replacement; consequently the variable is not odr-used (in particular, it does not require a definition, and the issue described by the OP would be avoided because none of the translation units in which doMath is defined would actually reference a definition of kTwo). If the expression is too complicated, however, then all bets are off. The compiler might still replace the variable with its value, in which case the program may work as you expect; or the program may exhibit bugs or crash. That's the reality with IFNDR programs.
The case where the variable is immediately passed by reference to a function, with the reference binding directly, is one common case where the variable is used in a way that is too complicated and the compiler is not required to determine whether or not it may be replaced by its compile-time constant value. This is because doing so would necessarily require inspecting the definition of the function (such as std::max<int> in this example).
You can "help" the compiler by writing int(kTwo) and using that as the argument to std::max as opposed to kTwo itself; this prevents an odr-use since the lvalue-to-rvalue conversion is now immediately applied prior to calling the function. I don't think this is a great solution (I recommend one of the two solutions that I previously mentioned) but it has its uses (GoogleTest uses this in order to avoid introducing odr-uses in statements like EXPECT_EQ(2, kTwo)).
If you want to know more about how to understand the precise definition of odr-use, involving "potential results of an expression e...", that would be best addressed with a separate question.
Does a program where doMath() defined as follows: [...] have undefined behaviour as soon as doMath() is defined in two different translation units (say by inclusion of some_math.h and subsequent use of doMath())?
Yes; this particular issue was highlighted in LWG2888 and LWG2889 which were both resolved for C++17 by P0607R0 (Inline Variables for the Standard Library) [emphasis mine]:
2888. Variables of library tag types need to be inline variables
[...]
The variables of library tag types need to be inline variables.
Otherwise, using them in inline functions in multiple translation
units is an ODR violation.
Proposed change: Make piecewise_construct, allocator_arg, nullopt,
(the in_place_tags after they are made regular tags), defer_lock,
try_to_lock and adopt_lock inline.
[...]
[2017-03-12, post-Kona] Resolved by p0607r0.
2889. Mark constexpr global variables as inline
The C++ standard library provides many constexpr global variables.
These all create the risk of ODR violations for innocent user code.
This is especially bad for the new ExecutionPolicy algorithms, since
their constants are always passed by reference, so any use of those
algorithms from an inline function results in an ODR violation.
This can be avoided by marking the globals as inline.
Proposed change: Add inline specifier to: bind placeholders _1, _2,
..., nullopt, piecewise_construct, allocator_arg, ignore, seq, par,
par_unseq in
[...]
[2017-03-12, post-Kona] Resolved by p0607r0.
Thus, in C++14, prior to inline variables, this risk is present both for your own global variables as well as library ones.
Lets assume I have a library somelib.a, that is distributed as binary by the package manager. And this library makes use of the header only library anotherlib.hpp.
If I now link my program against somelib.a, and also use anotherlib.hpp but with a different version, then this can result in UB, if somelib.a uses parts of the anotherlib.hpp in its include headers.
But what will happen if somelib.a will reference/use anotherlib.hpp only in its cpp files (so I don't know that it uses them)? Will the linking step between my application and somelib.a ensure that somelib.a and my application will both use their own version of anotherlib.hpp.
The reason I ask is if I link the individual compilation units of my program to the final program, then the linker removes duplicate symbols (depending on if it is internal linkage or not). So a header only library is normally written in a way that removing duplicate symbols can be done.
A minimal example
somelib.a is build on a system with nlohmann/json.hpp version 3.2
somelib/somelib.h
namespace somelib {
struct config {
// some members
};
config read_configuration(const std::string &path);
}
somelib.cpp
#include <nlohmann/json.hpp>
namespace somelib {
config read_configuration(const std::string &path)
{
nlohmann::json j;
std::ifstream i(path);
i >> j;
config c;
// populate c based on j
return c;
}
}
application is build on another system with nlohmann/json.hpp version 3.5 and 3.2 and 3.5 are not compatible, and then application is then linked against the somelib.a that was build on the system with version 3.2
application.cpp
#include <somelib/somelib.h>
#include <nlohmann/json.hpp>
#include <ifstream>
int main() {
auto c = somelib::read_configuration("config.json");
nlohmann::json j;
std::ifstream i("another.json");
i >> j;
return 0;
}
It hardly makes any difference that you are using a static library.
The C++ standard states that if in a program there is multiple definitions of an inline function (or class template, or variable, etc.) and all the definitions are not the same, then you have UB.
Practically, it means that unless the changes between the 2 versions of the header library are very limited you will have UB.
For instance, if the only changes are whitespace changes, comments, or adding new symbols, then you will not have undefined behavior. However, if a single line of code in an existing function was changed, then it is UB.
From the C++17 final working draft (n4659.pdf):
6.2 One-definition rule
[...]
There can be more than one definition of a class type (Clause 12),
enumeration type (10.2), inline function with external linkage
(10.1.6), inline variable with external linkage (10.1.6), class
template (Clause 17), non-static function template (17.5.6), static
data member of a class template (17.5.1.3), member function of a class
template (17.5.1.1), or template specialization for which some
template parameters are not specified in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the
following requirements.
Given such an entity named D defined in more than one translation
unit, then
each definition of D shall consist of the same
sequence of tokens; and
in each definition of D, corresponding
names, looked up according to 6.4, shall refer to an entity defined
within the definition of D, or shall refer to the same entity, after
overload resolution (16.3) and after matching of partial template
specialization (17.8.3), except that a name can refer to (6.2.1)
a non-volatile const object with internal or no linkage if the object
has the same literal type in all definitions of D,
(6.2.1.2)
is initialized with a constant expression (8.20),
is not odr-used in any definition of D, and
has the same value in all definitions of D,
or
a reference with internal or no linkage initialized with a constant expression
such that the reference refers to the same entity in all definitions
of D; and (6.3)
in each definition of D, corresponding entities
shall have the same language linkage; and
in each definition
of D, the overloaded operators referred to, the implicit calls to
conversion functions, constructors, operator new functions and
operator delete functions, shall refer to the same function, or to a
function defined within the definition of D; and
in each definition of
D, a default argument used by an (implicit or explicit) function call
is treated as if its token sequence were present in the definition of
D; that is, the default argument is subject to the requirements
described in this paragraph (and, if the default argument has
subexpressions with default arguments, this requirement applies
recursively).28
if D is a class with an implicitly-declared
constructor (15.1), it is as if the constructor was implicitly defined
in every translation unit where it is odr-used, and the implicit
definition in every translation unit shall call the same constructor
for a subobject of D.
If D is a template and is defined in more than one translation unit,
then the preceding requirements shall apply both to names from the
template’s enclosing scope used in the template definition (17.6.3),
and also to dependent names at the point of instantiation (17.6.2). If
the definitions of D satisfy all these requirements, then the behavior
is as if there were a single definition of D. If the definitions of D
do not satisfy these requirements, then the behavior is undefined.
I just read that constexpr and inline functions obey one-definition rule, but they definition must be identical. So I try it:
inline void foo() {
return;
}
inline void foo() {
return;
}
int main() {
foo();
};
error: redefinition of 'void foo()',
and
constexpr int foo() {
return 1;
}
constexpr int foo() {
return 1;
}
int main() {
constexpr x = foo();
};
error: redefinition of 'constexpr int foo()'
So what exactly means that, constexpr and inline function can obey ODR?
I just read that constexpr and inline functions obey one-definition rule, but they definition must be identical.
This is in reference to inline functions in different translations units. In your example they are both in the same translation unit.
This is covered in the draft C++ standard 3.2 One definition rule [basic.def.odr] which says:
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with
external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member
of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for
which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition
appears in a different translation unit, and provided the definitions satisfy the following requirements. Given
such an entity named D defined in more than one translation unit, then
and includes the following bullet:
each definition of D shall consist of the same sequence of tokens; and
You are defining functions repeatedly in one translation unit. This is always forbidden:
No translation unit shall contain more than one definition of any variable, function, class type, enumeration
type, or template. (C++11 3.2/1)
For inline functions, you are allowed to define same function in exactly the same way in more than one translation unit (read: .cpp file). In fact, you must define it in every translation unit (which is usually done by defining it in a header file):
An inline function shall be defined in every translation unit in which it is odr-used. (C++11 3.2/3)
For "normal" (non-inline, non-constexpr, non-template, etc.) functions with external linkage (non-static) functions, this will usually (no diagnostic required) lead to a linker error.
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used
in that program; no diagnostic required. (C++11 3.2/3)
To sum up:
Never define anything multiple times in one translation unit (which is a .cpp file and all directly or indirectly included headers).
You may put a certain number of things into header files, where they will be included once in several different translation units, for example:
inline functions
class types and templates
static data members of a class template.
If you have:
file1.cpp:
inline void foo() { std::cout << "Came to foo in file1.cpp" << std::endl; }
and
file2.cpp:
inline void foo() { std::cout << "Came to foo in file2.cpp" << std::endl; }
and you link those files together in an executable, you are violating the one-definition-rule since the two versions of the inline function are not same.
When you have a static global variable in a C++ header file, each translation unit that includes the header file ends up with its own copy of the variable.
However, if I declare a class in that same header file, and create a member function of that class, implemented inline within the class declaration, that uses the static global variable, for example:
#include <iostream>
static int n = 10;
class Foo {
public:
void print() { std::cout << n << std::endl; }
};
then I see slightly odd behavior under gcc 4.4:
If I compile without optimization, all uses of the member function use the copy of the variable from one of the translation units (the first one mentioned on the g++ command line).
If I compile with -O2, each use of the member function uses the copy of the variable from the translation unit in which the case is made.
Obviously this is really bad design, so this question is just out of curiosity. But my question, nonetheless, is what does the C++ standard say about this case? Is g++ behaving correctly by giving different behavior with and without optimization enabled?
The standard says (3.2/5):
There can be more than one definition
of a class type (clause 9),
... provided the definitions satisfy
the following requirements ... in each
definition of D, corresponding names,
looked up according to 3.4, shall
refer to an entity defined within the
definition of D, or shall refer to the
same entity
This is where your code loses. The uses of n in the different definitions of Foo do not refer to the same object. Game over, undefined behavior, so yes gcc is entitled to do different things at different optimization levels.
3.2/5 continues:
except that a name can refer to a
const object with internal or no
linkage if the object has the same
integral or enumeration type in all
definitions of D, and the object is
initialized with a constant expression
(5.19), and the value (but not the
address) of the object is used, and
the object has the same value in all
definitions of D
So in your example code you could make n into a static const int and all would be lovely. It's not a coincidence that this clause describes conditions under which it makes no difference whether the different TUs "refer to" the same object or different objects - all they use is a compile-time constant value, and they all use the same one.