The ODR allows us to define several times the same inline function (with some restrictions).
However, what about the simpler case of static functions?
// First TU
static int foo() { return 0; }
int bar1() { return foo(); }
// Second TU
static int foo() { return 1; }
int bar2() { return foo(); }
If we do a quick read of [basic.def.odr]p4, we could naively conclude that this would be UB:
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program outside of a discarded statement (9.4.1); no diagnostic required.
Where in the C++ standard is specified that each foo is a different function and therefore not breaking the ODR, even if they have the same name?
Is it simply a matter of reading [basic.link]p2.2 (i.e. due to internal linkage the names do not refer to the same entity and therefore [basic.def.odr]p4 does not apply here)? Or are there more nuances/rules involved to make this determination (like something in [basic.scope])?
Note that, with unnamed namespaces, the outcome is clear, because the name would be already different/unique.
Correct — even though they have the same name locally, those are two different functions/entities, so there's no violation.
[basic.link]/4.3: When a name has internal linkage, the entity it denotes can be referred to by names from other scopes in the same translation unit.
[basic.link]/5: A name having namespace scope has internal linkage if it is the name of a variable, variable template, function, or function template that is explicitly declared static; or [..]
I can't immediately find any further wording (normative or otherwise) that applies, but I don't think we need any.
Related
(Note! This question particularly covers the state of C++14, before the introduction of inline variables in C++17)
TLDR; Question
What constitutes odr-use of a constexpr variable used in the definition of an inline function, such that multiple definitions of the function violates [basic.def.odr]/6?
(... likely [basic.def.odr]/3; but could this silently introduce UB in a program as soon as, say, the address of such a constexpr variable is taken in the context of the inline function's definition?)
TLDR example: does a program where doMath() defined as follows:
// some_math.h
#pragma once
// Forced by some guideline abhorring literals.
constexpr int kTwo{2};
inline int doMath(int arg) { return std::max(arg, kTwo); }
// std::max(const int&, const int&)
have undefined behaviour as soon as doMath() is defined in two different translation units (say by inclusion of some_math.h and subsequent use of doMath())?
Background
Consider the following example:
// constants.h
#pragma once
constexpr int kFoo{42};
// foo.h
#pragma once
#include "constants.h"
inline int foo(int arg) { return arg * kFoo; } // #1: kFoo not odr-used
// a.cpp
#include "foo.h"
int a() { return foo(1); } // foo odr-used
// b.cpp
#include "foo.h"
int b() { return foo(2); } // foo odr-used
compiled for C++14, particularly before inline variables and thus before constexpr variables were implicitly inline.
The inline function foo (which has external linkage) is odr-used in both translation units (TU) associated with a.cpp and b.cpp, say TU_a and TU_b, and shall thus be defined in both of these TU's ([basic.def.odr]/4).
[basic.def.odr]/6 covers the requirements for when such multiple definitions (different TU's) may appear, and particularly /6.1 and /6.2 is relevant in this context [emphasis mine]:
There can be more than one definition of a [...] inline function with external linkage [...] in a program
provided that each definition appears in a different translation unit,
and provided the definitions satisfy the following requirements. Given
such an entity named D defined in more than one translation unit, then
/6.1 each definition of D shall consist of the same sequence of tokens; and
/6.2 in each definition of D, corresponding names, looked up according to [basic.lookup], shall refer to an entity defined within
the definition of D, or shall refer to the same entity, after overload
resolution ([over.match]) and after matching of partial template
specialization ([temp.over]), except that a name can refer to a
non-volatile const object with internal or no linkage if the object
has the same literal type in all definitions of D, and the object is
initialized with a constant expression ([expr.const]), and the object
is not odr-used, and the object has the same value in all definitions
of D; and
...
If the definitions of D do not satisfy these requirements, then the behavior is undefined.
/6.1 is fulfilled.
/6.2 if fulfilled if kFoo in foo:
[OK] is const with internal linkage
[OK] is initialized with a constant expressions
[OK] is of same literal type over all definitions of foo
[OK] has the same value in all definitions of foo
[??] is not odr-used.
I interpret 5 as particularly "not odr-used in the definition of foo"; this could arguably have been clearer in the wording. However if kFoo is odr-used (at least in the definition of foo) I interpret it as opening up for odr-violations and subsequent undefined behavior, due to violation of [basic.def.odr]/6.
Afaict [basic.def.odr]/3 governs whether kFoo is odr-used or not,
A variable x whose name appears as a potentially-evaluated expression ex is odr-used by ex unless applying the lvalue-to-rvalue conversion ([conv.lval]) to x yields a constant expression ([expr.const]) that does not invoke any non-trivial functions and, if x is an object, ex is an element of the set of potential results of an expression e, where either the lvalue-to-rvalue conversion ([conv.lval]) is applied to e, or e is a discarded-value expression (Clause [expr]). [...]
but I'm having a hard time to understand whether kFoo is considered as odr-used e.g. if its address is taken within the definition of foo, or e.g. whether if its address is taken outside of the definition of foo or not affects whether [basic.def.odr]/6.2 is fulfilled or not.
Further details
Particularly, consider if foo is defined as:
// #2
inline int foo(int arg) {
std::cout << "&kFoo in foo() = " << &kFoo << "\n";
return arg * kFoo;
}
and a() and b() are defined as:
int a() {
std::cout << "TU_a, &kFoo = " << &kFoo << "\n";
return foo(1);
}
int b() {
std::cout << "TU_b, &kFoo = " << &kFoo << "\n";
return foo(2);
}
then running a program which calls a() and b() in sequence produces:
TU_a, &kFoo = 0x401db8
&kFoo in foo() = 0x401db8 // <-- foo() in TU_a:
// &kFoo from TU_a
TU_b, &kFoo = 0x401dbc
&kFoo in foo() = 0x401db8 // <-- foo() in TU_b:
// !!! &kFoo from TU_a
namely the address of the TU-local kFoo when accessed from the different a() and b() functions, but pointing to the same kFoo address when accessed from foo().
DEMO.
Does this program (with foo and a/b defined as per this section) have undefined behaviour?
A real life example would be where these constexpr variables represent mathematical constants, and where they are used, from within the definition of an inline function, as arguments to utility math functions such as std::max(), which takes its arguments by reference.
In the OP's example with std::max, an ODR violation does indeed occur, and the program is ill-formed NDR. To avoid this issue, you might consider one of the following fixes:
give the doMath function internal linkage, or
move the declaration of kTwo inside doMath
A variable that is used by an expression is considered to be odr-used unless there is a certain kind of simple proof that the reference to the variable can be replaced by the compile-time constant value of the variable without changing the result of the expression. If such a simple proof exists, then the standard requires the compiler perform such a replacement; consequently the variable is not odr-used (in particular, it does not require a definition, and the issue described by the OP would be avoided because none of the translation units in which doMath is defined would actually reference a definition of kTwo). If the expression is too complicated, however, then all bets are off. The compiler might still replace the variable with its value, in which case the program may work as you expect; or the program may exhibit bugs or crash. That's the reality with IFNDR programs.
The case where the variable is immediately passed by reference to a function, with the reference binding directly, is one common case where the variable is used in a way that is too complicated and the compiler is not required to determine whether or not it may be replaced by its compile-time constant value. This is because doing so would necessarily require inspecting the definition of the function (such as std::max<int> in this example).
You can "help" the compiler by writing int(kTwo) and using that as the argument to std::max as opposed to kTwo itself; this prevents an odr-use since the lvalue-to-rvalue conversion is now immediately applied prior to calling the function. I don't think this is a great solution (I recommend one of the two solutions that I previously mentioned) but it has its uses (GoogleTest uses this in order to avoid introducing odr-uses in statements like EXPECT_EQ(2, kTwo)).
If you want to know more about how to understand the precise definition of odr-use, involving "potential results of an expression e...", that would be best addressed with a separate question.
Does a program where doMath() defined as follows: [...] have undefined behaviour as soon as doMath() is defined in two different translation units (say by inclusion of some_math.h and subsequent use of doMath())?
Yes; this particular issue was highlighted in LWG2888 and LWG2889 which were both resolved for C++17 by P0607R0 (Inline Variables for the Standard Library) [emphasis mine]:
2888. Variables of library tag types need to be inline variables
[...]
The variables of library tag types need to be inline variables.
Otherwise, using them in inline functions in multiple translation
units is an ODR violation.
Proposed change: Make piecewise_construct, allocator_arg, nullopt,
(the in_place_tags after they are made regular tags), defer_lock,
try_to_lock and adopt_lock inline.
[...]
[2017-03-12, post-Kona] Resolved by p0607r0.
2889. Mark constexpr global variables as inline
The C++ standard library provides many constexpr global variables.
These all create the risk of ODR violations for innocent user code.
This is especially bad for the new ExecutionPolicy algorithms, since
their constants are always passed by reference, so any use of those
algorithms from an inline function results in an ODR violation.
This can be avoided by marking the globals as inline.
Proposed change: Add inline specifier to: bind placeholders _1, _2,
..., nullopt, piecewise_construct, allocator_arg, ignore, seq, par,
par_unseq in
[...]
[2017-03-12, post-Kona] Resolved by p0607r0.
Thus, in C++14, prior to inline variables, this risk is present both for your own global variables as well as library ones.
P1787 has an excellent description for what are the same entity.
Two declarations of entities declare the same entity if, considering declarations of unnamed types to introduce their names for linkage purposes, if any ([dcl.typedef], [dcl.enum]), they correspond ([basic.scope.scope]), have the same target scope that is not a function or template parameter scope, and either
they appear in the same translation unit, or
they both declare names with module linkage and are attached to the same module, or
they both declare names with external linkage.
So, consider this example:
// a.hpp
inline int& function(){
static int value = 0; // #1
return value;
}
----------------------
//b.cpp
#include "a.hpp"
void g(){
auto&& rf = function();
}
----------------------
//c.cpp
#include "a.hpp"
int main(){
auto&& rf0 = function();
}
Except for the note says that:
[ Note: An inline function or variable with external or module linkage can be defined in multiple translation units([basic.def.odr]), but is one entity with one address. A type or variable defined in the body of such a function is therefore a single entity.--end note]
However, let's consider the value declared at #1. In b's TU and c's TU, these two declarations for value are corresponding, and they have the same target scope which is introduced by the compound-statement of function. However, a local variable does not have any linkage, so neither bullet in that list will be satisfied. So, why two declarations for value(in the body of the function) in different two translate units declared the same entity? How to interpret that through the rule in P1787?
The behavior of the program (assuming that the usual ODR constraints are satisfied) is as if there were one definition of function. Whichever definition that is contains the only (operative) declaration of value, which of course declares only one entity.
Note that this singularity of definition is so strong that it is able to make “two different” lambda expressions produce the same closure type without any notion of linkage; it is certainly capable of suppressing a declaration for the purposes of object identity without the assistance of [basic.link].
Consider the following header and assume it is used in several TUs:
static int x = 0;
struct A {
A() {
++x;
printf("%d\n", x);
}
};
As this question explains, this is an ODR violation and, therefore, UB.
Now, there is no ODR violation if our inline function refers to a non-volatile const object and we do not odr-use it within that function (plus the other provisions), so this still works fine in a header:
constexpr int x = 1;
struct A {
A() {
printf("%d\n", x);
}
};
But if we do happen to odr-use it, we are back at square one with UB:
constexpr int x = 1;
struct A {
A() {
printf("%p\n", &x);
}
};
Thus, given we have now inline variables, should not the guideline be to mark all namespace-scoped variables as inline in headers to avoid all problems?
constexpr inline int x = 1;
struct A {
A() {
printf("%p\n", &x);
}
};
This also seems easier to teach, because we can simply say "inline-everything in headers" (i.e. both function and variable definitions), as well as "never static in headers".
Is this reasoning correct? If yes, are there any disadvantages whatsoever of always marking const and constexpr variables in headers as inline?
As you have pointed out, examples one and third does indeed violate ODR as per [basic.def.odr]/12.2.1
[..] in each definition of D, corresponding names, looked up according to [basic.lookup], shall refer to an entity defined within the definition of D, or shall refer to the same entity, after overload resolution and after matching of partial template specialization, except that a name can refer to
a non-volatile const object with internal or no linkage if the object
is not odr-used in any definition of D, [..]
Is this reasoning correct?
Yes, inline variables with external linkage are guaranteed to refer to the same entity even when they are odr-used as long all the definitions are the same:
[dcl.inline]/6
An inline function or variable shall be defined in every translation unit in which it is odr-used and shall have exactly the same definition in every case ([basic.def.odr]). [..] An inline function or variable with external linkage shall have the same address in all translation units.
The last example is OK because it meets and don't violate the bold part of the above.
are there any disadvantages whatsoever of always marking const and constexpr variables in headers as inline?
I can't think of any, because if we keep the promise of having the exact same definition of an inline variable with external linkage through TU's, the compiler is free to pick any of them to refer to the variable, this will be the same, technically, as having just one TU and have a global variable declared in the header with appropriate header guards
So, as far as I know, this is legal in C:
foo.c
struct foo {
int a;
};
bar.c
struct foo {
char a;
};
But the same thing with functions is illegal:
foo.c
int foo() {
return 1;
}
bar.c
int foo() {
return 0;
}
and will result in linking error (multiple definition of function foo).
Why is that? What's the difference between struct names and function names that makes C unable to handle one but not the other?
Also does this behavior extend to C++?
Why is that?
struct foo {
int a;
};
defines a template for creating objects. It does not create any objects or functions. Unless struct foo is used somewhere in your code, as far as the compiler/linker is concerned, those lines of code may as well not exist.
Please note that there is a difference in how C and C++ deal with incompatible struct definitions.
The differing definitions of struct foo in your posted code, is ok in a C program as long as you don't mix their usage.
However, it is not legal in C++. In C++, they have external linkage and must be defined identically. See 3.2 One definition rule/5 for further details.
The distinguishing concept in this case is called linkage.
In C struct, union or enum tags have no linkage. They are effectively local to their scope.
6.2.2 Linkages of identifiers
6 The following identifiers have no linkage: an identifier declared to be anything other than
an object or a function; an identifier declared to be a function parameter; a block scope
identifier for an object declared without the storage-class specifier extern.
They cannot be re-declared in the same scope (except for so called forward declarations). But they can be freely re-declared in different scopes, including different translation units. In different scopes they may declare completely independent types. This is what you have in your example: in two different translation units (i.e. in two different file scopes) you declared two different and unrelated struct foo types. This is perfectly legal.
Meanwhile, functions have linkage in C. In your example these two definitions define the same function foo with external linkage. And you are not allowed to provide more than one definition of any external linkage function in your entire program
6.9 External definitions
5 [...] If an identifier declared with external
linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the identifier; otherwise, there
shall be no more than one.
In C++ the concept of linkage is extended: it assigns specific linkage to a much wider variety of entities, including types. In C++ class types have linkage. Classes declared in namespace scope have external linkage. And One Definition Rule of C++ explicitly states that if a class with external linkage has several definitions (across different translation units) it shall be defined equivalently in all of these translation units (http://eel.is/c++draft/basic.def.odr#12). So, in C++ your struct definitions would be illegal.
Your function definitions remain illegal in C++ as well because of C++ ODR rule (but essentially for the same reasons as in C).
Your function definitions both declare an entity called foo with external linkage, and the C standard says there must not be more than one definition of an entity with external linkage. The struct types you defined are not entities with external linkage, so you can have more than one definition of struct foo.
If you declared objects with external linkage using the same name then that would be an error:
foo.c
struct foo {
int a;
};
struct foo obj;
bar.c
struct foo {
char a;
};
struct foo obj;
Now you have two objects called obj that both have external linkage, which is not allowed.
It would still be wrong even if one of the objects is only declared, not defined:
foo.c
struct foo {
int a;
};
struct foo obj;
bar.c
struct foo {
char a;
};
extern struct foo obj;
This is undefined, because the two declarations of obj refer to the same object, but they don't have compatible types (because struct foo is defined differently in each file).
C++ has similar, but more complex rules, to account for inline functions and inline variables, templates, and other C++ features. In C++ the relevant requirements are known as the One-Definition Rule (or ODR). One notable difference is that C++ doesn't even allow the two different struct definitions, even if they are never used to declare objects with external linkage or otherwise "shared" between translation units.
The two declarations for struct foo are incompatible with each other because the types of the members are not the same. Using them both within each translation unit is fine as long as you don't do anything to confuse the two.
If for example you did this:
foo.c:
struct foo {
char a;
};
void bar_func(struct foo *f);
void foo_func()
{
struct foo f;
bar_func(&f);
}
bar.c:
struct foo {
int a;
};
void bar_func(struct foo *f)
{
f.a = 1000;
}
You would be invoking undefined behavior because the struct foo that bar_func expects is not compatible with the struct foo that foo_func is supplying.
The compatibility of structs is detailed in section 6.2.7 of the C standard:
1 Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are
described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers,
and in 6.7.6 for declarators. Moreover, two structure, union, or
enumerated types declared in separate translation units are compatible
if their tags and members satisfy the following requirements: If one
is declared with a tag, the other shall be declared with the same tag.
If both are completed anywhere within their respective translation
units, then the following additional requirements apply: there shall
be a one-to-one correspondence between their members such that each
pair of corresponding members are declared with compatible types; if
one member of the pair is declared with an alignment specifier, the
other is declared with an equivalent alignment specifier; and if one
member of the pair is declared with a name, the other is declared with
the same name. For two structures, corresponding members shall be
declared in the same order. For two structures or unions,
corresponding bit-fields shall have the same widths. For two
enumerations, corresponding members shall have the same values.
2 All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
To summarize, the two instances of struct foo must have members with the same name and type and in the same order to be compatible.
Such rules are needed so that a struct can be defined once in a header file and that header is subsequently included in multiple source files. This results in the struct being defined in multiple source files, but with each instance being compatible.
The difference isn't so much in the names as in existence; a struct definition isn't stored anywhere and its name only exists during compilation.
(It is the programmer's responsibility to ensure that there is no conflict in the uses of identically named structs. Otherwise, our dear old friend Undefined Behaviour comes calling.)
On the other hand, a function needs to be stored somewhere, and if it has external linkage, the linker needs its name.
If you make your functions static, so they're "invisible" outside their respective compilation unit, the linking error will disappear.
To hide the function definition from the linker use the keyword static.
foo.c
static int foo() {
return 1;
}
bar.c
static int foo() {
return 0;
}
I accidentally made an error using the extern keyword and then discovered that the compiler allowed my line of code. Why is the following program allowed? Does the compiler strip off the extern keyword? It does not even give a warning.
#include <iostream>
extern void test() { std::cout << "Hello world!" << std::endl; };
int main()
{
test();
}
This is not unusual, in fact, C++ internally does that exact same thing, note what 1.4/6 says:
The templates, classes, functions, and objects in the library have external linkage
The templates in the library very obviously have definitions, too. And why wouldn't they!
See what 3.1/2 and 3.4/2 (emphasis added) have to say:
A declaration is a definition unless it declares a function without specifying the function’s body (8.4), [or] it contains the extern specifier (7.1.1) or a linkage-specification25 (7.5) and neither an initializer nor a functionbody
[...]
When a name has external linkage , the entity it denotes can be referred to by names from scopes of other translation units or from other scopes of the same translation unit.
Your declaration has a function body, so it is a definition, and that's explicitly, perfectly allowable. The function has external linkage, which means you could refer to it by a name from a scope in another translation unit, but you're not required to do that.
You're still perfectly allowed to call it by its name in the current translation unit, and that is what you are doing.
Note that there's a clause about names at namespace scope, so your usage of the extern keyword is somewhat redundant anyway.