So, as far as I know, this is legal in C:
foo.c
struct foo {
int a;
};
bar.c
struct foo {
char a;
};
But the same thing with functions is illegal:
foo.c
int foo() {
return 1;
}
bar.c
int foo() {
return 0;
}
and will result in linking error (multiple definition of function foo).
Why is that? What's the difference between struct names and function names that makes C unable to handle one but not the other?
Also does this behavior extend to C++?
Why is that?
struct foo {
int a;
};
defines a template for creating objects. It does not create any objects or functions. Unless struct foo is used somewhere in your code, as far as the compiler/linker is concerned, those lines of code may as well not exist.
Please note that there is a difference in how C and C++ deal with incompatible struct definitions.
The differing definitions of struct foo in your posted code, is ok in a C program as long as you don't mix their usage.
However, it is not legal in C++. In C++, they have external linkage and must be defined identically. See 3.2 One definition rule/5 for further details.
The distinguishing concept in this case is called linkage.
In C struct, union or enum tags have no linkage. They are effectively local to their scope.
6.2.2 Linkages of identifiers
6 The following identifiers have no linkage: an identifier declared to be anything other than
an object or a function; an identifier declared to be a function parameter; a block scope
identifier for an object declared without the storage-class specifier extern.
They cannot be re-declared in the same scope (except for so called forward declarations). But they can be freely re-declared in different scopes, including different translation units. In different scopes they may declare completely independent types. This is what you have in your example: in two different translation units (i.e. in two different file scopes) you declared two different and unrelated struct foo types. This is perfectly legal.
Meanwhile, functions have linkage in C. In your example these two definitions define the same function foo with external linkage. And you are not allowed to provide more than one definition of any external linkage function in your entire program
6.9 External definitions
5 [...] If an identifier declared with external
linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the identifier; otherwise, there
shall be no more than one.
In C++ the concept of linkage is extended: it assigns specific linkage to a much wider variety of entities, including types. In C++ class types have linkage. Classes declared in namespace scope have external linkage. And One Definition Rule of C++ explicitly states that if a class with external linkage has several definitions (across different translation units) it shall be defined equivalently in all of these translation units (http://eel.is/c++draft/basic.def.odr#12). So, in C++ your struct definitions would be illegal.
Your function definitions remain illegal in C++ as well because of C++ ODR rule (but essentially for the same reasons as in C).
Your function definitions both declare an entity called foo with external linkage, and the C standard says there must not be more than one definition of an entity with external linkage. The struct types you defined are not entities with external linkage, so you can have more than one definition of struct foo.
If you declared objects with external linkage using the same name then that would be an error:
foo.c
struct foo {
int a;
};
struct foo obj;
bar.c
struct foo {
char a;
};
struct foo obj;
Now you have two objects called obj that both have external linkage, which is not allowed.
It would still be wrong even if one of the objects is only declared, not defined:
foo.c
struct foo {
int a;
};
struct foo obj;
bar.c
struct foo {
char a;
};
extern struct foo obj;
This is undefined, because the two declarations of obj refer to the same object, but they don't have compatible types (because struct foo is defined differently in each file).
C++ has similar, but more complex rules, to account for inline functions and inline variables, templates, and other C++ features. In C++ the relevant requirements are known as the One-Definition Rule (or ODR). One notable difference is that C++ doesn't even allow the two different struct definitions, even if they are never used to declare objects with external linkage or otherwise "shared" between translation units.
The two declarations for struct foo are incompatible with each other because the types of the members are not the same. Using them both within each translation unit is fine as long as you don't do anything to confuse the two.
If for example you did this:
foo.c:
struct foo {
char a;
};
void bar_func(struct foo *f);
void foo_func()
{
struct foo f;
bar_func(&f);
}
bar.c:
struct foo {
int a;
};
void bar_func(struct foo *f)
{
f.a = 1000;
}
You would be invoking undefined behavior because the struct foo that bar_func expects is not compatible with the struct foo that foo_func is supplying.
The compatibility of structs is detailed in section 6.2.7 of the C standard:
1 Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are
described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers,
and in 6.7.6 for declarators. Moreover, two structure, union, or
enumerated types declared in separate translation units are compatible
if their tags and members satisfy the following requirements: If one
is declared with a tag, the other shall be declared with the same tag.
If both are completed anywhere within their respective translation
units, then the following additional requirements apply: there shall
be a one-to-one correspondence between their members such that each
pair of corresponding members are declared with compatible types; if
one member of the pair is declared with an alignment specifier, the
other is declared with an equivalent alignment specifier; and if one
member of the pair is declared with a name, the other is declared with
the same name. For two structures, corresponding members shall be
declared in the same order. For two structures or unions,
corresponding bit-fields shall have the same widths. For two
enumerations, corresponding members shall have the same values.
2 All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
To summarize, the two instances of struct foo must have members with the same name and type and in the same order to be compatible.
Such rules are needed so that a struct can be defined once in a header file and that header is subsequently included in multiple source files. This results in the struct being defined in multiple source files, but with each instance being compatible.
The difference isn't so much in the names as in existence; a struct definition isn't stored anywhere and its name only exists during compilation.
(It is the programmer's responsibility to ensure that there is no conflict in the uses of identically named structs. Otherwise, our dear old friend Undefined Behaviour comes calling.)
On the other hand, a function needs to be stored somewhere, and if it has external linkage, the linker needs its name.
If you make your functions static, so they're "invisible" outside their respective compilation unit, the linking error will disappear.
To hide the function definition from the linker use the keyword static.
foo.c
static int foo() {
return 1;
}
bar.c
static int foo() {
return 0;
}
Related
I am trying to understand why I get a warning -Wsubobject-linkage when trying to compile this code:
base.hh
#pragma once
#include <iostream>
template<char const *s>
class Base
{
public:
void print()
{
std::cout << s << std::endl;
}
};
child.hh
#pragma once
#include "base.hh"
constexpr char const hello[] = "Hello world!";
class Child : public Base<hello>
{
};
main.cc
#include "child.hh"
int main(void)
{
Child c;
c.print();
return 0;
}
When running g++ main.cc I get this warning:
In file included from main.cc:1:
child.hh:7:7: warning: ‘Child’ has a base ‘Base<(& hello)>’ whose type uses the anonymous namespace [-Wsubobject-linkage]
7 | class Child : public Base<hello>
| ^~~~~
This warning does not occur if the templated value is an int or if child.hh is copied in main.cc
I do not understand what justifies this warning here, and what I'm supposed to understand from it.
The warning is appropriate, just worded a bit unclear.
hello is declared constexpr, as well as redundantly const. A const variable generally has internal linkage (with some exceptions like variable templates, inline variables, etc.).
Therefore hello in the template argument to Base<hello> is a pointer to an internal linkage object. In each translation unit it will refer to a different hello local to that translation unit.
Consequently Base<hello> will be different types in different translation units and so if you include child.hh in more than one translation unit you will violate ODR on Child's definition and therefore your program will have undefined behavior.
The warning is telling you that. The use of "anonymous namespace" is not good. It should say something like "whose type refers to an entity with internal linkage". But otherwise it is exactly stating the problem.
This happens only if you use a pointer or reference to the object as template argument. If you take just the value of a variable, then it doesn't matter whether that variable has internal or external linkage since the type Base<...> will be determined by the value of the variable, not the identity of the variable.
I don't know why you need the template argument to be a pointer here, but if you really need it and you actually want to include the .hh file in multiple .cc files (directly or indirectly), then you need to give hello external linkage. Then you will have the problem that an external linkage variable may only be defined in one translation unit, which you can circumvent by making the variable inline:
inline constexpr char hello[] = "Hello world!";
inline implies external linkage (even if the variable is const) and constexpr implies const.
If you only need the value of the string, not the identity of the variable, then since C++20 you can pass a std::array<char, N> as template argument:
#include<array>
#include<concepts>
/*...*/
template<std::array s>
requires std::same_as<typename decltype(s)::value_type, char>
class Base
{
public:
void print()
{
std::cout << s.data() << std::endl;
}
};
/*...*/
constexpr auto hello = std::to_array("Hello world!");
With this linkage becomes irrelevant. Writing Base<std::to_array("Hello world!")> is also possible.
Before C++20 there isn't really any nice way to pass a string by-value as template argument.
Even in C++20 you might want to consider writing your own std::array-like class specifically for holding strings. Such a class needs to satisfy the requirements for a structural type. (std::string does not satisfy them.) On the other hand you might want to widen the allowed template parameter types by using the std::range concept and std::range_iter_t instead of std::array to allow any kind of range that could represent a string. If you don't want any restrictions just auto also works.
If child.hh is included in multiple TUs, it would violate ODR and lead to undefined behavior NDR as explained here.
A new bug for better wording of the warning has been submitted here:
Bug 106141 - Better wording for warning: ‘Child’ has a base ‘Base<(& hello)>’ whose type uses the anonymous namespace [-Wsubobject-linkage]
There is also an old bug submitted as:
Bug 86491 - bogus and unsuppressible warning: 'YYY' has a base 'ZZZ' whose type uses the anonymous namespace.
Consider a program with the following two translation units:
// TU 1
#include <typeinfo>
struct S {
enum { } x;
};
const std::type_info& ti1 = typeid(decltype(S::x));
// TU 2
#include <iostream>
#include <typeinfo>
struct S {
enum { } x;
};
extern std::type_info& ti1;
const std::type_info& ti2 = typeid(decltype(S::x));
int main() {
std::cout << (ti1 == ti2) << '\n';
}
I compiled it with GCC and Clang and in both cases the result was 1, and I'm not sure why. (GCC also warns that "ISO C++ forbids empty unnamed enum", which I don't think is true.)
[dcl.enum]/11 states that if an unnamed enumeration does not have a typedef name for linkage purposes but has at least one enumerator, then it has its first enumerator as its name for linkage purposes. These enums have no enumerators, so they have no name for linkage purposes. The same paragraph also has the following note which seems to be a natural consequence of not giving the enums names for linkage purposes:
[Note 3: Each unnamed enumeration with no enumerators is a distinct type. — end note]
Perhaps both compilers have a bug. Or, more likely, I just misunderstood the note. The note is non-normative anyway, so let's look at some normative wording.
[basic.link]/8
Two declarations of entities declare the same entity if, considering declarations of unnamed types to introduce their names for linkage purposes, if any ([dcl.typedef], [dcl.enum]), they correspond ([basic.scope.scope]), have the same target scope that is not a function or template parameter scope, and
[irrelevant]
[irrelevant]
they both declare names with external linkage.
[basic.scope.scope]/4
Two declarations correspond if they (re)introduce the same name, both declare constructors, or both declare destructors, unless [irrelevant]
It seems that, when an unnamed enum is not given a typedef name for linkage purposes, and has no enumerators, it can't be the same type as itself in a different translation unit.
So is it really just a compiler bug? One last thing I was thinking is that if the two enum types really are distinct, then the multiple definitions of S violate the one-definition rule and make the program ill-formed NDR. But I couldn't find anything in the ODR that actually says that.
This program is well-formed and prints 1, as seen. Because S is defined identically in both translation units with external linkage, it is as if there is one definition of S ([basic.def.odr]/14) and thus only one enumeration type is defined. (In practice it is mangled based on the name S or S::x.)
This is just the same phenomenon as static local variables and lambdas being shared among the definitions of an inline function:
// foo.hh
inline int* f() {static int x; return &x;}
inline auto g(int *p) {return [p] {return p;};}
inline std::vector<decltype(g(nullptr))> v;
// bar.cc
#include"foo.hh"
void init() {v.push_back(g(f()));}
// main.cc
#include"foo.hh"
void init();
int main() {
init();
return v.front()()!=f(); // 0
}
P1787 has an excellent description for what are the same entity.
Two declarations of entities declare the same entity if, considering declarations of unnamed types to introduce their names for linkage purposes, if any ([dcl.typedef], [dcl.enum]), they correspond ([basic.scope.scope]), have the same target scope that is not a function or template parameter scope, and either
they appear in the same translation unit, or
they both declare names with module linkage and are attached to the same module, or
they both declare names with external linkage.
So, consider this example:
// a.hpp
inline int& function(){
static int value = 0; // #1
return value;
}
----------------------
//b.cpp
#include "a.hpp"
void g(){
auto&& rf = function();
}
----------------------
//c.cpp
#include "a.hpp"
int main(){
auto&& rf0 = function();
}
Except for the note says that:
[ Note: An inline function or variable with external or module linkage can be defined in multiple translation units([basic.def.odr]), but is one entity with one address. A type or variable defined in the body of such a function is therefore a single entity.--end note]
However, let's consider the value declared at #1. In b's TU and c's TU, these two declarations for value are corresponding, and they have the same target scope which is introduced by the compound-statement of function. However, a local variable does not have any linkage, so neither bullet in that list will be satisfied. So, why two declarations for value(in the body of the function) in different two translate units declared the same entity? How to interpret that through the rule in P1787?
The behavior of the program (assuming that the usual ODR constraints are satisfied) is as if there were one definition of function. Whichever definition that is contains the only (operative) declaration of value, which of course declares only one entity.
Note that this singularity of definition is so strong that it is able to make “two different” lambda expressions produce the same closure type without any notion of linkage; it is certainly capable of suppressing a declaration for the purposes of object identity without the assistance of [basic.link].
The ODR allows us to define several times the same inline function (with some restrictions).
However, what about the simpler case of static functions?
// First TU
static int foo() { return 0; }
int bar1() { return foo(); }
// Second TU
static int foo() { return 1; }
int bar2() { return foo(); }
If we do a quick read of [basic.def.odr]p4, we could naively conclude that this would be UB:
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program outside of a discarded statement (9.4.1); no diagnostic required.
Where in the C++ standard is specified that each foo is a different function and therefore not breaking the ODR, even if they have the same name?
Is it simply a matter of reading [basic.link]p2.2 (i.e. due to internal linkage the names do not refer to the same entity and therefore [basic.def.odr]p4 does not apply here)? Or are there more nuances/rules involved to make this determination (like something in [basic.scope])?
Note that, with unnamed namespaces, the outcome is clear, because the name would be already different/unique.
Correct — even though they have the same name locally, those are two different functions/entities, so there's no violation.
[basic.link]/4.3: When a name has internal linkage, the entity it denotes can be referred to by names from other scopes in the same translation unit.
[basic.link]/5: A name having namespace scope has internal linkage if it is the name of a variable, variable template, function, or function template that is explicitly declared static; or [..]
I can't immediately find any further wording (normative or otherwise) that applies, but I don't think we need any.
When you have a static global variable in a C++ header file, each translation unit that includes the header file ends up with its own copy of the variable.
However, if I declare a class in that same header file, and create a member function of that class, implemented inline within the class declaration, that uses the static global variable, for example:
#include <iostream>
static int n = 10;
class Foo {
public:
void print() { std::cout << n << std::endl; }
};
then I see slightly odd behavior under gcc 4.4:
If I compile without optimization, all uses of the member function use the copy of the variable from one of the translation units (the first one mentioned on the g++ command line).
If I compile with -O2, each use of the member function uses the copy of the variable from the translation unit in which the case is made.
Obviously this is really bad design, so this question is just out of curiosity. But my question, nonetheless, is what does the C++ standard say about this case? Is g++ behaving correctly by giving different behavior with and without optimization enabled?
The standard says (3.2/5):
There can be more than one definition
of a class type (clause 9),
... provided the definitions satisfy
the following requirements ... in each
definition of D, corresponding names,
looked up according to 3.4, shall
refer to an entity defined within the
definition of D, or shall refer to the
same entity
This is where your code loses. The uses of n in the different definitions of Foo do not refer to the same object. Game over, undefined behavior, so yes gcc is entitled to do different things at different optimization levels.
3.2/5 continues:
except that a name can refer to a
const object with internal or no
linkage if the object has the same
integral or enumeration type in all
definitions of D, and the object is
initialized with a constant expression
(5.19), and the value (but not the
address) of the object is used, and
the object has the same value in all
definitions of D
So in your example code you could make n into a static const int and all would be lovely. It's not a coincidence that this clause describes conditions under which it makes no difference whether the different TUs "refer to" the same object or different objects - all they use is a compile-time constant value, and they all use the same one.