Identity of unnamed enums with no enumerators - c++

Consider a program with the following two translation units:
// TU 1
#include <typeinfo>
struct S {
enum { } x;
};
const std::type_info& ti1 = typeid(decltype(S::x));
// TU 2
#include <iostream>
#include <typeinfo>
struct S {
enum { } x;
};
extern std::type_info& ti1;
const std::type_info& ti2 = typeid(decltype(S::x));
int main() {
std::cout << (ti1 == ti2) << '\n';
}
I compiled it with GCC and Clang and in both cases the result was 1, and I'm not sure why. (GCC also warns that "ISO C++ forbids empty unnamed enum", which I don't think is true.)
[dcl.enum]/11 states that if an unnamed enumeration does not have a typedef name for linkage purposes but has at least one enumerator, then it has its first enumerator as its name for linkage purposes. These enums have no enumerators, so they have no name for linkage purposes. The same paragraph also has the following note which seems to be a natural consequence of not giving the enums names for linkage purposes:
[Note 3: Each unnamed enumeration with no enumerators is a distinct type. — end note]
Perhaps both compilers have a bug. Or, more likely, I just misunderstood the note. The note is non-normative anyway, so let's look at some normative wording.
[basic.link]/8
Two declarations of entities declare the same entity if, considering declarations of unnamed types to introduce their names for linkage purposes, if any ([dcl.typedef], [dcl.enum]), they correspond ([basic.scope.scope]), have the same target scope that is not a function or template parameter scope, and
[irrelevant]
[irrelevant]
they both declare names with external linkage.
[basic.scope.scope]/4
Two declarations correspond if they (re)introduce the same name, both declare constructors, or both declare destructors, unless [irrelevant]
It seems that, when an unnamed enum is not given a typedef name for linkage purposes, and has no enumerators, it can't be the same type as itself in a different translation unit.
So is it really just a compiler bug? One last thing I was thinking is that if the two enum types really are distinct, then the multiple definitions of S violate the one-definition rule and make the program ill-formed NDR. But I couldn't find anything in the ODR that actually says that.

This program is well-formed and prints 1, as seen. Because S is defined identically in both translation units with external linkage, it is as if there is one definition of S ([basic.def.odr]/14) and thus only one enumeration type is defined. (In practice it is mangled based on the name S or S::x.)
This is just the same phenomenon as static local variables and lambdas being shared among the definitions of an inline function:
// foo.hh
inline int* f() {static int x; return &x;}
inline auto g(int *p) {return [p] {return p;};}
inline std::vector<decltype(g(nullptr))> v;
// bar.cc
#include"foo.hh"
void init() {v.push_back(g(f()));}
// main.cc
#include"foo.hh"
void init();
int main() {
init();
return v.front()()!=f(); // 0
}

Related

How to use P1787 to interpret why the static local variable in inline function refers to the same object

P1787 has an excellent description for what are the same entity.
Two declarations of entities declare the same entity if, considering declarations of unnamed types to introduce their names for linkage purposes, if any ([dcl.typedef], [dcl.enum]), they correspond ([basic.scope.scope]), have the same target scope that is not a function or template parameter scope, and either
they appear in the same translation unit, or
they both declare names with module linkage and are attached to the same module, or
they both declare names with external linkage.
So, consider this example:
// a.hpp
inline int& function(){
static int value = 0; // #1
return value;
}
----------------------
//b.cpp
#include "a.hpp"
void g(){
auto&& rf = function();
}
----------------------
//c.cpp
#include "a.hpp"
int main(){
auto&& rf0 = function();
}
Except for the note says that:
[ Note: An inline function or variable with external or module linkage can be defined in multiple translation units([basic.def.odr]), but is one entity with one address. A type or variable defined in the body of such a function is therefore a single entity.--end note]
However, let's consider the value declared at #1. In b's TU and c's TU, these two declarations for value are corresponding, and they have the same target scope which is introduced by the compound-statement of function. However, a local variable does not have any linkage, so neither bullet in that list will be satisfied. So, why two declarations for value(in the body of the function) in different two translate units declared the same entity? How to interpret that through the rule in P1787?
The behavior of the program (assuming that the usual ODR constraints are satisfied) is as if there were one definition of function. Whichever definition that is contains the only (operative) declaration of value, which of course declares only one entity.
Note that this singularity of definition is so strong that it is able to make “two different” lambda expressions produce the same closure type without any notion of linkage; it is certainly capable of suppressing a declaration for the purposes of object identity without the assistance of [basic.link].

Should `const` and `constexpr` variables in headers be `inline` to prevent ODR violations?

Consider the following header and assume it is used in several TUs:
static int x = 0;
struct A {
A() {
++x;
printf("%d\n", x);
}
};
As this question explains, this is an ODR violation and, therefore, UB.
Now, there is no ODR violation if our inline function refers to a non-volatile const object and we do not odr-use it within that function (plus the other provisions), so this still works fine in a header:
constexpr int x = 1;
struct A {
A() {
printf("%d\n", x);
}
};
But if we do happen to odr-use it, we are back at square one with UB:
constexpr int x = 1;
struct A {
A() {
printf("%p\n", &x);
}
};
Thus, given we have now inline variables, should not the guideline be to mark all namespace-scoped variables as inline in headers to avoid all problems?
constexpr inline int x = 1;
struct A {
A() {
printf("%p\n", &x);
}
};
This also seems easier to teach, because we can simply say "inline-everything in headers" (i.e. both function and variable definitions), as well as "never static in headers".
Is this reasoning correct? If yes, are there any disadvantages whatsoever of always marking const and constexpr variables in headers as inline?
As you have pointed out, examples one and third does indeed violate ODR as per [basic.def.odr]/12.2.1
[..] in each definition of D, corresponding names, looked up according to [basic.lookup], shall refer to an entity defined within the definition of D, or shall refer to the same entity, after overload resolution and after matching of partial template specialization, except that a name can refer to
a non-volatile const object with internal or no linkage if the object
is not odr-used in any definition of D, [..]
Is this reasoning correct?
Yes, inline variables with external linkage are guaranteed to refer to the same entity even when they are odr-used as long all the definitions are the same:
[dcl.inline]/6
An inline function or variable shall be defined in every translation unit in which it is odr-used and shall have exactly the same definition in every case ([basic.def.odr]). [..] An inline function or variable with external linkage shall have the same address in all translation units.
The last example is OK because it meets and don't violate the bold part of the above.
are there any disadvantages whatsoever of always marking const and constexpr variables in headers as inline?
I can't think of any, because if we keep the promise of having the exact same definition of an inline variable with external linkage through TU's, the compiler is free to pick any of them to refer to the variable, this will be the same, technically, as having just one TU and have a global variable declared in the header with appropriate header guards

Why are non member static constexpr variables not implicitly inline?

In C++17 we got inline variables and I have assumed that global constexpr variables are implicitly inline.
But apparently this is true only for static member variables.
What is the logic/technical limitation behind this?
source:
A static member variable (but not a namespace-scope variable) declared constexpr is implicitly an inline variable.
The reason why constexpr static data members were made implicitly inline was to solve a common problem in C++: when defining a class-scoped constant, one was previously forced to emit the definition in exactly one translation unit, lest the variable be ODR-used:
// foo.h
struct foo {
static constexpr int kAnswer = 42;
};
// foo.cpp
// a linker error will occur if this definition is omitted before C++17
#include "foo.h"
constexpr int foo::kAnswer;
// main.cpp
#include "foo.h"
#include <vector>
int main() {
std::vector<int> bar;
bar.push_back(foo::kAnswer); // ODR-use of 42
}
In such cases, we usually care only about the value of the constant, not its address; and it's convenient for the compiler to synthesize a unique location for the constant in case it really is ODR-used, but we don't care where that location is.
Thus, C++17 changed the rules so that the out-of-line definition is no longer required. In order to do so, it makes the declaration of foo::kAnswer an inline definition, so that it can appear in multiple translation units without clashing, just like inline functions.
For namespace-scope constexpr variables (which are implicitly static, and therefore have internal linkage, unless declared extern) there is no similar issue. Each translation unit has its own copy. inline, as it's currently specified, would have no effect on such variables. And changing the existing behaviour would break existing programs.
The point here is that constexpr int x = 1; at namespace scope has internal linkage in C++14.
If you make it implicitly inline without changing the internal linkage part, the change would have no effect, because the internal linkage means that it can't be defined in other translation units anyway. And it harms teachability, because we want things like inline constexpr int x = 1; to get external linkage by default (the whole point of inline, after all, is to permit the same variable to be defined in multiple translation units).
If you make it implicitly inline with external linkage, then you break existing code:
// TU1
constexpr int x = 1;
// TU2
constexpr int x = 2;
This perfectly valid C++14 would become an ODR violation.

Struct vs. Function Definitions in Scope

So, as far as I know, this is legal in C:
foo.c
struct foo {
int a;
};
bar.c
struct foo {
char a;
};
But the same thing with functions is illegal:
foo.c
int foo() {
return 1;
}
bar.c
int foo() {
return 0;
}
and will result in linking error (multiple definition of function foo).
Why is that? What's the difference between struct names and function names that makes C unable to handle one but not the other?
Also does this behavior extend to C++?
Why is that?
struct foo {
int a;
};
defines a template for creating objects. It does not create any objects or functions. Unless struct foo is used somewhere in your code, as far as the compiler/linker is concerned, those lines of code may as well not exist.
Please note that there is a difference in how C and C++ deal with incompatible struct definitions.
The differing definitions of struct foo in your posted code, is ok in a C program as long as you don't mix their usage.
However, it is not legal in C++. In C++, they have external linkage and must be defined identically. See 3.2 One definition rule/5 for further details.
The distinguishing concept in this case is called linkage.
In C struct, union or enum tags have no linkage. They are effectively local to their scope.
6.2.2 Linkages of identifiers
6 The following identifiers have no linkage: an identifier declared to be anything other than
an object or a function; an identifier declared to be a function parameter; a block scope
identifier for an object declared without the storage-class specifier extern.
They cannot be re-declared in the same scope (except for so called forward declarations). But they can be freely re-declared in different scopes, including different translation units. In different scopes they may declare completely independent types. This is what you have in your example: in two different translation units (i.e. in two different file scopes) you declared two different and unrelated struct foo types. This is perfectly legal.
Meanwhile, functions have linkage in C. In your example these two definitions define the same function foo with external linkage. And you are not allowed to provide more than one definition of any external linkage function in your entire program
6.9 External definitions
5 [...] If an identifier declared with external
linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the identifier; otherwise, there
shall be no more than one.
In C++ the concept of linkage is extended: it assigns specific linkage to a much wider variety of entities, including types. In C++ class types have linkage. Classes declared in namespace scope have external linkage. And One Definition Rule of C++ explicitly states that if a class with external linkage has several definitions (across different translation units) it shall be defined equivalently in all of these translation units (http://eel.is/c++draft/basic.def.odr#12). So, in C++ your struct definitions would be illegal.
Your function definitions remain illegal in C++ as well because of C++ ODR rule (but essentially for the same reasons as in C).
Your function definitions both declare an entity called foo with external linkage, and the C standard says there must not be more than one definition of an entity with external linkage. The struct types you defined are not entities with external linkage, so you can have more than one definition of struct foo.
If you declared objects with external linkage using the same name then that would be an error:
foo.c
struct foo {
int a;
};
struct foo obj;
bar.c
struct foo {
char a;
};
struct foo obj;
Now you have two objects called obj that both have external linkage, which is not allowed.
It would still be wrong even if one of the objects is only declared, not defined:
foo.c
struct foo {
int a;
};
struct foo obj;
bar.c
struct foo {
char a;
};
extern struct foo obj;
This is undefined, because the two declarations of obj refer to the same object, but they don't have compatible types (because struct foo is defined differently in each file).
C++ has similar, but more complex rules, to account for inline functions and inline variables, templates, and other C++ features. In C++ the relevant requirements are known as the One-Definition Rule (or ODR). One notable difference is that C++ doesn't even allow the two different struct definitions, even if they are never used to declare objects with external linkage or otherwise "shared" between translation units.
The two declarations for struct foo are incompatible with each other because the types of the members are not the same. Using them both within each translation unit is fine as long as you don't do anything to confuse the two.
If for example you did this:
foo.c:
struct foo {
char a;
};
void bar_func(struct foo *f);
void foo_func()
{
struct foo f;
bar_func(&f);
}
bar.c:
struct foo {
int a;
};
void bar_func(struct foo *f)
{
f.a = 1000;
}
You would be invoking undefined behavior because the struct foo that bar_func expects is not compatible with the struct foo that foo_func is supplying.
The compatibility of structs is detailed in section 6.2.7 of the C standard:
1 Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are
described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers,
and in 6.7.6 for declarators. Moreover, two structure, union, or
enumerated types declared in separate translation units are compatible
if their tags and members satisfy the following requirements: If one
is declared with a tag, the other shall be declared with the same tag.
If both are completed anywhere within their respective translation
units, then the following additional requirements apply: there shall
be a one-to-one correspondence between their members such that each
pair of corresponding members are declared with compatible types; if
one member of the pair is declared with an alignment specifier, the
other is declared with an equivalent alignment specifier; and if one
member of the pair is declared with a name, the other is declared with
the same name. For two structures, corresponding members shall be
declared in the same order. For two structures or unions,
corresponding bit-fields shall have the same widths. For two
enumerations, corresponding members shall have the same values.
2 All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.
To summarize, the two instances of struct foo must have members with the same name and type and in the same order to be compatible.
Such rules are needed so that a struct can be defined once in a header file and that header is subsequently included in multiple source files. This results in the struct being defined in multiple source files, but with each instance being compatible.
The difference isn't so much in the names as in existence; a struct definition isn't stored anywhere and its name only exists during compilation.
(It is the programmer's responsibility to ensure that there is no conflict in the uses of identically named structs. Otherwise, our dear old friend Undefined Behaviour comes calling.)
On the other hand, a function needs to be stored somewhere, and if it has external linkage, the linker needs its name.
If you make your functions static, so they're "invisible" outside their respective compilation unit, the linking error will disappear.
To hide the function definition from the linker use the keyword static.
foo.c
static int foo() {
return 1;
}
bar.c
static int foo() {
return 0;
}

Class template overloading across TUs

Consider the following C++11 application:
A.cpp:
template<typename T>
struct Shape {
T x;
T area() const { return x*x; }
};
int testA() {
return Shape<int>{2}.area();
}
B.cpp:
template<typename T, typename U = T>
struct Shape {
T x;
U y;
U area() const { return x*y; }
};
int testB() {
return Shape<int,short>{3,4}.area();
}
Main.cpp:
int testA();
int testB();
int main() {
return testA() + testB();
}
Although it compiles (as long as A and B are in separate TUs), it doesn't look right, and I'm having trouble figuring out why.
Hence my Question: Does this violate ODR, overloading, or any other rule, and if so, what sections of the Standard are violated and why?
It is an ODR violation. Template names have linkage. And both those template names have external linkage, as [basic.link]/4 says:
An unnamed namespace or a namespace declared directly or indirectly
within an unnamed namespace has internal linkage. All other namespaces
have external linkage. A name having namespace scope that has not been
given internal linkage above has the same linkage as the enclosing
namespace if it is the name of
[...]
a template.
And on account of that, since both templates share a name, it means that [basic.def.odr]/5 applies:
There can be more than one definition of a [...] class template
(Clause [temp]) [...] in a program provided that each definition
appears in a different translation unit, and provided the definitions
satisfy the following requirements. Given such an entity named D
defined in more than one translation unit, then
each definition of D shall consist of the same sequence of tokens; and
[...]
If D is a template and is defined in more than one translation unit,
then the preceding requirements shall apply both to names from the
template's enclosing scope used in the template definition
([temp.nondep]), and also to dependent names at the point of
instantiation ([temp.dep]). If the definitions of D satisfy all these
requirements, then the program shall behave as if there were a single
definition of D. If the definitions of D do not satisfy these
requirements, then the behavior is undefined.
Not the same sequence of tokens by a margin.
You can easily resolve it, as Jarod42 suggested, by putting both definitions of the templates into an unnamed namespace, thus giving them internal linkage.