Non-overloadable non-inline function definitions in different translation units - c++

Let's say I have 2 TUs with 2 non-inline function definitions with external linkage which differ only in their return types.
Which paragraph(s) my program violates?
[basic.def.odr]/4 says:
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program outside of a discarded statement; no diagnostic required.
But
This paragraph says "that is odr-used" which may or may not be the case.
How do I tell if I define the same non-inline function in different TUs, after all? [over.dcl]/1 speaks about the same scope.

I believe you're looking for: [basic.link]/9:
Two names that are the same ([basic.pre]) and that are declared in different scopes shall denote the same variable, function, type, template or namespace if
both names have external or module linkage and are declared in declarations attached to the same module, or else both names have internal linkage and are declared in the same translation unit; and
both names refer to members of the same namespace or to members, not by inheritance, of the same class; and
when both names denote functions or function templates, the signatures ([defns.signature], [defns.signature.templ]) are the same.
If multiple declarations of the same name with external linkage would declare the same entity except that they are attached to different modules, the program is ill-formed; no diagnostic is required. [ Note: using-declarations, typedef declarations, and alias-declarations do not declare entities, but merely introduce synonyms. Similarly, using-directives do not declare entities. — end note ]
And [basic.link]/11:
After all adjustments of types (during which typedefs are replaced by their definitions), the types specified by all declarations referring to a given variable or function shall be identical, except that declarations for an array object can specify array types that differ by the presence or absence of a major array bound ([dcl.array]). A violation of this rule on type identity does not require a diagnostic.
And [defns.signature]:
⟨function⟩ name, parameter-type-list ([dcl.fct]), and enclosing namespace (if any)
The return type isn't part of the signature, so you're violating the rule that same signature means same entity.
Generally speaking, all discussions of scope and name lookup in the standard are pretty broken until Davis "The Hero We Don't Deserve" Herring's work goes through.

Related

why a structured binding does not potentially conflict with a variable

The newest standard has been modified to add the content of P1787. One rule about two declarations that potentially conflict is:
Two declarations potentially conflict if they correspond and cause their shared name to denote different entities ([basic.link]).
I don't know the meaning of wording "different entities" here. Does it mean these entities that has different kinds or mean that they are different entities determined by basic.link#8? The entities have these kinds:
An entity is a value, object, reference, structured binding, function, enumerator, type, class member, bit-field, template, template specialization, namespace, or pack.
Such for a example,
void fun(); //#1
extern int fun; //#2
int main(){}
If the wording "different entities" means the former, then #1 denotes a function and #2 denotes a variable, they're different kind of entities. And such two declaration are in the same scope, Hence they are ill-formed per:
The program is ill-formed if, in any scope, a name is bound to two declarations that potentially conflict and one precedes the other
However, if the differenct entities should be determined by [basic.link#8] instead of determing it according to kind of entities. Then, it also makes sense. Because, per [basic.link#8]
Two declarations of entities declare the same entity if, considering declarations of unnamed types to introduce their names for linkage purposes, if any ([dcl.typedef], [dcl.enum]), they correspond ([basic.scope.scope]), have the same target scope that is not a function or template parameter scope, and either
they appear in the same translation unit, or
they both declare names with module linkage and are attached to the same module, or
they both declare names with external linkage.
Since they're corresponding and have the same target scope, and also bullet 1 or bullet 3 is satisfied. So, #1 and #2 are the same entity. However, they violate the following rule:
For any two declarations of an entity E:
If one declares E to be a variable or function, the other shall declare E as one of the same type.
So, Regardless of how to understand "different entities", the first example is always ill-formed. These compilers indeed give a right dignosis.
However, consider the second example
#include <iostream>
struct D{
int m;
};
auto [x] = D{0}; //#1
void show(){
std::cout<<&x<<"\n";
}
int main(){
extern int x; //#2
std::cout<<&x<<"\n";
}
GCC prints the same address. Instead, Clang reports a link error.
Similarly, if "different entities" is the former opinion. Then #1 is a structured binding while #2 is a variable. According to this rule:
if the declaration inhabits a block scope S and declares a function ([dcl.fct]) or uses the extern specifier, the declaration shall not be attached to a named module ([module.unit]); its target scope is the innermost enclosing namespace scope, but the name is bound in S.
So, such two declarations has the same target scope. Hence, they should potentially conflict. If "different entities" is determined by [basic.link]. According to [basic.link#8], they are the same entity. However, it also violates
If one declares E to be a variable or function, the other shall declare E as one of the same type.
So, anyhow, the second example should be ill-formed. Why GCC gives the result that such two entity has the same address. Rather, Clang only gives a link error?
In addition, Is it necessary to add a precondition for [basic.link#8], that is, two entities should be first have the same kind?

Why does the same class being defined in multiple .cpp files not cause a linker multiple definition error?

I'm getting a strange behavior which I don't understand. So I have two different classes with the same name defined in two different cpp files. I understand that this will not cause any error during the compilation of the translation units as they don't know about each other. But shouldn't the linker throw some error when it links these files together?
You're thinking of the one definition rule. I'm quoting from there (boldface is emphasis of my choosing, not a part of the original document).
Your understanding would be correct--it's illegal to define the same function in multiple compilation units:
One and only one definition of every non-inline function or variable that is odr-used (see below) is required to appear in the entire program (including any standard and user-defined libraries). The compiler is not required to diagnose this violation, but the behavior of the program that violates it is undefined.
However, this isn't the case for classes, which can be defined multiple times (up to once in each compilation unit), as long as the definitions are all identical. If they are identical, then you can safely pass instances of that class from one compilation unit to another, since all compilation units have compatible, identical definitions with compatible sizes and memory layouts.
Only one definition of any variable, function, class type, enumeration type, concept (since C++20) or template is allowed in any one translation unit (some of these may have multiple declarations, but only one definition is allowed).
...
There can be more than one definition in a program, as long as each definition appears in a different translation unit, of each of the following: class type, enumeration type, inline function with external linkage inline variable with external linkage (since C++17), class template, non-static function template, static data member of a class template, member function of a class template, partial template specialization, concept, (since C++20) as long as all of the following is true:
each definition consists of the same sequence of tokens (typically, appears in the same header file)
name lookup from within each definition finds the same entities (after overload-resolution), except that constants with internal or no linkage may refer to different objects as long as they are not ODR-used and have the same values in every definition.
overloaded operators, including conversion, allocation, and deallocation functions refer to the same function from each definition (unless referring to one defined within the definition)
the language linkage is the same (e.g. the include file isn't inside an extern "C" block)
the three rules above apply to every default argument used in each definition
if the definition is for a class with an implicitly-declared constructor, every translation unit where it is odr-used must call the same constructor for the base and members
if the definition is for a template, then all these requirements apply to both names at the point of definition and dependent names at the point of instantiation
If all these requirements are satisfied, the program behaves as if there is only one definition in the entire program. Otherwise, the behavior is undefined.
The bullet points are a fancy and highly precise way of specifying that the definitions must be the same, in letter and in effective result.
The one-definition rule specifically permits this, as long as those definitions are completely, unadulteratedly, identical.
And I do mean absolutely identical. Even if you swap the token struct for the token class, in a case where it would otherwise not matter, your program has undefined behaviour.
And it's for good reason: typically we define classes in headers, and we typically include such headers into multiple translation units; it would be very awkward if this were not allowed.
The same applies to inline function definitions for the same reason.
As for why you don't get an error: well, like I said, undefined behaviour. It would technically be possible for the toolchain to diagnose this, but since multiple class definitions with the same name are a totally commonplace thing to do (per above), it's arguably a waste of time to come up with what would be quite complicated logic for the linker of all things to try to diagnose "accidents". Ultimately, as with many things in this language, it's left up to you to try to get it right.

Why C++'s <vector> templated class doesn't break one definition rule?

Maybe its lame question, But I don't get it!
If I include <string> or <vector> in multiple translation units (different .cpp) why it doesn't break the ODR?
As far as I know each .cpp is compiled differently so vector's methods code will be generated for each object file separately, right?
So linker should detect it and complain.
Even If it won't (I suspect it's special case for templates) will it be using one code or different set of cloned code in each unit, when I link all together???
The same way any template definitions don't break the ODR — the ODR specifically says that template definitions may be duplicated across translation units, as long as they are literally duplicates (and, since they are duplicates, no conflict or ambiguity is possible).
[C++14: 3.2/6]: There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements [..]
Multiple inclusions of <vector> within the same translation unit are expressly permitted and effectively elided, more than likely by "#ifndef" header guards.
The standard has a special exception for templates that allows for duplication of functions that otherwise would violate ODR (such as functions with external linkage and non-inline member functions). from C++11 3.2/5:
If D is a template and is defined in more than one translation unit,
then the preceding requirements shall apply both to names from the
template’s enclosing scope used in the template definition (14.6.3),
and also to dependent names at the point of instantiation (14.6.2). If
the definitions of D satisfy all these requirements, then the program
shall behave as if there were a single definition of D. If the
definitions of D do not satisfy these requirements, then the behavior
is undefined.
The ODR doesn't state that a struct will only be declared one time across all compilation units--it states that if you declare a struct in multiple compilation units, it has to be the same struct. Violating the ODR would be if you had two separate vector types with the same name but different contents. At that point the linker would get confused and you'd get mixed up code and/or errors.

How does the standard library manage to have non-inline functions in headers and how can I do that?

I understand that in the standard library for C++, they have lots of functions that are not inline. However, by reading multiple other SO questions/answers, I have discovered that having non-inline functions in headers that are included across multiple source files causes multiple definition errors. So how would I do this? I know usually a source file should be used to define non-inline functions, but how is it done? I'd just like to know. And not just in the standard library I know this is used, lots of other libraries also do that, such as, say, boost, among pretty much every other major library.
The relevant rules are found in §3.2 [basic.def.odr] of the standard:
(p1) No translation unit shall contain more than one definition of any variable, function, class type, enumeration type, or template.
(p4) Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required. [...] An inline function shall be defined in every translation unit in which it is odr-used.
(p6) There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14), non-static function template (14.5.6), static data member of a class template (14.5.1.3), member function of a class template (14.5.1.1), or template specialization for which some template parameters are not specified (14.7, 14.5.5) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. [This is followed by one whole page of rules that basically says the definitions must be completely identical.]
In short, to not violate ODR, a "function" defined in a header that's included in multiple translation units must be (1) inline; (2) a function template; or (3) a member of a class template.

Parsing declarations before definitions in a class

This question here piqued my interest a little. Is there anywhere in the C++ standard that specifies all declarations within a class must be parsed before any accompanying implementations of member functions? I've seen a few other questions similar to this but no references to the standard in any of the answers.
The Standard doesn't specify how the compiler should parse a translation unit. Instead, it specifies everywhere it is and is not valid to use any identifier to refer to a declaration.
3.3.2p5:
After the point of declaration of a class member, the member name can be looked up in the scope of its
class. [ Note: this is true even if the class is an incomplete class. ]
3.3.7p1:
The following rules describe the scope of names declared in classes.
The potential scope of a name declared in a class consists not only of the declarative region following the name’s point of declaration, but also of all function bodies, brace-or-equal-initializers of non-static data members, and default arguments in that class (including such things in nested classes).
A name N used in a class S shall refer to the same declaration in its context and when re-evaluated in the completed scope of S. No diagnostic is required for a violation of this rule.
If reordering member declarations in a class yields an alternate valid program under (1) and (2), the program is ill-formed, no diagnostic is required.
A name declared within a member function hides a declaration of the same name whose scope extends to or past the end of the member function’s class.
The potential scope of a declaration that extends to or past the end of a class definition also extends to the regions defined by its member definitions, even if the members are defined lexically outside the class (this includes static data member definitions, nested class definitions, member function definitions (including the member function body and any portion of the declarator part of such definitions which follows the declarator-id, including a parameter-declaration-clause and any default arguments (8.3.6).
[class.mem] says:
-2- A class is considered a completely-defined object type (3.9) (or complete type) at the closing } of the class-specifier. Within the class member-specification, the class is regarded as complete within function bodies, default arguments, and brace-or-equal-initializers for non-static data members (including such things in nested classes). Otherwise it is regarded as incomplete within its own class member-specification.
For the class to be complete within function bodies then in general all declaration need to be parsed: without completely parsing all declaration you can't know if something that wasn't parsed would change the meaning. Although, possibly related to that is [basic.scope.class]/1 which says:
3) If reordering member declarations in a class yields an alternate valid program under (1) and (2), the program is ill-formed, no diagnostic is required.
That means certain declarations could be used without parsing the entire class, because if another later declaration altered the meaning then the program would be ill-formed.
Of course the "as if" rule allows the compiler to choose any implementation as long as the user can't tell the difference, so maybe a compiler could choose to parse function bodies and then parse definitions as needed, but it would be hard to tell what's needed to process the member function definition (consider a function call which might call one of several overloaded functions, possibly involving enable_if-type tricks.)
This is the draft which explains C++ Programming Language Standard.
Programming Language C++ PDF
I think page 220 has some explanations on member functions.