Why class redefinition in a several cpp files is permitted [duplicate] - c++

This question already has answers here:
Same class name in different C++ files
(4 answers)
Closed 8 years ago.
Let I've two cpp files:
//--a.cpp--//
class A
{
public:
void bar()
{
printf("class A");
}
};
//--b.cpp--//
class A
{
public:
void bar()
{
printf("class A");
}
};
When I'm compling and linking this files together I have no errors. But if I'll write the following:
//--a.cpp--//
int a;
//--b.cpp--//
int a;
After compiling and linking this sources I've an error as the redefiniton of a. But in the case of classes I've redefinition to, but there is no error is raised. I'm confused.

Classes are types. For the most part, they are compile-time artifacts; global variables, on the other hand, are runtime artifacts.
In your first example, each translation unit has its own definition of class a. Since the translation units are separate from each other, and because they do not produce global runtime artifacts with identical names, this is OK. The standard requires that there be exactly one definition of a class per translation unit - see sections 3.2.1 and 3.2.4:
No translation unit shall contain more than one definition of any variable, function, class type, enumeration type, or template.
Exactly one definition of a class is required in a translation unit if the class is used in a way that requires the class type to be complete.
However, the standard permits multiple class definitions in separate translation units - see section 3.2.6:
There can be more than one definition of a class type, enumeration type, inline function with external linkage, class template, non-static function template, static data member of a class template, member function of a class template, or template specialization for
which some template parameters are not specified in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. [...]
What follows is a long list of requirements, which boils down to that the two class definitions need to be the same; otherwise, the program is considered ill-formed.
In your second example you are defining a global runtime artifact (variable int a) in two translation units. When the linker tries to produce the final output (an executable or a library) it finds both of these, and issues a redefinition error. Note that the rule 3.2.6 above does not include variables with external linkage.
If you declare your variables static, your program will compile, because static variables are local to a translation unit in which they are defined.
Although both programs would compile, the reasons why they compile are different: in case of multiple class definitions the compiler assumes that the two classes are the same; in the second case, the compiler considers the two variables independent of each other.

There are actually two different flavors of the One Definition Rule.
One flavor, which applies to global and namespace variables, static class members, and functions without the inline keyword, says that there can only be one definition in the entire program. These are the things that typically go in *.cpp files.
The other flavor, which applies to type definitions, functions ever declared with the inline keyword, and anything with a template parameter, says that the definition can appear once per translation unit but must be defined with the same source and have the same meaning in each. It's legal to copy-paste into two *.cpp files as you did, but typically you would put these things in a header file and #include that header from all the *.cpp files that need them.

Classes can't be used (except in very limited ways) unless the definition is available within the translation unit that uses it. This means that you need multiple definitions in order to use it in multiple units, and so the language allows that - as long as all the definitions are identical. The same rules apply to various other entities (such as templates and inline functions) for which a definition is needed at the point of use.
Usually, you would share the definition by putting it in a header, and including that wherever it's needed.
Variables can be used with only a declaration, not the definition, so there's no need to allow multiple definitions. In your case, you could fix the error by making one of them a pure declaration:
extern int a;
so that there is only one definition. Again, it's common for such declarations to go in headers, to make sure they're the same in every file that uses them.
For the full, gory details of the One Definition Rule, see C++11 3.2, [basic.def.odr].

Related

Can you have two classes with the same name and the same member function in different translation units?

Suppose I have two translation units:
//A.cpp
class X
{
};
//B.cpp
class X
{
int i;
};
Is the above program well-formed?
If not, no further questions. If the answer is yes, the program is well-formed (ignore the absence of main), then the second question. What if there is a function with the same name in those?
//A.cpp
class X
{
void f(){}
};
//B.cpp
class X
{
int i;
void f(){}
};
Would this be a problem for the linker as it would see &X::f in both object files? Are anonymous namespaces a must in such a situation?
Is the above program well-formed?
No. It violates the One-Definition Rule:
[basic.def.odr]
There can be more than one definition of a
class type ([class]),
...
in a program provided that each definition appears in a different translation unit and the definitions satisfy the following requirements.
Given such an entity D defined in more than one translation unit, for all definitions of D, or, if D is an unnamed enumeration, for all definitions of D that are reachable at any given program point, the following requirements shall be satisfied.
...
Each such definition shall consist of the same sequence of tokens, where the definition of a closure type is ...
...
Are anonymous namespaces a must in such a situation?
If you need different class definitions, they must be separate types. A uniquely named namespace is one option, and an anonymous namespace is a guaranteed way to get a unique (to the translation unit) namespace.
Short version
Well, no... C++ bases on assumption that every name in a namespace is unique. If you break that assumption you have 0 guarantee it will work.
For example if you have methods with the same name in two translation units (*.o files). Linker wont know which one to use for given call so it will just return an error.
Long version
... but actually yes!
There is actually quite a few situation when you could get away with classes/methods with the same name.
Do not actually use any of these tricks in your programs! Compilers are free to do pretty much anything if they think it will optimize the resulting program so any of the assumption bellow may break.
Classes are the easiest. Let's take some class with only non-static members and no functions. Such a thing don't even leave any trace in the compiled program. Classes/structs are only tools for a programmer to organize data so there is no need to deal with memory pools and offsets.
So basically if you have two classes with the same name in different compilation units, it should work. After the compiler is done with them, they will consist of just a few instruction of how much to move a pointer in memory to access a specific field.
There is hardly anything here that would confuse the linker.
Functions and variables (including static class variables) are more tricky because compiler often creates symbols for them in the *.o file. If you're lucky linker may ignore them if such a function/variable is not used but I wouldn't even count on that.
There are ways, though, to omit creating symbols for them. Static global elements or ones in anonymous namespaces are not visible outside their translation units so linker shouldn't complain on them. Also, inlined functions don't exist as separate entities so they also don't have symbols, which is especially relevant here because functions defined inside classes are inlined by default.
If there is no symbol, linker won't see a conflict and everything should compile.
Templates are also using some dirty tricks because they are compiled on demand in each compilation unit that uses them but they end up as a single copy in the final programs. I don't think this is the same case as multiple different things with the same name so let's drop the topic.
In conclusion, if your classes don't have static members and they do not define functions outside of their bodies, it may be possible to have two classes with the same name as long as you don't include them in the same file.
This is extremely fragile, though. Even if it works right now, a new version of the compiler may have some fix/optimization/change that would broke such program.
Let alone the fact that includes tends to be pretty interwoven in bigger projects so there is decent chance that at some point you will need to include both files in the same place.

Why does the same class being defined in multiple .cpp files not cause a linker multiple definition error?

I'm getting a strange behavior which I don't understand. So I have two different classes with the same name defined in two different cpp files. I understand that this will not cause any error during the compilation of the translation units as they don't know about each other. But shouldn't the linker throw some error when it links these files together?
You're thinking of the one definition rule. I'm quoting from there (boldface is emphasis of my choosing, not a part of the original document).
Your understanding would be correct--it's illegal to define the same function in multiple compilation units:
One and only one definition of every non-inline function or variable that is odr-used (see below) is required to appear in the entire program (including any standard and user-defined libraries). The compiler is not required to diagnose this violation, but the behavior of the program that violates it is undefined.
However, this isn't the case for classes, which can be defined multiple times (up to once in each compilation unit), as long as the definitions are all identical. If they are identical, then you can safely pass instances of that class from one compilation unit to another, since all compilation units have compatible, identical definitions with compatible sizes and memory layouts.
Only one definition of any variable, function, class type, enumeration type, concept (since C++20) or template is allowed in any one translation unit (some of these may have multiple declarations, but only one definition is allowed).
...
There can be more than one definition in a program, as long as each definition appears in a different translation unit, of each of the following: class type, enumeration type, inline function with external linkage inline variable with external linkage (since C++17), class template, non-static function template, static data member of a class template, member function of a class template, partial template specialization, concept, (since C++20) as long as all of the following is true:
each definition consists of the same sequence of tokens (typically, appears in the same header file)
name lookup from within each definition finds the same entities (after overload-resolution), except that constants with internal or no linkage may refer to different objects as long as they are not ODR-used and have the same values in every definition.
overloaded operators, including conversion, allocation, and deallocation functions refer to the same function from each definition (unless referring to one defined within the definition)
the language linkage is the same (e.g. the include file isn't inside an extern "C" block)
the three rules above apply to every default argument used in each definition
if the definition is for a class with an implicitly-declared constructor, every translation unit where it is odr-used must call the same constructor for the base and members
if the definition is for a template, then all these requirements apply to both names at the point of definition and dependent names at the point of instantiation
If all these requirements are satisfied, the program behaves as if there is only one definition in the entire program. Otherwise, the behavior is undefined.
The bullet points are a fancy and highly precise way of specifying that the definitions must be the same, in letter and in effective result.
The one-definition rule specifically permits this, as long as those definitions are completely, unadulteratedly, identical.
And I do mean absolutely identical. Even if you swap the token struct for the token class, in a case where it would otherwise not matter, your program has undefined behaviour.
And it's for good reason: typically we define classes in headers, and we typically include such headers into multiple translation units; it would be very awkward if this were not allowed.
The same applies to inline function definitions for the same reason.
As for why you don't get an error: well, like I said, undefined behaviour. It would technically be possible for the toolchain to diagnose this, but since multiple class definitions with the same name are a totally commonplace thing to do (per above), it's arguably a waste of time to come up with what would be quite complicated logic for the linker of all things to try to diagnose "accidents". Ultimately, as with many things in this language, it's left up to you to try to get it right.

aIs anonymous namespace structure unique

I think I understood that anonymous namespace can be used to make the symbols local to current translation unit. But what about structure definitions, can I assume that they do refer to the same type ?
MyClass.h:
namespace {
class MyClass {};
}
A.h:
#include "MyClass.h"
class A {
MyClass* impl;
void op();
}
A.cpp translation unit 1:
#include "A.h"
void A::op() {
// Let *this->impl refer to a type X.
}
B.cpp translation unit 2:
#include "A.h"
void global_op(const A& a) {
// Can I assume that *a->impl refer to same type X ?
}
No, they do not refer to the same type. The header MyClass.h contains a definition of a class type MyClass inside an unnamed namespace. An unnamed namespace basically makes everything inside it (yes, types too) have internal linkage [basic.link]/6. You have two translation units, each (indirectly) includes MyClass.h, each gets it's own unnamed namespace with it's own MyClass [basic.link]/11.
Think of an unnamed namespace as being a namespace that has a distinct name for each translation unit. So the MyClass in translation unit A is actually $somerandomstringA$::MyClass, while the MyClass in translation unit B is actually $somerandomstringB$::MyClass…
As discussed down in the comments to this answer, be aware of the fact that the program you described above will contain an ODR violation (specifically [basic.def.odr]/12.2) as a result of your class A being defined to contain a member of type MyClass*, which has a different meaning in different translation units.
This program has undefined behavior, since each translation unit defines class ::A but with two different meanings.
An anonymous namespace has internal linkage ([basic.link]/4). The type MyClass has the same linkage as its namespace, so also internal linkage ([basic.link]/4.3). And internal linkage means that the type can only be named from the same translation units, so the two translation units formed from A.cpp and B.cpp define two different types named MyClass. This isn't a problem, yet.
But the global namespace and class A have external linkage. There are two definitions of the single type ::A, but they give the member impl two different types. This is a One Definition Rule violation.
(Although we often say "the ODR", there are really essentially two flavors: [basic.def.odr]/10 applies to things like objects and functions which are namespace members and not marked inline, and says the program can only have one definition, in one TU; so we usually put those in source files. [basic.def.odr]/12 applies to things like types, things marked inline, and declarations with template parameters, and says multiple TUs may each have one definition, but all must have the same token spelling (after preprocessing) and the same meaning; so we often put those in header files so that multiple TUs can use a common definition.)
Specifically here, the program violates [basic.def.odr]/12.2:
There can be more than one definition of a class type, ... in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
...; and
in each definition of D, corresponding names, looked up according to [basic.lookup], shall refer to an entity defined within the definition of D, or shall refer to the same entity, after overload resolution and after matching of partial template specialization ([temp.over]), except that a name can refer to
a non-volatile const object with internal or no linkage if ..., or
a reference with internal or no linkage initialized with a constant expression such that ...;
and ....
... If the definitions of D do not satisfy these requirements, then the behavior is undefined.
Here MyClass is a name within the definitions of class ::A but referring to two different entities, and not falling into any of the specifically permitted categories.
This might work in practice for many systems, since both translation units will see the same member names, types, and offsets within their own MyClass types. But if MyClass ever ends up being used in "name mangling", that will go wrong. And anyway, it's safest to avoid undefined behavior whenever you can.

Declaring static data members of normal class and class template

I read the reason for defining static data members in the source file is because if they were in the header file and multiple source files included the header file- the definitions would get output multiple times. I can see why this would be a problem for the static const data member, but why is this a problem for the static data member?
I'm not too sure I fully understand why there is a problem if the definition is written in the header file...
The multiple definition problem for variables is due to two main deficiencies in the language definition.
As shown below you can easily work around it. There is no technical reason why there is no direct support. It has to do with the feature not being in sufficient high demand that people on the committee have chosen to make it a priority.
First, why multiple definitions in general are a problem. Since C++ lacks support for separately compiled modules (deficiency #1), programmers have to emulate that feature by using textual preprocessing etc. And then it's easy to inadvertently introduce two or more definitions of the same name, which would most likely be in error.
For functions this was solved by the inline keyword and property. A freestanding function can only be explicitly inline, while a member function can be implicitly inline by being defined in the class definition. Either way, if a function is inline then it can be defined in multiple translation units, and it must be defined in every translation unit where it's used, and those definitions must be equivalent.
Mainly that solution allowed classes to be defined in header files.
No such language feature was needed to support data, variables defined in header files, so it just isn't there: you can't have inline variables. This is language deficiency #2.
However, you can obtain the effect of inline variables via a special exemption for static data members of class templates. The reason for the exemption is that class templates generally have to be fully defined in header files (unless the template is only used internally in a translation unit), and so for a class template to be able to have static data members, it's necessary with either an exemption from the general rules, or some special support. The committee chose the exemption-from-the-rules route.
template< class Dummy >
struct math_
{
static double const pi;
};
template< class Dummy >
double const math_<Dummy>::pi = 3.14;
typedef math_<void> math;
The above has been referred to as the templated const trick. As far as I know I was the one who once introduced it, in the [comp.lang.c++] Usenet group, so I can't give credit to someone else. I've also posted it a few times here on SO.
Anyway, this means that every C++ compiler and linker internally supports and must support the machinery needed for inline data, and yet the language doesn't have that feature.
However, on the third hand, C++11 has constexpr, where you can write the above as just
struct math
{
static double constexpr pi = 3.14;
};
Well, there is a difference, that you can't take the address of the C++11 math::pi, but that's a very minor limitation.
I think you're confusing two things: static data members and global variables markes as static.
The latter have internal linkage, which means that if you put their definition in a header file that multiple translation units #include, each translation unit will receive a private copy of those variables.
Global variables marked as const have internal linkage by default, so you won't need to specify static explicitly for those. Hence, the linker won't complain about multiple definitions of global const variable or of global non-const variables marked as static, while it will complain in the other cases (because those variables would have external linkage).
Concerning static data members, this is what Paragraph 9.4.2/5 of the C++11 Standard says:
static data members of a class in namespace scope have external linkage (3.5). A local class shall not have
static data members.
This means that if you put their definition in a header file #included by multiple translation units, you will end up with multiple definitions of the same symbol in the corresponding object files (exactly like non-const global variables), no matter what their const-qualification is. In that case, your program would violate the One Definition Rule.
Also, this Q&A on StackOverflow may give you a clearer understanding of the subject.

Why is there no multiple definition error when you define a class in a header file?

I'm not sure if I asked the question correctly, but let me explain.
First, I read this article that explains the difference between declarations and definitions:
http://www.cprogramming.com/declare_vs_define.html
Second, I know from previous research that it is bad practice to define variables and functions in a header file, because during the linking phase you might have multiple definitions for the same name which will throw an error.
However, how come this doesn't happen for classes? According to another SO answer (
What is the difference between a definition and a declaration? ), the following would be a class DEFINITION:
class MyClass {
private:
public:
};
If the above definition is in a header file. Then , presumably, you can have multiple .cpp files that #include that header. This means the class is defined multiple times after compilation in multiple .o files, but doesn't seem to cause much problems...
On the other hand, if it was a function being defined in the header file, it would cause problems apparently...from what I understand... maybe?
So what's so special about class definitions?
The one-definition rule (3.2, [basic.def.odr]) applies differently to classes and functions:
1 - No translation unit shall contain more than one definition of any variable, function, class type, enumeration type, or template.
[...]
4 - Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program [...]
So while (non-inline) functions may be defined at most once in the whole program (and exactly once if they are called or otherwise odr-used), classes may be defined as many times as you have translation units (source files), but no more than once per translation unit.
The reason for this is that since classes are types, their definitions are necessary to be able to share data between translation units. Originally, classes (structs in C) did not have any data requiring linker support; C++ introduces virtual member functions and virtual inheritance, which require linker support for the vtable, but this is usually worked around by attaching the vtable to (the definition of) a member function.
A class definition is just a kind of a blueprint for the objects of that class. It's been the same with struct since the C days. No classes or structures actually exists in the code as such.
Your class definition defines the class, but does not define and objects of that class. It's OK to have the class (or structure) defined in multiple files, because you're just defining a type, not a variable of that type. If you just had the definition, no code would be emitted by the compiler.
The compiler actually emits code only after you declare an object (i.e. variable) of this type:
class MyClass myvar;
or:
class MyOtherClass {
public: ...
private: ...
} myvar; // note the variable name, it instantiates a MyOtherClass
That is what you do NOT want to do in headers because it will cause multiple instances of myvar to be instantiated.