Which object file contains the following static templatized "member variable"? - c++

Say I have the following template class with a static member function that itself instantiates a static variable (which is functionally a static member variable instantiated the first time its containing routine is called):
template <typename T>
struct foo
{
static int& mystatic()
{
static int value;
return value;
}
};
If I use foo<T> in multiple translation units for some T, into which object file does the compiler put foo<T>::mystatic::value? How is this apparent duplication/conflict resolved at link time?

You do understand that the your function mystatic is a function with external linkage? Which means that the very same conflict exists between multiple definitions of mystatic made in different translation units. Also, exactly the same issue can arise without templates: ordinary inline functions with external linkage defined in header files can produce the same apparent multiple definition conflict (and the same issue with a local static variable can be reproduced there as well).
In order to resolve such conflicts, all such symbols are labeled by the compiler in some implementation-dependent way. By doing this the compiler communicates to the linker the fact that these symbols can legally end up being defined multiple times. For example, one known implementation puts such symbols into a separate section of object file (sometimes called "COMDAT" section). Other implementations might label such symbols in some other way. When the linker discovers such symbols in multiple object files, instead of reporting a multiple definition error, it chooses one and only one of each identical symbol and uses it throughout the entire program. The other copies of each such symbol are discarded by the linker.
One typical consequence of this approach is that your local static variable value will have to be included as an external symbol into each object file, despite the fact that it has no linkage from the language point of view. The name of the symbol will usually be composed of the function name mystatic and variable name value and some other mangling.
In other words, the compiler proper puts both the definition of mystatic and the variable value into all independent object files that use the member function. The linker will later make sure that only one mystatic and only one value will exist in the linked program. There's probably no way to determine, which original object file supplied the surviving copy (if such distinction even makes sense).

Related

Why does the one definition rule exist in C/C++

In C and C++, you can't have a function with two definitions. For example, say we have the following two files:
1.c:
int main(){ return 0}
2.c:
int main(){ return 0}
Issuing the command gcc 1.c 2.c will give you a duplicate symbol linker error.
Why doesn't the same happen with structs and classes? Why are we allowed to have multiple
definitions of the same struct as long as they have the same tokens?
To answer this question, one has to delve into compilation process and what is needed in each part (question why these steps are perfomed is more historical, going back to beginning of C before it's standardization)
C and C++ programs are compiled in multiple steps:
Preprocessing
Compilation
Linkage
Preprocessing is everything that starts with #, it's not really important here.
Compilation is performed on each and every translation unit (typically a single .c or .cpp file plus the headers it includes). Compiler takes one translation unit at a time, reads it and produces an internal list of classes and their members, and then assembly code of each function in given unit (basing on the structures list). If a function call is not inlined (e.g. it is defined in different TU), compiler produces a "link" - "please insert function X here" for the linker to read.
Then linker takes all of the compiled translation units and merges them into one binary, substituting all the links specified by compiler.
Now, what is needed at each phase?
For compilation phase, you need the
definition of every class used in this file - compiler needs to know the size and offset of each class member to produce assembly
declaration of every function used in this file - to produce those "links".
Since function definitions are not needed for producing assembly (as long as they are compiled somewhere), they are not needed in compilation phase, only in linking phase.
To sum up:
One Definition Rule is there to protect programmers from theselves. If they'd accidentally define a function twice, linker will notice that and executable is not produced.
However, class definitions are required in every translation unit, and therefore such a rule cannot be set up for them. Since it cannot be forced by language, programmers have to be responsible beings and not define the same class in different ways.
ODR has also other limitations, e.g. you have to define template functions (or template class methods) in header files. You can also take the responsibility and say to the compiler "Every definition of this function will be the same, trust me dude" and make the function inline.
There is no use case for a function with 2 definitions. Either the two definitions would have to be the same, making it useless, or the compiler wouldn't be able to tell which one you meant.
This is not the case with classes or structures. There is also a large advantage to allowing multiple definitions of them, i.e. if we want to use a class or struct in multiple files. (This leads indirectly to multiple definitions because of includes.)
Structures, classes, unions and enumerations define types that can be used in several compilation units to define objects of these types. So each compilation unit need to know how the types are defined, for example to allocate correctly memory for an object or to be sure that specified member of a class does indeed exist.
For functions (if they are not inline functions) it is enough to have their declaration without their definition to generate for example a function call.
But the function definition shall be single. Otherwise the compiler will not know what function to call or the object code will be too big due to duplication and will be error prone..
It's quite simple: It's a question of scope. Non-static functions are seen (callable) by every compilation unit linked together, while structures are only seen in the compilation unit where they are defined.
For example, it's valid to link the following together because it's clear which definition of struct Foo and which definition of f is being used:
1.c:
struct Foo { int x; };
static void f(void) { struct Foo foo; ... }
2.c:
struct Foo { double d; };
static void f(void) { struct Foo foo; ... }
int main(void) { ... }
But it isn't valid to link the following together because the linker wouldn't know which f to call.
1.c:
void f(void) { ... }
2.c:
void f(void) { ... }
int main(void) { f(); }
Actually every programming element is associated with a scope of its applicability. And within this scope you cannot have the same name associated with multiple definitions of an element. In compiled world:
You cannot have more than one class definition with the same name within a single file. But you can have it in different compilation units.
You cannot have the same function or global variable name within a single link unit (library or executable), but you can potentially have functions named the same within different libraries.
you cannot have shared libraries with the same name situated in the same directory, but you can have them in different directories.
C/C++ compilation is very much after the compilation performance. Checking 2 objects like function or classes for identity is a time-consuming task. So, it is not done. Only names are considered for comparison. It is better to consider that 2 types are different and error out then checking them for identity. The only exception from this rule are text macros.
Macros are a pre-processor concept and historically it is allowed to have multiple identical macro definitions. If a definition changes, a warning gets generated. Comparing macro context is easy, just a simple string comparison, but some macro definitions could be huge.
Types are the compiler concept and they are resolved by the compiler. Types do not exist in object libraries and are represented by the sizes of corresponding variables. So, there is no reason for checking type name collisions at this scope.
Functions and variables on the other hand are named pointers to executable codes or data. They are the building blocks of applications. Applications are assembled from the codes and libraries coming from all around the world in some cases. In order to use someone else's function you'd better now its name and you do not want the same name to be used by some one else. Within a shared library names of functions and variables are usually stored in a hash table. There is no place for duplicates there.
And as I already mention checking functions for identical contents is seldom done, however there are some cases, but not in c or c++.
The reason of impeding two different definitions for the same thing to be used in programming is to avoid the ambiguity of deciding which definition to use at run time.
If you have two different implementations to the same thing to coexist in a program, then there's the possibility of aliasing them (with a different name each) into a common reference to decide at runtime which one of the two to use.
Anyway, in order to distinguish both, you have to be able to indicate the compiler which one you want to use. In C++ you can overload a function, giving it the same name and different lists of parameters, so you can distinguish which one of both you want to use. But in C, the compilers only preserve the name of the function to be able to solve at link time which definition matches the name you use in a different compilation unit. In case the linker ends with two different definitions with the same name, it is uncapable of deciding for you which one to use, so it emits an error and gives up the building process.
What should be the intention of using this ambiguity in a productive way? this is the question you have actually to ask to yourself.

How does this header-only library guard against linker problems?

After reading this question I thought I understood everything, but then I saw this file from a popular header-only library.
The library uses the #ifndef line, but the SO question points out that this is NOT adequate protection against multiple definition errors in multiple TUs.
So one of the following must be true:
It is possible to avoid multiple definition linker errors in ways other than described in the SO question. Perhaps the library is using techniques not mentioned in to the other SO question that are worthy of additional explanation.
The library assumes you won't include its header files in more than translation unit -- this seems fragile since a robust library shouldn't make this assumption on its users.
I'd appreciate having some light shed on this seemingly simple curiosity.
A header that causes linking problems when included in multiple translation units is one that will (attempt to) define some object (not just, for an obvious example, a type) in each source file where it's included.
For example, if you had something like: int f = 0; in a header, then each source file into which it was included would attempt to define f, and when you tried to link the object files together you'd get a complaint about multiple definitions of f.
The "technique" used in this header is simple: it doesn't attempt to define any actual objects. Rather, it includes some typedefs, and the definition of one fairly large class--but not any instances of that class or any instance of anything else either. That class includes a number of member functions, but they're all defined inside the function definition, which implicitly defines them as inline functions (so defining separately in each translation unit in which they're used is not only allowed, but required).
In short, the header only defines types, not objects, so there's nothing there to cause linker collisions when it's included in multiple source files that are linked together.
If the header defines items, as opposed to just declaring them, then it's possible to include it in more than one translation unit (i.e. cpp file) and have multiple definitions and hence linker errors.
I've used boost's unit test framework which is header only. I include a specified header in only one of my own cpp files to get my project to compile. But I include other unit test headers in other cpp files which presumably use the items that are defined in the specified header.
Headers only include libraries like Boost C++ Libraries use (mostly) stand-alone templates and as so are compiled at compile-time and don't require any linkage to binary libraries (that would need separate compilation). One designed to never need linkage is the great Catch
Templates are a special case in C++ regarding multiple definitions, as long as they're the same. See the "One Definition Rule" section of the C++ standard:
There can be more than one definition of a class type (Clause 9),
enumeration type (7.2), inline function with external linkage (7.1.2),
class template (Clause 14), non-static function template (14.5.6),
static data member of a class template (14.5.1.3), member function of
a class template (14.5.1.1), or template specialization for which some
template parameters are not specified (14.7, 14.5.5) in a program
provided that each definition appears in a different translation unit,
and provided the definitions satisfy the following requirements. ....
This is then followed by a list of conditions that make sure the template definitions are identical across translation units.
This specific quote is from the 2014 working draft, section 3.2 ("One Definition Rule"), subsection 6.
This header file can indeed be included in difference source files without causing "multiple symbol definition" errors.
This happens because it is fine to have multiple identically named symbols in different object files as long as these symbols are either weak or local.
Let's take a closer look at the header file. It (potentially) defines several objects like this helper:
static int const helper[] = {0,7,8,13};
Each translation unit that includes this header file will have this helper in it. However, there will be no "multiple symbol definition" errors, since helper is static and thus has internal linkage. The symbols created for helpers will be local and linker will just happily put them all in the resulting executable.
The header file also defines a class template connection. But it is also okay. Class templates can be defined multiple times in different translation units.
In fact, even regular class types can be defined multiple times (I've noticed that you've asked about this in the comments). Symbols created for member functions are usually weak symbols. Once again, weak symbols don't cause "multiple symbol definition" errors, because they can be redefined. Linker will just keep redefining weak symbols with names he has already seen until there will be just one symbol per member function left.
There are also other cases, where certain things (like inline functions and enumerations) can be defined several times in different translation units (see ยง3.2). And mechanisms of achieving this can be different (see class templates and inline functions). But the general rule is not to place stuff with global linkage in header files. As long as you follow this rule, you're really unlikely to stumble upon multiple symbol definitions problems.
And yes, include guards have nothing to do with this.

difference between static functions in C++

Can anybody explain difference between static function defined within class and static function declared e.g. in file.hpp and defined in file.cpp (I can only use this static function within this file ?
Can anybody explain difference between static function defined within class
That means the function is class-wide, and doesn't need to operate on a particular object. In other words, for that function there is no this.
and static function declared e.g. in file.hpp and defined in file.cpp (I can only use this static function within this file ?
That means that that function does not have external linkage, which means other compilation units (i.e. object files) cannot link to it, because it's not in the symbol table.
Thanks for your reply but could you explain why other compilation units cannot link to it ?
First, some terms. Technically, the compiler is just the part that generates object code from source code. The linker later takes a set of object files and "links" them to make the final program.
To make this work, the compiler generates a "symbol table" and puts it in the object file along with the compiled code. This symbol table lists both the symbols for the global variables and functions in the file, as well as the external symbols that code needs to be linked to in order to work.
The linker's job is to read all the object files and match symbols needed by each object file to symbols provided by other object files. If everything is successful, and there aren't any unresolved needed symbols, the link succeeds and you get your program.
What static on a function or global does is simply tell the compiler to not put that symbol in the object file's symbol table. Nothing else; that symbol is still perfectly usable within that same source file. The linker simply never sees the symbol, and thus cannot link anything to it.
Class members cannot be "disappeared" in this manner, so static has a different meaning in the context of a class. (This recycling of the keyword was probably done to avoid adding another reserved word to the language. BTW, Objective-C solved this same problem in a different manner, using the + and - tokens.)
(And static can have yet another meaning when applied to variables declared inside functions or methods, as Mike points out below. In that case it's basically a global variable, but private to the function.)
Could you also explain why inline functions are implicitly defined as static ?
Since inline functions do not exist as independent pieces of code (they are instead merged "in line" into the calling function), they cannot have symbol table entries (there's nothing to link to).
Please refer to this link
A static variable inside a function keeps its value between invocations.
In C++, however, static is also used to define class attributes (shared between all objects of the same class) and methods. In C there are no classes, so this feature is irrelevant.
There is no difference between a static function defined in the global scope, no matter if it's in a header file or in a source file. Unless the header file isn't included anywhere, when the functions in it is never really defined anywhere.
Then phrase you need to learn when talking about static (non-member) functions is translation unit. A translation unit is a source file and all header files included in that source file, after the preprocessor have processed the file and is the actual input to the compiler. A static function is local to the translation unit, which is why there is no difference if it's defined in the source file or a header file.
You can also use an anonymous namespace to define functions, and they will be local to just the translation unit the anonymous namespace is in.
Also note that functions defined as inline are implicitly defined as static as well.
A static member function is part of the class, and can access static member variables without scope prefix. They do of course have to prefixed with the scope of the class to be called. The difference between a static member function and a non-static member function is that static member functions are not part of any specific instance of the class, and so have no this pointer. If you want to access a specific class instances member variable, you have to pass the instance to the static member function through an argument.

Difference between an inline function and static inline function

Can anybody tell me what the difference is between an inline function and static inline function?
In which cases should I prefer static inline over inline?
I am asking this question because I have an inline function for which I am facing compilation issues during linking (relocation error:... symbol has been discarded with discarded section ...). I made it a normal function and it worked.
Now some of my seniors told me try with static inline.
Below is my function:
inline void wizSendNotifier (const char* nn_name, bpDU* arg=0, int aspect = -1)
{
wizuiNotifier* notifier = ::wizNtrKit.getNotifier (nn_name);
notifier->notify (arg, aspect);
}
and this not inside a class. This is inside a header file!
I guess the call to a static function should be done only in the particular TU where it is defined.
Since my function is in a header file and if i make it static, will it be the case that where ever I include that header file the static function can used used in that translation unit?
The non-static inline function declaration refers to the same function in every translation unit (source file) that uses it.
The One Definition Rule requires that the body of the function definition is identical in every TU that contains it, with a longish definition of "identical". This is usually satisfied provided that the source files all use the same header, and provided that the function doesn't use any global names with internal linkage (including static functions) or any macros that are defined differently in different TUs.
I don't remember encountering that particular linker error before, but it's at least possible that one of these restrictions is responsible. It's your responsibility to satisfy the requirements: undefined behavior with no diagnostic required if you don't.
The static inline function declaration refers to a different function in each translation unit, that just so happens to have the same name. It can use static global names or macros that are different in different TUs, in which case the function might behave differently in the different TUs, even though its definition in the header file "looks the same".
Because of this difference, if the function contains any static local variables then it behaves differently according to whether it is static or not. If it is static then each TU has its own version of the function and hence its own copy of the static local variables. If it's inline only, then there is only one copy of the static local variables used by all TUs.

How does linkage and name mangling work?

Lets take this code sample
//header
struct A { };
struct B { };
struct C { };
extern C c;
//code
A myfunc(B&b){ A a; return a; }
void myfunc(B&b, C&c){}
C c;
Lets do this line by line starting from the code section.
When the compiler sees the first myfunc method it does not care about A or B because its use is internal. Each c++ file will know what it takes in, what it returns. Although there needs to be a name for each of the two overload so how is that chosen and how does the linker know which means what?
Next is C c; I once had a bug were the linker wouldnt reconize thus allow me access to C in other C++ files. It was because that cpp didnt know c was extern and i had to mark it as extern in the header before i could link successfully. Now i am not sure if the class type has any involvement with the linker and the variable C. I dont know how RTTI will be involved but i do know C needs to be visible by other files.
How does the linker work and name mangling and such.
We first need to understand where compilation ends and linking begins. Compilation involves taking a compilation unit (a C or C++ source file) and turning it into an object file. Simplistically, this involves generating snippets of machine code for each function as well as a symbol table for all functions and static (global) variables. Placeholders are used for any symbols needed by the compilation unit that are external to the said compilation unit.
The linker is then responsible for loading all the object files and resolving all the place-holder symbols with real addresses (or offsets for machine independent code). This is placed into various sections that can be read by the operating system's dynamic loader when loading an executable.
So for the specifics. In order to avoid errors during linking, the compiler requires you to declare all external symbols that will be used by the current compilation unit. For global variables one must use the extern keyword, for functions this is optional.
All functions and global variables defined in a compilation unit have external linkage (i.e., can be referenced by other compilation units) unless one declares that with the static keyword (or the unnamed namespace in C++). In C++, the vtable will also have a symbol needed for linkage.
Now in C++, since functions can be overloaded, the parameters also form part of the function name. Since machine-code is just addresses and registers, extra information needs to be added to the function name in the symbols table. This extra parameter information comes in the form of a mangled name and ensures that the linker links to the correct version of an overloaded function.
If you really are interested in the gory details take a look at the ELF file format (PDF) used extensively on Linux. Windows has a different format but the principles can be expected to be the same.
Name mangling on the Itanuim (and ARM) platforms can be found here.
http://en.wikipedia.org/wiki/Name_mangling