How does this header-only library guard against linker problems? - c++

After reading this question I thought I understood everything, but then I saw this file from a popular header-only library.
The library uses the #ifndef line, but the SO question points out that this is NOT adequate protection against multiple definition errors in multiple TUs.
So one of the following must be true:
It is possible to avoid multiple definition linker errors in ways other than described in the SO question. Perhaps the library is using techniques not mentioned in to the other SO question that are worthy of additional explanation.
The library assumes you won't include its header files in more than translation unit -- this seems fragile since a robust library shouldn't make this assumption on its users.
I'd appreciate having some light shed on this seemingly simple curiosity.

A header that causes linking problems when included in multiple translation units is one that will (attempt to) define some object (not just, for an obvious example, a type) in each source file where it's included.
For example, if you had something like: int f = 0; in a header, then each source file into which it was included would attempt to define f, and when you tried to link the object files together you'd get a complaint about multiple definitions of f.
The "technique" used in this header is simple: it doesn't attempt to define any actual objects. Rather, it includes some typedefs, and the definition of one fairly large class--but not any instances of that class or any instance of anything else either. That class includes a number of member functions, but they're all defined inside the function definition, which implicitly defines them as inline functions (so defining separately in each translation unit in which they're used is not only allowed, but required).
In short, the header only defines types, not objects, so there's nothing there to cause linker collisions when it's included in multiple source files that are linked together.

If the header defines items, as opposed to just declaring them, then it's possible to include it in more than one translation unit (i.e. cpp file) and have multiple definitions and hence linker errors.
I've used boost's unit test framework which is header only. I include a specified header in only one of my own cpp files to get my project to compile. But I include other unit test headers in other cpp files which presumably use the items that are defined in the specified header.

Headers only include libraries like Boost C++ Libraries use (mostly) stand-alone templates and as so are compiled at compile-time and don't require any linkage to binary libraries (that would need separate compilation). One designed to never need linkage is the great Catch

Templates are a special case in C++ regarding multiple definitions, as long as they're the same. See the "One Definition Rule" section of the C++ standard:
There can be more than one definition of a class type (Clause 9),
enumeration type (7.2), inline function with external linkage (7.1.2),
class template (Clause 14), non-static function template (14.5.6),
static data member of a class template (14.5.1.3), member function of
a class template (14.5.1.1), or template specialization for which some
template parameters are not specified (14.7, 14.5.5) in a program
provided that each definition appears in a different translation unit,
and provided the definitions satisfy the following requirements. ....
This is then followed by a list of conditions that make sure the template definitions are identical across translation units.
This specific quote is from the 2014 working draft, section 3.2 ("One Definition Rule"), subsection 6.

This header file can indeed be included in difference source files without causing "multiple symbol definition" errors.
This happens because it is fine to have multiple identically named symbols in different object files as long as these symbols are either weak or local.
Let's take a closer look at the header file. It (potentially) defines several objects like this helper:
static int const helper[] = {0,7,8,13};
Each translation unit that includes this header file will have this helper in it. However, there will be no "multiple symbol definition" errors, since helper is static and thus has internal linkage. The symbols created for helpers will be local and linker will just happily put them all in the resulting executable.
The header file also defines a class template connection. But it is also okay. Class templates can be defined multiple times in different translation units.
In fact, even regular class types can be defined multiple times (I've noticed that you've asked about this in the comments). Symbols created for member functions are usually weak symbols. Once again, weak symbols don't cause "multiple symbol definition" errors, because they can be redefined. Linker will just keep redefining weak symbols with names he has already seen until there will be just one symbol per member function left.
There are also other cases, where certain things (like inline functions and enumerations) can be defined several times in different translation units (see §3.2). And mechanisms of achieving this can be different (see class templates and inline functions). But the general rule is not to place stuff with global linkage in header files. As long as you follow this rule, you're really unlikely to stumble upon multiple symbol definitions problems.
And yes, include guards have nothing to do with this.

Related

Why don't standard libraries in g++ violate the One Definition Rule when included multiple times?

I poorly understand the linkage process, and I'm struggling with multi-file compilation with template classes. I wanted to keep definitions in one file, declarations in another. But after a day of suffering, I found out that I should keep definitions and declarations in the same transition unit (see here). But my code still doesn't compile, and if I do not figure it out, I'll post another question.
But that's not the question. The question is: once I was reading about "keeping definitions and declarations in the same header" and it was considered as bad practice. One of the comments says:
The include guards protect against including the same header file multiple times in the same source file. It doesn't protect against inclusion over multiple source files
Makes sense, until I wanted to try this out with standard libraries in g++. I tried to do it with <vector>, so I have two files:
main.cpp:
#include "class.h"
#include <vector>
template class std::vector<int>;
int main()
{
std::vector<int> arr;
}
class.h:
#include <vector>
std::vector<int> a(4);
I compiled with g++ main.cpp class.h and it compiled successfully. But why? Didn't I included <vector> twice, and therefore I have double definitions?
I checked header files on my machine, and they have definitions and declarations in the same file.
You need to differentiate between two categories of entities: Those that may have only one definition in the whole program and those that may have a single definition in each translation unit, although these definitions must be identical (i.e. same sequence of tokens plus some more requirements). In any case a single translation unit may contain only one definition of an entity.
See https://en.cppreference.com/w/cpp/language/definition for the details.
Most entities belong to the first category, but classes, templates of all kinds and inline functions belong to the second one. Not only can these entities be defined in every translation unit, but they typically need to be defined in every translation unit using them. That's why their definitions typically belong in header files, while definitions of entities of the first category never belong in a header file.
std::vector is a class template, so it and its member functions can be defined in each translation unit once.
It is not clear what you intention with the compiler invocation g++ main.cpp class.h is, but it actually compiles only one translation unit (main.cpp) and does something else for class.h because of its file ending (see comments under your question).
In the compilation unit for main.cpp you are not including two definitions of std::vector or its members, although you have two #include<vector> directives, because the standard library is required to prevent this from happening. It could e.g. use header guards as user code would to achieve the same guarantee.
If class.h would also be compiled as a translation unit, then there still wouldn't be an issue, since std::vector and its members are templated entities and so they can be defined again in this translation unit.
Note that technically the standard library is not bound by the language rules. It may not be written in C++ at all. In that case it just has to work correctly for the user, no matter how it is included, as long as the user program is in accordance with the language rules.
Also note that the explicit instantiation definition
template class std::vector<int>;
is pointless in the way you are using it. It is only required if the members of the class template are, against the typical usage mentioned above, not defined in the header but a single translation unit (but that is never the case for the standard library) or to speed up compilation time, in which case there should be an explicit instantiation declaration in the other translation units to prevent implicit instantiation.

Why can classes be defined in header files?

According to Why can't I multi-declare a class
class A; is a declaration while
class A { ... } is a definition
Typically, in header files, we define the class and we implement its member functions in the .cpp. But wouldn't defining classes in the header file violate the One Definition Rule?
According to https://www.learncpp.com/cpp-tutorial/89-class-code-and-header-files/
Doesn’t defining a class in a header file violate the one-definition rule?
It shouldn’t. If your header file has proper header guards, it
shouldn’t be possible to include the class definition more than once
into the same file.
Types (which include classes), are exempt from the
part of the one-definition rule that says you can only have one
definition per program. Therefore, there isn’t an issue #including
class definitions into multiple code files (if there was, classes
wouldn’t be of much use).
While the first part is obviously true, header guards will prevent multiple definitions in the same file, but I am confused about the second part of the answer that addresses my question.
If a header file has a definition of a class, for example ThisClass, and that header file is included in two other files for example a.cpp and b.cpp. Why wouldn't it violate the One Definition Rule? If a ThisClass object is created in either file, which definition would be called?
If a header file has a definition of a class, for example ThisClass, and that header file is included in two other files for example a.cpp and b.cpp. Why wouldn't it violate the One Definition Rule?
Because, as you quoted, the one definition rule specifically allows for this.
How can it? Read on…
If a ThisClass object is created in either file, which definition would be called?
It doesn't matter, because the definitions are required to be absolutely, lexically identical.
If they're not, your program has undefined behaviour and you can expect all manner of weirdness to ensue.
Based on what I'm reading, it might be important to use Wikipedia's brief explanation of what the One Definition Rule actually is:
The One Definition Rule (ODR) is an important rule of the C++ programming language that prescribes that objects and non-inline functions cannot have more than one definition in the entire program and template and types cannot have more than one definition by translation unit.
This means that when the program is compiled, there can be one, and only one definition of a function or object. This might be confusing, however a class or type shouldn't be thought of as objects, they're really just blueprints for an object.
In your question, I think there's a misunderstanding of what #include does and how the ODR applies to classes. When the program is compiling, the .cpp files look to their #include'd class definitions as a way to link together, that is, the class.h blueprint must know where to look when calling it's member functions. Having multiple #include's of the same class.h is not the same as having multiple definitions of class.h, and thus, it cannot violate the One Definition Rule.
It is okay to have a class defined in a header file because the compiler treats the functions declared in the class in such a case as an inline functions.
So there are two options to happen when you call this class functions:
Or it wiil inline the function calls and in such a case there is no problem at all of multiple definition.
Or it wont inline them and in such a case he will create the symbols related to the functions as weak symbols. Then the linker will choose one of the symbols to be the symbol representing the function.
Of course if there will be different defintions in two or more cpp files there will be undefined behaviour.

Different C++ Class Declarations

I'm trying to use a third party C++ library that isn't using namespaces and is causing symbol conflicts. The conflicting symbols are for classes my code isn't utilizing, so I was considering creating custom header files for the third party library where the class declarations only include the public members my code is using, leaving out any members that use the conflicting classes. Basically creating an interface.
I have three questions:
If the compilation to .obj files works, will this technique still cause symbol conflicts when I get to linking?
If that isn't a problem, will the varying class declarations cause problems when linking? For example, does the linker verify that the declaration of a class used by each .obj file has the same number of members?
If neither of those are a problem and I'm able to link the .obj files, will it cause problems when invoking methods? I don't know exactly how C++ works under the hood, but if it uses indexes to point to class methods, and those indexes were different from one .obj file to another, I'm guessing this approach would blow up at runtime.
In theory, you need identical declarations for this to work.
In practice, you will definitely need to make sure your declarations contain:
All the methods you use
All the virtual methods, used or not.
All the data members
You need all these in the right order of declaration too.
You might get away with faking the data members, but would need to make sure you put in stubs that had the same size.
If you do not do all this, you will not get the same object layout and even if a link works it will fail badly and quickly at run-time.
If you do this, it still seems risky to me and as a worst case may appear to work but have odd run time failures.
"if it uses indexes ": To some extent exactly how virtual functions work is implementation defined, but typically it does use an index into a virtual function table.
What you might be able to do is to:
Take the original headers
Keep the full declarations for the classes you use
Stub out the classes and declarations you do not use but are referenced by the ones you do.
Remove all the types not referenced at all.
For explanatory purposes a simplified explaination follows.
c++ allows you to use functions you declare. what you do is putting multiple definitions to a single declaration across multiple translation units. if you expose the class declaration in a header file your compiler sees this in each translation unit, that includes the header file.
Therefore your own class functions have to be defined exactly as they have been declared (same function names same arguments).
if the function is not called you are allowed not to define it, because the compiler doesn't know whether it might be defined in another translation unit.
Compilation causes label creation for each defined function(symbol) in the object code. On the other hand a unresolved label is created for each symbol that is referenced to (a call site, a variable use).
So if you follow this rules you should get to the point where your code compiles but fails to link. The linker is the tool that maps defined symbols from each translation-unit to symbol references.
If the object files that are linked together have multiple definitions to the same functions the linker is unable to create an exact match and therefore fails to link.
In practice you most likely want to provide a library and enjoy using your own classes without bothering what your user might define. In spite of the programmer taking extra care to put things into a namespace two users might still choose the same name for a namespace. This will lead to link failures, because the compiler exposed the symbols and is supposed to link them.
gcc has added an attribute to explicitly mark symbols, that should not be exposed to the linker. (called attribute hidden (see this SO question))
This makes it possible to have multiple definitions of a class with the same name.
In order for this to work across compilation units, you have to make sure class declarations are not exposed in an interface header as it could cause multiple unmatching declarations.
I recommend using a wrapper to encapsulate the third party library.
Wrapper.h
#ifndef WRAPPER_H_
#define WRAPPER_H_
#include <memory>
class third_party;
class Wrapper
{
public:
void wrappedFunction();
Wrapper();
private:
// A better choice would be a unique_ptr but g++ and clang++ failed to
// compile due to "incomplete type" which is the whole point
std::shared_ptr<third_party> wrapped;
};
#endif
Wrapper.cpp
#include "Wrapper.h"
#include <third_party.h>
void Wrapper::wrappedFunction()
{
wrapped->command();
}
Wrapper::Wrapper():wrapped{std::make_shared<third_party>()}
{
}
The reason why a unique_ptr doesn't work is explained here: std::unique_ptr with an incomplete type won't compile
You can move the entire library into a namespace by using a clever trick to do with imports. All the import directive does is copy the relevant code into the current "translation unit" (a fancy name for the current code). You can take advantage of this as so
I've borrowed heavily from another answer by user JohnB which was later deleted by him.
// my_thirdparty.h
namespace ThirdParty {
#include "thirdparty.h"
//... Include all the headers here that you need to use for thirdparty.
}
// my_thirdparty.cpp / .cc
namespace ThirdParty {
#include "thirdparty.cpp"
//... Put all .cpp files in here that are currently in your project
}
Finally, remove all the .cpp files in the third party library from your project. Only compile my_thirdparty.cpp.
Warning: If you include many library files from the single my_thirdparty.cpp this might introduce compiler issues due to interaction between the individual .cpp files. Things such as include namespace or bad define / include directives can cause this. Either resolve or create multiple my_thirdparty.cpp files, splitting the library between them.

inline keyword for templates [duplicate]

This question already has answers here:
Does it make any sense to use inline keyword with templates?
(4 answers)
Closed 8 years ago.
This code will be placed in a header file:
template<typename TTT>
inline Permutation<TTT> operator * (const Cycle<TTT>& cy, const Permutation<TTT>& p)
{
return Permutation<TTT>(cy)*p;
}
Is inline necessary to avoid a linker error?
If this function is not a template and the header file is used in more than one .cpp file, inline is necessary to avoid a liker error complaining about multiple definitions for a function. It seems linker ignores this for templates.
Is inline necessary to avoid a linker error?
On a function template, no. Templates, like inline functions, are subject to a more relaxed One Definition Rule which allows multiple definitions - as long as the definitions are identical and in separate translation units.
As you say, inline would be necessary if you wanted to define a non-template function in a header; non-inline functions are subject to a more strict One Definition Rule, and can only have one definition in a program.
For the gory details, this is specified by C++11 3.2/5:
There can be more than one definition of a class type, inline function with
external linkage, class template, non-static function template, static data member
of a class template, member function of a class template, or template specialization for
which some template parameters are not specified in a program provided that each definition
appears in a different translation unit, and provided the definitions satisfy the following requirements.
(The "following requirements" basically say that the definitions must be identical).
Consider that a template function (or function template if you prefer) is not a function at all. It is rather a recipe to create a function. The actual function is only created when and where the template is instantiated. So you do not need the inline keyword here, because template functions will not result in multiple-definition linker errors because they are not actually defined (from the linker's perspective) until they are used.
Expanding on Mike Seymour's answer -- the paragraph (3.2/5) he cites in in the Standard refers to a concept called "vague linkage". Basically, it's a way of saying "we need this to exist somewhere in the resulting binary, but we don't have a clear-cut home for it in any specific object file we're emitting." On modern platforms (Windows, ELF systems such as Linux, and OS X), this is implemented using a mechanism known as COMDAT support that allows the compiler to simply generate the instantiations and other vague linkage items (vtables, typeinfos, and inline function bodies) as-needed -- the linker then is free to throw out the duplicates:
When used with GNU ld version 2.8 or later on an ELF system such as GNU/Linux or Solaris > 2, or on Microsoft Windows, duplicate copies of these constructs will be discarded at
link time. This is known as COMDAT support.
This is discussed in more detail in the GCC manual (quote snipped as the Cfront model is irrelevant for modern compilers):
C++ templates are the first language feature to require more intelligence from the
environment than one usually finds on a UNIX system. Somehow the compiler and linker
have to make sure that each template instance occurs exactly once in the executable if
it is needed, and not at all otherwise. There are two basic approaches to this problem,
which are referred to as the Borland model and the Cfront model.
Borland model
Borland C++ solved the template instantiation problem by adding the code equivalent
of common blocks to their linker; the compiler emits template instances in each
translation unit that uses them, and the linker collapses them together. The advantage
of this model is that the linker only has to consider the object files themselves; there
is no external complexity to worry about. This disadvantage is that compilation time is
increased because the template code is being compiled repeatedly. Code written for this
model tends to include definitions of all templates in the header file, since they must
be seen to be instantiated.

What exactly is a C++ header file and declaration?

I know the basics about header files and forward declarations but I'm wondering, if I declare the exact same thing in two separate header files and then compile that, will that work?
Is the C++ interface, in this case, portable, I mean, If I have two libs and they share the same interface (declaration or what not) somewhere could I theoretically just replicate the same declaration in the program and actually compile that, or if not, why?
How would C++ for instance be able to tell the difference between two identical declarations in two different files?
Say I had two distinct libs but they shared some interface, they are compiled separately but by the same tools, would it be possible to in a future step bring these to libs together and actually pass the interface between these two libs as if it was the same, compatible interface, even though it was originally compiled from different (but identical) header files?
Function declarations, variable declarations and class definitions for a given identifier may appear in the source code as often as you like. They just have to be identical each time they appear (which is automatic if you include a given header file in multiple translation units).
It is only the definition of functions, variables and class member functions that must appear precisely once (with a special rule for inlined functions).
(It's a little different for templates: Template definitions may appear repeatedly, but again all occurrences have to be identical. But templates require some non-trivial deduplication work from the linker.)