How to find duplicate definitions from template specializations?

How to find duplicate definitions from template specializations? - c++

I have a template class with a specialization, defined in another file. Therefore it is possible to generate two versions of the same class: one time by replacing the template parameter and one time using the specialization. My current understanding is, that this can lead to two instanciations of the same type to have different sizes in memory, resulting in segmentation faults.
I created a minimal example and the following code is to illustrate the question:
Create a template class:
// - templateexample.h ---------------
#ifndef TEMPLATEEXAMPLE_H
#define TEMPLATEEXAMPLE_H
template<typename T> class Example
{
public:
Example(){}
int doWork() {return 42;}
};
#endif
// -----------------------------------
Template specialization in another file:
// - templatespecialization.h --------
#ifndef TEMPLATESPECIALIZATION_H
#define TEMPLATESPECIALIZATION_H
#include "templateexample.h"
template<> class Example<int>
{
public:
Example() : a(0), b(1), c(2), d(3) {}
int doWork() {return a+b+c+d;}
private:
int a; //<== the specialized object will be larger in memory
int b;
int c;
int d;
};
#endif
// --------------------------------
Have a class which includes only the template class definition, but should include the specialization.
// - a.h --------------------------
#ifndef A_H
#define A_H
#include "templateexample.h"
class A
{
public:
Example<int> returnSmallExample();
};
#endif
// - a.cpp ------------------------
#include "a.h"
Example<int> A::returnSmallExample() {return Example<int>();}
// --------------------------------
The main class now knows two versions of Example<int> the one from A and the one from the templatespecialization.h.
// - main.cpp ---------------------
#include <iostream>
#include "a.h"
#include "templatespecialization.h"
int main()
{
A a;
Example<int> test = a.returnSmallExample();
std::cout<<test.doWork()<<std::endl;
}
// --------------------------------
Please note, this problem will only occur when compiling class A separately, this example from ideone outputs 6, whereas using separate files can result in a segmentation fauls, or output 42 (https://ideone.com/3RTzlC). On my machine the example compiles successfully and outputs 2013265920:
In the production version of the above example everything is build into a shared library which is used by main.
Question 1: Why doesn't the linker detect this problem? This should be easy to spot by comparing the size of objects.
Question 2: is there a way to examine the object files or the shared library to detect multiple implementations of the same type like in the example above?
Edit: please note: the code above is a minimal example to explain the problem. The reason for the situation is that the template class is from one library and I cannot edit the files from this library. Finally the whole thing is used all over the place in the executable and now I need to find out if the problem above occurs.
Edit: code above can be compiled like this:
#!/bin/bash
g++ -g -c a.cpp
g++ -g -c main.cpp
g++ -o test a.o main.o

You have different definition of the same template and its specializations in different translation units. This leads to One Definition Rule violation.
A fix would be to put the specialization in the same header file where the primary class template is defined.
Question 1: Why doesn't the linker detect this problem? This should be easy to spot by comparing the size of objects.
Different types may have the same size (e.g. double and int64_t), so, obviously, just comparing sizes of objects does not work.
Question 2: is there a way to examine the object files or the shared library to detect multiple implementations of the same type like in the example above?
You should use gold linker for linking your C++ applications, if you do not use it already. One nice feature of it is --detect-odr-violations command line switch which does exactly what you ask:
gold uses a heuristic to find potential ODR violations: if the same symbol is seen defined in two different input files, and the two symbols have different sizes, then gold looks at the debugging information in the input objects. If the debugging information suggests that the symbols were defined in different source files, gold reports a potential ODR violation. This approach has both false negatives and false positives. However, it is reasonably reliable at detecting problems when linking unoptimized code. It is much easier to find these problems at link time than to debug cases where the wrong symbol.
See Enforcing One Definition Rule for more details.

Question 1: Why doesn't the linker detect this problem? This should be easy to spot by comparing the size of objects.
Because this is not a linker's problem. In case of templates, the main declaration and all the other specializations (be it class or function) should be visible upfront.
Question 2: is there a way to examine the object files or the shared library to detect multiple implementations of the same type
like in the example above?
At least I am not aware of any.
To further simplify this situation, look at a similar broken code:
// foo.h
inline void foo () {
#ifdef FOO
return;
#else
throw 0;
#endif
}
// foo1.cpp
#define FOO
#include"foo.h"
// ... uses foo() with "return"
// foo2.cpp
#include"foo.h"
// ... uses foo() with "throw"
It's possible that you get different results based on the way of compilation being used.
Update:
Having multiple body definitions for a same function is undefined behavior. That's the reason why you are getting an awkward output like 2013265920 and the same happens in my machine as well. The output should be either 42 or 6. I gave you above example because with inline functions you can create such race conditions.
With my limited knowledge on linking stage, the responsibility undertaken by a typical linker is limited only till rejecting more than 1 non-inline functions definitions with the same signature. e.g.
// Header.h is included in multiple .cpp files
void foo () {} // rejected due to multiple definitions
inline void bar () {} // ok because `inline` keyword is found
Beyond that it doesn't check if the function body is similar or not, because that is already parsed during earlier stage and linker doesn't parse the function body.
Having said above, now pay attention to this statement:
template functions are always inline by nature
Hence linker may not get a chance to reject them.
The safest way is to #include the read-only header into your specialized header and include that specialized header everywhere.

I don't know of a way to do this by analysis of the compiled binary, but you could build a graph of your program's #include relationships — there are tools that can do this, such as Doxygen — and use it to look for files that (directly or indirectly) include the library header but not the specialization header.
You'd need to examine each file to determine whether it actually uses the template in question, but at least you can narrow down the set of files that you have to examine.

I think you managed to deceive the compiler. But I should note that in my humble opinion you understand conception of templates in a wrong way, particularly you are trying to mix up template specialization with inheritance.
I mean, template specialization must not add data members to class, the only aims are to define types for function parameters and class fields. If you want to change algorithms, i.e. rewrite code, or add new date members to class, you should define derived class.
Concerning "several-step" separate compilation and template class in library, C++ reference said (http://www.cplusplus.com/doc/oldtutorial/templates/):
Because templates are compiled when required, this forces a restriction for multi-file projects: the implementation (definition) of a template class or function must be in the same file as its declaration. That means that we cannot separate the interface in a separate header file, and that we must include both interface and implementation in any file that uses the templates.
Since no code is generated until a template is instantiated when required, compilers are prepared to allow the inclusion more than once of the same template file with both declarations and definitions in a project without generating linkage errors.

Related

How come you can split non template classes in a .h interface and .cpp implementation?

I understand that if you attempt to split a templated class in a .h interface and a .cpp implementation, you get a linker error. The reason for this as mentioned in a popular post is "If the implementations were not in the header, they wouldn't be accessible, and therefore the compiler wouldn't be able to instantiate the template."
What I dont understand is that if the implementations inside a .cpp file are inaccessible in case of templated classes, what makes them accessible for non templated or just regular classes. How come we are able to split the interface and implementation for normal classes over a .h and .cpp file without getting a linker error?
Test.h
template<typename TypeOne>
TypeOne ProcessVal(TypeOne val);
Test.cpp
template<typename TypeOne>
TypeOne ProcessVal(TypeOne val)
{
// Process it here.
return val;
}
Main.cpp
void main()
{
int a, b;
b = ProcessVal(a);
}
This code gives linker error. A similar splitting of non templated classes does not give Linker error. I could post the code, but you get the idea.

In case of plain function, compiler generates code straightaway and adds generated code to compilation unit.
Test.cpp
int ProcessVal(int val)
{
// Process it here.
return val;
}
In case of the above code all the necessary information is known and C++ code of the function ProcessVal can be translated into machine instructions. As a result, object file (probably called Test.o) will contain ProcessVal symbol + corresponding code and linker can refer to it (to generate calls or to perform inlining).
On the other hand this piece of code:
Test.cpp
template<typename TypeOne>
TypeOne ProcessVal(TypeOne val)
{
// Process it here.
return val;
}
does not provide any output to compilation unit. Object file of this compilation unit (Test.o) will not contain any code for ProcessVal() function, because compiler does not know of what type TypeOne argument is going to be. You have to instantiate template to get its binary form and only this can be added to resulting binary.

When you have a template definition, nothing is added to the compilation unit, because the template argument could be many things, so you can't know from compile time what class to create, as stated in this answer.
With the non-templated case, you know what you have inside your class, you don't have to wait for a template argument to be given in order to really generate the actual class, thus the linker can see them (since they are compiled into the binary as doc's post puts it).

Basically, this goes back to the old C language. The .h files were intended to be done with the C preprocessor, and were literally textually included into the C source by the preprocessor. So if you had foo.h:
int i = 0;
and foo.c:
#include "foo.h"
int main(){ printf("%d\n", i);}
when the preprocessor was done, the compiler actually saw:
int i = 0;
int main(){ printf("%d\n", i);}
as the source file. There was no chance for a linker error, because the linker was never involved -- you just compiled one C file.
While the semantics of templates are a little more complicated now, the programming model is still the same: your .h file includes program text that is introduced to the program lexically, before actual final parsing and compilation takes place.

If you actually do want to have the implementation of a template function or a class in a separate C++ file, you can explicitly instantiate it for a certain type. For instance, in your particular example if you add this line to test.cpp, your code will successfully link
template int ProcessVal<int>(int val);

Linker errors - undefined references

Yes, I know this has been asked a billion times before, I've checked at least 100 duplicates of this question, and still haven't found an answer.
I'm getting undefined reference errors to all of my LList functions, although it all seems to be properly defined and linked. Since my code is a bit too long to paste here, I made a pastie: Click
I compile my code with: g++ driver.cpp box.cpp LList.cpp Song.cpp Album.cpp -o driver

A class or function template is not a class or function and hence cannot be placed in a .cpp file like classes or functions. Rather, a template is a blueprint for how to make a class or function, namely a particular instantination of the template.
You can solve your problem in two ways:
1 either put all the templated code in the respective header files.
2 or instantinate the code explicitly in the .cpp files. For example
// Llist.cpp
#include Llist.hpp
#include Sonc.hpp
/* definition of members of Llist<> */
template class Llist<Song>; // creates class Llist<Song>
Solution 1 always works, but has the potential for HUGE header files and exposes all (or most) implementation details to the user. Solution 2 avoids huge headers and hides implementation details, but requires that you know which instantination you actually need (often impossible, in particular for such general concepts as linked lists).
Finally, solution 3: use the C++ standard library (std::list, std::forward_list) and don't worry.

Template classes and functions must be defined inline. Thats the problem. For example:
//box.h
#ifndef BOX_H
#define BOX_H
template <typename DataType>
struct box
{
DataType data;
box<DataType> *next;
box(DataType d, box<DataType>* n)
{
data = d;
next = n;
}
};
#endif
And remove the .cpp file; you should do the same to the LList.h/cpp

Separating C++ Class Code into Multiple Files, what are the rules?

Thinking Time - Why do you want to split your file anyway?
As the title suggests, the end problem I have is multiple definition linker errors. I have actually fixed the problem, but I haven't fixed the problem in the correct way. Before starting I want to discuss the reasons for splitting a class file into multiple files. I have tried to put all the possible scenarios here - if I missed any, please remind me and I can make changes. Hopefully the following are correct:
Reason 1 To save space:
You have a file containing the declaration of a class with all class members. You place #include guards around this file (or #pragma once) to ensure no conflicts arise if you #include the file in two different header files which are then included in a source file. You compile a separate source file with the implementation of any methods declared in this class, as it offloads many lines of code from your source file, which cleans things up a bit and introduces some order to your program.
Example: As you can see, the below example could be improved by splitting the implementation of the class methods into a different file. (A .cpp file)
// my_class.hpp
#pragma once
class my_class
{
public:
void my_function()
{
// LOTS OF CODE
// CONFUSING TO DEBUG
// LOTS OF CODE
// DISORGANIZED AND DISTRACTING
// LOTS OF CODE
// LOOKS HORRIBLE
// LOTS OF CODE
// VERY MESSY
// LOTS OF CODE
}
// MANY OTHER METHODS
// MEANS VERY LARGE FILE WITH LOTS OF LINES OF CODE
}
Reason 2 To prevent multiple definition linker errors:
Perhaps this is the main reason why you would split implementation from declaration. In the above example, you could move the method body to outside the class. This would make it look much cleaner and structured. However, according to this question, the above example has implicit inline specifiers. Moving the implementation from within the class to outside the class, as in the example below, will cause you linker errors, and so you would either inline everything, or move the function definitions to a .cpp file.
Example: _The example below will cause "multiple definition linker errors" if you do not move the function definition to a .cpp file or specify the function as inline.
// my_class.hpp
void my_class::my_function()
{
// ERROR! MULTIPLE DEFINITION OF my_class::my_function
// This error only occurs if you #include the file containing this code
// in two or more separate source (compiled, .cpp) files.
}
To fix the problem:
//my_class.cpp
void my_class::my_function()
{
// Now in a .cpp file, so no multiple definition error
}
Or:
// my_class.hpp
inline void my_class::my_function()
{
// Specified function as inline, so okay - note: back in header file!
// The very first example has an implicit `inline` specifier
}
Reason 3 You want to save space, again, but this time you are working with a template class:
If we are working with template classes, then we cannot move the implementation to a source file (.cpp file). That's not currently allowed by (I assume) either the standard or by current compilers. Unlike the first example of Reason 2, above, we are allowed to place the implementation in the header file. According to this question the reason is that template class methods also have implied inline specifiers. Is that correct? (It seems to make sense.) But nobody seemed to know on the question I have just referenced!
So, are the two examples below identical?
// some_header_file.hpp
#pragma once
// template class declaration goes here
class some_class
{
// Some code
};
// Example 1: NO INLINE SPECIFIER
template<typename T>
void some_class::class_method()
{
// Some code
}
// Example 2: INLINE specifier used
template<typename T>
inline void some_class::class_method()
{
// Some code
}
If you have a template class header file, which is becoming huge due to all the functions you have, then I believe you are allowed to move the function definitions to another header file (usually a .tpp file?) and then #include file.tpp at the end of your header file containing the class declaration. You must NOT include this file anywhere else, however, hence the .tpp rather than .hpp.
I assume you could also do this with the inline methods of a regular class? Is that allowed also?
Question Time
So I have made some statements above, most of which relate to the structuring of source files. I think everything I said was correct, because I did some basic research and "found out some stuff", but this is a question and so I don't know for sure.
What this boils down to, is how you would organize code within files. I think I have figured out a structure which will always work.
Here is what I have come up with. (This is my class code file organization/structure standard, if you like. Don't know if it will be very useful yet, that's the point of asking.)
1: Declare the class (template or otherwise) in a .hpp file, including all methods, friend functions and data.
2: At the bottom of the .hpp file, #include a .tpp file containing the implementation of any inline methods. Create the .tpp file and ensure all methods are specified to be inline.
3: All other members (non-inline functions, friend functions and static data) should be defined in a .cpp file, which #includes the .hpp file at the top to prevent errors like "class ABC has not been declared". Since everything in this file will have external linkage, the program will link correctly.
Do standards like this exist in industry? Will the standard I came up with work in all cases?

Your three points sound about right. That's the standard way to do things (although I've not seen .tpp extension before, usually it's .inl), although personally I just put inline functions at the bottom of header files rather than in a separate file.
Here is how I arrange my files. I omit the forward declare file for simple classes.
myclass-fwd.h
#pragma once
namespace NS
{
class MyClass;
}
myclass.h
#pragma once
#include "headers-needed-by-header"
#include "myclass-fwd.h"
namespace NS
{
class MyClass
{
..
};
}
myclass.cpp
#include "headers-needed-by-source"
#include "myclass.h"
namespace
{
void LocalFunc();
}
NS::MyClass::...
Replace pragma with header guards according to preference..
The reason for this approach is to reduce header dependencies, which slow down compile times in large projects. If you didn't know, you can forward declare a class to use as a pointer or reference. The full declaration is only needed when you construct, create or use members of the class.
This means another class which uses the class (takes parameters by pointer/reference) only has to include the fwd header in its own header. The full header is then included in the second class's source file. This greatly reduces the amount of unneeded rubbish you get when pulling in a big header, which pulls in another big header, which pulls in another...
The next tip is the unnamed namespace (sometimes called anonymous namespace). This can only appear in a source file and it is like a hidden namespace only visible to that file. You can place local functions, classes etc here which are only used by the the source file. This prevents name clashes if you create something with the same name in two different files. (Two local function F for example, may give linker errors).

The main reason to separate interface from implementation is so that you don't have to recompile all of your code when something in the implementation changes; you only have to recompile the source files that changed.
As for "Declare the class (template or otherwise)", a template is not a class. A template is a pattern for creating classes. More important, though, you define a class or a template in a header. The class definition includes declarations of its member functions, and non-inine member functions are defined in one or more source files. Inline member functions and all template functions should be defined in the header, by whatever combination of direct definitions and #include directives you prefer.

Do standards like this exist in industry?
Yes. Then again, coding standards that are rather different from the ones you expressed can also be found in industry. You are talking about coding standards, after all, and coding standards range from good to bad to ugly.
Will the standard I came up with work in all cases?
Absolutely not. For example,
template <typename T> class Foo {
public:
void some_method (T& arg);
...
};
Here, the definition of class template Foo doesn't know a thing about that template parameter T. What if, for some class template, the definitions of the methods vary depending on the template parameters? Your rule #2 just doesn't work here.
Another example: What if the corresponding source file is huge, a thousand lines long or longer? At times it makes sense to provide the implementation in multiple source files. Some standards go to the extreme of dictating one function per file (personal opinion: Yech!).
At the other extreme of a thousand-plus line long source file is a class that has no source files. The entire implementation is in the header. There's a lot to be said for header-only implementations. If nothing else, it simplifies, sometimes significantly, the linking problem.

What determines which class definition is included for identically-named classes in two source files?

If I have two source files in a project that each define a class of the same name, what determines which version of the class is used?
For example:
// file1.cpp:
#include <iostream>
#include "file2.h"
struct A
{
A() : a(1) {}
int a;
};
int main()
{
// foo() <-- uncomment this line to draw in file2.cpp's use of class A
A a; // <-- Which version of class A is chosen by the linker?
std::cout << a.a << std::endl; // <-- Is "1" or "2" output?
}
...
//file2.h:
void foo();
...
// file2.cpp:
#include <iostream>
#include "file2.h"
struct A
{
A() : a(2) {}
int a;
};
void foo()
{
A a; // <-- Which version of class A is chosen by the linker?
std::cout << a.a << std::endl; // <-- Is "1" or "2" output?
}
I have been able to get different versions of A to be selected by the linker, with identical code - merely by changing the order in which I type the code (building along the way).
Granted, it's poor programming practice to include different definitions of classes in the same namespace with the same name. However, are there defined rules that determine which class will be selected by the linker - and if so, what are they?
As a useful addendum to this question, I would like to know (in general) how the compiler / linker handles classes - does the compiler, when it builds each source file, incorporate the class name and compiled class definition within the object file, whereas the linker (in the scenario of a name clash) throws away one set of compiled class function/member definitions?
The issue of a name clash is not arcane - I now realize that it happens EVERY TIME a header-only template file is #included by two or more source files (and subsequently the same template classes are instantiated, and the same member functions called, in these multiple source files), as is a common scenario with the STL. Each source file must have a separately-compiled version of the same instantiated template class functions, so the linker MUST be selecting among different such compiled versions of these functions at linkage time), I would think.
-- ADDENDUM with related question about Java --
I note that various answers have indicated the One Definition Rule (http://en.wikipedia.org/wiki/One_definition_rule) for C++. As an interesting aside, am I correct that Java has NO SUCH rule - so that multiple, different definitions ARE allowed in Java by the Java specifications?

If a C++ program provides two definitions of the same class (i.e., within the same namespace and named identical), the program violates the rules of the standard and you'll get undefined behavior. What exactly does happen somewhat depends on the compiler and linker: sometimes you get a linker error but this isn't required.
The obvious fix is not to have conflicting class names. The easiest approach to obtain unique class names is to define locally used types within an unnamed namespace:
// file1.cpp
namespace {
class A { /*...*/ };
}
// file2.cpp
namespace {
class A { /*...*/ };
}
These two classes won't conflict.

Such a program violates the One Definition Rule and exhibits undefined behavior.
If there are multiple definitions of a class or inline function in a program (in different translation units, or source files), then all of the definitions must be identical. Neither the compiler nor the linker is required to diagnose all violations of this rule (and not all violations can be easily diagnosed).

This is only successfully linking because the definitions for the 2 constructors are implied to be inline. Try moving them under the class and not using the inline keyword. The kind of linkage you are abusing tells the linker that there will be multiple definitions, where normally it would error that you're breaking the One Definition Rule, which you are actually breaking. Normally this condition that allows you to seemingly break ODR exists for things like templates, which will always have multiple identical definitions in different translation units. But that's the condition: definitions in different translation units must be identical.
Ultimately it's up to your compiler which gets used, in your example.

The compiler will give you a warning for multiple definitions if you allow it (which you should).
The gnu linker resolves symbols in the order that you present files on the command line, so it uses the first definition that it sees. Not sure if all linkers work the same way.

The reason the One Definition Rule exists is so that it doesn't matter which definition is used, they're all identical. It's completely up to the compiler and linker in question as to which version is used, or whether they're consistent. The only externally viewable side effect is when there's a static variable inside a function, a single instance of the variable must be used between all instatiations of the function.
By violating the One Definition Rule you're exposing the mechanics of the compiler/linker in a way that isn't relevant to a correctly written program.

Undefined reference to ClassName::ClassName

I'm using Code::Blocks to build my project, which contains three files: main.cpp, TimeSeries.cpp, TimeSeries.h. TimeSeries.h provides declarations for the TimeSeries class as follows:
template<class XType, class YType> class TimeSeries {
public:
TimeSeries(void);
~TimeSeries(void);
};
Then TimeSeries.cpp contains:
#include "TimeSeries.h"
template<class XType, class YType>
TimeSeries<XType, YType>::TimeSeries(void) {
}
template<class XType, class YType>
TimeSeries<XType, YType>::~TimeSeries(void) {
}
And finally, main.cpp contains
#include "TimeSeries.h"
typedef TimeSeries<float, float> FTimeSeries;
int main(int argc, char** argv) {
FTimeSeries input_data;
return 0;
}
When building with C::B, I get the following error:
undefined reference to `TimeSeries<float, float>::TimeSeries()'
What can I do?
Thanks,
CFP.

Basically all templated code should be defined in a header, otherwise it will not be built since nothing uses it in the compiled unit.
Each cpp file is compiled as a separate unit, and thus constructor and destructor are not compiled. The compiler has no way of knowing what type of template argument you will use in main.cpp when it compiles TimeSeries.cpp.

The reason for splitting code into header- and source-files is so that the declaration and the implementation are separated. The compiler can translate the source-file (compilation unit) into an object file, and other compilation-units that want to use the classes and functions just include the header-file, and link the object file. This way, the code has to be compiled only once, and can be reused by linking it.
The problem with templates is that, as long as there are no parameters provided for them, the compiler cannot compile them. The same template instantiated with different parameters results in different types. std::vector<int> and std::vector<float> are, from the compilers perspective, not related in any way. Because of this, template-classes usually have to reside completely in a header-file, because, when the template is used, the compiler needs the complete definition in order to generate the class depending on the parameters.
As #Gabriel Schreiber pointed out in his answer, you can tell the compiler that he should compile the template with a specific set of parameters, making that available to other compilation units just by linking. However, this does not make the template available for other parameter sets.

You need to add this in your .cpp-file (below the definitions):
template class TimeSeries<float, float>;
When the compiler compiles TimeSeries.cpp it doesn't know which for which types the template is need because it is used in another source file. You need to tell the compiler explicitly.
Read about Explicit Template Instantiation in your copy of the Stroustrup or on the internet.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js