Linking/compile time concerning static template libraries - c++

It seems to be a common convention not to use source files for template based classes (STL and boost) and to put the implementation into the header as well. I assume that this will increase the time it takes to compile the source files that include the header drastically compared to the classic separation between declaration and implementation in header and source files. The reason why this is done is probably due to the fact that you would have to tell the compiler in the source file which templates to use, which will probably result in a bloated .a file.
Assuming the linker also requires more time as the library grows, which approach would be faster in terms of the time it takes to compile a source file that includes the library header?
1. Not using a .cpp file and put the entire class, including the implementation, into the header
//foo.hpp
template <class T>
class Foo
{
public:
Foo(){};
T bar()
{
T* t = NULL;
//do stuff
return *t;
}
};
or
2. Explicitly compiling the template for various types inside the source file of the library itself
//foo.h
template <class T>
class Foo
{
public:
Foo(){};
T bar();
};
//foo.cpp
template <class T>
T Foo<T>::bar()
{
T* t = NULL;
//do stuff
return *t;
}
template class Foo<int>;
template class Foo<float>;
template class Foo<double>;
template class Foo<long long>;

The key issue with templates is that the compiler won't know for which template arguments a template will be used. The only time the compiler knows a template is used with a specific set of arguments is when it sees the template used and the compiler will instantiate the template at this point. As a result the code is often put into the header so the compiler can instantiate the template on the user's behalf when it is used.
Alternatively, the author of a template can tell the compiler that a template is used with a specific list of template arguments and just instantiate them explicitly. In this case, the template definition can go into a source file (or, more likely, a special header not generally included by users). The problem with this approach is that the author of the template code doesn't necessarily know which instantiations are needed.
In C++ 2011 there is also a middle ground: It is possible to tell the compiler that certain instantiations are already created by declaring a specialization as extern. This way, the compiler knows that it doesn't need to instantiate the template with certain arguments but if other arguments are used it knows that it needs to create them. For example, the standard C++ library has std::basic_string and it can predict that instantiations for char and wchar_t are likely to be used and can put them into the library, declaring the instantiations as extern. However, having the code readily available makes it viable to use std::basic_string<user_type> with user defined types.
Going forward we hope to get a module system but right now nobody really knows how such a system should really work. The compiler implementers who are interested in the topic have a group to think about modules and it is likely that a system like this may help with compile times for templates.

Normally, the compiler generates code once for a particular compilation unit and the linker then ensures that other compilation units can access the variables and functions.
When it comes to templates, this is no longer the case. The compiler is not able to generate code for a template before a concrete instance of the template is instantiated. We therefore instantiate the template in every compilation unit and the only way to do this without copy and paste is to place all the template code in the header file.
The linker also then has to be template aware and reconcile multiple instances of the same object.
C++11 does help a bit with this scenario where one can declare:
template class MyTemplate<MyType>;
in a single C++ file and then use:
extern template class MyTemplate<MyType>;
The latter does not instantiate the template in the current compilation unit but will get the linker to link to one already defined.
see here for more detail

There are already two good answers, so I will just write briefly.
Templated code cannot be compiled without concrete types, which is why you can't do a classic "compile-and-link".
A template function or class is not complete, in the sense that not all types are resolved, until it's specific usage in the code (where a certain type replaces the template).
For this reason, templates are put in the header files and cannot be compiled independent of specific usage.

Related

What does the compiler do when you make a function template? [duplicate]

I'm reading a book about how templates work, and I'm having difficulty understanding this explanation of templates.
It says
When the compiler sees the definition of a template, it does not generate code. It generates code only when we instantiate a specific instance of the template. The fact that code is generated only when we use a template (and not when we define it) affects how we organize our source code and when errors are detected...To generate an instantiation, the compiler needs to have the code that defines a function template or class template member function. As a result, unlike non-template code, headers for templates typically include definitions as well as declarations.
What exactly does it mean by "generate code"? I don't understand what is different when you compile function templates or class templates compared to regular functions or classes.
The compiler generates the code for the specific types given in the template class instantiation.
If you have for instance a template class declaration as
template<typename T>
class Foo
{
public:
T& bar()
{
return subject;
}
private:
T subject;
};
as soon you have for example the following instantiations
Foo<int> fooInt;
Foo<double> fooDouble;
these will effectively generate the same linkable code as you would have defined classes like
class FooInt
{
public:
int& bar()
{
return subject;
}
private:
int subject;
}
and
class FooDouble
{
public:
double& bar()
{
return subject;
}
private:
double subject;
}
and instantiate the variables like
FooInt fooInt;
FooDouble fooDouble;
Regarding the point that template definitions (don't confuse with declarations regardless of templates) need to be seen with the header (included) files, it's pretty clear why:
The compiler can't generate this code without seeing the definition. It can refer to a matching instantiation that appeared first at linking stage though.
What does a non-template member function have that allows for it to
be defined outside of the header that a template function doesn't
have?
The declaration of a non-template class/member/function gives a predefined entry point for the linker. The definition can be drawn from a single implementation seen in a compiled object file (== .cpp == compilation unit).
In contrast the declaration of a templated class/member/function might be instantiated from arbitrary compilation units given the same or varying template parameters. The definition for these template parameters need's to be seen at least once. It can be either generic or specialized.
Note that you can specialize template implementations for particular types anyway (included with the header or at a specific compilation unit).
If you would provide a specialization for your template class in one of your compilation units, and don't use your template class with types other than specialized, that also should suffice for linking it all together.
I hope this sample helps clarifying what's the difference and efforts done from the compiler.
A template is a pattern for creating code. When the compiler sees the definition of a template it makes notes about that pattern. When it sees a use of that template it digs out its notes, figures out how to apply the pattern at the point where it's being used, and generates code according to the pattern.
What is the compiler suppose to do when it sees a template? Generate all the machine code for all possible data types - ints, doubles, float, strings, ... Could take a lot of time. Or just be a little lazy and generate the machine code for what it requires.
I guess the latter option is the better solution and gets the job done.
The main point here is that compiler does not treat a template definition until it meets a certain instance of the template. (Then it can proceed, I guess, like it have a usual class, which is a specific case of the template class, with fixed template parameters.)
The direct answer to your question is: Compiler generates machine code from users c++ code, I think this is wat is meant here by word "generate code".
The template declaration must be in header file because when compiler compiles some source, which use template it HAVE only header file (included in source with #include macro), but it NEED whole template definition. So logical conclusion is that template definition must be in header.
When you create a function and compile it, the compiler generates code for it. Many compilers will not generate code for static functions that are not used.
If you create a templated function and nothing uses the template (such as std::sort), the code for the function will not be generated.
Remember, templates are like stencils. The templates tell how to generate a class or function using the given template parameters. If the stencil is not used, nothing is generated.
Consider also that the compiler doesn't know how to implement or use the template until it sees all the template parameters resolved.
It won't straight away generate code. Only generates the class or template code when it comes across an instantiation of that template. That is, if you are actually creating an object of that template definition.
In essence, templates allow you abstract away from types. If you need two instantiations of the template class for example for an int and a double the compiler will literally create two of these classes for you when you need them. That is what makes templates so powerful.
Your C++ is read by the compiler and turned into assembly code, before being turned in machine code.
Templates are designed to allow generic programming. If your code doesn't use your template at all, the compiler won't generate the assembly code associated. The more data types you associate your template with in your program, the more assembly code it will generate.

How does the compilation of templates work?

I'm reading a book about how templates work, and I'm having difficulty understanding this explanation of templates.
It says
When the compiler sees the definition of a template, it does not generate code. It generates code only when we instantiate a specific instance of the template. The fact that code is generated only when we use a template (and not when we define it) affects how we organize our source code and when errors are detected...To generate an instantiation, the compiler needs to have the code that defines a function template or class template member function. As a result, unlike non-template code, headers for templates typically include definitions as well as declarations.
What exactly does it mean by "generate code"? I don't understand what is different when you compile function templates or class templates compared to regular functions or classes.
The compiler generates the code for the specific types given in the template class instantiation.
If you have for instance a template class declaration as
template<typename T>
class Foo
{
public:
T& bar()
{
return subject;
}
private:
T subject;
};
as soon you have for example the following instantiations
Foo<int> fooInt;
Foo<double> fooDouble;
these will effectively generate the same linkable code as you would have defined classes like
class FooInt
{
public:
int& bar()
{
return subject;
}
private:
int subject;
}
and
class FooDouble
{
public:
double& bar()
{
return subject;
}
private:
double subject;
}
and instantiate the variables like
FooInt fooInt;
FooDouble fooDouble;
Regarding the point that template definitions (don't confuse with declarations regardless of templates) need to be seen with the header (included) files, it's pretty clear why:
The compiler can't generate this code without seeing the definition. It can refer to a matching instantiation that appeared first at linking stage though.
What does a non-template member function have that allows for it to
be defined outside of the header that a template function doesn't
have?
The declaration of a non-template class/member/function gives a predefined entry point for the linker. The definition can be drawn from a single implementation seen in a compiled object file (== .cpp == compilation unit).
In contrast the declaration of a templated class/member/function might be instantiated from arbitrary compilation units given the same or varying template parameters. The definition for these template parameters need's to be seen at least once. It can be either generic or specialized.
Note that you can specialize template implementations for particular types anyway (included with the header or at a specific compilation unit).
If you would provide a specialization for your template class in one of your compilation units, and don't use your template class with types other than specialized, that also should suffice for linking it all together.
I hope this sample helps clarifying what's the difference and efforts done from the compiler.
A template is a pattern for creating code. When the compiler sees the definition of a template it makes notes about that pattern. When it sees a use of that template it digs out its notes, figures out how to apply the pattern at the point where it's being used, and generates code according to the pattern.
What is the compiler suppose to do when it sees a template? Generate all the machine code for all possible data types - ints, doubles, float, strings, ... Could take a lot of time. Or just be a little lazy and generate the machine code for what it requires.
I guess the latter option is the better solution and gets the job done.
The main point here is that compiler does not treat a template definition until it meets a certain instance of the template. (Then it can proceed, I guess, like it have a usual class, which is a specific case of the template class, with fixed template parameters.)
The direct answer to your question is: Compiler generates machine code from users c++ code, I think this is wat is meant here by word "generate code".
The template declaration must be in header file because when compiler compiles some source, which use template it HAVE only header file (included in source with #include macro), but it NEED whole template definition. So logical conclusion is that template definition must be in header.
When you create a function and compile it, the compiler generates code for it. Many compilers will not generate code for static functions that are not used.
If you create a templated function and nothing uses the template (such as std::sort), the code for the function will not be generated.
Remember, templates are like stencils. The templates tell how to generate a class or function using the given template parameters. If the stencil is not used, nothing is generated.
Consider also that the compiler doesn't know how to implement or use the template until it sees all the template parameters resolved.
It won't straight away generate code. Only generates the class or template code when it comes across an instantiation of that template. That is, if you are actually creating an object of that template definition.
In essence, templates allow you abstract away from types. If you need two instantiations of the template class for example for an int and a double the compiler will literally create two of these classes for you when you need them. That is what makes templates so powerful.
Your C++ is read by the compiler and turned into assembly code, before being turned in machine code.
Templates are designed to allow generic programming. If your code doesn't use your template at all, the compiler won't generate the assembly code associated. The more data types you associate your template with in your program, the more assembly code it will generate.

C++ Template Instantiation - Why must mine always be explicit, unlike STL?

Any one of my C++ projects will generate a linker error unless I include an explicit template instantiation for every templated class/method/function I author and use.
STL classes seem to have no such problem.
Is there some simple code of conduct (pun intended) I can adhere to which allows deferred instantiation like that of STL?
Thanks for listening.
For templates you need to put all the template code and methods into headers rather than source files. The standard library does this.
You should put most parts of a template within the header file. This applies equally to template classes as well as template functions within normal classes.
The exception that you should know to this rule is that when you specialize a template, you need to put the specialization in the implementation/.cpp file because a specialization is a concrete type. The actual template definitions on the other hand will need to be run through multiple times, one for each template parameter type used with the template - so they must go in the header file (they're not concrete type-wise).
e.g. put:
template typename<T> foo(T val) { std::cerr << "DEFAULT"; }
in the header file and its specialization for an int:
template<> foo<int>(int val) { std::cerr << "INT"; }
in the cpp file because the int version is a concrete, fully defined function whereas the T version is a template definition which will be used many time to generate many concrete functions.
Most of the time a template class is defined completely in the header file. This allows the compiler to generate the body of code on the fly if it hasn't already come across the combination of function and template parameters that you're using.
You can still use a template with separate header and implementation files, but it's much less convenient. As you've discovered, you must anticipate each template parameter and put that combination in the implementation file, so that the compiler can generate the necessary code.

About C++ template and explicit declaration

I just spent about an 20 minutes trying to figure out why some template methods of mine passed compilation but not linkage.
Turns out I needed to explicitly declare my template method.
It was something of this kind :
class Test {
template<class Source> void Save(Source& obj);
};
Then I would use it like this somewhere :
Test t;
ClassDerivedFromInterface obj;
t.Save(obj);
It compiled fine but didn't link. Until I added :
template void Test::Save(ClassDerivedFromInterface);
I would like to understand in which case an explicit declaration is necessary.
Thanks
In a nutshell, you need to have the entire body (the definition) of a template function visible to the translation unit that instantiates the template. So when you say t.Save(obj);, that translation unit should have access to the definition of Save. Usually you achieve this by including the definitions of function templates in the header file itself.
The reason for this is that templates aren't ordinary code that gets compiled and can later be linked at will. Rather, templates are a code generation tool that generates the necessary code on demand - an automatic version of copy/paste followed by search-and-replace, if you will.
Therefore, the actual compilable code for your function Save(ClassDerivedFromInterface&) doesn't come into existence until you write that line. If only the declaration of the function template is visible, then the template only produces the declaration of the concrete function, but not its body, and so at link time you notice that the function is missing.
To recap, templates themselves cannot be compiled, it is only their concrete instances that can, and you have to pay attention to ensure that the concrete instances are always available when you instantiate them. Explicit instantiation as you have it works and allows you to package a few specific instances into a separate TU, but generally that's hard to maintain and not scalable, and there are other drawbacks to explicit instantiation that you avoid when you let the compiler instantiate implicitly. So usually it's best to package your entire definitions into the header file.
You will need to explicitly declare the template if the template source is not visible at the time of compilation. This link covers it pretty well, also an awesome site in general:
C++ FAQ 35.13
You need explicit declaration when your template definitions are not accesible from the code that is using them, consider the following:
template.h - template declarations
template.cpp - template definitions
main.cpp - template usage
template.cpp is not included in main.cpp and thus unreachable by the template user so you need explicit declarations.
But if your structure is:
template.h - template declarations and definitions
main.cpp - template usage
the template declarations are reachable by the template user so you don't need explicit declarations.
The compiler needs to know what types you're going to be using with your template. If you create a template class and then use it with, for example, int, char, and double, then the compiler will create methods for the template for those types. If you compile the template method in a separate compilation unit from where you use it, the compiler will not instantiate your template for the type you need. But if you explicitly instantiate the template, the compiler will create whatever you tell it to.

Why declaration/definition must both be in source file for template class in c++?

Anyone can elaborate the reason?
Source files are compiled independently of one another into executable code, then later linked in to the main program. Template functions on the other hand, cannot be compiled without the template parameters. So, the file that uses them needs to have that code in order for it to be compiled. Therefore the functions need to be visible in the header file.
Promised example:
template<class T>
void swap(T & a, T & b)
{
T temp = a;
a = b;
b = temp;
}
The only requirements of class T here are that it has a public assignment operator(=). That's just about every class that has ever been implemented or conceived. However, each class implements the assignment operator in it's own way. The same machine code cannot be generated for swap<int>, swap<double> and swap<string>. Each one of those functions has to be unique. At the same time, the compiler cannot possibly anticipate all the myriad of different types that you might pass to this function, so it can't generate the functions ahead of time. So it has to wait until the function is called, and then it can get compiled.
For example, let's say I have that function above defined in "swap.h". Then in "main.cpp", I do this:
int main()
{
int a=5, b=10;
double c=3.5, d=7.9;
string s1="hello";
string s2="world";
swap(a,b);
swap(c,d);
swap(s1,s2);
}
In this example, 3 different functions were created. One to swap ints, one to swap doubles, and one to swap strings. In order to create those functions, the compiler needed to be able to see the template code. If it was in a separate source file, "swap.cpp" for example, the compiler wouldn't be able to see it, because like I said before, each source file is compiled independently of one another.
Are you asking why template bodies have to be in header files? It's because the compiler needs to know both the body and the template parameter(s) at the same time in order to generate machine code. The template parameters are known where the template is used (instantiated). This gives you one trivial case and two non-trivial ones:
(Trivial) The template is only used in one source file, so the body can be in that same source file.
Make the body available at every use, which often means in a header file.
In the source file which contains the body, explicitly instantiate every needed combination of template parameters.
The short answer to your question is that there is no obligation for declaration and definition of template classes to be in the same source files.
In fact, i consider this a bad thing, but it's completely understandable, given that it's pretty difficult to use them separately (but it can be done !).
EDIT
Suppose you have
myTemplateClass.h which declares a template class MyTemplateClass
myTemplateClass.hpp which defines its class members (includes myTemplateClass.h)
use of MyTemplateClass inside main.cpp
Simply include myTemplateClass.h in main.cpp and create myTemplateClassInt.cpp as follows :
#include "myTemplateClass.hpp"
template MyTemplateClass<int>;
Doing that, you tell the compiler to instantiate all template methods of MyTemplateClass for template parameter "int". Since it has access to myTemplateClass.hpp, such methods will be generated flawlessly... And the linker won't complain.
Of course, this approach requires that you use some place where instantiated versions of your template classes are defined.