How does the compilation of templates work? - c++

I'm reading a book about how templates work, and I'm having difficulty understanding this explanation of templates.
It says
When the compiler sees the definition of a template, it does not generate code. It generates code only when we instantiate a specific instance of the template. The fact that code is generated only when we use a template (and not when we define it) affects how we organize our source code and when errors are detected...To generate an instantiation, the compiler needs to have the code that defines a function template or class template member function. As a result, unlike non-template code, headers for templates typically include definitions as well as declarations.
What exactly does it mean by "generate code"? I don't understand what is different when you compile function templates or class templates compared to regular functions or classes.

The compiler generates the code for the specific types given in the template class instantiation.
If you have for instance a template class declaration as
template<typename T>
class Foo
{
public:
T& bar()
{
return subject;
}
private:
T subject;
};
as soon you have for example the following instantiations
Foo<int> fooInt;
Foo<double> fooDouble;
these will effectively generate the same linkable code as you would have defined classes like
class FooInt
{
public:
int& bar()
{
return subject;
}
private:
int subject;
}
and
class FooDouble
{
public:
double& bar()
{
return subject;
}
private:
double subject;
}
and instantiate the variables like
FooInt fooInt;
FooDouble fooDouble;
Regarding the point that template definitions (don't confuse with declarations regardless of templates) need to be seen with the header (included) files, it's pretty clear why:
The compiler can't generate this code without seeing the definition. It can refer to a matching instantiation that appeared first at linking stage though.
What does a non-template member function have that allows for it to
be defined outside of the header that a template function doesn't
have?
The declaration of a non-template class/member/function gives a predefined entry point for the linker. The definition can be drawn from a single implementation seen in a compiled object file (== .cpp == compilation unit).
In contrast the declaration of a templated class/member/function might be instantiated from arbitrary compilation units given the same or varying template parameters. The definition for these template parameters need's to be seen at least once. It can be either generic or specialized.
Note that you can specialize template implementations for particular types anyway (included with the header or at a specific compilation unit).
If you would provide a specialization for your template class in one of your compilation units, and don't use your template class with types other than specialized, that also should suffice for linking it all together.
I hope this sample helps clarifying what's the difference and efforts done from the compiler.

A template is a pattern for creating code. When the compiler sees the definition of a template it makes notes about that pattern. When it sees a use of that template it digs out its notes, figures out how to apply the pattern at the point where it's being used, and generates code according to the pattern.

What is the compiler suppose to do when it sees a template? Generate all the machine code for all possible data types - ints, doubles, float, strings, ... Could take a lot of time. Or just be a little lazy and generate the machine code for what it requires.
I guess the latter option is the better solution and gets the job done.

The main point here is that compiler does not treat a template definition until it meets a certain instance of the template. (Then it can proceed, I guess, like it have a usual class, which is a specific case of the template class, with fixed template parameters.)
The direct answer to your question is: Compiler generates machine code from users c++ code, I think this is wat is meant here by word "generate code".
The template declaration must be in header file because when compiler compiles some source, which use template it HAVE only header file (included in source with #include macro), but it NEED whole template definition. So logical conclusion is that template definition must be in header.

When you create a function and compile it, the compiler generates code for it. Many compilers will not generate code for static functions that are not used.
If you create a templated function and nothing uses the template (such as std::sort), the code for the function will not be generated.
Remember, templates are like stencils. The templates tell how to generate a class or function using the given template parameters. If the stencil is not used, nothing is generated.
Consider also that the compiler doesn't know how to implement or use the template until it sees all the template parameters resolved.

It won't straight away generate code. Only generates the class or template code when it comes across an instantiation of that template. That is, if you are actually creating an object of that template definition.
In essence, templates allow you abstract away from types. If you need two instantiations of the template class for example for an int and a double the compiler will literally create two of these classes for you when you need them. That is what makes templates so powerful.

Your C++ is read by the compiler and turned into assembly code, before being turned in machine code.
Templates are designed to allow generic programming. If your code doesn't use your template at all, the compiler won't generate the assembly code associated. The more data types you associate your template with in your program, the more assembly code it will generate.

Related

What does the compiler do when you make a function template? [duplicate]

I'm reading a book about how templates work, and I'm having difficulty understanding this explanation of templates.
It says
When the compiler sees the definition of a template, it does not generate code. It generates code only when we instantiate a specific instance of the template. The fact that code is generated only when we use a template (and not when we define it) affects how we organize our source code and when errors are detected...To generate an instantiation, the compiler needs to have the code that defines a function template or class template member function. As a result, unlike non-template code, headers for templates typically include definitions as well as declarations.
What exactly does it mean by "generate code"? I don't understand what is different when you compile function templates or class templates compared to regular functions or classes.
The compiler generates the code for the specific types given in the template class instantiation.
If you have for instance a template class declaration as
template<typename T>
class Foo
{
public:
T& bar()
{
return subject;
}
private:
T subject;
};
as soon you have for example the following instantiations
Foo<int> fooInt;
Foo<double> fooDouble;
these will effectively generate the same linkable code as you would have defined classes like
class FooInt
{
public:
int& bar()
{
return subject;
}
private:
int subject;
}
and
class FooDouble
{
public:
double& bar()
{
return subject;
}
private:
double subject;
}
and instantiate the variables like
FooInt fooInt;
FooDouble fooDouble;
Regarding the point that template definitions (don't confuse with declarations regardless of templates) need to be seen with the header (included) files, it's pretty clear why:
The compiler can't generate this code without seeing the definition. It can refer to a matching instantiation that appeared first at linking stage though.
What does a non-template member function have that allows for it to
be defined outside of the header that a template function doesn't
have?
The declaration of a non-template class/member/function gives a predefined entry point for the linker. The definition can be drawn from a single implementation seen in a compiled object file (== .cpp == compilation unit).
In contrast the declaration of a templated class/member/function might be instantiated from arbitrary compilation units given the same or varying template parameters. The definition for these template parameters need's to be seen at least once. It can be either generic or specialized.
Note that you can specialize template implementations for particular types anyway (included with the header or at a specific compilation unit).
If you would provide a specialization for your template class in one of your compilation units, and don't use your template class with types other than specialized, that also should suffice for linking it all together.
I hope this sample helps clarifying what's the difference and efforts done from the compiler.
A template is a pattern for creating code. When the compiler sees the definition of a template it makes notes about that pattern. When it sees a use of that template it digs out its notes, figures out how to apply the pattern at the point where it's being used, and generates code according to the pattern.
What is the compiler suppose to do when it sees a template? Generate all the machine code for all possible data types - ints, doubles, float, strings, ... Could take a lot of time. Or just be a little lazy and generate the machine code for what it requires.
I guess the latter option is the better solution and gets the job done.
The main point here is that compiler does not treat a template definition until it meets a certain instance of the template. (Then it can proceed, I guess, like it have a usual class, which is a specific case of the template class, with fixed template parameters.)
The direct answer to your question is: Compiler generates machine code from users c++ code, I think this is wat is meant here by word "generate code".
The template declaration must be in header file because when compiler compiles some source, which use template it HAVE only header file (included in source with #include macro), but it NEED whole template definition. So logical conclusion is that template definition must be in header.
When you create a function and compile it, the compiler generates code for it. Many compilers will not generate code for static functions that are not used.
If you create a templated function and nothing uses the template (such as std::sort), the code for the function will not be generated.
Remember, templates are like stencils. The templates tell how to generate a class or function using the given template parameters. If the stencil is not used, nothing is generated.
Consider also that the compiler doesn't know how to implement or use the template until it sees all the template parameters resolved.
It won't straight away generate code. Only generates the class or template code when it comes across an instantiation of that template. That is, if you are actually creating an object of that template definition.
In essence, templates allow you abstract away from types. If you need two instantiations of the template class for example for an int and a double the compiler will literally create two of these classes for you when you need them. That is what makes templates so powerful.
Your C++ is read by the compiler and turned into assembly code, before being turned in machine code.
Templates are designed to allow generic programming. If your code doesn't use your template at all, the compiler won't generate the assembly code associated. The more data types you associate your template with in your program, the more assembly code it will generate.

Linking/compile time concerning static template libraries

It seems to be a common convention not to use source files for template based classes (STL and boost) and to put the implementation into the header as well. I assume that this will increase the time it takes to compile the source files that include the header drastically compared to the classic separation between declaration and implementation in header and source files. The reason why this is done is probably due to the fact that you would have to tell the compiler in the source file which templates to use, which will probably result in a bloated .a file.
Assuming the linker also requires more time as the library grows, which approach would be faster in terms of the time it takes to compile a source file that includes the library header?
1. Not using a .cpp file and put the entire class, including the implementation, into the header
//foo.hpp
template <class T>
class Foo
{
public:
Foo(){};
T bar()
{
T* t = NULL;
//do stuff
return *t;
}
};
or
2. Explicitly compiling the template for various types inside the source file of the library itself
//foo.h
template <class T>
class Foo
{
public:
Foo(){};
T bar();
};
//foo.cpp
template <class T>
T Foo<T>::bar()
{
T* t = NULL;
//do stuff
return *t;
}
template class Foo<int>;
template class Foo<float>;
template class Foo<double>;
template class Foo<long long>;
The key issue with templates is that the compiler won't know for which template arguments a template will be used. The only time the compiler knows a template is used with a specific set of arguments is when it sees the template used and the compiler will instantiate the template at this point. As a result the code is often put into the header so the compiler can instantiate the template on the user's behalf when it is used.
Alternatively, the author of a template can tell the compiler that a template is used with a specific list of template arguments and just instantiate them explicitly. In this case, the template definition can go into a source file (or, more likely, a special header not generally included by users). The problem with this approach is that the author of the template code doesn't necessarily know which instantiations are needed.
In C++ 2011 there is also a middle ground: It is possible to tell the compiler that certain instantiations are already created by declaring a specialization as extern. This way, the compiler knows that it doesn't need to instantiate the template with certain arguments but if other arguments are used it knows that it needs to create them. For example, the standard C++ library has std::basic_string and it can predict that instantiations for char and wchar_t are likely to be used and can put them into the library, declaring the instantiations as extern. However, having the code readily available makes it viable to use std::basic_string<user_type> with user defined types.
Going forward we hope to get a module system but right now nobody really knows how such a system should really work. The compiler implementers who are interested in the topic have a group to think about modules and it is likely that a system like this may help with compile times for templates.
Normally, the compiler generates code once for a particular compilation unit and the linker then ensures that other compilation units can access the variables and functions.
When it comes to templates, this is no longer the case. The compiler is not able to generate code for a template before a concrete instance of the template is instantiated. We therefore instantiate the template in every compilation unit and the only way to do this without copy and paste is to place all the template code in the header file.
The linker also then has to be template aware and reconcile multiple instances of the same object.
C++11 does help a bit with this scenario where one can declare:
template class MyTemplate<MyType>;
in a single C++ file and then use:
extern template class MyTemplate<MyType>;
The latter does not instantiate the template in the current compilation unit but will get the linker to link to one already defined.
see here for more detail
There are already two good answers, so I will just write briefly.
Templated code cannot be compiled without concrete types, which is why you can't do a classic "compile-and-link".
A template function or class is not complete, in the sense that not all types are resolved, until it's specific usage in the code (where a certain type replaces the template).
For this reason, templates are put in the header files and cannot be compiled independent of specific usage.

About C++ template and explicit declaration

I just spent about an 20 minutes trying to figure out why some template methods of mine passed compilation but not linkage.
Turns out I needed to explicitly declare my template method.
It was something of this kind :
class Test {
template<class Source> void Save(Source& obj);
};
Then I would use it like this somewhere :
Test t;
ClassDerivedFromInterface obj;
t.Save(obj);
It compiled fine but didn't link. Until I added :
template void Test::Save(ClassDerivedFromInterface);
I would like to understand in which case an explicit declaration is necessary.
Thanks
In a nutshell, you need to have the entire body (the definition) of a template function visible to the translation unit that instantiates the template. So when you say t.Save(obj);, that translation unit should have access to the definition of Save. Usually you achieve this by including the definitions of function templates in the header file itself.
The reason for this is that templates aren't ordinary code that gets compiled and can later be linked at will. Rather, templates are a code generation tool that generates the necessary code on demand - an automatic version of copy/paste followed by search-and-replace, if you will.
Therefore, the actual compilable code for your function Save(ClassDerivedFromInterface&) doesn't come into existence until you write that line. If only the declaration of the function template is visible, then the template only produces the declaration of the concrete function, but not its body, and so at link time you notice that the function is missing.
To recap, templates themselves cannot be compiled, it is only their concrete instances that can, and you have to pay attention to ensure that the concrete instances are always available when you instantiate them. Explicit instantiation as you have it works and allows you to package a few specific instances into a separate TU, but generally that's hard to maintain and not scalable, and there are other drawbacks to explicit instantiation that you avoid when you let the compiler instantiate implicitly. So usually it's best to package your entire definitions into the header file.
You will need to explicitly declare the template if the template source is not visible at the time of compilation. This link covers it pretty well, also an awesome site in general:
C++ FAQ 35.13
You need explicit declaration when your template definitions are not accesible from the code that is using them, consider the following:
template.h - template declarations
template.cpp - template definitions
main.cpp - template usage
template.cpp is not included in main.cpp and thus unreachable by the template user so you need explicit declarations.
But if your structure is:
template.h - template declarations and definitions
main.cpp - template usage
the template declarations are reachable by the template user so you don't need explicit declarations.
The compiler needs to know what types you're going to be using with your template. If you create a template class and then use it with, for example, int, char, and double, then the compiler will create methods for the template for those types. If you compile the template method in a separate compilation unit from where you use it, the compiler will not instantiate your template for the type you need. But if you explicitly instantiate the template, the compiler will create whatever you tell it to.

Why do C++ template definitions need to be in the header? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why should the implementation and the declaration of a template class be in the same header file?
e.g when defining a template class why do the implementations of the class methods need to be in the header? Why can't they be in a implementation file (cpp/cxx)?
A template class is not a class, it's a template that can be used to create a class. When you instantiate such a class, e.g. MyTemplate<int>, the compiler creates the class on the spot. In order to create it, it has to see all the templated member functions (so that it can use the templates to create actual member functions such as MyTemplate<int>::foo() ), and therefore these templated member functions must be in the header.
If the members are not in the header, the compiler will simply assume that they exist somewhere else and just create actual function declarations from the templated function declarations, and this gives you linker errors.
The "export" keyword is supposed to fix this, but few compilers support it (I only know of Comeau).
You can also explicitly instantiate MyTemplate<int> - then the compiler will create actual member functions for MyTemplate<int> when it compiles the cpp files containing the MyTemplate member function definition templates.
They need to be visible for the compiler when they are instantiated. That basically means that if you are publishing the template in a header, the definitions have to be visible by all translation units that include that header if you depend on implicit instantiation.
They need not be defined in the header if you are going to explicitly instantiate the templates, but this is in most cases not a good idea.
As to the reason, it basically boils down to the fact that templates are not compiled when the compiler parses the definition, but rather when they are instantiated, and then they are compiled for the particular instantiation type.
If your compiler supports export, then it doesn't. Only EDG-based compilers support export, and it's going to be removed from C++0x because of that.
Non-exported templates require that the compiler can see the full template definition, in order to instantiate it for the particular types you supply as arguments. For example:
template<typename T>
struct X {
T t;
X(int i): t(i) {}
};
Now, when you write X<float>(5) in some translation unit, the compiler as part of compiling that translation unit must check that the constructor of X is type-correct, generate the code for it, and so on. Hence it must see the definition of X, so that it can permit X<float>(5) but forbid X<char*>(5).
The only sensible way to ensure that the compiler sees the same template definition in all translation units that use it, is to put the definition in a header file. As far as the standard is concerned, though, you're welcome to copy-and-paste it manually, or to define a template in a cpp file that is used only in that one translation unit.
export in effect tells the compiler that it must output a parsed form of the template definition into a special kind of object file. Then the linker performs template instantiation. With normal toolchains, the compiler is smart enough to perform template instantiation and the linker isn't. Bear in mind that template instantiation has to do pretty much everything that the compiler does beyond basic parsing.
They can be in a CPP file.
The problem arises from the fact that the compiler builds the code for a specific instantiation of a template class (eg std::vector< int >) on a per translation unit basis. The problem with defining the functions in a CPP file is that you will need to define every possible form in that CPP file (this is called template specialization).
So for that int vector exampled above you could define a function in a CPP file for the int case using specialization.
e.g
template<> void std::vector< int >::push_back( int& intVal )
Of course doing this can produce the advantage of optimisation for specific cases but it does give you an idea of just how much code bloat can be introduced by STL! At least all the functions aren't defined as inline as a certain compiler used to do ;)
That aspect of template is called the compilation model, not to be confused with the instantiation mechanism which was the subject of How does C++ link template instances.
The instantiation mechanism is the answer to the question "When is the instantiation generated?", the instantiation model is the answer to "Where the source are found?"
There are two standards compilation model:
inclusion, the one that you know, where the definition must be available,
separated, which allows to put the definition somewhere else with the help of the keyword export. That one has been removed from the standard and won't be available in C++0X. One of the raison for removal was that it wasn't widely implemented (only one implementation).
See C++ Templates, The Complete Guide by David Vandevoorde and Nicolai Josuttis or http://www.bourguet.org/v2/cpplang/export.pdf for more information, the separated compilation model being the subject of that later paper.

Why declaration/definition must both be in source file for template class in c++?

Anyone can elaborate the reason?
Source files are compiled independently of one another into executable code, then later linked in to the main program. Template functions on the other hand, cannot be compiled without the template parameters. So, the file that uses them needs to have that code in order for it to be compiled. Therefore the functions need to be visible in the header file.
Promised example:
template<class T>
void swap(T & a, T & b)
{
T temp = a;
a = b;
b = temp;
}
The only requirements of class T here are that it has a public assignment operator(=). That's just about every class that has ever been implemented or conceived. However, each class implements the assignment operator in it's own way. The same machine code cannot be generated for swap<int>, swap<double> and swap<string>. Each one of those functions has to be unique. At the same time, the compiler cannot possibly anticipate all the myriad of different types that you might pass to this function, so it can't generate the functions ahead of time. So it has to wait until the function is called, and then it can get compiled.
For example, let's say I have that function above defined in "swap.h". Then in "main.cpp", I do this:
int main()
{
int a=5, b=10;
double c=3.5, d=7.9;
string s1="hello";
string s2="world";
swap(a,b);
swap(c,d);
swap(s1,s2);
}
In this example, 3 different functions were created. One to swap ints, one to swap doubles, and one to swap strings. In order to create those functions, the compiler needed to be able to see the template code. If it was in a separate source file, "swap.cpp" for example, the compiler wouldn't be able to see it, because like I said before, each source file is compiled independently of one another.
Are you asking why template bodies have to be in header files? It's because the compiler needs to know both the body and the template parameter(s) at the same time in order to generate machine code. The template parameters are known where the template is used (instantiated). This gives you one trivial case and two non-trivial ones:
(Trivial) The template is only used in one source file, so the body can be in that same source file.
Make the body available at every use, which often means in a header file.
In the source file which contains the body, explicitly instantiate every needed combination of template parameters.
The short answer to your question is that there is no obligation for declaration and definition of template classes to be in the same source files.
In fact, i consider this a bad thing, but it's completely understandable, given that it's pretty difficult to use them separately (but it can be done !).
EDIT
Suppose you have
myTemplateClass.h which declares a template class MyTemplateClass
myTemplateClass.hpp which defines its class members (includes myTemplateClass.h)
use of MyTemplateClass inside main.cpp
Simply include myTemplateClass.h in main.cpp and create myTemplateClassInt.cpp as follows :
#include "myTemplateClass.hpp"
template MyTemplateClass<int>;
Doing that, you tell the compiler to instantiate all template methods of MyTemplateClass for template parameter "int". Since it has access to myTemplateClass.hpp, such methods will be generated flawlessly... And the linker won't complain.
Of course, this approach requires that you use some place where instantiated versions of your template classes are defined.