C++ Templates: Are template ínstantiations inlined? Are there drawbacks in performance? - c++

When some C++ entity, such as a structure, class, or function, is declared as a template, then the definitions provided for said entities are solely blue-prints which must be instantiated.
Due to the fact that a template entity must be defined when it is declared (which is commonly header files) I have the conception, which I try to convince myself as wrong, that when after a template has been instantiated it will be inlined by the compiler. I would like to ask if this is so?
The answer for this question raises my suspicion when I read the the paragraph:
"Templates can lead to slower compile-times and possibly larger
executable, especially with older compilers."
Slower compile-times are clear as template must be instantiated but why "possibly larger executables"? In what ways should this be interpreted? Should I interpret this as 'many functions are inlined' or 'the executable's size increases if there are many template instantiations, that is the same template is instantiated with a lot of different types, which causes several copies of the same entity to be there'?
In the latter case, does a larger executable size cause the software to run more slowly seeing that more code must be loaded into memory which will in turn cause expensive paging?
Also, as these questions are also somewhat compiler dependent I am interested in the Visual C++ compiler. Generalised answers concerning what most compilers do give a good insight as well.
Thank you in advance.

Due to the fact that a template entity must be defined when it is declared (which is commonly header files)
Not true. You can declare and define template classes, methods and functions separately, just like you can other classes, methods and functions.
I have the conception, which I try to convince myself as wrong, that when after a template has been instantiated it will be inlined by the compiler. I would like to ask if this is so?
Some of it may be, or all of it, or none of it. The compiler will do what it deems is best.
Slower compile-times are clear as template must be instantiated but why "possibly larger executables"? In what ways should this be interpreted?
It could be interpreted a number of ways. In the same way that a bottle of Asprin contains the warning "may induce <insert side-effects here>".
Should I interpret this as 'many functions are inlined' or 'the executable's size increases if there are many template instantiations, that is the same template is instantiated with a lot of different types, which causes several copies of the same entity to be there'?
You won't have several copies of the same entity - the compiler suite is obliged to see to that. Even if methods are inline, the address of the method will:
always exist, and
be the same address when referenced by multiple compilation units.
What you may find is that you start creating more types than you intended. For example std::vector<int> is a completely different type to std::vector<double>. foo<X>() is a different function to foo<Y>(). The number of types and function definitions in your program can grow quickly.
In the latter case, does a larger executable size cause the software to run more slowly seeing that more code must be loaded into memory which will in turn cause expensive paging?
Excessive paging, probably not. Excessive cache misses, quite possibly. Often, optimising for smaller code is a good strategy for achieving good performance (under certain circumstances such as when there is little data being accessed and it's all in-cache).

Whether they are inlined is always up to the compiler. Non-inlined template instantiations are shared for all similar instantiations.
So a lot of translation units all wanting to make a Foo<int> will share the Foo instantiation. Obviously if Foo<int> is a function, and the compiler decides in each case to inline it then the code is repeated. However the choice to inline is because the optimization of doing so seems highly likely to be superior to a function call.
Technically there could be a corner case where a template causes slower execution. I work on software which has super tight inner loops for which we do a lot of performance measurements. We use a number of template functions and classes and it has yet to show up a degradation compared to writing the code by hand.
I can't be certain but I think you'd have to have a situation where the template:
- generated non-inlined code
- did so for multiple instantiations
- could have been one hand-written function
- hand-written function incurs no run-time penalty for being one function (ie: no implicit conversions involving runtime checks)
Then it's possible that you have a case where the single-handwritten-function fits within the CPU's instruction cache but the multiple template instantiations do not.

Related

What the benefits by using the template<int size> than dynamic allocate?

I'm reading the pbrt and it has defined a type:
template <int nSpectrumSamples>
class CoefficientSpectrum;
class RGBSpectrum : public CoefficientSpectrum<3> {
using CoefficientSpectrum<3>::c;
typedef RGBSpectrum Spectrum;
// typedef SampledSpectrum Spectrum;
And the author said:
"We have not written the system such that the selection of which Spectrum implementation to use could be resolved at run time; to switch to a different representation, the entire system must be recompiled. One advantage to this design is that many of the various Spectrum methods can be implemented as short functions that can be inlined by the compiler, rather than being left as stand-alone functions that have to be invoked through the relatively slow virtual method call mechanism. Inlining frequently used short functions like these can give a substantial improvement in performance."
1.Why template can inline function but normal way can not?
2.Why do normal way has to use the virtual method?
Linkage to the entire header file:
https://github.com/mmp/pbrt-v3/blob/master/src/core/spectrum.h
To inline a function call, the compiler has to know 1. which function is called and 2. the exact code of that function. The whole purpose of virtual functions is to defer the choice which function is called to run-time, so compilers can obtain the above pieces of information only with sophisticated optimization techniques that require very specific circumstances1.
Both templates and virtual functions (i.e. polymorphy) are tools for encoding abstraction. The code that uses a CoefficientSpectrum does not really care about the implementation details of the spectrum, only that you can e.g. convert it to and from RGB - that's why it uses an abstraction (to avoid repeating the code for each kind of spectrum). As explained in the comment you quoted, using polymorphy for abstraction here would mean that the compiler has a hard time optimizing the code because it fundamentally defers the choice of implementation to run-time (which is sometimes useful but not strictly necessary here). By requiring the choice of implementation to be made at compile-time, the compiler can easily optimize (i.e. inline) the code.
1For example, some compilers are able to optimize away the std::function abstraction, which generally uses polymorphy for type erasure. Of course, this can only work if all the necessary information is available.

compilation control and code bloat in templates

I am reading about templates in apllied C++ book and it is mentioned as below
Templates create code bloat. The compiler will instantiate a template
object for each pixel type. Even if your users only need limited
types, the image processing routines may need additional types for
temporary images and the like.
Not having to be templated object has the advantage of giving us
control of how the object will be compiled, and lets us control code
bloat.
My questions on above text
What does author mean by "Even if your users only need limited types, the image processing routines may need additional types for temporary images and the like." ?
What does author mean by "Not having to be templated object has the advantage of giving us control of how the object will be compiled" ?
Request your help in understanding above statements. It would be good if explained with simple examples.
Thanks for your time and help.
The author is right that templates may create so-called code-bloat, but his explanations are fuzzy...
Let us a start with a primer on code-bloat.
There is an annoying interaction in the C++ Standard between templates and function pointers:
Each instantiation of a template with a given set of parameters is its own types
Two different functions should have different addresses
Since two different instantiations (one with int and one with long for example) are different types, the functions associated with them are different functions, and thus they need different addresses.
An optimizing compiler is allowed to actually merge functions under the as-if rule: if the programmer cannot realize they were merged. The naive attempt is to try and prove that the address of one of them is never taken, this is futile. A more clever strategy is to merge the function bodies, but still provide a different address:
; assembly-like code
function_instantiation_1:
nop ; offset to have different addresses
function_instantiation_2:
; body of both functions
However a practical problem is to identify such functions that could be merged, given the sheer number of functions there are.
So ? If one wishes to limit the amount of code produced, one just has to limit the number of instantiations. I find the author claim that the image processing routines may need additional types for temporary images and the like dubious. The set of types within a program is generally fairly restricted, and there are not gazillions of image types.

Will C++ compiler generate code for each template type?

I have two questions about templates in C++. Let's imagine I have written a simple List and now I want to use it in my program to store pointers to different object types (A*, B* ... ALot*). My colleague says that for each type there will be generated a dedicated piece of code, even though all pointers in fact have the same size.
If this is true, can somebody explain me why? For example in Java generics have the same purpose as templates for pointers in C++. Generics are only used for pre-compile type checking and are stripped down before compilation. And of course the same byte code is used for everything.
Second question is, will dedicated code be also generated for char and short (considering that they both have the same size and there are no specialization).
If this makes any difference, we are talking about embedded applications.
I have found a similar question, but it did not completely answer my question: Do C++ template classes duplicate code for each pointer type used?
Thanks a lot!
I have two questions about templates in C++. Let's imagine I have written a simple List and now I want to use it in my program to store pointers to different object types (A*, B* ... ALot*). My colleague says that for each type there will be generated a dedicated piece of code, even though all pointers in fact have the same size.
Yes, this is equivalent to having both functions written.
Some linkers will detect the identical functions, and eliminate them. Some libraries are aware that their linker doesn't have this feature, and factor out common code into a single implementation, leaving only a casting wrapper around the common code. Ie, a std::vector<T*> specialization may forward all work to a std::vector<void*> then do casting on the way out.
Now, comdat folding is delicate: it is relatively easy to make functions you think are identical, but end up not being the same, so two functions are generated. As a toy example, you could go off and print the typename via typeid(x).name(). Now each version of the function is distinct, and they cannot be eliminated.
In some cases, you might do something like this thinking that it is a run time property that differs, and hence identical code will be created, and the identical functions eliminated -- but a smart C++ compiler might figure out what you did, use the as-if rule and turn it into a compile-time check, and block not-really-identical functions from being treated as identical.
If this is true, can somebody explain me why? For example in Java generics have the same purpose as templates for pointers in C++. Generics are only used for per-compile type checking and are stripped down before compilation. And of course the same byte code is used for everything.
No, they aren't. Generics are roughly equivalent to the C++ technique of type erasure, such as what std::function<void()> does to store any callable object. In C++, type erasure is often done via templates, but not all uses of templates are type erasure!
The things that C++ does with templates that are not in essence type erasure are generally impossible to do with Java generics.
In C++, you can create a type erased container of pointers using templates, but std::vector doesn't do that -- it creates an actual container of pointers. The advantage to this is that all type checking on the std::vector is done at compile time, so there doesn't have to be any run time checks: a safe type-erased std::vector may require run time type checking and the associated overhead involved.
Second question is, will dedicated code be also generated for char and short (considering that they both have the same size and there are no specialization).
They are distinct types. I can write code that will behave differently with a char or short value. As an example:
std::cout << x << "\n";
with x being a short, this print an integer whose value is x -- with x being a char, this prints the character corresponding to x.
Now, almost all template code exists in header files, and is implicitly inline. While inline doesn't mean what most folk think it means, it does mean that the compiler can hoist the code into the calling context easily.
If this makes any difference, we are talking about embedded applications.
What really makes a difference is what your particular compiler and linker is, and what settings and flags they have active.
The answer is maybe. In general, each instantiation of a
template is a unique type, with a unique implementation, and
will result in a totally independent instance of the code.
Merging the instances is possible, but would be considered
"optimization" (under the "as if" rule), and this optimization
isn't wide spread.
With regards to comparisons with Java, there are several points
to keep in mind:
C++ uses value semantics by default. An std::vector, for
example, will actually insert copies. And whether you're
copying a short or a double does make a difference in the
generated code. In Java, short and double will be boxed,
and the generated code will clone a boxed instance in some way;
cloning doesn't require different code, since it calls a virtual
function of Object, but physically copying does.
C++ is far more powerful than Java. In particular, it allows
comparing things like the address of functions, and it requires
that the functions in different instantiations of templates have
different addresses. Usually, this is not an important point,
and I can easily imagine a compiler with an option which tells
it to ignore this point, and to merge instances which are
identical at the binary level. (I think VC++ has something like
this.)
Another issue is that the implementation of a template in C++
must be present in the header file. In Java, of course,
everything must be present, always, so this issue affects all
classes, not just template. This is, of course, one of the
reasons why Java is not appropriate for large applications. But
it means that you don't want any complicated functionality in a
template; doing so loses one of the major advantages of C++,
compared to Java (and many other languages). In fact, it's not
rare, when implementing complicated functionality in templates,
to have the template inherit from a non-template class which
does most of the implementation in terms of void*. While
implementing large blocks of code in terms of void* is never
fun, it does have the advantage of offering the best of both
worlds to the client: the implementation is hidden in compiled
files, invisible in any way, shape or manner to the client.

C++ Template performance inclusion model and inline model

If I have a C++ template I have two choices (without the export keyword) to link them:
Inclusion model with inlining - i.e. including the definitions together with the declarations in the .h file. This inlines all the functions and create a big unit (although it's lazy)
Inclusion model without inlining - i.e. something like including this .h file:
code:
// templateinstantiations.cpp
#include "array.cpp"
template class array <int, 50>; // explicit instantiation
every time I want to use a template, and being careful to explicit instantiating every single type I need (this can be boring and hard to maintain)
My question is: I know that excessively inlining functions may cause memory thrashing and losses of performances.. besides it seems that in both the above cases compilation times are huge.. what is the tradeoff between the first and the second approach? Is there a criterion to choose the first over the second or I just need to try them out and "time" them?
This question isn't really about templates but about inlining, I think. For the purposes of run-time performance, the compiler probably does the right choice in most cases: if it can see that a function is too big to benefit from inlining it is likely to generate a non-inlined version of any inline function, independently of the function being a template or not. Each translation unit will create its own version of the function and the linker will choose one to use (and, hopefully, throw away the other unused copies but whether it really does this depends on the linker and the object file format).
The interaction with templates comes in when looking at the various interactions between the template code and the functions it calls which may be templates themselves: When forcing the code not to be inlined, the compiler has no chance to avoid the overhead of a function call. Often the abstractions used by templates are very simple functions, e.g., "increment an iterator" and "dereference an iterator" mapping to underlying pointer operations, creation a function call can become rather expensive due to the function call overhead and the lost opportunity for optimizations. However, the compiler can actually see through this and do the right choices in many cases.
That said, I'm a big fan of creating explicit instantiations for certain templates. For example, removing certain parts of the IOStreams library from the headers and explicitly instantiating it in the library has a huge effect on compile time, especially when optimization is turned on: Calling a simple output function for an integer causes lots of templates to be instantiated. Putting this code into its own file and compiling it with the appropriate optimization options probably won't make much of a difference with respect to performance but it does have a major effect on compile times. This may have an indirect impact on performance, though: you can afford more iterations testing the performance of the code using the library.
Even when you explicitly declare a function as inline there is no guarantee that C++ make it inline, so how you think that implementing a template all in header will force an inline implementation and cause you some problems?
In almost all cases you don't need second case although you can do like that but it is not needed to avoid inline problems

Does a compiler collapse classes which are identical in their structure?

I hope this isn't a duplicate of a question itself, but the search terms are so ambiguous, I can't think of anything better.
Say we have two classes:
class FloatRect
{
float x,y,width,height;
};
and somewhere else
class FloatBox
{
float top,left,bottom,right;
};
From a practical standpoint, they're the same, so does the compiler treat them both as some sort of typedef?
Or will it produce two separate units of code?
I'm curious because I'd like to go beyond typedefs and make a few variants of a type to improve readability.
I don't want needless duplication, though...
This is completely implementation specific.
For example I can use CLang / LLVM to illustrate both point of view at once:
CLang is the C++ front-end, it uses two distinct types to resolve function calls etc... and treats them as completely different values
LLVM is the optimizer backend, it doesn't care (yet) about names, but only structural representation, and will therefore collapse them in a single type... or even entirely remove the time definition if useless.
If the question is about: does introducing a similarly laid-out class creates overhead, then the answer is no, so write the classes that you need.
Note: the same happens for functions, ie the optimizer can merge blocks of functions that are identical to get tighter code, this is not a reason to copy/paste though
They are totally unrelated classes with regards to the compiler.
If they are just POD C-structs, it won't actually generate any real code for them as such. (Yes there is a silent assignment operator and some other functions but I doubt there will be code actually compiled to do it, it will just inline them if they are used).
Since the classes you use as samples are only relevant during compilation, there's nothing to duplicate or collapse. Runtime, the member variables are simply accessed as "the value at at offset N".
This is, of course, hugely implementation-specific.
Any internal collapse here would be completely internal to the mechanism of the compiler, and would not have an effect on the produced translated code.
I would imagine it's very unlikely that this is the case, as I can think of no benefit and several ways in which this would really complicate matters. I can't present any evidence, though.
No. As they are literally two different types.
The compiler must treat them that way.
There is no magic merging going on.
No they are not treated as typedefs, because they are different types and can for example be used for overloading functions.
On the other hand, the types have no code in them so there will be nothing to duplicate.