If I have a C++ template I have two choices (without the export keyword) to link them:
Inclusion model with inlining - i.e. including the definitions together with the declarations in the .h file. This inlines all the functions and create a big unit (although it's lazy)
Inclusion model without inlining - i.e. something like including this .h file:
code:
// templateinstantiations.cpp
#include "array.cpp"
template class array <int, 50>; // explicit instantiation
every time I want to use a template, and being careful to explicit instantiating every single type I need (this can be boring and hard to maintain)
My question is: I know that excessively inlining functions may cause memory thrashing and losses of performances.. besides it seems that in both the above cases compilation times are huge.. what is the tradeoff between the first and the second approach? Is there a criterion to choose the first over the second or I just need to try them out and "time" them?
This question isn't really about templates but about inlining, I think. For the purposes of run-time performance, the compiler probably does the right choice in most cases: if it can see that a function is too big to benefit from inlining it is likely to generate a non-inlined version of any inline function, independently of the function being a template or not. Each translation unit will create its own version of the function and the linker will choose one to use (and, hopefully, throw away the other unused copies but whether it really does this depends on the linker and the object file format).
The interaction with templates comes in when looking at the various interactions between the template code and the functions it calls which may be templates themselves: When forcing the code not to be inlined, the compiler has no chance to avoid the overhead of a function call. Often the abstractions used by templates are very simple functions, e.g., "increment an iterator" and "dereference an iterator" mapping to underlying pointer operations, creation a function call can become rather expensive due to the function call overhead and the lost opportunity for optimizations. However, the compiler can actually see through this and do the right choices in many cases.
That said, I'm a big fan of creating explicit instantiations for certain templates. For example, removing certain parts of the IOStreams library from the headers and explicitly instantiating it in the library has a huge effect on compile time, especially when optimization is turned on: Calling a simple output function for an integer causes lots of templates to be instantiated. Putting this code into its own file and compiling it with the appropriate optimization options probably won't make much of a difference with respect to performance but it does have a major effect on compile times. This may have an indirect impact on performance, though: you can afford more iterations testing the performance of the code using the library.
Even when you explicitly declare a function as inline there is no guarantee that C++ make it inline, so how you think that implementing a template all in header will force an inline implementation and cause you some problems?
In almost all cases you don't need second case although you can do like that but it is not needed to avoid inline problems
Related
I'm reading the pbrt and it has defined a type:
template <int nSpectrumSamples>
class CoefficientSpectrum;
class RGBSpectrum : public CoefficientSpectrum<3> {
using CoefficientSpectrum<3>::c;
typedef RGBSpectrum Spectrum;
// typedef SampledSpectrum Spectrum;
And the author said:
"We have not written the system such that the selection of which Spectrum implementation to use could be resolved at run time; to switch to a different representation, the entire system must be recompiled. One advantage to this design is that many of the various Spectrum methods can be implemented as short functions that can be inlined by the compiler, rather than being left as stand-alone functions that have to be invoked through the relatively slow virtual method call mechanism. Inlining frequently used short functions like these can give a substantial improvement in performance."
1.Why template can inline function but normal way can not?
2.Why do normal way has to use the virtual method?
Linkage to the entire header file:
https://github.com/mmp/pbrt-v3/blob/master/src/core/spectrum.h
To inline a function call, the compiler has to know 1. which function is called and 2. the exact code of that function. The whole purpose of virtual functions is to defer the choice which function is called to run-time, so compilers can obtain the above pieces of information only with sophisticated optimization techniques that require very specific circumstances1.
Both templates and virtual functions (i.e. polymorphy) are tools for encoding abstraction. The code that uses a CoefficientSpectrum does not really care about the implementation details of the spectrum, only that you can e.g. convert it to and from RGB - that's why it uses an abstraction (to avoid repeating the code for each kind of spectrum). As explained in the comment you quoted, using polymorphy for abstraction here would mean that the compiler has a hard time optimizing the code because it fundamentally defers the choice of implementation to run-time (which is sometimes useful but not strictly necessary here). By requiring the choice of implementation to be made at compile-time, the compiler can easily optimize (i.e. inline) the code.
1For example, some compilers are able to optimize away the std::function abstraction, which generally uses polymorphy for type erasure. Of course, this can only work if all the necessary information is available.
When some C++ entity, such as a structure, class, or function, is declared as a template, then the definitions provided for said entities are solely blue-prints which must be instantiated.
Due to the fact that a template entity must be defined when it is declared (which is commonly header files) I have the conception, which I try to convince myself as wrong, that when after a template has been instantiated it will be inlined by the compiler. I would like to ask if this is so?
The answer for this question raises my suspicion when I read the the paragraph:
"Templates can lead to slower compile-times and possibly larger
executable, especially with older compilers."
Slower compile-times are clear as template must be instantiated but why "possibly larger executables"? In what ways should this be interpreted? Should I interpret this as 'many functions are inlined' or 'the executable's size increases if there are many template instantiations, that is the same template is instantiated with a lot of different types, which causes several copies of the same entity to be there'?
In the latter case, does a larger executable size cause the software to run more slowly seeing that more code must be loaded into memory which will in turn cause expensive paging?
Also, as these questions are also somewhat compiler dependent I am interested in the Visual C++ compiler. Generalised answers concerning what most compilers do give a good insight as well.
Thank you in advance.
Due to the fact that a template entity must be defined when it is declared (which is commonly header files)
Not true. You can declare and define template classes, methods and functions separately, just like you can other classes, methods and functions.
I have the conception, which I try to convince myself as wrong, that when after a template has been instantiated it will be inlined by the compiler. I would like to ask if this is so?
Some of it may be, or all of it, or none of it. The compiler will do what it deems is best.
Slower compile-times are clear as template must be instantiated but why "possibly larger executables"? In what ways should this be interpreted?
It could be interpreted a number of ways. In the same way that a bottle of Asprin contains the warning "may induce <insert side-effects here>".
Should I interpret this as 'many functions are inlined' or 'the executable's size increases if there are many template instantiations, that is the same template is instantiated with a lot of different types, which causes several copies of the same entity to be there'?
You won't have several copies of the same entity - the compiler suite is obliged to see to that. Even if methods are inline, the address of the method will:
always exist, and
be the same address when referenced by multiple compilation units.
What you may find is that you start creating more types than you intended. For example std::vector<int> is a completely different type to std::vector<double>. foo<X>() is a different function to foo<Y>(). The number of types and function definitions in your program can grow quickly.
In the latter case, does a larger executable size cause the software to run more slowly seeing that more code must be loaded into memory which will in turn cause expensive paging?
Excessive paging, probably not. Excessive cache misses, quite possibly. Often, optimising for smaller code is a good strategy for achieving good performance (under certain circumstances such as when there is little data being accessed and it's all in-cache).
Whether they are inlined is always up to the compiler. Non-inlined template instantiations are shared for all similar instantiations.
So a lot of translation units all wanting to make a Foo<int> will share the Foo instantiation. Obviously if Foo<int> is a function, and the compiler decides in each case to inline it then the code is repeated. However the choice to inline is because the optimization of doing so seems highly likely to be superior to a function call.
Technically there could be a corner case where a template causes slower execution. I work on software which has super tight inner loops for which we do a lot of performance measurements. We use a number of template functions and classes and it has yet to show up a degradation compared to writing the code by hand.
I can't be certain but I think you'd have to have a situation where the template:
- generated non-inlined code
- did so for multiple instantiations
- could have been one hand-written function
- hand-written function incurs no run-time penalty for being one function (ie: no implicit conversions involving runtime checks)
Then it's possible that you have a case where the single-handwritten-function fits within the CPU's instruction cache but the multiple template instantiations do not.
In one of my C++ projects, I make use of a many templates. Since I cannot put them in *.cpp files, they whole functions live in the headers at the moment.
But that messes up header files on the one hand, and leads to long compile times on the other hand. How can I handle implementations of templated functions in a clean way?
There isn't really a requirement that templates have to be in the header. They can very well be in a translation unit. The only requirement is that compiler is either able to instantiate them implicitly when they are used or that they get explicitly instantiated.
Whether separating the templatized code into headers and non-headers based on that is feasible depends pretty much on what is being done. It works rather well, e.g., for the IOStreams library because it is, in practice, only instantiated for the character types char and wchar_t. Writing the explicit instantiations is fairly straight forward and even if there are couple of more character types, e.g., char16_t and char32_t, it stays feasible. On the other hand, separating templates like std::vector<T> in a similar way is rather infeasible.
Using general templates like std::vector<T> in interfaces between subsystems quickly starts to become a major problem: while concrete instantiations or selected instantiations are OK, as the subsystem can be implemented without being a template, using arbitrary instantiations would force an entire system to be all templates. Doing so is infeasible in any real-world application which are often a couple of million lines of code on the small end.
What this amounts to is to use compilation-firewalls which are fully typed and don't use arbitrary templates between subsystems. To ease the use of the subsystem interfaces there may be thin template wrappers which, e.g., convert one container type into another container or which type-erase the template parameter where feasible. It needs to be recognized, however, that the compilation separation generally comes at a run-time performance cost: calling a virtual function is a lot more expensive than calling an inline function. Thus, the abstractions between subsystems may be very different from those within subsystems. For example, iterators are great internal abstractions. Between subsystems, a specific container, e.g., a std::vector<X> for some type X, tends to be more effective.
Note that the build-time interactions of templates are inherent in their dependency on specific instantiations. That is, even if there is a rather different system for declarations and template definitions, e.g., in the form of a module system rather than using header files, it won't become feasible to make everything a template. Using templates for local flexibility works great but they don't work without fixing the instantiations globally in large projects.
Finally a plug: here is a write-up on how to organize sources implementing templates.
You can just create a new header file library_detail.hpp and include it in your original header.
I've seen some people naming this implementation-header file using .t or .template extensions as well.
One thing that can help is to factor out functionality that doesn't depend on the template arguments into non templated helper functions that can be defined in an implementation file.
As an example, if you were implementing your own vector class you could factor out most of the memory management into a non template class that just works with an untyped array of bytes and have the definitions of its member functions in a .cpp file. The functions that actually need to know the type of the elements are implemented in the templated class in a header and delegate much of the work to the non-templated helpers.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I'm about to create a lexer for a project, proof of concepts of it exists, the idea works and whatnot, I was about to start writing it and I realised:
Why chars?
(I'm moving away from C, I'm still fairly suspicious of the standard libraries, I felt it easier to deal in char* for offsets and such than learn about strings)
why not w_char or something, ints, or indeed any type (given it has some defined operations).
So should I use a template? So far it seems like yes I should but there are 2 counter-arguments I can consider:
Firstly, modular complication, the moment I write "template" it must go in a header file / be available with implementation to whatever uses it (it's not a matter of hiding source code I don't mind the having to show code part, it will be free (as in freedom) software) this means extra parsing and things like that.
My C background screams not to do this, I seem to want separate .o files, I understand why I can't by the way, I'm not asking for a way.
Separate object files speed up complication, because the make file (you tell it or have it use -MM with the compiler to figure out for itself) wont run the complication command for things that haven't changed and so forth.
Secondly, with templates, I know of no way to specify what a type must do, other than have the user realise when something fails (you know how Java has an extends keyword) I suspect that C++11 builds on this, as meta-programming is a large chapter in "The C++ programming language" 4th edition.
Are these reasons important these days? I learned with the following burned into my mind:
"You are not creating one huge code file that gets compiled, you create little ones that are linked" and templates seem to go against this.
I'm sure G++ parses very quickly, but it also optimises, if it spends a lot of time optimising one file, it'll re-do that optimisation every time it sees that in a translation unit, where as with separate object files, it does a bit (general optimisations) only once, and perhaps a bit more if you use LTO (link time optimisation)
Or I could create a class that every input to the lexer derives from and use that (generic programming I believe it's called) but my C-roots say "eww virtuals" and urge me towards the char*
I understand this is quite open, I just don't know where to draw the line between using a template, and not using a template.
Templates don't have to be in the header! If you have only a few instantiations, you can explicitly instantiate the class and function templates in suitable translation units. That is, a template would be split into three parts:
A header declaring the templates.
A header including the first and implementing the template but otherwise only included in the third set of files.
Source files including the headers in 2. and explicitly instantiating the templates with the corresponding types.
Users of these template would only include the header and never the implementation header. An example where this can be done are IOStreams: There are basically just two instantiations: one for char and one for wchar_t. Yes, you can instantiate the streams for other types but I doubt that anybody would do so (I'm sometimes questioning if anybody uses stream with a different character type than char but probably people are).
That said, the concepts used by templates are, indeed, not explicitly represented in the source and C++11 doesn't add any facilities to do so either. There were discussions on adding concepts to C++ but so far they are not part of any standard. There is a concepts light proposal which, I think, will be included in C++14.
However, in practice I haven't found that much of a problem: it is quite possible to document the concepts and use things like static_assert() to potentially produce nicer error messages. The problem is more that many concepts are actually more restrictive than the underlying algorithms and that the extra slack is sometimes quite useful.
Here is a brief and somewhat made-up example of how to implement and instantiate the template. The idea is to implement something like std::basic_ostream but merely provide out scaled-down version of a string output operator:
// simple-ostream.hpp
#include "simple-streambuf.hpp"
template <typename CT>
class simple_ostream {
simple_streambuf<CT>* d_sbuf;
public:
simple_ostream(simple_streambuf<CT>* sbuf);
simple_streambuf<CT>* rdbuf() { return this->d_sbuf; } // should be inline
};
template <typename CT>
simple_ostream<CT>& operator<< (simple_ostream<CT>&, CT const*);
Except for the rdbuf() member the above is merely a class definition with a few member declarations and a function declaration. The rdbuf() function is implemented directly to show that you can mix&match the visible implementation where performance is necessary with external implementation where decoupling is more important. The used class template simple_streambuf is thought to be similar to std::basic_streambuf and, at least, declared in the header "simple-streambuf.hpp".
// simple-ostream.tpp
// the implementation, only included to create explicit instantiations
#include "simple-ostream.hpp"
template <typename CT>
simple_ostream<CT>::simple_ostream(simple_streambuf<CT>* sbuf): d_sbuf(sbuf) {}
template <typename CT>
simple_ostream<CT>& operator<< (simple_ostream<CT>& out, CT const* str) {
for (; *str; ++str) {
out.rdbuf()->sputc(*str);
}
return out;
}
This implementation header is only included when explicitly instantiating the class and function templates. For example, to instantiations for char would look like this:
// simple-ostream-char.cpp
#include "simple-ostream.tpp"
// instantiate all class members for simple_ostream<char>:
template class simple_ostream<char>;
// instantiate the free-standing operator
template simple_ostream<char>& operator<< <char>(simple_ostream<char>&, char const*);
Any use of the simple_ostream<CT> would just include simple-ostream.hpp. For example:
// use-simple-ostream.cpp
#include "simple-ostream.hpp"
int main()
{
simple_streambuf<char> sbuf;
simple_ostream<char> out(&sbuf);
out << "hello, world\n";
}
Of course, to build an executable you will need both use-simple-ostream.o and simple-ostream-char.o but assuming the template instantiations are part of a library this isn't really adding any complexity. The only real headache is when a user wants to use the class template with unexpected instantiations, say, char16_t, but only char and wchar_t are provided: In this case the user would need to explicitly create the instantiations or, if necessary, include the implementation header.
In case you want to try the example out, below is a somewhat simple-minded and sloppy (because being header-only) implementation of simple-streambuf<CT>:
#ifndef INCLUDED_SIMPLE_STREAMBUF
#define INCLUDED_SIMPLE_STREAMBUF
#include <iostream>
template <typename CT> struct stream;
template <>
struct stream<char> {
static std::ostream& get() { return std::cout; }
};
template <>
struct stream<wchar_t> {
static std::wostream& get() { return std::wcout; }
};
template <typename CT>
struct simple_streambuf
{
void sputc(CT c) {
stream<CT>::get().rdbuf()->sputc(c);
}
};
#endif
Yes, it should be limited to chars. Why ? Because you're asking...
I have little experience with templates, but when I used templates the necessity arose naturally, I didn't need to try to use templates.
My 2 cents, FWIW.
1: Firstly, modular complication, the moment I write "template" it must go in a header file…
That's not a real argument. You have the ability to use C, C++, structs, classes, templates, classes with virtual functions, and all the other benefits of a multi paradigm language. You're not coerced to take an all-or-nothing approach with your designs, and you can mix and match these functionalities based on your design's needs. So you can use templates where they are an appropriate tool, and other constructs where templates are not ideal. It's hard to know when that will be, until after you have had experience using them all. Template/header-only libraries are popular, but one of the reasons the approach is used is that they simplify linking and the build process, and can reduce dependencies if designed well. If they are designed poorly, then yes, they can result in an explosion in compile times. That's not the language's fault -- it's the implementor's design.
Heck, you could even put your implementations behind C opaque types and use templates for everything, keeping the core template code visible to exactly one translation.
2: Secondly, with templates, I know of no way to specify what a type must do…
That is generally regarded as a feature. Instantiation can result in further instantiations which is capable of instantiating different implementations and specializations -- this is template meta programming domain. Often, all you really need to do is instantiate the implementation, which results in evaluation of the type and parameters. This -- simulation of "concepts" and interface verification -- can increase your build times, however. But furthermore, that may not be the best design because deferring instantiation is in many cases preferable.
If you just need to brute-force instantiate all your variants, one approach would be to create a separate translation which does just that -- you don't even need to link it to your library; add it to a unit test or some separate target. That way, you could validate instantiation and functionalities are correct without significant impact to your clients including/linking to the library.
Are these reasons important these days?
No. Build times are of course very important, but I think you just need to learn the right tool to use, and when and why some implementations must be abstracted (or put behind compilation firewalls) when/if you need fast builds and scalability for large projects. So yes, they are important, but a good design can strike a good balance between versatility and build times. Also remember that template metaprogramming is capable of moving a significant amount of program validation from runtime to compile time. So a hit on compile times does not have to be bad, because it can save you from a lot of runtime validations/issues.
I'm sure G++ parses very quickly, but it also optimises, if it spends a lot of time optimising one file, it'll re-do that optimisation every time it sees that in a translation unit…
Right; That redundancy can kill fast build times.
where as with separate object files, it does a bit (general optimisations) only once, and perhaps a bit more if you use LTO (link time optimisation) … Separate object files speed up complication, because the make file (you tell it or have it use -MM with the compiler to figure out for itself) wont run the complication command for things that haven't changed and so forth.
Not necessarily so. First, many object files produce a lot of demand on the linker. Second, it multiplies the work because you have more translations, so reducing object files is a good thing. This really depends on the structure of your libraries and dependencies. Some teams take the approach the opposite direction (I do quite regularly), and use an approach which produces few object files. This can make your builds many times faster with complex projects because you eliminate redundant work for the compiler and linker. For best results, you need a good understanding of the process and your dependencies. In large projects, translation/object reductions can result in builds which are many times faster. This is often referred to as a "Unity Build". Large Scale C++ Design by John Lakos is a great read on dependencies and C++ project structures, although it's rather dated at this point so you should not take every bit of advice at face value.
So the short answer is: Use the best tool for the problem at hand -- a good designer will use many available tools. You're far from exhausting the capabilities of the tools and build systems. A good understanding of these subjects will take years.
In C++ when it is possible to implement the same functionality using either run time (sub classes, virtual functions) or compile time (templates, function overloading) polymorphism, why would you choose one over the other?
I would think that the compiled code would be larger for compile time polymorphism (more method/class definitions created for template types), and that compile time would give you more flexibility, while run time would give you "safer" polymorphism (i.e. harder to be used incorrectly by accident).
Are my assumptions correct? Are there any other advantages/disadvantages to either? Can anyone give a specific example where both would be viable options but one or the other would be a clearly better choice?
Also, does compile time polymorphism produce faster code, since it is not necessary to call functions through vtable, or does this get optimized away by the compiler anyway?
Example:
class Base
{
virtual void print() = 0;
}
class Derived1 : Base
{
virtual void print()
{
//do something different
}
}
class Derived2 : Base
{
virtual void print()
{
//do something different
}
}
//Run time
void print(Base o)
{
o.print();
}
//Compile time
template<typename T>
print(T o)
{
o.print();
}
Static polymorphism produces faster code, mostly because of the possibility of aggressive inlining. Virtual functions can rarely be inlined, and mostly in a "non-polymorphic" scenarios. See this item in C++ FAQ. If speed is your goal, you basically have no choice.
On the other hand, not only compile times, but also the readability and debuggability of the code is much worse when using static polymorphism. For instance: abstract methods are a clean way of enforcing implementation of certain interface methods. To achieve the same goal using static polymorphism, you need to restore to concept checking or the curiously recurring template pattern.
The only situation when you really have to use dynamic polymorphism is when the implementation is not available at compile time; for instance, when it's loaded from a dynamic library. In practice though, you may want to exchange performance for cleaner code and faster compilation.
After you filter out obviously bad and suboptimal cases I believe you're left with almost nothing. IMO it is pretty rare when you're facing that kind of choice. You could improve the question by stating an example, and for that a real comparison van be provided.
Assuming we have that realistic choice I'd go for the compile time solution -- why waste runtime for something not absolutely necessary? Also is something is decided at compile time it is easier to think about, follow in head and do evaluation.
Virtual functions, just like function pointers make you unable to create accurate call graphs. You can review the bottom but not easily from the top. virtual functions shall follow some rules but if they don't, you have to look all of them for the sinner.
Also there are some losses on performance, probably not a big deal in majority of cases but if no balance on the other side, why take it?
In C++ when it is possible to implement the same functionality using either run time (sub classes, virtual functions) or compile time (templates, function overloading) polymorphism, why would you choose one over the other?
I would think that the compiled code would be larger for compile time polymorphism (more method/class definitions created for template types)...
Often yes - due to multiple instantiations for different combinations of template parameters, but consider:
with templates, only the functions actually called are instantiated
dead code elimination
constant array dimensions allowing member variables such as T mydata[12]; to be allocated with the object, automatic storage for local variables etc., whereas a runtime polymorphic implementation might need to use dynamic allocation (i.e. new[]) - this can dramatically impact cache efficiency in some cases
inlining of function calls, which makes trivial things like small-object get/set operations about an order of magnitude faster on the implementations I've benchmarked
avoiding virtual dispatch, which amounts to following a pointer to a table of function pointers, then making an out-of-line call to one of them (it's normally the out-of-line aspect that hurts performance most)
...and that compile time would give you more flexibility...
Templates certainly do:
given the same template instantiated for different types, the same code can mean different things: for example, T::f(1) might call a void f(int) noexcept function in one instantiation, a virtual void f(double) in another, a T::f functor object's operator()(float) in yet another; looking at it from another perspective, different parameter types can provide what the templated code needs in whatever way suits them best
SFINAE lets your code adjust at compile time to use the most efficient interfaces objects supports, without the objects actively having to make a recommendation
due to the instantiate-only-functions-called aspect mentioned above, you can "get away" with instantiating a class template with a type for which only some of the class template's functions would compile: in some ways that's bad because programmers may expect that their seemingly working Template<MyType> will support all the operations that the Template<> supports for other types, only to have it fail when they try a specific operation; in other ways it's good because you can still use Template<> if you're not interested in all the operations
if Concepts [Lite] make it into a future C++ Standard, programmers will have the option of putting stronger up-front contraints on the semantic operations that types used as template paramters must support, which will avoid nasty surprises as a user finds their Template<MyType>::operationX broken, and generally give simpler error messages earlier in the compile
...while run time would give you "safer" polymorphism (i.e. harder to be used incorrectly by accident).
Arguably, as they're more rigid given the template flexibility above. The main "safety" problems with runtime polymorphism are:
some problems end up encouraging "fat" interfaces (in the sense Stroustrup mentions in The C++ Programming Language): APIs with functions that only work for some of the derived types, and algorithmic code needs to keep "asking" the derived types "should I do this for you", "can you do this", "did that work" etc..
you need virtual destructors: some classes don't have them (e.g. std::vector) - making it harder to derive from them safely, and the in-object pointers to virtual dispatch tables aren't valid across processes, making it hard to put runtime polymorphic objects in shared memory for access by multiple processes
Can anyone give a specific example where both would be viable options but one or the other would be a clearly better choice?
Sure. Say you're writing a quick-sort function: you could only support data types that derive from some Sortable base class with a virtual comparison function and a virtual swap function, or you could write a sort template that uses a Less policy parameter defaulting to std::less<T>, and std::swap<>. Given the performance of a sort is overwhelmingly dominated by the performance of these comparison and swap operations, a template is massively better suited to this. That's why C++ std::sort clearly outperforms the C library's generic qsort function, which uses function pointers for what's effectively a C implementation of virtual dispatch. See here for more about that.
Also, does compile time polymorphism produce faster code, since it is not necessary to call functions through vtable, or does this get optimized away by the compiler anyway?
It's very often faster, but very occasionally the sum impact of template code bloat may overwhelm the myriad ways compile time polymorphism is normally faster, such that on balance it's worse.