Template specialization vs. Compiler optimization

Template specialization vs. Compiler optimization - c++

I have a class with a boolean template argument.
template<bool b>
class Foo {
void doFoo();
};
I want doFoo to do different things based on the value of b.
Naively I could write
Option 1
template<bool b> void Foo<b>::doFoo() {
if (b) cout << "hello";
else cout << "goodbye";
}
This seems inefficient to me because I have to execute an if every time the function is called event though the correct branch should be known at compile time. I could fix this with template specialization:
Option 2
template<> void Foo<true>::doFoo() { cout << "hello"; }
template<> void Foo<false>::doFoo() { cout << "goodbye"; }
This way I don't have any conditionals executed in runtime. This solution is a bit more complicated (especially since in my real code the class has several template arguments and you can't partially specialize functions so I will need to wrap the function in a class).
My question is, is the compiler smart enough to know not to execute the conditional in option 1 since it always executes the same way or do I need to write the specializations? If the compiler is smart enough I would be happy to know if this is compiler dependent or a language feature that I can rely on?

The compiler will probably optimize the branch away since it is known at compile time what b is. This is not guaranteed though and the only way to know for sure is to check the assembly.
If you can use C++17 you can use if constexpr and that guarantees only one branch will exist.

This seems inefficient to me because I have to execute an if every time the function is called
The compiler will probably optimize this away - but it's not guaranteed by the standard. To be certain, you should look at the output of the compiler you care about (with the compile options you plan to use): eg. clang doesn't have a branch in the linked example (the un-optimized version has lots of function call boilerplate but no branch).
In C++17 you can use if constexpr, and the branch not taken will be discarded at compile time.

Related

How do I force the compiler to evaluate a switch at compile time?

I'm working on an embedded project (only C++14 compiler available) and I would like to optimize the speed of execution.
Here is an example of what I am doing.
enum gpio_type{
TYPE_1,
TYPE_2
}
template <gpio_type T>
class test{
test(){}
void set_gpio(bool output)
{
switch (T)
{
case TYPE_1:
do_something();
break;
case TYPE_2:
do_something_else();
break;
}
}
}
Will the compiler automatically remove the dead code at compile time? If it does, it is a standard feature or compiler dependent? If it does not, is it possible to write the code in a way that will force optimizing?

You could specialize set_gpio for the different enum values - eg. :
template <gpio_type T>
class test{
public:
test(){}
void set_gpio(bool output);
};
template<> void test<TYPE_1>::set_gpio(bool output) {
do_something();
}
template<> void test<TYPE_2>::set_gpio(bool output) {
do_something_else();
}
As other answers have indicated, you might not need to if your compiler is anywhere close to decent at optimizing. But the above might be more readable nevertheless.

Constant propagation and dead code elimination are one of the simplest compiler optimizations. And since T is a compile time constant I would be extremely extremely surprised if the code isn't optimized by any compiler.
I have tested 15 compilers and platforms on godbolt from the venerable x86 to arm, avr, risc-v, raspberry and arduino (and more). All of them just compile to the equivalent of a tail call jump. No tests, no conditional jumps. Go check it out for yourself.
At this point I can say with reasonable confidence that there is no performance reason to modify your code.

That might depend on if you turn on optimization or not and how intelligent your compiler is. I guess current compilers would optimize in this case at least if they inline the function.
But if you want to be 100% sure
specialize the template for different enum values or
use your switch and look at the assembler output of your compiler to check if the compiler optimized like you want it or
use C++17 and if constexpr

I would pass the functor (do_something or do_something_else) as a template argument.
Thus, your code of set_gpio becomes clearer and you are sure to have a compile time deduction which function to take.
In this post you can see how this is done:
Function passed as template argument

What is practical difference between `inline` and `template<class = void>`?

We have 2 methods to declare function in header-only library. They are inline and template<class = void>. In boost source code I can see both variants. Example follows:
inline void my_header_only_function(void)
{
// Do something...
return;
}
template<class = void> void my_header_only_function(void)
{
// Do something...
return;
}
I know what is difference according to C++ standard. However, any C++ compiler is much more than just standard, and also standard is unclear often.
In situation where template argument is never used and where it is not related to recursive variadic template, is there (and what is) practical difference between 2 variants for mainstream compilers?

I think this can be used as a weird way to allow library extension (or mocking) from outside library code by providing specialization for void or a non-template version of the function in the same namespace:
#include <iostream>
template<class = void>
int
foo(int data)
{
::std::cout << "template" << std::endl;
return data;
}
// somewhere else
int
foo(int data)
{
::std::cout << "non-template" << std::endl;
return data;
}
int main()
{
foo(1); // non template overload is selected
return 0;
}
online compiler

One difference is that binary code for the function may become part of the generated object file even if the function is never used in that file, but there will never be any code for the template if it's not used.

I'm the author of Beast. Hopefully I will be able to shed some light on why you see one versus the other. It really is very simple, the template seems less likely to be inlined into calling functions, bloating the code needlessly. I know that "inline" is really only supposed to mean "remove duplicate definitions" but sometimes compiler implementors get overzealous. The template thing is a little bit harder on the compile (Travis craps out sometimes at only 2GB RAM). So I decided to try writing some new stuff using the "inline" keyword. I still don't know how I feel about it.
The short answer is that I was doing it one way for a long time and then I briefly did it the other way for no particularly strong reason. Sorry if that is not as exciting as the other theories! (which were very interesting in fact)

Calling a method on an unknown type through a template function

The C++ language allows me to write a template function that will call a method on the object that gets passed to the function.
The concern here is when I do this, my IDE (NetBeans 8.2) will complain that it can't resolve the method. This makes sense because the type that will be used for "T" is not known until I try to compile the code, but the fact that it gives me a warning makes me a little concerned about whether or not doing this is bad programming practice.
Consider the following code:
struct S1 {
void method() const {
cout << "Method called from S1!" << endl;
}
};
struct S2 {
void method() const {
cout << "Method called from S2!" << endl;
}
};
template <typename T>
void call_method(const T& object) {
// IDE reports a warning "can't resolve the template based identifier method."
object.method();
}
Usage example:
S1 obj1;
S2 obj2;
call_method(obj1);
call_method(obj2);
This code will compile and work just fine, but the IDE will always complain. Is this okay to do? Or is there a better design that I should know about to get the same desired results.
The desired result is to write a function that can use S1 or S2 objects without caring which one it is as long as they provide an interface that includes "method()".
Assume I do not have access to the source code of S1 and S2 so I can't make changes to them. (For example I cannot make them inherit from a common base class and use dynamic polymorphism instead of a template function to achieve the same effect).

This is perfectly OK and is commonly used in a lot of cases. Dealing with a generic container from the standard library or different types of iterators for example.
If the type passed in does not have an appropriate method you will get a compilation error.
If you want you can use SFINAE to make sure the type passed in is among the types you expect. Can be good or useful sometimes, but often not needed.
Update: static_assert is another useful way of putting constraints on a template as pointed out by #Evgeny

Will C++ optimize out empty/non-virtual/void method calls?

Example code:
class DummyLock {
public:
void lock() {}
void unlock() {}
};
...
template <class T>
class List {
T _lock;
...
public:
void append(void* smth) {
_lock.lock();
...
_lock.unlock();
}
};
...
List<DummyLock> l;
l.append(...);
So, will it optimize out these method calls if lock type is a templated type? If no, what is the best approach to making a template list that has policies as template arguments (as in Andrei Alexandrescu C++ book)

Assuming inlining is enabled (so "some optimisation turned on"), then yes, any decent compiler should make this sort of thing into zero instructions. Particularly in a template, as templates require [in nearly all of the current compilers, at least] the compiler to "see" the source of the object. In a non-templated sitution, then it's possible to come up with a scenario where you are "out of line" declaring the empty lock code, and the compiler can't know that the function is empty.
(Looks scary with void *smth in your append tho' - I hope you do intent to have that as a second template type in your real implementation)
As always when it comes to "does the compiler do this", if it's really important, you need to check that YOUR compiler does what you expect in this particular case. clang++ -S or g++ -S would for example show if there are calls made or not within your append function.

Yes, any real-world C++ compiler (i.e. gcc, cland, VC++), will output no code for empty inline functions when optimization is turned on.

Why uncalled template class members aren't instantiated?

I was wondering that when I create an instance of a class template with specifying the template type parameter.
1) why the non-called function do not get instatiated ? .
2) dont they get compiled until I try to use it ?
3) what is the logic behind this behavior ?
Example
template <class T>
class cat{
public:
T a;
void show(){
cout << a[0];
}
void hello(){
cout << "hello() get called \n";
}
};
int main(){
cat<int> ob1; // I know that show() did not get instatiated, otherwise I will get an error since a is an int
ob1.hello();
}

Templates aren't code - they're a pattern used to make the actual code. The template isn't complete until you supply the parameters so the code can't be made ahead of time. If you don't call a function with a particular set of template parameters, the code never gets generated.

If they instantiated the entire class, then you might get invalid code.
You don't always want that.
Why? Because in C++, it's difficult (and in some cases, outright impossible, as far as I know) to say, "only compile this code if X, Y, and Z are true".
For example, how would you say, "only my copy constructor if the embedded object can be copied"? As far as I know, you can't.
So they just made them not compile unless you actually call them.

To embellish a little more: this is typically called duck typing, and the bottom line is that it allows you to write "class patterns" of which some member functions may apply when instantiated with one template type, other members may apply when instantiated with a second template type, and by only requiring the ones you actually call to compile, you get to write a lot less code for operations that wind up being common ones.
By not requiring all member functions to be compiled you get all the niceties of static type checking on the functions that actually are compiled.
For example, imagine you had:
template <typename E>
class myContainer {
// Imagine that constructors, setup functions, etc. were here
void sort(); // this function might make sense only if E has an operator< defined
E max(); // compute the max element, again only makes sense with a operator<
E getElement(int i); // return the ith element
E transmogrify(); // perhaps this operation only makes sense on vectors
};
Then you have
// sort() and getElement() makes total sense on this, but not transmogrify()
myContainer<int> mci;
// sort and max might not be needed, but getElement() and transmogrify() might
myContainer<vector<double>> mcvd;

No code is generated for cat<int>::show() because you never call it. If you did call it you would get a compilation error. Template functions which are never called do not exist.
Templates are little more than test substitution mechanisms. This makes them very powerful. You, as the programmer, may want to create a cat<int> knowing that you will never call show() or call anything else that would be invalid. The compiler lets you know if you did, so it works out nicely.
So if your question is "why does it work this way", I would ask you "why not"? It's a design choice. This choice allows me to use a template type safely and still benefit from other parts of the code. What's the harm? You also generate less code, which is a good thing, right?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Template specialization vs. Compiler optimization - c++

The compiler will probably optimize the branch away since it is known at compile time what b is. This is not guaranteed though and the only way to know for sure is to check the assembly. If you can use C++17 you can use if constexpr and that guarantees only one branch will exist.

Related

How do I force the compiler to evaluate a switch at compile time?

What is practical difference between `inline` and `template<class = void>`?

Calling a method on an unknown type through a template function

Will C++ optimize out empty/non-virtual/void method calls?

Why uncalled template class members aren't instantiated?

Categories

Resources