Optimization for specific argument without template

Optimization for specific argument without template - c++

I ran into some optimized code that is fast, but it makes my code ugly.
A minimal example is as follows:
enum class Foo : char {
A = 'A',
B = 'B'
};
struct A_t {
constexpr operator Foo() const { return Foo::A; }
};
void function_v1(Foo s){
if(s == Foo::A){
//Run special version of the code
} else {
//Run other version of the code
}
}
template<class foo_t>
void function_v2(foo_t s){
if(s == Foo::A){
//Run special version of the code
} else {
//Run other version of the code
}
}
int main(){
// Version 1 of the function, simple call, no template
function_v1(Foo::A);
// Version 2 of the function, templated, but call is still simple
function_v2(Foo::A);
// Version 2 of the function, the argument is now not of type Foo, but of type A_t
const A_t a;
function_v2(a);
}
For that last function call function_v2 will be instantiated with a specific version for A_t. This may be bad for the size of the executable, but in experiments, I notice that the compiler is able to recognize that switch == Foo::A will always evaluate to true and the check is optimized away. Using gcc, This check is not optimized away in the other versions, even with -O3.
I'm working on an extremely performance intensive application, so such optimizations matter. However, I don't like the style of function_v2. To protect against calling the function with the wrong type, I would have to do something like enable_if to make sure the function isn't called with the wrong type. It complicates autocompletion because the type is now templated. And now the user needs to keep in mind to call the function using that specifically typed variable instead of the enum value.
Is there a way to write a function in the style of function_v1, but still have the compiler make different instantiations? Maybe a slightly different coding style? Or a compiler hint in the code? Or some compiler flag that will make the compiler more likely to make multiple instantiations?

Is there a way to write a function in the style of function_v1, but still have the compiler make different instantiations?
If we expand your example a bit to better reveal the compiler's behavior:
enum class Foo : char {
A = 'A',
B = 'B'
};
struct A_t {
constexpr operator Foo() const { return Foo::A; }
};
void foo();
void bar();
void function_v1(Foo s){
if(s == Foo::A){
foo();
} else {
bar();
}
}
template<class foo_t>
void function_v2(foo_t s){
if(s == Foo::A){
foo();
} else {
bar();
}
}
void test1(){
function_v1(Foo::A);
}
void test2(){
function_v2(Foo::A);
}
void test3(){
const A_t a;
function_v2(a);
}
And compile with -O3, we get:
test1(): # #test1()
jmp foo() # TAILCALL
test2(): # #test2()
jmp foo() # TAILCALL
test3(): # #test3()
jmp foo() # TAILCALL
See on godbolt.org: https://gcc.godbolt.org/z/443TqcczW
The resulting assembly for test1(), test2() and test3() are the exact same! What's going on here?
The if being optimized out in function_v2() has nothing to do with it being a template, but rather the fact that it is defined in a header (which is a necessity for templates), and the full implementation is visible at call sites.
All you have to do to get the same benefits for function_v1() is to define the function in a header and mark it as inline to avoid ODR violations. You will effectively get the exact same optimizations as are happening in function_v2().
All this gives you is equivalence though. If you want guarantees, you should forcefully provide the value at compile time, as a template parameter:
template<Foo s>
void function_v3() {
if constexpr (s == Foo::A) {
foo();
}
else {
bar();
}
}
// usage:
function_v3<Foo::A>();
If you still need a runtime-evaluated version of the function, you could do something along these lines:
decltype(auto) function_v3(Foo s) {
switch(s) {
case Foo::A:
return function_v3<Foo::A>();
case Foo::B:
return function_v3<Foo::B>();
}
}
// Forced compile-time switch
function_v3<Foo::A>();
// At the mercy of the optimizer.
function_v3(some_val);

How about using template specialization:
template<class T>
void function_v2_other(T s){
//Run other version of the code
}
template<class T>
void function_v2(T s){
function_v2_other(s);
}
template<>
void function_v2(Foo s){
if(s == Foo::A){
//Run special version of the code
} else {
function_v2_other(s);
}
}

Related

How to declare the template argument for an overloaded function

I have a fairly big project that, regarding this question,
I can summarize with
this structure:
void do_something()
{
//...
}
template<typename F> void use_funct(F funct)
{
// ...
funct();
}
int main()
{
// ...
use_funct(do_something);
}
All is working ok until someone (me) decides to reformat a little
minimizing some functions, rewriting
as this minimum reproducible example:
void do_something(const int a, const int b)
{
//...
}
void do_something()
{
//...
do_something(1,2);
}
template<typename F> void use_funct(F funct)
{
// ...
funct();
}
int main()
{
// ...
use_funct(do_something);
}
And now the code doesn't compile with
error: no matching function for call
where use_funct is instantiated.
Since the error message was not so clear to me
and the changes were a lot I wasted a considerable
amount of time to understand that the compiler
couldn't deduce the template parameter
because do_something could now refer to
any of the overloaded functions.
I removed the ambiguity changing the function name,
but I wonder if there's the possibility to avoid
this error in the future not relying on template
argument deduction.
How could I specify in this case the template argument for do_something(), possibly without referring to a function pointer?
I haven't the slightest idea to express explicitly:
use_funct<-the-one-with-no-arguments->(do_something);

You can wrap the function in a lambda, or pass a function pointer after casting it to the type of the overload you want to call or explicitly specify the template parameter:
use_funct([](){ do_something (); });
use_funct(static_cast<void(*)()>(do_something));
use_funct<void()>(do_something);
Wrapping it in a lambda has the advantage, that it is possible to defer overload resolution to use_func. For example:
void do_something(int) {}
void do_something(double) {}
template<typename F> void use_funct(F funct) {
funct(1); // calls do_something(int)
funct(1.0); // calls do_something(double)
}
int main() {
use_funct([](auto x){ do_something (x); });
}
[...] possibly without referring to a function pointer?
I am not sure what you mean or why you want to avoid that. void() is the type of the function, not a function pointer. If you care about spelling out the type, you can use an alias:
using func_type = void();
use_funct<func_type>(do_something);

Crash in shared_ptr destructor in templated function

In my 32-bit VS2015 application, I have a templated function that accesses functions of a library (BTK). Depending on the type of this function, a specific overload of a function of this library is called.
This works fine, but recently I'm using this same code and library (same binaries and code) in another (also VS2015 32-bit) application, and it segfaults/access violation in the destructor of shared_ptr. To be precise, it crashes at the (interlocked) decrement of the use count.
void _Decref()
{ // decrement use count
if (_MT_DECR(_Uses) == 0) // BOOM
{ // destroy managed resource, decrement weak reference count
_Destroy();
_Decwref();
}
}
Now comes the interesting part, when I replace my templated function with a non-templated version, it works fine..
So, if I replace this:
template<class T>
bool SetParameters(const std::string& group, const std::string& param, const std::vector<T>& values, const std::vector<uint8_t>& dims)
{
btk::MetaData::Pointer pParam = GetBtkMetaData(group, param);
if (!pParam)
{
pParam = AddBtkMetaData(group, param);
}
if (!pParam->HasInfo())
{
pParam->SetInfo(btk::MetaDataInfo::New(dims, values));
}
else pParam->GetInfo()->SetValues(dims, values);
return true;
}
with this:
bool C3DFile::SetParameters(const std::string& group, const std::string& param, const std::vector<int16_t>& values, const std::vector<uint8_t>& dims)
{
btk::MetaData::Pointer pParam = GetBtkMetaData(group, param);
if (!pParam)
{
pParam = AddBtkMetaData(group, param);
}
if (!pParam->HasInfo())
{
pParam->SetInfo(btk::MetaDataInfo::New(dims, values));
}
else pParam->GetInfo()->SetValues(dims, values);
return true;
}
It works fine... Apparantly, the template-instantiation has some effect on the shared pointers. I have three questions:
What kind of effect could templates have on this? I can imagine that the code instantiation could have some effect, but I'm not sure.
Why would the templated version work, with the same binaries etc, in one 32-bit VS2015 app, but not in the other? (Where I need to resort to non-templated functions)
Which compiler/linker options could be relevant? I checked the compiler and linker options, but couldn't find a relevant difference.
Any help would be appreciated.
Ben

What kind of effect could templates have on this? I can imagine that the code instantiation could have some effect, but I'm not sure.
ADL: the template method will use ADL to find the dependent methods (in your case btk::MetaDataInfo::New(dims, values)), whereas the non template only considers visible declarations, so the possible difference.
Example:
struct A{};
void fooT(const void*) { std::cout << "void*\n"; }
template <typename T> void barT(const T* p) { fooT(p); }
void fooT(const A*) { std::cout << "A*\n"; }
void foo(const void*) { std::cout << "void*\n"; }
void bar(const A* p) { foo(p); }
void foo(const A*) { std::cout << "A*\n"; }
int main()
{
A a{};
barT(&a); // fooT(const A*) -> A*
bar(&a); // foo(const void*) -> void*
}
Demo

Is testing for a return value of an inline (templated) function, which is itself testing, optimized to one test?

Let's say I have a function
bool inline fn(int a) {
if (a == 0) {
a = 1; //or some other computation
return true;
}
return false;
}
int main() {
int a = 0;
if (fn(a)) {
return 1;
}
}
will the main code be roughly inlined to:
int a = 0;
bool test = false;
if (a == 0) {
a = 1; //or some other computation
test = true;
}
if (test) {
return 1;
}
thus resulting in two ifs, OR rather it would look more like this:
int a = 0;
if (a == 0) {
a = 1; //or some other computation
return 1;
}
which I obviously wanted to achieve. I'm using functions here not to make the executable smaller or anything, but purely to make the code more readable.
Actually why I do this is in the next example – imagine the function fn is templated, so that I can choose of more implementations of the function, while having the caller function exhibiting common behavior to all it's template instances, delegating the specific functionality to called functions.
Again this usage is purely for code reuse and readability. The functions will called/inlined in a single place in the code (that is the base_function).
I want to know, if tests on return values of the functions are efficiently optimized, so this code reuse technique doesn't interfere with performace/ actual execution at all.
template<typename TagSwitch, typename ... Args>
void base_function(Args ... args) {
// some base behavior meant to be common to all functions "derived" from this function
if (do_end(TagSwitch(), args ...)) {
return;
}
//specific_behavior(TagSwitch(), args ...);
}
// default for all specific ("derived") functions is don't end
template<typename TagSwitch, typename ... Args>
bool inline do_end(TagSwitch, Args ... args) {
return false;
}
// here I define my specific ("derived") function
struct MySpecificFunctionTag {};
template<typename ... Args>
bool inline do_end(MySpecificFunctionTag, int a, Args ... args) {
if (a == 0) {
//Do something (set a parameter)
return true;
}
return false;
}
int main() {
base_function<MySpecificFunctionTag>(1);
}
I'd like to know, if the test if (do_end(TagSwitch(), args ...)) { in base_function<MySpecificFunctionTag>(1) instantiation would result in two ifs or one would be optimized out.

Is testing for a return value of an inline (templated) function, which is itself testing, optimized to one test?
It can be optimized to one test, yes. In fact, both of your tests can be optimized away, since the value of the tested expression can be known at compile time. The entire program can be optimized to:
main:
mov eax, 1
ret
I.e. Always return 1, and do nothing else. The latter, template example returns 0 instead, but is otherwise identical.
The test cannot be optimized away from the function fn, and the check after the function call cannot be optimized away unless the return value of the function can be known at compile time. So, a prerequisite for merging the tests to one is that the optimizer must be able to expand the call inline.

How are template arguments expanded

I am confused about the expansion of this template [example one]. If bool b is checked at runtime in the constructor where is b stored ? is it put into the private data section [example two]? or does it become compile time and remove a branched based on the bool? or does it simply "paste" what was passed as a template argument into the if(b) [example 3]
Example one:
template<bool b>
class Test
{
public:
Test()
{
if(b)
{
// do something
}
else
{
// do something else
}
}
};
Example two:
class Test
{
public:
Test()
{
if(b)
{
// do something
}
else
{
// do something else
}
}
private:
bool b = true;
};
Example three:
//called with Test<true>
class Test
{
public:
Test()
{
if(true)
{
// do something
}
else
{
// do something else - probably removed due too compiler optimization
}
}
};

Example 3 is the snippet that more closely resembles what the compiler is doing. It's important to understand that example 2 is wrong, as the template parameter is evaluated at compile-time and not injected into the class as a field.
Doing if(b){ } else { } where b is a template bool parameter will require both branches of the if statement to be both parseable and well-formed, even if the compiler will very likely optimize out the branch that doesn't match b.
If you want guaranteed compile-time branch evaluation, and if you need only the taken branch to be well-formed, you can use if constexpr(...) in C++17:
if constexpr(b)
{
// do something
}
else
{
// do something else
}
...or implement your own static_if construct in C++14...
...or use an explicit template specialization in C++11.
I cover all of these techniques in my CppCon 2016 talk, "Implementing static control flow in C++14".

C++ specialization, type_of or just typeid

I would like to know what is better to use in my situation and why. First of all I heard that using RTTI (typeid) is bad. Anyone could explain why? If I know exactly types what is wrong to compare them in a runtime? Furthermore is there any example how to use boost::type_of? I have found none searching through the mighty google :) Other solution for me is specialization, but I would neet to specialize at least 9 types of new method. Here is an example what I need:
I have this class
template<typename A, typename B, typename C>
class CFoo
{
void foo()
{
// Some chunk of code depends on old A type
}
}
So I need to rather check in typeid(what is I heard is BAD) and make these 3 realizations in example like:
void foo()
{
if (typeid(A) == typeid(CSomeClass)
// Do this chunk of code related to A type
else
if (typeid(B) == typeid(CSomeClass)
// Do this chunk of code related to B type
else
if (typeid(C) == typeid(CSomeClass)
// Do this chunk of code related to C type
}
So what is the best solution? I don't want to specialize for all A,B,C, because every type is has 3 specializations so I will get 9 methods or just this typeid check.

It's bad because
A, B and C are known at compile-time but you're using a runtime mechanism. If you invoke typeid the compiler will make sure to include metadata into the object files.
If you replace "Do this chunk of code related to A type" with actual code that makes use of CSomeClass's interface you'll see you won't be able to compile the code in case A!=CSomeClass and A having an incompatible interface. The compiler still tries to translate the code even though it is never run. (see example below)
What you normally do is factoring out the code into separate function templates or static member functions of classes that can be specialized.
Bad:
template<typename T>
void foo(T x) {
if (typeid(T)==typeid(int*)) {
*x = 23; // instantiation error: an int can't be dereferenced
} else {
cout << "haha\n";
}
}
int main() {
foo(42); // T=int --> instantiation error
}
Better:
template<typename T>
void foo(T x) {
cout << "haha\n";
}
void foo(int* x) {
*x = 23;
}
int main() {
foo(42); // fine, invokes foo<int>(int)
}
Cheers, s

Well generally solutions can be come up with without RTTI. It "can" show you haven't thought the design of the software out properly. THAT is bad. Sometimes RTTI can be a good thing though.
None-the-less there IS something odd in what you want to do. Could you not create an interim template designed something like as follows:
template< class T > class TypeWrapper
{
T t;
public:
void DoSomething()
{
}
};
then partially specialise for the functions you want to as follows:
template<> class TypeWrapper< CSomeClass >
{
CSomeClass c;
public:
void DoSomething()
{
c.DoThatThing();
}
};
Then in your class define above you would do something such as ...
template
class CFoo
{
TypeWrapper< A > a;
TypeWrapper< B > b;
TypeWrapper< C > c;
void foo()
{
a.DoSomething();
b.DoSomething();
c.DoSomething();
}
}
This way it only actually does something in the "DoSomething" call if it is going through the partially specialised template.

The problem lies in the code chunks you write for every specialization.
It doesn't matter if you write (lengthwise)
void foo()
{
if (typeid(A) == typeid(CSomeClass)
// Do this chunk of code related to A type
else
if (typeid(B) == typeid(CSomeClass)
// Do this chunk of code related to B type
else
if (typeid(C) == typeid(CSomeClass)
// Do this chunk of code related to C type
}
or
void foo()
{
A x;
foo_( x );
B y;
foo_( y );
C z;
foo_( z );
}
void foo_( CSomeClass1& ) {}
void foo_( CSomeClass2& ) {}
void foo_( CSomeClass3& ) {}
The upside of the second case is, when you add a class D, you get reminded by the compiler that there is an overload for foo_ missing which you have to write. This can be forgotten in the first variant.

I'm afraid this is not going to work in the first place. Those "chunks of code" have to be compilable even if the type is not CSomeClass.
I don't think type_of is going to help either (if it is the same as auto and decltype in C++0x).
I think you could extract those three chunks into separate functions and overload each for CSomeClass. (Edit: oh there are else if's. Then you might indeed need lots of overloads/specialization. What is this code for?)
Edit2: It appears that your code is hoping to do the equivalent of the following, where int is the special type:
#include <iostream>
template <class T>
bool one() {return false; }
template <>
bool one<int>() { std::cout << "one\n"; return true; }
template <class T>
bool two() {return false; }
template <>
bool two<int>() { std::cout << "two\n"; return true; }
template <class T>
bool three() {return false; }
template <>
bool three<int>() { std::cout << "three\n"; return true; }
template <class A, class B, class C>
struct X
{
void foo()
{
one<A>() || two<B>() || three<C>();
}
};
int main()
{
X<int, double, int>().foo(); //one
X<double, int, int>().foo(); //two
X<double, double, double>().foo(); //...
X<double, double, int>().foo(); //three
}

I think you've got your abstractions wrong somewhere.
I would try redefining A, B & C in terms of interfaces they need to expose (abstract base classes in C++ with pure virtual methods).
Templating allows basically duck-typing, but it sounds like CFoo knows too much about the A B & C classes.
typeid is bad because:
typeid can be expensive, bloats
binaries, carries around extra
information that shouldn't be
required.
Not all compilers support it
It's basically breaking the class hierarchy.
What I would recommend is refactoring: remove the templating, instead define interfaces for A, B & C, and make CFoo take those interfaces. That will force you to refactor the behaviour so the A, B & C are actually cohesive types.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Optimization for specific argument without template - c++

Related

How to declare the template argument for an overloaded function

Crash in shared_ptr destructor in templated function

Is testing for a return value of an inline (templated) function, which is itself testing, optimized to one test?

How are template arguments expanded

C++ specialization, type_of or just typeid

Categories

Resources