Inline function pointer to avoid if statement - c++

In my jpg decoder I have a loop with an if statement that will always be true or always be false depending on the image. I could make two separate functions to avoid the if statement but I was wondering out of curiosity what the effect on efficiency would be using a function pointer instead of the if statement. It will point to the inline function if true or point to an empty inline function if false.
class jpg{
private:
// emtpy function
void inline nothing();
// real function
void inline function();
// pointer to inline function
void (jpg::*functionptr)() = nullptr;
}
jpg::nothing(){}
main(){
functionptr = &jpg::nothing;
if(trueorfalse){
functionptr = &jpg::function;
}
while(kazillion){
(this->*functionptr)();
dootherstuff();
}
}
Could this be faster than an if statement? My guess is no, because the inline will be useless since the compiler won't know which function to inline at compile time and the function pointer address resolve is slower than an if statement.
I have profiled my program and while I expected a noticeable difference one way or the other when I ran my program... I did not experience a noticeable difference. So I'm just wondering out of curiosity.

It is very likely that the if statement would be faster than invoking a function, as the if will just be a short jump vs the overhead of a function call.
This has been discussed here: Which one is faster ? Function call or Conditional if Statement?
The "inline" keyword is just a hint to the compiler to tell it to try to put the instructions inline when assembling it. If you use a function pointer to an inline, the inline optimization cannot be used anyway:
Read: Do inline functions have addresses?
If you feel that the if statement is slowing it too much, you could eliminate it altogether by using separate while statements:
if (trueorfalse) {
while (kazillion) {
trueFunction();
dootherstuff();
}
} else {
while (kazillion) {
dootherstuff();
}
}

Caution 1: I am not really answering the above question, on purpose. If one wants to know what it faster between an if statement and a function call via a pointer in the above example, then mbonneau gives a very good answer.
Caution 2: The following is pseudo-code.
Besides curiosity, I truly think one should not ask himself what is faster between an if statement and a function call to optimize his code. The gain would certainly be very small, and the resulting code might be twisted in such a way it could impact readability AND maintenance.
For my research, I do care about performance, this is a fundamental notion I have to stick with. But I do more care about code maintenance, and if I have to choose between a good structure and a slight optimization, I definitely choose the good structure. Then, if it was me, I would write the above code as follows (avoiding if statements), using composition through a Strategy Pattern.
class MyStrategy {
public:
virtual void MyFunction( Stuff& ) = 0;
};
class StrategyOne : public MyStrategy {
public:
void MyFunction( Stuff& ); // do something
};
class StrategyTwo : public MyStrategy {
public:
void MyFunction( Stuff &stuff ) { } // do nothing, and if you
// change your mind it could
// do something later.
};
class jpg{
public:
jpg( MyStrategy& strat) : strat(strat) { }
void func( Stuff &stuff ) { return strat.MyFunction( stuff ); }
private:
...
MyStrategy strat;
}
main(){
jpg a( new StrategyOne );
jpg b( new StrategyTwo );
vector<jpg> v { a, b };
for( auto e : v )
{
e.func();
dootherstuff();
}
}

Related

Avoiding Checking likely if

Given the following:
class ReadWrite {
public:
int Read(size_t address);
void Write(size_t address, int val);
private:
std::map<size_t, int> db;
}
In read function when accessing an address which no previous write was made to I want to either throw exception designating such error or allow that and return 0, in other words I would like to either use std::map<size_t, int>::operator[]() or std::map<size_t, int>::at(), depending on some bool value which user can set. So I add the following:
class ReadWrite {
public:
int Read(size_t add) { if (allow) return db[add]; return db.at(add);}
void Write(size_t add, int val) { db[add] = val; }
void Allow() { allow = true; }
private:
bool allow = false;
std::map<size_t, int> db;
}
The problem with that is:
Usually, the program will have one call of allow or none at the beginning of the program and then afterwards many accesses. So, performance wise, this code is bad because it every-time performs the check if (allow) where usually it's either always true or always false.
So how would you solve such problem?
Edit:
While the described use case (one or none Allow() at first) of this class is very likely it's not definite and so I must allow user call Allow() dynamically.
Another Edit:
Solutions which use function pointer: What about the performance overhead incurred by using function pointer which is not able to make inline by the compiler? If we use std::function instead will that solve the issue?
Usually, the program will have one call of allow or none at the
beginning of the program and then afterwards many accesses. So,
performance wise, this code is bad because it every-time performs the
check if (allow) where usually it's either always true or always
false. So how would you solve such problem?
I won't, The CPU will.
the Branch Prediction will figure out that the answer is most likely to be same for some long time so it will able to optimize the branch in the hardware level very much. it will still incur some overhead, but very negligible.
If you really need to optimize your program, I think your better use std::unordered_map instead of std::map, or move to some faster map implementation, like google::dense_hash_map. the branch is insignificant compared to map-lookup.
If you want to decrease the time-cost, you have to increase the memory-cost. Accepting that, you can do this with a function pointer. Below is my answer:
class ReadWrite {
public:
void Write(size_t add, int val) { db[add] = val; }
// when allowed, make the function pointer point to read2
void Allow() { Read = &ReadWrite::read2;}
//function pointer that points to read1 by default
int (ReadWrite::*Read)(size_t) = &ReadWrite::read1;
private:
int read1(size_t add){return db.at(add);}
int read2(size_t add) {return db[add];}
std::map<size_t, int> db;
};
The function pointer can be called as the other member functions. As an example:
ReadWrite rwObject;
//some code here
//...
rwObject.Read(5); //use of function pointer
//
Note that non-static data member initialization is available with c++11, so the int (ReadWrite::*Read)(size_t) = &ReadWrite::read1; may not compile with older versions. In that case, you have to explicitly declare one constructor, where the initialization of the function pointer can be done.
You can use a pointer to function.
class ReadWrite {
public:
void Write(size_t add, int val) { db[add] = val; }
int Read(size_t add) { (this->*Rfunc)(add); }
void Allow() { Rfunc = &ReadWrite::Read2; }
private:
std::map<size_t, int> db;
int Read1(size_t add) { return db.at(add); }
int Read2(size_t add) { return db[add]; }
int (ReadWrite::*Rfunc)(size_t) = &ReadWrite::Read1;
}
If you want runtime dynamic behaviour you'll have to pay for it at runtime (at the point you want your logic to behave dynamically).
You want different behaviour at the point where you call Read depending on a runtime condition and you'll have to check that condition.
No matter whether your overhad is a function pointer call or a branch, you'll find a jump or call to different places in your program depending on allow at the point Read is called by the client code.
Note: Profile and fix real bottlenecks - not suspected ones. (You'll learn more if you profile by either having your suspicion confirmed or by finding out why your assumption about the performance was wrong.)

Jump as an alternative to RTTI

I am learning how c++ is compiled into assembly and I found how exceptions works under the hood very interesting. If its okay to have more then one execution paths for exceptions why not for normal functions.
For example, lets say you have a function that can return a pointer to class A or something derived from A. The way your supposed to do it is with RTTI.
But why not, instead, have the called function, after computing the return value, jump back to the caller function into the specific location that matchs up with the return type. Like how exceptions, the execution flow can go normal or, if it throws, it lands in one of your catch handlers.
Here is my code:
class A
{
public:
virtual int GetValue() { return 0; }
};
class B : public A
{
public:
int VarB;
int GetValue() override { return VarB; }
};
class C : public A
{
public:
int VarC;
int GetValue() override { return VarC; }
};
A* Foo(int i)
{
if(i == 1) return new B;
if(i == 2)return new C;
return new A;
}
void main()
{
A* a = Foo(2);
if(B* b = dynamic_cast<B*>(a))
{
b->VarB = 1;
}
else if(C* c = dynamic_cast<C*>(a)) // Line 36
{
c->VarC = 2;
}
else
{
assert(a->GetValue() == 0);
}
}
So instead of doing it with RTTI and dynamic_cast checks, why not have the Foo function just jump to the appropriate location in main. So in this case Foo returns a pointer to C, Foo should instead jump to line 36 directly.
Whats wrong with this? Why aren't people doing this? Is there a performance reason? I would think this would be cheaper then RTTI.
Or is this just a language limitation, regardless if its a good idea or not?
First of all, there are million different ways of defining the language. C++ is defined as it is defined. Nice or not really does not matter. If you want to improve the language, you are free to write a proposal to C++ committee. They will review it and maybe include in future standards. Sometimes this happens.
Second, although exceptions are dispatched under the hood, there are no strong reasons to think that this is more efficient comparing your handwritten code that uses RTTI. Exception dispatch still requires CPU cycles. There is no miracle there. The real difference is that for using RTTI you need to write the code yourself, while the exception dispatch code is generated for you by compiler.
You may want to call you function 10000 times and find out what will run faster: RTTI based code or exception dispatch.

Generating code during execution in c++11

I am programming with C++11 and was wondering if there is a way to generate some code during execution.
For example instead of writing:
void b(int i){i+1}
void c(int i){i-1}
if(true) b()
else{ c() }
would there be a more straightforward way to say if true, then replace all + with - ?
Thank you and sorry if this question is stupid..
C++ has no native facilities for runtime code generation. You could of course invoke a C++ compiler from your program, then dynamically load the resulting binary, and call code from it, but I doubt this is the best solution to your problem.
If you are worried about repeatedly checking the condition, you shouldn't be. Modern CPUs will likely deal with this very well, even in a tight loop, due to branch prediction.
Last, if you really want to more dynamically alter the code path you take, you could use function pointers and/or polymorphism and/or lambdas.
An example with functions
typedef void (pFun*)(int); // pointer to function taking int, returning void
void b(int i){i+1}
void c(int i){i-1}
...
pFun d = cond ? b : c; // based on condition, select function b or c
...
pFun(i); // calls either b or c, effectively selecting + or -
An example with polymorphism
class Operator
{
public:
Operator() {}
virtual ~Operator() {}
virtual void doIt(int i) = 0;
};
class Add : public Operator
{
public:
virtual void doIt(int i) { i+1; }
};
class Sub : public Operator
{
public:
virtual void doIt(int i) { i-1; }
};
...
Operator *pOp = cond ? new Add() : new Sub();
...
pOp->doIt(i);
...
delete pOp;
Here, I have defined a base class with the doIt pure virtual function. The two child classes override the doIt() function to do different things. pOp will then point at either an Add or a Sub instance depending on cond, so when pOp->doIt() is called, the appropriate implementation of your operator is used. Under the covers, this does essentially what I outlined in the above example with function pointers, so choosing one over the other is largely a matter of style and/or other design constrains. They should both perform just as well.
An example with lambdas
This is basically the same as the first example using function pointers, but done in a more C++11 way using lambdas (and it is more concise).
auto d = cond ? [](int i) { i+1; }
: [](int i) { i-1; };
...
d(i);
Alternatively, you may prefer to have the condition inside the body of the lambda, for example
auto d = [&](int i) { cond ? i+1 : i-1; }
...
d(i);
C++ does not have runtime code generation since it's a compiled language.
In this case, you could put the sign into a variable (to be used with multiple variables.)
E.g.
int sign = (true ? 1 : -1);
result2 += sign;
result1 += sign;
Not necessarily a solution for your problem, but you could use
a template, instantiated on one of the operators in <functional>:
template <typename Op>
int
func( int i )
{
return Op()( i, 1 );
}
In your calling function, you would then do something like:
int (*f)( int i ) = condition ? &func<std::plus> : &func<std::minus>;
// ...
i = f( i );
It's possible to use lambdas, which may be preferable, but you can't use
the conditional operator in this case. (Every lambda has a unique type,
and the second and third operatands of the conditional operator must
have the same type.) So it becomes a bit more verbose:
int (*f)( int i );
if ( condition ) {
f = []( int i ) { return i + 1; }
} else {
f = []( int i ) { return i - 1; }
}
This will only work if there is no capture in the lambdas; when there is
no capture, the lambda not only generates an instance of a class with
a unique type, but also a function. Although not being able to use the
conditional operator makes this more verbose than necessary, it is still
probably simpler than having to define a function outside of the class,
unless that function can be implemented as a template, as in my first
example. (I'm assuming that your actual case may be significantly more
complicated than the example you've posted.)
EDIT:
Re lambdas, I tried:
auto f = c ? []( int i ) { return i + 1; } : []( int i ) { return i - 1; };
just out of curiosity. MSC++ gave me the expected error
message:
no conversion from 'someFunc::<lambda_21edbc86aa2c32f897f801ab50700d74>' to 'someFunc::<lambda_0dff34d4a518b95e95f7980e6ff211c5>'
but g++ compiled it without complaining, typeid(f) gave "PFiiI",
which I think is a pointer to a function. In this case, I'm pretty sure
that MSC++ is right: the standard says that each of the lambdas has
a unique type, and that each has a conversion operator to (in this
case) an int (*)( int ) (so both can be converted to the same
type—this is why the version with the if works). But the
specification of the conditional operator requires that either the
second operand can be converted to the type of the third, or vice versa,
but the results must be the type of one of the operands; it cannot be
a third type to which both are converted.

C++ inline function & context specific optimization

I have read in Scott Meyers' Effective C++ book that:
When you inline a function you may enable the compiler to perform context specific optimizations on the body of function. Such optimization would be impossible for normal function calls.
Now the question is: what is context specific optimization and why it is necessary?
I don't think "context specific optimization" is a defined term, but I think it basically means the compiler can analyse the call site and the code around it and use this information to optimise the function.
Here's an example. It's contrived, of course, but it should demonstrate the idea:
Function:
int foo(int i)
{
if (i < 0) throw std::invalid_argument("");
return -i;
}
Call site:
int bar()
{
int i = 5;
return foo(i);
}
If foo is compiled separately, it must contain a comparison and exception-throwing code. If it's inlined in bar, the compiler sees this code:
int bar()
{
int i = 5;
if (i < 0) throw std::invalid_argument("");
return -i;
}
Any sane optimiser will evaluate this as
int bar()
{
return -5;
}
If the compile choose to inline a function, it will replace a function call to this function by the body of the function. It now has more code to optimize inside the caller function body. Therefore, it often leads to better code.
Imagine that:
bool callee(bool a){
if(a) return false;
else return true;
}
void caller(){
if(callee(true)){
//Do something
}
//Do something
}
Once inlined, the code will be like this (approximatively):
void caller(){
bool a = true;
bool ret;
if(a) ret = false;
else ret = true;
if(ret){
//Do something
}
//Do something
}
Which may be optimized further too:
void caller(){
if(false){
//Do something
}
//Do something
}
And then to:
void caller(){
//Do something
}
The function is now much smaller and you don't have the cost of the function call and especially (regarding the question) the cost of branching.
Say the function is
void fun( bool b) { if(b) do_sth1(); else do_sth2(); }
and it is called in the context with pre-defined false parameter
bool param = false;
...
fun( param);
then the compiler may reduce the function body to
...
do_sth2();
I don't think that context specific optimization means something specific and you probably can't find exact definition.
Nice example would be classical getter for some class attributes, without inlining it program has to:
jump to getter body
move value to registry (eax on x86 under windows with default Visual studio settings)
jump back to callee
move value from eax to local variable
While using inlining can skip almost all the work and move value directly to local variable.
Optimizations strictly depend on compiler but lot of think can happen (variable allocation may be skipped, code may get reorder and so on... But you always save call/jump which is expensive instruction.
More reading on optimisation here.

Best Practice for Scoped Reference Idiom?

I just got burned by a bug that is partially due to my lack of understanding, and partially due to what I think is suboptimal design in our codebase. I'm curious as to how my 5-minute solution can be improved.
We're using ref-counted objects, where we have AddRef() and Release() on objects of these classes. One particular object is derived from the ref-count object, but a common function to get an instance of these objects (GetExisting) hides an AddRef() within itself without advertising that it is doing so. This necessitates doing a Release at the end of the functional block to free the hidden ref, but a developer who didn't inspect the implementation of GetExisting() wouldn't know that, and someone who forgets to add a Release at the end of the function (say, during a mad dash of bug-fixing crunch time) leaks objects. This, of course, was my burn.
void SomeFunction(ProgramStateInfo *P)
{
ThreadClass *thread = ThreadClass::GetExisting( P );
// some code goes here
bool result = UseThreadSomehow(thread);
// some code goes here
thread->Release(); // Need to do this because GetExisting() calls AddRef()
}
So I wrote up a little class to avoid the need for the Release() at the end of these functions.
class ThreadContainer
{
private:
ThreadClass *m_T;
public:
ThreadContainer(Thread *T){ m_T = T; }
~ThreadContainer() { if(m_T) m_T->Release(); }
ThreadClass * Thread() const { return m_T; }
};
So that now I can just do this:
void SomeFunction(ProgramStateInfo *P)
{
ThreadContainer ThreadC(ThreadClass::GetExisting( P ));
// some code goes here
bool result = UseThreadSomehow(ThreadC.Thread());
// some code goes here
// Automagic Release() in ThreadC Destructor!!!
}
What I don't like is that to access the thread pointer, I have to call a member function of ThreadContainer, Thread(). Is there some clever way that I can clean that up so that it's syntactically prettier, or would anything like that obscure the meaning of the container and introduce new problems for developers unfamiliar with the code?
Thanks.
use boost::shared_ptr
it is possible to define your own destructor function, such us in next example: http://www.boost.org/doc/libs/1_38_0/libs/smart_ptr/sp_techniques.html#com
Yes, you can implement operator ->() for the class, which will recursively call operator ->() on whatever you return:
class ThreadContainer
{
private:
ThreadClass *m_T;
public:
ThreadContainer(Thread *T){ m_T = T; }
~ThreadContainer() { if(m_T) m_T->Release(); }
ThreadClass * operator -> () const { return m_T; }
};
It's effectively using smart pointer semantics for your wrapper class:
Thread *t = new Thread();
...
ThreadContainer tc(t);
...
tc->SomeThreadFunction(); // invokes tc->t->SomeThreadFunction() behind the scenes...
You could also write a conversion function to enable your UseThreadSomehow(ThreadContainer tc) type calls in a similar way.
If Boost is an option, I think you can set up a shared_ptr to act as a smart reference as well.
Take a look at ScopeGuard. It allows syntax like this (shamelessly stolen from that link):
{
FILE* topSecret = fopen("cia.txt");
ON_BLOCK_EXIT(std::fclose, topSecret);
... use topSecret ...
} // topSecret automagically closed
Or you could try Boost::ScopeExit:
void World::addPerson(Person const& aPerson) {
bool commit = false;
m_persons.push_back(aPerson); // (1) direct action
BOOST_SCOPE_EXIT( (&commit)(&m_persons) )
{
if(!commit)
m_persons.pop_back(); // (2) rollback action
} BOOST_SCOPE_EXIT_END
// ... // (3) other operations
commit = true; // (4) turn all rollback actions into no-op
}
I would recommend following bb advice and using boost::shared_ptr<>. If boost is not an option, you can take a look at std::auto_ptr<>, which is simple and probably addresses most of your needs. Take into consideration that the std::auto_ptr has special move semantics that you probably don't want to mimic.
The approach is providing both the * and -> operators together with a getter (for the raw pointer) and a release operation in case you want to release control of the inner object.
You can add an automatic type-cast operator to return your raw pointer. This approach is used by Microsoft's CString class to give easy access to the underlying character buffer, and I've always found it handy. There might be some unpleasant surprises to be discovered with this method, as in any time you have an implicit conversion, but I haven't run across any.
class ThreadContainer
{
private:
ThreadClass *m_T;
public:
ThreadContainer(Thread *T){ m_T = T; }
~ThreadContainer() { if(m_T) m_T->Release(); }
operator ThreadClass *() const { return m_T; }
};
void SomeFunction(ProgramStateInfo *P)
{
ThreadContainer ThreadC(ThreadClass::GetExisting( P ));
// some code goes here
bool result = UseThreadSomehow(ThreadC);
// some code goes here
// Automagic Release() in ThreadC Destructor!!!
}