IF Performance with changing condition style - c++

I have this code
void test()
{
If (condition)
{
doSomthing();
}
}
test();
doSomethingMore();
Is there any performance effect with change it to:
void test()
{
If (!condition)
{
return;
}else
doSomthing();
}
}
test();
doSomethingMore();
Why??

You're really asking the wrong question. The performance is completely irrelevant here. Any optimizing compiler will render it absolutely insignificant, probably producing exactly the same binary for both styles.
No, in fact the real question is which code snippet is more readable. Your fellow human beings are the ones who will have to read and understand your code later. And, if you're particularly unlucky, one of those human beings that will have to read and understand your code later might even be you. Thus, it's far more important to write code that is logically structured and easily readable than it is to worry about micro-optimizations like this. Let the compiler handle those.
So, which is more readable? Definitely the first one. If the condition is true, then you're going to call the doSomething() function. Far more understandable.
In general, if statements should evaluate positively-named conditions. Most of the time, you should try to avoid the ! sign, as it's easily missed when scanning the code and can potentially make your code read as a double-negative. You also should generally avoid having multiple exit points from functions (i.e., the return statement). You should be able to read the function from top-to-bottom and see exactly what it does, without having to jump around. The second code snippet violates both of these rules, and does little else to help justify preferring it over the first.

For any modern compiler its extremely unlikely that small changes like this would make any difference. The optimizer already does many small optimizations (and several large ones).
In your particular example there will definitely be no difference even with an totally unoptimized compiler. All you have done is added an extra explicit "return" instruction which will sometimes be executed instead of the implied return at the end of the function.
Your code may end up four bytes bigger but the execution time would be identical.

Related

Will modern c++ compiler optimize immutable temporary variable?

For example I have a code like this:
void func(const QString& str)
{
QString s = str.replace(QRegexp("[abc]+"), " ");
......
}
will the compiler optimize the var QRegep("[abc]+"), just construct it once instead of construct for each time func invoked? Or in other words, do I need to reimplement the coding for performance like this:
void func(const QString& str)
{
static const QRegexp sc_re("[abc]+");
QString s = str.replace(sc_re, " ");
......
}
make the QRegexp as an static const variable.
will the compiler optimize the var QRegep("[abc]+"), just construct it once instead of construct for each time func invoked?
You are assuming that each invocation of func will construct an identical QRegexp object, but how do you know that? How do you know, for example, that these objects do not contain a serial number, an integer member that is set to the number of QRegexp objects previously constructed? If such a serial number was being used, it would be wrong for the compiler to construct your temporary variable just once.
OK, we can reasonably guess that nothing like that is going on. The point, though, is that we are guessing, and the compiler is not allowed to guess. So a prerequisite for the compiler considering such an optimization would be that the definition of the constructor is available (which is an implementation detail of that class, something you should not make your code dependent on).
If the constructor's definition is available, and if that definition provably produces the same results given the same input (and probably some other technical restrictions that slip my mind at the moment), then a compiler would be allowed to make this optimization.
I do not know if any compilers choose to provide this sort of optimization when it would be both allowed and beneficial (another assumption you've made). Performance testing of the two candidates with and without optimizations enabled should reveal if your particular compiler is likely taking advantage of this.
Or in other words, do I need to reimplement the coding for performance like this:
You almost never need to re-implement for performance. (One exception would be if your code is so inefficient it would take centuries to finish. I'm pretty sure we're not in that ballpark.) A better question is "should". I'll go with that.
In this specific case I would guess "no, that looks like premature optimization". However, that is just a guess, so I'll proceed to general guidelines that you can apply.
You should re-implement for performance only if:
1) the performance gain is noticeable to an end user, or
2) the new code is easier for a programmer to read and understand.
In other cases, rely on the compiler to make appropriate optimizations.
In your case, I see the variable name sc_re and think "what is that?" So point 2 is out. That leaves the question of a noticeable performance gain. This usually is not something one can determine by simply asking around. Typically, it involves performance testing, probably of at least two types. One test would time the two candidates in an artificial heavy loop to see how large the performance gain is (if there is one at all). The other test would profile your actual program to see if this code is called often enough for the gain to be noticed by an end user. A good third test would be to give the actual program to an end user and see if they notice the difference.
Of these tests, profiling might be the most productive use of your time. (Programmers are notoriously bad at identifying true performance roadblocks without the aid of a profiler.) If you spend 2 milliseconds in this function every 5 minutes, why spend time trying to improve that? On the other hand, if you spend 1 second in this function each time it is called, the profiler might tell you whether or not this constructor is the main culprit.

Understanding cost of multiple . and -> operator use?

Out of habit, when accessing values via . or ->, I assign them to variables anytime the value is going to be used more than once. My understanding is that in scripting languages like actionscript, this is pretty important. However, in C/C++, I'm wondering if this is a meaningless chore; am I wasting effort that the compiler is going to handle for me, or am I exercising a good practice, and why?
public struct Foo
{
public:
Foo(int val){m_intVal = val;)
int GetInt(){return m_intVal;}
int m_intVal; // not private for sake of last example
};
public void Bar()
{
Foo* foo = GetFooFromSomewhere();
SomeFuncUsingIntValA(foo->GetInt()); // accessing via dereference then function
SomeFuncUsingIntValB(foo->GetInt()); // accessing via dereference then function
SomeFuncUsingIntValC(foo->GetInt()); // accessing via dereference then function
// Is this better?
int val = foo->GetInt();
SomeFuncUsingIntValA(val);
SomeFuncUsingIntValB(val);
SomeFuncUsingIntValC(val);
///////////////////////////////////////////////
// And likewise with . operator
Foo fooDot(5);
SomeFuncUsingIntValA(fooDot.GetInt()); // accessing via function
SomeFuncUsingIntValB(fooDot.GetInt()); // accessing via function
SomeFuncUsingIntValC(fooDot.GetInt()); // accessing via function
// Is this better?
int valDot = foo.GetInt();
SomeFuncUsingIntValA(valDot);
SomeFuncUsingIntValB(valDot);
SomeFuncUsingIntValC(valDot);
///////////////////////////////////////////////
// And lastly, a dot operator to a member, not a function
SomeFuncUsingIntValA(fooDot.m_intVal); // accessing via member
SomeFuncUsingIntValB(fooDot.m_intVal); // accessing via member
SomeFuncUsingIntValC(fooDot.m_intVal); // accessing via member
// Is this better?
int valAsMember = foo.m_intVal;
SomeFuncUsingIntValA(valAsMember);
SomeFuncUsingIntValB(valAsMember);
SomeFuncUsingIntValC(valAsMember);
}
Ok so I try to go for an answer here.
Short version: you definitely don’t need to to this.
Long version: you might need to do this.
So here it goes: in interpreted programs like Javascript theese kind of things might have a noticeable impact. In compiled programs, like C++, not so much to the point of not at all.
Most of the times you don’t need to worry with these things because an immense amount of resources have been pulled into compiler optimization algorithms (and actual implementations) that the compiler will correctly decide what to do: allocate an extra register and save the result in order to reuse it or recompute every time and save that register space, etc.
There are instances where the compiler can’t do this. That is when it can’t prove multiple calls produce the same result. Then it has no choice but to make all the calls.
Now let’s assume that the compiler makes the wrong choice and you as a precaution make the effort of micro–optimizations. You make the optimization and you squish a 10% performance increase (which is already an overly overly optimistic figure for this kind of optimization) on that portion of code. But what do you know, your code spends only 1% of his time in that portion of code. The rest of the time is most likely spend in some hot loops and waiting for data fetch. So you spend a non-negligible amount of effort to optimize yourself the code only to get a 0.1% performance increase in total time, which won’t even be observable due to the external factors that vary the execution time by way more than that amount.
So don’t spend time with micro-optimizations in C++.
However there are cases where you might need to do this and even crazier things. But this is only after properly profiling your code and this is another discussion.
So worry about readability, don’t worry about micro–optimizations.
The question is not really related to -> and . operators, but rather about repetitive expressions in general. Yes, it is true that most modern compilers are smart enough to optimize the code that evaluates the same expression repeatedly (assuming it has no observable side-effects).
However, using an explicit intermediate variable typically makes the program much more readable, since it explicitly exposes the fact that the same value is supposed to be used in all contexts. It exposes the fact the it was your intent to use the same value in all contexts.
If you repeat using the same expression to generate that value again and again, this fact becomes much less obvious. Firstly, it is difficult to say at the first sight whether the expressions are really identical (especially when they are long). Secondly, it is not obvious whether sequential evaluations of the seemingly the same expression produce identical results.
Finally, slicing long expressions into smaller ones by using intermediate variables can significantly simply debugging the code in step-by-step debugger, since it give the user much greater degree of control through "step in" and "step over" commands.
It's for sure better in terms of readability and maintainability to have such temporary variable.
In terms of performance, you shouldn't worry about such micro-optimization at this stage (premature optimization). Moreover, modern C++ compilers can optimize it anyway, so you really shouldn't worry about it.

Function calls vs. local variables

I often see functions where other functions are called multiple times instead of storing the result of the function once.
i.e (1):
void ExampleFunction()
{
if (TestFunction() > x || TestFunction() < y || TestFunction() == z)
{
a = TestFunction();
return;
}
b = TestFunction();
}
Instead I would write it that way, (2):
void ExampleFunction()
{
int test = TestFunction();
if (test > x || test < y || test == z)
{
a = test;
return;
}
b = test;
}
I think version 2 is much better to read and better to debug.
But I'm wondering why people do it like in number 1?
Is there anything I don't see? Performance Issue?
When I look at it, I see in the worst case 4 function calls in number (1) instead of 1 function call in number (2), so performance should be worse in number (1), shouldn't it?
I'd use (2) if I wanted to emphasize that the same value is used throughout the code, or if I wanted to emphasize that the type of that value is int. Emphasizing things that are true but not obvious can assist readers to understand the code quickly.
I'd use (1) if I didn't want to emphasize either of those things, especially if they weren't true, or if the number of times that TestFunction() is called is important due to side-effects.
Obviously if you emphasize something that's currently true, but then in future TestFunction() changes and it becomes false, then you have a bug. So I'd also want either to have control of TestFunction() myself, or to have some confidence in the author's plans for future compatibility. Often that confidence is easy: if TestFunction() returns the number of CPUs then you're happy to take a snapshot of the value, and you're also reasonably happy to store it in an int regardless of what type it actually returns. You have to have minimal confidence in future compatibility to use a function at all, e.g. be confident that it won't in future return the number of keyboards. But different people sometimes have different ideas what's a "breaking change", especially when the interface isn't documented precisely. So the repeated calls to TestFunction() might sometimes be a kind of defensive programming.
When a temporary is used to store the result of a very simple expression like this one, it can be argued that the temporary introduces unecessary noise that should be eliminated.
In his book "Refactoring: Improving the Design of Existing Code", Martin Fowler lists this elimination of temporaries as a possibly beneficial refactoring (Inline temp).
Whether or not this is a good idea depends on many aspects:
Does the temporary provides more information than the original expression, for example through a meaningful name?
Is performance important? As you noted, the second version without temporary might be more efficient (most compilers should be able to optimize such code so that the function is called only once, assuming it is free of side-effects).
Is the temporary modified later in the function? (If not, it should probably be const)
etc.
In the end, the choice to introduce or remove such temporary is a decision that should be made on a case by case basis. If it makes the code more readable, leave it. If it is just noise, remove it. In your particular example, I would say that the temporary does not add much, but this is hard to tell without knowing the real names used in your actual code, and you may feel otherwise.
The second option is clearly superior.
You want to emphasize and ensure that you have three times the same value in the if-statement.
Performance should not be a bottleneck in this example. In conclusion minimizing the chance for errors plus emphasize same values are much more important then a potential small performance gain.
The two are not equivalent. Take for example:
int TestFunction()
{
static int x;
return x++;
}
In a sane world though, this wouldn't be the case, and I agree that the second version is better. :)
If the function, for some reason, can't be inlined, the second will even be more efficient.
I think version 2 is much better to read and better to debug.
Agreed.
so performance should be worse in number (1), shouldn't it?
Not necessarily. If TestFunction is small enough, then the compiler may decide to optimize the multiple calls away. In other cases, whether performance matters depends on how often ExampleFunction is called. If not often, then optimize for maintainability.
Also, TestFunction may have side-effects, but in that case, the code or comments should make that clear in some way.

Pointer dereferencing overhead vs branching / conditional statements

In heavy loops, such as ones found in game applications, there could be many factors that decide what part of the loop body is executed (for example, a character object will be updated differently depending on its current state) and so instead of doing:
void my_loop_function(int dt) {
if (conditionX && conditionY)
doFoo();
else
doBar();
...
}
I am used to using a function pointer that points to a certain logic function corresponding to the character's current state, as in:
void (*updater)(int);
void something_happens() {
updater = &doFoo;
}
void something_else_happens() {
updater = &doBar;
}
void my_loop_function(int dt) {
(*updater)(dt);
...
}
And in the case where I don't want to do anything, I define a dummy function and point to it when I need to:
void do_nothing(int dt) { }
Now what I'm really wondering is: am I obsessing about this needlessly? The example given above of course is simple; sometimes I need to check many variables to figure out which pieces of code I'll need to execute, and so I figured out using these "state" function pointers would indeed be more optimal, and to me, natural, but a few people I'm dealing with are heavily disagreeing.
So, is the gain from using a (virtual)function pointer worth it instead of filling my loops with conditional statements to flow the logic?
Edit: to clarify how the pointer is being set, it's done through event handling on a per-object basis. When an event occurs and, say, that character has custom logic attached to it, it sets the updater pointer in that event handler until another event occurs which will change the flow once again.
Thank you
The function pointer approach let's you make the transitions asynchronous. Rather than just passing dt to the updater, pass the object as well. Now the updater can itself be responsible for the state transitions. This localizes the state transition logic instead of globalizing it in one big ugly if ... else if ... else if ... function.
As far as the cost of this indirection, do you care? You might care if your updaters are so extremely small that the cost of a dereference plus a function call overwhelms the cost of executing the updater code. If the updaters are of any complexity, that complexity is going to overwhelm the cost of this added flexibility.
I think I 'll agree with the non-believers here. The money question in this case is how is the pointer value going to be set?
If you can somehow index into a map and produce a pointer, then this approach might justify itself through reducing code complexity. However, what you have here is rather more like a state machine spread across several functions.
Consider that something_else_happens in practice will have to examine the previous value of the pointer before setting it to another value. The same goes for something_different_happens, etc. In effect you 've scattered the logic for your state machine all over the place and made it difficult to follow.
Now what I'm really wondering is: am I obsessing about this needlessly?
If you haven't actually run your code, and found that it actually runs too slowly, then yes, I think you probably are worrying about performance too soon.
Herb Sutter and Andrei Alexandrescu in
C++ Coding Standards: 101 Rules, Guidelines, and Best Practices devote chapter 8 to this, called "Don’t optimize prematurely", and they summarise it well:
Spur not a willing horse (Latin proverb): Premature optimization is as addictive as it is unproductive. The first rule of optimization is: Don’t do it. The second rule of optimization (for experts only) is: Don’t do it yet. Measure twice, optimize once.
It's also worth reading chapter 9: "Don’t pessimize prematurely"
Testing a condition is:
fetch a value
compare (subtract)
Jump if zero (or non-zero)
Perform an indirection is:
Fetch an address
jump.
It may be even more performant!
In fact you do the "compare" before, in another place, to decide what to call. The result will be identical.
You did nothign more that an dispatch system identical to the one the compiler does when calling virtual functions.
It is proven that avoiding virtual function to implement dispatching through switches doesn't improve performance on modern compilers.
The "don't use indirection / don't use virtual / don't use function pointer / don't dynamic cast etc." in most of the case are just myths based on historical limitations of early compiler and hardware architectures..
The performance difference will depend on the hardware and the compiler
optimizer. Indirect calls can be very expensive on some machines, and
very cheap on others. And really good compilers may be able to optimize
even indirect calls, based on profiler output. Until you've actually
benchmarked both variants, on your actual target hardware and with the
compiler and compiler options you use in your final release code, it's
impossible to say.
If the indirect calls do end up being too expensive, you can still hoist
the tests out of the loop, by either setting an enum, and using a
switch in the loop, or by implementing the loop for each combination
of settings, and selecting once at the beginning. (If the functions you
point to implement the complete loop, this will almost certainly be
faster than testing the condition each time through the loop, even if
indirection is expensive.)

Overhead of calling tiny functions from a tight inner loop? [C++]

Say you see a loop like this one:
for(int i=0;
i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count();
++i)
{
thing.getData().insert(
thing.GetData().Count(),
thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName()
);
}
if this was Java I'd probably not think twice. But in performance-critical sections of C++, it makes me want to tinker with it... however I don't know if the compiler is smart enough to make it futile.
This is a made up example but all it's doing is inserting strings into a container. Please don't assume any of these are STL types, think in general terms about the following:
Is having a messy condition in the for loop going to get evaluated each time, or only once?
If those get methods are simply returning references to member variables on the objects, will they be inlined away?
Would you expect custom [] operators to get optimized at all?
In other words is it worth the time (in performance only, not readability) to convert it to something like:
ElementContainer &source =
thing.getParent().getObjectModel().getElements(SOME_TYPE);
int num = source.count();
Store &destination = thing.getData();
for(int i=0;i<num;++i)
{
destination.insert(thing.GetData().Count(), source[i].getName());
}
Remember, this is a tight loop, called millions of times a second. What I wonder is if all this will shave a couple of cycles per loop or something more substantial?
Yes I know the quote about "premature optimisation". And I know that profiling is important. But this is a more general question about modern compilers, Visual Studio in particular.
The general way to answer such questions is to looked at the produced assembly. With gcc, this involve replacing the -c flag with -S.
My own rule is not to fight the compiler. If something is to be inlined, then I make sure that the compiler has all the information needed to perform such an inline, and (possibly) I try to urge him to do so with an explicit inline keyword.
Also, inlining saves a few opcodes but makes the code grow, which, as far as L1 cache is concerned, can be very bad for performance.
All the questions you are asking are compiler-specific, so the only sensible answer is "it depends". If it is important to you, you should (as always) look at the code the compiler is emitting and do some timing experiments. Make sure your code is compiled with all optimisations turned on - this can make a big difference for things like operator[](), which is often implemented as an inline function, but which won't be inlined (in GCC at least) unless you turn on optimisation.
If the loop is that critical, I can only suggest that you look at the code generated. If the compiler is allowed to aggressively optimise the calls away then perhaps it will not be an issue. Sorry to say this but modern compilers can optimise incredibly well and the I really would suggest profiling to find the best solution in your particular case.
If the methods are small and can and will be inlined, then the compiler may do the same optimizations that you have done. So, look at the generated code and compare.
Edit: It is also important to mark const methods as const, e.g. in your example count() and getName() should be const to let the compiler know that these methods do not alter the contents of the given object.
As a rule, you should not have all that garbage in your "for condition" unless the result is going to be changing during your loop execution.
Use another variable set outside the loop. This will eliminate the WTF when reading the code, it will not negatively impact performance, and it will sidestep the question of how well the functions get optimized. If those calls are not optimized this will also result in performance increase.
I think in this case you are asking the compiler to do more than it legitimately can given the scope of compile-time information it has access to. So, in particular cases the messy condition may be optimized away, but really, the compiler has no particularly good way to know what kind of side effects you might have from that long chain of function calls. I would assume that breaking out the test would be faster unless I have benchmarking (or disassembly) that shows otherwise.
This is one of the cases where the JIT compiler has a big advantage over a C++ compiler. It can in principle optimize for the most common case seen at runtime and provide optimized bytecode for that (plus checks to make sure that one falls into that case). This sort of thing is used all the time in polymorphic method calls that turn out not to actually be used polymorphically; whether it could catch something as complex as your example, though, I'm not certain.
For what it's worth, if speed really mattered, I'd split it up in Java too.