I'm wondering which is more efficient.
Let's say I have this code:
while (i<10000){
if (cond1)
doSomething1();
else if (cond2)
doSomething2();
else
doSomething3();
}
Now cond1 and cond2 are "constant" conditions, which means that if it happens once for i it will happen to all of them, and just one of them can be true.
Most of the time, the last doSomething3() is executed.
Now what happens if I write something like this:
if (cond1){
while (i<10000)
doSomething1();
}
else if (cond2){
while (i<10000)
doSomething2();
}
else{
while (i<10000)
doSomething3();
}
Is it more efficient because I'm checking cond1 and cond2 just once?
What you need to be asking is which is more cache-efficient.
In many cases, the compiler can figure this out for you and will rearrange your code to get the best results. Its ability to do so will depend largely on what doSomethingN does and what condN are. If condN can change over time, then it may well need to be physically checked each time, in which case your code cannot be substantially re-arranged and checking it on every loop iteration may indeed be far slower. But, if condN is constant, the repeated checks can be optimised away, essentially resulting in your first code being converted into the second.
Ultimately, the only way to be sure is to measure it (and study the resulting assembly, if you can understand it).
On a first look, the second one looks like it is more efficient, after all one check is obviously more efficient than 10000 checks, well, more efficient performance wise, slightly more code is the price it comes at.
But then again, the performance overhead of the first one may very well be negligible, you should really benchmark the two, because you know, you don't have a performance problem until you prove you do.
At any rate, any production grade compiler will likely be able to optimize things for you in that fairly static and easy to analyze context.
It could be another scenario, where there are a lot more options, which might not be able to predict during compile time. For example you can have a hash map of conditions and function pointers, so you lookup the function pointer for the condition and use it to invoke the functionality. Naturally, this will be far less efficient, because it will use dynamic dispatch, which also means the calls cannot be inlined, but it is the price you pay for having it more flexible, for example using this approach you can register different functions and conditions during the runtime, you can change functionality for a given condition and whatnot.
For example consider a scenario where you need to perform 10000 actions, but each actions should depend on the result from the previous one.
while (i < 10000) cond = map[cond]();
I think you've answered your own question. Yes, it's more efficient if you write more efficient code.
EDIT #NeilKirk has a point. Since your compiler knows what it's doing, manually checking outside is only at least as efficient, not necessarily more efficient, if the compiler can detect that the condition won't change during your loop.
Related
How much time does saving a value cost me processor-vise? Say i have a calculated value x that i will use 2 times, 5 times, or 20 times.At what point does it get more optimal to save the value calculated instead of recalculating it each time i use it?
example:
int a=0,b=-5;
for(int i=0;i<k;++i)
a+=abs(b);
or
int a=0,b=-5;
int x=abs(b);
for(int i=0;i<k;++i)
a+=x;
At what k value does the second scenario produce better results? Also, how much is this RAM dependent?
Since the value of abs(b) doesn't change inside the for loop, a compiler will most likely optimize both snippets to the same result i.e. evaluating the value of abs(b) just once.
It is almost impossible to provide an answer other than measure in a real scenario. When you cache the data in the code, it may be stored in a register (in the code you provide it will most probably be), or it might be flushed to L1 cache, or L2 cache... depending on what the loop is doing (how much data is it using?). If the value is cached in a register the cost is 0, the farther it is pushed the higher the cost it will take to retrieve the value.
In general, write code that is easy to read and maintain, then measure the performance of the application, and if that is not good, profile. Find the hotspots, find why they are hotspots and then work from there on. I doubt that caching vs. calculating abs(x) for something as above would ever be a hotspot in a real application. So don't sweat it.
I would suggest (this is without testing mind you) that the example with
int x=abs(b)
outside the loop will be faster simply because you're avoiding allocating a stack frame each iteration in order to call abs().
That being said, if the compiler is smart enough, it may figure out what you're doing and produce the same (or similar) instructions for both.
As a rule of thumb it doesn't cost you much, if anything, to store that value outside the loop, since the compiler is most likely going to store the result of abs(x) into a register anyways. In fact, when the compiler optimizes this code (assuming you have optimizations turned on), one of the first things it will do is pull that abs(x) out of the loop.
You can further help the compiler generate good code by qualifying your declaration of "x" with the "register" hint. This will ask the compiler to store x into a register value if possible.
If you want to see what the compiler actually does with your code, one thing to do is to tell it to compile but not assemble (in gcc, the option is -S) and look at the resulting assembly code. In many cases, the compiler will generate better code than you can optimize by hand. However, there's also no reason to NOT do these easy optimizations yourself.
Addendum:
Compiling the above code with optimizations turned on in GCC will result in code equivalent to:
a = abs(b) * k;
Try it and see.
For many cases it produces better perf from k=2. The example you gave is . not one. Most compilers try to perform this kind of hoisting when even low levels of optimization are enabled. The value is stored, at worst, on the local stack and so is likely to stay fairly cache warm, negating your memory concerns.
But potentially it will be held in a register.
The original has to perform an adittional branch, repeat the calculations and return the value. Abs is one example of a function the compiler may be able to recognize as a constexpr and hoist.
While developing your own classes, this is one of the reason you should try to mark members and references as construe whenever possible.
So, there's this rule to try to pull if statements out of high repetition loops:
for( int i = 0 ; i < 10000 ; i++ )
{
if( someModeSettingOn ) doThis( data[i] ) ;
else doThat( data[i] ) ;
}
They say, it's better to break it up, to put the if statement outside:
if( someModeSettingOn )
for( int i = 0 ; i < 10000 ; i++ )
doThis( data[i] ) ;
else
for( int i = 0 ; i < 10000 ; i++ )
doThat( data[i] ) ;
(In case you're saying "Ho! Don't optimize that yourself! The compiler will do it!") Sure the optimizer might do this for you. But in Typical C++ Bullshit (which I don't agree with all his points, eg his attitude towards virtual functions) Mike Acton says "Why make the compiler guess at something you know? Pretty much best point of those stickies, for me.
So why not use a function pointer instead?
FunctionPointer *fp ;
if( someModeSettingOn ) fp = func1 ;
else fp = func2 ;
for( int i = 0 ; i < 10000 ; i++ )
{
fp( data[i] ) ;
}
Is there some kind of hidden overhead to function pointers? Is it is efficient as calling a straight function?
In this example it's impossible to say which case will be faster. You need to profile this code on target platform/compiler to estimate it.
And in general, in 99% case such code need not to be optimized. It's example of evil premature optimization.
Write human-readable code and optimize it only if need after profiling.
Don't guess, measure.
But, if I absolutely had to guess, I'd say the third variant (function pointer) is going to be slower than the second variant (if outside loops), which I suspect might play with CPU's branch prediction better.
The first variant may or may not be equivalent to the second one, depending on how smart the compiler is, as you have already noted.
Why make the compiler guess at something you know?
Because you may complicate the code for future maintainers without providing any tangible benefit to the users of your code. This change smells strongly of premature optimization and only after profiling would I consider anything other than the obvious (if inside loop) implementation.
Given that profiling shows it to be a problem then as a guess I believe pulling the if out of the loop would be faster than the function pointer because the pointer may add a level of indirection that the compiler can't optimize away. It will also decrease the likelihood that the compiler can inline any calls.
However I would also consider an alternate design using an abstract interface instead of an if within the loop. Then each data object already knows what to do automatically.
My bet would be on the second version to be the fastest with the if/else outside the loop provided that I get a refund when we tie and test this across the widest range of compilers. :-D I make this bet with quite a number of years with VTune in hand.
That said, I would actually be happy if I lost the bet. I think it's very feasible that many compilers nowadays could optimize the first version to rival the second, detecting that you're repeatedly checking a variable which doesn't change inside the loop and therefore effectively hoisting the branching to occur outside the loop.
However, I haven't encountered a case yet where I've seen an optimizer do the analogical equivalent of inlining an indirect function call... though if there was a case where an optimizer could do this, yours would definitely be the easiest since it assigns the addresses to the functions to call in the same function in which it calls those functions through the function pointers. I'd be really pleasantly surprised if optimizers can do that now, especially because I like your third version best from a maintainability standpoint (easiest one to change if we want to add new conditions which lead to different functions to call, e.g.).
Still, if it fails to inline, then the function pointer solution will have a tendency to be the most costly, not only because of the long jump and potentially the additional stack spills and so forth, but also because the optimizer will lack information -- there's an optimizer barrier when it doesn't know what function is going to be called through a pointer. At that point it can no longer coalesce all this information in IR and do the best job of instruction selection, register allocation, etc. This compiler design aspect of indirect function calls isn't discussed quite as often, but is potentially the most expensive part of calling a function indirectly.
Not sure if it qualifies as "hidden", but of course using a function pointer requires one more level of indirection.
The compiler has to generate code to dereference the pointer, and then jump to the resulting address, as opposed to code that just directly jumps to a constant address, for a normal function call.
You have three cases:
If inside the loop, function pointer de-ref inside the loop, if outside the loop.
Of the three, WITH NO COMPILER OPTIMIZATION, the third is going to be the best. The first does a conditional and the second does a pointer de-reference on top of the code you want to run, while the third just runs what you want it to.
If you want to optimize yourself do NOT do the function pointer version! If you don't trust the compiler to optimize, then the extra indirection might end up costing you, and it's a lot easier to break accidentally in the future (in my opinion).
You have to measure which is faster - but I very much doubt the function pointer answer will be faster. Checking a flag probalby has zero latency on modern processors with deep multiple pipelines. Whereas a function pointer will make it likely that the compiler will be forced to do an actual function call, pushing registers etc.
"Why make the compiler guess at something you know?"
Both you and the compiler know some things at compile time - but the processor knows even more things at run time - like if there are empty pipelines in that inner loop. The days of doing this kind of optimization are gone outside of embedded systems and graphics shaders.
The others all raise very valid points, most notably that you have to measure. I want to add three things:
One important aspect is that using function pointers often prevents inlining, which can kill the performance of your code. But it definitely depends. Try to play around with the godbolt compiler explorer and have a look at the assembly generated:
https://godbolt.org/g/85ZzpK
Note than when doThis and doThat are not defined, e.g. as could happen across DSO boundaries, there won't be much of a difference.
The second point is related to the branch prediction. Have a look at https://danluu.com/branch-prediction/. It should make it clear that the code you have here is actually an ideal case for the branch predictor and thus you probably don't have to bother. Again, a good profiler like perf or VTune will tell you whether you are suffering from branch mispredictions or not.
Finally, there was at least one scenario I've seen where hoisting out the conditionals form a loop made a huge difference, despite the above reasoning. This was in a tight mathematical loop, which was not getting auto-vectorized due to the conditionals. GCC and Clang can both output reports about what loop gets vectorized, or why that wasn't done. In my case, a conditional was indeed the issue for the autovectorizer. This was with GCC 4.8 though, so things may have changed since then. With Godbolt, it's pretty easy to check whether this is an issue for you. Again, always measure on your target machine and check whether you are affected or not.
Similar question, but less specific:
Performance issue for vector::size() in a loop
Suppose we're in a member function like:
void Object::DoStuff() {
for( int k = 0; k < (int)this->m_Array.size(); k++ )
{
this->SomeNotConstFunction();
this->ConstFunction();
double x = SomeExternalFunction(i);
}
}
1) I'm willing to believe that if only the "SomeExternalFunction" is called that the compiler will optimize and not redundantly call size() on m_Array ... is this the case?
2) Wouldn't you almost certainly get a boost in speed from doing
int N = m_Array.size()
for( int k = 0; k < N; k++ ) { ... }
if you're calling some member function that is not const ?
Edit Not sure where these down-votes and snide comments about micro-optimization are coming from, perhaps I can clarify:
Firstly, it's not to optimize per-se but just understand what the compiler will and will not fix. Usually I use the size() function but I ask now because here the array might have millions of data points.
Secondly, the situation is that "SomeNotConstFunction" might have a very rare chance of changing the size of the array, or its ability to do so might depend on some other variable being toggled. So, I'm asking at what point will the compiler fail, and what exactly is the time cost incurred by size() when the array really might change, despite human-known reasons that it won't?
Third, the operations in-loop are pretty trivial, there are just millions of them but they are embarrassingly parallel. I would hope that by externally placing the value would let the compiler vectorize some of the work.
Do not get into the habit of doing things like that.
The cases where the optimization you make in (2) is:
safe to do
has a noticeable difference
something your compiler cannot figure out on its own
are few and far in-between.
If it were just the latter two points, I would just advise that you're worrying about something unimportant. However, that first point is the real killer: you do not want to get in the habit of giving yourself extra chances to make mistakes. It's far, far easier to accelerate slow, correct code than it is to debug fast, buggy code.
Now, that said, I'll try answering your question. The definitions of the functions SomeNotConstFunction and SomeConstFunction are (presumably) in the same translation unit. So if these functions really do not modify the vector, the compiler can figure it out, and it will only "call" size once.
However, the compiler does not have access to the definition of SomeExternalFunction, and so must assume that every call to that function has the potential of modifying your vector. The presence of that function in your loop guarantees that `size is "called" every time.
I put "called" in quotes, however, because it is such a trivial function that it almost certainly gets inlined. Also, the function is ridiculously cheap -- two memory lookups (both nearly guaranteed to be cache hits), and either a subtraction and a right shift, or maybe even a specialized single instruction that does both.
Even if SomeExternalFunction does absolutely nothing, it's quite possible that "calling" size every time would still only be a small-to-negligible fraction of the running time of your loop.
Edit: In response to the edit....
what exactly is the time cost incurred by size() when the array really might change
The difference in the times you see when you time the two different versions of code. If you're doing very low level optimizations like that, you can't get answers through "pure reason" -- you must empirically test the results.
And if you really are doing such low level optimizations (and you can guarantee that the vector won't resize), you should probably be more worried about the fact the compiler doesn't know the base pointer of the array is constant, rather than it not knowing the size is constant.
If SomeExternalFunction really is external to the compilation unit, then you have pretty much no chance of the compiler vectorizing the loop, no matter what you do. (I suppose it might be possible at link time, though....) And it's also unlikely to be "trivial" because it requires function call overhead -- at least if "trivial" means the same thing to you as to me. (again, I don't know how good link time optimizations are....)
If you really can guarantee that some operations will not resize the vector, you might consider refining your class's API (or at least it's protected or private parts) to include functions that self-evidently won't resize the vector.
The size method will typically be inlined by the compiler, so there will be a minimal performance hit, though there will usually be some.
On the other hand, this is typically only true for vectors. If you are using a std::list, for instance, the size method can be quite expensive.
If you are concerned with performance, you should get in the habit of using iterators and/or algorithms like std::for_each, rather than a size-based for loop.
The micro optimization remarks are probably because the two most common implementations of vector::size() are
return _Size;
and
return _End - _Begin;
Hoisting them out of the loop will probably not noticably improve the performance.
And if it is obvious to everyone that it can be done, the compiler is also likely to notice. With modern compilers, and if SomeExternalFunction is statically linked, the compiler is usually able to see if the call might affect the vector's size.
Trust your compiler!
In MSVC 2015, it does a return (this->_Mylast() - this->_Myfirst()). I can't tell you offhand just how the optimizer might deal with this; but unless your array is const, the optimizer must allow for the possibility that you may modify its number of elements; making it hard to optimize out. In Qt, it equates to an inline function that that does a return d->size; ; that is, for a QVector.
I've taken to doing it in one particular project I'm working on, but it is for performance-oriented code. Unless you are interested in deeply optimizing something, I wouldn't bother. It probably is pretty fast any of these ways. In Qt, it is at most one pointer dereferencing, and is more typing. It looks like it could make a difference in MSVC.
I think nobody has offered a definitive answer so far; but if you really want to test it, have the compiler emit assembly source code, and inspect it both ways. I wouldn't be surprised to find that there's no difference when highly optimized. Let's not forget, though, that unoptimized performance during debug is also a factor that might be taken into consideration, when a lot of e.g. number crunching is involved.
I think the OP's original ? really could use to give how the array is declared.
What is the most efficient way to code "print all the elements of a vector to standard out" in C++,
for(std::vector<int>::iterator it = intVect.begin(); it != intVect.end(); ++i)
std::cout << *it;
or
std::copy(intVect.begin(), intVect.end(), std::ostream_iterator<int>(std::cout));
and why?
You can use
http://louisdx.github.com/cxx-prettyprint/
and rely on the work of other people that made sure it will be most optimal.
If you are asking which of methods you've posted will be faster, the only valid answer can be:
There is no way to know for sure because they are equivalent. You must profile them both and see
for yourself.
This is because the two methods are effectively the same. The do the same thing, but they use different mechanisms to do it. By the time your compiler's optimizer has finished with the code, it may have found different opportunities to increase execution speed, or it may have found opportunities in each that result in identical machine code being executed.
For example, consider:
for(std::vector<int>::iterator it = intVect.begin(); it != intVect.end(); ++i)
At first blush, it might seem like this could have a built-in inefficiency by the fact that intVect.end() is evaluated at each loop. This would make this method slower than,
std::copy(intVect.begin(), intVect.end(), std::ostream_iterator<int>(std::cout));
...where it is only evaluated once.
However, depending on the surrounding code and your compiler's settings, it might be re-written so that it is only evaluated once, at the beginning of the for. (Credit: #SteveJessop) Or it might even be that it isn't hoisted, but evaluating it is no different from examining a pre-computed value. It's possible that either way, the emitted code must load a pointer value from (stack pointer) + (small offset known at compile time). The only way to know for sure is to compile them both and examine the resulting assembly code.
Beyond all of this however is a more fundamental issue. You are asking which method of doing something is faster, when the core thing you're trying to do is potentially very slow to begin with, relative to the means by which you do it. If you are writing to stdout using streams, it is going to have negligible effect on the overall execution time whether you use a for loop or std::copy even if one is marginally faster than the other. If your concern is overall execution time, you're possibly barking up the wrong tree.
These two lines will end up doing essentially the same thing (almost definitely) once the compiler gets through with them. Either way you will end up with the same code looping through using iterators in range of {begin, end-1} using the same streams.
This is a micro-optimization that will not help you significantly, though I'm sure you can compile it with a big data set and see for yourself easily on your platform.
What is the preferred method of writing loops according to efficiency:
Way a)
/*here I'm hoping that compiler will optimize this
code and won't be calling size every time it iterates through this loop*/
for (unsigned i = firstString.size(); i < anotherString.size(), ++i)
{
//do something
}
or maybe should I do it this way:
Way b)
unsigned first = firstString.size();
unsigned second = anotherString.size();
and now I can write:
for (unsigned i = first; i < second, ++i)
{
//do something
}
the second way seems to me like worse option for two reasons: scope polluting and verbosity but it has the advantage of being sure that size() will be invoked once for each object.
Looking forward to your answers.
I usually write this code as:
/* i and size are local to the loop */
for (size_t i = firstString.size(), size = anotherString.size(); i < size; ++i) {
// do something
}
This way I do not pollute the parent scope and avoid calling anotherString.size() for each loop iteration.
It is especially useful with iterators:
for(some_generic_type<T>::forward_iterator it = container.begin(), end = container.end();
it != end; ++it) {
// do something with *it
}
Since C++ 11 the code can be shortened even more by writing a range-based for loop:
for(const auto& item : container) {
// do something with item
}
or
for(auto item : container) {
// do something with item
}
In general, let the compiler do it. Focus on the algorithmic complexity of what you're doing rather than micro-optimizations.
However, note that your two examples are not semantically identical - if the body of the loop changes the size of the second string, the two loops will not iterate the same amount of times. For that reason, the compiler might not be able to perform the specific optimization you're talking about.
I would first use the first version, simply because it looks cleaner and easier to type. Then you can profile it to see if anything needs to be more optimized.
But I highly doubt that the first version will cause a noticable performance drop. If the container implements size() like this:
inline size_t size() const
{
return _internal_data_member_representing_size;
}
then the compiler should be able to inline the function, eliding the function call. My compiler's implementation of the standard containers all do this.
How will a good compiler optimize your code? Not at all, as it can't be sure size() has any side-effects. If size() had any side effects your code relied on, they'd now be gone after a possible compiler optimization.
This kind of optimization really isn't safe from a compiler's perspective, you need to do it on your own. Doing on your own doesn't mean you need to introduce two additional local variables. Depending on your implementation of size, it might be an O(1) operation. If size is also declared inline, you'll also spare the function call, making the call to size() as good as a local member access.
Don't pre-optimize your code. If you have a performance problem, use a profiler to find it, otherwise you are wasting development time. Just write the simplest / cleanest code that you can.
This is one of those things that you should test yourself. Run the loops 10,000 or even 100,000 iterations and see what difference, if any, exists.
That should tell you everything you want to know.
My recommendation is to let inconsequential optimizations creep into your style. What I mean by this is that if you learn a more optimal way of doing something, and you cant see any disadvantages to it (as far as maintainability, readability, etc) then you might as well adopt it.
But don't become obsessed. Optimizations that sacrifice maintainability should be saved for very small sections of code that you have measured and KNOW will have a major impact on your application. When you do decide to optimize, remember that picking the right algorithm for the job is often far more important than tight code.
I'm hoping that compiler will optimize this...
You shouldn't. Anything involving
A call to an unknown function or
A call to a method that might be overridden
is hard for a C++ compiler to optimize. You might get lucky, but you can't count on it.
Nevertheless, because you find the first version simpler and easier to read and understand, you should write the code exactly the way it is shown in your simple example, with the calls to size() in the loop. You should consider the second version, where you have extra variables that pull the common call out of the loop, only if your application is too slow and if you have measurements showing that this loop is a bottleneck.
Here's how I look at it. Performance and style are both important, and you have to choose between the two.
You can try it out and see if there is a performance hit. If there is an unacceptable performance hit, then choose the second option, otherwise feel free to choose style.
You shouldn't optimize your code, unless you have a proof (obtained via profiler) that this part of code is bottleneck. Needless code optimization will only waste your time, it won't improve anything.
You can waste hours trying to improve one loop, only to get 0.001% performance increase.
If you're worried about performance - use profilers.
There's nothing really wrong with way (b) if you just want to write something that will probably be no worse than way (a), and possibly faster. It also makes it clearer that you know that the string's size will remain constant.
The compiler may or may not spot that size will remain constant; just in case, you might as well perform this optimization yourself. I'd certainly do this if I was suspicious that the code I was writing was going to be run a lot, even if I wasn't sure that it would be a big deal. It's very straightforward to do, it takes no more than 10 extra seconds thinking about it, it's very unlikely to slow things down, and, if nothing else, will almost certainly make the unoptimized build run a bit more quickly.
(Also the first variable in style (b) is unnecessary; the code for the init expression is run only once.)
How much percent of time is spent in for as opposed to // do something? (Don't guess - sample it.) If it is < 10% you probably have bigger issues elsewhere.
Everybody says "Compilers are so smart these days."
Well they're no smarter than the poor coders who write them.
You need to be smart too. Maybe the compiler can optimize it but why tempt it not to?
For the "std::size_t size()const" member function which not only is O(1) but is also declared "const" and so can be automatically pulled out of the loop by the compiler, it probably doesn't matter. That said, I wouldn't count on the compiler to remove it from the loop, and I think it is a good habit to get into to factor out the calls within the loop for cases where the function isn't constant or O(1). In addition, I think assigning the values to a variable leads to the code being more readable. I would not suggest, though, that you make any premature optimizations if it will result in the code being harder to read. Again, though, I think the following code is more readable, since there is less to read within the loop:
std::size_t firststrlen = firststr.size();
std::size_t secondstrlen = secondstr.size();
for ( std::size_t i = firststrlen; i < secondstrlen; i++ ){
// ...
}
Also, I should point out that you should use "std::size_t" instead of "unsigned", as the type of "std::size_t" can vary from one platform to another, and using "unsigned" can lead to trunctations and errors on platforms for which the type of "std::size_t" is "unsigned long" instead of "unsigned int".