As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Let's suppose we have this struct
struct structure
{
type element;
int size;
}
and we're in the main and we want to iterate something.
Is it faster
for ( int i = 0; i < structure.size; ++i )
or
int size = structure.size;
for ( int i = 0; i < size; ++i )
?
Does weight more the continous binding to sctructure in the first method or the additional space of memory, and the time spent creating the first variable in the first line of method n.2?
I can't see any other difference between the two of them, so if you do, please share!
EDIT: I edited the question so that is now concise, simple and easy answerable.
Please reconsider the vote you would give to it. Thank you.
There might be a good reason to choose one over the other. If the contents of your loop in the first example change the value of structure.size, i will be continuously checked against the current value. However, in your second choice, size will not change as structure.size does. Which one you want depends on the problem. I would perhaps change size to be called initialSize instead, however.
If that is not the case, you should stop thinking about such minor "optimizations" and instead think about what is most readable. I'd prefer the first choice because it doesn't introduce an unnecessary variable name. When you have two bits of code that do the same thing, trust the compiler to work out the optimal way of doing it. It's your job to tell the compiler what you want your program to do. It's the compilers job to do it in the best way it can.
If and only if you determine through measurement that this is a necessary optimization (I can't imagine that it ever will be) should you then choose the one that measures fastest.
Very unlikely that there will be any actual difference in the compiled code from this, unless it's REALLY an ancient compiler with really rubbish optimisation. Anything like gcc, clang, MSVC or Intel's C++ compilers would produce exactly the same code for these scenarios.
Of course, if you start calling a function inside the condition of the loop, and the data processed by the function is modified by the loop, e.g.
std::string str;
cin >> str;
for(int i = 0; i < str.size(); i++)
{
if (str[i] > 'a')
str+= 'B';
}
then we have a different story...
You should allow compiler to do microoptimizations like this. Write readable code, make it work, then if it runs slow profile it and optimize where it is really necessary.
Though in case inside the loop you call a function, that can modify this structure and compiler does not have access to it's implementation second variant may help, as you give compiler a hint that it does not need to reload structure.size from memory. I would recommend to use const:
const int size = structure.size;
for ( int i = 0; i < size; ++i ) {
somefunc( &structure );
}
I do not know how much you know about compilation, but among the various phases of a compiler, there is a phase called code-optimization, which attempts to improve the intermediate code (by performing various optimization techniques like dead-code elimination, loop transformations, etc.), so that faster-running machine code can be produced.
So, actually your compiler takes care of your headache and I doubt that you would notice any performance issues.
In your first method, if structure is a reference or a member variable, it will not be properly stored into the CPU cache, as there is no way to tell if it is changed outise this block.
In your second method, as size is a local variable to the current code block, it will be properly stored in cache.
Thus, the second method should be faster, despite creating a new variable.
See Load-Hit-Store for a more complete explanation.
Related
I have done my best and read a lot of Q&As on SO.SE, but I haven't found an answer to my particular question. Most for-loop and break related question refer to nested loops, while I am concerned with performance.
I want to know if using a break inside a for-loop has an impact on the performance of my C++ code (assuming the break gets almost never called). And if it has, I would also like to know tentatively how big the penalization is.
I am quite suspicions that it does indeed impact performance (although I do not know how much). So I wanted to ask you. My reasoning goes as follows:
Independently of the extra code for the conditional statements that
trigger the break (like an if), it necessarily ads additional
instructions to my loop.
Further, it probably also messes around when my compiler tries to
unfold the for-loop, as it no longer knows the number of iterations
that will run at compile time, effectively rendering it into a
while-loop.
Therefore, I suspect it does have a performance impact, which could be
considerable for very fast and tight loops.
So this takes me to a follow-up question. Is a for-loop & break performance-wise equal to a while-loop? Like in the following snippet, where we assume that checkCondition() evaluates 99.9% of the time as true. Do I loose the performance advantage of the for-loop?
// USING WHILE
int i = 100;
while( i-- && checkCondition())
{
// do stuff
}
// USING FOR
for(int i=100; i; --i)
{
if(checkCondition()) {
// do stuff
} else {
break;
}
}
I have tried it on my computer, but I get the same execution time. And being wary of the compiler and its optimization voodoo, I wanted to know the conceptual answer.
EDIT:
Note that I have measured the execution time of both versions in my complete code, without any real difference. Also, I do not trust compiling with -s (which I usually do) for this matter, as I am not interested in the particular result of my compiler. I am rather interested in the concept itself (in an academic sense) as I am not sure if I got this completely right :)
The principal answer is to avoid spending time on similar micro optimizations until you have verified that such condition evaluation is a bottleneck.
The real answer is that CPU have powerful branch prediction circuits which empirically work really well.
What will happen is that your CPU will choose if the branch is going to be taken or not and execute the code as if the if condition is not even present. Of course this relies on multiple assumptions, like not having side effects on the condition calculation (so that part of the body loop depends on it) and that that condition will always evaluate to false up to a certain point in which it will become true and stop the loop.
Some compilers also allow you to specify the likeliness of an evaluation as a hint the branch predictor.
If you want to see the semantic difference between the two code versions just compile them with -S and examinate the generated asm code, there's no other magic way to do it.
The only sensible answer to "what is the performance impact of ...", is "measure it". There are very few generic answers.
In the particular case you show, it would be rather surprising if an optimising compiler generated significantly different code for the two examples. On the other hand, I can believe that a loop like:
unsigned sum = 0;
unsigned stop = -1;
for (int i = 0; i<32; i++)
{
stop &= checkcondition(); // returns 0 or all-bits-set;
sum += (stop & x[i]);
}
might be faster than:
unsigned sum = 0;
for (int i = 0; i<32; i++)
{
if (!checkcondition())
break;
sum += x[i];
}
for a particular compiler, for a particular platform, with the right optimization levels set, and for a particular pattern of "checkcondition" results.
... but the only way to tell would be to measure.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
If I have a number a, would it be slower to add 1 to it b times rather than simply adding a + b?
a += b;
or
for (int i = 0; i < b; i++) {
a += 1;
}
I realize that the second example seems kind of silly, but I have a situation where coding would actually be easier that way, and I am wondering if that would impact performance.
EDIT: Thank you for all your answers. It looks like some posters would like to know what situation I have. I am trying to write a function to shift an inputted character a certain number of characters over (ie. a cipher) if it is a letter. So, I want to say that one char += the number of shifts, but I also need to account for the jumps between the lowercase characters and uppercase characters on the ascii table, and also wrapping from z back to A. So, while it is doable in another way, I thought it would be easiest to keep adding one until I get to the end of a block of letter characters, then jump to the next one and keep going.
If your loop is really that simple, I don't see any reason why a compiler couldn't optimize it. I have no idea if any actually would, though. If your compiler doesn't the single addition will be much faster than the loop.
The language C++ does not describe how long either of those operations take. Compilers are free to turn your first statement into the second, and that is a legal way to compile it.
In practice, many compilers would treat those two subexpressions as the same expression, assuming everything is of type int. The second, however, would be fragile in that seemingly innocuous changes would cause massive performance degradation. Small changes in type that 'should not matter', extra statements nearby, etc.
It would be extremely rare for the first to be slower than the second, but if the type of a was such that += b was a much slower operation than calling += 1 a bunch of times, it could be. For example;
struct A {
std::vector<int> v;
void operator+=( int x ) {
// optimize for common case:
if (x==1 && v.size()==v.capacity()) v.reserve( v.size()*2 );
// grow the buffer:
for (int i = 0; i < x; ++i)
v.reserve( v.size()+1 );
v.resize( v.size()+1 );
}
}
};
then A a; int b = 100000; a+=b; would take much longer than the loop construct.
But I had to work at it.
The overhead (CPU instructions) on having a variable being incremented in a loop is likely to be insignificant compared to the total number of instructions in that loop (unless the only thing you are doing in the loop is incrementing). Loop variables are likely to remain in the low levels of the CPU cache (if not in CPU registries) and is very fast to increment as in doesn't need to read from the RAM via the FSB. Anyway, if in doubt just make a quick profile and you'll know if it makes sense to sacrifice code readability for speed.
Yes, absolutely slower. The second example is beyond silly. I highly doubt you have a situation where it would make sense to do it that way.
Lets say 'b' is 500,000... most computers can add that in a single operation, why do 500,000 operations (not including the loop overhead).
If the processor has an increment instruction, the compiler will usually translate the "add one" operation into an increment instruction.
Some processors may have an optimized increment instructions to help speed up things like loops. Other processors can combine an increment operation with a load or store instruction.
There is a possibility that a small loop containing only an increment instruction could be replaced by a multiply and add. The compiler is allowed to do so, if and only if the functionality is the same.
This kind of operation, generally produces negligible results. However, for large data sets and performance critical applications, this kind of operation may be necessary and the time gained would be significant.
Edit 1:
For adding values other than 1, the compiler would emit processor instructions to use the best addition operations.
The add operation is optimized in hardware as a different animal than incrementing. Arithmetic Logic Units (ALU) have been around for a long time. The basic addition operation is very optimized and a lot faster than incrementing in a loop.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have frequently noticed the following pattern:
for (int i = 0; i < strlen(str); ++i) {
// do some operations on string
}
The complexity of above loop would be O(N²) because the complexity of strlen is N and that comparison is made during every iteration.
However, if we calculate strlen before the loop and use that constant, the complexity of the loop is reduced to O(N).
I am sure there are many other such optimizations.
Does the compiler carry out such optimizations or do programmers have to take precautions to prevent it?
While I don't have any solid evidence whatsoever, my guess would be this:
The compiler makes a data flow analysis of the variable str. If it's potentially modified inside the loop or marked as volatile, there is no guarantee that strlen(str) will remain constant between iterations and therefore cannot be cached. Otherwise, it should be safe to cache and would be optimized.
Yes, good optimizers are able to do this kind of transform if they can establish that the string remains unmodified in the loop body. They can pull out of loops expressions that remain constant.
Anyway, in a case where you would, say, turn all characters to uppercase, it would be hard for a compiler to infer that the string length won't change.
I personally favor a "defensive" approach, not relying on advanced compiler skills, and do the obvious optimizations myself. In case the code would be ported to a different environment, with a poorer compiler, or just in case of doubt.
Also think of the cases where optimization is off.
Try
for (int i = 0; str[i]; ++i) {
// do some operations on string
}
As strlen is essentially doing this
The first step towards understanding what compilers do, can do, and can not do is to write your intentions into the code and see what happens:
const int len = strlen(str);
for (int i=0; i<len; ++I)
{
// do some operations which DO NOT CHANGE the length of str
}
Of course, it is your responsibility not to change the length of str inside the loop...you can lowercase, uppercase, swap or replace characters...something you may 'assert()' if you really care (in a debug version). In this case, you communicated your intentions to the compiler and if you are lucky and you are using a good compiler you are likely to get what you are after.
I really doubt that this optimisation would make any difference in your code: if you were doing heavy string manipulation you would (1) know whether you are working with long strings or short ones, (2) be using a library which (a) keeps explicit track of the length of strings, (b) makes (especially repeated) string concatenation cheaper as it is in C.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
If I want to repeat a task 100 times, is it better to write.
for (i = 0; i < 100; ++i) { ... }
or
for (i = 100; i > 0; --i) { ... }
I'd go for #1. It's more intuitive to humans reading your code. In case when no performance benefit can be achieved always choose the more readable solution.
a: for (i = 0; i < 100; ++i) { ... }
b: for (i = 100; i > 0; --i) { ... }
It doesn't really matter, but I use A. B should be fine, but if you go to i >= 0 instead of i > 0 you get issues if i is unsigned.
If you don't need to use i inside the loop, or the order of execution doesn't matter (no data dependencies between iterations), then the second one is probably just a tiny bit faster.
But the first is easier for beginning programmers to read. Depending on who else is likely to look at the code, this might or might not be a concern.
I tend to prefer this:
{
int i = 100;
do {
--i;
...
} while (i);
}
because it's efficient, terse, and gives the same range for i (0 .. 99, descending), unlike the second for loop which gives i in the range 1..100 (descending)
You don't state any use case, so "better" is completely meaningless here.
However, for the sake of sanity, I'll point out this:
It's very unusual that you're just counting to 100 for the sake of it. If you are, then either loop is equivalent to the other.
However, let's consider if you're iterating through an array of 100 elements, and for some reason doing so with indexes. Your index counter should be unsigned, and you should be counting between 0 and 99. Then your second loop should actually be:
for (size_t i = 99; i >= 0; --i) { ... }
The problem here is that on the "last" iteration, when i is 0, decrementing it gives you std::numeric_limits<size_t>::max(), which is some really large number. (This is not an underflow: unsigned values are defined to wrap-around at the limits of their range.) The loop condition is still satisfied, and you get an infinite loop.
In general, loop backwards to 0 with an unsigned quantity is not going to work as you expect.
Sure, you can work around this by changing the counter to be signed, but you are then restricting the counter's range and, semantically, array indexes should be unsigned. Indeed, you're going to see "comparing signed and unsigned values" warnings if you do this.
So, if only in this way, looping backwards can be more error-prone than looping forwards.
But, again, without a specific use case, there's no specific advice that anyone can render here. There is no general concept of "better".
There's certainly not going to be any inherent, noticeable, useful performance difference in the two.
It will depend on the contents of the loop body as to which is the most appropriate.
Your 2 loops are different!
The first one will loop with i ranging from 0 to 99 inclusive;
the second will loop with i ranging from 100 down to 1 inclusive.
It depends what is going on in the loop. If the code in the loop is loop invariant (doesn't depend on the loop variable) then it's quite possible that the compiler you're using will use the decrement version regardless of how you write it. In general, for readability and code maintenance, I'd recommend using an increasing loop counter. For performance I'd recommend the decrement version as most CPUs perform better when doing comparisons with 0.
That said, these days with processors and predictive branches/lots of registers it may not matter much how you write it (for measuring performance). I'd recommend going for readability unless you really really really need to squeeze performance (and even then this is probably not one of the first optimizations I'd target anyway).
Both the same. If 2nd as:-
for (int i = 99; i >= 0; --i) { ... }
Depends what you want to do in the loop. Generally people count up unless there is a good reason not to. To count down will confuse other programmers looking at your code.
For example if you are going to be testing array elements and removing them if they satisfy some condition it is best to count down.
I have a code segment which is as simple as :
for( int i = 0; i < n; ++i)
{
if( data[i] > c && data[i] < r )
{
--data[i];
}
}
It's a part of a large function and project. This is actually a rewrite of a different loop, which proved to be time consuming (long loops), but I was surprised by two things :
When data[i] was temporary stored like this :
for( int i = 0; i < n; ++i)
{
const int tmp = data[i];
if( tmp > c && tmp < r )
{
--data[i];
}
}
It became more much slower. I don't claim this should be faster, but I can not understand why it should be so much slower, the compiler should be able to figure out if tmp should be used or not.
But more importantly when I moved the code segment into a separate function it became around four times slower. I wanted to understand what was going on, so I looked in the opt-report and in both cases the loop is vectorized and seem to do the same optimization.
So my question is what can make such a difference on a function which is not called a million times, but is time consuming in itself ? What to look for in the opt-report ?
I could avoid it by just keeping it inlined, but the why is bugging me.
UPDATE :
I should underline that my main concern is to understand, why it became slower, when moved to a separate function. The code example given with tmp variable, was just a strange example I encountered during the process.
You're probably register starved, and the compiler is having to load and store. I'm pretty sure that the native x86 assembly instructions can take memory addresses to operate on- i.e., the compiler can keep those registers free. But by making it local, you may changing the behaviour wrt. aliasing and the compiler may not be able to prove that the faster version has the same semantics, especially if there is some form of multiple threads in here, allowing it to change the code.
The function was slower when in a new segment likely because function calls not only can break the pipeline, but also create poor instruction cache performance (there's extra code for parameter push/pop/etc).
Lesson: Let the compiler do the optimizing, it's smarter than you. I don't mean that as an insult, it's smarter than me too. But really, especially the Intel compiler, those guys know what they're doing when targetting their own platform.
Edit: More importantly, you need to recognize that compilers are targetted at optimizing unoptimized code. They're not targetted at recognizing half-optimized code. Specifically, the compiler will have a set of triggers for each optimization, and if you happen to write your code in such a way as that they're not hit, you can avoid optimizations being performed even if the code is semantically identical.
And you also need to consider implementation cost. Not every function ideal for inlining can be inlined- just because inlining that logic is too complex for the compiler to handle. I know that VC++ will rarely inline with loops, even if the inlining yields benefit. You may be seeing this in the Intel compiler- that the compiler writers simply decided that it wasn't worth the time to implement.
I encountered this when dealing with loops in VC++- the compiler would produce different assembly for two loops in slightly different formats, even though they both achieved the same result. Of course, their Standard library used the ideal format. You may observe a speedup by using std::for_each and a function object.
You're right, the compiler should be able to identify that as unused code and remove it/not compile it. That doesn't mean it actually does identify it and remove it.
Your best bet is to look at the generated assembly and check to see exactly what is going on. Remember, just because a clever compiler could be able to figure out how to do an optimization, it doesn't mean it can.
If you do check, and see that the code is not removed, you might want to report that to the intel compiler team. It sounds like they might have a bug.