So, I am new to online competitive programming and i came across a code where i am using the if else statement inside a for loop. I want to increase the speed of the loop and after doing some research i came across break and continue statements.
So my question is that does using continue really increases the speed of the loop or not.
CODE :
int even_sum = 0;
for(int i=0;i<200;i++){
if(i%4 == 0){
even_sum +=i;
continue;
}else{
//do other stuff when sum of multiple of 4 is not calculated
}
}
In the specific code in the question, the code has the identical meaning with and without the continue: In either case, after execution leaves even_sum +=i;, it flows to the closing } of the for statement. Any compiler of even modest quality should treat the two options identically.
The intended purpose of continue is not to speed up code by requesting a jump the compiler is going to make anyway but to skip code that is undesired in the current loop iteration—it acts as if the remaining code had been enclosed in an else clause but may be more visually appealing and less disruptive to human perception of the code.
It is conceivable a very rudimentary compiler, or even a decent compiler but with optimization disabled, might generate a jump instruction for the continue and also a jump instruction for the “then” clause of the if statement to jump over the else clause. The latter would never be executed and would have no direct effect on program execution time, but it would increase the size of the program and thus could have indirect effects. This possibility is of negligible concern in typical modern environments, where you are unlikely to encounter such a rudimentary compiler.
No, there's no speed advantage when using continue here. Both of your codes are identical and even without optimizations they produce the same machine code.
However, sometimes continue can make your code a lot more efficient, if you have structured your loop in a specific way, e.g.
This:
int even_sum = 0;
for (int i = 0; i < 200; i++) {
if (i % 4 == 0) {
even_sum += i;
continue;
}
if (huge_computation_but_always_false_when_multiple_of_4(i)) {
// do stuff
}
}
is a lot more efficient, than:
int even_sum = 0;
for (int i = 0; i < 200; i++) {
if (i % 4 == 0) {
even_sum += i;
}
if (huge_computation_but_always_false_when_multiple_of_4(i)) {
// do stuff
}
}
because the former doesn't have to execute the huge_computation_but_always_false_when_multiple_of_4() function every time.
So even though both of these codes would always produce the same result (given that huge_computation_but_always_false_when_multiple_of_4() has no side effects), the first one, which uses continue, would be a lot faster.
Related
I wrote a neural network program in C++ to test something, and I found that my program gets slower as computation proceeds. Since this kind of phenomenon is somewhat I've never seen before, I checked possible causes. Memory used by program did not change when it got slower. RAM and CPU status were fine when I ran the program.
Fortunately, the previous version of the program did not have such problem. So I finally found that a single statement that makes the program slow. The program does not get slower when I use this statement:
dw[k][i][j] = hidden[k-1][i].y * hidden[k][j].phi;
However, the program gets slower and slower as soon as I replace above statement with:
dw[k][i][j] = hidden[k-1][i].y * hidden[k][j].phi - lambda*w[k][i][j];
To solve this problem, I did my best to find and remove the cause but I failed... The below is the simple code structure. For the case that this is not the problem that is related to local statement, I uploaded my code to google drive. The URL is located at the end of this question.
MLP.h
class MLP
{
private:
...
double lambda;
double ***w;
double ***dw;
neuron **hidden;
...
MLP.cpp
...
for(k = n_depth - 1; k > 0; k--)
{
if(k == n_depth - 1)
...
else
{
...
for(j = 1; n_neuron > j; j++)
{
for(i = 0; n_neuron > i; i++)
{
//dw[k][i][j] = hidden[k-1][i].y * hidden[k][j].phi;
dw[k][i][j] = hidden[k-1][i].y * hidden[k][j].phi - lambda*w[k][i][j];
}
}
}
}
...
Full source code: https://drive.google.com/open?id=1A8Uw0hNDADp3-3VWAgO4eTtj4sVk_LZh
I'm not sure exactly why it gets slower and slower, but I do see where you can gain some performance.
Two and higher dimensional arrays are still stored in one dimensional
memory. This means (for C/C++ arrays) array[i][j] and array[i][j+1]
are adjacent to each other, whereas array[i][j] and array[i+1][j] may
be arbitrarily far apart.
Accessing data in a more-or-less sequential fashion, as stored in
physical memory, can dramatically speed up your code (sometimes by an
order of magnitude, or more)!
When modern CPUs load data from main memory into processor cache,
they fetch more than a single value. Instead they fetch a block of
memory containing the requested data and adjacent data (a cache line
). This means after array[i][j] is in the CPU cache, array[i][j+1] has
a good chance of already being in cache, whereas array[i+1][j] is
likely to still be in main memory.
Source: https://people.cs.clemson.edu/~dhouse/courses/405/papers/optimize.pdf
With your current code, w[k][i][j] will be read, and on the next iteration, w[k][i+1][j] will be read. You should invert i and j so that w is read in sequential order:
for(j = 1; n_neuron > j; ++j)
{
for(i = 0; n_neuron > i; ++i)
{
dw[k][j][i] = hidden[k-1][j].y * hidden[k][i].phi - lambda*w[k][j][i];
}
}
Also note that ++x should be slightly faster than x++, since x++ has to create a temporary containing the old value of x as the expression result. The compiler might optimize it when the value is unused though, but do not count on it.
I have an if statement that is currently never executed, however if I print something to the screen it takes over ten times longer for the program to run than if a variable is declared. Doing a bit of research online this seems to be some kind of branch prediction issue. Is there anything I can do to improve the program speed?
Basically both myTest and myTest_new return the same thing except one is a macro and one is a function. I am just monitoring the time it takes for bitTest to execute. and it executes in 3 seconds with just declaration in if statement but takes over a minute when Serial.print is in if statement even though neither are executed.
void bitTest()
{
int count = 0;
Serial1.println("New Test");
int lastint = 0;
Serial1.println("int");
for (int index = -2147483647; index <= 2147483647; index+=1000) {
if (index <= 0 && lastint > 0) {
break;
}
lastint = index;
for (int num = 0; num <= 31; num++) {
++1000;
int vcr1 = myTest(index, num);
int vcr2 = myTest_new(index, num);
if (vcr1 != vcr2) {
Serial1.println("Test"); // leave this println() and it takes 300 seconds for the test to run
//int x = 0;
}
} // if (index)
} // for (index)
Serial1.print("count = ");
Serial1.println(count);
return;
}
It is much less likely to be caused by a branch prediction (that branch prediction shouldn't be influenced by what you do inside your code) but by the fact that
{
int x = 0;
}
simply does nothing, because the scope of x ends at }, so that the compiler simply ditches the whole if clause, including the check. Note that this is only possible because the expression that if checks has no side effects, and neither does the block that would get executed.
By the way, the code you showed would usually directly be "compiled away", because the compiler, at compile time, can determine whether the if clause could ever be executed, unless you explicitly tell the compiler to omit such safe optimizations. Hence, I kind of doubt your "10 times as slow" measurement. Either the code you're showing isn't the actual example on which you demonstrate this, or you should turn on compiler optimization prior to doing performance comparisons.
The reason why your program takes forever is that it's buggy:
for (int index = -2147483647; index <= 2147483647; index+=1000) {
simply: at a very large index close to the maximum integer value, a wrap-around will occur. There's no "correct" way for your program to terminate. Hence you invented your strange lastint > 0 checking.
Now, fix up the loop (I mean, you're really just using every 1000th element, so why not simply loop index from 0 to 2*2147483?)
++1000;
should be illegal in C, because you can't increase a constant numeral. This is very much WTF.
All in all, your program is a mess. Re-write it, and debug a clean, well-defined version of it.
There is a lot of debate about the goto command, this question is not about the rightness or wrongness of its use, but more simply a question of whether it ever actually creates different assembly.
I'm specifically looking at Visual Studio 2013, but an example in any compiler would be wonderful.
Bjarne Stroustrup states:
The scope of a label is the function it is in (§6.3.4). This implies that you can use goto to jump into and out of blocks. The only restriction is that you cannot jump past an initializer or an exception handler (§13.5).
One to the few sensible uses of goto in ordinary code is to break out from a nested loop or switch-statement.
My question then: Is there any instance in which goto still produces different assembly than what can already be accomplished by use of other control structures?
For example, this produces identical assembly:
auto r = rand();
auto a = 0;
for(auto i = rand(); i > 0; --i){
switch(r){
case 1:
++sum;
goto END;
case default:
sum += rand();
break;
}
}
sum++;
END:
To this non-goto code:
auto r = rand();
auto b = false;
auto a = 0;
for(auto i = rand(); i > 0; --i){
switch(r){
case 1:
++sum;
b = true;
break;
case default:
sum += rand();
break;
}
if(b)break;
}
if(!b)sum++;
Here's my experience: I once had a bit of code that was extremely time critical. And it had a loop that iterated on average zero times (while (condition) ...) and the condition was almost always false. The compiler insisted on loop optimisations, moving things outside the loop - even when the loop wasn't executed at all, therefore slowing it down.
I tried to rewrite the loop using goto, hoping to confuse the optimiser enough to give up on optimising the code, and failed. gcc and clang optimise depending on the actual control flow, not depending on what C or C++ code you use.
In the follow two code snippets, is there actually any different according to the speed of compiling or running?
for (int i = 0; i < 50; i++)
{
if (i % 3 == 0)
continue;
printf("Yay");
}
and
for (int i = 0; i < 50; i++)
{
if (i % 3 != 0)
printf("Yay");
}
Personally, in the situations where there is a lot more than a print statement, I've been using the first method as to reduce the amount of indentation for the containing code. Been wondering for a while so found it about time I ask whether it's actually having an effect other than visually.
Reply to Alf (i couldn't get code working in comments...)
More accurate to my usage is something along the lines of a "handleObjectMovement" function which would include
for each object
if object position is static
continue
deal with velocity and jazz
compared with
for each object
if object position is not static
deal with velocity and jazz
Hence me not using return. Essentially "if it's not relevant to this iteration, move on"
The behaviour is the same, so the runtime speed should be the same unless the compiler does something stupid (or unless you disable optimisation).
It's impossible to say whether there's a difference in compilation speed, since it depends on the details of how the compiler parses, analyses and translates the two variations.
If speed is important, measure it.
If you know which branch of the condition has higher probability you may use GCC likely/unlikely macro
How about getting rid of the check altogether?
for (int t = 0; t < 33; t++)
{
int i = t + (t >> 1) + 1;
printf("%d\n", i);
}
Would there be any noticeable speed difference between these two snippets of code? Naively, I think the second snippet would be faster because branch instructions are encountered a lot less, but on the other hand the branch predictor should solve this problem. Or will it have a noticeable overhead despite the predictable pattern? Assume that no conditional move instruction is used.
Snippet 1:
for (int i = 0; i < 100; i++) {
if (a == 3)
output[i] = 1;
else
output[i] = 0;
}
Snippet 2:
if (a == 3) {
for (int i = 0; i < 100; i++)
output[i] = 1;
} else {
for (int i = 0; i < 100; i++)
output[i] = 0;
}
I'm not intending to optimise these cases myself, but I would like to know more about the overhead of branches even with a predictable pattern.
Since a remains unchanged once you enter into the loop, there shouldn't be much difference between the two code-snippet.
Personally, I would prefer the former, unless branch predictor fails to predict the branch which is really unlikely, given that a remains unchanged in the loop.
Moreover, the compiler may perform this optimization:
Loop unswitching
thereby making both code-snippets emit exactly same machine instructions.
You asked a performance question without specifying hardware (although from the question we can infer that it's one of the architectures that have branch prediction), toolchain, or compile options.
Overall, this is just another space vs speed tradeoff, where space often itself affects speed (CPU instruction and microcode caches).
The only reasonable answer is "Performance will vary depending on processor hardware and compiler optimizations."