I was writing a code, and accidentally put int before a variable twice, and noticed something different in the end product.
int main ()
{
int number = 123456789;
int step = 0;
while (number>0)
{
cout << number%10 << endl;
number = number/10;
step = step+1;
cout << "step ; " << step << endl;
}
return 0;
}
The output was:
9
step ; 1
8
step ; 2
7
step ; 3
6 ...
However, when I called it this way:
int main ()
{
int number = 123456789;
int step = 0;
while (number>0)
{
cout << number%10 << endl;
number = number/10;
int step = step+1;
cout << "step ; " << step << endl;
}
return 0;
}
The output suddenly changed to:
9
step ; 11006125
8
step ; 11006126
7
step ; 11006127
6
Can someone please explain the difference, so that I will be mindful next time?
C++ is block scoped. When you redeclare step inside the while loop (by using int step instead of just step), you "shadow" the step from outside that scope; while the outer step (with value 0) still exists, it cannot be read directly from the inner scope (a pointer/reference to the outer step with a different name could read/write the data, you just can't read the outer step directly through that name).
The outer step is invisible, so int step = step + 1; translates as "please assign to the local step the value of whatever is already stored in the local step (which, not existing until that moment, has no defined value yet) after adding 1". Since the step it reads from is garbage, this is undefined behavior, and in practice, it gets garbage (whatever junk happened to be left on the stack in that position from before you entered the while).
The second and subsequent creations of the inner step are still undefined behavior, but in this case (and in most cases where loop unrolling isn't involved) the compiler chose to reuse the same storage for them as the prior use of inner step (that expired when that iteration of the loop ended). Thus, as a coincidence of the new inner step reading from the same memory location as the expired inner step, it increments normally from then on, based on the original garbage value as a base.
But again, to be clear, all of this is undefined behavior. Be thankful you didn't get nasal demons, and only use the first form in the future. I'll note, if you enable all warnings, any compiler worth its salt should detect this error (the entire error exists in the scope of a single line); sadly, it appears at least GCC is not worth its salt, because the only thing it detects, even with -Wall -Wextra -Werror -pedantic is that you didn't use the outer step variable. If you used it, or deleted the declaration of the outer step, it compiles with no warnings, so I guess you're stuck just remembering not to initialize a variable in terms of its own value.
In the second case, you declare a second variable, which is also called step, but is visible only inside the loop.
Related
I was trying to write some code that allow me to observe reordering of memory operations.
In the fallowing example I expected that on some executions of set_values() order of assigning values could change. Especialy notification = 1 may occur before the rest of operations, but in dosn't happend even after thousens of iterations.
I've compiled code with -O3 optimization.
Here is youtube material that i'm refering to : https://youtu.be/qlkMbxUbKfw?t=200
int a{0};
int b{0};
int c{0};
int notification{0};
void set_values()
{
a = 1;
b = 2;
c = 3;
notification = 1;
}
void calculate()
{
while(notification != 1);
a += b + c;
}
void reset()
{
a = 0;
b = 0;
c = 0;
notification = 0;
}
int main()
{
a=6; //just to allow first iteration
for(int i = 0 ; a == 6 ; i++)
{
reset();
std::thread t1(calculate);
std::thread t2(set_values);
t1.join();
t2.join();
std::cout << "Iteration: " << i << ", " "a = " << a << std::endl;
}
return 0;
}
Now the program is stuck in infinited loop. I expect that in some iterations order of instructions in set_values() function can change (due to optimalization on cash memory). For example notification = 1 will be executed before c = 3 what will trigger execution of calculate() function and gives a==3 what satisfies the condition of terminating the loop and prove reordering
Or maybe someone can provide other trivial example of code that help observe reordering of memory operations?
The compiler can indeed reorder your assignments in the function set_values. However, it is not required to do so. In this case it has no reason to reorder anything, since you are assigning constants to all four variables.
Now the program is stuck in infinited loop.
This is probably because while(notification != 1); will be optimized to an infinite loop.
With a bit of work, we can find a way to make the compiler reorder the assignment notify = 1 before the other statements, see https://godbolt.org/z/GY-pAw.
Notice that the program reads x from the standard input, this is done to force the compiler to read from a memory location.
I've also made the variable notification volatile, so that while(notification != 1); doesn't get optimised away.
You can try this example on your machine, I've been able to consistently fail the assertion using g++9.2 and -O3 running on an Intel Sandy Bridge cpu.
Be aware that the cpu itself can reorder instructions if they are independent of each other, see https://en.wikipedia.org/wiki/Out-of-order_execution. This is, however, a bit tricky to test and reproduce consistently.
Your compiler optimizes in unexpected ways but is allowed to do so because you are violating a fundamental rule of the C++ memory model.
You cannot access a memory location from multiple threads if at least one of them is a writer.
To synchronize, either use a std:mutex or use std:atomic<int> instead of int for your variables
I have an if statement that is currently never executed, however if I print something to the screen it takes over ten times longer for the program to run than if a variable is declared. Doing a bit of research online this seems to be some kind of branch prediction issue. Is there anything I can do to improve the program speed?
Basically both myTest and myTest_new return the same thing except one is a macro and one is a function. I am just monitoring the time it takes for bitTest to execute. and it executes in 3 seconds with just declaration in if statement but takes over a minute when Serial.print is in if statement even though neither are executed.
void bitTest()
{
int count = 0;
Serial1.println("New Test");
int lastint = 0;
Serial1.println("int");
for (int index = -2147483647; index <= 2147483647; index+=1000) {
if (index <= 0 && lastint > 0) {
break;
}
lastint = index;
for (int num = 0; num <= 31; num++) {
++1000;
int vcr1 = myTest(index, num);
int vcr2 = myTest_new(index, num);
if (vcr1 != vcr2) {
Serial1.println("Test"); // leave this println() and it takes 300 seconds for the test to run
//int x = 0;
}
} // if (index)
} // for (index)
Serial1.print("count = ");
Serial1.println(count);
return;
}
It is much less likely to be caused by a branch prediction (that branch prediction shouldn't be influenced by what you do inside your code) but by the fact that
{
int x = 0;
}
simply does nothing, because the scope of x ends at }, so that the compiler simply ditches the whole if clause, including the check. Note that this is only possible because the expression that if checks has no side effects, and neither does the block that would get executed.
By the way, the code you showed would usually directly be "compiled away", because the compiler, at compile time, can determine whether the if clause could ever be executed, unless you explicitly tell the compiler to omit such safe optimizations. Hence, I kind of doubt your "10 times as slow" measurement. Either the code you're showing isn't the actual example on which you demonstrate this, or you should turn on compiler optimization prior to doing performance comparisons.
The reason why your program takes forever is that it's buggy:
for (int index = -2147483647; index <= 2147483647; index+=1000) {
simply: at a very large index close to the maximum integer value, a wrap-around will occur. There's no "correct" way for your program to terminate. Hence you invented your strange lastint > 0 checking.
Now, fix up the loop (I mean, you're really just using every 1000th element, so why not simply loop index from 0 to 2*2147483?)
++1000;
should be illegal in C, because you can't increase a constant numeral. This is very much WTF.
All in all, your program is a mess. Re-write it, and debug a clean, well-defined version of it.
Editor's clarification: When this was originally posted, there were two issues:
Test performance drops by a factor of three if seemingly inconsequential statement added
Time taken to complete the test appears to vary randomly
The second issue has been solved: the randomness only occurs when running under the debugger.
The remainder of this question should be understood as being about the first bullet point above, and in the context of running in VC++ 2010 Express's Release Mode with optimizations "Maximize Speed" and "favor fast code".
There are still some Comments in the comment section talking about the second point but they can now be disregarded.
I have a simulation where if I add a simple if statement into the while loop that runs the actual simulation, the performance drops about a factor of three (and I run a lot of calculations in the while loop, n-body gravity for the solar system besides other things) even though the if statement is almost never executed:
if (time - cb_last_orbital_update > 5000000)
{
cb_last_orbital_update = time;
}
with time and cb_last_orbital_update being both of type double and defined in the beginning of the main function, where this if statement is too. Usually there are computations I want to run there too, but it makes no difference if I delete them. The if statement as it is above has the same effect on the performance.
The variable time is the simulation time, it increases in 0.001 steps in the beginning so it takes a really long time until the if statement is executed for the first time (I also included printing a message to see if it is being executed, but it is not, or at least only when it's supposed to). Regardless, the performance drops by a factor of 3 even in the first minutes of the simulation when it hasn't been executed once yet. If I comment out the line
cb_last_orbital_update = time;
then it runs faster again, so it's not the check for
time - cb_last_orbital_update > 5000000
either, it's definitely the simple act of writing current simulation time into this variable.
Also, if I write the current time into another variable instead of cb_last_orbital_update, the performance does not drop. So this might be an issue with assigning a new value to a variable that is used to check if the "if" should be executed? These are all shots in the dark though.
Disclaimer: I am pretty new to programming, and sorry for all that text.
I am using Visual C++ 2010 Express, deactivating the stdafx.h precompiled header function didn't make a difference either.
EDIT: Basic structure of the program. Note that nowhere besides at the end of the while loop (time += time_interval;) is time changed. Also, cb_last_orbital_update has only 3 occurrences: Declaration / initialization, plus the two times in the if statement that is causing the problem.
int main(void)
{
...
double time = 0;
double time_interval = 0.001;
double cb_last_orbital_update = 0;
F_Rocket_Preset(time, time_interval, ...);
while(conditions)
{
Rocket[active].Stage[Rocket[active].r_stage].F_Update_Stage_Performance(time, time_interval, ...);
Rocket[active].F_Calculate_Aerodynamic_Variables(time);
Rocket[active].F_Calculate_Gravitational_Forces(cb_mu, cb_pos_d, time);
Rocket[active].F_Update_Rotation(time, time_interval, ...);
Rocket[active].F_Update_Position_Velocity(time_interval, time, ...);
Rocket[active].F_Calculate_Orbital_Elements(cb_mu);
F_Update_Celestial_Bodies(time, time_interval, ...);
if (time - cb_last_orbital_update > 5000000.0)
{
cb_last_orbital_update = time;
}
Rocket[active].F_Check_Apoapsis(time, time_interval);
Rocket[active].F_Status_Check(time, ...);
Rocket[active].F_Update_Mass (time_interval, time);
Rocket[active].F_Staging_Check (time, time_interval);
time += time_interval;
if (time > 3.1536E8)
{
std::cout << "\n\nBreak main loop! Sim Time: " << time << std::endl;
break;
}
}
...
}
EDIT 2:
Here is the difference in the assembly code. On the left is the fast code with the line
cb_last_orbital_update = time;
outcommented, on the right the slow code with the line.
EDIT 4:
So, i found a workaround that seems to work just fine so far:
int cb_orbit_update_counter = 1; // before while loop
if(time - cb_orbit_update_counter * 5E6 > 0)
{
cb_orbit_update_counter++;
}
EDIT 5:
While that workaround does work, it only works in combination with using __declspec(noinline). I just removed those from the function declarations again to see if that changes anything, and it does.
EDIT 6: Sorry this is getting confusing. I tracked down the culprit for the lower performance when removing __declspec(noinline) to this function, that is being executed inside the if:
__declspec(noinline) std::string F_Get_Body_Name(int r_body)
{
switch (r_body)
{
case 0:
{
return ("the Sun");
}
case 1:
{
return ("Mercury");
}
case 2:
{
return ("Venus");
}
case 3:
{
return ("Earth");
}
case 4:
{
return ("Mars");
}
case 5:
{
return ("Jupiter");
}
case 6:
{
return ("Saturn");
}
case 7:
{
return ("Uranus");
}
case 8:
{
return ("Neptune");
}
case 9:
{
return ("Pluto");
}
case 10:
{
return ("Ceres");
}
case 11:
{
return ("the Moon");
}
default:
{
return ("unnamed body");
}
}
}
The if also now does more than just increase the counter:
if(time - cb_orbit_update_counter * 1E7 > 0)
{
F_Update_Orbital_Elements_Of_Celestial_Bodies(args);
std::cout << F_Get_Body_Name(3) << " SMA: " << cb_sma[3] << "\tPos Earth: " << cb_pos_d[3][0] << " / " << cb_pos_d[3][1] << " / " << cb_pos_d[3][2] <<
"\tAlt: " << sqrt(pow(cb_pos_d[3][0] - cb_pos_d[0][0],2) + pow(cb_pos_d[3][1] - cb_pos_d[0][1],2) + pow(cb_pos_d[3][2] - cb_pos_d[0][2],2)) << std::endl;
std::cout << "Time: " << time << "\tcb_o_h[3]: " << cb_o_h[3] << std::endl;
cb_orbit_update_counter++;
}
I remove __declspec(noinline) from the function F_Get_Body_Name alone, the code gets slower. Similarly, if i remove the execution of this function or add __declspec(noinline) again, the code runs faster. All other functions still have __declspec(noinline).
EDIT 7:
So i changed the switch function to
const std::string cb_names[] = {"the Sun","Mercury","Venus","Earth","Mars","Jupiter","Saturn","Uranus","Neptune","Pluto","Ceres","the Moon","unnamed body"}; // global definition
const int cb_number = 12; // global definition
std::string F_Get_Body_Name(int r_body)
{
if (r_body >= 0 && r_body < cb_number)
{
return (cb_names[r_body]);
}
else
{
return (cb_names[cb_number]);
}
}
and also made another part of the code slimmer. The program now runs fast without any __declspec(noinline). As ElderBug suggested, an issue with the CPU instruction cache then / the code getting too big?
I'd put my money on Intel's branch predictor. http://en.wikipedia.org/wiki/Branch_predictor
The processor assumes (time - cb_last_orbital_update > 5000000) to be false most of the time and loads up the execution pipeline accordingly.
Once the condition (time - cb_last_orbital_update > 5000000) comes true. The misprediction delay is hitting you. You may loose 10 to 20 cycles.
if (time - cb_last_orbital_update > 5000000)
{
cb_last_orbital_update = time;
}
Something is happening that you don't expect.
One candidate is some uninitialised variables hanging around somewhere, which have different values depending on the exact code that you are running. For example, you might have uninitialised memory that is sometime a denormalised floating point number, and sometime it's not.
I think it should be clear that your code doesn't do what you expect it to do. So try debugging your code, compile with all warnings enabled, make sure you use the same compiler options (optimised vs. non-optimised can easily be a factor 10). Check that you get the same results.
Especially when you say "it runs faster again (this doesn't always work though, but i can't see a pattern). Also worked with changing 5000000 to 5E6 once. It only runs fast once though, recompiling causes the performance to drop again without changing anything. One time it ran slower only after recompiling twice." it looks quite likely that you are using different compiler options.
I will try another guess. This is hypothetical, and would be mostly due to the compiler.
My guess is that you use a lot of floating point calculations, and the introduction and use of double values in your main makes the compiler run out of XMM registers (the floating point SSE registers). This force the compiler to use memory instead of registers, and induce a lot of swapping between memory and registers, thus greatly reducing the performance. This would be happening mainly because of the computations functions inlining, because function calls are preserving registers.
The solution would be to add __declspec(noinline) to ALL your computation functions declarations.
I suggest using the Microsoft Profile Guided Optimizer -- if the compiler is making the wrong assumption for this particular branch it will help, and it will in all likelihood improve speed for the rest of the code as well.
Workaround, try 2:
The code is now looking like this:
int cb_orbit_update_counter = 1; // before while loop
if(time - cb_orbit_update_counter * 5E6 > 0)
{
cb_orbit_update_counter++;
}
So far it runs fast, plus the code is being executed when it should as far as i can tell. Again only a workaround, but if this proves to work all around then i'm satisfied.
After some more testing, seems good.
My guess is that this is because the variable cb_last_orbital_update is otherwise read-only, so when you assign to it inside the if, it destroys some optimizations that the compiler has for read-only variables (e.g. perhaps it's now stored in memory instead of a register).
Something to try (although this might still not work) is to make a third variable that is initialized via cb_last_orbital_update and time depending on whether the condition is true, and using that one instead. Presumably, the compiler would now treat that variable as a constant, but I'm not sure.
I'm writing a class in windows using visual studio, one of it's public function has a big for loop looks like below,
void brain_network_opencl::block_filter_fcd_all(int m)
{
const int m_block_len = m * block_len;
time_t start, end;
for (int j = 0; j < shift_2d_gpu[1]; j++) // local work size/number of rows per block
{
for (int i = 0; i < masksize; i++) // number of extracted voxels
{
if (j + m_block_len != i)
{
//if (floor(dst_ptr_gpu[i + j * masksize] * power_up) > threadhold_fcd)
if ((int)(dst_ptr_gpu[i + j * masksize] * power_up) > threadhold_fcd)
{
org_row = mask_ind[j + m_block_len];
org_col = mask_ind[i];
nodes.insert(org_row);
conns.insert(make_pair(org_row, org_col));
}
}
}
}
end = clock();
cout << end - start << "ms" << " for block" << j << endl;
}
where nodes is std::set<set> ,conns is std::multimap<int, int> and mask_ind is std::vector<int>, they are declared as private variables as well as masksize and shift_2d_gpu;
Major time costs by floor and .insert;
The problem is, the same code (with all the variables) in a main function costs only 1/5~1 the time than it calls from here. And if I replace (int) by floor in both function and main(), it costs much more in this function;
What causes this problem and do I have to write it all inside a main()?
By the way does it has something to do with the overloads?
floor shows +3 overloads and .insert shows +5 overloads
updates
I copy the codes of this function to another new console project's main function.
It's still much slower than my first function (codes also in main)!!!
Now I'm confused...
It's there any settings that make floor and .insert faster?
updates 2014/03/31
It's because of the settings in Project Properties->Configuration Properties->C/C++->General->Debug Information Format, this value is set to P*rogram Database for Edit And Continue (/ZI)* as default and it is incompatible with a lot of optimizations according to msdn. If this value is set to Program Database (/Zi), the time cost of floor wouldn't be 10 times of (int).
(I looked into Disassembly and found out that the length of codes (call floor -> jmp floor ->different codes) are different when the setting is altered, that's the reason causes floor and .insert spent much more time than it should)
As Gassa has pointed out, to optimize the tight loop use a custom floor function.
set<int> isn't cache friendly, but to replace it with a cache-friendly structure you might need to alter the algorithm. Still, unordered_set<int>, with a decent space reserved to it, should be a bit better, having less cache misses per insert than a binary tree.
P.S. Non-virtual overloads in C++ are resolved at compile time and have no effect on performance
when i run the following c++ code
#include<iostream>
int main()
{
for(int i=1;i<=10;i++)
{
cout<<i<<endl;
}
cout<<i;
getch();
return 0;
}
I get result from no. 1 to 11.
i don't understand why the value of i = 11 after the block of for loop is finished,Please give me the reason.I have declared i inside for loop and scope of i has been finished after loop so why i get the outpout i=11 after execute second cout statement .I have not declared i in the variable declaration inside main.My question is that is i is visible outside of for loop?
Thanks in advance.
For multiple reasons, this program does not compile. You are either using an extremely old compiler, an extremely permissive compiler, or not showing us the program you're actually having a problem with.
From your comments it seems that you actually can compile it. I can only guess that you are using a very old compiler. Perhaps an old MS-DOS compiler (Zortech C++? Turbo C++?) since the getch function is not generally a standard library function and doesn't do the right thing in the curses library anyway. It's probably an old BIOS-based function from the MS-DOS days.
The standard was changed awhile ago (over 10 years now) so that variable declarations in the parenthesized section of a for loop are local to that loop. It was once not actually the case that this was true.
I no longer have access to any compiler that's so old it doesn't handle things this way. I'm surprised anybody does. Your program will not compile on my compiler.
Here is a version of your program that does compile, even though it requires the -lcurses option to link:
#include <iostream>
#include <curses.h>
using ::std::cout;
using ::std::endl;
int main()
{
for(int i=1;i<=10;i++)
{
cout<<i<<endl;
}
getch();
return 0;
}
Notice how the offending cout << i; statement is gone? That because it will not compile on a modern compiler.
Now, lets edit your program some more so it will compile with the cout << i; statement you're vexed about:
#include <iostream>
#include <curses.h>
int main()
{
using ::std::cout;
int i;
for (i = 1; i <= 10; i++)
{
cout << i << '\n';
}
cout << "last: " << i << '\n';
getch();
return 0;
}
This, of course, does print out last: 11 at the very end. This happens for a very obvious reason. What value does i have to have in order for the i <= 10 test to fail? Why, any value greater than 10! And since i is having one added to it every loop iteration, the first value i has that has the property of being greater than 10 is 11.
The loop test happens at the top of the loop and is used to decide if the remainder of the loop should be executed or not. And the increment happens at the very bottom of the loop (despite appearing in the body of the for statement). So i will be 10, will be printed, and then 1 will be added to it. Then the loop test (i <= 10) will be done, it will be discovered that 11 <= 10 is false, and control will drop out of the loop down to the print statement after the loop, and last: 11 will be printed.
And yes, the exact same thing will happen in C.
Because the loop breaks when the condition i<=10 becomes untrue, and this can happen when i becomes 11. Simple!
I think you wanted to write i < 10 .
Also, as #Omnifarious noted in the comment, the code shouldn't even compile, as i doesn't exist outside the loop. Maybe, you've declared i outside the loop, in your original code?
Besides the fact that it shouldn't compile (because i doesn't exist outside the loop block).
The loop runs from 1 to 10, so it stops when i reaches 11 and the condition fails. So i is 11 in the end of the loop.
This is because you have an old compiler.
cout<<i<<endl; should not compile, as cout and endl need to be qualified by the std namespace.
Fixing that, std::cout<<i; shouldn't compile because your variable is loop-scoped, so shouldn't even be visible outside the loop.
Fixing that, here's your code:
#include<iostream>
#include<conio.h>
int main()
{
int i;
for(i = 1; i <= 10; i++)
{
std::cout << i << std::endl;
}
std::cout << i;
getch();
return 0;
}
It should become more obvious why 11 is printed now.
When i == 10, the loop executes, increments i, and checks its value. It is then equal to 11, so it exits the loop.
Then you have another print statement, that will print the post-loop value, which is 11.
Here's the output I get from that corrected program:
1
2
3
4
5
6
7
8
9
10
11
This is the same as you are getting.
If you only want to print 1-10, then why have the extra std::cout << i;?
Recommendation
Get an up to date C++ compiler that will give you syntax errors on things that are no longer valid in standard-compliant C++
Get rid of the extra std::cout << i;
Keep your i variable loop-scoped, as in your original code
The result from my recommendations will be that you only see 1 through 10 printed, and you will have slightly fewer unexpected surprises in the future (as some "bad" and surprising code won't even compile).