I have encounter a horrible situation.
I usually use visual code to edit my code, also compile and execute in it(F5).
But I found vscode is too smart or ignore some warning message for me. And
output the right answer, which also work fine in Ideone. But in window cmd or dev C++ my code can't output anything, just return a big number.
And I find some situation will occur the thing I mention above.
The code like this
for (i = 0; i < g.size(); i++)
{
int source;
int dest;
int minWeight = 999;
for (j = 0; i < g[j].size(); j++)
{
// no edge, come to next condition
if (!g[i][j])
continue;
if (g[i][j] < minWeight)
{
source = i;
dest = j;
minWeight = g[i][j];
}
}
if
updateGroup(index, index[source], index[dest]);
else
updateGroup(index, index[source], index[dest]);
}
You may found that the second for loops have wrong condition statement, it should change
j = 0; i < g[j].size(); j++ to j = 0; j < g[i].size(); j++
So I wonder to know
Are there any way let my vscode more strict?
Why it still can output right answer in vscode and ideone?
How to avoid or be easier to found where my code wrong when this kind of no message error?
Really hope someone can help me, and appreciate all of your suggestion!!
There is no way for the compiler or computer to read your mind and guess what you meant to write instead of what you really did mean.
Even when this mistake results in a bug, it cannot know that you did not intend to write this, or that you meant to write some other specific thing instead.
Even when this bug results in your program having undefined behaviour, it is not possible to detect many cases of undefined behaviour, and it is not worthwhile for a compiler author to attempt to write code to do this, because it's too hard and not useful enough. Even if they did, the compiler could still not guess what you meant instead.
Remember, loops like this don't have to check or increment the same variable that you declared in the preamble; that's just a common pattern (now superseded by the safer ranged-for statement). There's nothing inherently wrong with having a loop that increments i but checks j.
Ultimately, the solution to this problem is to write tests for your code, which is why many organisations have dedicated Quality Assurance teams to search for bugs, and why you should already be testing your code before committing it to your project.
Remember to concentrate and pay close attention and read your code, and eventually such typos will become less common in your work. Of course once in a while you will write a bug, and your tests will catch it. Sometimes your tests won't catch it, which is when your customers will eventually notice it and raise a complaint. Then, you release a new version that fixes the bug.
This is all totally normal software development practice. That's what makes it fun! 😊
1) Crank up your compiler warnings as high as they will go.
2) Use multiple different compilers (they all warn about different things).
3) Know the details of the language well (a multi year effort) and be really, really careful about the code you write.
4) Write (and regularly run) lots of tests.
5) Use tools like sanitizers, fuzzers, linters, static code analyzers etc. to help catch bugs.
6) Build and run your code on multiple platforms to keep it portable and find bugs exposed by different environments/implementations.
Related
Let's consider the situation:
bool b = checking_some_condition();
for (int i = 0; i < 1000000; ++i)
{
if (b)
do_something(i);
else
do_something_else(i);
}
Is it obvious that compiler optimizes the code above into something like this? :
if (b)
{
for (int i = 0; i < 1000000; ++i)
do_something(i);
}
else
{
for (int i = 0; i < 1000000; ++i)
do_something_else(i);
}
Of course, I am only giving example the present the situation. I know that checking bool value 1000000 times is hardly noticeable for the performace, but if I'd have more complex comparisons with multiple ways of how the code inside loop would go, change in performance could be significant. Especially if this code would be inside the function that is called multiple times.
As was mentioned in the comments above you can't really make a safe assumption what the compiler will optimize or won't. It's their "freedom" to do these things or not.
If you want to get a feeling for what's going on the best way is to look at the generated assembly which will give you and objective way of arguing what the compiler might have done. https://godbolt.org/z/W-5Hve shows the easy example you posted above.
However, please try to make the example in godbolt as realistic as possible and then check the assembly. Even if two snippets will yield the same assembly in godbolt to make sure that this will also happen in your codebase you need to check the assembly of your compiled implementation in you codebase as well.
Summarizing this, what I normally do is:
- try a realistic example in godbolt and play with different compilers/flags and change the code until I think I know whats going on.
- compile my project and look at the assembly there to try and find the specific function again to make sure that the result in my code base is the same.
As a little extra: objdump -M intel -dC executable will show you the assembly of an executable.
I have done my best and read a lot of Q&As on SO.SE, but I haven't found an answer to my particular question. Most for-loop and break related question refer to nested loops, while I am concerned with performance.
I want to know if using a break inside a for-loop has an impact on the performance of my C++ code (assuming the break gets almost never called). And if it has, I would also like to know tentatively how big the penalization is.
I am quite suspicions that it does indeed impact performance (although I do not know how much). So I wanted to ask you. My reasoning goes as follows:
Independently of the extra code for the conditional statements that
trigger the break (like an if), it necessarily ads additional
instructions to my loop.
Further, it probably also messes around when my compiler tries to
unfold the for-loop, as it no longer knows the number of iterations
that will run at compile time, effectively rendering it into a
while-loop.
Therefore, I suspect it does have a performance impact, which could be
considerable for very fast and tight loops.
So this takes me to a follow-up question. Is a for-loop & break performance-wise equal to a while-loop? Like in the following snippet, where we assume that checkCondition() evaluates 99.9% of the time as true. Do I loose the performance advantage of the for-loop?
// USING WHILE
int i = 100;
while( i-- && checkCondition())
{
// do stuff
}
// USING FOR
for(int i=100; i; --i)
{
if(checkCondition()) {
// do stuff
} else {
break;
}
}
I have tried it on my computer, but I get the same execution time. And being wary of the compiler and its optimization voodoo, I wanted to know the conceptual answer.
EDIT:
Note that I have measured the execution time of both versions in my complete code, without any real difference. Also, I do not trust compiling with -s (which I usually do) for this matter, as I am not interested in the particular result of my compiler. I am rather interested in the concept itself (in an academic sense) as I am not sure if I got this completely right :)
The principal answer is to avoid spending time on similar micro optimizations until you have verified that such condition evaluation is a bottleneck.
The real answer is that CPU have powerful branch prediction circuits which empirically work really well.
What will happen is that your CPU will choose if the branch is going to be taken or not and execute the code as if the if condition is not even present. Of course this relies on multiple assumptions, like not having side effects on the condition calculation (so that part of the body loop depends on it) and that that condition will always evaluate to false up to a certain point in which it will become true and stop the loop.
Some compilers also allow you to specify the likeliness of an evaluation as a hint the branch predictor.
If you want to see the semantic difference between the two code versions just compile them with -S and examinate the generated asm code, there's no other magic way to do it.
The only sensible answer to "what is the performance impact of ...", is "measure it". There are very few generic answers.
In the particular case you show, it would be rather surprising if an optimising compiler generated significantly different code for the two examples. On the other hand, I can believe that a loop like:
unsigned sum = 0;
unsigned stop = -1;
for (int i = 0; i<32; i++)
{
stop &= checkcondition(); // returns 0 or all-bits-set;
sum += (stop & x[i]);
}
might be faster than:
unsigned sum = 0;
for (int i = 0; i<32; i++)
{
if (!checkcondition())
break;
sum += x[i];
}
for a particular compiler, for a particular platform, with the right optimization levels set, and for a particular pattern of "checkcondition" results.
... but the only way to tell would be to measure.
I would like to know how to avoid wasting my time and risking typos by re-hashing source code when I'm integrating legacy code, library code or sample code into my own codebase.
If I give a simple example, based on an image processing scenario, you might see what I mean.
It's actually not unusual to find I'm integrating a code snippet like this:
for (unsigned int y = 0; y < uHeight; y++)
{
for (unsigned int x = 0; x < uWidth; x++)
{
// do something with this pixel ....
uPixel = pPixels[y * uStride + x];
}
}
Over time, I've become accustomed to doing things like moving unnecessary calculations out of the inner loop and maybe changing the postfix increments to prefix ...
for (unsigned int y = 0; y < uHeight; ++y)
{
unsigned int uRowOffset = y * uStride;
for (unsigned int x = 0; x < uWidth; ++x)
{
// do something with this pixel ....
uPixel = pPixels[uRowOffset + x];
}
}
Or, I might use pointer arithmetic, either by row ...
for (unsigned int y = 0; y < uHeight; ++y)
{
unsigned char *pRow = pPixels + (y * uStride);
for (unsigned int x = 0; x < uWidth; ++x)
{
// do something with this pixel ....
uPixel = pRow[x];
}
}
... or by row and column ... so I end up with something like this
unsigned char *pRow = pPixels;
for (unsigned int y = 0; y < uHeight; ++y)
{
unsigned char *pPixel = pRow;
for (unsigned int x = 0; x < uWidth; ++x)
{
// do something with this pixel ....
uPixel = *pPixel++;
}
// next row
pRow += uStride;
}
Now, when I write from scratch, I'll habitually apply my own "optimisations" but I'm aware that the compiler will also be doing things like:
Moving code from inside loops to outside loops
Changing postfix increments to prefix
Lots of other stuff that I have no idea about
Bearing in mind that every time I mess with a piece of working, tested code in this way, I not only cost myself some time but I also run the risk that I'll introduce bugs with finger trouble or whatever (the above examples are simplified). I'm aware of "premature optimisation" and also other ways of improving performance by designing better algorithms, etc. but for the situations above I'm creating building-blocks that will be used in larger pipelined type of apps, where I can't predict what the non-functional requirements might be so I just want the code as fast and tight as is reasonable within time limits (I mean the time I spend tweaking the code).
So, my question is: Where can I find out what compiler optimisations are commonly supported by "modern" compilers. I'm using a mixture of Visual Studio 2008 and 2012, but would be interested to know if there are differences with alternatives e.g. Intel's C/C++ Compiler. Can anyone shed some insight and/or point me at a useful web link, book or other reference?
EDIT
Just to clarify my question
The optimisations I showed above were simple examples, not a complete list. I know that it's pointless (from a performance point of view) to make those specific changes because the compiler will do it anyway.
I'm specifically looking for information about what optimisations are provided by the compilers I'm using.
I would expect most of the optimizations that you include as examples to be a waste of time. A good optimizing compiler should be able to do all of this for you.
I can offer three suggestions by way of practical advice:
Profile your code in the context of a real application processing real data. If you can't, come up with some synthetic tests that you think would closely mimic the final system.
Only optimize code that you have demonstrated through profiling to be a bottleneck.
If you are convinced that a piece of code needs optimization, don't just assume that factoring invariant expression out of a loop would improve performance. Always benchmark, optionally looking at the generated assembly to gain further insight.
The above advice applies to any optimizations. However, the last point is particularly relevant to low-level optimizations. They are a bit of a black art since there are a lot of relevant architectural details involved: memory hierarchy and bandwidth, instruction pipelining, branch prediction, the use of SIMD instructions etc.
I think it's better to rely on the compiler writer having a good knowledge of the target architecture than to try and outsmart them.
From time to time you will find through profiling that you need to optimize things by hand. However, these instances will be fairly rare, which will allow you to spend a good deal of energy on things that will actually make a difference.
In the meantime, focus on writing correct and maintainable code.
I think it would probably be more useful for you to reconsider the premise of your question, rather than to get a direct answer.
Why do you want to perform these optimizations? Judging by your question, I assume it is to make a concrete program faster. If that is the case, you need to start with the question: How do I make this program faster?
That question has a very different answer. First, you need to consider Amdahl's law. That usually means that it only makes sense to optimize one or two important parts of the program. Everything else pretty much doesn't matter. You should use a profiler to locate these parts of the program. At this point you might argue that you already know that you should use a profiler. However, almost all the programmers I know don't profile their code, even if they know that they should. Knowing about vegetables isn't the same as eating them. ;-)
Once you have located the hot-spots, the solution will probably involve:
Improving the algorithm, so the code does less work.
Improving the memory access patterns, to improve cache performance.
Again, you should use the profiler to see if your changes have improved the run-time.
For more details, you can Google code optimization and similar terms.
If you want to get really serious, you should also take a look at Agner Fog's optimization manuals and Computer Architecture: A Quantitative Approach. Make sure to get the newest edition.
You might also might want to read The Sad Tragedy of Micro-Optimization Theater.
What can I assume about C/C++ compiler optimisations?
As possible as you can imagine, except the cases that you get issues either functionality or performance with the optimized code, then turn off optimization and debug.
Mordern compilers have various strategies to optimize your code, especially when you are doing concurrent programming, and using libraries like OMP, Boost or TBB.
If you DO care about what your code exactly made into machine code, it would be no more better to decompile it and observe the assemblies.
The most thing for you to do the manual optimization, might be reduce the unpredictable branches, which is some harder be done by the compiler.
If you want to look for the informations about optimization, there's already a question on SO
What are the c++ compiler optimization techniques in Visual studio
In the optimization options, there are explanations about what each optimize for:
/O Options (Optimize Code)
And there's something about optimization strategies and techniques
C++ Optimization Strategies and Techniques, by Pete Isensee
I've a peculiar issue here, which is happening both with VS2005 and 2010. I have a for loop in which an inline function is called, in essence something like this (C++, for illustrative purposes only):
inline double f(int a)
{
if (a > 100)
{
// This is an error condition that shouldn't happen..
}
// Do something with a and return a double
}
And then the loop in another function:
for (int i = 0; i < 11; ++i)
{
double b = f(i * 10);
}
Now what happens is that in debug build everything works fine. In release build with all the optimizations turned on this is, according to disassembly, compiled so that i is used directly without the * 10 and the comparison a > 100 turns into a > 9, while I guess it should be a > 10. Do you have any leads as to what might make the compiler think that a > 9 is the correct way? Interestingly, even a minor change (a debug printout for example) in the surrounding code makes the compiler use i * 10 and compare that with the literal value of 100.
I know this is somewhat vague, but I'd be grateful for any old idea.
EDIT:
Here's a hopefully reproducable case. I don't consider it too big to be pasted here, so here goes:
__forceinline int get(int i)
{
if (i > 600)
__asm int 3;
return i * 2;
}
int main()
{
for (int i = 0; i < 38; ++i)
{
int j = (i < 4) ? 0 : get(i * 16);
}
return 0;
}
I tested this with VS2010 on my machine, and it seems to behave as badly as the original code I'm having problems with. I compiled and ran this with the IDE's default empty C++ project template, in release configuration. As you see, the break should never be hit (37 * 16 = 592). Note that removing the i < 4 makes this work, just like in the original code.
For anyone interested, it turned out to be a bug in the VS compiler. Confirmed by Microsoft and fixed in a service pack following the report.
First, it'd help if you could post enough code to allow us to reproduce the issue. Otherwise you're just asking for psychic debugging.
Second, it does occasionally happen that a compiler fails to generate valid code at the highest optimization levels, but more likely, you just have a bug somewhere in your code. If there is undefined behavior somewhere in your code, that means the assumptions made by the optimizer may not hold, and then the compiler can end up generating bad code.
But without seeing your actual code, I can't really get any more specific.
The only famous bug I know with optimization (and only with highest optimization level) are occasional modifications of the orders of priority of the operations (due to change of operations performed by the optimizer, looking for the fastest way to compute). You could look in this direction (and put some parenthesis even though they are not strictly speaking necessary, which is why more parenthesis is never bad), but frankly, those kind of bugs are quite rare.
As stated, it difficult to have any precise idea without more code.
Firstly, inline assembly prevents certain optimizations, you should use the __debugbreak() intrinsic for int3 breakpointing. The compiler sees the inline function having no effect other than a breakpoint, so it divides the 600 by 16(note: this is affected by integer truncation), thus it optimizes to debugbreak to trigger with 38 > i >= 37. So it seems to work on this end
Or is it all about semantics?
Short answer: no, they are exactly the same.
Guess it could in theory depend on the compiler; a really broken one might do something slightly different but I'd be surprised.
Just for fun here are two variants that compile down to exactly the same assembly code for me using x86 gcc version 4.3.3 as shipped with Ubuntu. You can check the assembly produced on the final binary with objdump on linux.
int main()
{
#if 1
int i = 10;
do { printf("%d\n", i); } while(--i);
#else
int i = 10;
for (; i; --i) printf("%d\n", i);
#endif
}
EDIT: Here is an "oranges with oranges" while loop example that also compiles down to the same thing:
while(i) { printf("%d\n", i); --i; }
If your for and while loops do the same things, the machine code generated by the compiler should be (nearly) the same.
For instance in some testing I did a few years ago,
for (int i = 0; i < 10; i++)
{
...
}
and
int i = 0;
do
{
...
i++;
}
while (i < 10);
would generate exactly the same code, or (and Neil pointed out in the comments) with one extra jmp, which won't make a big enough difference in performance to worry about.
There is no semantic difference, there need not be any compiled difference. But it depends on the compiler. So I tried with with g++ 4.3.2, CC 5.5, and xlc6.
g++, CC were identical, xlc WAS NOT
The difference in xlc was in the initial loop entry.
extern int doit( int );
void loop1( ) {
for ( int ii = 0; ii < 10; ii++ ) {
doit( ii );
}
}
void loop2() {
int ii = 0;
while ( ii < 10 ) {
doit( ii );
ii++;
}
}
XLC OUTPUT
.loop2: # 0x00000000 (H.10.NO_SYMBOL)
mfspr r0,LR
stu SP,-80(SP)
st r0,88(SP)
cal r3,0(r0)
st r3,64(SP)
l r3,64(SP) ### DIFFERENCE ###
cmpi 0,r3,10
bc BO_IF_NOT,CR0_LT,__L40
...
enter code here
.loop1: # 0x0000006c (H.10.NO_SYMBOL+0x6c)
mfspr r0,LR
stu SP,-80(SP)
st r0,88(SP)
cal r3,0(r0)
cmpi 0,r3,10 ### DIFFERENCE ###
st r3,64(SP)
bc BO_IF_NOT,CR0_LT,__La8
...
The scope of the variable in the test of the while loop is wider than the scope of variables declared in the header of the for loop.
Therefore, if there are performance implications as a side-effect of keeping a variable alive longer, then there will be performance implications in choosing between a while and a for loop ( and not wrapping the while up in {} to reduce the scope of its variables ).
An example might be a concurrent collection which counts the number of iterators referring to it, and if more than one iterator exists, it applies locking to prevent concurrent modification, but as an optimisation elides the locking if only one iterator refers to it. If you then had two for loops in a function using differently named iterators on the same container, the fast path would be taken, but with two while loops the slow path would be taken. Similarly there may be performance implications if the objects are large (more cache traffic), or use system resources. But I can't think of a real example that I've ever seen where it would make a difference.
Compilers that optimize using loop unrolling will probably only do so in the for-loop case.
Both are equivalent. It's a matter of semantics.
The only difference may lie in the do... while construct, where you postpone the evaluation of the condition until after the body, and thus may save 1 evaluation.
i = 1; do { ... i--; } while( i > 0 );
as opposed to
for( i = 1; i > 0; --i )
{ ....
}
I write compilers. We compile all "structured" control flow (if, while, for, switch, do...while) into conditional and unconditional branches. Then we analyze the control-flow graph. Since a C compiler has to deal with general goto anyway, it is easiest to reduce everything to branch and conditional-branch instructions, then be sure to handle that case well. (A C compiler has to do a good job not just on handwritten code but also on automatically generated code, which may have many, many goto statements.)
No. If they're doing equivalent things, they'll compile to the same code - as you say, it's about semantics. Choose the one that best represents what you're trying to express.
Ideally it should be the same, but eventually it depends on your compiler/interpreter. To be sure, you must measure or examine the generated assembly code.
Proof that there may be a difference: These lines produce different assembly code using cc65.
for (; i < 1000; ++i);
while (i < 1000) ++i;
On Atmel ATMega while() is faster than for(). Why is this is explained in AVR035: Efficient C Coding for AVR.
P.S. Original platform was not mentioned in question.
continue behaves differently in for and while: in for, it alters the counter, in while, it usually doesn't
To add another answer: In my experience, optimizing software is like a big, bushy beard being shaved off a man.
First you lop it off in big chunks with scissors (prune whole limbs off the call tree).
Then you make it short with an electric clipper (tweak algorithms).
Finally you shave it with a razor to get rid of the last little bit (low-level optimization).
The last is where the difference between for() and while() might, but probably won't, make a difference.
P.S. The programmers I know (who are all very good, and I suspect are a representative sample) basically go at it from the other direction.
They are the same as far as performance goes. I tend to use while when waiting for a state change (such as waiting for a buffer to be filled) and for when processing a number of discrete objects (such as going through each item in a collection).
There is a difference in some cases.
If you are at the point where that difference matters, you either need to pick a better algorithm or begin coding in assembly language. Trust me, coding in assembly is preferable to fixing your compiler version.
Is while() faster/slower than for()? Let's review a few things about optimization:
Compiler-writers work very hard to shave cycles by having fewer calls to jump, compare, increment, and the other kinds of instructions that they generate.
Call instructions, on the other hand, consume many magnitudes more cycles, but the compiler is nearly powerless to do anything to remove those.
As programmers, we write lots of function calls, some because we mean to, some because we're lazy, and some because the compiler slips them in without being obvious.
Most of the time, it doesn't matter, because the hardware is so fast, and our jobs are so small, that the computer is like a beagle dog who wolfes her food and begs for more.
Sometimes, however, the job is big enough that performance is an issue.
What do we do then? Where's the bigger payoff?
Getting the compiler to shave a few cycles off loops & such?
Finding function calls that don't -really- need to be done so much?
The compiler can't do the latter. Only we the programmers can.
We need to learn or be taught how to do this. It doesn't come naturally.
We are congenitally inclined to make wrong guesses and then bet on them.
Getting better algorithms is a start, but only a start. Our teachers need to teach this, if indeed they know how.
Profilers are a start. I do this.
The apocryphal quote of Willie Sutton when asked Why do you rob banks?:
Because that's where the money is.
If you want to save cycles, find out where they are.
Probably only coding style.
for if you know the number of iterations.
while if you do not know the number of iterations.