C++ code gets strangely skipped with no optimizations. Any ideas why? - c++

I looked for an answer for two days to this with no success. I've never come across this problem before so I'll try my best. Please bear with me.
I returned to a C++ project I created over a year ago, which at the time ran without problems. I came across this interesting and incredibly annoying problem the other day as I was trying to get the same program to run. The code was something like:
file.h
...
short id;
...
file.cc
id = 0;
while (id < some_large_number)
{
id = foo();
if (id == 2)
{
//do something
}
else if (id == 2900)
{
//do something
}
else if (id == 30000)
{
//do something
}
else if (id == 40000)
{
//do something
}
else if (id == 45000)
{
//do something
}
else
{
//do something else
}
}
The constant numbers were macros in hex notation that I expanded for this example. Turns out that this was truly a bug, but the debugger did not make it easy to discover. Heres what happened:
As I was trying to step through the code using GDB (with no optimizations), I noticed that GDB would jump straight to the else statement after reaching if (id == 30000), everytime. Because the numbers were c macros in hex notation, I did not notice at first that 40000 was beyond the limit of a signed short. This was very misleading, and spent hours trying to figure it out: I recompiled external libraries, reinstalled g++, among other things.
Obviously, making id an unsigned short fixed the problem. The other problem seems like a compiler issue. But I still don't understand, why were those sections of code completely skipped during execution, and with no optimizations? Why would it not go through each if statement and that way I could identify the real problem? Any ideas?
Thanks so much. I hope this is okay for a first question.

If you enable all the warnings from gcc, it will tell you at compile time that this is going to happen.

short is 16 bits long and its range is -32768 to 32767. So it can never be 40000 or 45000 and compiler eliminated dead code (as it will never be reached).

GCC is an excellent optimizing compiler, however even when error and warning information is enabled via -Werror, -Wall, etc GCC still doesn't produce the same level of information a diagnostic compile does. While developing the code I would recommend using Clang, a diagnostic compiler, to help in finding bugs and error. Clang is intended to compatible with GCC, and with the exception of some more esoteric features I have had no problem changing my CC between the two in my Makefile.
Being an optimizing compiler, I believe GCC by default enable dead-code elimination. This would cause all branches which the compiler detected impossible, such as those outside the bounds of your id variable, to be eliminated. You might be able to disable that type of dead-code elimination.

Recall that during compiling, the C++ compiler looks through the code and determines what parts of the code execute in what order. If the compiler determines that a part of the code is never going to run, it will not optimize it.
For example:
int i = 0;
if( i == 1) {
printf("This will never be printed\n");
}
Here, there is no reason to optimize the if-statement, as it will never execute.
This sort of instance would be picked up if you compile with:
g++ -Wall mycode.c
where -Wall means show all warnings, and mycode.c is your project file.
As for execution, stepping through GDB shows the current flow of the program. If a branch (in an if-statement) is false, why would it ever go through that section of the code? You can only take one branch in an if-elseif-else statement.
I hope that helps you out.

My conclusion is the same as yours: It seems like it was optimized out even without "optimizations" turned on. Maybe these constant predicates "always true"/"always false" are used to skip the code somewhere directly in the code generation step, i.e. way sooner than any -O switch optimizations are being performed. Just a guess.

Related

can the return value from finish in gdb be different from the actual one in execution

I am a gdb novice, and I was trying to debug some GSSAPI code, and was using fin to see the return value from the frame. As seen in the snip pasted below, the call from gssint_mechglue_initialize_library() seems to be 0 but the actual check seems to fail. Can someone please point out if I am missing something obvious here?
Thanks in advance!
One possible explanation for the observed behavior is that you are debugging optimized code, and that line 1001 isn't really executed.
You can confirm this with a few nexts, or by executing fin again and observing whether GSS_S_COMPLETE or something else is returned from gssint_select_mech_type.
When optimization is on, code motion performed by the optimizer often prevents correct assignment of actual code sequences to line numbers (as instructions "belonging" to different lines are mixed and re-ordered). This often causes the code to "jump around" when e.g. doing nexti command.
For ease of debugging, recompile with -O0, or make sure to remove any -O2 and the like from your compile lines.

c++ attempt to optimize code by replacing tests

I am looking at code that someone else wrote, and it has a lot of debug sections, of type
if(0) { code }
or if(1) { code }
or if(false) { code }
There is even
#if(0)
#endif
(which did not turn gray though - I thought that it should)
I was wondering, if I replace these with some #if 0 (or #ifdef _DEBUG), is it possible to optimize the code ? - or - it will not make any difference ?
I think that it may help, since I have seen code that is within these sections being grayed out - and I thought that this code is removed from the Release executable... Therefore making it faster. Is that true ?
The code that I am thinking of is inside functions that could be called lots of times...
Edit: The code I am referring to is being run millions of times. I am aware that the contents of the if(0) will be ignored...
I am also aware of the benefit of being able to easily debug an issue, by switching a test from 0 to 1...
My question was, the fact that I am adding millions of millions of times the test if(0) does not add overhead... I am trying to figure out what are all the things that could make this code take fewer hours.
If expressions placed inside those IFs are constant and determinable at the time of compilation, then you may be almost sure that the compiler has already removed them off the code for you.
Of course, if you compile in Debug-Mode, and/or if you have optimization-level set to zero, then the compiler may skip that and leave those tests - but with plain zero/one/true/false values it is highly unlikely.
For a compile-time constant branches, you may be sure that the compiler removed the dead ones.
It is able to remove even complex-looking cases like:
const int x = 5;
if( 3 * x * x < 10 ) // ~ 75 < 10
{
doBlah(); // skipped
}
However, without that 'const' marker at X, the expression's value may be not determinable at the compile time, and it may 'leak' into the actual final product.
Also, the value of expression in following code is not necesarily compile-time constant:
const int x = aFunction();
if( 3 * x * x < 10 ) // ~ 75 < 10
{
doBlah(); // skipped
}
X is a constant, but it is initialized with value from a function. X will most probably be not determinable at the time of compilation. In runtime the function could return any value*) so the compiler must assume that X is unknown.
Therefore, if you have possibility, then use preprocessor. In trivial cases that won't do much, because the compiler already knew that. But cases are not always trivial, and you will notice the change vrey often. When optimizer fails to deduce the values, it leaves the code, even if it is dead. Preprocessor on the other hand is guaranteed to remove disabled sections, before they get compiled and optimized. Also, using preprocessor to do that at least will speed up the compilation: the compiler/optimizer will not have to traceconstants/calculate/checkbranches etc.
*) it is possible to write a method/function which return value will actually be determinable at the compilation and optimization phases: if the function is simple and if it gets inlined, its result value might be optimized out along with some branches.. But even if you can somewhat rely on removing the if-0 clauses, you cannot rely on the inlining as much..
If you have code inside an if (0) block, the code generated by the compiler will be the same as if that block wasn't there on any reasonable compiler. The code will still be checked for compile-time errors. (Assuming you don't have any jump labels inside it or something weird like that.)
If you have code inside an if (1) block, the code generated by the compiler will be the same as if the code was just inside braces. It's a common way to give a block of code its own scope so that local variables are destructed where desired.
If you ifdef out code, then the compiler ignores it completely. The code can be completely nonsense, contain syntax errors, or whatever and the compiler will not care.
Typically, #if 0 is used to remove code whilst still keeping it around - for example to easily compare to options, I sometimes do:
#if 1
some sort of code
#else
some other code
#endif
That way, I can quickly switch between the two alternatives.
In this case, the preprocessor will just leave one of the two options in the code.
The constructs of if(0) or if(1) is similar - the compiler will pretty much remove the if, and in the case of 0 also remove the rest of the if-statement.
I think it's rather sloppy to leave this sort of stuff in "completed" code, but it's very useful for debugging/development.
Say for example you are trying a new method for doing something that is much faster:
if (1)
{
fast_function();
}
else
{
slower_function();
}
Now, in one of your testcases, the result shows an error. So you want to quickly go back to slower_funcion and see if the result is the same or not. If it's the same, then you have to look at what else has changed since it last passed. If it's OK with slower_function, you go back and look at why fast_function() is not working as it should in this case.
It's true (depending on your build settings and preprocessor).
Putting debug code in #ifdef _DEBUG (or similar) is a standard way to keep these completely out of your release builds. Usually the debug build #defines it, and the release build does not.
Usually, though, a compiler should also remove code such as if (0), if given the proper optimization flags, but this puts extra work on the compiler, and on the programmer (now you have to go change them all!). I'd definitely leave this to the preprocessor.
You're correct. If you compile with #define DEBUG 0 then you will be actually removing all the #if DEBUG blocks at compile time. Hence, there will be lot less code, and it will run faster.
Just make sure you release your code after making #define DEBUG 0 at release time.
A good optimizing compiler(GCC, MSVC) will remove if(0) and if(1) from code completely... the translation to machine code will NOT test for these conditions...

Visual Studio C++ compiler optimizations breaking code?

I've a peculiar issue here, which is happening both with VS2005 and 2010. I have a for loop in which an inline function is called, in essence something like this (C++, for illustrative purposes only):
inline double f(int a)
{
if (a > 100)
{
// This is an error condition that shouldn't happen..
}
// Do something with a and return a double
}
And then the loop in another function:
for (int i = 0; i < 11; ++i)
{
double b = f(i * 10);
}
Now what happens is that in debug build everything works fine. In release build with all the optimizations turned on this is, according to disassembly, compiled so that i is used directly without the * 10 and the comparison a > 100 turns into a > 9, while I guess it should be a > 10. Do you have any leads as to what might make the compiler think that a > 9 is the correct way? Interestingly, even a minor change (a debug printout for example) in the surrounding code makes the compiler use i * 10 and compare that with the literal value of 100.
I know this is somewhat vague, but I'd be grateful for any old idea.
EDIT:
Here's a hopefully reproducable case. I don't consider it too big to be pasted here, so here goes:
__forceinline int get(int i)
{
if (i > 600)
__asm int 3;
return i * 2;
}
int main()
{
for (int i = 0; i < 38; ++i)
{
int j = (i < 4) ? 0 : get(i * 16);
}
return 0;
}
I tested this with VS2010 on my machine, and it seems to behave as badly as the original code I'm having problems with. I compiled and ran this with the IDE's default empty C++ project template, in release configuration. As you see, the break should never be hit (37 * 16 = 592). Note that removing the i < 4 makes this work, just like in the original code.
For anyone interested, it turned out to be a bug in the VS compiler. Confirmed by Microsoft and fixed in a service pack following the report.
First, it'd help if you could post enough code to allow us to reproduce the issue. Otherwise you're just asking for psychic debugging.
Second, it does occasionally happen that a compiler fails to generate valid code at the highest optimization levels, but more likely, you just have a bug somewhere in your code. If there is undefined behavior somewhere in your code, that means the assumptions made by the optimizer may not hold, and then the compiler can end up generating bad code.
But without seeing your actual code, I can't really get any more specific.
The only famous bug I know with optimization (and only with highest optimization level) are occasional modifications of the orders of priority of the operations (due to change of operations performed by the optimizer, looking for the fastest way to compute). You could look in this direction (and put some parenthesis even though they are not strictly speaking necessary, which is why more parenthesis is never bad), but frankly, those kind of bugs are quite rare.
As stated, it difficult to have any precise idea without more code.
Firstly, inline assembly prevents certain optimizations, you should use the __debugbreak() intrinsic for int3 breakpointing. The compiler sees the inline function having no effect other than a breakpoint, so it divides the 600 by 16(note: this is affected by integer truncation), thus it optimizes to debugbreak to trigger with 38 > i >= 37. So it seems to work on this end

if - else vs if and returns revisited (not asking about multiple returns ok or not)

With regards this example from Code Complete:
Comparison Compare(int value1, int value2)
{
if ( value1 < value2 )
return Comparison_LessThan;
else if ( value1 > value2 )
return Comparison_GreaterThan;
else
return Comparison_Equal;
}
You could also write this as:
Comparison Compare(int value1, int value2)
{
if ( value1 < value2 )
return Comparison_LessThan;
if ( value1 > value2 )
return Comparison_GreaterThan;
return Comparison_Equal;
}
Which is more optimal though? (readability, etc aside)
Readability aside, the compiler should be smart enough to generate identical code for both cases.
"Readability, etc aside" I'd expect the compiler to produce identical code from each of them.
You can test that though, if you like: your C++ compiler probably has an option to generate a listing file, so you can see the assembly/opcodes generated from each version ... or, you can see the assembly/opcodes by using your debugger to inspect the code (after you start the executable).
This will generate identical code in just about any compiler... (GCC, visual studio, etc). Compilers work on a little bit different logic then we do. If's become if!... meaning that in both cases it would just fall through to that last return statement.
Edit:
More generally, the else statement is just there for the human, it actually doesn't generate anything on most compilers... this is true in your case and just about anything else using the if... else... construct.
The compiler generates identical code. One of the most basic things the compiler does is to build a control graph. Basically, "standing at node X, which nodes can I get to", and then it inserts jump statements for these reachable nodes.
And in your case, the control graph is exactly the same in both cases.
(Of course this is a gross simplification, and the compiler does a lot more before actually generating any actual code)
Readability is the correct answer. Any compiler will produce equivalent code to within a cycle or two, and an optimizer will have no problems parsing and sorting the control flow, either.
That's why readability is more important. Your cost of this code isn't just in writing it and compiling it today. It may have to be maintained in the future by you or someone else. You want your code to be readable so that the next maintainer will not have to waste a lot of time trying to understand it.
<underwear fabric="asbestos"> Not all coding style decisions should be made solely on "efficiency" or cycle count. </underwear> Should you write inefficient code? Of course not. But let the optimizer handle the tiny questions when it can. You're more valuable than that.
It will really depend on your compiler inferring what you are trying to do and placing the "jumps" or not. It is trivial.
In case there is a return statement, there is no difference.
Using else in these cases may just stop you from checking the second condition in case you enter the first if. But the performance difference should be really slow, unless you have a condition that takes long to check.
The two code samples should compile identically on modern compilers, whether optimizations are turned on or off. The only chance you may encounter something different is if you use an old compiler that doesn't recognize that it's going to write inefficient code (most likely, unused code).
If you're worried about optimizations, you might consider taking a look at the algorithm being used.
Just execute gcc -S to see at generated assembler code, it should be identical. Anyway you could answer yourself by executing each 1000000 times and measuring execution time.

GCC: program doesn't work with compilation option -O3

I'm writing a C++ program that doesn't work (I get a segmentation fault) when I compile it with optimizations (options -O1, -O2, -O3, etc.), but it works just fine when I compile it without optimizations.
Is there any chance that the error is in my code? or should I assume that this is a bug in GCC?
My GCC version is 3.4.6.
Is there any known workaround for this kind of problem?
There is a big difference in speed between the optimized and unoptimized version of my program, so I really need to use optimizations.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.
Now that you posted the code fragment and a working workaround was found (#Windows programmer's answer), I can say that perhaps what you are looking for is -ffloat-store.
-ffloat-store
Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.
This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.
Source: http://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html
I would assume your code is wrong first.
Though it is hard to tell.
Does your code compile with 0 warnings?
g++ -Wall -Wextra -pedantic -ansi
Here's some code that seems to work, until you hit -O3...
#include <stdio.h>
int main()
{
int i = 0, j = 1, k = 2;
printf("%d %d %d\n", *(&j-1), *(&j), *(&j+1));
return 0;
}
Without optimisations, I get "2 1 0"; with optimisations I get "40 1 2293680". Why? Because i and k got optimised out!
But I was taking the address of j and going out of the memory region allocated to j. That's not allowed by the standard. It's most likely that your problem is caused by a similar deviation from the standard.
I find valgrind is often helpful at times like these.
EDIT: Some commenters are under the impression that the standard allows arbitrary pointer arithmetic. It does not. Remember that some architectures have funny addressing schemes, alignment may be important, and you may get problems if you overflow certain registers!
The words of the [draft] standard, on adding/subtracting an integer to/from a pointer (emphasis added):
"If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
Seeing as &j doesn't even point to an array object, &j-1 and &j+1 can hardly point to part of the same array object. So simply evaluating &j+1 (let alone dereferencing it) is undefined behaviour.
On x86 we can be pretty confident that adding one to a pointer is fairly safe and just takes us to the next memory location. In the code above, the problem occurs when we make assumptions about what that memory contains, which of course the standard doesn't go near.
As an experiment, try to see if this will force the compiler to round everything consistently.
volatile float d1=distance(point,p1) ;
volatile float d2=distance(point,p2) ;
return d1 < d2 ;
The error is in your code. It's likely you're doing something that invokes undefined behavior according to the C standard which just happens to work with no optimizations, but when GCC makes certain assumptions for performing its optimizations, the code breaks when those assumptions aren't true. Make sure to compile with the -Wall option, and the -Wextra might also be a good idea, and see if you get any warnings. You could also try -ansi or -pedantic, but those are likely to result in false positives.
You may be running into an aliasing problem (or it could be a million other things). Look up the -fstrict-aliasing option.
This kind of question is impossible to answer properly without more information.
It is very seldom the compiler fault, but compiler do have bugs in them, and them often manifest themselves at different optimization levels (if there is a bug in an optimization pass, for example).
In general when reporting programming problems: provide a minimal code sample to demonstrate the issue, such that people can just save the code to a file, compile and run it. Make it as easy as possible to reproduce your problem.
Also, try different versions of GCC (compiling your own GCC is very easy, especially on Linux). If possible, try with another compiler. Intel C has a compiler which is more or less GCC compatible (and free for non-commercial use, I think). This will help pinpointing the problem.
It's almost (almost) never the compiler.
First, make sure you're compiling warning-free, with -Wall.
If that didn't give you a "eureka" moment, attach a debugger to the least optimized version of your executable that crashes and see what it's doing and where it goes.
5 will get you 10 that you've fixed the problem by this point.
Ran into the same problem a few days ago, in my case it was aliasing. And GCC does it differently, but not wrongly, when compared to other compilers. GCC has become what some might call a rules-lawyer of the C++ standard, and their implementation is correct, but you also have to be really correct in you C++, or it'll over optimize somethings, which is a pain. But you get speed, so can't complain.
I expect to get some downvotes here after reading some of the comments, but in the console game programming world, it's rather common knowledge that the higher optimization levels can sometimes generate incorrect code in weird edge cases. It might very well be that edge cases can be fixed with subtle changes to the code, though.
Alright...
This is one of the weirdest problems I've ever had.
I dont think I have enough proof to state it's a GCC bug, but honestly... It really looks like one.
This is my original functor. The one that works fine with no levels of optimizations and throws a segmentation fault with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
return distance(point,p1) < distance(point,p2) ;
}
} ;
And this one works flawlessly with any level of optimization:
struct distanceToPointSort{
indexedDocument* point ;
distanceToPointSort(indexedDocument* p): point(p) {}
bool operator() (indexedDocument* p1,indexedDocument* p2){
float d1=distance(point,p1) ;
float d2=distance(point,p2) ;
std::cout << "" ; //without this line, I get a segmentation fault anyways
return d1 < d2 ;
}
} ;
Unfortunately, this problem is hard to reproduce because it happens with some specific values. I get the segmentation fault upon sorting just one out of more than a thousand vectors, so it really depends on the specific combination of values each vector has.
Wow, I didn't expect answers so quicly, and so many...
The error occurs upon sorting a std::vector of pointers using std::sort()
I provide the strict-weak-ordering functor.
But I know the functor I provide is correct because I've used it a lot and it works fine.
Plus, the error cannot be some invalid pointer in the vector becasue the error occurs just when I sort the vector. If I iterate through the vector without applying std::sort first, the program works fine.
I just used GDB to try to find out what's going on. The error occurs when std::sort invoke my functor. Aparently std::sort is passing an invalid pointer to my functor. (of course this happens with the optimized version only, any level of optimization -O, -O2, -O3)
as other have pointed out, probably strict aliasing.
turn it of in o3 and try again. My guess is that you are doing some pointer tricks in your functor (fast float as int compare? object type in lower 2 bits?) that fail across inlining template functions.
warnings do not help to catch this case. "if the compiler could detect all strict aliasing problems it could just as well avoid them" just changing an unrelated line of code may make the problem appear or go away as it changes register allocation.
As the updated question will show ;) , the problem exists with a std::vector<T*>. One common error with vectors is reserve()ing what should have been resize()d. As a result, you'd be writing outside array bounds. An optimizer may discard those writes.
post the code in distance! it probably does some pointer magic, see my previous post. doing an intermediate assignment just hides the bug in your code by changing register allocation. even more telling of this is the output changing things!
The true answer is hidden somewhere inside all the comments in this thread. First of all: it is not a bug in the compiler.
The problem has to do with floating point precision. distanceToPointSort should be a function that should never return true for both the arguments (a,b) and (b,a), but that is exactly what can happen when the compiler decides to use higher precision for some data paths. The problem is especially likely on, but by no means limited to, x86 without -mfpmath=sse. If the comparator behaves that way, the sort function can become confused, and the segmentation fault is not surprising.
I consider -ffloat-store the best solution here (already suggested by CesarB).