How to debug a variable that is optimized away in Release build - c++

I am using VS2010. My debug version works fine however my Release version kept on crashing. So In the release version mode I right clicked the project chose Debug and then chose start new instance. At this point I saw that an array that I had declared as such
int ma[4]= { 1,2,8,4};
Never gets initialized. Any suggestions on what might be going on.

When you build in Release, the compiler performs many optimizations on your code. Many of the optimizations include replacing variables with hard-coded values, when it is possible and correct to do so. For example, if you have something like:
int n = 42;
cout << "The answer is: " << n;
By the time the optimizer gets done with it, it will often look more like:
cout << "The answer is: " << 42;
...and the variable n is eliminated from your program completely. If you are stepping through a release version of this program and trying to examine the value of n, you may see very odd values or the debugger may report that n doesn't exist at all.
There are many other optimizations that can be applied which make debugging an optimized program quite difficult. Placing a breakpoint after the initialization of an array could yield very misleading information if the array was eliminated, or if the initialization of it was moved to someplace else.
Another common optimization is to eliminate unused variables, such as with:
int a = ma[0];
If there is no code in your program which actually uses a, the compiler will see that a is unneeded, and optimize it away so that it no longer exists.
In order to see the values that ma has been initialized with, the simplest somewhat reliable approach is to use so-called sprintf debugging:
cout << "ma values: ";
copy (ma, ma+4, ostream_iterator <int> (cout, ", "));
And see what is actually there.

If you debug release build the debugger will report bogus values or will not be able to display any values for most of your variables. The safest way to check that the value of a variable is in Release build is to use logging.
So most probably your array is initialized in Release just as in Debug build but you are not able to see that through the debugger. It seems you have some other problem that is causing the code to crash in Release. Look for some other uninitialized variable or some stack corruption/index out of bounds access.

Related

How to debug segmentation fault?

It works when, in the loop, I set every element to 0 or to entry_count-1.
It works when I set it up so that entry_count is small, and I write it by hand instead of by loop (sorted_order[0] = 0; sorted_order[1] = 1; ... etc).
Please do not tell me what to do to fix my code. I will not be using smart pointers or vectors for very specific reasons. Instead focus on the question:
What sort of conditions can cause this segfault?
Thank you.
---- OLD -----
I am trying to debug code that isn't working on a unix machine. The gist of the code is:
int *sorted_array = (int*)memory;
// I know that this block is large enough
// It is allocated by malloc earlier
for (int i = 0; i < entry_count; ++i){
sorted_array[i] = i;
}
There appears to be a segfault somewhere in the loop. Switching to debug mode, unfortunately, makes the segfault stop. Using cout debugging I found that it must be in the loop.
Next I wanted to know how far into the loop the segfault happend so I added:
std::cout << i << '\n';
It showed the entire range it was suppose to be looping over and there was no segfault.
With a little more experimentation I eventually created a string stream before the loop and write an empty string into it for each iteration of the loop and there is no segfault.
I tried some other assorted operations trying to figure out what is going on. I tried setting a variable j = i; and stuff like that, but I haven't found anything that works.
Running valgrind the only information I got on the segfault was that it was a "General Protection Fault" and something about default response to 11. It also mentions that there's a Conditional jump or move depends on uninitialized value(s), but looking at the code I can't figure out how that's possible.
What can this be? I am out of ideas to explore.
This is clearly a symptoms of invalid memory uses within your program.This would be bit difficult to find by looking out your code snippet as it is most likely be the side effect of something else bad which has already happened.
However as you have mentioned in your question that you are able to attach your program using Valgrind. as it is reproducible. So you may want to attach your program(a.out).
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Once you are able to figure it out your first error, fix it and rerun it and see what are other errors you are getting.This steps should be done till no error is getting reported by Valgrind.
However you should avoid using the raw pointers in modern C++ programs and start using std::vector std::unique_ptr as suggested by others as well.
Valgrind and GDB are very useful.
The most previous one that I used was GDB- I like it because it showed me the exact line number that the Segmentation Fault was on.
Here are some resources that can guide you on using GDB:
GDB Tutorial 1
GDB Tutorial 2
If you still cannot figure out how to use GDB with these tutorials, there are tons on Google! Just search debugging Segmentation Faults with GDB!
Good luck :)
That is hard, I used valgrind tools to debug seg-faults and it usually pointed to violations.
Likely your problem is freed memory that you are writing to i.e. sorted_array gets out of scope or gets freed.
Adding more code hides this problem as data allocation shifts around.
After a few days of experimentation, I figured out what was really going on.
For some reason the machine segfaults on unaligned access. That is, the integers I was writing were not being written to memory boundaries that were multiples of four bytes. Before the loop I computed the offset and shifted the array up that much:
int offset = (4 - (uintptr_t)(memory) % 4) % 4;
memory += offset;
After doing this everything behaved as expected again.

Can two doubles be equal and not equal at the same time?

I have a very strange bug in my program. I was not able to isolate the error in a reproducible code but at a certain place in my code there is:
double distance, criticalDistance;
...
if (distance > criticalDistance)
{
std::cout << "first branch" << std::endl;
}
if (distance == criticalDistance)
{
std::cout << "second branch" << std::endl;
}
In debug build everything is fine. Only one branch gets executed.
But in release build all hell breaks loose and sometimes both branches get executed.
This is very strange, since if I add the else conditional:
if (distance > criticalDistance)
{
std::cout << "first branch" << std::endl;
}
else if (distance == criticalDistance)
{
std::cout << "second branch" << std::endl;
}
This does not happen.
Please, what can be the cause of this? I am using gcc 4.8.1 on Ubuntu 13.10 on a 32 bit computer.
EDIT1:
I am using precompiler flags
-std=gnu++11
-gdwarf-3
EDIT2:
I do not think this is caused by a memory leak. I analyzed both release and debug builds with valgrind memory analyzer with tracking of unitialized memory and detection of self-modifiyng code and I found no errors.
EDIT3:
Changing the declaration to
volatile double distance, criticalDistance;
makes the problem go away. Does this confirm woolstar's answer? Is this a compiler bug?
EDIT4:
using the gcc option -ffloat-store also fixes the problem. If I understand this correctly this is caused by gcc.
if (distance > criticalDistance)
// true
if (distance == criticalDistance)
// also true
I have seen this behavior before in my own code. It is due to the mismatch between the standard 64 bit value stored in memory, and the 80 bit internal values that intel processors use for floating point calculation.
Basically, when truncated to 64 bits, your values are equal, but when tested at 80 bit values, one is slightly larger than the other. In DEBUG mode, the values are always stored to memory and then reloaded so they are always truncated. In optimized mode, the compiler reuses the value in the floating point register and it doesn't get truncated.
Please, what can be the cause of this?
Undefined behavior, aka. bugs in your code.
There is no IEEE floating point value which exhibits this behavior. So what's happening is that you are doing something wrong, which violates an assumption made by your compiler.
When optimizing your code, the compiler assumes that your code can be described by the C++ standard. If you do anything that is left undefined by the C++ standard, then these assumptions are violated, resulting in "weird" execution. It could be something "simple" like an uninitialized variable or a buffer overrun resulting in parts of the stack or heap being overwritten with garbage data, or it could be something more subtle, where you rely on a specific ordering between two operations, which is not guaranteed by the standard.
That is probably why you were not able to reproduce the problem in a small test case (the smaller test code does not contain the erroneous code), or and why you only see the error in optimized builds.
Of course, it is also possible that you've stumbled across a compiler bug, but a bug in your code is quite a bit more likely. :)
And best of all, it means that we don't really have a chance to debug the problem from the code snippet you've shown. We can say "the code shouldn't behave like that", but that's about all.
You are not initializing your doubles, are you sure that they always get a value?
I have found that uninitilized variables in debug is allways 0, but in release they can be pretty much anything.

Debugging problems with Visual Studio 2010 C++ - Odd values showing up

I am encountering some really odd behavior with my code.
It is throwing an access violation error when I run it and through the debugger I managed to find out the line which is causing it:
for( int x=0; x<width; x++) {
int current = edgeKernelXY(left+x,top+y,true,0);
....
}
I placed a debug point in the method edgeKernelXY and the code never even got into the method.
The next thing I checked was what the values that I am passing into it are. Left, top and y seem normal enough. However according to the debugger, x = 19840810 and current = 19840810. I don't understand how this could have happened especially if I declared x to be 0 at the beginning of the loop. Width is correct at 40.
x and current have not been declared anywhere else in scope of the forloop. What could be going wrong here?
EDIT:
I changed the code as follows:
for( int x=0; x<width; x++) {
int current = edgeKernelXY(left,top+y,true,0);
if( current > THRESHOLD &&
edgeKernelXY(left+x,top+y,true,1) > THRESHOLD &&
edgeKernelXY(left+x,top+y,true,2) > THRESHOLD ) {
} else {
current = 0.0f;
}
Specifically I changed left+x on the first call of edgeKernelXY to just left. This seems to run and the second call to edgeKernelXY shows x set correctly to 0. However the behavior is not what I want. Left + x still gets me crazy values for x which is causing an access violation.
for( int x=0; x<width; x++) {
int current;
current = 0;
current = edgeKernelXY(left+x,top+y,true,0);
Also shows problems with current.
Since you're debugging a release build you can't depend on the debugger showing the right values in watch windows, etc. Debugging release mode can be done, but is considerably trickier. Generally it requires looking at the disassembly to understand what's going on and what the state of things is (ie., x is almost certainly being kept in a register).
If you can't perform a full debug release, you can still get a much better debugging experience by building with optimizations turned off (which can be done in release builds and can be done without linking to debug libraries).
Assuming that you're building in the IDE, just go into the project settings and change the C/C++ - Optimization - Optimization value to Disabled (/Od).
If this causes a problem by setting it project-wide (which it shouldn't), you can do the same on a per-source-file basis by right clicking on the source files you're interested in debugging and setting the option in the properties for that source file. Just remember to clear those settings when you're done (and don't check them into source control) because the IDE doesn't make it obvious that there are source file specific settings so it can be confusing down the road.
Few things to check
Are there any double deletes in your program which might be messing up the heap (I know it doesn't explain the issue of the x on the stack, but something to check nevertheless)
Are there any statements between the for starting the function call edgeKernelXY? If so, please check those for any potential stack-overflows
The value of x being wrong smells like stack corruption to me which can happen due to multiple reasons
Is there an array that you created at the beginning of the function which overran the stack?
Could there be a function call that returned a pointer to an array (on its stack) which you are using assuming that the data is already available
All of the above will not give you the exact answer you are looking for, but some options for you to check. And most importantly - have you run your program through some memory profiler (like Valgrind) and checked the output? More often than not, that will show the cause of this problem.
If you do find out the cause for the problem, please do post the approach you followed. It will make the community's experience richer. Thanks.

Unitialized int value always the same (C++)

Given this code:
void main()
{
int x;
cout << x;
system("pause");
}
When I debug this piece of code, it always prints -858993460A. I read that its because VS set this as default value for Unitialized vars. But I also read that in release mode, this should get random value. But everytime I run this code in release mode I get 1772893972A , which Is not changing -> its not random. What is this? Why do I get this value?
Your confusion is in the assumption that "in release mode, this should get a random value." That is not true.
An uninitialized variable gets an "undefined" value. It could be random, but it doesn't have to be.
If you want x to have a random value, then use rand().
The main is not the real entrypoint of the executable, in general the real entrypoint is taken by the runtime library (and in VC++ is definitely like that), which performs some CRT initialization tasks and then calls your main. That value is probably a leftover of one of the function calls performed by the initialization code; the difference between the Debug and Release builds is probably due to different initialization/stack management between the two configurations. By the way, it's just a chance that such vales are always the same, probably they are from some parameter/variable that assumes the same value every time.
If it's not like that, it's probably stuff from some other initialization task internal to your process. It's not stuff from other processes or that just "happened" to be at that spot in physical memory, since Windows (on which your application is running) never gives memory pages that belonged to other processes without first blanking them.
Still, keep in mind that, as far as the standard is concerned, uninitialized variables have "indeterminate initial value" (§3.3.1 ¶9), so you should not rely on the values you may get by reading uninitialized variables. If you need random numbers, use the appropriate library functions.
I was forgetting... void main is not valid C++, it should be int main (§3.6.1 ¶2, "It shall have a return type of type int").
Interestingly, your DEBUG value in hex is 0xFFFFFFFFCCCCCCCC. Your RELEASE value in hex is just random. It could be that the debug compile adds a stack scribbler to make sure your uninitialized values are not sane (like 0) and would be quickly noticeable.

How to correctly benchmark a [templated] C++ program

< backgound>
I'm at a point where I really need to optimize C++ code. I'm writing a library for molecular simulations and I need to add a new feature. I already tried to add this feature in the past, but I then used virtual functions called in nested loops. I had bad feelings about that and the first implementation proved that this was a bad idea. However this was OK for testing the concept.
< /background>
Now I need this feature to be as fast as possible (well without assembly code or GPU calculation, this still has to be C++ and more readable than less).
Now I know a little bit more about templates and class policies (from Alexandrescu's excellent book) and I think that a compile-time code generation may be the solution.
However I need to test the design before doing the huge work of implementing it into the library. The question is about the best way to test the efficiency of this new feature.
Obviously I need to turn optimizations on because without this g++ (and probably other compilers as well) would keep some unnecessary operations in the object code. I also need to make a heavy use of the new feature in the benchmark because a delta of 1e-3 second can make the difference between a good and a bad design (this feature will be called million times in the real program).
The problem is that g++ is sometimes "too smart" while optimizing and can remove a whole loop if it consider that the result of a calculation is never used. I've already seen that once when looking at the output assembly code.
If I add some printing to stdout, the compiler will then be forced to do the calculation in the loop but I will probably mostly benchmark the iostream implementation.
So how can I do a correct benchmark of a little feature extracted from a library ?
Related question: is it a correct approach to do this kind of in vitro tests on a small unit or do I need the whole context ?
Thanks for advices !
There seem to be several strategies, from compiler-specific options allowing fine tuning to more general solutions that should work with every compiler like volatile or extern.
I think I will try all of these.
Thanks a lot for all your answers!
If you want to force any compiler to not discard a result, have it write the result to a volatile object. That operation cannot be optimized out, by definition.
template<typename T> void sink(T const& t) {
volatile T sinkhole = t;
}
No iostream overhead, just a copy that has to remain in the generated code.
Now, if you're collecting results from a lot of operations, it's best not to discard them one by one. These copies can still add some overhead. Instead, somehow collect all results in a single non-volatile object (so all individual results are needed) and then assign that result object to a volatile. E.g. if your individual operations all produce strings, you can force evaluation by adding all char values together modulo 1<<32. This adds hardly any overhead; the strings will likely be in cache. The result of the addition will subsequently be assigned-to-volatile so each char in each sting must in fact be calculated, no shortcuts allowed.
Unless you have a really aggressive compiler (can happen), I'd suggest calculating a checksum (simply add all the results together) and output the checksum.
Other than that, you might want to look at the generated assembly code before running any benchmarks so you can visually verify that any loops are actually being run.
Compilers are only allowed to eliminate code-branches that can not happen. As long as it cannot rule out that a branch should be executed, it will not eliminate it. As long as there is some data dependency somewhere, the code will be there and will be run. Compilers are not too smart about estimating which aspects of a program will not be run and don't try to, because that's a NP problem and hardly computable. They have some simple checks such as for if (0), but that's about it.
My humble opinion is that you were possibly hit by some other problem earlier on, such as the way C/C++ evaluates boolean expressions.
But anyways, since this is about a test of speed, you can check that things get called for yourself - run it once without, then another time with a test of return values. Or a static variable being incremented. At the end of the test, print out the number generated. The results will be equal.
To answer your question about in-vitro testing: Yes, do that. If your app is so time-critical, do that. On the other hand, your description hints at a different problem: if your deltas are in a timeframe of 1e-3 seconds, then that sounds like a problem of computational complexity, since the method in question must be called very, very often (for few runs, 1e-3 seconds is neglectible).
The problem domain you are modeling sounds VERY complex and the datasets are probably huge. Such things are always an interesting effort. Make sure that you absolutely have the right data structures and algorithms first, though, and micro-optimize all you want after that. So, I'd say look at the whole context first. ;-)
Out of curiosity, what is the problem you are calculating?
You have a lot of control on the optimizations for your compilation. -O1, -O2, and so on are just aliases for a bunch of switches.
From the man pages
-O2 turns on all optimization flags specified by -O. It also turns
on the following optimization flags: -fthread-jumps -falign-func‐
tions -falign-jumps -falign-loops -falign-labels -fcaller-saves
-fcrossjumping -fcse-follow-jumps -fcse-skip-blocks
-fdelete-null-pointer-checks -fexpensive-optimizations -fgcse
-fgcse-lm -foptimize-sibling-calls -fpeephole2 -fregmove -fre‐
order-blocks -freorder-functions -frerun-cse-after-loop
-fsched-interblock -fsched-spec -fschedule-insns -fsched‐
ule-insns2 -fstrict-aliasing -fstrict-overflow -ftree-pre
-ftree-vrp
You can tweak and use this command to help you narrow down which options to investigate.
...
Alternatively you can discover which binary optimizations are
enabled by -O3 by using:
gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
diff /tmp/O2-opts /tmp/O3-opts Φ grep enabled
Once you find the culpret optimization you shouldn't need the cout's.
If this is possible for you, you might try splitting your code into:
the library you want to test compiled with all optimizations turned on
a test program, dinamically linking the library, with optimizations turned off
Otherwise, you might specify a different optimization level (it looks like you're using gcc...) for the test functio n with the optimize attribute (see http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes).
You could create a dummy function in a separate cpp file that does nothing, but takes as argument whatever is the type of your calculation result. Then you can call that function with the results of your calculation, forcing gcc to generate the intermediate code, and the only penalty is the cost of invoking a function (which shouldn't skew your results unless you call it a lot!).
#include <iostream>
// Mark coords as extern.
// Compiler is now NOT allowed to optimise away coords
// This it can not remove the loop where you initialise it.
// This is because the code could be used by another compilation unit
extern double coords[500][3];
double coords[500][3];
int main()
{
//perform a simple initialization of all coordinates:
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
std::cout << "hello world !"<< std::endl;
return 0;
}
edit: the easiest thing you can do is simply use the data in some spurious way after the function has run and outside your benchmarks. Like,
StartBenchmarking(); // ie, read a performance counter
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
StopBenchmarking(); // what comes after this won't go into the timer
// this is just to force the compiler to use coords
double foo;
for (int j = 0 ; j < 500 ; ++j )
{
foo += coords[j][0] + coords[j][1] + coords[j][2];
}
cout << foo;
What sometimes works for me in these cases is to hide the in vitro test inside a function and pass the benchmark data sets through volatile pointers. This tells the compiler that it must not collapse subsequent writes to those pointers (because they might be eg memory-mapped I/O). So,
void test1( volatile double *coords )
{
//perform a simple initialization of all coordinates:
for (int i=0; i<1500; i+=3)
{
coords[i+0] = 3.23;
coords[i+1] = 1.345;
coords[i+2] = 123.998;
}
}
For some reason I haven't figured out yet it doesn't always work in MSVC, but it often does -- look at the assembly output to be sure. Also remember that volatile will foil some compiler optimizations (it forbids the compiler from keeping the pointer's contents in register and forces writes to occur in program order) so this is only trustworthy if you're using it for the final write-out of data.
In general in vitro testing like this is very useful so long as you remember that it is not the whole story. I usually test my new math routines in isolation like this so that I can quickly iterate on just the cache and pipeline characteristics of my algorithm on consistent data.
The difference between test-tube profiling like this and running it in "the real world" means you will get wildly varying input data sets (sometimes best case, sometimes worst case, sometimes pathological), the cache will be in some unknown state on entering the function, and you may have other threads banging on the bus; so you should run some benchmarks on this function in vivo as well when you are finished.
I don't know if GCC has a similar feature, but with VC++ you can use:
#pragma optimize
to selectively turn optimizations on/off. If GCC has similar capabilities, you could build with full optimization and just turn it off where necessary to make sure your code gets called.
Just a small example of an unwanted optimization:
#include <vector>
#include <iostream>
using namespace std;
int main()
{
double coords[500][3];
//perform a simple initialization of all coordinates:
for (int i=0; i<500; ++i)
{
coords[i][0] = 3.23;
coords[i][1] = 1.345;
coords[i][2] = 123.998;
}
cout << "hello world !"<< endl;
return 0;
}
If you comment the code from "double coords[500][3]" to the end of the for loop it will generate exactly the same assembly code (just tried with g++ 4.3.2). I know this example is far too simple, and I wasn't able to show this behavior with a std::vector of a simple "Coordinates" structure.
However I think this example still shows that some optimizations can introduce errors in the benchmark and I wanted to avoid some surprises of this kind when introducing new code in a library. It's easy to imagine that the new context might prevent some optimizations and lead to a very inefficient library.
The same should also apply with virtual functions (but I don't prove it here). Used in a context where a static link would do the job I'm pretty confident that decent compilers should eliminate the extra indirection call for the virtual function. I can try this call in a loop and conclude that calling a virtual function is not such a big deal.
Then I'll call it hundred of thousand times in a context where the compiler cannot guess what will be the exact type of the pointer and have a 20% increase of running time...
at startup, read from a file. in your code, say if(input == "x") cout<< result_of_benchmark;
The compiler will not be able to eliminate the calculation, and if you ensure the input is not "x", you won't benchmark the iostream.