In the following case I'm calling a Func with pointer passed to it, but in the called function, the parameter shows the pointer value as something totally bogus. Something like below.
bool flag = Func(pfspara);--> pfspara = 0x0091d910
bool Func(PFSPARA pfspara) --> pfspara = 0x00000005
{
return false;
}
Why does pfspara change to some bogus pointer? I can't reproduce the problem in debug, only in production.
Thanks.
If you are trying to debug optimized code in for example Visual Studio, you cannot always rely on the debugger properly showing the values of variables - especially not if the variable is unused so that the compiler probably optimizes it away.
Try running this instead:
bool Func(PFSPARA pfspara)
{
printf("%x\n", pfspara);
return false;
}
In general, this should never happen. Problems that can cause this type of symptoms include incompatibility in the compilation options between the calling and the caller modules, bad casting of member function pointers, or simply compiler bugs.
You need to provide a lot more details about your problem: Show the real code, specify your compiler, specify what are the debug vs. production compilation flags, etc.
In addition to Rasmus' comments, I find it is generally worth checking whether the problem occurs in a debug build as well as the release build. If you see genuine problems occurring in a release build but not in the debug build, it is often down to a bug that is exposed by the optimization processs, such as an uninitialised variable. There is a real danger in doing most of your testing in a debug build, to avoid the problem you are seeing here, and then shipping a release build. IMO, if you don't have a good regression test suite (preferably automated) I would avoid shipping opimized code.
It sounds like a buffer overflow problem to me -- something is overwriting that variable. But as mentioned in other answers, there's no way to tell for sure without some actual code to work with.
It sounds to me like you're scribbling on the stack... somewhere in your code a buffer on the stack is overflowing, or you're taking the address of an object on the stack and writing to it after the function returns. This is causing your stack to be corrupted.
It may only happen in release mode because the stack allocations are different due to optimization and exclusion of 'guard' blocks used to help check for this kind of condition.
Related
I dislike pointers, and generally try to write as much code as I can using refs instead.
I've written a very rudimentary "vertical layout" system for a small Win32 app. Most of the Layout methods look like this:
void Control::DoLayout(int availableWidth, int &consumedYAmt)
{
textYPosition = consumedYAmt;
consumedYAmt += measureText(font, availableWidth);
}
They are looped through like so:
int innerYValue = 0;
foreach(control in controls) {
control->DoLayout(availableWidth, innerYValue);
}
int heightOfControl = innerYValue;
It's not drawing its content here, just calculating exactly how much space this control will require (usually it's adding padding too, etc). This has worked great for me.......in debug mode.
I found that in Release mode, I could suddenly see tangible, loggable issues where, when I'm looping through controls and calling DoLayout(), the consumedYAmt variable actually stays at 0 in the outside loop. The most annoying part is that if I put in breakpoints and walk through the code line by line, this stops happening and parts of it are properly updated by the inside "add" methods.
I'm kind of thinking about whether this would be some compiler optimization where they think I'm simply adding the ref flag to ints as a way to optimize memory; or if there's any possibility this actually works in a way different from how it seems.
I would give a minimum reproducible example, but I wasn't able to do so with a tiny commandline app. I get the sense that if this is an optimization, it only kicks in for larger code blocks and indirections.
EDIT: Again sorry for generally low information, but I'm now getting hints that this might be some kind of linker issue. I skipped one part of the inheritance model in my pseudocode: The calling class actually calls "Layout()", which is a non-virtual function on the root definition of the class. This function performs some implementation-neutral logic, and then calls DoLayout() with the same arguments. However, I'm now noticing that if I try adding a breakpoint to Layout(), Visual Studio claims that "The breakpoint will not be hit. No executable code of the debugger's target code type is associated with this line." I am able to add breakpoints to certain other lines, but I'm beginning to notice weird stepping logic where it refuses to go inside certain functions, like Layout. Already tried completely clearing the build folders and rebuilding. I'm going to have to keep looking, since I have to admit this isn't a lot to go on.
Also, random addition: The "controls" list is a vector containing shared_ptr objects. I hadn't suspected the looping mechanism previously but now I'm looking more closely.
"the consumedYAmt variable actually stays at 0"
The behavior you describe is typical for a specific optimization that's more due to the CPU than the compiler. I suspect you're logging consumedYAmt from another thread. The updates to consumedYAmt simply don't make it to that other thread.
This is legal for the CPU, because the C++ compiler didn't put in memory fences. And the CPU compiler didn't put in fences because the variable isn't atomic.
In a small program without threads, this simply doesn't show up, nor does it show in debug mode.
Written by OP
Okay, eventually figured this one out. As simple as the issue was, pinning it down became difficult because of Release mode's debugger seemingly acting in inconsistent ways. When I changed tactic to adding Logging statements in lots of places, I found that my Control class had an "mShowing" variable that was uninitialized in its constructor. In debug mode, it apparently retained uninitialized memory which I guess made it "true" - but in release mode, my best analysis is that memory protections made it default to "false", which as it turns out skipped the main body of the DoLayout method most of the time.
Since through the process, responders were low on information to work with (certainly could've been easier if I posted a longer example), I instead simply upvoted each comment that mentioned uninitialized variables.
I'm currently dealing with one of the strangest bugs I have ever seen. I have this "else if" statement and inside the else-if I have a line of code that is causing a bug to happen elsewhere (my program is kind of complicated so I don't think it would help to post a short snippet of code here because it would be too difficult to explain -- so I apologize in advance if this post seems rather vague).
The issue is that the line of code that is causing the bug is not being called at all. I put a break point at the line and also put a print statement before it but the program never enters that particular "if-else" statement. The bug goes away when I comment out the line and shows up again when I uncomment it. This leads me to believe that the line must be getting called somehow but my break point and prints suggest otherwise.
Has anyone ever heard of something like this happening? Why would a line of code that is not even being called affect the rest of my program? Are there other ways to detect if the line is being called somehow besides using breakpoints and print statements?
I'm using XCode as my IDE and this is a single threaded program (so it's not some weird asynchronous bug)
PROBLEM HAS BEEN SOLVED. SEE TOP ANSWER
It actually may happen in some cases indeed and I already saw it before. If the bug is some slight buffer overflow then the presence/abasence of that line may make the compiler differently optimize the memory layout (ie. not allocate some variables or arrange them in a different way or place segments differently for example) that will by chance not trigger the problem anymore.
The same applies if the bug is a strange race condition: the lack of that line may change slightly the timings (due to differently optimized code) and make the bug come out.
Very long shot: that code may even somehow trigger a compiler bug. But this may be less the case, but it may.
So: yes it's definitely possible and I already saw it. And if something like this is happening to you and you're 100% sure your code is correct then be very careful since something quite nasty may be hiding in the code.
Not that there is enough information here to really answer this question, but perhaps I can try to provide some pointers.
Optimization. Try turning it off if it is turned on at all. When the compiler is optimizing your code, it may make different decisions when a statement is present vs. when it isn't. Whenever I am dealing with bugs that go away when a a seemingly unrelated code construct is changed, it is usually that the optimizer is doing something differently. I suppose a very simple example would be if the statement accesses something that the compiler would otherwise think is never accessed and can be folded away. The "unrelated" line may take an address of something which affects aliasing information, etc. All this isn't to say that the optimizer is wrong, your code likely still does have a bug, but it just explains the weird behaviour.
Debugging. It is very unlikely that the bug is in this line that is not reached. What I would focus on is setting watch points for the variables that are getting incorrect values (assuming you can narrow it down to what is receiving the wrong value).
These very weird issues often indicate an uninitialized variable or something along those lines. If the underlying compiler supports it, you can try providing an option that will cause the program to trap when an uninitialized memory location is being accessed.
Finally, this could be an indication that something is overwriting areas of the stack (and perhaps less likely other memory areas - heap, data). If you have anything that is writing to arrays allocated on the stack, check if you're walking past the end of the array.
Make sure you're using { and } around the bodies in your if() and else clauses. Without them it's easy to wind up with something like this:
if (a)
do_a_work();
if (b)
do_a_and_b_work();
else
do_not_a_work();
Which actually equates to this (and is not what's implied by the indenting):
if (a) {
do_a_work();
}
if (b) {
do_a_and_b_work();
} else {
do_not_a_work();
}
because the brace-less if (a) only takes the first statement below it for its body.
Commenting out your mystery line may be changing which code belongs to which if-else.
I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Common reasons for bugs in release version not present in debug mode
Sometimes I encouter such strange situations that the program run incorrectly while running normally and it will pop-up the termination dialog,but correctly while debugging.This do make me frustrated when I want to use debugger to find the bug inside my code.
Have you ever met this kind of situation and why?
Update:
To prove there are logic reasons that will led such a frustrating situation:
I think one big possibility is heap access volidation. I once wrote a function that allocate a small buffer, but later I step out the boudary. It will run correctly within gdb, cdb, etc (I do not know why, but it do run correctly); but terminate abnormally while running normally.
I am using C++.
I do not think my problem duplicate the above one.
That one is comparision between release mode and debug mode,but mine is between debugging and not debugging,which have a word heisenbug, as many other noted.
thanks.
You have a heisenbug.
Debugger might be initializing values
Some environments initialize variables and/or memory to known values like zero in debug builds but not release builds.
Release might be built with optimizations
Modern compilers are good, but it could hypothetically happen that optimized code functions differently than non-optimized code. Edit: These days, compiler bugs are rare. If you find yourself thinking you have one, exhaust all other ideas first.
There can be other reasons for heisenbugs.
Here's a common gotcha that can lead to a Heisenbug (love that name!):
// Sanity check - this should never fail
ASSERT( ReleaseResources() == SUCCESS);
In a debug build, this will work as expected, but the ASSERT macro's argument is ignored in a release build. By ignored, I mean that not only won't the result be reported, but the expression won't be evaluated at all (i.e. ReleaseResources() won't be called).
This is a common mistake, and it's why the Windows SDK defines a VERIFY() macro in addition to the ASSERT() macro. They both generate an assertion dialog at runtime in a debug build if the argument evaluates to false. Their behavior is different for a release build, however. Here's the difference:
ASSERT( foo() == true ); // Confirm that call to foo() was successful
VERIFY( bar() == true ); // Confirm that call to bar() was successful
In a debug build, the above two macros behave identically. In a release build, however, they are essentially equivalent to:
; // Confirm that call to foo() was successful
bar(); // Confirm that call to bar() was successful
By the way, if your environment defines an ASSERT() macro, but not a VERIFY() macro, you can easily define your own:
#ifdef _DEBUG
// DEBUG build: Define VERIFY simply as ASSERT
# define VERIFY(expr) ASSERT(expr)
#else
// RELEASE build: Define VERIFY as the expression, without any checking
# define VERIFY(expr) ((void)(expr))
#endif
Hope that helps.
Apparently stackoverflow won't let me post a response which contains only a single word :)
VALGRIND
When using a debugger, sometimes memory gets initialized (e.g. zero'ed) whereas without a debugging session, memory can be random. This could explain the behavior you are seeing.
You have dialogs, so there may be threads in your application. If there is threads, there is a possibility of race conditions.
Let say your main thread initialize a structure that another thread uses. When you run your program inside the debugger the initializing thread may be scheduled before the other thread while in your real-life situation the thread that use the structure is scheduled before the other thread actually initialize it.
In addition to what JeffH said, you have to consider if the deploying computer (or server) has the same environment/libraries/whatever_related_to_the_program.
Sometimes it's very difficult to debug correctly if you debug with other conditions.
Giovanni
Also, debuggers might add some padding around allocated memory changing the behaviour. This has caught me out a number of times, so you need to be aware of it. Getting the same memory behaviour in debug is important.
For MSVC, this can be disabled with the env-var _NO_DEBUG_HEAP=1. (The debug heap is slow, so this helps if your debug runs are hideously slow too..).
Another method to get the same is to start the process outside the debugger, so you get a normal startup, then wait on first line in main and attach the debugger to process. That should work for "any" system. provided that you don't crash before main. (You could wait on a ctor on a statically pre-mani constructed object then...)
But I've no experience with gcc/gdb in this matter, but things might be similar there... (Comments welcome.)
One real-world example of heisenbug from Raymand Zhang.
/*--------------------------------------------------------------
GdPage.cpp : a real example to illustrate Heisenberg Effect
related with guard page by Raymond Zhang, Oct. 2008
--------------------------------------------------------------*/
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
LPVOID lpvAddr; // address of the test memory
lpvAddr = VirtualAlloc(NULL, 0x4096,
MEM_RESERVE | MEM_COMMIT,
PAGE_READONLY | PAGE_GUARD);
if(lpvAddr == NULL)
{
printf("VirtualAlloc failed with %ld\n", GetLastError());
return -1;
}
return *(long *)lpvAddr;
}
The program would terminate abnormally whether compile with Debug or Release,because
by specifying the PAGE_GUARD flag would cause the:
Pages in the region become guard
pages. Any attempt to read from or
write to a guard page causes the
system to raise a STATUS_GUARD_PAGE
exception and turn off the guard page
status. Guard pages thus act as a
one-shot access alarm.
So you'd get STATUS_GUARD_PAGE while trying to access *lpvAddr.But if you use debugger load the program and watch *lpvAddv or step the last statement return *(long *)lpvAddr assembly by assembly,the debugger would forsee the guard page to determine the value of *lpvAddr.So the debugger would have cleared the guard alarm for us before we access *lpvAddr.
Which programming language are you using. Certain languages, such as C++, behave slightly differently between release and debug builds. In the case of C++, this means that when you declare a var, such as int i;, in debug builds it will be initialised to 0, while in release builds it may take any value (whatever was stored in its memory location before).
One big reason is that debug code may define the _DEBUG macro that one may use in the code to add extra stuff in debug builds.
For multithreaded code, optimization may affect ordering which may influence race conditions.
I do not know if debug code adds code on the stack to mark stack frames. Any extra stuff on the stack may hide the effects of buffer overruns.
Try using the same command options as your release build and just add the -g (or equivalent debug flag). gcc allows the debug option together with the optimization options.
If your logic depends on data from the system clock, you could see serious probe effects. If you break into the debugger, you will obviously effect the values returned from clock functions such as timeGetTime(). The same is true if your program takes longer to execute. As other people have said, debug builds insert NOOPs. Also, simply running under the debugger (without hitting breakpoints) might slow things down.
An example of where this might happen is a real-time physics simulation with a variable time step, based off elapsed system time. This is why there are articles like this:
http://gafferongames.com/game-physics/fix-your-timestep/
I'm running my C++ program in gdb. I'm not real experienced with gdb, but I'm getting messages like:
warning: HEAP[test.exe]:
warning: Heap block at 064EA560 modified at 064EA569 past requested size of 1
How can I track down where this is happening at? Viewing the memory doesn't give me any clues.
Thanks!
So you're busting your heap. Here's a nice GDB tutorial to keep in mind.
My normal practice is to set a break in known good part of the code. Once it gets there step through until you error out. Normally you can determine the problem that way.
Because you're getting a heap error I'd assume it has to do with something you're putting on the heap so pay special attention to variables (I think you can use print in GDB to determine it's memory address and that may be able to sync you with where your erroring out). You should also remember that entering functions and returning from functions play with the heap so they may be where your problem lies (especially if you messed your heap before returning from a function).
You can probably use a feature called a "watch point". This is like a break point but the debugger stops when the memory is modified.
I gave a rough idea on how to use this in an answer to a different question.
If you can use other tools, I highly recommend trying out Valgrind. It is an instrumentation framework, that can run your code in a manner that allows it to, typically, stop at the exact instruction that causes the error. Heap errors are usually easy to find, this way.
One thing you can try, as this is the same sort of thing as the standard libc, with the MALLOC_CHECK_ envronment variable configured (man libc).
If you keep from exiting gdb (if your application quit's, just use "r" to re-run it), you can setup a memory breakpoint at that address, "hbreak 0x64EA569", also use "help hbreak" to configure condition's or other breakpoitn enable/disable options to prevent excessively entering that breakpoint....
You can just configure a log file, set log ... setup a stack trace on every break, "display/bt -4", then hit r, and just hold down the enter key and let it scroll by
"or use c ## to continue x times... etc..", eventually you will see that same assertion, then you will now have (due to the display/bt) a stacktrace which you can corolate to what code was modifying on that address...
I had similar problem when I was trying to realloc array of pointers to my structures, but instead I was reallocating as array of ints (because I got the code from tutorial and forgot to change it). The compiler wasnt correcting me because it cannot be checked whats in size argument.
My variable was:
itemsetList_t ** iteration_isets;
So in realloc instead of having:
iteration_isets = realloc(iteration_isets, sizeof(itemsetList_t *) * max_elem);
I had:
iteration_isets = realloc(iteration_isets, sizeof(int) * max_elem);
And this caused my heap problem.