debug stack overflow in windows? - c++

So I'm trying to debug this strange problem where a process ends without calling some destructors...
In the VS (2005) debugger, I hit 'Break all' and look around in the call stacks of the threads of the misteriously disappearing process, when I see this:
smells like SO http://img6.imageshack.us/img6/7628/95434880.jpg
This definitely looks like a SO in the making, which would explain why the process runs to its happy place without packing its suitcase first.
The problem is, the VS debugger's call stack only shows what you can see in the image.
So my question is: how can I find where the infinite recursion call starts?
I read somewhere that in Linux you can attach a callback to the SIGSEGV handler and get more info on what's going on.
Is there anything similar on Windows?

To control what Windows does in case of an access violation (SIGSEGV-equivalent), call SetErrorMode (pass it parameter 0 to force a popup in case of errors, allowing you to attach to it with a debugger.)
However, based on the stack trace you have already obtained, attaching with a debugger on fault may yield no additional information. Either your stack has been corrupted, or the depth of recursion has exceeded the maximum number of frames displayable by VS. In the latter case, you may want to decrease the default stack size of the process (use the /F switch or equivalent option in the Project properties) in order to make the problem manifest itself sooner, and make sure that VS will display all frames. You may, alternatively, want to stick a breakpoint in std::basic_filebuf<>::flush() and walk through it until the destruction phase (or disable it until just prior to the destruction phase.)

Well, you know what thread the problem is on - it might be a simple matter of tracing through it from inception to see where it goes off into the weeds.
Another option is to use one of the debuggers in the Debugging Tools for Windows package - they may be able to show more than the VS debugger (maybe), even if they are generally more complex and difficult to use (actually maybe because of that).

That does look at first glance like an infinite recursion, you could try putting a breakpoint at the line before the one that terminates the process. Does it get there ok? If it does, you've got two fairly easy ways to go.
Either you just step forward and see which destructors get called and when it gets caught up. Or you could put a printf/OutputDebugString in every relevant objects destructor (ONly ones which are globals should need this). If the message is the first thing the destructor does, then the last message you see is from the destructor which hangs things up.
On the other hand, if it doesn't get to that breakpoint I originally mentioned, then can do something similar, but it will be more annoying since the program is still "doing stuff".

I wouldn't rule out there being such a handler in Windows, but I've never heard of it.
I think the traceback that you're showing may be bogus. If you broke into the process after some kind of corruption had already occurred, then the traceback isn't necessarily valid. However, if you're lucky the bottom of the stack trace still has some clues about what's going on.
Try putting Sleep() calls into selected functions in your source that might be involved in the recursion. That should give you a better chance of breaking into the process before the stack has completely overflowed.

I agree with Dan Breslau. Your stack is bogus. may be simply because you don't have the right symbols, though.
If a program simply disappears without the WER handling kicking in, it's usually an out of memory condition. Have you gone investigated that possibility ?

Related

Analysing a Crash

I'm writing a C++ Software Image processing tool.The tool works fine, but suddenly it stops and it never sends any exception or any crash or nothing that can let me which line or which area that does that crash.
How can I determine that faulty code in that situation?
There's a few things you can do:
First of all though, it sounds more like an infinite loop, deadlock, or like you're using all of your system resources and it's just slowing down and taking a very (possibly infinite) long time. If that's the case, you'll have to find it with debugging.
Things that you can try - not necessarily in this order:
Look for shared variables you're using.  Is there a chance that you
have a deadlock with threads and mutexes?  Think about it and try to
fix it.
Check for use of uninitialized variables/pointers.  Sometimes
(rarely) you can get very strange behavior when you invoke undefined
behavior - I'm not a windows C++ dev (I work on Linux), but it
wouldn't be the first time I saw a lockup from a segmentation fault.
Add error output (std::cerr/stderror) to your processing logic so
you can see how far in it crashes. After that, set a condition to
catch it around that point so you can watch it happen in the
debugger and see the state of your variables and what might be
wrong.
Do a stack trace so you can see what calls have been hit most
recently. This will at least let you know the last function chain
that executed.
Have you used stack tracing before?
Look up MSDN documentation on how to use them. They have different type of stack trace depending on your application.
You could
Use a debugger
Add logging code to your program
Cut sections of code from your program until it starts working
Start with the first.

C++: Where to start when my application crashes at random places?

I'm developing a game and when I do a specific action in the game, it crashes.
So I went debugging and I saw my application crashed at simple C++ statements like if, return, ... Each time when I re-run, it crashes randomly at one of 3 lines and it never succeeds.
line 1:
if (dynamic) { ... } // dynamic is a bool member of my class
line 2:
return m_Fixture; // a line of the Box2D physical engine. m_Fixture is a pointer.
line 3:
return m_Density; // The body of a simple getter for an integer.
I get no errors from the app nor the OS...
Are there hints, tips or tricks to debug more efficient and get known what is going on?
That's why I love Java...
Thanks
Random crashes like this are usually caused by stack corruption, since these are branching instructions and thus are sensitive to the condition of the stack. These are somewhat hard to track down, but you should run valgrind and examine the call stack on each crash to try and identify common functions that might be the root cause of the error.
Are there hints, tips or tricks to debug more efficient and get known what is going on?
Run game in debugger, on the point of crash, check values of all arguments. Either using visual studio watch window or using gdb. Using "call stack" check parent routines, try to think what could go wrong.
In suspicious(potentially related to crash) routines, consider dumping all arguments to stderr (if you're using libsdl or on *nixlike systems), or write a logfile, or send dupilcates of all error messages using (on Windows) OutputDebugString. This will make them visible in "output" window in visual studio or debugger. You can also write "traces" (log("function %s was called", __FUNCTION__))
If you can't debug immediately, produce core dumps on crash. On windows it can be done using MiniDumpWriteDump, on linux it is set somewhere in configuration variables. core dumps can be handled by debugger. I'm not sure if VS express can deal with them on Windows, but you still can debug them using WinDBG.
if crash happens within class, check *this argument. It could be invalid or zero.
If the bug is truly evil (elusive stack corruption in multithreaded app that leads to delayed crash), write custom memory manager, that will override new/delete, provide alternative to malloc(if your app for some reason uses it, which may be possible), AND that locks all unused memory memory using VirtualProtect (windows) or OS-specific alternative. In this case all potentially dangerous operation will crash app instantly, which will allow you to debug the problem (if you have Just-In-Time debugger) and instantly find dangerous routine. I prefer such "custom memory manager" to boundschecker and such - since in my experience it was more useful. As an alternative you could try to use valgrind, which is available on linux only. Note, that if your app very frequently allocates memory, you'll need a large amount of RAM in order to be able to lock every unused memory block (because in order to be locked, block should be PAGE_SIZE bytes big).
In areas where you need sanity check either use ASSERT, or (IMO better solution) write a routine that will crash the application (by throwing an std::exception with a meaningful message) if some condition isn't met.
If you've identified a problematic routine, walk through it using debugger's step into/step over. Watch the arguments.
If you've identified a problematic routine, but can't directly debug it for whatever reason, after every statement within that routine, dump all variables into stderr or logfile (fprintf or iostreams - your choice). Then analyze outputs and think how it could have happened. Make sure to flush logfile after every write, or you might miss the data right before the crash.
In general you should be happy that app crashes somewhere. Crash means a bug you can quickly find using debugger and exterminate. Bugs that don't crash the program are much more difficult (example of truly complex bug: given 100000 values of input, after few hundreds of manipulations with values, among thousands of outputs, app produces 1 absolutely incorrect result, which shouldn't have happened at all)
That's why I love Java...
Excuse me, if you can't deal with language, it is entirely your fault. If you can't handle the tool, either pick another one or improve your skill. It is possible to make game in java, by the way.
These are mostly due to stack corruption, but heap corruption can also affect programs in this way.
stack corruption occurs most of the time because of "off by one errors".
heap corruption occurs because of new/delete not being handled carefully, like double delete.
Basically what happens is that the overflow/corruption overwrites an important instruction, then much much later on, when you try to execute the instruction, it will crash.
I generally like to take a second to step back and think through the code, trying to catch any logic errors.
You might try commenting out different parts of the code and seeing if it affects how the program is compiled.
Besides those two things you could try using a debugger like Visual Studio or Eclipse etc...
Lastly you could try to post your code and the error you are getting on a website with a community that knows programming and could help you work through the error (read: stackoverflow)
Crashes / Seg faults usually happen when you access a memory location that it is not allowed to access, or you attempt to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location).
There are many memory analyzer tools, for example I use Valgrind which is really great in telling what the issue is (not only the line number, but also what's causing the crash).
There are no simple C++ statements. An if is only as simple as the condition you evaluate. A return is only as simple as the expression you return.
You should use a debugger and/or post some of the crashing code. Can't be of much use with "my app crashed" as information.
I had problems like this before. I was trying to refresh the GUI from different threads.
If the if statements involve dereferencing pointers, you're almost certainly corrupting the stack (this explains why an innocent return 0 would crash...)
This can happen, for instance, by going out of bounds in an array (you should be using std::vector!), trying to strcpy a char[]-based string missing the ending '\0' (you should be using std::string!), passing a bad size to memcpy (you should be using copy-constructors!), etc.
Try to figure out a way to reproduce it reliably, then place a watch on the corrupted pointer. Run through the code line-by-line until you find the very line that corrupts the pointer.
Look at the disassembly. Almost any C/C++ debugger will be happy to show you the machine code and the registers where the program crashed. The registers include the Instruction Pointer (EIP or RIP on x86/x64) which is where the program was when it stopped. The other registers usually have memory addresses or data. If the memory address is 0 or a bad pointer, there is your problem.
Then you just have to work backward to find out how it got that way. Hardware breakpoints on memory changes are very helpful here.
On a Linux/BSD/Mac, using GDB's scripting features can help a lot here. You can script things so that after the breakpoint is hit 20 times it enables a hardware watch on the address of array element 17. Etc.
You can also write debugging into your program. Use the assert() function. Everywhere!
Use assert to check the arguments to every function. Use assert to check the state of every object before you exit the function. In a game, assert that the player is on the map, that the player has health between 0 and 100, assert everything that you can think of. For complicated objects write verify() or validate() functions into the object itself that checks everything about it and then call those from an assert().
Another way to write in debugging is to have the program use signal() in Linux or asm int 3 in Windows to break into the debugger from the program. Then you can write temporary code into the program to check if it is on iteration 1117321 of the main loop. That can be useful if the bug always happens at 1117322. The program will execute much faster this way than to use a debugger breakpoint.
some tips :
- run your application under a debugger, with the symbol files (PDB) together.
- How to set Visual Studio as the default post-mortem debugger?
- set default debugger for WinDbg Just-in-time Debugging
- check memory allocations Overriding new and delete, and Overriding malloc and free
One other trick: turn off code optimization and see if the crash points make more sense. Optimization is allowed to float little bits of your code to surprising places; mapping that back to source code lines can be less than perfect.
Check pointers. At a guess, you're dereferencing a null pointer.
I've found 'random' crashes when there are some reference to a deleted object. As the memory is not necessarily overwritten, in many cases you don't notice it and the program works correctly, and than crashes after the memory was updated and is not valid anymore.
JUST FOR DEBUGGING PURPOSES, try commenting out some suspicious 'deletes'. Then, if it doesn't crash anymore, there you are.
use the GNU Debugger
Refactoring.
Scan all the code, make it clearer if not clear at first read, try to understand what you wrote and immediately fix what seems incorrect.
You'll certainly discover the problem(s) this way and fix a lot of other problems too.

What exactly does a debugger do?

I've stumbled onto a very interesting issue where a function (has to deal with the Windows clipboard) in my app only works properly when a breakpoint is hit inside the function. This got me wondering, what exactly does the debugger do (VS2008, C++) when it hits a breakpoint?
Without directly answering your question (since I suspect the debugger's internal workings may not really be the problem), I'll offer two possible reasons this might occur that I've seen before:
First, your program does pause when it hits a breakpoint, and often that delay is enough time for something to happen (perhaps in another thread or another process) that has to happen before your function will work. One easy way to verify this is to add a pause for a few seconds beforehand and run the program normally. If that works, you'll have to look for a more reliable way of finding the problem.
Second, Visual Studio has historically (I'm not certain about 2008) over-allocated memory when running in debug mode. So, for example, if you have an array of int[10] allocated, it should, by rights, get 40 bytes of memory, but Visual Studio might give it 44 or more, presumably in case you have an out-of-bounds error. Of course, if you DO have an out-of-bounds error, this over-allocation might make it appear to be working anyway.
Typically, for software breakpoints, the debugger places an interrupt instruction at the location you set the breakpoint at. This transfers control of the program to the debugger's interrupt handler, and from there you're in a world where the debugger can decide what to do (present you with a command prompt, print the stack and continue, what have you.)
On a related note, "This works in the debugger but not when I run without a breakpoint" suggests to me that you have a race condition. So if your app is multithreaded, consider examining your locking discipline.
It might be a timing / thread synchronization issue. Do you do any multimedia or multithreading stuff in your program?
The reason your app only works properly when a breakpoint is hit might be that you have some watches with side effects still in your watch list from previous debugging sessions. When you hit the break point, the watch is executed and your program behaves differently.
http://en.wikipedia.org/wiki/Debugger
A debugger essentially allows you to step through your source code and examine how the code is working. If you set a breakpoint, and run in debug mode, your code will pause at that break point and allow you to step into the code. This has some distinct advantages. First, you can see what the status of your variables are in memory. Second, it allows you to make sure your code is doing what you expect it to do without having to do a whole ton of print statements. And, third, it let's you make sure the logic is working the way you expect it to work.
Edit: A debugger is one of the more valuable tools in my development toolbox, and I'd recommend that you learn and understand how to use the tool to improve your development process.
I'd recommend reading the Wikipedia article for more information.
The debugger just halts execution of your program when it hits a breakpoint. If your program is working okay when it hits the breakpoint, but doesn't work without the breakpoint, that would indicate to me that you have a race condition or another threading issue in your code. The breakpoint is stopping the execution of your code, perhaps allowing another process to complete normally?
It stops the program counter for your process (the one you are debugging), and shows the current value of your variables, and uses the value of your variables at the moment to calculate expressions.
You must take into account, that if you edit some variable value when you hit a breakpoint, you are altering your process state, so it may behave differently.
Debugging is possible because the compiler inserts debugging information (such as function names, variable names, etc) into your executable. Its possible not to include this information.
Debuggers sometimes change the way the program behaves in order to work properly.
I'm not sure about Visual Studio but in Eclipse for example. Java classes are not loaded the same when ran inside the IDE and when ran outside of it.
You may also be having a race condition and the debugger stops one of the threads so when you continue the program flow it's at the right conditions.
More info on the program might help.
On Windows there is another difference caused by the debugger. When your program is launched by the debugger, Windows will use a different memory manager (heap manager to be exact) for your program. Instead of the default heap manager your program will now get the debug heap manager, which differs in the following points:
it initializes allocated memory to a pattern (0xCDCDCDCD comes to mind but I could be wrong)
it fills freed memory with another pattern
it overallocates heap allocations (like a previous answer mentioned)
All in all it changes the memory use patterns of your program so if you have a memory thrashing bug somewhere its behavior might change.
Two useful tricks:
Use PageHeap to catch memory accesses beyond the end of allocated blocks
Build using the /RTCsu (older Visual C++ compilers: /GX) switch. This will initialize the memory for all your local variables to a nonzero bit pattern and will also throw a runtime error when an unitialized local variable is accessed.

How can I guarantee catching a EXCEPTION_STACK_OVERFLOW structured exception in C++ under Visual Studio 2005?

Background
I have an application with a Poof-Crash[1]. I'm fairly certain it is due to a blown stack.
The application is Multi-Threaded.
I am compiling with "Enable C++ Exceptions: Yes With SEH Exceptions (/EHa)".
I have written an SE Translator function and called _set_se_translator() with it.
I have written functions for and setup set_terminate() and set_unexpected().
To get the Stack Overflow, I must run in release mode, under heavy load, for several days. Running under a debugger is not an option as the application can't perform fast enough to achieve the runtime necessary to see the issue.
I can simulate the issue by adding infinite recursion on execution of one of the functions, and thus test the catching of the EXCEPTION_STACK_OVERFLOW exception.
I have WinDBG setup as the crash dump program, and get good information for all other crash issues but not this one. The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
The Question
None of the things I've tried has resulted in picking up the EXCEPTION_STACK_OVERFLOW exception.
Does anyone know how to guarantee getting a a chance at this exception during runtime in release mode?
Definitions
Poof-Crash: The application crashes by going "poof" and disappearing without a trace.
(Considering the name of this site, I'm kind of surprised this question isn't on here already!)
Notes
An answer was posted briefly about adjusting the stack size to potentially force the issue sooner and allow catching it with a debugger. That is a clever thought, but unfortunately, I don't believe it would help. The issue is likely caused by a corner case leading to infinite recursion. Shortening the stack would not expose the issue any sooner and would likely cause an unrelated crash in validly deep code. Nice idea though, and thanks for posting it, even if you did remove it.
Everything prior to windows xp would not (or would be harder) generally be able to trap stack overflows. With the advent of xp, you can set vectored exception handler that gets a chance at stack overflow prior to any stack-based (structured exception) handlers (this is being the very reason - structured exception handlers are stack-based).
But there's really not much you can do even if you're able to trap such an exception.
In his blog, cbrumme (sorry, do not have his/her real name) discusses a stack page neighboring the guard page (the one, that generates the stack overflow) that can potentially be used for backout. If you can squeeze your backout code to use just one stack page - you can free as much as your logic allows. Otherwise, the application is pretty much dead upon encountering stack overflow. The only other reasonable thing to do, having trapped it, is to write a dump file for later debugging.
Hope, it helps.
I'm not convinced that you're on the right track in diagnosing this as a stack overflow.
But in any case, the fact that you're getting a poof!, plus what you're seeing in WinDbg
The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
suggests to me that somebody has called the C RTL exit() function, or possibly called the Windows API TerminateProcess() directly. That could have something to do with your interrupt handlers or not. Maybe something in the exception handling logic has a re-entrance check and arbitrarily decides to exit() if it's reentered.
My suggestion is to patch your executables to put maybe an INT 3 debug at the entry point to exit (), if it's statically linked, or if it's dynamically linked, patch up the import and also patch up any imports of kernel32::TerminateProcess to throw a DebugBreak() instead.
Of course, exit() and/or TerminateProcess() may be called on a normal shutdown, too, so you'll have to filter out the false alarms, but if you can get the call stack for the case where it's just about to go proof, you should have what you need.
EDIT ADD: Just simply writing your own version of exit() and linking it in instead of the CRTL version might do the trick.
I remember code from a previous workplace that sounded similar having explicit bounds checks on the stack pointer and throwing an exception manually.
It's been a while since I've touched C++ though, and even when I did touch it I didn't know what I was doing, so caveat implementor about portability/reliability of said advice.
Have you considered ADPlus from Debugging Tools for Windows?
ADPlus attaches the CDB debugger to a process in "crash" mode and will generate crash dumps for most exceptions the process generates. Basically, you run "ADPlus -crash -p yourPIDhere", it performs an invasive attach and begins logging.
Given your comment above about running under a debugger, I just wanted to add that CDB adds virtually zero overhead in -crash mode on a decent (dual-core, 2GB RAM) machine, so don't let that hold you back from trying it.
You can generate debugging symbols without disabling optimizations. In fact, you should be doing that anyways. It just makes debugging harder.
And the documentation for _set_se_translator says that each thread has its own SE translator. Are you setting one for each thread?
set_unexpected is probably a no-op, at least according to the VS 2005 documentation. And each thread also has its own terminate handler, so you should install that per thread as well.
I would also strongly recommend NOT using SE translation. It takes hardware exceptions that you shouldn't ignore (i.e., you should really log an error and terminate) and turns them into something you can ignore (C++ exceptions). If you want to catch this kind of error, use a __try/__except handler.

Debugging a crash after exiting? (After main returned)

This is a fairly involved bug, and I've tried looking around to find other sources of help, but for reasons I don't understand, "Program Crashes in Vista" is not the most helpful query.
The issue I'm having is that the program I'm working on - a graphical, multithreaded data visualization software that uses OpenGL and the Windows API - is crashing after WinMain() returns. I've tried stepping through the shutdown routine as well as looking at a stack trace, and the last bit of code that's not assembly is _crtExitProcess, where it crashes in the actual ExitProcess(0) call. After that, the stack trace shows kernel32.dll and four ntdll.dll, which is where it actually crashes.
This bug only occurs on Vista, and the same exact code when run on XP exits normally. I really can't think of anything that would help me debug this problem, and debugging this issue is something I've never really learned. Any help would be appreciated.
I've done a little digging around, and I've found a couple of posts around that suggest you're not the only one suffering from this:
http://developer.nvidia.com/forums/index.php?showtopic=318
http://objectmix.com/xml-soap/115379-problem-latest-ms-patches-msxml4-vista.html
Particularly, the second one is of interest, where Tom Chm mentions:
We believe we have identified the root
cause of our crash, and adding a
virtual destructor to our interface
class wrapper seems to resolve our
problem. But we would like to know the
exact cause of the crash to verify
that we didn't just sweep the actual
problem under the rug.
The problem may be with a destructor somewhere, or lack thereof. If you have a way of attaching a debugger and stepping through the shutdown process, it might be of help.
You might want to read through the whole thread and see if there's something you can learn. That is, if you haven't already found these posts in your searching, of course.
Sounds a little like a problem with a destructor.
Check for objects that are destroyed on shutdown. This will mainly be global or static objects. Look carefully at their destructors for a case of accessing something that is no longer valid. In a multi-threaded environment, it might be a race condition, where an object is destroyed while another thread is still using it.
Try writing a log as objects are destroyed. e.g.
SomeClass::~SomeClass()
{
WriteLog("Begin ~SomeClass()");
// do whatever
WriteLog("End ~SomeClass()");
}
WriteLog() should open the log file, write and then close the file to make sure the file is flushed. Using a Mutex or CriticalSection to avoid conflicts would be a good idea.
Looking at the log after a crash might give you some clues as to what is going on.