In Microsoft Visual C++ 2010 I created a program which delibrately causes a stack overflow. When I run the program using "start debugging" an error is thrown when stack overflow occurs. When I run it with "start without debugging" no error is thrown and the program just terminates silently as if it had successfully completed. Could someone explain to me what's going on? Also do any other compilers not throw errors on stack overflow?
(I thought this would be the right place to ask a question about stack overflow.)
C++ won't hold your hand as a managed enviroment does. Having a stack overflow means undefined behaviour.
A stack overflow is undefined behaviour. The compiler is well within it's rights to ignore it or cause any event to happen.
Because when your process stack overflows, it is no longer a valid process. Displaying an error message requires a stack.
Raymond Chen went over this recently.
As for why the debugger is able to throw such an exception, in that case the process is being kept around because it's attached in debugging mode to the debugger's process. Your process isn't displaying the error, the debugger is.
On Windows machines, you can catch the SEH exception which corresponds to a stack overflow. For an example, you can see boost::regex's source code (Google for BOOST_REGEX_HAS_MS_STACK_GUARD).
It might well be that the compiler optimized the intended stack overflow away. Consider the following pseudo-code example:
void RecursiveMethod(int n)
{
if (n % 1024 == 0)
print n;
// call recursively
RecursiveMethod(n + 1);
}
The method will call itself recursively and overflow the stack pretty quickly because there is no exit condition.
However, most compilers use tail recursion, a technique which will transfer the recursive function call into a loop construct.
It should be noted that with tail recursion, the above program would run in an endless loop and not exit silently.
Bart de Smet has a nice blog article where explains how this technique works in .NET:
The Case of The Failed Demo – StackOverflowException on x64
In a debug build, a number of stack checks are put in place to help you detect problems such as stack overflows, stack corruption etc. They are not present in release builds because they would affect the performance of the application. As others have pointed out, a stack overflow is undefined behaviour, so the compiler is not required to implement such stack checks at all.
When you're in a debugging environment, the runtime checks will help you detect problems that will also occur in your release build, therefore, if you fix all the problems detected in your debug build, then they should also be fixed in your release build . . . in theory. In practice, sometimes bugs that you see in your debug build are not present in your release build or vice versa.
Stack overflows shouldn't happen. Generally, stack overflows occur only by unintentional recursive function calling, or by allocating a large enough buffer on the stack. The former is obviously a bug, and the latter should use the heap instead.
In debugging mode, you want the overhead. You want it to detect if you broke your stack, overflowed your buffers, etc. This overhead is built into the debug instrumentation, and the debugger. In high level terms the debug instrumentation is extra code and data, put there to help flag errors, and the debugger is there to detect the flagged errors and notify the user (in addition to helping you debug, of course).
If you are running with the project compiled in release mode, or without a debugger attached, there is no one to hear the screaming of your program when it dies :) If a tree falls in the forest...
Depending on how you program, C++ is programming without training wheels. If you hit a wall, no one is going to be there to tell you that you screwed up. You will just crash and burn, or even worse, crash and keep running along in a very crippled state, without knowing anything is wrong. Because of this, it can be very fast. There are no extra checks or safe guards to keep it from blazing through your program with the full speed and potential of the processor (and, of course, how many extra steps you coded into your program).
Related
To put it mildly I have a small memory issue and am running out of tools and ideas to isolate the cause.
I have a highly multi-threaded (pthreads) C/C++ program that has developed a stack smashing issue under optimized compiles with GCC after 4.4.4 and prior to 4.7.1.
The symptom is that during the creation of one of the threads, I get a full stack smash, not just %RIP, but all parent frames and most of the registers are 0x00 or other non-sense address.
Which thread causes the issue is seemingly random, however judging by log messages it seems to be isolated to the same Hunk of code, and seems to come at a semi repeatable point in the creation of the new thread.
This has made it very hard to trap and isolate the offending code more narrowly than to a single compilation unit of may thousand lines, since print()'s with in the offending file have so far proved unreliable in trying to narrow down the active section.
The thread creation that leads off the thread that eventually smashes the stack is:
extern "C"
{
static ThreadReturnVal ThreadAPI WriterThread(void *act)
{
Recorder *rec = reinterpret_cast (act);
xuint64 writebytes;
LoggerHandle m_logger = XXGetLogger("WriterThread");
if (SetThreadAffinity(rec->m_cpu_mask))
{ ... }
SetThreadPrio((xint32)rec->m_thread_priority);
while (true)
{
... poll a ring buffer ... Hard Spin 100% use on a single core, this is that sort of crazy code.
}
}
I have tried a debug build, but the symptom is only present in optimized builds, -O2 or better.
I have tried Valgrind/memcheck and DRD but both fail to find any issue before the stack is blown away ( and takes about 12hr's to reach the failure )
A compile with -O2 -Wstack-protector sees nothing wrong,
however a build with -fstack-protector-all does protect me from the bug, but emits no errors.
Electric-Fence also traps, but only after the stack is gone.
Question: What other tools or techniques would be useful in narrowing down the offending section ?
Many thanks,
--Bill
A couple of options for approaching this sort of problem:
You could try setting a hardware breakpoint on a stack address before the corruption occurs and hope the debugger breaks early enough in the corruption to provide a vaguely useful debugging state. The tricky part here is choosing the right stack address; depending on how random the 'choice' of offending thread is, this might not be practical. But from one of your comments it sounds like it is often the newly created thread that gets smashed, so this might be doable. Try to break during thread creation, grab the thread's stack location, offset by some wild guess, set the hardware BP, and continue. Based on whether you break too early, too late, or not at all, adjust your offset, rinse, and repeat. This is basically advanced guess and check, and can be heavily hindered or outright unpractical if the corruption pattern is too random, but it is surprising how often this can lead to a semi-legible stack and successful debugging efforts.
Another option would be to start collecting crash dumps. Try to look for patterns between the crash dumps that might help bring you closer to the source of the corruption. Perhaps you'll get lucky and one of the crash dumps will crash 'faster'/'closer to the source'.
Unfortunately, both of these techniques are more art that science; they're non-deterministic, rely on a healthy dose of luck, etc. (at least in my experience.. that being said, there are people out there who can do amazing things with crash dumps, but it takes a lot of time to get to that level of skill).
One more side note: as others have pointed out, uninitialized memory is a very typical source of debug vs release differences, and could easily be your problem here. However, another possibility to keep in mind is timing differences. The order that threads get scheduled in, and for how long, is often dramatically different in debug vs release, and can easily lead to synchronization bugs being masked in one but not the other. These differences can be just due to execution speed differences, but I think some runtimes intentionally mess with thread scheduling in a debug environment.
You can use a static analysis tool to check for some sutble errors, maybe one of the found errors will be the cause of your bug. You can find some information on these tools here.
My application uses GLUTesselator to tesselate complex concave polygons. It randomly crashes when I run the plain release exe, but it never crashes if I do start debugging in VS. I found this right here which is basically my problem:
The multi-thread debug CRT (/MTd) masks the problem, because, like
Windows does with processes spawned by
a debugger, it provides to your
program a debug heap, that is
initialized to the 0xCD pattern.
Probably somewhere you use some
uninitialized area of memory from the
heap as a pointer and you dereference
it; with the two debug heaps you get
away with it for some reason (maybe
because at address 0xbaadf00d and
0xcdcdcdcd there's valid allocated
memory), but with the "normal" heap
(which is often initialized to 0) you
get an access violation, because you
dereference a NULL pointer.
The problem is the crash occurs in GLU32.dll and I have no way to find out why its trying to dereference a null pointer sometimes. it seems to do this when my polygons get fairly large and have lots of points. What can I do?
Thanks
It's a fact of life that sometimes programs behave differently in the debugger. In your case, some memory is initialized differently, and it's probably laid out differently as well. Another common case in concurrent programs is that the timing is different, and race conditions often happen less often in a debugger.
You could try to manually initialize the heap to a different value (or see if there is an option for this in Visual Studio). Usually initializing to nonzero catches more bugs, but that may not be the case in your situation. You could also try to play with your program's memory mapping to arrange that the page 0xcdcdc000 is unmapped.
Visual Studio can set a breakpoint on accesses to a particular memory address, you could try this (it may slow your program significantly more than a variable breakpoint).
but it never crashes if I do start debugging in VS.
Well, I'm not sure exactly why but while debugging in visual studio program sometimes can get away with accessing some memory regions that would crash it without debugger. I do not know exact reasons, though, but sometimes 0xcdcdcdcd and 0xbaadfood doesn't have anything to do with that. It is just accessing certain addresses doesn't cause problems. When this happens, you'll need to find alternative methods of guessing the problem.
What can I do?
Possible solutions:
Install exception handler in your program (_set_se_translator, if I remember correctly). On access violation try MinidumpWriteDump. Debug it later using Visual Studio (afaik, crash dump debugging is n/a in express edition), or using windbg.
Use just-in-time debuggers. Non-express edition of visual studio have this feature. There are probably alternatives.
Write custom memory manager (that'll override new/delete and will provide malloc/free alternatives (if you use them)) that will grab large chunk of memory, lock all unused memory with VirtualProtect. In this case all invalid access will cause crashes even in debug mode. You'll need a lot of memory for such memory manager, because to be locked, each block should be aligned to pages.
Add excessive logging to all suspicious function calls. Dump a lot of text/debug information into file (or stderr) - parameter values, arrays, everything you suspect could be related to crash, flush after every write to file, otherwise some info will be lost during the crash. This way you'll be able to guess what happened before program crashed.
Try debugging release build. You should be able to do it to some extent if you enable "debug information" for release build in project settings.
Try switching on/off "basic runtime checks" and "buffer security check" in project properties (configuration properties->c/c++->code genration).
Try to find some kind of external tool - something like valgrind or bounds checker. Although, to my expereinece, #3 is more reliable than that approach. Although that really depends on the problem.
A link to an earlier question and two thoughts.
First off you may want to look at a previous question about valgrind substitutes for windows. Lots of good hints on programs that will help you.
Now the thoughts:
1) The debugger may stop your program from crashing in the code you're testing, but it's not fixing the problem. At worst you're just kicking the can down the street, there's still corruption but it's not evident from the way you're running. When you ship you can be assured someone will run into the problem again.
2) What often happens in cases like this is that the error isn't near where the problem occurs. While you may be noticing the problem in GLU32.dll, there was probably corruption earlier, maybe even in a different thread or function, which didn't cause a problem and at some later point the program came back to the corrupted region and failed.
I'm developing a game and when I do a specific action in the game, it crashes.
So I went debugging and I saw my application crashed at simple C++ statements like if, return, ... Each time when I re-run, it crashes randomly at one of 3 lines and it never succeeds.
line 1:
if (dynamic) { ... } // dynamic is a bool member of my class
line 2:
return m_Fixture; // a line of the Box2D physical engine. m_Fixture is a pointer.
line 3:
return m_Density; // The body of a simple getter for an integer.
I get no errors from the app nor the OS...
Are there hints, tips or tricks to debug more efficient and get known what is going on?
That's why I love Java...
Thanks
Random crashes like this are usually caused by stack corruption, since these are branching instructions and thus are sensitive to the condition of the stack. These are somewhat hard to track down, but you should run valgrind and examine the call stack on each crash to try and identify common functions that might be the root cause of the error.
Are there hints, tips or tricks to debug more efficient and get known what is going on?
Run game in debugger, on the point of crash, check values of all arguments. Either using visual studio watch window or using gdb. Using "call stack" check parent routines, try to think what could go wrong.
In suspicious(potentially related to crash) routines, consider dumping all arguments to stderr (if you're using libsdl or on *nixlike systems), or write a logfile, or send dupilcates of all error messages using (on Windows) OutputDebugString. This will make them visible in "output" window in visual studio or debugger. You can also write "traces" (log("function %s was called", __FUNCTION__))
If you can't debug immediately, produce core dumps on crash. On windows it can be done using MiniDumpWriteDump, on linux it is set somewhere in configuration variables. core dumps can be handled by debugger. I'm not sure if VS express can deal with them on Windows, but you still can debug them using WinDBG.
if crash happens within class, check *this argument. It could be invalid or zero.
If the bug is truly evil (elusive stack corruption in multithreaded app that leads to delayed crash), write custom memory manager, that will override new/delete, provide alternative to malloc(if your app for some reason uses it, which may be possible), AND that locks all unused memory memory using VirtualProtect (windows) or OS-specific alternative. In this case all potentially dangerous operation will crash app instantly, which will allow you to debug the problem (if you have Just-In-Time debugger) and instantly find dangerous routine. I prefer such "custom memory manager" to boundschecker and such - since in my experience it was more useful. As an alternative you could try to use valgrind, which is available on linux only. Note, that if your app very frequently allocates memory, you'll need a large amount of RAM in order to be able to lock every unused memory block (because in order to be locked, block should be PAGE_SIZE bytes big).
In areas where you need sanity check either use ASSERT, or (IMO better solution) write a routine that will crash the application (by throwing an std::exception with a meaningful message) if some condition isn't met.
If you've identified a problematic routine, walk through it using debugger's step into/step over. Watch the arguments.
If you've identified a problematic routine, but can't directly debug it for whatever reason, after every statement within that routine, dump all variables into stderr or logfile (fprintf or iostreams - your choice). Then analyze outputs and think how it could have happened. Make sure to flush logfile after every write, or you might miss the data right before the crash.
In general you should be happy that app crashes somewhere. Crash means a bug you can quickly find using debugger and exterminate. Bugs that don't crash the program are much more difficult (example of truly complex bug: given 100000 values of input, after few hundreds of manipulations with values, among thousands of outputs, app produces 1 absolutely incorrect result, which shouldn't have happened at all)
That's why I love Java...
Excuse me, if you can't deal with language, it is entirely your fault. If you can't handle the tool, either pick another one or improve your skill. It is possible to make game in java, by the way.
These are mostly due to stack corruption, but heap corruption can also affect programs in this way.
stack corruption occurs most of the time because of "off by one errors".
heap corruption occurs because of new/delete not being handled carefully, like double delete.
Basically what happens is that the overflow/corruption overwrites an important instruction, then much much later on, when you try to execute the instruction, it will crash.
I generally like to take a second to step back and think through the code, trying to catch any logic errors.
You might try commenting out different parts of the code and seeing if it affects how the program is compiled.
Besides those two things you could try using a debugger like Visual Studio or Eclipse etc...
Lastly you could try to post your code and the error you are getting on a website with a community that knows programming and could help you work through the error (read: stackoverflow)
Crashes / Seg faults usually happen when you access a memory location that it is not allowed to access, or you attempt to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location).
There are many memory analyzer tools, for example I use Valgrind which is really great in telling what the issue is (not only the line number, but also what's causing the crash).
There are no simple C++ statements. An if is only as simple as the condition you evaluate. A return is only as simple as the expression you return.
You should use a debugger and/or post some of the crashing code. Can't be of much use with "my app crashed" as information.
I had problems like this before. I was trying to refresh the GUI from different threads.
If the if statements involve dereferencing pointers, you're almost certainly corrupting the stack (this explains why an innocent return 0 would crash...)
This can happen, for instance, by going out of bounds in an array (you should be using std::vector!), trying to strcpy a char[]-based string missing the ending '\0' (you should be using std::string!), passing a bad size to memcpy (you should be using copy-constructors!), etc.
Try to figure out a way to reproduce it reliably, then place a watch on the corrupted pointer. Run through the code line-by-line until you find the very line that corrupts the pointer.
Look at the disassembly. Almost any C/C++ debugger will be happy to show you the machine code and the registers where the program crashed. The registers include the Instruction Pointer (EIP or RIP on x86/x64) which is where the program was when it stopped. The other registers usually have memory addresses or data. If the memory address is 0 or a bad pointer, there is your problem.
Then you just have to work backward to find out how it got that way. Hardware breakpoints on memory changes are very helpful here.
On a Linux/BSD/Mac, using GDB's scripting features can help a lot here. You can script things so that after the breakpoint is hit 20 times it enables a hardware watch on the address of array element 17. Etc.
You can also write debugging into your program. Use the assert() function. Everywhere!
Use assert to check the arguments to every function. Use assert to check the state of every object before you exit the function. In a game, assert that the player is on the map, that the player has health between 0 and 100, assert everything that you can think of. For complicated objects write verify() or validate() functions into the object itself that checks everything about it and then call those from an assert().
Another way to write in debugging is to have the program use signal() in Linux or asm int 3 in Windows to break into the debugger from the program. Then you can write temporary code into the program to check if it is on iteration 1117321 of the main loop. That can be useful if the bug always happens at 1117322. The program will execute much faster this way than to use a debugger breakpoint.
some tips :
- run your application under a debugger, with the symbol files (PDB) together.
- How to set Visual Studio as the default post-mortem debugger?
- set default debugger for WinDbg Just-in-time Debugging
- check memory allocations Overriding new and delete, and Overriding malloc and free
One other trick: turn off code optimization and see if the crash points make more sense. Optimization is allowed to float little bits of your code to surprising places; mapping that back to source code lines can be less than perfect.
Check pointers. At a guess, you're dereferencing a null pointer.
I've found 'random' crashes when there are some reference to a deleted object. As the memory is not necessarily overwritten, in many cases you don't notice it and the program works correctly, and than crashes after the memory was updated and is not valid anymore.
JUST FOR DEBUGGING PURPOSES, try commenting out some suspicious 'deletes'. Then, if it doesn't crash anymore, there you are.
use the GNU Debugger
Refactoring.
Scan all the code, make it clearer if not clear at first read, try to understand what you wrote and immediately fix what seems incorrect.
You'll certainly discover the problem(s) this way and fix a lot of other problems too.
I just ran into an issue where a stack overflow in a threaded c++ program on HPUX caused a SEGV_MAPERR when a local object tried to call a very simple procedure. I was puzzled for a while, but luckily I talked to someone who recognized this as a stack size issue and we were able to fix the problem by increasing the stack size available to the threads.
How can I recognize when the stack overflows? Do the symptoms differ on windows/linux/hpux?
Assuming you're not on a platform thats going to stop your app and say "stack overflow" I suspect you'll see the same behavior that you would see from any kind of buffer overflow. The stack is just another preallocated chunk of memory for your program, and if you go outside those bounds... well good luck! Who knows what you'll stomp on!
You could write over the temperature readout from the CPU, it could be the email you're typing to Larry, it could be the bit saying that the kernel is locked, causing a fun deadlock condition! Who knows.
As for C++, there's nothing saying how the stack should be laid out in relation to other things in memory or that this thing even needs to be a stack!
How can I recognize when the stack overflows?
If you know the stack size, where the stack starts and the direction it grows in memory, you can simply check the address of the stack pointer and see if it past the end of the stack. C++ does not allow direct access to the stack pointer. You could easily write a small function in assembly to perform this analysis and link it into you program.
Exception code 0xC00000FD on Windows.
Usually it's easier to diagnose when you realize your SEH stops working.
Perhaps a bit off topic, but the analagous issue in Ada (running out of stack space in tasks) is a rather common "uncommon" error. Many compilers will stop the task (but not the main task) with a PROGRAM_ERROR exception.
In a way, you almost have to be able to sniff out this one. It tends to start with something like, "I moved this big array inside my task, and suddenly it quit working".
Output text to screen became mixed with lines of code from program under test. Also present were previous bash commands and other text of unidentified origin. Added to all that the program text became corrupted.
Background
I have an application with a Poof-Crash[1]. I'm fairly certain it is due to a blown stack.
The application is Multi-Threaded.
I am compiling with "Enable C++ Exceptions: Yes With SEH Exceptions (/EHa)".
I have written an SE Translator function and called _set_se_translator() with it.
I have written functions for and setup set_terminate() and set_unexpected().
To get the Stack Overflow, I must run in release mode, under heavy load, for several days. Running under a debugger is not an option as the application can't perform fast enough to achieve the runtime necessary to see the issue.
I can simulate the issue by adding infinite recursion on execution of one of the functions, and thus test the catching of the EXCEPTION_STACK_OVERFLOW exception.
I have WinDBG setup as the crash dump program, and get good information for all other crash issues but not this one. The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
The Question
None of the things I've tried has resulted in picking up the EXCEPTION_STACK_OVERFLOW exception.
Does anyone know how to guarantee getting a a chance at this exception during runtime in release mode?
Definitions
Poof-Crash: The application crashes by going "poof" and disappearing without a trace.
(Considering the name of this site, I'm kind of surprised this question isn't on here already!)
Notes
An answer was posted briefly about adjusting the stack size to potentially force the issue sooner and allow catching it with a debugger. That is a clever thought, but unfortunately, I don't believe it would help. The issue is likely caused by a corner case leading to infinite recursion. Shortening the stack would not expose the issue any sooner and would likely cause an unrelated crash in validly deep code. Nice idea though, and thanks for posting it, even if you did remove it.
Everything prior to windows xp would not (or would be harder) generally be able to trap stack overflows. With the advent of xp, you can set vectored exception handler that gets a chance at stack overflow prior to any stack-based (structured exception) handlers (this is being the very reason - structured exception handlers are stack-based).
But there's really not much you can do even if you're able to trap such an exception.
In his blog, cbrumme (sorry, do not have his/her real name) discusses a stack page neighboring the guard page (the one, that generates the stack overflow) that can potentially be used for backout. If you can squeeze your backout code to use just one stack page - you can free as much as your logic allows. Otherwise, the application is pretty much dead upon encountering stack overflow. The only other reasonable thing to do, having trapped it, is to write a dump file for later debugging.
Hope, it helps.
I'm not convinced that you're on the right track in diagnosing this as a stack overflow.
But in any case, the fact that you're getting a poof!, plus what you're seeing in WinDbg
The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
suggests to me that somebody has called the C RTL exit() function, or possibly called the Windows API TerminateProcess() directly. That could have something to do with your interrupt handlers or not. Maybe something in the exception handling logic has a re-entrance check and arbitrarily decides to exit() if it's reentered.
My suggestion is to patch your executables to put maybe an INT 3 debug at the entry point to exit (), if it's statically linked, or if it's dynamically linked, patch up the import and also patch up any imports of kernel32::TerminateProcess to throw a DebugBreak() instead.
Of course, exit() and/or TerminateProcess() may be called on a normal shutdown, too, so you'll have to filter out the false alarms, but if you can get the call stack for the case where it's just about to go proof, you should have what you need.
EDIT ADD: Just simply writing your own version of exit() and linking it in instead of the CRTL version might do the trick.
I remember code from a previous workplace that sounded similar having explicit bounds checks on the stack pointer and throwing an exception manually.
It's been a while since I've touched C++ though, and even when I did touch it I didn't know what I was doing, so caveat implementor about portability/reliability of said advice.
Have you considered ADPlus from Debugging Tools for Windows?
ADPlus attaches the CDB debugger to a process in "crash" mode and will generate crash dumps for most exceptions the process generates. Basically, you run "ADPlus -crash -p yourPIDhere", it performs an invasive attach and begins logging.
Given your comment above about running under a debugger, I just wanted to add that CDB adds virtually zero overhead in -crash mode on a decent (dual-core, 2GB RAM) machine, so don't let that hold you back from trying it.
You can generate debugging symbols without disabling optimizations. In fact, you should be doing that anyways. It just makes debugging harder.
And the documentation for _set_se_translator says that each thread has its own SE translator. Are you setting one for each thread?
set_unexpected is probably a no-op, at least according to the VS 2005 documentation. And each thread also has its own terminate handler, so you should install that per thread as well.
I would also strongly recommend NOT using SE translation. It takes hardware exceptions that you shouldn't ignore (i.e., you should really log an error and terminate) and turns them into something you can ignore (C++ exceptions). If you want to catch this kind of error, use a __try/__except handler.