C++: __try...__except; hides crash in release mode? - c++

I have a DX9 application that runs on an embedded Windows XP box. When leaving it automated overnight for soak testing it crashes after about six to eight hours. On our dev. machines (Win 7) we can't seem to reproduce this issue. I'm also fairly certain it's not a memory leak.
If we run the same application in Debug on the embedded machines, it doesn't crash.
If we place a __try/__except around the main loop update on the embedded machines, it doesn't crash.
I know in Debug, there is some additional byte padding around the local stack which may be "hiding" a local array access out of bounds, or some sort of uninitialized variable is sneaking through.
So I have two questions:
Does __try/__except behave similar to debug, even when run in release?
What kind of things should I be scanning the code for if we have a crash in Release mode, but not in Debug mode?

If you're using __try{ } __except() you shouldn't.
Those and C++ code don't mix well. (for instance, you can't have C++ objects on the stack of a function wrapped with those. You should use C++ try {} catch() {} if you use catch(...) (with ellipsis) it does basically the same as __except()
both try.. catch and __try .. __except behave the same in debug and release.
If you suspect that your problem is an unexpected exception you should read about all of the following:
SetUnhandledExceptionFilter()
_set_se_translator()
_CrtSetReportMode()
_RTC_SetErrorFunc()
_set_abort_behavior()
_set_error_mode()
_set_new_handler()
_set_new_mode()
_set_purecall_handler()
set_terminate()
set_unexpected()
_set_invalid_parameter_handler()
_controlfp()
Using one of the first two would probably allow you to pinpoint your problem pretty quickly. The rest are there if you want absolute control for all error cases possible in your process.
Specifically, with SetUnhandledExceptionFilter() you can set up a function filter which logs the address of the code which caused the exception. You can then use your debugger to pin point that code. Using the DbgHelp library and with the information given to the filter function you can write some code which prints out a full stack trace of the crash, including symbols and line numbers.
Make sure you set up your build configuration to emit debug symbols for release builds as well. They can only help and don't do anything to slow your application (but maybe make it bigger)

If we place a __try/__except around the main loop update on the embedded machines, it doesn't crash.
Then do that.
A single __try block around the whole program (as well as the entry point for each worker thread) is the recommended approach, it lets you write out a crash dump and make an error report before exiting. There's not much recovery you can do with SEH, because the exceptions just don't carry enough information to distinguish different failures usefully. Storing the whole program state and pulling it into a debugger is very useful, though.
Note: Some video drivers cause SEH exceptions that they also catch, perhaps some logic expects there to be more than one SEH scope installed, which your __try block provided.

Related

Why won't my code segfault on Windows 7?

This is an unusual question to ask but here goes:
In my code, I accidentally dereference NULL somewhere. But instead of the application crashing with a segfault, it seems to stop execution of the current function and just return control back to the UI. This makes debugging difficult because I would normally like to be alerted to the crash so I can attach a debugger.
What could be causing this?
Specifically, my code is an ODBC Driver (ie. a DLL). My test application is ODBC Test (odbct32w.exe) which allows me to explicitly call the ODBC API functions in my DLL. When I call one of the functions which has a known segfault, instead of crashing the application, ODBC Test simply returns control to the UI without printing the result of the function call. I can then call any function in my driver again.
I do know that technically the application calls the ODBC driver manager which loads and calls the functions in my driver. But that is beside the point as my segfault (or whatever is happening) causes the driver manager function to not return either (as evidenced by the application not printing a result).
One of my co-workers with a similar machine experiences this same problem while another does not but we have not been able to determine any specific differences.
Windows has non-portable language extensions (known as "SEH") which allow you to catch page faults and segmentation violations as exceptions.
There are parts of the OS libraries (particularly inside the OS code that processes some window messages, if I remember correctly) which have a __try block and will make your code continue to run even in the face of such catastrophic errors. Likely you are being called inside one of these __try blocks. Sad but true.
Check out this blog post, for example: The case of the disappearing OnLoad exception – user-mode callback exceptions in x64
Update:
I find it kind of weird the kind of ideas that are being attributed to me in the comments. For the record:
I did not claim that SEH itself is bad.I said that it is "non-portable", which is true. I also claimed that using SEH to ignore STATUS_ACCESS_VIOLATION in user mode code is "sad". I stand by this. I should hope that I had the nerve to do this in new code and you were reviewing my code that you would yell at me, just as if I wrote catch (...) { /* Ignore this! */ }. It's a bad idea. It's especially bad for access violation because getting an AV typically means your process is in a bad state, and you shouldn't continue execution.
I did not argue that the existence of SEH means that you must swallow all errors.Of course SEH is a general mechanism and not to blame for every idiotic use of it. What I said was that some Windows binaries swallow STATUS_ACCESS_VIOLATION when calling into a function pointer, a true and observable fact, and that this is less than pretty. Note that they may have historical reasons or extenuating circumstances to justify this. Hence "sad but true."
I did not inject any "Windows vs. Unix" rhetoric here. A bad idea is a bad idea on any platform. Trying to recover from SIGSEGV on a Unix-type OS would be equally sketchy.
Dereferencing NULL pointer is an undefined behavior, which can produce almost anything -- a seg.fault, a letter to IRS, or a post to stackoverflow :)
Windows 7 also have its Fault Tollerant Heap (FTH) which sometimes does such things. In my case it was also a NULL-dereference. If you develop on Windows 7 you really want to turn it off!
What is Windows 7's Fault Tolerant Heap?
http://msdn.microsoft.com/en-us/library/dd744764%28v=vs.85%29.aspx
Read about the different kinds of exception handlers here -- they don't catch the same kind of exceptions.
Attach your debugger to all the apps that might call your dll, turn on the feature to break when an excption is thrown not just unhandled in the [debug]|[exceptions] menu.
ODBC is most (if not all) COM as such unhandled exceptions will cause issues, which could appear as exiting the ODBC function strangely or as bad as it hang and never return.

Why do certain things never crash whith debugger on?

My application uses GLUTesselator to tesselate complex concave polygons. It randomly crashes when I run the plain release exe, but it never crashes if I do start debugging in VS. I found this right here which is basically my problem:
The multi-thread debug CRT (/MTd) masks the problem, because, like
Windows does with processes spawned by
a debugger, it provides to your
program a debug heap, that is
initialized to the 0xCD pattern.
Probably somewhere you use some
uninitialized area of memory from the
heap as a pointer and you dereference
it; with the two debug heaps you get
away with it for some reason (maybe
because at address 0xbaadf00d and
0xcdcdcdcd there's valid allocated
memory), but with the "normal" heap
(which is often initialized to 0) you
get an access violation, because you
dereference a NULL pointer.
The problem is the crash occurs in GLU32.dll and I have no way to find out why its trying to dereference a null pointer sometimes. it seems to do this when my polygons get fairly large and have lots of points. What can I do?
Thanks
It's a fact of life that sometimes programs behave differently in the debugger. In your case, some memory is initialized differently, and it's probably laid out differently as well. Another common case in concurrent programs is that the timing is different, and race conditions often happen less often in a debugger.
You could try to manually initialize the heap to a different value (or see if there is an option for this in Visual Studio). Usually initializing to nonzero catches more bugs, but that may not be the case in your situation. You could also try to play with your program's memory mapping to arrange that the page 0xcdcdc000 is unmapped.
Visual Studio can set a breakpoint on accesses to a particular memory address, you could try this (it may slow your program significantly more than a variable breakpoint).
but it never crashes if I do start debugging in VS.
Well, I'm not sure exactly why but while debugging in visual studio program sometimes can get away with accessing some memory regions that would crash it without debugger. I do not know exact reasons, though, but sometimes 0xcdcdcdcd and 0xbaadfood doesn't have anything to do with that. It is just accessing certain addresses doesn't cause problems. When this happens, you'll need to find alternative methods of guessing the problem.
What can I do?
Possible solutions:
Install exception handler in your program (_set_se_translator, if I remember correctly). On access violation try MinidumpWriteDump. Debug it later using Visual Studio (afaik, crash dump debugging is n/a in express edition), or using windbg.
Use just-in-time debuggers. Non-express edition of visual studio have this feature. There are probably alternatives.
Write custom memory manager (that'll override new/delete and will provide malloc/free alternatives (if you use them)) that will grab large chunk of memory, lock all unused memory with VirtualProtect. In this case all invalid access will cause crashes even in debug mode. You'll need a lot of memory for such memory manager, because to be locked, each block should be aligned to pages.
Add excessive logging to all suspicious function calls. Dump a lot of text/debug information into file (or stderr) - parameter values, arrays, everything you suspect could be related to crash, flush after every write to file, otherwise some info will be lost during the crash. This way you'll be able to guess what happened before program crashed.
Try debugging release build. You should be able to do it to some extent if you enable "debug information" for release build in project settings.
Try switching on/off "basic runtime checks" and "buffer security check" in project properties (configuration properties->c/c++->code genration).
Try to find some kind of external tool - something like valgrind or bounds checker. Although, to my expereinece, #3 is more reliable than that approach. Although that really depends on the problem.
A link to an earlier question and two thoughts.
First off you may want to look at a previous question about valgrind substitutes for windows. Lots of good hints on programs that will help you.
Now the thoughts:
1) The debugger may stop your program from crashing in the code you're testing, but it's not fixing the problem. At worst you're just kicking the can down the street, there's still corruption but it's not evident from the way you're running. When you ship you can be assured someone will run into the problem again.
2) What often happens in cases like this is that the error isn't near where the problem occurs. While you may be noticing the problem in GLU32.dll, there was probably corruption earlier, maybe even in a different thread or function, which didn't cause a problem and at some later point the program came back to the corrupted region and failed.

C++: Where to start when my application crashes at random places?

I'm developing a game and when I do a specific action in the game, it crashes.
So I went debugging and I saw my application crashed at simple C++ statements like if, return, ... Each time when I re-run, it crashes randomly at one of 3 lines and it never succeeds.
line 1:
if (dynamic) { ... } // dynamic is a bool member of my class
line 2:
return m_Fixture; // a line of the Box2D physical engine. m_Fixture is a pointer.
line 3:
return m_Density; // The body of a simple getter for an integer.
I get no errors from the app nor the OS...
Are there hints, tips or tricks to debug more efficient and get known what is going on?
That's why I love Java...
Thanks
Random crashes like this are usually caused by stack corruption, since these are branching instructions and thus are sensitive to the condition of the stack. These are somewhat hard to track down, but you should run valgrind and examine the call stack on each crash to try and identify common functions that might be the root cause of the error.
Are there hints, tips or tricks to debug more efficient and get known what is going on?
Run game in debugger, on the point of crash, check values of all arguments. Either using visual studio watch window or using gdb. Using "call stack" check parent routines, try to think what could go wrong.
In suspicious(potentially related to crash) routines, consider dumping all arguments to stderr (if you're using libsdl or on *nixlike systems), or write a logfile, or send dupilcates of all error messages using (on Windows) OutputDebugString. This will make them visible in "output" window in visual studio or debugger. You can also write "traces" (log("function %s was called", __FUNCTION__))
If you can't debug immediately, produce core dumps on crash. On windows it can be done using MiniDumpWriteDump, on linux it is set somewhere in configuration variables. core dumps can be handled by debugger. I'm not sure if VS express can deal with them on Windows, but you still can debug them using WinDBG.
if crash happens within class, check *this argument. It could be invalid or zero.
If the bug is truly evil (elusive stack corruption in multithreaded app that leads to delayed crash), write custom memory manager, that will override new/delete, provide alternative to malloc(if your app for some reason uses it, which may be possible), AND that locks all unused memory memory using VirtualProtect (windows) or OS-specific alternative. In this case all potentially dangerous operation will crash app instantly, which will allow you to debug the problem (if you have Just-In-Time debugger) and instantly find dangerous routine. I prefer such "custom memory manager" to boundschecker and such - since in my experience it was more useful. As an alternative you could try to use valgrind, which is available on linux only. Note, that if your app very frequently allocates memory, you'll need a large amount of RAM in order to be able to lock every unused memory block (because in order to be locked, block should be PAGE_SIZE bytes big).
In areas where you need sanity check either use ASSERT, or (IMO better solution) write a routine that will crash the application (by throwing an std::exception with a meaningful message) if some condition isn't met.
If you've identified a problematic routine, walk through it using debugger's step into/step over. Watch the arguments.
If you've identified a problematic routine, but can't directly debug it for whatever reason, after every statement within that routine, dump all variables into stderr or logfile (fprintf or iostreams - your choice). Then analyze outputs and think how it could have happened. Make sure to flush logfile after every write, or you might miss the data right before the crash.
In general you should be happy that app crashes somewhere. Crash means a bug you can quickly find using debugger and exterminate. Bugs that don't crash the program are much more difficult (example of truly complex bug: given 100000 values of input, after few hundreds of manipulations with values, among thousands of outputs, app produces 1 absolutely incorrect result, which shouldn't have happened at all)
That's why I love Java...
Excuse me, if you can't deal with language, it is entirely your fault. If you can't handle the tool, either pick another one or improve your skill. It is possible to make game in java, by the way.
These are mostly due to stack corruption, but heap corruption can also affect programs in this way.
stack corruption occurs most of the time because of "off by one errors".
heap corruption occurs because of new/delete not being handled carefully, like double delete.
Basically what happens is that the overflow/corruption overwrites an important instruction, then much much later on, when you try to execute the instruction, it will crash.
I generally like to take a second to step back and think through the code, trying to catch any logic errors.
You might try commenting out different parts of the code and seeing if it affects how the program is compiled.
Besides those two things you could try using a debugger like Visual Studio or Eclipse etc...
Lastly you could try to post your code and the error you are getting on a website with a community that knows programming and could help you work through the error (read: stackoverflow)
Crashes / Seg faults usually happen when you access a memory location that it is not allowed to access, or you attempt to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location).
There are many memory analyzer tools, for example I use Valgrind which is really great in telling what the issue is (not only the line number, but also what's causing the crash).
There are no simple C++ statements. An if is only as simple as the condition you evaluate. A return is only as simple as the expression you return.
You should use a debugger and/or post some of the crashing code. Can't be of much use with "my app crashed" as information.
I had problems like this before. I was trying to refresh the GUI from different threads.
If the if statements involve dereferencing pointers, you're almost certainly corrupting the stack (this explains why an innocent return 0 would crash...)
This can happen, for instance, by going out of bounds in an array (you should be using std::vector!), trying to strcpy a char[]-based string missing the ending '\0' (you should be using std::string!), passing a bad size to memcpy (you should be using copy-constructors!), etc.
Try to figure out a way to reproduce it reliably, then place a watch on the corrupted pointer. Run through the code line-by-line until you find the very line that corrupts the pointer.
Look at the disassembly. Almost any C/C++ debugger will be happy to show you the machine code and the registers where the program crashed. The registers include the Instruction Pointer (EIP or RIP on x86/x64) which is where the program was when it stopped. The other registers usually have memory addresses or data. If the memory address is 0 or a bad pointer, there is your problem.
Then you just have to work backward to find out how it got that way. Hardware breakpoints on memory changes are very helpful here.
On a Linux/BSD/Mac, using GDB's scripting features can help a lot here. You can script things so that after the breakpoint is hit 20 times it enables a hardware watch on the address of array element 17. Etc.
You can also write debugging into your program. Use the assert() function. Everywhere!
Use assert to check the arguments to every function. Use assert to check the state of every object before you exit the function. In a game, assert that the player is on the map, that the player has health between 0 and 100, assert everything that you can think of. For complicated objects write verify() or validate() functions into the object itself that checks everything about it and then call those from an assert().
Another way to write in debugging is to have the program use signal() in Linux or asm int 3 in Windows to break into the debugger from the program. Then you can write temporary code into the program to check if it is on iteration 1117321 of the main loop. That can be useful if the bug always happens at 1117322. The program will execute much faster this way than to use a debugger breakpoint.
some tips :
- run your application under a debugger, with the symbol files (PDB) together.
- How to set Visual Studio as the default post-mortem debugger?
- set default debugger for WinDbg Just-in-time Debugging
- check memory allocations Overriding new and delete, and Overriding malloc and free
One other trick: turn off code optimization and see if the crash points make more sense. Optimization is allowed to float little bits of your code to surprising places; mapping that back to source code lines can be less than perfect.
Check pointers. At a guess, you're dereferencing a null pointer.
I've found 'random' crashes when there are some reference to a deleted object. As the memory is not necessarily overwritten, in many cases you don't notice it and the program works correctly, and than crashes after the memory was updated and is not valid anymore.
JUST FOR DEBUGGING PURPOSES, try commenting out some suspicious 'deletes'. Then, if it doesn't crash anymore, there you are.
use the GNU Debugger
Refactoring.
Scan all the code, make it clearer if not clear at first read, try to understand what you wrote and immediately fix what seems incorrect.
You'll certainly discover the problem(s) this way and fix a lot of other problems too.

How can I guarantee catching a EXCEPTION_STACK_OVERFLOW structured exception in C++ under Visual Studio 2005?

Background
I have an application with a Poof-Crash[1]. I'm fairly certain it is due to a blown stack.
The application is Multi-Threaded.
I am compiling with "Enable C++ Exceptions: Yes With SEH Exceptions (/EHa)".
I have written an SE Translator function and called _set_se_translator() with it.
I have written functions for and setup set_terminate() and set_unexpected().
To get the Stack Overflow, I must run in release mode, under heavy load, for several days. Running under a debugger is not an option as the application can't perform fast enough to achieve the runtime necessary to see the issue.
I can simulate the issue by adding infinite recursion on execution of one of the functions, and thus test the catching of the EXCEPTION_STACK_OVERFLOW exception.
I have WinDBG setup as the crash dump program, and get good information for all other crash issues but not this one. The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
The Question
None of the things I've tried has resulted in picking up the EXCEPTION_STACK_OVERFLOW exception.
Does anyone know how to guarantee getting a a chance at this exception during runtime in release mode?
Definitions
Poof-Crash: The application crashes by going "poof" and disappearing without a trace.
(Considering the name of this site, I'm kind of surprised this question isn't on here already!)
Notes
An answer was posted briefly about adjusting the stack size to potentially force the issue sooner and allow catching it with a debugger. That is a clever thought, but unfortunately, I don't believe it would help. The issue is likely caused by a corner case leading to infinite recursion. Shortening the stack would not expose the issue any sooner and would likely cause an unrelated crash in validly deep code. Nice idea though, and thanks for posting it, even if you did remove it.
Everything prior to windows xp would not (or would be harder) generally be able to trap stack overflows. With the advent of xp, you can set vectored exception handler that gets a chance at stack overflow prior to any stack-based (structured exception) handlers (this is being the very reason - structured exception handlers are stack-based).
But there's really not much you can do even if you're able to trap such an exception.
In his blog, cbrumme (sorry, do not have his/her real name) discusses a stack page neighboring the guard page (the one, that generates the stack overflow) that can potentially be used for backout. If you can squeeze your backout code to use just one stack page - you can free as much as your logic allows. Otherwise, the application is pretty much dead upon encountering stack overflow. The only other reasonable thing to do, having trapped it, is to write a dump file for later debugging.
Hope, it helps.
I'm not convinced that you're on the right track in diagnosing this as a stack overflow.
But in any case, the fact that you're getting a poof!, plus what you're seeing in WinDbg
The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
suggests to me that somebody has called the C RTL exit() function, or possibly called the Windows API TerminateProcess() directly. That could have something to do with your interrupt handlers or not. Maybe something in the exception handling logic has a re-entrance check and arbitrarily decides to exit() if it's reentered.
My suggestion is to patch your executables to put maybe an INT 3 debug at the entry point to exit (), if it's statically linked, or if it's dynamically linked, patch up the import and also patch up any imports of kernel32::TerminateProcess to throw a DebugBreak() instead.
Of course, exit() and/or TerminateProcess() may be called on a normal shutdown, too, so you'll have to filter out the false alarms, but if you can get the call stack for the case where it's just about to go proof, you should have what you need.
EDIT ADD: Just simply writing your own version of exit() and linking it in instead of the CRTL version might do the trick.
I remember code from a previous workplace that sounded similar having explicit bounds checks on the stack pointer and throwing an exception manually.
It's been a while since I've touched C++ though, and even when I did touch it I didn't know what I was doing, so caveat implementor about portability/reliability of said advice.
Have you considered ADPlus from Debugging Tools for Windows?
ADPlus attaches the CDB debugger to a process in "crash" mode and will generate crash dumps for most exceptions the process generates. Basically, you run "ADPlus -crash -p yourPIDhere", it performs an invasive attach and begins logging.
Given your comment above about running under a debugger, I just wanted to add that CDB adds virtually zero overhead in -crash mode on a decent (dual-core, 2GB RAM) machine, so don't let that hold you back from trying it.
You can generate debugging symbols without disabling optimizations. In fact, you should be doing that anyways. It just makes debugging harder.
And the documentation for _set_se_translator says that each thread has its own SE translator. Are you setting one for each thread?
set_unexpected is probably a no-op, at least according to the VS 2005 documentation. And each thread also has its own terminate handler, so you should install that per thread as well.
I would also strongly recommend NOT using SE translation. It takes hardware exceptions that you shouldn't ignore (i.e., you should really log an error and terminate) and turns them into something you can ignore (C++ exceptions). If you want to catch this kind of error, use a __try/__except handler.

Program only crashes as release build -- how to debug?

I've got a "Schroedinger's Cat" type of problem here -- my program (actually the test suite for my program, but a program nonetheless) is crashing, but only when built in release mode, and only when launched from the command line. Through caveman debugging (ie, nasty printf() messages all over the place), I have determined the test method where the code is crashing, though unfortunately the actual crash seems to happen in some destructor, since the last trace messages I see are in other destructors which execute cleanly.
When I attempt to run this program inside of Visual Studio, it doesn't crash. Same goes when launching from WinDbg.exe. The crash only occurs when launching from the command line. This is happening under Windows Vista, btw, and unfortunately I don't have access to an XP machine right now to test on.
It would be really nice if I could get Windows to print out a stack trace, or something other than simply terminating the program as if it had exited cleanly. Does anyone have any advice as to how I could get some more meaningful information here and hopefully fix this bug?
Edit: The problem was indeed caused by an out-of-bounds array, which I describe more in this post. Thanks everybody for your help in finding this problem!
In 100% of the cases I've seen or heard of, where a C or C++ program runs fine in the debugger but fails when run outside, the cause has been writing past the end of a function local array. (The debugger puts more on the stack, so you're less likely to overwrite something important.)
When I have encountered problems like this before it has generally been due to variable initialization. In debug mode, variables and pointers get initialized to zero automatically but in release mode they do not. Therefore, if you have code like this
int* p;
....
if (p == 0) { // do stuff }
In debug mode the code in the if is not executed but in release mode p contains an undefined value, which is unlikely to be 0, so the code is executed often causing a crash.
I would check your code for uninitialized variables. This can also apply to the contents of arrays.
No answer so far has tried to give a serious overview about the available techniques for debugging release applications:
Release and Debug builds behave differently for many reasons. Here is an excellent overview. Each of these differences might cause a bug in the Release build that doesn't exist in the Debug build.
The presence of a debugger may change the behavior of a program too, both for release and debug builds. See this answer. In short, at least the Visual Studio Debugger uses the Debug Heap automatically when attached to a program. You can turn the debug heap off by using environment variable _NO_DEBUG_HEAP . You can specify this either in your computer properties, or in the Project Settings in Visual Studio. That might make the crash reproducible with the debugger attached.
More on debugging heap corruption here.
If the previous solution doesn't work, you need to catch the unhandled exception and attach a post-mortem debugger the instance the crash occurs. You can use e.g. WinDbg for this, details about the avaiable post-mortem debuggers and their installation at MSDN
You can improve your exception handling code and if this is a production application, you should:
a. Install a custom termination handler using std::set_terminate
If you want to debug this problem locally, you could run an endless loop inside the termination handler and output some text to the console to notify you that std::terminate has been called. Then attach the debugger and check the call stack. Or you print the stack trace as described in this answer.
In a production application you might want to send an error report back home, ideally together with a small memory dump that allows you to analyze the problem as described here.
b. Use Microsoft's structured exception handling mechanism that allows you to catch both hardware and software exceptions. See MSDN. You could guard parts of your code using SEH and use the same approach as in a) to debug the problem. SEH gives more information about the exception that occurred that you could use when sending an error report from a production app.
Things to look out for:
Array overruns - the visual studio debugger inserts padding which may stop crashes.
Race conditions - do you have multiple threads involved if so a race condition many only show up when an application is executed directly.
Linking - is your release build pulling in the correct libraries.
Things to try:
Minidump - really easy to use (just look it up in msdn) will give you a full crash dump for each thread. You just load the output into visual studio and it is as if you were debugging at the time of the crash.
You can set WinDbg as your postmortem debugger. This will launch the debugger and attach it to the process when the crash occurs. To install WinDbg for postmortem debugging, use the /I option (note it is capitalized):
windbg /I
More details here.
As to the cause, it's most probably an unitialized variable as the other answers suggest.
After many hours of debugging, I finally found the cause of the problem, which was indeed caused by a buffer overflow, caused a single byte difference:
char *end = static_cast<char*>(attr->data) + attr->dataSize;
This is a fencepost error (off-by-one error) and was fixed by:
char *end = static_cast<char*>(attr->data) + attr->dataSize - 1;
The weird thing was, I put several calls to _CrtCheckMemory() around various parts of my code, and they always returned 1. I was able to find the source of the problem by placing "return false;" calls in the test case, and then eventually determining through trial-and-error where the fault was.
Thanks everybody for your comments -- I learned a lot about windbg.exe today! :)
Even though you have built your exe as a release one, you can still generate PDB (Program database) files that will allow you to stack trace, and do a limited amount of variable inspection.
In your build settings there is an option to create the PDB files. Turn this on and relink. Then try running from the IDE first to see if you get the crash. If so, then great - you're all set to look at things. If not, then when running from the command line you can do one of two things:
Run EXE, and before the crash do an Attach To Process (Tools menu on Visual Studio).
After the crash, select the option to launch debugger.
When asked to point to PDB files, browse to find them. If the PDB's were put in the same output folder as your EXE or DLL's they will probably be picked up automatically.
The PDB's provide a link to the source with enough symbol information to make it possible to see stack traces, variables etc. You can inspect the values as normal, but do be aware that you can get false readings as the optimisation pass may mean things only appear in registers, or things happen in a different order than you expect.
NB: I'm assuming a Windows/Visual Studio environment here.
Crashes like this are almost always caused because an IDE will usually set the contents of uninitialized variable to zeros, null or some other such 'sensible' value, whereas when running natively you'll get whatever random rubbish that the system picks up.
Your error is therefore almost certainly that you are using something like you are using a pointer before it has been properly initialized and you're getting away with it in the IDE because it doesn't point anywhere dangerous - or the value is handled by your error checking - but in release mode it does something nasty.
In order to have a crash dump that you can analyze:
Generate pdb files for your code.
You rebase to have your exe and dlls loaded in the same address.
Enable post mortem debugger such as Dr. Watson
Check the crash failures address using a tool such as crash finder.
You should also check out the tools in Debugging tools for windows.
You can monitor the application and see all the first chance exceptions that were prior to your second chance exception.
Hope it helps...
Sometimes this happens because you have wrapped important operation inside "assert" macro. As you may know, "assert" evaluates expressions only on debug mode.
A great way to debug an error like this is to enable optimizations for your debug build.
Once i had a problem when app behaved similarily to yours. It turned out to be a nasty buffer overrun in sprintf. Naturally, it worked when run with a debugger attached. What i did, was to install an unhandled exception filter (SetUnhandledExceptionFilter) in which i simply blocked infinitely (using WaitForSingleObject on a bogus handle with a timeout value of INFINITE).
So you could something along the lines of:
long __stdcall MyFilter(EXCEPTION_POINTERS *)
{
HANDLE hEvt=::CreateEventW(0,1,0,0);
if(hEvt)
{
if(WAIT_FAILED==::WaitForSingleObject(hEvt, INFINITE))
{
//log failure
}
}
}
// somewhere in your wmain/WinMain:
SetUnhandledExceptionFilter(MyFilter);
I then attached the debugger after the bug had manifested itself (gui program stopped responding).
Then you can either take a dump and work with it later:
.dump /ma path_to_dump_file
Or debug it right away. The simplest way is to track where processor context has been saved by the runtime exception handling machinery:
s-d esp Range 1003f
Command will search stack address space for CONTEXT record(s) provided the length of search. I usually use something like 'l?10000'. Note, do not use unsually large numbers as the record you're after usually near to the unhanded exception filter frame.
1003f is the combination of flags (i believe it corresponds to CONTEXT_FULL) used to capture the processor state.
Your search would look similar to this:
0:000> s-d esp l1000 1003f
0012c160 0001003f 00000000 00000000 00000000 ?...............
Once you get results back, use the address in the cxr command:
.cxr 0012c160
This will take you to this new CONTEXT, exactly at the time of crash (you will get exactly the stack trace at the time your app crashed).
Additionally, use:
.exr -1
to find out exactly which exception had occurred.
Hope it helps.
With regard to your problems getting diagnostic information, have you tried using adplus.vbs as an alternative to WinDbg.exe? To attach to a running process, use
adplus.vbs -crash -p <process_id>
Or to start the application in the event that the crash happens quickly:
adplus.vbs -crash -sc your_app.exe
Full info on adplus.vbs can be found at: http://support.microsoft.com/kb/286350
Ntdll.dll with debugger attached
One little know difference between launching a program from the IDE or WinDbg as opposed to launching it from command line / desktop is that when launching with a debugger attached (i.e. IDE or WinDbg) ntdll.dll uses a different heap implementation which performs some little validation on the memory allocation/freeing.
You may read some relevant information in unexpected user breakpoint in ntdll.dll. One tool which might be able to help you identifying the problem is PageHeap.exe.
Crash analysis
You did not write what is the "crash" you are experiencing. Once the program crashes and offers you to send the error information to the Microsoft, you should be able to click on the technical information and to check at least the exception code, and with some effort you can even perform post-mortem analysis (see Heisenbug: WinApi program crashes on some computers) for instructions)
Vista SP1 actually has a really nice crash dump generator built into the system. Unfortunately, it isn't turned on by default!
See this article:
http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx
The benefit of this approach is that no extra software needs to be installed on the affected system. Grip it and rip it, baby!
As my experience, that are most being memory corruption issues.
For example :
char a[8];
memset(&a[0], 0, 16);
: /*use array a doing some thing */
it is very possible to be normal in debug mode when one runs the code.
But in release, that would/might be crash.
For me, to rummage where the memory is out of bound is too toilsome.
Use some tools like Visual Leak Detector (windows) or valgrind (linux) are more wise choise.
I've seen a lot of right answers. However, there is none that helped me. In my case, there was a wrong usage of the SSE instructions with the unaligned memory. Take a look at your math library (if you use one), and try to disable SIMD support, recompile and reproduce the crash.
Example:
A project includes mathfu, and uses the classes with STL vector: std::vector< mathfu::vec2 >. Such usage will probably cause a crash at the time of the construction of mathfu::vec2 item since the STL default allocator does not guarantee required 16-byte alignment. In this case to prove the idea, one can define #define MATHFU_COMPILE_WITHOUT_SIMD_SUPPORT 1 before each include of the mathfu, recompile in Release configuration and check again.
The Debug and RelWithDebInfo configurations worked well for my project, but not the Release one. The reason behind this behavior is probably because debugger processes allocation/deallocation requests and does some memory bookkeeping to check and verify the accesses to the memory.
I experienced the situation in Visual Studio 2015 and 2017 environments.
Something similar happend to me once with GCC. It turned out to be a too aggressive optimization that was enabled only when creating the final release and not during the development process.
Well, to tell the truth it was my fault, not gcc's, as I didn't noticed that my code was relying on the fact that that particular optimization wouldn't have been done.
It took me a lot of time to trace it and I only came to it because I asked on a newsgroup and somebody made me think about it. So, let me return the favour just in case this is happening to you as well.
I've found this this article useful for your scenario. ISTR the compiler options were a little out of date. Look around your Visual Studio project options to see how to generate pdb files for your release build, etc.
It's suspicious that it would happen outside the debugger and not inside; running in the debugger does not normally change the application behavior. I would check the environment differences between the console and the IDE. Also, obviously, compile release without optimizations and with debug information, and see if that affects the behavior. Finally, check out the post-mortem debugging tools other people have suggested here, usually you can get some clue from them.
Debugging release builds can be a pain due to optimizations changing the order in which lines of your code appear to be executed. It can really get confusing!
One technique to at least narrow down the problem is to use MessageBox() to display quick statements stating what part of the program your code has got to ("Starting Foo()", "Starting Foo2()"); start putting them at the top of functions in the area of your code that you suspect (what were you doing at the time when it crashed?). When you can tell which function, change the message boxes to blocks of code or even individual lines within that function until you narrow it down to a few lines. Then you can start printing out the value of variables to see what state they are in at the point of crashing.
Try using _CrtCheckMemory() to see what state the allocated memory is in .
If everything goes well , _CrtCheckMemory returns TRUE , else FALSE .
You might run your software with Global Flags enabled (Look in Debugging Tools for Windows). It will very often help to nail the problem.
Make your program generate a mini dump when the exception occurs, then open it up in a debugger (for example, in WinDbg). The key functions to look at: MiniDumpWriteDump, SetUnhandledExceptionFilter
Here's a case I had that somebody might find instructive. It only crashed in release in Qt Creator - not in debug. I was using .ini files (as I prefer apps that can be copied to other drives, vs. ones that lose their settings if the Registry gets corrupted). This applies to any apps that store their settings under the apps' directory tree. If the debug and release builds are under different directories, you can have a setting that's different between them, too. I had preference checked in one that wasn't checked in the other. It turned out to be the source of my crash. Good thing I found it.
I hate to say it, but I only diagnosed the crash in MS Visual Studio Community Edition; after having VS installed, letting my app crash in Qt Creator, and choosing to open it in Visual Studio's debugger. While my Qt app had no symbol info, it turns out that the Qt libraries had some. It led me to the offending line; since I could see what method was being called. (Still, I think Qt is a convenient, powerful, & cross-platform LGPL framework.)
I had this problem too. In my case, the RELEASE mode was having msvscrtd.dll in the linker definition. We removed it and the issue resolved.
Alternatively, adding /NODEFAULTLIB to the linker command line arguments also resolved the issue.
I'll add another possibility for future readers: Check if you're logging to stderr or stdout from an application with no console window (ie you linked with /SUBSYSTEM:WINDOWS). This can crash.
I had a GUI application where I logged to both stderr and a file in both debug and release, so logging was always enabled. I created a console window in debug for easy viewing of the logs, but not in release. However, if the VS debugger is attached to the release build, it'll automatically pipe stderr to the VS output window. So only in release with no debugger did it actually crash when I wrote to stderr.
To make things worse, printf debugging obviously didn't work, which I didn't understand why until I'd tracked down the root cause (by painfully bisecting the codebase by inserting an infinite loop in various spots).
I had this error and vs crashed even when trying to !clean! my project. So I deleted the obj files manually from the Release directory, and after that it built just fine.
I agree with Rolf. Because reproducibility is so important, you shouldn't have a non-debug mode. All your builds should be debuggable. Having two targets to debug more than doubles your debugging load. Just ship the "debug mode" version, unless it is unusable. In which case, make it usable.