C++ Program freezes esoterically - c++

I wrote a C++ CLI program with MS VC++ 2010 and GCC 4.2.1 (for Mac OS X 10.6 64 bit, in Eclipse).
The program works well under GCC+OS X and most times under Windows. But sometimes it silently freezes. The command line cursor keeps blinking, but the program refuses to continue working.
The following configurations work well:
GCC with 'Release' and 'Debug' configuration.
VC++ with 'Debug' configuration
The error only occurs in the configuration 'VC++ with 'Release' configuration' under Win 7 32 bit and 64 bit. Unfortunately this is the configuration my customer wants to work with ;-(
I already checked my program high and low and fixed all memory leaks. But this error still occurs. Do you have any ideas how I can find the error?

Use logging to narrow down which part of code the program is executing when it crashes. Keep adding log until you narrow it down enough to see the issue.
Enable debug information in the release build (both compiler and linker); many variables won't show up correctly, but it should at least give you sensible backtrace (unless the freeze is due to stack smashing or stack overflow), which is usually enough if you keep functions short and doing just one thing.
Memory leaks can't cause freezes. Other forms of memory misuse are however quite likely to. In my experience overrunning a buffer often cause freezes when that buffer is freed as the free function follows the corrupted block chains. Also watch for any other kind of Undefined Behaviour. There is a lot of it in C/C++ and it usually behaves as you expect in debug and completely randomly when optimized.
Try building and running the program under DUMA library to check for buffer overruns. Be warned though that:
It requires a lot of memory. I mean easily like thousand times more. So you can only test on simple cases.
Microsoft headers tend to abuse their internal allocation functions and mismatch e.g. regular malloc and internal __debug_free (or the other way 'round). So might get a few cases that you'll have to carefully workaround by including those system headers into the duma one before it redefines the functions.
Try building the program for Linux and run it under Valgrind. That will check more problems in addition to buffer overruns and won't use that much memory (only twice as normal, but it is slower, approximately 20 times).

Debug versions usually initialize all allocated memory (MSVC fills them with 0xCD with the debug configuration). Maybe you have some uninitialized values in your classes, with the GCC configurations and MSVC Debug configuration it gets a "lucky" value, but in MSVC Release it doesn't.
Here are the rest of the magic numbers used by MSVC.
So look for uninitialized variables, attributes and allocated memory blocks.

Thank you all, especially Cody Gray and MikMik, I found it!
As some of you recommended I told VS to generate debug information and disabled optimizations in the release configuration. Then I started the program and paused it. Alternatively I remotely attached to the running process. This helped me finding the region where the error was.
The reasons were infinite loops, caused by reads behind the boundaries of an array and a missing exclusion of an invalid case. Both led to unreachable stopping conditions at runtime. The esoteric part came from the fact, that my program uses some randomized values.
That's life...

Related

Debugging memory corruption on MinGW

I'm having some trouble with memory corruption in a rather large project to control some scientific hardware (ca. 6000 lines), and I'm not sure which is the best way/tool to solve the problem. The project uses Qt 4.8, and it's built with QtCreator and MinGW. The program works more or less, but I'm having some stability issues. Sometimes I get random crashes, but on some occasions, when I change the source code a little bit, the program will crash at exactly the same position (one which has worked previously). The position which it has picked this time looks like this:
char stages2[1024];
sprintf(stages2, "M-511.DD.LOAD\nNOSTAGE");
The second line gives a segmentation fault (SIGSEGV) when I run it in gdb - which tells me that I have some kind of problem with the program's memory, because I'm certain these two lines are correct. Also, the "crash site" changes depending on the exact source code; I've even seen crashes in Windows DLLs which Qt is using.
I've looked into a few options to find the cause of the problem, but I've run into some difficulties:
I've downloaded DUMA, but I just can't get it to compile on MinGW - I've had to change an include command in order to make the library, but now one of the test programs is failing. (Any hints or links to binaries, anyone?)
I've also tried Application Verifier, but when I run it it always stops at the same position, where a 3rd party DLL I'm using is leaking a handle. I'm reasonably sure this is not the cause of my problems, but I can't continue the debugging process because gdb always goes back to the same position (it only gets stuck there when I'm using Application verifier).
Finally, I've run my program with Dr. Memory, but it just crashes before reaching the main window, without giving me any useful outputs (the only thing I'm seeing is where Qt is apparently wasting some memory).
I'd be really grateful for some advice on what's the most promising method to finally get rid of this error.
Compile with optimization and -Wall (and perhaps other warning flags), check all warnings to make sure nothing fishy is going on.
Use tools like valgrind to check for memory management snafus.

calloc works when debugging, hangs when running

I have a c++ program which I'm running on a windows 7 64bit machine, using Eclipse as my IDE. I use mingw32 for a compiler.
The problem: When I debug the program using the gdb debugger it runs just fine and does what it needs to do. But when I run it without debugging, either from the command line or from within eclipse (using the same configuration as with the debug), it crashes.
I tried running the program from the command line and attaching to the process using the debugger, and what I saw is that it reaches the following line of code:
anc_map[ancestry].hap_array = (char**)calloc(anc_map[ancestry].nr_hap , sizeof(char*));
and just hangs (cpu is not working, and nothing happens although the program is still running).
The above line is actually called more than once, and the hanging occurs the second time it is called (it works the first time).
Any idea what can be the cause for this behavior?
Thanks,
Itamar.
Edit:
I realize that using calloc is a little old-fashioned, but since this is a legacy code I just need to modify a little, I'm trying to avoid doing major refactoring.
I've tried compiling the code and running on linux, and the problem does not occur there, so it has something to do with my configuration on the windows machine
First thing that comes to mind is whether anc_map[ancestry].nr_hap could be some bogus, probably huge, number. Probably because any of the variables got corrupt. I am not sure why it would get corrupt only without debugger, but it might be that the debugger affects where things are allocated and the corruption appears somewhere less harmful when debugging.
The other thing that comes to mind is, that if the program needs a lot of memory, the debugger might affect the 2-GB limit flag in Windows, so in one case there is enough memory and the other way you run out. I am, however, not sure how to change it with mingw32 compiler, as I only did it with the Microsoft one (/LARGEADDRESSAWARE option to Microsoft link and editbin). The reason is, that in some old software, they noticed people doing binary search like (whatever *)(((unsigned)begin + (unsigned)end)/2), which, besides being incorrect C, does not work if the pointers are above 2GB, because the calculation overflows. So for old software, written before more than 2GB was common, they limited the memory to 2GB and provided option to get more, which means 3GB on 32-bit Windows (the last 1GB maps kernel space to avoid swapping page tables on kernel entry and exit; linux does the same thing) and 4GB for 32-bit process on 64-bit windows (the kernel can be mapped above 4GB there).
Hm, but most likely it's actually corruption of the memory management metadata, because that's the usual case where memory allocation or deallocation functions just hang instead of returning an error. Again the debugger would cause some addresses to be different and the corruption to happen elsewhere.
In the first and last case the corruption would probably be always there, so you might have some luck trying to run it:
In linux under valgrind.
Under DUMA, but the Microsoft standard runtime library will try to resist replacing the memory allocation functions rather hard; I finally gave up when I found that IO streams use something like __debug_delete, but normal new. Or the other way around; I don't recall exactly.In either case one was the standard allocation function and the other was some their internal undocumented function. It will also use much more memory than usual, because each allocation will be at least 8kB. In Linux it's trivial, because GNU libc has special support for overriding memory allocation, but valgrind is superior there anyway.

Why do certain things never crash whith debugger on?

My application uses GLUTesselator to tesselate complex concave polygons. It randomly crashes when I run the plain release exe, but it never crashes if I do start debugging in VS. I found this right here which is basically my problem:
The multi-thread debug CRT (/MTd) masks the problem, because, like
Windows does with processes spawned by
a debugger, it provides to your
program a debug heap, that is
initialized to the 0xCD pattern.
Probably somewhere you use some
uninitialized area of memory from the
heap as a pointer and you dereference
it; with the two debug heaps you get
away with it for some reason (maybe
because at address 0xbaadf00d and
0xcdcdcdcd there's valid allocated
memory), but with the "normal" heap
(which is often initialized to 0) you
get an access violation, because you
dereference a NULL pointer.
The problem is the crash occurs in GLU32.dll and I have no way to find out why its trying to dereference a null pointer sometimes. it seems to do this when my polygons get fairly large and have lots of points. What can I do?
Thanks
It's a fact of life that sometimes programs behave differently in the debugger. In your case, some memory is initialized differently, and it's probably laid out differently as well. Another common case in concurrent programs is that the timing is different, and race conditions often happen less often in a debugger.
You could try to manually initialize the heap to a different value (or see if there is an option for this in Visual Studio). Usually initializing to nonzero catches more bugs, but that may not be the case in your situation. You could also try to play with your program's memory mapping to arrange that the page 0xcdcdc000 is unmapped.
Visual Studio can set a breakpoint on accesses to a particular memory address, you could try this (it may slow your program significantly more than a variable breakpoint).
but it never crashes if I do start debugging in VS.
Well, I'm not sure exactly why but while debugging in visual studio program sometimes can get away with accessing some memory regions that would crash it without debugger. I do not know exact reasons, though, but sometimes 0xcdcdcdcd and 0xbaadfood doesn't have anything to do with that. It is just accessing certain addresses doesn't cause problems. When this happens, you'll need to find alternative methods of guessing the problem.
What can I do?
Possible solutions:
Install exception handler in your program (_set_se_translator, if I remember correctly). On access violation try MinidumpWriteDump. Debug it later using Visual Studio (afaik, crash dump debugging is n/a in express edition), or using windbg.
Use just-in-time debuggers. Non-express edition of visual studio have this feature. There are probably alternatives.
Write custom memory manager (that'll override new/delete and will provide malloc/free alternatives (if you use them)) that will grab large chunk of memory, lock all unused memory with VirtualProtect. In this case all invalid access will cause crashes even in debug mode. You'll need a lot of memory for such memory manager, because to be locked, each block should be aligned to pages.
Add excessive logging to all suspicious function calls. Dump a lot of text/debug information into file (or stderr) - parameter values, arrays, everything you suspect could be related to crash, flush after every write to file, otherwise some info will be lost during the crash. This way you'll be able to guess what happened before program crashed.
Try debugging release build. You should be able to do it to some extent if you enable "debug information" for release build in project settings.
Try switching on/off "basic runtime checks" and "buffer security check" in project properties (configuration properties->c/c++->code genration).
Try to find some kind of external tool - something like valgrind or bounds checker. Although, to my expereinece, #3 is more reliable than that approach. Although that really depends on the problem.
A link to an earlier question and two thoughts.
First off you may want to look at a previous question about valgrind substitutes for windows. Lots of good hints on programs that will help you.
Now the thoughts:
1) The debugger may stop your program from crashing in the code you're testing, but it's not fixing the problem. At worst you're just kicking the can down the street, there's still corruption but it's not evident from the way you're running. When you ship you can be assured someone will run into the problem again.
2) What often happens in cases like this is that the error isn't near where the problem occurs. While you may be noticing the problem in GLU32.dll, there was probably corruption earlier, maybe even in a different thread or function, which didn't cause a problem and at some later point the program came back to the corrupted region and failed.

Common reasons for bugs in release version not present in debug mode

What are the typical reasons for bugs and abnormal program behavior that manifest themselves only in release compilation mode but which do not occur when in debug mode?
Many times, in debug mode in C++ all variables are null initialized, whereas the same does not happen in release mode unless explicitly stated.
Check for any debug macros and uninitialized variables
Does your program uses threading, then optimization can also cause some issues in release mode.
Also check for all exceptions, for example not directly related to release mode but sometime we just ignore some critical exceptions, like mem access violation in VC++, but the same can be a issue at least in other OS like Linux, Solaris. Ideally your program should not catch such critical exceptions like accessing a NULL pointer.
A common pitfall is using an expression with side effect inside an ASSERT.
I've been bitten by a number of bugs in the past that have been fine in Debug builds but crash in Release builds. There are many underlying causes (including of course those that have already been summarised in this thread) and I've been caught out by all of the following:
Member variables or member functions in an #ifdef _DEBUG, so that a class is a different size in a debug build. Sometimes #ifndef NDEBUG is used in a release build
Similarly, there's a different #ifdef which happens to be only present in one of the two builds
The debug version uses debug versions of the system libraries, especially the heap and memory allocation functions
Inlined functions in a release build
Order of inclusion of header files. This shouldn't cause problems, but if you have something like a #pragma pack that hasn't been reset then this can lead to nasty problems. Similar problems can also occur using precompiled headers and forced includes
Caches: you may have code such as caches that only gets used in release builds, or cache size limits that are different
Project configurations: the debug and release configurations may have different build settings (this is likely to happen when using an IDE)
Race conditions, timing issues and miscellanous side-effects occurring as a result of debug only code
Some tips that I've accumulated over the years for getting to the bottom of debug/release bugs:
Try to reproduce anomalous behaviour in a debug build if you can, and even better, write a unit test to capture it
Think about what differs between the two: compiler settings, caches, debug-only code. Try to minimise those differences temporarily
Create a release build with optimisations switched off (so you're more likely to get useful data in the debugger), or an optimised debug build. By minimising the changes between debug and release, you're more likely to be able to isolate which difference is causing the bug.
Other differences might be:
In a garbage-collected language, the
collector is usually more aggressive
in release mode;
Layout of memory may
often be different;
Memory may be
initialized differently (eg could be
zeroed in debug mode, or re-used more
aggressively in release);
Locals may
be promoted to register values in release, which can
cause issues with floating point
values.
Yes!, if you have conditional compilation, there may be timing bugs (optimised release code verse, non-optimised debug code), memory re-use vs. debug heap.
It can, especially if you are in the C realm.
One cause could be that the DEBUG version may add code to check for stray pointers and somehow protect your code from crashing (or behave incorrectly). If this is the case you should carefully check warnings and other messages you get from your compiler.
Another cause could be optimization (which is normally on for release versions and off for debug). The code and data layout may have been optimized and while your debugging program just was, for example, accessing unused memory, the release version is now trying to access reserved memory or even pointing to code!
EDIT: I see other mentioned it: of course you might have entire code sections that are conditionally excluded if not compiling in DEBUG mode. If that's the case, I hope that is really debugging code and not something vital for the correctness of the program itself!
The CRT library functions behave differently in debug vs release (/MD vs /MDd).
For example, the debug versions often prefill buffers you pass to the indicated length to verify your claim. Examples include strcpy_s, StringCchCopy, etc. Even if the strings terminate earlier, your szDest better be n bytes long!
Sure, for example, if you use constructions like
#if DEBUG
//some code
#endif
You'd need to give a lot more information, but yes, it's possible. It depends what your debug version does. You may well have logging or extra checks in that that don't get compiled into a release version. These debug only code paths may have unintended side effects which change state or affect variables in strange ways. Debug builds usually run slower, so this may affect threading and hide race conditions. The same for straight forward optimisations from a release compile, it's possible (although unlikely these days) that a release compile may short circuit something as an optimisation.
Without more details, I will assume that "not OK" means that it either does not compile or throws some sort of error at runtime. Check if you have code that relies on the compilation version, either via #if DEBUG statements or via methods marked with the Conditional attribute.
In .NET, even if you don't use conditional compilation like #if DEBUG, the compiler is still alot more liberal with optimisations in release mode than it is in debug mode, which can lead to release only bugs as well.
That is possible, if you have conditional compilation so that the debug code and release code are different, and there is a bug in the code that is only use in the release mode.
Other than that, it's not possible. There are difference in how debug code and release code are compiled, and differences in how code is executed if run under a debugger or not, but if any of those differences cause anything other than a performance difference, the problem was there all along.
In the debug version the error might not be occuring (because the timing or memory allocation is different), but that doesn't mean that the error is not there. There may also be other factors that are not related to the debug mode that changes the timing of the code, causing the error to occur or not, but it all boils down to the fact that if the code was correct, the error would not occur in any of the situations.
So, no, the debug version is not OK just because you can run it without getting an error. If an error occurs when you run it in release mode, it's not because of the release mode, it's because the error was there from the start.
There are compiler optimizations that can break valid code because they are too aggressive.
Try compiling your code with less optimization turned on.
In a non-void function, all execution paths should end with a return statement.
In debug mode, if you forget to end such a path with a return statement then the function usually returns 0 by default.
However, in release mode your function may return garbage values, which may affect how your program runs.
It's possible. If it happens and no conditional compilation is involved, than you can be pretty sure that your program is wrong, and is working in debug mode only because of fortuitous memory initializations or even layout in memory!
I just experienced that when I was calling an assembly function that didn't restored the registers' previous values.
In the "Release" configuration, VS was compiling with /O2 which optimizes the code for speed. Thus some local variables where merely mapping to CPU registers (for optimization) which were shared with the aforementioned function leading to serious memory corruption.
Anyhow see if you aren't indirectly messing with CPU registers anywhere in your code.
Another reasons could be DB calls.
Are you saving and updating same record multiple times in same thread,
sometimes for updating.
Its possible the update failed or didnt work as expected because the previous create command was still processing and for update, the db call failed to find any record.
this wont happen in debug as debugger makes sure to complete all pending tasks before landing.
I remember while ago when we were building dll and pdb in c/c++.
I remember this:
Adding log data would sometime make the bug move or disappear or make a totally other error appears (so it was not really an option).
Many of these errors where related to char allocation in strcpy and strcat and arrays of char[] etc...
We weeded some out by running bounds checker and simply fixing the
memory alloc/dealloc issues.
Many times, we systematically went through the code and fixed a char allocation.
My two cents is that it is related to memory allocation and management and constraints and differences between Debug mode and release mode.
And then kept at going through that cycle.
We sometimes, temporarily swapped release for debug versions of dlls, in order not to hold off production, while working on these bugs.

Program only crashes as release build -- how to debug?

I've got a "Schroedinger's Cat" type of problem here -- my program (actually the test suite for my program, but a program nonetheless) is crashing, but only when built in release mode, and only when launched from the command line. Through caveman debugging (ie, nasty printf() messages all over the place), I have determined the test method where the code is crashing, though unfortunately the actual crash seems to happen in some destructor, since the last trace messages I see are in other destructors which execute cleanly.
When I attempt to run this program inside of Visual Studio, it doesn't crash. Same goes when launching from WinDbg.exe. The crash only occurs when launching from the command line. This is happening under Windows Vista, btw, and unfortunately I don't have access to an XP machine right now to test on.
It would be really nice if I could get Windows to print out a stack trace, or something other than simply terminating the program as if it had exited cleanly. Does anyone have any advice as to how I could get some more meaningful information here and hopefully fix this bug?
Edit: The problem was indeed caused by an out-of-bounds array, which I describe more in this post. Thanks everybody for your help in finding this problem!
In 100% of the cases I've seen or heard of, where a C or C++ program runs fine in the debugger but fails when run outside, the cause has been writing past the end of a function local array. (The debugger puts more on the stack, so you're less likely to overwrite something important.)
When I have encountered problems like this before it has generally been due to variable initialization. In debug mode, variables and pointers get initialized to zero automatically but in release mode they do not. Therefore, if you have code like this
int* p;
....
if (p == 0) { // do stuff }
In debug mode the code in the if is not executed but in release mode p contains an undefined value, which is unlikely to be 0, so the code is executed often causing a crash.
I would check your code for uninitialized variables. This can also apply to the contents of arrays.
No answer so far has tried to give a serious overview about the available techniques for debugging release applications:
Release and Debug builds behave differently for many reasons. Here is an excellent overview. Each of these differences might cause a bug in the Release build that doesn't exist in the Debug build.
The presence of a debugger may change the behavior of a program too, both for release and debug builds. See this answer. In short, at least the Visual Studio Debugger uses the Debug Heap automatically when attached to a program. You can turn the debug heap off by using environment variable _NO_DEBUG_HEAP . You can specify this either in your computer properties, or in the Project Settings in Visual Studio. That might make the crash reproducible with the debugger attached.
More on debugging heap corruption here.
If the previous solution doesn't work, you need to catch the unhandled exception and attach a post-mortem debugger the instance the crash occurs. You can use e.g. WinDbg for this, details about the avaiable post-mortem debuggers and their installation at MSDN
You can improve your exception handling code and if this is a production application, you should:
a. Install a custom termination handler using std::set_terminate
If you want to debug this problem locally, you could run an endless loop inside the termination handler and output some text to the console to notify you that std::terminate has been called. Then attach the debugger and check the call stack. Or you print the stack trace as described in this answer.
In a production application you might want to send an error report back home, ideally together with a small memory dump that allows you to analyze the problem as described here.
b. Use Microsoft's structured exception handling mechanism that allows you to catch both hardware and software exceptions. See MSDN. You could guard parts of your code using SEH and use the same approach as in a) to debug the problem. SEH gives more information about the exception that occurred that you could use when sending an error report from a production app.
Things to look out for:
Array overruns - the visual studio debugger inserts padding which may stop crashes.
Race conditions - do you have multiple threads involved if so a race condition many only show up when an application is executed directly.
Linking - is your release build pulling in the correct libraries.
Things to try:
Minidump - really easy to use (just look it up in msdn) will give you a full crash dump for each thread. You just load the output into visual studio and it is as if you were debugging at the time of the crash.
You can set WinDbg as your postmortem debugger. This will launch the debugger and attach it to the process when the crash occurs. To install WinDbg for postmortem debugging, use the /I option (note it is capitalized):
windbg /I
More details here.
As to the cause, it's most probably an unitialized variable as the other answers suggest.
After many hours of debugging, I finally found the cause of the problem, which was indeed caused by a buffer overflow, caused a single byte difference:
char *end = static_cast<char*>(attr->data) + attr->dataSize;
This is a fencepost error (off-by-one error) and was fixed by:
char *end = static_cast<char*>(attr->data) + attr->dataSize - 1;
The weird thing was, I put several calls to _CrtCheckMemory() around various parts of my code, and they always returned 1. I was able to find the source of the problem by placing "return false;" calls in the test case, and then eventually determining through trial-and-error where the fault was.
Thanks everybody for your comments -- I learned a lot about windbg.exe today! :)
Even though you have built your exe as a release one, you can still generate PDB (Program database) files that will allow you to stack trace, and do a limited amount of variable inspection.
In your build settings there is an option to create the PDB files. Turn this on and relink. Then try running from the IDE first to see if you get the crash. If so, then great - you're all set to look at things. If not, then when running from the command line you can do one of two things:
Run EXE, and before the crash do an Attach To Process (Tools menu on Visual Studio).
After the crash, select the option to launch debugger.
When asked to point to PDB files, browse to find them. If the PDB's were put in the same output folder as your EXE or DLL's they will probably be picked up automatically.
The PDB's provide a link to the source with enough symbol information to make it possible to see stack traces, variables etc. You can inspect the values as normal, but do be aware that you can get false readings as the optimisation pass may mean things only appear in registers, or things happen in a different order than you expect.
NB: I'm assuming a Windows/Visual Studio environment here.
Crashes like this are almost always caused because an IDE will usually set the contents of uninitialized variable to zeros, null or some other such 'sensible' value, whereas when running natively you'll get whatever random rubbish that the system picks up.
Your error is therefore almost certainly that you are using something like you are using a pointer before it has been properly initialized and you're getting away with it in the IDE because it doesn't point anywhere dangerous - or the value is handled by your error checking - but in release mode it does something nasty.
In order to have a crash dump that you can analyze:
Generate pdb files for your code.
You rebase to have your exe and dlls loaded in the same address.
Enable post mortem debugger such as Dr. Watson
Check the crash failures address using a tool such as crash finder.
You should also check out the tools in Debugging tools for windows.
You can monitor the application and see all the first chance exceptions that were prior to your second chance exception.
Hope it helps...
Sometimes this happens because you have wrapped important operation inside "assert" macro. As you may know, "assert" evaluates expressions only on debug mode.
A great way to debug an error like this is to enable optimizations for your debug build.
Once i had a problem when app behaved similarily to yours. It turned out to be a nasty buffer overrun in sprintf. Naturally, it worked when run with a debugger attached. What i did, was to install an unhandled exception filter (SetUnhandledExceptionFilter) in which i simply blocked infinitely (using WaitForSingleObject on a bogus handle with a timeout value of INFINITE).
So you could something along the lines of:
long __stdcall MyFilter(EXCEPTION_POINTERS *)
{
HANDLE hEvt=::CreateEventW(0,1,0,0);
if(hEvt)
{
if(WAIT_FAILED==::WaitForSingleObject(hEvt, INFINITE))
{
//log failure
}
}
}
// somewhere in your wmain/WinMain:
SetUnhandledExceptionFilter(MyFilter);
I then attached the debugger after the bug had manifested itself (gui program stopped responding).
Then you can either take a dump and work with it later:
.dump /ma path_to_dump_file
Or debug it right away. The simplest way is to track where processor context has been saved by the runtime exception handling machinery:
s-d esp Range 1003f
Command will search stack address space for CONTEXT record(s) provided the length of search. I usually use something like 'l?10000'. Note, do not use unsually large numbers as the record you're after usually near to the unhanded exception filter frame.
1003f is the combination of flags (i believe it corresponds to CONTEXT_FULL) used to capture the processor state.
Your search would look similar to this:
0:000> s-d esp l1000 1003f
0012c160 0001003f 00000000 00000000 00000000 ?...............
Once you get results back, use the address in the cxr command:
.cxr 0012c160
This will take you to this new CONTEXT, exactly at the time of crash (you will get exactly the stack trace at the time your app crashed).
Additionally, use:
.exr -1
to find out exactly which exception had occurred.
Hope it helps.
With regard to your problems getting diagnostic information, have you tried using adplus.vbs as an alternative to WinDbg.exe? To attach to a running process, use
adplus.vbs -crash -p <process_id>
Or to start the application in the event that the crash happens quickly:
adplus.vbs -crash -sc your_app.exe
Full info on adplus.vbs can be found at: http://support.microsoft.com/kb/286350
Ntdll.dll with debugger attached
One little know difference between launching a program from the IDE or WinDbg as opposed to launching it from command line / desktop is that when launching with a debugger attached (i.e. IDE or WinDbg) ntdll.dll uses a different heap implementation which performs some little validation on the memory allocation/freeing.
You may read some relevant information in unexpected user breakpoint in ntdll.dll. One tool which might be able to help you identifying the problem is PageHeap.exe.
Crash analysis
You did not write what is the "crash" you are experiencing. Once the program crashes and offers you to send the error information to the Microsoft, you should be able to click on the technical information and to check at least the exception code, and with some effort you can even perform post-mortem analysis (see Heisenbug: WinApi program crashes on some computers) for instructions)
Vista SP1 actually has a really nice crash dump generator built into the system. Unfortunately, it isn't turned on by default!
See this article:
http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx
The benefit of this approach is that no extra software needs to be installed on the affected system. Grip it and rip it, baby!
As my experience, that are most being memory corruption issues.
For example :
char a[8];
memset(&a[0], 0, 16);
: /*use array a doing some thing */
it is very possible to be normal in debug mode when one runs the code.
But in release, that would/might be crash.
For me, to rummage where the memory is out of bound is too toilsome.
Use some tools like Visual Leak Detector (windows) or valgrind (linux) are more wise choise.
I've seen a lot of right answers. However, there is none that helped me. In my case, there was a wrong usage of the SSE instructions with the unaligned memory. Take a look at your math library (if you use one), and try to disable SIMD support, recompile and reproduce the crash.
Example:
A project includes mathfu, and uses the classes with STL vector: std::vector< mathfu::vec2 >. Such usage will probably cause a crash at the time of the construction of mathfu::vec2 item since the STL default allocator does not guarantee required 16-byte alignment. In this case to prove the idea, one can define #define MATHFU_COMPILE_WITHOUT_SIMD_SUPPORT 1 before each include of the mathfu, recompile in Release configuration and check again.
The Debug and RelWithDebInfo configurations worked well for my project, but not the Release one. The reason behind this behavior is probably because debugger processes allocation/deallocation requests and does some memory bookkeeping to check and verify the accesses to the memory.
I experienced the situation in Visual Studio 2015 and 2017 environments.
Something similar happend to me once with GCC. It turned out to be a too aggressive optimization that was enabled only when creating the final release and not during the development process.
Well, to tell the truth it was my fault, not gcc's, as I didn't noticed that my code was relying on the fact that that particular optimization wouldn't have been done.
It took me a lot of time to trace it and I only came to it because I asked on a newsgroup and somebody made me think about it. So, let me return the favour just in case this is happening to you as well.
I've found this this article useful for your scenario. ISTR the compiler options were a little out of date. Look around your Visual Studio project options to see how to generate pdb files for your release build, etc.
It's suspicious that it would happen outside the debugger and not inside; running in the debugger does not normally change the application behavior. I would check the environment differences between the console and the IDE. Also, obviously, compile release without optimizations and with debug information, and see if that affects the behavior. Finally, check out the post-mortem debugging tools other people have suggested here, usually you can get some clue from them.
Debugging release builds can be a pain due to optimizations changing the order in which lines of your code appear to be executed. It can really get confusing!
One technique to at least narrow down the problem is to use MessageBox() to display quick statements stating what part of the program your code has got to ("Starting Foo()", "Starting Foo2()"); start putting them at the top of functions in the area of your code that you suspect (what were you doing at the time when it crashed?). When you can tell which function, change the message boxes to blocks of code or even individual lines within that function until you narrow it down to a few lines. Then you can start printing out the value of variables to see what state they are in at the point of crashing.
Try using _CrtCheckMemory() to see what state the allocated memory is in .
If everything goes well , _CrtCheckMemory returns TRUE , else FALSE .
You might run your software with Global Flags enabled (Look in Debugging Tools for Windows). It will very often help to nail the problem.
Make your program generate a mini dump when the exception occurs, then open it up in a debugger (for example, in WinDbg). The key functions to look at: MiniDumpWriteDump, SetUnhandledExceptionFilter
Here's a case I had that somebody might find instructive. It only crashed in release in Qt Creator - not in debug. I was using .ini files (as I prefer apps that can be copied to other drives, vs. ones that lose their settings if the Registry gets corrupted). This applies to any apps that store their settings under the apps' directory tree. If the debug and release builds are under different directories, you can have a setting that's different between them, too. I had preference checked in one that wasn't checked in the other. It turned out to be the source of my crash. Good thing I found it.
I hate to say it, but I only diagnosed the crash in MS Visual Studio Community Edition; after having VS installed, letting my app crash in Qt Creator, and choosing to open it in Visual Studio's debugger. While my Qt app had no symbol info, it turns out that the Qt libraries had some. It led me to the offending line; since I could see what method was being called. (Still, I think Qt is a convenient, powerful, & cross-platform LGPL framework.)
I had this problem too. In my case, the RELEASE mode was having msvscrtd.dll in the linker definition. We removed it and the issue resolved.
Alternatively, adding /NODEFAULTLIB to the linker command line arguments also resolved the issue.
I'll add another possibility for future readers: Check if you're logging to stderr or stdout from an application with no console window (ie you linked with /SUBSYSTEM:WINDOWS). This can crash.
I had a GUI application where I logged to both stderr and a file in both debug and release, so logging was always enabled. I created a console window in debug for easy viewing of the logs, but not in release. However, if the VS debugger is attached to the release build, it'll automatically pipe stderr to the VS output window. So only in release with no debugger did it actually crash when I wrote to stderr.
To make things worse, printf debugging obviously didn't work, which I didn't understand why until I'd tracked down the root cause (by painfully bisecting the codebase by inserting an infinite loop in various spots).
I had this error and vs crashed even when trying to !clean! my project. So I deleted the obj files manually from the Release directory, and after that it built just fine.
I agree with Rolf. Because reproducibility is so important, you shouldn't have a non-debug mode. All your builds should be debuggable. Having two targets to debug more than doubles your debugging load. Just ship the "debug mode" version, unless it is unusable. In which case, make it usable.