This is an unusual question to ask but here goes:
In my code, I accidentally dereference NULL somewhere. But instead of the application crashing with a segfault, it seems to stop execution of the current function and just return control back to the UI. This makes debugging difficult because I would normally like to be alerted to the crash so I can attach a debugger.
What could be causing this?
Specifically, my code is an ODBC Driver (ie. a DLL). My test application is ODBC Test (odbct32w.exe) which allows me to explicitly call the ODBC API functions in my DLL. When I call one of the functions which has a known segfault, instead of crashing the application, ODBC Test simply returns control to the UI without printing the result of the function call. I can then call any function in my driver again.
I do know that technically the application calls the ODBC driver manager which loads and calls the functions in my driver. But that is beside the point as my segfault (or whatever is happening) causes the driver manager function to not return either (as evidenced by the application not printing a result).
One of my co-workers with a similar machine experiences this same problem while another does not but we have not been able to determine any specific differences.
Windows has non-portable language extensions (known as "SEH") which allow you to catch page faults and segmentation violations as exceptions.
There are parts of the OS libraries (particularly inside the OS code that processes some window messages, if I remember correctly) which have a __try block and will make your code continue to run even in the face of such catastrophic errors. Likely you are being called inside one of these __try blocks. Sad but true.
Check out this blog post, for example: The case of the disappearing OnLoad exception – user-mode callback exceptions in x64
Update:
I find it kind of weird the kind of ideas that are being attributed to me in the comments. For the record:
I did not claim that SEH itself is bad.I said that it is "non-portable", which is true. I also claimed that using SEH to ignore STATUS_ACCESS_VIOLATION in user mode code is "sad". I stand by this. I should hope that I had the nerve to do this in new code and you were reviewing my code that you would yell at me, just as if I wrote catch (...) { /* Ignore this! */ }. It's a bad idea. It's especially bad for access violation because getting an AV typically means your process is in a bad state, and you shouldn't continue execution.
I did not argue that the existence of SEH means that you must swallow all errors.Of course SEH is a general mechanism and not to blame for every idiotic use of it. What I said was that some Windows binaries swallow STATUS_ACCESS_VIOLATION when calling into a function pointer, a true and observable fact, and that this is less than pretty. Note that they may have historical reasons or extenuating circumstances to justify this. Hence "sad but true."
I did not inject any "Windows vs. Unix" rhetoric here. A bad idea is a bad idea on any platform. Trying to recover from SIGSEGV on a Unix-type OS would be equally sketchy.
Dereferencing NULL pointer is an undefined behavior, which can produce almost anything -- a seg.fault, a letter to IRS, or a post to stackoverflow :)
Windows 7 also have its Fault Tollerant Heap (FTH) which sometimes does such things. In my case it was also a NULL-dereference. If you develop on Windows 7 you really want to turn it off!
What is Windows 7's Fault Tolerant Heap?
http://msdn.microsoft.com/en-us/library/dd744764%28v=vs.85%29.aspx
Read about the different kinds of exception handlers here -- they don't catch the same kind of exceptions.
Attach your debugger to all the apps that might call your dll, turn on the feature to break when an excption is thrown not just unhandled in the [debug]|[exceptions] menu.
ODBC is most (if not all) COM as such unhandled exceptions will cause issues, which could appear as exiting the ODBC function strangely or as bad as it hang and never return.
Related
I have a DX9 application that runs on an embedded Windows XP box. When leaving it automated overnight for soak testing it crashes after about six to eight hours. On our dev. machines (Win 7) we can't seem to reproduce this issue. I'm also fairly certain it's not a memory leak.
If we run the same application in Debug on the embedded machines, it doesn't crash.
If we place a __try/__except around the main loop update on the embedded machines, it doesn't crash.
I know in Debug, there is some additional byte padding around the local stack which may be "hiding" a local array access out of bounds, or some sort of uninitialized variable is sneaking through.
So I have two questions:
Does __try/__except behave similar to debug, even when run in release?
What kind of things should I be scanning the code for if we have a crash in Release mode, but not in Debug mode?
If you're using __try{ } __except() you shouldn't.
Those and C++ code don't mix well. (for instance, you can't have C++ objects on the stack of a function wrapped with those. You should use C++ try {} catch() {} if you use catch(...) (with ellipsis) it does basically the same as __except()
both try.. catch and __try .. __except behave the same in debug and release.
If you suspect that your problem is an unexpected exception you should read about all of the following:
SetUnhandledExceptionFilter()
_set_se_translator()
_CrtSetReportMode()
_RTC_SetErrorFunc()
_set_abort_behavior()
_set_error_mode()
_set_new_handler()
_set_new_mode()
_set_purecall_handler()
set_terminate()
set_unexpected()
_set_invalid_parameter_handler()
_controlfp()
Using one of the first two would probably allow you to pinpoint your problem pretty quickly. The rest are there if you want absolute control for all error cases possible in your process.
Specifically, with SetUnhandledExceptionFilter() you can set up a function filter which logs the address of the code which caused the exception. You can then use your debugger to pin point that code. Using the DbgHelp library and with the information given to the filter function you can write some code which prints out a full stack trace of the crash, including symbols and line numbers.
Make sure you set up your build configuration to emit debug symbols for release builds as well. They can only help and don't do anything to slow your application (but maybe make it bigger)
If we place a __try/__except around the main loop update on the embedded machines, it doesn't crash.
Then do that.
A single __try block around the whole program (as well as the entry point for each worker thread) is the recommended approach, it lets you write out a crash dump and make an error report before exiting. There's not much recovery you can do with SEH, because the exceptions just don't carry enough information to distinguish different failures usefully. Storing the whole program state and pulling it into a debugger is very useful, though.
Note: Some video drivers cause SEH exceptions that they also catch, perhaps some logic expects there to be more than one SEH scope installed, which your __try block provided.
I have a Win32 C++ app developed in VS2005. There is a try {} catch (...) {} wrapped around a block of code, and yet 3 functions deep, when the algorithm breaks down and tries to reference off the end of a std::vector, instead of catching the exception, the program drops into the VS debugger, tells me I have an unhandled win32 exception, and the following is found on the call stack above my function:
msvcr80.dll!:inavlid_parameter_noinfo()
msvcr80.dll!:invoke_watson(....)
msvcr80.dll!:_crt_debugger_hook(...)
How can I prevent the debugger being called? This occurs at the end of a 30 minute simulation, at which point I lose all my results unless I can catch and log the exception. This and similar try/catch constructs have been working in the past - are there compiler settings which affect this? Help?
You may want to convert non-C++ exceptions into C++ exceptions. Here's an example of how to do it.
Apologies for the delay - unforeseen circumstances - but the real answer appears to be the following.
First, the application is also wrapped in a __try {} __except () {} construct, and uses unhandled exception filtering based on XCrashReport. This picks up the non C++ exceptions for the reasons many have pointed out above.
Second, however, as one can find here and elsewhere, since CRT 8.0, for security reasons Microsoft bypasses unhandled exception handling on buffer overruns (and certain other situations), and passes the issue directly to Dr Watson.
This is really annoying - I want to know where my buffer overruns are occurring, and why, and to fix them. I also want my program to crash, leave an audit trail, and then be automatically restarted, 24/7. Having MS Visual Studio on the PC seems to mean that Dr Watson will pause and offer me the option of using MSVC to debug the issue. However, until I respond to the dialog box, nothing will happen. Comments on workarounds greatly appreciated...
Just because you have a catch(...) doesn't mean you can catch and recover from all exceptions. Not all exceptions are recoverable.
You problem might be that the first exception that pin points the exact problem is getting obscured by your catch statement
You need to attach the debugger to the program and have it break on all exceptions and fix the code
Standard Library containers do not check for any logic errors and hence themselves never emit any exceptions.
Specifically, for std::vector only method that throws an exception is std::vector::at().
Any Undefined Behavior w.r.t to them will most likely lead your program to crash.
You will need to use Windows' SEH and C++ Exception Handling for catching Windows specific exceptions, which are non C++ standard btw.
At runtime, when myApp.exe crashes i receive "Unhandled Win32 exception" but how would i know which exception was occurred? where did something went wrong?
For a Native C++ app see my earlier answer here: Detect/Redirect core dumps (when a software crashes) on Windows for catching the unhandled exception (that also gives code for creating a crash dump that you can use to analyse the crash later. If the crash is happening on a development system then in Visual Studio (I'm assuming you're using that, if not other IDEs will have something similar), in Debug/Exceptions tick the 'Thrown' box for 'Win32 Exceptions'.
Typically, Windows will give you several hexadecimal numbers as well. Chances are that the exception code will be 0xC0000005. This is the code of an Access Violation. When that happens, you also will have three additional bits of information: the violating address, the violated address, and the type of violation (read, write or execute).
Windows won't narrow it down any further than that, and often it couldn't anyway. For instance, if you walk past the end of an array in your program, Windows likely won't realize that you were even iterating over an array. It just sees "read:OK, read:OK, read:out of bounds => page fault => ACCESS VIOLATION". You will have to figure that out from the violating address (your array iteration code) and the violated address (the out-of-bounds address behind your array).
If it's a .Net app you could try to put in a handledr for the UnhandledException event. You can find more information about it and some sample code here.
In general it's a good sign that your exception handling is broken though, so might be worth going through your code and finding places that could throw but where you don't handle exceptions.
Use the debugger. You can run the program and see what exception is been thrown that kills your application. It might be able to pinpoint the location of the throw. I have not used the VS debugger for this, but in gdb you can use catch throw to force a breakpoint when an exception is thrown, there should be something similar.
How can I specify a separate function which will be called automatically at Run-Time error to prevent program crash?
Best thing as mentioned already is to identify the area where the crash ic occuring and then fix the piece of code. This is the idealistic approach.
In case you are unable to find that out another alternative is to do structured exception handling in the areas where you suspect crashes to occur. Once the crash occurs you capture what ever data you want and process it. Meanwhile you can change the settings in windows service manager to restart your application whenever it crashes. Hope this answers your question.
Also in case you are looking for methods to capture the crash and analyse debugdiag and windbg are some of the standard tools that people use to take the crash dumps.
If your in Windows you need to write your own custom runtime check handler. Use:
_RTC_SetErrorFunc
to install your custom function in place of
_CrtDbgReport.
Here is a good article on how to do it:
http://msdn.microsoft.com/en-us/library/40ky6s47(VS.71).aspx
If by run-time error you mean an unhandled exception, don't do it. If an unhanded exception propagates through your code, you want your app to crash. Maybe create a dumpfile on the way down, sure. But the last thing you want to do is catch the exception, do nothing, and continue running as if nothing had happened in the first place.
When an exception is generated, whatever caused the problem could have had other effects. Like blowing the stack or corrupting the heap. If you were to silently ignore these exceptions and continue running anyway, your app might run for a while and seem OK, but something underneath is unstable. Memory could be corrupt, resources might not be available, who knows. You will eventually crash. Bu ignoring unhandled exceptions and running anyway, all you do is delay the inevitable, and make it that much more difficult to diagnose the real problem, because when you do crash, the stack will be in a totally different and probably unrelated place from what caused the initial problem.
Background
I have an application with a Poof-Crash[1]. I'm fairly certain it is due to a blown stack.
The application is Multi-Threaded.
I am compiling with "Enable C++ Exceptions: Yes With SEH Exceptions (/EHa)".
I have written an SE Translator function and called _set_se_translator() with it.
I have written functions for and setup set_terminate() and set_unexpected().
To get the Stack Overflow, I must run in release mode, under heavy load, for several days. Running under a debugger is not an option as the application can't perform fast enough to achieve the runtime necessary to see the issue.
I can simulate the issue by adding infinite recursion on execution of one of the functions, and thus test the catching of the EXCEPTION_STACK_OVERFLOW exception.
I have WinDBG setup as the crash dump program, and get good information for all other crash issues but not this one. The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
The Question
None of the things I've tried has resulted in picking up the EXCEPTION_STACK_OVERFLOW exception.
Does anyone know how to guarantee getting a a chance at this exception during runtime in release mode?
Definitions
Poof-Crash: The application crashes by going "poof" and disappearing without a trace.
(Considering the name of this site, I'm kind of surprised this question isn't on here already!)
Notes
An answer was posted briefly about adjusting the stack size to potentially force the issue sooner and allow catching it with a debugger. That is a clever thought, but unfortunately, I don't believe it would help. The issue is likely caused by a corner case leading to infinite recursion. Shortening the stack would not expose the issue any sooner and would likely cause an unrelated crash in validly deep code. Nice idea though, and thanks for posting it, even if you did remove it.
Everything prior to windows xp would not (or would be harder) generally be able to trap stack overflows. With the advent of xp, you can set vectored exception handler that gets a chance at stack overflow prior to any stack-based (structured exception) handlers (this is being the very reason - structured exception handlers are stack-based).
But there's really not much you can do even if you're able to trap such an exception.
In his blog, cbrumme (sorry, do not have his/her real name) discusses a stack page neighboring the guard page (the one, that generates the stack overflow) that can potentially be used for backout. If you can squeeze your backout code to use just one stack page - you can free as much as your logic allows. Otherwise, the application is pretty much dead upon encountering stack overflow. The only other reasonable thing to do, having trapped it, is to write a dump file for later debugging.
Hope, it helps.
I'm not convinced that you're on the right track in diagnosing this as a stack overflow.
But in any case, the fact that you're getting a poof!, plus what you're seeing in WinDbg
The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.
suggests to me that somebody has called the C RTL exit() function, or possibly called the Windows API TerminateProcess() directly. That could have something to do with your interrupt handlers or not. Maybe something in the exception handling logic has a re-entrance check and arbitrarily decides to exit() if it's reentered.
My suggestion is to patch your executables to put maybe an INT 3 debug at the entry point to exit (), if it's statically linked, or if it's dynamically linked, patch up the import and also patch up any imports of kernel32::TerminateProcess to throw a DebugBreak() instead.
Of course, exit() and/or TerminateProcess() may be called on a normal shutdown, too, so you'll have to filter out the false alarms, but if you can get the call stack for the case where it's just about to go proof, you should have what you need.
EDIT ADD: Just simply writing your own version of exit() and linking it in instead of the CRTL version might do the trick.
I remember code from a previous workplace that sounded similar having explicit bounds checks on the stack pointer and throwing an exception manually.
It's been a while since I've touched C++ though, and even when I did touch it I didn't know what I was doing, so caveat implementor about portability/reliability of said advice.
Have you considered ADPlus from Debugging Tools for Windows?
ADPlus attaches the CDB debugger to a process in "crash" mode and will generate crash dumps for most exceptions the process generates. Basically, you run "ADPlus -crash -p yourPIDhere", it performs an invasive attach and begins logging.
Given your comment above about running under a debugger, I just wanted to add that CDB adds virtually zero overhead in -crash mode on a decent (dual-core, 2GB RAM) machine, so don't let that hold you back from trying it.
You can generate debugging symbols without disabling optimizations. In fact, you should be doing that anyways. It just makes debugging harder.
And the documentation for _set_se_translator says that each thread has its own SE translator. Are you setting one for each thread?
set_unexpected is probably a no-op, at least according to the VS 2005 documentation. And each thread also has its own terminate handler, so you should install that per thread as well.
I would also strongly recommend NOT using SE translation. It takes hardware exceptions that you shouldn't ignore (i.e., you should really log an error and terminate) and turns them into something you can ignore (C++ exceptions). If you want to catch this kind of error, use a __try/__except handler.