Catching a DLL crash in C/C++

Catching a DLL crash in C/C++ - c++

I'm calling a function from a DLL, like this:
__declspec ( dllimport ) bool dll_function(...);
int main() {
[...]
if (dll_function(...)) {
[...]
}
}
In some cases, the data I pass to the DLL function will lead to a crash of the DLL. Is it possible to catch this so my application doesn't crash as well (without modifying the DLL which is not created by me)?

You can catch AVs with the __try and __except keywords in the MSVC compiler. Not all that useful, you have no idea what kind of damage was done. The state of your program might well be corrupted. The heap might be blown for example, causing subsequent random failure. Hosting the DLL in its own process and using IPC to talk to it is the only decent approach.

In some cases, the data I pass to the
DLL function will lead to a crash of
the DLL. Is it possible to catch this
so my application doesn't crash as
well?
Isn't it possible to prevent the dll from crashing if you only call the function with valid data? That should be the preferable solution in any case - but its hard to tell without knowing which dll you want to use. But in most cases, you should have an idea what "data" exactly results in an crash...

Try looking at:
http://msdn.microsoft.com/en-us/library/ms680634%28v=vs.85%29.aspx
and
Enforce Filter code by Oleg Starodumov (www.debuginfo.com)
http://www.debuginfo.com/articles/debugfilters.html
However, that is a top level filter and not a try/catch. You can perhaps restart your process.
You might need to use __try for exceptions. Again, probably better to fix the problem or just crash than to try to catch it.
I agree with the others that rather than suppressing or hiding the crash you should fix it. I don't know how well you can recover from the crash - is it going to be useful to continue execution after something like that?

I'm not sure if this is the problem, try specifying the correct calling convention. (__stdcall, __cdecl, etc).
If that's not the problem, we need to see what you are passing to the function and possibly the function code if you have it.

Related

My code crashes on delete this

I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.

The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)

It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)

Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).

How to change my error handling method

I can't seem to get my head around why people say C++ exceptions are better. For example, I have an application which loads function objects from shared objects to be used in the application. What goes on is something like this:
bool LoadFunctions()
{
//Get Function factory.
FunctionFactory& oFactory = GetFunctionFactory();
//Create functions from the factory and use.
}
FunctionFactory& GetFunctionFactory()
{
//Get shared object handle.
void* pHandle = dlopen("someso.so");
//Get function ptr for Factory getter.
typedef FunctionFactory* (*tpfFacGet)();
tpfFacGet pF = static_cast<tpfFacGet>(dlsym(pHandle, "GetFactory"));
//Call function and return object.
return *((*pF)());
}
Now, it's easy to see that loads of stuff can go wrong. If I did it like I always do, I'd return pointers instead of references, and I'd check if they were NULL and print an error message and get out if they weren't. That way, I know where things went wrong and I can even try to recover from that (i.e. If I successfully load the factory and fail to load just a single function, I may still continue). What I don't understand is how to use exceptions in such a scenario and how to recover the program rather than printing an error message and qutting. Can someone tell me how I am to do this in C++ish way?

We don't even need return codes. If a problem occurs it should be in the exception.
int main()
{
try
{
LoadFunctions();
// if we're here, everything succeeded!
}
catch(std::exception _e)
{
// output exception message, quit gracefully
}
// IRRESPECTIVE OF SUCCESS/FAILURE WE END UP HERE
return 0;
} // eo main
EDIT:
Okay, so lets say that you have an alternative method of loading functions should LoadFunctions() fail. You might be tempted to call that in the catch handler, but this way you'll quickly end up with a huge amount of nested exception handlers which just complicates things.
So now we get down to the question of design. LoadFunctions should succeed if functions are loaded and throw out an exception if it does not. In this hypothetical example of an alternative method of loading functions, that call should be within the LoadFunctions method. This alternative method does not need to be visible to the caller.
At the top level we either end up with functions, or we do not. Writing good exception handling, in my opinion is about getting rid of grey areas. The function did what it was told to do, or it didn't.

There is, as you say, a lot that can go wrong. You won't catch a bad cast there by the way. If the symbol exists but is not the type you are casting it to, you will just get a nasty shock later.
If you were to avoid exceptions you will need somewhere to report the error. As your LoadFunctions and GetFunctionFactory() do not know how you wish to handle the error (log it? print it to stderr? Put up a message box?) The only thing it can do is generate the error.
A common way do to that in C is to pass in a parameter into which it can put the error if one occurs, and for each function to "check" success before continuing. This can make the flow rather tricky.
The C++ concept of "throwing" the exception means that you do not need to keep passing a pointer (or reference) through each function. Where the error occurs you generate it and "throw" it - a bit like "shouting" it. This causes all code (other than cleanup in destructors) to halt until it finds a catcher that handles the error the way that is required.
Note that exceptions should only generally be used to handle errors, not a normal occurrence like encountering "end of file" when this is the way you know a read has completed.

Using exceptions instead of return-values (or any other method) is not supposed to change the behaviour of the code, only how it is written and organized. That means basically that first you decide what your recovery of a certain error is, be it more graceful or less, then you write the code to perform that.
Most experienced programmers (all practically) agree that exceptions are a much better method than return values. You can't see the big difference in short examples of a few functions, but in real systems of thousands of functions and types you would see it clearly. I will not get into more details of how it is better.
I suggest anyway you should just get yourself used to using exceptions by default. However note that using exceptions has some somewhat delicate issues (e.g. RAII http://en.wikipedia.org/wiki/RAII), that ultimately make your code better, but you should read about them in a book (I won't be able to describe here and feel that I do justice to the subject).
I think the book "Effective c++ / Scott Meyer" deals with that, certainly "Exceptional C++ / Herb Sutter". These books are a good jump start for any c++ developer if you havent read them anyway.

How to insulate a job/thread from crashes

I'm working on a library where I'm farming various tasks out to some third-party libraries that do some relatively sketchy or dangerous platform-specific work. (In specific, I'm writing a mathematical function parser that calls JIT-compilers, like LLVM or libjit, to build machine code.) In practice, these third-party libraries have a tendency to be crashy (part of this is my fault, of course, but I still want some insurance).
I'd like, then, to be able to very gracefully deal with a job dying horribly -- SIGSEGV, SIGILL, etc. -- without bringing down the rest of my code (or the code of the users calling my library functions). To be clear, I don't care if that particular job can continue (I'm not going to try to repair a crash condition), nor do I really care about the state of the objects after such a crash (I'll discard them immediately if there's a crash). I just want to be able to detect that a crash has occurred, stop the crash from taking out the entire process, stop calling whatever's crashing, and resume execution.
(For a little more context, the code at the moment is a for loop, testing each of the available JIT-compilers. Some of these compilers might crash. If they do, I just want to execute continue; and get on with testing another compiler.)
Currently, I've got a signal()-based implementation that fails pretty horribly; of course, it's undefined behavior to longjmp() out of a signal handler, and signal handlers are pretty much expected to end with exit() or terminate(). Just throwing the code in another thread doesn't help by itself, at least the way I've tested it so far. I also can't hack out a way to make this work using C++ exceptions.
So, what's the best way to insulate a particular set of instructions / thread / job from crashes?

Spawn a new process.

What output do you collect when a job succeeds?
I ask because if the output is low bandwidth I would be tempted to run each job in its own process.
Each of these crashy jobs you fire up has a high chance of corrupting memory used elsewhere in your process.
Processes offer the best protection.

Processes offer the best protection, but it's possible you can't do that.
If your threads' entry points are functions you wrote, (for example, ThreadProc in the Windows world), then you can wrap them in try{...}catch(...) blocks. If you want to communicate that an exception has occurred, then you can communicate specific error codes back to the main thread or use some other mechanism. If you want to log not only that an exception has occured but what that exception was, then you'll need to catch specific exception types and extract diagnostic information from them to communicate back to the main thread. A'la:
int my_tempermental_thread()
{
try
{
// ... magic happens ...
return 0;
}
catch( const std::exception& ex )
{
// ... or maybe it doesn't ...
string reason = ex.what();
tell_main_thread_what_went_wong(reason);
return 1;
}
catch( ... )
{
// ... definitely not magical happenings here ...
tell_main_thread_what_went_wrong("uh, something bad and undefined");
return 2;
}
}
Be aware that if you go this way you run the risk of hosing the host process when the exceptions do occur. You say you're not trying to correct the problem, but how do you know the malignant thread didn't eat your stack for example? Catch-and-ignore is a great way to create horribly confounding bugs.

On Windows, you might be able to use VirtualProtect(YourMemory, PAGE_READONLY) when calling the untrusted code. Any attempt to modify this memory would cause a Structured Exception. You can safely catch this and continue execution. However, memory allocated by that library will of course leak, as will other resources. The Linux equivalent is mprotect(YorMemory, PROT_READ), which causes a SEGV.

How applications can be protected from errors in DLL module

I have DLL and application that will call some function in this dll. For example...
DLL function:
char const* func1()
{
return reinterpret_cast<char const*>(0x11223344);
}
Application code:
func1 = reinterpret_cast<Func1Callback>(::GetProcAddress(hDll, "func1"));
blablabla
char const* ptr = func1();
cout << ptr;
That DLL is not under my control (plugin)..
Same code will cause access violation in my application, so... Is there any mechanism that will allow to determine such errors?

Since the DLL can do anything your program could do the only reliable way is to load it into a separate worker lightweight process and once anything bad happens just restart the process. You'll need some protocol to pass data into the worker process and receive results.

The dll is loaded into your process address space. If it accesses some invalid memory location your process will crash. I don't see any way around it other than not using this dll at all.

If a DLL is returning junk memory addresses to your application, I'd say you shouldn't be using it, because it isn't performing its intended/desired function.
Also, don't try to guard against it by using the Win32 IsBad*Ptr class of functions, as the MSDN documentation states (see also this post by Raymond Chen for a good description of why not).

You can enclose function call into try catch block, it'll help not to crush entire application right at function call, but nobody can garantee, that dll loaded will not alter memory somwhere in your code, in this case yo'll get hard to debug crash in other place. So you can probably raise some protection for just access violation, but anyway cannot fully protext you application for some side effects.

How to build a C++ Dll wrapper that catches all exceptions?

Like the title says, we’re looking for a way to catch all exceptions from a piece of C++ code, and wrap this in a dll. This way we can shield of the application that uses this dll, from any errors occurring in this dll.
However, this does not seem possible with C++ under Windows.
Example:
void function()
{
try
{
std::list<int>::iterator fd_it;
fd_it++;
} catch(...) {}
}
The exception that occurs is not caught by the standard C++ try/catch block, nor by any SEH translator function set by _set_se_translator(). Instead, the DLL crashes, and the program that uses the DLL is aborted. We compiled with Visual C++ 2005, with the option /SHa. Does anyone know if it’s possible in C++/Win32 to catch these kind of problems and make a rocksolid DLL wrapper?

The only way to make a rock solid DLL wrapper is to load the buggy DLL in another process, so that if it crashes it doesn't take your primary process down with it.
Catching all C++ exceptions seems reasonable, but catching all structured exceptions is another story. SEH might seem to get you most of the way there, because it allows you to catch access violations, divide-by-zero exceptions, etc.
But what if the buggy DLL happens to touch an uncommitted page from another thread's stack? The memory access will page fault, the exception handler will be invoked, and now that page is no longer a guard page. When that thread needs to grow the stack, it will get an access violation, and the process will crash. (These posts describe this case in more detail.)
Another likely problem: the buggy DLL crashes while holding a synchronization object, but you use SEH to catch the exception. If your process attempts to acquire the same synchronization object, then it deadlocks instead of crashing. The shared synchronization object may be part of the C runtime or the OS: what if buggy DLL 1 loads buggy DLL 2, which crashes in its DllMain() while buggy DLL 1 is holding the loader lock? Will your process deadlock the next time it loads a DLL?
For more information on why this (and functions like IsBadReadPtr(), which have similar problems) is a misuse of SEH:
Larry Osterman's WebLog: Structured Exception Handling Considered Harmful
Larry Osterman's WebLog: Should I check the parameters to my function?
Larry Osterman's WebLog: Resilience is NOT necessarily a good thing
The Old New Thing: IsBadXxxPtr should really be called CrashProgramRandomly

On Windows, C++ has 2 different styles of exceptions: C++ and SEH exceptions.
SEH is a windows only form of exceptions (somewhat akin to signals in UNIX). It's more of a system level exception. They will be thrown for such operations as invalid pointer accesses, alignment issues, etc ...
If you want to catch every exception that can be thrown by a C++ app on windows you will need to catch both. Fortunately there is a way to mix the use of C++ and SEH exceptions. I wrote a detailed blog post about this recently, it should help you out.
http://blogs.msdn.com/jaredpar/archive/2008/01/11/mixing-seh-and-c-exceptions.aspx

Incrementing an iterator on a standard libary container will never throw a C++ exception. It may give you undefined behaviour.

The code below is taken from the Zeus IDE. It will trap any Windows generated exceptions:
Step #1: Define an Exception Filter Function
DWORD ExceptionFilter(EXCEPTION_POINTERS *pointers, DWORD dwException)
{
//-- we handle all exceptions
DWORD dwResult = EXCEPTION_EXECUTE_HANDLER;
switch (dwException)
{
case EXCEPTION_ACCESS_VIOLATION:
case EXCEPTION_DATATYPE_MISALIGNMENT:
case EXCEPTION_ARRAY_BOUNDS_EXCEEDED:
case EXCEPTION_FLT_DENORMAL_OPERAND:
case EXCEPTION_FLT_DIVIDE_BY_ZERO:
case EXCEPTION_FLT_INEXACT_RESULT:
case EXCEPTION_FLT_INVALID_OPERATION:
case EXCEPTION_FLT_OVERFLOW:
case EXCEPTION_FLT_STACK_CHECK:
case EXCEPTION_FLT_UNDERFLOW:
case EXCEPTION_INT_DIVIDE_BY_ZERO:
case EXCEPTION_INT_OVERFLOW:
case EXCEPTION_PRIV_INSTRUCTION:
case EXCEPTION_NONCONTINUABLE_EXCEPTION:
case EXCEPTION_BREAKPOINT:
dwResult = EXCEPTION_EXECUTE_HANDLER;
break;
}
return dwResult;
}
Step #2: Wrap the code in a __try and __except as shown below:
__try
{
// call your dll entry point here
}
__except(ExceptionFilter(GetExceptionInformation(),
GetExceptionCode()))
{
//-- display the fatal error message
MessageBox(0, "An unexpected error was caught here!",
"Unexpected Error", MB_OK);
}

Konrad Rudolph: Of course this code contains a "logic error", it's to illustrate a problem that could occur. Like the man says he wants to be able to shield his dll from any possible errors. You don't think this is a legitimate question? Heard of vendor products. Some of us live in the real world and live with real problems. It's just not possible to fix everyone else's problems

This code contains a logic error, if at all. Since a logic error constitues a bug, don't swallow the exception – fix the error!
Of course, this is specific to the particular code. Others have offered more general advice. However, I've found that a lot of people actually do prefer catching exceptions over fixing logic errors and this is simply unacceptable.

Have you looked at the windows API function SetUnhandledExceptionFilter?
I usually call it in the DllMain function and have it generate a minidump when the DLL crashes. However: (a) I don't know if it traps application exceptions as well as DLL exceptions, and (b) I don't know if you can have the handler return in such a way that program execution can continue. The docs say yes, but I've never done it.

What are you going to do after you catch the exception (especially for SEH exceptions)?
In reality you can make no assumptions about the state of the process, realistically the only option you have is to (optionally) dump core and exit.
Any attempt and proceeding is absolutely going to cause you problems in the long run.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js