Windows: Handle segfaults in all threads

Windows: Handle segfaults in all threads - c++

I'm looking for a way to catch segfaults and other errors anywhere in a program (which uses multiple threads, some of which are created by external libraries). I'm using Visual Studio 2013 with the Intel C++ Compiler 2015.
Some external DLL's - in some cases I've even seen this with Windows drivers - can contain bugs that are out of my control, and my software runs 24/7 - I need to be able to log a crash somewhere and restart my software.
So far, I found that you can set a signal handler which handles SIGSEGV and other signals. Based on what I read, under Linux this would do exactly what I need (handle this signal for all threads), but under Windows you need to set the signal handler for each thread separately. Since I'm not the one who creates all the threads (if I was, I could just use __try/__catch), this isn't really an option. On top of that, I'm seeing that when I set a signal handler in a thread and then cause a SIGSEGV it doesn't get handled by the handler, while the exact same code works fine in the main thread - not sure what's going on there (but even a fix for that wouldn't help, since I don't create all the threads, and looping through all existing threads in my process to set handlers sounds like a very bad idea).
So, is there a way to do this? Googling and searching here did not help - I found several people with similar questions but no answers that are usable in my situation.
Note: What I have now, which works perfectly in the main thread but not at all if I copy this same block of code to any thread:
SignalHandlerPointer previousHandlerSEGV = signal(SIGSEGV, SignalHandler);
int *a;
a = NULL;
*a = 0;

To get notified about all unhandled exceptions in a process you can call SetUnhandledExceptionFilter. The functionality is documented as:
Issuing SetUnhandledExceptionFilter replaces the existing top-level exception filter for all existing and all future threads in the calling process.
Inside the exception filter it is recommended to trigger a call to MiniDumpWriteDump (in an external process), to produce a mini dump for off-line analysis. You get to control the amount of information that is written into the mini dump (e.g. threads, modules, call stacks, memory). Most importantly, you can dump the call stack of the thread that raised the uncaught exception, at the time the exception is raised.
As an alternative, I believe some/most/all of this can be done automatically by registering for application recovery and restart.

Related

When does Windows Error Reporting create a dump file? Is it configurable? Has this changed in Windows 7?

I'm relying on Windows Error Reporting to create full user-mode dumps for a large, multi-threaded application. I know that when I started using it (early 2012) these dumps contained all application memory, and full stacks for all threads that were accurate for the time the application crashed (threw the unhandled exception, etc). But at at some unknown point in the last year, crash dumps created by WER have changed. They still contain all memory, but only show one thread, and the stack appears to be from after the process is already shutting down:
ntdll.dll!_LdrpCallInitRoutine#16() + 0x14 bytes
ntdll.dll!_LdrShutdownProcess#0() + 0x141 bytes
ntdll.dll!_RtlExitUserProcess#4() + 0x74 bytes
kernel32.dll!_UnhandledExceptionFilter#4() + 0x18928 bytes
This is an unmanaged (unmanagable?) 32-bit C++ application compiled with VS2010 SP1, running on 64-bit Win7 SP1 (and kept updated). Does anyone know of any Windows updates that have changed WER behavior in the last year? Is there something configurable other than 'HKLM\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\AppName.exe'?
Also, killing the application by calling 'RaiseFailFastException' still results in a good dump with valid stacks for all threads.

Aha! A third-party library was calling SetUnhandledExceptionFilter, which prevented Windows Error Reporting from getting the original exception. Their handler did some internal cleanup, then called abort, at which point WER was finally able to create a dump.
For anyone encountering this sort of problem, I recommend checking that the installed handlers (the return values of SetUnhandledExceptionFilter, set_terminate, etc) are what you expect (null if you're relying on WER, or your own handlers or CrashRpt).

It is better to make your application self-dump-writer when it crashes. You just need to call SetUnhandledExceptionFilter function, specify the call back. In the call back, use MiniDumpWriteDump function with MiniDumpWithFullMemory dump type.
Your exception filter will be called whenever unhandled (by your code) exception occurs. In the call back, it will be better to enumerate and suspend all other threads of your process.
You may also need to install a hook for SetUnhandledExceptionFilter itself! Why? Well, CRT would always disable any installed exception filter, and by the hooked function, you may avoid that.

How do I Delay destruction of my process if another thread causes it to crash on windows?

I am writing a DLL which is loaded by a proprietary program which is closed source and I have no control over. I also load a Proprietary DLL which is just as obscure. Since I sometimes have to relay commands I get through my DLLs interface to the DLL I Load with very low latency I launch a separate detached thread upon initializing my DLL and send it unformatted debug information through a lock free queue. The time consuming formating of debug output and writing to a log file is thus done asynchronously. The problem is that the process crashes inadvertently (which I am almost certain is no my fault) and I have no way of knowing what the last debug info was because my detached thread is killed by windows before it can write it to disk.
So here is my question:
Can I delay destruction in any way if the proprietary program crashes so that my detached thread runs longer before destruction?
Would interprocess communication solve my problem by moving my detached thread to another process which windows would not kill? If so what method would you suggest (I have not worked with IPC much)
If I use IPC how do I know when to terminate my "debug formating process"?

I think even with IPC you would have the situation where your thread may have un-written debug information during a process fault. Presumably you don't have debugging going on all the time, so I'd think you wouldn't need a separate thread for it, just a compile-time or run-time option to enable it. You might be able to SetUnhandledExceptionFilter for the process and do a few things before you terminate.

Is there a single catch-all-failures hook in c++?

I want that when and if the program will fail than it will be caught at this handler in order to do some guard notifications.
Is there a bottom handler or list of handlers that I need to register in order to be sure that a program cannot crash without passing through my handler?
Running on ubuntu and solution needed only to ubuntu
I need all kind of failure like exception memory allocation ...

The simple answer is that there is no single point where you can handle all errors in the program. You can add a try/catch (...) at in main to handle exceptions that occur after main is entered and before it completes. You can also add a handler for terminate in C++. Then depending on the OS you will also need to handle other situations differently (invalid memory references can be handled in unix/linux by handling SIG_SEGV, but that will not work in Windows --AFAIK; some other errors might trigger different signals that could or not be handled...) Further than that, there might be errors that still get unnoticed (say an invalid memory access that happens to hit a valid memory address... the program will be incorrect, but the error might go undetected)

C++ does not run in a virtual sandbox, thus there is nothing built-in to the language to catch this. You can certainly build one yourself (for example using exceptions), but it's up to your code to construct this from the foundation up.
The platform you're running on may have something you can use though. For example in Windows there is SetUnhandledExceptionFilter.
Of course all of this still depends on what it means to "crash".

On process startup, call fork. Use the parent to monitor the child. If it encounters a fatal error, the process will go away. You can detect this and do whatever you need to do when that happens. If the child wishes to terminate normally, it can simply kill its parent before terminating.

For a normal program exit you can register a handler with std::atexit().
For a program exit because of uncaught exceptions/... you can register a handler with std::set_terminate. If by "exception memory allocation" you mean a std::bad_alloc exception, than this handler should be triggered.

In Linux You need to respond to SIGABRT Signal. Your callback will be called whenever your app gets SIGABRT signal
signal(SIGABRT, &callback);
There are different Signals for different Scenarios such as SIGSEGV, SIGBUS that you ned to hook. you better hook them in different callbacks and check which error goes into what. because one error might come due to multiple problems.

No. If the process is killed with a SIGKILL, for example, no handler will be run.
P.S. FYI, this has nothing to do with the SPOF.

You can put a try/catch(...) block at the top level to catch all exceptions. But there are other ways for the program to be terminated and the ways of catching these aren't portable. On Unix-based systems you'll have to create signal handlers but even those won't stop kill -9.

Application crash with no explanation

I'd like to apologize in advance, because this is not a very good question.
I have a server application that runs as a service on a dedicated Windows server. Very randomly, this application crashes and leaves no hint as to what caused the crash.
When it crashes, the event logs have an entry stating that the application failed, but gives no clue as to why. It also gives some information on the faulting module, but it doesn't seem very reliable, as the faulting module is usually different on each crash. For example, the latest said it was ntdll, the one before that said it was libmysql, the one before that said it was netsomething, and so on.
Every single thread in the application is wrapped in a try/catch (...) (anything thrown from an exception handler/not specifically caught), __try/__except (structured exceptions), and try/catch (specific C++ exceptions). The application is compiled with /EHa, so the catch all will also catch structured exceptions.
All of these exception handlers do the same thing. First, a crash dump is created. Second, an entry is logged to a new file on disk. Third, an entry is logged in the application logs. In the case of these crashes, none of this is happening. The bottom most exception handler (the try/catch (...)) does nothing, it just terminates the thread. The main application thread is asleep and has no chance of throwing an exception.
The application log files just stop logging. Shortly after, the process that monitors the server notices that it's no longer responding, sends an alert, and starts it again. If the server monitor notices that the server is still running, but just not responding, it takes a dump of the process and reports this, but this isn't happening.
The only other reason for this behavior that I can come up with, aside from uncaught exceptions, is a call to exit or similar. Searching the code brings up no calls to any functions that could terminate the process. I've also made sure that the program isn't terminating normally (i.e. a stop request from the service manager).
We have tried running it with windbg attached (no chance to use Visual Studio, the overhead is too high), but it didn't report anything when the crash occurred.
What can cause an application to crash like this? We're beginning to run out of options and consider that it might be a hardware failure, but that seems a bit unlikely to me.

If your app is evaporating an not generating a dump file, then it is likely that an exception is being generated which your app doesnt (or cant) handle. This could happen in two instances:
1) A top-level exception is generated and there is no matching catch block for that exception type.
2) You have a matching catch block (such as catch(...)), but you are generating an exception within that handler. When this happens, Windows will rip the bones from your program. Your app will simply cease to exist. No dump will be generated, and virtually nothing will be logged, This is Windows' last-ditch effort to keep a rogue program from taking down the entire system.
A note about catch(...). This is patently Evil. There should (almost) never be a catch(...) in production code. People who write catch(...) generally argue one of two things:
"My program should never crash. If anything happens, I want to recover from the exception and continue running. This is a server application! ZOMG!"
-or-
"My program might crash, but if it does I want to create a dump file on the way down."
The former is a naive and dangerous attitude because if you do try to handle and recover from every single exception, you are going to do something bad to your operating footprint. Maybe you'll munch the heap, keep resources open that should be closed, create deadlocks or race conditions, who knows. Your program will suffer from a fatal crash eventually. But by that time the call stack will bear no resemblance to what caused the actual problem, and no dump file will ever help you.
The latter is a noble & robust approach, but the implementation of it is much more difficult that it might seem, and it fraught with peril. The problem is you have to avoid generating any further exceptions in your exception handler, and your machine is already in a very wobbly state. Operations which are normally perfectly safe are suddenly hand grenades. new, delete, any CRT functions, string formatting, even stack-based allocations as simple as char buf[256] could make your application go >POOF< and be gone. You have to assume the stack and the heap both lie in ruins. No allocation is safe.
Moreover, there are exceptions that can occur that a catch block simply can't catch, such as SEH exceptions. For that reason, I always write an unhandled-exception handler, and register it with Windows, via SetUnhandledExceptionFilter. Within my exception handler, I allocate every single byte I need via static allocation, before the program even starts up. The best (most robust) thing to do within this handler is to trigger a seperate application to start up, which will generate a MiniDump file from outside of your application. However, you can generate the MiniDump from within the handler itself if you are extremely careful no not call any CRT function directly or indirectly. Basically, if it isn't an API function you're calling, it probably isn't safe.

I've seen crashes like these happen as a result of memory corruption. Have you run your app under a memory debugger like Purify to see if that sheds some light on potential problem areas?

Analyze memory in a signal handler
http://msdn.microsoft.com/en-us/library/xdkz3x12%28v=VS.100%29.aspx

This isn't a very good answer, but hopefully it might help you.
I ran into those symptoms once, and after spending some painful hours chasing the cause, I found out a funny thing about Windows (from MSDN):
Dereferencing potentially invalid
pointers can disable stack expansion
in other threads. A thread exhausting
its stack, when stack expansion has
been disabled, results in the
immediate termination of the parent
process, with no pop-up error window
or diagnostic information.
As it turns out, due to some mis-designed data sharing between threads, one of my threads would end up dereferencing more or less random pointers - and of course it hit the area just around the stack top sometimes. Tracking down those pointers was heaps of fun.
There's some technincal background in Raymond Chen's IsBadXxxPtr should really be called CrashProgramRandomly

Late response, but maybe it helps someone: every Windows app has a limit on how many handles can have open at any time. We had a service not releasing a handle in some situation, the service would just disappear, after a few days, or at times weeks (depending on the usage of the service).
Finding the leak was great fun :D (use Task Manager to see thread count, handles count, GDI objects, etc)

Catch unhandled exception of invisible thread

In my C++ application i use an activeX component that runs its own thread (or several I don't know). Sometimes this components throws exceptions. I would like to catch these exceptions and do recovery instead of my entire application crashing. But since I don't have access to its source code or thread I am unsure how it would be done.
The only solution I can think of is to run it in its own process. Using something like CreateProcess and then CreateRemoteThread, unsure how it could be implemented.
Any suggestion on how to go about solving this?

If the ActiveX component is launching its own threads, then there isn't a lot that you can do. You could set a global exception handler and try to swallow exceptions, but this creates a high likelihood that your program state will become corrupted and lead to bizarre "impossible" crashes down the road.
Running the buggy component in a separate process is the most robust solution, as you'll be able to identify and recover from fatal errors without compromising your own program state.

Try setting up an exception filter with SetUnhandledExceptionFilter().

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js