Intercepting signals/mach exceptions in a C++ plugin on macOS - c++

I'm working on a suite of plugins for a certain host application, targeting both Windows and Mac (OSX+). The plugins are written in C++. I would like to add crash & exception report handling to them in case the plugin goes rogue. This in order to not bring the whole host app down in case the plugin misbehaves, but get some feedback instead and just skip that one plugin call (and making follow-up plugin calls a sort of no-op). Think logging some state and slapping an error code + text on it. From then on the plugin switches into an error state where requesting details returns this state.
This is a big legacy code base which has been improved greatly over time, but there are still some rough edges here and there, so answers like "just don't crash" is not what I'm looking for :) It also doesn't help I'm much more of a Windows developer than a macOS developer, so I might have overlooked something completely obvious.
I've covered unhandled C++ exceptions in a cross-platform way by wrapping each host->plugin callback in a big C++ try/catch block right at the plugin entry points.
I'm also handling crashes, div-by-zero, access violations etc. for Windows at that same spot by using __try/__except and registering a SEH handler.
Now I want to do the latter as well for Mac. But here I'm struggling with finding out what my options are, if any.
I looked into signal handlers, but what I glean from that is that they are process-wide. I.e.: not plugin-friendly, especially when multiple of these plugins can be used by the host concurrently (who will catch and thus handle the signals first?). And the host app already has it's own crash handler, possibly using a signal handler, so installing our own would make it a fight over who's in charge I think? Plus that my reporting options are extremely limited in such handlers; if possible I'd like to have a bit more freedom here (like using std::strings with the new/delete they imply).
Then there's also Mach exception handling. But I totally fail to get informative results when googling this in combination with 'plugin'...
Does anyone have any advice on what route to go, or which option is better in my situation?

The only options on macOS are signal handlers and Mach exception handlers.
Both of these mechanisms are process-wide, so would report problems wherever they occurred.
If a new signal handler is installed, the old one will not be run. The sigaction() API does return the previously installed one, so it's possible to have it run as well as your new one. Then again another signal handler might get installed and replace yours.
There's a very useful post here that goes into detail about implementing a signal handler - https://developer.apple.com/forums/thread/113742
The situation with Mach exception handlers is pretty much the same, calling task_set_exception_ports() will override the previously set handlers, so these have to be restored once your new handler has run if you want to propagate the exception. One big advantage of Mach exception handlers is that they can be run in a separate process, in which you're free to use std::strings etc. at the expense of it being more difficult to examine the crashed process's state.
There is little documentation around Mach exception handling, the best references are the various open-source crash reporting frameworks.
Overall it's difficult to properly implement crash detection, and I'd advise against doing it in a plugin. It's a LOT more complicated than SEH.

Related

How can I intercept segfaults in C++ on Windows?

I am not a Windows developer(!) in any way, but I currently work on a Windows only project.
The project is very old and a lot of people have been working on it. It looks like the original development team took the big old book of anti-patterns and applied all of them wherever they saw it possible. Fixing bugs is hard. Much much harder than it should be. There are plenty of crashes and just general slowing down of stuff. When stuff crash, some resources still has to be cleaned up. In particular, the program may reserve some screen space for a tool bar -- not cleaning up that space means a part of the screen won't be available to other programmes.
I've tried several approaches, based on various other peoples attempts:
using c-style signals (<csignal> -- setting signal handlers on all defined signals)
using std::set_terminate
using DllMain to set signals/terminate
using __try/__except
using system to invoke a copy of the program and communicating resources (a HWND) through the registry. This one was a long shot, but I had to try it.
None of them worked as hoped -- except the system bit, I couldn't get any of the error handling code to run at all.
We're using Visual Studio 2012, so we have C++(ish)11(ish) available.
As the comments note, drop C++. You can't trust the C++ library anymore once your program has corrupted memory all over the place.
The first step is to figure out the event you're reacting to. "SegFault" is POSIX. Windows has Access Violations (the famous C0000005). This might also be why you were misled by signal. It's a bit of POSIX that ended up in C. Windows simply does not use signal.
The next step is how you react to them. My preference is a Vectored Exception Handler. Structured Exception Handling assumes the stack is somewhat sane, and that too is a guess. A vectored Exception Handler in effect is a hard jump. We're not going to return, just doing cleanup before calling TerminateProcess. Same pattern again: ExitProcess is what you would use if the state of your program could be trusted, but we don't.
In your Vectored Exception Handler, you'll query the OS about the existence of that toolbar. Don't believe your own program: you can't trust it, and besides, if the OS doesn't think there's a toolbar, then there isn't one. Use the handle returned by the OS, and destroy that toolbar. Then commit suicide by TerminateProcess

Application crash with no explanation

I'd like to apologize in advance, because this is not a very good question.
I have a server application that runs as a service on a dedicated Windows server. Very randomly, this application crashes and leaves no hint as to what caused the crash.
When it crashes, the event logs have an entry stating that the application failed, but gives no clue as to why. It also gives some information on the faulting module, but it doesn't seem very reliable, as the faulting module is usually different on each crash. For example, the latest said it was ntdll, the one before that said it was libmysql, the one before that said it was netsomething, and so on.
Every single thread in the application is wrapped in a try/catch (...) (anything thrown from an exception handler/not specifically caught), __try/__except (structured exceptions), and try/catch (specific C++ exceptions). The application is compiled with /EHa, so the catch all will also catch structured exceptions.
All of these exception handlers do the same thing. First, a crash dump is created. Second, an entry is logged to a new file on disk. Third, an entry is logged in the application logs. In the case of these crashes, none of this is happening. The bottom most exception handler (the try/catch (...)) does nothing, it just terminates the thread. The main application thread is asleep and has no chance of throwing an exception.
The application log files just stop logging. Shortly after, the process that monitors the server notices that it's no longer responding, sends an alert, and starts it again. If the server monitor notices that the server is still running, but just not responding, it takes a dump of the process and reports this, but this isn't happening.
The only other reason for this behavior that I can come up with, aside from uncaught exceptions, is a call to exit or similar. Searching the code brings up no calls to any functions that could terminate the process. I've also made sure that the program isn't terminating normally (i.e. a stop request from the service manager).
We have tried running it with windbg attached (no chance to use Visual Studio, the overhead is too high), but it didn't report anything when the crash occurred.
What can cause an application to crash like this? We're beginning to run out of options and consider that it might be a hardware failure, but that seems a bit unlikely to me.
If your app is evaporating an not generating a dump file, then it is likely that an exception is being generated which your app doesnt (or cant) handle. This could happen in two instances:
1) A top-level exception is generated and there is no matching catch block for that exception type.
2) You have a matching catch block (such as catch(...)), but you are generating an exception within that handler. When this happens, Windows will rip the bones from your program. Your app will simply cease to exist. No dump will be generated, and virtually nothing will be logged, This is Windows' last-ditch effort to keep a rogue program from taking down the entire system.
A note about catch(...). This is patently Evil. There should (almost) never be a catch(...) in production code. People who write catch(...) generally argue one of two things:
"My program should never crash. If anything happens, I want to recover from the exception and continue running. This is a server application! ZOMG!"
-or-
"My program might crash, but if it does I want to create a dump file on the way down."
The former is a naive and dangerous attitude because if you do try to handle and recover from every single exception, you are going to do something bad to your operating footprint. Maybe you'll munch the heap, keep resources open that should be closed, create deadlocks or race conditions, who knows. Your program will suffer from a fatal crash eventually. But by that time the call stack will bear no resemblance to what caused the actual problem, and no dump file will ever help you.
The latter is a noble & robust approach, but the implementation of it is much more difficult that it might seem, and it fraught with peril. The problem is you have to avoid generating any further exceptions in your exception handler, and your machine is already in a very wobbly state. Operations which are normally perfectly safe are suddenly hand grenades. new, delete, any CRT functions, string formatting, even stack-based allocations as simple as char buf[256] could make your application go >POOF< and be gone. You have to assume the stack and the heap both lie in ruins. No allocation is safe.
Moreover, there are exceptions that can occur that a catch block simply can't catch, such as SEH exceptions. For that reason, I always write an unhandled-exception handler, and register it with Windows, via SetUnhandledExceptionFilter. Within my exception handler, I allocate every single byte I need via static allocation, before the program even starts up. The best (most robust) thing to do within this handler is to trigger a seperate application to start up, which will generate a MiniDump file from outside of your application. However, you can generate the MiniDump from within the handler itself if you are extremely careful no not call any CRT function directly or indirectly. Basically, if it isn't an API function you're calling, it probably isn't safe.
I've seen crashes like these happen as a result of memory corruption. Have you run your app under a memory debugger like Purify to see if that sheds some light on potential problem areas?
Analyze memory in a signal handler
http://msdn.microsoft.com/en-us/library/xdkz3x12%28v=VS.100%29.aspx
This isn't a very good answer, but hopefully it might help you.
I ran into those symptoms once, and after spending some painful hours chasing the cause, I found out a funny thing about Windows (from MSDN):
Dereferencing potentially invalid
pointers can disable stack expansion
in other threads. A thread exhausting
its stack, when stack expansion has
been disabled, results in the
immediate termination of the parent
process, with no pop-up error window
or diagnostic information.
As it turns out, due to some mis-designed data sharing between threads, one of my threads would end up dereferencing more or less random pointers - and of course it hit the area just around the stack top sometimes. Tracking down those pointers was heaps of fun.
There's some technincal background in Raymond Chen's IsBadXxxPtr should really be called CrashProgramRandomly
Late response, but maybe it helps someone: every Windows app has a limit on how many handles can have open at any time. We had a service not releasing a handle in some situation, the service would just disappear, after a few days, or at times weeks (depending on the usage of the service).
Finding the leak was great fun :D (use Task Manager to see thread count, handles count, GDI objects, etc)

Automatically Relaunch Application On Crash?

On Android, I'm running an application using the NDK that runs a series of tests in C++. If ever one of the tests fails, which most likely means a crash, I'd like the application to relaunch itself and start at the next test.
I wish I could use exceptions but the NDK doesn't support them.
Is this possible?
Why does your application have to crash? Why not catch any exception being thrown? Even the compiler doesn't enforce you to add a try..catch block, RuntimeExceptions might still be thrown.
You can also use Thread.setDefaultUncaughtExceptionHandler. Note that this must be called per thread.
If, for some reason, the solutions above are not suitable for you, you could create a background service that acts as a watchdog timer.
EDIT: Check this link: for a custom version of the NDK that supports C++ exceptions. I found it in this thread.

Catch unhandled exception of invisible thread

In my C++ application i use an activeX component that runs its own thread (or several I don't know). Sometimes this components throws exceptions. I would like to catch these exceptions and do recovery instead of my entire application crashing. But since I don't have access to its source code or thread I am unsure how it would be done.
The only solution I can think of is to run it in its own process. Using something like CreateProcess and then CreateRemoteThread, unsure how it could be implemented.
Any suggestion on how to go about solving this?
If the ActiveX component is launching its own threads, then there isn't a lot that you can do. You could set a global exception handler and try to swallow exceptions, but this creates a high likelihood that your program state will become corrupted and lead to bizarre "impossible" crashes down the road.
Running the buggy component in a separate process is the most robust solution, as you'll be able to identify and recover from fatal errors without compromising your own program state.
Try setting up an exception filter with SetUnhandledExceptionFilter().

Is it possible for a process to catch an unhandled exception of another process on windows?

Is it possible for a process to catch an unhandled exception of another process on the system?
If possible, under which circumstances is it possible? Is it for instance possible if the second process is not started by the first?
I am mainly looking for an answer regarding native c++.
Native (AKA standard) C++ has no real concept of multiple processes, and has no means of catching exceptions thrown across process boundaries. And no means of throwing exceptions across such boundaries, come to that.
Windows Exceptions: Structured Exception Handling (SEH) is per thread. Another thread in the process might be able to manipulate the stack of the target thread to insert its own handler, but that is going to be hard to get right (especially with the lack of consistent calling convention on x86). Another process could inject a dll & thread into a process to do this. This will be hard to get right, especially without close coupling to the details of the target process (what functions are called and how).
On second thoughts debuggers can do this, so the Win32 debugger APIs must have this capability. A process can debug other processes in the same session (with lower or equal integrity level), or if the user has the "debug process" privileged any process.
Yes. Matt Pietrek explains how. Scroll down to the "VectoredExceptionHandling is a clean, easily extensible way to see all exceptions" part. There's example code as well.