I'm working on a C++ application which uses a library written in C by another team. The writers of the library like to call exit() when errors happen, which ends the program immediately without calling the destructors of objects on the stack in the C++ application. The application sets up some system resources which don't automatically get reclaimed by the operating system after the process ends (shared memory regions, interprocess mutexes, etc), so this is a problem.
I have complete source code for both the app and the library, but the library is very well-established and has no unit tests, so changing it would be a big deal. Is there a way to "hook" the calls to exit() so I can implement graceful shutdown for my app?
One possibility I'm considering is making one big class which is the application - meaning all cleanup would happen either in its destructor or in the destructor of one of its members - then allocating one of these big objects on the heap in main(), setting a global pointer to point to it, and using atexit() to register a handler which simply deletes the object via the global pointer. Is that likely to work?
Is there a known good way to approach this problem?
In the very worst case, you can always write your own implementation of exit and link it rather than the system's own implementation. You can handle the errors there, and optionally call _exit(2) yourself.
Since you have the library source, it's even easier - just add a -Dexit=myExit flag when building it, and then provide an implementation of myExit.
install exit handler with atexit and implement the desired behavior
If you want to make the C library more usable from C++, you could perhaps run it in a separate process. Then make sure (with an exit handler or otherwise) that when it exits, your main application process notices and throws an exception to unwind its own stack. Perhaps in some cases it could handle the error in a non-fatal way.
Of course, moving the library use into another process might not be easy or particularly efficient. You'll have some work to do to wrap the interface, and to copy inputs and outputs via the IPC mechanism of your choice.
As a workaround to use the library from your main process, though, I think the one you describe should work. The risk is that you can't identify and isolate everything that needs cleaning up, or that someone in future modifies your application (or another component you use) on the assumption that the stack will get unwound normally.
You could modify the library source to call a runtime- or compile-time-configurable function instead of calling exit(). Then compile the library with exception-handling and implement the function in C++ to throw an exception. The trouble with that is that the library itself probably leaks resources on error, so you'd have to use that exception only to unwind the stack (and maybe do some error reporting). Don't catch it and continue even if the error could be non-fatal as far as your app is concerned.
If the call exit and not assert or abort, there are a few points to get control again:
When calling exit, the destructors for objects with static lifetime (essentially: globals and objects declared with static) are still executed. This means you could set up a (few) global "resource manager" object(s) and release the resources in their destructor(s).
As you already found, you can register hooks with atexit. This is not limited to one. You can register more.
If all else fails, because you have the source of the library, you can play some macro tricks to effectively replace the calls to exit with a function of your own that could, for example, throw an exception.
Related
I work on a product that's usually built as a shared library.
The using application will load it, create some handles, use them, and eventually free all the handles and unload the library.
The library creates some background threads which are usually stopped at the point the handles are freed.
Now, the issue is that some consuming applications aren't super well-behaved, and will fail to free the handles in some cases (cancellation, errors, etc). Eventually, static destructors in our library run, and crash when they try to interact with the (now dead) background thread(s).
One possibility is to not have any global objects with destructors, and so to avoid running any code in the library during static destruction. This would probably solve the crash on process exit, but it would introduce leaks and crashes in the scenario where the application simply unloads the library without freeing the handles (as opposed to exiting), as we wouldn't ensure that the background threads are actually stopped before the code they were running was unloaded.
More importantly, to my knowledge, when main() exits, all other threads will be killed, wherever they happened to be at the time, which could leave locks locked, and invariants broken (for example, within the heap manager).
Given that, does it even make sense to try and support these buggy applications?
Yes, your library should allow the process to exit without warning. Perhaps in an ideal world every program using your library would carefully track the handles and free them all when it exits for any reason, but in practice this isn't a realistic requirement. The code path that is triggering the program exit might be a shared component that isn't even aware that your library is in use!
In any case, it is likely that your current architecture has a more general problem, because it is inherently unsafe for static destructors to interact with other threads.
From DllMain entry point in MSDN:
Because DLL notifications are serialized, entry-point functions should not attempt to communicate with other threads or processes. Deadlocks may occur as a result.
and
If your DLL is linked with the C run-time library (CRT), the entry point provided by the CRT calls the constructors and destructors for global and static C++ objects. Therefore, these restrictions for DllMain also apply to constructors and destructors and any code that is called from them.
In particular, if your destructors attempt to wait for your threads to exit, that is almost certain to deadlock in the case where the library is explicitly unloaded while the threads are still running. If the destructors don't wait, the process will crash when the code the threads are running disappears. I'm not sure why you aren't seeing that problem already; perhaps you are terminating the threads? (That's not safe either, although for different reasons.)
There are a number of ways to resolve this problem. Probably the simplest is the one you already mentioned:
One possibility is to not have any global objects with destructors, and so to avoid running any code in the library during static destruction.
You go on to say:
[...] but it would introduce leaks and crashes in the scenario where the application simply unloads the library without freeing the handles [...]
That's not your problem! The library will only be unloaded if the application explicitly chooses to do so; obviously, and unlike the earlier scenario, the code in question knows your library is present, so it is perfectly reasonable for you to require that it close all your handles before doing so.
Ideally, however, you would provide an uninitialization function that closes all the handles automatically, rather than requiring the application to close each handle individually. Explicit initialization and uninitialization functions also allows you to safely set up and free global resources, which is usually more efficient than doing all of your setup and teardown on a per-handle basis and is certainly safer than using global objects.
(See the link above for a full description of all the restrictions applicable to static constructors and destructors; they are quite extensive. Constructing all your globals in an explicit initialization routine, and destroying them in an explicit uninitialization routine, avoids the whole messy business.)
We have a third-party library that was written without multithreading or exception handling in mind. Our main executable is multithreaded and uses exceptions.
The third-party library uses exit() to abort the program for serious problems (like "driver not initialized" or "file not found"). Calling exit() in a multithreaded application is not allowed, as it does not shut down threads correctly. In addition, I really don't want to ever exit the main application, as it is a server application, and in many cases, there are proactive things that the main program can do to recover from the error.
I would like to essentially replace the system provided exit(int status) function with my own function, ie
class exit_exception : public runtime_error
{
public: exit_exception(int status)
: runtime_error("exit called with status " + to_string(status)) {}
};
extern "C" void exit(int status) {
throw exit_exception(status);
}
and catch the exception in my code. It seems to work, but this is obviously a hack and not the way nature intended exit() to be used. What am I doing wrong without knowing?
edit
Many have suggested I put this in a separate process, but that would defeat many things. The third-party library does very high speed data transfer that needs to be in the main application process because it lives in the same virtual memory space and does not use malloc to allocate memory from the FPGA coprocessor that is controller. This code is close to the "iron" and is squeezing every bit of bandwidth out of the memory and PCIe busses.
edit 2
My program can still return status codes to the OS with the return value from int main(), which does not ultimately call exit(). Otherwise I would be in real trouble.
This is just an idea, but you could use a similar approach as i did when i needed to wrap memcpy to use some different version, take a look at my answer here.
So you could build a replacement for the exit() function that does nothing, or do some cleanup. It's just an idea and i have not tried it, but it could help you to solve your problem.
The biggest and most obvious risk is resource leakage.
If the library thinks its error handling strategy is to dive out of the nearest window there's alway a risk that throwing that exception isn't going to result in well organised release of memory and system resources.
However I notice you mention it doesn't allocate memory with malloc() if that means you have to provide it with all its resources as buffers and the like then maybe by some miraculous accident it is safely unwindable!
If that fails all I can suggest is contact the supplier and persuade them that they should join the rest of us in 21st programming paradigms.
PS: Throwing an exception out of an atexit() handler causes termination of C++ programs. So adding a throwing handler is not an option.
It's also not an option if some other library inserts an 'atexit()' handler after yours.
You can try leaving atexit() handler with longjmp. Though the behavior is undefined :)
The other way is to run lib in a separate process, bridged with any IPC. Harder, tedious, but safer.
Or, you can try to scan binary image and hook any exit() invocations. I know about MS detours and mhook on windows. Though I know none on linux, unfortunately.
I am refactoring an old code, and one of the things I'd like to address is the way that errors are handled. I'm well aware of exceptions and how they work, but I'm not entirely sure they're the best solution for the situations I'm trying to handle.
In this code, if things don't validate, there's really no reason or advantage to unwind the stack. We're done. There's no point in trying to save the ship, because it's a non-interactive code that runs in parallel through the Sun Grid Engine. The user can't intervene. What's more, these validation failures don't really represent exceptional circumstances. They're expected.
So how do I best deal with this? One thing I'm not sure I want is an exit point in every class method that can fail. That seems unmaintainable. Am I wrong? Is it acceptable practice to just call exit() or abort() at the failure point in codes like this? Or should I throw an exception all the way back to some generic catch statement in main? What's the advantage?
Throwing an exception to be caught in main and then exiting means your RAII resource objects get cleaned up. On most systems this isn't needed for a lot of resource types. The OS will clean up memory, file handles, etc. (though I've used a system where failing to free memory meant it remained allocated until system restart, so leaking on program exit wasn't a good idea.)
But there are other resource types that you may want to release cleanly such as network or database connections, or a mechanical device you're driving and need to shut down safely. If an application uses a lot of such things then you may prefer to throw an exception to unwind the stack back to main, and then exit.
So the appropriate method of exiting depends on the application. If an application knows it's safe then calling _Exit(), abort(), exit(), or quickexit() may be perfectly reasonable. (Library code shouldn't call these, since obviously the library has no idea whether its safe for every application that will ever use the library.) If there is some critical clean up that must be performed before an application exits but you know it's limited, then the application can register that clean up code via atexit() or at_quick_exit().
So basically decide what you need cleaned up, document it, implement it, and try to make sure it's tested.
It is acceptable to terminate the program if it cannot handle the error gracefully. There are few things you can do:
Call abort() if you need a core dump.
Call exit() if you want to give a chance to run to those routines registered with atexit() (that is most likely to call destructors for global C++ objects).
Call _exit() to terminate a process immediately.
There is nothing wrong with using those functions as long as you understand what you are doing, know your other choices, and choose that path willingly. After all, that's why those functions exist. So if you don't think it makes any sense to try to handle the error or do anything else when it happens - go ahead. What I would probably do is try to log some informative message (say, to syslog), and call _exit. If logging fails - call abort to get a core along the termination.
I'd suggest to call global function
void stopProgram() {
exit(1);
}
Later you can change it's behavior, so it is maintainable.
As you pointed out, having an exit or abort thrown around throughout your code is not maintainable ... additionally, there may be a mechanism in the future that could allow you to recover from an error, or handle an error in a more graceful manner than simply exiting, and if you've already hard-coded this functionality in, then it would be very hard to undo.
Throwing an exception that is caught in main() is your best-bet at this point that will also give you flexibility in the future should you run the code under a different scenario that will allow you to recover from errors, or handle them differently. Additionally, throwing exceptions could help should you decide to add more debugging support, etc., as it will give you spots to implement logging features and record the program state from isolated and maintainable points in the software before you decide let the program exit.
I'm currently wrapping a lua class in c++ and it's going pretty well so far. But I'm wondering if there some way to break a lua script from running(could be in the middle of the script) for another thread. So if I run my lua script on thread 1, can I break it from thread 2? Would lua_close(...) do that?
Thanks.
If this is an expected occurrence and most of the Lua script's time is spent inside Lua functions (i.e., not lengthy, blocking C calls), you could install a debug hook that checks for your "break flag" every N instructions and aborts the script. See the "debug library" section of Programming in Lua.
Then there must be some way to force the script to halt?
No, there doesn't.
Lua provides exactly one thread safety guarantee: two separate lua_States (separate being defined as different objects returned by lua_open) can be called freely from two different CPU threads. Thread A can call lua_State X, and thread B can call lua_State Y.
That is the only guarantee Lua gives you. Lua is inherently single-threaded; it only provides this guarantee because all of the global information for Lua is contained in a self-contained object: lua_State.
The rules of Lua require that only one thread talk to a lua_State at any one time. Therefore, it is impossible for Lua code to be executing and then be halted or paused by external C++ code. That would require breaking the rule: some other thread would have to call functions on this thread's lua_State.
If you want to abort a script in-progress, you will have to use debug hooks. This will dramatically slow down the performance of your script (if that matters to you). You will also need thread synchronization primitives, so that thread B can set a flag that thread A will read and halt on.
As for how you "abort" the script, that... is impossible in the most general case. And here's why.
The lua_State represents a Lua thread of execution. It's got a stack, a pointer to instructions to interpret, and some code it's currently executing. To "abort" this is not a concept that Lua has. What Lua does have is the concept of erroring. Calling lua_error from within C code being called by Lua will cause it to longjmp out of that function. This will also unwind the Lua thread, and it will return an error to the most recent function that handles errors.
Note: if you use lua_error in C++ code, you need to either have compiled Lua as C++ (thus causing errors to work via exception handling rather than setjmp/longjmp) or you need to make sure that every object with a destructor is off the stack when you call it. Calling longjmp in C++ code is not advisable without great care.
So if you issue lua_pcall to call some script, and in your debug hook, you issue lua_error, then you will get an error back in your lua_pcall statement. Lua's stack will be unwound, objects destroyed, etc. That's good.
The reason why this does not work in the general case (besides the fact that I have no idea how Lua would react to a debug hook calling lua_error) is quite simple: Lua scripts can execute pcall too. The error will go to the nearest pcall statement, whether it's in Lua or in C.
This Lua script will confound this method:
local function dostuff()
while true do end
end
while true do
pcall(dostuff)
end
So issuing lua_error does not guarantee that you'll "abort the script." Think of it like throwing exceptions; anyone could catch them, so you can't ensure that they are only caught in one place.
In general, you should never want to abort a Lua scripts execution in progress. This is not something you should be wanting to do.
I'm faced with the problem that my applications global variable destructors are not called. This seems to occur only if my application successfully connects to an oracle database (using OCI).
I put some breakpoints in the CRT and it seems that DllMain (or __DllMainCRTStartup) is not called with DLL_PROCESS_DETACH, thus no atexit() is called which explains why my destructors are not called.
I have no idea why this happens.
I realize that this is probably not enough information to be able to indicate the cause, but my question is: What would be a good start to look for the cause of this problem?
This is a list of things I already tried:
search the net for solutions
attached the debugger and enable native exceptions to see there was no hidden crash, sometimes I get an exception in the .Net framework, but the app seems to continue.
try to reproduce in a small application, no success
The most common situation I encounter where this occurs is a program crash. In certain circumstances crashes can happen silently from an end user perspective. I would attach a debugger to the program, set it to break on all native exceptions and run the scenario.
It's possible that someone is calling TerminateProcess, which unlike ExitProcess does not notify DLLs of shutdown.
This might be helpful:
What happens to global variables declared in a DLL?
Are your globals declared inside a dll, or in your application's memory space?
Calling theexit API often means that the application exits without calling destructors. I an not sure that VC does in this case.
Also try avoid using global objects. This is because you have little control over when and in what order the constructors and destructors are called. Rather convert the objects into pointers and initialise and destroy the pointers using the appropriate DllMain hooks. Since DllMain is an OS construct and not a language construct, it should prove more reliable in the case of normal exits.