I am refactoring an old code, and one of the things I'd like to address is the way that errors are handled. I'm well aware of exceptions and how they work, but I'm not entirely sure they're the best solution for the situations I'm trying to handle.
In this code, if things don't validate, there's really no reason or advantage to unwind the stack. We're done. There's no point in trying to save the ship, because it's a non-interactive code that runs in parallel through the Sun Grid Engine. The user can't intervene. What's more, these validation failures don't really represent exceptional circumstances. They're expected.
So how do I best deal with this? One thing I'm not sure I want is an exit point in every class method that can fail. That seems unmaintainable. Am I wrong? Is it acceptable practice to just call exit() or abort() at the failure point in codes like this? Or should I throw an exception all the way back to some generic catch statement in main? What's the advantage?
Throwing an exception to be caught in main and then exiting means your RAII resource objects get cleaned up. On most systems this isn't needed for a lot of resource types. The OS will clean up memory, file handles, etc. (though I've used a system where failing to free memory meant it remained allocated until system restart, so leaking on program exit wasn't a good idea.)
But there are other resource types that you may want to release cleanly such as network or database connections, or a mechanical device you're driving and need to shut down safely. If an application uses a lot of such things then you may prefer to throw an exception to unwind the stack back to main, and then exit.
So the appropriate method of exiting depends on the application. If an application knows it's safe then calling _Exit(), abort(), exit(), or quickexit() may be perfectly reasonable. (Library code shouldn't call these, since obviously the library has no idea whether its safe for every application that will ever use the library.) If there is some critical clean up that must be performed before an application exits but you know it's limited, then the application can register that clean up code via atexit() or at_quick_exit().
So basically decide what you need cleaned up, document it, implement it, and try to make sure it's tested.
It is acceptable to terminate the program if it cannot handle the error gracefully. There are few things you can do:
Call abort() if you need a core dump.
Call exit() if you want to give a chance to run to those routines registered with atexit() (that is most likely to call destructors for global C++ objects).
Call _exit() to terminate a process immediately.
There is nothing wrong with using those functions as long as you understand what you are doing, know your other choices, and choose that path willingly. After all, that's why those functions exist. So if you don't think it makes any sense to try to handle the error or do anything else when it happens - go ahead. What I would probably do is try to log some informative message (say, to syslog), and call _exit. If logging fails - call abort to get a core along the termination.
I'd suggest to call global function
void stopProgram() {
exit(1);
}
Later you can change it's behavior, so it is maintainable.
As you pointed out, having an exit or abort thrown around throughout your code is not maintainable ... additionally, there may be a mechanism in the future that could allow you to recover from an error, or handle an error in a more graceful manner than simply exiting, and if you've already hard-coded this functionality in, then it would be very hard to undo.
Throwing an exception that is caught in main() is your best-bet at this point that will also give you flexibility in the future should you run the code under a different scenario that will allow you to recover from errors, or handle them differently. Additionally, throwing exceptions could help should you decide to add more debugging support, etc., as it will give you spots to implement logging features and record the program state from isolated and maintainable points in the software before you decide let the program exit.
Related
I have a program that uses services from others. If the program crashes, what is the best way to close those services? At server side, I would define some checkers that monitor if a client is invalid periodically. But can we do any thing at client? I am not the sure if the normal RAII still works at this case. My code is written in C and C++.
If your application experiences a hard crash, then no, your carefully crafted cleanup code will not run, whether it is part of an RAII paradigm or a method you call at the end of main. None of an application's cleanup code runs after a crash that causes the application to be terminated.
Of course, this is not true for exceptions. Although those might eventually cause the application to be terminated, they still trigger this termination in a controlled way. Generally, the runtime library will catch an unhandled exception and trigger termination. Along the way, your RAII-based cleanup code will be executed, unless it also throws an exception. Then you're back to being unceremoniously ripped out of memory.
But even if your application's cleanup code can't run, the operating system will still attempt to clean up after you. This solves the problem of unreleased memory, handles, and other system objects. In general, if you crash, you need not worry about releasing these things. Your application's state is inconsistent, so trying to execute a bunch of cleanup code will just lead to unpredictable and potentially erroneous behavior, not to mention wasting a bunch of time. Just crash and let the system deal with your mess. As Raymond Chen puts it:
The building is being demolished. Don't bother sweeping the floor and emptying the trash cans and erasing the whiteboards. And don't line up at the exit to the building so everybody can move their in/out magnet to out. All you're doing is making the demolition team wait for you to finish these pointless housecleaning tasks.
Do what you must; skip everything else.
The only problem with this approach is, as you suggest in this question, when you're managing resources that are not controlled by the operating system, such as a remote resource on another system. In that case, there is very little you can do. The best scenario is to make your application as robust as possible so that it doesn't crash, but even that is not a perfect solution. Consider what happens when the power is lost, e.g. because a user's cat pulled the cord from the wall. No cleanup code could possibly run then, so even if your application never crashes, there may be termination events that are outside of your control. Therefore, your external resources must be robust in the event of failure. Time-outs are a standard method, and a much better solution than polling.
Another possible solution, depending on the particular use case, is to run consistency-checking and cleanup code at application initialization. This might be something that you would do for a service that is intended to run continuously and will be restarted promptly after termination. The next time it restarts, it checks its data and/or external resources for consistency, releases and/or re-initializes them as necessary, and then continues on as normal. Obviously this is a bad solution for a typical application, because there is no guarantee that the user will relaunch it in a timely manner.
As the other answers make clear, hoping to clean up after an uncontrolled crash (i.e., a failure which doesn't trigger the C++ exception unwind mechanism) is probably a path to nowhere. Even if you cover some cases, there will be other cases that fail and you are building in a serious vulnerability to those cases.
You mention that the source of the crashes is that you are "us[ing] services from others". I take this to mean that you are running untrusted code in-process, which is the potential source of crashes. In this case, you might consider running the untrusted code "out of process" and communicating back to your main process through a pipe or shared memory or whatever. Then you isolate the crashes this child process, and can do controlled cleanup in your main process. A separate process is really the lightest weight thing you can do that gives you the strong isolation you need to avoid corruption in the calling code.
If forking a process per-call is performance-prohibitive, you can try to keep the child process alive for multiple calls.
One approach would be for your program to have two modes: normal operation and monitoring.
When started in a usual way, it would :
Act as a background monitor.
Launch a subprocess of itself, passing it an internal argument (something that wouldn't clash with normal arguments passed to it, if any).
When the subprocess exists, it would release any resources held at the server.
When started with the internal argument, it would:
Expose the user interface and "act normally", using the resources of the server.
You might look into atexit, which may give you the functionality you need to release resources upon program termination. I don't believe it is infallible, though.
Having said that, however, you should really be focusing on making sure your program doesn't crash; if you're hitting an error that is "unrecoverable", you should still invest in some error-handling code. If the error is caused by a Seg-Fault or some other similar OS-related error, you can either enable SEH exceptions (not sure if this is Windows-specific or not) to enable you to catch them with a normal try-catch block, or write some Signal Handlers to intercept those errors and deal with them.
We have a third-party library that was written without multithreading or exception handling in mind. Our main executable is multithreaded and uses exceptions.
The third-party library uses exit() to abort the program for serious problems (like "driver not initialized" or "file not found"). Calling exit() in a multithreaded application is not allowed, as it does not shut down threads correctly. In addition, I really don't want to ever exit the main application, as it is a server application, and in many cases, there are proactive things that the main program can do to recover from the error.
I would like to essentially replace the system provided exit(int status) function with my own function, ie
class exit_exception : public runtime_error
{
public: exit_exception(int status)
: runtime_error("exit called with status " + to_string(status)) {}
};
extern "C" void exit(int status) {
throw exit_exception(status);
}
and catch the exception in my code. It seems to work, but this is obviously a hack and not the way nature intended exit() to be used. What am I doing wrong without knowing?
edit
Many have suggested I put this in a separate process, but that would defeat many things. The third-party library does very high speed data transfer that needs to be in the main application process because it lives in the same virtual memory space and does not use malloc to allocate memory from the FPGA coprocessor that is controller. This code is close to the "iron" and is squeezing every bit of bandwidth out of the memory and PCIe busses.
edit 2
My program can still return status codes to the OS with the return value from int main(), which does not ultimately call exit(). Otherwise I would be in real trouble.
This is just an idea, but you could use a similar approach as i did when i needed to wrap memcpy to use some different version, take a look at my answer here.
So you could build a replacement for the exit() function that does nothing, or do some cleanup. It's just an idea and i have not tried it, but it could help you to solve your problem.
The biggest and most obvious risk is resource leakage.
If the library thinks its error handling strategy is to dive out of the nearest window there's alway a risk that throwing that exception isn't going to result in well organised release of memory and system resources.
However I notice you mention it doesn't allocate memory with malloc() if that means you have to provide it with all its resources as buffers and the like then maybe by some miraculous accident it is safely unwindable!
If that fails all I can suggest is contact the supplier and persuade them that they should join the rest of us in 21st programming paradigms.
PS: Throwing an exception out of an atexit() handler causes termination of C++ programs. So adding a throwing handler is not an option.
It's also not an option if some other library inserts an 'atexit()' handler after yours.
You can try leaving atexit() handler with longjmp. Though the behavior is undefined :)
The other way is to run lib in a separate process, bridged with any IPC. Harder, tedious, but safer.
Or, you can try to scan binary image and hook any exit() invocations. I know about MS detours and mhook on windows. Though I know none on linux, unfortunately.
I'm working on a C++ application which uses a library written in C by another team. The writers of the library like to call exit() when errors happen, which ends the program immediately without calling the destructors of objects on the stack in the C++ application. The application sets up some system resources which don't automatically get reclaimed by the operating system after the process ends (shared memory regions, interprocess mutexes, etc), so this is a problem.
I have complete source code for both the app and the library, but the library is very well-established and has no unit tests, so changing it would be a big deal. Is there a way to "hook" the calls to exit() so I can implement graceful shutdown for my app?
One possibility I'm considering is making one big class which is the application - meaning all cleanup would happen either in its destructor or in the destructor of one of its members - then allocating one of these big objects on the heap in main(), setting a global pointer to point to it, and using atexit() to register a handler which simply deletes the object via the global pointer. Is that likely to work?
Is there a known good way to approach this problem?
In the very worst case, you can always write your own implementation of exit and link it rather than the system's own implementation. You can handle the errors there, and optionally call _exit(2) yourself.
Since you have the library source, it's even easier - just add a -Dexit=myExit flag when building it, and then provide an implementation of myExit.
install exit handler with atexit and implement the desired behavior
If you want to make the C library more usable from C++, you could perhaps run it in a separate process. Then make sure (with an exit handler or otherwise) that when it exits, your main application process notices and throws an exception to unwind its own stack. Perhaps in some cases it could handle the error in a non-fatal way.
Of course, moving the library use into another process might not be easy or particularly efficient. You'll have some work to do to wrap the interface, and to copy inputs and outputs via the IPC mechanism of your choice.
As a workaround to use the library from your main process, though, I think the one you describe should work. The risk is that you can't identify and isolate everything that needs cleaning up, or that someone in future modifies your application (or another component you use) on the assumption that the stack will get unwound normally.
You could modify the library source to call a runtime- or compile-time-configurable function instead of calling exit(). Then compile the library with exception-handling and implement the function in C++ to throw an exception. The trouble with that is that the library itself probably leaks resources on error, so you'd have to use that exception only to unwind the stack (and maybe do some error reporting). Don't catch it and continue even if the error could be non-fatal as far as your app is concerned.
If the call exit and not assert or abort, there are a few points to get control again:
When calling exit, the destructors for objects with static lifetime (essentially: globals and objects declared with static) are still executed. This means you could set up a (few) global "resource manager" object(s) and release the resources in their destructor(s).
As you already found, you can register hooks with atexit. This is not limited to one. You can register more.
If all else fails, because you have the source of the library, you can play some macro tricks to effectively replace the calls to exit with a function of your own that could, for example, throw an exception.
I'd like to apologize in advance, because this is not a very good question.
I have a server application that runs as a service on a dedicated Windows server. Very randomly, this application crashes and leaves no hint as to what caused the crash.
When it crashes, the event logs have an entry stating that the application failed, but gives no clue as to why. It also gives some information on the faulting module, but it doesn't seem very reliable, as the faulting module is usually different on each crash. For example, the latest said it was ntdll, the one before that said it was libmysql, the one before that said it was netsomething, and so on.
Every single thread in the application is wrapped in a try/catch (...) (anything thrown from an exception handler/not specifically caught), __try/__except (structured exceptions), and try/catch (specific C++ exceptions). The application is compiled with /EHa, so the catch all will also catch structured exceptions.
All of these exception handlers do the same thing. First, a crash dump is created. Second, an entry is logged to a new file on disk. Third, an entry is logged in the application logs. In the case of these crashes, none of this is happening. The bottom most exception handler (the try/catch (...)) does nothing, it just terminates the thread. The main application thread is asleep and has no chance of throwing an exception.
The application log files just stop logging. Shortly after, the process that monitors the server notices that it's no longer responding, sends an alert, and starts it again. If the server monitor notices that the server is still running, but just not responding, it takes a dump of the process and reports this, but this isn't happening.
The only other reason for this behavior that I can come up with, aside from uncaught exceptions, is a call to exit or similar. Searching the code brings up no calls to any functions that could terminate the process. I've also made sure that the program isn't terminating normally (i.e. a stop request from the service manager).
We have tried running it with windbg attached (no chance to use Visual Studio, the overhead is too high), but it didn't report anything when the crash occurred.
What can cause an application to crash like this? We're beginning to run out of options and consider that it might be a hardware failure, but that seems a bit unlikely to me.
If your app is evaporating an not generating a dump file, then it is likely that an exception is being generated which your app doesnt (or cant) handle. This could happen in two instances:
1) A top-level exception is generated and there is no matching catch block for that exception type.
2) You have a matching catch block (such as catch(...)), but you are generating an exception within that handler. When this happens, Windows will rip the bones from your program. Your app will simply cease to exist. No dump will be generated, and virtually nothing will be logged, This is Windows' last-ditch effort to keep a rogue program from taking down the entire system.
A note about catch(...). This is patently Evil. There should (almost) never be a catch(...) in production code. People who write catch(...) generally argue one of two things:
"My program should never crash. If anything happens, I want to recover from the exception and continue running. This is a server application! ZOMG!"
-or-
"My program might crash, but if it does I want to create a dump file on the way down."
The former is a naive and dangerous attitude because if you do try to handle and recover from every single exception, you are going to do something bad to your operating footprint. Maybe you'll munch the heap, keep resources open that should be closed, create deadlocks or race conditions, who knows. Your program will suffer from a fatal crash eventually. But by that time the call stack will bear no resemblance to what caused the actual problem, and no dump file will ever help you.
The latter is a noble & robust approach, but the implementation of it is much more difficult that it might seem, and it fraught with peril. The problem is you have to avoid generating any further exceptions in your exception handler, and your machine is already in a very wobbly state. Operations which are normally perfectly safe are suddenly hand grenades. new, delete, any CRT functions, string formatting, even stack-based allocations as simple as char buf[256] could make your application go >POOF< and be gone. You have to assume the stack and the heap both lie in ruins. No allocation is safe.
Moreover, there are exceptions that can occur that a catch block simply can't catch, such as SEH exceptions. For that reason, I always write an unhandled-exception handler, and register it with Windows, via SetUnhandledExceptionFilter. Within my exception handler, I allocate every single byte I need via static allocation, before the program even starts up. The best (most robust) thing to do within this handler is to trigger a seperate application to start up, which will generate a MiniDump file from outside of your application. However, you can generate the MiniDump from within the handler itself if you are extremely careful no not call any CRT function directly or indirectly. Basically, if it isn't an API function you're calling, it probably isn't safe.
I've seen crashes like these happen as a result of memory corruption. Have you run your app under a memory debugger like Purify to see if that sheds some light on potential problem areas?
Analyze memory in a signal handler
http://msdn.microsoft.com/en-us/library/xdkz3x12%28v=VS.100%29.aspx
This isn't a very good answer, but hopefully it might help you.
I ran into those symptoms once, and after spending some painful hours chasing the cause, I found out a funny thing about Windows (from MSDN):
Dereferencing potentially invalid
pointers can disable stack expansion
in other threads. A thread exhausting
its stack, when stack expansion has
been disabled, results in the
immediate termination of the parent
process, with no pop-up error window
or diagnostic information.
As it turns out, due to some mis-designed data sharing between threads, one of my threads would end up dereferencing more or less random pointers - and of course it hit the area just around the stack top sometimes. Tracking down those pointers was heaps of fun.
There's some technincal background in Raymond Chen's IsBadXxxPtr should really be called CrashProgramRandomly
Late response, but maybe it helps someone: every Windows app has a limit on how many handles can have open at any time. We had a service not releasing a handle in some situation, the service would just disappear, after a few days, or at times weeks (depending on the usage of the service).
Finding the leak was great fun :D (use Task Manager to see thread count, handles count, GDI objects, etc)
If you check this link http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=107
it's written:
"For example, the abort() and exit() library functions are never to be used in an object-oriented environment—even during debugging—because they don't invoke objects' destructors before program termination."
Why does the destructor need to be called when one calls exit? (Since the OS guarantees memory will be reclaim whenever a program exits, right?)
Destructors can, and often do, other operations besides freeing memory and/or resources. They are often used to make certain other guarantees such as user data is written to a file or non-process specific resources are in a known state. The OS won't do these types of operations on exit.
That being said, any program which relies on these types of actions is fundamentally flawed. Use of exit and abort are not the only ways destructors are avoided. There are many other ways a destructor can be circumvented. For example user forcable terminating the process or a power outage.
I definitely disagree with the use of never in the quoted passage. I can think of at least one situation where you absolutely don't want destructors executing: corrupted memory. At the point you detect corrupted memory you can no longer make any guarantees about the code in your process, destructors included. Code which should write data to a file might delete / corrupt it instead.
The only safe thing to do when memory corruption is detected is to exit the process as fast as possible.
First I will like to advice that "Don't believe anything blindly you have read."
Probably as JaredPar said the destructor may doing some logging things and closing of OS resource handle things. In case you call abort or exit these things will never happen.
But I certainly do not agree with the quote at all. Abort is the best and the fastest way of finding the programming errors in the development cycle. As a developer you certainly don't put abort blindly everywhere in the code, but on the condition you know that should have never happened. You put abort when you know that the other programmer or you have messed up somewhere in the code and it's better to stop than to handle the error. I have gone through a situation where abort has really saved my #$$.