We've inherited large legacy application which is structured roughly like this:
class Application
{
Foo* m_foo;
Bar* m_bar;
Baz* m_baz;
public:
Foo* getFoo() { return m_foo; }
Bar* getBar() { return m_bar; }
Baz* getBaz() { return m_baz; }
void Init()
{
m_foo = new Foo();
m_bar = new Bar();
m_baz = new Baz();
// all of them are singletons, which can call each other
// whenever they please
// may have internal threads, open files, acquire
// network resources, etc.
SomeManager.Init(this);
SomeOtherManager.Init(this);
AnotherManager.Init(this);
SomeManagerWrapper.Init(this);
ManagerWrapperHelper.Init(this);
}
void Work()
{
SomeManagerWrapperHelperWhateverController.Start();
// it will never finish
}
// no destructor, no cleanup
};
All managers once created stay there for the whole application lifetime. The application does not have close or shutdown methods and managers also doesn't have those. So, the complex inter dependencies are never dealt with.
The question is: if the objects lifetime is tightly coupled with the application lifetime, is it accepted practice to not have cleanup at all? Will the operating system (Windows in our case) be able to cleanup everything (kill threads, close open file handles, sockets, etc.) once the process ends (by ending it in task manager or by calling special functions like ExitProcess, Abort, etc.)? What are possible problems with the above approach?
Or more generic question: are destructors absolutely necessary for global objects (declared outside of main)?
Will the operating system (Windows in our case) be able to cleanup
everything (kill threads, close open file handles, sockets, etc.) once
the process ends (by ending it in task manager or by calling special
functions like ExitProcess, Abort, etc.)? What are possible problems
with the above approach?
As long as your objects aren't initialising any resources not cleaned up by the operating system, then it doesn't make any practical difference whether you explicitly clean up or not, as the OS will mop up for you when your process is terminated.
However, if your objects are creating resources which are not cleaned up by the OS then you've got a problem and need a destructor or some other explicit clean up code somewhere in your app.
Consider if one of those objects creates sessions on some remote service, like a database for example. Of course, the OS doesn't magically know that this has been done or how to clean them up when your process dies, so those sessions would remain open until something kills them (the DBMS itself probably, by enforcing some timeout threshold or other). Perhaps not a problem if your app is a tiny user of resources and you're running on a big infrastructure - but if your app creates and then orphans enough sessions then that resource contention on that remote service might start to become a problem.
if the objects lifetime is tightly coupled with the application
lifetime, is it accepted practice to not have cleanup at all?
That's a matter of subjective debate. My personal preference is to include the explicit cleanup code and make each object I create personally responsible for cleaning up after itself wherever practical. If application-lifetime objects are ever refactored such that they no longer live for the lifetime of the object, I don't have to go back and figure out whether I need to add previously-omitted cleanup. I guess for cleanup I'm saying that I generally prefer to lean towards RAII over the more pragmatic YAGNI.
is it accepted practice to not have cleanup at all
It depends on who you're asking.
Will the operating system (Windows in our case) be able to cleanup
everything (kill threads, close open file handles, sockets, etc.) once
the process ends
Yes, the OS will take back everything. It will claim memory, free handles etc.
What are possible problems with the above approach
One of the possible problems is that if you use a memory leak detector it will constantly show you have leaks.
In general, modern operating systems cleans up all a process resources on exit. But in my opinion it's still good manners to clean up after yourself. (But then I was "raised" on the Amiga, where you had to do it.)
Sometimes it's forced on you by a spec or just by the behaviour of 'peripherals'. Perhaps you have a lot of data buffered in your app that should really be flushed to disk or maybe a DB may accumulate 'half-open' connections is not explicitly closed.
Other than that, as #cnicutar says, it depends who you ask. I'm firmly in the 'don't bother' camp for the following reasons:
1) It's difficult enough to get apps to work anyway without writing extra shutdown code that is not required.
2) The more code you write, the more bugs there are and the more testing you have to do. You may have to test such code in more than one OS version:(
3) The OS developers have spent a long time ensuring that apps can always be shut down if required, (eg. by Task Manger), without any overall impact on the rest of the system. If some functionality is already there in the OS, why not leverage it?
4) Threads pose a particular problem - they could be in any state. They may be running on a different core than the thread that initiates app close or may be blocked on a system call. While it's very easy for the OS to ensure that all threads are terminated before releasing any memory, closing handles etc, it's very difficult to stop such threads in a safe and reliable manner from user code.
5) Performance-sapping memory-managers are not the only way of detecting leaks. If large objects, (eg. network buffers), are pooled, it's easy to tell if any leak during run-time without relying on 3rd-party memory-managers that issue a leak report on app close. An intensive memory-checker like Valgrind my actually cause system problems by affecting the overall timing.
6) Empirically, every app I've eve written for Windows that has no explicit shutdown code has closed immediately and completely when the user clicks on the 'red cross' border icon. This incudes busy, complex IOCP servers running on multicore boxes with thousands of connected clients.
7) Assuming that a reasonable test phase has been done - one that includes load/soak testing - it's not difficult to differentiate an app that is leaking from one that chooses to not free memory that it is using at close time. Colander-apps will show memory/handles/whatever always increasing with run time.
8) Small, occasional leaks that are not obvious are not worth spending a huge amount of time on. Most Windows boxes are restarted every month anyway, (Patch Tuesday).
9) Opaque libraries are often written by developers like me and so will generate spurious 'leak reports' on shutdown anyway.
Designing/writing/debugging/testing shutdown code solely to clean up a memory-report is an expensive luxury I can well do without:)
You should determine that for each object individually. If an object requires special actions to be taken upon cleanup (such as flushing a buffer to disk), this will not happen unless you explicitly take care of it.
Related
I debug console multithreaded application written in C++/Qt 5.12.1. It is running on Linux Mint 18.3 x64.
This app has SIGINT handler, QWebSocketServer and QWebSocket table. It uses close() QWebSocketServer and call abort()/deleteLater() for items in QWebSocket table to handle the termination.
If the websocket client connects to this console app, then termination fails because of some running thread (I suppose it's internal QWebSocket thread).
Termination is successful if there were no connections.
How to fix it? So that the app gracefully exits.
To gracefully quit the socket server we can attempt:
The most important part is to allow the main thread event loop to run and wait on QWebSocketServer::closed() so that the slot calls QCoreApplication::quit().
That can be done even with:
connect(webSocketServer, &QWebSocketServer::closed,
QCoreApplication::instance(), &QCoreApplication::quit);
If we don't need more detailed reaction.
After connecting that signal before all, proceed with pauseAccepting() to prevent more connections.
Call QWebSocketServer::close.
The below may not be needed if the above sufficient. You need to try the above first, and only if still have problems then deal with existing and pending connections. From my experience the behavior was varying on platforms and with some unique websocket implementations in the server environment (which is likely just Qt for you).
As long as we have some array with QWebSocket instances, we can try to call QWebSocket::abort() on all of them to immediately release. This step seem to be described by the question author.
Try to iterate pending connections with QWebSocketServer::nextPendingConnection() and call abort() for them. Call deleteLater, if that works as well.
There is no need to do anything. What do you mean by "graceful exit"? As soon as there's a request to terminate your application, you should terminate it immediately using exit(0) or a similar mechanism. That's what "graceful exit" should be.
Note: I got reformed. I used to think that graceful exits were a good thing. They are most usually a waste of CPU resources and usually indicate problems in the architecture of the application.
A good rationale for why it should be so written in the kj framework (a part of capnproto).
Quoting Kenton Varda:
KJ_NORETURN(virtual void exit()) = 0;
Indicates program completion. The program is considered successful unless error() was
called. Typically this exits with _Exit(), meaning that the stack is not unwound, buffers
are not flushed, etc. -- it is the responsibility of the caller to flush any buffers that
matter. However, an alternate context implementation e.g. for unit testing purposes could
choose to throw an exception instead.
At first this approach may sound crazy. Isn't it much better to shut down cleanly? What if
you lose data? However, it turns out that if you look at each common class of program, _Exit()
is almost always preferable. Let's break it down:
Commands: A typical program you might run from the command line is single-threaded and
exits quickly and deterministically. Commands often use buffered I/O and need to flush
those buffers before exit. However, most of the work performed by destructors is not
flushing buffers, but rather freeing up memory, placing objects into freelists, and closing
file descriptors. All of this is irrelevant if the process is about to exit anyway, and
for a command that runs quickly, time wasted freeing heap space may make a real difference
in the overall runtime of a script. Meanwhile, it is usually easy to determine exactly what
resources need to be flushed before exit, and easy to tell if they are not being flushed
(because the command fails to produce the expected output). Therefore, it is reasonably
easy for commands to explicitly ensure all output is flushed before exiting, and it is
probably a good idea for them to do so anyway, because write failures should be detected
and handled. For commands, a good strategy is to allocate any objects that require clean
destruction on the stack, and allow them to go out of scope before the command exits.
Meanwhile, any resources which do not need to be cleaned up should be allocated as members
of the command's main class, whose destructor normally will not be called.
Interactive apps: Programs that interact with the user (whether they be graphical apps
with windows or console-based apps like emacs) generally exit only when the user asks them
to. Such applications may store large data structures in memory which need to be synced
to disk, such as documents or user preferences. However, relying on stack unwind or global
destructors as the mechanism for ensuring such syncing occurs is probably wrong. First of
all, it's 2013, and applications ought to be actively syncing changes to non-volatile
storage the moment those changes are made. Applications can crash at any time and a crash
should never lose data that is more than half a second old. Meanwhile, if a user actually
does try to close an application while unsaved changes exist, the application UI should
prompt the user to decide what to do. Such a UI mechanism is obviously too high level to
be implemented via destructors, so KJ's use of _Exit() shouldn't make a difference here.
Servers: A good server is fault-tolerant, prepared for the possibility that at any time
it could crash, the OS could decide to kill it off, or the machine it is running on could
just die. So, using _Exit() should be no problem. In fact, servers generally never even
call exit anyway; they are killed externally.
Batch jobs: A long-running batch job is something between a command and a server. It
probably knows exactly what needs to be flushed before exiting, and it probably should be
fault-tolerant.
There is this multiplatform (Windows, Linux, Cygwin) dynamic library which is loaded at run time by a Cygwin executable. At some point of time, during the normal workflow, the DLL allocates a pool of threads for use. These threads are managed as global variables (reference counted). So when the client process goes to shutdown, it starts releasing global objects, threads should be released too.
Issue is, as I understand, that during the process shutdown, the Loader lock is acquired and further down the street, threads want to acuiqre the same lock and, we have now a deadlock.
Now my ask for advise is, how we can make a nice shutdown?
The DLL has no init() or uninit() methods to be called. The client at best can be enhanced with some code before the end of main () (so this is before the process shutdown).
If I detach the threads, instead of joining them, during the global var clean up, memory goes corrupted. If I terminate them, we have ugly process dumps.
Btw, under Linux I see no such problems.
DLL is only C++14, client is C99 (Cygwin).
I tried to make the situation clear, but let me know if you have further questions. Thanks in advance for any ideas.
The fix is to add an uninit method to the DLL. It may not have one yet, but it needs one. You found out why: while the OS will call DllMain on DLL unload, it does so under loader lock. You need to do things that aren't possible under loader lock, so you need an extra call before DllMain. Naming that method uninit() is reasonable enough.
C++14 is not an issue here; this is an OS mechanism. Loader Lock has been around since ancient times.
At a previous job I struggled with this issue for a pretty lengthy period. Ultimately it came down to only 2 possible solutions:
Clean up every single resource the thread has claim to, then TerminateThread. It's violent and ugly but it works around the THREAD_DETACH issue and I actually found it advised on the Internet.
If you have the luxury of being able to get advance notice prior to PROCESS_DETACH - clean up everything at that early point, including orderly shutdown of your threads. Then by all means do absolutely nothing during PROCESS_DETACH - yes, don't even free any lingering heap objects, as you may be exposing yourself to deadlock or crash risks and the process is going down and freeing up all its resources anyway.
As an added note, I also learned to avoid at all costs having any global variables linked to the DLL lifetime. These will have their constructors and destructors executed in the DllMain context, needless to say more... If you need global singletons in a DLL, make sure to have manual control on their lifetimes (on both ends, so no auto-destruct smart pointers either).
I have a program that uses services from others. If the program crashes, what is the best way to close those services? At server side, I would define some checkers that monitor if a client is invalid periodically. But can we do any thing at client? I am not the sure if the normal RAII still works at this case. My code is written in C and C++.
If your application experiences a hard crash, then no, your carefully crafted cleanup code will not run, whether it is part of an RAII paradigm or a method you call at the end of main. None of an application's cleanup code runs after a crash that causes the application to be terminated.
Of course, this is not true for exceptions. Although those might eventually cause the application to be terminated, they still trigger this termination in a controlled way. Generally, the runtime library will catch an unhandled exception and trigger termination. Along the way, your RAII-based cleanup code will be executed, unless it also throws an exception. Then you're back to being unceremoniously ripped out of memory.
But even if your application's cleanup code can't run, the operating system will still attempt to clean up after you. This solves the problem of unreleased memory, handles, and other system objects. In general, if you crash, you need not worry about releasing these things. Your application's state is inconsistent, so trying to execute a bunch of cleanup code will just lead to unpredictable and potentially erroneous behavior, not to mention wasting a bunch of time. Just crash and let the system deal with your mess. As Raymond Chen puts it:
The building is being demolished. Don't bother sweeping the floor and emptying the trash cans and erasing the whiteboards. And don't line up at the exit to the building so everybody can move their in/out magnet to out. All you're doing is making the demolition team wait for you to finish these pointless housecleaning tasks.
Do what you must; skip everything else.
The only problem with this approach is, as you suggest in this question, when you're managing resources that are not controlled by the operating system, such as a remote resource on another system. In that case, there is very little you can do. The best scenario is to make your application as robust as possible so that it doesn't crash, but even that is not a perfect solution. Consider what happens when the power is lost, e.g. because a user's cat pulled the cord from the wall. No cleanup code could possibly run then, so even if your application never crashes, there may be termination events that are outside of your control. Therefore, your external resources must be robust in the event of failure. Time-outs are a standard method, and a much better solution than polling.
Another possible solution, depending on the particular use case, is to run consistency-checking and cleanup code at application initialization. This might be something that you would do for a service that is intended to run continuously and will be restarted promptly after termination. The next time it restarts, it checks its data and/or external resources for consistency, releases and/or re-initializes them as necessary, and then continues on as normal. Obviously this is a bad solution for a typical application, because there is no guarantee that the user will relaunch it in a timely manner.
As the other answers make clear, hoping to clean up after an uncontrolled crash (i.e., a failure which doesn't trigger the C++ exception unwind mechanism) is probably a path to nowhere. Even if you cover some cases, there will be other cases that fail and you are building in a serious vulnerability to those cases.
You mention that the source of the crashes is that you are "us[ing] services from others". I take this to mean that you are running untrusted code in-process, which is the potential source of crashes. In this case, you might consider running the untrusted code "out of process" and communicating back to your main process through a pipe or shared memory or whatever. Then you isolate the crashes this child process, and can do controlled cleanup in your main process. A separate process is really the lightest weight thing you can do that gives you the strong isolation you need to avoid corruption in the calling code.
If forking a process per-call is performance-prohibitive, you can try to keep the child process alive for multiple calls.
One approach would be for your program to have two modes: normal operation and monitoring.
When started in a usual way, it would :
Act as a background monitor.
Launch a subprocess of itself, passing it an internal argument (something that wouldn't clash with normal arguments passed to it, if any).
When the subprocess exists, it would release any resources held at the server.
When started with the internal argument, it would:
Expose the user interface and "act normally", using the resources of the server.
You might look into atexit, which may give you the functionality you need to release resources upon program termination. I don't believe it is infallible, though.
Having said that, however, you should really be focusing on making sure your program doesn't crash; if you're hitting an error that is "unrecoverable", you should still invest in some error-handling code. If the error is caused by a Seg-Fault or some other similar OS-related error, you can either enable SEH exceptions (not sure if this is Windows-specific or not) to enable you to catch them with a normal try-catch block, or write some Signal Handlers to intercept those errors and deal with them.
I'm writing a multi-threaded c++ application for *nix operating systems. What are some best practices for terminating such an application gracefully? My instinct is that I'd want to install a signal handler on SIGINT (SIGTERM?) which stops/joins my threads. Also, is it possible to "guarantee" that all destructors are called (provided no other errors or exceptions are thrown while handling the signal)?
Some considerations come to mind:
designate 1 thread to be responsible for orchestrating the shutdown, eg, as Dithermaster suggested, this could be the main thread if you are writing a standalone application. Or if you are writing a library, provide an interface (eg function call) whereby a client program can terminate the objects created within the library.
you cannot guarantee destructors are called; that is up to you, and requires carefully calling delete for each new. Maybe smart pointers will help you. But, really, this is a design consideration. The major components should have start & stop semantics, which you could choose to invoke from the class constructor & destructor.
the shutdown sequence for a set of interacting objects is something that can require some effort to get correct. E.g., before you delete an object, are you sure some timer mechanism is not going to try calling it in few micro/milli/seconds later? Trial and error is your friend here; develop a framework which can repeatedly & rapidly start and stop your application to tease out shutdown related race-conditions.
signals are one way to trigger an event; others might be periodically polling for a known file, or opening a socket and receiving some data on it. Either way, you want to decouple the shutdown sequence code from the trigger event.
My recommendation is that the main thread shut down all worker threads before exiting itself. Send each worker an event telling it to clean up and exit, and wait for each one to do so. This will allow all C++ destructors to run.
Regarding signal management, the only thing you can portably and safely do inside a signal handler is to write to a variable of type sig_atomic_t (possibly volatile-qualified) and return. In general, you cannot call most functions and must not write to global memory. In other words, the handler should just set a flag to be tested inside your main routine, at some point you find appropriate, and the action resulting from the signal itself should be performed from there.
(Since there might be blocking I/O involved, consider studying POSIX Thread Cancellation. Your Unix clone (most notably Linux) might have peculiarities with respect to this and to the above.)
Regarding destructors, no magic is involved. They will be executed if control leaves a given scope through any means defined in the language. Leaving a scope through other means (for example, longjmp() or even exit()) does not trigger destructors.
Regarding general shutdown practices, there are divergent opinions on the field.
Some state that a "graceful termination", in the sense of releasing every resource ever allocated, should be performed. In C++, this usually means that all destructors should be properly executed before the process terminates. This is tricky in practice and often a source of much grief, specially in multithreaded programs, for a variety of reasons. Signals further complicate things by the very nature of asynchronous signal dispatching.
Because most of this work is totally useless, some others, like me, contend that the program must just terminate immediately, possibly shortly after undoing persistent changes to the system (like removing temporary files or restoring the screen resolution) and saving configuration. An apparently tidier cleanup is not only a waste of time (because the operating system will clean up most things like allocated memory, dangling threads and open file descriptors), but might be a serious waste of time (deallocators might touch paged out memory, uselessly forcing the system to page them in just for releasing them soon after the process terminates, for example), not mentioning the possibility of deadlocks being originated from joining threads.
Just say no. When you want to leave, call exit() (or even _exit(), but watch out for unflushed I/O) and that's it. More annoying than slow starting programs are slow terminating programs.
I have a few "global" constructs that are allocated with new and are alive the entirety of the applications life span.
Should i bother calling delete on the pointers just before the application finishes?
Doesn't all the of the applications memory get reclaimed after it closes anyway?
Edit For Clarity. I am only talking about not calling delete for lifetime objects who "die" right as the program is closing.
Technically, yes, the memory is reclaimed. But unless you use delete the destructors of those objects are not run and their side effect is not applied. This might lead to a temporary file not deleted or a database change not committed depending on what those destructor were meant to do.
Also don't forget Murphy. Now the code for managing those objects is used as you describe (objects have to persist for the life of the program) but later you might want to reuse the code so that it is run multiple times. Unless it can deal with recreating objects properly it will be leaking objects.
It is always good practice to clean up everything, although the memory is reclained these objects might have other resources allocated (shared memory, smeaphores etc) that should be cleaned up, probably by the objects destructors.
If you do not want to call delete use shared pointers to hold these resources, so that they are cleaned up correctly when the application exits.
How are you testing your application? Not cleaning up might hinder development of a decent test harness. Tests of the application might want a way of spoofing a shutdown and restart.
There is more to cleaning up that simple releasing memory.
No, don't write/debug/maintain code to do something that the OS is already very good at.
Unless there are specific reasons to the contrary, (eg. outstanding transactions to be commited, files to flush, connections to be closed), I don't bother writing code to do something that the OS is going to do anyway. If a dtor does nothing special, why bother calling it?
Many developers put in a lot of effort into deleting/destroying/freeing/terminating stuff at app close time - a load of effort to avoid some spurious 'leak report' on app shutdown from a memory manager that is itself about to be destroyed.
I think you're probably right, but I personally would consider it poor coding and bad practise to rely on the system and would ensure my code always tidied properly whe shutting down.
There is no one right answer. Most of the time, it probably doesn't
matter, but there are destructors which do something important, beyond
just freeing memory (I have one which deletes temporary files), which
argues in favor of cleanup; on the other hand, destructing such objects
may lead to order of destruction issues, if the objects are used by
destructors of other objects. My general rule is to not destruct,
unless the destructor does something more than just free memory, but
others may prefer a different set of defaults.
In addition to the destructors not being executed (as sharptooth pointer out), it's also worthwhile to delete global objects to make memory checkers happy. Especially if your code is in a shared library - you don't want to clutter their memory checker (say, Valgrind) output just because you didn't delete properly.
.. then there's those cases where you definitely don't want the dtor's called at all before the OS terminates the process, eg:
1) When the dtor does not work properly because it tries to terminate a thread, fails and blocks on the thread handle or other signal, (the perennial 'join/waitFor' deadlock) - the cause off 99% of all household 'my app will not close down cleanly' posts.
2) When the dtor does not work properly because it's just bad anyway and buried in a library.
3) Where the memory must outlive the process threads else there will be segfaults/AV on close, (eg. pools of buffer objects that threads may well be writing to at close time).
4) Any other 'special cases' where the destruction of the object/s has to be left to the OS.
There are so many 'special cases' that I regard 'cleaning up' shutdown code as the special case.