There was no direct and satisfactory answer found on quite a simple question:
Given multiple threads running is there a generic/correct way to wait on them to finish while exiting the process? Or "is doing timed wait Ok in this case?"
Yes, we attempt to signal threads to finish but it is observed that during process exit some of them tend to stall. We recently had a discussion and it was decided to rid of "arbitrary wait":
m_thread.quit(); // the way we had threads finished
m_thread.wait(kWaitMs); // with some significant expiration (~1000ms)
m_thread.quit(); // the way we have threads finished now
m_thread.wait(); // wait forever until finished
I understand that kWaitMs constant should be chosen somewhat proportional to one uninterrupted "job cycle" for the thread to finish. Say, if the thread processes some chunk of data for 10 ms then we should probably wait on it to respond to quit signal for 100 ms and if it still does not quit then we just don't wait anymore. We don't wait in that case as long as we quit the program and no longer care. But some engineers don't understand such "paradigm" and want an ultimate wait. Mind that the program process stuck in memory on the client machine will cause problems on the next program start in our case for sure not to mention that the log will not be properly finished to process as an error.
Can the question about the proper thread finishing on process quit be answered?
Is there some assistance from Qt/APIs to resolve the thread hang-up better, so we can log the reason for it?
P.S. Mind that I am well aware on why it is wrong to terminate the thread forcefully and how can that be done. This question I guess is not about synchronization but about limited determinism of threads that run tons of our and framework and OS code. The OS is not Real Time, right: Windows / MacOS / Linux etc.
P.P.S. All the threads in question have event loop so they should respond to QThread::quit().
Yes, we attempt to signal threads to finish but it is observed that
during process exit some of them tend to stall.
That is your real problem. You need to figure out why some of your threads are stalling, and fix them so that they do not stall and always quit reliably when they are supposed to. (The exact amount of time they take to quit isn't that important, as long as they do quit in a reasonable amount of time, i.e. before the user gets tired of waiting and force-quits the whole application)
If you don't/can't do that, then there is no way to shut down your app reliably, because you can't safely free up any resources that a thread might still be accessing. It is necessary to 100% guarantee that a thread has exited before the main thread calls the destructors of any objects that the thread uses (e.g. the QThread object associated with the thread)
So to sum up: don't bother playing games with wait-timeouts or forcibly-terminating threads; all that will get you is an application that sometimes crashes on shutdown. Use an indefinite-wait, and make sure your threads always (always!) quit after the main thread has asked them to, as that is the only way you'll achieve a reliable shutdown sequence.
Related
I'm building plugins for a host application using C++11/14, for now targeting Windows and MacOS. The plugins start up async worker threads when the host app starts us up, and if they're still running when the host shuts the plugins down they get signaled to stop. Some of these worker threads are started with std::async so I can use an std::future to get the thread result back, while other less involved threads are just std::threads which I ultimately just join to see when they're done. It all works nicely this way.
Unless the host decides not to call our shutdown procedure when it shuts down itself... Yeah, I know, but it really is that bad sometimes -- it often enough just crashes during shutdown. And they even plan to make that into a 'feature' and call it "Fast Exit" to please their users; just pull the plug and we're done extra fast :(
For that case I have registered an std::atexit handler. It last-minute signals any still running threads to exit NOW (atomic bools and/or signals to wake them up), then it waits a second to give the threads some time to respond, and finally it detaches the regular std::thread threads and hopes for the best. This way at least the threads get a heads up to quickly write intermediate state to disk for a next round (if needed), and quit writing to probably already deceased data structures, thus avoiding crashes which would make any crash dump point the finger at my plugins.
However, atexit handlers run at OS DLL unload time, so I'm not even allowed to use thread synchronization (right?). And under the debugger I just saw all of the worker threads were presumably already killed by the OS, since the atexit handler's thread was the only thread left under the debugger. Needless to say, all remaining std::futures went into full blocking mode, hanging up the remaining corpse of the dead host app...
Is there a way to abandon an std::future? In MS Visual C++ I saw futures have an _Abandon method, but that's too platform specific (and undocumented) for my taste. Or is my only recourse to not use std::future, do all thread communication via my own data structures and synchronization, and work with simple std::threads which can just be detached?
I use Qt 4.8.6, MS Visual Studio 2008, Windows 7. I've created a GUI program. It contains main GUI thread and worker thread (I have not made QThread subclass, by the way), which makes synchronous calls to 3rd party DLL functions. These functions are rather slow. QTcpServer instance is also under worker thread. My worker class contains QTcpServer and DLL wrapper methods.
I know that quit() is preferred over terminate(), but I don't wanna wait for a minute (because of slow DLL functions) during program shutdown. When I try to terminate() worker thread, I notice warnings about stopping QTcpServer from another thread. What is a correct way of process shutdown?
QThread::quit tells the thread's event loop to exit. After calling it the thread will get finished as soon as the control returns to the event loop of the thread
You may also force a thread to terminate right now via QThread::terminate(), but this is a very bad practice, because it may terminate the thread at an undefined position in its code, which means you may end up with resources never getting freed up and other nasty stuff. So use this only if you really can't get around it.
So i think the right approach is to first tell the thread to quit normally and if something goes wrong and takes much time and you have no way to wait for it, then terminate it:
QThread * th = myWorkerObject->thread();
th->quit();
th->wait(5000); // Wait for some seconds to quit
if(th->isRunning()) // Something took time more than usual, I have to terminate it
th->terminate();
You should always try to avoid killing threads from the outside by force and instead ask them nicely to finish what they're doing. This usually means that the thread checks regularly if it should terminate itself and the outside world tells it to terminate when needed (by setting a flag, signaling an event or whatever is appropriate for the situation at hand).
When a thread is asked to terminate itself, it finishes up what it's doing and exists cleanly. The application waits for the thread to terminate and then exits.
You say that in your case the thread takes a long time to finish. You can take this into consideration and still terminate the thread "the nice way" (for example you can hide the application window and give the impression that the app has exited, even if the process takes a little more time until it finally terminates; or you can show some form of progress indication to the user telling him that the application is shutting down).
Unless there is an overriding reason to do so, you should not attempt to terminate threads with user code at process-termination.
If there is no such reason, just call your OS process termination syscall, eg. ExitProcess(0). The OS can, and will will stop all process threads in any state before releasing all process resources. User code cannot do that, and should not try to terminate threads, or signal them to self-terminate, unless absolutely necessary.
Attempting to 'clean up' with user code sounds 'nice', (aparrently), but is an expensive luxury that you will pay for with extra code, extra testing and extra maintenance.
That is, if your customers don't stop buying your app because they get pissed off with it taking so long to shut down.
The OS is very good at stopping threads and cleaning up. It's had endless thousands of hours of testing during development and decades of life in the wild where problems with process termination would have become aparrent and got fixed. You will not even get close to that with your flags, events etc. as you struggle to stop threads running on another core without the benefit of an interprocessor driver.
There are surely times when you will have to resort to user code to stop threads. If you need to stop them before process termination, or you need to close some DB connection, flush some file at shutdown, deal with interprocess comms or the like issues, then you will have to resort to some of the approaches already suggested in other answers.
If not, don't try to duplicate OS functionality in the name of 'niceness'. Just ask it to terminate your process. You can get your warm, fuzzy feeling when your app shuts down immedately while other developers are still struggling to implement 'Shutdown' progress bars or trying to explain to customers why they have 15 zombie apps still running.
Is it possible to have a boost::thread sleep indefinitely after its work is completed and then wake it from another boost::thread?
Using while(1)s are perfect for a dedicated server where I want the threads to run all cores at 100%, but I'm writing a websocket++ server to be run on a desktop, thus I only want the boost::threads to run when they actually have work to do, so I can do other work on my desktop without performance suffering.
I've seen other examples where boost::threads are set to sleep() for constant a amount of time, but I'd rather not spend the time trying to find that optimal constant; besides, I need the websocket++ server to respond as quickly as possible when it receives data to process.
If this is possible, how can it be done with multiple threads trying to wake?
This mechanism is implemented by what is called a condition-variable, see boost::condition_variable. Essentially, the waiting thread will sleep on a locked mutex until another thread signals the condition, thereby unlocking it.
Watch out for spurious wake-ups. Sometimes the waiting thread will wake-up without being signaled. This means that you should still put a while-loop that checks a predicate (or condition) to decipher between real wake-ups and spurious ones.
yes, pthread_mutex_t+pthread_cond_t is the right thing to use, you can find the corresponding
thing in boost.
I have a rare heisenbug in a multi-threaded application where the main thread, and only this thread, will just do nothing. As it's an heisenbug it's really hard to understand why this is happening.
The main thread is basically just looping. In the loop, it check several concurrent priority queues which contain tasks ordered by time to be executed. It pop a task, see if it's time to execute it. If it's time, it will just schedule it into TBB's task scheduler (using a root task which is the parent of all other tasks). If it's not time, the task is pushed again in the priority queue.
That's for one cycle. At the end of the cycle, the main thread is put to sleep for a very short time that I expect will be longer in practice but it's not really a problem, I just don't want it to use too much resources when not necessary.
Litterally:
static const auto TIME_SCHEDULED_TASKS_SPAWN_FREQUENCY = microseconds(250);
while( !m_task_scheduler.is_exiting() ) // check if the application should exit
{
m_clock_scheduler.spawn_realtime_tasks(); // here we spawn tasks if it's time
this_thread::sleep_for( TIME_SCHEDULED_TASKS_SPAWN_FREQUENCY );
}
m_clock_scheduler.clear_tasks();
m_root_task.wait_for_all();
I have a special task that just log a "TICK" message each second. It is automatically rescheduling until the end of the program. However, when the heisenbug appear, I can see the "TICK" disappearing and the application not doing anything else than the work that occurs in non-main threads. So it appear that only the main thread is touched.
The problem can come from different places, maybe in the scheduling logic, but then it's also the only thread that have a sleep call. That sleep is a boost::this_thread::sleep_for().
My question is: Is it possible that Windows (7 64bit) consider the main thread to be sleeping often and decide that it should sleep for a longer period of time than asked or be definitely ended?
I expect that it is not possible but I would like to be sure. I didn't find any precision on this in online documentation so far.
Update:
I have a friend who can reproduce the bug systematically (on Windows Vista, Core 2 Duo). I sent him a version without sleep and another with the main loop reimplemented using condition_variable so that each time a task is pushed in the queue the condition_variable awaken the main thread (but still have a minimum time of spawning).
The version without sleep works (but is slower) - so the problem seem to be related even if I don't know the real source.
The version using condition_variable works - which would indicate that it's the sleep call that don't work correctly?
So, apparently I fixed the bug, but I still don't know why the specific sleep call can sometime block.
UPDATE:
It was actually a bug triggered by Boost code. I hunted the bug and reported it and it have been fixed. I didn't check the previous versions but it is fixed in Boost 1.55
Is it possible that Windows (7 64bit) consider the main thread to be sleeping often and decide that it should sleep for a longer period of time than asked or be definitely ended?
NO. This does not happen. MSDN does not indicate that this could happen. Empirically, I have many Windows apps with periodic intervals ranging from ms to hours. The effect you suggest does not happen - it would be disastrous for my apps.
Given the well-known granularity issues with Sleep() calls for very short intervals, a sleeping thread will become ready upon the expiry of the interval. If there is a CPU core available, (ie. the cores are not all in use running higher-priority threads), the newly-ready thread will become running.
The OS will not extend the interval of Sleep() because of any historical/statistical data associated with the thread states - I don't think it keeps any such data.
When my application is ready to close the tread it created using CreateThread the following algorithm is executed:
_bCloseRequested = TRUE;
dwMsThen = ::GetTickCount();
do
{
::GetExitCodeThread( m_hThread, &dwExitCode );
dwMsNow = ::GetTickCount();
}
while( (dwExitCode == STILL_ACTIVE) && ((dwMsNow - dwMsThen) < 50000UL) );
If the thread fails to close within the 5 allotted seconds, should the thread handle be closed, or allowed to remain open? Thanks.
First, don't wait for a thread to finish like this. You will eat up all available CPU time just waiting, which has also the disadvantage that your thread will take longer to finish!
Use something like this instead:
WaitForSingleObject(m_hThread, 50000);
That said: whether you want to leave the thread running or not depends on what the thread does. Can it even run even though your main app starts doing something else? Does it have critical stuff (files, connections, databases, ...) open that would be left open if you kill the thread? You have to consider all of this before you decide whether to kill the thread or leave it running.
Just wait on the thread handle. If it takes too long, you should just timeout and terminate your app, and fix whatever bug makes the thread fail to exit.
static const DWORD TIMEOUT_VALUE(50000);
if (WaitforSingleObject(m_hThread, TIMEOUT_VALUE) != WAIT_OBJECT_0))
{
// thread did not exit in time, log and exit process
}
Good question.
There are a couple of approaches to this.
The first approach is what I would consider to be the ideal approach. And that is to never terminate threads. The reasons for this are multiple, but here are some biggies:
If your thread owns a synchronization object, they won't be released
RAII objects don't get a chance to clean up
Allocated memory won't be freed
If you are in the middle of certian kernel calls, you could hose your entire application
So going with this approach, you would identify the reasons why the threads are not shutting down, and fix that problem. You may find that the problems run deep. You may find deadlocks, race conditions, etc. Static analysis can help to find these problems.
The ideal approach is the one you should always persue. And in doing this, it's best not to use a spin lock. Instead, Wait() on the thread handle with a timeout. By spinning, your'e wasting resources, and stealing time slices from the thread you're waiting for.
But in the real world, in production code, you need a fallback measure in case everything else fails. You should first try multiple methods to trigger your thread to shut itself down. If everything fails as an absolute last resort, kill the thread. But because of the dangers behind killing a zombie thread, once you've done this, you should restart your entire application. When you kill a thread, you can put your process in a non-deterministic state. So start over. Log an error message, shut the app down, and start again.
Neither. You should fix whatever is keeping the thread from exiting cleanly and simply join on it. Everything else is just a hack.