boost::threads - how to do graceful shutdown?

boost::threads - how to do graceful shutdown? - c++

I'm trying to improve the portability of a C++ app by using boost:threads instead of our own wrapper over Win32 threads, and the issue of graceful thread termination (again) rears its ugly head.
On pure win32, I 'interrupt' threads by using QueueUserAPC to throw a "thread_interrupt" exception which causes all RAII objects to cleanup on the way out, as described here. Any 'alertable' OS function can be interrupted in this way, so things like mutex waits, sleeps, serial and socket I/O are all possible interruption points.
However, boost:mutexes etc. aren't "alertable" by QueueUserAPC on win32 - they call things like Sleep(n) rather then SleepEx(n, true))
Boost threads do have an "interrupt" mechanism (which similarly involves throwing an exception) but it seems to have the drawback that ONLY boost::thread calls are interruptable, so a third-party socket library (for example) cannot be interrupted.
What to do? I could modify the boost source locally to make it interruptable, but that feels like a poor choice and I don't think helps the portability. Redesigning the entire app to remove the requirement for graceful thread shutdown is a similarly unappealing route...

I have an idea for a partial solution for Win32, but I have yet to test it:
My "thread interrupt" method could call both boost::thread.interrupt(), AND QueueUserAPC and The function invoked by QueueUserAPC would just call boost::interruption_point() to allow the boost interrupt to take control.
This should mean that the thread gets interrupted (but "differently") both when it's waiting on a boost synchronization object or an 'alertable' native windows one.

Related

Should I use thread in my library?

I am implementing a function in library which takes a while (up to a minute). It initialize a device. Now generally any long function should run in its own thread and report to main thread when it completes but I am not sure since this function is in library.
My dilemma is this, even if I implement this in a separate thread, another thread in the application has to wait on it. If so why not let the application run this function in that thread anyways?
I could pass queue or mailbox to the library function but I would prefer a simpler mechanism where the library can be used in VB, VC, C# or other windows platforms.
Alternatively I could pass HWND of the window and the library function can post message to it when it completes instead of signaling any event. That seems like most practical approach if I have to implement the function in its own thread. Is this reasonable?
Currently my function prototype is:
void InitDevice(HANDLE hWait)
When initialization is complete than I signal bWait. This works fine but I am not convinced I should use thread anyways when another secondary thread will have to wait on InitDevice. Should I pass HWNDinstead? That way the message will be posted to the primary thread and it will make better sense with multithreading.

In general, when I write library code, I normally try to stay away from creating threads unless it's really necessary. By creating a thread, you're forcing a particular threading model on the application. Perhaps they wish to use it from a very simplistic command-line tool where a single thread is fine. Or they could use it from a GUI tool where things must be multi-threaded.
So, instead, just give the library user understanding that a function call is a long-term blocking call, some callback mechanism to monitor the progress, and finally a way to immediately halt the operation which could be used by a multi-threaded application.
What you do want to claim is being thread safe. Use mutexes to protect data items if there are other functions they can call to affect the operation of the blocking function.

Cancelling a thread that has a mutex locked does not unlock the mutex

helping a client out with an issue that they are having. I'm more of a sysadmin/DBA guy so I'm struggling with helping them out. They are saying it is a bug in the kernel/environment, I'm trying to either prove or disprove that before I insist that it is in their code or seek vendor support for the OS.
Happens on Red Hat and Oracle Enterprise Linux 5.7 (and 5.8), application is written in C++
The problem they are experiencing is that the main thread starts a separate thread to do a potentially long-running TCP connect() [client connecting to server].
If the 'long-running' aspect takes too long, they cancel the thread and start another one.
This is done because we don't know the state of the server program:
server program up and running --> connection immediately accepted
server program not running, machine and network OK --> connection
immediately failed with error 'connection refused'
machine or network crashed or down --> connection takes a long time
to fail with error 'no route to host'
The problem is that cancelling the thread that has the mutex locked
(with cleanup handlers set up to unlock the mutex) sometimes does NOT unlock the mutex.
That leaves the main thread hung on trying to lock the mutex.
Detailed environment info:
glibc-2.5-65
glibc-2.5-65
libcap-1.10-26
kernel-debug-2.6.18-274.el5
glibc-headers-2.5-65
glibc-common-2.5-65
libcap-1.10-26
kernel-doc-2.6.18-274.el5
kernel-2.6.18-274.el5
kernel-headers-2.6.18-274.el5
glibc-devel-2.5-65
Code was built with:
c++ -g3 tst2.C -lpthread -o tst2
Any advice and guidance is greatly appreciated

It's correct that cancelled threads do not unlock mutexes they hold, you need to arrange for that to happen manually, which can be tricky as you need to be very careful to use the right cleanup handlers around every possible cancellation point. Assuming you're using pthread_cancel to cancel the thread and setting cleanup handlers with pthread_cleanup_push to unlock the mutexes, there are a couple of alternatives you could try which might be simpler to get right and so may be more reliable.
Using RAII to unlock the mutex will be more reliable. On GNU/Linux pthread_cancel is implemented with a special exception of type __cxxabi::__forced_unwind, so when a thread is cancelled an exception is thrown and the stack is unwound. If a mutex is locked by an RAII type then its destructor will be guaranteed to run if the stack is unwound by a __forced_unwind exception. Boost Thread provides a portable C++ library that wraps Pthreads and is much easier to use. It provides an RAII type boost::mutex and other useful abstractions. Boost Thread also provides its own "thread interruption" mechanism which is similar to Pthread cancellation but not the same, and Pthread cancellation points (such as connect) are not Boost Thread interruption points, which can be helpful for some applications. However in your client's case since the point of cancellation is to interrupt the connect call they probably do want to stick with Pthread cancellation. The (non-portable) way GNU/Linux implements cancellation as an exception means it will work well with boost::mutex.
There is really no excuse for explicitly locking and unlocking mutexes when you're writing in C++, IMHO the most important and most useful feature of C++ is destructors which are ideal for automatically releasing resources such as mutex locks.
Another option would be to use a robust mutex, which is created by calling pthread_mutexattr_setrobust on a pthread_mutexattr_t before initializing the mutex. If a thread dies while holding a robust mutex the kernel will make a note of it so that the next thread which tries to lock the mutex gets the special error code EOWNERDEAD. If possible, the new thread can make the data protected by the thread consistent again and take ownership of the mutex. This is much harder to use correctly than simply using an RAII type to lock and unlock the mutex.
A completely different approach would be to decide if you really need to hold the mutex lock while calling connect. Holding mutexes during slow operations is not a good idea. Can't you call connect then if successful lock the mutex and update whatever shared data is being protected by the mutex?
My preference would be to both use Boost Thread and avoid holding the mutex for long periods.

The problem they are experiencing is that the main thread starts a separate thread to do a potentially long-running TCP connect() [client connecting to server]. If the 'long-running' aspect takes too long, they cancel the thread and start another one.
Trivial fix -- don't cancel the thread. Is it doing any harm? If necessary, have the thread check (when the connect finally does complete) whether the connection is still needed and, if not, close it, release the mutex, and terminate. You can do this with a boolean variable protected by a mutex.
Also, a thread should not hold a mutex while waiting for network I/O. Mutexes should be used only for things that are fast and primarily CPU-limited or perhaps limited by local disk.
Finally, if you feel you need to reach in from the outside and force a thread to do something, step back. You wrote the code for that thread. If you feel that need, it means you didn't code that thread to do what you really wanted it to do. The fix is to modify the thread to do what, and only what, you actually want. Then you won't have to "push it around" from the outside.

win32 Handles and multithread

In our application, there is a heavy use of win32 HANDLEs, using CreateEvent, SetEvent/ResetEvent, so as to perform synchronization mechanisms.
A colleague of mine has asked me if accessing the HANDLEs for events was thread-safe.
I could not answer, since HANDLEs are not thread safe for any GDI object...
But since events are aimed towards multithread synchronization, I could not imagine they arent thread safe.
Could you confirm this ?

All handles you obtain from functions in Kernel32 are thread-safe, unless the MSDN Library article for the function explicitly mentions it is not. There's an easy way to tell from your code, such a handle is closed with CloseHandle().
What you do with the handle may not necessarily be thread safe, Windows won't help when you call SetEvent() twice but WaitForSingleObject() only once. Which might be a threading race in your program, depending on how you use the event.

Depends on the type of handle.
A synchronization handle (like one created by CreateEvent) is by definition thread safe.
A file handle, when written to by multiple threads simultaneously, not so much.

Boost.asio & UNIX signal handling

Preface
I have a multi-threaded application running via Boost.Asio. There is only one boost::asio::io_service for the whole application and all the things are done inside it by a group of threads. Sometimes it is needed to spawn child processes using fork and exec. When child terminates I need to make waitpid on it to check exit code an to collect zombie. I used recently added boost::asio::signal_set but encountered a problem under ancient systems with linux-2.4.* kernels (that are unfortunately still used by some customers). Under older linux kernels threads are actually a special cases of processes and therefore if a child was spawned by one thread, another thread is unable to wait for it using waitpid family of system calls. Asio's signal_set posts signal handler to io_service and any thread running this service can run this handler, which is inappropriate for my case. So I decided to handle signals in old good signal/sigaction way - all threads have the same handler that calls waitpid. So there is another problem:
The problem
When signal is caught by handler and process is successfully sigwaited, how can I "post" this to my io_service from the handler? As it seems to me, obvious io_service::post() method is impossible because it can deadlock on io_service internal mutexes if signal comes at wrong time. The only thing that came to my mind is to use some pipe or socketpair to write notifications there and async_wait on another end as it is done sometimes to handle signals in poll() event loops.
Are there any better solutions?

I've not dealt with boost::asio but I have solved a similar problem. I believe my solution works for both LinuxThreads and the newer NPTL threads.
I'm assuming that the reason you want to "post" signals to your *io_service* is to interrupt an system call so the thread/program will exit cleanly. Is this correct? If not maybe you can better describe your end goal.
I tried a lot of different solutions including some which required detecting which type of threads were being used. The thing that finally helped me solve this was the section titled Interruption of System Calls and Library Functions by Signal Handlers of man signal(7).
The key is to use sigaction() in your signal handling thread with out SA_RESTART, to create handlers for all the signals you want to catch, unmask these signals using pthread_sigmask(SIG_UNBLOCK, sig_set, 0) in the signal handling thread and mask the same signal set in all other threads. The handler does not have to do anything. Just having a handler changes the behavior and not setting SA_RESTART allows interruptible systems calls (like write()) to interrupt. Whereas if you use sigwait() system calls in other threads are not interrupted.
In order to easily mask signals in all other threads. I start the signal handling thread. Then mask all the signals in want to handle in the main thread before starting any other threads. Then when other threads are started they copy the main thread's signal mask.
The point is if you do this then you may not need to post signals to your *io_service* because you can just check your system calls for interrupt return codes. I don't know how this works with boost::asio though.
So the end result of all this is that I can catch the signals I want like SIGINT, SIGTERM, SIGHUO and SIGQUIT in order to perform a clean shutdown but my other threads still get their system calls interrupted and can also exit cleanly with out any communication between the signal thread and the rest of the system, with out doing anything dangerous in the signal handler and a single implementation works on both LinuxThreads and NPTL.
Maybe that wasn't the answer you were looking for but I hope it helps.
NOTE: If you want to figure out if the system is running LinuxThreads you can do this by spawning a thread and then comparing it's PID to the main thread's PID. If they differ it's LinuxThreads. You can then choose the best solution for the thread type.

If you are already polling your IO, another possible solution that is very simple is to just use a boolean to signal the other threads. A boolean is always either zero or not so there is no possibility of a partial update and a race condition. You can then just set this boolean flag without any mutexes that the other threads read. Tools like valgrind wont like it but in practice it works.
If you want to be even more correct you can use gcc's atomics but this is compiler specific.

Kill a blocked Boost::Thread

I am writing an application which blocks on input from two istreams.
Reading from either istream is a synchronous (blocking) call, so, I decided to create two Boost::threads to do the reading.
Either one of these threads can get to the "end" (based on some input received), and once the "end" is reached, both input streams stop receiving. Unfortunately, I cannot know which will do so.
Thus, I cannot join() on both threads, because only one thread (cannot be predetermined which one) will actually return (unblock).
I must somehow force the other to exit, but it is blocked waiting for input, so it cannot itself decide it is time to return (condition variables or what not).
Is their a way to either:
Send a signal a boost::thread, or
Force an istream to "fail", or
Kill a Boost::thread?
Note:
One of the istreams is cin
I am trying to restart the process, so I cannot close the input streams in a way that prohibits reseting them.
Edit:
I do know when the "end" is reached, and I do know which thread has successfully finished, and which needs to be killed. Its the killing I need to figure out (or a different strategy for reading from an istream).
I need both threads to exit and cleanup properly :(
Thanks!

I don't think there is a way to do it cross platform, but pthread_cancel should be what you are looking for. With a boost thread you can get the native_handle from a thread, and call pthread_cancel on it.
In addition a better way might be to use the boost asio equivalent of a select call on multiple files. That way one thread will be blocked waiting for the input, but it could come from either input stream. I don't know how easy it is to do something like this with iostreams though.

Yes there is!
boost::thread::terminate() will do the job to your specifications.
It will cause the targeted thread to throw an exception. Assuming it's uncaught, the stack will unwind properly destroying all resources and terminating thread execution.
The termination isn't instant. (The wrong thread is running at that moment, anyway.)
It happens under predefined conditions - the most convenient for you would probably be when calling boost::this_thread::sleep();, which you could have that thread do periodically.

If a boost thread is blocking on an i/o operation (e.g. cin>>whatever), boost::thread::terminate() will not kill the thread. cin i/o is not a valid termination point. Catch 22.

Well on linux, I use pthread_signal(SIGUSR1), as it interrupts blocking IO. There no such call on windows as I discovered when porting my code. Only a deprecated one in socket reading call. In windows you have to explicitly define an event that will interrupt your blocking call. So there no such thing (AFAIK) as a generic way to interrupt blocking IO.
The boost.thread design handle this by managing well identified interrupt points. I don't know boost.asio well and it seems that you don't want to rely on it anyway. If you don't want to refactor to use non-blocking paradigm, What you can do is using something between non-blocking (polling) and blocking IO. That is do something like (pseudo code ?) :
while(!stopped && !interrupted)
{
io.blockingCall(timeout);
if(!stopped && !interrupted)
{
doSomething();
}
}
Then you interrupt your two threads and join them ...
Perhaps it is simpler in your case ? If you have a master thread that knows one thread is ended you just have to close the IO of the other thread ?
Edit:
By the way I'm interested in the final solution you have ...

I had a similar issue myself and have reached this solution, which some other readers of this question might find useful:
Assuming that you are using a condition variable with a wait() command, it is important for you to know that in Boost, the wait() statement is a natural interrupt point. So just put a try/catch block around the code with the wait statement and allow the function to terminate normally in your catch block.
Now, assuming you have a container with your thread pointers, iterate over your thread pointers and call interrupt() on each thread, followed by join().
Now all of your threads will terminate gracefully and any Boost-related memory cleanup should work cleanly.

Rather than trying to kill your thread, you can always tryjoin the thread instead, and if it fails, you join the other one instead. (Assuming you will always be able to join at least one of your two threads).
In boost:thread you're looking for the timed_join function.
If you want to look at the correct answer, however, that would be to use non-blocking io with timed waits. Allowing you to get the flow structure of synchronous io, with the non-blocking of asynchronous io.
You talk about reading form an istream, but an istream is only an interface. for stdin, you can just fclose the stdin file descriptor to interrupt the read. As for the other, it depends an where you're reading from...

It seems that threads are not helping you do what you want in a simple way. If Boost.Asio is not to your liking, consider using select().
The idea is to get two file descriptors and use select() to tell you which of them has input available. The file descriptor for cin is typically STDIN_FILENO; how to get the other one depends on your specifics (if it's a file, just open() it instead of using ifstream).
Call select() in a loop to find out which input to read, and when you want to stop, just break out of the loop.

Under Windows, use QueueUserAPC to queue a proc which throws an exception. That approach works fine for me.
HOWEVER: I've just found that boost mutexes etc are not "alertable" on win32, so QueueUserAPC cannot interrupt them.

Very late, but in Windows (and it's precursors like VMS or RSX for those that rember such things) I'd use something like ReadFileEx with a completion routine that signals when finished, and CancelIO if the read needs to be cancelled early.
Linux/BSD has an entirely different underlying API which isn't as flexible. Using pthread_kill to send a signal works for me, that will stop the read/open operation.
It's worth implementing different code in this area for each platform, IMHO.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js