Do you need to join a cancelled thread? (pthreads) - c++

I'm a little confused about clean-up order when you're using PThreads with regard to cancellation. Normally, if your thread is detached, it automatically cleans up when it terminates. If it's not detached, you need to join it to reclaim the system resources.
The textbook I'm reading states the following which strangely sounds like joining is optional with regard to cancellation:
"If you need to know when the thread has actually terminated, you must
join with it by calling pthread_join after cancelling it."
So, do I need to join a cancelled thread to free its resources - and if not, then why?

TLPI says this:
Upon receiving a cancellation request, a thread whose cancelability is
enabled and deferred terminates when it next reaches a cancellation
point. If the thread was not detached, then some other thread in the
process must join with it, in order to prevent it from becoming a
zombie thread.
Also, since canceling a thread isn't usually done immediately (read more about "cancellation points") without joining you can't be sure the thread was actually canceled.

From man pthread_join:
After a canceled thread has terminated, a join with that thread using
pthread_join(3) obtains PTHREAD_CANCELED as the thread's exit status.
(Joining with a thread is the only way to know that cancellation has
completed.)
It seems that joining is not necessary for execution it is necessary if you want know what you did actually succeed.

From Doccumentation of pthread_cancel():
After a canceled thread has terminated, a join with that thread using pthread_join(3) obtains PTHREAD_CANCELED as the thread's exit status. (Joining with a thread is the only way to know that cancellation has completed.)

A thread using pthread can have following cancelling statuses:
PTHREAD_CANCEL_ENABLE
PTHREAD_CANCEL_DISABLE
If you try to cancel a thread you do not 100% know if the thread will really get cancelled. Using a join delivers the information to you if the thread was really cancelled or not. There are also cancel types to be considered and respective pthread functions for setting the cancel type and state:
int pthread_setcancelstate (int state, int *oldstate);
int pthread_setcanceltype (int type, int *oldtype);
Here is a sample code borrowed from http://www.ijon.de/comp/tutorials/threads/cancel.html
EDIT: Either I am too stupid to post a few lines of code or the formatter is really going on my nerves today. Just look up the code in the link above, please.

If something goes wrong in a thread or it is stopped from with in somehow it will always be tidied up by the OS. So it's all nice and safe.
You only need to join the thread if you have to be sure it has actually stopped executing, like merging two parallel tasks. (E.g. if you have various threads working on various parts a split structure you need to join them all, as in wait until they are all finished, when you want to combine the structure again)

Related

How will pthread_cancel() respond when the cancellation request is queued?

This is a basic question, but the answer seems to be eluding me. In any case, here's the background information:
According to the man pages, pthread_cancel()'s return values are as follows:
On success, pthread_cancel() returns 0; on error, it returns a
nonzero error number.
Depending on the cancellation state of the thread to be cancelled, it may terminate immediately or the request may be queued. In my case, the cancellation will be deferred and will run through a few clean up handlers. Within my main thread, I want to validate the return value. That is, a simple approach would be to add a line such as
assert(pthread_cancel(tID));
From what I can tell, it seems like pthread_cancel() simply returns 0 if the request was successfully queued and not if the thread was cancelled. In other words, will the above line of code be non-blocking? My concern is that if I misinterpreted the man pages, and I have a particularly long deferment period in the child thread, my main thread will be stuck on the assertion because pthread_cancel() is blocking.
pthread_cancel() will never block. It will also not tell you, whether the thread was successfully cancelled.
In mode deferred: It just sets a flag (a cancel request) which the thread in question has to actively query (potentially implicitly in system calls) and then the thread will exit cooperatively.
In mode asynchronous: The thread will be cancelled at any point in time (usually immediately). This is only safe in pure CPU-bound loops, not calling any system or library function, not even allocating memory directly or indirectly. Note that this is also not safe if the thread is synchronizing with other threads using mutexes or other related thread primitives, as cancelling the thread will leave all mutexes in an undefined state.
In short: Cancelling asynchronously is generally unsafe, unclean by design and should generally be avoided. There are only very few use cases where this can be used in a clean way (for example 100% CPU bound code communicating with other threads only through acquire/release semantics (lock free)).
From my man page:
The cancellation processing in the target thread runs asynchronously with respect to the calling thread returning from pthread_cancel().
So yes, your assert will not block.

Do I need to join every thread in my application ?

I'm new with multi-threading and I need to get the whole idea about the "join" and do I need to join every thread in my application ?, and how does that work with multi-threading ?
no, you can detach one thread if you want it to leave it alone.
If you start a thread, either you detach it or you join it before the program ends, otherwise this is undefined behaviour.
To know that a thread needs to be detached you need to ask yourself this question: "do I want the the thread to run after the program main function is finished?". Here are some examples:
When you do File/New you create a new thread and you detach it: the thread will be closed when the user closes the document Here you don't need to join the threads
When you do a Monte Carlo simulation, some distributed computing, or any Divide And Conquer type algorithms, you launch all the threads and you need to wait for all the results so that you can combine them. Here you explicitly need to join the thread before combining the results
Not joining a thread is like not deleteing all memory you new. It can be harmless, or it could be a bad habit.
A thread you have not synchronized with is in an unknown state of execution. If it is a file writing thread, it could be half way through writing a file and then the app finishes. If it is a network communications thread, it could be half way through a handshake.
The downside to joining every thread is if one of them has gotten into a bad state and has blocked, your app can hang.
In general you should try to send a message to your outstanding threads to tell them to exit and clean up. Then you should wait a modest amount of time for them to finish or otherwise respond that they are good to die, and then shut down the app. Now prior to this you should signify your program is no longer open for business -- shit down GUI windows, respond to requests from other processes that you are shutting down, etc -- so if this takes longer than anticipated the user is not bothered. Finally if things go imperfectly -- if threads refuse to respond to your request that they shut down and you give up on them -- then you should log errors as well, so you can fix what may be a symptom of a bigger problem.
The last time a worker thread unexpectedly hung I initially thought was a problem with a network outage and a bug in the timeout code. Upon deeper inspection it was because one of the objects in use was deleted prior to the shutdown synchronization: the undefined behaviour that resulted just looked like a hang in my reproduction cases. Had we not carefully joined, that bug would have been harder to track down (now, the right thing to do would have been to use a shared resource that we could not delete: but mistakes happen).
The pthread_join() function suspends execution of the calling thread
until the target thread terminates, unless the target thread has
already terminated. On return from a successful pthread_join() call
with a non-NULL value_ptr argument, the value passed to pthread_exit()
by the terminating thread is made available in the location referenced
by value_ptr. When a pthread_join() returns successfully, the target
thread has been terminated. The results of multiple simultaneous calls
to pthread_join() specifying the same target thread are undefined. If
the thread calling pthread_join() is canceled, then the target thread
will not be detached.
So pthread_join does two things:
Wait for the thread to finish.
Clean up any resources associated
with the thread.
This means that if you exit the process without call to pthread_join, then (2) will be done for you by the OS (although it won't do thread cancellation cleanup), and (1) will not be done.
So whether you need to call pthread_join depends whether you need (1) to happen.
Detached thread
If you don't need the thread to run, then you may as well pthread_detach it. A detached thread cannot be joined (so you can't wait on its completion), but its resources are freed automatically if it does complete.
do I need to join every thread in my application ?
Not necessarily - depends on your design and OS. Join() is actively hazardous in GUI apps - tend If you don't need to know, or don't care, about knowing if one thread has terminated from another thread, you don't need to join it.
I try very hard to not join/WaitFor any threads at all. Pool threads, app-lifetime threads and the like often do not require any explicit termination - depends on OS and whether the thread/s own, or are explicitly bound to, any resources that need explicit termination/close/whatever.
Threads can be either joinable or detached. Detached threads should not be joined. On the other hand, if you didn't join the joinable thread, you app would leak some memory and some thread structures. c++11 std::thread would call std::terminate, if it wasn't marked detached and thread object went out of scope without .join() called. See pthread_detach and pthread_create. This is much alike with processes. When the child exits, it will stay as zombee while it's creater willn't call waitpid. The reson for such behavior is that thread's and process's creater might want to know there exit code.
Update: if pthread_create is called with attribute argument equal to NULL (default attributes are used), joinable thread will be created. To create a detached thread, you can use attributes:
pthread_attr_t attrs;
pthread_attr_init(&attrs);
pthread_attr_setdetachstate(&attrs, PTHREAD_CREATE_DETACHED);
pthread_create(thread, attrs, callback, arg);
Also, you can make a thread to be detached by calling pthread_detach on a created one. If you will try to join with a detached thread, pthread_join will return EINVAL error code. glibc has a non portable extension pthread_getattr_np that allows to get attributes of a running thread. So you can check if thread is detached with pthread_attr_getdetachstate.

How do I know if a boost thread is done?

I am using boost::thread to process messages in a queue.
When a first message comes I start a message processing thread.
When a second message comes I check if the message processing thread is done.
if it is done I start a new one
if it is not done I don nothing.
How do I know if the thread is done ? I tried with joinable() but it is not working, as when the thread is done, it is still joinable.
I also tried to interrupt the process at once, and add an interruption point at the end of my thread, but it did not work.
Thanks
EDIT :
I would like to have my thread sleep for an undetermined time, and wake up when a signal is triggered.
The mean to do it is boost::condition_variable
As far as I know you should use the join() method to wait the end of a thread execution. You can use it with a timeout with timed_join().
You can interrupt threads with interrupt(). In this case, inside the thread an exception will occur if the execution reaches an interruption point ( a boost::this_thread::sleep() or boost::this_thread::interruption_point() ). You catch the exception inside the thread and you can then close it.
Spawning a new thread for each incoming message is very inefficient. You should check out the Thread pool pattern.
EDIT:
Sorry, jules, I misread your question. I recommend you take a look at the producer-consumer pattern. Check out this article on how to roll your own blocking queue using boost condition variables. Intel's Thread Building Blocks also has a blocking queue implementation.
Check out this SO question about existing lock-free queue implementations.
Hope this helps.
Have you tried checking get_id() with boost::this_thread::get_id(). If they match the thread does not exist. But that will only happen if you have exited the thread.

Stopping an MFC thread

I understand the problem with just killing the thread directly (via AfxEndThread or other means), and I've seen the examples using CEvent objects to signal the thread and then having the thread clean itself up. The problem I have is that using CEvent to signal the thread seems to require a loop where you check to see if the thread is signaled at the end of the loop. The problem is, my thread doesn't loop. It just runs, and the processing could take a while (which is why I'd like to be able to stop it).
Also, if I were to just kill the thread, I realize that anything I've allocated will not have a chance to clean itself up. It seems to me like any locals I've been using that happen to have put stuff on the heap will also not be able to clean themselves up. Is this the case?
There is no secret magic knowledge here.
Just check the event object periodically throughout the function code, where you deem it is safe to exit.
Does your thread ever exit? If so, you could set an event in the thread at exit and have the main process wait for that event via waitforsingleevent. This is best to do with a timeout so the main process doesn't appear to lockup when it's closing. At the timeout event, kill the thread via AfxKillThread. You'll have to determine what a reasonable timeout is, though.
Since you don't loop in the thread this seems to me to be the only way to do this. Of course, you could something like set a boolean flag in the main process and have the thread periodically check this flag, but then your thread code will be littered with "if(!canRun) return;" type code.
If the thread never exits, then AfxKillThread/AfxTerminateThread is the only way to stop the thread.
Locals would be placed on the stack and, hence, WOULD be freed on forcing the thread shut (I think). Destructors won't get called though and any critical sections the thread holds will not get released.
If the thread is ONLY doing things with simple data types on the stack, however, it IS a safe thing to be doing.

In wxwidgets, how do I make one thread wait for another to complete before proceeding?

I have a system where my singleton class spawns a thread to do a calculation. If the user requests another calculation while another calculation is still running, I want it to tear down the existing thread and start a new one. But, it should wait for the first thread to exit completely before proceeding. I have all the tear down working but I seem to have an issue with making sure that only one thread runs. My approach is for the StartCalculation function to call mutex->Lock(). And the thread in the destructor releases the lock. It's not working. Am I right in assuming that if Lock() can't get the lock, it spins and keeps trying to reacquire the lock? Can this Lock() be called from my main application thread? Any ideas is helpful. Maybe wxMutex locks are the right mechanism for this.
To wait for a thread you need to create it joinable and simply use wxThread::Wait(). However I agree with the remark above: this is not something you'd normally do at all and definitely not from the main GUI thread as you should never block in it because this freezes the UI.
Consider using a message queue to simply tell the existing thread about the new task it needs to perform instead.