I'm doing an assignment where we are creating 4 threads that all share some memory. Just want to ask if the design of my monitor looks good, since I'm having a deadlock/ stalling somewhere in my code when I try to cancel all threads.
Previously, I have identified the stalling to threads locking up the mutex when they are cancelled, leaving a thread deadlocking on waiting for the mutex to unlock. I've implemented some changes but still seems to stall when I pipe some data into it using
cat input.txt |./app
However if I read the data directly from the file using getline() then it does not stall and all threads are cancelled.
Currently, the monitor contains the 2 shared lists(these are created using the same pool of nodes), a mutex control access to this list, with 4 condition variables 2 per list.
//shared data
static List *sendList;
static List *receivelist;
//locks
static pthread_cond_t sendListIsFullCondVar = PTHREAD_COND_INITIALIZER;
static pthread_cond_t sendListIsEmptyCondVar = PTHREAD_COND_INITIALIZER;
static pthread_cond_t receiveListIsFullCondVar = PTHREAD_COND_INITIALIZER;
static pthread_cond_t receiveListIsEmptyCondVar = PTHREAD_COND_INITIALIZER;
static pthread_mutex_t listAccessMutex = PTHREAD_MUTEX_INITIALIZER;
The monitor interface consist of an add function and a get function for each list. Each threads are expected to be either adding to the list or getting from the list, but not both.
More specifically, the keyboardThread puts data into the sendlist, sendThread get data from sendlist, receiveThread put data into receivelist, and printThread get data from receivelist.
void Monitor_sendlistAdd(void *item)
{
pthread_mutex_lock(&listAccessMutex);
if (List_count(sendList) == MAX_LIST_SIZE)
{
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
pthread_cond_wait(&sendListIsFullCondVar, &listAccessMutex);
}
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
List_prepend(sendList, item);
pthread_cond_signal(&sendListIsEmptyCondVar);
pthread_mutex_unlock(&listAccessMutex);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
}
void *Monitor_sendlistGet()
{
pthread_mutex_lock(&listAccessMutex);
if (List_count(sendList) == 0)
{
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
pthread_cond_wait(&sendListIsEmptyCondVar, &listAccessMutex);
}
pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
void *item = List_trim(sendList);
pthread_cond_signal(&sendListIsFullCondVar);
pthread_mutex_unlock(&listAccessMutex);
pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL);
return item;
}
(I didn't include the interface for receiveList since its is identical.)
I'm changing the cancel states after the if-statement to make sure a thread is not cancelled while locking the mutex, which would stall any process from cancelling if it is waiting for the mutex.
I also giving each thread's a cleaup handler that releases its mutex, again to ensure that no thread cancel without keeping the mutex locked.
void *keyboardThread()
{
pthread_cleanup_push(Monitor_releaseListAccessMutex, NULL);
while (1)
{
... some codes
}
pthread_cleanup_pop(1);
return;
So yeah, I'm just completely stumped as to where else could be blocking in my code. The rest of the code are just making connections to sockets to shoot data between ports and mallocs. I have looked into the manual and it seems like mutex_lock is the only function in my code that can block a thread cancel.
Do not use thread cancellation if you can possibly help it. It has deep problems associated with it.
To break threads out of a pthread_cond_wait(), use pthread_cond_signal() or pthread_cond_broadcast(). Or use pthread_cond_timedwait() in the first place and just let the timeout expire.
"But, wait!" I imagine you saying, "Then my threads will just proceed as if they had been signaled normally!" And there's the rub: your threads need to be able to handle spurious returns from their waits anyway, as those can and do happen. They must check before waiting whether they need to wait at all, and then they must check again, and potentially wait again, after returning from their wait.
What you can do, then, is add a shared flag that informs your thread(s) that they should abort instead of proceeding normally, and have them check that it is not set as one of the conditions for waiting. If it is set before any iteration of the wait loop then the thread should take whatever action is appropriate, such as (releasing all its locked mutexes and) terminating.
You remark:
I also giving each thread's a cleaup handler that releases its mutex
That's probably a bad idea, and it may be directly contributing to your problem. Threads must not attempt to unlock a mutex that they do not hold locked, so for your cleanup handlers to work correctly, you would need to track which mutexes each thread currently has locked, in a form that the cleanup handlers can act upon. It's conceivable that you could do that, but at best it would be messy, and probably fragile, and it might well carry its own synchronization issues. This is among the problems attending use of thread cancellation.
Related
Hey there!
I have already looked up similiar problems here, but didn't arrive at a final solution.
In my application, I have two processes running concurrently that need to be synchronized via shared-memory. Currently I'm using a pthread_mutex_t & pthread_cond_t for that purpose which are put into shared-memory.
This is working fine until process A crashes whilst waiting on the condition. If process A is restarted a deadlock(?) happens in which process A waits on the condition and process B is stuck indefinetly at the call to pthread_cond_broadcast.
I read that this may be due to the mutex being in an inconsistent state, but in fact it never seems to be in my program.
I would appreciate if you can tell me if I misunderstood something or if there are alternative approaches on solving this or if it isn't possible at all to guard this case of crashing.
struct Poller
{
private:
std::atomic<bool> waiting;
pthread_mutex_t mtx;
pthread_cond_t cnd;
auto lockMutex() -> void
{
// Recover Mutex if any process locking it died unexpectedly
LOG_DEBUG(info, "AdaptivePoller", "Locking mutex...");
if(pthread_mutex_lock(&mtx) == EOWNERDEAD) {
LOG_DEBUG(info, "AdaptivePoller", "Mutex in inconsistent state.");
if(pthread_mutex_consistent(&mtx) != 0) {
throw std::runtime_error("Mutex could not be brought back into consistent state.");
}
}
}
public:
Poller()
{
waiting = false;
pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);
pthread_mutexattr_setrobust(&attr, PTHREAD_MUTEX_ROBUST);
pthread_mutexattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_mutex_init(&mtx, &attr);
pthread_mutexattr_destroy(&attr);
pthread_condattr_t attrcond;
pthread_condattr_init(&attrcond);
pthread_condattr_setpshared(&attrcond, PTHREAD_PROCESS_SHARED);
pthread_cond_init(&cnd, &attrcond);
pthread_condattr_destroy(&attrcond);
}
auto wait() -> void
{
lockMutex();
waiting = true;
LOG_DEBUG(info, "AdaptivePoller", "Start waiting...");
while(waiting) {
pthread_cond_wait(&cnd, &mtx);
}
pthread_mutex_unlock(&mtx);
}
auto notify() -> void
{
lockMutex();
waiting = false;
LOG_DEBUG(info, "AdaptivePoller", "Notifying...");
pthread_cond_broadcast(&cnd);
pthread_mutex_unlock(&mtx);
}
};
As far as I am aware, the POSIX API does not define mechanisms sufficient to make your Poller wholly robust against all process-failure scenarios. But you might achieve a situation you consider an improvement by using a SysV semaphore in place of the condition variable and mutex. Here's an outline:
a semaphore can_proceed provides for threads to block themselves until released. This semaphore has initial value 0, and threads block themselves by attempting to decrement it.
a thread releases the currently-blocked processes by incrementing can_proceed by the number of blocked processes. The SysV semaphore API provides an operation for determining how many that is.
The above is not infallible. At least these cases can occur:
A waiting process W dies after a notifying process N reads a waiter count that includes W, but before W completes its semaphore decrement. This will allow one future waiter to pass through without blocking.
A waiting process W2 arrives in wait() between when a notifying process has incremented can_proceed and when the last of the previously-waiting processes completes its semaphore decrement. It is possible in this case that W2 proceeds immediately instead of some thread W1 that was already waiting at the time of the notification. In this case, W1 would be released by the next notification (unless the same thing happened to it again).
If two processes attempt to notify at about the same time then one could observe a waiter count that reflected processes that had already been released, but had not yet completed their semaphore decrement. The likely outcome is that one or more future processes would pass through wait() without blocking.
Inasmuch as two of those are analogous to spurious return from a CV wait, they could be addressed via a similar predicate-checking idiom. That would be tricky given the fact that there is necessarily a gap between a process completing its semaphore decrement and it acquiring a mutex, but perhaps that could be worked around by using an atomic object, so that no mutex is required.
It is also possible that a process dies in notify() before incrementing can_proceed, such that that notification is ineffective. I don't account that a weakness specific to the suggested scheme.
Note that SysV semaphores do not themselves reside in or require shared memory.
I am trying to incorporate threads into my project, but have a problem where using merely 1 worker thread makes it "fall asleep" permanently. Perhaps I have a race condition, but just can't notice it.
My PeriodicThreads object maintains a collection of threads. Once PeriodicThreads::exec_threads() has been invoked, the threads are notified, are awaken and preform their task. Afterwards, they fall back to sleep.
Function of such a worker-thread:
void PeriodicThreads::threadWork(size_t threadId){
//not really used, but need to decalre to use conditional_variable:
std::mutex mutex;
std::unique_lock<std::mutex> lck(mutex);
while (true){
// wait until told to start working on a task:
while (_thread_shouldWork[threadId] == false){
_threads_startSignal.wait(lck);
}
thread_iteration(threadId); //virtual function
_thread_shouldWork[threadId] = false; //vector of flags
_thread_doneSignal.notify_all();
}//end while(true) - run until terminated externally or this whole obj is deleted
}
As you can see, each thread is monitoring its own entry in a vector of flags, and once it sees that it's flag is true - performs the task then resets its flag.
Here is the function that can awaken all the threads:
std::atomic_bool _threadsWorking =false;
//blocks the current thread until all worker threads have completed:
void PeriodicThreads::exec_threads(){
if(_threadsWorking ){
throw std::runtime_error("you requested exec_threads(), but threads haven't yet finished executing the previous task!");
}
_threadsWorking = true;//NOTICE: doing this after the exception check.
//tell all threads to unpause by setting their flags to 'true'
std::fill(_thread_shouldWork.begin(), _thread_shouldWork.end(), true);
_threads_startSignal.notify_all();
//wait for threads to complete:
std::mutex mutex;
std::unique_lock<std::mutex> lck(mutex); //lock & mutex are not really used.
auto isContinueWaiting = [&]()->bool{
bool threadsWorking = false;
for (size_t i=0; i<_thread_shouldWork.size(); ++i){
threadsWorking |= _thread_shouldWork[i];
}
return threadsWorking;
};
while (isContinueWaiting()){
_thread_doneSignal.wait(lck);
}
_threadsWorking = false;//set atomic to false
}
Invoking exec_threads() works fine for several hundred or in rare cases several thousand consecutive iterations. Invocations occur from the main thread's while loop. Its worker thread processes the task, resets its flag and goes back to sleep until the next exec_threads(), and so on.
However, some time after that, the program snaps into a "hibernation", and seems to pause, but doesn't crash.
During such a "hibernation" putting a breakpoint at any while-loop of my condition_variables never actualy causes that breakpoint to trigger.
Being sneaky, I've created my own verify-thread (parallel to main) and monitor my PeriodicThreads object. As it falls into hibernation, my verify-thread keeps outputting to the console me that no threads are currently running (the _threadsWorking atomic of PeriodicThreads is permanently set to false). However, during the other tests the atomic remains as true, once that "hibernation issue" begins.
The strange thing is that if I force the PeriodicThreads::run_thread to sleep for at least 10 microseconds before resetting its flag, things work as normal, and no "hibernation" occurs. Otherwise, if we allow thread to complete it's task very quickly it might cause this whole issue.
I've wrapped each condition_variable inside a while loop to prevent spurious wakes from triggering transition, and situation where notify_all is called before .wait() is called on it. Link
Notice, this occurs even when I have only 1 worker thread
What could be the cause?
Edit
Abandoning these vector flags and just testing on a single atomic_bool with 1 worker thread still shows the same issue.
All shared data should be protected by a mutex. The mutex should have (at least) the same scope as the shared data.
Your _thread_shouldWork container is shared data. You can make a global array of mutexes and each one can protect its own _thread_shouldWork element. (see note below). You should also have at least as many condition variables as you have mutexes. (You can use 1 mutex with several different condition variables, but you should not use several different mutexes with 1 condition variable.)
A condition_variable should protect an actual condition (in this case, the state of an individual element of _thread_shouldWork at any given point) and the mutex is used to protect the variables that encompass that condition.
If you're just using a random local mutex (as you are in your thread code) or just not using a mutex at all (in the main code), then all bets are off. It's undefined behavior. Although I could see it working (by luck) most of the time. What I suspect is happening is that a worker thread is missing the signal from the main thread. It could also be that your main thread is missing the signal from a worker thread. (Thread A reads the state and enters the while loop, then Thread B changes the state and sends the notification, then Thread A goes to sleep... waiting for a notification that was already sent)
Mutexes with local scope are a red flag!
Note: If you're using a vector, you have to watch out because adding or removing items can trigger a resize which will touch elements without grabbing the mutex first (because of course the vector doesn't know about your mutex).
You also have to watch out for false sharing when using arrays
Edit: Here's a video that #Kari found useful for explaining false sharing
https://www.youtube.com/watch?v=dznxqe1Uk3E
I’m reading up on pthread.h; the condition variable related functions (like pthread_cond_wait(3)) require a mutex as an argument. Why? As far as I can tell, I’m going to be creating a mutex just to use as that argument? What is that mutex supposed to do?
It's just the way that condition variables are (or were originally) implemented.
The mutex is used to protect the condition variable itself. That's why you need it locked before you do a wait.
The wait will "atomically" unlock the mutex, allowing others access to the condition variable (for signalling). Then when the condition variable is signalled or broadcast to, one or more of the threads on the waiting list will be woken up and the mutex will be magically locked again for that thread.
You typically see the following operation with condition variables, illustrating how they work. The following example is a worker thread which is given work via a signal to a condition variable.
thread:
initialise.
lock mutex.
while thread not told to stop working:
wait on condvar using mutex.
if work is available to be done:
do the work.
unlock mutex.
clean up.
exit thread.
The work is done within this loop provided that there is some available when the wait returns. When the thread has been flagged to stop doing work (usually by another thread setting the exit condition then kicking the condition variable to wake this thread up), the loop will exit, the mutex will be unlocked and this thread will exit.
The code above is a single-consumer model as the mutex remains locked while the work is being done. For a multi-consumer variation, you can use, as an example:
thread:
initialise.
lock mutex.
while thread not told to stop working:
wait on condvar using mutex.
if work is available to be done:
copy work to thread local storage.
unlock mutex.
do the work.
lock mutex.
unlock mutex.
clean up.
exit thread.
which allows other consumers to receive work while this one is doing work.
The condition variable relieves you of the burden of polling some condition instead allowing another thread to notify you when something needs to happen. Another thread can tell that thread that work is available as follows:
lock mutex.
flag work as available.
signal condition variable.
unlock mutex.
The vast majority of what are often erroneously called spurious wakeups was generally always because multiple threads had been signalled within their pthread_cond_wait call (broadcast), one would return with the mutex, do the work, then re-wait.
Then the second signalled thread could come out when there was no work to be done. So you had to have an extra variable indicating that work should be done (this was inherently mutex-protected with the condvar/mutex pair here - other threads needed to lock the mutex before changing it however).
It was technically possible for a thread to return from a condition wait without being kicked by another process (this is a genuine spurious wakeup) but, in all my many years working on pthreads, both in development/service of the code and as a user of them, I never once received one of these. Maybe that was just because HP had a decent implementation :-)
In any case, the same code that handled the erroneous case also handled genuine spurious wakeups as well since the work-available flag would not be set for those.
A condition variable is quite limited if you could only signal a condition, usually you need to handle some data that's related to to condition that was signalled. Signalling/wakeup have to be done atomically in regards to achieve that without introducing race conditions, or be overly complex
pthreads can also give you , for rather technical reasons, a spurious wakeup . That means you need to check a predicate, so you can be sure the condition actually was signalled - and distinguish that from a spurious wakeup. Checking such a condition in regards to waiting for it need to be guarded - so a condition variable needs a way to atomically wait/wake up while locking/unlocking a mutex guarding that condition.
Consider a simple example where you're notified that some data are produced. Maybe another thread made some data that you want, and set a pointer to that data.
Imagine a producer thread giving some data to another consumer thread through a 'some_data'
pointer.
while(1) {
pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
char *data = some_data;
some_data = NULL;
handle(data);
}
you'd naturally get a lot of race condition, what if the other thread did some_data = new_data right after you got woken up, but before you did data = some_data
You cannot really create your own mutex to guard this case either .e.g
while(1) {
pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
pthread_mutex_lock(&mutex);
char *data = some_data;
some_data = NULL;
pthread_mutex_unlock(&mutex);
handle(data);
}
Will not work, there's still a chance of a race condition in between waking up and grabbing the mutex. Placing the mutex before the pthread_cond_wait doesn't help you, as you will now
hold the mutex while waiting - i.e. the producer will never be able to grab the mutex.
(note, in this case you could create a second condition variable to signal the producer that you're done with some_data - though this will become complex, especially so if you want many producers/consumers.)
Thus you need a way to atomically release/grab the mutex when waiting/waking up from the condition. That's what pthread condition variables does, and here's what you'd do:
while(1) {
pthread_mutex_lock(&mutex);
while(some_data == NULL) { // predicate to acccount for spurious wakeups,would also
// make it robust if there were several consumers
pthread_cond_wait(&cond,&mutex); //atomically lock/unlock mutex
}
char *data = some_data;
some_data = NULL;
pthread_mutex_unlock(&mutex);
handle(data);
}
(the producer would naturally need to take the same precautions, always guarding 'some_data' with the same mutex, and making sure it doesn't overwrite some_data if some_data is currently != NULL)
POSIX condition variables are stateless. So it is your responsibility to maintain the state. Since the state will be accessed by both threads that wait and threads that tell other threads to stop waiting, it must be protected by a mutex. If you think you can use condition variables without a mutex, then you haven't grasped that condition variables are stateless.
Condition variables are built around a condition. Threads that wait on a condition variable are waiting for some condition. Threads that signal condition variables change that condition. For example, a thread might be waiting for some data to arrive. Some other thread might notice that the data has arrived. "The data has arrived" is the condition.
Here's the classic use of a condition variable, simplified:
while(1)
{
pthread_mutex_lock(&work_mutex);
while (work_queue_empty()) // wait for work
pthread_cond_wait(&work_cv, &work_mutex);
work = get_work_from_queue(); // get work
pthread_mutex_unlock(&work_mutex);
do_work(work); // do that work
}
See how the thread is waiting for work. The work is protected by a mutex. The wait releases the mutex so that another thread can give this thread some work. Here's how it would be signalled:
void AssignWork(WorkItem work)
{
pthread_mutex_lock(&work_mutex);
add_work_to_queue(work); // put work item on queue
pthread_cond_signal(&work_cv); // wake worker thread
pthread_mutex_unlock(&work_mutex);
}
Notice that you need the mutex to protect the work queue. Notice that the condition variable itself has no idea whether there's work or not. That is, a condition variable must be associated with a condition, that condition must be maintained by your code, and since it's shared among threads, it must be protected by a mutex.
Not all condition variable functions require a mutex: only the waiting operations do. The signal and broadcast operations do not require a mutex. A condition variable also is not permanently associated with a specific mutex; the external mutex does not protect the condition variable. If a condition variable has internal state, such as a queue of waiting threads, this must be protected by an internal lock inside the condition variable.
The wait operations bring together a condition variable and a mutex, because:
a thread has locked the mutex, evaluated some expression over shared variables and found it to be false, such that it needs to wait.
the thread must atomically move from owning the mutex, to waiting on the condition.
For this reason, the wait operation takes as arguments both the mutex and condition: so that it can manage the atomic transfer of a thread from owning the mutex to waiting, so that the thread does not fall victim to the lost wake up race condition.
A lost wakeup race condition will occur if a thread gives up a mutex, and then waits on a stateless synchronization object, but in a way which is not atomic: there exists a window of time when the thread no longer has the lock, and has not yet begun waiting on the object. During this window, another thread can come in, make the awaited condition true, signal the stateless synchronization and then disappear. The stateless object doesn't remember that it was signaled (it is stateless). So then the original thread goes to sleep on the stateless synchronization object, and does not wake up, even though the condition it needs has already become true: lost wakeup.
The condition variable wait functions avoid the lost wake up by making sure that the calling thread is registered to reliably catch the wakeup before it gives up the mutex. This would be impossible if the condition variable wait function did not take the mutex as an argument.
I do not find the other answers to be as concise and readable as this page. Normally the waiting code looks something like this:
mutex.lock()
while(!check())
condition.wait(mutex) # atomically unlocks mutex and sleeps. Calls
# mutex.lock() once the thread wakes up.
mutex.unlock()
There are three reasons to wrap the wait() in a mutex:
without a mutex another thread could signal() before the wait() and we'd miss this wake up.
normally check() is dependent on modification from another thread, so you need mutual exclusion on it anyway.
to ensure that the highest priority thread proceeds first (the queue for the mutex allows the scheduler to decide who goes next).
The third point is not always a concern - historical context is linked from the article to this conversation.
Spurious wake-ups are often mentioned with regard to this mechanism (i.e. the waiting thread is awoken without signal() being called). However, such events are handled by the looped check().
Condition variables are associated with a mutex because it is the only way it can avoid the race that it is designed to avoid.
// incorrect usage:
// thread 1:
while (notDone) {
pthread_mutex_lock(&mutex);
bool ready = protectedReadyToRunVariable
pthread_mutex_unlock(&mutex);
if (ready) {
doWork();
} else {
pthread_cond_wait(&cond1); // invalid syntax: this SHOULD have a mutex
}
}
// signalling thread
// thread 2:
prepareToRunThread1();
pthread_mutex_lock(&mutex);
protectedReadyToRuNVariable = true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cond1);
Now, lets look at a particularly nasty interleaving of these operations
pthread_mutex_lock(&mutex);
bool ready = protectedReadyToRunVariable;
pthread_mutex_unlock(&mutex);
pthread_mutex_lock(&mutex);
protectedReadyToRuNVariable = true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cond1);
if (ready) {
pthread_cond_wait(&cond1); // uh o!
At this point, there is no thread which is going to signal the condition variable, so thread1 will wait forever, even though the protectedReadyToRunVariable says it's ready to go!
The only way around this is for condition variables to atomically release the mutex while simultaneously starting to wait on the condition variable. This is why the cond_wait function requires a mutex
// correct usage:
// thread 1:
while (notDone) {
pthread_mutex_lock(&mutex);
bool ready = protectedReadyToRunVariable
if (ready) {
pthread_mutex_unlock(&mutex);
doWork();
} else {
pthread_cond_wait(&mutex, &cond1);
}
}
// signalling thread
// thread 2:
prepareToRunThread1();
pthread_mutex_lock(&mutex);
protectedReadyToRuNVariable = true;
pthread_cond_signal(&mutex, &cond1);
pthread_mutex_unlock(&mutex);
The mutex is supposed to be locked when you call pthread_cond_wait; when you call it it atomically both unlocks the mutex and then blocks on the condition. Once the condition is signaled it atomically locks it again and returns.
This allows the implementation of predictable scheduling if desired, in that the thread that would be doing the signalling can wait until the mutex is released to do its processing and then signal the condition.
It appears to be a specific design decision rather than a conceptual need.
Per the pthreads docs the reason that the mutex was not separated is that there is a significant performance improvement by combining them and they expect that because of common race conditions if you don't use a mutex, it's almost always going to be done anyway.
https://linux.die.net/man/3/pthread_cond_wait
Features of Mutexes and Condition Variables
It had been suggested that the mutex acquisition and release be
decoupled from condition wait. This was rejected because it is the
combined nature of the operation that, in fact, facilitates realtime
implementations. Those implementations can atomically move a
high-priority thread between the condition variable and the mutex in a
manner that is transparent to the caller. This can prevent extra
context switches and provide more deterministic acquisition of a mutex
when the waiting thread is signaled. Thus, fairness and priority
issues can be dealt with directly by the scheduling discipline.
Furthermore, the current condition wait operation matches existing
practice.
There are a tons of exegeses about that, yet I want to epitomize it with an example following.
1 void thr_child() {
2 done = 1;
3 pthread_cond_signal(&c);
4 }
5 void thr_parent() {
6 if (done == 0)
7 pthread_cond_wait(&c);
8 }
What's wrong with the code snippet? Just ponder somewhat before going ahead.
The issue is genuinely subtle. If the parent invokes
thr_parent() and then vets the value of done, it will see that it is 0 and
thus try to go to sleep. But just before it calls wait to go to sleep, the parent
is interrupted between lines of 6-7, and the child runs. The child changes the state variable
done to 1 and signals, but no thread is waiting and thus no thread is
woken. When the parent runs again, it sleeps forever, which is really egregious.
What if they are carried out while acquired locks individually?
I made an exercice in class if you want a real example of condition variable :
#include "stdio.h"
#include "stdlib.h"
#include "pthread.h"
#include "unistd.h"
int compteur = 0;
pthread_cond_t varCond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex_compteur;
void attenteSeuil(arg)
{
pthread_mutex_lock(&mutex_compteur);
while(compteur < 10)
{
printf("Compteur : %d<10 so i am waiting...\n", compteur);
pthread_cond_wait(&varCond, &mutex_compteur);
}
printf("I waited nicely and now the compteur = %d\n", compteur);
pthread_mutex_unlock(&mutex_compteur);
pthread_exit(NULL);
}
void incrementCompteur(arg)
{
while(1)
{
pthread_mutex_lock(&mutex_compteur);
if(compteur == 10)
{
printf("Compteur = 10\n");
pthread_cond_signal(&varCond);
pthread_mutex_unlock(&mutex_compteur);
pthread_exit(NULL);
}
else
{
printf("Compteur ++\n");
compteur++;
}
pthread_mutex_unlock(&mutex_compteur);
}
}
int main(int argc, char const *argv[])
{
int i;
pthread_t threads[2];
pthread_mutex_init(&mutex_compteur, NULL);
pthread_create(&threads[0], NULL, incrementCompteur, NULL);
pthread_create(&threads[1], NULL, attenteSeuil, NULL);
pthread_exit(NULL);
}
I’m reading up on pthread.h; the condition variable related functions (like pthread_cond_wait(3)) require a mutex as an argument. Why? As far as I can tell, I’m going to be creating a mutex just to use as that argument? What is that mutex supposed to do?
It's just the way that condition variables are (or were originally) implemented.
The mutex is used to protect the condition variable itself. That's why you need it locked before you do a wait.
The wait will "atomically" unlock the mutex, allowing others access to the condition variable (for signalling). Then when the condition variable is signalled or broadcast to, one or more of the threads on the waiting list will be woken up and the mutex will be magically locked again for that thread.
You typically see the following operation with condition variables, illustrating how they work. The following example is a worker thread which is given work via a signal to a condition variable.
thread:
initialise.
lock mutex.
while thread not told to stop working:
wait on condvar using mutex.
if work is available to be done:
do the work.
unlock mutex.
clean up.
exit thread.
The work is done within this loop provided that there is some available when the wait returns. When the thread has been flagged to stop doing work (usually by another thread setting the exit condition then kicking the condition variable to wake this thread up), the loop will exit, the mutex will be unlocked and this thread will exit.
The code above is a single-consumer model as the mutex remains locked while the work is being done. For a multi-consumer variation, you can use, as an example:
thread:
initialise.
lock mutex.
while thread not told to stop working:
wait on condvar using mutex.
if work is available to be done:
copy work to thread local storage.
unlock mutex.
do the work.
lock mutex.
unlock mutex.
clean up.
exit thread.
which allows other consumers to receive work while this one is doing work.
The condition variable relieves you of the burden of polling some condition instead allowing another thread to notify you when something needs to happen. Another thread can tell that thread that work is available as follows:
lock mutex.
flag work as available.
signal condition variable.
unlock mutex.
The vast majority of what are often erroneously called spurious wakeups was generally always because multiple threads had been signalled within their pthread_cond_wait call (broadcast), one would return with the mutex, do the work, then re-wait.
Then the second signalled thread could come out when there was no work to be done. So you had to have an extra variable indicating that work should be done (this was inherently mutex-protected with the condvar/mutex pair here - other threads needed to lock the mutex before changing it however).
It was technically possible for a thread to return from a condition wait without being kicked by another process (this is a genuine spurious wakeup) but, in all my many years working on pthreads, both in development/service of the code and as a user of them, I never once received one of these. Maybe that was just because HP had a decent implementation :-)
In any case, the same code that handled the erroneous case also handled genuine spurious wakeups as well since the work-available flag would not be set for those.
A condition variable is quite limited if you could only signal a condition, usually you need to handle some data that's related to to condition that was signalled. Signalling/wakeup have to be done atomically in regards to achieve that without introducing race conditions, or be overly complex
pthreads can also give you , for rather technical reasons, a spurious wakeup . That means you need to check a predicate, so you can be sure the condition actually was signalled - and distinguish that from a spurious wakeup. Checking such a condition in regards to waiting for it need to be guarded - so a condition variable needs a way to atomically wait/wake up while locking/unlocking a mutex guarding that condition.
Consider a simple example where you're notified that some data are produced. Maybe another thread made some data that you want, and set a pointer to that data.
Imagine a producer thread giving some data to another consumer thread through a 'some_data'
pointer.
while(1) {
pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
char *data = some_data;
some_data = NULL;
handle(data);
}
you'd naturally get a lot of race condition, what if the other thread did some_data = new_data right after you got woken up, but before you did data = some_data
You cannot really create your own mutex to guard this case either .e.g
while(1) {
pthread_cond_wait(&cond); //imagine cond_wait did not have a mutex
pthread_mutex_lock(&mutex);
char *data = some_data;
some_data = NULL;
pthread_mutex_unlock(&mutex);
handle(data);
}
Will not work, there's still a chance of a race condition in between waking up and grabbing the mutex. Placing the mutex before the pthread_cond_wait doesn't help you, as you will now
hold the mutex while waiting - i.e. the producer will never be able to grab the mutex.
(note, in this case you could create a second condition variable to signal the producer that you're done with some_data - though this will become complex, especially so if you want many producers/consumers.)
Thus you need a way to atomically release/grab the mutex when waiting/waking up from the condition. That's what pthread condition variables does, and here's what you'd do:
while(1) {
pthread_mutex_lock(&mutex);
while(some_data == NULL) { // predicate to acccount for spurious wakeups,would also
// make it robust if there were several consumers
pthread_cond_wait(&cond,&mutex); //atomically lock/unlock mutex
}
char *data = some_data;
some_data = NULL;
pthread_mutex_unlock(&mutex);
handle(data);
}
(the producer would naturally need to take the same precautions, always guarding 'some_data' with the same mutex, and making sure it doesn't overwrite some_data if some_data is currently != NULL)
POSIX condition variables are stateless. So it is your responsibility to maintain the state. Since the state will be accessed by both threads that wait and threads that tell other threads to stop waiting, it must be protected by a mutex. If you think you can use condition variables without a mutex, then you haven't grasped that condition variables are stateless.
Condition variables are built around a condition. Threads that wait on a condition variable are waiting for some condition. Threads that signal condition variables change that condition. For example, a thread might be waiting for some data to arrive. Some other thread might notice that the data has arrived. "The data has arrived" is the condition.
Here's the classic use of a condition variable, simplified:
while(1)
{
pthread_mutex_lock(&work_mutex);
while (work_queue_empty()) // wait for work
pthread_cond_wait(&work_cv, &work_mutex);
work = get_work_from_queue(); // get work
pthread_mutex_unlock(&work_mutex);
do_work(work); // do that work
}
See how the thread is waiting for work. The work is protected by a mutex. The wait releases the mutex so that another thread can give this thread some work. Here's how it would be signalled:
void AssignWork(WorkItem work)
{
pthread_mutex_lock(&work_mutex);
add_work_to_queue(work); // put work item on queue
pthread_cond_signal(&work_cv); // wake worker thread
pthread_mutex_unlock(&work_mutex);
}
Notice that you need the mutex to protect the work queue. Notice that the condition variable itself has no idea whether there's work or not. That is, a condition variable must be associated with a condition, that condition must be maintained by your code, and since it's shared among threads, it must be protected by a mutex.
Not all condition variable functions require a mutex: only the waiting operations do. The signal and broadcast operations do not require a mutex. A condition variable also is not permanently associated with a specific mutex; the external mutex does not protect the condition variable. If a condition variable has internal state, such as a queue of waiting threads, this must be protected by an internal lock inside the condition variable.
The wait operations bring together a condition variable and a mutex, because:
a thread has locked the mutex, evaluated some expression over shared variables and found it to be false, such that it needs to wait.
the thread must atomically move from owning the mutex, to waiting on the condition.
For this reason, the wait operation takes as arguments both the mutex and condition: so that it can manage the atomic transfer of a thread from owning the mutex to waiting, so that the thread does not fall victim to the lost wake up race condition.
A lost wakeup race condition will occur if a thread gives up a mutex, and then waits on a stateless synchronization object, but in a way which is not atomic: there exists a window of time when the thread no longer has the lock, and has not yet begun waiting on the object. During this window, another thread can come in, make the awaited condition true, signal the stateless synchronization and then disappear. The stateless object doesn't remember that it was signaled (it is stateless). So then the original thread goes to sleep on the stateless synchronization object, and does not wake up, even though the condition it needs has already become true: lost wakeup.
The condition variable wait functions avoid the lost wake up by making sure that the calling thread is registered to reliably catch the wakeup before it gives up the mutex. This would be impossible if the condition variable wait function did not take the mutex as an argument.
I do not find the other answers to be as concise and readable as this page. Normally the waiting code looks something like this:
mutex.lock()
while(!check())
condition.wait(mutex) # atomically unlocks mutex and sleeps. Calls
# mutex.lock() once the thread wakes up.
mutex.unlock()
There are three reasons to wrap the wait() in a mutex:
without a mutex another thread could signal() before the wait() and we'd miss this wake up.
normally check() is dependent on modification from another thread, so you need mutual exclusion on it anyway.
to ensure that the highest priority thread proceeds first (the queue for the mutex allows the scheduler to decide who goes next).
The third point is not always a concern - historical context is linked from the article to this conversation.
Spurious wake-ups are often mentioned with regard to this mechanism (i.e. the waiting thread is awoken without signal() being called). However, such events are handled by the looped check().
Condition variables are associated with a mutex because it is the only way it can avoid the race that it is designed to avoid.
// incorrect usage:
// thread 1:
while (notDone) {
pthread_mutex_lock(&mutex);
bool ready = protectedReadyToRunVariable
pthread_mutex_unlock(&mutex);
if (ready) {
doWork();
} else {
pthread_cond_wait(&cond1); // invalid syntax: this SHOULD have a mutex
}
}
// signalling thread
// thread 2:
prepareToRunThread1();
pthread_mutex_lock(&mutex);
protectedReadyToRuNVariable = true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cond1);
Now, lets look at a particularly nasty interleaving of these operations
pthread_mutex_lock(&mutex);
bool ready = protectedReadyToRunVariable;
pthread_mutex_unlock(&mutex);
pthread_mutex_lock(&mutex);
protectedReadyToRuNVariable = true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cond1);
if (ready) {
pthread_cond_wait(&cond1); // uh o!
At this point, there is no thread which is going to signal the condition variable, so thread1 will wait forever, even though the protectedReadyToRunVariable says it's ready to go!
The only way around this is for condition variables to atomically release the mutex while simultaneously starting to wait on the condition variable. This is why the cond_wait function requires a mutex
// correct usage:
// thread 1:
while (notDone) {
pthread_mutex_lock(&mutex);
bool ready = protectedReadyToRunVariable
if (ready) {
pthread_mutex_unlock(&mutex);
doWork();
} else {
pthread_cond_wait(&mutex, &cond1);
}
}
// signalling thread
// thread 2:
prepareToRunThread1();
pthread_mutex_lock(&mutex);
protectedReadyToRuNVariable = true;
pthread_cond_signal(&mutex, &cond1);
pthread_mutex_unlock(&mutex);
The mutex is supposed to be locked when you call pthread_cond_wait; when you call it it atomically both unlocks the mutex and then blocks on the condition. Once the condition is signaled it atomically locks it again and returns.
This allows the implementation of predictable scheduling if desired, in that the thread that would be doing the signalling can wait until the mutex is released to do its processing and then signal the condition.
It appears to be a specific design decision rather than a conceptual need.
Per the pthreads docs the reason that the mutex was not separated is that there is a significant performance improvement by combining them and they expect that because of common race conditions if you don't use a mutex, it's almost always going to be done anyway.
https://linux.die.net/man/3/pthread_cond_wait
Features of Mutexes and Condition Variables
It had been suggested that the mutex acquisition and release be
decoupled from condition wait. This was rejected because it is the
combined nature of the operation that, in fact, facilitates realtime
implementations. Those implementations can atomically move a
high-priority thread between the condition variable and the mutex in a
manner that is transparent to the caller. This can prevent extra
context switches and provide more deterministic acquisition of a mutex
when the waiting thread is signaled. Thus, fairness and priority
issues can be dealt with directly by the scheduling discipline.
Furthermore, the current condition wait operation matches existing
practice.
There are a tons of exegeses about that, yet I want to epitomize it with an example following.
1 void thr_child() {
2 done = 1;
3 pthread_cond_signal(&c);
4 }
5 void thr_parent() {
6 if (done == 0)
7 pthread_cond_wait(&c);
8 }
What's wrong with the code snippet? Just ponder somewhat before going ahead.
The issue is genuinely subtle. If the parent invokes
thr_parent() and then vets the value of done, it will see that it is 0 and
thus try to go to sleep. But just before it calls wait to go to sleep, the parent
is interrupted between lines of 6-7, and the child runs. The child changes the state variable
done to 1 and signals, but no thread is waiting and thus no thread is
woken. When the parent runs again, it sleeps forever, which is really egregious.
What if they are carried out while acquired locks individually?
I made an exercice in class if you want a real example of condition variable :
#include "stdio.h"
#include "stdlib.h"
#include "pthread.h"
#include "unistd.h"
int compteur = 0;
pthread_cond_t varCond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex_compteur;
void attenteSeuil(arg)
{
pthread_mutex_lock(&mutex_compteur);
while(compteur < 10)
{
printf("Compteur : %d<10 so i am waiting...\n", compteur);
pthread_cond_wait(&varCond, &mutex_compteur);
}
printf("I waited nicely and now the compteur = %d\n", compteur);
pthread_mutex_unlock(&mutex_compteur);
pthread_exit(NULL);
}
void incrementCompteur(arg)
{
while(1)
{
pthread_mutex_lock(&mutex_compteur);
if(compteur == 10)
{
printf("Compteur = 10\n");
pthread_cond_signal(&varCond);
pthread_mutex_unlock(&mutex_compteur);
pthread_exit(NULL);
}
else
{
printf("Compteur ++\n");
compteur++;
}
pthread_mutex_unlock(&mutex_compteur);
}
}
int main(int argc, char const *argv[])
{
int i;
pthread_t threads[2];
pthread_mutex_init(&mutex_compteur, NULL);
pthread_create(&threads[0], NULL, incrementCompteur, NULL);
pthread_create(&threads[1], NULL, attenteSeuil, NULL);
pthread_exit(NULL);
}
I have 10 threads that are supposed to be waiting for signal.
Until now I've simply done 'sleep(3)', and that has been working fine, but is there are a more secure way to make sure, that all threads have been created and are indeed waiting.
I made the following construction where I in critical region, before the wait, increment a counter telling how many threads are waiting. But then I have to have an additional mutex and conditional for signalling back to the main that all threads are created, it seems overly complex.
Am I missing some basic thread design pattern?
Thanks
edit: fixed types
edit: clarifying information below
A barrier won't work in this case, because I'm not interested in letting my threads wait until all threads are ready. This already happens with the 'cond_wait'.
I'm interested in letting the main function know, when all threads are ready and waiting.
//mutex and conditional to signal from main to threads to do work
mutex_t mutex_for_cond;
condt_t cond;
//mutex and conditional to signal back from thread to main that threads are ready
mutex_t mutex_for_back_cond;
condt_t back_cond;
int nThreads=0;//threadsafe by using mutex_for_cond
void *thread(){
mutex_lock(mutex_for_cond);
nThreads++;
if(nThreads==10){
mutex_lock(mutex_for_back_cond)
cond_signal(back_cond);
mutex_unlock(mutex_for_back_cond)
}while(1){
cond_wait(cond,mutext_for_cond);
if(spurious)
continue;
else
break;
}
mutex_unlock(mutex_for_cond);
//do work on non critical region data
}
int main(){
for(int i=0;i<10)
create_threads;
while(1){
mutex_lock(mutex_for_back_cond);
cond_wait(back_cond,mutex_for_back_cond);
mutex_unlock(mutex_for_back_cond);
mutex_lock(mutex_for_cond);
if(nThreads==10){
break;
}else{
//spurious wakeup
mutex_unlock(mutex_for_cond);
}
}
//now all threads are waiting
//mutex_for_cond is still locked so broadcast
cond_broadcast(cond);//was type here
}
Am I missing some basic thread design pattern?
Yes. For every condition, there should be a variable that is protected by the accompanying mutex. Only the change of this variable is indicated by signals on the condition variable.
You check the variable in a loop, waiting on the condition:
mutex_lock(mutex_for_back_cond);
while ( ready_threads < 10 )
cond_wait(back_cond,mutex_for_back_cond);
mutex_unlock( mutex_for_back_cond );
Additionally, what you are trying to build is a thread barrier. It is often pre-implemented in threading libraries, like pthread_barrier_wait.
Sensible threading APIs have a barrier construct which does precisely this.
For example, with boost::thread, you would create a barrier like this:
boost::barrier bar(10); // a barrier for 10 threads
and then each thread would wait on the barrier:
bar.wait();
the barrier waits until the specified number of threads are waiting for it, and then releases them all at once. In other words, once all ten threads have been created and are ready, it'll allow them all to proceed.
That's the simple, and sane, way of doing it. Threading APIs which do not have a barrier construct require you to do it the hard way, not unlike what you're doing now.
You should associate some variable that contains the 'event state' with the condition variable. The main thread sets the event state variable appropriately just before issuing the broadcast. The threads that are interested in the event check the event state variable regardless of whether they've blocked on the condition variable or not.
With this pattern, the main thread doesn't need to know about the precise state of the threads - it just sets the event when it needs to then broadcasts the condition. Any waiting threads will be unblocked, and any threads not waiting yet will never block on the condition variable because they'll note that the event has already occurred before waiting on the condition. Something like the following pseudocode:
//mutex and conditional to signal from main to threads to do work
pthread_mutex_t mutex_for_cond;
pthread_cond_t cond;
int event_occurred = 0;
void *thread()
{
pthread_mutex_lock(&mutex_for_cond);
while (!event_occurred) {
pthread_cond_wait( &cond, &mutex_for_cond);
}
pthread_mutex_unlock(&mutex_for_cond);
//do work on non critical region data
}
int main()
{
pthread_mutex_init(&mutex_for_cond, ...);
pthread_cond_init(&cond, ...);
for(int i=0;i<10)
create_threads(...);
// do whatever needs to done to set up the work for the threads
// now let the threads know they can do their work (whether or not
// they've gotten to the "wait point" yet)
pthread_mutex_lock(&mutex_for_cond);
event_occured = 1;
pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mutex_for_cond);
}