Destruction of condition variable randomly loses notification - c++

Given a condition_variable as a member of a class, my understanding is that:
The condition variable is destroyed after the class destructor completes.
Destruction of a condition variable does not need to wait for notifications to have been received.
In light of these expectations, my question is: why does the example code below randomly fail to notify a waiting thread?
#include <mutex>
#include <condition_variable>
#define NOTIFY_IN_DESTRUCTOR
struct notify_on_delete {
std::condition_variable cv;
~notify_on_delete() {
#ifdef NOTIFY_IN_DESTRUCTOR
cv.notify_all();
#endif
}
};
int main () {
for (int trial = 0; trial < 10000; ++trial) {
notify_on_delete* nod = new notify_on_delete();
std::mutex flag;
bool kill = false;
std::thread run([nod, &flag, &kill] () {
std::unique_lock<std::mutex> lock(flag);
kill = true;
nod->cv.wait(lock);
});
while(true) {
std::unique_lock<std::mutex> lock(flag);
if (!kill) continue;
#ifdef NOTIFY_IN_DESTRUCTOR
delete nod;
#else
nod->cv.notify_all();
#endif
break;
}
run.join();
#ifndef NOTIFY_IN_DESTRUCTOR
delete nod;
#endif
}
return 0;
}
In the code above, if NOTIFY_IN_DESTRUCTOR is not defined then the test will run to completion reliably. However, when NOTIFY_IN_DESTRUCTOR is defined the test will randomly hang (usually after a few thousand trials).
I am compiling using Apple Clang:
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin17.3.0
Thread model: posix
C++14 specified, compiled with DEBUG flags set.
EDIT:
To clarify: this question is about the semantics of the specified behavior of instances of condition_variable. The second point above appears to be reenforced in the following quote:
Blockquote
Requires: There shall be no thread blocked on *this. [ Note: That is, all threads shall have been notified; they may subsequently block on the lock specified in the wait. This relaxes the usual rules, which would have required all wait calls to happen before destruction. Only the notification to unblock the wait needs to happen before destruction. The user should take care to ensure that no threads wait on *this once the destructor has been started, especially when the waiting threads are calling the wait functions in a loop or using the overloads of wait, wait_­for, or wait_­until that take a predicate. — end note ]
The core semantic question seems to be what "blocked on" means. My present interpretation of the quote above would be that after the line
cv.notify_all(); // defined NOTIFY_IN_DESTRUCTOR
in ~notify_on_delete() the thread test is not "blocked on" nod - which is to say that I presently understand that after this call "the notification to unblock the wait" has occurred, so according to the quote the requirement has been met to proceed with the destruction of the condition_variable instance.
Can someone provide a clarification of "blocked on" or "notification to unblock" to the effect that in the code above, the call to notify_all() does not satisfy the requirements of ~condition_variable()?

When NOTIFY_IN_DESTRUCTOR is defined:
Calling notify_one()/notify_all() doesn't mean that the waiting thread is immediately woken up and the current thread will wait for the other thread. It just means that if the waiting thread wakes up at some point after the current thread has called notify, it should proceed. So in essence, you might be deleting the condition variable before the waiting thread wakes up (depending on how the threads are scheduled).
The explanation for why it hangs, even if the condition variable is deleted while the other thread is waiting on it lies on the fact the wait/notify operations are implemented using queues associated with the condition variables. These queues hold the threads waiting on the condition variables. Freeing the condition variable would mean getting rid of these thread queues.

I am pretty sure your vendors implementation is broken. Your program looks almost OK from the perspective of obeying the contract with the cv/mutex classes. I couldn’t 100% verify, I am behind one version.
The notion of “blocking” is confusing in the condition_variable (CV) class because there are multiple things to be blocking on. The contract requires the implementation to be more complex than a veneer on pthread_cond* (for example). My reading of it indicates that a single CV would require at least 2 pthread_cond_t’s to implement.
The crux is the destructor having a definition while threads are waiting upon a CV; and its ruin is in a race between CV.wait and ~CV. The naive implementation simply has ~CV broadcast the condvar then eliminate it, and has CV.wait remember the lock in a local variable, so that when it awakens from the runtime notion of blocking it no longer has to reference the object. In that implementation, ~CV becomes a “fire and forget” mechanism.
Sadly, a racing CV.wait could meet the preconditions, yet not be finished interacting with the object yet, when ~CV sneaks in and destroys it. To resolve the race CV.wait and ~CV need to exclude each other, thus the CV requires at least a private mutex to resolve races.
We aren’t finished yet. There usually isn’t an underlying support [ eg. kernel ] for an operation like “wait on cv controlled by lock and release this other lock once I am blocked”. I think that even the posix folks found that too funny to require. Thus, burying a mutex in my CV isn’t enough, I actually require a mechanism that permits me to process events within it; thus a private condvar is required inside the implementation of CV. Obligatory David Parnas meme.
Almost OK, because as Marek R points out, you are relying on referencing a class after its destruction has begun; not the cv/mutex class, your notify_on_delete class. The conflict is a bit academic. I doubt clang would depend upon nod remaining valid after control had transferred to nod->cv.wait(); but the real customer of most compiler vendors are benchmarks, not programmers.
As as general note, multi-threaded programming is difficult, and having now peaked at the c++ threading model, it might be best to give it a decade or two to settle down. It’s contracts are astonishing. When I first looked at your program, I thought ‘duh, there is no way you can destroy a cv that can be accessed because RAII’. Silly me.
Pthreads is another awful API for threading. At least it doesn’t attempt over-reach, and is mature enough that robust test suites keep vendors in line.

Related

libc++ implementation of std::condition_variable_any

Condition variables should have have a single order with respect to notify() and unlock_sleep() (an imaginary function call used within wait() where the mutex is unlocked and the thread sleeps as one atomic sequence of operations) operations. To achieve this with arbitrary lockables std::condition_variable_any implementations typically use another mutex internally (to ensure atomicity and to sleep on)
If the internal unlock_sleep() and notify() (notify_one() or notify_all()) operations are not atomic with respect to each other you risk a thread unlocking the mutex, another thread signaling and then the original thread going to sleep and never waking up.
I was reading the libstdc++ and libc++ implementations of std::condition_variable_any and noticed this code in the libc++ implementation
{lock_guard<mutex> __lx(*__mut_);}
__cv_.notify_one();
the internal mutex is locked and then immediately unlocked before the signal operation. Doesn't this risk the problem I described above?
libstdc++ seems to have gotten this right
The C++11 and later standards explicitly say "The execution of notify_one and notify_all shall be atomic". So in one sense I think that you are correct that the internal mutex should be held across the call down to the platform's underlying condition variable notify call (for example pthread_cond_signal())
However, I don't think that the libc++ implementation will cause notifications to be missed because without the notifying thread synchronizing on the lock the waiting thread passes to wait() (or some other synchronization between the two threads) while calling notify_one() (or notify_all()) there is no way to ensure which of the two threads is 'first' to the notify or wait anyway. So if the notification can be missed in libc++'s current implementation, it could also be missed if libc++ were changed to hold the internal lock across its call down to the platform's notify API.
So I think that libc++ can invoke the "as if" rule to say that the implementation of notify_one()/notify_any() is "atomic enough".

Safely Destroying a Thread Pool

Consider the following implementation of a trivial thread pool written in C++14.
threadpool.h
threadpool.cpp
Observe that each thread is sleeping until it's been notified to awaken -- or some spurious wake up call -- and the following predicate evaluates to true:
std::unique_lock<mutex> lock(this->instance_mutex_);
this->cond_handle_task_.wait(lock, [this] {
return (this->destroy_ || !this->tasks_.empty());
});
Furthermore, observe that a ThreadPool object uses the data member destroy_ to determine if its being destroyed -- the destructor has been called. Toggling this data member to true will notify each worker thread that it's time to finish its current task and any of the other queued tasks then synchronize with the thread that's destroying this object; in addition to prohibiting the enqueue member function.
For your convenience, the implementation of the destructor is below:
ThreadPool::~ThreadPool() {
{
std::lock_guard<mutex> lock(this->instance_mutex_); // this line.
this->destroy_ = true;
}
this->cond_handle_task_.notify_all();
for (auto &worker : this->workers_) {
worker.join();
}
}
Q: I do not understand why it's necessary to lock the object's mutex while toggling destroy_ to true in the destructor. Furthermore, is it only necessary for setting its value or is it also necessary for accessing its value?
BQ: Can this thread pool implementation be improved or optimized while maintaining it's original purpose; a thread pool that can pool N amount of threads and distribute tasks to them to be executed concurrently?
This thread pool implementation is forked from Jakob Progsch's C++11 thread pool repository with a thorough code step through to understand the purpose behind its implementation and some subjective style changes.
I am introducing myself to concurrent programming and there is still much to learn -- I am a novice concurrent programmer as it stands right now. If my questions are not worded correctly then please make the appropriate correction(s) in your provided answer. Moreover, if the answer can be geared towards a client who is being introduced to concurrent programming for the first time then that would be best -- for myself and any other novices as well.
If the owning thread of the ThreadPool object is the only thread that atomically writes to the destroy_ variable, and the worker threads only atomically read from the destroy_ variable, then no, a mutex is not needed to protect the destroy_ variable in the ThreadPool destructor. Typically a mutex is necessary when an atomic set of operations must take place that can't be accomplished through a single atomic instruction on a platform, (i.e., operations beyond an atomic swap, etc.). That being said, the author of the thread pool may be trying to force some type of acquire semantics on the destroy_ variable without restoring to atomic operations (i.e. a memory fence operation), and/or the setting of the flag itself is not considered an atomic operation (platform dependent)... Some other options include declaring the variable as volatile to prevent it from being cached, etc. You can see this thread for more info.
Without some sort of synchronization operation in place, the worst case scenario could end up with a worker that won't complete due to the destroy_ variable being cached on a thread. On platforms with weaker memory ordering models, that's always a possibility if you allowed a benign memory race condition to exist ...
C++ defines a data race as multiple threads potentially accessing an object simultaneously with at least one of those accesses being a write. Programs with data races have undefined behavior. If you were to write to destroy in your destructor without holding the mutex, your program would have undefined behavior and we cannot predict what would happen.
If you were to read destroy elsewhere without holding the mutex, that read could potentially happen while the destructor is writing to it which is also a data race.

Is there a race condition in the `latch` sample in N3600?

Proposed for inclusion in C++14 (aka C++1y) are some new thread synchronization primitives: latches and barriers. The proposal is
N3600: C++ Latches and Barriers
N3666: C++ Latches and Barriers, revised
It sounds like a good idea and the samples make it look very programmer-friendly. Unfortunately, I think the sample code invokes undefined behavior. The proposal says of latch::~latch():
Destroys the latch. If the latch is destroyed while other threads are in wait(), or are invoking count_down(), the behaviour is undefined.
Note that it says "in wait()" and not "blocked in wait()", as the description of count_down() uses.
Then the following sample is provided:
An example of the second use case is shown below. We need to load data and then process it using a number of threads. Loading the data is I/O bound, whereas starting threads and creating data structures is CPU bound. By running these in parallel, throughput can be increased.
void DoWork()
{
latch start_latch(1);
vector<thread*> workers;
for (int i = 0; i < NTHREADS; ++i) {
workers.push_back(new thread([&] {
// Initialize data structures. This is CPU bound.
...
start_latch.wait();
// perform work
...
}));
}
// Load input data. This is I/O bound.
...
// Threads can now start processing
start_latch.count_down();
}
Isn't there a race condition between the threads waking and returning from wait(), and destruction of the latch when it leaves scope? Besides that, all the thread objects are leaked. If the scheduler doesn't run all worker threads before count_down returns and the start_latch object leaves scope, then I think undefined behavior will result. Presumably the fix is to iterate the vector and join() and delete all the worker threads after count_down but before returning.
Is there a problem with the sample code?
Do you agree that a proposal should show a complete correct example, even if the task is extremely simple, in order for reviewers to see what the use experience will be like?
Note: It appears possible that one or more of the worker threads haven't yet begun to wait, and will therefore call wait() on a destroyed latch.
Update: There's now a new version of the proposal, but the representative example is unchanged.
Thanks for pointing this out. Yes, I think that the sample code (which, in its defense, was intended to be concise) is broken. It should probably wait for the threads to finish.
Any implementation that allows threads to be blocked in wait() is almost certainly going to involves some kind of condition variable, and destroying the latch while a thread has not yet exited wait() is potentially undefined.
I don't know if there's time to update the paper, but I can make sure that the next version is fixed.
Alasdair

Need to mutex-protect (atomic) assignment sought by condition variable?

I understand how to use condition variables (crummy name for this construct, IMO, as the cv object neither is a variable nor indicates a condition). So I have a pair of threads, canonically set up with Boost.Thread as:
bool awake = false;
boost::mutex sync;
boost::condition_variable cv;
void thread1()
{
boost::unique_lock<boost::mutex> lock1(sync);
while (!awake)
cv.wait(lock1);
lock1.unlock(); // this line actually not canonical, but why not?
// proceed...
}
void thread2()
{
//...
boost::unique_lock<boost::mutex> lock2;
awake = true;
lock2.unlock();
cv.notify_all();
}
My question is: does thread2 really need to be protecting the assignment to awake? It seems to me the notify_all() call should be sufficient. If the data being manipulated and checked against were more than a simple "ok to proceed" flag, I see the value in the mutex, but here it seems like overkill.
A secondary question is that asked in the code fragment: Why doesn't the Boost documentation show the lock in thread1 being unlocked before the "process data" step?
EDIT: Maybe my question is really: Is there a cleaner construct than a CV to implement this kind of wait?
does thread2 really need to be protecting the assignment to awake?
Yes. Modifying an object from one thread and accessing it from another without synchronisation gives undefined behaviour. Even if it's just a bool.
For example, on some multiprocessor systems the write might only affect local memory; without an explicit synchronisation operation, other threads might never see the change.
Why doesn't the Boost documentation show the lock in thread1 being unlocked before the "process data" step?
If you unlocked the mutex before clearing the flag, then you might miss another signal.
Is there a cleaner construct than a CV to implement this kind of wait?
In Boost and the standard C++ library, no; a condition variable is flexible enough to handle arbitrary shared state and not particularly over-complicated for this simple case, so there's no particular need for anything simpler.
More generally, you could use a semaphore or a pipe to send a simple signal between threads.
Formally, you definitely need the lock in both threads: if any thread
modifies an object, and more than one thread accesses it, then all
accesses must be synchronized.
In practice, you'll probably get away with it without the lock; it's
almost certain that notify_all will issue the necessary fence or
membar instructions to ensure that the memory is properly synchronized.
But why take the risk?
As to the absense of the unlock, that's the whole point of the scoped
locking pattern: the unlock is in the destructor of the object, so
that the mutex will be unlocked even if an exception passes through.

Modelling boost::Lockable with semaphore rather than mutex (previously titled: Unlocking a mutex from a different thread)

I'm using the C++ boost::thread library, which in my case means I'm using pthreads. Officially, a mutex must be unlocked from the same thread which locks it, and I want the effect of being able to lock in one thread and then unlock in another. There are many ways to accomplish this. One possibility would be to write a new mutex class which allows this behavior.
For example:
class inter_thread_mutex{
bool locked;
boost::mutex mx;
boost::condition_variable cv;
public:
void lock(){
boost::unique_lock<boost::mutex> lck(mx);
while(locked) cv.wait(lck);
locked=true;
}
void unlock(){
{
boost::lock_guard<boost::mutex> lck(mx);
if(!locked) error();
locked=false;
}
cv.notify_one();
}
// bool try_lock(); void error(); etc.
}
I should point out that the above code doesn't guarantee FIFO access, since if one thread calls lock() while another calls unlock(), this first thread may acquire the lock ahead of other threads which are waiting. (Come to think of it, the boost::thread documentation doesn't appear to make any explicit scheduling guarantees for either mutexes or condition variables). But let's just ignore that (and any other bugs) for now.
My question is, if I decide to go this route, would I be able to use such a mutex as a model for the boost Lockable concept. For example, would anything go wrong if I use a boost::unique_lock< inter_thread_mutex > for RAII-style access, and then pass this lock to boost::condition_variable_any.wait(), etc.
On one hand I don't see why not. On the other hand, "I don't see why not" is usually a very bad way of determining whether something will work.
The reason I ask is that if it turns out that I have to write wrapper classes for RAII locks and condition variables and whatever else, then I'd rather just find some other way to achieve the same effect.
EDIT:
The kind of behavior I want is basically as follows. I have an object, and it needs to be locked whenever it is modified. I want to lock the object from one thread, and do some work on it. Then I want to keep the object locked while I tell another worker thread to complete the work. So the first thread can go on and do something else while the worker thread finishes up. When the worker thread gets done, it unlocks the mutex.
And I want the transition to be seemless so nobody else can get the mutex lock in between when thread 1 starts the work and thread 2 completes it.
Something like inter_thread_mutex seems like it would work, and it would also allow the program to interact with it as if it were an ordinary mutex. So it seems like a clean solution. If there's a better solution, I'd be happy to hear that also.
EDIT AGAIN:
The reason I need locks to begin with is that there are multiple master threads, and the locks are there to prevent them from accessing shared objects concurrently in invalid ways.
So the code already uses loop-level lock-free sequencing of operations at the master thread level. Also, in the original implementation, there were no worker threads, and the mutexes were ordinary kosher mutexes.
The inter_thread_thingy came up as an optimization, primarily to improve response time. In many cases, it was sufficient to guarantee that the "first part" of operation A, occurs before the "first part" of operation B. As a dumb example, say I punch object 1 and give it a black eye. Then I tell object 1 to change it's internal structure to reflect all the tissue damage. I don't want to wait around for the tissue damage before I move on to punch object 2. However, I do want the tissue damage to occur as part of the same operation; for example, in the interim, I don't want any other thread to reconfigure the object in such a way that would make tissue damage an invalid operation. (yes, this example is imperfect in many ways, and no I'm not working on a game)
So we made the change to a model where ownership of an object can be passed to a worker thread to complete an operation, and it actually works quite nicely; each master thread is able to get a lot more operations done because it doesn't need to wait for them all to complete. And, since the event sequencing at the master thread level is still loop-based, it is easy to write high-level master-thread operations, as they can be based on the assumption that an operation is complete (more precisely, the critical "first part" upon which the sequencing logic depends is complete) when the corresponding function call returns.
Finally, I thought it would be nice to use inter_thread mutex/semaphore thingies using RAII with boost locks to encapsulate the necessary synchronization that is required to make the whole thing work.
man pthread_unlock (this is on OS X, similar wording on Linux) has the answer:
NAME
pthread_mutex_unlock -- unlock a mutex
SYNOPSIS
#include <pthread.h>
int
pthread_mutex_unlock(pthread_mutex_t *mutex);
DESCRIPTION
If the current thread holds the lock on mutex, then the
pthread_mutex_unlock() function unlocks mutex.
Calling pthread_mutex_unlock() with a mutex that the
calling thread does not hold will result in
undefined behavior.
...
My counter-question would be - what kind of synchronization problem are you trying to solve with this? Most probably there is an easier solution.
Neither pthreads nor boost::thread (built on top of it) guarantee any order in which a contended mutex is acquired by competing threads.
Sorry, but I don't understand. what will be the state of your mutex in line [1] in the following code if another thread can unlock it?
inter_thread_mutex m;
{
m.lock();
// [1]
m.unlock();
}
This has no sens.
There's a few ways to approach this. Both of the ones I'm going to suggest are going to involve adding an additional piece of information to the object, rather adding a mechanism to unlock a thread from a thread other than the one that owns it.
1) you can add some information to indicate the object's state:
enum modification_state { consistent, // ready to be examined or to start being modified
phase1_complete, // ready for the second thread to finish the work
};
// first worker thread
lock();
do_init_work(object);
object.mod_state = phase1_complete;
unlock();
signal();
do_other_stuff();
// second worker thread
lock()
while( object.mod_state != phase1_complete )
wait()
do_final_work(obj)
object.mod_state = consistent;
unlock()
signal()
// some other thread that needs to read the data
lock()
while( object.mod_state != consistent )
wait();
read_data(obj)
unlock()
Works just fine with condition variables, because obviously you're not writing your own lock.
2) If you have a specific thread in mind, you can give the object an owner.
// first worker
lock();
while( obj.owner != this_thread() ) wait();
do_initial_work(obj);
obj.owner = second_thread_id;
unlock()
signal()
...
This is pretty much the same solution as my first solution, but more flexible in the adding/removing of phases, and less flexible in the adding/removing of threads.
To be honest, I'm not sure how inter thread mutex would help you here. You'd still need a semaphore or condition variable to signal the passing of the work to the second thread.
Small modification to what you already have: how about storing the id of the thread which you want to take the lock, in your inter_thread_whatever? Then unlock it, and send a message to that thread, saying "I want you execute whatever routine it is that tries to take this lock".
Then the condition in lock becomes while(locked || (desired_locker != thisthread && desired_locker != 0)). Technically you've "released the lock" in the first thread, and "taken it again" in the second thread, but there's no way that any other thread can grab it in between, so it's as if you've transferred it directly from one to the other.
There's a potential problem, that if a thread exits or is killed, while it's the desired locker of your lock, then that thread deadlocks. But you were already talking about the first thread waiting for a message from the second thread to say that it has successfully acquired the lock, so presumably you already have a plan in mind for what happens if that message is never received. To that plan, add "reset the desired_locker field on the inter_thread_whatever".
This is all very hairy, though, I'm not convinced that what I've proposed is correct. Is there a way that the "master" thread (the one that's directing all these helpers) can just make sure that it doesn't order any more operations to be performed on whatever is protected by this lock, until the first op is completed (or fails and some RAII thing notifies you)? You don't need locks as such, if you can deal with it at the level of the message loop.
I don't think it is a good idea to say that your inter_thread_mutex (binary_semaphore) can be seen as a model of Lockable. The main issue is that the main feature of your inter_thread_mutex defeats the Locakble concept. If inter_thread_mutex was a model of lockable you will expect in In [1] that the inter_thread_mutex m is locked.
// thread T1
inter_thread_mutex m;
{
unique_lock<inter_thread_mutex> lk(m);
// [1]
}
But as an other thread T2 can do m.unlock() while T1 is in [1], the guaranty is broken.
Binary semaphores can be used as Lockables as far as each thread tries to lock before unlocking. But the main goal of your class is exactly the contrary.
This is one of the reason semaphores in Boost.Interprocess don't use lock/unlock to name the functions, but wait/notify. Curiously these are the same names used by conditions :)
A mutex is a mechanism for describing mutually exclusive blocks of code. It does not make sense for these blocks of code to cross thread boundaries. Trying to use such a concept in such an counter intuitive way can only lead to problems down the line.
It sounds very much like you're looking for a different multi-threading concept, but without more detail it's hard to know what.