Possible race condition in std::condition_variable? - c++

I've looked into the VC++ implementation of std::condition_variable(lock,pred), basically, it looks like this:
template<class _Predicate>
void wait(unique_lock<mutex>& _Lck, _Predicate _Pred)
{ // wait for signal and test predicate
while (!_Pred())
wait(_Lck);
}
Basically , the naked wait calls _Cnd_waitX which calls _Cnd_waitwhich calls do_wait which calls cond->_get_cv()->wait(cs); (all of these are in the file cond.c).
cond->_get_cv() returns Concurrency::details::stl_condition_variable_interface .
If we go to the file primitives.h, we see that under windows 7 and above, we have the class stl_condition_variable_win7 which contains the old good win32 CONDITION_VARIABLE, and wait calls __crtSleepConditionVariableSRW.
Doing a bit of assembly debug, __crtSleepConditionVariableSRW just extract the the SleepConditionVariableSRW function pointer, and calls it.
Here's the thing: as far as I know, the win32 CONDITION_VARIABLE is not a kernel object, but a user mode one. Therefore, if some thread notifies this variable and no thread actually sleep on it, you lost the notification, and the thread will remain sleeping until timeout has reached or some other thread notifies it. A small program can actually prove it - if you miss the point of notification - your thread will remain sleeping although some other thread notified it.
My question goes like this:
one thread waits on a condition variable and the predicate returns false. Then, the whole chain of calls explained above takes place. In that time, another thread changed the environment so the predicate will return true and notifies the condition variable. We passed the predicate in the original thread, but we still didn't get into SleepConditionVariableSRW - the call chain is very long.
So, although we notified the condition variable and the predicate put on the condition variable will definitely return true (because the notifier made so), we are still blocking on the condition variable, possibly forever.
Is this how should it behave? It seems like a big ugly race condition waiting to happen. If you notify a condition variable and it's predicate returns true - the thread should unblock. But if we're in the limbo between checking the predicate and going to sleep - we are blocked forever. std::condition_variable::wait is not an atomic function.
What does the standard says about it and is it a really race condition?

You've violated the contract so all bets are off. See: http://en.cppreference.com/w/cpp/thread/condition_variable
TLDR: It's impossible for the predicate to change by someone else while you're holding the mutex.
You're supposed to change the underlying variable of the predicate while holding a mutex and you have to acquire that mutex before calling std::condition_variable::wait (both because wait releases the mutex, and because that's the contract).
In the scenario you described the change happened after the while (!_Pred()) saw that the predicate doesn't hold but before wait(_Lck) had a chance to release the mutex. This means that you changed the thing the predicate checks without holding the mutex. You have violated the rules and a race condition or an infinite wait are still not the worst kinds of UB you can get. At least these are local and related to the rules you violated so you can find the error...
If you play by the rules, either:
The waiter takes hold of the mutex first
Goes into std::condition_variable::wait. (Recall the notifier still waits on the mutex.)
Checks the predicate and sees that it doesn't hold. (Recall the notifier still waits on the mutex.)
Call some implementation defined magic to release the mutex and wait, and only now may the notifier proceed.
The notifier finally managed to take the mutex.
The notifier changes whatever needs to change for the predicate to hold true.
The notifier calls std::condition_variable::notify_one.
or:
The notifier acquires the mutex. (Recall that the waiter is blocked on trying to acquire the mutex.)
The notifier changes whatever needs to change for the predicate to hold true. (Recall that the waiter is still blocked.)
The notifier releases the mutex. (Somewhere along the way the waiter will call std::condition_variable::notify_one, but once the mutex is released...)
The waiter acquires the mutex.
The waiter calls std::condition_variable::wait.
The waiter checks while (!_Pred()) and viola! the predicate is true.
The waiter doesn't even go into the internal wait, so whether or not the notifier managed to call std::condition_variable::notify_one or didn't manage to do that yet is irrelevant.
That's the rationale behind the requirement on cppreference.com:
Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread.
Note that this is a general rule for condition variables rather than a special requirements for std::condition_variabless (including Windows CONDITION_VARIABLEs, POSIX pthread_cond_ts, etc.).
Recall that the wait overload that takes a predicate is just a convenience function so that the caller doesn't have to deal with spurious wakeups. The standard (§30.5.1/15) explicitly says that this overload is equivalent to the while loop in Microsoft's implementation:
Effects: Equivalent to:
while (!pred())
wait(lock);
Does the simple wait work? Do you test the predicate before and after calling wait? Great. You're doing the same. Or are you questioning void std::condition_variable::wait( std::unique_lock<std::mutex>& lock ); too?
Windows Critical Sections and Slim Reader/Writer Locks being user-mode facilities rather than kernel objects is immaterial and irrelevant to the question. There are alternative implementations. If you're interested to know how Windows manages to atomically release a CS/SRWL and enter a wait state (what naive pre-Vista user-mode implementations with Mutexes and Events did wrong) that's a different question.

Related

C++. std::condition_variable and multiple wait-threads

I have some class, with queue of std::function<void()> member and methods Push and Pop.
I want to implement addition method PushAndWaitUntilExecuted. It is easy when you have one consumer-thread(who call Pop) and one producer-thread(who call Push) - just simple std::condition_variable will be enough.
But my application have dynamic number of threads which can execute the same lines of code with calling PushAndWaitUntilExecuted function in parallel and wait until consumer-thread execute pushed std::function object.
A have the idea with passing std::pair<uint64_t, std::function<void()>> to queue instead of just std::function<void()>, where uint64_t - producer-thread ID(boost::this_thread::get_id()). And then consumer-thread call std::condition_variable::notify_all() and all threads will check if executed std::function have same ID with thread or not.
Is it ok solution or something better can be implemented?
More than just a condition variable needs to be introduced here, in order to avoid several different race conditions. A mutex and a job completion flag are also required.
At this point, it becomes cleaner to replace your std::function<void()> with a small class that contains this closure, as well as all the additional baggage:
struct job {
std::function<void()> implementation;
std::mutex m;
std::condition_variable flag;
bool completed=false;
};
Your queue becomes a queue of std::shared_ptr<job>s, instead of a queue of std::functions, with the jobs constructed in dynamic scope (since, of course, mutexes and condition variables are not copyable or movable, and these objects get accessed from both of your threads).
After your worker thread finishes executing the implementation, it:
locks the mutex.
sets completed to true
signals the condition variable.
And your PushAndWaitUntilExecuted, after it executes the push:
locks the mutex
waits on the condition variable, until completed is set
You must thoroughly understand that C++ gives you absolutely no guarantees, whatsoever, that after you push a new closure into your job queue, some thread doesn't immediately grab it, execute it, and finish it, before the original thread (the one that pushed it) gets around to looking at the condition variable. By now, nobody will be signaling the condition variable any more. If all you have to work with is just a condition variable here, you'll be waiting for the condition variable to get signaled until our sun explodes.
Which is why you need more than just a condition variable, a mutex and an explicit flag, and use the above approach, to correctly handle interthread sequencing.
This is a fairly classical, routine approach. You should find examples of many similar implementations in every good C++ textbook on this subject matter.

How does std::notify_all_at_thread_exit work?

According to cppref:
std::notify_all_at_thread_exit provides a mechanism to notify other
threads that a given thread has completely finished, including
destroying all thread_local objects.
I know the exact semantics of std::notify_all_at_thread_exit. What makes me puzzled is:
How to register a callback function that will be called after a given thread has finished and destroyed all of its thread-local objects?
std::notify_all_at_thread_exit takes a condition variable in its first parameter, by reference. When the thread exits, it will call notify_all on that condition variable, waking up threads that are waiting for the condition variable to be notified.
There doesn't appear to be a direct way to truly register a callback for this; you'll likely need to have a thread waiting for the condition variable to be notified (using the same lock as the one passed into std::notify_all_at_thread_exit. When the CV is notified, the thread that's waiting should verify that the wakeup isn't spurious, and then execute the desired code that should be run.
More info about how this is implemented:
At least on Google's libcxx, std::notify_all_at_thread_exit calls __thread_struct_imp::notify_all_at_thread_exit, which stores a pair with the parameters to a vector (_Notify). Upon thread death, the destructor of __thread_struct_imp iterates over this vector and notifies all of the condition variables that have been registered in this way.
Meanwhile, GNU stdc++ uses a similar approach: A notifier object is created, it's registered with __at_thread_exit, it's designed to call its destructor when run at thread exit, and the destructor actually performs the notification process. I'd need to investigate __at_thread_exit more closely as I don't understand its inner workings fully just yet.

Why does a condition variable need a lock (and therefore also a mutex) [duplicate]

This question already has answers here:
Why do pthreads’ condition variable functions require a mutex?
(10 answers)
Closed 7 years ago.
Condition variables are one of the aspects of c++11 I'm still struggling with a bit. From what I have gathered a condition variable is very similar to a semaphore.
But then again, a semaphore wouldn't need a lock to function. A condition variable does. And a lock in turn needs a mutex. So in order to use the fairly simple functionality of a semaphore we now need to not only manage a condition variable. But also a mutex and a lock.
So why does a condition variable need this? And what added functionality is provided by adding this requirement?
Condition variables are generally used to signal a change of state. A mutex is usually needed to make that change, and the following signal, atomic.
A semaphore encapsulates some state (a flag or counter) along with the signalling mechanism. A condition variable is more primitive, only providing the signal.
In general once you've signaled something has changed (via a condition variable) you need some code to run to handle that change and that code has to safely read the changed data. If you didn't have a lock associated with the cv then your thread waiting on the cv might wake up then try to (and fail) to acquire the lock associated with the data and therefore have to yield again. With a CV/Lock combo the underlying system can wake your thread up only if the thread can acquire the relevant lock as a unit and so be more efficient.
Its unlikely a CV on its own is useful as it gives no data above the fact it was signaled. If you imagine uses of cv - such as a thread-safe linked list with producers and consumers, you have variables representing {list, cv, lock} . In this case you take the lock, mutate the list, release the lock then signal the cv. On you consumer thread you'll very likely need to take the lock once signaled to act on the list, so having the lock acquired once you wake up from the CV being signaled is a good thing.
Look at something like events on windows (::CreateEvent) which are cv's without the implicit lock, a lot of the time they'll have a lock associated with them, but just not built into the actual usage.
Although this isn't the original reason condition variable in pthreads was created (they used the lock to protect the cv itself which is no longer needed in c++) the reason and usefulness of locks with cv's has migrated to whats in this answer.

C++/BOOST: Does condition_variable::wait( ) / notify( ) ensure ordering of threads waiting?

I need to know if there is a way to "queue up" threads that wait on a condition variable so that they are awoken in the correct order...without writing a bunch of queueing code, that is.
In most systems, the following reversal of the producer/consumer model (with blocking on full mailbox) may not ensure ordering:
unique_lock lock1(mutex), lock2(mutex)
ConditionVariable cv
Code Block A: (called by multiple threads)
lock(lock1)
timestampOnEntry = now()
cv.wait(lock1) // Don't worry about spurious notifies, out of scope.
somethingRequiringMonotonicOrderOfTimestamps(timestampOnEntry)
unlock(lock1)
Code Block B: (called by a single thread, typically within a loop)
lock(lock2)
somethingVeryVerySlow()
(1) unlock(lock2) // the ordering here is not a mistake
(2) cv.notify_one(lock2) // prevents needless reblocking in code block A
Note that lines (1) and (2) in the given order. This prevents an unnecessary second block on guard in code block A should the notified thread wake up before guard is unlocked by the thread in code block B.
The question is that if multiple threads are "blocked" on wait, I need to know if
*notify_one* will wake them up in the order in which blocked. Probably not (as in Java). If not by default, if there is a way to specify that.
This could of course be done with a bunch of queuing code, but I'd prefer to use a pre-canned BOOST methodology, regardless of how complicated the contents of the can are. Of course, should I convert *cv.notify_one(guard)* into *cv.notify_all(guard)*, I would be required to do the queueing code, regardless.
No such guarantess are given by the standard, notify_one may wake any thread that is currently waiting (§30.5.1):
void notify_one() noexcept;
Effects: If any threads are blocked waiting for *this, unblocks one of those theads.
The only way to ensure that a specific thread reacts to the event is to wake all threads and then have some additional synchronization mechanism that sends all but the correct thread back to sleep.
This is a fundamental limitation due to the requirements that the platform has to fulfill: Usually condition variables are implemented in a way that the waiting threads are put into a suspended state and will not get scheduled by the system again until a notify occurs. A scheduler implementation is not required to provide the functionality for selecting a specific thread for waking up (and many actually don't).
So this part of the logic inevitably has to be handled by user code, which in turn means you have to wake up all threads to make it work, because this is the only way to ensure that the correct thread will get woken at all.
The short answer, as you seem to have suspected, is no. Which thread (or threads) notify_one is going to rouse is not necessarily guaranteed.
That said, I'm not sure what to make of your example code. Specifically, passing a mutex to notify_one doesn't make sense to me (I am unaware of any condition variable implementations on any platform that signal/broadcast that way). I don't know your use case--perhaps you must have a lot of thread local data that prevents arranging your application state in such a way that any thread can pick up the necessary data to do the next task? My first reaction to that would be to refactor the code to care less about which particular OS thread does which work and focus more on the ordering of the work itself.

Cheapest way to wake up multiple waiting threads without blocking

I use boost::thread to manage threads. In my program i have pool of threads (workers) that are activated sometimes to do some job simultaneously.
Now i use boost::condition_variable: and all threads are waiting inside boost::condition_variable::wait() call on their own conditional_variableS objects.
Can i AVOID using mutexes in classic scheme, when i work with conditional_variables? I want to wake up threads, but don't need to pass some data to them, so don't need a mutex to be locked/unlocked during awakening process, why should i spend CPU on this (but yes, i should remember about spurious wakeups)?
The boost::condition_variable::wait() call trying to REACQUIRE the locking object when CV received the notification. But i don't need this exact facility.
What is cheapest way to awake several threads from another thread?
If you don't reacquire the locking object, how can the threads know that they are done waiting? What will tell them that? Returning from the block tells them nothing because the blocking object is stateless. It doesn't have an "unlocked" or "not blocking" state for it to return in.
You have to pass some data to them, otherwise how will they know that before they had to wait and now they don't? A condition variable is completely stateless, so any state that you need must be maintained and passed by you.
One common pattern is to use a mutex, condition variable, and a state integer. To block, do this:
Acquire the mutex.
Copy the value of the state integer.
Block on the condition variable, releasing the mutex.
If the state integer is the same as it was when you coped it, go to step 3.
Release the mutex.
To unblock all threads, do this:
Acquire the mutex.
Increment the state integer.
Broadcast the condition variable.
Release the mutex.
Notice how step 4 of the locking algorithm tests whether the thread is done waiting? Notice how this code tracks whether or not there has been an unblock since the thread decided to block? You have to do that because condition variables don't do it themselves. (And that's why you need to reacquire the locking object.)
If you try to remove the state integer, your code will behave unpredictably. Sometimes you will block too long due to missed wakeups and sometimes you won't block long enough due to spurious wakeups. Only a state integer (or similar predicate) protected by the mutex tells the threads when to wait and when to stop waiting.
Also, I haven't seen how your code uses this, but it almost always folds into logic you're already using. Why did the threads block anyway? Is it because there's no work for them to do? And when they wakeup, are they going to figure out what to do? Well, finding out that there's no work for them to do and finding out what work they do need to do will require some lock since it's shared state, right? So there almost always is already a lock you're holding when you decide to block and need to reacquire when you're done waiting.
For controlling threads doing parallel jobs, there is a nice primitive called a barrier.
A barrier is initialized with some positive integer value N representing how many threads it holds. A barrier has only a single operation: wait. When N threads call wait, the barrier releases all of them. Additionally, one of the threads is given a special return value indicating that it is the "serial thread"; that thread will be the one to do some special job, like integrating the results of the computation from the other threads.
The limitation is that a given barrier has to know the exact number of threads. It's really suitable for parallel processing type situations.
POSIX added barriers in 2003. A web search indicates that Boost has them, too.
http://www.boost.org/doc/libs/1_33_1/doc/html/barrier.html
Generally speaking, you can't.
Assuming the algorithm looks something like this:
ConditionVariable cv;
void WorkerThread()
{
for (;;)
{
cv.wait();
DoWork();
}
}
void MainThread()
{
for (;;)
{
ScheduleWork();
cv.notify_all();
}
}
NOTE: I intentionally omitted any reference to mutexes in this pseudo-code. For the purposes of this example, we'll suppose ConditionVariable does not require a mutex.
The first time through MainTnread(), work is queued and then it notifies WorkerThread() that it should execute its work. At this point two things can happen:
WorkerThread() completes DoWork() before MainThread() can complete ScheduleWork().
MainThread() completes ScheduleWork() before WorkerThread() can complete DoWork().
In case #1, WorkerThread() comes back around to sleep on the CV, and is awoken by the next cv.notify() and all is well.
In case #2, MainThread() comes back around and notifies... nobody and continues on. Meanwhile WorkerThread() eventually comes back around in its loop and waits on the CV but it is now one or more iterations behind MainThread().
This is known as a "lost wakeup". It is similar to the notorious "spurious wakeup" in that the two threads now have different ideas about how many notify()s have taken place. If you are expecting the two threads to maintain synchrony (and usually you are), you need some sort of shared synchronization primitive to control it. This is where the mutex comes in. It helps avoid lost wakeups which, arguably, are a more serious problem than the spurious variety. Either way, the effects can be serious.
UPDATE: For further rationale behind this design, see this comment by one of the original POSIX authors: https://groups.google.com/d/msg/comp.programming.threads/cpJxTPu3acc/Hw3sbptsY4sJ
Spurious wakeups are two things:
Write your program carefully, and make sure it works even if you
missed something.
Support efficient SMP implementations
There may be rare cases where an "absolutely, paranoiacally correct"
implementation of condition wakeup, given simultaneous wait and
signal/broadcast on different processors, would require additional
synchronization that would slow down ALL condition variable operations
while providing no benefit in 99.99999% of all calls. Is it worth the
overhead? No way!
But, really, that's an excuse because we wanted to force people to
write safe code. (Yes, that's the truth.)
boost::condition_variable::notify_*(lock) does NOT require that the caller hold the lock on the mutex. THis is a nice improvement over the Java model in that it decouples the notification of threads with the holding of the lock.
Strictly speaking, this means the following pointless code SHOULD DO what you are asking:
lock_guard lock(mutex);
// Do something
cv.wait(lock);
// Do something else
unique_lock otherLock(mutex);
//do something
otherLock.unlock();
cv.notify_one();
I do not believe you need to call otherLock.lock() first.