libc++ implementation of std::condition_variable_any - c++

Condition variables should have have a single order with respect to notify() and unlock_sleep() (an imaginary function call used within wait() where the mutex is unlocked and the thread sleeps as one atomic sequence of operations) operations. To achieve this with arbitrary lockables std::condition_variable_any implementations typically use another mutex internally (to ensure atomicity and to sleep on)
If the internal unlock_sleep() and notify() (notify_one() or notify_all()) operations are not atomic with respect to each other you risk a thread unlocking the mutex, another thread signaling and then the original thread going to sleep and never waking up.
I was reading the libstdc++ and libc++ implementations of std::condition_variable_any and noticed this code in the libc++ implementation
{lock_guard<mutex> __lx(*__mut_);}
__cv_.notify_one();
the internal mutex is locked and then immediately unlocked before the signal operation. Doesn't this risk the problem I described above?
libstdc++ seems to have gotten this right

The C++11 and later standards explicitly say "The execution of notify_one and notify_all shall be atomic". So in one sense I think that you are correct that the internal mutex should be held across the call down to the platform's underlying condition variable notify call (for example pthread_cond_signal())
However, I don't think that the libc++ implementation will cause notifications to be missed because without the notifying thread synchronizing on the lock the waiting thread passes to wait() (or some other synchronization between the two threads) while calling notify_one() (or notify_all()) there is no way to ensure which of the two threads is 'first' to the notify or wait anyway. So if the notification can be missed in libc++'s current implementation, it could also be missed if libc++ were changed to hold the internal lock across its call down to the platform's notify API.
So I think that libc++ can invoke the "as if" rule to say that the implementation of notify_one()/notify_any() is "atomic enough".

Related

std::condition_variable memory writes visibility

Does std::condition_variable::notify_one() or std::condition_variable::notify_all() guarantee that non-atomic memory writes in the current thread prior to the call will be visible in notified threads?
Other threads do:
{
std::unique_lock lock(mutex);
cv.wait(lock, []() { return values[threadIndex] != 0; });
// May a thread here see a zero value and therefore start to wait again?
}
Main thread does:
fillData(values); // All values are zero and all threads wait() before calling this.
cv.notify_all(); // Do need some memory fence or lock before this
// to ensure that new non-zero values will be visible
// in other threads immediately after waking up?
Doesn't notify_all() store some atomic value therefore enforcing memory ordering? I did not clarified it.
UPD: according to Superlokkus' answer and an answer here: we have to acquire a lock to ensure memory writes visibility in other threads (memory propagation), otherwise threads in my case may read zero values.
Also I missed this quote here about condition_variable, which specifically answers my question. Even an atomic variable has to be modified under a lock in a case when the modification must become visible immediately.
Even if the shared variable is atomic, it must be modified under the
mutex in order to correctly publish the modification to the waiting
thread.
I guess you are mixing up memory ordering of so called atomic values and the mechanisms of classic lock based synchronization.
When you have a datum which is shared between threads, lets say an int for example, one thread can not simply read it while the other thread might be write to it meanwhile. Otherwise we would have a data race.
To get around this for long time we used classic lock based synchronization:
The threads share at least a mutex and the int. To read or to write any thread has to hold the lock first, meaning they wait on the mutex. Mutexes are build so that they are fine that this can happen concurrently. If a thread wins gettting the mutex it can change or read the int and then should unlock it, so others can read/write too. Using a conditional variable like you used is just to make the pattern "readers wait for a change of a value by a writer" more efficient, they get woken up by the cv instead of periodically waiting on the lock, reading, and unlocking, which would be called busy waiting.
So because you hold the lock in any after waiting on the mutex or in you case, correctly (mutex is still needed) waiting on the conditional variable, you can change the int. And readers will read the new value after the writer was able to wrote it, never the old. UPDATE: However one thing if have to add, which might also be the cause of confusion: Conditional variables are subject for so called spurious wakeups. Meaning even though you write did not have notified any thread, a read thread might still wake up, with the mutex locked. So you have to check if you writer actually waked you up, which is usually done by the writer by changing another datum just to notify this, or if its suitable by using the same datum you already wanted to share. The lambda parameter overload of std::condition_variable::wait was just made to make the checking and going back to sleep code looking a bit prettier. Based on your question now I don't know if you want to use you values for this job.
However at snippet for the "main" thread is incorrect or incomplete:
You are not synchronizing on the mutex in order to change values.
You have to hold the lock for that, but notifying can be done without the lock.
std::unique_lock lock(mutex);
fillData(values);
lock.unlock();
cv.notify_all();
But these mutex based patters have some drawbacks and are slow, only one thread at a time can do something. This is were so called atomics, like std::atomic<int> came into play. They can be written and read at the same time without an mutex by multiple threads concurrently. Memory ordering is only a thing to consider there and an optimization for cases where you uses several of them in a meaningful way or you don't need the "after the write, I never see the old value" guarantee. However with it's default memory ordering memory_order_seq_cst you would also be fine.

Why boost::condition_variable can use pthread_cond_signal to wake up only one thread

In the source code of boost::condition_variable, method condition_variable::notify_one() try to use pthread_cond_signal() to wake up only one thread.
https://code.woboq.org/appleseed/include/boost/thread/pthread/condition_variable.hpp.html
inline void condition_variable::notify_one() BOOST_NOEXCEPT
{
#if defined BOOST_THREAD_PROVIDES_INTERRUPTIONS
boost::pthread::pthread_mutex_scoped_lock internal_lock(&internal_mutex);
#endif
BOOST_VERIFY(!pthread_cond_signal(&cond));
}
However, POSIX says :
The pthread_cond_signal() function shall unblock at least one of the threads that are blocked on the specified condition variable cond (if any threads are blocked on cond).
So why boost::condition_variable make sure that the pthread_cond_signal just wake up one thread???
So why boost::condition_variable make sure that the pthread_cond_signal just wake up one thread???
Why? That question is moot. The question is "whether". And it doesn't (as you see).
You can see that it uses the pthread API there to unblock at least one waiting thread.
This is merely as opposed to notify_all, which uses pthread_cond_broadcast.
The distinction is useful because it can improve the efficiency of concurrent operations in situations where waking up all waiters would be wasteful.
Related: when using condition variables, you must always take spurious wake-up into account
You seem to be asking about the name of boost::condition_variable::notify_one() in light of the fact that the pthread-based implementation may in fact wake up more than one thread.
The fact is that although the specifications for pthread_cond_signal() say that it will unblock "at least one" thread blocked on the CV (provided there are any), that is weaselly standard-speak. Although it is allowed for pthread_cond_signal() to unblock more than one thread, it is usual and expected that if any threads are blocked on the CV when it is signaled, then exactly one will be unblocked. Allowing for more than one to be unblocked allows for implementations to be lighter-weight and more efficient, and it places no new burden on programmers because the same measures that handle spurious wakeups should be equally effective against extra wakeups, supposing that you even choose to distinguish between those.
Thus, boost::condition_variable::notify_one() is so named because "notify_one" accurately represents a behavior that programmers are anticipated sometimes to want, and the so-named method is the way for them to request it. That occasionally it delivers something a bit different is a characteristic of a very wide variety of functions. It is not helpful to try to capture all the details in the name. Not only would you end up with unreadably long names, such as notify_at_least_one_if_any_are_waiting, but you would lose the sense of the primary expectation for the function's behavior.

unlock the mutex after condition_variable::notify_all() or before?

Looking at several videos and the documentation example, we unlock the mutex before calling the notify_all(). Will it be better to instead call it after?
The common way:
Inside the Notifier thread:
//prepare data for several worker-threads;
//and now, awaken the threads:
std::unique_lock<std::mutex> lock2(sharedMutex);
_threadsCanAwaken = true;
lock2.unlock();
_conditionVar.notify_all(); //awaken all the worker threads;
//wait until all threads completed;
//cleanup:
_threadsCanAwaken = false;
//prepare new batches once again, etc, etc
Inside one of the worker threads:
while(true){
// wait for the next batch:
std::unique_lock<std::mutex> lock1(sharedMutex);
_conditionVar.wait(lock1, [](){return _threadsCanAwaken});
lock1.unlock(); //let sibling worker-threads work on their part as well
//perform the final task
//signal the notifier that one more thread has completed;
//loop back and wait until the next task
}
Notice how the lock2 is unlocked before we notify the condition variable - should we instead unlock it after the notify_all() ?
Edit
From my comment below: My concern is that, what if the worker spuriously awakes, sees that the mutex is unlocked, super-quickly completes the task and loops back to the start of while. Now the slow-poke Notifier finally calls notify_all(), causing the worker to loop an additional time (excessive and undesired).
There are no advantages to unlocking the mutex before signaling the condition variable unless your implementation is unusual. There are two disadvantages to unlocking before signaling:
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
There is a performance penalty for unlocking before signaling that is avoided by unlocking after signaling. If you signal before you unlock, a good implementation will know that your signal cannot possibly render any thread ready-to-run because the mutex is held by the calling thread and any thread affects by the condition variable necessarily cannot make forward progress without the mutex. This permits a significant optimization (often called "wait morphing") that is not possible if you unlock first.
So signal while holding the lock unless you have some unusual reason to do otherwise.
should we instead unlock it after the notify_all() ?
It is correct to do it either way but you may have different behavior in different situations. It is quite difficult to predict how it will affect performance of your program - I've seen both positive and negative effects for different applications. So it is better you profile your program and make decision on your particular situation based on profiling.
As mentioned here : cppreference.com
The notifying thread does not need to hold the lock on the same mutex
as the one held by the waiting thread(s); in fact doing so is a
pessimization, since the notified thread would immediately block
again, waiting for the notifying thread to release the lock.
That said, documentation for wait
At the moment of blocking the thread, the function automatically calls
lck.unlock(), allowing other locked threads to continue.
Once notified (explicitly, by some other thread), the function
unblocks and calls lck.lock(), leaving lck in the same state as when
the function was called. Then the function returns (notice that this
last mutex locking may block again the thread before returning).
so when notified wait will re-attempt to gain the lock and in that process it will get blocked again till original notifying thread releases the lock.
So I'll suggest that release the lock before calling notify. As done in example on cppreference.com and most importantly
Don't be Pessimistic.
David's answer seems to me wrong.
First, assuming the simple case of two threads, one waiting for the other on a condition variable, unlocking first by the notifier will not waken the other waiting thread, as the signal has not arrived. Then the notify call will immediately waken the waiting thread. You do not need any special optimizations.
On the other hand, signalling first has the potential of waking up a thread and making it sleep immediately again, as it cannot hold the lock—unless wait morphing is implemented.
Wait morphing does not exist in Linux at least, according to the answer under this StackOverflow question: Which OS / platforms implement wait morphing optimization?
The cppreference example also unlocks first before signalling: https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all
It explicit says:
The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s). Doing so may be a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock, though some implementations recognize the pattern and do not attempt to wake up the thread that is notified under lock.
should we instead unlock it after the notify_all() ?
After reading several related posts, I've formed the opinion that it's purely a performance issue. If OS supports "wait morphing", unlock after; otherwise, unlock before.
I'm adding an answer here to augment that of #DavidSchwartz 's. Particularly, I'd like to clarify his point 1.
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
The 1st thing I said is that, because it's a CV and not a Mutex, a better term for the so-called "deadlock" might be "sleep paralysis" - a mistake some programs make is that
a thread that's supposed to wake
went to sleep due to not rechecking the condition it's been waiting for before wait'ng again.
The 2nd thing is that, when waking some other thread(s),
the default choice should be broadcast/notify_all (broadcast is the POSIX term, which is equivalent to its C++ counterpart).
signal/notify is an optimized special case used for when there's only 1 other thread is waiting.
Finally 3rd, David is adamant that
it's better to unlock after notify,
because it can avoid the "deadlock" which I've been referring to as "sleep paralysis".
If it's unlock then notify, then there's a window where another thread (let's call this the "wrong" thread) may i.) acquire the mutex, ii.)going into wait, and iii.) wake up. The steps i. ii. and iii. happens too quickly, consumed the signal, leaving the intended (let's call it "correct") thread in sleep.
I discussed this extensively with David, he clarified that only when all 3 points are violated ( 1. condvar associated with several separate conditions and/or didn't check it before waiting again; 2. signal/notify only 1 thread when there're more than 1 other threads using the condvar; 3. unlock before notify creating a window for race condition ), the "sleep paralysis" would occur.
Finally, my recommendation is that, point 1 and 2 are essential for correctness of the program, and fixing issues associated with 1 and 2 should be prioritized over 3, which should only be a augmentative "last resort".
For the purpose of providing reference, manpage for signal/broadcast and wait contains some info from version 3 of Single Unix Specification that gave some explanations on point 1 and 2, and partly 3. Although specified for POSIX/Unix/Linux in C, it's concepts are applicable to C++.
As of this writing (2023-01-31), the 2018 edition of version 4 of Single Unix Specification is released, and the drafting of version 5 is underway.

C++/BOOST: Does condition_variable::wait( ) / notify( ) ensure ordering of threads waiting?

I need to know if there is a way to "queue up" threads that wait on a condition variable so that they are awoken in the correct order...without writing a bunch of queueing code, that is.
In most systems, the following reversal of the producer/consumer model (with blocking on full mailbox) may not ensure ordering:
unique_lock lock1(mutex), lock2(mutex)
ConditionVariable cv
Code Block A: (called by multiple threads)
lock(lock1)
timestampOnEntry = now()
cv.wait(lock1) // Don't worry about spurious notifies, out of scope.
somethingRequiringMonotonicOrderOfTimestamps(timestampOnEntry)
unlock(lock1)
Code Block B: (called by a single thread, typically within a loop)
lock(lock2)
somethingVeryVerySlow()
(1) unlock(lock2) // the ordering here is not a mistake
(2) cv.notify_one(lock2) // prevents needless reblocking in code block A
Note that lines (1) and (2) in the given order. This prevents an unnecessary second block on guard in code block A should the notified thread wake up before guard is unlocked by the thread in code block B.
The question is that if multiple threads are "blocked" on wait, I need to know if
*notify_one* will wake them up in the order in which blocked. Probably not (as in Java). If not by default, if there is a way to specify that.
This could of course be done with a bunch of queuing code, but I'd prefer to use a pre-canned BOOST methodology, regardless of how complicated the contents of the can are. Of course, should I convert *cv.notify_one(guard)* into *cv.notify_all(guard)*, I would be required to do the queueing code, regardless.
No such guarantess are given by the standard, notify_one may wake any thread that is currently waiting (§30.5.1):
void notify_one() noexcept;
Effects: If any threads are blocked waiting for *this, unblocks one of those theads.
The only way to ensure that a specific thread reacts to the event is to wake all threads and then have some additional synchronization mechanism that sends all but the correct thread back to sleep.
This is a fundamental limitation due to the requirements that the platform has to fulfill: Usually condition variables are implemented in a way that the waiting threads are put into a suspended state and will not get scheduled by the system again until a notify occurs. A scheduler implementation is not required to provide the functionality for selecting a specific thread for waking up (and many actually don't).
So this part of the logic inevitably has to be handled by user code, which in turn means you have to wake up all threads to make it work, because this is the only way to ensure that the correct thread will get woken at all.
The short answer, as you seem to have suspected, is no. Which thread (or threads) notify_one is going to rouse is not necessarily guaranteed.
That said, I'm not sure what to make of your example code. Specifically, passing a mutex to notify_one doesn't make sense to me (I am unaware of any condition variable implementations on any platform that signal/broadcast that way). I don't know your use case--perhaps you must have a lot of thread local data that prevents arranging your application state in such a way that any thread can pick up the necessary data to do the next task? My first reaction to that would be to refactor the code to care less about which particular OS thread does which work and focus more on the ordering of the work itself.

Not locking mutex for pthread_cond_timedwait and pthread_cond_signal ( on Linux )

Is there any downside to calling pthread_cond_timedwait without taking a lock on the associated mutex first, and also not taking a mutex lock when calling pthread_cond_signal ?
In my case there is really no condition to check, I want a behavior very similar to Java wait(long) and notify().
According to the documentation, there can be "unpredictable scheduling behavior". I am not sure what that means.
An example program seems to work fine without locking the mutexes first.
The first is not OK:
The pthread_cond_timedwait() and
pthread_cond_wait() functions shall
block on a condition variable. They
shall be called with mutex locked by
the calling thread or undefined
behavior results.
http://opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwait.html
The reason is that the implementation may want to rely on the mutex being locked in order to safely add you to a waiter list. And it may want to release the mutex without first checking it is held.
The second is disturbing:
if predictable scheduling behaviour is
required, then that mutex is locked by
the thread calling
pthread_cond_signal() or
pthread_cond_broadcast().
http://www.opengroup.org/onlinepubs/007908775/xsh/pthread_cond_signal.html
Off the top of my head, I'm not sure what the specific race condition is that messes up scheduler behaviour if you signal without taking the lock. So I don't know how bad the undefined scheduler behaviour can get: for instance maybe with broadcast the waiters just don't get the lock in priority order (or however your particular scheduler normally behaves). Or maybe waiters can get "lost".
Generally, though, with a condition variable you want to set the condition (at least a flag) and signal, rather than just signal, and for this you need to take the mutex. The reason is that otherwise, if you're concurrent with another thread calling wait(), then you get completely different behaviour according to whether wait() or signal() wins: if the signal() sneaks in first, then you'll wait for the full timeout even though the signal you care about has already happened. That's rarely what users of condition variables want, but may be fine for you. Perhaps this is what the docs mean by "unpredictable scheduler behaviour" - suddenly the timeslice becomes critical to the behaviour of your program.
Btw, in Java you have to have the lock in order to notify() or notifyAll():
This method should only be called by a
thread that is the owner of this
object's monitor.
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#notify()
The Java synchronized {/}/wait/notifty/notifyAll behaviour is analogous to pthread_mutex_lock/pthread_mutex_unlock/pthread_cond_wait/pthread_cond_signal/pthread_cond_broadcast, and not by coincidence.
Butenhof's excellent "Programming with POSIX Threads" discusses this right at the end of chapter 3.3.3.
Basically, signalling the condvar without locking the mutex is a potential performance optimisation: if the signalling thread has the mutex locked, then the thread waking on the condvar has to immediately block on the mutex that the signalling thread has locked even if the signalling thread is not modifying any of the data the waiting thread will use.
The reason that "unpredictable scheduler behavior" is mentioned is that if you have a high-priority thread waiting on the condvar (which another thread is going to signal and wakeup the high priority thread), any other lower-priority thread can come and lock the mutex so that when the condvar is signalled and the high-priority thread is awakened, it has to wait on the lower-priority thread to release the mutex. If the mutex is locked whilst signalling, then the higher-priority thread will be scheduled on the mutex before the lower-priority thread: basically you know that that when you "awaken" the high-priority thread it will awaken as soon as the scheduler allows it (of course, you might have to wait on the mutex before signalling the high-priority thread, but that's a different issue).
The point of waiting on conditional variable paired with a mutex is to atomically enter wait and release the lock, i.e. allow other threads to modify the protected state, then again atomically receive notification of the state change and acquire the lock. What you describe can be done with many other methods like pipes, sockets, signals, or - probably the most appropriate - semaphores.
I think this should work (note untested code):
// initialize a semaphore
sem_t sem;
sem_init(&sem,
0, // not shared
0 // initial value of 0
);
// thread A
struct timespec tm;
struct timeb tp;
const long sec = msecs / 1000;
const long millisec = msecs % 1000;
ftime(&tp);
tp.time += sec;
tp.millitm += millisec;
if(tp.millitm > 999) {
tp.millitm -= 1000;
tp.time++;
}
tm.tv_sec = tp.time;
tm.tv_nsec = tp.millitm * 1000000;
// wait until timeout or woken up
errno = 0;
while((sem_timedwait(&sem, &tm)) == -1 && errno == EINTR) {
continue;
}
return errno == ETIMEDOUT; // returns true if a timeout occured
// thread B
sem_post(&sem); // wake up Thread A early
Conditions should be signaled outside of the mutex whenever possible. Mutexes are a necessary evil in concurrent programming. Their use leads to contention which robs the system of the maximum performance that it can gain from the use of multiple processors.
The purpose of a mutex is to guard access to some shared variables in the program so that they behave atomically. When a signaling operation is done inside a mutex, it causes an inclusion of hundreds of irrelevant machine cycles into the mutex which have nothing to do with guarding the shared data. Potentially, it calls from a user space all the way into a kernel.
The notes about "predictable scheduler behavior" in the standard are completely bogus.
When we want the machine to execute statements in a predictable, well-defined order, the tool for that is the sequencing of statements within a single thread of execution: S1 ; S2. Statement S1 is "scheduled" before S2.
We use threads when we realize that some actions are independent and their scheduling order is not important, and there are performance benefits to be realized, like more timely response to real time events or computing on multiple processors.
At times when scheduling orders do become important among multiple threads, this falls under a concept called priority. Priority resolves what happens first when any one of N statements could potentially be scheduled to execute. Another tool for ordering under multithreading is queuing. Events are placed into a queue by one or more threads and a single service thread processes the events in the queue order.
The bottom line is, the placement of pthread_cond_broadcast is not an appropriate tool for controlling execution order. It will not make execution order predictable in the sense that the program suddenly has exactly the same, reproducible behavior on every platform.
"unpredictable scheduling behavior" means just that. You don't know what's going to happen.
Nor do the implementation. It could work as expected. It could crash your app. It could work fine for years, then a race condition makes your app go monkey. It could deadlock.
Basically if any docs suggest anything undefined/unpredicatble can happen unless you do what the docs tell you to do, you better do it. Else stuff might blow up in your face.
(And it won't blow up until you put the code into production , just to annoy you even more. Atleast that's my experience)