Avoiding lost-wakeup when condition update is a blocking function

Avoiding lost-wakeup when condition update is a blocking function - c++

I'm writing an event loop that goes to sleep when there's no work to do by waiting on a "work to do" condition variable (work_to_do). This condition variable could be notified by different threads based on various events. When an event happens in another thread it notifies on the condition variable, waking up the event loop which then checks the conditions that could have triggered the notify, loops until there's no more work to do and then waits again. One of the conditions is set by a blocking function (WaitForMessage()).
The event loop thread:
std::lock_guard<std::mutex> lock(work_to_do_lock);
for (;;) {
if (condition1) {
// Act on condition 1.
} else if (condition2) {
// Act on condition 2.
} else if (HasMessage()) {
// Act on receiving message.
} else {
work_to_do.wait(lock);
}
}
The thread that handles the notify from the blocking function:
for (;;) {
// Wait for message to be received (blocking). Once it returns you are
// guaranteed that HasMessage() will return true.
WaitForMessage();
// Wake-up the main event loop.
work_to_do.notify_one();
}
The main thread acquires a lock on the mutex guarding the condition variable (work_to_do_lock) before entering the event loop, and passes it into the wait() call when there's no work to do. To avoid lost-wakeups, the common advice is that all notifiers must hold the lock while updating their condition states. However, if you were to guard the WaitForMessage() call with work_to_do_lock you could prevent other signals from waking up the event loop.
The solution I came up with is to acquire and release the lock after WaitForMessage() but before notify_one():
for (;;) {
// Wait for message to be received (blocking). Once it returns you are
// guaranteed that HasMessage() will return true.
WaitForMessage();
{
std::lock_guard<std::mutex> lock(work_to_do_lock);
}
// Wake-up the main event loop.
work_to_do.notify_one();
}
This should avoid the lost-wakeup issue, as it is no longer possible for both the condition to become true (WaitForMessage() to return) and the notify_one() to occur in-between the condition check (HasMessage()) and the wait().
An alternative approach is to not rely on HasMessage() and just update a shared variable, which we could guard with the lock:
for (;;) {
// Wait for message to be received (blocking). Once it returns you are
// guaranteed that HasMessage() will return true.
WaitForMessage();
{
std::lock_guard<std::mutex> lock(work_to_do_lock);
has_message = true;
}
// Wake-up the main event loop.
work_to_do.notify_one();
}
Corresponding event loop that checks new condition predicate:
std::lock_guard<std::mutex> lock(work_to_do_lock);
for (;;) {
if (condition1) {
// Act on condition 1.
} else if (condition2) {
// Act on condition 2.
} else if (has_message) {
has_message = false;
// Act on receiving message.
} else {
work_to_do.wait(lock);
}
}
I've never seen the former approach before, so I was wondering if there was a flaw with the design or a reason it's typically avoided? It seems that you could use this approach as a general replacement for locking the condition variable lock before the condition state update, assuming that the specific condition state write/read itself is protected by some mutual exclusion mechanism.

Your approach works, but it’s less efficient than one which reuses whatever synchronization makes it safe to call WaitForMessage and HasMessage concurrently (or, put differently, takes your work_to_do_lock to update the HasMessage value rather than (say) using an atomic for it). Of course, if that’s inaccessible to this code, this is about the best you can do, since you need mutual exclusion for the other conditions.

Related

Is it mandatory to lock mutex before signaling on condition variable?

We have implemented TaskRunner whose functions will be called by different threads to start, stop and post tasks. TaskRunner will internally create a thread and if the queue is not empty, it will pop the task from queue and executes it. Start() will check if the thread is running. If not creates a new thread. Stop() will join the thread. The code is as below.
bool TaskRunnerImpl::PostTask(Task* task) {
tasks_queue_.push_back(task);
return true;
}
void TaskRunnerImpl::Start() {
std::lock_guard<std::mutex> lock(is_running_mutex_);
if(is_running_) {
return;
}
is_running_ = true;
runner_thread_ = std::thread(&TaskRunnerImpl::Run, this);
}
void TaskRunnerImpl::Run() {
while(is_running_) {
if(tasks_queue_.empty()) {
continue;
}
Task* task_to_run = tasks_queue_.front();
task_to_run->Run();
tasks_queue_.pop_front();
delete task_to_run;
}
}
void TaskRunnerImpl::Stop() {
std::lock_guard<std::mutex> lock(is_running_mutex_);
is_running_ = false;
if(runner_thread_.joinable()) {
runner_thread_.join();
}
}
We want to use conditional variables now otherwise the thread will be continuously checking whether the task queue is empty or not. We implemented as below.
Thread function (Run()) will wait on condition variable.
PostTask() will signal if some one posts a task.
Stop() will signal if some one calls stop.
Code is as below.
bool TaskRunnerImpl::PostTask(Task* task) {
std::lock_guard<std::mutex> taskGuard(m_task_mutex);
tasks_queue_.push_back(task);
m_task_cond_var.notify_one();
return true;
}
void TaskRunnerImpl::Start() {
std::lock_guard<std::mutex> lock(is_running_mutex_);
if(is_running_) {
return;
}
is_running_ = true;
runner_thread_ = std::thread(&TaskRunnerImpl::Run, this);
}
void TaskRunnerImpl::Run() {
while(is_running_) {
Task* task_to_run = nullptr;
{
std::unique_lock<std::mutex> mlock(m_task_mutex);
m_task_cond_var.wait(mlock, [this]() {
return !(is_running_ && tasks_queue_.empty());
});
if(!is_running_) {
return;
}
if(!tasks_queue_.empty()) {
task_to_run = tasks_queue_.front();
task_to_run->Run();
tasks_queue_.pop_front();
}
}
if(task_to_run)
delete task_to_run;
}
}
void TaskRunnerImpl::Stop() {
std::lock_guard<std::mutex> lock(is_running_mutex_);
is_running_ = false;
m_task_cond_var.notify_one();
if(runner_thread_.joinable()) {
runner_thread_.join();
}
}
I have couple of questions as below. Can some one please help me to understand these.
Condition variable m_task_cond_var is linked with mutex m_task_mutex. But Stop() already locks mutex is_running_mutex to gaurd 'is_running_'. Do I need to lock m_task_mutex before signaling? Here I am not convinced why to lock m_task_mutex as we are not protecting any thing related to task queue.
In Thread function(Run()), we are reading is_running_ without locking is_running_mutex. Is this correct?

Do I need to lock m_task_mutex before signaling [In Stop]?
When the predicate being tested in condition_variable::wait method depends on something happening in the signaling thread (which is almost always), then you should obtain the mutex before signaling. Consider the following possibility if you are not holding the m_task_mutex:
The watcher thread (TaskRunnerImpl::Run) wakes up (via spurious wakeup or being notified from elsewhere) and obtains the mutex.
The watcher thread checks its predicate and sees that it is false.
The signaler thread (TaskRunnerImpl::Stop) changes the predicate to return true (by setting is_running_ = false;).
The signaler thread signals the condition variable.
The watcher thread waits to be signaled (bad)
the signal has already come and gone
the predicate was false, so the watcher begins waiting, possibly indefinitely.
The worst that can happen if you are holding the mutex when you signal is that, the blocked thread (TaskRunnerImpl::Run) wakes up and is immediately blocked when trying to obtain the mutex. This can have some performance implications.
In [TaskRunnerImpl::Run] , we are reading is_running_ without locking is_running_mutex. Is this correct?
In general no. Even if it's of type bool. Because a boolean is typically implemented as a single byte, it's possible that one thread is writing to the byte while you are reading, resulting in a partial read. In practice, however, it's safe. That said, you should obtain the mutex before you read (and then release immediately afterwards).
In fact, it may be preferable to use std::atomic<bool> instead of a bool + mutex combination (or std::atomic_flag if you want to get fancy) which will have the same effect, but be easier to work with.

Do I need to lock m_task_mutex before signaling [In Stop]?
Yes you do. You must change condition under the same mutex and send signal either after the mutex is locked or unlocked after the change. If you do not use the same mutex, or send signal before that mutex is locked you create race condition that std::condition_variable is created to solve.
Logic is this:
Watching thread locks mutex and checks watched condition. If it did not happen it goes to sleep and unlocks the mutex atomically. So signaling thread lock the mutex, change condition and signal. If signalling thread does that before watching one locks the mutex, then watchiong one would see condition happen and would not go to sleep. If it locks before, it would go to sleep and woken when signalling thread raise the signal.
Note: you can signal condition variable before or after mutex is unlocked, both cases is correct but may affect performance. But it is incorrect to signal before locking the mutex.
Condition variable m_task_cond_var is linked with mutex m_task_mutex. But Stop() already locks mutex is_running_mutex to gaurd 'is_running_'. Do I need to lock m_task_mutex before signaling? Here I am not convinced why to lock m_task_mutex as we are not protecting any thing related to task queue.
You overcomlicated your code and made things worse. You should use only one mutex in this case and it would work as intended.
In Thread function(Run()), we are reading is_running_ without locking is_running_mutex. Is this correct?
On x86 hardware it may "work", but from language point of view this is UB.

Using std::condition_variable to wait in the message sending thread, a deadlock occurred

I am writing a network module, the sending of data is carried out in a separate thread, using a concurrent queue to synchronize data in the main thread.
private:
std::mutex mutex_;
std::condition_variable blockNotification_;
moodycamel::ConcurrentQueue<Envelope> sendQueue_;
std::promise<bool> senderThreadStopped_;
void AsyncTransport::RunSender()
{
while (!drain_)
{
SendAllQueuedEnvelope();
std::unique_lock<std::mutex> lock(mutex_);
blockNotification_.wait(lock);
}
// Make sure all envelope have been sent.
SendAllQueuedEnvelope();
senderThreadStopped_.set_value(true);
assert(sendQueue_.size_approx() == 0);
}
void AsyncTransport::SendAllQueuedEnvelope()
{
auto envelope = Envelope::Wrap(nullptr);
while (sendQueue_.try_dequeue(envelope))
{
envelope = syncTransport_->Send(envelope);
}
}
Envelope AsyncTransport::Send(const Envelope& envelope) const
{
if (drain_)
{
return envelope.With<SentFaildStamp>("The current transport has drained.");
}
if (!sendQueue_.try_enqueue(envelope.CloneContent()))
{
return envelope.With<SentFaildStamp>("Send queue is full.");
}
blockNotification_.notify_all();
return envelope;
}
RunSender It runs in a separate thread, and always gets data from the concurrent queue，When all the data in the queue is sent, we wait for the thread to avoid additional cpu overhead until there is new data in the queue.
Send method call in main thread.
But I found that I had a deadlock, what did I do wrong?
I expect the sending thread to enter wait after sending data, and wake up again after there is new data in the queue.

The Send() method isn't thread-safe. I would use a std:lock_guard in a new scope to lock the mutex and ensure it is unlocked before the notify_all call like this:
Envelope AsyncTransport::Send(const Envelope& envelope) const
{
{
const std::lock_guard<std::mutex> lock(mutex_);
if (drain_)
{
return envelope.With<SentFaildStamp>("The current transport has drained.");
}
if (!sendQueue_.try_enqueue(envelope.CloneContent()))
{
return envelope.With<SentFaildStamp>("Send queue is full.");
}
}
blockNotification_.notify_all();
return envelope;
}
Since the lock_guard locks the mutex, you would have to either make the mutex mutable to be used in a const function or remove the const specifier on the function.
You should always have a condition with a wait to protect against spurious wake-ups. See CP.42. So I would change the wait to include a condition like this:
std::unique_lock<std::mutex> lock(mutex_);
blockNotification_.wait(lock, [&]() { return drain_ || !sendQueue_.empty(); });
Now the wait will only wake up once drain is true or the sendQueue has something in it.

You need a variable that is shared between threads and is the condition predicate.
You need to take a lock before reading or writing the predicate. The condition variable wait will unlock before sleeping so the other thread can lock, update the predicate and unlock. You can send the notify before or after unlocking. I prefer after, but meh.
A condition variable on its own is useless. It must always go along with a lock protected variable, or set of variables, which must be checked before continuing after waiting.
And of course it then only makes sense to update whatever that variable is before sending a notification.

Regarding the deadlock: it may happen that blockNotification_.notify_all(); happens before blockNotification_.wait(lock); and then the latter will wait forever. You need to use wait_for that checks if the queue is not-empty so that block_notification can exit when new messages are ready to be sent. Be aware of spurious wakeup.

Using std::condition_variable with atomic<bool>

There are several questions on SO dealing with atomic, and other that deal with std::condition_variable. But my question if my use below is correct?
Three threads, one ctrl thread that does preparation work before unpausing the two other threads. The ctrl thread also is able to pause the worker threads (sender/receiver) while they are in their tight send/receive loops.
The idea with using the atomic is to make the tight loops faster in case the boolean for pausing is not set.
class SomeClass
{
public:
//...
// Disregard that data is public...
std::condition_variable cv; // UDP threads will wait on this cv until allowed
// to run by ctrl thread.
std::mutex cv_m;
std::atomic<bool> pause_test_threads;
};
void do_pause_test_threads(SomeClass *someclass)
{
if (!someclass->pause_test_threads)
{
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
}
}
void unpause_test_threads(SomeClass *someclass)
{
if (someclass->pause_test_threads)
{
{
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = false;
}
someclass->cv.notify_all(); // Allow send/receive threads to run.
}
}
void wait_to_start(SomeClass *someclass)
{
std::unique_lock<std::mutex> lk(someclass->cv_m); // RAII, no need for unlock.
auto not_paused = [someclass](){return someclass->pause_test_threads == false;};
someclass->cv.wait(lk, not_paused);
}
void ctrl_thread(SomeClass *someclass)
{
// Do startup work
// ...
unpause_test_threads(someclass);
for (;;)
{
// ... check for end-program etc, if so, break;
if (lost ctrl connection to other endpoint)
{
pause_test_threads();
}
else
{
unpause_test_threads();
}
sleep(SLEEP_INTERVAL);
}
unpause_test_threads(someclass);
}
void sender_thread(SomeClass *someclass)
{
wait_to_start(someclass);
...
for (;;)
{
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
...
}
}
void receiver_thread(SomeClass *someclass)
{
wait_to_start(someclass);
...
for (;;)
{
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
...
}

I looked through your code manipulating conditional variable and atomic, and it seems that it is correct and won't cause problems.
Why you should protect writes to shared variable even if it is atomic:
There could be problems if write to shared variable happens between checking it in predicate and waiting on condition. Consider following:
Waiting thread wakes spuriously, aquires mutex, checks predicate and evaluates it to false, so it must wait on cv again.
Controlling thread sets shared variable to true.
Controlling thread sends notification, which is not received by anybody, because there is no threads waiting on conditional variable.
Waiting thread waits on conditional variable. Since notification was already sent, it would wait until next spurious wakeup, or next time when controlling thread sends notification. Potentially waiting indefinetly.
Reads from shared atomic variables without locking is generally safe, unless it introduces TOCTOU problems.
In your case you are reading shared variable to avoid unnecessary locking and then checking it again after lock (in conditional wait call). It is a valid optimisation, called double-checked locking and I do not see any potential problems here.
You might want to check if atomic<bool> is lock-free. Otherwise you will have even more locks you would have without it.

In general, you want to treat the fact that variable is atomic independently of how it works with a condition variable.
If all code that interacts with the condition variable follows the usual pattern of locking the mutex before query/modification, and the code interacting with the condition variable does not rely on code that does not interact with the condition variable, it will continue to be correct even if it wraps an atomic mutex.
From a quick read of your pseudo-code, this appears to be correct. However, pseudo-code is often a poor substitute for real code for multi-threaded code.
The "optimization" of only waiting on the condition variable (and locking the mutex) when an atomic read says you might want to may or may not be an optimization. You need to profile throughput.

atomic data doesn't need another synchronization, it's basis of lock-free algorithms and data structures.
void do_pause_test_threads(SomeClass *someclass)
{
if (!someclass->pause_test_threads)
{
/// your pause_test_threads might be changed here by other thread
/// so you have to acquire mutex before checking and changing
/// or use atomic methods - compare_exchange_weak/strong,
/// but not all together
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
}
}

std::condition_variable spurious blocking

As you know, condition variables should be called in cycle to avoid spurious wake-ups. Like this:
while (not condition)
condvar.wait();
If another thread wants to wake up waiting thread, it must set condition flag to true. E.g.:
condition = true;
condvar.notify_one();
I wonder, is it possible for condition variable to be blocked by this scenario:
1)Waiting thread checks condition flag, and finds it is equal to FALSE, so, it's going to enter condvar.wait() routine.
2)But just before this (but after condition flag checking) waiting thread is preempted by kernel (e.g. because of time slot expiration).
3) At this time, another thread wants to notify waiting thread about condition. It sets condition flag to TRUE and calls condvar.notify_one();
4) When kernel scheduler runs first thread again, it enters condvar.wait() routine, but the notification have been already missed.
So, waiting thread can't exit from condvar.wait(), despite condition flag is set to TRUE, because there is no wake up notifications anymore.
Is it possible?

That is exactly why a condition variable must be used in conjunction with a mutex, in order to atomically update the state and signal the change. The full code would look more like:
unique_lock<mutex> lock(mutex);
while (not condition)
condvar.wait(lock);
and for the other thread:
lock_guard<mutex> lock(mutex);
condition = true;
condvar.notify_one();

You example missing small part, but that explains why that is not possible if done correctly:
while (not condition) // when you check condition mutex is locked
condvar.wait( mutex ); // when you wait mutex is unlocked
So if you change condition to true under the same mutex lock, this situation will not happen.

Mike Seymour his answer is incomplete because there is a race condition which ends up with wakeup lost.
The right way is to (now with the c++11) is as follow:
Thread1:
std::unique_lock<std::mutex> lck(myMutex);
condvar.wait(lck, []{ return condition; }); // prevent spurious wakeup
// Process data
Thread2:
{
std::lock_guard<std::mutex> lck(myMutex);
condition = true;
} // unlock here! prevent wakeup lost
condvar.notify_one();

Yes (I tested this in December 2012), and there is a solution I conjured up for this a while ago. The "Flare" class:
Note that it uses a spin lock, but the time spent in this is minimal.
Declaration (hpp):
class Flare
{
public:
/**
\brief Flare's constructor.
\param fall_through_first, will skip the first wait() if true.
*/
Flare(bool fall_through_first = false);
/**
\brief Flare's destructor.
Takes care of removing the object of this class.
*/
~Flare();
/**
\brief Notifies the same object of availability.
Any thread waiting on this object will be freed,
and if the thread was not waiting, it will skip
wait when it iterates over it.
*/
void notify();
/**
\brief Wait until the next notification.
If a notification was sent whilst not being
inside wait, then wait will simply be skipped.
*/
void wait();
private:
std::mutex m_mx; // Used in the unique_lock,
std::unique_lock<std::mutex> m_lk; // Used in the cnd_var
std::condition_variable m_cndvar;
std::mutex m_in_function, n_mx; // protection of re-iteration.
bool m_notifications;
};
Implementaton/Definition (cpp):
#include "Flare.hpp"
// PUBLIC:
Flare::Flare(bool fall_through_first)
:
m_lk(m_mx),
m_notifications(!fall_through_first)
{}
Flare::~Flare()
{}
void Flare::notify()
{
if (m_in_function.try_lock() == true)
{
m_notifications = false;
m_in_function.unlock();
}
else // Function is waiting.
{
n_mx.lock();
do
{
m_notifications = false;
m_cndvar.notify_one();
}
while (m_in_function.try_lock() == false);
n_mx.unlock();
m_in_function.unlock();
}
}
void Flare::wait()
{
m_in_function.lock();
while (m_notifications)
m_cndvar.wait(m_lk);
m_in_function.unlock();
n_mx.lock();
m_notifications = true;
n_mx.unlock();
}

Two Condition Variables and avoiding deadlock

I have two condition variables:
CondVar1
CondVar2
Used in two threads like this (pseudo-code):
// thread1 starts in 'waiting' mode, and then Thread2 signals
void Thread1()
{
CondVar1->Wait();
CondVar2->Signal();
}
void Thread2()
{
CondVar1->Signal();
CondVar2->Wait();
}
Can this cause a deadlock? meaning, thread1 waits, thread2 signals, and then can thread1 signals before thread2 enters Wait(), meaning thread2 will never return?
Thanks

You don't usually just wait on a condition variable. The common use pattern is holding a lock, checking a variable that determines whether you can proceed or not and if you cannot wait in the condition:
// pseudocode
void push( T data ) {
Guard<Mutex> lock( m_mutex ); // Hold a lock on the queue
while (m_queue.full()) // [1]
m_cond1.wait(lock); // Wait until a consumer leaves a slot for me to write
// insert data
m_cond2.signal_one(); // Signal consumers that might be waiting on an empty queue
}
Some things to note: most libraries allow for spurious wakes in condition variables. While it is possible to implement a condition variable that avoid spurious wakes, the cost of the operations would be higher, so it is considered a lesser evil to require users to recheck the state before continuing (while loop in [1]).
Some libraries, notably C++11, allow you to pass a predicate, and will implement the loop internally: cond.wait(lock, [&queue](){ return !queue.full(); } );

There are two situations that could lead to a deadlock here:
In normal execution, the one you described. It is possible that the variable is signaled before the thread reaches the call to Wait, so the signal is lost.
A spurious wake-up could happen, causing the first thread to leave the call to Wait before actually being signaled, hence signaling Thread 2 who is not yet waiting.
You should design your code as follows when using signaling mechanisms:
bool thread1Waits = true;
bool thread2Waits = true;
void Thread1()
{
while(thread1Waits) CondVar1->Wait();
thread2Waits = false;
CondVar2->Signal();
}
void Thread2()
{
thread1Waits = false;
CondVar1->Signal();
while(thread2Waits) CondVar2->Wait();
}
Of course, this assumes there are locks protecting the condition variables and that additionally thread 1 runs before thread 2.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js