I'm having trouble with properly "swapping" locks. Consider this situation:
bool HidDevice::wait(const std::function<bool(const Info&)>& predicate)
/* A method scoped lock. */
std::unique_lock waitLock(this->waitMutex, std::defer_lock);
/* A scoped, general access, lock. */
std::lock_guard lock(this->mutex);
bool exitEarly = false;
/* do some checks... */
if (exitEarly)
return false;
/* Only one thread at a time can execute this method, however
other threads can execute other methods or abort this one. Thus,
general access mutex "this->mutex" should be unlocked (to allow threads
to call other methods) while at the same time, "this->waitMutex" should
be locked to prevent multiple executions of code below. */
waitLock.lock(); // How do I release "this->mutex" here?
/* do some stuff... */
/* The main problem is with this event based OS function. It can
only be called once with the data I provide, therefore I need to
have a 2 locks - one blocks multiple method calls (the usual stuff)
and "waitLock" makes sure that only one instance of "osBlockingFunction"
is ruinning at the time. Since this is a thread blocking function,
"this->mutex" must be unlocked at this point. */
bool result = osBlockingFunction(...);
/* In methods, such as "close", "this->waitMutex" and others are then used
to make sure that thread blocking methods have returned and I can safely
modify related data. */
/* do some more stuff... */
return result;
How could I solve this "swapping" problem without overly complicating code? I could unlock this->mutex before locking another, however I'm afraid that in that nanosecond, a race condition might occur.
Imagine that 3 threads are calling wait method. The first one will lock this->mutex, then this->waitMutex and then will unlock this->mutex. The second one will lock this->mutex and will have to wait for this->waitMutex to be available. It will not unlock this->mutex. The third one will get stuck on locking this->mutex.
I would like to get the last 2 threads to wait for this->waitMutex to be available.
Edit 2:
Expanded example with osBlockingFunction.

It smells like that the design/implementation should be a bit different with std::condition_variable cv on the HidDevice::wait and only one mutex. And as you write "other threads can execute other methods or abort this one" will call cv.notify_one to "abort" this wait. The cv.wait {enter wait & unlocks the mutex} atomically and on cv.notify {exits wait and locks the mutex} atomically. Like that HidDevice::wait is more simple:
bool HidDevice::wait(const std::function<bool(const Info&)>& predicate)
std::unique_lock<std::mutex> lock(this->m_Mutex); // Only one mutex.
m_bEarlyExit = false;
this->cv.wait(lock, spurious wake-up check);
if (m_bEarlyExit) // A bool data-member for abort.
/* do some stuff... */
My assumption is (according to the name of the function) that on /* do some checks... */ the thread waits until some logic comes true.
"Abort" the wait, will be in the responsibility of other HidDevice function, called by the other thread:
void HidDevice::do_some_checks() /* do some checks... */
if ( some checks )
if ( other checks )
m_bEarlyExit = true;
Something similar to that.

I recommend creating a little "unlocker" facility. This is a mutex wrapper with inverted semantics. On lock it unlocks and vice-versa:
template <class Lock>
class unlocker
Lock& locked_;
unlocker(Lock& lk) : locked_{lk} {}
void lock() {locked_.unlock();}
bool try_lock() {locked_.unlock(); return true;}
void unlock() {locked_.lock();}
Now in place of:
waitLock.lock(); // How do I release "this->mutex" here?
You can instead say:
unlocker temp{lock};
std::lock(waitLock, temp);
where lock is a unique_lock instead of a lock_guard holding mutex.
This will lock waitLock and unlock mutex as if by one uninterruptible instruction.
And now, after coding all of that, I can reason that it can be transformed into:
lock.unlock(); // lock must be a unique_lock to do this
Whether the first version is more or less readable is a matter of opinion. The first version is easier to reason about (once one knows what std::lock does). But the second one is simpler. But with the second, the reader has to think more carefully about the correctness.
Just read the edit in the question. This solution does not fix the problem in the edit: The second thread will block the third (and following threads) from making progress in any code that requires mutex but not waitMutex, until the first thread releases waitMutex.
So in this sense, my answer is technically correct, but does not satisfy the desired performance characteristics. I'll leave it up for informational purposes.


And odd use of conditional variable with local mutex

Poring through legacy code of old and large project, I had found that there was used some odd method of creating thread-safe queue, something like this:
template < typename _Msg>
class WaitQue: public QWaitCondition
typedef _Msg DataType;
void wakeOne(const DataType& msg)
QMutexLocker lock_(&mx);
void wait(DataType& msg)
/// wait if empty.
QMutex wx; // WHAT?
QMutexLocker cvlock_(&wx);
if (que.empty())
QMutexLocker _wlock(&mx);
msg = que.front();
unsigned long size() {
QMutexLocker lock_(&mx);
return que.size();
std::queue<DataType> que;
QMutex mx;
wakeOne is used from threads as kind of "posting" function" and wait is called from other threads and waits indefinitely until a message appears in queue. In some cases roles between threads reverse at different stages and using separate queues.
Is this even legal way to use a QMutex by creating local one? I kind of understand why someone could do that to dodge deadlock while reading size of que but how it even works? Is there a simpler and more idiomatic way to achieve this behavior?
Its legal to have a local condition variable. But it normally makes no sense.
As you've worked out in this case is wrong. You should be using the member:
void wait(DataType& msg)
QMutexLocker cvlock_(&mx);
while (que.empty())
msg = que.front();
Notice also that you must have while instead of if around the call to QWaitCondition::wait. This is for complex reasons about (possible) spurious wake up - the Qt docs aren't clear here. But more importantly the fact that the wake and the subsequent reacquire of the mutex is not an atomic operation means you must recheck the variable queue for emptiness. It could be this last case where you previously were getting deadlocks/UB.
Consider the scenario of an empty queue and a caller (thread 1) to wait into QWaitCondition::wait. This thread blocks. Then thread 2 comes along and adds an item to the queue and calls wakeOne. Thread 1 gets woken up and tries to reacquire the mutex. However, thread 3 comes along in your implementation of wait, takes the mutex before thread 1, sees the queue isn't empty, processes the single item and moves on, releasing the mutex. Then thread 1 which has been woken up finally acquires the mutex, returns from QWaitCondition::wait and tries to process... an empty queue. Yikes.

Using std::condition_variable with atomic<bool>

There are several questions on SO dealing with atomic, and other that deal with std::condition_variable. But my question if my use below is correct?
Three threads, one ctrl thread that does preparation work before unpausing the two other threads. The ctrl thread also is able to pause the worker threads (sender/receiver) while they are in their tight send/receive loops.
The idea with using the atomic is to make the tight loops faster in case the boolean for pausing is not set.
class SomeClass
// Disregard that data is public...
std::condition_variable cv; // UDP threads will wait on this cv until allowed
// to run by ctrl thread.
std::mutex cv_m;
std::atomic<bool> pause_test_threads;
void do_pause_test_threads(SomeClass *someclass)
if (!someclass->pause_test_threads)
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
void unpause_test_threads(SomeClass *someclass)
if (someclass->pause_test_threads)
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = false;
someclass->cv.notify_all(); // Allow send/receive threads to run.
void wait_to_start(SomeClass *someclass)
std::unique_lock<std::mutex> lk(someclass->cv_m); // RAII, no need for unlock.
auto not_paused = [someclass](){return someclass->pause_test_threads == false;};
someclass->cv.wait(lk, not_paused);
void ctrl_thread(SomeClass *someclass)
// Do startup work
// ...
for (;;)
// ... check for end-program etc, if so, break;
if (lost ctrl connection to other endpoint)
void sender_thread(SomeClass *someclass)
for (;;)
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
void receiver_thread(SomeClass *someclass)
for (;;)
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
I looked through your code manipulating conditional variable and atomic, and it seems that it is correct and won't cause problems.
Why you should protect writes to shared variable even if it is atomic:
There could be problems if write to shared variable happens between checking it in predicate and waiting on condition. Consider following:
Waiting thread wakes spuriously, aquires mutex, checks predicate and evaluates it to false, so it must wait on cv again.
Controlling thread sets shared variable to true.
Controlling thread sends notification, which is not received by anybody, because there is no threads waiting on conditional variable.
Waiting thread waits on conditional variable. Since notification was already sent, it would wait until next spurious wakeup, or next time when controlling thread sends notification. Potentially waiting indefinetly.
Reads from shared atomic variables without locking is generally safe, unless it introduces TOCTOU problems.
In your case you are reading shared variable to avoid unnecessary locking and then checking it again after lock (in conditional wait call). It is a valid optimisation, called double-checked locking and I do not see any potential problems here.
You might want to check if atomic<bool> is lock-free. Otherwise you will have even more locks you would have without it.
In general, you want to treat the fact that variable is atomic independently of how it works with a condition variable.
If all code that interacts with the condition variable follows the usual pattern of locking the mutex before query/modification, and the code interacting with the condition variable does not rely on code that does not interact with the condition variable, it will continue to be correct even if it wraps an atomic mutex.
From a quick read of your pseudo-code, this appears to be correct. However, pseudo-code is often a poor substitute for real code for multi-threaded code.
The "optimization" of only waiting on the condition variable (and locking the mutex) when an atomic read says you might want to may or may not be an optimization. You need to profile throughput.
atomic data doesn't need another synchronization, it's basis of lock-free algorithms and data structures.
void do_pause_test_threads(SomeClass *someclass)
if (!someclass->pause_test_threads)
/// your pause_test_threads might be changed here by other thread
/// so you have to acquire mutex before checking and changing
/// or use atomic methods - compare_exchange_weak/strong,
/// but not all together
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;

Why is there no wait function for condition_variable which does not relock the mutex

Consider the following example.
std::mutex mtx;
std::condition_variable cv;
void f()
std::unique_lock<std::mutex> lock( mtx );
cv.wait( lock ); // 1
std::cout << "f()\n";
void g()
std::this_thread::sleep_for( 1s );
int main()
std::thread t1{ f };
std::thread t2{ g };
g() "knows" that f() is waiting in the scenario I would like to discuss.
According to cppreference.com there is no need for g() to lock the mutex before calling notify_one. Now in the line marked "1" cv will release the mutex and relock it once the notification is sent. The destructor of lock releases it again immediately after that. This seems to be superfluous especially since locking is expensive. (I know in certain scenarios the mutex needs to be locked. But this is not the case here.)
Why does condition_variable have no function "wait_nolock" which does not relock the mutex once the notification arrives. If the answer is that pthreads do not provide such functionality: Why can`t pthreads be extended for providing it? Is there an alternative for realizing the desired behavior?
You misunderstand what your code does.
Your code on line // 1 is free to not block at all. condition_variables can (and will!) have spurious wakeups -- they can wake up for no good reason at all.
You are responsible for checking if the wakeup is spurious.
Using a condition_variable properly requires 3 things:
A condition_variable
A mutex
Some data guarded by the mutex
The data guarded by the mutex is modified (under the mutex). Then (with the mutex possibly disengaged), the condition_variable is notified.
On the other end, you lock the mutex, then wait on the condition variable. When you wake up, your mutex is relocked, and you test if the wakeup is spurious by looking at the data guarded by the mutex. If it is a valid wakeup, you process and proceed.
If it wasn't a valid wakeup, you go back to waiting.
In your case, you don't have any data guarded, you cannot distinguish spurious wakeups from real ones, and your design is incomplete.
Not surprisingly with the incomplete design you don't see the reason why the mutex is relocked: it is relocked so you can safely check the data to see if the wakeup was spurious or not.
If you want to know why condition variables are designed that way, probably because this design is more efficient than the "reliable" one (for whatever reason), and rather than exposing higher level primitives, C++ exposed the lower level more efficient primitives.
Building a higher level abstraction on top of this isn't hard, but there are design decisions. Here is one built on top of std::experimental::optional:
template<class T>
struct data_passer {
std::experimental::optional<T> data;
bool abort_flag = false;
std::mutex guard;
std::condition_variable signal;
void send( T t ) {
std::unique_lock<std::mutex> _(guard);
data = std::move(t);
void abort() {
std::unique_lock<std::mutex> _(guard);
abort_flag = true;
std::experimental::optional<T> get() {
std::unique_lock<std::mutex> _(guard);
signal.wait( _, [this]()->bool{
return data || abort_flag;
if (abort_flag) return {};
T retval = std::move(*data);
data = {};
return retval;
Now, each send can cause a get to succeed at the other end. If more than one send occurs, only the latest one is consumed by a get. If and when abort_flag is set, instead get() immediately returns {};
The above supports multiple consumers and producers.
An example of how the above might be used is a source of preview state (say, a UI thread), and one or more preview renderers (which are not fast enough to be run in the UI thread).
The preview state dumps a preview state into the data_passer<preview_state> willy-nilly. The renderers compete and one of them grabs it. Then they render it, and pass it back (through whatever mechanism).
If the preview states come faster than the renderers consume them, only the most recent one is of interest, so the earlier ones are discarded. But existing previews aren't aborted just because a new state shows up.
Questions where asked below about race conditions.
If the data being communicated is atomic, can't we do without the mutex on the "send" side?
So something like this:
template<class T>
struct data_passer {
std::atomic<std::experimental::optional<T>> data;
std::atomic<bool> abort_flag = false;
std::mutex guard;
std::condition_variable signal;
void send( T t ) {
data = std::move(t); // 1a
signal.notify_one(); // 1b
void abort() {
abort_flag = true; // 1a
signal.notify_all(); // 1b
std::experimental::optional<T> get() {
std::unique_lock<std::mutex> _(guard); // 2a
signal.wait( _, [this]()->bool{ // 2b
return data.load() || abort_flag.load(); // 2c
if (abort_flag.load()) return {};
T retval = std::move(*data.load());
// data = std::experimental::nullopt; // doesn't make sense
return retval;
the above fails to work.
We start with the listening thread. It does step 2a, then waits (2b). It evaluates the condition at step 2c, but doesn't return from the lambda yet.
The broadcasting thread then does step 1a (setting the data), then signals the condition variable. At this moment, nobody is waiting on the condition variable (the code in the lambda doesn't count!).
The listening thread then finishes the lambda, and returns "spurious wakeup". It then blocks on the condition variable, and never notices that data was sent.
The std::mutex used while waiting on the condition variable must guard the write to the data "passed" by the condition variable (whatever test you do to determine if the wakeup was spurious), and the read (in the lambda), or the possibility of "lost signals" exists. (At least in a simple implementation: more complex implementations can create lock-free paths for "common cases" and only use the mutex in a double-check. This is beyond the scope of this question.)
Using atomic variables does not get around this problem, because the two operations of "determine if the message was spurious" and "rewait in the condition variable" must be atomic with regards to the "spuriousness" of the message.

Do I have to acquire lock before calling condition_variable.notify_one()?

I am a bit confused about the use of std::condition_variable. I understand I have to create a unique_lock on a mutex before calling condition_variable.wait(). What I cannot find is whether I should also acquire a unique lock before calling notify_one() or notify_all().
Examples on cppreference.com are conflicting. For example, the notify_one page gives this example:
#include <iostream>
#include <condition_variable>
#include <thread>
#include <chrono>
std::condition_variable cv;
std::mutex cv_m;
int i = 0;
bool done = false;
void waits()
std::unique_lock<std::mutex> lk(cv_m);
std::cout << "Waiting... \n";
cv.wait(lk, []{return i == 1;});
std::cout << "...finished waiting. i == 1\n";
done = true;
void signals()
std::cout << "Notifying...\n";
std::unique_lock<std::mutex> lk(cv_m);
i = 1;
while (!done) {
std::cerr << "Notifying again...\n";
int main()
std::thread t1(waits), t2(signals);
t1.join(); t2.join();
Here the lock is not acquired for the first notify_one(), but is acquired for the second notify_one(). Looking though other pages with examples I see different things, mostly not acquiring the lock.
Can I choose myself to lock the mutex before calling notify_one(), and why would I choose to lock it?
In the example given, why is there no lock for the first notify_one(), but there is for subsequent calls. Is this example wrong or is there some rationale?
You do not need to be holding a lock when calling condition_variable::notify_one(), but it's not wrong in the sense that it's still well defined behavior and not an error.
However, it might be a "pessimization" since whatever waiting thread is made runnable (if any) will immediately try to acquire the lock that the notifying thread holds. I think it's a good rule of thumb to avoid holding the lock associated with a condition variable while calling notify_one() or notify_all(). See Pthread Mutex: pthread_mutex_unlock() consumes lots of time for an example where releasing a lock before calling the pthread equivalent of notify_one() improved performance measurably.
Keep in mind that the lock() call in the while loop is necessary at some point, because the lock needs to be held during the while (!done) loop condition check. But it doesn't need to be held for the call to notify_one().
2016-02-27: Large update to address some questions in the comments about whether there's a race condition if the lock isn't held for the notify_one() call. I know this update is late because the question was asked almost two years ago, but I'd like to address #Cookie's question about a possible race condition if the producer (signals() in this example) calls notify_one() just before the consumer (waits() in this example) is able to call wait().
The key is what happens to i - that's the object that actually indicates whether or not the consumer has "work" to do. The condition_variable is just a mechanism to let the consumer efficiently wait for a change to i.
The producer needs to hold the lock when updating i, and the consumer must hold the lock while checking i and calling condition_variable::wait() (if it needs to wait at all). In this case, the key is that it must be the same instance of holding the lock (often called a critical section) when the consumer does this check-and-wait. Since the critical section is held when the producer updates i and when the consumer checks-and-waits on i, there is no opportunity for i to change between when the consumer checks i and when it calls condition_variable::wait(). This is the crux for a proper use of condition variables.
The C++ standard says that condition_variable::wait() behaves like the following when called with a predicate (as in this case):
while (!pred())
There are two situations that can occur when the consumer checks i:
if i is 0 then the consumer calls cv.wait(), then i will still be 0 when the wait(lock) part of the implementation is called - the proper use of the locks ensures that. In this case the producer has no opportunity to call the condition_variable::notify_one() in its while loop until after the consumer has called cv.wait(lk, []{return i == 1;}) (and the wait() call has done everything it needs to do to properly 'catch' a notify - wait() won't release the lock until it has done that). So in this case, the consumer cannot miss the notification.
if i is already 1 when the consumer calls cv.wait(), the wait(lock) part of the implementation will never be called because the while (!pred()) test will cause the internal loop to terminate. In this situation it doesn't matter when the call to notify_one() occurs - the consumer will not block.
The example here does have the additional complexity of using the done variable to signal back to the producer thread that the consumer has recognized that i == 1, but I don't think this changes the analysis at all because all of the access to done (for both reading and modifying) are done while in the same critical sections that involve i and the condition_variable.
If you look at the question that #eh9 pointed to, Sync is unreliable using std::atomic and std::condition_variable, you will see a race condition. However, the code posted in that question violates one of the fundamental rules of using a condition variable: It does not hold a single critical section when performing a check-and-wait.
In that example, the code looks like:
if (--f->counter == 0) // (1)
// we have zeroed this fence's counter, wake up everyone that waits
f->resume.notify_all(); // (2)
unique_lock<mutex> lock(f->resume_mutex);
f->resume.wait(lock); // (3)
You will notice that the wait() at #3 is performed while holding f->resume_mutex. But the check for whether or not the wait() is necessary at step #1 is not done while holding that lock at all (much less continuously for the check-and-wait), which is a requirement for proper use of condition variables). I believe that the person who has the problem with that code snippet thought that since f->counter was a std::atomic type this would fulfill the requirement. However, the atomicity provided by std::atomic doesn't extend to the subsequent call to f->resume.wait(lock). In this example, there is a race between when f->counter is checked (step #1) and when the wait() is called (step #3).
That race does not exist in this question's example.
As others have pointed out, you do not need to be holding the lock when calling notify_one(), in terms of race conditions and threading-related issues. However, in some cases, holding the lock may be required to prevent the condition_variable from getting destroyed before notify_one() is called. Consider the following example:
thread t;
void foo() {
std::mutex m;
std::condition_variable cv;
bool done = false;
t = std::thread([&]() {
std::lock_guard<std::mutex> l(m); // (1)
done = true; // (2)
} // (3)
cv.notify_one(); // (4)
}); // (5)
std::unique_lock<std::mutex> lock(m); // (6)
cv.wait(lock, [&done]() { return done; }); // (7)
void main() {
foo(); // (8)
t.join(); // (9)
Assume there is a context switch to the newly created thread t after we created it but before we start waiting on the condition variable (somewhere between (5) and (6)). The thread t acquires the lock (1), sets the predicate variable (2) and then releases the lock (3). Assume there is another context switch right at this point before notify_one() (4) is executed. The main thread acquires the lock (6) and executes line (7), at which point the predicate returns true and there is no reason to wait, so it releases the lock and continues. foo returns (8) and the variables in its scope (including cv) are destroyed. Before thread t could join the main thread (9), it has to finish its execution, so it continues from where it left off to execute cv.notify_one() (4), at which point cv is already destroyed!
The possible fix in this case is to keep holding the lock when calling notify_one (i.e. remove the scope ending in line (3)). By doing so, we ensure that thread t calls notify_one before cv.wait can check the newly set predicate variable and continue, since it would need to acquire the lock, which t is currently holding, to do the check. So, we ensure that cv is not accessed by thread t after foo returns.
To summarize, the problem in this specific case is not really about threading, but about the lifetimes of the variables captured by reference. cv is captured by reference via thread t, hence you have to make sure cv stays alive for the duration of the thread's execution. The other examples presented here do not suffer from this issue, because condition_variable and mutex objects are defined in the global scope, hence they are guaranteed to be kept alive until the program exits.
Using vc10 and Boost 1.56 I implemented a concurrent queue pretty much like this blog post suggests. The author unlocks the mutex to minimize contention, i.e., notify_one() is called with the mutex unlocked:
void push(const T& item)
std::unique_lock<std::mutex> mlock(mutex_);
mlock.unlock(); // unlock before notificiation to minimize mutex contention
cond_.notify_one(); // notify one waiting thread
Unlocking the mutex is backed by an example in the Boost documentation:
void prepare_data_for_processing()
boost::lock_guard<boost::mutex> lock(mut);
Still this led to the following erratic behaviour:
while notify_one() has not been called yet cond_.wait() can still be interrupted via boost::thread::interrupt()
once notify_one() was called for the first time cond_.wait() deadlocks; the wait cannot be ended by boost::thread::interrupt() or boost::condition_variable::notify_*() anymore.
Removing the line mlock.unlock() made the code work as expected (notifications and interrupts end the wait). Note that notify_one() is called with the mutex still locked, it is unlocked right afterwards when leaving the scope:
void push(const T& item)
std::lock_guard<std::mutex> mlock(mutex_);
cond_.notify_one(); // notify one waiting thread
That means that at least with my particular thread implementation the mutex must not be unlocked before calling boost::condition_variable::notify_one(), although both ways seem correct.
#Michael Burr is correct. condition_variable::notify_one does not require a lock on the variable. Nothing prevents you to use a lock in that situation though, as the example illustrates it.
In the given example, the lock is motivated by the concurrent use of the variable i. Because the signals thread modifies the variable, it needs to ensure that no other thread is access it during that time.
Locks are used for any situation requiring synchronization, I don't think we can state it in a more general way.
In some case, when the cv may be occupied(locked) by other threads. You needs to get lock and release it before notify_*().
If not, the notify_*() maybe not executed at all.
Just adding this answer because I think the accepted answer might be misleading. In all cases you will need to lock the mutex, prior to calling notify_one() somewhere for your code to be thread-safe, although you might unlock it again before actually calling notify_*().
To clarify, you MUST take the lock before entering wait(lk) because wait() unlocks lk and it would be Undefined Behavior if the lock wasn't locked. This is not the case with notify_one(), but you need to make sure you won't call notify_*() before entering wait() and having that call unlock the mutex; which obviously only can be done by locking that same mutex before you call notify_*().
For example, consider the following case:
std::atomic_int count;
std::mutex cancel_mutex;
std::condition_variable cancel_cv;
void stop()
if (count.fetch_sub(1) == -999) // Reached -1000 ?
bool start()
if (count.fetch_add(1) >= 0)
return true;
// Failure.
return false;
void cancel()
if (count.fetch_sub(1000) == 0) // Reached -1000?
// Wait till count reached -1000.
std::unique_lock<std::mutex> lk(cancel_mutex);
Warning: this code contains a bug.
The idea is the following: threads call start() and stop() in pairs, but only as long as start() returned true. For example:
if (start())
// Do stuff
One (other) thread at some point will call cancel() and after returning from cancel() will destroy objects that are needed at 'Do stuff'. However, cancel() is supposed not to return while there are threads between start() and stop(), and once cancel() executed its first line, start() will always return false, so no new threads will enter the 'Do stuff' area.
Works right?
The reasoning is as follows:
1) If any thread successfully executes the first line of start() (and therefore will return true) then no thread did execute the first line of cancel() yet (we assume that the total number of threads is much smaller than 1000 by the way).
2) Also, while a thread successfully executed the first line of start(), but not yet the first line of stop() then it is impossible that any thread will successfully execute the first line of cancel() (note that only one thread ever calls cancel()): the value returned by fetch_sub(1000) will be larger than 0.
3) Once a thread executed the first line of cancel(), the first line of start() will always return false and a thread calling start() will not enter the 'Do stuff' area anymore.
4) The number of calls to start() and stop() are always balanced, so after the first line of cancel() is unsuccessfully executed, there will always be a moment where a (the last) call to stop() causes count to reach -1000 and therefore notify_one() to be called. Note that can only ever happen when the first line of cancel resulted in that thread to fall through.
Apart from a starvation problem where so many threads are calling start()/stop() that count never reaches -1000 and cancel() never returns, which one might accept as "unlikely and never lasting long", there is another bug:
It is possible that there is one thread inside the 'Do stuff' area, lets say it is just calling stop(); at that moment a thread executes the first line of cancel() reading the value 1 with the fetch_sub(1000) and falling through. But before it takes the mutex and/or does the call to wait(lk), the first thread executes the first line of stop(), reads -999 and calls cv.notify_one()!
Then this call to notify_one() is done BEFORE we are wait()-ing on the condition variable! And the program would indefinitely dead-lock.
For this reason we should not be able to call notify_one() until we called wait(). Note that the power of a condition variable lies there in that it is able to atomically unlock the mutex, check if a call to notify_one() happened and go to sleep or not. You can't fool it, but you do need to keep the mutex locked whenever you make changes to variables that might change the condition from false to true and keep it locked while calling notify_one() because of race conditions like described here.
In this example there is no condition however. Why didn't I use as condition 'count == -1000'? Because that isn't interesting at all here: as soon as -1000 is reached at all, we are sure that no new thread will enter the 'Do stuff' area. Moreover, threads can still call start() and will increment count (to -999 and -998 etc) but we don't care about that. The only thing that matters is that -1000 was reached - so that we know for sure that there are no threads anymore in the 'Do stuff' area. We are sure that this is the case when notify_one() is being called, but how to make sure we don't call notify_one() before cancel() locked its mutex? Just locking cancel_mutex shortly prior to notify_one() isn't going to help of course.
The problem is that, despite that we're not waiting for a condition, there still is a condition, and we need to lock the mutex
1) before that condition is reached
2) before we call notify_one.
The correct code therefore becomes:
void stop()
if (count.fetch_sub(1) == -999) // Reached -1000 ?
[...same start()...]
void cancel()
std::unique_lock<std::mutex> lk(cancel_mutex);
if (count.fetch_sub(1000) == 0)
Of course this is just one example but other cases are very much alike; in almost all cases where you use a conditional variable you will need to have that mutex locked (shortly) before calling notify_one(), or else it is possible that you call it before calling wait().
Note that I unlocked the mutex prior to calling notify_one() in this case, because otherwise there is the (small) chance that the call to notify_one() wakes up the thread waiting for the condition variable which then will try to take the mutex and block, before we release the mutex again. That's just slightly slower than needed.
This example was kinda special in that the line that changes the condition is executed by the same thread that calls wait().
More usual is the case where one thread simply wait's for a condition to become true and another thread takes the lock before changing the variables involved in that condition (causing it to possibly become true). In that case the mutex is locked immediately before (and after) the condition became true - so it is totally ok to just unlock the mutex before calling notify_*() in that case.
As I understand notify_one calls pthread_cond_signal.
If so, then what do think about this?
For predictable scheduling behavior and to prevent lost wake-ups, the mutex should be held when signaling a condition variable.
All of the threads waiting on the condition variable are suspended until another thread uses the signal function:
In this case the mutex has to be locked before calling the function and unlocked after it.
I personally had cases when notifications were missed because notify_one was called without locking the mutex.

Does boost::condition::notify_all guarantee that a listener thread will acquire the lock before returning?

boost::condition cond;
boost::mutex access;
void listener_thread()
boost::mutex::scoped_lock lock(access);
while (true) {
while (!condition_check_var) {
/// ... Main thread ...
boost::mutex::scoped_lock lock(access);
Is this proper design? Is it safe to assume that once the notify_all() returns, the listener_thread will have already acquired the lock? And that when the check_work block will run (since it's locking the same mutex as the listener_thread()), some "work" will have already been done by the listener_thread()?
If not, what is the preferred way to achieve this kind of behavior?
There is no guarantee that any other thread has acted upon a notification or even, yet, received. In fact, there isn't even a guarantee that there is a thread currently waiting for its reception although in your setup it looks as if it is likely the case that there are threads waiting. If you want to make sure that the receiving threads have done their work you'll need to set up a reverse communication channel, e.g., using another condition variable and a suitable condition.
I realize that your question is about Boost but here is what the standard has to say about this (30.5.1 [thread.condition.condvar] paragraph 8):
void notify_all() noexcept;
Effects: Unblocks all threads that are blocked waiting for *this.
It doesn't give any guarantee about what happens to the threads and/or any involved mutex.
It's generally OK, though the typical way to write it is like this:
while (true)
cond.wait(lock, [&]() -> bool { return condition_check_var; });
You can't speak to the simultaneity of calling notify_all() and the return of wait(), since there is no formal causal relationship between the two. All you need to know for synchronisation is that when wait() returns you will have acquired the lock. Since your check_work block also locks the mutex, it is guaranteed to execute only while the other thread is blocking on the condition variable.