After reading some other articles, I got to know that I could implement a c++ blocking queue like this:
template<typename T>
class BlockingQueue {
std::mutex mtx;
std::condition_variable not_full;
std::condition_variable not_empty;
std::queue<T> queue;
size_t capacity{5};
BlockingQueue(int cap):capacity(cap) {}
BlockingQueue(const BlockingQueue&)=delete;
BlockingQueue& operator=(const BlockingQueue&)=delete;
void push(const T& data) {
std::unique_lock<std::mutex> lock(mtx);
while (queue.size() >= capacity) {
not_full.wait(lock, [&]{return queue.size() < capacity;});
T pop() {
std::unique_lock<std::mutex> lock(mtx);
while (queue.empty()) {
not_empty.wait(lock, [&]{return !queue.empty();});
T res = queue.front();
return res;
bool empty() {
std::unique_lock<std::mutex> lock(mtx);
return queue.empty();
size_t size() {
std::unique_lock<std::mutex> lock(mtx);
return queue.size();
void set_capacity(const size_t capacity) {
this->capacity = (capacity > 0 ? capacity : 10);
This works for me, but I do not know how could I shut it down if I start it in the background thread:
void main() {
BlockingQueue<float> q;
bool stop{false};
auto fun = [&] {
std::cout << "before entering loop\n";
while (!stop) {
std::cout << "after entering loop\n";
std::thread t_bg(fun);
// Some other tasks here
stop = true;
// How could I shut it down before quit here, or could I simply let the operation system do that when the whole program is over?
The problem is that when I want to shut down the background thread, the background thread might have been sleeping because the queue is full and the push operation is blocked. How could I stop it when I want the background thread to stop ?

One easy way would be to add a flag that you set from outside when you want to abort a pop() operation that's already blocked. And then you'd have to decide what an aborted pop() is going to return. One way is for it to throw an exception, another would be to return an std::optional<T>. Here's the first method (I'll only write the changed parts.)
Add this type wherever you think is appropriate:
struct AbortedPopException {};
Add this to your class fields:
mutable std::atomic<bool> abort_flag = false;
Also add this method:
void abort () const {
abort_flag = true;
Change the while loop in the pop() method like this: (you don't need the while at all, since I believe the condition variable wait() method that accepts a lambda does not wake up/return spuriously; i.e. the loop is inside the wait already.)
not_empty.wait(lock, [this]{return !queue.empty() || abort_flag;});
if (abort_flag)
throw AbortedPopException{};
That's it (I believe.)
In your main(), when you want to shut the "consumer" down you can call abort() on your queue. But you'll have to handle the thrown exception there as well. It's your "exit" signal, basically.
Some side notes:
Don't detach from threads! Specially here where AFAICT there is no reason for it (and some actual danger too.) Just signal them to exit (in any manner appropriate) and join() them.
Your stop flag should be atomic. You read from it in your background thread and write to it from your main thread, and those can (and in fact do) overlap in time, so... data race!
I don't understand why you have a "full" state and "capacity" in your queue. Think about whether they are necessary.
UPDATE 1: In response to OP's comment about detaching... Here's what happens in your main thread:
You spawn the "producer" thread (i.e. the one that pushed stuff onto the queue)
Then you do all the work you want to do (e.g. consuming the stuff on the queue)
Sometime, perhaps at the end of main(), you signal the thread to stop (e.g. by setting stop flag to true)
then, and only then you join() with the thread.
It is true that your main thread will block while it is waiting for the thread to pick up the "stop" signal, exit its loop, and return from its thread function, but that's a very very short wait. And you have nothing else to do. More importantly, you'll know that your thread exited cleanly and predictably, and from that point on, you know definitely that that thread won't be running (not important for you here, but could be critical for some other threaded task.)
That is the pattern that you usually want to follow in spawning worker thread that loop over a short task.
Update 2: About "full" and "capacity" of the queue. That's fine. It's certainly your decision. No problem with that.
Update 3: About "throwing" vs. returning an "empty" object to signal an aborted "blocking pop()". I don't think there is anything wrong with throwing like that; specially since it is very very rare (just happens once at the end of the operation of the producer/consumer.) However, if all T types that you want to store in your Queue have an "invalid" or "empty" state, then you certainly can use that. But throwing is more general, if more "icky" to some people.


How to properly abort a thread with the use of a condition_variable?

I have a class with some methods that should be thread safe, i.e. multiple threads should be able operate on the class object state. One of the methods spawns a new thread that, every 10 seconds, updates a field. Because this thread can be long-running, I'd like to be able to abort it properly.
I have implemented a solution that uses std::condition_variable.wait_for() to wait for an abortion signal inside the thread, but am not particularly sure if my solution is either optimal or correct at all.
class A
unsigned int value; // A value that will be updated every 10 s in another thread
bool is_being_updated; // true while value is being updated in another thread
std::thread t;
bool aborted; // true = thread should abort
mutable std::mutex m1;
mutable std::mutex m2;
std::condition_variable cv;
void begin_update(); // Creates a thread that periodically updates value
void abort(); // Aborts the updating thread
unsigned int get_value() const;
void set_value(unsigned int);
This is how I implemented the methods:
A::A() : value(0), is_being_updated(false), aborted(false) { }
// Not sure if this is thread safe?
if(t.joinable()) t.join();
// Updates this->value every 10 seconds
void A::begin_update()
std::lock_guard<std::mutex> lck(m1);
if (is_being_updated) return; // Don't allow begin_update() while updating
is_being_updated = true;
if (aborted) aborted = false;
// Create a thread that will update value periodically
t = std::thread([this] {
std::unique_lock<std::mutex> update_lock(m2);
for(int i=0; i < 10; i++)
cv.wait_for(update_lock, std::chrono::seconds(10), [this]{ return aborted; });
if (!aborted)
std::lock_guard<std::mutex> lck(m1);
this->value++; // Update value
break; // Break on thread abort
// Locking here would cause indefinite blocking ...
// std::lock_guard<std::mutex> lck(m1);
if(is_being_updated) is_being_updated = false;
// Aborts the thread created in begin_update()
void A::abort()
std::lock_guard<std::mutex> lck(m1);
is_being_updated = false;
this->value = 0; // Reset value
std::lock_guard<std::mutex> update_lock(m2);
aborted = true;
cv.notify_one(); // Signal abort ...
if(t.joinable()) t.join(); // Wait for the thread to finish
unsigned int A::get_value() const
std::lock_guard<std::mutex> lck(m1);
return this->value;
void A::set_value(unsigned int v)
std::lock_guard<std::mutex> lck(m1);
if (is_being_updated) return; // Cannot set value while thread is updating it
this->value = v;
This seems to work fine, but I'm uncertain about it being correct. My concerns are the following:
Is my destructor safe? Suppose that the updating thread has not been aborted and is still doing its job while A object goes out of scope. A switch to a different thread now happens while dtor's t.join() still hasn't finished, and the switched-to thread calls begin_update() on the same object. Is something like this possible? Should I introduce e.g. an extra is_being_destructed flag that I would set to true inside a destructor and that all other methods should check for being false before they can proceed? Or can no such undesired scenario happen?
Inside the thread, at the end, I'm setting is_being_updated = false without a lock, despite the variable being shared state. This can mean that other threads won't see its correct value, e.g. even after the thread is done, some other thread may still see the value as is_being_updated == true instead of false. I cannot lock the mutex, however, because abort() may have already locked it, meaning that the call will block indefinitely. I'm not sure about the best way to solve this, other than perhaps making is_being_updated atomic. Would that work?
I've read about spurious wakeups, but am not sure I the code should do anything extra to handle them. As far as I understand, the answer is no, and no problems are to be expected in this regard.
Is my thinking here correct? Did I miss anything else that I should have in mind?
This stuff is always hard to check, so don't be afraid to question me if you think I misunderstand.
Short answer, no, it's not thread safe.
As long as the thread that has scope of A is the one calling abort (and doesn't forget to call abort), you won't experience a race condition, as A::abort() will block until the thread is joined. Under these assumptions, the join in your destructor is pointless.
If abort is called by the a thread that doesn't own A, then it's definitely possible for the thread to be join-ed twice, which is bad. Using .joinable() to decide to join a thread or not is a big red flag.
Please remove one of your if(t.joinable()) t.join(); (I'm leaning towards the one in the destructor) and change the other to just t.join().
As you said, you can make is_being_updated atomic. That's a great solution.
Here's another solution. You can signal without holding the lock. (It's actually better form in general, as it helps reduce lock contention, since the first thing the woken thread needs to do is reacquire its mutex.)
void A::abort()
std::lock(m1, m2); // deadlock-proof
std::lock_guard<std::mutex> lck(m1, std::adopt_lock);
std::lock_guard<std::mutex> update_lock(m2, std::adopt_lock);
is_being_updated = false;
this->value = 0; // Reset value
aborted = true;
cv.notify_one(); // Signal abort ...
t.join(); // Wait for the thread to finish
You're good. The way you wrote the wait, you will only come back if abort==true or 10 seconds has elapsed.
1) I think this problem is inherent on your design, as it is a bool flag will not fix the problem. Maybe A shouldn't go out of scope until all the threads stop using it, in which case it should reside in a managed pointer like shared_ptr.
2) You should be using atomics for your bools and also value, this would avoid having to use the unique_lock for increasing the value and for returning it.
3) As I said in the comments the lambda in the cv handles the spurious wakeups.
The biggest bit of code smell is using a full thread to update a variable every 10 seconds. A heavy-weight OS thread with magabytes to gigabytes of address space to do one task every 10 seconds.
What more, it is updating a value without anyone being able to see the change.
You already have a get_value wrapping accessor. Simply store the start point when you want to start counting. When you call get_value calculate the time since the start point. Divide by 10 seconds. Use that to calculate the returned value.
In a real application, you'd have a timer system that lets you trigger events (either in a thread pool, or in a message pump) every period of time. You'd use that instead of a dedicated thread to do something like this, and you'd make sure that modifying that value was vulgar (allowed people to subscribe to changes in it). Then your abort would consist of deregistering the timer instead of stopping a thread.
Your system is a horrible mixture of the two, using threads for no good reason.

Writing a thread that stays alive

I would like to write a class that wraps around std::thread and behaves like a std::thread but without actually allocating a thread every time I need to process something async. The reason is that I need to use multi threading in a context where I'm not allow to dynamically allocate and I also don't want to have the overhead of creating a std::thread.
Instead, I want a thread to run in a loop and wait until it can start processing. The client calls invoke which wakes up the thread. The Thread locks a mutex, does it's processing and falls asleep again. A function join behaves like std::thread::join by locking until the thread frees the lock (i.e. falls asleep again).
I think I got the class to run but because of a general lack of experience in multi threading, I would like to ask if anybody can spot race conditions or if the approach I used is considered "good style". For example, I'm not sure if temporary locking the mutex is a decent way to "join" the thread.
I found another race condition: when calling join directly after invoke, there is no reason the thread already locked the mutex and thus locks the caller of join until the thread goes to sleep. To prevent this, I had to add a check for the invoke counter.
#pragma once
#include <thread>
#include <atomic>
#include <mutex>
class PersistentThread
// set function to invoke
// locks if thread is currently processing _func
void set(const std::function<void()> &f);
// wakes the thread up to process _func and fall asleep again
// locks if thread is currently processing _func
void invoke();
// mimics std::thread::join
// locks until the thread is finished with it's loop
void join();
// intern thread loop
void loop(bool *initialized);
bool _shutdownRequested{ false };
std::mutex _mutex;
std::unique_ptr<std::thread> _thread;
std::condition_variable _cond;
std::function<void()> _func{ nullptr };
Source File
#include "PersistentThread.h"
auto lock = std::unique_lock<std::mutex>(_mutex);
bool initialized = false;
_thread = std::make_unique<std::thread>(&PersistentThread::loop, this, &initialized);
// wait until _thread notifies, check bool initialized to prevent spurious wakeups
_cond.wait(lock, [&] {return initialized; });
std::lock_guard<std::mutex> lock(_mutex);
_func = nullptr;
_shutdownRequested = true;
// wake up and let join
// join thread,
if (_thread->joinable())
void PersistentThread::set(const std::function<void()>& f)
std::lock_guard<std::mutex> lock(_mutex);
this->_func = f;
void PersistentThread::invoke()
std::lock_guard<std::mutex> lock(_mutex);
void PersistentThread::join()
bool joined = false;
while (!joined)
std::lock_guard<std::mutex> lock(_mutex);
joined = (_invokeCounter == 0);
void PersistentThread::loop(bool *initialized)
std::unique_lock<std::mutex> lock(_mutex);
*initialized = true;
while (true)
// wait until we get the mutex again
_cond.wait(lock, [this] {return _shutdownRequested || (this->_invokeCounter > 0); });
// shut down if requested
if (_shutdownRequested) return;
// process
if (_func) _func();
You are asking about potential race conditions, and I see at least one race condition in the shown code.
After constructing a PersistentThread, there is no guarantee that the new thread will acquire its initial lock in its loop() before the main execution thread returns from the constructor and enters invoke(). It is possible that the main execution thread enters invoke() immediately after the constructor is complete, ends up notifying nobody, since the internal execution thread hasn't locked the mutex yet. As such, this invoke() will not result in any processing taking place.
You need to synchronize the completion of the constructor with the execution thread's initial lock acquisition.
EDIT: your revision looks right; but I also spotted another race condition.
As documented in the description of wait(), wait() may wake up "spuriously". Just because wait() returned, doesn't mean that some other thread has entered invoke().
You need a counter, in addition to everything else, with invoke() incrementing the counter, and the execution thread executing its assigned duties only when the counter is greater than zero, decrementing it. This will guard against spurious wake-ups.
I would also have the execution thread check the counter before entering wait(), and enter wait() only if it is 0. Otherwise, it decrements the counter, executes its function, and loops back.
This should plug up all the potential race conditions in this area.
P.S. The spurious wake-up also applies to the initial notification, in your correction, that the execution thread has entered the loop. You'll need to do something similar for that situation, too.
I don't understand what you're trying to ask exactly. It's a nice style you used.
It would be much safer using bools and check the single routines because void returns nothing so you could be maybe stuck caused by bugs. Check everything you can since the thread runs under the hood. Make sure the calls are running correctly, if the process had really success. Also you could read some stuff about "Thread Pooling".

Approach of using an std::atomic compared to std::condition_variable wrt pausing & resuming an std::thread in C++

This is a separate question but related to the previous question I asked here
I am using an std::thread in my C++ code to constantly poll for some data & add it to a buffer. I use a C++ lambda to start the thread like this:
StartMyThread() {
thread_running = true;
the_thread = std::thread { [this] {
while(thread_running) {
thread_running is an atomic<bool> declared in class header. Here is my GetData function:
GetData() {
//Some heavy logic
Next I also have a StopMyThread function where I set thread_running to false so that it exits out of the while loop in the lambda block.
StopMyThread() {
thread_running = false;
As I understand, I can pause & resume the thread using a std::condition_variable as pointed out here in my earlier question.
But is there a disadvantage if I just use the std::atomic<bool> thread_running to execute or not execute the logic in GetData() like below ?
GetData() {
if (thread_running == false)
//Some heavy logic
Will this burn more CPU cycles compared to the approach of using an std::condition_variable as described here ?
A condition variable is useful when you want to conditionally halt another thread or not. So you might have an always-running "worker" thread that waits when it notices it has nothing to do to be running.
The atomic solution requires your UI interaction synchronize with the worker thread, or very complex logic to do it asynchronously.
As a general rule, your UI response thread should never block on non-ready state from worker threads.
struct worker_thread {
worker_thread( std::function<void()> t, bool play = true ):
thread = std::async( std::launch::async, [this]{
// move is not safe. If you need this movable,
// use unique_ptr<worker_thread>.
worker_thread(worker_thread&& )=delete;
~worker_thread() {
if (!exit) finalize();
void finalize() {
auto l = lock();
exit = true;
void pause() {
auto l = lock();
execute = false;
void play() {
auto l = lock();
execute = true;
void wait() {
if (thread)
void work() {
while(true) {
bool done = false;
auto l = lock();
cv.wait( l, [&]{
return exit || execute;
done = exit; // have lock here
if (done) break;
std::unique_lock<std::mutex> lock() {
return std::unique_lock<std::mutex>(m);
std::mutex m;
std::condition_variable cv;
bool exit = false;
bool execute = true;
std::function<void()> task;
std::future<void> thread;
or somesuch.
This owns a thread. The thread repeatedly runs task so long as it is in play() mode. If you pause() the next time task() finishes, the worker thread stops. If you play() before the task() call finishes, it doesn't notice the pause().
The only wait is on destruction of worker_thread, where it automatically informs the worker thread it should exit and it waits for it to finish.
You can manually .wait() or .finalize() as well. .finalize() is async, but if your app is shutting down you can call it early and give the worker thread more time to clean up while the main thread cleans things up elsewhere.
.finalize() cannot be reversed.
Code not tested.
Unless I'm missing something, you already answered this in your original question: You'll be creating and destroying the worker thread each time it's needed. This may or may not be an issue in your actual application.
There's two different problems being solved and it may depend on what you're actually doing. One problem is "I want my thread to run until I tell it to stop." The other seems to be a case of "I have a producer/consumer pair and want to be able to notify the consumer when data is ready." The thread_running and join method works well for the first of those. The second you may want to use a mutex and condition because you're doing more than just using the state to trigger work. Suppose you have a vector<Work>. You guard that with the mutex, so the condition becomes [&work] (){ return !work.empty(); } or something similar. When the wait returns, you hold the mutex so you can take things out of work and do them. When you're done, you go back to wait, releasing the mutex so the producer can add things to the queue.
You may want to combine these techniques. Have a "done processing" atomic that all of your threads periodically check to know when to exit so that you can join them. Use the condition to cover the case of data delivery between threads.

Thread pool stuck on wait condition

I'm encountering a stuck in my c++ program using this thread pool class:
class ThreadPool {
unsigned threadCount;
std::vector<std::thread> threads;
std::list<std::function<void(void)> > queue;
std::atomic_int jobs_left;
std::atomic_bool bailout;
std::atomic_bool finished;
std::condition_variable job_available_var;
std::condition_variable wait_var;
std::mutex wait_mutex;
std::mutex queue_mutex;
std::mutex mtx;
void Task() {
while (!bailout) {
std::function<void(void)> next_job() {
std::function<void(void)> res;
std::unique_lock<std::mutex> job_lock(queue_mutex);
// Wait for a job if we don't have any.
job_available_var.wait(job_lock, [this]()->bool { return queue.size() || bailout; });
// Get job from the queue
if (!bailout) {
res = queue.front();
}else {
// If we're bailing out, 'inject' a job into the queue to keep jobs_left accurate.
res = [] {};
return res;
ThreadPool(int c)
: threadCount(c)
, threads(threadCount)
, jobs_left(0)
, bailout(false)
, finished(false)
for (unsigned i = 0; i < threadCount; ++i)
threads[i] = std::move(std::thread([this, i] { this->Task(); }));
~ThreadPool() {
void AddJob(std::function<void(void)> job) {
std::lock_guard<std::mutex> lock(queue_mutex);
void JoinAll(bool WaitForAll = true) {
if (!finished) {
if (WaitForAll) {
// note that we're done, and wake up any thread that's
// waiting for a new job
bailout = true;
for (auto& x : threads)
if (x.joinable())
finished = true;
void WaitAll() {
std::unique_lock<std::mutex> lk(wait_mutex);
if (jobs_left > 0) {
wait_var.wait(lk, [this] { return this->jobs_left == 0; });
gdb say (when stopping the blocked execution) that the stuck was in (std::unique_lock&, ThreadPool::WaitAll()::{lambda()#1})+58>
I'm using g++ v5.3.0 with support for c++14 (-std=c++1y)
How can I avoid this problem?
I've edited (rewrote) the class:
The issue here is a race condition on your job count. You're using one mutex to protect the queue, and another to protect the count, which is semantically equivalent to the queue size. Clearly the second mutex is redundant (and improperly used), as is the job_count variable itself.
Every method that deals with the queue has to gain exclusive access to it (even JoinAll to read its size), so you should use the same queue_mutex in the three bits of code that tamper with it (JoinAll, AddJob and next_job).
Btw, splitting the code at next_job() is pretty awkward IMO. You would avoid calling a dummy function if you handled the worker thread body in a single function.
As other comments have already stated, you would probably be better off getting your eyes off the code and reconsidering the problem globally for a while.
The only thing you need to protect here is the job queue, so you need only one mutex.
Then there is the problem of waking up the various actors, which requires a condition variable since C++ basically does not give you any other useable synchronization object.
Here again you don't need more than one variable. Terminating the thread pool is equivalent to dequeueing the jobs without executing them, which can be done any which way, be it in the worker threads themselves (skipping execution if the termination flag is set) or in the JoinAll function (clearing the queue after gaining exclusive access).
Last but not least, you might want to invalidate AddJob once someone decided to close the pool, or else you could get stuck in the destructor while someone keeps feeding in new jobs.
I think you need to keep it simple.
you seem to be using a mutex too many. So there's queue_mutex and you use that when you add and process jobs.
Now what's the need for another separate mutex when you are waiting on reading the queue?
Why can't you use just a conditional variable with the same queue_mutex to read the queue in your WaitAll() method?
I would also recommend using a lock_guard instead of the unique_lock in your WaitAll. There really isn't a need to lock the queue_mutex beyond the WaitAll under exceptional conditions. If you exit the WaitAll exceptionally it should be released regardless.
Ignore my Update above. Since you are using a condition variable you can't use a lock guard in the WaitAll. But if you are using a unique_lock always go with the try_to_lock version especially if you have more than a couple control paths

C++11 lockfree single producer single consumer: how to avoid busy wait

I'm trying to implement a class that uses two threads: one for the producer and one for the consumer. The current implementation does not use locks:
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
using Queue =
class Worker
Worker() : working_(false), done_(false) {}
~Worker() {
done_ = true; // exit even if the work has not been completed
void enqueue(int value) {
if (!working_) {
working_ = true;
worker_ = std::thread([this]{ work(); });
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
working_ = false;
std::atomic<bool> working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
The application needs to enqueue work items for a certain amount of time and then sleep waiting for an event. This is a minimal main that simulates the behavior:
int main()
Worker w;
for (int i = 0; i < 1000; ++i)
for (int i = 0; i < 1000; ++i)
I'm pretty sure that my implementation is bugged: what if the worker thread completes and before executing working_ = false, another enqueue comes? Is it possible to make my code thread safe without using locks?
The solution requires:
a fast enqueue
the destructor has to quit even if the queue is not empty
no busy wait, because there are long period of time in which the worker thread is idle
no locks if possible
I did another implementation of the Worker class, based on your suggestions. Here is my second attempt:
class Worker
: working_(ATOMIC_FLAG_INIT), done_(false) { }
~Worker() {
// exit even if the work has not been completed
done_ = true;
if (worker_.joinable())
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set()) {
if (worker_.joinable())
worker_ = std::thread([this]{ work(); });
return enqueued;
void work() {
int value;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
while (!done_ && queue_.pop(value)) {
std::cout << value << std::endl;
std::atomic_flag working_;
std::atomic<bool> done_;
Queue queue_;
std::thread worker_;
I introduced the worker_.join() inside the enqueue method. This can impact the performances, but in very rare cases (when the queue gets empty and before the thread exits, another enqueue comes). The working_ variable is now an atomic_flag that is set in enqueue and cleared in work. The Additional while after working_.clear() is needed because if another value is pushed, before the clear, but after the while, the value is not processed.
Is this implementation correct?
I did some tests and the implementation seems to work.
OT: Is it better to put this as an edit, or an answer?
what if the worker thread completes and before executing working_ = false, another enqueue comes?
Then the value will be pushed to the queue but will not be processed until another value is enqueued after the flag is set. You (or your users) may decide whether that is acceptable. This can be avoided using locks, but they're against your requirements.
The code may fail if the running thread is about to finish and sets working_ = false; but hasn't stopped running before next value is enqueued. In that case your code will call operator= on the running thread which results in a call to std::terminate according to the linked documentation.
Adding worker_.join() before assigning the worker to a new thread should prevent that.
Another problem is that queue_.push may fail if the queue is full because it has a fixed size. Currently you just ignore the case and the value will not be added to the full queue. If you wait for queue to have space, you don't get fast enqueue (in the edge case). You could take the bool returned by push (which tells if it was successful) and return it from enqueue. That way the caller may decide whether it wants to wait or discard the value.
Or use non-fixed size queue. Boost has this to say about that choice:
Can be used to completely disable dynamic memory allocations during push in order to ensure lockfree behavior.
If the data structure is configured as fixed-sized, the internal nodes are stored inside an array and they are addressed
by array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index
type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way
to achieve lock-freedom.
Your worker thread needs more than 2 states.
Not running
Doing tasks
Idle shutdown
If you force shut down, it skips idle shutdown. If you run out of tasks, it transitions to idle shutdown. In idle shutdown, it empties the task queue, then goes into shutting down.
Shutdown is set, then you walk off the end of your worker task.
The producer first puts things on the queue. Then it checks the worker state. If Shutdown or Idle shutdown, first join it (and transition it to not running) then launch a new worker. If not running, just launch a new worker.
If the producer wants to launch a new worker, it first makes sure that we are in the not running state (otherwise, logic error). We then transition to the Doing tasks state, and then we launch the worker thread.
If the producer wants to shut down the helper task, it sets the done flag. It then checks the worker state. If it is anything besides not running, it joins it.
This can result in a worker thread that is launched for no good reason.
There are a few cases where the above can block, but there where a few before as well.
Then, we write a formal or semi-formal proof that the above cannot lose messages, because when writing lock free code you aren't done until you have a proof.
This is my solution of the question. I don't like very much answering myself, but I think showing actual code may help others.
#include <boost/lockfree/spsc_queue.hpp>
#include <atomic>
#include <thread>
// I used this semaphore class:
#include "binsem.hpp"
using Queue =
class Worker
// the worker thread starts in the constructor
: working_(ATOMIC_FLAG_INIT), done_(false), semaphore_(0)
, worker_([this]{ work(); })
{ }
~Worker() {
// exit even if the work has not been completed
done_ = true;
bool enqueue(int value) {
bool enqueued = queue_.push(value);
if (!working_.test_and_set())
// signal to the worker thread to wake up
return enqueued;
void work() {
int value;
// the worker thread continue to live
while (!done_) {
// wait the start signal, sleeping
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
while (!done_ && queue_.pop(value)) {
// perform actual work
std::cout << value << std::endl;
std::atomic_flag working_;
std::atomic<bool> done_;
binsem semaphore_;
Queue queue_;
std::thread worker_;
I tried the suggestion of #Cameron, to not shutdown the thread and adding a semaphore. This actually is used only in the first enqueue and in the last work. This is not lock-free, but only in these two cases.
I did some performance comparison, between my previous version (see my edited question), and this one. There are no significant differences, when there are not many start and stop. However, the enqueue is 10 times faster when it have to signal the worker thread, instead of starting a new thread. This is a rare case, so it is not very important, but anyway it is an improvement.
This implementation satisfies:
lock-free in the common case (when enqueue and work are busy);
no busy wait in case for long time there are not enqueue
the destructor exits as soon as possible
correctness?? :)
Very partial answer: I think all those atomics, semaphores and states are a back-communication channel, from "the thread" to "the Worker". Why not use another queue for that? At the very least, thinking about it will help you around the problem.