Thread pool (presumably) locking issue with condition variable and mutex - c++

I'm working on a thread pool and ran into a weird issue regarding condition variables and mutexes. I suspect there might be a locking problem since it sometimes works, sometimes it doesn't. This is the relevant part of the code (removed non-relevant bits):
class ThreadPool {
std::atomic<bool> running;
std::atomic<size_t> unfinished_tasks;
std::queue<std::function<void(void)>> task_queue;
std::condition_variable cv_work;
std::mutex mtx_queue;
std::vector<std::thread> threads;
ThreadPool(size_t num_threads = std::thread::hardware_concurrency());
template<class T, class Fn>
std::future<T> queueTask(Fn&& fn);
ThreadPool::ThreadPool(size_t num_threads) :
running(true), unfinished_tasks(0) {
auto thread_loop = [&] {
while (running.load()) {
std::unique_lock<std::mutex> lock(mtx_queue);
if (!task_queue.empty()) {
auto work = task_queue.front();
} else {
std::cout << std::this_thread::get_id() << " going to sleep..." << std::endl;
for (size_t i = 0; i < num_threads; i++) {
template<class T, class Fn>
inline std::future<T> ThreadPool::queueTask(Fn&& fn) {
// func = lambda containing packaged task with fn
return future;
As soon as I comment out the line containing the debug output, adding lots of small tasks to the thread pool will make it lock up at some point, with the debug output in place, it will finish all tasks properly. I'm not really sure where the issue could be here.

You have a race condition. queueTask can notify cv_work before your thread function is waiting. Don't unlock mtx_queue until after you call cv_work.notify_one().


How to wait for completion of all tasks in this ThreadPool?

I am trying to write a ThreadPool class
class ThreadPool {
ThreadPool(size_t numberOfThreads):isAlive(true) {
for(int i =0; i < numberOfThreads; i++) {
workerThreads.push_back(std::thread(&ThreadPool::doJob, this));
#ifdef DEBUG
std::cout<<"Construction Complete"<<std::endl;
~ThreadPool() {
#ifdef DEBUG
std::cout<<"Destruction Start"<<std::endl;
isAlive = false;
#ifdef DEBUG
std::cout<<"Destruction Complete"<<std::endl;
void waitForExecution() {
for(std::thread& worker: workerThreads) {
void addWork(std::function<void()> job) {
#ifdef DEBUG
std::cout<<"Adding work"<<std::endl;
std::unique_lock<std::mutex> lock(lockListMutex);
// performs actual work
void doJob() {
// try {
while(isAlive) {
#ifdef DEBUG
std::cout<<"Do Job"<<std::endl;
std::unique_lock<std::mutex> lock(lockListMutex);
if(!jobQueue.empty()) {
#ifdef DEBUG
std::cout<<"Next Job Found"<<std::endl;
std::function<void()> job = jobQueue.front();
// a vector containing worker threads
std::vector<std::thread> workerThreads;
// a queue for jobs
std::list<std::function<void()>> jobQueue;
// a mutex for synchronized insertion and deletion from list
std::mutex lockListMutex;
std::atomic<bool> isAlive;
// condition variable to track whether or not there is a job in queue
std::condition_variable conditionVariable;
I am adding work to this thread pool from my main thread. My problem is calling waitForExecution() results in forever waiting main thread. I need to be able to terminate threads when all work is done and continue main thread execution from there. How should I proceed here?
The first step when writing a robust thread pool is to split the queue from the management of threads. A thread-safe queue is hard enough to write by its own, and managing threads similarly.
A thread safe queue looks like:
template<class T>
struct threadsafe_queue {
boost::optional<T> pop() {
std::unique_lock<std::mutex> l(m);
cv.wait(l, [&]{ aborted || !data.empty(); } );
if (aborted) return {};
return data.pop_front();
void push( T t )
std::unique_lock<std::mutex> l(m);
if (aborted) return;
data.push_front( std::move(t) );
void abort()
std::unique_lock<std::mutex> l(m);
aborted = true;
data = {};
~threadsafe_queue() { abort(); }
std::mutex m;
std::condition_variable cv;
std::queue< T > data;
bool aborted = false;
where pop returns nullopt when the queue is aborted.
Now our thread pool is easy:
struct threadpool {
explicit threadpool(std::size_t n) { add_threads(n); }
threadpool() = default;
~threadpool(){ abort(); }
void add_thread() { add_threads(1); }
void add_threads(std::size_t n)
for (std::size_t i = 0; i < n; ++i)
threads.push_back( std::thread( [this]{ do_thread_work(); } ) );
template<class F>
auto add_task( F && f )
using R = std::result_of_t< F&() >;
auto pptr = std::make_shared<std::promise<R>>();
auto future = pptr.get_future();
tasks.push([pptr]{ (*pptr)(); });
return future;
void abort()
while (!threads.empty()) {
threadsafe_queue< std::function<void()> > tasks;
std::vector< std::thread > threads;
void do_thread_work() {
while (auto f = tasks.pop()) {
note that if you abort, outstanding future's are filled with a broken promise exception.
Worker threads stop running when the queue they are feeding from is aborted. The main thread on abort() will wait for the worker threads to finish (as is wise).
This does mean that worker thread tasks must also terminate, or the main thread will hang. There is no way to avoid this; often, your worker threads' tasks need to cooperate to get a message saying they should abort early.
Boost has a thread pool that integrates with its threading primitives and permits a less cooperative abort; in it, all mutex type operations implicitly check for an abort flag, and if they see it the operation throws.
How should I proceed here?
Well, you should learn to use your debugger, which should show you exactly where each of the threads you want to join is stopped.
I'm going to tell you what looks wrong, but strongly encourage you to do that first. It's invaluable.
OK, now: your condition variable loop is wrong.
The correct pattern is the one that behaves like the second form, with the predicate argument, here:
while (!pred()) {
Specifically, if your predicate is true, you must not call wait. You may never be woken again, because the predicate never became false in the first place!
// wait until we have something to do
while(jobQueue.empty() && isAlive) {
// unless we're exiting, we must have a job
if (isAlive) {
#ifdef DEBUG
std::cout<<"Next Job Found"<<std::endl;
std::function<void()> job = jobQueue.front();
Imagine your thread is running a job when you call notify_all - it will call wait after the notification has already happened, and it isn't coming again. Since it doesn't check isAlive between finishing the job and calling wait, it's going to wait forever.
Even without the shutdown problem it would be wrong, because it should keep consuming jobs while there is work to do, instead of blocking every time it finishes one. Which reminds me of the last issue - you should probably unlock the mutex while executing the job (and re-lock it afterwards) - otherwise your pool is single-threaded.

Add a std::packaged_task to an existing thread?

Is there an standard way to add a std::packaged_task to an existing thread? There's a nontrivial amount of overhead that must happen before the task is run, so I want to do that once, then keep the thread running and waiting for tasks to execute. I want to be able to use futures so I can optionally get the result of the task and catch exceptions.
My pre-C++11 implementation requires my tasks to inherit from an abstract base class with a Run() method (a bit of a pain, can't use lambdas), and having a std::deque collection of those that I add to in the main thread and dequeue from in the worker thread. I have to protect that collection from simultaneous access and provide a signal to the worker thread that there's something to do so it isn't spinning or sleeping. Enqueing something returns a "result" object with a synchronization object to wait for the task to complete, and a result value. It all works well but it's time for an upgrade if there's something better.
Here is a toy thread pool:
template<class T>
struct threaded_queue {
using lock = std::unique_lock<std::mutex>;
void push_back( T t ) {
lock l(m);
boost::optional<T> pop_front() {
lock l(m);
cv.wait(l, [this]{ return abort || !data.empty(); } );
if (abort) return {};
auto r = std::move(data.back());
return std::move(r);
void terminate() {
lock l(m);
abort = true;
std::mutex m;
std::deque<T> data;
std::condition_variable cv;
bool abort = false;
struct thread_pool {
thread_pool( std::size_t n = 1 ) { start_thread(n); }
thread_pool( thread_pool&& ) = delete;
thread_pool& operator=( thread_pool&& ) = delete;
~thread_pool() = default; // or `{ terminate(); }` if you want to abandon some tasks
template<class F, class R=std::result_of_t<F&()>>
std::future<R> queue_task( F task ) {
std::packaged_task<R()> p(std::move(task));
auto r = p.get_future();
tasks.push_back( std::move(p) );
return r;
template<class F, class R=std::result_of_t<F&()>>
std::future<R> run_task( F task ) {
if (threads_active() >= total_threads()) {
return queue_task( std::move(task) );
void terminate() {
std::size_t threads_active() const {
return active;
std::size_t total_threads() const {
return threads.size();
void clear_threads() {
void start_thread( std::size_t n = 1 ) {
while(n-->0) {
std::async( std::launch::async,
while(auto task = tasks.pop_front()) {
} catch(...) {
std::vector<std::future<void>> threads;
threaded_queue<std::packaged_task<void()>> tasks;
std::atomic<std::size_t> active;
copied from another answer of mine.
A thread_pool with 1 thread matches your description pretty much.
The above is only a toy, a real thread pool I'd replace the std::packaged_task<void()> with a move_only_function<void()>, which is all I use it for. (A packaged_task<void()> can hold a packaged_task<R()> amusingly, if inefficiencly).
You will have to reason about shutdown and make a plan. The above code locks up if you try to shut it down without first clearing the threads.

Signaling main thread when std::future is ready to be retrieved

I'm trying to understand the std::async, std::future system. What I don't quite understand is how you deal with running multiple async "tasks", and then, based on what returns first, second, etc, running some additional code.
Example: Let's say your main thread is in a simple loop. Now, based on user input, you run several functions via std::async, and save the futures in a std::list.
My issue is, how do I pass information back from the std::async function that can specify which future is complete?
My main thread is basically in a message loop, and what I need to do is have a function run by std::async be able to queue a message that somehow specifies which future is complete. The issue is that the function doesn't have access to the future.
Am I just missing something?
Here is some pseudo-code of what I'm trying to accomplish; extra points if there is a way to also have a way to have a way to make a call to "cancel" the request using a cancelation token.
class RequestA
int input1;
int output1;
//check for completion
// i.e. pop next "message"
if(auto *completed_task = get_next_completed_task())
// other code to handle user input
// note that I don't want to use a raw pointer but
// am not sure how to use future for this
RequestA *a = new RequestA();
run(a, OnRequestTypeAComplete);
void OnRequestTypeAComplete(RequestA &req)
// Do stuff with req, want access to inputs and output
Unfortunately C++11 std::future doesn't provide continuations and cancellations. You can retrieve result from std::future only once. Moreover future returned from std::async blocks in its destructor. There is a group headed by Sean Parent from Adobe. They implemented future, async, task as it should be. Also functions with continuation like when_all, when_any. Could be it is what you're looking for. Anyway have a look at this project. Code has good quality and can be read easily.
If platform dependent solution are also ok for you you can check them. For windows I know PPL library. It also has primitives with cancellation and continuation.
You can create a struct containing a flag and pass a reference to that flag to your thread function.
Something a bit like this:
int stuff(std::atomic_bool& complete, std::size_t id)
std::cout << "starting: " << id << '\n';
// do stuff
// generate value
int value = hol::random_number(30);
// signal end
complete = true;
std::cout << "ended: " << id << " -> " << value << '\n';
return value;
struct task
std::future<int> fut;
std::atomic_bool complete;
task() = default;
task(task&& t): fut(std::move(t.fut)), complete(t.complete.load()) {}
int main()
// list of tasks
std::vector<task> tasks;
// reserve enough spaces so that nothing gets reallocated
// as that would invalidate the references to the atomic_bools
// needed to signal the end of a thread
// create a new task
// start it running
tasks.back().fut = std::async(std::launch::async, stuff, std::ref(tasks.back().complete), tasks.size());
tasks.back().fut = std::async(std::launch::async, stuff, std::ref(tasks.back().complete), tasks.size());
tasks.back().fut = std::async(std::launch::async, stuff, std::ref(tasks.back().complete), tasks.size());
// Keep going as long as any of the tasks is incomplete
while(std::any_of(std::begin(tasks), std::end(tasks),
[](auto& t){ return !t.complete.load(); }))
// do some parallel stuff
// process the results
int sum = 0;
for(auto&& t: tasks)
sum += t.fut.get();
std::cout << "sum: " << sum << '\n';
Here a solution with a std::unordered_map instead of a std::list in which you don't need to modify your callables. Instead of that, you use a helper function that assigns an id to each task and notify when they finish:
class Tasks {
* Helper to create the tasks in a safe way.
* lockTaskCreation is needed to guarantee newTask is (temporarilly)
* assigned before it is moved to the list of tasks
template <class R, class ...Args>
void createNewTask(const std::function<R(Args...)>& f, Args... args) {
std::unique_lock<std::mutex> lock(mutex);
std::lock_guard<std::mutex> lockTaskCreation(mutexTaskCreation);
newTask = std::async(std::launch::async, executeAndNotify<R, Args...>,
std::move(lock), f, std::forward<Args>(args)...);
* Assign an id to the task, execute it, and notify when finishes
template <class R, class ...Args>
static R executeAndNotify(std::unique_lock<std::mutex> lock,
const std::function<R(Args...)>& f, Args... args)
std::lock_guard<std::mutex> lockTaskCreation(mutexTaskCreation);
tasks[std::this_thread::get_id()] = std::move(newTask);
Notifier notifier;
return f(std::forward<Args>(args)...);
* Class to notify when a task is completed (follows RAII)
class Notifier {
~Notifier() {
std::lock_guard<std::mutex> lock(mutex);
* Wait for a finished task.
* This function needs to be called in an infinite loop
static void waitForFinishedTask() {
std::unique_lock<std::mutex> lock(mutex);
cv.wait(lock, [] { return finishedTasks.size() || finish; });
if (finishedTasks.size()) {
auto threadId = finishedTasks.front();
auto result =;
std::cout << "task " << threadId
<< " returned: " << result << std::endl;
static std::unordered_map<std::thread::id, std::future<int>> tasks;
static std::mutex mutex;
static std::mutex mutexTaskCreation;
static std::queue<std::thread::id> finishedTasks;
static std::condition_variable cv;
static std::future<int> newTask;
Then, you can call an async task in this way:
int doSomething(int i) {
return i;
int main() {
Tasks tasks;
tasks.createNewTask(std::function<decltype(doSomething)>(doSomething), 10);
return 0;
See a complete implementation run on Coliru

Thread Pool: Block destruction until all work is done

I have the following thread pool implementation:
template<typename... event_args>
class thread_pool{
using handler_type = std::function<void(event_args...)>;
thread_pool(handler_type&& handler, std::size_t N = 4, bool finish_before_exit = true) : _handler(std::forward<handler_type&&>(handler)),_workers(N),_running(true),_finish_work_before_exit(finish_before_exit)
for(auto&& worker: _workers)
//worker function
worker = std::thread([this]()
while (_running)
//wait for work
std::unique_lock<std::mutex> _lk{_wait_mutex};
_cv.wait(_lk, [this]{
return !_events.empty() || !_running;
//_lk unlocked
//check to see why we woke up
if (!_events.empty()) {//was it new work
std::unique_lock<std::mutex> _readlk(_queue_mutex);
auto data = _events.front();
invoke(std::move(_handler), std::move(data));
}else if(!_running){//was it a signal to exit
//or was it spurious and we should just ignore it
//end worker function
{//block destruction until all work is done
std::condition_variable _work_remains;
std::mutex _wr;
std::unique_lock<std::mutex> lk{_wr};
return _events.empty();
//let all workers know to exit
//attempt to join all workers
for(auto&& _worker: _workers)
handler_type& handler()
return _handler;
void propagate(event_args&&... args)
//lock before push
std::unique_lock<std::mutex> _lk(_queue_mutex);
_lk.unlock();//explicit unlock
_cv.notify_one();//let worker know that data is available
bool _finish_work_before_exit;
handler_type _handler;
std::queue<std::tuple<event_args...>> _events;
std::vector<std::thread> _workers;
std::atomic_bool _running;
std::condition_variable _cv;
std::mutex _wait_mutex;
std::mutex _queue_mutex;
//helpers used to unpack tuple into function call
template<typename Func, typename Tuple, std::size_t... I>
auto invoke_(Func&& func, Tuple&& t, std::index_sequence<I...>)
return func(std::get<I>(std::forward<Tuple&&>(t))...);
template<typename Func, typename Tuple, typename Indicies = std::make_index_sequence<std::tuple_size<Tuple>::value>>
auto invoke(Func&& func, Tuple&& t)
return invoke_(std::forward<Func&&>(func), std::forward<Tuple&&>(t), Indicies());
I recently added this section to the destructor:
{//block destruction until all work is done
std::condition_variable _work_remains;
std::mutex _wr;
std::unique_lock<std::mutex> lk{_wr};
return _events.empty();
The intent was to have the destructor block until the work queue was fully consumed.
But it seems to put the program into deadlock. aAll of the work does get completed, but the wait does not seem to end when the work is done.
Consider this example main:
std::mutex writemtx;
thread_pool<int> pool{
[&](int i){
std::unique_lock<std::mutex> lk{writemtx};
std::cout<<i<<" : "<<std::this_thread::get_id()<<std::endl;
for (int i=0; i<8192; ++i) {
How can I have the destructor wait for the completion of the work without causing deadlock?
The reason your code is deadlocked is that _work_remains is a condition variable which is not "notified" by any part of your code. You would need to make that a class attribute and have it notified by any thread that picks up the last event from the _events.

detached thread crashing on exiting

I am using a simple thread pool as below-
template<typename T>
class thread_safe_queue // thread safe worker queue.
std::atomic<bool> finish;
mutable std::mutex mut;
std::queue<T> data_queue;
std::condition_variable data_cond;
thread_safe_queue() : finish{ false }
void setDone()
void push(T new_value)
std::lock_guard<std::mutex> lk(mut);
void wait_and_pop(T& value)
std::unique_lock<std::mutex> lk(mut);
data_cond.wait(lk, [this]
return false == data_queue.empty();
if (finish.load() == true)
value = std::move(data_queue.front());
bool empty() const
std::lock_guard<std::mutex> lk(mut);
return data_queue.empty();
//Thread Pool
class ThreadPool
std::atomic<bool> done;
unsigned thread_count;
std::vector<std::thread> threads;
explicit ThreadPool(unsigned count = 1);
ThreadPool(const ThreadPool & other) = delete;
ThreadPool& operator = (const ThreadPool & other) = delete;
// IF thread is NOT marked detached and this is uncommented the worker threads waits infinitely.
//for (auto &th : threads)
// if (th.joinable())
// th.join();
// }
void init()
thread_count = std::min(thread_count, std::thread::hardware_concurrency());
for (unsigned i = 0; i < thread_count; ++i)
threads.emplace_back(std::move(std::thread(&ThreadPool::workerThread, this)));
// here the problem is if i dont mark it detatched thread infinitely waits for condition.
// if i comment out the detach line and uncomment out comment lines in ~ThreadPool main threads waits infinitely.
catch (...)
void workerThread()
while (true)
std::function<void()> task;
if (done == true)
void submit(std::function<void(void)> fn)
The usage is like :
struct start
ThreadPool::ThreadPool m_NotifPool;
ThreadPool::ThreadPool m_SnapPool;
int main()
start s;
return 0;
I am running this code on visual studio 2013. The problem is when main thread exits. The program crashes. It throws exception.
Please help me with what am i doing wrong? How do i stop the worker thread properly? I have spent quite some time but still figuring out what is the issue.
Thanks for your help in advance.
I am not familiar with threads in c++ but have worked with threading in C. In C what actually happens is when you creates child threads of from the main thread then you have to stop the main thread until the childs finishes. If main exits the threads becomes zombie. I think C don't throw an exception in case of Zombies. And may be you are getting exception because of these zombies only. Try stopping the main until the childs finishes and see if it works.
When main exits, detached threads are allowed to continue running, however, object s is destroyed. So, as your threads attempt to access members of object s, you are running into UB.
See accepted answer of this question for more details about your issue : What happens to a detached thread when main() exits?
A rule of thumb would be not to detach threads from main, but signal thread pool that app is ending and join all thread. Or do as is answered in What happens to a detached thread when main() exits?