Proper usage of std::promise and std::future, segfaults - c++

I've been trying my luck on a small threadpool implementation.
However, after conceptualizing and implementing i've hit a brick wall.
I've confirmed that the worker threads ate starting up and sleeping correctly, also that they pick up and execute stored tasks correctly.
However, my program segfaults - i'm pretty sure its at promise.set_value.
Im not sure how i could provide a complete, verifiable example (given that i can hardly upload the whole code) but i'll include the segments
i believe to be relevant to this problem.
First off, workers are created like this:
worker = [this](){
std::unique_lock<std::mutex> lock(mStatusMutex); //CV for status updates
if(mStatus != Running) //If threadpool status does not imply running
break; //Break out of loop, ending thread in the process
else //If threadpool is in running state
lock.unlock(); //Unlock state
while(true) //Loop until no tasks are left
mTasksMutex.lock(); //Lock task queue
if(mTasks.empty()) //IF no tasks left, break out of loop and return to waiting
else //Else, retrieve a task, unlock the task queue and execute the task
std::function<void()> task = mTasks.front();
task(); //Execute task
And then started and stored into a std::vector<std::thread> like this:
std::thread tWorker(worker);
Now, the tricky part i believe to be the following is when adding/executing tasks to the task queue, which is a std::queue<std::function<void()>>.
The following two functions are relevant here:
template<typename RT>
inline std::future<RT> queueTask(std::function<RT()> _task, bool _execute = false)
std::promise<RT> promise;
std::function<void()> func([&_task, &promise]() -> RT {
RT val = _task();
if(_execute) flush();
return promise.get_future();
inline void flush()
Is there anything principally wrong with this approach?
For anyone who believes this to be a bad question, feel free to tell me how i can improve it.
Full code is hosted on my github repo.

The main problem is that the promise is already dead. When queueTask is done, the promise is destroyed, and the task now just has a dangling reference. The task must share ownership of the promise in order for it to live long enough to fulfill it.
The same is true of the underlying std::function object _task, since you're capturing it by reference.
You're using std::function, which requires copyable objects, hence... shared_ptr:
template<typename RT>
inline std::future<RT> queueTask(std::function<RT()> _task, bool _execute = false)
auto promise = std::make_shared<std::promise<RT>>();
std::function<void()> func([promise, task=std::move(_task)]{
RT val = _task();
std::lock_guard<std::mutex> lk(mTasksMutex); // NB: no manual lock()/unlock()!!
if(_execute) flush();
return promise->get_future();
Consider std::packaged_task instead.


Approach of using an std::atomic compared to std::condition_variable wrt pausing & resuming an std::thread in C++

This is a separate question but related to the previous question I asked here
I am using an std::thread in my C++ code to constantly poll for some data & add it to a buffer. I use a C++ lambda to start the thread like this:
StartMyThread() {
thread_running = true;
the_thread = std::thread { [this] {
while(thread_running) {
thread_running is an atomic<bool> declared in class header. Here is my GetData function:
GetData() {
//Some heavy logic
Next I also have a StopMyThread function where I set thread_running to false so that it exits out of the while loop in the lambda block.
StopMyThread() {
thread_running = false;
As I understand, I can pause & resume the thread using a std::condition_variable as pointed out here in my earlier question.
But is there a disadvantage if I just use the std::atomic<bool> thread_running to execute or not execute the logic in GetData() like below ?
GetData() {
if (thread_running == false)
//Some heavy logic
Will this burn more CPU cycles compared to the approach of using an std::condition_variable as described here ?
A condition variable is useful when you want to conditionally halt another thread or not. So you might have an always-running "worker" thread that waits when it notices it has nothing to do to be running.
The atomic solution requires your UI interaction synchronize with the worker thread, or very complex logic to do it asynchronously.
As a general rule, your UI response thread should never block on non-ready state from worker threads.
struct worker_thread {
worker_thread( std::function<void()> t, bool play = true ):
thread = std::async( std::launch::async, [this]{
// move is not safe. If you need this movable,
// use unique_ptr<worker_thread>.
worker_thread(worker_thread&& )=delete;
~worker_thread() {
if (!exit) finalize();
void finalize() {
auto l = lock();
exit = true;
void pause() {
auto l = lock();
execute = false;
void play() {
auto l = lock();
execute = true;
void wait() {
if (thread)
void work() {
while(true) {
bool done = false;
auto l = lock();
cv.wait( l, [&]{
return exit || execute;
done = exit; // have lock here
if (done) break;
std::unique_lock<std::mutex> lock() {
return std::unique_lock<std::mutex>(m);
std::mutex m;
std::condition_variable cv;
bool exit = false;
bool execute = true;
std::function<void()> task;
std::future<void> thread;
or somesuch.
This owns a thread. The thread repeatedly runs task so long as it is in play() mode. If you pause() the next time task() finishes, the worker thread stops. If you play() before the task() call finishes, it doesn't notice the pause().
The only wait is on destruction of worker_thread, where it automatically informs the worker thread it should exit and it waits for it to finish.
You can manually .wait() or .finalize() as well. .finalize() is async, but if your app is shutting down you can call it early and give the worker thread more time to clean up while the main thread cleans things up elsewhere.
.finalize() cannot be reversed.
Code not tested.
Unless I'm missing something, you already answered this in your original question: You'll be creating and destroying the worker thread each time it's needed. This may or may not be an issue in your actual application.
There's two different problems being solved and it may depend on what you're actually doing. One problem is "I want my thread to run until I tell it to stop." The other seems to be a case of "I have a producer/consumer pair and want to be able to notify the consumer when data is ready." The thread_running and join method works well for the first of those. The second you may want to use a mutex and condition because you're doing more than just using the state to trigger work. Suppose you have a vector<Work>. You guard that with the mutex, so the condition becomes [&work] (){ return !work.empty(); } or something similar. When the wait returns, you hold the mutex so you can take things out of work and do them. When you're done, you go back to wait, releasing the mutex so the producer can add things to the queue.
You may want to combine these techniques. Have a "done processing" atomic that all of your threads periodically check to know when to exit so that you can join them. Use the condition to cover the case of data delivery between threads.

Is this threadpool usage safe?

I'm posting several jobs to a threadpool and then waiting for it to finish. I'm wondering if I've missed something here, since occasionally my worker threads seem to freeze.
My main thread start the workers like this:
numJobsPosted = 0;
for(auto entry : list)
threadPool->post(std::bind(&Controller::workerFunc, this, entry));
std::unique_lock<std::mutex> lock(m_workerLock);
while(numJobsPosted > 0)
Now my workerFunc looks something like this:
void Controller::workerFunc(Entry entry)
// do some work with entry
// notify finished
if(numJobsPosted <= 0)
// does the look need to be around the numJobsPosted-- ?
std::unique_lock<std::mutex> locker(m_workerLock);
Is the above code safe, or do I need to put the lock around the decrement operator?
This may depend on details of your thread pool's inner logic or setup (e.g. if you have a single thread, so jobs are actually run sequentially), but assuming that numJobsPosted is an int or similar built-in type, your code isn't thread-safe.
This line in workerFunc:
could very well be the subject of a race condition if it gets executed by several jobs concurrently.
Also, I'm not sure what your threadpool's post function does precisely, but if it dispatches the worker function to a thread right away and some of the worker functions can return immediately, you have another possible race condition between this line in your main thread code:
and this line in workerFunc:
To make it safe, you can for instance make numJobsPosted atomic, e.g. declare it like this (in C++11):
#include <atomic>
std::atomic_int numJobsPosted;
Making your workerFunc something like this:
void Controller::workerFunc(Entry entry)
// do some work with entry
// notify finished
std::unique_lock<std::mutex> locker(m_workerLock);
if(numJobsPosted <= 0)
may solve the first race condition case, but not the second.
(Also, I don't really understand the logic around the manipulation and testing you're doing on numJobsPosted, but I think that's beside the point of your question)

Thread pool stuck on wait condition

I'm encountering a stuck in my c++ program using this thread pool class:
class ThreadPool {
unsigned threadCount;
std::vector<std::thread> threads;
std::list<std::function<void(void)> > queue;
std::atomic_int jobs_left;
std::atomic_bool bailout;
std::atomic_bool finished;
std::condition_variable job_available_var;
std::condition_variable wait_var;
std::mutex wait_mutex;
std::mutex queue_mutex;
std::mutex mtx;
void Task() {
while (!bailout) {
std::function<void(void)> next_job() {
std::function<void(void)> res;
std::unique_lock<std::mutex> job_lock(queue_mutex);
// Wait for a job if we don't have any.
job_available_var.wait(job_lock, [this]()->bool { return queue.size() || bailout; });
// Get job from the queue
if (!bailout) {
res = queue.front();
}else {
// If we're bailing out, 'inject' a job into the queue to keep jobs_left accurate.
res = [] {};
return res;
ThreadPool(int c)
: threadCount(c)
, threads(threadCount)
, jobs_left(0)
, bailout(false)
, finished(false)
for (unsigned i = 0; i < threadCount; ++i)
threads[i] = std::move(std::thread([this, i] { this->Task(); }));
~ThreadPool() {
void AddJob(std::function<void(void)> job) {
std::lock_guard<std::mutex> lock(queue_mutex);
void JoinAll(bool WaitForAll = true) {
if (!finished) {
if (WaitForAll) {
// note that we're done, and wake up any thread that's
// waiting for a new job
bailout = true;
for (auto& x : threads)
if (x.joinable())
finished = true;
void WaitAll() {
std::unique_lock<std::mutex> lk(wait_mutex);
if (jobs_left > 0) {
wait_var.wait(lk, [this] { return this->jobs_left == 0; });
gdb say (when stopping the blocked execution) that the stuck was in (std::unique_lock&, ThreadPool::WaitAll()::{lambda()#1})+58>
I'm using g++ v5.3.0 with support for c++14 (-std=c++1y)
How can I avoid this problem?
I've edited (rewrote) the class:
The issue here is a race condition on your job count. You're using one mutex to protect the queue, and another to protect the count, which is semantically equivalent to the queue size. Clearly the second mutex is redundant (and improperly used), as is the job_count variable itself.
Every method that deals with the queue has to gain exclusive access to it (even JoinAll to read its size), so you should use the same queue_mutex in the three bits of code that tamper with it (JoinAll, AddJob and next_job).
Btw, splitting the code at next_job() is pretty awkward IMO. You would avoid calling a dummy function if you handled the worker thread body in a single function.
As other comments have already stated, you would probably be better off getting your eyes off the code and reconsidering the problem globally for a while.
The only thing you need to protect here is the job queue, so you need only one mutex.
Then there is the problem of waking up the various actors, which requires a condition variable since C++ basically does not give you any other useable synchronization object.
Here again you don't need more than one variable. Terminating the thread pool is equivalent to dequeueing the jobs without executing them, which can be done any which way, be it in the worker threads themselves (skipping execution if the termination flag is set) or in the JoinAll function (clearing the queue after gaining exclusive access).
Last but not least, you might want to invalidate AddJob once someone decided to close the pool, or else you could get stuck in the destructor while someone keeps feeding in new jobs.
I think you need to keep it simple.
you seem to be using a mutex too many. So there's queue_mutex and you use that when you add and process jobs.
Now what's the need for another separate mutex when you are waiting on reading the queue?
Why can't you use just a conditional variable with the same queue_mutex to read the queue in your WaitAll() method?
I would also recommend using a lock_guard instead of the unique_lock in your WaitAll. There really isn't a need to lock the queue_mutex beyond the WaitAll under exceptional conditions. If you exit the WaitAll exceptionally it should be released regardless.
Ignore my Update above. Since you are using a condition variable you can't use a lock guard in the WaitAll. But if you are using a unique_lock always go with the try_to_lock version especially if you have more than a couple control paths

Simple threaded timer, sanity check please

I've made a very simple threaded timer class and given the pitfalls around MT code, I would like a sanity check please. The idea here is to start a thread then continuously loop waiting on a variable. If the wait times out, the interval was exceeded and we call the callback. If the variable was signalled, the thread should quit and we don't call the callback.
One of the things I'm not sure about is what happens in the destructor with my code, given the thread may be joinable there (just). Can I join a thread in a destructor to make sure it's finished?
Here's the class:
class TimerThreaded
TimerThreaded() {}
if (MyThread.joinable())
void Start(std::chrono::milliseconds const & interval, std::function<void(void)> const & callback)
if (MyThread.joinable())
MyThread = std::thread([=]()
for (;;)
auto locked = std::unique_lock<std::mutex>(MyMutex);
auto result = MyTerminate.wait_for(locked, interval);
if (result == std::cv_status::timeout)
void Stop()
std::thread MyThread;
std::mutex MyMutex;
std::condition_variable MyTerminate;
I suppose a better question might be to ask someone to point me towards a very simple threaded timer, if there's one already available somewhere.
Can I join a thread in a destructor to make sure it's finished?
Not only you can, but it's quite typical to do so. If the thread instance is joinable (i.e. still running) when it's destroyed, terminate would be called.
For some reason result is always timeout. It never seems to get signalled and so never stops. Is it correct? notify_all should unblock the wait_for?
It can only unblock if the thread happens to be on the cv at the time. What you're probably doing is call Start and then immediately Stop before the thread has started running and begun waiting (or possibly while callback is running). In that case, the thread would never be notified.
There is another problem with your code. Blocked threads may be spuriously woken up on some implementations even when you don't explicitly call notify_X. That would cause your timer to stop randomly for no apparent reason.
I propose that you add a flag variable that indicates whether Stop has been called. This will fix both of the above problems. This is the typical way to use condition variables. I've even written the code for you:
class TimerThreaded
MyThread = std::thread([=]()
for (;;)
auto locked = std::unique_lock<std::mutex>(MyMutex);
auto result = MyTerminate.wait_for(locked, interval);
if (stop_please)
if (result == std::cv_status::timeout)
void Stop()
std::lock_guard<std::mutex> lock(MyMutex);
stop_please = true;
bool stop_please = false;
With these changes yout timer should work, but do realize that "[std::condition_variable::wait_for] may block for longer than timeout_duration due to scheduling or resource contention delays", in the words of
point me towards a very simple threaded timer, if there's one already available somewhere.
I don't know of a standard c++ solution, but modern operating systems typically provide this kind of functionality or at least pieces that can be used to build it. See timerfd_create on linux for an example.

Parallel writer and reader of std::vector

I have a class that is used by 2 threads at the same time: one thread adds results (one by one) to the results of a task, the second thread works on those results that are already there.
// all members are copy-able
struct task {
command cmd;
vector<result> results;
class generator {
generator(executor* e); // store the ptr
void run();
class executor {
void run();
void add_result(int command_id, result r);
task& find_task(int command_id);
vector<task> tasks_;
condition_variable_any update_condition_;
// In main, we have instances of generator and executor,
// we launch 2 threads and wait for them.
std::thread gen_th( std::bind( &generator::run, gen_instance_) );
std::thread exe_th( std::bind( &executor::run, exe_instance_) );
Generator Thread
void generator::run() {
while(is_running) {
executor_->add_result( SOME_ID, new_result() );
Executor thread
void executor::add_result( int command_id, result r ) {
std::unique_lock<std::recursive_mutex> l(mutex_);
task& t = this->find_task(command_id);
void executor::run() {
while(is_running) {
task& t = this->find_task(SOME_ID);
for(result r: t.results) {
// no live updates are visible here
Generator thread adds a result every few seconds.
Executor thread is an executor itself. It is run via the run method, which waits for an update and when that happens, it works on the results.
Few things to take notice of:
vector of tasks may be big; the results are never disposed;
the for-each loop in executor fetches the task it's working on, then iterates over results, checks which of them are new and processes them. Once processed, they are marked and won't be processed again. This processing may take some time.
The problem occurs when Executor Thread doesn't finish the for loop before another result is added - the result object is not visible in the for loop. Since Executor Thread is working, it doesn't notice the update condition update, doesn't refresh the vector etc. When it finishes (working on a alread-not-actual view of tasks_) it hangs again on the update_condition_.. which was just triggered.
I need to make the code aware, that it should run the loop again after finishing it or make changes to a task visible in the for-each loop. What is the best solution to this problem?
You just need to check whether your vector is empty or not before blocking on the CV. Something like that:
while (running) {
std::unique_lock<std::mutex> lock(mutex);
while (tasks_.empty()) // <-- this is important
// handle tasks_
If your architecture allows it (ie. if you don't need to hold the lock while handling the tasks), you may also want to unlock the mutex ASAP, before handling the tasks so that the producer can push more tasks without blocking. Maybe swapping your tasks_ vector with a temporary one, then unlock the mutex, and only then start handling the tasks in the temporary vector:
while (running) {
std::unique_lock<std::mutex> lock(mutex);
while (tasks_.empty())
std::vector<task> localTasks;
lock.unlock(); // <-- release the lock early
// handle localTasks
Edit: ah now I realize this doesn't really fit your situation, because your messages are not directly in tasks_ but in tasks_.results. You get my general idea though, but using it will require structure changes in your code (eg. flatten your tasks / results and always have a cmd associated with a single result).
I act in the following way in the same situation
std::vector< ... > temp;
temp.swap( results );
for(result r: temp ){
A little overhead takes a place, but in general whole code is more readeble and if an amount of calculations is big, then the time for copying goes to zero (sorry for english - it's not native to me)))