Limitation on Qt and boost thread local storage - c++

I have following questions on QThreadStorage and boost's thread_specific_ptr:
1) Is there any limitation on number of objects that can be stored in Qthreadstorage? I came across a qt query about 256 QThreadStorage objects, so like to clarify what this limitation points to?
2) Does QThreadStorage work only with QThreads?
3) Is there any limitation on boost tls?
4) I have a use case where I want to operate on tls and sync the data to main thread when all threads finish for further processing. I wrote the below code and like to check if the below code is okay.
#include <iostream>
#include <boost/thread/thread.hpp>
#include <boost/thread/tss.hpp>
boost::mutex mutex1;
int glob = 0;
class data
char* p;
p = (char*)malloc(10);
sprintf(p, "test%d\n", ++glob);
char* global_p[11] = {0};
int index = -1;
void cleanup(data* _ignored) {
std::cout << "TLS cleanup" << std::endl;
boost::mutex::scoped_lock lock(mutex1);
global_p[++index] = _ignored->p;
boost::thread_specific_ptr<data> value(cleanup);
void thread_proc()
value.reset(new data()); // initialize the thread's storage
std::cout << "here" << std::endl;
int main(int argc, char* argv[])
boost::thread_group threads;
for (int i=0; i<10; ++i)
for (int i=0; i<10; ++i)

I can partially answer your question.
The 256 limit belongs to old qt. Probably you are reading old documentation. New qt version (i.e above 4.6) does not have such limit
QThreadStorage can destroy contained items at thread exit because it works closely with QThread. So separting these two is not a wise idea in my opinion.
Here I think you are asking the number of objects that can be stored with boost tls. I am not aware of any limitation on boost tls. You should be fine.
Your code looks OK to me except in the constructor of data you need to put a mutex lock before ++glob otherwise you may not get an incrementing value.
I hope this helps.


Thread Queue C++

'''The original post has been edited'''
How can I make a thread pool for two for loops in C++? I need to run the start_thread function 22 times for each number between 0 and 6. And I will have a flexible number of threads available depending on the machine I am using. How can I create a pool to allocate the free threads to the next of the nested loop?
for (int t=0; t <22; t++){
for(int p=0; p<6; p++){
thread th1(start_thread, p);
thread th2(start_thread, p);
Not really certain about what you want, but maybe it's something like this.
for (int t=0; t <22; t++){
std::vector<std::thread> th;
for(int p=0; p<6; p++){
th.emplace_back(std::thread(start_thread, p));
for(int p=0; p<6; p++){
(or maybe permute the two loops)
Edit if you want to control the number of threads
#include <iostream>
#include <thread>
#include <vector>
start_thread(int t, int p)
std::cout << "th " << t << ' ' << p << '\n';
join_all(std::vector<std::thread> &th)
for(auto &e: th)
std::size_t max_threads=std::thread::hardware_concurrency();
std::vector<std::thread> th;
for(int t=0; t <22; ++t)
for(int p=0; p<6; ++p)
th.emplace_back(std::thread(start_thread, t, p));
return 0;
If you don't want dependency on a third-party library, this is pretty simple.
Just create a number of threads you like and let them pick a "job" from some queue.
For example:
#include <iostream>
#include <mutex>
#include <chrono>
#include <vector>
#include <thread>
#include <queue>
void work(int p)
// do the "work"
std::cout << p << std::endl;
std::mutex m;
std::queue<int> jobs;
void worker()
while (true)
int job(0);
// sync access to the jobs queue
std::lock_guard<std::mutex> l(m);
if (jobs.empty())
job = jobs.front();
int main()
// queue all jobs
for (int t = 0; t < 22; t++) {
for (int p = 0; p < 6; p++) {
// create reasonable number of threads
static const int n = std::thread::hardware_concurrency();
std::vector<std::thread> threads;
for (int i = 0; i < n; ++i)
// wait for all of them to finish
for (int i = 0; i < n; ++i)
[ADDED] Obviously, you don't want global variables in your production code; this is simply a demo solution.
Stop trying to code and draw out what you need to do and the pieces you need to have in order to do it.
You need one queue to hold the jobs, one mutex to protect the queue so the threads don't smurf it up with simultaneous accesses, and N threads.
Each thread function is a loop that
grabs the mutex,
gets a job from the queue,
releases the mutex, and
processes the job.
In this case I'd keep things simple by exiting the loop and the thread when there are no more jobs in the queue in step 2. In production you'd have the thread block and wait on the queue so it's still available to service jobs added later.
Wrap that up in a class with a function that allows you to add jobs to the queue, a function to start N threads, and a function to join on all of the running threads.
main defines an instance of the class, feeds in the jobs, starts the thread pool and then blocks on join until everyone's done.
Once you've beaten the design into something you have high confidence does what you need it to do, then you start writing code. Write code, especially multi-threaded code, without a plan and you're in for a lot of debugging and re-writing that usually exceeds the time spent on design by a significant margin.
Since C++17 you can use one of the execution policies for many of the algorithms in the standard library. This can simplify going over a number of work packages greatly. What goes on behind the curtains is usually that it picks threads from a built-in thread pool and distribute work to them efficiently. It usually use just enough™ threads in both Linux and Windows and it'll use all the CPU you've got left (0% idle on all cores when the CPU:s have started spinning at max frequency) - strangely without making neither Linux nor Windows "sluggish".
Here I've used the execution policy std::execution::parallel_policy (indicated by the std::execution::par constant). If you can prepare the work that needs to be done and put it in a container, like a std::vector, it'll be really easy.
#include <algorithm>
#include <chrono>
#include <execution> // std::execution::par
#include <iostream>
// #include <thread> // not needed to run with execuion policies
#include <vector>
struct work_package {
work_package() : payload(co) { ++co; }
int payload;
static int co;
int work_package::co = 10;
int main() {
std::vector<work_package> wps(22*6); // 132 work packages
for(const auto& wp : wps) std::cout << wp.payload << '\n'; // prints 10 to 141
// work on the work packages
std::for_each(std::execution::par, wps.begin(), wps.end(), [](auto& wp) {
// Probably in a thread - As long as you do not write to the same work package
// from different threads, you don't need synchronization here.
// do some work with the work package
for(const auto& wp : wps) std::cout << wp.payload << '\n'; // prints 11 to 142
With g++ you may need to install tbb (The Threading Building Blocks) that you also need to link with: -ltbb.
apt install libtbb-dev on Ubuntu.
dnf install tbb-devel.x86_64 on Fedora.
Other distributions may call it something different.
Visual Studio (2017 and later) links with the proper library automatically (also tbb if I'm now mistaken).

Segmentation Fault when assigning value to a pointer C++

When I run the following parallel code I get a segmentation fault at the assignment at row 18 (between the two prints). I don't really understand what is causing.
This is a minimal working example which describes the problem:
#include <iostream>
#include <numeric>
#include <vector>
#include <thread>
struct Worker{
std::vector<int>* v;
void f(){
std::vector<int> a(20);
std::iota(a.begin(), a.end(), 1);
auto b = new std::vector<int>(a);
std::cout << "Test 1" << std::endl;
v = b;
std::cout << "Test 2" << std::endl;
int main(int argc, char** argv) {
int nw = 1;
std::vector<std::thread> threads(nw);
std::vector<std::unique_ptr<Worker>> W;
for(int i = 0; i < nw; i++){
threads[i] = std::thread([&]() { W[i]->f(); } );
// Pinning threads to cores
cpu_set_t cpuset;
CPU_SET(i, &cpuset);
pthread_setaffinity_np(threads[i].native_handle(), sizeof(cpu_set_t), &cpuset);
for (int i = 0; i < nw; i++) {
std::cout << (*(W[i]->v))[0] << std::endl;
It seems that compiling it with -fsanitize=address the code works fine but I get worst performances. How can I make it work?
std::vector is not thread-safe. None of the containers in the C++ library are thread safe.
threads[i] = std::thread([&]() { W[i]->f(); } );
The new execution thread captures the vector by reference and accesses it.
The original execution thread continuously modifies the vector here, without synchronizing access to the W vector with any of the new execution threads. Any push_back may invalidate the existing contents of the vector in order to reallocate it, and if a different execution thread attempts to get W[i] at the same time, while it's being reallocated, hillarity ensues.
This is undefined behavior.
You must either synchronize access to the vector using a mutex, or make sure that the vector will never be reallocated, using any number of known techniques. A sufficiently-large reserve(), in advance, should do the trick.
Additionally, it's been pointed out that i is also captured by reference, so by the time each new execution thread starts, its value could be anything.
In addition to the vector synchronization problem mentioned by Sam, there is another problem.
This line:
threads[i] = std::thread([&]() { W[i]->f(); } );
captures i by reference. There is a good chance that i goes out of scope (and is destroyed) before the thread starts running. The statement W[i]->f(); is likely to read an invalid value of i which is negative or too large. Note that before i goes out of scope, the last value written to it is nw, so if even if the memory that previously contained i is still accessible, it's likely to have the value nw which is too large.
You could fix this problem by capturing i by value:
threads[i] = std::thread([&W, i]() { W[i]->f(); } );
// ^^^^^
// captures W by reference, and i by value
As noted by others, the capture is the problem.
I've added the i parameter to the f() call:
void f(int i){
std::vector<int> a(20);
std::iota(a.begin(), a.end(), 1);
auto b = new std::vector<int>(a);
std::cout << "Test 1 " << i << std::endl;
v = b;
std::cout << "Test 2 " << v->size() << std::endl;
and the output: Test 1 1
The call to f works however but it is called without a valid Worker instance and when you assign to v it is surely at a wrong memory.

Confusing C++ Thread Behavior

I was reading some literature on C++11 threads and tried the following code:
#include "iostream"
#include "thread"
using namespace std;
class background_task{
int data;
int flag;
background_task(int val):data(val),flag(data%2){}
void operator()(void){
int count = 0;
while(count < 100)
cout <<'\n'<<data++;
cout <<'\n'<<data--;
int main(int argc , char** argv){
std::thread T1 {background_task(2)};
std::thread T2 {background_task(3)};
return 0;
the output doesn't make sense given that i am running two threads so each should be printing almost together and not wait for one thread to finish to start. Instead each thread finishes and then the next thread starts, like in a synchronous fashion.Am i missing something here?
its probably because of creating a new thread takes some time and the first thread finishes before the next one begin .
and you have the choice to detach or join a thread like
t1.detach();//don't care about t1 finishing
or t1.join()//wait for t1 to finish
Your operating system need not start the threads at the same time; it need not start them on different cores; it need not provide equal time to each thread. I really don't believe the standard mandates anything of the sort, but I haven't checked the standard to cite the right parts to verify.
You may be able to (no promises!) get the behavior you desire by changing your code to the following. This code is "encouraging" the OS to give more time to both threads, and hopefully allows for both threads to be fully constructed before one of them finishes.
#include <chrono>
#include <iostream>
#include <thread>
class background_task {
background_task(int val) : data(val), flag(data % 2) {}
void operator()() {
int count = 0;
while (count < 100) {
if (flag)
std::cout << '\n' << data++;
std::cout << '\n' << data--;
int data;
int flag;
int main() {
std::thread T1{background_task(2)};
std::thread T2{background_task(3)};
return 0;
Try below code, modified you earlier code to show the result:
#include "iostream"
#include "thread"
using namespace std;
class background_task{
int data;
int flag;
background_task(int val):data(val),flag(data%2){}
void operator()(void){
int count = 0;
while(count < 10000000)
cout <<'\n'<<"Yes";
cout <<'\n'<<" "<<"No";
int main(int argc , char** argv){
std::thread T1 {background_task(2)};
std::thread T2 {background_task(3)};
return 0;
By the time second thread starts first thread is already done processing hence you saw what you saw.
In addition to Amir Rasti's answer I think it's worth mentioning the scheduler.
If you use a while(1) instead, you will see that the output isn't exactly parallel even after the two threads running "parallel". The scheduler (part of the operating system) will give each process time to run, but the time can vary. So it can be that one process will print 100 characters before the scheduler let the other process print again.
while(count < 10000)
Loop may be finished before starting of next thread, you can see the difference if you increase the loop or insert some sleep inside the loop.

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
for (std::thread &t: workers) {
if (t.joinable()) {
arr[4] = std::min_element(arr, arr+4);
return 0;
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.
This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
for (uint32_t i = 0; i < num_threads; i++) { = std::thread(ThreadLoop);
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
if (should_terminate) {
job = jobs.front();
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
std::unique_lock<std::mutex> lock(queue_mutex);
To use it:
thread_pool->QueueJob([] { /* ... */ });
void ThreadPool::busy() {
bool poolbusy;
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
return poolbusy;
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
Stop the pool.
void ThreadPool::Stop() {
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
for (std::thread& active_thread : threads) {
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().
You can use C++ Thread Pool Library,
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
for (int j = 0; j < 4; ++j) {
arr[4] = std::min_element(arr, arr + 4);
You will get the desired number of threads and will not create and delete them over and over again on the iterations.
A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)
A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.
Follwoing [PhD EcE]( suggestion, I implemented the thread pool:
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
void push(std::function<void()> func);
void done();
void infinite_loop_func();
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
void Function_pool::push(std::function<void()> func)
std::unique_lock<std::mutex> lock(m_lock);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
void Function_pool::done()
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
//notify all waiting threads.
void Function_pool::infinite_loop_func()
std::function<void()> func;
while (true)
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
func = m_function_queue.front();
//release the lock
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
std::cout << "bla" << std::endl;
int main()
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
//here we should send our functions
for (int i = 0; i < 50; i++)
for (unsigned int i = 0; i < thread_pool.size(); i++)
You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return; };
grp.add_thread(new boost::thread(worker));
template<class F>
void enqueue(F f) {;
~thread_pool() {
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).
Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
cout << "is prime";
cout << "is not prime";
cout << endl;
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github:
looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here -
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
// wait until all pushed tasks are finished.
for (auto& f : futures)
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);
I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
return res;
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).

How do I reverse set_value() and 'deactivate' a promise?

I have a total n00b question here on synchronization. I have a 'writer' thread which assigns a different value 'p' to a promise at each iteration. I need 'reader' threads which wait for shared_futures of this value and then process them, and my question is how do I use future/promise to ensure that the reader threads wait for a new update of 'p' before performing their processing task at each iteration? Many thanks.
You can "reset" a promise by assigning it to a blank promise.
myPromise = promise< int >();
A more complete example:
promise< int > myPromise;
void writer()
for( int i = 0; i < 10; ++i )
cout << "Setting promise.\n";
myPromise.set_value( i );
myPromise = promise< int >{}; // Reset the promise.
cout << "Waiting to set again...\n";
this_thread::sleep_for( chrono::seconds( 1 ));
void reader()
int result;
auto myFuture = myPromise.get_future();
cout << "Waiting to receive result...\n";
result = myFuture.get();
cout << "Received " << result << ".\n";
} while( result < 9 );
int main()
std::thread write( writer );
std::thread read( reader );
return 0;
A problem with this approach, however, is that synchronization between the two threads can cause the writer to call promise::set_value() more than once between the reader's calls to future::get(), or future::get() to be called while the promise is being reset. These problems can be avoided with care (e.g. with proper sleeping between calls), but this takes us into the realm of hacking and guesswork rather than logically correct concurrency.
So although it's possible to reset a promise by assigning it to a fresh promise, doing so tends to raise broader synchronization issues.
A promise/future pair is designed to carry only a single value (or exception.). To do what you're describing, you probably want to adopt a different tool.
If you wish to have multiple threads (your readers) all stop at a common point, you might consider a barrier.
The following code demonstrates how the producer/consumer pattern can be implemented with future and promise.
There are two promise variables, used by a producer and a consumer thread. Each thread resets one of the two promise variables and waits for the other one.
#include <iostream>
#include <future>
#include <thread>
using namespace std;
// produces integers from 0 to 99
void producer(promise<int>& dataready, promise<void>& consumed)
for (int i = 0; i < 100; ++i) {
// do some work here ...
consumed = promise<void>{}; // reset
dataready.set_value(i); // make data available
consumed.get_future().wait(); // wait for the data to be consumed
dataready.set_value(-1); // no more data
// consumes integers
void consumer(promise<int>& dataready, promise<void>& consumed)
for (;;) {
int n = dataready.get_future().get(); // wait for data ready
if (n >= 0) {
std::cout << n << ",";
dataready = promise<int>{}; // reset
consumed.set_value(); // mark data as consumed
// do some work here ...
int main(int argc, const char*argv[])
promise<int> dataready{};
promise<void> consumed{};
thread th1([&] {producer(dataready, consumed); });
thread th2([&] {consumer(dataready, consumed); });
std::cout << "\n";
return 0;