Infinite loop when joining threads - c++

I'm trying to use the ThreadPool class that is available here
Unfortunately, this class has been designed in such a way to create its threads at creation time and join them in the destructor. To make it more flexible and to be able create threads several times in it, I have added the following function to this class:
void join_all() {
for (std::thread &worker : workers) {
worker.join(); // I get blocked here
However, with this change, when running the following main:
int main() {
ThreadPool pool(4);
for (int i = 0; i < 8; ++i) {
pool.enqueue([i]() {
std::cout << "HELLO " << i << std::endl;
pool.join_all(); // here I am blocked
return 0;
My main thread will be blocked inside join_all function while trying to join the first thread.
What's the proper way to write join_all() function which would allow me to keep using the pool without terminating it?

The ThreadPool class already joins the threads on its destructor. But if you want to have your own join_all() function (for any reason), you should set the stop variable as well:
void join_all()
std::unique_lock<std::mutex> lock(queue_mutex);
stop = true;
for (std::thread &worker : workers) {
Warning: Now, you should be careful about double joining on your threads. So, what I propose is to check the threads before joining (e.g. in the destructor):
for (std::thread &worker : workers)
if (worker.joinable())
With these changes, the code works without any infinite loops.


std::async thread pool continue execution without blocking

I have a Thread Pool where each thread must be a waiting thread and keep listening to new tasks to process them asynchronously (the processing takes some long time). However, in the following code I am not able to get this behaviour. The problem is that when I create the thread pool, they execute successfully the first task given. The process() function reaches de return 0; while threads are computing tasks, but it never returns to main(). It stands in the v.wait(l, [&] {return !tasks.empty(); }); line, that is, it still waits for new tasks to be pushed into the tasks queue and that never happens. I've readed that it's something related to the std::future destructor: If I am not wrong, I think that when process() reaches the return, the std::future destructor is called and it waits till all the threads ends, but they never ends!
Here's the code:
static int callings = 0;
class ThreadPool
std::queue<int> tasks;
std::mutex m;
std::vector<std::future<void>> finished;
std::condition_variable v;
void push_task(int arg) {
std::unique_lock<std::mutex> l(m);
v.notify_one(); // wake a thread to work on the task
void read_tasks() {
while (true) {
std::unique_lock<std::mutex> l(m);
if (tasks.empty()) {
//waits till new task
v.wait(l, [&] {return !tasks.empty(); }); //after completing the first task, the program stays here forever
int task = tasks.front(); // read task
tasks.pop(); //delete task
//run the task
std::this_thread::sleep_for(std::chrono::milliseconds(5 * 1000)); //simulate computation
}//while true
void create_thread_pool(int m_threads_count) {
for (int t_i = 0; t_i < m_threads_count; t_i++) {
finished.push_back(std::async(std::launch::async,[this] { read_tasks(); }));
printf("Thread %d is doing work...\n", t_i);
}; //ThreadPool
int process(){
ThreadPool pool;
if(callings == 0)
//give some task to do...
return 0; //point reached but never returning to main
int main(){
// do things...
// do more things...
// this does not execute, how to solve this?
return 0;
How can I return to main() while the threads keep waiting for new tasks without blocking?
Thanks in advance

std::thread: How to wait (join) for any of the given threads to complete?

For example, I have two threads, t1 and t2. I want to wait for t1 or t2 to finish. Is this possible?
If I have a series of threads, say, a std::vector<std::thread>, how can I do it?
There's always wait & notify using std::condition_variable, e.g.:
std::mutex m;
std::condition_variable cond;
std::atomic<std::thread::id> val;
auto task = [&] {
std::this_thread::sleep_for(1s); // Some work
val = std::this_thread::get_id();
std::unique_lock<std::mutex> lock{m};
cond.wait(lock, [&] { return val != std::thread::id{}; });
std::cout << "Thread " << val << " finished first" << std::endl;
Note: val doesn't necessarily represent the thread that finished first as all threads finish at about the same time and an overwrite might occur, but it is only for the purposes of this example.
No, there is no wait for multiple objects equivalent in C++11's threading library.
If you want to wait on the first of a set of operations, consider having them feed a thread-safe producer-consumer queue.
Here is a post I made containing a threaded_queue<T>. Have the work product of your threads be delivered to such a queue. Have the consumer read off of the other end.
Now someone can wait on (the work product) of multiple threads at once. Or one thread. Or a GPU shader. Or work product being delivered over a RESTful web interface. You don't care.
The threads themselves should be managed by something like a thread pool or other higher level abstraction on top of std::thread, as std::thread makes a poor client-facing threading abstraction.
template<class T>
struct threaded_queue {
using lock = std::unique_lock<std::mutex>;
void push_back( T t ) {
lock l(m);
boost::optional<T> pop_front() {
lock l(m);
cv.wait(l, [this]{ return abort || !data.empty(); } );
if (abort) return {};
auto r = std::move(data.back());
return r;
void terminate() {
lock l(m);
abort = true;
std::mutex m;
std::deque<T> data;
std::condition_variable cv;
bool abort = false;
I'd use std::optional instead of boost::optional in C++17. It can also be replaced with a unique_ptr, or a number of other constructs.
It's easy to do with a polling wait:
void thread_task(std::atomic<bool> & boolean) {
std::default_random_engine engine{std::random_device{}()};
std::uniform_int_distribution<int64_t> dist{1000, 3000};
int64_t wait_time = dist(engine);
std::string line = "Thread slept for " + std::to_string(wait_time) + "ms.\n";
std::cout << line;;
int main() {
std::vector<std::thread> threads;
std::atomic<bool> boolean{false};
for(int i = 0; i < 4; i++) {
std::string line = "We reacted after a single thread finished!\n";
while(!boolean) std::this_thread::yield();
std::cout << line;
for(std::thread & thread : threads) {
return 0;
Example output I got on
Thread slept for 1194ms.
We reacted after a single thread finished!
Thread slept for 1967ms.
Thread slept for 2390ms.
Thread slept for 2984ms.
This probably isn't the best code possible, because polling loops are not necessarily best practice, but it should work as a start.
There is no standard way of waiting on multiple threads.
You need to resort to operating system specific functions like WaitForMultipleObjects on Windows.
A Windows only example:
HANDLE handles[] = { t1.native_handle(), t2.native_handle(), };
auto res = WaitForMultipleObjects(2 , handles, FALSE, INFINITE);
Funnily , when std::when_any will be standardized, one can do a standard but wasteful solution:
std::vector<std::thread> waitingThreads;
std::vector<std::future<void>> futures;
for (auto& thread: threads){
std::promise<void> promise;
waitingThreads.emplace_back([&thread, promise = std::move(promise)]{
auto oneFinished = std::when_any(futures.begin(), futures.end());
very wastefull, still not available , but standard.

Two threads sharing variable C++

So I have two threads where they share the same variable, 'counter'. I want to synchronize my threads by only continuing execution once both threads have reached that point. Unfortunately I enter a deadlock state as my thread isn't changing it's checking variable. The way I have it is:
volatile int counter = 0;
Thread() {
- some calculations -
while(counter != 2) {
counter = 0;
- rest of the calculations -
The idea is that since I have 2 threads, once they reach that point - at different times - they will increment the counter. If the counter isn't equal to 2, then the thread that reached there first will have to wait until the other has incremented the counter so that they are synced up. Does anyone know where the issue lies here?
To add more information about the problem, I have two threads which perform half of the operations on an array. Once they are done, I want to make sure that they both have completed finish their calculations. Once they are, I can signal the printer thread to wake up and perform it's operation of printing and clearing the array. If I do this before both threads have completed, there will be issues.
Pseudo code:
Thread() {
1/2 of the calculations on array
wait for both to finish - this is the issue
wake up printer thread
In situations like this, you must use an atomic counter.
std::atomic_uint counter = 0;
In the given example, there is also no sign that counter got initialized.
You are probably looking for std::conditional_variable: A conditional variable allows one thread to signal to another thread. Because it doesn't look like you are using the counter, and you're only using it for synchronisation, here is some code from another answer (disclaimer: it's one of my answers) that shows std::conditional_variable processing logic on different threads, and performing synchronisation around a value:
unsigned int accountAmount;
std::mutex mx;
std::condition_variable cv;
void depositMoney()
// go to the bank etc...
// wait in line...
std::unique_lock<std::mutex> lock(mx);
std::cout << "Depositing money" << std::endl;
accountAmount += 5000;
// Notify others we're finished
void withdrawMoney()
std::unique_lock<std::mutex> lock(mx);
// Wait until we know the money is there
std::cout << "Withdrawing money" << std::endl;
accountAmount -= 2000;
int main()
accountAmount = 0;
// Run both threads simultaneously:
std::thread deposit(&depositMoney);
std::thread withdraw(&withdrawMoney);
// Wait for both threads to finish
std::cout << "All transactions processed. Final amount: " << accountAmount << std::endl;
return 0;
I would look into using a countdown latch. The idea is to have one or more threads block until the desired operation is completed. In this case you want to wait until both threads are finished modifying the array.
Here is a simple example:
#include <condition_variable>
#include <mutex>
#include <thread>
class countdown_latch
countdown_latch(int count)
: count_(count)
void wait()
std::unique_lock<std::mutex> lock(mutex_);
while (count_ > 0)
void countdown()
std::lock_guard<std::mutex> lock(mutex_);
if (count_ == 0)
int count_;
std::mutex mutex_;
std::condition_variable condition_variable_;
and usage would look like this
std::atomic<int> result = 0;
countdown_latch latch(2);
void perform_work()
int main()
std::thread t1(perform_work);
std::thread t2(perform_work);
std::cout << "result = " << result;

Shutdown boost threads correctly

I have x boost threads that work at the same time. One producer thread fills a synchronised queue with calculation tasks. The consumer threads pop out tasks and calculates them.
Image Source:
The user may finish the programm during this process, so I need to shutdown my threads properly. My current approach seems to not work, since exceptions are thrown. It's intented that on system shutdown all processes should be killed and stop their current task no matter what they do. Could you please show me, how you would kill thoses threads?
Thread Initialisation:
for (int i = 0; i < numberOfThreads; i++)
std::thread* thread = new std::thread(&MyManager::worker, this);
Thread Destruction:
void MyManager::shutdown()
for (int i = 0; i < numberOfThreads; i++)
void MyManager::worker()
while (true)
int current = waitingList.pop();
Object * p =;
p->calculateMesh(); //this task is internally locked by a mutex
catch (const boost::thread_interrupted&)
// Thread interruption request received, break the loop
std::cout << "- Thread interrupted. Exiting thread." << std::endl;
Synchronised Queue:
#include <queue>
#include <thread>
#include <mutex>
#include <condition_variable>
template <typename T>
class ThreadSafeQueue
T pop()
std::unique_lock<std::mutex> mlock(mutex_);
while (queue_.empty())
auto item = queue_.front();
return item;
void push(const T& item)
std::unique_lock<std::mutex> mlock(mutex_);
int sizeIndicator()
std::unique_lock<std::mutex> mlock(mutex_);
return queue_.size();
bool isEmpty() {
std::unique_lock<std::mutex> mlock(mutex_);
return queue_.empty();
std::queue<T> queue_;
std::mutex mutex_;
std::condition_variable cond_;
The thrown error call stack:
... std::_Mtx_lockX(_Mtx_internal_imp_t * * _Mtx) Line 68 C++
... std::_Mutex_base::lock() Line 42 C++
... std::unique_lock<std::mutex>::unique_lock<std::mutex>(std::mutex & _Mtx) Line 220 C++
... ThreadSafeQueue<int>::pop() Line 13 C++
... MyManager::worker() Zeile 178 C++
From my experience on working with threads in both Boost and Java, trying to shut down threads externally is always messy. I've never been able to really get that to work cleanly.
The best I've gotten is to have a boolean value available to all the consumer threads that is set to true. When you set it to false, the threads will simply return on their own. In your case, that could easily be put into the while loop you have.
On top of that, you're going to need some synchronization so that you can wait for the threads to return before you delete them, otherwise you can get some hard to define behavior.
An example from a past project of mine:
Thread creation
barrier = new boost::barrier(numOfThreads + 1);
threads = new detail::updater_thread*[numOfThreads];
for (unsigned int t = 0; t < numOfThreads; t++) {
//This object is just a wrapper class for the boost thread.
threads[t] = new detail::updater_thread(barrier, this);
Thread destruction
for (unsigned int i = 0; i < numOfThreads; i++) {
threads[i]->requestStop();//Notify all threads to stop.
barrier->wait();//The update request will allow the threads to get the message to shutdown.
for (unsigned int i = 0; i < numOfThreads; i++) {
threads[i]->waitForStop();//Wait for all threads to stop.
delete threads[i];//Now we are safe to clean up.
Some methods that may be of interest from the thread wrapper.
updater_thread::updater_thread(boost::barrier * barrier)
this->barrier = barrier;
running = true;
thread = boost::thread(&updater_thread::run, this);
void updater_thread::run() {
while (running) {
if (!running) break;
//Do stuff
void updater_thread::requestStop() {
running = false;
void updater_thread::waitForStop() {
Try moving 'try' up (like in the sample below). If your thread is waiting for data (inside waitingList.pop()) then may be waiting inside the condition variable .wait(). This is an 'interruption point' and so may throw when the thread gets interrupted.
void MyManager::worker()
while (true)
int current = waitingList.pop();
Object * p =;
p->calculateMesh(); //this task is internally locked by a mutex
catch (const boost::thread_interrupted&)
// Thread interruption request received, break the loop
std::cout << "- Thread interrupted. Exiting thread." << std::endl;
Maybe you are catching the wrong exception class?
Which would mean it does not get caught.
Not too familiar with threads but is it the mix of std::threads and boost::threads that is causing this?
Try catching the lowest parent exception.
I think this is a classic problem of reader/writer thread working on a common buffer. One of the most secured way of working out this problem is to use mutexes and signals.( I am not able to post the code here. Please send me an email, I post the code to you).

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
for (std::thread &t: workers) {
if (t.joinable()) {
arr[4] = std::min_element(arr, arr+4);
return 0;
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.
This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
for (uint32_t i = 0; i < num_threads; i++) { = std::thread(ThreadLoop);
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
if (should_terminate) {
job = jobs.front();
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
std::unique_lock<std::mutex> lock(queue_mutex);
To use it:
thread_pool->QueueJob([] { /* ... */ });
void ThreadPool::busy() {
bool poolbusy;
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
return poolbusy;
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
Stop the pool.
void ThreadPool::Stop() {
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
for (std::thread& active_thread : threads) {
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().
You can use C++ Thread Pool Library,
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
for (int j = 0; j < 4; ++j) {
arr[4] = std::min_element(arr, arr + 4);
You will get the desired number of threads and will not create and delete them over and over again on the iterations.
A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)
A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.
Follwoing [PhD EcE]( suggestion, I implemented the thread pool:
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
void push(std::function<void()> func);
void done();
void infinite_loop_func();
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
void Function_pool::push(std::function<void()> func)
std::unique_lock<std::mutex> lock(m_lock);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
void Function_pool::done()
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
//notify all waiting threads.
void Function_pool::infinite_loop_func()
std::function<void()> func;
while (true)
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
func = m_function_queue.front();
//release the lock
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
std::cout << "bla" << std::endl;
int main()
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
//here we should send our functions
for (int i = 0; i < 50; i++)
for (unsigned int i = 0; i < thread_pool.size(); i++)
You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return; };
grp.add_thread(new boost::thread(worker));
template<class F>
void enqueue(F f) {;
~thread_pool() {
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).
Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
cout << "is prime";
cout << "is not prime";
cout << endl;
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github:
looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here -
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
// wait until all pushed tasks are finished.
for (auto& f : futures)
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);
I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
return res;
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).