Why this thread safe queue, creates a deadlock?

Why this thread safe queue, creates a deadlock? - c++

I've written my own version of thread safe queue. However, when I run this program, it hangs/deadlocks itself.
Wondering, why is this locks/hangs forever.
void concurrentqueue::addtoQueue(const int number)
{
locker currentlock(lock_for_queue);
numberlist.push(number);
pthread_cond_signal(&queue_availability_condition);
}
int concurrentqueue::getFromQueue()
{
int number = 0;
locker currentlock(lock_for_queue);
if ( empty() )
{
pthread_cond_wait(&queue_availability_condition,&lock_for_queue);
}
number = numberlist.front();
numberlist.pop();
return number;
}
bool concurrentqueue::empty()
{
return numberlist.empty();
}
I've written, the class locker as RAII.
class locker
{
public:
locker(pthread_mutex_t& lockee): target(lockee)
{
pthread_mutex_lock(&target);
}
~locker()
{
pthread_mutex_unlock(&target);
}
private:
pthread_mutex_t target;
};
My writer/reader thread code is very simple. Writer thread, adds to the queue and reader thread, reads from the queue.
void * writeintoqueue(void* myqueue)
{
void *t = 0;
concurrentqueue *localqueue = (concurrentqueue *) myqueue;
for ( int i = 0; i < 10 ; ++i)
{
localqueue->addtoQueue(i*10);
}
pthread_exit(t);
}
void * readfromqueue(void* myqueue)
{
void *t = 0;
concurrentqueue *localqueue = (concurrentqueue *) myqueue;
int number = 0;
for ( int i = 0 ; i < 10 ; ++i)
{
number = localqueue->getFromQueue();
std::cout << "The number from the queue is " << number << std::endl;
}
pthread_exit(t);
}

This is definitely not safe:
if ( empty() )
{
pthread_cond_wait(&queue_availability_condition,&lock_for_queue);
}
If another thread that was not previously waiting calls getFromQueue() after addtoQueue() has signalled the condition variable and exited but before the waiting thread has aquired the lock then this thread could exit and expect the queue to have values in it. You must recheck that the queue is not empty.
Change the if into a while:
while ( empty() )
{
pthread_cond_wait(&queue_availability_condition,&lock_for_queue);
}

Reformulating spong's comment as an answer: your locker class should NOT be copying the pthread_mutex_t by value. You should use a reference or a pointer instead, e.g.:
class locker
{
public:
locker(pthread_mutex_t& lockee): target(lockee)
{
pthread_mutex_lock(&target);
}
~locker()
{
pthread_mutex_unlock(&target);
}
private:
pthread_mutex_t& target; // <-- this is a reference
};
The reason for this is that all pthreads data types should be treated as opaque types -- you don't know what's in them and should not copy them. The library does things like looking at a particular memory address to determine if a lock is held, so if there are two copies of a variable that indicates if the lock is held, odd things could happen, such as multiple threads appearing to succeed in locking the same mutex.
I tested your code, and it also deadlocked for me. I then ran it through Valgrind, and although it did not deadlock in that case (due to different timings, or maybe Valgrind only simulates one thread at a time), Valgrind reported numerous errors. After fixing locker to use a reference instead, it ran without deadlocking and without generating any errors in Valgrind.
See also Debugging with pthreads.

Related

Why it doesn't work? Simple multithreading example

can you help me with understanding why does this code freeze the program?
#include <iostream>
#include <thread>
#include <mutex>
using namespace std;
int i = 0;
mutex mx;
void foo() {
while(1) {
lock_guard<mutex> locker(mx);
i++;
if(i == 5000) {
break;
}
}
}
void boo() {
while(1) {
if(i == 100) {
lock_guard<mutex> locker(mx);
i = 5000;
break;
}
}
}
int main(int argc, char *argv[])
{
thread th1(foo);
thread th2(boo);
th1.join();
th2.join();
return 0;
}
Why do I have such a result?
How to change the code to make it right? Could you give me your thoughts.
Thanks.

Even if boo starts running first, it will probably never see i==100.
If you only have one CPU, then it's very unlikely that the CPU would be switched from foo to boo while i==100.
If you have multiple CPUs, then i==100 will probably never even make it into foo's cache, because i is not volatile, and the mutex is not locked between reads.
Really the compiler doesn't even have to read i after the first time, because there are no memory barriers. It can assume that the value hasn't changed.
Even if you were to fix this, the distinct possibility would remain that i could be incremented past 100 before boo would notice. It looks like you expect the two threads to "take turns", but that's just not how it works.

The behavior of the program is undefined, so reasoning about what it does is futile. The problem is that boo reads the value of i and foo both reads and writes the value of i, but the read of i in if (i == 100) in boo is unsequenced with respect to the writes occurring in foo. That's a data race, and the behavior of the program is undefined. Sure, you can guess at what might happen, but if you want your code to run correctly, you have to ensure that there are no data races. That means using some form of synchronization: either move the lock in boo before the if, or get rid of the mutex and change the type of i to std::atomic<int>.

There are a few concurrency issues with your solution:
You have to lock the mutex consistently. All access to i must be protected by the mutex, so also at the if (i == 100) { line. In the absence of synchronization, the compiler is free to optimize the thread as-if it was running in isolation, and assume i to never change.
There is no guarantee that boo will start before foo. If it starts after, i will already be incremented well above 100.
Mutex locking is not guaranteed to be fair. Two threads competing for the same mutex will not run in an interleaved manner. Which means foo might increment i many times before boo gets a chance to run, so the value of i as seen by boo might easily jump from 0 to 1000, skipping the desired 100.
In isolation, foo will "run away", incrementing i well beyond 5000. There should be some exit or a restart condition.
How to change the code to make it right?
Add some synchronization in order to enforce interleaved processing. For example, using condition_variables to signal between threads:
int i = 0;
mutex mx;
condition_variable updated_cond;
bool updated = false;
condition_variable consumed_cond;
bool consumed = true;
void foo() {
while (1) {
unique_lock<mutex> locker(mx);
consumed_cond.wait(locker, [] { return consumed; });
consumed = false;
if (i == 5000) {
break;
}
std::cout << "foo: i = " << i << "+1\n";
i++;
updated = true;
updated_cond.notify_one();
}
std::cout << "foo exiting\n";
}
void boo() {
for (bool exit = false; !exit; ) {
unique_lock<mutex> locker(mx);
updated_cond.wait(locker, [] { return updated; });
updated = false;
std::cout << "boo: i = " << i << "\n";
if (i == 100) {
i = 5000;
exit = true;
}
consumed = true;
consumed_cond.notify_one();
}
std::cout << "boo exiting\n";
}

why does this thread pool deadlock or run too many times?

I'm trying to write a thread pool in c++ that fulfills the following criteria:
a single writer occasionally writes a new input value, and once it does, many threads concurrently access this same value, and each spit out a random floating point number.
each worker thread uses the same function, so there's no reason to build a thread-safe queue for all the different functions. I store the common function inside the thread_pool class.
these functions are by far the most computationally-intensive aspect of the program. Any locks that prevent these functions from doing their work is the primary thing I'm trying to avoid.
the floating point output from all these functions is simply averaged.
the user has a single function called thread_pool::start_work that changes this shared input, and tells all the workers to work for a fixed number of tasks.
thread_pool::start_work returns std::future
Below is what I have so far. It can be built and run with g++ test_tp.cpp -std=c++17 -lpthread; ./a.out Unfortunately it either deadlocks or does the work too many (or sometimes too few) times. I am thinking that it's because m_num_comps_done is not thread-safe. There are chances that all the threads skip over the last count, and then they all end up yielding. But isn't this variable atomic?
#include <vector>
#include <thread>
#include <mutex>
#include <shared_mutex>
#include <queue>
#include <atomic>
#include <future>
#include <iostream>
#include <numeric>
/**
* #class join_threads
* #brief RAII thread killer
*/
class join_threads
{
std::vector<std::thread>& m_threads;
public:
explicit join_threads(std::vector<std::thread>& threads_)
: m_threads(threads_) {}
~join_threads() {
for(unsigned long i=0; i < m_threads.size(); ++i) {
if(m_threads[i].joinable())
m_threads[i].join();
}
}
};
// how remove the first two template parameters ?
template<typename func_input_t, typename F>
class thread_pool
{
using func_output_t = typename std::result_of<F(func_input_t)>::type;
static_assert( std::is_floating_point<func_output_t>::value,
"function output type must be floating point");
unsigned m_num_comps;
std::atomic_bool m_done;
std::atomic_bool m_has_an_input;
std::atomic<int> m_num_comps_done; // need to be atomic? why?
F m_f; // same function always used
func_input_t m_param; // changed occasionally by a single writer
func_output_t m_working_output; // many reader threads average all their output to get this
std::promise<func_output_t> m_out;
mutable std::shared_mutex m_mut;
mutable std::mutex m_output_mut;
std::vector<std::thread> m_threads;
join_threads m_joiner;
void worker_thread() {
while(!m_done)
{
if(m_has_an_input){
if( m_num_comps_done.load() < m_num_comps - 1 ) {
std::shared_lock<std::shared_mutex> lk(m_mut);
func_output_t tmp = m_f(m_param); // long time
m_num_comps_done++;
// quick
std::lock_guard<std::mutex> lk2(m_output_mut);
m_working_output += tmp / m_num_comps;
}else if(m_num_comps_done.load() == m_num_comps - 1){
std::shared_lock<std::shared_mutex> lk(m_mut);
func_output_t tmp = m_f(m_param); // long time
m_num_comps_done++;
std::lock_guard<std::mutex> lk2(m_output_mut);
m_working_output += tmp / m_num_comps;
m_num_comps_done++;
try{
m_out.set_value(m_working_output);
}catch(std::future_error& e){
std::cout << "future_error caught: " << e.what() << "\n";
}
}else{
std::this_thread::yield();
}
}else{
std::this_thread::yield();
}
}
}
public:
/**
* #brief ctor spawns working threads
*/
thread_pool(F f, unsigned num_comps)
: m_num_comps(num_comps)
, m_done(false)
, m_has_an_input(false)
, m_joiner(m_threads)
, m_f(f)
{
unsigned const thread_count=std::thread::hardware_concurrency(); // should I subtract one?
try {
for(unsigned i=0; i<thread_count; ++i) {
m_threads.push_back( std::thread(&thread_pool::worker_thread, this));
}
} catch(...) {
m_done=true;
throw;
}
}
~thread_pool() {
m_done=true;
}
/**
* #brief changes the shared data member,
* resets the num_comps_left variable,
* resets the accumulator thing to 0, and
* resets the promise object
*/
std::future<func_output_t> start_work(func_input_t new_param) {
std::unique_lock<std::shared_mutex> lk(m_mut);
m_param = new_param;
m_num_comps_done = 0;
m_working_output = 0.0;
m_out = std::promise<func_output_t>();
m_has_an_input = true; // only really matters just after initialization
return m_out.get_future();
}
};
double slowSum(std::vector<double> nums) {
// std::this_thread::sleep_for(std::chrono::milliseconds(200));
return std::accumulate(nums.begin(), nums.end(), 0.0);
}
int main(){
// construct
thread_pool<std::vector<double>, std::function<double(std::vector<double>)>>
le_pool(slowSum, 1000);
// add work
auto ans = le_pool.start_work(std::vector<double>{1.2, 3.2, 4213.1});
std::cout << "final answer is: " << ans.get() << "\n";
std::cout << "it should be 4217.5\n";
return 1;
}

You check the "done" count, then get the lock. This allows multiple threads to be waiting for the lock. In particular, there might not be a thread that enters the second if body.
The other side of that is because you have all threads running all the time, the "last" thread may not get access to its exclusive section early (before enough threads have run) or even late (because additional threads are waiting at the mutex in the first loop).
To fix the first issue, since the second if block has all of the same code that is in the first if block, you can have just one block that checks the count to see if you've reached the end and should set the out value.
The second issue requires you to check m_num_comps_done a second time after acquiring the mutex.

how condition_variable and unique_lock works for thread safe list

I am trying to use condition variable and unique lock to make a thread safe list. However, I met some problems, it seems the list operation is not thread safe.
I have create an atomic_flag to test if it is thread safe.
Basically, when I operate the list, I will first check if the atomic flag was set, and clear the atomic flag when the list operation is done.
In my thought, the atomic flag operates under the mutex protection, so each time when I test_and_set the atomic flag, I should see the initial value should be false, but when I run the test code, I found it is not so.
Can anyone help me and point out what wrong with the code, why the list operation is not thread safe with the condition variable's protection?
Thanks
The test code is as the following:
using namespace std;
//list element
class myitem
{
public:
myitem() { val = -1; };
myitem(int n, int c){ val = n; chr = c; };
int val;
char chr;
};
// mutex and condition variable to protect the list.
std::mutex mymtx;
condition_variable mycv;
// the list to be protected
std::list<myitem> mylist;
// the atomic flag to test.
std::atomic_flag testlk = ATOMIC_FLAG_INIT;
void datagenthread(char c)
{
int n = 10*1000*1000;
while(n >0)
{
myitem item(n, c);
{
unique_lock<mutex> lk(mymtx); // get the lock
if( testlk.test_and_set() != false) { // test the atomic flag
cout<<"error in thread"<<c<<" for test lock"<<endl;
}
mylist.push_back(item);
testlk.clear(); // clear the atomic before unlock.
}
mycv.notify_one();
n--;
}
}
void datareadthread()
{
int count = 0;
int readc = 0;
while ( count <2) {
{
unique_lock<mutex> lk(mymtx); // acquire lock
while ( mylist.size() <= 0) {
mycv.wait(lk); // block until the thread get notified and get lock again.
}
if( testlk.test_and_set()!= false) {// test the atomic flag.
cout<<"error in reader thread"<<endl;
}
myitem readitem;
readitem = mylist.front();
mylist.pop_front();
readc++;
if ( readitem.val == 1)
{
cout<<" get last one last item form a thread,"<<endl;
count++;
}
testlk.clear(); // clear the atomic flag before unlock
}//unique_lock destruct
}//end while
}
int main()
{
std::thread cons( datareadthread);
std::thread gen1( datagenthread, 'a');
std::thread gen2( datagenthread, 'b');
gen1.join();
gen2.join();
cons.join();
return 0;
}

testlk default constructor initializes it to unspecified state, so at first iteration initial value be something other than false. You should initialize it with cleared state like this:
std::atomic_flag testlk = ATOMIC_FLAG_INIT;

I have found the reason, it seems I link some wrong libs which caused the mutex/condition_var not working properly.

Shutdown boost threads correctly

I have x boost threads that work at the same time. One producer thread fills a synchronised queue with calculation tasks. The consumer threads pop out tasks and calculates them.
Image Source: https://www.quantnet.com/threads/c-multithreading-in-boost.10028/
The user may finish the programm during this process, so I need to shutdown my threads properly. My current approach seems to not work, since exceptions are thrown. It's intented that on system shutdown all processes should be killed and stop their current task no matter what they do. Could you please show me, how you would kill thoses threads?
Thread Initialisation:
for (int i = 0; i < numberOfThreads; i++)
{
std::thread* thread = new std::thread(&MyManager::worker, this);
mThreads.push_back(thread);
}
Thread Destruction:
void MyManager::shutdown()
{
for (int i = 0; i < numberOfThreads; i++)
{
mThreads.at(i)->join();
delete mThreads.at(i);
}
mThreads.clear();
}
Worker:
void MyManager::worker()
{
while (true)
{
int current = waitingList.pop();
Object * p = objects.at(current);
p->calculateMesh(); //this task is internally locked by a mutex
try
{
boost::this_thread::interruption_point();
}
catch (const boost::thread_interrupted&)
{
// Thread interruption request received, break the loop
std::cout << "- Thread interrupted. Exiting thread." << std::endl;
break;
}
}
}
Synchronised Queue:
#include <queue>
#include <thread>
#include <mutex>
#include <condition_variable>
template <typename T>
class ThreadSafeQueue
{
public:
T pop()
{
std::unique_lock<std::mutex> mlock(mutex_);
while (queue_.empty())
{
cond_.wait(mlock);
}
auto item = queue_.front();
queue_.pop();
return item;
}
void push(const T& item)
{
std::unique_lock<std::mutex> mlock(mutex_);
queue_.push(item);
mlock.unlock();
cond_.notify_one();
}
int sizeIndicator()
{
std::unique_lock<std::mutex> mlock(mutex_);
return queue_.size();
}
private:
bool isEmpty() {
std::unique_lock<std::mutex> mlock(mutex_);
return queue_.empty();
}
std::queue<T> queue_;
std::mutex mutex_;
std::condition_variable cond_;
};
The thrown error call stack:
... std::_Mtx_lockX(_Mtx_internal_imp_t * * _Mtx) Line 68 C++
... std::_Mutex_base::lock() Line 42 C++
... std::unique_lock<std::mutex>::unique_lock<std::mutex>(std::mutex & _Mtx) Line 220 C++
... ThreadSafeQueue<int>::pop() Line 13 C++
... MyManager::worker() Zeile 178 C++

From my experience on working with threads in both Boost and Java, trying to shut down threads externally is always messy. I've never been able to really get that to work cleanly.
The best I've gotten is to have a boolean value available to all the consumer threads that is set to true. When you set it to false, the threads will simply return on their own. In your case, that could easily be put into the while loop you have.
On top of that, you're going to need some synchronization so that you can wait for the threads to return before you delete them, otherwise you can get some hard to define behavior.
An example from a past project of mine:
Thread creation
barrier = new boost::barrier(numOfThreads + 1);
threads = new detail::updater_thread*[numOfThreads];
for (unsigned int t = 0; t < numOfThreads; t++) {
//This object is just a wrapper class for the boost thread.
threads[t] = new detail::updater_thread(barrier, this);
}
Thread destruction
for (unsigned int i = 0; i < numOfThreads; i++) {
threads[i]->requestStop();//Notify all threads to stop.
}
barrier->wait();//The update request will allow the threads to get the message to shutdown.
for (unsigned int i = 0; i < numOfThreads; i++) {
threads[i]->waitForStop();//Wait for all threads to stop.
delete threads[i];//Now we are safe to clean up.
}
Some methods that may be of interest from the thread wrapper.
//Constructor
updater_thread::updater_thread(boost::barrier * barrier)
{
this->barrier = barrier;
running = true;
thread = boost::thread(&updater_thread::run, this);
}
void updater_thread::run() {
while (running) {
barrier->wait();
if (!running) break;
//Do stuff
barrier->wait();
}
}
void updater_thread::requestStop() {
running = false;
}
void updater_thread::waitForStop() {
thread.join();
}

Try moving 'try' up (like in the sample below). If your thread is waiting for data (inside waitingList.pop()) then may be waiting inside the condition variable .wait(). This is an 'interruption point' and so may throw when the thread gets interrupted.
void MyManager::worker()
{
while (true)
{
try
{
int current = waitingList.pop();
Object * p = objects.at(current);
p->calculateMesh(); //this task is internally locked by a mutex
boost::this_thread::interruption_point();
}
catch (const boost::thread_interrupted&)
{
// Thread interruption request received, break the loop
std::cout << "- Thread interrupted. Exiting thread." << std::endl;
break;
}
}
}

Maybe you are catching the wrong exception class?
Which would mean it does not get caught.
Not too familiar with threads but is it the mix of std::threads and boost::threads that is causing this?
Try catching the lowest parent exception.

I think this is a classic problem of reader/writer thread working on a common buffer. One of the most secured way of working out this problem is to use mutexes and signals.( I am not able to post the code here. Please send me an email, I post the code to you).

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
}
}
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
}
for (std::thread &t: workers) {
if (t.joinable()) {
t.join();
}
}
arr[4] = std::min_element(arr, arr+4);
}
return 0;
}
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.

This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
public:
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
private:
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
};
ThreadPool::Start
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
threads.resize(num_threads);
for (uint32_t i = 0; i < num_threads; i++) {
threads.at(i) = std::thread(ThreadLoop);
}
}
ThreadPool::ThreadLoop
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
{
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
});
if (should_terminate) {
return;
}
job = jobs.front();
jobs.pop();
}
job();
}
}
ThreadPool::QueueJob
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
{
std::unique_lock<std::mutex> lock(queue_mutex);
jobs.push(job);
}
mutex_condition.notify_one();
}
To use it:
thread_pool->QueueJob([] { /* ... */ });
ThreadPool::busy
void ThreadPool::busy() {
bool poolbusy;
{
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
}
return poolbusy;
}
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
ThreadPool::Stop
Stop the pool.
void ThreadPool::Stop() {
{
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
}
mutex_condition.notify_all();
for (std::thread& active_thread : threads) {
active_thread.join();
}
threads.clear();
}
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
Notes:
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().

You can use C++ Thread Pool Library, https://github.com/vit-vit/ctpl.
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
}
for (int j = 0; j < 4; ++j) {
results[j].get();
}
arr[4] = std::min_element(arr, arr + 4);
}
}
You will get the desired number of threads and will not create and delete them over and over again on the iterations.

A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)

A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
signal
i/o
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.

Follwoing [PhD EcE](https://stackoverflow.com/users/3818417/phd-ece) suggestion, I implemented the thread pool:
function_pool.h
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
{
private:
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
public:
Function_pool();
~Function_pool();
void push(std::function<void()> func);
void done();
void infinite_loop_func();
};
function_pool.cpp
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
{
}
Function_pool::~Function_pool()
{
}
void Function_pool::push(std::function<void()> func)
{
std::unique_lock<std::mutex> lock(m_lock);
m_function_queue.push(func);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
lock.unlock();
m_data_condition.notify_one();
}
void Function_pool::done()
{
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
lock.unlock();
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
m_data_condition.notify_all();
//notify all waiting threads.
}
void Function_pool::infinite_loop_func()
{
std::function<void()> func;
while (true)
{
{
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
{
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
return;
}
func = m_function_queue.front();
m_function_queue.pop();
//release the lock
}
func();
}
}
main.cpp
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
{
std::cout << "bla" << std::endl;
}
int main()
{
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
{
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
}
//here we should send our functions
for (int i = 0; i < 50; i++)
{
func_pool.push(example_function);
}
func_pool.done();
for (unsigned int i = 0; i < thread_pool.size(); i++)
{
thread_pool.at(i).join();
}
}

You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
...
});
}
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
}

Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return service.run(); };
grp.add_thread(new boost::thread(worker));
}
}
template<class F>
void enqueue(F f) {
service.post(f);
}
~thread_pool() {
service_worker.reset();
grp.join_all();
service.stop();
}
private:
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
};
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
});
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
});
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).

Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
}
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
}
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
if(prime)
cout << "is prime";
else
cout << "is not prime";
cout << endl;
}
}
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github: https://github.com/Tyler-Hardin/thread_pool.

looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here - https://github.com/yurir-dev/threadpool
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
tp.start(std::thread::hardware_concurrency());
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
}));
}
}
// wait until all pushed tasks are finished.
for (auto& f : futures)
f.get();
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);

I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
{
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
}
return res;
}
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
...
}
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
...
}

The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js