boost::asio, thread pools and thread monitoring

boost::asio, thread pools and thread monitoring - c++

I've implemented a thread pool using boost::asio, and some number boost::thread objects calling boost::asio::io_service::run(). However, a requirement that I've been given is to have a way to monitor all threads for "health". My intent is to make a simple sentinel object that can be passed through the thread pool -- if it makes it through, then we can assume that the thread is still processing work.
However, given my implementation, I'm not sure how (if) I can monitor all the threads in the pool reliably. I've simply delegated the thread function to boost::asio::io_service::run(), so posting a sentinel object into the io_service instance won't guarantee which thread will actually get that sentinel and do the work.
One option may be to just periodically insert the sentinel, and hope that it gets picked up by each thread at least once in some reasonable amount of time, but that obviously isn't ideal.
Take the following example. Due to the way that the handler is coded, in this instance we can see that each thread will do the same amount of work, but in reality I will not have control of the handler implementation, some can be long running while others will be almost immediate.
#include <iostream>
#include <boost/asio.hpp>
#include <vector>
#include <boost/thread.hpp>
#include <boost/bind.hpp>
void handler()
{
std::cout << boost::this_thread::get_id() << "\n";
boost::this_thread::sleep(boost::posix_time::milliseconds(100));
}
int main(int argc, char **argv)
{
boost::asio::io_service svc(3);
std::unique_ptr<boost::asio::io_service::work> work(new boost::asio::io_service::work(svc));
boost::thread one(boost::bind(&boost::asio::io_service::run, &svc));
boost::thread two(boost::bind(&boost::asio::io_service::run, &svc));
boost::thread three(boost::bind(&boost::asio::io_service::run, &svc));
svc.post(handler);
svc.post(handler);
svc.post(handler);
svc.post(handler);
svc.post(handler);
svc.post(handler);
svc.post(handler);
svc.post(handler);
svc.post(handler);
svc.post(handler);
work.reset();
three.join();
two.join();
one.join();
return 0;
}

You can use a common io_service instance between all the threads and a private io_service instance for every thread. Every thread will execute a method like this:
void Mythread::threadLoop()
{
while(/* termination condition */)
{
commonIoService.run_one();
privateIoService.run_one();
commonConditionVariable.timed_wait(time);
}
}
By this way, if you want to ensure that some task is executed in a thread, you only have to post this task in its owned io_service.
To post a task in your thread pool you can do:
void MyThreadPool::post(Hander handler)
{
commonIoService.post(handler);
commonConditionVariable.notify_all();
}

The solution that I used relies on the fact that I own the implementation of the tread pool objects. I created a wrapper type that will update statistics, and copy the user defined handlers that are posted to the thread pool. Only this wrapper type is ever posted to the underlying io_service. This method allows me to keep track of the handlers that are posted/executed, without having to be intrusive into the user code.
Here's a stripped down and simplified example:
#include <iostream>
#include <memory>
#include <vector>
#include <boost/thread.hpp>
#include <boost/asio.hpp>
// Supports scheduling anonymous jobs that are
// executable as returning nothing and taking
// no arguments
typedef std::function<void(void)> functor_type;
// some way to store per-thread statistics
typedef std::map<boost::thread::id, int> thread_jobcount_map;
// only this type is actually posted to
// the asio proactor, this delegates to
// the user functor in operator()
struct handler_wrapper
{
handler_wrapper(const functor_type& user_functor, thread_jobcount_map& statistics)
: user_functor_(user_functor)
, statistics_(statistics)
{
}
void operator()()
{
user_functor_();
// just for illustration purposes, assume a long running job
boost::this_thread::sleep(boost::posix_time::milliseconds(100));
// increment executed jobs
++statistics_[boost::this_thread::get_id()];
}
functor_type user_functor_;
thread_jobcount_map& statistics_;
};
// anonymous thread function, just runs the proactor
void thread_func(boost::asio::io_service& proactor)
{
proactor.run();
}
class ThreadPool
{
public:
ThreadPool(size_t thread_count)
{
threads_.reserve(thread_count);
work_.reset(new boost::asio::io_service::work(proactor_));
for(size_t curr = 0; curr < thread_count; ++curr)
{
boost::thread th(thread_func, boost::ref(proactor_));
// inserting into this map before any work can be scheduled
// on it, means that we don't have to look it for lookups
// since we don't dynamically add threads
thread_jobcount_.insert(std::make_pair(th.get_id(), 0));
threads_.emplace_back(std::move(th));
}
}
// the only way for a user to get work into
// the pool is to use this function, which ensures
// that the handler_wrapper type is used
void schedule(const functor_type& user_functor)
{
handler_wrapper to_execute(user_functor, thread_jobcount_);
proactor_.post(to_execute);
}
void join()
{
// join all threads in pool:
work_.reset();
proactor_.stop();
std::for_each(
threads_.begin(),
threads_.end(),
[] (boost::thread& t)
{
t.join();
});
}
// just an example showing statistics
void log()
{
std::for_each(
thread_jobcount_.begin(),
thread_jobcount_.end(),
[] (const thread_jobcount_map::value_type& it)
{
std::cout << "Thread: " << it.first << " executed " << it.second << " jobs\n";
});
}
private:
std::vector<boost::thread> threads_;
std::unique_ptr<boost::asio::io_service::work> work_;
boost::asio::io_service proactor_;
thread_jobcount_map thread_jobcount_;
};
struct add
{
add(int lhs, int rhs, int* result)
: lhs_(lhs)
, rhs_(rhs)
, result_(result)
{
}
void operator()()
{
*result_ = lhs_ + rhs_;
}
int lhs_,rhs_;
int* result_;
};
int main(int argc, char **argv)
{
// some "state objects" that are
// manipulated by the user functors
int x = 0, y = 0, z = 0;
// pool of three threads
ThreadPool pool(3);
// schedule some handlers to do some work
pool.schedule(add(5, 4, &x));
pool.schedule(add(2, 2, &y));
pool.schedule(add(7, 8, &z));
// give all the handlers time to execute
boost::this_thread::sleep(boost::posix_time::milliseconds(1000));
std::cout
<< "x = " << x << "\n"
<< "y = " << y << "\n"
<< "z = " << z << "\n";
pool.join();
pool.log();
}
Output:
x = 9
y = 4
z = 15
Thread: 0000000000B25430 executed 1 jobs
Thread: 0000000000B274F0 executed 1 jobs
Thread: 0000000000B27990 executed 1 jobs

Related

why does this thread pool deadlock or run too many times?

I'm trying to write a thread pool in c++ that fulfills the following criteria:
a single writer occasionally writes a new input value, and once it does, many threads concurrently access this same value, and each spit out a random floating point number.
each worker thread uses the same function, so there's no reason to build a thread-safe queue for all the different functions. I store the common function inside the thread_pool class.
these functions are by far the most computationally-intensive aspect of the program. Any locks that prevent these functions from doing their work is the primary thing I'm trying to avoid.
the floating point output from all these functions is simply averaged.
the user has a single function called thread_pool::start_work that changes this shared input, and tells all the workers to work for a fixed number of tasks.
thread_pool::start_work returns std::future
Below is what I have so far. It can be built and run with g++ test_tp.cpp -std=c++17 -lpthread; ./a.out Unfortunately it either deadlocks or does the work too many (or sometimes too few) times. I am thinking that it's because m_num_comps_done is not thread-safe. There are chances that all the threads skip over the last count, and then they all end up yielding. But isn't this variable atomic?
#include <vector>
#include <thread>
#include <mutex>
#include <shared_mutex>
#include <queue>
#include <atomic>
#include <future>
#include <iostream>
#include <numeric>
/**
* #class join_threads
* #brief RAII thread killer
*/
class join_threads
{
std::vector<std::thread>& m_threads;
public:
explicit join_threads(std::vector<std::thread>& threads_)
: m_threads(threads_) {}
~join_threads() {
for(unsigned long i=0; i < m_threads.size(); ++i) {
if(m_threads[i].joinable())
m_threads[i].join();
}
}
};
// how remove the first two template parameters ?
template<typename func_input_t, typename F>
class thread_pool
{
using func_output_t = typename std::result_of<F(func_input_t)>::type;
static_assert( std::is_floating_point<func_output_t>::value,
"function output type must be floating point");
unsigned m_num_comps;
std::atomic_bool m_done;
std::atomic_bool m_has_an_input;
std::atomic<int> m_num_comps_done; // need to be atomic? why?
F m_f; // same function always used
func_input_t m_param; // changed occasionally by a single writer
func_output_t m_working_output; // many reader threads average all their output to get this
std::promise<func_output_t> m_out;
mutable std::shared_mutex m_mut;
mutable std::mutex m_output_mut;
std::vector<std::thread> m_threads;
join_threads m_joiner;
void worker_thread() {
while(!m_done)
{
if(m_has_an_input){
if( m_num_comps_done.load() < m_num_comps - 1 ) {
std::shared_lock<std::shared_mutex> lk(m_mut);
func_output_t tmp = m_f(m_param); // long time
m_num_comps_done++;
// quick
std::lock_guard<std::mutex> lk2(m_output_mut);
m_working_output += tmp / m_num_comps;
}else if(m_num_comps_done.load() == m_num_comps - 1){
std::shared_lock<std::shared_mutex> lk(m_mut);
func_output_t tmp = m_f(m_param); // long time
m_num_comps_done++;
std::lock_guard<std::mutex> lk2(m_output_mut);
m_working_output += tmp / m_num_comps;
m_num_comps_done++;
try{
m_out.set_value(m_working_output);
}catch(std::future_error& e){
std::cout << "future_error caught: " << e.what() << "\n";
}
}else{
std::this_thread::yield();
}
}else{
std::this_thread::yield();
}
}
}
public:
/**
* #brief ctor spawns working threads
*/
thread_pool(F f, unsigned num_comps)
: m_num_comps(num_comps)
, m_done(false)
, m_has_an_input(false)
, m_joiner(m_threads)
, m_f(f)
{
unsigned const thread_count=std::thread::hardware_concurrency(); // should I subtract one?
try {
for(unsigned i=0; i<thread_count; ++i) {
m_threads.push_back( std::thread(&thread_pool::worker_thread, this));
}
} catch(...) {
m_done=true;
throw;
}
}
~thread_pool() {
m_done=true;
}
/**
* #brief changes the shared data member,
* resets the num_comps_left variable,
* resets the accumulator thing to 0, and
* resets the promise object
*/
std::future<func_output_t> start_work(func_input_t new_param) {
std::unique_lock<std::shared_mutex> lk(m_mut);
m_param = new_param;
m_num_comps_done = 0;
m_working_output = 0.0;
m_out = std::promise<func_output_t>();
m_has_an_input = true; // only really matters just after initialization
return m_out.get_future();
}
};
double slowSum(std::vector<double> nums) {
// std::this_thread::sleep_for(std::chrono::milliseconds(200));
return std::accumulate(nums.begin(), nums.end(), 0.0);
}
int main(){
// construct
thread_pool<std::vector<double>, std::function<double(std::vector<double>)>>
le_pool(slowSum, 1000);
// add work
auto ans = le_pool.start_work(std::vector<double>{1.2, 3.2, 4213.1});
std::cout << "final answer is: " << ans.get() << "\n";
std::cout << "it should be 4217.5\n";
return 1;
}

You check the "done" count, then get the lock. This allows multiple threads to be waiting for the lock. In particular, there might not be a thread that enters the second if body.
The other side of that is because you have all threads running all the time, the "last" thread may not get access to its exclusive section early (before enough threads have run) or even late (because additional threads are waiting at the mutex in the first loop).
To fix the first issue, since the second if block has all of the same code that is in the first if block, you can have just one block that checks the count to see if you've reached the end and should set the out value.
The second issue requires you to check m_num_comps_done a second time after acquiring the mutex.

std::atomic_flag to stop multiple threads

I'm trying to stop multiple worker threads using a std::atomic_flag. Starting from Issue using std::atomic_flag with worker thread the following works:
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
std::atomic_flag continueFlag;
std::thread t;
void work()
{
while (continueFlag.test_and_set(std::memory_order_relaxed)) {
std::cout << "work ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void start()
{
continueFlag.test_and_set(std::memory_order_relaxed);
t = std::thread(&work);
}
void stop()
{
continueFlag.clear(std::memory_order_relaxed);
t.join();
}
int main()
{
std::cout << "Start" << std::endl;
start();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "Stop" << std::endl;
stop();
std::cout << "Stopped." << std::endl;
return 0;
}
Trying to rewrite into multiple worker threads:
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
#include <vector>
#include <memory>
struct thread_data {
std::atomic_flag continueFlag;
std::thread thread;
};
std::vector<thread_data> threads;
void work(int threadNum, std::atomic_flag &continueFlag)
{
while (continueFlag.test_and_set(std::memory_order_relaxed)) {
std::cout << "work" << threadNum << " ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void start()
{
const unsigned int numThreads = 2;
for (int i = 0; i < numThreads; i++) {
////////////////////////////////////////////////////////////////////
//PROBLEM SECTOR
////////////////////////////////////////////////////////////////////
thread_data td;
td.continueFlag.test_and_set(std::memory_order_relaxed);
td.thread = std::thread(&work, i, td.continueFlag);
threads.push_back(std::move(td));
////////////////////////////////////////////////////////////////////
//PROBLEM SECTOR
////////////////////////////////////////////////////////////////////
}
}
void stop()
{
//Flag stop
for (auto &data : threads) {
data.continueFlag.clear(std::memory_order_relaxed);
}
//Join
for (auto &data : threads) {
data.thread.join();
}
threads.clear();
}
int main()
{
std::cout << "Start" << std::endl;
start();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "Stop" << std::endl;
stop();
std::cout << "Stopped." << std::endl;
return 0;
}
My issue is "Problem Sector" in above. Namely creating the threads. I cannot wrap my head around how to instantiate the threads and passing the variables to the work thread.
The error right now is referencing this line threads.push_back(std::move(td)); with error Error C2280 'thread_data::thread_data(const thread_data &)': attempting to reference a deleted function.
Trying to use unique_ptr like this:
auto td = std::make_unique<thread_data>();
td->continueFlag.test_and_set(std::memory_order_relaxed);
td->thread = std::thread(&work, i, td->continueFlag);
threads.push_back(std::move(td));
Gives error std::atomic_flag::atomic_flag(const std::atomic_flag &)': attempting to reference a deleted function at line td->thread = std::thread(&work, i, td->continueFlag);. Am I fundamentally misunderstanding the use of std::atomic_flag? Is it really both immovable and uncopyable?

Your first approach was actually closer to the truth. The problem is that it passed a reference to an object within the local for loop scope to each thread, as a parameter. But, of course, once the loop iteration ended, that object went out of scope and got destroyed, leaving each thread with a reference to a destroyed object, resulting in undefined behavior.
Nobody cared about the fact that you moved the object into the std::vector, after creating the thread. The thread received a reference to a locally-scoped object, and that's all it knew. End of story.
Moving the object into the vector first, and then passing to each thread a reference to the object in the std::vector will not work either. As soon as the vector internally reallocates, as part of its natural growth, you'll be in the same pickle.
What needs to happen is to have the entire threads array created first, before actually starting any std::threads. If the RAII principle is religiously followed, that means nothing more than a simple call to std::vector::resize().
Then, in a second loop, iterate over the fully-cooked threads array, and go and spawn off a std::thread for each element in the array.

I was almost there with my unique_ptr solution. I just needed to pass the call as a std::ref() as such:
std::vector<std::unique_ptr<thread_data>> threads;
void start()
{
const unsigned int numThreads = 2;
for (int i = 0; i < numThreads; i++) {
auto td = std::make_unique<thread_data>();
td->continueFlag.test_and_set(std::memory_order_relaxed);
td->thread = std::thread(&work, i, std::ref(td->continueFlag));
threads.push_back(std::move(td));
}
}
However, inspired by Sam above I also figured a non-pointer way:
std::vector<thread_data> threads;
void start()
{
const unsigned int numThreads = 2;
//create new vector, resize doesn't work as it tries to assign/copy which atomic_flag
//does not support
threads = std::vector<thread_data>(numThreads);
for (int i = 0; i < numThreads; i++) {
auto& t = threads.at(i);
t.continueFlag.test_and_set(std::memory_order_relaxed);
t.thread = std::thread(&work, i, std::ref(t.continueFlag));
}
}

Message passing between threads using a command file

This project asked for 4 threads that has a command file with
instructions such as SEND, Receive and quit. When the file says "2
send" the thread that in the second place in the array should wake
up and receive its message. I need to know how to make a thread read
it's message if the command file has a message for it?

The biggest issue I see for your design is the fact that each thread reads its line randomly independent from any other thread. After this it would have to check wether the current line is actually meant for it i.e. starting with the appropriate number. What happens if not ? Too complicated.
I would split this issue up into one reader thread and a set of worker threads. The first reads lines from a file and dispatches it to the workers by pushing it into the current workers queue. All synchronized with a per worker mutex and conditional variable The following is implemented in C++11 but should as well be doable in pthread_* style.
#include <thread>
#include <iostream>
#include <queue>
#include <mutex>
#include <fstream>
#include <list>
#include <sstream>
#include <condition_variable>
class worker {
public:
void operator()(int n) {
while(true) {
std::unique_lock<std::mutex> l(_m);
_c.wait(l);
if(!_q.empty()) {
{
std::unique_lock<std::mutex> l(_mm);
std::cerr << "#" << n << " " << _q.back() <<std::endl;
}
_q.pop();
}
}
}
private:
std::mutex _m;
std::condition_variable _c;
std::queue<std::string> _q;
// Only needed to synchronize I/O
static std::mutex _mm;
// Reader may write into our queue
friend class reader;
};
std::mutex worker::_mm;
class reader {
public:
reader(worker & w0,worker & w1,worker & w2,worker & w3) {
_v.push_back(&w0);
_v.push_back(&w1);
_v.push_back(&w2);
_v.push_back(&w3);
}
void operator()() {
std::ifstream fi("commands.txt");
std::string s;
while(std::getline(fi,s)) {
std::stringstream ss(s);
int n;
if((ss >> n >> std::ws) && n>=0 && n<_v.size()) {
std::string s0;
if(std::getline(ss,s0)) {
std::unique_lock<std::mutex> l(_v[n]->_m);
_v[n]->_q.push(s0);
_v[n]->_c.notify_one();
}
}
}
std::cerr << "done" << std::endl;
}
private:
std::vector<worker *> _v;
};
int main(int c,char **argv) {
worker w0;
worker w1;
worker w2;
worker w3;
std::thread tw0([&w0]() { w0(0); });
std::thread tw1([&w1]() { w1(1); });
std::thread tw2([&w2]() { w2(2); });
std::thread tw3([&w3]() { w3(3); });
reader r(w0,w1,w2,w3);
std::thread tr([&r]() { r(); });
tr.join();
tw0.join();
tw1.join();
tw2.join();
tw3.join();
}
The example code only reads from "commands.txt" until EOF. I assume you'd like to read continuously like the "tail -f" command. That's however not doable with std::istream.
The code of course is clumsy but I guess it gives you an idea. One should for example add a blocking mechanism if the workers are way too slow processing their stuff and the queues may eat up all the precious RAM.

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
}
}
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
}
for (std::thread &t: workers) {
if (t.joinable()) {
t.join();
}
}
arr[4] = std::min_element(arr, arr+4);
}
return 0;
}
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.

This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
public:
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
private:
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
};
ThreadPool::Start
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
threads.resize(num_threads);
for (uint32_t i = 0; i < num_threads; i++) {
threads.at(i) = std::thread(ThreadLoop);
}
}
ThreadPool::ThreadLoop
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
{
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
});
if (should_terminate) {
return;
}
job = jobs.front();
jobs.pop();
}
job();
}
}
ThreadPool::QueueJob
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
{
std::unique_lock<std::mutex> lock(queue_mutex);
jobs.push(job);
}
mutex_condition.notify_one();
}
To use it:
thread_pool->QueueJob([] { /* ... */ });
ThreadPool::busy
void ThreadPool::busy() {
bool poolbusy;
{
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
}
return poolbusy;
}
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
ThreadPool::Stop
Stop the pool.
void ThreadPool::Stop() {
{
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
}
mutex_condition.notify_all();
for (std::thread& active_thread : threads) {
active_thread.join();
}
threads.clear();
}
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
Notes:
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().

You can use C++ Thread Pool Library, https://github.com/vit-vit/ctpl.
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
}
for (int j = 0; j < 4; ++j) {
results[j].get();
}
arr[4] = std::min_element(arr, arr + 4);
}
}
You will get the desired number of threads and will not create and delete them over and over again on the iterations.

A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)

A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
signal
i/o
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.

Follwoing [PhD EcE](https://stackoverflow.com/users/3818417/phd-ece) suggestion, I implemented the thread pool:
function_pool.h
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
{
private:
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
public:
Function_pool();
~Function_pool();
void push(std::function<void()> func);
void done();
void infinite_loop_func();
};
function_pool.cpp
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
{
}
Function_pool::~Function_pool()
{
}
void Function_pool::push(std::function<void()> func)
{
std::unique_lock<std::mutex> lock(m_lock);
m_function_queue.push(func);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
lock.unlock();
m_data_condition.notify_one();
}
void Function_pool::done()
{
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
lock.unlock();
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
m_data_condition.notify_all();
//notify all waiting threads.
}
void Function_pool::infinite_loop_func()
{
std::function<void()> func;
while (true)
{
{
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
{
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
return;
}
func = m_function_queue.front();
m_function_queue.pop();
//release the lock
}
func();
}
}
main.cpp
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
{
std::cout << "bla" << std::endl;
}
int main()
{
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
{
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
}
//here we should send our functions
for (int i = 0; i < 50; i++)
{
func_pool.push(example_function);
}
func_pool.done();
for (unsigned int i = 0; i < thread_pool.size(); i++)
{
thread_pool.at(i).join();
}
}

You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
...
});
}
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
}

Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return service.run(); };
grp.add_thread(new boost::thread(worker));
}
}
template<class F>
void enqueue(F f) {
service.post(f);
}
~thread_pool() {
service_worker.reset();
grp.join_all();
service.stop();
}
private:
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
};
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
});
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
});
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).

Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
}
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
}
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
if(prime)
cout << "is prime";
else
cout << "is not prime";
cout << endl;
}
}
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github: https://github.com/Tyler-Hardin/thread_pool.

looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here - https://github.com/yurir-dev/threadpool
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
tp.start(std::thread::hardware_concurrency());
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
}));
}
}
// wait until all pushed tasks are finished.
for (auto& f : futures)
f.get();
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);

I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
{
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
}
return res;
}
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
...
}
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
...
}

The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).

How do I reverse set_value() and 'deactivate' a promise?

I have a total n00b question here on synchronization. I have a 'writer' thread which assigns a different value 'p' to a promise at each iteration. I need 'reader' threads which wait for shared_futures of this value and then process them, and my question is how do I use future/promise to ensure that the reader threads wait for a new update of 'p' before performing their processing task at each iteration? Many thanks.

You can "reset" a promise by assigning it to a blank promise.
myPromise = promise< int >();
A more complete example:
promise< int > myPromise;
void writer()
{
for( int i = 0; i < 10; ++i )
{
cout << "Setting promise.\n";
myPromise.set_value( i );
myPromise = promise< int >{}; // Reset the promise.
cout << "Waiting to set again...\n";
this_thread::sleep_for( chrono::seconds( 1 ));
}
}
void reader()
{
int result;
do
{
auto myFuture = myPromise.get_future();
cout << "Waiting to receive result...\n";
result = myFuture.get();
cout << "Received " << result << ".\n";
} while( result < 9 );
}
int main()
{
std::thread write( writer );
std::thread read( reader );
write.join();
read.join();
return 0;
}
A problem with this approach, however, is that synchronization between the two threads can cause the writer to call promise::set_value() more than once between the reader's calls to future::get(), or future::get() to be called while the promise is being reset. These problems can be avoided with care (e.g. with proper sleeping between calls), but this takes us into the realm of hacking and guesswork rather than logically correct concurrency.
So although it's possible to reset a promise by assigning it to a fresh promise, doing so tends to raise broader synchronization issues.

A promise/future pair is designed to carry only a single value (or exception.). To do what you're describing, you probably want to adopt a different tool.
If you wish to have multiple threads (your readers) all stop at a common point, you might consider a barrier.

The following code demonstrates how the producer/consumer pattern can be implemented with future and promise.
There are two promise variables, used by a producer and a consumer thread. Each thread resets one of the two promise variables and waits for the other one.
#include <iostream>
#include <future>
#include <thread>
using namespace std;
// produces integers from 0 to 99
void producer(promise<int>& dataready, promise<void>& consumed)
{
for (int i = 0; i < 100; ++i) {
// do some work here ...
consumed = promise<void>{}; // reset
dataready.set_value(i); // make data available
consumed.get_future().wait(); // wait for the data to be consumed
}
dataready.set_value(-1); // no more data
}
// consumes integers
void consumer(promise<int>& dataready, promise<void>& consumed)
{
for (;;) {
int n = dataready.get_future().get(); // wait for data ready
if (n >= 0) {
std::cout << n << ",";
dataready = promise<int>{}; // reset
consumed.set_value(); // mark data as consumed
// do some work here ...
}
else
break;
}
}
int main(int argc, const char*argv[])
{
promise<int> dataready{};
promise<void> consumed{};
thread th1([&] {producer(dataready, consumed); });
thread th2([&] {consumer(dataready, consumed); });
th1.join();
th2.join();
std::cout << "\n";
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js