Stop infinite looping thread from main

Stop infinite looping thread from main - c++

I am relatively new to threads, and I'm still learning best techniques and the C++11 thread library. Right now I'm in the middle of implementing a worker thread which infinitely loops, performing some work. Ideally, the main thread would want to stop the loop from time to time to sync with the information that the worker thread is producing, and then start it again. My idea initially was this:
// Code run by worker thread
void thread() {
while(run_) {
// Do lots of work
}
}
// Code run by main thread
void start() {
if ( run_ ) return;
run_ = true;
// Start thread
}
void stop() {
if ( !run_ ) return;
run_ = false;
// Join thread
}
// Somewhere else
volatile bool run_ = false;
I was not completely sure about this so I started researching, and I discovered that volatile is actually not required for synchronization and is in fact generally harmful. Also, I discovered this answer, which describes a process nearly identical to the one I though about. In the answer's comments however, this solution is described as broken, as volatile does not guarantee that different processor cores readily (if ever) communicate changes on the volatile values.
My question is this then: Should I use an atomic flag, or something else entirely? What exactly is the property that is lacking in volatile and that is then provided by whatever construct is needed to solve my problem effectively?

Have you looked for the Mutex ? They're made to lock the Threads avoiding conflicts on the shared data. Is it what you're looking for ?

I think you want to use barrier synchronization using std::mutex?
Also take a look at boost thread, for a relatively high level threading library
Take a look at this code sample from the link:
#include <iostream>
#include <map>
#include <string>
#include <chrono>
#include <thread>
#include <mutex>
std::map<std::string, std::string> g_pages;
std::mutex g_pages_mutex;
void save_page(const std::string &url)
{
// simulate a long page fetch
std::this_thread::sleep_for(std::chrono::seconds(2));
std::string result = "fake content";
g_pages_mutex.lock();
g_pages[url] = result;
g_pages_mutex.unlock();
}
int main()
{
std::thread t1(save_page, "http://foo");
std::thread t2(save_page, "http://bar");
t1.join();
t2.join();
g_pages_mutex.lock(); // not necessary as the threads are joined, but good style
for (const auto &pair : g_pages) {
std::cout << pair.first << " => " << pair.second << '\n';
}
g_pages_mutex.unlock();
}

I would suggest to use std::mutex and std::condition_variable to solve the problem. Here's an example how it can work with C++11:
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>
using namespace std;
int main()
{
mutex m;
condition_variable cv;
// Tells, if the worker should stop its work
bool done = false;
// Zero means, it can be filled by the worker thread.
// Non-zero means, it can be consumed by the main thread.
int result = 0;
// run worker thread
auto t = thread{ [&]{
auto bound = 1000;
for (;;) // ever
{
auto sum = 0;
for ( auto i = 0; i != bound; ++i )
sum += i;
++bound;
auto lock = unique_lock<mutex>( m );
// wait until we can safely write the result
cv.wait( lock, [&]{ return result == 0; });
// write the result
result = sum;
// wake up the consuming thread
cv.notify_one();
// exit the loop, if flag is set. This must be
// done with mutex protection. Hence this is not
// in the for-condition expression.
if ( done )
break;
}
} };
// the main threads loop
for ( auto i = 0; i != 20; ++i )
{
auto r = 0;
{
// lock the mutex
auto lock = unique_lock<mutex>( m );
// wait until we can safely read the result
cv.wait( lock, [&]{ return result != 0; } );
// read the result
r = result;
// set result to zero so the worker can
// continue to produce new results.
result = 0;
// wake up the producer
cv.notify_one();
// the lock is released here (the end of the scope)
}
// do time consuming io at the side.
cout << r << endl;
}
// tell the worker to stop
{
auto lock = unique_lock<mutex>( m );
result = 0;
done = true;
// again the lock is released here
}
// wait for the worker to finish.
t.join();
cout << "Finished." << endl;
}
You could do the same with std::atomics by essentially implementing spin locks. Spin locks can be slower than mutexes. So I repeat the advise on the boost website:
Do not use spinlocks unless you are certain that you understand the consequences.
I believe that mutexes and condition variables are the way to go in your case.

Related

why does this thread pool deadlock or run too many times?

I'm trying to write a thread pool in c++ that fulfills the following criteria:
a single writer occasionally writes a new input value, and once it does, many threads concurrently access this same value, and each spit out a random floating point number.
each worker thread uses the same function, so there's no reason to build a thread-safe queue for all the different functions. I store the common function inside the thread_pool class.
these functions are by far the most computationally-intensive aspect of the program. Any locks that prevent these functions from doing their work is the primary thing I'm trying to avoid.
the floating point output from all these functions is simply averaged.
the user has a single function called thread_pool::start_work that changes this shared input, and tells all the workers to work for a fixed number of tasks.
thread_pool::start_work returns std::future
Below is what I have so far. It can be built and run with g++ test_tp.cpp -std=c++17 -lpthread; ./a.out Unfortunately it either deadlocks or does the work too many (or sometimes too few) times. I am thinking that it's because m_num_comps_done is not thread-safe. There are chances that all the threads skip over the last count, and then they all end up yielding. But isn't this variable atomic?
#include <vector>
#include <thread>
#include <mutex>
#include <shared_mutex>
#include <queue>
#include <atomic>
#include <future>
#include <iostream>
#include <numeric>
/**
* #class join_threads
* #brief RAII thread killer
*/
class join_threads
{
std::vector<std::thread>& m_threads;
public:
explicit join_threads(std::vector<std::thread>& threads_)
: m_threads(threads_) {}
~join_threads() {
for(unsigned long i=0; i < m_threads.size(); ++i) {
if(m_threads[i].joinable())
m_threads[i].join();
}
}
};
// how remove the first two template parameters ?
template<typename func_input_t, typename F>
class thread_pool
{
using func_output_t = typename std::result_of<F(func_input_t)>::type;
static_assert( std::is_floating_point<func_output_t>::value,
"function output type must be floating point");
unsigned m_num_comps;
std::atomic_bool m_done;
std::atomic_bool m_has_an_input;
std::atomic<int> m_num_comps_done; // need to be atomic? why?
F m_f; // same function always used
func_input_t m_param; // changed occasionally by a single writer
func_output_t m_working_output; // many reader threads average all their output to get this
std::promise<func_output_t> m_out;
mutable std::shared_mutex m_mut;
mutable std::mutex m_output_mut;
std::vector<std::thread> m_threads;
join_threads m_joiner;
void worker_thread() {
while(!m_done)
{
if(m_has_an_input){
if( m_num_comps_done.load() < m_num_comps - 1 ) {
std::shared_lock<std::shared_mutex> lk(m_mut);
func_output_t tmp = m_f(m_param); // long time
m_num_comps_done++;
// quick
std::lock_guard<std::mutex> lk2(m_output_mut);
m_working_output += tmp / m_num_comps;
}else if(m_num_comps_done.load() == m_num_comps - 1){
std::shared_lock<std::shared_mutex> lk(m_mut);
func_output_t tmp = m_f(m_param); // long time
m_num_comps_done++;
std::lock_guard<std::mutex> lk2(m_output_mut);
m_working_output += tmp / m_num_comps;
m_num_comps_done++;
try{
m_out.set_value(m_working_output);
}catch(std::future_error& e){
std::cout << "future_error caught: " << e.what() << "\n";
}
}else{
std::this_thread::yield();
}
}else{
std::this_thread::yield();
}
}
}
public:
/**
* #brief ctor spawns working threads
*/
thread_pool(F f, unsigned num_comps)
: m_num_comps(num_comps)
, m_done(false)
, m_has_an_input(false)
, m_joiner(m_threads)
, m_f(f)
{
unsigned const thread_count=std::thread::hardware_concurrency(); // should I subtract one?
try {
for(unsigned i=0; i<thread_count; ++i) {
m_threads.push_back( std::thread(&thread_pool::worker_thread, this));
}
} catch(...) {
m_done=true;
throw;
}
}
~thread_pool() {
m_done=true;
}
/**
* #brief changes the shared data member,
* resets the num_comps_left variable,
* resets the accumulator thing to 0, and
* resets the promise object
*/
std::future<func_output_t> start_work(func_input_t new_param) {
std::unique_lock<std::shared_mutex> lk(m_mut);
m_param = new_param;
m_num_comps_done = 0;
m_working_output = 0.0;
m_out = std::promise<func_output_t>();
m_has_an_input = true; // only really matters just after initialization
return m_out.get_future();
}
};
double slowSum(std::vector<double> nums) {
// std::this_thread::sleep_for(std::chrono::milliseconds(200));
return std::accumulate(nums.begin(), nums.end(), 0.0);
}
int main(){
// construct
thread_pool<std::vector<double>, std::function<double(std::vector<double>)>>
le_pool(slowSum, 1000);
// add work
auto ans = le_pool.start_work(std::vector<double>{1.2, 3.2, 4213.1});
std::cout << "final answer is: " << ans.get() << "\n";
std::cout << "it should be 4217.5\n";
return 1;
}

You check the "done" count, then get the lock. This allows multiple threads to be waiting for the lock. In particular, there might not be a thread that enters the second if body.
The other side of that is because you have all threads running all the time, the "last" thread may not get access to its exclusive section early (before enough threads have run) or even late (because additional threads are waiting at the mutex in the first loop).
To fix the first issue, since the second if block has all of the same code that is in the first if block, you can have just one block that checks the count to see if you've reached the end and should set the out value.
The second issue requires you to check m_num_comps_done a second time after acquiring the mutex.

C++ conditional wait race condition

Suppose that I have a program that has a worker-thread that squares number from a queue. The problem is that if the work is to light (takes to short time to do), the worker finishes the work and notifies the main thread before it have time to even has time to wait for the worker to finish.
My simple program looks as follows:
#include <atomic>
#include <condition_variable>
#include <queue>
#include <thread>
std::atomic<bool> should_end;
std::condition_variable work_to_do;
std::mutex work_to_do_lock;
std::condition_variable fn_done;
std::mutex fn_done_lock;
std::mutex data_lock;
std::queue<int> work;
std::vector<int> result;
void worker() {
while(true) {
if(should_end) return;
data_lock.lock();
if(work.size() > 0) {
int front = work.front();
work.pop();
if (work.size() == 0){
fn_done.notify_one();
}
data_lock.unlock();
result.push_back(front * front);
} else {
data_lock.unlock();
// nothing to do, so we just wait
std::unique_lock<std::mutex> lck(work_to_do_lock);
work_to_do.wait(lck);
}
}
}
int main() {
should_end = false;
std::thread t(worker); // start worker
data_lock.lock();
const int N = 10;
for(int i = 0; i <= N; i++) {
work.push(i);
}
data_lock.unlock();
work_to_do.notify_one(); // notify the worker that there is work to do
//if the worker is quick, it signals done here already
std::unique_lock<std::mutex> lck(fn_done_lock);
fn_done.wait(lck);
for(auto elem : result) {
printf("result = %d \n", elem);
}
work_to_do.notify_one(); //notify the worker so we can shut it down
should_end = true;
t.join();
return 0;
}

Your try to use notification itself over conditional variable as a flag that job is done is fundamentally flawed. First and foremost std::conditional_variable can have spurious wakeups so it should not be done this way. You should use your queue size as an actual condition for end of work, check and modify it under the same mutex protected in all threads and use the same mutex lock for condition variable. Then you may use std::conditional_variable to wait until work is done but you do it after you check queue size and if work is done at the moment you do not go to wait at all. Otherwise you check queue size in a loop (because of spurious wakeups) and wait if it is still not empty or you use std::condition_variable::wait() with a predicate, that has the loop internally.

std::thread: How to wait (join) for any of the given threads to complete?

For example, I have two threads, t1 and t2. I want to wait for t1 or t2 to finish. Is this possible?
If I have a series of threads, say, a std::vector<std::thread>, how can I do it?

There's always wait & notify using std::condition_variable, e.g.:
std::mutex m;
std::condition_variable cond;
std::atomic<std::thread::id> val;
auto task = [&] {
std::this_thread::sleep_for(1s); // Some work
val = std::this_thread::get_id();
cond.notify_all();
};
std::thread{task}.detach();
std::thread{task}.detach();
std::thread{task}.detach();
std::unique_lock<std::mutex> lock{m};
cond.wait(lock, [&] { return val != std::thread::id{}; });
std::cout << "Thread " << val << " finished first" << std::endl;
Note: val doesn't necessarily represent the thread that finished first as all threads finish at about the same time and an overwrite might occur, but it is only for the purposes of this example.

No, there is no wait for multiple objects equivalent in C++11's threading library.
If you want to wait on the first of a set of operations, consider having them feed a thread-safe producer-consumer queue.
Here is a post I made containing a threaded_queue<T>. Have the work product of your threads be delivered to such a queue. Have the consumer read off of the other end.
Now someone can wait on (the work product) of multiple threads at once. Or one thread. Or a GPU shader. Or work product being delivered over a RESTful web interface. You don't care.
The threads themselves should be managed by something like a thread pool or other higher level abstraction on top of std::thread, as std::thread makes a poor client-facing threading abstraction.
template<class T>
struct threaded_queue {
using lock = std::unique_lock<std::mutex>;
void push_back( T t ) {
{
lock l(m);
data.push_back(std::move(t));
}
cv.notify_one();
}
boost::optional<T> pop_front() {
lock l(m);
cv.wait(l, [this]{ return abort || !data.empty(); } );
if (abort) return {};
auto r = std::move(data.back());
data.pop_back();
return r;
}
void terminate() {
{
lock l(m);
abort = true;
data.clear();
}
cv.notify_all();
}
~threaded_queue()
{
terminate();
}
private:
std::mutex m;
std::deque<T> data;
std::condition_variable cv;
bool abort = false;
};
I'd use std::optional instead of boost::optional in C++17. It can also be replaced with a unique_ptr, or a number of other constructs.

It's easy to do with a polling wait:
#include<iostream>
#include<thread>
#include<random>
#include<chrono>
#include<atomic>
void thread_task(std::atomic<bool> & boolean) {
std::default_random_engine engine{std::random_device{}()};
std::uniform_int_distribution<int64_t> dist{1000, 3000};
int64_t wait_time = dist(engine);
std::this_thread::sleep_for(std::chrono::milliseconds{wait_time});
std::string line = "Thread slept for " + std::to_string(wait_time) + "ms.\n";
std::cout << line;
boolean.store(true);
}
int main() {
std::vector<std::thread> threads;
std::atomic<bool> boolean{false};
for(int i = 0; i < 4; i++) {
threads.emplace_back([&]{thread_task(boolean);});
}
std::string line = "We reacted after a single thread finished!\n";
while(!boolean) std::this_thread::yield();
std::cout << line;
for(std::thread & thread : threads) {
thread.join();
}
return 0;
}
Example output I got on Ideone.com:
Thread slept for 1194ms.
We reacted after a single thread finished!
Thread slept for 1967ms.
Thread slept for 2390ms.
Thread slept for 2984ms.
This probably isn't the best code possible, because polling loops are not necessarily best practice, but it should work as a start.

There is no standard way of waiting on multiple threads.
You need to resort to operating system specific functions like WaitForMultipleObjects on Windows.
A Windows only example:
HANDLE handles[] = { t1.native_handle(), t2.native_handle(), };
auto res = WaitForMultipleObjects(2 , handles, FALSE, INFINITE);
Funnily , when std::when_any will be standardized, one can do a standard but wasteful solution:
std::vector<std::thread> waitingThreads;
std::vector<std::future<void>> futures;
for (auto& thread: threads){
std::promise<void> promise;
futures.emplace_back(promise.get_future());
waitingThreads.emplace_back([&thread, promise = std::move(promise)]{
thread.join();
promise.set_value();
});
}
auto oneFinished = std::when_any(futures.begin(), futures.end());
very wastefull, still not available , but standard.

How to say to std::thread to stop?

I have two questions.
1) I want to launch some function with an infinite loop to work like a server and checking for messages in a separate thread. However I want to close it from the parent thread when I want. I'm confusing how to std::future or std::condition_variable in this case. Or is it better to create some global variable and change it to true/false from the parent thread.
2) I'd like to have something like this. Why this one example crashes during the run time?
#include <iostream>
#include <chrono>
#include <thread>
#include <future>
std::mutex mu;
bool stopServer = false;
bool serverFunction()
{
while (true)
{
// checking for messages...
// processing messages
std::this_thread::sleep_for(std::chrono::seconds(1));
mu.lock();
if (stopServer)
break;
mu.unlock();
}
std::cout << "Exiting func..." << std::endl;
return true;
}
int main()
{
std::thread serverThread(serverFunction);
// some stuff
system("pause");
mu.lock();
stopServer = true;
mu.unlock();
serverThread.join();
}

Why this one example crashes during the run time?
When you leave the inner loop of your thread, you leave the mutex locked, so the parent thread may be blocked forever if you use that mutex again.
You should use std::unique_lock or something similar to avoid problems like that.

You leave your mutex locked. Don't lock mutexes manually in 999/1000 cases.
In this case, you can use std::unique_lock<std::mutex> to create a RAII lock-holder that will avoid this problem. Simply create it in a scope, and have the lock area end at the end of the scope.
{
std::unique_lock<std::mutex> lock(mu);
stopServer = true;
}
in main and
{
std::unique_lock<std::mutex> lock(mu);
if (stopServer)
break;
}
in serverFunction.
Now in this case your mutex is pointless. Remove it. Replace bool stopServer with std::atomic<bool> stopServer, and remove all references to mutex and mu from your code.
An atomic variable can safely be read/written to from different threads.
However, your code is still busy-waiting. The right way to handle a server processing messages is a condition variable guarding the message queue. You then stop it by front-queuing a stop server message (or a flag) in the message queue.
This results in a server thread that doesn't wake up and pointlessly spin nearly as often. Instead, it blocks on the condition variable (with some spurious wakeups, but rare) and only really wakes up when there are new messages or it is told to shut down.
template<class T>
struct cross_thread_queue {
void push( T t ) {
{
auto l = lock();
data.push_back(std::move(t));
}
cv.notify_one();
}
boost::optional<T> pop() {
auto l = lock();
cv.wait( l, [&]{ return halt || !data.empty(); } );
if (halt) return {};
T r = data.front();
data.pop_front();
return std::move(r); // returning to optional<T>, so we'll explicitly `move` here.
}
void terminate() {
{
auto l = lock();
data.clear();
halt = true;
}
cv.notify_all();
}
private:
std::mutex m;
std::unique_lock<std::mutex> lock() {
return std::unique_lock<std::mutex>(m);
}
bool halt = false;
std::deque<T> data;
std::condition_variable cv;
};
We use boost::optional for the return type of pop -- if the queue is halted, pop returns an empty optional. Otherwise, it blocks until there is data.
You can replace this with anything optional-like, even a std::pair<bool, T> where the first element says if there is anything to return, or a std::unique_ptr<T>, or a std::experimental::optional, or a myriad of other choices.
cross_thread_queue<int> queue;
bool serverFunction()
{
while (auto message = queue.pop()) {
// processing *message
std::cout << "Processing " << *message << std::endl;
}
std::cout << "Exiting func..." << std::endl;
return true;
}
int main()
{
std::thread serverThread(serverFunction);
// some stuff
queue.push(42);
system("pause");
queue.terminate();
serverThread.join();
}
live example.

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
}
}
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
}
for (std::thread &t: workers) {
if (t.joinable()) {
t.join();
}
}
arr[4] = std::min_element(arr, arr+4);
}
return 0;
}
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.

This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
public:
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
private:
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
};
ThreadPool::Start
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
threads.resize(num_threads);
for (uint32_t i = 0; i < num_threads; i++) {
threads.at(i) = std::thread(ThreadLoop);
}
}
ThreadPool::ThreadLoop
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
{
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
});
if (should_terminate) {
return;
}
job = jobs.front();
jobs.pop();
}
job();
}
}
ThreadPool::QueueJob
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
{
std::unique_lock<std::mutex> lock(queue_mutex);
jobs.push(job);
}
mutex_condition.notify_one();
}
To use it:
thread_pool->QueueJob([] { /* ... */ });
ThreadPool::busy
void ThreadPool::busy() {
bool poolbusy;
{
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
}
return poolbusy;
}
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
ThreadPool::Stop
Stop the pool.
void ThreadPool::Stop() {
{
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
}
mutex_condition.notify_all();
for (std::thread& active_thread : threads) {
active_thread.join();
}
threads.clear();
}
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
Notes:
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().

You can use C++ Thread Pool Library, https://github.com/vit-vit/ctpl.
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
}
for (int j = 0; j < 4; ++j) {
results[j].get();
}
arr[4] = std::min_element(arr, arr + 4);
}
}
You will get the desired number of threads and will not create and delete them over and over again on the iterations.

A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)

A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
signal
i/o
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.

Follwoing [PhD EcE](https://stackoverflow.com/users/3818417/phd-ece) suggestion, I implemented the thread pool:
function_pool.h
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
{
private:
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
public:
Function_pool();
~Function_pool();
void push(std::function<void()> func);
void done();
void infinite_loop_func();
};
function_pool.cpp
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
{
}
Function_pool::~Function_pool()
{
}
void Function_pool::push(std::function<void()> func)
{
std::unique_lock<std::mutex> lock(m_lock);
m_function_queue.push(func);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
lock.unlock();
m_data_condition.notify_one();
}
void Function_pool::done()
{
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
lock.unlock();
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
m_data_condition.notify_all();
//notify all waiting threads.
}
void Function_pool::infinite_loop_func()
{
std::function<void()> func;
while (true)
{
{
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
{
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
return;
}
func = m_function_queue.front();
m_function_queue.pop();
//release the lock
}
func();
}
}
main.cpp
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
{
std::cout << "bla" << std::endl;
}
int main()
{
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
{
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
}
//here we should send our functions
for (int i = 0; i < 50; i++)
{
func_pool.push(example_function);
}
func_pool.done();
for (unsigned int i = 0; i < thread_pool.size(); i++)
{
thread_pool.at(i).join();
}
}

You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
...
});
}
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
}

Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return service.run(); };
grp.add_thread(new boost::thread(worker));
}
}
template<class F>
void enqueue(F f) {
service.post(f);
}
~thread_pool() {
service_worker.reset();
grp.join_all();
service.stop();
}
private:
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
};
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
});
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
});
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).

Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
}
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
}
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
if(prime)
cout << "is prime";
else
cout << "is not prime";
cout << endl;
}
}
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github: https://github.com/Tyler-Hardin/thread_pool.

looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here - https://github.com/yurir-dev/threadpool
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
tp.start(std::thread::hardware_concurrency());
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
}));
}
}
// wait until all pushed tasks are finished.
for (auto& f : futures)
f.get();
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);

I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
{
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
}
return res;
}
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
...
}
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
...
}

The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js