C++ Fork Join Parallelism Blocking

C++ Fork Join Parallelism Blocking - c++

Suppose you wish you run a section in parallel, then merge back into the main thread then back to section in parallel, and so on. Similar to the childhood game red light green light.
I've given an example of what I'm trying to do, where I'm using a conditional variable to block the threads at the start but wish to start them all in parallel but then block them at the end so they can be printed out serially. The *= operation could be a much larger operation spanning many seconds. Reusing the threads is also important. Using a task queue might be too heavy.
I need to use some kind of blocking construct that isn't just a plain busy loop, because I know how to solve this problem with busy loops.
In English:
Thread 1 creates 10 threads that are blocked
Thread 1 signals all threads to start (without blocking eachother)
Thread 2-11 process their exclusive memory
Thread 1 is waiting until 2-11 are complete (can use an atomic to count here)
Thread 2-11 complete, each can notify for 1 to check its condition if necessary
Thread 1 checks its condition and prints the array
Thread 1 resignals 2-11 to process again, continuing from 2
Example code (Naive adapted from example on cplusplus.com):
// condition_variable example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable
#include <atomic>
std::mutex mtx;
std::condition_variable cv;
bool ready = false;
std::atomic<int> count(0);
bool end = false;
int a[10];
void doublea (int id) {
while(!end) {
std::unique_lock<std::mutex> lck(mtx);
while (!ready) cv.wait(lck);
a[id] *= 2;
count.fetch_add(1);
}
}
void go() {
std::unique_lock<std::mutex> lck(mtx);
ready = true;
cv.notify_all();
ready = false; // Naive
while (count.load() < 10) sleep(1);
for(int i = 0; i < 10; i++) {
std::cout << a[i] << std::endl;
}
ready = true;
cv.notify_all();
ready = false;
while (count.load() < 10) sleep(1);
for(int i = 0; i < 10; i++) {
std::cout << a[i] << std::endl;
}
end = true;
cv.notify_all();
}
int main () {
std::thread threads[10];
// spawn 10 threads:
for (int i=0; i<10; ++i) {
a[i] = 0;
threads[i] = std::thread(doublea,i);
}
std::cout << "10 threads ready to race...\n";
go(); // go!
return 0;
}

This is not as trivial to implement it efficiently. Moreover, it does not make any sense unless you are learning this subject. Conditional variable is not a good choice here because it does not scale well.
I suggest you to look how mature run-time libraries implement fork-join parallelism and learn from them or use them in your app. See http://www.openmprtl.org/, http://opentbb.org/, https://www.cilkplus.org/ - all these are open-source.
OpenMP is the closest model for what you are looking for and it has the most efficient implementation of fork-join barriers. Though, it has its disadvantages because it is designed for HPC and lacks dynamic composability. TBB and Cilk work best for nested parallelism and usage in modules and libraries which can be used in context of external parallel regions.

You can use barrier or condition variable to start all threads. Then thread one can wait to when all threads end their work (by join method on all threads, it is blocking) and then print in one for loop their data.

Related

How to run two loops from two threads one by one, like a flip flop?

I have a question which similarly answered here but it is not exactly what I need. I have two threads, each has a loop. Now I want to force two threads to work like a flip flop. exactly like this ABABAB or BABABA... it is not important who start first but must work one by one.
There is a very easy code I have, but it does not work well because the thread A iterates super fact and takes the lock again. Please help me as I am learning C++ multi-threading.
1- in the above link it is said that maybe it's not best approach to have two threads. Assume it is a game and I must run one iteration for player A and one iteration for Player B.
I agree it does not give me much better efficiency because at each moment only one of them is working, I want to learn if there is a way.
int pointA , pointB;
void testA()
{
int i = 0;
while (i < 10)
{
unique_lock<std::mutex> lck(mtx);
cout << pointB << endl;
pointA++;
i++;
}
}
void main()
{
int i = 0;
pointA =100, pointB=0;
thread t(testA);
while (i < 10)
{
unique_lock<std::mutex> lck(mtx);
cout << pointA << endl;
pointB++;
i++;
}
t.join();
}

Without using the standard and if same code in multiple threads is allowed, you can branch the flow by a variable:
// in both threads
unique_lock<std::mutex> lck(mtx);
if(var && myId == "A")
{
// stuff that A does
var = false;
}
else if(!var && myId == "B")
{
// stuff that B does
var = true;
}
but this would be slow because there are other cases where id values do not match the variable condition and checking every case makes it even more slower.
C++ has something to help on this:
std::condition_variable
by using condition variable, you can have a different condition per thread automatically triggered to stop waiting:
std::condition_variable cv;
...
std::unique_lock lk(mtx);
cv.wait(lk, []{return your_logic();});
Since it just waits, it does not waste CPU cycles like the first example. Unnecessary waking-up/processing gets lower and memory bandwidth is not wasted either.
More implicit way of combining outputs from 2 threads would be using two thread-safe queues, one from A to B, one from B to output:
// assuming the implementation blocks .front() until it is filled
ThreadSafeQueue q1;
ThreadSafeQueue q2;
// in thread A
for(int i=0;i<10;i++)
q1.push("A");
// in thread B
for(int i=0;i<10;i++)
{
q2.push(q1.front()+"B");
q1.pop();
}
// in main thread
auto result = q2.front(); // "AB"
q2.pop();
With this pattern, thread-B would only work once for each result of thread-A. But this doesn't synchronize the threads. The thread-A could fill queue with 10 "A" values before thread-B processes the 5th "AB" and before main thread gets the 3rd "AB".
To enforce flip-flop-like work in time, you can limit the size of the queues to 1 or 2. Then it would block thread-A until thread-B consumes it and second queue would block thread-B until main thread consumes it.
Yet another way of synchronizing multiple threads for different tasks would be using cyclic-barriers:
[C++20]
std::barrier sync_point(size /*2?*/, func_on_completion);
// in thread A
..stuff..flip..
sync_point.arrive_and_wait();
..more stuff that needs updated stuff..
// in thread B
..stuff..flop..
sync_point.arrive_and_wait();
..more stuff that needs updated stuff..
the barrier makes sure both threads wait each other before continuing. If this is in a loop, then they will process one step (1 step means both A and B produced at the same time here) at a time while waiting each other before going next iteration. So it will produced ABBAABABBABAAB while never doing more A or more B than the other. If A is always required before B, then you need more barriers to ensure order:
// in thread A and B
if(thread is A)
output "A"
sync_point.arrive_and_wait();
if(thread is B)
output "B"
sync_point.arrive_and_wait();
this prints ABABABAB...
If you are using OpenMP, it has barrier too:
#pragma omp parallel
{
...work...
#pragma omp barrier
...more work...
}
if you don't want second part happened same time as first part of next iteration, you need two barriers:
for(...)
#pragma omp parallel
{
...work...
#pragma omp barrier
...more work...
#pragma omp barrier
}
if order of two threads' work in each iteration is still important, this would require dedicated segments to each thread
for(...)
#pragma omp parallel
{
if(thread is A?)
do this
#pragma omp barrier
if(thread is B?)
do that
#pragma omp barrier
}
this would write ABABAB always, although with decreased efficiency because OpenMP block start/stop overhead is high and measurable in a loop. It would be better to have a loop in each thread instead:
#pragma omp parallel num_threads(2)
{
// this loop runs same for both threads, not shared/parallelized
for(int i=0;i<10;i++)
{
int id=omp_get_thread_num();
if(id==0)
std::cout<<"A"<<std::endl;
#pragma omp barrier
if(id==1)
std::cout<<"B"<<std::endl;
#pragma omp barrier
}
}
this outputs ABABABABAB... and has no openmp start/stop overhead (but still barrier overhead exists).

based on this answer and answer above, I managed to write the code. We need one flag to switch the turn between two loops. There is also another way with ready-go approach explained well here, it is in C# but concepts are same:
#include <iostream>
#include <thread>
#include <mutex>
using namespace std;
mutex mutex1;
condition_variable cv3;
char turn;
void ThreadA()
{
for (int i = 0; i < 1000; i++)
{
unique_lock<mutex> lock(mutex1);
cv3.wait(lock, [] {return (turn == 'A'); });
cout << "Thread A" << endl;
turn = 'B';
cv3.notify_all();
}
}
void ThreadB()
{
for (int i = 0; i < 1000; i++)
{
unique_lock<mutex> lock(mutex1);
cv3.wait(lock, [] {return (turn == 'B'); });
cout << "Thread B" << endl;
turn = 'A';
cv3.notify_all();
}
}
void ExecuteThreads()
{
turn = 'A';
std::thread t1(ThreadA);
std::thread t2(ThreadB);
t1.join();
t2.join();
std::cout << "Finished" << std::endl;
}
int main()
{
ExecuteThreads();
return 0;
}

Enforce concurrent modification of a variable (C++)

I'm trying to unit test an atomic library (I am aware that an atomic library is not suitable for unit testing, but I still want to give it a try)
For this, I want to let X parallel threads increment a counter and evaluate the resulting value (it should be X).
The code is below. The problem is that is it never breaks. The Counter always nicely ends up being 2000 (see below). What I also notice is that the cout is also printed as a whole (instead of being mingled, what I remember seeing with other multithreaded couts)
My question is: why doesn't this break? Or how can I let this break?
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <condition_variable>
std::mutex m;
std::condition_variable cv;
bool start = false;
int Counter = 0;
void Inc() {
// Wait until test says start
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, [] {return start; });
std::cout << "Incrementing in thread " << std::this_thread::get_id() << std::endl;
Counter++;
}
int main()
{
std::vector<std::thread> threads;
for (int i = 0; i < 2000; ++i) {
threads.push_back(std::thread(Inc));
}
// signal the threads to start
{
std::lock_guard<std::mutex> lk(m);
start = true;
}
cv.notify_all();
for (auto& thread : threads) {
thread.join();
}
// Now check whether value is right
std::cout << "Counter: " << Counter << std::endl;
}
The results looks like this (but then 2000 lines)
Incrementing in thread 130960
Incrementing in thread 130948
Incrementing in thread 130944
Incrementing in thread 130932
Incrementing in thread 130928
Incrementing in thread 130916
Incrementing in thread 130912
Incrementing in thread 130900
Incrementing in thread 130896
Counter: 2000
Any help would be appreciated
UPDATE: Reducing the nr of threads to 4, but incrementing a million times in a for loop (as suggested by #tkausl) the cout of thread id appear to be sequential..
UPDATE2: Turns out that the lock had to be unlocked to prevent exclusive access per thread (lk.unlock()). An additional yield in the for-loop increased the race condition effect.

cv.wait(lk, [] {return start; }); only returns with the lk acquired. So it's exclusive. You might want to unlock lk right after:
void Inc() {
// Wait until test says start
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, [] {return start; });
lk.unlock();
Counter++;
}
And you must remove std::cout, because it potentially introduces synchronization.

Stop infinite looping thread from main

I am relatively new to threads, and I'm still learning best techniques and the C++11 thread library. Right now I'm in the middle of implementing a worker thread which infinitely loops, performing some work. Ideally, the main thread would want to stop the loop from time to time to sync with the information that the worker thread is producing, and then start it again. My idea initially was this:
// Code run by worker thread
void thread() {
while(run_) {
// Do lots of work
}
}
// Code run by main thread
void start() {
if ( run_ ) return;
run_ = true;
// Start thread
}
void stop() {
if ( !run_ ) return;
run_ = false;
// Join thread
}
// Somewhere else
volatile bool run_ = false;
I was not completely sure about this so I started researching, and I discovered that volatile is actually not required for synchronization and is in fact generally harmful. Also, I discovered this answer, which describes a process nearly identical to the one I though about. In the answer's comments however, this solution is described as broken, as volatile does not guarantee that different processor cores readily (if ever) communicate changes on the volatile values.
My question is this then: Should I use an atomic flag, or something else entirely? What exactly is the property that is lacking in volatile and that is then provided by whatever construct is needed to solve my problem effectively?

Have you looked for the Mutex ? They're made to lock the Threads avoiding conflicts on the shared data. Is it what you're looking for ?

I think you want to use barrier synchronization using std::mutex?
Also take a look at boost thread, for a relatively high level threading library
Take a look at this code sample from the link:
#include <iostream>
#include <map>
#include <string>
#include <chrono>
#include <thread>
#include <mutex>
std::map<std::string, std::string> g_pages;
std::mutex g_pages_mutex;
void save_page(const std::string &url)
{
// simulate a long page fetch
std::this_thread::sleep_for(std::chrono::seconds(2));
std::string result = "fake content";
g_pages_mutex.lock();
g_pages[url] = result;
g_pages_mutex.unlock();
}
int main()
{
std::thread t1(save_page, "http://foo");
std::thread t2(save_page, "http://bar");
t1.join();
t2.join();
g_pages_mutex.lock(); // not necessary as the threads are joined, but good style
for (const auto &pair : g_pages) {
std::cout << pair.first << " => " << pair.second << '\n';
}
g_pages_mutex.unlock();
}

I would suggest to use std::mutex and std::condition_variable to solve the problem. Here's an example how it can work with C++11:
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <thread>
using namespace std;
int main()
{
mutex m;
condition_variable cv;
// Tells, if the worker should stop its work
bool done = false;
// Zero means, it can be filled by the worker thread.
// Non-zero means, it can be consumed by the main thread.
int result = 0;
// run worker thread
auto t = thread{ [&]{
auto bound = 1000;
for (;;) // ever
{
auto sum = 0;
for ( auto i = 0; i != bound; ++i )
sum += i;
++bound;
auto lock = unique_lock<mutex>( m );
// wait until we can safely write the result
cv.wait( lock, [&]{ return result == 0; });
// write the result
result = sum;
// wake up the consuming thread
cv.notify_one();
// exit the loop, if flag is set. This must be
// done with mutex protection. Hence this is not
// in the for-condition expression.
if ( done )
break;
}
} };
// the main threads loop
for ( auto i = 0; i != 20; ++i )
{
auto r = 0;
{
// lock the mutex
auto lock = unique_lock<mutex>( m );
// wait until we can safely read the result
cv.wait( lock, [&]{ return result != 0; } );
// read the result
r = result;
// set result to zero so the worker can
// continue to produce new results.
result = 0;
// wake up the producer
cv.notify_one();
// the lock is released here (the end of the scope)
}
// do time consuming io at the side.
cout << r << endl;
}
// tell the worker to stop
{
auto lock = unique_lock<mutex>( m );
result = 0;
done = true;
// again the lock is released here
}
// wait for the worker to finish.
t.join();
cout << "Finished." << endl;
}
You could do the same with std::atomics by essentially implementing spin locks. Spin locks can be slower than mutexes. So I repeat the advise on the boost website:
Do not use spinlocks unless you are certain that you understand the consequences.
I believe that mutexes and condition variables are the way to go in your case.

Can single condition variable be used for bidirectional synchronization?

Is it possible to use single condition variable for bidirectional synchronization (i.e. two different conditions are waited for at different times on the same condition variable)? I'm sure that no more than one thread will wait on the condition variable at any time. The example code below illustrates what I'm thinking about:
#include <condition_variable>
#include <thread>
#include <mutex>
#include <iostream>
std::condition_variable condvar;
std::mutex mutex;
int i;
void even()
{
while (i < 10000) {
std::unique_lock<std::mutex> lock(mutex);
if (i % 2 != 0) {
condvar.notify_one();
condvar.wait(lock, [&](){ return i % 2 == 0; });
}
i++;
std::cout << i << std::endl;
}
condvar.notify_one();
}
void odd()
{
while (i < 10001) {
std::unique_lock<std::mutex> lock(mutex);
if (i % 2 != 1) {
condvar.notify_one();
condvar.wait(lock, [&](){ return i % 2 == 1; });
}
i++;
std::cout << i << std::endl;
}
}
int main()
{
i = 0;
std::thread a(even);
std::thread b(odd);
a.join();
b.join();
}

Yes, it's perfectly safe. However, I wouldn't get into the habit of calling notify_one when you actually want to notify all threads waiting for the condition, even if you "know" only one thread will be waiting.

Fundamentally, notifying a condition variable really only declares "the condition you are looking for may have occurred."
The only concern one could have with bidirectional communication with one condition variable is that a thread may be woken up by a notify when there is no data available for it. Proper use of condition variables is done in a while loop, so the worst case is that the thread sees no data is available, and goes back to sleep. This is totally safe, so bidirectional communication with one condition variable is possible.
That being said, there is little advantage to waking up threads unnecessarily, so it is usually preferable to have one mutex protecting the data (i.e. you must hold the mutex to access the data), and two different condition variables indicating different conditions. This will minimize how many times you wake up a thread to find it has no data to work on (called a "spurious" notification).

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
}
}
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
}
for (std::thread &t: workers) {
if (t.joinable()) {
t.join();
}
}
arr[4] = std::min_element(arr, arr+4);
}
return 0;
}
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.

This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
public:
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
private:
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
};
ThreadPool::Start
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
threads.resize(num_threads);
for (uint32_t i = 0; i < num_threads; i++) {
threads.at(i) = std::thread(ThreadLoop);
}
}
ThreadPool::ThreadLoop
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
{
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
});
if (should_terminate) {
return;
}
job = jobs.front();
jobs.pop();
}
job();
}
}
ThreadPool::QueueJob
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
{
std::unique_lock<std::mutex> lock(queue_mutex);
jobs.push(job);
}
mutex_condition.notify_one();
}
To use it:
thread_pool->QueueJob([] { /* ... */ });
ThreadPool::busy
void ThreadPool::busy() {
bool poolbusy;
{
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
}
return poolbusy;
}
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
ThreadPool::Stop
Stop the pool.
void ThreadPool::Stop() {
{
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
}
mutex_condition.notify_all();
for (std::thread& active_thread : threads) {
active_thread.join();
}
threads.clear();
}
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
Notes:
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().

You can use C++ Thread Pool Library, https://github.com/vit-vit/ctpl.
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
}
for (int j = 0; j < 4; ++j) {
results[j].get();
}
arr[4] = std::min_element(arr, arr + 4);
}
}
You will get the desired number of threads and will not create and delete them over and over again on the iterations.

A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)

A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
signal
i/o
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.

Follwoing [PhD EcE](https://stackoverflow.com/users/3818417/phd-ece) suggestion, I implemented the thread pool:
function_pool.h
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
{
private:
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
public:
Function_pool();
~Function_pool();
void push(std::function<void()> func);
void done();
void infinite_loop_func();
};
function_pool.cpp
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
{
}
Function_pool::~Function_pool()
{
}
void Function_pool::push(std::function<void()> func)
{
std::unique_lock<std::mutex> lock(m_lock);
m_function_queue.push(func);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
lock.unlock();
m_data_condition.notify_one();
}
void Function_pool::done()
{
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
lock.unlock();
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
m_data_condition.notify_all();
//notify all waiting threads.
}
void Function_pool::infinite_loop_func()
{
std::function<void()> func;
while (true)
{
{
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
{
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
return;
}
func = m_function_queue.front();
m_function_queue.pop();
//release the lock
}
func();
}
}
main.cpp
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
{
std::cout << "bla" << std::endl;
}
int main()
{
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
{
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
}
//here we should send our functions
for (int i = 0; i < 50; i++)
{
func_pool.push(example_function);
}
func_pool.done();
for (unsigned int i = 0; i < thread_pool.size(); i++)
{
thread_pool.at(i).join();
}
}

You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
...
});
}
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
}

Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return service.run(); };
grp.add_thread(new boost::thread(worker));
}
}
template<class F>
void enqueue(F f) {
service.post(f);
}
~thread_pool() {
service_worker.reset();
grp.join_all();
service.stop();
}
private:
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
};
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
});
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
});
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).

Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
}
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
}
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
if(prime)
cout << "is prime";
else
cout << "is not prime";
cout << endl;
}
}
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github: https://github.com/Tyler-Hardin/thread_pool.

looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here - https://github.com/yurir-dev/threadpool
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
tp.start(std::thread::hardware_concurrency());
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
}));
}
}
// wait until all pushed tasks are finished.
for (auto& f : futures)
f.get();
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);

I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
{
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
}
return res;
}
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
...
}
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
...
}

The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js