Extend the life of threads with synchronization (C++11) - c++

I have a program with a function which takes a pointer as arg, and a main. The main is creating n threads, each of them running the function on different memory areas depending on the passed arg. Threads are then joined, the main performs some data mixing between the area and creates n new threads which do the the same operation as the old ones.
To improve the program I would like to keep the threads alive, removing the long time necessary to create them. Threads should sleep when the main is working and notified when they have to come up again. At the same way the main should wait when threads are working as it did with join.
I cannot end up with a strong implementation of this, always falling in a deadlock.
Simple baseline code, any hints about how to modify this would be much appreciated
#include <thread>
#include <climits>
...
void myfunc(void * p) {
do_something(p);
}
int main(){
void * myp[n_threads] {a_location, another_location,...};
std::thread mythread[n_threads];
for (unsigned long int j=0; j < ULONG_MAX; j++) {
for (unsigned int i=0; i < n_threads; i++) {
mythread[i] = std::thread(myfunc, myp[i]);
}
for (unsigned int i=0; i < n_threads; i++) {
mythread[i].join();
}
mix_data(myp);
}
return 0;
}

Here is a possible approach using only classes from the C++11 Standard Library. Basically, each thread you create has an associated command queue (encapsulated in std::packaged_task<> objects) which it continuously check. If the queue is empty, the thread will just wait on a condition variable (std::condition_variable).
While data races are avoided through the use of std::mutex and std::unique_lock<> RAII wrappers, the main thread can wait for a particular job to be terminated by storing the std::future<> object associated to each submitted std::packaged_tast<> and call wait() on it.
Below is a simple program that follows this design. Comments should be sufficient to explain what it does:
#include <thread>
#include <iostream>
#include <sstream>
#include <future>
#include <queue>
#include <condition_variable>
#include <mutex>
// Convenience type definition
using job = std::packaged_task<void()>;
// Some data associated to each thread.
struct thread_data
{
int id; // Could use thread::id, but this is filled before the thread is started
std::thread t; // The thread object
std::queue<job> jobs; // The job queue
std::condition_variable cv; // The condition variable to wait for threads
std::mutex m; // Mutex used for avoiding data races
bool stop = false; // When set, this flag tells the thread that it should exit
};
// The thread function executed by each thread
void thread_func(thread_data* pData)
{
std::unique_lock<std::mutex> l(pData->m, std::defer_lock);
while (true)
{
l.lock();
// Wait until the queue won't be empty or stop is signaled
pData->cv.wait(l, [pData] () {
return (pData->stop || !pData->jobs.empty());
});
// Stop was signaled, let's exit the thread
if (pData->stop) { return; }
// Pop one task from the queue...
job j = std::move(pData->jobs.front());
pData->jobs.pop();
l.unlock();
// Execute the task!
j();
}
}
// Function that creates a simple task
job create_task(int id, int jobNumber)
{
job j([id, jobNumber] ()
{
std::stringstream s;
s << "Hello " << id << "." << jobNumber << std::endl;
std::cout << s.str();
});
return j;
}
int main()
{
const int numThreads = 4;
const int numJobsPerThread = 10;
std::vector<std::future<void>> futures;
// Create all the threads (will be waiting for jobs)
thread_data threads[numThreads];
int tdi = 0;
for (auto& td : threads)
{
td.id = tdi++;
td.t = std::thread(thread_func, &td);
}
//=================================================
// Start assigning jobs to each thread...
for (auto& td : threads)
{
for (int i = 0; i < numJobsPerThread; i++)
{
job j = create_task(td.id, i);
futures.push_back(j.get_future());
std::unique_lock<std::mutex> l(td.m);
td.jobs.push(std::move(j));
}
// Notify the thread that there is work do to...
td.cv.notify_one();
}
// Wait for all the tasks to be completed...
for (auto& f : futures) { f.wait(); }
futures.clear();
//=================================================
// Here the main thread does something...
std::cin.get();
// ...done!
//=================================================
//=================================================
// Posts some new tasks...
for (auto& td : threads)
{
for (int i = 0; i < numJobsPerThread; i++)
{
job j = create_task(td.id, i);
futures.push_back(j.get_future());
std::unique_lock<std::mutex> l(td.m);
td.jobs.push(std::move(j));
}
// Notify the thread that there is work do to...
td.cv.notify_one();
}
// Wait for all the tasks to be completed...
for (auto& f : futures) { f.wait(); }
futures.clear();
// Send stop signal to all threads and join them...
for (auto& td : threads)
{
std::unique_lock<std::mutex> l(td.m);
td.stop = true;
td.cv.notify_one();
}
// Join all the threads
for (auto& td : threads) { td.t.join(); }
}

The concept you want is the threadpool. This SO question deals with existing implementations.
The idea is to have a container for a number of thread instances. Each instance is associated with a function which polls a task queue, and when a task is available, pulls it and run it. Once the task is over (if it terminates, but that's another problem), the thread simply loop over to the task queue.
So you need a synchronized queue, a thread class which implements the loop on the queue, an interface for the task objects, and maybe a class to drive the whole thing (the pool class).
Alternatively, you could make a very specialized thread class for the task it has to perform (with only the memory area as a parameter for instance). This requires a notification mechanism for the threads to indicate that they are done with the current iteration.
The thread main function would be a loop on that specific task, and at the end of one iteration, the thread signals its end, and wait on condition variables to start the next loop. In essence, you would be inlining the task code within the thread, dropping the need of a queue altogether.
using namespace std;
// semaphore class based on C++11 features
class semaphore {
private:
mutex mMutex;
condition_variable v;
int mV;
public:
semaphore(int v): mV(v){}
void signal(int count=1){
unique_lock lock(mMutex);
mV+=count;
if (mV > 0) mCond.notify_all();
}
void wait(int count = 1){
unique_lock lock(mMutex);
mV-= count;
while (mV < 0)
mCond.wait(lock);
}
};
template <typename Task>
class TaskThread {
thread mThread;
Task *mTask;
semaphore *mSemStarting, *mSemFinished;
volatile bool mRunning;
public:
TaskThread(Task *task, semaphore *start, semaphore *finish):
mTask(task), mRunning(true),
mSemStart(start), mSemFinished(finish),
mThread(&TaskThread<Task>::psrun){}
~TaskThread(){ mThread.join(); }
void run(){
do {
(*mTask)();
mSemFinished->signal();
mSemStart->wait();
} while (mRunning);
}
void finish() { // end the thread after the current loop
mRunning = false;
}
private:
static void psrun(TaskThread<Task> *self){ self->run();}
};
classcMyTask {
public:
MyTask(){}
void operator()(){
// some code here
}
};
int main(){
MyTask task1;
MyTask task2;
semaphore start(2), finished(0);
TaskThread<MyTask> t1(&task1, &start, &finished);
TaskThread<MyTask> t2(&task2, &start, &finished);
for (int i = 0; i < 10; i++){
finished.wait(2);
start.signal(2);
}
t1.finish();
t2.finish();
}
The proposed (crude) implementation above relies on the Task type which must provide the operator() (ie. a functor like class). I said you could incorporate the task code directly in the thread function body earlier, but since I don't know it, I kept it as abstract as I could. There's one condition variable for the start of threads, and one for their end, both encapsulated in semaphore instances.
Seeing the other answer proposing the use of boost::barrier, I can only support this idea: make sure to replace my semaphore class with that class if possible, the reason being that it is better to rely on well tested and maintained external code rather than a self implemented solution for the same feature set.
All in all, both approaches are valid, but the former gives up a tiny bit of performance in favor of flexibility. If the task to be performed takes a sufficiently long time, the management and queue synchronization cost becomes negligible.
Update: code fixed and tested. Replaced a simple condition variable by a semaphore.

It can easily be achieved using a barrier (just a convenience wrapper over a conditional variable and a counter). It basically blocks until all N threads have reached the "barrier". It then "recycles" again. Boost provides an implementation.
void myfunc(void * p, boost::barrier& start_barrier, boost::barrier& end_barrier) {
while (!stop_condition) // You'll need to tell them to stop somehow
{
start_barrier.wait ();
do_something(p);
end_barrier.wait ();
}
}
int main(){
void * myp[n_threads] {a_location, another_location,...};
boost::barrier start_barrier (n_threads + 1); // child threads + main thread
boost::barrier end_barrier (n_threads + 1); // child threads + main thread
std::thread mythread[n_threads];
for (unsigned int i=0; i < n_threads; i++) {
mythread[i] = std::thread(myfunc, myp[i], start_barrier, end_barrier);
}
start_barrier.wait (); // first unblock the threads
for (unsigned long int j=0; j < ULONG_MAX; j++) {
end_barrier.wait (); // mix_data must not execute before the threads are done
mix_data(myp);
start_barrier.wait (); // threads must not start new iteration before mix_data is done
}
return 0;
}

The following is a simple compiling and working code performing some random stuffs. It implements aleguna's concept of barrier. The task length of each thread is different so it is really necessary to have a strong synchronization mechanism. I will try to do a pool on the same tasks and benchmark the result, and then maybe with futures as pointed out by Andy Prowl.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <chrono>
#include <complex>
#include <random>
const unsigned int n_threads=4; //varying this will not (almost) change the total amount of work
const unsigned int task_length=30000/n_threads;
const float task_length_variation=task_length/n_threads;
unsigned int rep=1000; //repetitions of tasks
class t_chronometer{
private:
std::chrono::steady_clock::time_point _t;
public:
t_chronometer(): _t(std::chrono::steady_clock::now()) {;}
void reset() {_t = std::chrono::steady_clock::now();}
double get_now() {return std::chrono::duration_cast<std::chrono::duration<double>>(std::chrono::steady_clock::now() - _t).count();}
double get_now_ms() {return
std::chrono::duration_cast<std::chrono::duration<double,std::milli>>(std::chrono::steady_clock::now() - _t).count();}
};
class t_barrier {
private:
std::mutex m_mutex;
std::condition_variable m_cond;
unsigned int m_threshold;
unsigned int m_count;
unsigned int m_generation;
public:
t_barrier(unsigned int count):
m_threshold(count),
m_count(count),
m_generation(0) {
}
bool wait() {
std::unique_lock<std::mutex> lock(m_mutex);
unsigned int gen = m_generation;
if (--m_count == 0)
{
m_generation++;
m_count = m_threshold;
m_cond.notify_all();
return true;
}
while (gen == m_generation)
m_cond.wait(lock);
return false;
}
};
using namespace std;
void do_something(complex<double> * c, unsigned int max) {
complex<double> a(1.,0.);
complex<double> b(1.,0.);
for (unsigned int i = 0; i<max; i++) {
a *= polar(1.,2.*M_PI*i/max);
b *= polar(1.,4.*M_PI*i/max);
*(c)+=a+b;
}
}
bool done=false;
void task(complex<double> * c, unsigned int max, t_barrier* start_barrier, t_barrier* end_barrier) {
while (!done) {
start_barrier->wait ();
do_something(c,max);
end_barrier->wait ();
}
cout << "task finished" << endl;
}
int main() {
t_chronometer t;
std::default_random_engine gen;
std::normal_distribution<double> dis(.0,1000.0);
complex<double> cpx[n_threads];
for (unsigned int i=0; i < n_threads; i++) {
cpx[i] = complex<double>(dis(gen), dis(gen));
}
t_barrier start_barrier (n_threads + 1); // child threads + main thread
t_barrier end_barrier (n_threads + 1); // child threads + main thread
std::thread mythread[n_threads];
unsigned long int sum=0;
for (unsigned int i=0; i < n_threads; i++) {
unsigned int max = task_length + i * task_length_variation;
cout << i+1 << "th task length: " << max << endl;
mythread[i] = std::thread(task, &cpx[i], max, &start_barrier, &end_barrier);
sum+=max;
}
cout << "total task length " << sum << endl;
complex<double> c(0,0);
for (unsigned long int j=1; j < rep+1; j++) {
start_barrier.wait (); //give to the threads the missing call to start
if (j==rep) done=true;
end_barrier.wait (); //wait for the call from each tread
if (j%100==0) cout << "cycle: " << j << endl;
for (unsigned int i=0; i<n_threads; i++) {
c+=cpx[i];
}
}
for (unsigned int i=0; i < n_threads; i++) {
mythread[i].join();
}
cout << "result: " << c << " it took: " << t.get_now() << " s." << endl;
return 0;
}

Related

Let main thread wait async threads complete

I'm new to c++ and don't know how to let main thread wait for all async threads done. I refered this but makes void consume() not parallel.
#include <iostream>
#include <vector>
#include <unistd.h> // sleep
#include <future>
using namespace std;
class Myclass {
private:
std::vector<int> resources;
std::vector<int> res;
std::mutex resMutex;
std::vector<std::future<void>> m_futures;
public:
Myclass() {
for (int i = 0; i < 10; i++) resources.push_back(i); // add task
res.reserve(resources.size());
}
void consume() {
for (int i = 0; i < resources.size(); i++) {
m_futures.push_back(std::async(std::launch::async, &Myclass::work, this, resources[i]));
// m_futures.back().wait();
}
}
void work(int x) {
sleep(1); // Simulation time-consuming
std::lock_guard<std::mutex> lock(resMutex);
res.push_back(x);
printf("%d be added.---done by %d.\n", x, std::this_thread::get_id());
}
std::vector<int> &getRes() { return res;}
};
int main() {
Myclass obj;
obj.consume();
auto res = obj.getRes();
cout << "Done. res.size = " << res.size() << endl;
for (int i : res) cout << i << " ";
cout <<"main thread over\n";
}
Main thread ends up when res = 0. I want obj.getRes() be be executed when all results be added into res.
Done. res.size = 0
main thread over
4 be added.---done by 6.
9 be added.---done by 11...
You had the right idea with the commented out line: m_futures.back().wait();, you just have it in the wrong place.
As you note, launching a std::async and then waiting for its result right after, forces the entire thing to execute in series and makes the async pointless.
Instead you want two functions: One, like your consume() that launches all the async's, and then another that loops over the futures and calls wait (or get, whatever suits your needs) on them - and then call that from main.
This lets them all run in parallel, while still making main wait for the final result.
Addition to #Frodyne 's answer,
consume() function calls are parallel, and main thread waits for the all consume() s have their work done;
void set_wait(void)
{
for (int i = 0; i < resources.size(); i++) {
m_futures[i].wait();
}
}
And call it here
void consume() {
for (int i = 0; i < resources.size(); i++) {
m_futures.push_back(std::async(std::launch::async, &Myclass::work, this, resources[i]));
// Calling wait() here makes no sense
}
set_wait(); // Waits for all threads do work
}
I created new function for convenience.
You can use std::future:wait after you add task to m_futures. Example.
void consume() {
for (int i = 0; i < resources.size(); i++) {
m_futures.push_back(std::async(std::launch::async, &Myclass::work, this, resources[i]));
//m_futures.back().wait();
}
for(auto& f: m_futures) f.wait();
}

How to iterate through boost thread specific pointers

I have a multi-thread application. Each thread initializes a struct data type in its own local storage. Some elements are being added to the vectors inside the struct type variables. At the end of the program, I would like to iterate through these thread local storages and add all the results together. How can I iterate through the thread specific pointer so that I can add all the results from the multi threads together ?
Thanks in advance.
boost::thread_specific_ptr<testStruct> tss;
size_t x = 10;
void callable(string str, int x) {
if(!tss.get()){
tss.reset(new testStruct);
(*tss).xInt.resize(x, 0);
}
// Assign some values to the vector elements after doing some calculations
}
Example:
#include <iostream>
#include <vector>
#include <boost/thread/mutex.hpp>
#include <boost/thread/tss.hpp>
#include <boost/thread.hpp>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#define NR_THREAD 4
#define SAMPLE_SIZE 500
using namespace std;
static bool busy = false;
struct testStruct{
vector<int> intVector;
};
boost::asio::io_service ioService;
boost::thread_specific_ptr<testStruct> tsp;
boost::condition_variable cond;
boost::mutex mut;
void callable(int x) {
if(!tsp.get()){
tsp.reset(new testStruct);
}
(*tsp).intVector.push_back(x);
if (x + 1 == SAMPLE_SIZE){
busy = true;
cond.notify_all();
}
}
int main() {
boost::thread_group threads;
size_t (boost::asio::io_service::*run)() = &boost::asio::io_service::run;
boost::asio::io_service::work work(ioService);
for (short int i = 0; i < NR_THREAD; ++i) {
threads.create_thread(boost::bind(run, &ioService));
}
size_t iterations = 10;
for (int i = 0; i < iterations; i++) {
busy = false;
for (short int j = 0; j < SAMPLE_SIZE; ++j) {
ioService.post(boost::bind(callable, j));
}
// all threads need to finish the job for the next iteration
boost::unique_lock<boost::mutex> lock(mut);
while (!busy) {
cond.wait(lock);
}
cout << "Iteration: " << i << endl;
}
vector<int> sum(SAMPLE_SIZE, 0); // sum up all the values from thread local storages
work.~work();
threads.join_all();
return 0;
}
So, after I haven given some thought to this issue, I have come up with such a solution:
void accumulateTLS(size_t idxThread){
if (idxThread == nr_threads) // Suspend all the threads till all of them are called and waiting here
{
busy = true;
}
boost::unique_lock<boost::mutex> lock(mut);
while (!busy)
{
cond.wait(lock);
}
// Accumulate the variables using thread specific pointer
cond.notify_one();
}
With boost io_service, the callable function can be changed after the threads are initialized. So, after I have done all the calculations, I am sending jobs(as many as the number of threads) to the io service again with callable function accumulateTLS(idxThread). The N jobs are sent to N threads and the accumulation process is done inside accumulateTLS method.
P.S. instead of work.~work(), work.reset() should be used.

Making threads redo a print function in order

This is a home assignment.
Have to print a string(given as input) in small chunks(Size given as input) by multiple threads one at a time in order 1,2,3,1,2,3,1,2(number of threads is given as input).
A thread does this printing function on creation and I want it to redo it after all the other threads. I face two problems:
1. Threads don't print in fixed order(mine gave 1,3,2,4 see output)
2. Threads need to re print till the entire string is exhausted.
This is what I tried...
#include<iostream>
#include<mutex>
#include<thread>
#include<string>
#include<vector>
#include<condition_variable>
#include<chrono>
using namespace std;
class circularPrint{
public:
int pos;
string message;
int nCharsPerPrint;
mutex mu;
condition_variable cv;
circularPrint(){
pos=0;
}
void shared_print(int threadID){
unique_lock<mutex> locker(mu);
if(pos+nCharsPerPrint<message.size())
cout<<"Thread"<<threadID<<" : "<<message.substr(pos,nCharsPerPrint)<<endl;
else if(pos<message.size())
cout<<"Thread"<<threadID<<" : "<<message.substr(pos)<<endl;
pos+=nCharsPerPrint;
}
};
void f(circularPrint &obj,int threadID){
obj.shared_print(threadID);
}
int main(){
circularPrint obj;
cout<<"\nMessage : ";
cin>>obj.message;
cout<<"\nChars : ";
cin>>obj.nCharsPerPrint;
int nthreads;
cout<<"\nThreads : ";
cin>>nthreads;
vector<thread> threads;
for(int count=1;count<=nthreads;++count)
{
threads.push_back(thread(f,ref(obj),count));
}
for(int count=0;count<nthreads;++count)
{
if(threads[count].joinable())
threads[count].join();
}
return 0;
}
Why would you want to multithread a method that can only be executed once at a time?
Anyway, something like this below? Be aware that the take and print use different locks and that there is a chance the output does not show in the expected order (hence, the why question above).
#include <iostream>
#include <mutex>
#include <thread>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
class circularPrint
{
public:
int pos;
string message;
int nCharsPerPrint;
mutex takeLock;
mutex printLock;
circularPrint() {
pos = 0;
}
string take(int count) {
lock_guard<mutex> locker(takeLock);
count = std::min(count, (int)message.size() - pos);
string substring = message.substr(pos, count);
pos += count;
return substring;
}
void print(int threadID, string& message) {
lock_guard<mutex> locker(printLock);
cout << "Thread" << threadID << " : " << message << endl;
}
void loop(int threadID) {
string message;
while((message = take(nCharsPerPrint)).size() > 0) {
print(threadID, message);
}
}
};
void f(circularPrint &obj, int threadID)
{
obj.loop(threadID);
}
int main()
{
circularPrint obj;
//cout << "\nMessage : ";
//cin >> obj.message;
//cout << "\nChars : ";
//cin >> obj.nCharsPerPrint;
int nthreads;
//cout << "\nThreads : ";
//cin >> nthreads;
nthreads = 4;
obj.message = "123456789012345";
obj.nCharsPerPrint = 2;
vector<thread> threads;
for (int count = 1; count <= nthreads; ++count)
threads.push_back(thread(f, ref(obj), count));
for (int count = 0; count < nthreads; ++count) {
if (threads[count].joinable())
threads[count].join();
}
return 0;
}
Currently each thread exits after printing one message - but you need more messages than threads, so each thread will need to do more than one message.
How about putting an infinite loop around your current locked section, and breaking out when there are no characters left to print?
(You may then find that the first thread does all the work; you can hack that by putting a zero-length sleep outside the locked section, or by making all the threads wait for some single signal to start, or just live with it.)
EDIT: Hadn't properly realised that you wanted to assign work to specific threads (which is normally a really bad idea). But if each thread knows its ID, and how many there are, it can figure out which characters it is supposed to print. Then all it has to do is wait till all the preceding characters have been printed (which it can tell using pos), do its work, then repeat until it has no work left to do and exit.
The only tricky bit is waiting for the preceding work to finish. You can do that with a busy wait (bad), a busy wait with a sleep in it (also bad), or a condition variable (better).
You need inter thread synchronization, each thread doing a loop "print, send a message to next one, wait for a message (from the last thread)".
You can use semaphores, events, messages or something similar.
Something as:
#include <string>
#include <iostream>
#include <condition_variable>
#include <thread>
#include <unistd.h>
using namespace std;
// Parameters passed to a thread.
struct ThreadParameters {
string message; // to print.
volatile bool *exit; // set when the thread should exit.
condition_variable* input; // condition to wait before printing.
condition_variable* output; // condition to set after printing.
};
class CircularPrint {
public:
CircularPrint(int nb_threads) {
nb_threads_ = nb_threads;
condition_variables_ = new condition_variable[nb_threads];
thread_parameters_ = new ThreadParameters[nb_threads];
threads_ = new thread*[nb_threads];
exit_ = false;
for (int i = 0; i < nb_threads; ++i) {
thread_parameters_[i].message = to_string(i + 1);
thread_parameters_[i].exit = &exit_;
// Wait 'your' condition
thread_parameters_[i].input = &condition_variables_[i];
// Then set next one (of first one if you are the last).
thread_parameters_[i].output =
&condition_variables_[(i + 1) % nb_threads];
threads_[i] = new thread(Thread, &thread_parameters_[i]);
}
// Start the dance, free the first thread.
condition_variables_[0].notify_all();
}
~CircularPrint() {
// Ask threads to exit.
exit_ = true;
// Wait for all threads to end.
for (int i = 0; i < nb_threads_; ++i) {
threads_[i]->join();
delete threads_[i];
}
delete[] condition_variables_;
delete[] thread_parameters_;
delete[] threads_;
}
static void Thread(ThreadParameters* params) {
for (;;) {
if (*params->exit) {
return;
}
{
// Wait the mutex. We don't really care, by condition variables
// need a mutex.
// Though the mutex will be useful for the real assignement.
unique_lock<mutex> lock(mutex_);
// Wait for the input condition variable (frees the mutex before waiting).
params->input->wait(lock);
}
cout << params->message << endl;
// Free next thread.
params->output->notify_all();
}
}
private:
int nb_threads_;
condition_variable* condition_variables_;
ThreadParameters* thread_parameters_;
thread** threads_;
bool exit_;
static mutex mutex_;
};
mutex CircularPrint::mutex_;
int main() {
CircularPrint printer(10);
sleep(3);
return 0;
}
using vector<shared_ptr<...>> would be more elegant than just arrays, though this works:
g++ -std=c++11 -o test test.cc -pthread -Wl,--no-as-needed
./test

C++ Syncing threads in most elegant way

I am try to solve the following problem, I know there are multiple solutions but I'm looking for the most elegant way (less code) to solve it.
I've 4 threads, 3 of them try to write a unique value (0,1,or 2) to a volatile integer variable in an infinite loop, the forth thread try to read the value of this variable and print the value to the stdout also in an infinite loop.
I'd like to sync between the thread so the thread that writes 0 will be run and then the "print" thread and then the thread that writes 1 and then again the print thread, an so on...
So that finally what I expect to see at the output of the "print" thread is a sequence of zeros and then sequence of 1 and then 2 and then 0 and so on...
What is the most elegant and easy way to sync between these threads.
This is the program code:
volatile int value;
int thid[4];
int main() {
HANDLE handle[4];
for (int ii=0;ii<4;ii++) {
thid[ii]=ii;
handle[ii] = (HANDLE) CreateThread( NULL, 0, (LPTHREAD_START_ROUTINE) ThreadProc, &thid[ii], 0, NULL);
}
return 0;
}
void WINAPI ThreadProc( LPVOID param ) {
int h=*((int*)param);
switch (h) {
case 3:
while(true) {
cout << value << endl;
}
break;
default:
while(true) {
// setting a unique value to the volatile variable
value=h;
}
break;
}
}
your problem can be solved with the producer consumer pattern.
I got inspired from Wikipedia so here is the link if you want some more details.
https://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem
I used a random number generator to generate the volatile variable but you can change that part.
Here is the code: it can be improved in terms of style (using C++11 for random numbers) but it produces what you expect.
#include <iostream>
#include <sstream>
#include <vector>
#include <stack>
#include <thread>
#include <mutex>
#include <atomic>
#include <condition_variable>
#include <chrono>
#include <stdlib.h> /* srand, rand */
using namespace std;
//random number generation
std::mutex mutRand;//mutex for random number generation (given that the random generator is not thread safe).
int GenerateNumber()
{
std::lock_guard<std::mutex> lk(mutRand);
return rand() % 3;
}
// print function for "thread safe" printing using a stringstream
void print(ostream& s) { cout << s.rdbuf(); cout.flush(); s.clear(); }
// Constants
//
const int num_producers = 3; //the three producers of random numbers
const int num_consumers = 1; //the only consumer
const int producer_delay_to_produce = 10; // in miliseconds
const int consumer_delay_to_consume = 30; // in miliseconds
const int consumer_max_wait_time = 200; // in miliseconds - max time that a consumer can wait for a product to be produced.
const int max_production = 1; // When producers has produced this quantity they will stop to produce
const int max_products = 1; // Maximum number of products that can be stored
//
// Variables
//
atomic<int> num_producers_working(0); // When there's no producer working the consumers will stop, and the program will stop.
stack<int> products; // The products stack, here we will store our products
mutex xmutex; // Our mutex, without this mutex our program will cry
condition_variable is_not_full; // to indicate that our stack is not full between the thread operations
condition_variable is_not_empty; // to indicate that our stack is not empty between the thread operations
//
// Functions
//
// Produce function, producer_id will produce a product
void produce(int producer_id)
{
while (true)
{
unique_lock<mutex> lock(xmutex);
int product;
is_not_full.wait(lock, [] { return products.size() != max_products; });
product = GenerateNumber();
products.push(product);
print(stringstream() << "Producer " << producer_id << " produced " << product << "\n");
is_not_empty.notify_all();
}
}
// Consume function, consumer_id will consume a product
void consume(int consumer_id)
{
while (true)
{
unique_lock<mutex> lock(xmutex);
int product;
if(is_not_empty.wait_for(lock, chrono::milliseconds(consumer_max_wait_time),
[] { return products.size() > 0; }))
{
product = products.top();
products.pop();
print(stringstream() << "Consumer " << consumer_id << " consumed " << product << "\n");
is_not_full.notify_all();
}
}
}
// Producer function, this is the body of a producer thread
void producer(int id)
{
++num_producers_working;
for(int i = 0; i < max_production; ++i)
{
produce(id);
this_thread::sleep_for(chrono::milliseconds(producer_delay_to_produce));
}
print(stringstream() << "Producer " << id << " has exited\n");
--num_producers_working;
}
// Consumer function, this is the body of a consumer thread
void consumer(int id)
{
// Wait until there is any producer working
while(num_producers_working == 0) this_thread::yield();
while(num_producers_working != 0 || products.size() > 0)
{
consume(id);
this_thread::sleep_for(chrono::milliseconds(consumer_delay_to_consume));
}
print(stringstream() << "Consumer " << id << " has exited\n");
}
//
// Main
//
int main()
{
vector<thread> producers_and_consumers;
// Create producers
for(int i = 0; i < num_producers; ++i)
producers_and_consumers.push_back(thread(producer, i));
// Create consumers
for(int i = 0; i < num_consumers; ++i)
producers_and_consumers.push_back(thread(consumer, i));
// Wait for consumers and producers to finish
for(auto& t : producers_and_consumers)
t.join();
return 0;
}
Hope that helps, tell me if you need more info or if you disagree with something :-)
And Good Bastille Day to all French people!
If you want to synchronise the threads, then using a sync object to hold each of the threads in a "ping-pong" or "tick-tock" pattern.
In C++ 11 you can use condition variables, the example here shows something similar to what you are asking for.

Thread pooling in C++11

Relevant questions:
About C++11:
C++11: std::thread pooled?
Will async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?
About Boost:
C++ boost thread reusing threads
boost::thread and creating a pool of them!
How do I get a pool of threads to send tasks to, without creating and deleting them over and over again? This means persistent threads to resynchronize without joining.
I have code that looks like this:
namespace {
std::vector<std::thread> workers;
int total = 4;
int arr[4] = {0};
void each_thread_does(int i) {
arr[i] += 2;
}
}
int main(int argc, char *argv[]) {
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
workers.push_back(std::thread(each_thread_does, j));
}
for (std::thread &t: workers) {
if (t.joinable()) {
t.join();
}
}
arr[4] = std::min_element(arr, arr+4);
}
return 0;
}
Instead of creating and joining threads each iteration, I'd prefer to send tasks to my worker threads each iteration and only create them once.
This is adapted from my answer to another very similar post.
Let's build a ThreadPool class:
class ThreadPool {
public:
void Start();
void QueueJob(const std::function<void()>& job);
void Stop();
void busy();
private:
void ThreadLoop();
bool should_terminate = false; // Tells threads to stop looking for jobs
std::mutex queue_mutex; // Prevents data races to the job queue
std::condition_variable mutex_condition; // Allows threads to wait on new jobs or termination
std::vector<std::thread> threads;
std::queue<std::function<void()>> jobs;
};
ThreadPool::Start
For an efficient threadpool implementation, once threads are created according to num_threads, it's better not to
create new ones or destroy old ones (by joining). There will be a performance penalty, and it might even make your
application go slower than the serial version. Thus, we keep a pool of threads that can be used at any time (if they
aren't already running a job).
Each thread should be running its own infinite loop, constantly waiting for new tasks to grab and run.
void ThreadPool::Start() {
const uint32_t num_threads = std::thread::hardware_concurrency(); // Max # of threads the system supports
threads.resize(num_threads);
for (uint32_t i = 0; i < num_threads; i++) {
threads.at(i) = std::thread(ThreadLoop);
}
}
ThreadPool::ThreadLoop
The infinite loop function. This is a while (true) loop waiting for the task queue to open up.
void ThreadPool::ThreadLoop() {
while (true) {
std::function<void()> job;
{
std::unique_lock<std::mutex> lock(queue_mutex);
mutex_condition.wait(lock, [this] {
return !jobs.empty() || should_terminate;
});
if (should_terminate) {
return;
}
job = jobs.front();
jobs.pop();
}
job();
}
}
ThreadPool::QueueJob
Add a new job to the pool; use a lock so that there isn't a data race.
void ThreadPool::QueueJob(const std::function<void()>& job) {
{
std::unique_lock<std::mutex> lock(queue_mutex);
jobs.push(job);
}
mutex_condition.notify_one();
}
To use it:
thread_pool->QueueJob([] { /* ... */ });
ThreadPool::busy
void ThreadPool::busy() {
bool poolbusy;
{
std::unique_lock<std::mutex> lock(queue_mutex);
poolbusy = jobs.empty();
}
return poolbusy;
}
The busy() function can be used in a while loop, such that the main thread can wait the threadpool to complete all the tasks before calling the threadpool destructor.
ThreadPool::Stop
Stop the pool.
void ThreadPool::Stop() {
{
std::unique_lock<std::mutex> lock(queue_mutex);
should_terminate = true;
}
mutex_condition.notify_all();
for (std::thread& active_thread : threads) {
active_thread.join();
}
threads.clear();
}
Once you integrate these ingredients, you have your own dynamic threading pool. These threads always run, waiting for
job to do.
I apologize if there are some syntax errors, I typed this code and and I have a bad memory. Sorry that I cannot provide
you the complete thread pool code; that would violate my job integrity.
Notes:
The anonymous code blocks are used so that when they are exited, the std::unique_lock variables created within them
go out of scope, unlocking the mutex.
ThreadPool::Stop will not terminate any currently running jobs, it just waits for them to finish via active_thread.join().
You can use C++ Thread Pool Library, https://github.com/vit-vit/ctpl.
Then the code your wrote can be replaced with the following
#include <ctpl.h> // or <ctpl_stl.h> if ou do not have Boost library
int main (int argc, char *argv[]) {
ctpl::thread_pool p(2 /* two threads in the pool */);
int arr[4] = {0};
std::vector<std::future<void>> results(4);
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
results[j] = p.push([&arr, j](int){ arr[j] +=2; });
}
for (int j = 0; j < 4; ++j) {
results[j].get();
}
arr[4] = std::min_element(arr, arr + 4);
}
}
You will get the desired number of threads and will not create and delete them over and over again on the iterations.
A pool of threads means that all your threads are running, all the time – in other words, the thread function never returns. To give the threads something meaningful to do, you have to design a system of inter-thread communication, both for the purpose of telling the thread that there's something to do, as well as for communicating the actual work data.
Typically this will involve some kind of concurrent data structure, and each thread would presumably sleep on some kind of condition variable, which would be notified when there's work to do. Upon receiving the notification, one or several of the threads wake up, recover a task from the concurrent data structure, process it, and store the result in an analogous fashion.
The thread would then go on to check whether there's even more work to do, and if not go back to sleep.
The upshot is that you have to design all this yourself, since there isn't a natural notion of "work" that's universally applicable. It's quite a bit of work, and there are some subtle issues you have to get right. (You can program in Go if you like a system which takes care of thread management for you behind the scenes.)
A threadpool is at core a set of threads all bound to a function working as an event loop. These threads will endlessly wait for a task to be executed, or their own termination.
The threadpool job is to provide an interface to submit jobs, define (and perhaps modify) the policy of running these jobs (scheduling rules, thread instantiation, size of the pool), and monitor the status of the threads and related resources.
So for a versatile pool, one must start by defining what a task is, how it is launched, interrupted, what is the result (see the notion of promise and future for that question), what sort of events the threads will have to respond to, how they will handle them, how these events shall be discriminated from the ones handled by the tasks. This can become quite complicated as you can see, and impose restrictions on how the threads will work, as the solution becomes more and more involved.
The current tooling for handling events is fairly barebones(*): primitives like mutexes, condition variables, and a few abstractions on top of that (locks, barriers). But in some cases, these abstrations may turn out to be unfit (see this related question), and one must revert to using the primitives.
Other problems have to be managed too:
signal
i/o
hardware (processor affinity, heterogenous setup)
How would these play out in your setting?
This answer to a similar question points to an existing implementation meant for boost and the stl.
I offered a very crude implementation of a threadpool for another question, which doesn't address many problems outlined above. You might want to build up on it. You might also want to have a look of existing frameworks in other languages, to find inspiration.
(*) I don't see that as a problem, quite to the contrary. I think it's the very spirit of C++ inherited from C.
Follwoing [PhD EcE](https://stackoverflow.com/users/3818417/phd-ece) suggestion, I implemented the thread pool:
function_pool.h
#pragma once
#include <queue>
#include <functional>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <cassert>
class Function_pool
{
private:
std::queue<std::function<void()>> m_function_queue;
std::mutex m_lock;
std::condition_variable m_data_condition;
std::atomic<bool> m_accept_functions;
public:
Function_pool();
~Function_pool();
void push(std::function<void()> func);
void done();
void infinite_loop_func();
};
function_pool.cpp
#include "function_pool.h"
Function_pool::Function_pool() : m_function_queue(), m_lock(), m_data_condition(), m_accept_functions(true)
{
}
Function_pool::~Function_pool()
{
}
void Function_pool::push(std::function<void()> func)
{
std::unique_lock<std::mutex> lock(m_lock);
m_function_queue.push(func);
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
lock.unlock();
m_data_condition.notify_one();
}
void Function_pool::done()
{
std::unique_lock<std::mutex> lock(m_lock);
m_accept_functions = false;
lock.unlock();
// when we send the notification immediately, the consumer will try to get the lock , so unlock asap
m_data_condition.notify_all();
//notify all waiting threads.
}
void Function_pool::infinite_loop_func()
{
std::function<void()> func;
while (true)
{
{
std::unique_lock<std::mutex> lock(m_lock);
m_data_condition.wait(lock, [this]() {return !m_function_queue.empty() || !m_accept_functions; });
if (!m_accept_functions && m_function_queue.empty())
{
//lock will be release automatically.
//finish the thread loop and let it join in the main thread.
return;
}
func = m_function_queue.front();
m_function_queue.pop();
//release the lock
}
func();
}
}
main.cpp
#include "function_pool.h"
#include <string>
#include <iostream>
#include <mutex>
#include <functional>
#include <thread>
#include <vector>
Function_pool func_pool;
class quit_worker_exception : public std::exception {};
void example_function()
{
std::cout << "bla" << std::endl;
}
int main()
{
std::cout << "stating operation" << std::endl;
int num_threads = std::thread::hardware_concurrency();
std::cout << "number of threads = " << num_threads << std::endl;
std::vector<std::thread> thread_pool;
for (int i = 0; i < num_threads; i++)
{
thread_pool.push_back(std::thread(&Function_pool::infinite_loop_func, &func_pool));
}
//here we should send our functions
for (int i = 0; i < 50; i++)
{
func_pool.push(example_function);
}
func_pool.done();
for (unsigned int i = 0; i < thread_pool.size(); i++)
{
thread_pool.at(i).join();
}
}
You can use thread_pool from boost library:
void my_task(){...}
int main(){
int threadNumbers = thread::hardware_concurrency();
boost::asio::thread_pool pool(threadNumbers);
// Submit a function to the pool.
boost::asio::post(pool, my_task);
// Submit a lambda object to the pool.
boost::asio::post(pool, []() {
...
});
}
You also can use threadpool from open source community:
void first_task() {...}
void second_task() {...}
int main(){
int threadNumbers = thread::hardware_concurrency();
pool tp(threadNumbers);
// Add some tasks to the pool.
tp.schedule(&first_task);
tp.schedule(&second_task);
}
Something like this might help (taken from a working app).
#include <memory>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
struct thread_pool {
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
thread_pool(int threads) :service(), service_worker(new asio_worker::element_type(service)) {
for (int i = 0; i < threads; ++i) {
auto worker = [this] { return service.run(); };
grp.add_thread(new boost::thread(worker));
}
}
template<class F>
void enqueue(F f) {
service.post(f);
}
~thread_pool() {
service_worker.reset();
grp.join_all();
service.stop();
}
private:
boost::asio::io_service service;
asio_worker service_worker;
boost::thread_group grp;
};
You can use it like this:
thread_pool pool(2);
pool.enqueue([] {
std::cout << "Hello from Task 1\n";
});
pool.enqueue([] {
std::cout << "Hello from Task 2\n";
});
Keep in mind that reinventing an efficient asynchronous queuing mechanism is not trivial.
Boost::asio::io_service is a very efficient implementation, or actually is a collection of platform-specific wrappers (e.g. it wraps I/O completion ports on Windows).
Edit: This now requires C++17 and concepts. (As of 9/12/16, only g++ 6.0+ is sufficient.)
The template deduction is a lot more accurate because of it, though, so it's worth the effort of getting a newer compiler. I've not yet found a function that requires explicit template arguments.
It also now takes any appropriate callable object (and is still statically typesafe!!!).
It also now includes an optional green threading priority thread pool using the same API. This class is POSIX only, though. It uses the ucontext_t API for userspace task switching.
I created a simple library for this. An example of usage is given below. (I'm answering this because it was one of the things I found before I decided it was necessary to write it myself.)
bool is_prime(int n){
// Determine if n is prime.
}
int main(){
thread_pool pool(8); // 8 threads
list<future<bool>> results;
for(int n = 2;n < 10000;n++){
// Submit a job to the pool.
results.emplace_back(pool.async(is_prime, n));
}
int n = 2;
for(auto i = results.begin();i != results.end();i++, n++){
// i is an iterator pointing to a future representing the result of is_prime(n)
cout << n << " ";
bool prime = i->get(); // Wait for the task is_prime(n) to finish and get the result.
if(prime)
cout << "is prime";
else
cout << "is not prime";
cout << endl;
}
}
You can pass async any function with any (or void) return value and any (or no) arguments and it will return a corresponding std::future. To get the result (or just wait until a task has completed) you call get() on the future.
Here's the github: https://github.com/Tyler-Hardin/thread_pool.
looks like threadpool is very popular problem/exercise :-)
I recently wrote one in modern C++; it’s owned by me and publicly available here - https://github.com/yurir-dev/threadpool
It supports templated return values, core pinning, ordering of some tasks.
all implementation in two .h files.
So, the original question will be something like this:
#include "tp/threadpool.h"
int arr[5] = { 0 };
concurency::threadPool<void> tp;
tp.start(std::thread::hardware_concurrency());
std::vector<std::future<void>> futures;
for (int i = 0; i < 8; ++i) { // for 8 iterations,
for (int j = 0; j < 4; ++j) {
futures.push_back(tp.push([&arr, j]() {
arr[j] += 2;
}));
}
}
// wait until all pushed tasks are finished.
for (auto& f : futures)
f.get();
// or just tp.end(); // will kill all the threads
arr[4] = *std::min_element(arr, arr + 4);
I found the pending tasks' future.get() call hangs on caller side if the thread pool gets terminated and leaves some tasks inside task queue. How to set future exception inside thread pool with only the wrapper std::function?
template <class F, class... Args>
std::future<std::result_of_t<F(Args...)>> enqueue(F &&f, Args &&...args) {
auto task = std::make_shared<std::packaged_task<std::result_of_t<F(Args...)>()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...));
std::future<return_type> res = task->get_future();
{
std::unique_lock<std::mutex> lock(_mutex);
_tasks.push([task]() -> void { (*task)(); });
}
return res;
}
class StdThreadPool {
std::vector<std::thread> _workers;
std::priority_queue<TASK> _tasks;
...
}
struct TASK {
//int _func_return_value;
std::function<void()> _func;
int priority;
...
}
The Stroika library has a threadpool implementation.
Stroika ThreadPool.h
ThreadPool p;
p.AddTask ([] () {doIt ();});
Stroika's thread library also supports cancelation (cooperative) - so that when the ThreadPool above goes out of scope - it cancels any running tasks (similar to c++20's jthread).