how to use boost atomic to remove race condition? - c++

I am trying to use boost::atomic to do multithreading synchronization on linux.
But, the result is not consistent.
Any help will be appreciated.
thanks
#include <boost/bind.hpp>
#include <boost/threadpool.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread.hpp>
#include <boost/atomic.hpp>
boost::atomic<int> g(0) ;
void f()
{
g.fetch_add(1, boost::memory_order_relaxed);
return ;
}
const int threadnum = 10;
int main()
{
boost::threadpool::fifo_pool tp(threadnum);
for (int i = 0 ; i < threadnum ; ++i)
tp.schedule(boost::bind(f));
tp.wait();
std::cout << g << std::endl ;
return 0 ;
}

I'm not familiar with the boost thread library specifically, or boost::threadpool, but it looks to me like the threads have not necessarily completed when you access the value of g, so you will get some value between zero and 10.
Here's your program, modified to use the standard library, with joins inserted so that the fetch adds happen before the output of g.
std::atomic<int> g(0);
void f() {
g.fetch_add(1, std::memory_order_relaxed);
}
int main() {
const int threadnum = 10;
std::vector<std::thread> v;
for (int i = 0 ; i < threadnum ; ++i)
v.push_back(std::thread(f));
for (auto &th : v)
th.join();
std::cout << g << '\n';
}
edit:
If your program still isn't consistent even with the added tp.wait() then that is puzzling. The adds should happen before the threads end, and I would think that the threads ending would synchronize with the tp.wait(), which happens before the read. So all the adds should happen before g is printed, even though you use memory_order_relaxed, so the printed value should be 10.

Here are some examples that might help:
http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html
Basically, you're trying to "protect" a "critical region" with a "lock".
You can set or unset a semaphore.
Or you can "exchange" a boost "atomic" variable. For example (from the above link):
class spinlock {
private:
typedef enum {Locked, Unlocked} LockState;
boost::atomic<LockState> state_;
public:
spinlock() : state_(Unlocked) {}
lock()
{
while (state_.exchange(Locked, boost::memory_order_acquire) == Locked) {
/* busy-wait */
}
}
unlock()
{
state_.store(Unlocked, boost::memory_order_release);
}
};

Related

why does this thread pool deadlock or run too many times?

I'm trying to write a thread pool in c++ that fulfills the following criteria:
a single writer occasionally writes a new input value, and once it does, many threads concurrently access this same value, and each spit out a random floating point number.
each worker thread uses the same function, so there's no reason to build a thread-safe queue for all the different functions. I store the common function inside the thread_pool class.
these functions are by far the most computationally-intensive aspect of the program. Any locks that prevent these functions from doing their work is the primary thing I'm trying to avoid.
the floating point output from all these functions is simply averaged.
the user has a single function called thread_pool::start_work that changes this shared input, and tells all the workers to work for a fixed number of tasks.
thread_pool::start_work returns std::future
Below is what I have so far. It can be built and run with g++ test_tp.cpp -std=c++17 -lpthread; ./a.out Unfortunately it either deadlocks or does the work too many (or sometimes too few) times. I am thinking that it's because m_num_comps_done is not thread-safe. There are chances that all the threads skip over the last count, and then they all end up yielding. But isn't this variable atomic?
#include <vector>
#include <thread>
#include <mutex>
#include <shared_mutex>
#include <queue>
#include <atomic>
#include <future>
#include <iostream>
#include <numeric>
/**
* #class join_threads
* #brief RAII thread killer
*/
class join_threads
{
std::vector<std::thread>& m_threads;
public:
explicit join_threads(std::vector<std::thread>& threads_)
: m_threads(threads_) {}
~join_threads() {
for(unsigned long i=0; i < m_threads.size(); ++i) {
if(m_threads[i].joinable())
m_threads[i].join();
}
}
};
// how remove the first two template parameters ?
template<typename func_input_t, typename F>
class thread_pool
{
using func_output_t = typename std::result_of<F(func_input_t)>::type;
static_assert( std::is_floating_point<func_output_t>::value,
"function output type must be floating point");
unsigned m_num_comps;
std::atomic_bool m_done;
std::atomic_bool m_has_an_input;
std::atomic<int> m_num_comps_done; // need to be atomic? why?
F m_f; // same function always used
func_input_t m_param; // changed occasionally by a single writer
func_output_t m_working_output; // many reader threads average all their output to get this
std::promise<func_output_t> m_out;
mutable std::shared_mutex m_mut;
mutable std::mutex m_output_mut;
std::vector<std::thread> m_threads;
join_threads m_joiner;
void worker_thread() {
while(!m_done)
{
if(m_has_an_input){
if( m_num_comps_done.load() < m_num_comps - 1 ) {
std::shared_lock<std::shared_mutex> lk(m_mut);
func_output_t tmp = m_f(m_param); // long time
m_num_comps_done++;
// quick
std::lock_guard<std::mutex> lk2(m_output_mut);
m_working_output += tmp / m_num_comps;
}else if(m_num_comps_done.load() == m_num_comps - 1){
std::shared_lock<std::shared_mutex> lk(m_mut);
func_output_t tmp = m_f(m_param); // long time
m_num_comps_done++;
std::lock_guard<std::mutex> lk2(m_output_mut);
m_working_output += tmp / m_num_comps;
m_num_comps_done++;
try{
m_out.set_value(m_working_output);
}catch(std::future_error& e){
std::cout << "future_error caught: " << e.what() << "\n";
}
}else{
std::this_thread::yield();
}
}else{
std::this_thread::yield();
}
}
}
public:
/**
* #brief ctor spawns working threads
*/
thread_pool(F f, unsigned num_comps)
: m_num_comps(num_comps)
, m_done(false)
, m_has_an_input(false)
, m_joiner(m_threads)
, m_f(f)
{
unsigned const thread_count=std::thread::hardware_concurrency(); // should I subtract one?
try {
for(unsigned i=0; i<thread_count; ++i) {
m_threads.push_back( std::thread(&thread_pool::worker_thread, this));
}
} catch(...) {
m_done=true;
throw;
}
}
~thread_pool() {
m_done=true;
}
/**
* #brief changes the shared data member,
* resets the num_comps_left variable,
* resets the accumulator thing to 0, and
* resets the promise object
*/
std::future<func_output_t> start_work(func_input_t new_param) {
std::unique_lock<std::shared_mutex> lk(m_mut);
m_param = new_param;
m_num_comps_done = 0;
m_working_output = 0.0;
m_out = std::promise<func_output_t>();
m_has_an_input = true; // only really matters just after initialization
return m_out.get_future();
}
};
double slowSum(std::vector<double> nums) {
// std::this_thread::sleep_for(std::chrono::milliseconds(200));
return std::accumulate(nums.begin(), nums.end(), 0.0);
}
int main(){
// construct
thread_pool<std::vector<double>, std::function<double(std::vector<double>)>>
le_pool(slowSum, 1000);
// add work
auto ans = le_pool.start_work(std::vector<double>{1.2, 3.2, 4213.1});
std::cout << "final answer is: " << ans.get() << "\n";
std::cout << "it should be 4217.5\n";
return 1;
}
You check the "done" count, then get the lock. This allows multiple threads to be waiting for the lock. In particular, there might not be a thread that enters the second if body.
The other side of that is because you have all threads running all the time, the "last" thread may not get access to its exclusive section early (before enough threads have run) or even late (because additional threads are waiting at the mutex in the first loop).
To fix the first issue, since the second if block has all of the same code that is in the first if block, you can have just one block that checks the count to see if you've reached the end and should set the out value.
The second issue requires you to check m_num_comps_done a second time after acquiring the mutex.

std::atomic_flag to stop multiple threads

I'm trying to stop multiple worker threads using a std::atomic_flag. Starting from Issue using std::atomic_flag with worker thread the following works:
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
std::atomic_flag continueFlag;
std::thread t;
void work()
{
while (continueFlag.test_and_set(std::memory_order_relaxed)) {
std::cout << "work ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void start()
{
continueFlag.test_and_set(std::memory_order_relaxed);
t = std::thread(&work);
}
void stop()
{
continueFlag.clear(std::memory_order_relaxed);
t.join();
}
int main()
{
std::cout << "Start" << std::endl;
start();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "Stop" << std::endl;
stop();
std::cout << "Stopped." << std::endl;
return 0;
}
Trying to rewrite into multiple worker threads:
#include <iostream>
#include <atomic>
#include <chrono>
#include <thread>
#include <vector>
#include <memory>
struct thread_data {
std::atomic_flag continueFlag;
std::thread thread;
};
std::vector<thread_data> threads;
void work(int threadNum, std::atomic_flag &continueFlag)
{
while (continueFlag.test_and_set(std::memory_order_relaxed)) {
std::cout << "work" << threadNum << " ";
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
void start()
{
const unsigned int numThreads = 2;
for (int i = 0; i < numThreads; i++) {
////////////////////////////////////////////////////////////////////
//PROBLEM SECTOR
////////////////////////////////////////////////////////////////////
thread_data td;
td.continueFlag.test_and_set(std::memory_order_relaxed);
td.thread = std::thread(&work, i, td.continueFlag);
threads.push_back(std::move(td));
////////////////////////////////////////////////////////////////////
//PROBLEM SECTOR
////////////////////////////////////////////////////////////////////
}
}
void stop()
{
//Flag stop
for (auto &data : threads) {
data.continueFlag.clear(std::memory_order_relaxed);
}
//Join
for (auto &data : threads) {
data.thread.join();
}
threads.clear();
}
int main()
{
std::cout << "Start" << std::endl;
start();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::cout << "Stop" << std::endl;
stop();
std::cout << "Stopped." << std::endl;
return 0;
}
My issue is "Problem Sector" in above. Namely creating the threads. I cannot wrap my head around how to instantiate the threads and passing the variables to the work thread.
The error right now is referencing this line threads.push_back(std::move(td)); with error Error C2280 'thread_data::thread_data(const thread_data &)': attempting to reference a deleted function.
Trying to use unique_ptr like this:
auto td = std::make_unique<thread_data>();
td->continueFlag.test_and_set(std::memory_order_relaxed);
td->thread = std::thread(&work, i, td->continueFlag);
threads.push_back(std::move(td));
Gives error std::atomic_flag::atomic_flag(const std::atomic_flag &)': attempting to reference a deleted function at line td->thread = std::thread(&work, i, td->continueFlag);. Am I fundamentally misunderstanding the use of std::atomic_flag? Is it really both immovable and uncopyable?
Your first approach was actually closer to the truth. The problem is that it passed a reference to an object within the local for loop scope to each thread, as a parameter. But, of course, once the loop iteration ended, that object went out of scope and got destroyed, leaving each thread with a reference to a destroyed object, resulting in undefined behavior.
Nobody cared about the fact that you moved the object into the std::vector, after creating the thread. The thread received a reference to a locally-scoped object, and that's all it knew. End of story.
Moving the object into the vector first, and then passing to each thread a reference to the object in the std::vector will not work either. As soon as the vector internally reallocates, as part of its natural growth, you'll be in the same pickle.
What needs to happen is to have the entire threads array created first, before actually starting any std::threads. If the RAII principle is religiously followed, that means nothing more than a simple call to std::vector::resize().
Then, in a second loop, iterate over the fully-cooked threads array, and go and spawn off a std::thread for each element in the array.
I was almost there with my unique_ptr solution. I just needed to pass the call as a std::ref() as such:
std::vector<std::unique_ptr<thread_data>> threads;
void start()
{
const unsigned int numThreads = 2;
for (int i = 0; i < numThreads; i++) {
auto td = std::make_unique<thread_data>();
td->continueFlag.test_and_set(std::memory_order_relaxed);
td->thread = std::thread(&work, i, std::ref(td->continueFlag));
threads.push_back(std::move(td));
}
}
However, inspired by Sam above I also figured a non-pointer way:
std::vector<thread_data> threads;
void start()
{
const unsigned int numThreads = 2;
//create new vector, resize doesn't work as it tries to assign/copy which atomic_flag
//does not support
threads = std::vector<thread_data>(numThreads);
for (int i = 0; i < numThreads; i++) {
auto& t = threads.at(i);
t.continueFlag.test_and_set(std::memory_order_relaxed);
t.thread = std::thread(&work, i, std::ref(t.continueFlag));
}
}

How to iterate through boost thread specific pointers

I have a multi-thread application. Each thread initializes a struct data type in its own local storage. Some elements are being added to the vectors inside the struct type variables. At the end of the program, I would like to iterate through these thread local storages and add all the results together. How can I iterate through the thread specific pointer so that I can add all the results from the multi threads together ?
Thanks in advance.
boost::thread_specific_ptr<testStruct> tss;
size_t x = 10;
void callable(string str, int x) {
if(!tss.get()){
tss.reset(new testStruct);
(*tss).xInt.resize(x, 0);
}
// Assign some values to the vector elements after doing some calculations
}
Example:
#include <iostream>
#include <vector>
#include <boost/thread/mutex.hpp>
#include <boost/thread/tss.hpp>
#include <boost/thread.hpp>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#define NR_THREAD 4
#define SAMPLE_SIZE 500
using namespace std;
static bool busy = false;
struct testStruct{
vector<int> intVector;
};
boost::asio::io_service ioService;
boost::thread_specific_ptr<testStruct> tsp;
boost::condition_variable cond;
boost::mutex mut;
void callable(int x) {
if(!tsp.get()){
tsp.reset(new testStruct);
}
(*tsp).intVector.push_back(x);
if (x + 1 == SAMPLE_SIZE){
busy = true;
cond.notify_all();
}
}
int main() {
boost::thread_group threads;
size_t (boost::asio::io_service::*run)() = &boost::asio::io_service::run;
boost::asio::io_service::work work(ioService);
for (short int i = 0; i < NR_THREAD; ++i) {
threads.create_thread(boost::bind(run, &ioService));
}
size_t iterations = 10;
for (int i = 0; i < iterations; i++) {
busy = false;
for (short int j = 0; j < SAMPLE_SIZE; ++j) {
ioService.post(boost::bind(callable, j));
}
// all threads need to finish the job for the next iteration
boost::unique_lock<boost::mutex> lock(mut);
while (!busy) {
cond.wait(lock);
}
cout << "Iteration: " << i << endl;
}
vector<int> sum(SAMPLE_SIZE, 0); // sum up all the values from thread local storages
work.~work();
threads.join_all();
return 0;
}
So, after I haven given some thought to this issue, I have come up with such a solution:
void accumulateTLS(size_t idxThread){
if (idxThread == nr_threads) // Suspend all the threads till all of them are called and waiting here
{
busy = true;
}
boost::unique_lock<boost::mutex> lock(mut);
while (!busy)
{
cond.wait(lock);
}
// Accumulate the variables using thread specific pointer
cond.notify_one();
}
With boost io_service, the callable function can be changed after the threads are initialized. So, after I have done all the calculations, I am sending jobs(as many as the number of threads) to the io service again with callable function accumulateTLS(idxThread). The N jobs are sent to N threads and the accumulation process is done inside accumulateTLS method.
P.S. instead of work.~work(), work.reset() should be used.

Making threads redo a print function in order

This is a home assignment.
Have to print a string(given as input) in small chunks(Size given as input) by multiple threads one at a time in order 1,2,3,1,2,3,1,2(number of threads is given as input).
A thread does this printing function on creation and I want it to redo it after all the other threads. I face two problems:
1. Threads don't print in fixed order(mine gave 1,3,2,4 see output)
2. Threads need to re print till the entire string is exhausted.
This is what I tried...
#include<iostream>
#include<mutex>
#include<thread>
#include<string>
#include<vector>
#include<condition_variable>
#include<chrono>
using namespace std;
class circularPrint{
public:
int pos;
string message;
int nCharsPerPrint;
mutex mu;
condition_variable cv;
circularPrint(){
pos=0;
}
void shared_print(int threadID){
unique_lock<mutex> locker(mu);
if(pos+nCharsPerPrint<message.size())
cout<<"Thread"<<threadID<<" : "<<message.substr(pos,nCharsPerPrint)<<endl;
else if(pos<message.size())
cout<<"Thread"<<threadID<<" : "<<message.substr(pos)<<endl;
pos+=nCharsPerPrint;
}
};
void f(circularPrint &obj,int threadID){
obj.shared_print(threadID);
}
int main(){
circularPrint obj;
cout<<"\nMessage : ";
cin>>obj.message;
cout<<"\nChars : ";
cin>>obj.nCharsPerPrint;
int nthreads;
cout<<"\nThreads : ";
cin>>nthreads;
vector<thread> threads;
for(int count=1;count<=nthreads;++count)
{
threads.push_back(thread(f,ref(obj),count));
}
for(int count=0;count<nthreads;++count)
{
if(threads[count].joinable())
threads[count].join();
}
return 0;
}
Why would you want to multithread a method that can only be executed once at a time?
Anyway, something like this below? Be aware that the take and print use different locks and that there is a chance the output does not show in the expected order (hence, the why question above).
#include <iostream>
#include <mutex>
#include <thread>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
class circularPrint
{
public:
int pos;
string message;
int nCharsPerPrint;
mutex takeLock;
mutex printLock;
circularPrint() {
pos = 0;
}
string take(int count) {
lock_guard<mutex> locker(takeLock);
count = std::min(count, (int)message.size() - pos);
string substring = message.substr(pos, count);
pos += count;
return substring;
}
void print(int threadID, string& message) {
lock_guard<mutex> locker(printLock);
cout << "Thread" << threadID << " : " << message << endl;
}
void loop(int threadID) {
string message;
while((message = take(nCharsPerPrint)).size() > 0) {
print(threadID, message);
}
}
};
void f(circularPrint &obj, int threadID)
{
obj.loop(threadID);
}
int main()
{
circularPrint obj;
//cout << "\nMessage : ";
//cin >> obj.message;
//cout << "\nChars : ";
//cin >> obj.nCharsPerPrint;
int nthreads;
//cout << "\nThreads : ";
//cin >> nthreads;
nthreads = 4;
obj.message = "123456789012345";
obj.nCharsPerPrint = 2;
vector<thread> threads;
for (int count = 1; count <= nthreads; ++count)
threads.push_back(thread(f, ref(obj), count));
for (int count = 0; count < nthreads; ++count) {
if (threads[count].joinable())
threads[count].join();
}
return 0;
}
Currently each thread exits after printing one message - but you need more messages than threads, so each thread will need to do more than one message.
How about putting an infinite loop around your current locked section, and breaking out when there are no characters left to print?
(You may then find that the first thread does all the work; you can hack that by putting a zero-length sleep outside the locked section, or by making all the threads wait for some single signal to start, or just live with it.)
EDIT: Hadn't properly realised that you wanted to assign work to specific threads (which is normally a really bad idea). But if each thread knows its ID, and how many there are, it can figure out which characters it is supposed to print. Then all it has to do is wait till all the preceding characters have been printed (which it can tell using pos), do its work, then repeat until it has no work left to do and exit.
The only tricky bit is waiting for the preceding work to finish. You can do that with a busy wait (bad), a busy wait with a sleep in it (also bad), or a condition variable (better).
You need inter thread synchronization, each thread doing a loop "print, send a message to next one, wait for a message (from the last thread)".
You can use semaphores, events, messages or something similar.
Something as:
#include <string>
#include <iostream>
#include <condition_variable>
#include <thread>
#include <unistd.h>
using namespace std;
// Parameters passed to a thread.
struct ThreadParameters {
string message; // to print.
volatile bool *exit; // set when the thread should exit.
condition_variable* input; // condition to wait before printing.
condition_variable* output; // condition to set after printing.
};
class CircularPrint {
public:
CircularPrint(int nb_threads) {
nb_threads_ = nb_threads;
condition_variables_ = new condition_variable[nb_threads];
thread_parameters_ = new ThreadParameters[nb_threads];
threads_ = new thread*[nb_threads];
exit_ = false;
for (int i = 0; i < nb_threads; ++i) {
thread_parameters_[i].message = to_string(i + 1);
thread_parameters_[i].exit = &exit_;
// Wait 'your' condition
thread_parameters_[i].input = &condition_variables_[i];
// Then set next one (of first one if you are the last).
thread_parameters_[i].output =
&condition_variables_[(i + 1) % nb_threads];
threads_[i] = new thread(Thread, &thread_parameters_[i]);
}
// Start the dance, free the first thread.
condition_variables_[0].notify_all();
}
~CircularPrint() {
// Ask threads to exit.
exit_ = true;
// Wait for all threads to end.
for (int i = 0; i < nb_threads_; ++i) {
threads_[i]->join();
delete threads_[i];
}
delete[] condition_variables_;
delete[] thread_parameters_;
delete[] threads_;
}
static void Thread(ThreadParameters* params) {
for (;;) {
if (*params->exit) {
return;
}
{
// Wait the mutex. We don't really care, by condition variables
// need a mutex.
// Though the mutex will be useful for the real assignement.
unique_lock<mutex> lock(mutex_);
// Wait for the input condition variable (frees the mutex before waiting).
params->input->wait(lock);
}
cout << params->message << endl;
// Free next thread.
params->output->notify_all();
}
}
private:
int nb_threads_;
condition_variable* condition_variables_;
ThreadParameters* thread_parameters_;
thread** threads_;
bool exit_;
static mutex mutex_;
};
mutex CircularPrint::mutex_;
int main() {
CircularPrint printer(10);
sleep(3);
return 0;
}
using vector<shared_ptr<...>> would be more elegant than just arrays, though this works:
g++ -std=c++11 -o test test.cc -pthread -Wl,--no-as-needed
./test

Extend the life of threads with synchronization (C++11)

I have a program with a function which takes a pointer as arg, and a main. The main is creating n threads, each of them running the function on different memory areas depending on the passed arg. Threads are then joined, the main performs some data mixing between the area and creates n new threads which do the the same operation as the old ones.
To improve the program I would like to keep the threads alive, removing the long time necessary to create them. Threads should sleep when the main is working and notified when they have to come up again. At the same way the main should wait when threads are working as it did with join.
I cannot end up with a strong implementation of this, always falling in a deadlock.
Simple baseline code, any hints about how to modify this would be much appreciated
#include <thread>
#include <climits>
...
void myfunc(void * p) {
do_something(p);
}
int main(){
void * myp[n_threads] {a_location, another_location,...};
std::thread mythread[n_threads];
for (unsigned long int j=0; j < ULONG_MAX; j++) {
for (unsigned int i=0; i < n_threads; i++) {
mythread[i] = std::thread(myfunc, myp[i]);
}
for (unsigned int i=0; i < n_threads; i++) {
mythread[i].join();
}
mix_data(myp);
}
return 0;
}
Here is a possible approach using only classes from the C++11 Standard Library. Basically, each thread you create has an associated command queue (encapsulated in std::packaged_task<> objects) which it continuously check. If the queue is empty, the thread will just wait on a condition variable (std::condition_variable).
While data races are avoided through the use of std::mutex and std::unique_lock<> RAII wrappers, the main thread can wait for a particular job to be terminated by storing the std::future<> object associated to each submitted std::packaged_tast<> and call wait() on it.
Below is a simple program that follows this design. Comments should be sufficient to explain what it does:
#include <thread>
#include <iostream>
#include <sstream>
#include <future>
#include <queue>
#include <condition_variable>
#include <mutex>
// Convenience type definition
using job = std::packaged_task<void()>;
// Some data associated to each thread.
struct thread_data
{
int id; // Could use thread::id, but this is filled before the thread is started
std::thread t; // The thread object
std::queue<job> jobs; // The job queue
std::condition_variable cv; // The condition variable to wait for threads
std::mutex m; // Mutex used for avoiding data races
bool stop = false; // When set, this flag tells the thread that it should exit
};
// The thread function executed by each thread
void thread_func(thread_data* pData)
{
std::unique_lock<std::mutex> l(pData->m, std::defer_lock);
while (true)
{
l.lock();
// Wait until the queue won't be empty or stop is signaled
pData->cv.wait(l, [pData] () {
return (pData->stop || !pData->jobs.empty());
});
// Stop was signaled, let's exit the thread
if (pData->stop) { return; }
// Pop one task from the queue...
job j = std::move(pData->jobs.front());
pData->jobs.pop();
l.unlock();
// Execute the task!
j();
}
}
// Function that creates a simple task
job create_task(int id, int jobNumber)
{
job j([id, jobNumber] ()
{
std::stringstream s;
s << "Hello " << id << "." << jobNumber << std::endl;
std::cout << s.str();
});
return j;
}
int main()
{
const int numThreads = 4;
const int numJobsPerThread = 10;
std::vector<std::future<void>> futures;
// Create all the threads (will be waiting for jobs)
thread_data threads[numThreads];
int tdi = 0;
for (auto& td : threads)
{
td.id = tdi++;
td.t = std::thread(thread_func, &td);
}
//=================================================
// Start assigning jobs to each thread...
for (auto& td : threads)
{
for (int i = 0; i < numJobsPerThread; i++)
{
job j = create_task(td.id, i);
futures.push_back(j.get_future());
std::unique_lock<std::mutex> l(td.m);
td.jobs.push(std::move(j));
}
// Notify the thread that there is work do to...
td.cv.notify_one();
}
// Wait for all the tasks to be completed...
for (auto& f : futures) { f.wait(); }
futures.clear();
//=================================================
// Here the main thread does something...
std::cin.get();
// ...done!
//=================================================
//=================================================
// Posts some new tasks...
for (auto& td : threads)
{
for (int i = 0; i < numJobsPerThread; i++)
{
job j = create_task(td.id, i);
futures.push_back(j.get_future());
std::unique_lock<std::mutex> l(td.m);
td.jobs.push(std::move(j));
}
// Notify the thread that there is work do to...
td.cv.notify_one();
}
// Wait for all the tasks to be completed...
for (auto& f : futures) { f.wait(); }
futures.clear();
// Send stop signal to all threads and join them...
for (auto& td : threads)
{
std::unique_lock<std::mutex> l(td.m);
td.stop = true;
td.cv.notify_one();
}
// Join all the threads
for (auto& td : threads) { td.t.join(); }
}
The concept you want is the threadpool. This SO question deals with existing implementations.
The idea is to have a container for a number of thread instances. Each instance is associated with a function which polls a task queue, and when a task is available, pulls it and run it. Once the task is over (if it terminates, but that's another problem), the thread simply loop over to the task queue.
So you need a synchronized queue, a thread class which implements the loop on the queue, an interface for the task objects, and maybe a class to drive the whole thing (the pool class).
Alternatively, you could make a very specialized thread class for the task it has to perform (with only the memory area as a parameter for instance). This requires a notification mechanism for the threads to indicate that they are done with the current iteration.
The thread main function would be a loop on that specific task, and at the end of one iteration, the thread signals its end, and wait on condition variables to start the next loop. In essence, you would be inlining the task code within the thread, dropping the need of a queue altogether.
using namespace std;
// semaphore class based on C++11 features
class semaphore {
private:
mutex mMutex;
condition_variable v;
int mV;
public:
semaphore(int v): mV(v){}
void signal(int count=1){
unique_lock lock(mMutex);
mV+=count;
if (mV > 0) mCond.notify_all();
}
void wait(int count = 1){
unique_lock lock(mMutex);
mV-= count;
while (mV < 0)
mCond.wait(lock);
}
};
template <typename Task>
class TaskThread {
thread mThread;
Task *mTask;
semaphore *mSemStarting, *mSemFinished;
volatile bool mRunning;
public:
TaskThread(Task *task, semaphore *start, semaphore *finish):
mTask(task), mRunning(true),
mSemStart(start), mSemFinished(finish),
mThread(&TaskThread<Task>::psrun){}
~TaskThread(){ mThread.join(); }
void run(){
do {
(*mTask)();
mSemFinished->signal();
mSemStart->wait();
} while (mRunning);
}
void finish() { // end the thread after the current loop
mRunning = false;
}
private:
static void psrun(TaskThread<Task> *self){ self->run();}
};
classcMyTask {
public:
MyTask(){}
void operator()(){
// some code here
}
};
int main(){
MyTask task1;
MyTask task2;
semaphore start(2), finished(0);
TaskThread<MyTask> t1(&task1, &start, &finished);
TaskThread<MyTask> t2(&task2, &start, &finished);
for (int i = 0; i < 10; i++){
finished.wait(2);
start.signal(2);
}
t1.finish();
t2.finish();
}
The proposed (crude) implementation above relies on the Task type which must provide the operator() (ie. a functor like class). I said you could incorporate the task code directly in the thread function body earlier, but since I don't know it, I kept it as abstract as I could. There's one condition variable for the start of threads, and one for their end, both encapsulated in semaphore instances.
Seeing the other answer proposing the use of boost::barrier, I can only support this idea: make sure to replace my semaphore class with that class if possible, the reason being that it is better to rely on well tested and maintained external code rather than a self implemented solution for the same feature set.
All in all, both approaches are valid, but the former gives up a tiny bit of performance in favor of flexibility. If the task to be performed takes a sufficiently long time, the management and queue synchronization cost becomes negligible.
Update: code fixed and tested. Replaced a simple condition variable by a semaphore.
It can easily be achieved using a barrier (just a convenience wrapper over a conditional variable and a counter). It basically blocks until all N threads have reached the "barrier". It then "recycles" again. Boost provides an implementation.
void myfunc(void * p, boost::barrier& start_barrier, boost::barrier& end_barrier) {
while (!stop_condition) // You'll need to tell them to stop somehow
{
start_barrier.wait ();
do_something(p);
end_barrier.wait ();
}
}
int main(){
void * myp[n_threads] {a_location, another_location,...};
boost::barrier start_barrier (n_threads + 1); // child threads + main thread
boost::barrier end_barrier (n_threads + 1); // child threads + main thread
std::thread mythread[n_threads];
for (unsigned int i=0; i < n_threads; i++) {
mythread[i] = std::thread(myfunc, myp[i], start_barrier, end_barrier);
}
start_barrier.wait (); // first unblock the threads
for (unsigned long int j=0; j < ULONG_MAX; j++) {
end_barrier.wait (); // mix_data must not execute before the threads are done
mix_data(myp);
start_barrier.wait (); // threads must not start new iteration before mix_data is done
}
return 0;
}
The following is a simple compiling and working code performing some random stuffs. It implements aleguna's concept of barrier. The task length of each thread is different so it is really necessary to have a strong synchronization mechanism. I will try to do a pool on the same tasks and benchmark the result, and then maybe with futures as pointed out by Andy Prowl.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <chrono>
#include <complex>
#include <random>
const unsigned int n_threads=4; //varying this will not (almost) change the total amount of work
const unsigned int task_length=30000/n_threads;
const float task_length_variation=task_length/n_threads;
unsigned int rep=1000; //repetitions of tasks
class t_chronometer{
private:
std::chrono::steady_clock::time_point _t;
public:
t_chronometer(): _t(std::chrono::steady_clock::now()) {;}
void reset() {_t = std::chrono::steady_clock::now();}
double get_now() {return std::chrono::duration_cast<std::chrono::duration<double>>(std::chrono::steady_clock::now() - _t).count();}
double get_now_ms() {return
std::chrono::duration_cast<std::chrono::duration<double,std::milli>>(std::chrono::steady_clock::now() - _t).count();}
};
class t_barrier {
private:
std::mutex m_mutex;
std::condition_variable m_cond;
unsigned int m_threshold;
unsigned int m_count;
unsigned int m_generation;
public:
t_barrier(unsigned int count):
m_threshold(count),
m_count(count),
m_generation(0) {
}
bool wait() {
std::unique_lock<std::mutex> lock(m_mutex);
unsigned int gen = m_generation;
if (--m_count == 0)
{
m_generation++;
m_count = m_threshold;
m_cond.notify_all();
return true;
}
while (gen == m_generation)
m_cond.wait(lock);
return false;
}
};
using namespace std;
void do_something(complex<double> * c, unsigned int max) {
complex<double> a(1.,0.);
complex<double> b(1.,0.);
for (unsigned int i = 0; i<max; i++) {
a *= polar(1.,2.*M_PI*i/max);
b *= polar(1.,4.*M_PI*i/max);
*(c)+=a+b;
}
}
bool done=false;
void task(complex<double> * c, unsigned int max, t_barrier* start_barrier, t_barrier* end_barrier) {
while (!done) {
start_barrier->wait ();
do_something(c,max);
end_barrier->wait ();
}
cout << "task finished" << endl;
}
int main() {
t_chronometer t;
std::default_random_engine gen;
std::normal_distribution<double> dis(.0,1000.0);
complex<double> cpx[n_threads];
for (unsigned int i=0; i < n_threads; i++) {
cpx[i] = complex<double>(dis(gen), dis(gen));
}
t_barrier start_barrier (n_threads + 1); // child threads + main thread
t_barrier end_barrier (n_threads + 1); // child threads + main thread
std::thread mythread[n_threads];
unsigned long int sum=0;
for (unsigned int i=0; i < n_threads; i++) {
unsigned int max = task_length + i * task_length_variation;
cout << i+1 << "th task length: " << max << endl;
mythread[i] = std::thread(task, &cpx[i], max, &start_barrier, &end_barrier);
sum+=max;
}
cout << "total task length " << sum << endl;
complex<double> c(0,0);
for (unsigned long int j=1; j < rep+1; j++) {
start_barrier.wait (); //give to the threads the missing call to start
if (j==rep) done=true;
end_barrier.wait (); //wait for the call from each tread
if (j%100==0) cout << "cycle: " << j << endl;
for (unsigned int i=0; i<n_threads; i++) {
c+=cpx[i];
}
}
for (unsigned int i=0; i < n_threads; i++) {
mythread[i].join();
}
cout << "result: " << c << " it took: " << t.get_now() << " s." << endl;
return 0;
}