How can I avoid a race condition between these two interlated threads - c++

The little program I wrote consistently gets block due to what I think is a race condition. Can somebody kindly help me identifying my error?
There is one function, process_two which does some tasks and then relies on another function to update some underlying data structure before it may continue. Note: this function is run by multiple threads simultaneously This function is:
void process_two(int n_tasks) {
while (total_iter < 100) {
// do some work
iter_since_update += n_tasks;
total_iter += n_tasks;
std::shared_lock<std::shared_mutex> lk(mutex);
cv.wait(lk, [&] { return data_updated; });
data_updated = false;
lk.unlock();
}
}
The function does some work and increments a count of the completed tasks. Then it acquires a shared lock (if possible), and waits for a condition variable. After having waited, the thread resets the condition and unlocks the mutex. The function updating the underlying data structure is:
void process_one(int threshold) {
while (total_iter < 100) {
if (iter_since_update >= threshold) {
std::lock_guard<std::shared_mutex> lk(mutex);
// updating
data_updated = true;
iter_since_update = 0;
cv.notify_all();
}
}
}
Whenever the other function has made some number of iterations, process_one acquires a lock and updates the data. It also resets the counter of iterations and notifies the other threads. Finally, the entire program is:
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <shared_mutex>
#include <thread>
#include <vector>
int iter_since_update(0);
bool data_updated(true);
std::shared_mutex mutex;
std::condition_variable_any cv;
int total_iter(0);
void process_two(int n_tasks) {
while (total_iter < 100) {
// do some work
iter_since_update += n_tasks;
total_iter += n_tasks;
std::shared_lock<std::shared_mutex> lk(mutex);
cv.wait(lk, [&] { return data_updated; });
data_updated = false;
lk.unlock();
}
}
void process_one(int threshold) {
while (total_iter < 100) {
if (iter_since_update >= threshold) {
std::lock_guard<std::shared_mutex> lk(mutex);
// updating
data_updated = true;
iter_since_update = 0;
cv.notify_all();
}
}
}
int main() {
int total_tasks(10);
int n_threads(1);
int n_tasks = total_tasks / n_threads;
std::thread update_thread([&]() { process_one(total_tasks);});
std::vector<std::thread> threads;
for (int i = 0; i < n_threads; i++)
threads.push_back(std::thread([&]() { process_two(n_tasks); }));
while (total_iter < 100) {
;
}
update_thread.join();
for (int i = 0; i < n_threads; i++) threads[i].join();
}
I set a total number of tasks and let each thread running process_two do a fraction of these tasks. Whenever I set n_threads = 1, the program completes almost always. Whenever I set n_threads = 2, the program always failed.
I am new to multithreading in c++ and would greatly appreciate any suggestion as to what I am doing wrong. I suspect that the condition in process_one (waiting until a number of iterations have been done) is wrong. Whenever I saw condition variables being used in examples in Anthony Williams' wonderful book (see section 4.1.1. in the 2012 edition) one function was operating until an external condition is negated but here I have both processes depending on each other. Can somebody kindly point out what I can improve?

Related

QtConcurrent: why releaseThread and reserveThread cause deadlock?

In Qt 4.7 Reference for QThreadPool, we find:
void QThreadPool::releaseThread()
Releases a thread previously reserved by a call to reserveThread().
Note: Calling this function without previously reserving a thread
temporarily increases maxThreadCount(). This is useful when a thread
goes to sleep waiting for more work, allowing other threads to
continue. Be sure to call reserveThread() when done waiting, so that
the thread pool can correctly maintain the activeThreadCount().
See also reserveThread().
void QThreadPool::reserveThread()
Reserves one thread, disregarding activeThreadCount() and
maxThreadCount().
Once you are done with the thread, call releaseThread() to allow it to
be reused.
Note: This function will always increase the number of active threads.
This means that by using this function, it is possible for
activeThreadCount() to return a value greater than maxThreadCount().
See also releaseThread().
I want to use releaseThread() to make it possible to use nested concurrent map, but in the following code, it hangs in waitForFinished():
#include <QApplication>
#include <QMainWindow>
#include <QtConcurrentMap>
#include <QtConcurrentRun>
#include <QFuture>
#include <QThreadPool>
#include <QtTest/QTest>
#include <QFutureSynchronizer>
struct Task2 { // only calculation
typedef void result_type;
void operator()(int count) {
int k = 0;
for (int i = 0; i < count * 10; ++i) {
for (int j = 0; j < count * 10; ++j) {
k++;
}
}
assert(k >= 0);
}
};
struct Task1 { // will launch some other concurrent map
typedef void result_type;
void operator()(int count) {
QVector<int> vec;
for (int i = 0; i < 5; ++i) {
vec.push_back(i+count);
}
Task2 task;
QFuture<void> f = QtConcurrent::map(vec.begin(), vec.end(), task);
{
// with out releaseThread before wait, it will hang directly
QThreadPool::globalInstance()->releaseThread();
f.waitForFinished(); // BUG: may hang there
QThreadPool::globalInstance()->reserveThread();
}
}
};
int main() {
QThreadPool* gtpool = QThreadPool::globalInstance();
gtpool->setExpiryTimeout(50);
int count = 0;
for (;;) {
QVector<int> vec;
for (int i = 0; i < 40 ; i++) {
vec.push_back(i);
}
// launch a task with nested map
Task1 task; // Task1 will have nested concurrent map
QFuture<void> f = QtConcurrent::map(vec.begin(), vec.end(),task);
f.waitForFinished(); // BUG: may hang there
count++;
// waiting most of thread in thread pool expire
while (QThreadPool::globalInstance()->activeThreadCount() > 0) {
QTest::qSleep(50);
}
// launch a task only calculation
Task2 task2;
QFuture<void> f2 = QtConcurrent::map(vec.begin(), vec.end(), task2);
f2.waitForFinished(); // BUG: may hang there
qDebug() << count;
}
return 0;
}
This code will not run forever; it will hang in after many loops (1~10000), with all threads waiting for condition variable.
My questions are:
Why does it hang?
Can I fix it and keep the nested concurrent map?
dev env:
Linux version 2.6.32-696.18.7.el6.x86_64; Qt4.7.4; GCC 3.4.5
Windows 7; Qt4.7.4; mingw 4.4.0
The program hangs because of the race condition in QThreadPool when you try to deal with expiryTimeout. Here is the analysis in detail :
The problem in QThreadPool - source
When starting a task, QThreadPool did something along the lines of:
QMutexLocker locker(&mutex);
taskQueue.append(task); // Place the task on the task queue
if (waitingThreads > 0) {
// there are already running idle thread. They are waiting on the 'runnableReady'
// QWaitCondition. Wake one up them up.
waitingThreads--;
runnableReady.wakeOne();
} else if (runningThreadCount < maxThreadCount) {
startNewThread(task);
}
And the the thread's main loop looks like this:
void QThreadPoolThread::run()
{
QMutexLocker locker(&manager->mutex);
while (true) {
/* ... */
if (manager->taskQueue.isEmpty()) {
// no pending task, wait for one.
bool expired = !manager->runnableReady.wait(locker.mutex(),
manager->expiryTimeout);
if (expired) {
manager->runningThreadCount--;
return;
} else {
continue;
}
}
QRunnable *r = manager->taskQueue.takeFirst();
// run the task
locker.unlock();
r->run();
locker.relock();
}
}
The idea is that the thread will wait for a given amount of second for
a task, but if no task was added in a given amount of time, the thread
expires and is terminated. The problem here is that we rely on the
return value of runnableReady. If there is a task that is scheduled at
exactly the same time as the thread expires, then the thread will see
false and will expire. But the main thread will not restart any other
thread. That might let the application hang as the task will never be
run.
The quick workaround is to use a long expiryTime (30000 by default) and remove the while loop that waits for the threads expired.
Here is the main function modified, the program runs smoothly in Windows 7, 4 threads used by default :
int main() {
QThreadPool* gtpool = QThreadPool::globalInstance();
//gtpool->setExpiryTimeout(50); <-- don't set the expiry Timeout, use the default one.
qDebug() << gtpool->maxThreadCount();
int count = 0;
for (;;) {
QVector<int> vec;
for (int i = 0; i < 40 ; i++) {
vec.push_back(i);
}
// launch a task with nested map
Task1 task; // Task1 will have nested concurrent map
QFuture<void> f = QtConcurrent::map(vec.begin(), vec.end(),task);
f.waitForFinished(); // BUG: may hang there
count++;
/*
// waiting most of thread in thread pool expire
while (QThreadPool::globalInstance()->activeThreadCount() > 0)
{
QTest::qSleep(50);
}
*/
// launch a task only calculation
Task2 task2;
QFuture<void> f2 = QtConcurrent::map(vec.begin(), vec.end(), task2);
f2.waitForFinished(); // BUG: may hang there
qDebug() << count ;
}
return 0;
}
#tungIt's answer is good enough, I found the qtbug and fix commit, just for reference:
https://bugreports.qt.io/browse/QTBUG-3786
https://github.com/qt/qtbase/commit/a9b6a78e54670a70b96c122b10ad7bd64d166514#diff-6d5794cef91df41c39b5e7cc6b71d041

Thread pool on a queue in C++

I've been trying to solve a problem concurrently, which fits the thread pool pattern very nicely. Here I will try to provide a minimal representative example:
Say we have a pseudo-program like this:
Q : collection<int>
while (!Q.empty()) {
for each q in Q {
// perform some computation
}
// assign a new value to Q
Q = something_completely_new();
}
I'm trying to implement that in a parallel way, with n-1 workers and one main thread. The workers will perform the computation in the inner loop by grabbing elements from Q.
I tried to solve this using two conditional variables, work, on which the master threads notifies the workers that Q has been assigned to, and another, work_done, where the workers notify master that the entire computation might be done.
Here's my C++ code:
#include <iostream>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <thread>
using namespace std;
std::queue<int> Q;
std::mutex mut;
std::condition_variable work;
std::condition_variable work_done;
void run_thread() {
for (;;) {
std::unique_lock<std::mutex> lock(mut);
work.wait(lock, [&] { return Q.size() > 0; });
// there is work to be done - pretend we're working on something
int x = Q.front(); Q.pop();
std::cout << "Working on " << x << std::endl;
work_done.notify_one();
}
}
int main() {
// your code goes here
std::vector<std::thread *> workers(3);
for (size_t i = 0; i < 3; i++) {
workers[i] = new std::thread{
[&] { run_thread(); }
};
}
for (int i = 4; i > 0; --i) {
std::unique_lock<std::mutex> lock(mut);
Q = std::queue<int>();
for (int k = 0; k < i; k++) {
Q.push(k);
}
work.notify_all();
work_done.wait(lock, [&] { return Q.size() == 0; });
}
for (size_t i = 0; i < 3; i++) {
delete workers[i];
}
return 0;
}
Unfortunately, after compiling it on OS X with g++ -std=c++11 -Wall -o main main.cpp I get the following output:
Working on 0
Working on 1
Working on 2
Working on 3
Working on 0
Working on 1
Working on 2
Working on 0
Working on 1
Working on 0
libc++abi.dylib: terminating
Abort trap: 6
After a while of googling it looks like a segmentation fault. It probably has to do with me misusing conditional variables. I would appreciate some insight, both architectural (on how to approach this type of problem) and specific, as in what I'm doing wrong here exactly.
I appreciate the help
Your application was killed by std::terminate.
Body of your thread function is infinite-loop, so when these lines are executed
for (size_t i = 0; i < 3; i++) {
delete workers[i];
}
you want to delete threads which are still running (each thread is in joinable state). When you call destructor of thread which is in joinable state the following thing happens (from http://www.cplusplus.com/reference/thread/thread/~thread/)
If the thread is joinable when destroyed, terminate() is called.
so if you want terminate not to be called, you should call detach() method after creating threads.
for (size_t i = 0; i < 3; i++) {
workers[i] = new std::thread{
[&] { run_thread(); }
};
workers[i]->detach(); // <---
}
Just because the queue is empty doesn't mean the work is done.
finished = true;
work.notify_all();
for (size_t i = 0; i < 3; i++) {
workers[i].join(); // wait for threads to finish
delete workers[i];
}
and we need some way to terminate the threads
for (;!finshed;) {
std::unique_lock<std::mutex> lock(mut);
work.wait(lock, [&] { return Q.size() > 0 || finished; });
if (finished)
return;

c++ thread does not execute

The thread1 function does not seem to get executed
#include <iostream>
#include <fstream>
#include <thread>
#include <condition_variable>
#include <queue>
std::condition_variable cv;
std::mutex mu;
std::queue<int> queue;
bool ready;
static void thread1() {
while(!ready) {std::this_thread::sleep_for(std::chrono::milliseconds(10));}
while(ready && queue.size() <= 4) {
std::unique_lock<std::mutex> lk(mu);
cv.wait(lk, [&]{return !queue.empty();});
queue.push(2);
}
}
int main() {
ready = false;
std::thread t(thread1);
while(queue.size() <= 4) {
{
std::lock_guard<std::mutex> lk(mu);
queue.push(1);
}
ready = true;
cv.notify_one();
}
t.join();
for(int i = 0; i <= queue.size(); i++) {
int a = queue.front();
std::cout << a << std::endl;
queue.pop();
}
return 0;
}
On my Mac the output is 1 2 1 2 but in my ubuntu its 1 1 1. I'm compiling with g++ -std=c++11 -pthread -o thread.out thread.cpp && ./thread.out. Am I missing something?
This:
for(int i = 0; i <= queue.size(); i++) {
int a = queue.front();
std::cout << a << std::endl;
queue.pop();
}
Is undefined behavior. A for loop that goes from 0 to size runs size+1 times. I would suggest that you write this in the more idiomatic style for a queue:
while(!queue.empty()) {
int a = queue.front();
std::cout << a << std::endl;
queue.pop();
}
When I run this on coliru, which I assume runs some kind of *nix machine, I get 4 1's: http://coliru.stacked-crooked.com/a/8de5b01e87e8549e.
Again, you haven't specified anything that would force each thread to run a certain amount of times. You only (try to*) cause an invariant where the queue will reach size 4, either way. It just happens to be that on the machines that we ran it on, thread 2 never manages to acquire the mutex.
This example will be more interesting if you add more work or even (just for pedagogical purposes) delays at various points. Simulating that the two threads are actually doing work. If you add sleeps at various points you can ensure that the two threads alternate, though depending where you add them you may see your invariant of 4 elements in the thread break!
*Note that even your 4 element invariant on the queue, is not really an invariant. It is possible (though very unlikely) that both threads pass the while condition at the exact same moment, when there are 3 elements in the queue. One acquires the lock first and pushes, and then the other. So you can end up with 5 elements in the queue! (as you can see, asynchronous programming is tricky). In particular you really need to check the queue size when you have the lock in order for this to work.
I was able to solve this by making the second thread wait on a separate predicate on a separate conditional variable. I'm not sure if queue.size() is thread safe.
#include <iostream>
#include <fstream>
#include <thread>
#include <condition_variable>
#include <queue>
std::condition_variable cv;
std::condition_variable cv2;
std::mutex mu;
std::queue<int> queue;
bool tick;
bool tock;
static void thread1() {
while(queue.size() < 6) {
std::unique_lock<std::mutex> lk(mu);
cv2.wait(lk, []{return tock;});
queue.push(1);
tock = false;
tick = true;
cv.notify_one();
}
}
int main() {
tick = false;
tock = true;
std::thread t(thread1);
while(queue.size() < 6) {
std::unique_lock<std::mutex> lk(mu);
cv.wait(lk, []{return tick;});
queue.push(2);
tick = false;
tock = true;
cv2.notify_one();
}
t.join();
while(!queue.empty()) {
int r = queue.front();
queue.pop();
std::cout << r << std::endl;
}
return 0;
}

How to iterate through boost thread specific pointers

I have a multi-thread application. Each thread initializes a struct data type in its own local storage. Some elements are being added to the vectors inside the struct type variables. At the end of the program, I would like to iterate through these thread local storages and add all the results together. How can I iterate through the thread specific pointer so that I can add all the results from the multi threads together ?
Thanks in advance.
boost::thread_specific_ptr<testStruct> tss;
size_t x = 10;
void callable(string str, int x) {
if(!tss.get()){
tss.reset(new testStruct);
(*tss).xInt.resize(x, 0);
}
// Assign some values to the vector elements after doing some calculations
}
Example:
#include <iostream>
#include <vector>
#include <boost/thread/mutex.hpp>
#include <boost/thread/tss.hpp>
#include <boost/thread.hpp>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#define NR_THREAD 4
#define SAMPLE_SIZE 500
using namespace std;
static bool busy = false;
struct testStruct{
vector<int> intVector;
};
boost::asio::io_service ioService;
boost::thread_specific_ptr<testStruct> tsp;
boost::condition_variable cond;
boost::mutex mut;
void callable(int x) {
if(!tsp.get()){
tsp.reset(new testStruct);
}
(*tsp).intVector.push_back(x);
if (x + 1 == SAMPLE_SIZE){
busy = true;
cond.notify_all();
}
}
int main() {
boost::thread_group threads;
size_t (boost::asio::io_service::*run)() = &boost::asio::io_service::run;
boost::asio::io_service::work work(ioService);
for (short int i = 0; i < NR_THREAD; ++i) {
threads.create_thread(boost::bind(run, &ioService));
}
size_t iterations = 10;
for (int i = 0; i < iterations; i++) {
busy = false;
for (short int j = 0; j < SAMPLE_SIZE; ++j) {
ioService.post(boost::bind(callable, j));
}
// all threads need to finish the job for the next iteration
boost::unique_lock<boost::mutex> lock(mut);
while (!busy) {
cond.wait(lock);
}
cout << "Iteration: " << i << endl;
}
vector<int> sum(SAMPLE_SIZE, 0); // sum up all the values from thread local storages
work.~work();
threads.join_all();
return 0;
}
So, after I haven given some thought to this issue, I have come up with such a solution:
void accumulateTLS(size_t idxThread){
if (idxThread == nr_threads) // Suspend all the threads till all of them are called and waiting here
{
busy = true;
}
boost::unique_lock<boost::mutex> lock(mut);
while (!busy)
{
cond.wait(lock);
}
// Accumulate the variables using thread specific pointer
cond.notify_one();
}
With boost io_service, the callable function can be changed after the threads are initialized. So, after I have done all the calculations, I am sending jobs(as many as the number of threads) to the io service again with callable function accumulateTLS(idxThread). The N jobs are sent to N threads and the accumulation process is done inside accumulateTLS method.
P.S. instead of work.~work(), work.reset() should be used.

Extend the life of threads with synchronization (C++11)

I have a program with a function which takes a pointer as arg, and a main. The main is creating n threads, each of them running the function on different memory areas depending on the passed arg. Threads are then joined, the main performs some data mixing between the area and creates n new threads which do the the same operation as the old ones.
To improve the program I would like to keep the threads alive, removing the long time necessary to create them. Threads should sleep when the main is working and notified when they have to come up again. At the same way the main should wait when threads are working as it did with join.
I cannot end up with a strong implementation of this, always falling in a deadlock.
Simple baseline code, any hints about how to modify this would be much appreciated
#include <thread>
#include <climits>
...
void myfunc(void * p) {
do_something(p);
}
int main(){
void * myp[n_threads] {a_location, another_location,...};
std::thread mythread[n_threads];
for (unsigned long int j=0; j < ULONG_MAX; j++) {
for (unsigned int i=0; i < n_threads; i++) {
mythread[i] = std::thread(myfunc, myp[i]);
}
for (unsigned int i=0; i < n_threads; i++) {
mythread[i].join();
}
mix_data(myp);
}
return 0;
}
Here is a possible approach using only classes from the C++11 Standard Library. Basically, each thread you create has an associated command queue (encapsulated in std::packaged_task<> objects) which it continuously check. If the queue is empty, the thread will just wait on a condition variable (std::condition_variable).
While data races are avoided through the use of std::mutex and std::unique_lock<> RAII wrappers, the main thread can wait for a particular job to be terminated by storing the std::future<> object associated to each submitted std::packaged_tast<> and call wait() on it.
Below is a simple program that follows this design. Comments should be sufficient to explain what it does:
#include <thread>
#include <iostream>
#include <sstream>
#include <future>
#include <queue>
#include <condition_variable>
#include <mutex>
// Convenience type definition
using job = std::packaged_task<void()>;
// Some data associated to each thread.
struct thread_data
{
int id; // Could use thread::id, but this is filled before the thread is started
std::thread t; // The thread object
std::queue<job> jobs; // The job queue
std::condition_variable cv; // The condition variable to wait for threads
std::mutex m; // Mutex used for avoiding data races
bool stop = false; // When set, this flag tells the thread that it should exit
};
// The thread function executed by each thread
void thread_func(thread_data* pData)
{
std::unique_lock<std::mutex> l(pData->m, std::defer_lock);
while (true)
{
l.lock();
// Wait until the queue won't be empty or stop is signaled
pData->cv.wait(l, [pData] () {
return (pData->stop || !pData->jobs.empty());
});
// Stop was signaled, let's exit the thread
if (pData->stop) { return; }
// Pop one task from the queue...
job j = std::move(pData->jobs.front());
pData->jobs.pop();
l.unlock();
// Execute the task!
j();
}
}
// Function that creates a simple task
job create_task(int id, int jobNumber)
{
job j([id, jobNumber] ()
{
std::stringstream s;
s << "Hello " << id << "." << jobNumber << std::endl;
std::cout << s.str();
});
return j;
}
int main()
{
const int numThreads = 4;
const int numJobsPerThread = 10;
std::vector<std::future<void>> futures;
// Create all the threads (will be waiting for jobs)
thread_data threads[numThreads];
int tdi = 0;
for (auto& td : threads)
{
td.id = tdi++;
td.t = std::thread(thread_func, &td);
}
//=================================================
// Start assigning jobs to each thread...
for (auto& td : threads)
{
for (int i = 0; i < numJobsPerThread; i++)
{
job j = create_task(td.id, i);
futures.push_back(j.get_future());
std::unique_lock<std::mutex> l(td.m);
td.jobs.push(std::move(j));
}
// Notify the thread that there is work do to...
td.cv.notify_one();
}
// Wait for all the tasks to be completed...
for (auto& f : futures) { f.wait(); }
futures.clear();
//=================================================
// Here the main thread does something...
std::cin.get();
// ...done!
//=================================================
//=================================================
// Posts some new tasks...
for (auto& td : threads)
{
for (int i = 0; i < numJobsPerThread; i++)
{
job j = create_task(td.id, i);
futures.push_back(j.get_future());
std::unique_lock<std::mutex> l(td.m);
td.jobs.push(std::move(j));
}
// Notify the thread that there is work do to...
td.cv.notify_one();
}
// Wait for all the tasks to be completed...
for (auto& f : futures) { f.wait(); }
futures.clear();
// Send stop signal to all threads and join them...
for (auto& td : threads)
{
std::unique_lock<std::mutex> l(td.m);
td.stop = true;
td.cv.notify_one();
}
// Join all the threads
for (auto& td : threads) { td.t.join(); }
}
The concept you want is the threadpool. This SO question deals with existing implementations.
The idea is to have a container for a number of thread instances. Each instance is associated with a function which polls a task queue, and when a task is available, pulls it and run it. Once the task is over (if it terminates, but that's another problem), the thread simply loop over to the task queue.
So you need a synchronized queue, a thread class which implements the loop on the queue, an interface for the task objects, and maybe a class to drive the whole thing (the pool class).
Alternatively, you could make a very specialized thread class for the task it has to perform (with only the memory area as a parameter for instance). This requires a notification mechanism for the threads to indicate that they are done with the current iteration.
The thread main function would be a loop on that specific task, and at the end of one iteration, the thread signals its end, and wait on condition variables to start the next loop. In essence, you would be inlining the task code within the thread, dropping the need of a queue altogether.
using namespace std;
// semaphore class based on C++11 features
class semaphore {
private:
mutex mMutex;
condition_variable v;
int mV;
public:
semaphore(int v): mV(v){}
void signal(int count=1){
unique_lock lock(mMutex);
mV+=count;
if (mV > 0) mCond.notify_all();
}
void wait(int count = 1){
unique_lock lock(mMutex);
mV-= count;
while (mV < 0)
mCond.wait(lock);
}
};
template <typename Task>
class TaskThread {
thread mThread;
Task *mTask;
semaphore *mSemStarting, *mSemFinished;
volatile bool mRunning;
public:
TaskThread(Task *task, semaphore *start, semaphore *finish):
mTask(task), mRunning(true),
mSemStart(start), mSemFinished(finish),
mThread(&TaskThread<Task>::psrun){}
~TaskThread(){ mThread.join(); }
void run(){
do {
(*mTask)();
mSemFinished->signal();
mSemStart->wait();
} while (mRunning);
}
void finish() { // end the thread after the current loop
mRunning = false;
}
private:
static void psrun(TaskThread<Task> *self){ self->run();}
};
classcMyTask {
public:
MyTask(){}
void operator()(){
// some code here
}
};
int main(){
MyTask task1;
MyTask task2;
semaphore start(2), finished(0);
TaskThread<MyTask> t1(&task1, &start, &finished);
TaskThread<MyTask> t2(&task2, &start, &finished);
for (int i = 0; i < 10; i++){
finished.wait(2);
start.signal(2);
}
t1.finish();
t2.finish();
}
The proposed (crude) implementation above relies on the Task type which must provide the operator() (ie. a functor like class). I said you could incorporate the task code directly in the thread function body earlier, but since I don't know it, I kept it as abstract as I could. There's one condition variable for the start of threads, and one for their end, both encapsulated in semaphore instances.
Seeing the other answer proposing the use of boost::barrier, I can only support this idea: make sure to replace my semaphore class with that class if possible, the reason being that it is better to rely on well tested and maintained external code rather than a self implemented solution for the same feature set.
All in all, both approaches are valid, but the former gives up a tiny bit of performance in favor of flexibility. If the task to be performed takes a sufficiently long time, the management and queue synchronization cost becomes negligible.
Update: code fixed and tested. Replaced a simple condition variable by a semaphore.
It can easily be achieved using a barrier (just a convenience wrapper over a conditional variable and a counter). It basically blocks until all N threads have reached the "barrier". It then "recycles" again. Boost provides an implementation.
void myfunc(void * p, boost::barrier& start_barrier, boost::barrier& end_barrier) {
while (!stop_condition) // You'll need to tell them to stop somehow
{
start_barrier.wait ();
do_something(p);
end_barrier.wait ();
}
}
int main(){
void * myp[n_threads] {a_location, another_location,...};
boost::barrier start_barrier (n_threads + 1); // child threads + main thread
boost::barrier end_barrier (n_threads + 1); // child threads + main thread
std::thread mythread[n_threads];
for (unsigned int i=0; i < n_threads; i++) {
mythread[i] = std::thread(myfunc, myp[i], start_barrier, end_barrier);
}
start_barrier.wait (); // first unblock the threads
for (unsigned long int j=0; j < ULONG_MAX; j++) {
end_barrier.wait (); // mix_data must not execute before the threads are done
mix_data(myp);
start_barrier.wait (); // threads must not start new iteration before mix_data is done
}
return 0;
}
The following is a simple compiling and working code performing some random stuffs. It implements aleguna's concept of barrier. The task length of each thread is different so it is really necessary to have a strong synchronization mechanism. I will try to do a pool on the same tasks and benchmark the result, and then maybe with futures as pointed out by Andy Prowl.
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <chrono>
#include <complex>
#include <random>
const unsigned int n_threads=4; //varying this will not (almost) change the total amount of work
const unsigned int task_length=30000/n_threads;
const float task_length_variation=task_length/n_threads;
unsigned int rep=1000; //repetitions of tasks
class t_chronometer{
private:
std::chrono::steady_clock::time_point _t;
public:
t_chronometer(): _t(std::chrono::steady_clock::now()) {;}
void reset() {_t = std::chrono::steady_clock::now();}
double get_now() {return std::chrono::duration_cast<std::chrono::duration<double>>(std::chrono::steady_clock::now() - _t).count();}
double get_now_ms() {return
std::chrono::duration_cast<std::chrono::duration<double,std::milli>>(std::chrono::steady_clock::now() - _t).count();}
};
class t_barrier {
private:
std::mutex m_mutex;
std::condition_variable m_cond;
unsigned int m_threshold;
unsigned int m_count;
unsigned int m_generation;
public:
t_barrier(unsigned int count):
m_threshold(count),
m_count(count),
m_generation(0) {
}
bool wait() {
std::unique_lock<std::mutex> lock(m_mutex);
unsigned int gen = m_generation;
if (--m_count == 0)
{
m_generation++;
m_count = m_threshold;
m_cond.notify_all();
return true;
}
while (gen == m_generation)
m_cond.wait(lock);
return false;
}
};
using namespace std;
void do_something(complex<double> * c, unsigned int max) {
complex<double> a(1.,0.);
complex<double> b(1.,0.);
for (unsigned int i = 0; i<max; i++) {
a *= polar(1.,2.*M_PI*i/max);
b *= polar(1.,4.*M_PI*i/max);
*(c)+=a+b;
}
}
bool done=false;
void task(complex<double> * c, unsigned int max, t_barrier* start_barrier, t_barrier* end_barrier) {
while (!done) {
start_barrier->wait ();
do_something(c,max);
end_barrier->wait ();
}
cout << "task finished" << endl;
}
int main() {
t_chronometer t;
std::default_random_engine gen;
std::normal_distribution<double> dis(.0,1000.0);
complex<double> cpx[n_threads];
for (unsigned int i=0; i < n_threads; i++) {
cpx[i] = complex<double>(dis(gen), dis(gen));
}
t_barrier start_barrier (n_threads + 1); // child threads + main thread
t_barrier end_barrier (n_threads + 1); // child threads + main thread
std::thread mythread[n_threads];
unsigned long int sum=0;
for (unsigned int i=0; i < n_threads; i++) {
unsigned int max = task_length + i * task_length_variation;
cout << i+1 << "th task length: " << max << endl;
mythread[i] = std::thread(task, &cpx[i], max, &start_barrier, &end_barrier);
sum+=max;
}
cout << "total task length " << sum << endl;
complex<double> c(0,0);
for (unsigned long int j=1; j < rep+1; j++) {
start_barrier.wait (); //give to the threads the missing call to start
if (j==rep) done=true;
end_barrier.wait (); //wait for the call from each tread
if (j%100==0) cout << "cycle: " << j << endl;
for (unsigned int i=0; i<n_threads; i++) {
c+=cpx[i];
}
}
for (unsigned int i=0; i < n_threads; i++) {
mythread[i].join();
}
cout << "result: " << c << " it took: " << t.get_now() << " s." << endl;
return 0;
}