signal that a batch of threads has finished to a masterthread - c++

I think I miss a fundamental design pattern concerning multiprogramming.
I got at solution to a problem but I would say its overly complex.
At program start, I'm allocating a static pool of workers and a master thread, that live throughout the program run. (pseudocode below)
void *worker(){
while(1){
//perworker mutex lock
//wait for workerSIGNAL
//do calculations
//perworker mutex unlock
}
}
My master thread signals all my workers, when the workers are done, they wait for the next signal from the master thread. (pseudocode below)
void *master(){
while(1){
//masterMutex lock
//wait for masterSignal
//signal all workerthread to start running
/*
SHOULD WAIT FOR ALL WORKER THREADS TO FINISH
(that is when workers are done with the calculations,
and are waiting for a new signal)
*/
//materMutex unlock
}
}
My master thread gets a signal from another part of my code (non thread), which means that only one masterthread exists. (pseudocode below)
double callMaster(){
//SIGNAL masterThread
//return value that is the result of the master thread
}
My problem is, how do I make the masterthread wait for all the workers to be done (waiting for next workerSignal) ?
My solution is extraordinary complex.
I have a barrier in my workerthreads, that waits for all worker threads to finish, then from one of my threads (threadId=0),I signal a workerDone conditional that is being waited for in the bottom of my masterthread.
It works but its not beautiful, any ideas for improvements is much appreciated.
Thanks.

Have you considered using pthread_join http://kernel.org/doc/man-pages/online/pages/man3/pthread_join.3.html? It sounds like your using a signal to communicate between threads. While this might be appropriate in some situations I think in your case you might find the use of pthread_join simplifies your code.
I've outlined some example pseudo-code below:
//this goes in your main thread
for (int i = 0; i < num_threads; ++i)
pthread_join(thread_id[i], ...
This way your main thread will block until all threads, your worker threads, in the thread_id array have terminated.

You want to use a barrier. Barriers are initialized with a count N, and when any thread calls pthread_barrier_wait, it blocks until a total of N threads are at pthread_barrier_wait, and then they all return and the barrier can be used again (with the same count).
See the documentation in POSIX for details:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_wait.html

In Java you can use a Cyclic Barrier here with an initial value equal to the number of worker threads.
A reference to this barrier is passed to each worker thread, who, at the end of a single execution of their work, call barrier.await().
The main program will await() at the barrier until all worker threads have reached the point in their execution and called barrier.await().
Only when all worker threads have called barrier.await() will the barrier be raised and main may continue.
Cyclic barriers are similar to Latches, except that the barrier is cyclical, allowing it to be reset indefinately.
So in the case of main being in a loop, a cyclic barrier is a better option.

Related

Signaling mechanism for one thread waiting on several

I'm designing a system where a pool of workers pop jobs out of a queue, and I want the main thread to wait for all that to be done. This is what I've come up with so far (pseudocode):
// Main
launch_signal();
for (auto &worker : pool) {
// create unique_lock
if (!worker.done)
worker.condition_variable.wait(lock, worker.done);
}
// Worker
if (queue.empty()) {
mutex.lock();
this->done = true;
mutex.unlock();
this->condition_variable.notify_one();
// wait for launch signal from Main
} else {
mutex.lock();
auto job = queue.pop();
mutex.unlock();
job.execute();
}
So Main signals that jobs are available, then waits for every worker to signal back. Worker meanwhile keeps popping jobs off the queue until empty, then signals done and goes into waiting for launch signal.
My question: What is a more efficient algorithm for doing this?
The existing code appears to access queue.empty() without holding a mutex lock. Unless the queue object itself is thread-safe, (or at least the queue.empty() method is explicitly documented as being thread-safe), this will be undefined behavior.
So the first improvement would be to fix this likely bug.
Otherwise, this is a fairly stock, battle-tested, implementation of a worker pool. There's not much room for improvement here.
The only suggestion I can make is that if the number of worker threads is N, and after locking the mutex a thread finds that there are J jobs in the queue, the thread could remove J/N jobs (with the result of the division being at least 1) from the queue at once, and then do them in the sequence, on the assumptions that all other threads will do the same, and jobs take about the same amount of time to be done, on average. This will minimize lock contention.

Make threads call init once every cycle

I have a code that looks like the following, with many threads doing this code snippet:
if (!shouldWork)
{
long timeToSleep = 0;
if (FindTimeToSleep(timeToSleep)) {
Sleep(timeToSleep);
InnerInitialize();
}
}
}
Now, The function InnerInitialize should be called only once after the sleep timeout. There are many threads that can sleep, after they wake up, only one thread should call InnerInitialize. We could use a simple semaphore, but the problem is that in the next cycle, after all the threads have passed the call to InnerInitialize, if the threads get to sleep again, we may need to call the function again (only once). So this is similar to std::call_once, but only periodically.
How can we achieve this?
You should use a shared mutex for synchronization.
ignoring how each thread gets to the Sleep(timeToSleep) method this is what should happen:
pthread_mutex_t mutex;
int initialized;
.......
Sleep(timeToSleep);
pthread_mutex_lock(&mutex); //critical section
if (!initialized)
{
intialized = 1;
InnerInitialize();
}
pthread_mutex_unlock (&mutex);
You still have to reset the intialized variable somewhere in the code but I don't fully understand it to help you with that.
This of course assumes that all threads go to sleep for the same amount of time and that period is long enough to guarantee that no thread goes to sleep(again) before all the others have woken up.
Try using a single thread which manages the rest. Your, what seems to be per thread group, initialization and sleep between sessions would be managed from that one thread whilst the worker threads in the group would do their stuff when needed, possibly via a job queue.
This also cleanly separates out the responsibilities of each threads job.
Synchronize each thread around a "generational counter," which is simply an incrementing counter that signal its changes (via mutex and condition variable).
When the counter increments, it is a "new workday," if you will, and the workers know to begin again. A separate, dedicated scheduling thread performs the increment and initialization routines, and it does not need to know how many workers there are.
In pseudocode:
// main / global init
workCycle = new GenerationalCounter() // initialized to _generation 0
// worker thread
myCurrentCycle = 0
while true:
myCurrentCycle = workCycle.awaitNewGeneration(myCurrentCycle)
// lock mutex
// cond_wait until _generation != myCurrentCycle
// fetch _generation for return
// unlock mutex
DoWork()
// scheduler thread
while true:
SleepUntilNextWorkCycle()
InnerIntializer()
workCycle.increment() // lock mutex
// increment _generation
// broadcast
// unlock mutex
With a little bookkeeping, InnerInitialize() could be moved out of the scheduling thread and into one of the workers by extending GenerationalCounter to run a callback in the very first thread released after a generation increment.

Increase performance of thread pool (C++, pthreads)

My application has a main thread that assigns tasks to a number of worker threads. The communication pattern is the following:
The thread function (work is a function pointer here):
while(true) {
pthread_mutex_lock(mutex);
while(!work)
pthread_cond_wait(cond, mutex); // wait for work...
pthread_mutex_unlock(mutex);
work();
pthread_barrier_wait(barrier); /*all threads must finish their work*/
if(thread_id == 0) {
work = NULL;
pthread_cond_signal(cond); /*tell the main thread that the work is done*/
}
pthread_barrier_wait(barrier); /* make sure that none of the other worker
threads is already waiting on condition again...*/
}
In the main thread (the function that assigns a task to the worker threads):
pthread_mutex_lock(mutex);
work = func;
pthread_cond_broadcast(cond); // tell the worker threads to start...
while(work)
pthread_cond_wait(cond, mutex); // ...and wait for them to finish
pthread_mutex_unlock(mutex);
I did not use a queue here, because there can only be one task at a time and the main thread has to wait for the task to finish. The pattern works fine, but with poor performance. The problem is that tasks will be assigned very often while performing a single task is quite fast. Therefore the threads will suspend and wait on the condition very often. I would like to reduce the number of calls of pthread_mutex_(un)lock, phread_cond_wait and pthread_barrier, but I do not see how this could be done.
There is only one task at a time.
You don't need scheduling. You don't need threads. You can get rid of the locking.

Managing multiple concurrent threads

I'm writing a multi-threaded program in C++ using C++11 threading library.
I have the following requirements:
Main thread listens to some type of events, and fires off a new thread for each new event
When program termination is requested, new thread creation is blocked and we wait for the old threads to finish
I have the option to store the threads in some container, for example, a list. Before exit, all threads in the container are join()-ed. However, since STL containers are not thread-safe, additional synchronization is needed when adding a new thread and removing a finished thread from the container. The interaction between the main thread and the child threads becomes a little more complicated in this case. Should the child thread remove itself from the container? If not, how does it let the main thread know when it's time to remove? etc.
Another way I see is to have an atomic int which is incremented by the main thread when a child thread is created, and decremented by the child thread right before its termination (threads will be detach()-ed after creation, so I won't have to manage any std::thread objects). Before exiting, we just wait for the atomic integer to become 0 in a simple loop. This solution looks better to me because there are less moving parts and no locking (at least as long as the target platform has a lock-free implementation of std::atomic<int>).
So, my question is, which of the above methods would you prefer?
Using the thread counter would be my choice, combined with a condition wait. The last thread exiting should signal the condition variable to wake up the waiter.
// the waiter
{
std::lock_guard<std::mutex> g(threads_lock);
while (threads > 0) {
threads_cond.wait(threads_lock);
}
}
//...
// the threads that are exiting
if (--threads == 0) {
std::lock_guard<std::mutex> g(threads_lock);
threads_cond.notify_one();
}
This is assuming that threads is std::atomic, of course.

making sure threads are created and waiting before broadcasting

I have 10 threads that are supposed to be waiting for signal.
Until now I've simply done 'sleep(3)', and that has been working fine, but is there are a more secure way to make sure, that all threads have been created and are indeed waiting.
I made the following construction where I in critical region, before the wait, increment a counter telling how many threads are waiting. But then I have to have an additional mutex and conditional for signalling back to the main that all threads are created, it seems overly complex.
Am I missing some basic thread design pattern?
Thanks
edit: fixed types
edit: clarifying information below
A barrier won't work in this case, because I'm not interested in letting my threads wait until all threads are ready. This already happens with the 'cond_wait'.
I'm interested in letting the main function know, when all threads are ready and waiting.
//mutex and conditional to signal from main to threads to do work
mutex_t mutex_for_cond;
condt_t cond;
//mutex and conditional to signal back from thread to main that threads are ready
mutex_t mutex_for_back_cond;
condt_t back_cond;
int nThreads=0;//threadsafe by using mutex_for_cond
void *thread(){
mutex_lock(mutex_for_cond);
nThreads++;
if(nThreads==10){
mutex_lock(mutex_for_back_cond)
cond_signal(back_cond);
mutex_unlock(mutex_for_back_cond)
}while(1){
cond_wait(cond,mutext_for_cond);
if(spurious)
continue;
else
break;
}
mutex_unlock(mutex_for_cond);
//do work on non critical region data
}
int main(){
for(int i=0;i<10)
create_threads;
while(1){
mutex_lock(mutex_for_back_cond);
cond_wait(back_cond,mutex_for_back_cond);
mutex_unlock(mutex_for_back_cond);
mutex_lock(mutex_for_cond);
if(nThreads==10){
break;
}else{
//spurious wakeup
mutex_unlock(mutex_for_cond);
}
}
//now all threads are waiting
//mutex_for_cond is still locked so broadcast
cond_broadcast(cond);//was type here
}
Am I missing some basic thread design pattern?
Yes. For every condition, there should be a variable that is protected by the accompanying mutex. Only the change of this variable is indicated by signals on the condition variable.
You check the variable in a loop, waiting on the condition:
mutex_lock(mutex_for_back_cond);
while ( ready_threads < 10 )
cond_wait(back_cond,mutex_for_back_cond);
mutex_unlock( mutex_for_back_cond );
Additionally, what you are trying to build is a thread barrier. It is often pre-implemented in threading libraries, like pthread_barrier_wait.
Sensible threading APIs have a barrier construct which does precisely this.
For example, with boost::thread, you would create a barrier like this:
boost::barrier bar(10); // a barrier for 10 threads
and then each thread would wait on the barrier:
bar.wait();
the barrier waits until the specified number of threads are waiting for it, and then releases them all at once. In other words, once all ten threads have been created and are ready, it'll allow them all to proceed.
That's the simple, and sane, way of doing it. Threading APIs which do not have a barrier construct require you to do it the hard way, not unlike what you're doing now.
You should associate some variable that contains the 'event state' with the condition variable. The main thread sets the event state variable appropriately just before issuing the broadcast. The threads that are interested in the event check the event state variable regardless of whether they've blocked on the condition variable or not.
With this pattern, the main thread doesn't need to know about the precise state of the threads - it just sets the event when it needs to then broadcasts the condition. Any waiting threads will be unblocked, and any threads not waiting yet will never block on the condition variable because they'll note that the event has already occurred before waiting on the condition. Something like the following pseudocode:
//mutex and conditional to signal from main to threads to do work
pthread_mutex_t mutex_for_cond;
pthread_cond_t cond;
int event_occurred = 0;
void *thread()
{
pthread_mutex_lock(&mutex_for_cond);
while (!event_occurred) {
pthread_cond_wait( &cond, &mutex_for_cond);
}
pthread_mutex_unlock(&mutex_for_cond);
//do work on non critical region data
}
int main()
{
pthread_mutex_init(&mutex_for_cond, ...);
pthread_cond_init(&cond, ...);
for(int i=0;i<10)
create_threads(...);
// do whatever needs to done to set up the work for the threads
// now let the threads know they can do their work (whether or not
// they've gotten to the "wait point" yet)
pthread_mutex_lock(&mutex_for_cond);
event_occured = 1;
pthread_cond_broadcast(&cond);
pthread_mutex_unlock(&mutex_for_cond);
}