Best way of synchronising master and slave threads without using spinlocks c++ - c++

I've got a slave thread that executes a infinite loop, which only executes the code within the loop once on instruction from the master thread. eg
void slaveFunc(){
for(;;){
//// Wait here for go signal from Master thread
doStuff(); // doStuff() may return instantly or take hours.
}
return;
}
The master thread also runs a loop
void masterFunc(){
for(;;){
Instruct slaveThread to execute 1 iteration of the loop
Wait for slaveThread to complete 1 iteration of the loop
}
return;
}
Presently I'm using a std::mutex, so when the masterThread releases the mutex the slave thread executes an iteration of the loop. Likewise, when the slave thread releases the mutex, the master thread can instruct the slave thread to execute another iteration of the loop. I've had to use a constantly polled std::atomic to ensure that when one thread releases the mutex, the other thread will lock it.
Is there a more graceful way of doing this, without resorting to a spinlock?

Related

Waiting loop alternative in c++ thread pool

I have implemented a simple thread pool in my program where the main thread sets the data and notify the threads to execute and then wait in a loop for them to finish.
while(true){
// set the data
....
// notify threads
...
while(n_done < num_threads){} // wait in the while loop for threads to finish
}
Each thread takes approx 10-15 ms to complete until then the main thread just keeps looping for threads to finish consuming a lot of cpu usage.
Is there any alternative method to stop or sleep the main thread execution until the threads complete without a loop.
If you want to wait for all thread to finish and not reuse them you can use join() on every thread at the end.

Why doesn't my mutex lock before my other thread locks it?

I am running 1 thread created with pthreads and I'm using a mutex between the thread and my main thread. From what I understand, once a thread is ready to lock a mutex, it will spinlock until its able to lock. But I'm running into an issue where it doesn't spinlock. The pseudo code I have.
Main thread:
//I create thread 1 on this line, then it enters the while loop
while(p.size() > r.size()){
pthread_mutex_lock(&Mutex);
//calculations and decrease p.size()
pthread_mutex_unlock(&Mutex);
}
Thread 1:
//wait 500ms before hitting mutex
pthread_mutex_lock(&Mutex);
//calculations
pthread_mutex_unlock(&Mutex);
The issue I'm running to is that the Thread 1 Mutex never locks until the main thread while loop exits. Thread 1 reaches the mutex lock before the main thread can finish the while loop.
EDIT: If I had a delay of 10ms to the end of my while loop (after the mutex unlocks), then this fixes my problem. But how can I fix the problem without adding a delay of 10ms.
Your main thread is unlocking the mutex and then immediately locking it again. Try introducing a delay in your main loop (for testing purposes) to see if this is the issue.
Check out the answer to this question:
pthreads: thread starvation caused by quick re-locking

Signaling mechanism for one thread waiting on several

I'm designing a system where a pool of workers pop jobs out of a queue, and I want the main thread to wait for all that to be done. This is what I've come up with so far (pseudocode):
// Main
launch_signal();
for (auto &worker : pool) {
// create unique_lock
if (!worker.done)
worker.condition_variable.wait(lock, worker.done);
}
// Worker
if (queue.empty()) {
mutex.lock();
this->done = true;
mutex.unlock();
this->condition_variable.notify_one();
// wait for launch signal from Main
} else {
mutex.lock();
auto job = queue.pop();
mutex.unlock();
job.execute();
}
So Main signals that jobs are available, then waits for every worker to signal back. Worker meanwhile keeps popping jobs off the queue until empty, then signals done and goes into waiting for launch signal.
My question: What is a more efficient algorithm for doing this?
The existing code appears to access queue.empty() without holding a mutex lock. Unless the queue object itself is thread-safe, (or at least the queue.empty() method is explicitly documented as being thread-safe), this will be undefined behavior.
So the first improvement would be to fix this likely bug.
Otherwise, this is a fairly stock, battle-tested, implementation of a worker pool. There's not much room for improvement here.
The only suggestion I can make is that if the number of worker threads is N, and after locking the mutex a thread finds that there are J jobs in the queue, the thread could remove J/N jobs (with the result of the division being at least 1) from the queue at once, and then do them in the sequence, on the assumptions that all other threads will do the same, and jobs take about the same amount of time to be done, on average. This will minimize lock contention.

Increase performance of thread pool (C++, pthreads)

My application has a main thread that assigns tasks to a number of worker threads. The communication pattern is the following:
The thread function (work is a function pointer here):
while(true) {
pthread_mutex_lock(mutex);
while(!work)
pthread_cond_wait(cond, mutex); // wait for work...
pthread_mutex_unlock(mutex);
work();
pthread_barrier_wait(barrier); /*all threads must finish their work*/
if(thread_id == 0) {
work = NULL;
pthread_cond_signal(cond); /*tell the main thread that the work is done*/
}
pthread_barrier_wait(barrier); /* make sure that none of the other worker
threads is already waiting on condition again...*/
}
In the main thread (the function that assigns a task to the worker threads):
pthread_mutex_lock(mutex);
work = func;
pthread_cond_broadcast(cond); // tell the worker threads to start...
while(work)
pthread_cond_wait(cond, mutex); // ...and wait for them to finish
pthread_mutex_unlock(mutex);
I did not use a queue here, because there can only be one task at a time and the main thread has to wait for the task to finish. The pattern works fine, but with poor performance. The problem is that tasks will be assigned very often while performing a single task is quite fast. Therefore the threads will suspend and wait on the condition very often. I would like to reduce the number of calls of pthread_mutex_(un)lock, phread_cond_wait and pthread_barrier, but I do not see how this could be done.
There is only one task at a time.
You don't need scheduling. You don't need threads. You can get rid of the locking.

signal that a batch of threads has finished to a masterthread

I think I miss a fundamental design pattern concerning multiprogramming.
I got at solution to a problem but I would say its overly complex.
At program start, I'm allocating a static pool of workers and a master thread, that live throughout the program run. (pseudocode below)
void *worker(){
while(1){
//perworker mutex lock
//wait for workerSIGNAL
//do calculations
//perworker mutex unlock
}
}
My master thread signals all my workers, when the workers are done, they wait for the next signal from the master thread. (pseudocode below)
void *master(){
while(1){
//masterMutex lock
//wait for masterSignal
//signal all workerthread to start running
/*
SHOULD WAIT FOR ALL WORKER THREADS TO FINISH
(that is when workers are done with the calculations,
and are waiting for a new signal)
*/
//materMutex unlock
}
}
My master thread gets a signal from another part of my code (non thread), which means that only one masterthread exists. (pseudocode below)
double callMaster(){
//SIGNAL masterThread
//return value that is the result of the master thread
}
My problem is, how do I make the masterthread wait for all the workers to be done (waiting for next workerSignal) ?
My solution is extraordinary complex.
I have a barrier in my workerthreads, that waits for all worker threads to finish, then from one of my threads (threadId=0),I signal a workerDone conditional that is being waited for in the bottom of my masterthread.
It works but its not beautiful, any ideas for improvements is much appreciated.
Thanks.
Have you considered using pthread_join http://kernel.org/doc/man-pages/online/pages/man3/pthread_join.3.html? It sounds like your using a signal to communicate between threads. While this might be appropriate in some situations I think in your case you might find the use of pthread_join simplifies your code.
I've outlined some example pseudo-code below:
//this goes in your main thread
for (int i = 0; i < num_threads; ++i)
pthread_join(thread_id[i], ...
This way your main thread will block until all threads, your worker threads, in the thread_id array have terminated.
You want to use a barrier. Barriers are initialized with a count N, and when any thread calls pthread_barrier_wait, it blocks until a total of N threads are at pthread_barrier_wait, and then they all return and the barrier can be used again (with the same count).
See the documentation in POSIX for details:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_wait.html
In Java you can use a Cyclic Barrier here with an initial value equal to the number of worker threads.
A reference to this barrier is passed to each worker thread, who, at the end of a single execution of their work, call barrier.await().
The main program will await() at the barrier until all worker threads have reached the point in their execution and called barrier.await().
Only when all worker threads have called barrier.await() will the barrier be raised and main may continue.
Cyclic barriers are similar to Latches, except that the barrier is cyclical, allowing it to be reset indefinately.
So in the case of main being in a loop, a cyclic barrier is a better option.