Increase performance of thread pool (C++, pthreads) - c++

My application has a main thread that assigns tasks to a number of worker threads. The communication pattern is the following:
The thread function (work is a function pointer here):
while(true) {
pthread_mutex_lock(mutex);
while(!work)
pthread_cond_wait(cond, mutex); // wait for work...
pthread_mutex_unlock(mutex);
work();
pthread_barrier_wait(barrier); /*all threads must finish their work*/
if(thread_id == 0) {
work = NULL;
pthread_cond_signal(cond); /*tell the main thread that the work is done*/
}
pthread_barrier_wait(barrier); /* make sure that none of the other worker
threads is already waiting on condition again...*/
}
In the main thread (the function that assigns a task to the worker threads):
pthread_mutex_lock(mutex);
work = func;
pthread_cond_broadcast(cond); // tell the worker threads to start...
while(work)
pthread_cond_wait(cond, mutex); // ...and wait for them to finish
pthread_mutex_unlock(mutex);
I did not use a queue here, because there can only be one task at a time and the main thread has to wait for the task to finish. The pattern works fine, but with poor performance. The problem is that tasks will be assigned very often while performing a single task is quite fast. Therefore the threads will suspend and wait on the condition very often. I would like to reduce the number of calls of pthread_mutex_(un)lock, phread_cond_wait and pthread_barrier, but I do not see how this could be done.

There is only one task at a time.
You don't need scheduling. You don't need threads. You can get rid of the locking.

Related

Waiting loop alternative in c++ thread pool

I have implemented a simple thread pool in my program where the main thread sets the data and notify the threads to execute and then wait in a loop for them to finish.
while(true){
// set the data
....
// notify threads
...
while(n_done < num_threads){} // wait in the while loop for threads to finish
}
Each thread takes approx 10-15 ms to complete until then the main thread just keeps looping for threads to finish consuming a lot of cpu usage.
Is there any alternative method to stop or sleep the main thread execution until the threads complete without a loop.
If you want to wait for all thread to finish and not reuse them you can use join() on every thread at the end.

std::condition_variable without a lock

I'm trying to synchonise a set of threads. These threads sleep most of the time, waking up to do their scheduled job. I'm using std::thread for them.
Unfortunately, when I terminate the application threads prevent it from exiting. In C# I can make a thread to be background so that it will be termianted on app exit. It seems to me that equavalint feature is not availabe at C++.
So I decided to use a kind of event indicator, and make the threads to wake up when the app exits. Standard C++11 std::condition_variable requires a unique lock, so I cannot use it, as I need both threads to wake up at the same time (they do not share any resources).
Eventually, I decided to use WinApi's CreateEvent + SetEvent+WaitForSingleObject in order to solve the issue.
I there a way to achieve the same behavior using just c++11?
Again, what do I want:
a set of threads are working independently and usually are asleep
for a particular period (could be different for different threads;
all threds check a variable that is availabe for all of them whether
it is a time to stop working (I call this variable IsAliva).
Actually all threads are spinning in loop like this:
while (IsAlive) {
// Do work
std::this_thread::sleep_for(...);
}
threads must be able to work simultaneously, not blocking each other;
when the app is closed and event is risen and it makes the thread to
wake up right now, no matter the timeout;
waken up, it checks the
IsAlive and exits.
yes you can do this using standard c++ mechanisms of condition variables, a mutex and a flag of somekind
// Your class or global variables
std::mutex deathLock;
std::condition_variable deathCv;
bool deathTriggered = false;
// Kill Thread runs this code to kill all other threads:
{
std::lock_guard<std::mutex> lock(deathLock);
deathTriggered = true;
}
deathCv.notify_all();
// You Worker Threads run this code:
while(true)
{
... do work
// Now wait for 1000 milliseconds or until death is triggered:
std::unique_lock<std::mutex> lock(deathLock);
deathCv.wait_for(lock, std::chrono::milliseconds(1000), [](){return deathTriggered;});
// Check for death
if(deathTriggered)
{
break;
}
}
Note that this runs correctly in the face of death being triggered before entering the condition. You could also use the return value from wait_for but this way is easier to read imo. Also, whilst its not clear, multiple threads sleeping is fine as the wait_for code internally unlocks the unique_lock whilst sleeping and reacquires it to check the condition and also when it returns.
Finally, all the threads do wake up 'at the same time' as whilst they're serialised in checking the bool flag, this is only for a few instructions then they unlock the lock as they break out of the loop. It would be unnoticeable.
In c++11, you should be able to detach() a thread, so that it will be treated as a Daemon thread, which means the thread will be automatically stopped if the app terminates.

Signaling mechanism for one thread waiting on several

I'm designing a system where a pool of workers pop jobs out of a queue, and I want the main thread to wait for all that to be done. This is what I've come up with so far (pseudocode):
// Main
launch_signal();
for (auto &worker : pool) {
// create unique_lock
if (!worker.done)
worker.condition_variable.wait(lock, worker.done);
}
// Worker
if (queue.empty()) {
mutex.lock();
this->done = true;
mutex.unlock();
this->condition_variable.notify_one();
// wait for launch signal from Main
} else {
mutex.lock();
auto job = queue.pop();
mutex.unlock();
job.execute();
}
So Main signals that jobs are available, then waits for every worker to signal back. Worker meanwhile keeps popping jobs off the queue until empty, then signals done and goes into waiting for launch signal.
My question: What is a more efficient algorithm for doing this?
The existing code appears to access queue.empty() without holding a mutex lock. Unless the queue object itself is thread-safe, (or at least the queue.empty() method is explicitly documented as being thread-safe), this will be undefined behavior.
So the first improvement would be to fix this likely bug.
Otherwise, this is a fairly stock, battle-tested, implementation of a worker pool. There's not much room for improvement here.
The only suggestion I can make is that if the number of worker threads is N, and after locking the mutex a thread finds that there are J jobs in the queue, the thread could remove J/N jobs (with the result of the division being at least 1) from the queue at once, and then do them in the sequence, on the assumptions that all other threads will do the same, and jobs take about the same amount of time to be done, on average. This will minimize lock contention.

Make threads call init once every cycle

I have a code that looks like the following, with many threads doing this code snippet:
if (!shouldWork)
{
long timeToSleep = 0;
if (FindTimeToSleep(timeToSleep)) {
Sleep(timeToSleep);
InnerInitialize();
}
}
}
Now, The function InnerInitialize should be called only once after the sleep timeout. There are many threads that can sleep, after they wake up, only one thread should call InnerInitialize. We could use a simple semaphore, but the problem is that in the next cycle, after all the threads have passed the call to InnerInitialize, if the threads get to sleep again, we may need to call the function again (only once). So this is similar to std::call_once, but only periodically.
How can we achieve this?
You should use a shared mutex for synchronization.
ignoring how each thread gets to the Sleep(timeToSleep) method this is what should happen:
pthread_mutex_t mutex;
int initialized;
.......
Sleep(timeToSleep);
pthread_mutex_lock(&mutex); //critical section
if (!initialized)
{
intialized = 1;
InnerInitialize();
}
pthread_mutex_unlock (&mutex);
You still have to reset the intialized variable somewhere in the code but I don't fully understand it to help you with that.
This of course assumes that all threads go to sleep for the same amount of time and that period is long enough to guarantee that no thread goes to sleep(again) before all the others have woken up.
Try using a single thread which manages the rest. Your, what seems to be per thread group, initialization and sleep between sessions would be managed from that one thread whilst the worker threads in the group would do their stuff when needed, possibly via a job queue.
This also cleanly separates out the responsibilities of each threads job.
Synchronize each thread around a "generational counter," which is simply an incrementing counter that signal its changes (via mutex and condition variable).
When the counter increments, it is a "new workday," if you will, and the workers know to begin again. A separate, dedicated scheduling thread performs the increment and initialization routines, and it does not need to know how many workers there are.
In pseudocode:
// main / global init
workCycle = new GenerationalCounter() // initialized to _generation 0
// worker thread
myCurrentCycle = 0
while true:
myCurrentCycle = workCycle.awaitNewGeneration(myCurrentCycle)
// lock mutex
// cond_wait until _generation != myCurrentCycle
// fetch _generation for return
// unlock mutex
DoWork()
// scheduler thread
while true:
SleepUntilNextWorkCycle()
InnerIntializer()
workCycle.increment() // lock mutex
// increment _generation
// broadcast
// unlock mutex
With a little bookkeeping, InnerInitialize() could be moved out of the scheduling thread and into one of the workers by extending GenerationalCounter to run a callback in the very first thread released after a generation increment.

signal that a batch of threads has finished to a masterthread

I think I miss a fundamental design pattern concerning multiprogramming.
I got at solution to a problem but I would say its overly complex.
At program start, I'm allocating a static pool of workers and a master thread, that live throughout the program run. (pseudocode below)
void *worker(){
while(1){
//perworker mutex lock
//wait for workerSIGNAL
//do calculations
//perworker mutex unlock
}
}
My master thread signals all my workers, when the workers are done, they wait for the next signal from the master thread. (pseudocode below)
void *master(){
while(1){
//masterMutex lock
//wait for masterSignal
//signal all workerthread to start running
/*
SHOULD WAIT FOR ALL WORKER THREADS TO FINISH
(that is when workers are done with the calculations,
and are waiting for a new signal)
*/
//materMutex unlock
}
}
My master thread gets a signal from another part of my code (non thread), which means that only one masterthread exists. (pseudocode below)
double callMaster(){
//SIGNAL masterThread
//return value that is the result of the master thread
}
My problem is, how do I make the masterthread wait for all the workers to be done (waiting for next workerSignal) ?
My solution is extraordinary complex.
I have a barrier in my workerthreads, that waits for all worker threads to finish, then from one of my threads (threadId=0),I signal a workerDone conditional that is being waited for in the bottom of my masterthread.
It works but its not beautiful, any ideas for improvements is much appreciated.
Thanks.
Have you considered using pthread_join http://kernel.org/doc/man-pages/online/pages/man3/pthread_join.3.html? It sounds like your using a signal to communicate between threads. While this might be appropriate in some situations I think in your case you might find the use of pthread_join simplifies your code.
I've outlined some example pseudo-code below:
//this goes in your main thread
for (int i = 0; i < num_threads; ++i)
pthread_join(thread_id[i], ...
This way your main thread will block until all threads, your worker threads, in the thread_id array have terminated.
You want to use a barrier. Barriers are initialized with a count N, and when any thread calls pthread_barrier_wait, it blocks until a total of N threads are at pthread_barrier_wait, and then they all return and the barrier can be used again (with the same count).
See the documentation in POSIX for details:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_wait.html
In Java you can use a Cyclic Barrier here with an initial value equal to the number of worker threads.
A reference to this barrier is passed to each worker thread, who, at the end of a single execution of their work, call barrier.await().
The main program will await() at the barrier until all worker threads have reached the point in their execution and called barrier.await().
Only when all worker threads have called barrier.await() will the barrier be raised and main may continue.
Cyclic barriers are similar to Latches, except that the barrier is cyclical, allowing it to be reset indefinately.
So in the case of main being in a loop, a cyclic barrier is a better option.