Signaling mechanism for one thread waiting on several - c++

I'm designing a system where a pool of workers pop jobs out of a queue, and I want the main thread to wait for all that to be done. This is what I've come up with so far (pseudocode):
// Main
launch_signal();
for (auto &worker : pool) {
// create unique_lock
if (!worker.done)
worker.condition_variable.wait(lock, worker.done);
}
// Worker
if (queue.empty()) {
mutex.lock();
this->done = true;
mutex.unlock();
this->condition_variable.notify_one();
// wait for launch signal from Main
} else {
mutex.lock();
auto job = queue.pop();
mutex.unlock();
job.execute();
}
So Main signals that jobs are available, then waits for every worker to signal back. Worker meanwhile keeps popping jobs off the queue until empty, then signals done and goes into waiting for launch signal.
My question: What is a more efficient algorithm for doing this?

The existing code appears to access queue.empty() without holding a mutex lock. Unless the queue object itself is thread-safe, (or at least the queue.empty() method is explicitly documented as being thread-safe), this will be undefined behavior.
So the first improvement would be to fix this likely bug.
Otherwise, this is a fairly stock, battle-tested, implementation of a worker pool. There's not much room for improvement here.
The only suggestion I can make is that if the number of worker threads is N, and after locking the mutex a thread finds that there are J jobs in the queue, the thread could remove J/N jobs (with the result of the division being at least 1) from the queue at once, and then do them in the sequence, on the assumptions that all other threads will do the same, and jobs take about the same amount of time to be done, on average. This will minimize lock contention.

Related

Multithreading implementation in threads

I am in process of implementing messages passing from one thread to another
Thread 1: Callback functions are registered with libraries, on callback, functions are invoked and needs to be send to another thread for processing as it takes time.
Thread 2: Thread to check if any messages are available(preferrednas in queue) and process the same.
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
The question is a bit vague about how a condition variable and mutex would be used, but yes, there would definitely be a role for such objects. The high-level view would be something like this:
The mutex would protect access to the message queue. Any read or modification of the queue, by any thread, would be done only while holding the mutex locked.
The message-processing thread would block on the CV in the event that it became ready to process a new message but the queue was empty.
The message-generating thread would signal the CV each time it enqueued a new message.
This is exactly a producer / consumer problem, and you can find a lot of information about such problems using that terminology.
But note also that there are multiple message queue implementations already available to serve exactly your purpose ("message queue" is in fact a standard term for these), so you should consider whether you really want to reinvent this wheel.
In general, mutexes are intended to control access between threads; but not great for notifying between threads.
If you design Thread2 to wait on the condition; you can simply process messages as they are received from Thread1.
Here would be a rough implementation
void pushFunction
{
// Obtain the mutex (preferrably scoped lock in boost or c++17)
std::lock_guard lock(myMutex);
const bool empty = myQueue.empty();
myQueue.push(data);
lock.unlock();
if(empty)
{
conditionVar.notify_one();
}
}
In Thread 2
void waitForMessage()
{
std::lock_guard lock(myMutex);
while (myQueue.empty())
{
conditionVar.wait(lock);
}
rxMessage = myQueue.front();
myQueue.pop();
}
It's important to note that the condition can spuriously wake up so it's important to keep it in the 'while empty' loop.
See https://en.cppreference.com/w/cpp/thread/condition_variable

condition_variable without mutex in a lock-free implementation

I have a lock-free single producer multiple consumer queue implemented using std::atomics in a way similar to Herb Sutters CPPCon2014 talk.
Sometimes, the producer is too slow to feed all consumers, therefore consumers can starve. I want to prevent starved consumers to bang on the queue, therefore I added a sleep for 10ms. This value is arbitrary and not optimal. I would like to use a signal that the consumer can send to the producer once there is a free slot in the queue again. In a lock based implementation, I would naturally use std::condition_variable for this task. However now in my lock-free implementation I am not sure, if it is the right design choice to introduce a mutex, only to be able to use std::condition_variable.
I just want to ask you, if a mutex is the right way to go in this case?
Edit: I have a single producer, which is never sleeping. And there are multiple consumer, who go to sleep if they starve. Thus the whole system is always making progress, therefore I think it is lock-free.
My current solution is to do this in the consumers GetData Function:
std::unique_lock<std::mutex> lk(_idleMutex);
_readSetAvailableCV.wait(lk);
And this in the producer Thread once new data is ready:
_readSetAvailableCV.notify_all();
If most of your threads are just waiting for the producer to enqueue a resource, I'm not that sure a lock-free implementation is even worth the effort. most of the time, your threads will sleep, they won't fight each other for the queue lock.
That is why I think (from the amount of data you have supplied), changing everything to work with a mutex + conditional_variable is just fine. When the producer enqueues a resource it notifies just one thread (with notify_one()) and releases the queue lock. The consumer that locks the queue dequeues a resource and returns to sleep if the queue is empty again. There shouldn't be any real "friction" between the threads (if your producer is slow) so I'd go with that.
I just watched this CPPCON video about the concurrency TS:
Artur Laksberg #cppcon2015
Somewhere in the middle of this talk Artur explains how exactly my problem could be solved with barriers and latches. He also shows an existing workaround using a condition_variable in the way i did. He underlines some weakpoints about the condition_variable used for this purpose, like spurious wake ups and missing notify signals before you enter wait.
However in my application, these limitations are no problem, so that I think for now, I will use the solution that I mentioned in the edit of my post - until latches/barrierers are available.
Thanks everybody for commenting.
With minimal design change to what you have, you can simply use a semaphore. The semaphore begins empty and is upped every time the produces pushes to the queue. Consumers first try to down the semaphore before popping from the queue.
C++11 does not provide a semaphore implementation, although one can be emulated with a mutex, a condition variable, and a counter.†
If you really want lock-free behavior when the producer is faster than the consumers, you could use double checked locking.
/* producer */
bool was_empty = q.empty_lock_free();
q.push_lock_free(x);
if (was_empty) {
scoped_lock l(q.lock());
if (!q.empty()) {
q.cond().signal();
}
}
/* consumers */
for (;;) {
if (q.empty_lock_free()) {
scoped_lock l(q.lock());
while (q.empty()) {
q.cond().wait();
}
x = q.pop();
if (!q.empty()) {
q.cond().signal();
}
} else {
try {
x = q.pop_lock_free();
} catch (empty_exception) {
continue;
}
break;
}
}
One possibility with pthreads is that a starved thread sleeps with pause() and wakes up with SIGCONT. Each thread has its own awake flag. If any thread is asleep when the producer posts new input, wake one up with pthread_kill().

Increase performance of thread pool (C++, pthreads)

My application has a main thread that assigns tasks to a number of worker threads. The communication pattern is the following:
The thread function (work is a function pointer here):
while(true) {
pthread_mutex_lock(mutex);
while(!work)
pthread_cond_wait(cond, mutex); // wait for work...
pthread_mutex_unlock(mutex);
work();
pthread_barrier_wait(barrier); /*all threads must finish their work*/
if(thread_id == 0) {
work = NULL;
pthread_cond_signal(cond); /*tell the main thread that the work is done*/
}
pthread_barrier_wait(barrier); /* make sure that none of the other worker
threads is already waiting on condition again...*/
}
In the main thread (the function that assigns a task to the worker threads):
pthread_mutex_lock(mutex);
work = func;
pthread_cond_broadcast(cond); // tell the worker threads to start...
while(work)
pthread_cond_wait(cond, mutex); // ...and wait for them to finish
pthread_mutex_unlock(mutex);
I did not use a queue here, because there can only be one task at a time and the main thread has to wait for the task to finish. The pattern works fine, but with poor performance. The problem is that tasks will be assigned very often while performing a single task is quite fast. Therefore the threads will suspend and wait on the condition very often. I would like to reduce the number of calls of pthread_mutex_(un)lock, phread_cond_wait and pthread_barrier, but I do not see how this could be done.
There is only one task at a time.
You don't need scheduling. You don't need threads. You can get rid of the locking.

How to go about multithreading with "priority"?

I have multiple threads processing multiple files in the background, while the program is idle.
To improve disk throughput, I use critical sections to ensure that no two threads ever use the same disk simultaneously.
The (pseudo-)code looks something like this:
void RunThread(HANDLE fileHandle)
{
// Acquire CRITICAL_SECTION for disk
CritSecLock diskLock(GetDiskLock(fileHandle));
for (...)
{
// Do some processing on file
}
}
Once the user requests a file to be processed, I need to stop all threads -- except the one which is processing the requested file. Once the file is processed, then I'd like to resume all the threads again.
Given the fact that SuspendThread is a bad idea, how do I go about stopping all threads except the one that is processing the relevant input?
What kind of threading objects/features would I need -- mutexes, semaphores, events, or something else? And how would I use them? (I'm hoping for compatibility with Windows XP.)
I recommend you go about it in a completely different fashion. If you really want only one thread for every disk (I'm not convinced this is a good idea) then you should create one thread per disk, and distribute files as you queue them for processing.
To implement priority requests for specific files I would then have a thread check a "priority slot" at several points during its normal processing (and of course in its main queue wait loop).
The difficulty here isn't priority as such, it's the fact that you want a thread to back out of a lock that it's holding, to let another thread take it. "Priority" relates to which of a set of runnable threads should be scheduled to run -- you want to make a thread runnable that isn't (because it's waiting on a lock held by another thread).
So, you want to implement (as you put it):
if (ThisThreadNeedsToSuspend()) { ReleaseDiskLock(); WaitForResume(); ReacquireDiskLock(); }
Since you're (wisely) using a scoped lock I would want to invert the logic:
while (file_is_not_finished) {
WaitUntilThisThreadCanContinue();
CritSecLock diskLock(blah);
process_part_of_the_file();
}
ReleasePriority();
...
void WaitUntilThisThreadCanContinue() {
MutexLock lock(thread_priority_mutex);
while (thread_with_priority != NOTHREAD and thread_with_priority != thisthread) {
condition_variable_wait(thread_priority_condvar);
}
}
void GiveAThreadThePriority(threadid) {
MutexLock lock(thread_priority_mutex);
thread_with_priority = threadid;
condition_variable_broadcast(thread_priority_condvar);
}
void ReleasePriority() {
MutexLock lock(thread_priority_mutex);
if (thread_with_priority == thisthread) {
thread_with_priority = NOTHREAD;
condition_variable_broadcast(thread_priority_condvar);
}
}
Read up on condition variables -- all recent OSes have them, with similar basic operations. They're also in Boost and in C++11.
If it's not possible for you to write a function process_part_of_the_file then you can't structure it this way. Instead you need a scoped lock that can release and regain the disklock. The easiest way to do that is to make it a mutex, then you can wait on a condvar using that same mutex. You can still use the mutex/condvar pair and the thread_with_priority object in much the same way.
You choose the size of "part of the file" according to how responsive you need the system to be to a change in priority. If you need it to be extremely responsive then the scheme doesn't really work -- this is co-operative multitasking.
I'm not entirely happy with this answer, the thread with priority can be starved for a long time if there are a lot of other threads that are already waiting on the same disk lock. I'd put in more thought to avoid that. Possibly there should not be a per-disk lock, rather the whole thing should be handled under the condition variable and its associated mutex. I hope this gets you started, though.
You may ask the threads to stop gracefully. Just check some variable in loop inside threads and continue or terminate work depending on its value.
Some thoughts about it:
The setting and checking of this value should be done inside critical section.
Because the critical section slows down the thread, the checking should be done often enough to quickly stop the thread when needed and rarely enough, such that thread won't be stalled by acquiring and releasing the critical section.
After each worker thread processes a file, check a condition variable associated with that thread. The condition variable could implemented simply as a bool + critical section. Or with InterlockedExchange* functions. And to be honest, I usually just use an unprotected bool between threads to signal "need to exit" - sometimes with an event handle if the worker thread could be sleeping.
After setting the condition variable for each thread, Main thread waits for each thread to exit via WaitForSingleObject.
DWORD __stdcall WorkerThread(void* pThreadData)
{
ThreadData* pData = (ThreadData*) pTheradData;
while (pData->GetNeedToExit() == false)
{
ProcessNextFile();
}
return 0;
}
void StopWokerThread(HANDLE hThread, ThreadData* pData)
{
pData->SetNeedToExit = true;
WaitForSingleObject(hThread);
CloseHandle(hThread);
}
struct ThreadData()
{
CRITICAL_SECITON _cs;
ThreadData()
{
InitializeCriticalSection(&_cs);
}
~ThreadData()
{
DeleteCriticalSection(&_cs);
}
ThreadData::SetNeedToExit()
{
EnterCriticalSection(&_cs);
_NeedToExit = true;
LeaveCriticalSeciton(&_cs);
}
bool ThreadData::GetNeedToExit()
{
bool returnvalue;
EnterCriticalSection(&_cs);
returnvalue = _NeedToExit = true;
LeaveCriticalSeciton(&_cs);
return returnvalue;
}
};
You can also use the pool of threads and regulate their work by using the I/O Completion port.
Normally threads from the pool would sleep awaiting for the I/O Completion port event/activity.
When you have a request the I/O Completion port releases the thread and it starts to do a job.
OK, how about this:
Two threads per disk, for high and low priority requests, each with its own input queue.
A high-priority disk task, when initially submitted, will then issue its disk requests in parallel with any low-priority task that is running. It can reset a ManualResetEvent that the low-priority thread waits on when it can, (WaitForSingleObject) and so will get blocked if the high-prioriy thread is perfoming disk ops. The high-priority thread should set the event after finishing a task.
This should limit the disk-thrashing to the interval, (if any), between the submission of the high-priority task and whenver the low-priority thread can wait on the MRE. Raising the CPU priority of the thread servicing the high-priority queue may assist in improving performance of the high-priority work in this interval.
Edit: by 'queue', I mean a thread-safe, blocking, producer-consumer queue, (just to be clear:).
More edit - if the issuing threads needs notification of job completion, the tasks issued to the queues could contain an 'OnCompletion' event to call with the task object as a parameter. The event handler could, for example, signal an AutoResetEvent that the originating thread is waiting on, so providing synchronous notification.

signal that a batch of threads has finished to a masterthread

I think I miss a fundamental design pattern concerning multiprogramming.
I got at solution to a problem but I would say its overly complex.
At program start, I'm allocating a static pool of workers and a master thread, that live throughout the program run. (pseudocode below)
void *worker(){
while(1){
//perworker mutex lock
//wait for workerSIGNAL
//do calculations
//perworker mutex unlock
}
}
My master thread signals all my workers, when the workers are done, they wait for the next signal from the master thread. (pseudocode below)
void *master(){
while(1){
//masterMutex lock
//wait for masterSignal
//signal all workerthread to start running
/*
SHOULD WAIT FOR ALL WORKER THREADS TO FINISH
(that is when workers are done with the calculations,
and are waiting for a new signal)
*/
//materMutex unlock
}
}
My master thread gets a signal from another part of my code (non thread), which means that only one masterthread exists. (pseudocode below)
double callMaster(){
//SIGNAL masterThread
//return value that is the result of the master thread
}
My problem is, how do I make the masterthread wait for all the workers to be done (waiting for next workerSignal) ?
My solution is extraordinary complex.
I have a barrier in my workerthreads, that waits for all worker threads to finish, then from one of my threads (threadId=0),I signal a workerDone conditional that is being waited for in the bottom of my masterthread.
It works but its not beautiful, any ideas for improvements is much appreciated.
Thanks.
Have you considered using pthread_join http://kernel.org/doc/man-pages/online/pages/man3/pthread_join.3.html? It sounds like your using a signal to communicate between threads. While this might be appropriate in some situations I think in your case you might find the use of pthread_join simplifies your code.
I've outlined some example pseudo-code below:
//this goes in your main thread
for (int i = 0; i < num_threads; ++i)
pthread_join(thread_id[i], ...
This way your main thread will block until all threads, your worker threads, in the thread_id array have terminated.
You want to use a barrier. Barriers are initialized with a count N, and when any thread calls pthread_barrier_wait, it blocks until a total of N threads are at pthread_barrier_wait, and then they all return and the barrier can be used again (with the same count).
See the documentation in POSIX for details:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_barrier_wait.html
In Java you can use a Cyclic Barrier here with an initial value equal to the number of worker threads.
A reference to this barrier is passed to each worker thread, who, at the end of a single execution of their work, call barrier.await().
The main program will await() at the barrier until all worker threads have reached the point in their execution and called barrier.await().
Only when all worker threads have called barrier.await() will the barrier be raised and main may continue.
Cyclic barriers are similar to Latches, except that the barrier is cyclical, allowing it to be reset indefinately.
So in the case of main being in a loop, a cyclic barrier is a better option.