Choice of semaphores, mutexes and condition variables - c++

I have a producer thread and a consumer thread, with the producer being real-time and determinism-sensitive.
Hence I decided to hoist the processing out of the producer thread into a consumer thread, using a lock-free fifo queue. The goal is to have the consumer being both responsive but also avoiding busy-waiting, while never delaying the producer for a indeterminate amount of time; thus any allocations/locks (and kernel entries, I suppose?) etc. are completely out of the question.
I've implemented this pattern, which seems to work well, however I'm unsure of why a mutex is needed at all:
std::mutex m;
std::condition_variable cv;
void consumer()
{
std::unique_lock<std::mutex> lock(m);
while (1)
{
cv.wait(lock);
// process consumation...
}
}
void producer()
{
while (1)
{
// produce and post..
cv.notify_one();
}
}
The other canonical examples seems to lock the mutex in the producer as well, why? My data communication is already thread-safe, so this shouldn't be needed. Also, is this susceptible to missing signals?
And while researching this, I stumble upon semaphores which seem to be used explicitly for this situation. What are the benefits versus this system? I prefer my solution currently, just because it is a part of the standard library.

Semaphores and Condition Variables are somehow similar concepts. At least classical Counting Semaphores aren't available natively from the current c++ standard library. But these can be easily replaced with a std::condition_variable controlling an in-/decremented integer value.
The std::mutex for the condition variable is necessary, to protect from race conditions when changing the underlying value.

Related

Understanding condition_variable::wait for blocking a thread

While implementing a thread pool pattern in C++ based on this, I came across a few questions.
Let's assume minimal code sample:
std::mutex thread_mutex;
std::condition_variable thread_condition;
void thread_func() {
std::unique_lock<std::mutex> lock(thread_mutex);
thread_condition.wait(lock);
lock.unlock();
}
std::thread t1 = std::thread(thread_func);
Regarding cppreference.com about conditon_variable::wait(), wait() causes the current thread to block. What is locking the mutex then for when I only need one thread at all using wait() to get notified when something is to do?
unique_lock will block the thread when the mutex already has been locked by another thread. But this wouldn't be neccesary as long as wait() blocks anyway or what do I miss here?
Adding a few lines at the bottom...
std::thread t2 = std::thread(thread_func);
thread_condition.notify_all()
When unique_lock is blocking the thread, how will notify_all() reach both threads when one of them is locked by unique_lock and the other is blocked by wait()? I understand that blocking wait() will be freed by notify_all() which afterwards leads to unlocking the mutex and that this gives chance to the other thread for locking first the mutex and blocking thread by wait() afterwards. But how is this thread notified than?
Expanding this question by adding a loop in thread_func()...
std::mutex thread_mutex;
std::condition_variable thread_condition;
void thread_func() {
while(true) {
std::unique_lock<std::mutex> lock(thread_mutex);
thread_condition.wait(lock);
lock.unlock();
}
}
std::thread t1 = std::thread(thread_func);
std::thread t2 = std::thread(thread_func);
thread_condition.notify_all()
While reading documentation, I would now expect both threads running endlessly. But they do not return from wait() lock. Why do I have to use a predicate for expected behaviour like this:
bool wakeup = false;
//[...]
thread_condition.wait(lock, [] { return wakeup; });
//[...]
wakeup = !wakeup;
thread_condition.notify_all();
Thanks in advance.
This is really close to being a duplicate, but it's actually that question that answers this one; we also have an answer that more or less answers this question, but the question is distinct. I think that an independent answer is needed, even though it's little more than a (long) definition.
What is a condition variable?
The operational definition is that it's a means for a thread to block until a message arrives from another thread. A mutex alone can't possibly do this: if all other threads are busy with unrelated work, a mutex can't block a thread at all. A semaphore can block a lone thread, but it's tightly bound to the notion of a count, which isn't always appropriate to the nature of the message to receive.
This "channel" can be implemented in several ways. Very low-tech is to use a pipe, but that involves expensive system calls. Windows provides the Event object which is fundamentally a boolean on whose truth a thread may wait. (C++20 provides a similar feature with atomic_flag::wait.)
Condition variables take a different approach: their structural definition is that they are stateless, but have a special connection to a corresponding mutex type. The latter is necessitated by the former: without state, it is impossible to store a message, so arrangements must be made to prevent sending a message during some interval between a thread recognizing the need to wait (by examining some other state: perhaps that the queue from which it wants to pop is empty) and it actually being blocked. Of course, after the thread is blocked it cannot take any action to allow the message to be sent, so the condition variable must do so.
This is implemented by having the thread take a mutex before checking the condition and having wait release that mutex only after the thread can receive the message. (In some implementations, the mutex is also used to protect the workings of the condition variable, but C++ does not do so.) When the message is received, the mutex is re-acquired (which may block the thread again for a time), as is necessary to consult the external state again. wait thus acts like an everted std::unique_lock: the mutex is unlocked during wait and locked again afterwards, with possibly arbitary changes having been made by other threads in the meantime.
Answers
Given this understanding, the individual answers here are trivial:
Locking the mutex allows the waiting thread to safely decide to wait, given that there must be some other thread affecting the state in question.
If the std::unique_lock blocks, some other thread is currently updating the state, which might actually obviate the need for wait.
Any number of threads can be in wait, since each unlocks the mutex when it calls it.
Waiting on a condition variable, er, unconditionally is always wrong: the state you're after might already apply, with no further messages coming.

Multithreading implementation in threads

I am in process of implementing messages passing from one thread to another
Thread 1: Callback functions are registered with libraries, on callback, functions are invoked and needs to be send to another thread for processing as it takes time.
Thread 2: Thread to check if any messages are available(preferrednas in queue) and process the same.
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
Is condition_variable usage with mutex a correct approach to start considering thread 2 processing takes time in which multiple other messages can be added by thread 1?
The question is a bit vague about how a condition variable and mutex would be used, but yes, there would definitely be a role for such objects. The high-level view would be something like this:
The mutex would protect access to the message queue. Any read or modification of the queue, by any thread, would be done only while holding the mutex locked.
The message-processing thread would block on the CV in the event that it became ready to process a new message but the queue was empty.
The message-generating thread would signal the CV each time it enqueued a new message.
This is exactly a producer / consumer problem, and you can find a lot of information about such problems using that terminology.
But note also that there are multiple message queue implementations already available to serve exactly your purpose ("message queue" is in fact a standard term for these), so you should consider whether you really want to reinvent this wheel.
In general, mutexes are intended to control access between threads; but not great for notifying between threads.
If you design Thread2 to wait on the condition; you can simply process messages as they are received from Thread1.
Here would be a rough implementation
void pushFunction
{
// Obtain the mutex (preferrably scoped lock in boost or c++17)
std::lock_guard lock(myMutex);
const bool empty = myQueue.empty();
myQueue.push(data);
lock.unlock();
if(empty)
{
conditionVar.notify_one();
}
}
In Thread 2
void waitForMessage()
{
std::lock_guard lock(myMutex);
while (myQueue.empty())
{
conditionVar.wait(lock);
}
rxMessage = myQueue.front();
myQueue.pop();
}
It's important to note that the condition can spuriously wake up so it's important to keep it in the 'while empty' loop.
See https://en.cppreference.com/w/cpp/thread/condition_variable

condition_variable without mutex in a lock-free implementation

I have a lock-free single producer multiple consumer queue implemented using std::atomics in a way similar to Herb Sutters CPPCon2014 talk.
Sometimes, the producer is too slow to feed all consumers, therefore consumers can starve. I want to prevent starved consumers to bang on the queue, therefore I added a sleep for 10ms. This value is arbitrary and not optimal. I would like to use a signal that the consumer can send to the producer once there is a free slot in the queue again. In a lock based implementation, I would naturally use std::condition_variable for this task. However now in my lock-free implementation I am not sure, if it is the right design choice to introduce a mutex, only to be able to use std::condition_variable.
I just want to ask you, if a mutex is the right way to go in this case?
Edit: I have a single producer, which is never sleeping. And there are multiple consumer, who go to sleep if they starve. Thus the whole system is always making progress, therefore I think it is lock-free.
My current solution is to do this in the consumers GetData Function:
std::unique_lock<std::mutex> lk(_idleMutex);
_readSetAvailableCV.wait(lk);
And this in the producer Thread once new data is ready:
_readSetAvailableCV.notify_all();
If most of your threads are just waiting for the producer to enqueue a resource, I'm not that sure a lock-free implementation is even worth the effort. most of the time, your threads will sleep, they won't fight each other for the queue lock.
That is why I think (from the amount of data you have supplied), changing everything to work with a mutex + conditional_variable is just fine. When the producer enqueues a resource it notifies just one thread (with notify_one()) and releases the queue lock. The consumer that locks the queue dequeues a resource and returns to sleep if the queue is empty again. There shouldn't be any real "friction" between the threads (if your producer is slow) so I'd go with that.
I just watched this CPPCON video about the concurrency TS:
Artur Laksberg #cppcon2015
Somewhere in the middle of this talk Artur explains how exactly my problem could be solved with barriers and latches. He also shows an existing workaround using a condition_variable in the way i did. He underlines some weakpoints about the condition_variable used for this purpose, like spurious wake ups and missing notify signals before you enter wait.
However in my application, these limitations are no problem, so that I think for now, I will use the solution that I mentioned in the edit of my post - until latches/barrierers are available.
Thanks everybody for commenting.
With minimal design change to what you have, you can simply use a semaphore. The semaphore begins empty and is upped every time the produces pushes to the queue. Consumers first try to down the semaphore before popping from the queue.
C++11 does not provide a semaphore implementation, although one can be emulated with a mutex, a condition variable, and a counter.†
If you really want lock-free behavior when the producer is faster than the consumers, you could use double checked locking.
/* producer */
bool was_empty = q.empty_lock_free();
q.push_lock_free(x);
if (was_empty) {
scoped_lock l(q.lock());
if (!q.empty()) {
q.cond().signal();
}
}
/* consumers */
for (;;) {
if (q.empty_lock_free()) {
scoped_lock l(q.lock());
while (q.empty()) {
q.cond().wait();
}
x = q.pop();
if (!q.empty()) {
q.cond().signal();
}
} else {
try {
x = q.pop_lock_free();
} catch (empty_exception) {
continue;
}
break;
}
}
One possibility with pthreads is that a starved thread sleeps with pause() and wakes up with SIGCONT. Each thread has its own awake flag. If any thread is asleep when the producer posts new input, wake one up with pthread_kill().

Implement a high performance mutex similar to Qt's one

I have a multi-thread scientific application where several computing threads (one per core) have to store their results in a common buffer. This requires a mutex mechanism.
Working threads spend only a small fraction of their time writing to the buffer, so the mutex is unlocked most of the time, and locks have a high probability to succeed immediately without waiting for another thread to unlock.
Currently, I have used Qt's QMutex for the task, and it works well : the mutex has a negligible overhead.
However, I have to port it to c++11/STL only. When using std::mutex, the performance drops by 66% and the threads spend most of their time locking the mutex.
After another question, I figured that Qt uses a fast locking mechanism based on a simple atomic flag, optimized for cases where the mutex is not already locked. And falls back to a system mutex when concurrent locking occurs.
I would like to implement this in STL. Is there a simple way based on std::atomic and std::mutex ? I have digged in Qt's code but it seems overly complicated for my use (I do not need locks timeouts, pimpl, small footprint etc...).
Edit : I have tried a spinlock, but this does not work well because :
Periodically (every few seconds), another thread locks the mutexes and flushes the buffer. This takes some time, so all worker threads get blocked at this time. The spinlocks make the scheduling busy, causing the flush to be 10-100x slower than with a proper mutex. This is not acceptable
Edit : I have tried this, but it's not working (locks all threads)
class Mutex
{
public:
Mutex() : lockCounter(0) { }
void lock()
{
if(lockCounter.fetch_add(1, std::memory_order_acquire)>0)
{
std::unique_lock<std::mutex> lock(internalMutex);
cv.wait(lock);
}
}
void unlock();
{
if(lockCounter.fetch_sub(1, std::memory_order_release)>1)
{
cv.notify_one();
}
}
private:
std::atomic<int> lockCounter;
std::mutex internalMutex;
std::condition_variable cv;
};
Thanks!
Edit : Final solution
MikeMB's fast mutex was working pretty well.
As a final solution, I did:
Use a simple spinlock with a try_lock
When a thread fails to try_lock, instead of waiting, they fill a queue (which is not shared with other threads) and continue
When a thread gets a lock, it updates the buffer with the current result, but also with the results stored in the queue (it processes its queue)
The buffer flushing was made much more efficiently : the blocking part only swaps two pointers.
General Advice
As was mentioned in some comments, I'd first have a look, whether you can restructure your program design to make the mutex implementation less critical for your performance .
Also, as multithreading support in standard c++ is pretty new and somewhat infantile, you sometimes just have to fall back on platform specific mechanisms, like e.g. a futex on linux systems or critical sections on windows or non-standard libraries like Qt.
That being said, I could think of two implementation approaches that might potentially speed up your program:
Spinlock
If access collisions happen very rarely, and the mutex is only hold for short periods of time (two things one should strive to achieve anyway of course), it might be most efficient to just use a spinlock, as it doesn't require any system calls at all and it's simple to implement (taken from cppreference):
class SpinLock {
std::atomic_flag locked ;
public:
void lock() {
while (locked.test_and_set(std::memory_order_acquire)) {
std::this_thread::yield(); //<- this is not in the source but might improve performance.
}
}
void unlock() {
locked.clear(std::memory_order_release);
}
};
The drawback of course is that waiting threads don't stay asleep and steal processing time.
Checked Locking
This is essentially the idea you demonstrated: You first make a fast check, whether locking is actually needed based on an atomic swap operation and use a heavy std::mutex only if it is unavoidable.
struct FastMux {
//Status of the fast mutex
std::atomic<bool> locked;
//helper mutex and vc on which threads can wait in case of collision
std::mutex mux;
std::condition_variable cv;
//the maximum number of threads that might be waiting on the cv (conservative estimation)
std::atomic<int> cntr;
FastMux():locked(false), cntr(0){}
void lock() {
if (locked.exchange(true)) {
cntr++;
{
std::unique_lock<std::mutex> ul(mux);
cv.wait(ul, [&]{return !locked.exchange(true); });
}
cntr--;
}
}
void unlock() {
locked = false;
if (cntr > 0){
std::lock_guard<std::mutex> ul(mux);
cv.notify_one();
}
}
};
Note that the std::mutex is not locked in between lock() and unlock() but it is only used for handling the condition variable. This results in more calls to lock / unlock if there is high congestion on the mutex.
The problem with your implementation is, that cv.notify_one(); can potentially be called between if(lockCounter.fetch_add(1, std::memory_order_acquire)>0) and cv.wait(lock); so your thread might never wake up.
I didn't do any performance comparisons against a fixed version of your proposed implementation though so you just have to see what works best for you.
Not really an answer per definition, but depending on the specific task, a lock-free queue might help getting rid of the mutex at all. This would help the design, if you have multiple producers and a single consumer (or even multiple consumers). Links:
Though not directly C++/STL, Boost.Lockfree provides such a queue.
Another option is the lock-free queue implementation in "C++ Concurrency in Action" by Anthony Williams.
A Fast Lock-Free Queue for C++
Update wrt to comments:
Queue size / overflow:
Queue overflowing can be avoided by i) making the queue large enough or ii) by making the producer thread wait with pushing data once the queue is full.
Another option would be to use multiple consumers and multiple queues and implement a parallel reduction but this depends on how the data is treated.
Consumer thread:
The queue could use std::condition_variable and make the consumer thread wait until there is data.
Another option would be to use a timer for checking in regular intervals (polling) for the queue being non-empty, once it is non-empty the thread can continuously fetch data and the go back into wait-mode.

pthreads: thread starvation caused by quick re-locking

I have a two threads, one which works in a tight loop, and the other which occasionally needs to perform a synchronization with the first:
// thread 1
while(1)
{
lock(work);
// perform work
unlock(work);
}
// thread 2
while(1)
{
// unrelated work that takes a while
lock(work);
// synchronizing step
unlock(work);
}
My intention is that thread 2 can, by taking the lock, effectively pause thread 1 and perform the necessary synchronization. Thread 1 can also offer to pause, by unlocking, and if thread 2 is not waiting on lock, re-lock and return to work.
The problem I have encountered is that mutexes are not fair, so thread 1 quickly re-locks the mutex and starves thread 2. I have attempted to use pthread_yield, and so far it seems to run okay, but I am not sure it will work for all systems / number of cores. Is there a way to guarantee that thread 1 will always yield to thread 2, even on multi-core systems?
What is the most effective way of handling this synchronization process?
You can build a FIFO "ticket lock" on top of pthreads mutexes, along these lines:
#include <pthread.h>
typedef struct ticket_lock {
pthread_cond_t cond;
pthread_mutex_t mutex;
unsigned long queue_head, queue_tail;
} ticket_lock_t;
#define TICKET_LOCK_INITIALIZER { PTHREAD_COND_INITIALIZER, PTHREAD_MUTEX_INITIALIZER }
void ticket_lock(ticket_lock_t *ticket)
{
unsigned long queue_me;
pthread_mutex_lock(&ticket->mutex);
queue_me = ticket->queue_tail++;
while (queue_me != ticket->queue_head)
{
pthread_cond_wait(&ticket->cond, &ticket->mutex);
}
pthread_mutex_unlock(&ticket->mutex);
}
void ticket_unlock(ticket_lock_t *ticket)
{
pthread_mutex_lock(&ticket->mutex);
ticket->queue_head++;
pthread_cond_broadcast(&ticket->cond);
pthread_mutex_unlock(&ticket->mutex);
}
Under this kind of scheme, no low-level pthreads mutex is held while a thread is within the ticketlock protected critical section, allowing other threads to join the queue.
In your case it is better to use condition variable to notify second thread when it is required to awake and perform all required operations.
pthread offers a notion of thread priority in its API. When two threads are competing over a mutex, the scheduling policy determines which one will get it. The function pthread_attr_setschedpolicy lets you set that, and pthread_attr_getschedpolicy permits retrieving the information.
Now the bad news:
When only two threads are locking / unlocking a mutex, I can’t see any sort of competition, the first who runs the atomic instruction takes it, the other blocks. I am not sure whether this attribute applies here.
The function can take different parameters (SCHED_FIFO, SCHED_RR, SCHED_OTHER and SCHED_SPORADIC), but in this question, it has been answered that only SCHED_OTHER was supported on linux)
So I would give it a shot if I were you, but not expect too much. pthread_yield seems more promising to me. More information available here.
Ticket lock above looks like the best. However, to insure your pthread_yield works, you could have a bool waiting, which is set and reset by thread2. thread1 yields as long as bool waiting is set.
Here's a simple solution which will work for your case (two threads). If you're using std::mutex then this class is a drop-in replacement. Change your mutex to this type and you are guaranteed that if one thread holds the lock and the other is waiting on it, once the first thread unlocks, the second thread will grab the lock before the first thread can lock it again.
If more than two threads happen to use the mutex simultaneously it will still function but there are no guarantees on fairness.
If you're using plain pthread_mutex_t you can easily change your locking code according to this example (unlock remains unchanged).
#include <mutex>
// Behaves the same as std::mutex but guarantees fairness as long as
// up to two threads are using (holding/waiting on) it.
// When one thread unlocks the mutex while another is waiting on it,
// the other is guaranteed to run before the first thread can lock it again.
class FairDualMutex : public std::mutex {
public:
void lock() {
_fairness_mutex.lock();
std::mutex::lock();
_fairness_mutex.unlock();
}
private:
std::mutex _fairness_mutex;
};