Is this implementation of inter-process Producer Consumer correct and safe against process crash? - c++

I am developing a message queue between two processes on Windows.
I would like to support multiple producers and one consumer.
The queue must not be corrupted by the crash of one of the processes, that is, the other processes are not effected by the crash, and when the crashed process is restarted it can continue communication (with the new, updated state).
Assume that the event objects in these snippets are wrappers for named Windows Auto Reset Events and mutex objects are wrappers for named Windows mutex (I used the C++ non-interprocess mutex type as a placeholder).
This is the producer side:
void producer()
{
for (;;)
{
// Multiple producers modify _writeOffset so must be given exclusive access
unique_lock<mutex> excludeProducers(_producerMutex);
// A snapshot of the readOffset is sufficient because we use _notFullEvent.
long readOffset = InterlockedCompareExchange(&_readOffset, 0, 0);
// while is required because _notFullEvent.Wait might return because it was abandoned
while (IsFull(readOffset, _writeOffset))
{
_notFullEvent.Wait(INFINITE);
readOffset = InterlockedCompareExchange(&_readOffset, 0, 0);
}
// use a mutex to protect the resource from the consumer
{
unique_lock<mutex> lockResource(_resourceMutex);
produce(_writeOffset);
}
// update the state
InterlockedExchange(&_writeOffset, IncrementOffset(_writeOffset));
_notEmptyEvent.Set();
}
}
Similarly, this is the consumer side:
void consumer()
{
for (;;)
{
long writeOffset = InterlockedCompareExchange(&_writeOffset, 0, 0);
while (IsEmpty(_readOffset, writeOffset))
{
_notEmptyEvent.Wait(INFINITE);
writeOffset = InterlockedCompareExchange(&_writeOffset, 0, 0);
}
{
unique_lock<mutex> lockResource(_resourceMutex);
consume(_readOffset);
}
InterlockedExchange(&_readOffset, IncrementOffset(_readOffset));
_notFullEvent.Set();
}
}
Are there any race conditions in this implementation?
Is it indeed protected against crashes as required?
P.S. The queue meets the requirements if the state of the queue is protected. If the crash occurred within the process(i) or consume(i) the contents of those slots might be corrupted and other means will be used to detect and maybe even correct corruption of those. Those means are out of the scope of this question.

There is indeed a race condition in this implementation.
Thank you #VTT for pointing it out.
#VTT wrote that if the producer dies right before _notEmptyEvent.Set(); then consumer may get stuck forever.
Well, maybe not forever, because when the producer is resumed it will add an item and wake up the consumer again. But the state has indeed been corrupted. If, for instance this happens QUEUE_SIZE times, the producer will see that the queue is full (IsFull() will return true) and it will wait. This is a deadlock.
I am considering the following solution to this, adding the commented code on the producer side. A similar addition should be made on the consumer side:
void producer()
{
for (;;)
{
// Multiple producers modify _writeOffset so must be given exclusive access
unique_lock<mutex> excludeProducers(_producerMutex);
// A snapshot of the readOffset is sufficient because we use _notFullEvent.
long readOffset = InterlockedCompareExchange(&_readOffset, 0, 0);
// ====================== Added begin
if (!IsEmpty(readOffset, _writeOffset))
{
_notEmptyEvent.Set();
}
// ======================= end Added
// while is required because _notFullEvent.Wait might return because it was abandoned
while (IsFull(readOffset, _writeOffset))
This will cause the producer to wake up the consumer whenever it gets the chance to run, if indeed the queue is now not empty.
This is looking more like a solution based on condition variables, which would have been my preferred pattern, were it not for the unfortunate fact that on Windows, condition variables are not named and therefore cannot be shared between processes.
If this solution is voted correct, I will edit the original post with the complete code.

So there are a few problems with the code posted in the question:
As already noted, there's a marginal race condition; if the queue were to become full, and all the active producers crashed before setting _notFullEvent, your code would deadlock. Your answer correctly resolves that problem by setting the event at the start of the loop rather than the end.
You're over-locking; there's typically little point in having multiple producers if only one of them is going to be producing at a time. This prohibits writing directly into shared memory, you'll need a local cache. (It isn't impossible to have multiple producers writing directly into different slots in the shared memory, but it would make robustness much more difficult to achieve.)
Similarly, you typically need to be able to produce and consume simultaneously, and your code doesn't allow this.
Here's how I'd do it, using a single mutex (shared by both consumer and producer threads) and two auto-reset event objects.
void consumer(void)
{
claim_mutex();
for (;;)
{
if (!IsFull(*read_offset, *write_offset))
{
// Queue is not full, make sure at least one producer is awake
SetEvent(notFullEvent);
}
while (IsEmpty(*read_offset, *write_offset))
{
// Queue is empty, wait for producer to add a message
release_mutex();
WaitForSingleObject(notEmptyEvent, INFINITE);
claim_mutex();
}
release_mutex();
consume(*read_offset);
claim_mutex();
*read_offset = IncrementOffset(*read_offset);
}
}
void producer(void)
{
claim_mutex();
for (;;)
{
if (!IsEmpty(*read_offset, *write_offset))
{
// Queue is not empty, make sure consumer is awake
SetEvent(notEmptyEvent);
}
if (!IsFull(*read_offset, *write_offset))
{
// Queue is not full, make sure at least one other producer is awake
SetEvent(notFullEvent);
}
release_mutex();
produce_in_local_cache();
claim_mutex();
while (IsFull(*read_offset, *write_offset))
{
// Queue is full, wait for consumer to remove a message
release_mutex();
WaitForSingleObject(notFullEvent, INFINITE);
claim_mutex();
}
copy_from_local_cache_to_shared_memory(*write_offset);
*write_offset = IncrementOffset(*write_offset);
}
}

Related

is there any way to wakeup multiple threads at the same time in c/c++

well, actually, I'm not asking the threads must "line up" to work, but I just want to notify multiple threads. so I'm not looking for barrier.
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation(also the potential problem in multiple semaphore post operation). it's kind of like:
std::atomic_flag flag{ATOMIC_FLAG_INIT};
void example() {
if (!flag.test_and_set()) {
// this is the thread to do the job, and notify others
do_something();
notify_others(); // this is what I'm looking for
flag.clear();
} else {
// this is the waiting thread
wait_till_notification();
do_some_other_thing();
}
}
void runner() {
std::vector<std::threads>;
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
example();
}
});
}
// ...
}
so how can I do this in c/c++ or maybe posix API?
sorry, I didn't make this question clear enough, I'd add some more explaination.
it's not thunder heard problem I'm talking about, and yes, it's the re-acquire-lock that bothers me, and I tried shared_mutex, there's still some problem.
let me split the threads to 2 parts, 1 as leader thread, which do the writing job, the others as worker threads, which do the reading job.
but actually they're all equal in programme, the leader thread is the thread that 1st got access to the job( you can take it as the shared buffer is underflowed for this thread). once the job is done, the other workers just need to be notified that them have the access.
if the mutex is used here, any thread would block the others.
to give an example: the main thread's job do_something() here is a read, and it block the main thread, thus the whole system is blocked.
unfortunatly, shared_mutex won't solve this problem:
void example() {
if (!flag.test_and_set()) {
// leader thread:
lk.lock();
do_something();
lk.unlock();
flag.clear();
} else {
// worker thread
lk.shared_lock();
do_some_other_thing();
lk.shared_unlock();
}
}
// outer loop
void looper() {
std::vector<std::threads>;
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
example();
}
});
}
}
in this code, if the leader job was done, and not much to do between this unlock and next lock (remember they're in a loop), it may get the lock again, leave the worker jobs not working, which is why I call it starve earlier.
and to explain the blocking in do_something(), I don't want this part of job takes all my CPU time, even if the leader's job is not ready (no data arrive for read)
and std::call_once may still not be the answer to this. because, as you can see, the workers must wait till the leader's job finished.
to summarize, this is actually a one-producer-multi-consumer problem.
but I want the consumers can do the job when the product is ready for them. and any can be the producer or consumer. if any but the 1st find the product has run out, the thread should be the producer, thus others are automatically consumer.
but unfortunately, I'm not sure if this idea would work or not
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation
In principle it's not waking up that is serialized, but re-acquiring the lock.
You can avoid that by using std::condition_variable_any with a std::shared_lock - so long as nobody ever gets an exclusive lock on the std::shared_mutex. Alternatively, you can provide your own Lockable type.
Note however that this won't magically allow you to concurrently run more threads than you have cores, or force the scheduler to start them all running in parallel. They'll just be marked as runnable and scheduled as normal - this only fixes the avoidable serialization in your own code.
It sounds like you are looking for call_once
#include <mutex>
void example()
{
static std::once_flag flag;
bool i_did_once = false;
std::call_once(flag, [&i_did_once]() mutable {
i_did_once = true;
do_something();
});
if(! i_did_once)
do_some_other_thing();
}
I don't see how your problem relates to starvation. Are you perhaps thinking about the thundering herd problem? This may arise if do_some_other_thing has a mutex but in that case you have to describe your problem in more detail.

And odd use of conditional variable with local mutex

Poring through legacy code of old and large project, I had found that there was used some odd method of creating thread-safe queue, something like this:
template < typename _Msg>
class WaitQue: public QWaitCondition
{
public:
typedef _Msg DataType;
void wakeOne(const DataType& msg)
{
QMutexLocker lock_(&mx);
que.push(msg);
QWaitCondition::wakeOne();
}
void wait(DataType& msg)
{
/// wait if empty.
{
QMutex wx; // WHAT?
QMutexLocker cvlock_(&wx);
if (que.empty())
QWaitCondition::wait(&wx);
}
{
QMutexLocker _wlock(&mx);
msg = que.front();
que.pop();
}
}
unsigned long size() {
QMutexLocker lock_(&mx);
return que.size();
}
private:
std::queue<DataType> que;
QMutex mx;
};
wakeOne is used from threads as kind of "posting" function" and wait is called from other threads and waits indefinitely until a message appears in queue. In some cases roles between threads reverse at different stages and using separate queues.
Is this even legal way to use a QMutex by creating local one? I kind of understand why someone could do that to dodge deadlock while reading size of que but how it even works? Is there a simpler and more idiomatic way to achieve this behavior?
Its legal to have a local condition variable. But it normally makes no sense.
As you've worked out in this case is wrong. You should be using the member:
void wait(DataType& msg)
{
QMutexLocker cvlock_(&mx);
while (que.empty())
QWaitCondition::wait(&mx);
msg = que.front();
que.pop();
}
Notice also that you must have while instead of if around the call to QWaitCondition::wait. This is for complex reasons about (possible) spurious wake up - the Qt docs aren't clear here. But more importantly the fact that the wake and the subsequent reacquire of the mutex is not an atomic operation means you must recheck the variable queue for emptiness. It could be this last case where you previously were getting deadlocks/UB.
Consider the scenario of an empty queue and a caller (thread 1) to wait into QWaitCondition::wait. This thread blocks. Then thread 2 comes along and adds an item to the queue and calls wakeOne. Thread 1 gets woken up and tries to reacquire the mutex. However, thread 3 comes along in your implementation of wait, takes the mutex before thread 1, sees the queue isn't empty, processes the single item and moves on, releasing the mutex. Then thread 1 which has been woken up finally acquires the mutex, returns from QWaitCondition::wait and tries to process... an empty queue. Yikes.

Deadlock with boost::condition_variable

I am a bit stuck with the problem, so it is my cry for help.
I have a manager that pushes some events to a queue, which is proceeded in another thread.
I don't want this thread to be 'busy waiting' for events in the queue, because it may be empty all the time (as well as it may always be full).
Also I need m_bShutdownFlag to stop the thread when needed.
So I wanted to try a condition_variable for this case: if something was pushed to a queue, then the thread starts its work.
Simplified code:
class SomeManager {
public:
SomeManager::SomeManager()
: m_bShutdownFlag(false) {}
void SomeManager::Initialize() {
boost::recursive_mutex::scoped_lock lock(m_mtxThread);
boost::thread thread(&SomeManager::ThreadProc, this);
m_thread.swap(thread);
}
void SomeManager::Shutdown() {
boost::recursive_mutex::scoped_lock lock(m_mtxThread);
if (m_thread.get_id() != boost::thread::id()) {
boost::lock_guard<boost::mutex> lockEvents(m_mtxEvents);
m_bShutdownFlag = true;
m_condEvents.notify_one();
m_queue.clear();
}
}
void SomeManager::QueueEvent(const SomeEvent& event) {
boost::lock_guard<boost::mutex> lockEvents(m_mtxEvents);
m_queue.push_back(event);
m_condEvents.notify_one();
}
private:
void SomeManager::ThreadProc(SomeManager* pMgr) {
while (true) {
boost::unique_lock<boost::mutex> lockEvents(pMgr->m_mtxEvents);
while (!(pMgr->m_bShutdownFlag || pMgr->m_queue.empty()))
pMgr->m_condEvents.wait(lockEvents);
if (pMgr->m_bShutdownFlag)
break;
else
/* Thread-safe processing of all the events in m_queue */
}
}
boost::thread m_thread;
boost::recursive_mutex m_mtxThread;
bool m_bShutdownFlag;
boost::mutex m_mtxEvents;
boost::condition_variable m_condEvents;
SomeThreadSafeQueue m_queue;
}
But when I test it with two (or more) almost simultaneous calls to QueueEvent, it gets locked at the line boost::lock_guard<boost::mutex> lockEvents(m_mtxEvents); forever.
Seems like the first call doesn't ever release lockEvents, so all the rest just keep waiting for its freeing.
Please, help me to find out what am I doing wrong and how to fix this.
There's a few things to point out on your code:
You may wish to join your thread after calling shutdown, to ensure that your main thread doesn't finish before your other thread.
m_queue.clear(); on shutdown is done outside of your m_mtxEvents mutex lock, meaning it's not as thread safe as you think it is.
your 'thread safe processing' of the queue should be just taking an item off and then releasing the lock while you go off to process the event. You've not shown that explicitly, but failure to do so will result in the lock preventing items from being added.
The good news about a thread blocking like this, is that you can trivially break and inspect what the other threads are doing, and locate the one that is holding the lock. It might be that as per my comment #3 you're just taking a long time to process an event. On the other hand it may be that you've got a dead lock. In any case, what you need is to use your debugger to establish exactly what you've done wrong, since your sample doesn't have enough in it to demonstrate your problem.
inside ThreadProc, while(ture) loop, the lockEvents is not unlocked in any case. try put lock and wait inside a scope.

Queued thread notification

That you can imagine my problem i describe the usage of my design:
In the class SerialInterface there is a thread that is checking every 10ms if a message is received. The class is implemented as an Observer pattern to notify other classes about the new received message/byte.
The Notify method of the Observer pattern is blocking until every subject has done its operation. Because i want to avoid any lags, I would like to notify the subjects asynchronously.
My first thought were events (condition variables in C++11).
The implementation would look like this:
class SerialInterface: public Observer {
private:
.....
void NotifyThread() {
while (mRunThreadNotify) {
std::unique_lock<std::mutex> lock(mMutex);
mCv.wait(lock);
NotifyObservers();
}
}
std::mutex mMutex;
std::condition_variable mCv;
std::atomic_bool mRunThreadNotify;
std::thread mThreadNotify;
.....
};
Now i can notify asynchronously via mCv.notify_all();
The problem now is following:
What if the thread NotifyThread() is currently notifying the subjects, but theres a new notify event incoming at the same time. It would complete the current notification and the new state would be skipped.
So my second approach was to create a counter for notifications and let it act like a queue:
class SerialInterface: public Observer {
public:
....
private:
.....
void NotifyThread() {
while (mRunThreadNotify) {
if (mNotifications > 0) {
NotifyObservers();
mNotifications--;
} else {
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
}
}
std::atomic<size_t> mNotifications;
std::atomic_bool mRunThreadNotify;
std::thread mThreadNotify;
.....
};
Here i have to increase the variable mNotifications to notify the subjects. But for me this solution looks not perfect as i use std::this_thread::sleep_for for a fixed waiting time.
Are there any suggestions or another approaches for this problem?
It seems to me that you want to separate the real-time behavior (10mS serial poll) from the rest of the program so that the real-time thread will never be held off waiting for any other routines. Given that, my suggestion would be to split the pattern into two parts:
The real-time part, which does nothing but receive incoming serial data and append it to the end of a FIFO queue (in a thread-safe manner, of course).
The non-real-time part (running in a different thread), in which data is popped from the head of the FIFO queue and handed around to all of the software components that want to react to it. This part can be as fast or as slow as it likes, since it will not hold up the real-time thread.
The FIFO queue part is a standard producer-consumer problem; there are various ways to implement it, but the way I usually do it is with a dequeue, a lock, and a condition variable (pseudocode):
// Called by the real-time/serial thread when it received serial data
void AppendBytesToQueue(const TheBytesObject & bytes)
{
bool wasQueueEmptyBefore;
m_lock.lock();
wasQueueEmptyBefore = (m_fifo.size() == 0);
m_fifo.push_back(bytes);
m_lock.unlock();
if (wasQueueEmptyBefore) m_condition_variable.signal();
}
// Called by the non-real-time/handling thread after it was
// woken up by the condition variable's signal (outQueue should
// be a reference to an empty dequeue that gets filled by this
// method)
void GetNewBytesFromQueue(std::dequeue & outQueue)
{
m_lock.lock();
std::swap(m_fifo, outQueue); // fast O(1) operation so m_lock() will never be locked for long
m_lock.unlock();
}
... and then after calling GetNewBytesFromQueue(), the handling/non-real-time thread can iterate over the contents of its temporary dequeue and deal with each item in order, without any risk of affecting the serial thread's performance.
When a notification is received, you can check whether your requirements have been met at that time.
Meeting the requirement can be specified as a predicate in the second argument to the wait().
mCvNotifications.wait(lock, [](){return true_if_requirements_met;});
If the requirement has not been met, thread will stay in the wait stage despite the notification.

C++ multithreading, simple consumer / producer threads, LIFO, notification, counter

I am new to multi-thread programming, I want to implement the following functionality.
There are 2 threads, producer and consumer.
Consumer only processes the latest value, i.e., last in first out (LIFO).
Producer sometimes generates new value at a faster rate than consumer can
process. For example, producer may generate 2 new value in 1
milli-second, but it approximately takes consumer 5 milli-seconds to process.
If consumer receives a new value in the middle of processing an old
value, there is no need to interrupt. In other words, consumer will finish current
execution first, then start an execution on the latest value.
Here is my design process, please correct me if I am wrong.
There is no need for a queue, since only the latest value is
processed by consumer.
Is notification sent from producer being queued automatically???
I will use a counter instead.
ConsumerThread() check the counter at the end, to make sure producer
doesn't generate new value.
But what happen if producer generates a new value just before consumer
goes to sleep(), but after check the counter???
Here is some pseudo code.
boost::mutex mutex;
double x;
void ProducerThread()
{
{
boost::scoped_lock lock(mutex);
x = rand();
counter++;
}
notify(); // wake up consumer thread
}
void ConsumerThread()
{
counter = 0; // reset counter, only process the latest value
... do something which takes 5 milli-seconds ...
if (counter > 0)
{
... execute this function again, not too sure how to implement this ...
}
else
{
... what happen if producer generates a new value here??? ...
sleep();
}
}
Thanks.
If I understood your question correctly, for your particular application, the consumer only needs to process the latest available value provided by the producer. In other words, it's acceptable for values to get dropped because the consumer cannot keep up with the producer.
If that's the case, then I agree that you can get away without a queue and use a counter. However, the shared counter and value variables will be need to be accessed atomically.
You can use boost::condition_variable to signal notifications to the consumer that a new value is ready. Here is a complete example; I'll let the comments do the explaining.
#include <boost/thread/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread/condition_variable.hpp>
#include <boost/thread/locks.hpp>
#include <boost/date_time/posix_time/posix_time_types.hpp>
boost::mutex mutex;
boost::condition_variable condvar;
typedef boost::unique_lock<boost::mutex> LockType;
// Variables that are shared between producer and consumer.
double value = 0;
int count = 0;
void producer()
{
while (true)
{
{
// value and counter must both be updated atomically
// using a mutex lock
LockType lock(mutex);
value = std::rand();
++count;
// Notify the consumer that a new value is ready.
condvar.notify_one();
}
// Simulate exaggerated 2ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(200));
}
}
void consumer()
{
// Local copies of 'count' and 'value' variables. We want to do the
// work using local copies so that they don't get clobbered by
// the producer when it updates.
int currentCount = 0;
double currentValue = 0;
while (true)
{
{
// Acquire the mutex before accessing 'count' and 'value' variables.
LockType lock(mutex); // mutex is locked while in this scope
while (count == currentCount)
{
// Wait for producer to signal that there is a new value.
// While we are waiting, Boost releases the mutex so that
// other threads may acquire it.
condvar.wait(lock);
}
// `lock` is automatically re-acquired when we come out of
// condvar.wait(lock). So it's safe to access the 'value'
// variable at this point.
currentValue = value; // Grab a copy of the latest value
// while we hold the lock.
}
// Now that we are out of the mutex lock scope, we work with our
// local copy of `value`. The producer can keep on clobbering the
// 'value' variable all it wants, but it won't affect us here
// because we are now using `currentValue`.
std::cout << "value = " << currentValue << "\n";
// Simulate exaggerated 5ms delay
boost::this_thread::sleep(boost::posix_time::milliseconds(500));
}
}
int main()
{
boost::thread c(&consumer);
boost::thread p(&producer);
c.join();
p.join();
}
ADDENDUM
I was thinking about this question recently, and realized that this solution, while it may work, is not optimal. Your producer is using all that CPU just to throw away half of the computed values.
I suggest that you reconsider your design and go with a bounded blocking queue between the producer and consumer. Such a queue should have the following characteristics:
Thread-safe
The queue has a fixed size (bounded)
If the consumer wants to pop the next item, but the queue is empty, the operation will be blocked until notified by the producer that an item is available.
The producer can check if there's room to push another item and block until the space becomes available.
With this type of queue, you can effectively throttle down the producer so that it doesn't outpace the consumer. It also ensures that the producer doesn't waste CPU resources computing values that will be thrown away.
Libraries such as TBB and PPL provide implementations of concurrent queues. If you want to attempt to roll your own using std::queue (or boost::circular_buffer) and boost::condition_variable, check out this blogger's example.
The short answer is that you're almost certainly wrong.
With a producer/consumer, you pretty much need a queue between the two threads. There are basically two alternatives: either your code won't will simply lose tasks (which usually equals not working at all) or else your producer thread will need to block for the consumer thread to be idle before it can produce an item -- which effectively translates to single threading.
For the moment, I'm going to assume that the value you get back from rand is supposed to represent the task to be executed (i.e., is the value produced by the producer and consumed by the consumer). In that case, I'd write the code something like this:
void producer() {
for (int i=0; i<100; i++)
queue.insert(random()); // queue.insert blocks if queue is full
queue.insert(-1.0); // Tell consumer to exit
}
void consumer() {
double value;
while ((value = queue.get()) != -1) // queue.get blocks if queue is empty
process(value);
}
This, relegates nearly all the interlocking to the queue. The rest of the code for both threads pretty much ignores threading issues entirely.
Implementing a pipeline is actually quite tricky if you are doing it ground-up. For example, you'd have to use condition variable to avoid the kind of race condition you described in your question, avoid busy waiting when implementing the mechanism for "waking up" the consumer etc... Even using a "queue" of just 1 element won't save you from some of these complexities.
It's usually much better to use specialized libraries that were developed and extensively tested specifically for this purpose. If you can live with Visual C++ specific solution, take a look at Parallel Patterns Library, and the concept of Pipelines.