Race condition in a concurrent queue

Race condition in a concurrent queue - c++

I am currently trying to write a concurrent queue, but I have some segfaults that I can't explain to myself. My queue implementation is essentially given by the first listing on this site.
http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html
The site says that there is a race condition if objects are removed from the queue in parallel, but I just don't see why there is one, could anyone explain it to me?
Edit: This is the code:
template<typename Data>
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
public:
void push(const Data& data)
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.push(data);
}
bool empty() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}
Data& front()
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.front();
}
Data const& front() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.front();
}
void pop()
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.pop();
}
};

What if the queue is empty by the time you attempt to read item from it?
Think of this user code:
while(!q.empty()) //here you check q is not empty
{
//since q is not empty, you enter inside the loop
//BUT before executing the next statement in this loop body,
//the OS transfers the control to the other thread
//which removes items from q, making it empty!!
//then this thread executes the following statement!
auto item = q.front(); //what would it do (given q is empty?)
}

If you use empty and find the queue is not empty, another thread may have popped the item making it empty before you use the result.
Similarly for front, you may read the front item, and it could be popped by another thread by the time you use the item.

The answers from #parkydr and #Nawaz are correct, but here's another food for thought;
What are you trying to achieve?
The reason to have a thread-safe queue is sometimes (I dare not say often) mistaken. In many cases you want to lock "outside" the queue, in the context where the queue is just an implementation detail.
One reason however, for thread-safe queues are for consumer-producer situations, where 1-N nodes push data, and 1-M nodes pop from it regardless of what they get. All elements in the queue are treated equal, and the consumers just pop without knowing what they get, and start working on the data. In situations like that, your interface should not expose a T& front(). Well, you never should return a reference if you're not sure there's an item there (and in parallel situations, you can never be certain without external locks).
I would recommend using unique_ptr's (or shared_ptr of course) and to only expose race free functions (I'm leaving out const functions for brevity). Using std::unique_ptr will require C++11, but you can use boost::shared_ptr for the same functionality if C++11 isn't possible for you to use:
// Returns the first item, or an empty unique_ptr
std::unique_ptr< T > pop( );
// Returns the first item if it exists. Otherwise, waits at most <timeout> for
// a value to be be pushed. Returns an empty unique_ptr if timeout was reached.
std::unique_ptr< T > pop( {implementation-specific-type} timeout );
void push( std::unique_ptr< T >&& ptr );
Features such as exist() and front() are naturally victims of race conditions, since they cannot atomically perform the task you (think you) want. exist() will sometimes return a value which is incorrect at the time you receive the result, and front() would have to throw if the queue is empty.

I think the answers why the empty() function is useless/dangerous are clear. If you want a blocking queue, remove that.
Instead, add a condition variable (boost::condition, IIRC). The functions to push/pop then look like this:
void push(T data)
{
scoped_lock lock(mutex);
queue.push(data);
condition_var.notify_one();
}
data pop()
{
scoped_lock lock(mutex);
while(queue.empty())
condition_var.wait(lock);
return queue.pop();
}
Note that this is pseudo-ish code, but I'm confident that you can figure this out. That said, the suggestion to use unique_ptr (or auto_ptr for C98) to avoid copying the actual data is a good idea, but that's is a completely separate issue.

Related

std queue pop a moved std string in multithreading

I am currently implementing a string processor. I used to using single-thread, but it is kind of slow, so I would like to use multi-thread to boost it. Now it has some problems I could not solve on my own.
I use thread-safe queue to implement producer and consumer. And the push and pop method of the thread-safe queue is below, and if whole file is needed, take a look at here:
template <typename Tp>
void ThreadSafeQueue<Tp>::enqueue(Tp &&data) {
std::lock_guard<std::mutex> lk(mtx);
q.emplace(std::forward<Tp>(data));
cv.notify_one();
}
template <typename Tp>
bool ThreadSafeQueue<Tp>::dequeue(Tp &data) {
std::unique_lock<std::mutex> lk(mtx);
while (!broken && q.empty()) {
cv.wait(lk);
}
if (!broken && !q.empty()) {
data = std::move(q.front());
q.pop();
}
return !broken;
}
When I use this struct to store string (aka Tp=std::string), problem occurs. I am using it this way:
producer:
__prepare_data__(raw_data)
std::vector<std::thread> vec_threads;
for(int i=0;i<thread_num;++i)
{
vec_threads.emplace_back(consumer,std::ref(raw_data),std::ref(processed_data))
}
for(int i=0;i<thread_num;++i)
{
if(vec_threads[i].joinable())
{
vec_thread[i].join();
}
__collect_data__(processed_data)
}
and consumer:
std::string buf;
while(deque(buf))
{
__process__(buf)
}
In the above codes, all values passed to consumer threads are passed by reference (aks using std::ref wrapper), so the __collect_data__ procedure is valid.
I will not meet any problem in these cases:
The number of string pieces is small. (This does not mean the string length is short.)
Only one consumer is working.
I will meet the problem in these cases:
The number of string is large, millions or so.
2 or more consumers is working.
And what exception the system would throw varies between these two:
Corrupted double-linked list, followed by a bunch of memory indicator. GDB told me the line causing problem is the pop in the dequeue method.
Pure segment fault. GDB told me the problem occurred when consumer threads were joining.
The first case happens the most frequently, so I would like to ask as the title indicates, Would it cause any undefined behavior when popping an already moved std::string? Or if you have any other insights, please let me know!

While there are issues with your code, there are none that explain your crash. I suggest you investigate your data processing code, not your queue.
For reference, your logic around queue shutdown is slightly wrong. For example, shutdown waits on the condition variable until the queue is empty but the dequeue operation does not notify on that variable. So you might deadlock.
It is easier to just ignore the "broken" flag in the dequeue operation until the queue is empty. That way the worker threads will drain the queue before quitting. Also, don't let the shutdown block until empty. If you want to wait until all threads are done with the queue, just join the threads.
Something like this:
template <typename Tp>
bool ThreadSafeQueue<Tp>::dequeue(Tp &data) {
std::unique_lock<std::mutex> lk(mtx);
while (!broken && q.empty()) {
cv.wait(lk);
}
if (q.empty())
return false; // broken
data = std::move(q.front());
q.pop();
return true;
}
template <typename Tp>
void ThreadSafeQueue<Tp>::shutdown() {
std::unique_lock<std::mutex> lk(mtx);
broken = true;
cv.notify_all();
}
There are other minor issues, for example it is in practice more efficient (and safe) to unlock mutexes before notifying the condition variables so that the woken threads do not race with the waking thread on acquiring/releasing the mutex. But that is not a correctness issue.
I also suggest you delete the move constructor on the queue. You rightfully noted that it shouldn't be called. Better make sure that it really isn't.

Avoid race condition using std::mutex

I am dealing with the multi-threading project with C++ and I doubt about std::mutex
Let's assume that I have a stack.
#include <exception>
#include <memory>
#include <mutex>
#include <stack>
struct empty_stack: std::exception
{
const char* what() const throw();
};
template<typename T>
class threadsafe_stack
{
private:
std::stack<T> data;
mutable std::mutex m;
public:
threadsafe_stack(){}
threadsafe_stack(const threadsafe_stack& other)
{
std::lock_guard<std::mutex> lock(other.m);
data=other.data;
}
threadsafe_stack& operator=(const threadsafe_stack&) = delete;
void push(T new_value)
{
std::lock_guard<std::mutex> lock(m);
data.push(new_value);
}
std::shared_ptr<T> pop()
{
std::lock_guard<std::mutex> lock(m);
if(data.empty()) throw empty_stack();
std::shared_ptr<T> const res(std::make_shared<T>(data.top()));
data.pop();
return res;
}
void pop(T& value)
{
std::lock_guard<std::mutex> lock(m);
if(data.empty()) throw empty_stack();
value=data.top();
data.pop();
}
bool empty() const
{
std::lock_guard<std::mutex> lock(m);
return data.empty();
}
};
Someone said that using this stack can avoid race condition. However I think that problem here is that mutex aka mutual exclusion here only ensure for individual function not together. For example, I can have the threads call push and pop. Those function still have problem of race condition.
For example:
threadsafe_stack st; //global varibale for simple
void fun1(threadsafe_stack st)
{
std::lock_guard<std::mutex> lock(m);
st.push(t);
t = st.pop();
//
}
void fun2(threadsafe_stack st)
{
std::lock_guard<std::mutex> lock(m);
T t,t2;
t = st.pop();
// Do big things
st.push(t2);
//
}
If a thread fun1 and fun2 call the same stack (global variable for simple). So it can be a race condition(?)
I have only solution I can think is using some kind of atomic transaction means instead of calling directly push(), pop(), empty(), I call them via a function with a "function pointer" to those function and with only one mutex.
For example:
#define PUSH 0
#define POP 1
#define EMPTY 2
changeStack(int kindOfFunction, T* input, bool* isEmpty)
{
std::lock_guard<std::mutex> lock(m);
switch(kindOfFunction){
case PUSH:
push(input);
break;
case POP:
input = pop();
break;
case EMPTY:
isEmpty = empty();
break;
}
}
Is my solution good? Or I just overthinking and the first solution my friend told me is good enough? Are there any other solution for this? The solution can avoid "atomic transaction" like I suggest.

A given mutex is a single lock and can be held by a single thread at any one time.
If a thread (T1) is holding the lock on a given object in push() another thread (T2) cannot acquire it in pop() and will be blocked until T1 releases it. At that point of release T2 (or another thread also blocked by the same mutex) will be unblocked and allowed to proceed.
You do not need to do all the locking and unlocking in one member.
The point where you may still be introducing a race condition is constructs like this if they appear in consumer code:
if(!stack.empty()){
auto item=stack.pop();//Guaranteed?
}
If another thread T2 enters pop() after thread T1 enters empty() (above) and gets blocked waiting on the mutex then the pop() in T1 may fail because T2 'got there first'. Any number of actions might take place between the end of empty() and the start of pop() in that snippet unless other synchronization is handling it.
In this case you should imagine T1 & T2 literally racing to pop() though of course they may be racing to different members and still invalidate each other...
If you want to build code like that you usually have to then add further atomic member functions like try_pop() which returns (say) an empty std::shared_ptr<> if the stack is empty.
I hope this sentence isn't confusing:
Locking the object mutex inside member functions avoids race
conditions between calls to those member functions but not in
between calls to those member functions.
The best way to solve that is by adding 'composite' functions that are doing the job of more than one 'logical' operation. That tends to go against good class design in which you design a logical set of minimal operations and the consuming code combines them.
The alternative is to allow the consuming code access to the mutex. For example expose void lock() const; and void unlock() cont; members. That is usually not preferred because (a) it becomes very easy for consumer code to create deadlocks and (b) you either use a recursive lock (with its overhead) or double up member functions again:
void pop(); //Self locking version...
void pop_prelocked(); //Caller must hold object mutex or program invalidated.
Whether you expose them as public or protected or not that would make try_pop() look something like this:
std::shared_ptr<T> try_pop(){
std::lock_guard<std::mutex> guard(m);
if(empty_prelocked()){
return std::shared_ptr<T>();
}
return pop_prelocked();
}
Adding a mutex and acquiring it at the start of each member is only the start of the story...
Footnote: Hopefully that explains mutual exlusion (mut****ex). There's a whole other topic round memory barriers lurking below the surface here but if you use mutexes in this way you can treat that as an implementation detail for now...

You misunderstand something. You don't need that changeStack function.
If you forget about lock_guard, here's what it looks like (with lock_guard, the code does the same, but lock_guard makes it convenient: makes unlock automatic):
push() {
m.lock();
// do the push
m.unlock();
}
pop() {
m.lock();
// do the pop
m.unlock();
}
When push is called, mutex will be locked. Now, imagine, that on other thread, there is pop called. pop tries to lock the mutex, but it cannot lock it, because push already locked it. So it has to wait for push to unlock the mutex. When push unlocks the mutex, then pop can lock it.
So, in short, it is std::mutex which does the mutual exclusion, not the lock_guard.

fine-grained locking queue in c++

Here's a fine-grained locking queue introduced by Anthony Williams in chapter 6.2.3 C++ Concurrency in Action.
/*
pop only need lock head_mutex and a small section of tail_mutex,push only need
tail_mutex mutex.maximum container concurrency.
*/
template<typename T> class threadsafe_queue
{
private:
struct node
{
std::shared_ptr<T> data;
std::unique_ptr<node> next;
}
std::mutex head_mutex; //when change the head lock it.
std::unique_ptr<node> head;
std::mutex tail_mutex; //when change the tail lock it.
node* tail;
std::condition_variable data_cond;
node* get_tail()
{
std::lock_guard<std::mutex> tail_lock(tail_mutex);
return tail;
}
public:
/*
create a dummy node
*/
threadsafe_queue():
head(new node),tail(head.get())
{}
std::shared_ptr<T> wait_and_pop()
{
std::unique_lock<std::mutex> head_lock;
data_cond.wait(head_lock,[&]{return head.get()!=get_tail();}); //#1
std::unique_ptr<node> old_head=std::move(head);
head=std::move(old_head->next);
return old_head;
}
void push(T new_value)
{
std::shared_ptr<T> new_data(
std::make_shared<T>(std::move(new_value)));
std::unique_ptr<node> p(new node);
{
std::lock_guard<std::mutex> tail_lock(tail_mutex);
tail->data=new_data;
node* const new_tail=p.get();
tail->next=std::move(p);
tail=new_tail;
}
data_cond.notify_one();
}
}
Here's the situation: There are two threads (thread1 and thread2). thread1 is doing a wait_and_pop and thread2 is doing a push. The queue is empty.
thread1 is in #2, had already checked head.get()!=get_tail() before data_cond.wait(). At this time its CPU period had run out. thread2 begins.
thread2 finished the push function and did data_cond.notify_one(). thread1 begins again.
Now thread1 begins data_cond.wait(), but it waits forever.
Would this situation possibly happen ?If so, how to get this container fixed ?

Yes, the situation described in the OP is possible and will result in notifications being lost. Injecting a nice big time delay in the predicate function makes it easy to trigger. Here's a demonstration at Coliru. Notice how the program takes 10 seconds to complete (length of the timeout to wait_for) instead of 100 milliseconds (time when the producer inserts an item in the queue). The notification is lost.
There is an assumption implicit in the design of condition variables that the state of the condition (return value of the predicate) cannot change while the associated mutex is locked. This is not true for this queue implementation since push can change the "emptiness" of the queue without holding head_mutex.
§30.5p3 specifies that wait has three atomic parts:
the release of the mutex, and entry into the waiting state;
the unblocking of the wait; and
the reacquisition of the lock.
Note that none of these mention checking of the predicate, if any was passed to wait. The behavior of wait with a predicate is described in §30.5.1p15:
Effects:
while (!pred())
wait(lock);
Note that there is no guarantee here either that the predicate check and the wait are performed atomically. There is a pre-condition that lock is locked and it's associated mutex held by the calling thread.
As far as fixing the container to avoid loss of notifications, I would change it to a single mutex implementation and be done with it. It's a bit of a stretch to call it fine-grained locking when the push and pop both end up taking the same mutex (tail_mutex) anyway.

data_cond.wait() checks the condition every time it is woken up. So even though it may have already been checked, it will be checked again after data_cond.notify_one(). At that point, there is data to be popped (because Thread 2 had just completed a push), and so it returns. Read more here.
The only thing you should be worrying about is when you call wait_and_pop on an empty queue and then never push any more data onto it. This code does not have a mechanism for timing out a wait and returning an error (or throwing an exception).

Why does the author claim that this code leads to race?

Why does author think that below part of source code leads to race?
Author says:
This design is subject to race conditions between calls to empty, front and pop if there is more than one thread removing items from the queue, but in a single-consumer system (as being discussed here), this is not a problem.
Here is the code:
template<typename Data>
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
public:
void push(const Data& data)
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.push(data);
}
bool empty() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}
Data& front()
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.front();
}
Data const& front() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.front();
}
void pop()
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.pop();
}
};

If you call empty you check whether it is safe to pop an element. What could happen in a threaded system is that after you checked that queue is not empty another thread could already have popped the last element and it is no longer safe that the queue is not empty.
thread A: thread B:
if(!queue.empty());
if(!queue.empty());
queue.pop();
->it is no longer sure that the queue
isn't empty

If you have more than one thread "comsuming" data from the queue, it can lead to a race condition in a particularly bad way. Take the following pseudo code:
class consumer
{
void do_work()
{
if(!work_.empty())
{
type& t = work_.front();
work_.pop();
// do some work with t
t...
}
}
concurrent_queue<type> work_;
};
This looks simple enough, but what if you have multiple consumer objects, and there is only one item in the concurrent_queue. If the consumer is interrupted after calling empty(), but before calling pop(), then potentially multiple consumers will try to work on the same object.
A more appropriate implementation would perform the empty checking and popping in a single operation exposed in the interface, like this:
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
public:
void push(const Data& data)
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.push(data);
}
bool pop(Data& popped)
{
boost::mutex::scoped_lock lock(the_mutex);
if(!the_queue.empty())
{
popped = the_queue.front();
the_queue.pop();
return true;
}
return false;
}
};

Because you could do this...
if (!your_concurrent_queue.empty())
your_concurrent_queue.pop();
...and still have a failure on pop if another thread called pop "in between" these two lines.
(Whether this will actually happen in practice, depends on timing of execution of concurrent threads - in essence threads "race" and who wins this race determines whether the bug will manifest itself or not, which is essentially random on modern preemptive OSes. This randomness can make race conditions very hard to diagnose and repair.)
Whenever clients do "meta-operations" like these (where there is a sequence of several calls accomplishing the desired effect), it's impossible to protect against race conditions by in-method locking alone.
And since the clients have to perform their own locking anyway, you can even consider abandoning the in-method locking, for performance reasons. Just be sure this is clearly documented so the clients know that you are not making any promises regarding thread-safety.

I think what's confused you is that in the code you posted, there is nothing that causes a race condition. The race condition would be caused by the threads actually CALLING this code. Imagine that thread 1 checks to see if the thread is not empty. Then that thread goes to sleep for a year. One year later when it wakes up, is it still valid for that thread to assume the queue is still empty? Well, no, in the meantime, another thread could have easily come along and called pushed.

C++ Templated Producer-Consumer BlockingQueue, unbounded buffer: How do I end elegantly?

I wrote a BlockingQueue in order to communicate two threads. You could say it follows the Producer-Consumer pattern, with an unbounded buffer. Therefore, I implemented it using a Critical Section and a Semaphore, like this:
#pragma once
#include "Semaphore.h"
#include "Guard.h"
#include <queue>
namespace DRA{
namespace CommonCpp{
template<class Element>
class BlockingQueue{
CCriticalSection m_csQueue;
CSemaphore m_semElementCount;
std::queue<Element> m_Queue;
//Forbid copy and assignment
BlockingQueue( const BlockingQueue& );
BlockingQueue& operator=( const BlockingQueue& );
public:
BlockingQueue( unsigned int maxSize );
~BlockingQueue();
Element Pop();
void Push( Element newElement );
};
}
}
//Template definitions
template<class Element>
DRA::CommonCpp::BlockingQueue<Element>::BlockingQueue( unsigned int maxSize ):
m_csQueue( "BlockingQueue::m_csQueue" ),
m_semElementCount( 0, maxSize ){
}
template<class Element>
DRA::CommonCpp::BlockingQueue<Element>::~BlockingQueue(){
//TODO What can I do here?
}
template<class Element>
void DRA::CommonCpp::BlockingQueue<Element>::Push( Element newElement ){
{//RAII block
CGuard g( m_csQueue );
m_Queue.push( newElement );
}
m_semElementCount.Signal();
}
template<class Element>
Element DRA::CommonCpp::BlockingQueue<Element>::Pop(){
m_semElementCount.Wait();
Element popped;
{//RAII block
CGuard g( m_csQueue );
popped = m_Queue.front();
m_Queue.pop();
}
return popped;
}
CGuard is a RAII wrapper for a CCriticalSection, it enters it on construction and leaves it on destruction. CSemaphore is a wrapper for a Windows semaphore.
So far, so good, the threads are communicating perfectly. However, when the producer thread stops producing and ends, and the consumer thread has consumed everything, the consumer thread stays forever hung on a Pop() call.
How can I tell the consumer to end elegantly? I thought of sending a special empty Element, but it seems too sloppy.

You better use events instead of Semaphore. While adding, take lock on CS, and check element count (store into bIsEmpty local variable). Add into queue, then check if number of elements WAS empty, SetEvent!
On pop method, lock first, then check if it is empty, then WaitForSingleObject - as soon as WFSO returns you get that there is at least one item in queue.
Check this article

Does your Semaphore implementation have a timed wait function available? On Windows, that would be WaitForSingleObject() specifying a timeout. If so, Pop() could be implemented like this:
// Pseudo code
bool Pop(Element& r, timeout)
{
if(sem.wait(timeout))
{
r = m_Queue.front();
m_Queue.pop();
}
return false;
}
This way, the Pop() is still blocking, though it can be easily interrupted. Even with very short timeouts this won't consume significant amounts of CPU (more than absolutely necessary, yes -- and potentially introduce additional context switching -- so note those caveats).

You need a way of telling the consumer to stop. This could be a special element in the queue, say a simple wrapper structure around the Element, or a flag - a member variable of the queue class (in which case you want to make sure the flag is dealt with atomically - lookup windows "interlocked" functions). Then you need to check that condition in the consumer every time it wakes up. Finally, in the destructor, set that stop condition and signal the semaphore.
One issue remains - what to return from the consumer's pop(). I'd go for a boolean return value and an argument of type Element& to copy result into on success.
Edit:
Something like this:
bool Queue::Pop( Element& result ) {
sema.Wait();
if ( stop_condition ) return false;
critical_section.Enter();
result = m_queue.front();
m_queue.pop;
critical_section.Leave();
return true;
}

Change pop to return a boost optional (or do it like the standard library does with top/pop to separate the tasks) and then signal m_semElementCount one last time on destruction.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Race condition in a concurrent queue - c++

If you use empty and find the queue is not empty, another thread may have popped the item making it empty before you use the result. Similarly for front, you may read the front item, and it could be popped by another thread by the time you use the item.

Related

std queue pop a moved std string in multithreading

Avoid race condition using std::mutex

fine-grained locking queue in c++

Why does the author claim that this code leads to race?

C++ Templated Producer-Consumer BlockingQueue, unbounded buffer: How do I end elegantly?

Categories

Resources