fine-grained locking queue in c++ - c++

Here's a fine-grained locking queue introduced by Anthony Williams in chapter 6.2.3 C++ Concurrency in Action.
/*
pop only need lock head_mutex and a small section of tail_mutex,push only need
tail_mutex mutex.maximum container concurrency.
*/
template<typename T> class threadsafe_queue
{
private:
struct node
{
std::shared_ptr<T> data;
std::unique_ptr<node> next;
}
std::mutex head_mutex; //when change the head lock it.
std::unique_ptr<node> head;
std::mutex tail_mutex; //when change the tail lock it.
node* tail;
std::condition_variable data_cond;
node* get_tail()
{
std::lock_guard<std::mutex> tail_lock(tail_mutex);
return tail;
}
public:
/*
create a dummy node
*/
threadsafe_queue():
head(new node),tail(head.get())
{}
std::shared_ptr<T> wait_and_pop()
{
std::unique_lock<std::mutex> head_lock;
data_cond.wait(head_lock,[&]{return head.get()!=get_tail();}); //#1
std::unique_ptr<node> old_head=std::move(head);
head=std::move(old_head->next);
return old_head;
}
void push(T new_value)
{
std::shared_ptr<T> new_data(
std::make_shared<T>(std::move(new_value)));
std::unique_ptr<node> p(new node);
{
std::lock_guard<std::mutex> tail_lock(tail_mutex);
tail->data=new_data;
node* const new_tail=p.get();
tail->next=std::move(p);
tail=new_tail;
}
data_cond.notify_one();
}
}
Here's the situation: There are two threads (thread1 and thread2). thread1 is doing a wait_and_pop and thread2 is doing a push. The queue is empty.
thread1 is in #2, had already checked head.get()!=get_tail() before data_cond.wait(). At this time its CPU period had run out. thread2 begins.
thread2 finished the push function and did data_cond.notify_one(). thread1 begins again.
Now thread1 begins data_cond.wait(), but it waits forever.
Would this situation possibly happen ?If so, how to get this container fixed ?

Yes, the situation described in the OP is possible and will result in notifications being lost. Injecting a nice big time delay in the predicate function makes it easy to trigger. Here's a demonstration at Coliru. Notice how the program takes 10 seconds to complete (length of the timeout to wait_for) instead of 100 milliseconds (time when the producer inserts an item in the queue). The notification is lost.
There is an assumption implicit in the design of condition variables that the state of the condition (return value of the predicate) cannot change while the associated mutex is locked. This is not true for this queue implementation since push can change the "emptiness" of the queue without holding head_mutex.
§30.5p3 specifies that wait has three atomic parts:
the release of the mutex, and entry into the waiting state;
the unblocking of the wait; and
the reacquisition of the lock.
Note that none of these mention checking of the predicate, if any was passed to wait. The behavior of wait with a predicate is described in §30.5.1p15:
Effects:
while (!pred())
wait(lock);
Note that there is no guarantee here either that the predicate check and the wait are performed atomically. There is a pre-condition that lock is locked and it's associated mutex held by the calling thread.
As far as fixing the container to avoid loss of notifications, I would change it to a single mutex implementation and be done with it. It's a bit of a stretch to call it fine-grained locking when the push and pop both end up taking the same mutex (tail_mutex) anyway.

data_cond.wait() checks the condition every time it is woken up. So even though it may have already been checked, it will be checked again after data_cond.notify_one(). At that point, there is data to be popped (because Thread 2 had just completed a push), and so it returns. Read more here.
The only thing you should be worrying about is when you call wait_and_pop on an empty queue and then never push any more data onto it. This code does not have a mechanism for timing out a wait and returning an error (or throwing an exception).

Related

std queue pop a moved std string in multithreading

I am currently implementing a string processor. I used to using single-thread, but it is kind of slow, so I would like to use multi-thread to boost it. Now it has some problems I could not solve on my own.
I use thread-safe queue to implement producer and consumer. And the push and pop method of the thread-safe queue is below, and if whole file is needed, take a look at here:
template <typename Tp>
void ThreadSafeQueue<Tp>::enqueue(Tp &&data) {
std::lock_guard<std::mutex> lk(mtx);
q.emplace(std::forward<Tp>(data));
cv.notify_one();
}
template <typename Tp>
bool ThreadSafeQueue<Tp>::dequeue(Tp &data) {
std::unique_lock<std::mutex> lk(mtx);
while (!broken && q.empty()) {
cv.wait(lk);
}
if (!broken && !q.empty()) {
data = std::move(q.front());
q.pop();
}
return !broken;
}
When I use this struct to store string (aka Tp=std::string), problem occurs. I am using it this way:
producer:
__prepare_data__(raw_data)
std::vector<std::thread> vec_threads;
for(int i=0;i<thread_num;++i)
{
vec_threads.emplace_back(consumer,std::ref(raw_data),std::ref(processed_data))
}
for(int i=0;i<thread_num;++i)
{
if(vec_threads[i].joinable())
{
vec_thread[i].join();
}
__collect_data__(processed_data)
}
and consumer:
std::string buf;
while(deque(buf))
{
__process__(buf)
}
In the above codes, all values passed to consumer threads are passed by reference (aks using std::ref wrapper), so the __collect_data__ procedure is valid.
I will not meet any problem in these cases:
The number of string pieces is small. (This does not mean the string length is short.)
Only one consumer is working.
I will meet the problem in these cases:
The number of string is large, millions or so.
2 or more consumers is working.
And what exception the system would throw varies between these two:
Corrupted double-linked list, followed by a bunch of memory indicator. GDB told me the line causing problem is the pop in the dequeue method.
Pure segment fault. GDB told me the problem occurred when consumer threads were joining.
The first case happens the most frequently, so I would like to ask as the title indicates, Would it cause any undefined behavior when popping an already moved std::string? Or if you have any other insights, please let me know!
While there are issues with your code, there are none that explain your crash. I suggest you investigate your data processing code, not your queue.
For reference, your logic around queue shutdown is slightly wrong. For example, shutdown waits on the condition variable until the queue is empty but the dequeue operation does not notify on that variable. So you might deadlock.
It is easier to just ignore the "broken" flag in the dequeue operation until the queue is empty. That way the worker threads will drain the queue before quitting. Also, don't let the shutdown block until empty. If you want to wait until all threads are done with the queue, just join the threads.
Something like this:
template <typename Tp>
bool ThreadSafeQueue<Tp>::dequeue(Tp &data) {
std::unique_lock<std::mutex> lk(mtx);
while (!broken && q.empty()) {
cv.wait(lk);
}
if (q.empty())
return false; // broken
data = std::move(q.front());
q.pop();
return true;
}
template <typename Tp>
void ThreadSafeQueue<Tp>::shutdown() {
std::unique_lock<std::mutex> lk(mtx);
broken = true;
cv.notify_all();
}
There are other minor issues, for example it is in practice more efficient (and safe) to unlock mutexes before notifying the condition variables so that the woken threads do not race with the waking thread on acquiring/releasing the mutex. But that is not a correctness issue.
I also suggest you delete the move constructor on the queue. You rightfully noted that it shouldn't be called. Better make sure that it really isn't.

And odd use of conditional variable with local mutex

Poring through legacy code of old and large project, I had found that there was used some odd method of creating thread-safe queue, something like this:
template < typename _Msg>
class WaitQue: public QWaitCondition
{
public:
typedef _Msg DataType;
void wakeOne(const DataType& msg)
{
QMutexLocker lock_(&mx);
que.push(msg);
QWaitCondition::wakeOne();
}
void wait(DataType& msg)
{
/// wait if empty.
{
QMutex wx; // WHAT?
QMutexLocker cvlock_(&wx);
if (que.empty())
QWaitCondition::wait(&wx);
}
{
QMutexLocker _wlock(&mx);
msg = que.front();
que.pop();
}
}
unsigned long size() {
QMutexLocker lock_(&mx);
return que.size();
}
private:
std::queue<DataType> que;
QMutex mx;
};
wakeOne is used from threads as kind of "posting" function" and wait is called from other threads and waits indefinitely until a message appears in queue. In some cases roles between threads reverse at different stages and using separate queues.
Is this even legal way to use a QMutex by creating local one? I kind of understand why someone could do that to dodge deadlock while reading size of que but how it even works? Is there a simpler and more idiomatic way to achieve this behavior?
Its legal to have a local condition variable. But it normally makes no sense.
As you've worked out in this case is wrong. You should be using the member:
void wait(DataType& msg)
{
QMutexLocker cvlock_(&mx);
while (que.empty())
QWaitCondition::wait(&mx);
msg = que.front();
que.pop();
}
Notice also that you must have while instead of if around the call to QWaitCondition::wait. This is for complex reasons about (possible) spurious wake up - the Qt docs aren't clear here. But more importantly the fact that the wake and the subsequent reacquire of the mutex is not an atomic operation means you must recheck the variable queue for emptiness. It could be this last case where you previously were getting deadlocks/UB.
Consider the scenario of an empty queue and a caller (thread 1) to wait into QWaitCondition::wait. This thread blocks. Then thread 2 comes along and adds an item to the queue and calls wakeOne. Thread 1 gets woken up and tries to reacquire the mutex. However, thread 3 comes along in your implementation of wait, takes the mutex before thread 1, sees the queue isn't empty, processes the single item and moves on, releasing the mutex. Then thread 1 which has been woken up finally acquires the mutex, returns from QWaitCondition::wait and tries to process... an empty queue. Yikes.

mutex lock synchronization between different threads

Since I have recently started coding multi threaded programs this might be a stupid question. I found out about the awesome mutex and condition variable usage. From as far as I can understand there use is:
Protect sections of code/shared resources from getting corrupted by multiple threads access. Hence lock that portion thus one can control which thread will be accessing.
If a thread is waiting for a resource/condition from another thread one can use cond.wait() instead of polling every msec
Now Consider the following class example:
class Queue {
private:
std::queue<std::string> m_queue;
boost::mutex m_mutex;
boost::condition_variable m_cond;
bool m_exit;
public:
Queue()
: m_queue()
, m_mutex()
, m_cond()
, m_exit(false)
{}
void Enqueue(const std::string& Req)
{
boost::mutex::scoped_lock lock(m_mutex);
m_queue.push(Req);
m_cond.notify_all();
}
std::string Dequeue()
{
boost::mutex::scoped_lock lock(m_mutex);
while(m_queue.empty() && !m_exit)
{
m_cond.wait(lock);
}
if (m_queue.empty() && m_exit) return "";
std::string val = m_queue.front();
m_queue.pop();
return val;
}
void Exit()
{
boost::mutex::scoped_lock lock(m_mutex);
m_exit = true;
m_cond.notify_all();
}
}
In the above example, Exit() can be called and it will notify the threads waiting on Dequeue that it's time to exit without waiting for more data in the queue.
My question is since Dequeue has acquired the lock(m_mutex), how can Exit acquire the same lock(m_mutex)? Isn't unless the Dequeue releases the lock then only Exit can acquire it?
I have seen this pattern in Destructor implementation too, using same class member mutex, Destructor notifies all the threads(class methods) thats it time to terminate their respective loops/functions etc.
As Jarod mentions in the comments, the call
m_cond.wait(lock)
is guaranteed to atomically unlock the mutex, releasing it for the thread, and starts listening to notifications of the condition variable (see e.g. here).
This atomicity also ensures any code in the thread is executed after the listening is set up (so no notify calls will be missed). This assumes of course that the thread first locks the mutex, otherwise all bets are off.
Another important bit to understand is that condition variables may suffer from "spurious wakeups", so it is important to have a second boolean condition (e.g. here, you could check the emptiness of your queue) so that you don't end up awoken with an empty queue. Something like this:
m_cond.wait(lock, [this]() { return !m_queue.empty() || m_exit; });

How to test my blocking queue actually blocks

I have a blocking queue (it would be really hard for me to change its implementation), and I want to test that it actually blocks. In particular, the pop methods must block if the queue is empty and unblock as soon as a push is performed. See the following pseudo C++11 code for the test:
BlockingQueue queue; // empty queue
thread pushThread([]
{
sleep(large_delay);
queue.push();
});
queue.pop();
Obviously it is not perfect, because it may happen that the whole thread pushThread is executed and terminates before pop is called, even if the delay is large, and the larger the delay the more I have to wait for the test being over.
How can I properly ensure that pop is executed before push is called and that is blocks until push returns?
I do not believe this is possible without adding some extra state and interfaces to your BlockingQueue.
Proof goes something like this. You want to wait until the reading thread is blocked on pop. But there is no way to distinguish between that and the thread being about to execute the pop. This remains true no matter what you put just before or after the call to pop itself.
If you really want to fix this with 100% reliability, you need to add some state inside the queue, guarded by the queue's mutex, that means "someone is waiting". The pop call then has to update that state just before it atomically releases the mutex and goes to sleep on the internal condition variable. The push thread can obtain the mutex and wait until "someone is waiting". To avoid a busy loop here, you will want to use the condition variable again.
All of this machinery is nearly as complicated as the queue itself, so maybe you will want to test it, too... This sort of multi-threaded code is where concepts like "code coverage" -- and arguably even unit testing itself -- break down a bit. There are just too many possible interleavings of operations.
In practice, I would probably go with your original approach of sleeping.
template<class T>
struct async_queue {
T pop() {
auto l = lock();
++wait_count;
cv.wait( l, [&]{ return !data.empty(); } );
--wait_count;
auto r = std::move(data.front());
data.pop_front();
return r;
}
void push(T in) {
{
auto l = lock();
data.push_back( std::move(in) );
}
cv.notify_one();
}
void push_many(std::initializer_list<T> in) {
{
auto l = lock();
for (auto&& x: in)
data.push_back( x );
}
cv.notify_all();
}
std::size_t readers_waiting() {
return wait_count;
}
std::size_t data_waiting() const {
auto l = lock();
return data.size();
}
private:
std::queue<T> data;
std::condition_variable cv;
mutable std::mutex m;
std::atomic<std::size_t> wait_count{0};
auto lock() const { return std::unique_lock<std::mutex>(m); }
};
or somesuch.
In the push thread, busy wait on readers_waiting until it passes 1.
At which point you have the lock and are within cv.wait before the lock is unlocked. Do a push.
In theory an infinitely slow reader thread could have gotten into cv.wait and still be evaluating the first lambda by the time you call push, but an infinitely slow reader thread is no different than a blocked one...
This does, however, deal with slow thread startup and the like.
Using readers_waiting and data_waiting for anything other than debugging is usually code smell.
You can use a std::condition_variable to accomplish this. The help page of cppreference.com actually shows a very nice cosumer-producer example which should be exactly what you are looking for: http://en.cppreference.com/w/cpp/thread/condition_variable
EDIT: Actually the german version of cppreference.com has an even better example :-) http://de.cppreference.com/w/cpp/thread/condition_variable

Why does the author claim that this code leads to race?

Why does author think that below part of source code leads to race?
Author says:
This design is subject to race conditions between calls to empty, front and pop if there is more than one thread removing items from the queue, but in a single-consumer system (as being discussed here), this is not a problem.
Here is the code:
template<typename Data>
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
public:
void push(const Data& data)
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.push(data);
}
bool empty() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}
Data& front()
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.front();
}
Data const& front() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.front();
}
void pop()
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.pop();
}
};
If you call empty you check whether it is safe to pop an element. What could happen in a threaded system is that after you checked that queue is not empty another thread could already have popped the last element and it is no longer safe that the queue is not empty.
thread A: thread B:
if(!queue.empty());
if(!queue.empty());
queue.pop();
->it is no longer sure that the queue
isn't empty
If you have more than one thread "comsuming" data from the queue, it can lead to a race condition in a particularly bad way. Take the following pseudo code:
class consumer
{
void do_work()
{
if(!work_.empty())
{
type& t = work_.front();
work_.pop();
// do some work with t
t...
}
}
concurrent_queue<type> work_;
};
This looks simple enough, but what if you have multiple consumer objects, and there is only one item in the concurrent_queue. If the consumer is interrupted after calling empty(), but before calling pop(), then potentially multiple consumers will try to work on the same object.
A more appropriate implementation would perform the empty checking and popping in a single operation exposed in the interface, like this:
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
public:
void push(const Data& data)
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.push(data);
}
bool pop(Data& popped)
{
boost::mutex::scoped_lock lock(the_mutex);
if(!the_queue.empty())
{
popped = the_queue.front();
the_queue.pop();
return true;
}
return false;
}
};
Because you could do this...
if (!your_concurrent_queue.empty())
your_concurrent_queue.pop();
...and still have a failure on pop if another thread called pop "in between" these two lines.
(Whether this will actually happen in practice, depends on timing of execution of concurrent threads - in essence threads "race" and who wins this race determines whether the bug will manifest itself or not, which is essentially random on modern preemptive OSes. This randomness can make race conditions very hard to diagnose and repair.)
Whenever clients do "meta-operations" like these (where there is a sequence of several calls accomplishing the desired effect), it's impossible to protect against race conditions by in-method locking alone.
And since the clients have to perform their own locking anyway, you can even consider abandoning the in-method locking, for performance reasons. Just be sure this is clearly documented so the clients know that you are not making any promises regarding thread-safety.
I think what's confused you is that in the code you posted, there is nothing that causes a race condition. The race condition would be caused by the threads actually CALLING this code. Imagine that thread 1 checks to see if the thread is not empty. Then that thread goes to sleep for a year. One year later when it wakes up, is it still valid for that thread to assume the queue is still empty? Well, no, in the meantime, another thread could have easily come along and called pushed.