Is this implementation of a Blocking Queue safe? - c++

I'm trying to implement a queue which blocks on the Pop operation if it's empty, and unblocks as soon as a new element is pushed. I'm afraid I might have some race condition; I tried to look at some other implementation, but most I found were done in .NET, and the few C++ I found depended too much on other library classes.
template <class Element>
class BlockingQueue{
DRA::CommonCpp::CCriticalSection m_csQueue;
DRA::CommonCpp::CEvent m_eElementPushed;
std::queue<Element> m_Queue;
public:
void Push( Element newElement ){
CGuard g( m_csQueue );
m_Queue.push( newElement );
m_eElementPushed.set();
}
Element Pop(){
{//RAII block
CGuard g( m_csQueue );
bool wait = m_Queue.empty();
}
if( wait )
m_eElementPushed.wait();
Element first;
{//RAII block
CGuard g( m_csQueue );
first = m_Queue.front();
m_Queue.pop();
}
return first;
}
};
Some explanations are due:
CCriticalSection is a wrapper for a Windows Critical Section, methods Enter and Leave are private, and CGuard is its only friend
CGuard is a RAII wrapper for CCriticalSection, enters critical section on constructor, leaves it on destructor
CEvent is a wrapper for a Windows Event, wait uses the WaitForSingleObject function
I don't mind that Elements are passed around by value, they are small objects
I can't use Boost, just Windows stuff (as I have already been doing with CEvent and CGuard)
I'm afraid there might be some weird race condition scenario when using Pop(). What do you guys think?
UPDATE: Since I'm working on Visual Studio 2010 (.NET 4.0), I ended up using the unbounded_buffer class provided by the C++ runtime. Of course, I wrapped it in a class using the Pointer to Implementation Idiom (Chesire Cat) just in case we decide to change the implementation or need to port this class to another environment

It’s not thread safe:
{//RAII block
CGuard g( m_csQueue );
bool wait = m_Queue.empty();
}
/// BOOM! Other thread ninja-Pop()s an item.
if( wait )
m_eElementPushed.wait();
Notice the location of the BOOM comment. In fact, other locations are also thinkable (after the if). In either case, the subsequent front and pop calls will fail.

Condition variables should be helpful if you are targetting newest Windows versions. This typically makes implementing blocking queues simpler.
See here for design of a similar queue using Boost - even if you cannot use Boost or condition variables, the general guidance and follow-up discussion there should be useful.

Related

Thread-safe recursive iteration

I am implementing an Observer pattern in my C++ application. So far I have a base class defined as follows:
template<typename CallbackType,
typename LockType = utils::Lock<boost::mutex>,
typename StorageType = std::vector<std::shared_ptr<CallbackType>>>
class CallbackManager
{
LockType mCallbackLock { "commons::CallbackManager", __FILE__, __LINE__ };
StorageType mCallbacks;
virtual uint64_t registerCallback( const std::shared_ptr<CallbackType>& aCallbackPtr );
virtual bool unregisterCallback( uint64_t aCallbackId );
void iterateSubscribers(const std::function<void(typename StorageType::value_type&)>&);
};
One of the key requirement is to be able to perform registerCallback/unregisterCallback calls within the iterateSubscribers context (recursively).
iterateSubscribers function performs an iterative call of aFnc with every callback as an argument. aFnc might perform registerCallback/unregisterCallback call. Current implementation written as follows:
void iterateSubscribers(const std::function<void(typename StorageType::value_type&)>& aFnc)
{
decltype(mCallbacks) callbacks;
{
// BEGIN SCOPED LOCK
auto lock = mCallbackLock.getScopedLock();
boost::ignore_unused( lock ); // To supress the warning in case NoLock is used
callbacks = mCallbacks;
// END SCOPED LOCK
}
for( auto it = callbacks.begin(); it != callbacks.end(); ++it )
{
aFnc( *it );
}
}
I am performing a copy of the callbacks container which is kinda not really good from the performance point of view but resolves two issues:
mCallbacks modifications do not break a loop
new lock acquisitions will not cause a deadlock
The trade-off is acceptable for me, I will not notify newly added subscribers and will not skip notification for recently removed subscribers.
What I can't resolve in an acceptable way is:
Non-copyable container as a template parameter for StorageType will cause a compilation error in iterateSubscribers function
The one possible solution is to define such container or its value_type as a pointer, but for the reasons I cannot afford this.
I also gave a look on alternative solution - do not perform a copy, replace regular mutex with a recursive mutex and perform all calls under the lock held, but it has one problem. Once element will be added or deleted, the loop will be broken:
for( auto it = callbacks.begin(); it != callbacks.end(); ++it )
Is there any other technique/ideas to achieve desired behavior?
EDIT:
I went by the road of the refactoring. Changing legacy callback containers value_type from the copy of the callback to the shared_ptr. That allowed me to copy the container and to not rely on the TriviallyCopyable state of the underplaying value_type.

Mutex guards: is there any automated protection mechanism for objects?

This scenario always occurs frequently: We have some threads, and a shared object, we need to make sure at any time only one thread can modify that object.
Well, the obvious solution is to use the lock the door-do the job-get out of there idiom. In this situation, I always use POSIX mutexes. For example
pthread_mutex_lock(&this->messageRW); // lock the door
P_Message x = this->messageQueue.front(); // do the job
this->messageQueue.pop();
pthread_mutex_unlock(&this->messageRW); // get out of there
// somewhere else, in another thread
while (true) {
P_Message message;
solver->listener->recvMessage(message);
pthread_mutex_lock(&(solver->messageRW)); // lock the door
solver->messageQueue.push(message); // do the job
pthread_mutex_unlock(&(solver->messageRW)); // get out of there
sem_post(&solver->messageCount);
}
I use messageQueue in so many places in code. So ended up with a lot of lock/unlock pairs which are inelegant. I think there should be a way to declare messageQueue as an object that is supposed to be shared between threads, and then threading API can take care of lock/unlock. I can think of a wrapper class, or something similar. A POSIX-based solution is preferred though other API's (boost threads, or other libraries) are also acceptable.
What would you implement in a similar situation?
Update for future readers
I found this. Will be a part of C++14 I guess.
You can use boost:scoped_lock in this case. As soon as you go out of scope, it unlocks elegantly:
boost::mutex mMutex;//member mutex object defined somewhere
{ //scope operator start
boost::mutex::scoped_lock scopedLock(mMutex);
pthread_mutex_lock(); // scoped lock the door
P_Message x = this->messageQueue.front(); // do the job
this->messageQueue.pop();
} //scope operator end, unlock mutex
// somewhere else, in another thread
while (true) {
P_Message message;
solver->listener->recvMessage(message);
boost::mutex::scoped_lock scopedLock(mMutex); // scoped lock the door
solver->messageQueue.push(message); // do the job
sem_post(&solver->messageCount);
} //scope operator end, unlock mutex
For this I would either subclass (is-a) or include (has-a) the message queue class into another class which forced the use of mutexes.
That's functionally what other languages do such as with the Java synchronized keyword - it modifieds the underlying object to be automatically protected.
It's the message queue itself which should handle the locking
(be atomic), not the calling code. And you need more than just
a mutex, you need a condition as well to avoid race conditions.
The standard idiom would be something like:
class ScopedLock // You should already have this one anyway
{
pthread_mutex_t& myOwned;
ScopedLock( ScopedLock const& );
ScopedLock& operator=( ScopedLock const& );
public:
ScopedLock( pthread_mutex_t& owned )
: myOwned( owned )
{
pthread_mutex_lock( &myOwned );
}
~ScopedLock()
{
pthread_mutex_unlock( &myOwned );
}
};
class MessageQueue
{
std::deque<Message> myQueue;
pthread_mutex_t myMutex;
pthread_cond_t myCond;
public:
MessageQueue()
{
pthread_mutex_init( &myMutex );
pthread_cond_init( &myCond );
}
void push( Message const& message )
{
ScopedLock( myMutex );
myQueue.push_back( message );
pthread_cond_broadcast( &myCond );
}
Message pop()
{
ScopedLock( myMutex );
while ( myQueue.empty() ) {
pthread_cond_wait( &myCond, &myMutex );
}
Message results = myQueue.front();
myQueue.pop_front();
return results;
}
};
This needs more error handling, but the basic structure is
there.
Of course, if you can use C++11, you'ld be better off using the
standard thread primitives. (Otherwise, I'd normally suggest
Boost threads. But if you're already using Posix threads, you
might want to wait to convert until you can use standard
threads, rather than converting twice.) But you'll still need
both a mutex and a condition.

Race condition in a concurrent queue

I am currently trying to write a concurrent queue, but I have some segfaults that I can't explain to myself. My queue implementation is essentially given by the first listing on this site.
http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html
The site says that there is a race condition if objects are removed from the queue in parallel, but I just don't see why there is one, could anyone explain it to me?
Edit: This is the code:
template<typename Data>
class concurrent_queue
{
private:
std::queue<Data> the_queue;
mutable boost::mutex the_mutex;
public:
void push(const Data& data)
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.push(data);
}
bool empty() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.empty();
}
Data& front()
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.front();
}
Data const& front() const
{
boost::mutex::scoped_lock lock(the_mutex);
return the_queue.front();
}
void pop()
{
boost::mutex::scoped_lock lock(the_mutex);
the_queue.pop();
}
};
What if the queue is empty by the time you attempt to read item from it?
Think of this user code:
while(!q.empty()) //here you check q is not empty
{
//since q is not empty, you enter inside the loop
//BUT before executing the next statement in this loop body,
//the OS transfers the control to the other thread
//which removes items from q, making it empty!!
//then this thread executes the following statement!
auto item = q.front(); //what would it do (given q is empty?)
}
If you use empty and find the queue is not empty, another thread may have popped the item making it empty before you use the result.
Similarly for front, you may read the front item, and it could be popped by another thread by the time you use the item.
The answers from #parkydr and #Nawaz are correct, but here's another food for thought;
What are you trying to achieve?
The reason to have a thread-safe queue is sometimes (I dare not say often) mistaken. In many cases you want to lock "outside" the queue, in the context where the queue is just an implementation detail.
One reason however, for thread-safe queues are for consumer-producer situations, where 1-N nodes push data, and 1-M nodes pop from it regardless of what they get. All elements in the queue are treated equal, and the consumers just pop without knowing what they get, and start working on the data. In situations like that, your interface should not expose a T& front(). Well, you never should return a reference if you're not sure there's an item there (and in parallel situations, you can never be certain without external locks).
I would recommend using unique_ptr's (or shared_ptr of course) and to only expose race free functions (I'm leaving out const functions for brevity). Using std::unique_ptr will require C++11, but you can use boost::shared_ptr for the same functionality if C++11 isn't possible for you to use:
// Returns the first item, or an empty unique_ptr
std::unique_ptr< T > pop( );
// Returns the first item if it exists. Otherwise, waits at most <timeout> for
// a value to be be pushed. Returns an empty unique_ptr if timeout was reached.
std::unique_ptr< T > pop( {implementation-specific-type} timeout );
void push( std::unique_ptr< T >&& ptr );
Features such as exist() and front() are naturally victims of race conditions, since they cannot atomically perform the task you (think you) want. exist() will sometimes return a value which is incorrect at the time you receive the result, and front() would have to throw if the queue is empty.
I think the answers why the empty() function is useless/dangerous are clear. If you want a blocking queue, remove that.
Instead, add a condition variable (boost::condition, IIRC). The functions to push/pop then look like this:
void push(T data)
{
scoped_lock lock(mutex);
queue.push(data);
condition_var.notify_one();
}
data pop()
{
scoped_lock lock(mutex);
while(queue.empty())
condition_var.wait(lock);
return queue.pop();
}
Note that this is pseudo-ish code, but I'm confident that you can figure this out. That said, the suggestion to use unique_ptr (or auto_ptr for C98) to avoid copying the actual data is a good idea, but that's is a completely separate issue.

VC++ 2010: Weird Critical Section error

My program is randomly crashing in a small scenario I can reproduce, but it happens in mlock.c (which is a VC++ runtime file) from ntdll.dll, and I can't see the stack trace. I do know that it happens in one of my thread functions, though.
This is the mlock.c code where the program crashes:
void __cdecl _unlock (
int locknum
)
{
/*
* leave the critical section.
*/
LeaveCriticalSection( _locktable[locknum].lock );
}
The error is "invalid handle specified". If I look at locknum, it's a number larger than _locktable's size, so this makes some sense.
This seems to be related to Critical Section usage. I do use CRITICAL_SECTIONS in my thread, via a CCriticalSection wrapper class and its associated RAII guard, CGuard. Definitions for both here to avoid even more clutter.
This is the thread function that's crashing:
unsigned int __stdcall CPlayBack::timerThread( void * pParams ) {
#ifdef _DEBUG
DRA::CommonCpp::SetThreadName( -1, "CPlayBack::timerThread" );
#endif
CPlayBack * pThis = static_cast<CPlayBack*>( pParams );
bool bContinue = true;
while( bContinue ) {
float m_fActualFrameRate = pThis->m_fFrameRate * pThis->m_fFrameRateMultiplier;
if( m_fActualFrameRate != 0 && pThis->m_bIsPlaying ) {
bContinue = ( ::WaitForSingleObject( pThis->m_hEndThreadEvent, static_cast<DWORD>( 1000.0f / m_fActualFrameRate ) ) == WAIT_TIMEOUT );
CImage img;
if( pThis->m_bIsPlaying && pThis->nextFrame( img ) )
pThis->sendImage( img );
}
else
bContinue = ( ::WaitForSingleObject( pThis->m_hEndThreadEvent, 10 ) == WAIT_TIMEOUT );
}
::GetErrorLoggerInstance()->Log( LOG_TYPE_NOTE, "CPlayBack", "timerThread", "Exiting thread" );
return 0;
}
Where does CCriticalSection come in? Every CImage object contains a CCriticalSection object which it uses through a CGuard RAII lock. Moreover, every CImage contains a CSharedMemory object which implements reference counting. To that end, it contains two CCriticalSection's as well, one for the data and one for the reference counter. A good example of these interactions is best seen in the destructors:
CImage::~CImage() {
CGuard guard(m_csData);
if( m_pSharedMemory != NULL ) {
m_pSharedMemory->decrementUse();
if( !m_pSharedMemory->isBeingUsed() ){
delete m_pSharedMemory;
m_pSharedMemory = NULL;
}
}
m_cProperties.ClearMin();
m_cProperties.ClearMax();
m_cProperties.ClearMode();
}
CSharedMemory::~CSharedMemory() {
CGuard guardUse( m_cs );
if( m_pData && m_bCanDelete ){
delete []m_pData;
}
m_use = 0;
m_pData = NULL;
}
Anyone bumped into this kind of error? Any suggestion?
Edit: I got to see some call stack: the call comes from ~CSharedMemory. So there must be some race condition there
Edit: More CSharedMemory code here
The "invalid handle specified" return code paints a pretty clear picture that your critical section object has been deallocated; assuming of course that it was allocated properly to begin with.
Your RAII class seems like a likely culprit. If you take a step back and think about it, your RAII class violates the Sepration Of Concerns principle, because it has two jobs:
It provides allocate/destroy semantics for the CRITICAL_SECTION
It provides acquire/release semantics for the CRITICAL_SECTION
Most implementations of a CS wrapper I have seen violate the SoC principle in the same way, but it can be problematic. Especially when you have to start passing around instances of the class in order to get to the acquire/release functionality. Consider a simple, contrived example in psudocode:
void WorkerThreadProc(CCriticalSection cs)
{
cs.Enter();
// MAGIC HAPPENS
cs.Leave();
}
int main()
{
CCriticalSection my_cs;
std::vector<NeatStuff> stuff_used_by_multiple_threads;
// Create 3 threads, passing the entry point "WorkerThreadProc"
for( int i = 0; i < 3; ++i )
CreateThread(... &WorkerThreadProc, my_cs);
// Join the 3 threads...
wait();
}
The problem here is CCriticalSection is passed by value, so the destructor is called 4 times. Each time the destructor is called, the CRITICAL_SECTION is deallocated. The first time works fine, but now it's gone.
You could kludge around this problem by passing references or pointers to the critical section class, but then you muddy the semantic waters with ownership issues. What if the thread that "owns" the crit sec dies before the other threads? You could use a shared_ptr, but now nobody really "owns" the critical section, and you have given up a little control in on area in order to gain a little in another area.
The true "fix" for this problem is to seperate concerns. Have one class for allocation & deallocation:
class CCriticalSection : public CRITICAL_SECTION
{
public:
CCriticalSection(){ InitializeCriticalSection(this); }
~CCriticalSection() { DestroyCriticalSection(this); }
};
...and another to handle locking & unlocking...
class CSLock
{
public:
CSLock(CRITICAL_SECTION& cs) : cs_(cs) { EnterCriticalSection(&cs_); }
~CSLock() { LeaveCriticalSection(&cs_); }
private:
CRITICAL_SECTION& cs_;
};
Now you can pass around raw pointers or references to a single CCriticalSection object, possibly const, and have the worker threads instantiate their own CSLocks on it. The CSLock is owned by the thread that created it, which is as it should be, but ownership of the CCriticalSection is clearly retained by some controlling thread; also a good thing.
Make sure Critical Section object is not in #pragma packing 1 (or any non-default packing).
Ensure that no other thread (or same thread) is corrupting the CS object. Run some static analysis tool to check for any buffer overrun problem.
If you have runtime analysis tool, do run it to find the issue.
I decided to adhere to the KISS principle and rock and roll all nite simplify things. I figured I'd replace the CSharedMemoryClass with a std::tr1::shared_ptr<BYTE> and a CCriticalSection which protects it from concurrent access. Both are members of CImage now, and concerns are better separated now, IMHO.
That solved the weird critical section, but now it seems I have a memory leak caused by std::tr1::shared_ptr, you might see me post about it soon... It never ends!

C++ Templated Producer-Consumer BlockingQueue, unbounded buffer: How do I end elegantly?

I wrote a BlockingQueue in order to communicate two threads. You could say it follows the Producer-Consumer pattern, with an unbounded buffer. Therefore, I implemented it using a Critical Section and a Semaphore, like this:
#pragma once
#include "Semaphore.h"
#include "Guard.h"
#include <queue>
namespace DRA{
namespace CommonCpp{
template<class Element>
class BlockingQueue{
CCriticalSection m_csQueue;
CSemaphore m_semElementCount;
std::queue<Element> m_Queue;
//Forbid copy and assignment
BlockingQueue( const BlockingQueue& );
BlockingQueue& operator=( const BlockingQueue& );
public:
BlockingQueue( unsigned int maxSize );
~BlockingQueue();
Element Pop();
void Push( Element newElement );
};
}
}
//Template definitions
template<class Element>
DRA::CommonCpp::BlockingQueue<Element>::BlockingQueue( unsigned int maxSize ):
m_csQueue( "BlockingQueue::m_csQueue" ),
m_semElementCount( 0, maxSize ){
}
template<class Element>
DRA::CommonCpp::BlockingQueue<Element>::~BlockingQueue(){
//TODO What can I do here?
}
template<class Element>
void DRA::CommonCpp::BlockingQueue<Element>::Push( Element newElement ){
{//RAII block
CGuard g( m_csQueue );
m_Queue.push( newElement );
}
m_semElementCount.Signal();
}
template<class Element>
Element DRA::CommonCpp::BlockingQueue<Element>::Pop(){
m_semElementCount.Wait();
Element popped;
{//RAII block
CGuard g( m_csQueue );
popped = m_Queue.front();
m_Queue.pop();
}
return popped;
}
CGuard is a RAII wrapper for a CCriticalSection, it enters it on construction and leaves it on destruction. CSemaphore is a wrapper for a Windows semaphore.
So far, so good, the threads are communicating perfectly. However, when the producer thread stops producing and ends, and the consumer thread has consumed everything, the consumer thread stays forever hung on a Pop() call.
How can I tell the consumer to end elegantly? I thought of sending a special empty Element, but it seems too sloppy.
You better use events instead of Semaphore. While adding, take lock on CS, and check element count (store into bIsEmpty local variable). Add into queue, then check if number of elements WAS empty, SetEvent!
On pop method, lock first, then check if it is empty, then WaitForSingleObject - as soon as WFSO returns you get that there is at least one item in queue.
Check this article
Does your Semaphore implementation have a timed wait function available? On Windows, that would be WaitForSingleObject() specifying a timeout. If so, Pop() could be implemented like this:
// Pseudo code
bool Pop(Element& r, timeout)
{
if(sem.wait(timeout))
{
r = m_Queue.front();
m_Queue.pop();
}
return false;
}
This way, the Pop() is still blocking, though it can be easily interrupted. Even with very short timeouts this won't consume significant amounts of CPU (more than absolutely necessary, yes -- and potentially introduce additional context switching -- so note those caveats).
You need a way of telling the consumer to stop. This could be a special element in the queue, say a simple wrapper structure around the Element, or a flag - a member variable of the queue class (in which case you want to make sure the flag is dealt with atomically - lookup windows "interlocked" functions). Then you need to check that condition in the consumer every time it wakes up. Finally, in the destructor, set that stop condition and signal the semaphore.
One issue remains - what to return from the consumer's pop(). I'd go for a boolean return value and an argument of type Element& to copy result into on success.
Edit:
Something like this:
bool Queue::Pop( Element& result ) {
sema.Wait();
if ( stop_condition ) return false;
critical_section.Enter();
result = m_queue.front();
m_queue.pop;
critical_section.Leave();
return true;
}
Change pop to return a boost optional (or do it like the standard library does with top/pop to separate the tasks) and then signal m_semElementCount one last time on destruction.