Multithreaded c++11-ish queue fails on windows - c++

I'm not that into multi-threading, so I appreciate any advice.
In my server which is written in producer-consumer multi-threaded style
queue is wrapped altogether with its mutex and cv:
template <typename Event>
struct EventsHandle {
public: // methods:
Event*
getEvent()
{
std::unique_lock<std::mutex> lock {mutex};
while (events.empty()) {
condition_variable.wait(lock);
}
return events.front();
};
void
setEvent(Event* event)
{
std::lock_guard<std::mutex> lock {mutex};
events.push(event);
condition_variable.notify_one();
};
void
pop()
{ events.pop(); };
private: // fields:
std::queue<Event*> events;
std::mutex mutex;
std::condition_variable condition_variable;
};
and here how it is used in the consumer thread:
void
Server::listenEvents()
{
while (true) {
processEvent(events_handle.getEvent());
events_handle.pop();
}
};
and in a producer:
parse input, whatever else
...
setEvent(new Event {ERASE_CLIENT, getSocket(), nullptr});
...
void
Client::setEvent(Event* event)
{
if (event) {
events_handle->setEvent(event);
}
};
The code works on linux and I don't know why, but fails on windows MSVC13.
At some point exception is thrown with this dialog:
"Unhandled exception at 0x59432564 (msvcp120d.dll) in Server.exe: 0xC0000005: Access violation reading location 0xCDCDCDE1".
Debugging shows that exception is thrown on this line: std::lock_guard<std::mutex> lock(mutex) in the setEvent() function.
Little googling led me to these articles:
http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html
http://www.codeproject.com/Articles/598695/Cplusplus-threads-locks-and-condition-variables
I tried to follow them, but with little help. So after this long text sheet my question:
what's wrong with the code, mutex?
UPDATE
So... Finally after lovely game named 'comment that line out' it turned out that the problem was in memory management. Event's dctor caused failure.
As conclusion, it's better to use std::unique_ptr<> as mentions Jarod42 or use value semantic as advises juanchopanza when passing objects back and forth.
And use libraries whenever possible, don't reinvent the wheel =)

You have a data race due to a missing mutex in EventsHandle::pop.
Your producer thread may push items to the queue by calling setEvent(), and get preempted while executing the line events.push(event). Now a consumer thread can execute events.pop() concurrently. You end up with two unsynchronized writes on the queue, which is undefined behavior.
Also note that if you have more than one consumer, you need to ensure that the element you pop is the same you retrieved earlier from getEvent. In case one consumer gets preempted by another between the two calls. This is very hard to achieve with two separate member functions which are synchronized by a mutex that is a member of the class. The usual approach here is to offer a single getEventAndPop() function instead, that keeps the lock throughout the operation and get rid of the separate functions you currently have. This might seem like an absurd restriction at first sight, but multithreaded code has to play by different rules.

You might also want to change the setEvent method to not notify while the mutex is still locked. It depends on the scheduler but the thread waiting to be notified might wake up immediately just to wait for the mutex.
void
setEvent(Event* event)
{
{
std::lock_guard<std::mutex> lock{mutex};
events.push(event);
}
condition_variable.notify_one();
};

In my opinion, pop should be integrated in the getEvent().
This will prevent more threads from getting the same event (if pop is not in getEvent then more threads could get the same one).
Event*
getEvent()
{
std::unique_lock<std::mutex> lock {mutex};
while (events.empty()) {
condition_variable.wait(lock);
}
Event* front = events.front();
events.pop();
return front;
};

Related

And odd use of conditional variable with local mutex

Poring through legacy code of old and large project, I had found that there was used some odd method of creating thread-safe queue, something like this:
template < typename _Msg>
class WaitQue: public QWaitCondition
{
public:
typedef _Msg DataType;
void wakeOne(const DataType& msg)
{
QMutexLocker lock_(&mx);
que.push(msg);
QWaitCondition::wakeOne();
}
void wait(DataType& msg)
{
/// wait if empty.
{
QMutex wx; // WHAT?
QMutexLocker cvlock_(&wx);
if (que.empty())
QWaitCondition::wait(&wx);
}
{
QMutexLocker _wlock(&mx);
msg = que.front();
que.pop();
}
}
unsigned long size() {
QMutexLocker lock_(&mx);
return que.size();
}
private:
std::queue<DataType> que;
QMutex mx;
};
wakeOne is used from threads as kind of "posting" function" and wait is called from other threads and waits indefinitely until a message appears in queue. In some cases roles between threads reverse at different stages and using separate queues.
Is this even legal way to use a QMutex by creating local one? I kind of understand why someone could do that to dodge deadlock while reading size of que but how it even works? Is there a simpler and more idiomatic way to achieve this behavior?
Its legal to have a local condition variable. But it normally makes no sense.
As you've worked out in this case is wrong. You should be using the member:
void wait(DataType& msg)
{
QMutexLocker cvlock_(&mx);
while (que.empty())
QWaitCondition::wait(&mx);
msg = que.front();
que.pop();
}
Notice also that you must have while instead of if around the call to QWaitCondition::wait. This is for complex reasons about (possible) spurious wake up - the Qt docs aren't clear here. But more importantly the fact that the wake and the subsequent reacquire of the mutex is not an atomic operation means you must recheck the variable queue for emptiness. It could be this last case where you previously were getting deadlocks/UB.
Consider the scenario of an empty queue and a caller (thread 1) to wait into QWaitCondition::wait. This thread blocks. Then thread 2 comes along and adds an item to the queue and calls wakeOne. Thread 1 gets woken up and tries to reacquire the mutex. However, thread 3 comes along in your implementation of wait, takes the mutex before thread 1, sees the queue isn't empty, processes the single item and moves on, releasing the mutex. Then thread 1 which has been woken up finally acquires the mutex, returns from QWaitCondition::wait and tries to process... an empty queue. Yikes.

Deadlock with boost::condition_variable

I am a bit stuck with the problem, so it is my cry for help.
I have a manager that pushes some events to a queue, which is proceeded in another thread.
I don't want this thread to be 'busy waiting' for events in the queue, because it may be empty all the time (as well as it may always be full).
Also I need m_bShutdownFlag to stop the thread when needed.
So I wanted to try a condition_variable for this case: if something was pushed to a queue, then the thread starts its work.
Simplified code:
class SomeManager {
public:
SomeManager::SomeManager()
: m_bShutdownFlag(false) {}
void SomeManager::Initialize() {
boost::recursive_mutex::scoped_lock lock(m_mtxThread);
boost::thread thread(&SomeManager::ThreadProc, this);
m_thread.swap(thread);
}
void SomeManager::Shutdown() {
boost::recursive_mutex::scoped_lock lock(m_mtxThread);
if (m_thread.get_id() != boost::thread::id()) {
boost::lock_guard<boost::mutex> lockEvents(m_mtxEvents);
m_bShutdownFlag = true;
m_condEvents.notify_one();
m_queue.clear();
}
}
void SomeManager::QueueEvent(const SomeEvent& event) {
boost::lock_guard<boost::mutex> lockEvents(m_mtxEvents);
m_queue.push_back(event);
m_condEvents.notify_one();
}
private:
void SomeManager::ThreadProc(SomeManager* pMgr) {
while (true) {
boost::unique_lock<boost::mutex> lockEvents(pMgr->m_mtxEvents);
while (!(pMgr->m_bShutdownFlag || pMgr->m_queue.empty()))
pMgr->m_condEvents.wait(lockEvents);
if (pMgr->m_bShutdownFlag)
break;
else
/* Thread-safe processing of all the events in m_queue */
}
}
boost::thread m_thread;
boost::recursive_mutex m_mtxThread;
bool m_bShutdownFlag;
boost::mutex m_mtxEvents;
boost::condition_variable m_condEvents;
SomeThreadSafeQueue m_queue;
}
But when I test it with two (or more) almost simultaneous calls to QueueEvent, it gets locked at the line boost::lock_guard<boost::mutex> lockEvents(m_mtxEvents); forever.
Seems like the first call doesn't ever release lockEvents, so all the rest just keep waiting for its freeing.
Please, help me to find out what am I doing wrong and how to fix this.
There's a few things to point out on your code:
You may wish to join your thread after calling shutdown, to ensure that your main thread doesn't finish before your other thread.
m_queue.clear(); on shutdown is done outside of your m_mtxEvents mutex lock, meaning it's not as thread safe as you think it is.
your 'thread safe processing' of the queue should be just taking an item off and then releasing the lock while you go off to process the event. You've not shown that explicitly, but failure to do so will result in the lock preventing items from being added.
The good news about a thread blocking like this, is that you can trivially break and inspect what the other threads are doing, and locate the one that is holding the lock. It might be that as per my comment #3 you're just taking a long time to process an event. On the other hand it may be that you've got a dead lock. In any case, what you need is to use your debugger to establish exactly what you've done wrong, since your sample doesn't have enough in it to demonstrate your problem.
inside ThreadProc, while(ture) loop, the lockEvents is not unlocked in any case. try put lock and wait inside a scope.

Actor calculation model using boost::thread

I'm trying to implement Actor calculation model over threads on C++ using boost::thread.
But program throws weird exception during execution. Exception isn't stable and some times program works in correct way.
There my code:
actor.hpp
class Actor {
public:
typedef boost::function<int()> Job;
private:
std::queue<Job> d_jobQueue;
boost::mutex d_jobQueueMutex;
boost::condition_variable d_hasJob;
boost::atomic<bool> d_keepWorkerRunning;
boost::thread d_worker;
void workerThread();
public:
Actor();
virtual ~Actor();
void execJobAsync(const Job& job);
int execJobSync(const Job& job);
};
actor.cpp
namespace {
int executeJobSync(std::string *error,
boost::promise<int> *promise,
const Actor::Job *job)
{
int rc = (*job)();
promise->set_value(rc);
return 0;
}
}
void Actor::workerThread()
{
while (d_keepWorkerRunning) try {
Job job;
{
boost::unique_lock<boost::mutex> g(d_jobQueueMutex);
while (d_jobQueue.empty()) {
d_hasJob.wait(g);
}
job = d_jobQueue.front();
d_jobQueue.pop();
}
job();
}
catch (...) {
// Log error
}
}
void Actor::execJobAsync(const Job& job)
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_jobQueue.push(job);
d_hasJob.notify_one();
}
int Actor::execJobSync(const Job& job)
{
std::string error;
boost::promise<int> promise;
boost::unique_future<int> future = promise.get_future();
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_jobQueue.push(boost::bind(executeJobSync, &error, &promise, &job));
d_hasJob.notify_one();
}
int rc = future.get();
if (rc) {
ErrorUtil::setLastError(rc, error.c_str());
}
return rc;
}
Actor::Actor()
: d_keepWorkerRunning(true)
, d_worker(&Actor::workerThread, this)
{
}
Actor::~Actor()
{
d_keepWorkerRunning = false;
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_hasJob.notify_one();
}
d_worker.join();
}
Actually exception that is thrown is boost::thread_interrupted in int rc = future.get(); line. But form boost docs I can't reason of this exception. Docs says
Throws: - boost::thread_interrupted if the result associated with *this is not ready at the point of the call, and the current thread is interrupted.
But my worker thread can't be in interrupted state.
When I used gdb and set "catch throw" I see that back trace looks like
throw thread_interrupted
boost::detail::interruption_checker::check_for_interruption
boost::detail::interruption_checker::interruption_checker
boost::condition_variable::wait
boost::detail::future_object_base::wait_internal
boost::detail::future_object_base::wait
boost::detail::future_object::get
boost::unique_future::get
I looked into boost sources but can't get why interruption_checker decided that worker thread is interrupted.
So someone C++ guru, please help me. What I need to do to get correct code?
I'm using:
boost 1_53
Linux version 2.6.18-194.32.1.el5 Red Hat 4.1.2-48
gcc 4.7
EDIT
Fixed it! Thanks to Evgeny Panasyuk and Lazin. The problem was in TLS
management. boost::thread and boost::thread_specific_ptr are using
same TLS storage for their purposes. In my case there was problem when
they both tried to change this storage on creation (Unfortunately I
didn't get why in details it happens). So TLS became corrupted.
I replaced boost::thread_specific_ptr from my code with __thread
specified variable.
Offtop: During debugging I found memory corruption in external library
and fixed it =)
.
EDIT 2
I got the exact problem... It is a bug in GCC =)
The _GLIBCXX_DEBUG compilation flag breaks ABI.
You can see discussion on boost bugtracker:
https://svn.boost.org/trac/boost/ticket/7666
I have found several bugs:
Actor::workerThread function does double unlock on d_jobQueueMutex. First unlock is manual d_jobQueueMutex.unlock();, second is in destructor of boost::unique_lock<boost::mutex>.
You should prevent one of unlocking, for example release association between unique_lock and mutex:
g.release(); // <------------ PATCH
d_jobQueueMutex.unlock();
Or add additional code block + default-constructed Job.
It is possible that workerThread will never leave following loop:
while (d_jobQueue.empty()) {
d_hasJob.wait(g);
}
Imagine following case: d_jobQueue is empty, Actor::~Actor() is called, it sets flag and notifies worker thread:
d_keepWorkerRunning = false;
d_hasJob.notify_one();
workerThread wakes up in while loop, sees that queue is empty and sleeps again.
It is common practice to send special final job to stop worker thread:
~Actor()
{
execJobSync([this]()->int
{
d_keepWorkerRunning = false;
return 0;
});
d_worker.join();
}
In this case, d_keepWorkerRunning is not required to be atomic.
LIVE DEMO on Coliru
EDIT:
I have added event queue code into your example.
You have concurrent queue in both EventQueueImpl and Actor, but for different types. It is possible to extract common part into separate entity concurrent_queue<T> which works for any type. It would be much easier to debug and test queue in one place than catching bugs scattered over different classes.
So, you can try to use this concurrent_queue<T>(on Coliru)
This is just a guess. I think that some code can actually call boost::tread::interrupt(). You can set breakpoint to this function and see what code is responsible for this. You can test for interruption in execJobSync:
int Actor::execJobSync(const Job& job)
{
if (boost::this_thread::interruption_requested())
std::cout << "Interruption requested!" << std::endl;
std::string error;
boost::promise<int> promise;
boost::unique_future<int> future = promise.get_future();
The most suspicious code in this case is a code that has reference to thread object.
It is good practice to make your boost::thread code interruption aware anyway. It is also possible to disable interruption for some scope.
If this is not the case - you need to check code that works with thread local storage, because thread interruption flag stored in the TLS. Maybe some your code rewrites it. You can check interruption before and after such code fragment.
Another possibility is that your memory is corrupt. If no code is calling boost::thread::interrupt() and you doesn't work with TLS. This is the most hard case, try to use some dynamic analyzer - valgrind or clang memory sanitizer.
Offtopic:
You probably need to use some concurrent queue. std::queue will be very slow because of high memory contention and you will end up with poor cache performance. Good concurrent queue allow your code to enqueue and dequeue elements in parallel.
Also, actor is not something that supposed to execute arbitrary code. Actor queue must receive simple messages, not functions! Youre writing a job queue :) You need to take a look at some actor system like Akka or libcpa.

Condition variables and lockfree container

Conditional variables use a mutex and the .wait() function unlocks the
mutex so another thread can access the shared data. When the condition
variable is notified it tries to lock the mutex again to use the shared
data.
This pattern is used in the following concurrent_queue example from Anthony Williams:
template<typename Data>
class concurrent_queue
{
private:
boost::condition_variable the_condition_variable;
public:
void wait_for_data()
{
boost::mutex::scoped_lock lock(the_mutex);
while(the_queue.empty())
{
the_condition_variable.wait(lock);
}
}
void push(Data const& data)
{
boost::mutex::scoped_lock lock(the_mutex);
bool const was_empty=the_queue.empty();
the_queue.push(data);
if(was_empty)
{
the_condition_variable.notify_one();
}
}
};
Since the code uses std::queue it's clear that the mutex has to be
locked when accessing the queue.
But let's say instead of std::queue one uses Microsofts
Concurrency::concurrent_queue from PPL. Member functions like empty,
push and try_pop are thread safe. Do I still need to lock a mutex in
this case or can the condition variable be used like this, without
creating any possible race conditions.
My code (that seems to work, but what does that mean in multithreading?) looks like this. I have one producer that pushes items into Microsofts concurrent_queue and one background thread that waits for new items in this queue.
The consumer/background thread:
while(runFlag) //atomic
{
while(the_queue.empty() && runFlag) //wait only when thread should still run
{
boost::mutex mtx; //local mutex thats locked afterwards. awkward.
boost::mutex::scoped_lock lock(mtx);
condition.wait(lock);
}
Data d;
while(!the_queue.empty() && the_queue.try_pop(d))
{
//process data
}
}
The producer/main thread:
const bool was_empty = the_queue.empty();
Data d;
the_queue.push(d);
if(was_empty) cond_var.notify_one();
The shutdown procedure:
bool expected_run_state = true;
if(runLoop.compare_exchange_strong(expected_run_state, false))
{
//atomically set our loop flag to false and
//notify all clients of the queue to wake up and exit
cond_var.notify_all();
}
As said above this code seems to work but that doesn't necessarily mean it's correct. Especially the local mutex that is only used because the condition variable interface forces me to use a mutex, seems like a very bad idea. I wanted to use condition variables since the time between data items added to the queue hard to predict and I would have to create to sleep and wake up periodically like this:
if(the_queue.empty()) Sleep(short_amount_of_time);
Are there any other, maybe OS (in my case: Windows) specific tools, that make a background thread sleep until some condition is met without regularly waking up and checking the condition?
The code is not correct in different scenarios, for example. If the queue has a single element when const bool was_empty = the_queue.empty(); is evaluated, but a thread consumes the element and a different thread tries to consume and waits on the condition, the writer will not notify that thread after inserting the element in the queue.
The key issue is that the fact that all of the operations in an interface are thread safe does not necessarily mean that your use of the interface is safe. If you depend on multiple operations being performed atomically, you need to provide synchronization mechanisms externally.
Are there any other, maybe OS (in my case: Windows) specific tools,
that make a background thread sleep until some condition is met
without regularly waking up and checking the condition?
This is exactly what Events are for
But if you are targeting only Windows platform (Vista+) you should check out
Slim Reader/Writer (SRW) Locks

C++ synchronization and exception handling in cross threads

I am using boost library for threading and synchronization in my application.
First of all I must say exceptions within threads on synchronization is compilitey new thing for me.
In any case below is the pseudo code what I want to achieve. I want synchronized threads to throw same exception that MAY have been thrown from the thread doing notify. How can I achieve this?
Could not find any topics from Stack Overflow regarding exception throwing with cross thread interaction using boost threading model
Many thanks in advance!
// mutex and scondition variable for the problem
mutable boost::mutex conditionMutex;
mutable boost::condition_variable condition;
inline void doTheThing() const {
if (noone doing the thing) {
try {
doIt()
// I succeeded
failed = false;
condition.notify_all();
}
catch (...) {
// I failed to do it
failed = true;
condition.notify_all();
throw
}
else {
boost::mutex::scoped_lock lock(conditionMutex);
condition.wait(lock);
if (failed) {
// throw the same exception that was thrown from
// thread doing notify_all
}
}
}
So you want the first thread that hits doTheThing() to call doIt(), and all subsequent threads that hit doTheThing() to wait for the first thread to finish calling doIt() before they proceed.
I think this should do the trick:
boost::mutex conditionMutex; // mutable qualifier not needed
bool failed = false;
bool done = false;
inline void doTheThing() const {
boost::unique_lock uql(conditionMutex);
if (!done) {
done = true;
try {
doIt();
failed = false;
}
catch (...) {
failed = true;
throw
}
}
else if (failed)
{
uql.unlock();
// now this thread knows that another thread called doIt() and an exception
// was thrown in that thread.
}
}
Important notes:
Every thread that calls doTheThing() must take a lock. There is no way around this. You are synchronizing threads, and for a thread to know anything about what's happening in another thread, it must take a lock. (Or it can use atomic memory operations, but that's a more advanced technique.) The variables failed and done are protected by the conditionMutex.
C++ will call destructor of uql when the function exits normally or by throwing exception.
EDIT Oh, and as for throwing the exception to all the other threads, forget about that, it's almost impossible, and it isn't the way things are done in C++. Instead, each thread can check to see if the first thread successfully called doIt() in the place I've indicated above.
EDIT There is no language support for propagating an exception to another thread. You can generalize the problem of propagating exceptions to another thread to passing messages to another thread. There are lots of library solutions to the problem of passing messages between threads ( boost::asio::io_service::post() ), and you could pass a message that contains the exception, with instructions to throw that exception on receipt of message. It's a bad idea, though. Only throw exceptions when you have an error that prevents you from unwinding the call stack by ordinary function return. That's what an exception is--an alternative way to return from a function when returning the usual way doesn't make sense.