I'm trying to implement Actor calculation model over threads on C++ using boost::thread.
But program throws weird exception during execution. Exception isn't stable and some times program works in correct way.
There my code:
actor.hpp
class Actor {
public:
typedef boost::function<int()> Job;
private:
std::queue<Job> d_jobQueue;
boost::mutex d_jobQueueMutex;
boost::condition_variable d_hasJob;
boost::atomic<bool> d_keepWorkerRunning;
boost::thread d_worker;
void workerThread();
public:
Actor();
virtual ~Actor();
void execJobAsync(const Job& job);
int execJobSync(const Job& job);
};
actor.cpp
namespace {
int executeJobSync(std::string *error,
boost::promise<int> *promise,
const Actor::Job *job)
{
int rc = (*job)();
promise->set_value(rc);
return 0;
}
}
void Actor::workerThread()
{
while (d_keepWorkerRunning) try {
Job job;
{
boost::unique_lock<boost::mutex> g(d_jobQueueMutex);
while (d_jobQueue.empty()) {
d_hasJob.wait(g);
}
job = d_jobQueue.front();
d_jobQueue.pop();
}
job();
}
catch (...) {
// Log error
}
}
void Actor::execJobAsync(const Job& job)
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_jobQueue.push(job);
d_hasJob.notify_one();
}
int Actor::execJobSync(const Job& job)
{
std::string error;
boost::promise<int> promise;
boost::unique_future<int> future = promise.get_future();
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_jobQueue.push(boost::bind(executeJobSync, &error, &promise, &job));
d_hasJob.notify_one();
}
int rc = future.get();
if (rc) {
ErrorUtil::setLastError(rc, error.c_str());
}
return rc;
}
Actor::Actor()
: d_keepWorkerRunning(true)
, d_worker(&Actor::workerThread, this)
{
}
Actor::~Actor()
{
d_keepWorkerRunning = false;
{
boost::mutex::scoped_lock g(d_jobQueueMutex);
d_hasJob.notify_one();
}
d_worker.join();
}
Actually exception that is thrown is boost::thread_interrupted in int rc = future.get(); line. But form boost docs I can't reason of this exception. Docs says
Throws: - boost::thread_interrupted if the result associated with *this is not ready at the point of the call, and the current thread is interrupted.
But my worker thread can't be in interrupted state.
When I used gdb and set "catch throw" I see that back trace looks like
throw thread_interrupted
boost::detail::interruption_checker::check_for_interruption
boost::detail::interruption_checker::interruption_checker
boost::condition_variable::wait
boost::detail::future_object_base::wait_internal
boost::detail::future_object_base::wait
boost::detail::future_object::get
boost::unique_future::get
I looked into boost sources but can't get why interruption_checker decided that worker thread is interrupted.
So someone C++ guru, please help me. What I need to do to get correct code?
I'm using:
boost 1_53
Linux version 2.6.18-194.32.1.el5 Red Hat 4.1.2-48
gcc 4.7
EDIT
Fixed it! Thanks to Evgeny Panasyuk and Lazin. The problem was in TLS
management. boost::thread and boost::thread_specific_ptr are using
same TLS storage for their purposes. In my case there was problem when
they both tried to change this storage on creation (Unfortunately I
didn't get why in details it happens). So TLS became corrupted.
I replaced boost::thread_specific_ptr from my code with __thread
specified variable.
Offtop: During debugging I found memory corruption in external library
and fixed it =)
.
EDIT 2
I got the exact problem... It is a bug in GCC =)
The _GLIBCXX_DEBUG compilation flag breaks ABI.
You can see discussion on boost bugtracker:
https://svn.boost.org/trac/boost/ticket/7666
I have found several bugs:
Actor::workerThread function does double unlock on d_jobQueueMutex. First unlock is manual d_jobQueueMutex.unlock();, second is in destructor of boost::unique_lock<boost::mutex>.
You should prevent one of unlocking, for example release association between unique_lock and mutex:
g.release(); // <------------ PATCH
d_jobQueueMutex.unlock();
Or add additional code block + default-constructed Job.
It is possible that workerThread will never leave following loop:
while (d_jobQueue.empty()) {
d_hasJob.wait(g);
}
Imagine following case: d_jobQueue is empty, Actor::~Actor() is called, it sets flag and notifies worker thread:
d_keepWorkerRunning = false;
d_hasJob.notify_one();
workerThread wakes up in while loop, sees that queue is empty and sleeps again.
It is common practice to send special final job to stop worker thread:
~Actor()
{
execJobSync([this]()->int
{
d_keepWorkerRunning = false;
return 0;
});
d_worker.join();
}
In this case, d_keepWorkerRunning is not required to be atomic.
LIVE DEMO on Coliru
EDIT:
I have added event queue code into your example.
You have concurrent queue in both EventQueueImpl and Actor, but for different types. It is possible to extract common part into separate entity concurrent_queue<T> which works for any type. It would be much easier to debug and test queue in one place than catching bugs scattered over different classes.
So, you can try to use this concurrent_queue<T>(on Coliru)
This is just a guess. I think that some code can actually call boost::tread::interrupt(). You can set breakpoint to this function and see what code is responsible for this. You can test for interruption in execJobSync:
int Actor::execJobSync(const Job& job)
{
if (boost::this_thread::interruption_requested())
std::cout << "Interruption requested!" << std::endl;
std::string error;
boost::promise<int> promise;
boost::unique_future<int> future = promise.get_future();
The most suspicious code in this case is a code that has reference to thread object.
It is good practice to make your boost::thread code interruption aware anyway. It is also possible to disable interruption for some scope.
If this is not the case - you need to check code that works with thread local storage, because thread interruption flag stored in the TLS. Maybe some your code rewrites it. You can check interruption before and after such code fragment.
Another possibility is that your memory is corrupt. If no code is calling boost::thread::interrupt() and you doesn't work with TLS. This is the most hard case, try to use some dynamic analyzer - valgrind or clang memory sanitizer.
Offtopic:
You probably need to use some concurrent queue. std::queue will be very slow because of high memory contention and you will end up with poor cache performance. Good concurrent queue allow your code to enqueue and dequeue elements in parallel.
Also, actor is not something that supposed to execute arbitrary code. Actor queue must receive simple messages, not functions! Youre writing a job queue :) You need to take a look at some actor system like Akka or libcpa.
Related
I am currently debugging a server(win32/64) that utilizes Boost:asio 1.78.
The code is a blend of legacy, older legacy and some newer code. None of this code is mine. I can't answer for why something is done in a certain way. I'm just trying to understand why this is happening and hopefully fix it wo. rewriting it from scratch. This code has been running for years on 50+ servers with no errors. Just these 2 servers that missbehaves.
I have one client (dot.net) that is connected to two servers. Client is sending the same data to the 2 servers. The servers run the same code, as follows in code sect.
All is working well but now and then communications halts. No errors or exceptions on either end. It just halts. Never on both servers at the same time. This happens very seldom. Like every 3 months or less. I have no way of reproducing it in a debugger bc I don't know where to look for this behavior.
On the client side the socket appears to be working/open but does not accept new data. No errors is detected in the socket.
Here's a shortened code describing the functions. I want to stress that I can't detect any errors or exceptions during these failures. Code just stops at "m_socket->read_some()".
Only solution to "unblock" right now is to close the socket manually and restart the acceptor. When I manually close the socket the read_some method returns with error code so I know it is inside there it stops.
Questions:
What may go wrong here and give this behavior?
What parameters should I log to enable me to determine what is happening, and from where.
main code:
std::shared_ptr<boost::asio::io_service> io_service_is = std::make_shared<boost::asio::io_service>();
auto is_work = std::make_shared<boost::asio::io_service::work>(*io_service_is.get());
auto acceptor = std::make_shared<TcpAcceptorWrapper>(*io_service_is.get(), port);
acceptor->start();
auto threadhandle = std::thread([&io_service_is]() {io_service_is->run();});
TcpAcceptorWrapper:
void start(){
m_asio_tcp_acceptor.open(boost::asio::ip::tcp::v4());
m_asio_tcp_acceptor.bind(boost::asio::ip::tcp::endpoint(boost::asio::ip::tcp::v4(), m_port));
m_asio_tcp_acceptor.listen();
start_internal();
}
void start_internal(){
m_asio_tcp_acceptor.async_accept(m_socket, [this](boost::system::error_code error) { /* Handler code */ });
}
Handler code:
m_current_session = std::make_shared<TcpSession>(&m_socket);
std::condition_variable condition;
std::mutex mutex;
bool stopped(false);
m_current_session->run(condition, mutex, stopped);
{
std::unique_lock<std::mutex> lock(mutex);
condition.wait(lock, [&stopped] { return stopped; });
}
TcpSession runner:
void run(std::condition_variable& complete, std::mutex& mutex, bool& stopped){
auto self(shared_from_this());
std::thread([this, self, &complete, &mutex, &stopped]() {
{ // mutex scope
// Lock and hold mutex from tcp_acceptor scope
std::lock_guard<std::mutex> lock(mutex);
while (true) {
std::array<char, M_BUFFER_SIZE> buffer;
try {
boost::system::error_code error;
/* Next call just hangs/blocks but only rarely. like once every 3 months or more seldom */
std::size_t read = m_socket->read_some(boost::asio::buffer(buffer, M_BUFFER_SIZE), error);
if (error || read == -1) {
// This never happens
break;
}
// inside this all is working
process(buffer);
} catch (std::exception& ex) {
// This never happens
break;
} catch (...) {
// Neither does this
break;
}
}
stopped = true;
} // mutex released
complete.notify_one();
}).detach();
}
This:
m_acceptor.async_accept(m_socket, [this](boost::system::error_code error) { // Handler code });
Handler code:
std::condition_variable condition;
std::mutex mutex;
bool stopped(false);
m_current_session->run(condition, mutex, stopped);
{
std::unique_lock<std::mutex> lock(mutex);
condition.wait(lock, [&stopped] { return stopped; });
}
Is strange. It suggests you are using an "async" accept, but the handler block unconditionally until the session completes. That's the opposite of asynchrony. You could much easier write the same code without the asynchrony, and also without the thread and synchronization around it.
My intuition says something is blocking the mutex. Have you established that the session stack is actually inside the read_some frame when e.g. doing a debugger break during a "lock-up"?
When I manually close the socket the read_some method returns with error code so I know it is inside there I have an issue.
You can't legally do that. Your socket is in use on a thread - in a blocking read -, and you're bound to close it from a separate thread. That's a race-condition (see docs). If you want cancellable operations, use async_read*.
There are more code smells (read_some is a lowlevel primitive that is rarely what you want at the application level, detached threads with manual synchronization on termination could be packaged tasks, shared boolean flags could be atomics, notify_one outside the mutex could lead to thread starvation on some platforms etc.).
If you can share more code I'll be happy to sketch simplified solutions that remove the problems.
well, actually, I'm not asking the threads must "line up" to work, but I just want to notify multiple threads. so I'm not looking for barrier.
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation(also the potential problem in multiple semaphore post operation). it's kind of like:
std::atomic_flag flag{ATOMIC_FLAG_INIT};
void example() {
if (!flag.test_and_set()) {
// this is the thread to do the job, and notify others
do_something();
notify_others(); // this is what I'm looking for
flag.clear();
} else {
// this is the waiting thread
wait_till_notification();
do_some_other_thing();
}
}
void runner() {
std::vector<std::threads>;
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
example();
}
});
}
// ...
}
so how can I do this in c/c++ or maybe posix API?
sorry, I didn't make this question clear enough, I'd add some more explaination.
it's not thunder heard problem I'm talking about, and yes, it's the re-acquire-lock that bothers me, and I tried shared_mutex, there's still some problem.
let me split the threads to 2 parts, 1 as leader thread, which do the writing job, the others as worker threads, which do the reading job.
but actually they're all equal in programme, the leader thread is the thread that 1st got access to the job( you can take it as the shared buffer is underflowed for this thread). once the job is done, the other workers just need to be notified that them have the access.
if the mutex is used here, any thread would block the others.
to give an example: the main thread's job do_something() here is a read, and it block the main thread, thus the whole system is blocked.
unfortunatly, shared_mutex won't solve this problem:
void example() {
if (!flag.test_and_set()) {
// leader thread:
lk.lock();
do_something();
lk.unlock();
flag.clear();
} else {
// worker thread
lk.shared_lock();
do_some_other_thing();
lk.shared_unlock();
}
}
// outer loop
void looper() {
std::vector<std::threads>;
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
example();
}
});
}
}
in this code, if the leader job was done, and not much to do between this unlock and next lock (remember they're in a loop), it may get the lock again, leave the worker jobs not working, which is why I call it starve earlier.
and to explain the blocking in do_something(), I don't want this part of job takes all my CPU time, even if the leader's job is not ready (no data arrive for read)
and std::call_once may still not be the answer to this. because, as you can see, the workers must wait till the leader's job finished.
to summarize, this is actually a one-producer-multi-consumer problem.
but I want the consumers can do the job when the product is ready for them. and any can be the producer or consumer. if any but the 1st find the product has run out, the thread should be the producer, thus others are automatically consumer.
but unfortunately, I'm not sure if this idea would work or not
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation
In principle it's not waking up that is serialized, but re-acquiring the lock.
You can avoid that by using std::condition_variable_any with a std::shared_lock - so long as nobody ever gets an exclusive lock on the std::shared_mutex. Alternatively, you can provide your own Lockable type.
Note however that this won't magically allow you to concurrently run more threads than you have cores, or force the scheduler to start them all running in parallel. They'll just be marked as runnable and scheduled as normal - this only fixes the avoidable serialization in your own code.
It sounds like you are looking for call_once
#include <mutex>
void example()
{
static std::once_flag flag;
bool i_did_once = false;
std::call_once(flag, [&i_did_once]() mutable {
i_did_once = true;
do_something();
});
if(! i_did_once)
do_some_other_thing();
}
I don't see how your problem relates to starvation. Are you perhaps thinking about the thundering herd problem? This may arise if do_some_other_thing has a mutex but in that case you have to describe your problem in more detail.
When I create a new thread I want to wait until the new thread reaches a specific point. Until now, I have solved this with a promise that is passed to the thread while the creating function waits for future.get(). If the initialization of the new thread fails, I set an exception to the promise so that future.get() will also throw an exception.
This looks something like this:
boost::promise<bool> promiseThreadStart;
void threadEntry();
int main(int argc, char* argv[])
{
// Prepare promise/future
promiseThreadStart = boost::promise<bool>();
boost::unique_future<bool> futureThreadStarted = promiseThreadStart.get_future();
// Start thread
boost::thread threadInstance = boost::thread(threadEntry);
// Wait for the thread to successfully initialize or to fail
try {
bool threadStarted = futureThreadStarted.get();
// Started successfully
}
catch (std::exception &e) {
// Starting the thread failed
}
return 0;
}
void threadEntry() {
// Do some preparations that could possibly fail
if (initializationFailed) {
promiseThreadStart.set_exception(std::runtime_error("Could not start thread."));
return;
}
// Thread initialized successfully
promiseThreadStart.set_value(true);
// Do the actual work of the thread
}
What upsets me here is the fact that the thread could fail during its initialization phase with an error that I do not handle. Then, I would not set the proper exception to the promise and the main-function would wait infinitely for future.get() to return. With this in mind, my solution seems to me quite error prone and badly designed.
I have learned about RAII and how it supplies you with exception-safety because you can clean up in the destructor. I would like to apply a similar pattern to the situation mentioned above. Therefore, I am wondering if there is something like a thread-destructor or exit-handler where I could possibly set a default exception to the promise. But anyways, using this promise/future design seems to me like a dirty workaround. So, what's the best and most elegant way to achieve exception-safe waiting?
Finally, I found an answer incidentally: I guess it can be done with a std::condition_variable while the waiting thread(s) can be notified with std::notify_all_at_thread_exit if the newly created thread terminates prematurely.
I'm not that into multi-threading, so I appreciate any advice.
In my server which is written in producer-consumer multi-threaded style
queue is wrapped altogether with its mutex and cv:
template <typename Event>
struct EventsHandle {
public: // methods:
Event*
getEvent()
{
std::unique_lock<std::mutex> lock {mutex};
while (events.empty()) {
condition_variable.wait(lock);
}
return events.front();
};
void
setEvent(Event* event)
{
std::lock_guard<std::mutex> lock {mutex};
events.push(event);
condition_variable.notify_one();
};
void
pop()
{ events.pop(); };
private: // fields:
std::queue<Event*> events;
std::mutex mutex;
std::condition_variable condition_variable;
};
and here how it is used in the consumer thread:
void
Server::listenEvents()
{
while (true) {
processEvent(events_handle.getEvent());
events_handle.pop();
}
};
and in a producer:
parse input, whatever else
...
setEvent(new Event {ERASE_CLIENT, getSocket(), nullptr});
...
void
Client::setEvent(Event* event)
{
if (event) {
events_handle->setEvent(event);
}
};
The code works on linux and I don't know why, but fails on windows MSVC13.
At some point exception is thrown with this dialog:
"Unhandled exception at 0x59432564 (msvcp120d.dll) in Server.exe: 0xC0000005: Access violation reading location 0xCDCDCDE1".
Debugging shows that exception is thrown on this line: std::lock_guard<std::mutex> lock(mutex) in the setEvent() function.
Little googling led me to these articles:
http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html
http://www.codeproject.com/Articles/598695/Cplusplus-threads-locks-and-condition-variables
I tried to follow them, but with little help. So after this long text sheet my question:
what's wrong with the code, mutex?
UPDATE
So... Finally after lovely game named 'comment that line out' it turned out that the problem was in memory management. Event's dctor caused failure.
As conclusion, it's better to use std::unique_ptr<> as mentions Jarod42 or use value semantic as advises juanchopanza when passing objects back and forth.
And use libraries whenever possible, don't reinvent the wheel =)
You have a data race due to a missing mutex in EventsHandle::pop.
Your producer thread may push items to the queue by calling setEvent(), and get preempted while executing the line events.push(event). Now a consumer thread can execute events.pop() concurrently. You end up with two unsynchronized writes on the queue, which is undefined behavior.
Also note that if you have more than one consumer, you need to ensure that the element you pop is the same you retrieved earlier from getEvent. In case one consumer gets preempted by another between the two calls. This is very hard to achieve with two separate member functions which are synchronized by a mutex that is a member of the class. The usual approach here is to offer a single getEventAndPop() function instead, that keeps the lock throughout the operation and get rid of the separate functions you currently have. This might seem like an absurd restriction at first sight, but multithreaded code has to play by different rules.
You might also want to change the setEvent method to not notify while the mutex is still locked. It depends on the scheduler but the thread waiting to be notified might wake up immediately just to wait for the mutex.
void
setEvent(Event* event)
{
{
std::lock_guard<std::mutex> lock{mutex};
events.push(event);
}
condition_variable.notify_one();
};
In my opinion, pop should be integrated in the getEvent().
This will prevent more threads from getting the same event (if pop is not in getEvent then more threads could get the same one).
Event*
getEvent()
{
std::unique_lock<std::mutex> lock {mutex};
while (events.empty()) {
condition_variable.wait(lock);
}
Event* front = events.front();
events.pop();
return front;
};
I am using boost library for threading and synchronization in my application.
First of all I must say exceptions within threads on synchronization is compilitey new thing for me.
In any case below is the pseudo code what I want to achieve. I want synchronized threads to throw same exception that MAY have been thrown from the thread doing notify. How can I achieve this?
Could not find any topics from Stack Overflow regarding exception throwing with cross thread interaction using boost threading model
Many thanks in advance!
// mutex and scondition variable for the problem
mutable boost::mutex conditionMutex;
mutable boost::condition_variable condition;
inline void doTheThing() const {
if (noone doing the thing) {
try {
doIt()
// I succeeded
failed = false;
condition.notify_all();
}
catch (...) {
// I failed to do it
failed = true;
condition.notify_all();
throw
}
else {
boost::mutex::scoped_lock lock(conditionMutex);
condition.wait(lock);
if (failed) {
// throw the same exception that was thrown from
// thread doing notify_all
}
}
}
So you want the first thread that hits doTheThing() to call doIt(), and all subsequent threads that hit doTheThing() to wait for the first thread to finish calling doIt() before they proceed.
I think this should do the trick:
boost::mutex conditionMutex; // mutable qualifier not needed
bool failed = false;
bool done = false;
inline void doTheThing() const {
boost::unique_lock uql(conditionMutex);
if (!done) {
done = true;
try {
doIt();
failed = false;
}
catch (...) {
failed = true;
throw
}
}
else if (failed)
{
uql.unlock();
// now this thread knows that another thread called doIt() and an exception
// was thrown in that thread.
}
}
Important notes:
Every thread that calls doTheThing() must take a lock. There is no way around this. You are synchronizing threads, and for a thread to know anything about what's happening in another thread, it must take a lock. (Or it can use atomic memory operations, but that's a more advanced technique.) The variables failed and done are protected by the conditionMutex.
C++ will call destructor of uql when the function exits normally or by throwing exception.
EDIT Oh, and as for throwing the exception to all the other threads, forget about that, it's almost impossible, and it isn't the way things are done in C++. Instead, each thread can check to see if the first thread successfully called doIt() in the place I've indicated above.
EDIT There is no language support for propagating an exception to another thread. You can generalize the problem of propagating exceptions to another thread to passing messages to another thread. There are lots of library solutions to the problem of passing messages between threads ( boost::asio::io_service::post() ), and you could pass a message that contains the exception, with instructions to throw that exception on receipt of message. It's a bad idea, though. Only throw exceptions when you have an error that prevents you from unwinding the call stack by ordinary function return. That's what an exception is--an alternative way to return from a function when returning the usual way doesn't make sense.