Thread safety of boost::asio io_service and std::containers - c++

I'm building a network service with boost::asio and I'm unsure about the thread safety.
io_service.run() is called only once from a thread dedicated for the io_service work
send_message() on the other hand can be called either by the code inside the second io_service handlers mentioned later, or by the mainThread upon user interaction. And that is why I'm getting nervous.
std::deque<message> out_queue;
// send_message will be called by two different threads
void send_message(MsgPtr msg){
while (out_queue->size() >= 20){
Sleep(50);
}
io_service_.post([this, msg]() { deliver(msg); });
}
// from my understanding, deliver will only be called by the thread which called io_service.run()
void deliver(const MsgPtr){
bool write_in_progress = !out_queue.empty();
out_queue.push_back(msg);
if (!write_in_progress)
{
write();
}
}
void write()
{
auto self(shared_from_this());
asio::async_write(socket_,
asio::buffer(out_queue.front().header(),
message::header_length), [this, self](asio::error_code ec, std::size_t/)
{
if (!ec)
{
asio::async_write(socket_,
asio::buffer(out_queue.front().data(),
out_queue.front().paddedPayload_size()),
[this, self](asio::error_code ec, std::size_t /*length*/)
{
if (!ec)
{
out_queue.pop_front();
if (!out_queue.empty())
{
write();
}
}
});
}
});
}
Is this scenario safe?
A similar second scenario: When the network thread receives a message, it posts them into another asio::io_service which is also run by its own dedicated thread. This io_service uses an std::unordered_map to store callback functions etc.
std::unordered_map<int, eventSink> eventSinkMap_;
//...
// called by the main thread (GUI), writes a callback function object to the map
int IOReactor::registerEventSink(std::function<void(int, std::shared_ptr<message>)> fn, QObject* window, std::string endpointId){
util::ScopedLock lock(&sync_);
eventSink es;
es.id = generateRandomId();
// ....
std::pair<int, eventSink> eventSinkPair(es.id, es);
eventSinkMap_.insert(eventSinkPair);
return es.id;
}
// called by the second thread, the network service thread when a message was received
void IOReactor::onMessageReceived(std::shared_ptr<message> msg, ConPtr con)
{
reactor_io_service_.post([=](){ handleReceive(msg, con); });
}
// should be called only by the one thread running the reactor_io_service.run()
// read and write access to the map
void IOReactor::handleReceive(std::shared_ptr<message> msg, ConPtr con){
util::ScopedLock lock(&sync_);
auto es = eventSinkMap_.find(msg.requestId);
if (es != eventSinkMap_.end())
{
auto fn = es->second.handler;
auto ctx = es->second.context;
QMetaObject::invokeMethod(ctx, "runInMainThread", Qt::QueuedConnection, Q_ARG(std::function<void(int, std::shared_ptr<msg::IMessage>)>, fn), Q_ARG(int, CallBackResult::SUCCESS), Q_ARG(std::shared_ptr<msg::IMessage>, msg));
eventSinkMap_.erase(es);
}
first of all: Do I even need to use a lock here?
Ofc both methods access the map, but they are not accessing the same elements (the receiveHandler cannot try to access or read an element that has not yet been registered/inserted into the map). Is that threadsafe?

First of all, a lot of context is missing (where is onMessageReceived invoked, and what is ConPtr? and you have too many questions. I'll give you some specific pointers that will help you though.
You should be nervous here:
void send_message(MsgPtr msg){
while (out_queue->size() >= 20){
Sleep(50);
}
io_service_.post([this, msg]() { deliver(msg); });
}
The check out_queue->size() >= 20 requires synchronization unless out_queue is thread safe.
The call to io_service_.post is safe, because io_service is thread safe. Since you have one dedicated IO thread, this means that deliver() will run on that thread. Right now, you need synchronization there too.
I strongly suggest using a proper thread-safe queue there.
Q. first of all: Do I even need to use a lock here?
Yes you need to lock to do the map lookup (otherwise you get a data race with the main thread inserting sinks).
You do not need to lock during the invocation (in fact, that seems like a very unwise idea that could lead to performance issue or lockups). The reference remains valid due to Iterator invalidation rules.
The deletion of course requires a lock again. I'd revise the code to do deletion and removal at once, and invoke the sink only after releasing the lock. NOTE You will have to think about exceptions here (in your code when there is an exception during invocation, the sink doesn't get removed (ever?). This might be important to you.
Live Demo
void handleReceive(std::shared_ptr<message> msg, ConPtr con){
util::ScopedLock lock(&sync_);
auto es = eventSinkMap_.find(msg->requestId);
if (es != eventSinkMap_.end())
{
auto fn = es->second.handler;
auto ctx = es->second.context;
eventSinkMap_.erase(es); // invalidates es
lock.unlock();
// invoke in whatever way you require
fn(static_cast<int>(CallBackResult::SUCCESS), std::static_pointer_cast<msg::IMessage>(msg));
}
}

Related

boost::asio::ip::tcp::socket.read_some() stops working. No exception or errors detected

I am currently debugging a server(win32/64) that utilizes Boost:asio 1.78.
The code is a blend of legacy, older legacy and some newer code. None of this code is mine. I can't answer for why something is done in a certain way. I'm just trying to understand why this is happening and hopefully fix it wo. rewriting it from scratch. This code has been running for years on 50+ servers with no errors. Just these 2 servers that missbehaves.
I have one client (dot.net) that is connected to two servers. Client is sending the same data to the 2 servers. The servers run the same code, as follows in code sect.
All is working well but now and then communications halts. No errors or exceptions on either end. It just halts. Never on both servers at the same time. This happens very seldom. Like every 3 months or less. I have no way of reproducing it in a debugger bc I don't know where to look for this behavior.
On the client side the socket appears to be working/open but does not accept new data. No errors is detected in the socket.
Here's a shortened code describing the functions. I want to stress that I can't detect any errors or exceptions during these failures. Code just stops at "m_socket->read_some()".
Only solution to "unblock" right now is to close the socket manually and restart the acceptor. When I manually close the socket the read_some method returns with error code so I know it is inside there it stops.
Questions:
What may go wrong here and give this behavior?
What parameters should I log to enable me to determine what is happening, and from where.
main code:
std::shared_ptr<boost::asio::io_service> io_service_is = std::make_shared<boost::asio::io_service>();
auto is_work = std::make_shared<boost::asio::io_service::work>(*io_service_is.get());
auto acceptor = std::make_shared<TcpAcceptorWrapper>(*io_service_is.get(), port);
acceptor->start();
auto threadhandle = std::thread([&io_service_is]() {io_service_is->run();});
TcpAcceptorWrapper:
void start(){
m_asio_tcp_acceptor.open(boost::asio::ip::tcp::v4());
m_asio_tcp_acceptor.bind(boost::asio::ip::tcp::endpoint(boost::asio::ip::tcp::v4(), m_port));
m_asio_tcp_acceptor.listen();
start_internal();
}
void start_internal(){
m_asio_tcp_acceptor.async_accept(m_socket, [this](boost::system::error_code error) { /* Handler code */ });
}
Handler code:
m_current_session = std::make_shared<TcpSession>(&m_socket);
std::condition_variable condition;
std::mutex mutex;
bool stopped(false);
m_current_session->run(condition, mutex, stopped);
{
std::unique_lock<std::mutex> lock(mutex);
condition.wait(lock, [&stopped] { return stopped; });
}
TcpSession runner:
void run(std::condition_variable& complete, std::mutex& mutex, bool& stopped){
auto self(shared_from_this());
std::thread([this, self, &complete, &mutex, &stopped]() {
{ // mutex scope
// Lock and hold mutex from tcp_acceptor scope
std::lock_guard<std::mutex> lock(mutex);
while (true) {
std::array<char, M_BUFFER_SIZE> buffer;
try {
boost::system::error_code error;
/* Next call just hangs/blocks but only rarely. like once every 3 months or more seldom */
std::size_t read = m_socket->read_some(boost::asio::buffer(buffer, M_BUFFER_SIZE), error);
if (error || read == -1) {
// This never happens
break;
}
// inside this all is working
process(buffer);
} catch (std::exception& ex) {
// This never happens
break;
} catch (...) {
// Neither does this
break;
}
}
stopped = true;
} // mutex released
complete.notify_one();
}).detach();
}
This:
m_acceptor.async_accept(m_socket, [this](boost::system::error_code error) { // Handler code });
Handler code:
std::condition_variable condition;
std::mutex mutex;
bool stopped(false);
m_current_session->run(condition, mutex, stopped);
{
std::unique_lock<std::mutex> lock(mutex);
condition.wait(lock, [&stopped] { return stopped; });
}
Is strange. It suggests you are using an "async" accept, but the handler block unconditionally until the session completes. That's the opposite of asynchrony. You could much easier write the same code without the asynchrony, and also without the thread and synchronization around it.
My intuition says something is blocking the mutex. Have you established that the session stack is actually inside the read_some frame when e.g. doing a debugger break during a "lock-up"?
When I manually close the socket the read_some method returns with error code so I know it is inside there I have an issue.
You can't legally do that. Your socket is in use on a thread - in a blocking read -, and you're bound to close it from a separate thread. That's a race-condition (see docs). If you want cancellable operations, use async_read*.
There are more code smells (read_some is a lowlevel primitive that is rarely what you want at the application level, detached threads with manual synchronization on termination could be packaged tasks, shared boolean flags could be atomics, notify_one outside the mutex could lead to thread starvation on some platforms etc.).
If you can share more code I'll be happy to sketch simplified solutions that remove the problems.

How to ensure that the messages will be enqueued in chronological order on multithreaded Asio io_service?

Following Michael Caisse's cppcon talk I created a connection handler MyUserConnection which has a sendMessage method. sendMessage method adds a message to the queue similarly to the send() in the cppcon talk. My sendMessage method is called from multiple threads outside of the connection handler in high intervals. The messages must be enqueued chronologically.
When I run my code with only one Asio io_service::run call (aka one io_service thread) it async_write's and empties my queue as expected (FIFO), however, the problem occurs when there are, for example, 4 io_service::run calls, then the queue is not filled or the send calls are not called chronologically.
class MyUserConnection : public std::enable_shared_from_this<MyUserConnection> {
public:
MyUserConnection(asio::io_service& io_service, SslSocket socket) :
service_(io_service),
socket_(std::move(socket)),
strand_(io_service) {
}
void sendMessage(std::string msg) {
auto self(shared_from_this());
service_.post(strand_.wrap([self, msg]() {
self->queueMessage(msg);
}));
}
private:
void queueMessage(const std::string& msg) {
bool writeInProgress = !sendPacketQueue_.empty();
sendPacketQueue_.push_back(msg);
if (!writeInProgress) {
startPacketSend();
}
}
void startPacketSend() {
auto self(shared_from_this());
asio::async_write(socket_,
asio::buffer(sendPacketQueue_.front().data(), sendPacketQueue_.front().length()),
strand_.wrap([self](const std::error_code& ec, std::size_t /*n*/) {
self->packetSendDone(ec);
}));
}
void packetSendDone(const std::error_code& ec) {
if (!ec) {
sendPacketQueue_.pop_front();
if (!sendPacketQueue_.empty()) { startPacketSend(); }
} else {
// end(); // My end call
}
}
asio::io_service& service_;
SslSocket socket_;
asio::io_service::strand strand_;
std::deque<std::string> sendPacketQueue_;
};
I'm quite sure that I misinterpreted the strand and io_service::post when running the connection handler on multithreaded io_service. I'm also quite sure that the messages are not enqueued chronologically instead of messages not being async_write chronologically. How to ensure that the messages will be enqueued in chronological order in sendMessage call on multithreaded io_service?
If you use a strand, the order is guaranteed to be the order in which you post the operations to the strand.
Of course, if there is some kind of "correct ordering" between threads that post then you have to synchronize the posting between them, that's your application domain.
Here's a modernized, simplified take on your MyUserConnection class with a self-contained server test program:
Live On Coliru
#include <boost/asio.hpp>
#include <boost/asio/ssl.hpp>
#include <deque>
#include <iostream>
#include <mutex>
namespace asio = boost::asio;
namespace ssl = asio::ssl;
using asio::ip::tcp;
using boost::system::error_code;
using SslSocket = ssl::stream<tcp::socket>;
class MyUserConnection : public std::enable_shared_from_this<MyUserConnection> {
public:
MyUserConnection(SslSocket&& socket) : socket_(std::move(socket)) {}
void start() {
std::cerr << "Handshake initiated" << std::endl;
socket_.async_handshake(ssl::stream_base::handshake_type::server,
[self = shared_from_this()](error_code ec) {
std::cerr << "Handshake complete" << std::endl;
});
}
void sendMessage(std::string msg) {
post(socket_.get_executor(),
[self = shared_from_this(), msg = std::move(msg)]() {
self->queueMessage(msg);
});
}
private:
void queueMessage(std::string msg) {
outbox_.push_back(std::move(msg));
if (outbox_.size() == 1)
sendLoop();
}
void sendLoop() {
std::cerr << "Sendloop " << outbox_.size() << std::endl;
if (outbox_.empty())
return;
asio::async_write( //
socket_, asio::buffer(outbox_.front()),
[this, self = shared_from_this()](error_code ec, std::size_t) {
if (!ec) {
outbox_.pop_front();
sendLoop();
} else {
end();
}
});
}
void end() {}
SslSocket socket_;
std::deque<std::string> outbox_;
};
int main() {
asio::thread_pool ioc;
ssl::context ctx(ssl::context::sslv23_server);
ctx.set_password_callback([](auto...) { return "test"; });
ctx.use_certificate_file("server.pem", ssl::context::file_format::pem);
ctx.use_private_key_file("server.pem", ssl::context::file_format::pem);
ctx.use_tmp_dh_file("dh2048.pem");
tcp::acceptor a(ioc, {{}, 8989u});
for (;;) {
auto s = a.accept(make_strand(ioc.get_executor()));
std::cerr << "accepted " << s.remote_endpoint() << std::endl;
auto sess = make_shared<MyUserConnection>(SslSocket(std::move(s), ctx));
sess->start();
for(int i = 0; i<30; ++i) {
post(ioc, [sess, i] {
std::string msg = "message #" + std::to_string(i) + "\n";
{
static std::mutex mx;
// Lock so console output is guaranteed in the same order
// as the sendMessage call
std::lock_guard lk(mx);
std::cout << "Sending " << msg << std::flush;
sess->sendMessage(std::move(msg));
}
});
}
break; // for online demo
}
ioc.join();
}
If you run it a few times, you will see that
the order in which the threads post is not deterministic (that's up to the kernel scheduling)
the order in which messages are sent (and received) is exactly the order in which they are posted.
See live demo runs on my machine:
On a multi core or even on a single core preemptive OS, you cannot truly feed messages into a queue in strictly chronological order. Even if you use a mutex to synchronize write access to the queue, the strict order is no longer guaranteed once multiple writers wait on the mutex and the mutex becomes free. At best, the order, in which the waiting write threads acquire the mutex, is implementation dependent (OS code dependent), but it is best to assume it is just random.
With that being said, the strict chronological order is a matter of definition in the first place. To explain that, imagine your PC has some digital output bits (1 for each writer thread) and you connected a logic analyzer to those bits.... and imagine, you pick some spot in the code, where you toggle such a respective bit in your enqueue function. Even if that bit toggle takes place just one assembly instruction prior to acquiring the mutex, it is possible, that the order had been changed, while the writer code approached that point. You could also set it to other arbirtrary points prior (e.g. when you enter the enqueue function). But then, the same reasoning applies. Hence, the strict chronological order is in itself a matter of definition.
There is an analogy to a case, where a CPUs interrupt controller has multiple inputs and you tried to build a system which processes those interrupts in strictly chronological order. Even if all interrupt inputs were signaled exactly at the same moment (a switch, pulling them all to signaled state simultaneously), some order would occur (e.g. caused by hardware logic or just by noise at the input pins or by the systems interrupt dispatcher function (some CPUs (e.g. MIPS 4102) have a single interrupt vector and assembly code checks the possible interrupt sources and dispatches to dedicated interrupt handlers).
This analogy helps see the pattern: It comes down to asynchronous inputs on a synchronous system. Which is a notoriously hard problem in itself.
So, the best you could possibly do, is to make a suitable definition of your applications "strict ordering" and live with it.
Then, to avoid violations of your definition, you could use a priority queue instead of a normal FIFO data type and use as priority some atomic counter:
At your chosen point in the code, atomically read and increment the counter.
This is your messages sequence number.
Assemble your message and enqueue it into the priority queue, using your sequence number as priority.
Another possible approach is to define a notion of "simultaneous", which is detectable on the other side of the queue (and thus, the reader cannot assume strict ordering for a set of "simultaneous" messages). This could be implemented by reading some high frequency tick count and all those messages, which have the same "time stamp" are to be considered simultaneous on reader side.

What causes a random crash in boost::coroutine?

I have a multithread application which uses boost::asio and boost::coroutine via its integration in boost::asio. Every thread has its own io_service object. The only shared state between threads are connection pools which are locked with mutex when connection is get or returned from/to the connection pool. When there is not enough connections in the pool I push infinite asio::steady_tiemer in internal structure of the pool and asynchronously waiting on it and I yielding from the couroutine function. When other thread returns connection to the pool it checks whether there is waiting timers, it gets waiting timer from the internal structure, it gets its io_service object and posts a lambda which wakes up the timer to resume the suspended coroutine. I have random crashes in the application. I try to investigate the problem with valgrind. It founds some issues but I cannot understand them because they happen in boost::coroutine and boost::asio internals. Here are fragments from my code and from valgrind output. Can someone see and explain the problem?
Here is the calling code:
template <class ContextsType>
void executeRequests(ContextsType& avlRequestContexts)
{
AvlRequestDataList allRequests;
for(auto& requestContext : avlRequestContexts)
{
if(!requestContext.pullProvider || !requestContext.toAskGDS())
continue;
auto& requests = requestContext.pullProvider->getRequestsData();
copy(requests.begin(), requests.end(), back_inserter(allRequests));
}
if(allRequests.size() == 0)
return;
boost::asio::io_service ioService;
curl::AsioMultiplexer multiplexer(ioService);
for(auto& request : allRequests)
{
using namespace boost::asio;
spawn(ioService, [&multiplexer, &request](yield_context yield)
{
request->prepare(multiplexer, yield);
});
}
while(true)
{
try
{
VLOG_DEBUG(avlGeneralLogger, "executeRequests: Starting ASIO event loop.");
ioService.run();
VLOG_DEBUG(avlGeneralLogger, "executeRequests: ASIO event loop finished.");
break;
}
catch(const std::exception& e)
{
VLOG_ERROR(avlGeneralLogger, "executeRequests: Error while executing GDS request: " << e.what());
}
catch(...)
{
VLOG_ERROR(avlGeneralLogger, "executeRequests: Unknown error while executing GDS request.");
}
}
}
Here is the prepare function implementation which is called in spawned lambda:
void AvlRequestData::prepareImpl(curl::AsioMultiplexer& multiplexer,
boost::asio::yield_context yield)
{
auto& ioService = multiplexer.getIoService();
_connection = _pool.getConnection(ioService, yield);
_connection->prepareRequest(xmlRequest, xmlResponse, requestTimeoutMS);
multiplexer.addEasyHandle(_connection->getHandle(),
[this](const curl::EasyHandleResult& result)
{
if(0 == result.responseCode)
returnQuota();
VLOG_DEBUG(lastSeatLogger, "Response " << id << ": " << xmlResponse);
_pool.addConnection(std::move(_connection));
});
}
void AvlRequestData::prepare(curl::AsioMultiplexer& multiplexer,
boost::asio::yield_context yield)
{
try
{
prepareImpl(multiplexer, yield);
}
catch(const std::exception& e)
{
VLOG_ERROR(lastSeatLogger, "Error wile preparing request: " << e.what());
returnQuota();
}
catch(...)
{
VLOG_ERROR(lastSeatLogger, "Unknown error while preparing request.");
returnQuota();
}
}
The returnQuota function is pure virtual method of the AvlRequestData class and its implementation for the TravelportRequestData class which is used in all my tests is the following:
void returnQuota() const override
{
auto& avlQuotaManager = AvlQuotaManager::getInstance();
avlQuotaManager.consumeQuotaTravelport(-1);
}
Here are push and pop methods of the connection pool.
auto AvlConnectionPool::getConnection(
TimerPtr timer,
asio::yield_context yield) -> ConnectionPtr
{
lock_guard<mutex> lock(_mutex);
while(_connections.empty())
{
_timers.emplace_back(timer);
timer->expires_from_now(
asio::steady_timer::clock_type::duration::max());
_mutex.unlock();
coroutineAsyncWait(*timer, yield);
_mutex.lock();
}
ConnectionPtr connection = std::move(_connections.front());
_connections.pop_front();
VLOG_TRACE(defaultLogger, str(format("Getted connection from pool: %s. Connections count %d.")
% _connectionPoolName % _connections.size()));
++_connectionsGiven;
return connection;
}
void AvlConnectionPool::addConnection(ConnectionPtr connection,
Side side /* = Back */)
{
lock_guard<mutex> lock(_mutex);
if(Front == side)
_connections.emplace_front(std::move(connection));
else
_connections.emplace_back(std::move(connection));
VLOG_TRACE(defaultLogger, str(format("Added connection to pool: %s. Connections count %d.")
% _connectionPoolName % _connections.size()));
if(_timers.empty())
return;
auto timer = _timers.back();
_timers.pop_back();
auto& ioService = timer->get_io_service();
ioService.post([timer](){ timer->cancel(); });
VLOG_TRACE(defaultLogger, str(format("Connection pool %s: Waiting thread resumed.")
% _connectionPoolName));
}
This is implementation of coroutineAsyncWait.
inline void coroutineAsyncWait(boost::asio::steady_timer& timer,
boost::asio::yield_context yield)
{
boost::system::error_code ec;
timer.async_wait(yield[ec]);
if(ec && ec != boost::asio::error::operation_aborted)
throw std::runtime_error(ec.message());
}
And finally the first part of the valgrind output:
==8189== Thread 41:
==8189== Invalid read of size 8
==8189== at 0x995F84: void boost::coroutines::detail::trampoline_push_void, void, boost::asio::detail::coro_entry_point, void (anonymous namespace)::executeRequests > >(std::vector<(anonymous namespace)::AvlRequestContext, std::allocator<(anonymous namespace)::AvlRequestContext> >&)::{lambda(boost::asio::basic_yield_context >)#1}>&, boost::coroutines::basic_standard_stack_allocator > >(long) (trampoline_push.hpp:65)
==8189== Address 0x2e3b5528 is not stack'd, malloc'd or (recently) free'd
When I use valgrind with debugger attached it stops in the following function in trampoline_push.hpp in boost::coroutine library.
53│ template< typename Coro >
54│ void trampoline_push_void( intptr_t vp)
55│ {
56│ typedef typename Coro::param_type param_type;
57│
58│ BOOST_ASSERT( vp);
59│
60│ param_type * param(
61│ reinterpret_cast< param_type * >( vp) );
62│ BOOST_ASSERT( 0 != param);
63│
64│ Coro * coro(
65├> reinterpret_cast< Coro * >( param->coro) );
66│ BOOST_ASSERT( 0 != coro);
67│
68│ coro->run();
69│ }
Ultimately I found that when objects need to be deleted, boost::asio doesn't handle it gracefully without proper use of shared_ptr and weak_ptr. When crashes do occur, they are very difficult to debug, because its hard to look into what the io_service queue is doing at the time of failure.
After doing a full asynchronous client architecture recently and running into random crashing issues, I have a few tips to offer. Unfortunately, I cannot know whether these will solve your issues, but hopefully it provides a good start in the right direction.
Boost Asio Coroutine Usage Tips
Use boost::asio::asio_handler_invoke instead of io_service.post():
auto& ioService = timer->get_io_service();
ioService.post(timer{ timer->cancel(); });
Using post/dispatch within a coroutine is usually a bad idea. Always use the asio_handler_invoke when you are called from a coroutine. In this case, however, you can probably safely call timer->cancel() without posting it to the message loop anyways.
Your timers do not appear to use shared_ptr objects. Regardless of what is going on in the rest of your application, there is no way to know for sure when these objects should be destroyed. I would highly recommend using shared_ptr objects for all of your timer objects. Also, any pointer to class methods should use shared_from_this() as well. Using a plain this can be quite dangerous if this is destructed (on the stack) or goes out of scope somewhere else in a shared_ptr. Whatever you do, do not use shared_from_this() in the constructor of an object!
If you're getting a crash when a handler within the io_service is being executed, but part of the handler is no longer valid, this is a seriously difficult thing to debug. The handler object that is pumped into the io_service includes any pointers to timers, or pointers to objects that might be necessary to execute the handler.
I highly recommend going overboard with shared_ptr objects wrapped around any asio classes. If the problem goes away, then its likely order of destruction issues.
Is the failure address location on the heap somewhere or is it pointing to the stack? This will help you diagnose whether its an object going out of scope in a method at the wrong time, or if it is something else. For instance, this proved to me that all of my timers must become shared_ptr objects even within a single threaded application.

boost asio asynchronously waiting on a condition variable

Is it possible to perform an asynchronous wait (read : non-blocking) on a conditional variable in boost::asio ? if it isn't directly supported any hints on implementing it would be appreciated.
I could implement a timer and fire a wakeup even every few ms, but this is approach is vastly inferior, I find it hard to believe that condition variable synchronization is not implemented / documented.
If I understand the intent correctly, you want to launch an event handler, when some condition variable is signaled, in context of asio thread pool? I think it would be sufficient to wait on the condition variable in the beginning of the handler, and io_service::post() itself back in the pool in the end, something of this sort:
#include <iostream>
#include <boost/asio.hpp>
#include <boost/thread.hpp>
boost::asio::io_service io;
boost::mutex mx;
boost::condition_variable cv;
void handler()
{
boost::unique_lock<boost::mutex> lk(mx);
cv.wait(lk);
std::cout << "handler awakened\n";
io.post(handler);
}
void buzzer()
{
for(;;)
{
boost::this_thread::sleep(boost::posix_time::seconds(1));
boost::lock_guard<boost::mutex> lk(mx);
cv.notify_all();
}
}
int main()
{
io.post(handler);
boost::thread bt(buzzer);
io.run();
}
I can suggest solution based on boost::asio::deadline_timer which works fine for me. This is kind of async event in boost::asio environment.
One very important thing is that the 'handler' must be serialised through the same 'strand_' as 'cancel', because using 'boost::asio::deadline_timer' from multiple threads is not thread safe.
class async_event
{
public:
async_event(
boost::asio::io_service& io_service,
boost::asio::strand<boost::asio::io_context::executor_type>& strand)
: strand_(strand)
, deadline_timer_(io_service, boost::posix_time::ptime(boost::posix_time::pos_infin))
{}
// 'handler' must be serialised through the same 'strand_' as 'cancel' or 'cancel_one'
// because using 'boost::asio::deadline_timer' from multiple threads is not thread safe
template<class WaitHandler>
void async_wait(WaitHandler&& handler) {
deadline_timer_.async_wait(handler);
}
void async_notify_one() {
boost::asio::post(strand_, boost::bind(&async_event::async_notify_one_serialized, this));
}
void async_notify_all() {
boost::asio::post(strand_, boost::bind(&async_event::async_notify_all_serialized, this));
}
private:
void async_notify_one_serialized() {
deadline_timer_.cancel_one();
}
void async_notify_all_serialized() {
deadline_timer_.cancel();
}
boost::asio::strand<boost::asio::io_context::executor_type>& strand_;
boost::asio::deadline_timer deadline_timer_;
};
Unfortunately, Boost ASIO doesn't have an async_wait_for_condvar() method.
In most cases, you also won't need it. Programming the ASIO way usually means, that you use strands, not mutexes or condition variables, to protect shared resources. Except for rare cases, which usually focus around correct construction or destruction order at startup and exit, you won't need mutexes or condition variables at all.
When modifying a shared resource, the classic, partially synchronous threaded way is as follows:
Lock the mutex protecting the resource
Update whatever needs to be updated
Signal a condition variable, if further processing by a waiting thread is required
Unlock the mutex
The fully asynchronous ASIO way is though:
Generate a message, that contains everything, that is needed to update the resource
Post a call to an update handler with that message to the resource's strand
If further processing is needed, let that update handler create further message(s) and post them to the apropriate resources' strands.
If jobs can be executed on fully private data, then post them directly to the io-context instead.
Here is an example of a class some_shared_resource, that receives a string state and triggers some further processing depending on the state received. Please note, that all processing in the private method some_shared_resource::receive_state() is fully thread-safe, as the strand serializes all calls.
Of course, the example is not complete; some_other_resource needs a similiar send_code_red() method as some_shared_ressource::send_state().
#include <boost/asio>
#include <memory>
using asio_context = boost::asio::io_context;
using asio_executor_type = asio_context::executor_type;
using asio_strand = boost::asio::strand<asio_executor_type>;
class some_other_resource;
class some_shared_resource : public std::enable_shared_from_this<some_shared_resource> {
asio_strand strand;
std::shared_ptr<some_other_resource> other;
std::string state;
void receive_state(std::string&& new_state) {
std::string oldstate = std::exchange(state, new_state);
if(state == "red" && oldstate != "red") {
// state transition to "red":
other.send_code_red(true);
} else if(state != "red" && oldstate == "red") {
// state transition from "red":
other.send_code_red(false);
}
}
public:
some_shared_resource(asio_context& ctx, const std::shared_ptr<some_other_resource>& other)
: strand(ctx.get_executor()), other(other) {}
void send_state(std::string&& new_state) {
boost::asio::post(strand, [me = weak_from_this(), new_state = std::move(new_state)]() mutable {
if(auto self = me.lock(); self) {
self->receive_state(std::move(new_state));
}
});
}
};
As you see, posting always into ASIO's strands can be a bit tedious at first. But you can move most of that "equip a class with a strand" code into a template.
The good thing about message passing: As you are not using mutexes, you cannot deadlock yourself anymore, even in extreme situations. Also, using message passing, it is often easier to create a high level of parallelity than with classical multithreading. On the downside, moving and copying around all these message objects is time consuming, which can slow down your application.
A last note: Using the weak pointer in the message formed by send_state() facilitates the reliable destruction of some_shared_resource objects: Otherwise, if A calls B and B calls C and C calls A (possibly only after a timeout or similiar), using shared pointers instead of weak pointers in the messages would create cyclic references, which then prevents object destruction. If you are sure, that you never will have cycles, and that processing messages from to-be-deleted objects doesn't pose a problem, you can use shared_from_this() instead of weak_from_this(), of course. If you are sure, that objects won't get deleted before ASIO has been stopped (and all working threads been joined back to the main thread), then you can also directly capture the this pointer instead.
FWIW, I implemented an asynchronous mutex using the rather good continuable library:
class async_mutex
{
cti::continuable<> tail_{cti::make_ready_continuable()};
std::mutex mutex_;
public:
async_mutex() = default;
async_mutex(const async_mutex&) = delete;
const async_mutex& operator=(const async_mutex&) = delete;
[[nodiscard]] cti::continuable<std::shared_ptr<int>> lock()
{
std::shared_ptr<int> result;
cti::continuable<> tail = cti::make_continuable<void>(
[&result](auto&& promise) {
result = std::shared_ptr<int>((int*)1,
[promise = std::move(promise)](auto) mutable {
promise.set_value();
}
);
}
);
{
std::lock_guard _{mutex_};
std::swap(tail, tail_);
}
co_await std::move(tail);
co_return result;
}
};
usage eg:
async_mutex mutex;
...
{
const auto _ = co_await mutex.lock();
// only one lock per mutex-instance
}

threading-related active object design questions (c++ boost)

I would like some feedback regarding the IService class listed below. From what I know, this type of class is related to the "active-object" pattern. Please excuse/correct if I use any related terminology incorrectly. Basically the idea is that the classes using this active object class need to provide a start and a stop method which control some event loop. This event loop could be implemented with a while loop or with boost asio etc.
This class is responsible for starting a new thread in a non-blocking manner so that events can be handled in/by the new thread. It must also handle all clean-up related code. I first tried an OO approach in which subclasses were responsible for overriding methods to control the event loop but the cleanup was messy: in the destructor calling the stop method resulted in a pure virtual function call in cases where the calling class had not manually called the stop method. The templated solution seems to be a lot cleaner:
template <typename T>
class IService : private boost::noncopyable
{
typedef boost::shared_ptr<boost::thread> thread_ptr;
public:
IService()
{
}
~IService()
{
/// try stop the service in case it's running
stop();
}
void start()
{
boost::mutex::scoped_lock lock(m_threadMutex);
if (m_pServiceThread && m_pServiceThread->joinable())
{
// already running
return;
}
m_pServiceThread = thread_ptr(new boost::thread(boost::bind(&IService::main, this)));
// need to wait for thread to start: else if destructor is called before thread has started
// Wait for condition to be signaled and then
// try timed wait since the application could deadlock if the thread never starts?
//if (m_startCondition.timed_wait(m_threadMutex, boost::posix_time::milliseconds(getServiceTimeoutMs())))
//{
//}
m_startCondition.wait(m_threadMutex);
// notify main to continue: it's blocked on the same condition var
m_startCondition.notify_one();
}
void stop()
{
// trigger the stopping of the event loop
m_serviceObject.stop();
if (m_pServiceThread)
{
if (m_pServiceThread->joinable())
{
m_pServiceThread->join();
}
// the service is stopped so we can reset the thread
m_pServiceThread.reset();
}
}
private:
/// entry point of thread
void main()
{
boost::mutex::scoped_lock lock(m_threadMutex);
// notify main thread that it can continue
m_startCondition.notify_one();
// Try Dummy wait to allow 1st thread to resume???
m_startCondition.wait(m_threadMutex);
// call template implementation of event loop
m_serviceObject.start();
}
/// Service thread
thread_ptr m_pServiceThread;
/// Thread mutex
mutable boost::mutex m_threadMutex;
/// Condition for signaling start of thread
boost::condition m_startCondition;
/// T must satisfy the implicit service interface and provide a start and a stop method
T m_serviceObject;
};
The class could be used as follows:
class TestObject3
{
public:
TestObject3()
:m_work(m_ioService),
m_timer(m_ioService, boost::posix_time::milliseconds(200))
{
m_timer.async_wait(boost::bind(&TestObject3::doWork, this, boost::asio::placeholders::error));
}
void start()
{
// simple event loop
m_ioService.run();
}
void stop()
{
// signal end of event loop
m_ioService.stop();
}
void doWork(const boost::system::error_code& e)
{
// Do some work here
if (e != boost::asio::error::operation_aborted)
{
m_timer.expires_from_now( boost::posix_time::milliseconds(200) );
m_timer.async_wait(boost::bind(&TestObject3::doWork, this, boost::asio::placeholders::error));
}
}
private:
boost::asio::io_service m_ioService;
boost::asio::io_service::work m_work;
boost::asio::deadline_timer m_timer;
};
Now to my specific questions:
1) Is the use of the boost condition variable correct? It seems like a bit of a hack to me: I wanted to wait for the thread to be launched so I waited on the condition variable. Then once the new thread has launched in the main method, I again wait on the same condition variable to allow the initial thread to continue. Then once the start method of the initial thread is exited, the new thread can continue. Is this ok?
2) Are there any cases in which the thread would not get launched successfully by the OS? I remember reading somewhere that this can occur. If this is possible, I should rather do a timed wait on the condition variable (as is commented out in the start method)?
3) I am aware that of the templated class could not implement the stop method "correctly" i.e. if the event loop fails to stop, the code will block on the joins (either in the stop or in the destructor) but I see no way around this. I guess it is up to the user of the class to make sure that the start and stop method are implemented correctly?
4) I would appreciate any other design mistakes, improvements, etc?
Thanks!
Finally settled on the following:
1) After much testing use of condition variable seems fine
2) This issue hasn't cropped up (yet)
3) The templated class implementation must meet the requirements, unit tests are used to
test for correctness
4) Improvements
Added join with lock
Catching exceptions in spawned thread and rethrowing in main thread to avoid crashes and to not loose exception info
Using boost::system::error_code to communicate error codes back to caller
implementation object is set-able
Code:
template <typename T>
class IService : private boost::noncopyable
{
typedef boost::shared_ptr<boost::thread> thread_ptr;
typedef T ServiceImpl;
public:
typedef boost::shared_ptr<IService<T> > ptr;
IService()
:m_pServiceObject(&m_serviceObject)
{
}
~IService()
{
/// try stop the service in case it's running
if (m_pServiceThread && m_pServiceThread->joinable())
{
stop();
}
}
static ptr create()
{
return boost::make_shared<IService<T> >();
}
/// Accessor to service implementation. The handle can be used to configure the implementation object
ServiceImpl& get() { return m_serviceObject; }
/// Mutator to service implementation. The handle can be used to configure the implementation object
void set(ServiceImpl rServiceImpl)
{
// the implementation object cannot be modified once the thread has been created
assert(m_pServiceThread == 0);
m_serviceObject = rServiceImpl;
m_pServiceObject = &m_serviceObject;
}
void set(ServiceImpl* pServiceImpl)
{
// the implementation object cannot be modified once the thread has been created
assert(m_pServiceThread == 0);
// make sure service object is valid
if (pServiceImpl)
m_pServiceObject = pServiceImpl;
}
/// if the service implementation reports an error from the start or stop method call, it can be accessed via this method
/// NB: only the last error can be accessed
boost::system::error_code getServiceErrorCode() const { return m_ecService; }
/// The join method allows the caller to block until thread completion
void join()
{
// protect this method from being called twice (e.g. by user and by stop)
boost::mutex::scoped_lock lock(m_joinMutex);
if (m_pServiceThread && m_pServiceThread->joinable())
{
m_pServiceThread->join();
m_pServiceThread.reset();
}
}
/// This method launches the non-blocking service
boost::system::error_code start()
{
boost::mutex::scoped_lock lock(m_threadMutex);
if (m_pServiceThread && m_pServiceThread->joinable())
{
// already running
return boost::system::error_code(SHARED_INVALID_STATE, shared_category);
}
m_pServiceThread = thread_ptr(new boost::thread(boost::bind(&IService2::main, this)));
// Wait for condition to be signaled
m_startCondition.wait(m_threadMutex);
// notify main to continue: it's blocked on the same condition var
m_startCondition.notify_one();
// No error
return boost::system::error_code();
}
/// This method stops the non-blocking service
boost::system::error_code stop()
{
// trigger the stopping of the event loop
//boost::system::error_code ec = m_serviceObject.stop();
assert(m_pServiceObject);
boost::system::error_code ec = m_pServiceObject->stop();
if (ec)
{
m_ecService = ec;
return ec;
}
// The service implementation can return an error code here for more information
// However it is the responsibility of the implementation to stop the service event loop (if running)
// Failure to do so, will result in a block
// If this occurs in practice, we may consider a timed join?
join();
// If exception was thrown in new thread, rethrow it.
// Should the template implementation class want to avoid this, it should catch the exception
// in its start method and then return and error code instead
if( m_exception )
boost::rethrow_exception(m_exception);
return ec;
}
private:
/// runs in it's own thread
void main()
{
try
{
boost::mutex::scoped_lock lock(m_threadMutex);
// notify main thread that it can continue
m_startCondition.notify_one();
// Try Dummy wait to allow 1st thread to resume
m_startCondition.wait(m_threadMutex);
// call implementation of event loop
// This will block
// In scenarios where the service fails to start, the implementation can return an error code
m_ecService = m_pServiceObject->start();
m_exception = boost::exception_ptr();
}
catch (...)
{
m_exception = boost::current_exception();
}
}
/// Service thread
thread_ptr m_pServiceThread;
/// Thread mutex
mutable boost::mutex m_threadMutex;
/// Join mutex
mutable boost::mutex m_joinMutex;
/// Condition for signaling start of thread
boost::condition m_startCondition;
/// T must satisfy the implicit service interface and provide a start and a stop method
T m_serviceObject;
T* m_pServiceObject;
// Error code for service implementation errors
boost::system::error_code m_ecService;
// Exception ptr to transport exception across different threads
boost::exception_ptr m_exception;
};
Further feedback/criticism would of course be welcome.