asio::async_write incredibly difficult to synchronize on a high volume stream - c++

I am currently using the Asio C++ library and wrote a client wrapper around it. My original approach was very basic and only needed to stream in a single direction. Requirements have changed and I've switched over to using all asynchronous calls. Most of the migration has been easy except for the asio::async_write(...). I have used a few different approaches and inevitably run into a deadlock with each one.
The application streams data at a high volume continuously. I have stayed away from strands because they do not block and can lead to memory issues especially when the server is under heavy load. Jobs will back up and the applications heap indefinitely grows.
So I created a blocking queue only to find out the hard way that using locks across callbacks and or blocking events leads to unknown behavior.
The wrapper is a very large class, so I will try to explain my landscape in its current state and hopefully get some good suggestions:
I have an asio::steady_timer that runs on a fixed schedule to push a heartbeat message directly into the blocking queue.
A thread dedicated to reading events and pushing them to the blocking queue
A thread dedicated to consumption of the blocking queue
For example, in my queue I have a queue::block() and queue::unblock() that are just wrappers for the condition variable / mutex.
std::thread consumer([this]() {
std::string message_buffer;
while (queue.pop(message_buffer)) {
queue.stage_block();
asio::async_write(*socket, asio::buffer(message_buffer), std::bind(&networking::handle_write, this, std::placeholders::_1, std::placeholders::_2));
queue.block();
}
});
void networking::handle_write(const std::error_code& error, size_t bytes_transferred) {
queue.unblock();
}
When the socket backs up and the server can no longer accept data because of the current load, the queue fills up and leads to a deadlock where handle_write(...) is never called.
The other approach eliminates the consumer thread entirely and relies on handle_write(...) to pop the queue. Like so:
void networking::write(const std::string& data) {
if (!queue.closed()) {
std::stringstream stream_buffer;
stream_buffer << data << std::endl;
spdlog::get("console")->debug("pushing to queue {}", queue.size());
queue.push(stream_buffer.str());
if (queue.size() == 1) {
spdlog::get("console")->debug("handle_write: {}", stream_buffer.str());
asio::async_write(*socket, asio::buffer(stream_buffer.str()), std::bind(&networking::handle_write, this, std::placeholders::_1, std::placeholders::_2));
}
}
}
void networking::handle_write(const std::error_code& error, size_t bytes_transferred) {
std::string message;
queue.pop(message);
if (!queue.closed() && !queue.empty()) {
std::string front = queue.front();
asio::async_write(*socket, asio::buffer(queue.front()), std::bind(&networking::handle_write, this, std::placeholders::_1, std::placeholders::_2));
}
}
This also resulted in a deadlock and obviously results in other race problems. When I disabled my heartbeat callback, I had absolutely no issues. However, the heartbeat is a requirement.
What am I doing wrong? What is a better approach?

It appears all my pain derived from the heartbeat entirely. Disabling the heartbeat in each variation of my asynchronous write operations seem to cure my problems, so this lead me to believe that this could be a result of using the built in asio::async_wait(...) and the asio::steady_timer.
Asio synchronizes its work internally and waits for jobs to complete before executing the next job. Using the asio::async_wait(...) to construct my heartbeat functionality was my design flaw because it operated on the same thread that waited on pending jobs. It created a deadlock with Asio when the heartbeat waited on queue::push(...). This would explain why asio::async_write(...) completion handler never executed in my first example.
The solution was to put the heartbeat on its own thread and let it work independently from Asio. I am still using my blocking queue to synchronize calls to asio::async_write(...) but have modified my consumer thread to use std::future and std::promise. This synchronizes the callback with my consumer thread cleanly.
std::thread networking::heartbeat_worker() {
return std::thread([&]() {
while (socket_opened) {
spdlog::get("console")->trace("heartbeat pending");
write(heartbeat_message);
spdlog::get("console")->trace("heartbeat sent");
std::unique_lock<std::mutex> lock(mutex);
socket_closed_event.wait_for(lock, std::chrono::milliseconds(heartbeat_interval), [&]() {
return !socket_opened;
});
}
spdlog::get("console")->trace("heartbeat thread exited gracefully");
});
}

Related

How to integrate Cap'n'Proto threads with non Cap'n'Proto threads?

How do I properly integrate Cap'n'Proto client usage with surrounding multi-threaded code? The Cap'n'Proto docs say that each Cap'n'Proto interface is single-threaded with a dedicated event loop. Additionally they recommend using Cap'n'Proto to communicate between threads. However, the docs don't seem to describe how non-Cap'n'Proto threads (e.g. the UI loop) could integrate with that. Even if could integrate Cap'n'Proto event loops with the UI loop in some places, other models like thread pools (Android Binder, global libdispatch queues) seem more challenging.
I think the solution is to cache the thread executor for the client thread in a synchronized place that the non-capnp thread will access it.
I believe though that the calling thread always needs to be on its own event loop as well to marry them but I just want to make sure that's actually the case. My initial attempt to do that in a simple unit test is failing. I created a KjLooperEventPort class (following the structure for the node libuv adapter) to marry KJ & ALooper on Android.
Then my test code is:
TEST(KjLooper, CrossThreadPromise) {
std::thread::id kjThreadId;
ConditionVariable<const kj::Executor*> executorCv{nullptr};
ConditionVariable<std::pair<bool, kj::Promise<void>>> looperThreadFinished{false, nullptr};
std::thread looperThread([&] {
auto looper = android::newLooper();
android::KjLooperEventPort kjEventPort{looper};
kj::WaitScope waitScope(kjEventPort.getKjLoop());
auto finished = kj::newPromiseAndFulfiller<void>();
looperThreadFinished.constructValueAndNotifyAll(true, kj::mv(finished.promise));
executorCv.waitNotValue(nullptr);
auto executor = executorCv.readCopy();
kj::Promise<void> asyncPromise = executor->executeAsync([&] {
ASSERT_EQ(std::this_thread::get_id(), kjThreadId);
});
asyncPromise = asyncPromise.then([tid = std::this_thread::get_id(), kjThreadId, &finished] {
std::cerr << "Running promise completion on original thread\n";
ASSERT_NE(tid, kjThreadId);
ASSERT_EQ(std::this_thread::get_id(), tid);
std::cerr << "Fulfilling\n";
finished.fulfiller->fulfill();
std::cerr << "Fulfilled\n";
});
asyncPromise.wait(waitScope);
});
std::thread kjThread([&] {
kj::Promise<void> finished = kj::NEVER_DONE;
looperThreadFinished.wait([&](auto& promise) {
finished = kj::mv(promise.second);
return promise.first;
});
auto ioContext = kj::setupAsyncIo();
kjThreadId = std::this_thread::get_id();
executorCv.setValueAndNotifyAll(&kj::getCurrentThreadExecutor());
finished.wait(ioContext.waitScope);
});
looperThread.join();
kjThread.join();
}
This crashes fulfilling the promise back to the kj thread.
terminating with uncaught exception of type kj::ExceptionImpl: kj/async.c++:1269: failed: expected threadLocalEventLoop == &loop || threadLocalEventLoop == nullptr; Event armed from different thread than it was created in. You must use
Executor to queue events cross-thread.
Most Cap'n Proto RPC and KJ Promise-related objects can only be accessed in the thread that created them. Resolving a promise cross-thread, for example, will fail, as you saw.
Some ways you could solve this include:
You can use kj::Executor to schedule code to run on a different thread's event loop. The calling thread does NOT need to be a KJ event loop thread if you use executeSync() -- however, this function blocks until the other thread has had a chance to wake up and execute the function. I'm not sure how well this will perform in practice; if it's a problem, there is probably room to extend the Executor interface to handle this use case more efficiently.
You can communicate between threads by passing messages over pipes or socketpairs (but sending big messages this way would involve a lot of unnecessary copying to/from the socket buffer).
You could signal another thread's event loop to wake up using a pipe, signal, or (on Linux) eventfd, then have it look for messages in a mutex-protected queue. (But kj::Executor mostly obsoletes this technique.)
It's possible, though not easy, to adapt KJ's event loop to run on top of other event loops, so that both can run in the same thread. For example, node-capnp adapts KJ to run on top of libuv.

Boost Asio, async_read/connect timeout

In boost website, there is a good example about timeout of async operations. However, in that example, the socket is closed to cancel operations. There is also socket::cancel(), but in both documentation and as a compiler warning, it is stated as problematic in terms of portability.
Among the stack of Boost.Asio timeout questions in SO, there are several kind of answers. The first one probably is introducing a custom event loop, i.e., loop io_service::run_one() and cancel the event loop on deadline. I am using io_service::run() in a worker thread. That's not the kind of solution I would like to employ, if possible, as I do not want to change my code base.
A second option is directly changing the options of native socket. However, I would like to stick to Boost.Asio if possible and avoid any sort of platform-specific code as much as possible.
The example in the documentation is for an old version of Boost.Asio, but it's working properly, other than being forced to close the socket to cancel the operations. Using the documentation example, I have the following
void check_deadline(const boost::system::error_code &ec)
{
if(!running) {
return;
}
if(timer.expires_at() <= boost::asio::deadline_timer::traits_type::now()) {
// cancel all operations
boost::system::error_code errorcode;
boost::asio::ip::tcp::endpoint endpoint = socket.remote_endpoint();
socket.close(errorcode);
if(errorcode) {
SLOGERROR(mutex, errorcode.message(), "check_deadline()");
}
else {
SLOG(mutex, "timed out", "check_deadline()");
// connect again
Connect(endpoint);
if(errorcode) {
SLOGERROR(mutex, errorcode.message(), "check_deadline()");
}
}
// set timer to infinity, so that it won't expire
// until a proper deadline is set
timer.expires_at(boost::posix_time::pos_infin);
}
// keep waiting
timer.async_wait(std::bind(&TCPClient::check_deadline, this, std::placeholders::_1));
}
This is the only callback function registered to async_wait.The very first solution I could come up is reconnecting after closing the socket. Now my question is, is there a better way? By better way, I mean canceling the operations based on a timer without actually disrupting (i.e., not closing the socket) the connection.

Check for data with timing?

Is there a way to check for data for a certain time in asio?
I have a client with an asio socket which has a Method
bool ASIOClient::hasData()
{
return m_socket->available();
}
And i'd like to have some kind of delay here so it checks for data for like 1 second max and returns more ealy. Moreover i don't want to poll it for obvious reason that it meight take a second. The reaseon why i use this is, that i do send data to a client and wait for the respond. If he doesnt respond in a certain time i'd close the socket. Thats what the hasData is mentioned for.
I know that it is nativ possible with an select and an fd_set.
The asio Client is created in an Accept method of the server socket class and later used to handle requests and send back data to the one who connected here.
int ASIOServer::accept(const bool& blocking)
{
auto l_sock = std::make_shared<asio::ip::tcp::socket>(m_io_service);
m_acceptor.accept(*l_sock);
auto l_client = std::make_shared<ASIOClient>(l_sock);
return 0;
}
You just need to attempt to read.
The usual approach is to define deadlines for all asynchronous operations that could take "long" (or even indefinitely long).
This is quite natural in asynchronous executions:
Just add a deadline timer:
boost::asio::deadline_timer tim(svc);
tim.expires_from_now(boost::posix_time::seconds(2));
tim.async_wait([](error_code ec) {
if (!ec) // timer was not canceled, so it expired
{
socket_.cancel(); // cancel pending async operation
}
});
If you want to use it with synchronous calls, you can with judicious use of poll() instead of run(). See this answer: boost::asio + std::future - Access violation after closing socket which implements a helper await_operation that runs a single operations synchronously but under a timeout.

Keeping two cross-communicating asio io_service objects busy

I am using boost:asio with multiple io_services to keep different forms of blocking I/O separate. E.g. I have one io_service for blocking file I/O, and another for long-running CPU-bound tasks (and this could be extended to a third for blocking network I/O, etc.) Generally speaking I want to ensure that one form of blocking I/O cannot starve the others.
The problem I am having is that since tasks running in one io_service can post events to other io_service (e.g. a CPU-bound task may need to start a file I/O operation, or a completed file I/O operation may invoke a CPU-bound callback), I don't know how to keep both io_services running until they are both out of events.
Normally with a single I/O service, you do something like:
shared_ptr<asio::io_service> io_service (new asio::io_service);
shared_ptr<asio::io_service::work> work (
new asio::io_service::work(*io_service));
// Create worker thread(s) that call io_service->run()
io_service->post(/* some event */);
work.reset();
// Join worker thread(s)
However if I simply do this for both io_services, the one into which I did not post an initial event finishes immediately. And even if I post initial events to both, if the initial event on io_service B finishes before the task on io_service A posts a new event to B, io_service B will finish prematurely.
How can I keep io_service B running while io_service A is still processing events (because one of the queued events in service A might post a new event to B), and vice-versa, while still ensuring that both io_services exit their run() methods if they are ever both out of events at the same time?
Figured out a way to do this, so documenting it for the record in case anyone else finds this question in a search:
Create each N cross-communicating io_services, create a work object for each of them, and then start their worker threads.
Create a "master" io_service object which will not run any worker threads.
Do not allow posting events directly to the services. Instead, create accessor functions to the io_services which will:
Create a work object on the master thread.
Wrap the callback in a function that runs the real callback, then deletes the work.
Post this wrapped callback instead.
In the main flow of execution, once all of the N io_services have started and you have posted work to at least one of them, call run() on the master io_service.
When the master io_service's run() method returns, delete all of the initial work on the N cross-communicating io_services, and join all worker threads.
Having the master io_service's thread own work on each of the other io_services ensures that they will not terminate until the master io_service runs out of work. Having each of the other io_services own work on the master io_service for every posted callback ensure that the master io_service will not run out of work until every one of the other io_services no longer has any posted callbacks left to process.
An example (could be enapsulated in a class):
shared_ptr<boost::asio::io_service> master_io_service;
void RunWorker(boost::shared_ptr<boost::asio::io_service> io_service) {
io_service->run();
}
void RunCallbackAndDeleteWork(boost::function<void()> callback,
boost::asio::io_service::work* work) {
callback();
delete work;
}
// All new posted callbacks must come through here, rather than being posted
// directly to the io_service object.
void PostToService(boost::shared_ptr<boost::asio::io_service> io_service,
boost::function<void()> callback) {
io_service->post(boost::bind(
&RunCallbackAndDeleteWork, callback,
new boost::asio::io_service::work(*master_io_service)));
}
int main() {
vector<boost::shared_ptr<boost::asio::io_service> > io_services;
vector<boost::shared_ptr<boost::asio::io_service::work> > initial_work;
boost::thread_pool worker_threads;
master_io_service.reset(new boost::asio::io_service);
const int kNumServices = X;
const int kNumWorkersPerService = Y;
for (int i = 0; i < kNumServices; ++i) {
shared_ptr<boost::asio::io_service> io_service(new boost::asio::io_service);
io_services.push_back(io_service);
initial_work.push_back(new boost::asio::io_service::work(*io_service));
for (int j = 0; j < kNumWorkersPerService; ++j) {
worker_threads.create_thread(boost::bind(&RunWorker, io_service));
}
}
// Use PostToService to start initial task(s) on at least one of the services
master_io_service->run();
// At this point, there is no real work left in the services, only the work
// objects in the initial_work vector.
initial_work.clear();
worker_threads.join_all();
return 0;
}
The HTTP server example 2 does something similar that you may find useful. It uses the concept of an io_service pool that retains vectors of shared_ptr<boost::asio::io_service> and a shared_ptr<boost::asio::io_service::work> for each io_service. It uses a thread pool to run each service.
The example uses a round-robin scheduling for doling out work to the I/O services, I don't think that will apply in your case since you have specific tasks for io_service A and io_service B.

Boost Asio callback doesn't get called

I'm using Boost.Asio for network operations, they have to (and actually, can, there's no complex data structures or anything) remain pretty low level since I can't afford the luxury of serialization overhead (and the libs I found that did offer well enough performance seemed to be badly suited for my case).
The problem is with an async write I'm doing from the client (in QT, but that should probably be irrelevant here). The callback specified in the async_write doesn't get called, ever, and I'm at a complete loss as to why. The code is:
void SpikingMatrixClient::addMatrix() {
std::cout << "entered add matrix" << std::endl;
int action = protocol::Actions::AddMatrix;
int matrixSize = this->ui->editNetworkSize->text().toInt();
std::ostream out(&buf);
out.write(reinterpret_cast<const char*>(&action), sizeof(action));
out.write(reinterpret_cast<const char*>(&matrixSize), sizeof(matrixSize));
boost::asio::async_write(*connection.socket(), buf.data(),
boost::bind(&SpikingMatrixClient::onAddMatrix, this, boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred));
}
which calls the first write. The callback is
void SpikingMatrixClient::onAddMatrix(const boost::system::error_code& error, size_t bytes_transferred) {
std::cout << "entered onAddMatrix" << std::endl;
if (!error) {
buf.consume(bytes_transferred);
requestMatrixList();
} else {
QString message = QString::fromStdString(error.message());
this->ui->statusBar->showMessage(message, 15000);
}
}
The callback never gets called, even though the server receives all the data. Can anyone think of any reason why it might be doing that?
P.S. There was a wrapper for that connection, and yes there will probably be one again. Ditched it a day or two ago because I couldn't find the problem with this callback.
As suggested, posting a solution I found to be the most suitable (at least for now).
The client application is [being] written in QT, and I need the IO to be async. For the most part, the client receives calculation data from the server application and has to render various graphical representations of them.
Now, there's some key aspects to consider:
The GUI has to be responsive, it should not be blocked by the IO.
The client can be connected / disconnected.
The traffic is pretty intense, data gets sent / refreshed to the client every few secs and it has to remain responsive (as per item 1.).
As per the Boost.Asio documentation,
Multiple threads may call io_service::run() to set up a pool of
threads from which completion handlers may be invoked.
Note that all threads that have joined an io_service's pool are considered equivalent, and the io_service may distribute work across them in an arbitrary fashion.
Note that io_service.run() blocks until the io_service runs out of work.
With this in mind, the clear solution is to run io_service.run() from another thread. The relevant code snippets are
void SpikingMatrixClient::connect() {
Ui::ConnectDialog ui;
QDialog *dialog = new QDialog;
ui.setupUi(dialog);
if (dialog->exec()) {
QString host = ui.lineEditHost->text();
QString port = ui.lineEditPort->text();
connection = TcpConnection::create(io);
boost::system::error_code error = connection->connect(host, port);
if (!error) {
io = boost::shared_ptr<boost::asio::io_service>(new boost::asio::io_service);
work = boost::shared_ptr<boost::asio::io_service::work>(new boost::asio::io_service::work(*io));
io_threads.create_thread(boost::bind(&SpikingMatrixClient::runIo, this, io));
}
QString message = QString::fromStdString(error.message());
this->ui->statusBar->showMessage(message, 15000);
}
}
for connecting & starting IO, where:
work is a private boost::shared_ptr to the boost::asio::io_service::work object it was passed,
io is a private boost::shared_ptr to a boost::asio::io_service,
connection is a boost::shared_ptr to my connection wrapper class, and the connect() call uses a resolver etc. to connect the socket, there's plenty examples of that around
and io_threads is a private boost::thread_group.
Surely it could be shortened with some typedefs if needed.
TcpConnection is my own connection wrapper implementation, which sortof lacks functionality for now, and I suppose I could move the whole thread thing into it when it gets reinstated. This snippet should be enough to get the idea anyway...
The disconnecting part goes like this:
void SpikingMatrixClient::disconnect() {
work.reset();
io_threads.join_all();
boost::system::error_code error = connection->disconnect();
if (!error) {
connection.reset();
}
QString message = QString::fromStdString(error.message());
this->ui->statusBar->showMessage(message, 15000);
}
the work object is destroyed, so that the io_service can run out of work eventually,
the threads are joined, meaning that all work gets finished before disconnecting, thus data shouldn't get corrupted,
the disconnect() calls shutdown() and close() on the socket behind the scenes, and if there's no error, destroys the connection pointer.
Note, that there's no error handling in case of an error while disconnecting in this snippet, but it could very well be done, either by checking the error code (which seems more C-like), or throwing from the disconnect() if the error code within it represents an error after trying to disconnect.
I encountered a similar problem (callbacks not fired) but the circumstances are different from this question (io_service had jobs but still would not fire the handlers ). I will post this anyway and maybe it will help someone.
In my program, I set up an async_connect() then followed by io_service.run(), which blocks as expected.
async_connect() goes to on_connect_handler() as expected, which in turn fires async_write().
on_write_complete_handler() does not fire, even though the other end of the connection has received all the data and has even sent back a response.
I discovered that it is caused by me placing program logic in on_connect_handler(). Specifically, after the connection was established and after I called async_write(), I entered an infinite loop to perform arbitrary logic, not allowing on_connect_handler() to exit. I assume this causes the io_service to not be able to execute other handlers, even if their conditions are met because it is stuck here. ( I had many misconceptions, and thought that io_service would automagically spawn threads for each async_x() call )
Hope that helps.