Boost Asio experimental channel poor performance - c++

I wrote the following code to analyze experimental channel performance in a single thread application. On i7-6700HQ#3.2GHz It takes around 1 second to complete which shows a throughput of around 3M item per second.
The problem might be due to the fact that because asio is in single threaded mode the producer has to signal the consumer part and that leads to immediate resumption of consumer coroutine on every call to async_send(), but i don't know how to test to make sure this is the case and how we can avoid it in real applications. reducing channel buffer size even to 0 has no effect on the throughput which might be for the same reason.
#include <boost/asio.hpp>
#include <boost/asio/experimental/awaitable_operators.hpp>
#include <boost/asio/experimental/channel.hpp>
namespace asio = boost::asio;
using namespace asio::experimental::awaitable_operators;
using channel_t = asio::experimental::channel< void(boost::system::error_code, uint64_t) >;
asio::awaitable< void >
producer(channel_t &ch)
{
for (uint64_t i = 0; i < 3'000'000; i++)
co_await ch.async_send(boost::system::error_code {}, i, asio::use_awaitable);
ch.close();
}
asio::awaitable< void >
consumer(channel_t &ch)
{
for (;;)
co_await ch.async_receive(asio::use_awaitable);
}
asio::awaitable< void >
experiment()
{
channel_t ch { co_await asio::this_coro::executor, 1000 };
co_await (consumer(ch) && producer(ch));
}
int
main()
{
asio::io_context ctx {};
asio::co_spawn(ctx, experiment(), asio::detached);
ctx.run();
}

You can save a little by providing hints about the threading:
provide concurrency hint unsafe (BOOST_ASIO_CONCURRENCY_HINT_UNSAFE)
optionally disabling all threading - this will in practice probably not matter, it's just possible as long as you don't need any services that employ internal threads)
avoiding type erasure on the executor; this means replacing any_io_executor with the concrete executor type that you employ
I wrote a side-by-side benchmark with reduced message-count (30k) so that Nonius can sample 100 runs and do statistical analysis on the results:
//#define TWEAKS
#ifdef TWEAKS
#define BOOST_ASIO_DISABLE_THREADS 1
#endif
#include <boost/asio.hpp>
#include <boost/asio/experimental/awaitable_operators.hpp>
#include <boost/asio/experimental/channel.hpp>
#include <iostream>
namespace asio = boost::asio;
using namespace asio::experimental::awaitable_operators;
using boost::system::error_code;
using context = asio::io_context;
#ifdef TWEAKS
using executor_t = context::executor_type;
using channel_t = asio::experimental::channel<executor_t, void(error_code, uint64_t)>;
#else
using executor_t = asio::any_io_executor;
using channel_t = asio::experimental::channel<void(error_code, uint64_t)>;
#endif
asio::awaitable<void> producer(channel_t& ch) {
for (uint64_t i = 0; i < 30'000; i++)
co_await ch.async_send(error_code {}, i, asio::use_awaitable);
ch.close();
}
asio::awaitable<void> consumer(channel_t& ch) {
for (;;)
co_await ch.async_receive(asio::use_awaitable);
}
asio::awaitable<void> experiment() {
asio::any_io_executor ex = co_await asio::this_coro::executor;
channel_t ch { *ex.target<executor_t>(), 1000 };
co_await (consumer(ch) && producer(ch));
}
void foo() {
try {
#ifdef TWEAKS
asio::io_context ctx{BOOST_ASIO_CONCURRENCY_HINT_UNSAFE};
#else
asio::io_context ctx{1};
#endif
asio::co_spawn(ctx, experiment(), asio::detached);
ctx.run();
} catch (std::exception& e) {
std::cerr << "Exception: " << e.what() << "\n";
}
}
#include <nonius/benchmark.h++>
#define NONIUS_RUNNER
#include <nonius/main.h++>
NONIUS_BENCHMARK( //
"foo", //
[](nonius::chronometer cm) { cm.measure([] { foo(); }); })
The results per 30k batch (including construction and teardown) are:
Without TWEAKS: mean 12.091 ± 0.233ms (full data graph)
With TWEAKS defined: mean 8.784±0.097ms (full data graph)
So ~25% speed increase, and also much reduced variance.
Combining the series in one graph:
Thoughts
These are just the Asio technical tweaks. I might be missing some still.
I suspect you should be able to get much better throughput with smart buffering. I'm assuming you need the Asio integration for other reasons, making this the right choice.

It turned out the consumer and producer sides are scheduled in the event loop on each send/receive operation that's why channel size has no effect on the throughout.
I've changed the code to the following and now it can send 90M per seconds. but this is what I was expected from the implementation.
asio::awaitable< void >
producer(channel_t &ch)
{
for (uint64_t i = 0; i < 90'000'000; i++)
{
if (!ch.try_send(boost::system::error_code {}, i))
co_await ch.async_send(boost::system::error_code {}, i, asio::use_awaitable);
}
ch.close();
}
asio::awaitable< void >
consumer(channel_t &ch)
{
for (;;)
{
if (!ch.try_receive([](auto, auto) {}))
co_await ch.async_receive(asio::use_awaitable);
}
}
I think the reason that this is not the default behavior of channels is that because there is no way for awaitables in asio to return true in await_ready() call they always have to suspend and initiate an asynchronous operation.

Related

How to make a timeout at receiving in boost::asio udp::socket?

I create an one-thread application which exchanges with another one via UDP. When the second is disconnecting, my socket::receive_from blocks and I don't know how to solve this problem not changing the entire program into multi-threads or async interactions.
I thought that next may be a solution:
std::chrono::milliseconds timeout{4};
boost::system::error_code err;
data_t buffer(kPackageMaxSize);
std::size_t size = 0;
const auto status = std::async(std::launch::async,
[&]{
size = socket_.receive_from(boost::asio::buffer(buffer), dst_, 0, err);
}
).wait_for(timeout);
switch (status)
{
case std::future_status::timeout: /*...*/ break;
}
But I achieved a new problem: Qt Creator (GDB 11.1) (I don't have ability to try something yet) began to fall when I am debugging. If it runs without, the solution also not always works.
PS. As for "it doesn't work when debugging", debugging (specifically breakpoints) obviously changes timing. Also, keep in mind network operations have varying latency and UDP isn't a guaranteed protocol: messages may not be delivered.
Asio stands for "Asynchronous IO". As you might suspect, this means that asynchronous IO is a built-in feature, it's the entire purpose of the library. See overview/core/async.html: Concurrency Without Threads
It's not necessary to complicate with std::async. In your case I'd suggest using async_receive_from with use_future, as it is closest to the model you opted for:
Live On Coliru
#include <boost/asio.hpp>
#include <iostream>
#include <iomanip>
namespace net = boost::asio;
using net::ip::udp;
using namespace std::chrono_literals;
constexpr auto kPackageMaxSize = 65520;
using data_t = std::vector<char>;
int main() {
net::thread_pool ioc;
udp::socket socket_(ioc, udp::v4());
socket_.bind({{}, 8989});
udp::endpoint ep;
data_t buffer(kPackageMaxSize);
auto fut =
socket_.async_receive_from(net::buffer(buffer), ep, net::use_future);
switch (fut.wait_for(4ms)) {
case std::future_status::ready: {
buffer.resize(fut.get()); // never blocks here
std::cout << "Received " << buffer.size() << " bytes: "
<< std::quoted(
std::string_view(buffer.data(), buffer.size()))
<< "\n";
break;
}
case std::future_status::timeout:
case std::future_status::deferred: {
std::cout << "Timeout\n";
socket_.cancel(); // stop the IO operation
// fut.get() would throw system_error(net::error::operation_aborted)
break;
}
}
ioc.join();
}
The Coliru output:
Received 12 bytes: "Hello World
"
Locally demonstrating both timeout and successful path:

What could be a better for condition_variables

I am trying to make a multi threaded function it looks like:
namespace { // Anonymous namespace instead of static functions.
std::mutex log_mutex;
void Background() {
while(IsAlive){
std::queue<std::string> log_records;
{
// Exchange data for minimizing lock time.
std::unique_lock lock(log_mutex);
logs.swap(log_records);
}
if (log_records.empty()) {
Sleep(200);
continue;
}
while(!log_records.empty()){
ShowLog(log_records.front());
log_records.pop();
}
}
}
void Log(std::string log){
std::unique_lock lock(log_mutex);
logs.push(std::move(log));
}
}
I use Sleep to prevent high CPU usages due to continuously looping even if logs are empty. But this has a very visible draw back that it will print the logs in batches. I tried to get over this problem by using conditional variables but in there the problem is if there are too many logs in a short time then the cv is stopped and waked up many times leading to even more CPU usage. Now what can i do to solve this issue?
You can assume there may be many calls to log per second.
I would probably think of using a counting semaphore for this:
The semaphore would keep a count of the number of messages in the logs (initially zero).
Log clients would write a message and increment by one the number of messages by releasing the semaphore.
A log server would do an acquire on the semaphore, blocking until there was any message in the logs, and then decrementing by one the number of messages.
Notice:
Log clients get the logs queue lock, push a message, and only then do the release on the semaphore.
The log server can do the acquire before getting the logs queue lock; this would be possible even if there were more readers. For instance: 1 message in the log queue, server 1 does an acquire, server 2 does an acquire and blocks because semaphore count is 0, server 1 goes on and gets the logs queue lock...
#include <algorithm> // for_each
#include <chrono> // chrono_literasl
#include <future> // async, future
#include <iostream> // cout
#include <mutex> // mutex, unique_lock
#include <queue>
#include <semaphore> // counting_semaphore
#include <string>
#include <thread> // sleep_for
#include <vector>
std::mutex mtx{};
std::queue<std::string> logs{};
std::counting_semaphore c_semaphore{ 0 };
int main()
{
auto log = [](std::string message) {
std::unique_lock lock{ mtx };
logs.push(std::move(message));
c_semaphore.release();
};
auto log_client = [&log]() {
using namespace std::chrono_literals;
static size_t s_id{ 1 };
size_t id{ s_id++ };
for (;;)
{
log(std::to_string(id));
std::this_thread::sleep_for(id * 100ms);
}
};
auto log_server = []() {
for (;;)
{
c_semaphore.acquire();
std::unique_lock lock{ mtx };
std::cout << logs.front() << " ";
logs.pop();
}
};
std::vector<std::future<void>> log_clients(10);
std::for_each(std::begin(log_clients), std::end(log_clients),
[&log_client](auto& lc_fut) {
lc_fut = std::async(std::launch::async, log_client);
});
auto ls_fut{ std::async(std::launch::async, log_server) };
std::for_each(std::begin(log_clients), std::end(log_clients),
[](auto& lc_fut) { lc_fut.wait(); });
ls_fut.wait();
}

Wait until A job (as starkly opposed to ALL jobs) posted to boost::asio::thread_pool completes?

I post multiple jobs to a boost::asio::thread_pool, and I want to process their results as soon as each individual result is available. I would 1000× prefer an "event loop" idiom for this over callback idiom, because the event loop idiom automatically does synchronization for me: only one thread consumes the results and writes them to an aggregate data structure without needing to synchronize access to it. The documentation is unclear on how to do this; the examples given use lots of difficult concepts and seem about 10× more complicated than a typical instance of the event loop idiom should be.
Is the event loop idiom supported by boost::asio at all?
Asio has the proactor model. Basically, a service is "run" by executing new handlers when they are ready. If you view a handler as an event, then you will see that you already have what you are after.
The pool operates on an event loop per thread. So, you have a task, and have it post the continuation event to the pool. Same guarantees that you describe.
Demo
The following code runs taskB that depends on taskA as soon as taskA completes. It runs four of these tasks in parallel:
Live On Coliru
// #define BOOST_ASIO_ENABLE_HANDLER_TRACKING
#include <boost/asio.hpp>
#include <random>
#include <iostream>
#include <iomanip>
using namespace std::literals;
auto now = std::chrono::high_resolution_clock::now;
static auto const start = now();
static void randelay()
{
thread_local auto gen = std::bind(std::uniform_int_distribution<>(100, 800),
std::mt19937{std::random_device{}()});
std::this_thread::sleep_for(gen() * 1ms);
}
int main()
{
auto taskB = [](int resultFromTaskA) {
randelay();
static std::mutex mx;
std::lock_guard lk(mx);
std::cout << "at " << std::setw(4) << (now() - start) / 1ms
<< "ms taskB resultFromTaskA: " << resultFromTaskA
<< std::endl;
};
boost::asio::thread_pool ctx;
auto executor = ctx.get_executor();
auto taskA = [=](int payload) {
randelay();
post(executor, std::bind(taskB, payload*payload));
};
for (auto i = 1; i < 5; ++i)
post(ctx, std::bind(taskA, i));
ctx.join();
}
Prints e.g.
at 234ms taskB resultFromTaskA: 16
at 837ms taskB resultFromTaskA: 4
at 1214ms taskB resultFromTaskA: 1
at 1290ms taskB resultFromTaskA: 9
And with enabled handler tracking:

Boost.Asio contrived example inexplicably blocking

I've used Boost.Asio extensively, but I've came across a problem with a unit test that I don't understand. I've reduced the problem down to a very contrived example:
#include <string>
#include <chrono>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <boost/asio.hpp>
#define BOOST_TEST_MODULE My_Module
#define BOOST_TEST_DYN_LINK
#include <boost/test/unit_test.hpp>
#include <boost/test/auto_unit_test.hpp>
using namespace std::string_literals;
using namespace std::chrono_literals;
namespace BA = boost::asio;
namespace BAI = BA::ip;
BOOST_AUTO_TEST_CASE(test)
{
std::mutex m;
std::condition_variable cv;
BA::io_service servicer;
auto io_work = std::make_unique<BA::io_service::work>(servicer);
auto thread = std::thread{[&]() {
servicer.run();
}};
auto received_response = false;
auto server_buf = std::array<std::uint8_t, 4096>{};
auto server_sock = BAI::tcp::socket{servicer};
auto acceptor = BAI::tcp::acceptor{servicer,
BAI::tcp::endpoint{BAI::tcp::v4(), 20123}};
acceptor.async_accept(server_sock, [&](auto&& ec) {
if (ec) {
BOOST_TEST_MESSAGE(ec.message());
}
BOOST_REQUIRE(!ec);
BOOST_TEST_MESSAGE("Accepted connection from " << server_sock.remote_endpoint() <<
", reading...");
BA::async_read(server_sock,
BA::buffer(server_buf),
[&](auto&& ec, auto&& bytes_read){
std::unique_lock<decltype(m)> ul(m);
received_response = true;
if (ec) {
BOOST_TEST_MESSAGE(ec.message());
}
BOOST_REQUIRE(!ec);
const auto str = std::string{server_buf.begin(),
server_buf.begin() + bytes_read};
BOOST_TEST_MESSAGE("Read: " << str);
ul.unlock();
cv.notify_one();
});
});
const auto send_str = "hello"s;
auto client_sock = BAI::tcp::socket{servicer, BAI::tcp::v4()};
client_sock.async_connect(BAI::tcp::endpoint{BAI::tcp::v4(), 20123},
[&](auto&& ec) {
if (ec) {
BOOST_TEST_MESSAGE(ec.message());
}
BOOST_REQUIRE(!ec);
BOOST_TEST_MESSAGE("Connected...");
BA::async_write(client_sock,
BA::buffer(send_str),
[&](auto&& ec, auto&& bytes_written) {
if (ec) {
BOOST_TEST_MESSAGE(ec.message());
}
BOOST_REQUIRE(!ec);
BOOST_TEST_MESSAGE("Written " << bytes_written << " bytes");
});
});
std::unique_lock<decltype(m)> ul(m);
cv.wait_for(ul, 2s, [&](){ return received_response; });
BOOST_CHECK(received_response);
io_work.reset();
servicer.stop();
if (thread.joinable()) {
thread.join();
}
}
which I compile with:
g++ -std=c++17 source.cc -l boost_unit_test_framework -pthread -l boost_system -ggdb
The output is:
Accepted connection from 127.0.0.1:51688, reading...
Connected...
Written 5 bytes
And then it times out.
Running through the debugger shows that the async_read handler is never called. Pausing execution during the phase where it doesn't seem to be doing anything, shows that the main thread is waiting on the condition_variable (cv) and the io_service thread is on an epoll_wait.
I seem to be deadlocking but can't see how.
This is how the function is defined to work, it waits for exactly the number of bytes that the buffer has space for (http://www.boost.org/doc/libs/1_62_0/doc/html/boost_asio/reference/async_read/overload1.html).
Try this one instead: http://www.boost.org/doc/libs/1_62_0/doc/html/boost_asio/reference/async_read/overload2.html
You can give a callback to decide whether the read is complete and that could include waiting for and checking a length provided by another channel once the writer has written its message (if you've determined a deadlock-free way to do that) or just before the message proper.
Adding this completion condition makes it work:
[&](auto&& ec, auto&& bytes_read){
return bytes_read < 5 ? 5 - bytes_read : 0;
},
The answer provided by #codeshot is correct, but it is one of several solutions - which is most appropriate is dependent entirely upon the protocol you're using across the TCP connection.
For example, in a traditional Key-Length-Value style protocol, you would do two reads:
Using boost::asio::async_read (or equivalent) to read into fixed length buffer to obtain the fixed-length header
Use the length specified by the header to create a buffer of the required size, and repeat step 1 using it
There's a good example of this in the chat server example code.
If you were using HTTP or RTSP (the latter is what I was trying to do), then you don't know how much is data is coming, all you care about is receiving a packet's worth of data (I know this is an oversimplification due to the Content-Length header in responses, chunked transfer encoding, etc. but bear with me). For this you need async_read_some (or equivalent), see the HTTP server example.

setting the execution rate of while loop in a C++ code for real time synchronization

I am doing a real_time simulation using a .cpp source code. I have to take a sample every 0.2 seconds (200 ms) ... There is a while loop that takes a sample every time step... I want to synchronize the execution of this while loop to get a sample every (200 ms) ... How should I modify the while loop ?
while (1){
// get a sample every 200 ms
}
Simple and accurate solution with std::this_thread::sleep_until:
#include "date.h"
#include <chrono>
#include <iostream>
#include <thread>
int
main()
{
using namespace std::chrono;
using namespace date;
auto next = steady_clock::now();
auto prev = next - 200ms;
while (true)
{
// do stuff
auto now = steady_clock::now();
std::cout << round<milliseconds>(now - prev) << '\n';
prev = now;
// delay until time to iterate again
next += 200ms;
std::this_thread::sleep_until(next);
}
}
"date.h" isn't needed for the delay part. It is there to provide the round<duration> function (which is now in C++17), and to make it easier to print out durations. This is all under "do stuff", and doesn't matter for the loop delay.
Just get a chrono::time_point, add your delay to it, and sleep until that time_point. Your loop will on average stay true to your delay, as long as your "stuff" takes less time than your delay. No other thread needed. No timer needed. Just <chrono> and sleep_until.
This example just output for me:
200ms
205ms
200ms
195ms
205ms
198ms
202ms
199ms
196ms
203ms
...
what you are asking is tricky, unless you are using a real-time operating system.
However, Boost has a library that supports what you want. (There is, however, no guarantee that you are going to be called exactly every 200ms.
The Boost ASIO library is probably what you are looking for though, here is code from their tutorial:
//
// timer.cpp
// ~~~~~~~~~
//
// Copyright (c) 2003-2012 Christopher M. Kohlhoff (chris at kohlhoff dot com)
//
// Distributed under the Boost Software License, Version 1.0. (See accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
//
#include <iostream>
#include <boost/asio.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
int main()
{
boost::asio::io_service io;
boost::asio::deadline_timer t(io, boost::posix_time::seconds(5));
t.wait();
std::cout << "Hello, world!\n";
return 0;
}
link is here: link to boost asio.
You could take this code, and re-arrange it like this
#include <iostream>
#include <boost/asio.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
int main()
{
boost::asio::io_service io;
while(1)
{
boost::asio::deadline_timer t(io, boost::posix_time::seconds(5));
// process your IO here - not sure how long your IO takes, so you may need to adjust your timer
t.wait();
}
return 0;
}
There is also a tutorial for handling the IO asynchronously on the next page(s).
The offered answers show you that there are tools available in Boost to help you accomplish this. My late offering illustrates how to use setitimer(), which is a POSIX facility for iterative timers.
You basically need a change like this:
while (1){
// wait until 200 ms boundary
// get a sample
}
With an iterative timer, the fired signal would interrupt any blocked signal call. So, you could just block on something forever. select will do fine for that:
while (1){
int select_result = select(0, 0, 0, 0, 0);
assert(select_result < 0 && errno == EINTR);
// get a sample
}
To establish an interval timer for every 200 ms, use setitimer(), passing in an appropriate interval. In the code below, we set an interval for 200 ms, where the first one fires 150 ms from now.
struct itimerval it = { { 0, 200000 }, { 0, 150000 } };
if (setitimer(ITIMER_REAL, &it, 0) != 0) {
perror("setitimer");
exit(EXIT_FAILURE);
}
Now, you just need to install a signal handler for SIGALRM that does nothing, and the code is complete.
You can follow the link to see the completed example.
If it is possible for multiple signals to be fired during the program execution, then instead of relying on the interrupted system call, it is better to block on something that the SIGALRM handler can wake up in a deterministic way. One possibility is to have the while loop block on read of the read end of a pipe. The signal handler can then write to the write end of that pipe.
void sigalarm_handler (int)
{
if (write(alarm_pipe[1], "", 1) != 1) {
char msg[] = "write: failed from sigalarm_handler\n";
write(2, msg, sizeof(msg)-1);
abort();
}
}
Follow the link to see the completed example.
#include <thread>
#include <chrono>
#include <iostream>
int main() {
std::thread timer_thread;
while (true) {
timer_thread = std::thread([](){
std::this_thread::sleep_for (std::chrono::seconds(1));
});
// do stuff
std::cout << "Hello World!" << std::endl;
// waits until thread has "slept"
timer_thread.join();
// will loop every second unless the stuff takes longer than that.
}
return 0;
}
To get absolute percision will be nearly impossible - maybe in embedded systems. However, if you require only an approximate frequency, you can get pretty decent performance with a chrono library such as std::chrono (c++11) or boost::chrono. Like so:
while (1){
system_clock::time_point now = system_clock::now();
auto duration = now.time_since_epoch();
auto start_millis = std::chrono::duration_cast<std::chrono::milliseconds>(duration).count();
//run sample
now = system_clock::now();
duration = now.time_since_epoch();
auto end_millis = std::chrono::duration_cast<std::chrono::milliseconds>(duration).count();
auto sleep_for = max(0, 200 - (end_millis - start_millis ));
std::this_thread::sleep_for( sleep_for );
}