Reading from one socket for several consumers asynchronously in one thread - c++

I am implementing a connection multiplexer - class, which wraps a single connection in order to provide an ability to create so-called Stream-s over it. There can be dozens of such streams over one physical connection.
Messages sent over that connection are defined by a protocol and can be service ones (congestion control, etc), which are never seen by the clients, and data ones - they contain some data for the streams, for which one - defined in the header of the corresponding message.
I have encountered a problem when implementing a method read for a Stream. It must be blocking, but asynchronous, so that it returns some value - data read or error happened - but the request itself must be is some kind of async queue.
To implement asynchronous network IO we have used Boost's async_read-s, async_write-s, etc with a completion token, taken from another library. So, a call to MyConnection::underlying_connection::read(size_t) is asynchronous already in the terms I described before.
One solution I have implemented is function MyConnection::processFrame(), which is reading from the connection, processing message and, if it is a data message, puts the data into the corresponding stream's buffer. The function is to be called in a while loop by the stream's read. But, in that case there can be more than one simulteneous calls to async_read, which is UB. Also, this would mean that even service messages are to wait until some stream wants to read the data, which is not appropriate as well.
Another solution I came up is using future-s, but as I checked, their methods wait/get would block the whole thread (even with defered policy or paired promise), which must be avoided too.
Below is a simplified example with only methods, which are needed to understand the question. This is current implementation, which contains bugs.
struct LowLevelConnection {
/// completion token of 3-rd part library - ufibers
yield_t yield;
/// boost::asio socket
TcpSocket socket_;
/// completely async (in one thread) method
std::vector<uint8_t> read(size_t bytes) {
std::vector<uint8_t> res;
boost::asio::async_read(socket_, res, yield);
return res;
struct MyConnection {
/// header is always of that length
constexpr uint32_t kHeaderSize = 12;
/// underlying connection
LowLevelConnection connection_;
/// is running all the time the connection is up
void readLoop() {
while (connection_.isActive()) {
auto msg =;
if (msg.type == SERVICE) { handleService(msg); return; }
// this is data message; read another part of it
auto data =;
// put the data into the stream's buffer
struct Stream {
Buffer buffer;
// also async blocking method
std::vector<uint8_t> read(uint32_t bytes) {
// in perfect scenario, this should look like this
async_wait([]() { return buffer.size() >= bytes; });
// return the subbuffer of 'bytes' size and remove them
return subbufer...
Boost asio:async_read() using boost::asio::use_future

When calling asio::async_read() using a future, is there a way to get the number of bytes transferred when a boost:asio::error::eof exception occurs? It would seem that there are many cases when one would want to get the data transferred even if the peer disconnects.
For example:
namespace ba = boost::asio;
int32_t Session::read (unsigned char* pBuffer, uint32_t bufferSizeToRead)
// Create a mutable buffer
ba::mutable_buffer buffer (pBuffer, bufferSizeToRead);
int32_t result = 0;
// We do an async call using a future. A thread from the io_context pool does the
// actual read while the the thread calling this method will blocks on the
// std::future::get()
std::future<std::size_t> future =
ba::async_read(m_socket, buffer, ba::bind_executor(m_sessionStrand, ba::use_future));
// We block the calling thread here until we get the results of the async_read_some()...
result = future.get();
catch (boost::system::system_error &ex) // boost::system::system_error
auto exitCode = ex.code().value();
if ( exitCode == ba::error::eof )
log ("Connection closed by the peer");
return results; // This is zero if eof occurs
The code sample above represents our issue. It was designed to support a 3rd-party library. The library expects a blocking call. The new code under development is using ASIO with a minimal number of network threads. The expectation is that this 3rd party library calls session::read using its dedicated thread and we adapt the call to an asynchronous call. The network call must be async since we are supporting many such calls from different libraries with minimal threads.
What was unexpected and discovered late is that ASIO treats a connection closed as an error. Without the future, using a handler we could get the bytes transferred up to the point where the disconnect occurred. However, using a future, the exception is thrown and the bytes transferred becomes unknown.
void handler (const boost::system::error_code& ec,
std::size_t bytesTransferred );
Is there a way to do the above with a future and also get the bytes transferred?
Or ss there an alternative approach where we can provide the library a blocking call by still use an asio::async_read or similar.
Our expectation is that we could get the bytes transferred even if the client closed the connection. We're puzzled that when using a future this does not seem possible.
It's an implementation limitation of futures.
Modern async_result<> specializations (that use the initiate member approach) can be used together with as_tuple, e.g.:
ba::awaitable<std::tuple<boost::system::error_code, size_t>> a =
ba::async_read(m_socket, buffer, ba::as_tuple(ba::use_awaitable));
Or, more typical:
auto [ec, n] = co_await async_read(m_socket, buffer, ba::as_tuple(ba::use_awaitable));
However, the corresponding:
auto future = ba::async_read(m_socket, buffer, ba::as_tuple(ba::use_future));
isn't currently supported. It arguably could, but you'd have to create your own completion token, or ask Asio devs to add support to use_future:
Side-note: if you construct the m_socket from the m_sessioStrand executor, you do not need to bind_executor to the strand:
using Executor = net::io_context::executor_type;
struct Session {
int32_t read(unsigned char* pBuffer, uint32_t bufferSizeToRead);
net::io_context m_ioc;
net::strand<Executor> m_sessionStrand{m_ioc.get_executor()};
tcp::socket m_socket{m_sessionStrand};

C++ GRPC ClientAsyncReaderWriter: how to check if data is available for read?

I have bidirectional streaming async grpc client that use ClientAsyncReaderWriter for communication with server. RPC code looks like:
rpc Process (stream Request) returns (stream Response)
For simplicity Request and Response are bytes arrays (byte[]). I send several chunks of data to server, and when server accumulate enough data, server process this data and send back the response and continue accumulating data for next responses. After several responses, the server send final response and close connection.
For async client I using CompletionQueue. Code looks like:
CompletionQueue cq;
std::unique_ptr<Stub> stub;
grpc::ClientContext context;
std::unique_ptr<grpc::ClientAsyncReaderWriter<Request,Response>> responder = stub->AsyncProcess(&context, &cq, handler);
// thread for completition queue
std::thread t(
void *handler = nullptr;
bool ok = false;
while (cq_.Next(&handler, &ok)) {
if (can_read) {
// how do you know that it is read data available
// Do read
} else {
// do write
Request request = prepare_request();
responder_->Write(request, handler);
// wait
What is the proper way to async reading? Can I try to read if it no data available? Is it blocking call?
Sequencing Read() calls
Can I try to read if it no data available?
Yep, and it's going to be case more often than not. Read() will do nothing until data is available, and only then put its passed tag into the completion queue. (see below for details)
Is it blocking call?
Nope. Read() and Write() return immediately. However, you can only have one of each in flight at any given moment. If you try to send a second one before the previous has completed, it (the second one) will fail.
What is the proper way to async reading?
Each time a Read() is done, start a new one. For that, you need to be able to tell when a Read() is done. This is where tags come in!
When you call Read(&msg, tag), or Write(request, tag),you are telling grpc to put tag in the completion queue associated with that responder once that operation has completed. grpc doesn't care what the tag is, it just hands it off.
So the general strategy you will want to go for is:
As soon as you are ready to start receiving messages:
call responder->Read() once with some tag that you will recognize as a "read done".
Whenever cq_.Next() gives you back that tag, and ok == true:
consume the message
Queue up a new responder->Read() with that same tag.
Obviously, you'll also want to do something similar for your calls to Write().
But since you still want to be able to lookup the handler instance from a given tag, you'll need a way to pack a reference to the handler as well as information about which operation is being finished in a single tag.
Completion queues
Lookup the handler instance from a given tag? Why?
The true raison d'ĂȘtre of completion queues is unfortunately not evident from the examples. They allow multiple asynchronous rpcs to share the same thread. Unless your application only ever makes a single rpc call, the handling thread should not be associated with a specific responder. Instead, that thread should be a general-purpose worker that dispatches events to the correct handler based on the content of the tag.
The official examples tend to do that by using pointer to the handler object as the tag. That works when there's a specific sequence of events to expect since you can easily predict what a handler is reacting to. You often can't do that with async bidirectional streams, since any given completion event could be a Read() or a Write() finishing.
Here's a general outline of what I personally consider to be a clean way to go about all that:
// Base class for async bidir RPCs handlers.
// This is so that the handling thread is not associated with a specific rpc method.
class RpcHandler {
// This will be used as the "tag" argument to the various grpc calls.
struct TagData {
enum class Type {
// add more as needed...
RpcHandler* handler;
Type evt;
struct TagSet {
TagSet(RpcHandler* self)
: start_done{self, TagData::Type::start_done},
read_done{self, TagData::Type::read_done},
write_done{self, TagData::Type::write_done} {}
TagData start_done;
TagData read_done;
TagData write_done;
RpcHandler() : tags(this) {}
virtual ~RpcHandler() = default;
// The actual tag objects we'll be passing
TagSet tags;
virtual void on_ready() = 0;
virtual void on_recv() = 0;
virtual void on_write_done() = 0;
static void handling_thread_main(grpc::CompletionQueue* cq) {
void* raw_tag = nullptr;
bool ok = false;
while (cq->Next(&raw_tag, &ok)) {
TagData* tag = reinterpret_cast<TagData*>(raw_tag);
if(!ok) {
// Handle error
else {
switch (tag->evt) {
case TagData::Type::start_done:
case TagData::Type::read_done:
case TagData::Type::write_done:
void do_something_with_response(Response const&);
class MyHandler final : public RpcHandler {
using responder_ptr =
std::unique_ptr<grpc::ClientAsyncReaderWriter<Request, Response>>;
MyHandler(responder_ptr responder) : responder_(std::move(responder)) {
// This lock is needed because StartCall() can
// cause the handler thread to access the object.
std::lock_guard lock(mutex_);
~MyHandler() {
// TODO: finish/abort the streaming rpc as appropriate.
void send(const Request& msg) {
std::lock_guard lock(mutex_);
if (!sending_) {
sending_ = true;
responder_->Write(msg, &tags.write_done);
} else {
// TODO: add some form of synchronous wait, or outright failure
// if the queue starts to get too big.
// When the rpc is ready, queue the first read
void on_ready() override {
std::lock_guard l(mutex_); // To synchronize with the constructor
responder_->Read(&incoming_, &tags.read_done);
// When a message arrives, use it, and start reading the next one
void on_recv() override {
// incoming_ never leaves the handling thread, so no need to lock
// ------ If handling is cheap and stays in the handling thread.
responder_->Read(&incoming_, &tags.read_done);
// ------ If responses is expensive or involves another thread.
// Response msg = std::move(incoming_);
// responder_->Read(&incoming_, &tags.read_done);
// do_something_with_response(msg);
// When has been sent, send the next one is there is any
void on_write_done() override {
std::lock_guard lock(mutex_);
if (!queued_msgs_.empty()) {
responder_->Write(queued_msgs_.front(), &tags.write_done);
} else {
sending_ = false;
responder_ptr responder_;
// Only ever touched by the handler thread post-construction.
Response incoming_;
bool sending_ = false;
std::queue<Request> queued_msgs_;
std::mutex mutex_; // grpc might be thread-safe, MyHandler isn't...
int main() {
// Start the thread as soon as you have a completion queue.
auto cq = std::make_unique<grpc::CompletionQueue>();
std::thread t(RpcHandler::handling_thread_main, cq.get());
// Multiple concurent RPCs sharing the same handling thread:
MyHandler handler1(serviceA->MethodA(&context, cq.get()));
MyHandler handler2(serviceA->MethodA(&context, cq.get()));
MyHandlerB handler3(serviceA->MethodB(&context, cq.get()));
MyHandlerC handler4(serviceB->MethodC(&context, cq.get()));
If you have a keen eye, you will notice that the code above stores a bunch (1 per event type) of redundant this pointers in the handler. It's generally not a big deal, but it is possible to do without them via multiple inheritance and downcasting, but that's starting to be somewhat beyond the scope of this question.

C++ weird async behaviour

Note that I'm using boost async, due to the lack of threading classes support in MinGW.
So, I wanted to send a packet every 5 seconds and decided to use boost::async (std::async) for this purpose.
This is the function I use to send the packet (this is actually copying to the buffer and sending in the main application loop - nvm - it's working fine outside async method!)
m_sendBuf = new char[1024]; // allocate buffer
bool CNetwork::Send(const void* sourceBuffer, size_t size) {
size_t bufDif = m_sendBufSize - m_sendInBufPos;
if (size > bufDif) {
return false;
memcpy(m_sendBuf + m_sendInBufPos, sourceBuffer, size);
m_sendInBufPos += size;
return true;
Packet sending code:
struct TestPacket {
unsigned char type;
int code;
void SendPacket() {
TestPacket myPacket{};
myPacket.type = 10;
myPacket.code = 1234;
Send(&TestPacket, sizeof(myPacket));
Async code:
void StartPacketSending() {
StartPacketSending(); // Recursive endless call
boost::async(boost::launch::async, &StartPacketSending);
Alright. So the thing is, when I call SendPacket() from the async method, received packet is malformed on the server side and the data is different than specified. This doesn't happend when called outside the async call.
What is going on here? I'm out of ideas.
I think I have my head wrapped around what you are doing here. You are loading all unsent in to buffer in one thread and then flushing it in a different thread. Even thought the packets aren't overlapping (assuming they are consumed quickly enough), you still to synchronize all the shared data.
m_sendBuf, m_sendInPos, and m_sendBufSize are all being read from the main thread, likely while memcpy or your buffer size logic is running. I suspect you will have to use a proper queue to get your program to work as intended in the long run, but try protecting those variables with a mutex.
Also as other commenters have pointed out, infinite recursion is not supported in C++, but that probably does not contribute to your malformed packets.

boost read_some function lost data

I'm implementing a tcp server with boost asio library.
In the server, I use asio::async_read_some to get data, and use asio::write to write data. The server code is something like that.
std::array<char, kBufferSize> buffer_;
std::string ProcessMessage(const std::string& s) {
if (s == "msg1") return "resp1";
if (s == "msg2") return "resp2";
return "";
void HandleRead(const boost::system::error_code& ec, size_t size) {
std::string message(, size);
std::string resp = ProcessMessage(message);
if (!resp.empty()) {
asio::write(socket, boost::asio::buffer(message), WriteCallback);
Then I write a client to test the server, the code is something like
void MessageCallback(const boost::system::error_code& ec, size_t size) {
std::cout << string(, size) << std::endl;
//Init socket
asio::write(socket, boost::asio::buffer("msg1"));
socket.read_some(boost::asio::buffer(buffer_), MessageCallback);
// Or async_read
//socket.async_read_some(boost::asio::buffer(buffer_), MessageCallback);
asio::write(socket, boost::asio::buffer("msg1"));
socket.read_some(boost::asio::buffer(buffer_), MessageCallback);
// Or async_read
//socket.async_read_some(boost::asio::buffer(buffer_), MessageCallback);
If I run the client, the code will be waiting at second read_some, and output is:resp1.
If I remove the first read_some, the ouput is resp1resp2, that means the server done the right thing.
It seems the first read_some EAT the second response but don't give the response to MessageCallback function.
I've read the quesion at What is a message boundary?, I think if this problem is a "Message Boundary" problem, the second read_some should print something as the first read_some only get part of stream from the tcp socket.
How can I solve this problem?
I've try to change the size of client buffer to 4, that output will be:
It seems the read_some function will do a little more than read from the socket, I'll read the boost code to find out is that true.
The async_read_some() member function is very likely not doing what you intend, pay special attention to the Remarks section of the documentation
The read operation may not read all of the requested number of bytes.
Consider using the async_read function if you need to ensure that the
requested amount of data is read before the asynchronous operation
Note that async_read() free function does offer the guarantee that you are looking for
This operation is implemented in terms of zero or more calls to the
stream's async_read_some function, and is known as a composed
operation. The program must ensure that the stream performs no other
read operations (such as async_read, the stream's async_read_some
function, or any other composed operations that perform reads) until
this operation completes.

winsock, message oriented networking, and type-casting the buffer from recv

Okay, I actually don't have code as of yet because i'm just picking out a framework for the time being, but i'm still a little baffled about how i wish to go about this :.
Server side, i wish to have a class where each instance has a socket and various information identifying each connection. each object will have it's own thread for receiving data. I understand how i'll be implementing most of that, but my confusion starts just as i get to the actual transfer of data between server and client. I'll want to have a bunch of different message structs for specific cases, (for example CONNECT_MSG , DISCONNECT_MSG, POSTTEXT_MSG, etc) and then all i have to do is have a char * point at that struct and then pass it via the send() function.
But as i think on it, it gets a little complicated at that point. Any of those different message types could be sent, and on the receiving end, you will have no idea what you should cast the incoming buffer as. What i was hoping to do is, in the thread of each connection object, have it block until it receives a packet with a message, then dump it into a single queue object managed by the server(mutexes will prevent greediness) and then the server will process each message in FIFO order independent of the connection objects.
I havn't written anything yet, but let me write a little something to illustrate my setup.
#define CONNECT 1000
int id;
void Connection::Thread()
char buffer[MAX_BUFFER_SIZE]; // some constant(probably 2048)
recv(m_socket, buffer, MAX_BUFFER_SIZE, 0);
MESSAGE_GENERIC * msg = reinterpret_cast<MESSAGE_GENERIC *> (buffer);
void Server::QueueMessage(MESSAGE_GENERIC * msg)
void Server::Thread()
void Server::ProcessMessages()
for(int i = 0; i < messageQueue.size(); i++)
// the part i REALLY don't like
CONNECT_MESSAGE * msg = static_cast<CONNECT_MESSAGE *>(messageQueue.front() );
// do the rest of the processing on connect
// other cases for the other message types
Now if you've been following up until now, you realize just how STUPID and fragile this is. it casts to the base class, passes that pointer to a queue, and then just assumes that the pointer is still valid from the other thread, and even then whether or not the remaining buffer after the pointer for the rest of the derived class will always be valid afterward for casting, but i have yet to find a correct way of doing this. I am wide open for ANY suggestions, either making this work, or an entirely different messaging design.
Before you write even a line of code, design the protocol that will be used on the wired. Decide what a message will consist of at the byte level. Decide who sends first, whether messages are acknowledged, how receivers identify message boundaries, and so on. Decide how the connection will be kept active (if it will be), which side will close first, and so on. Then write the code around the specification.
Do not tightly associate how you store things in memory with how you send things on the wire. These are two very different things with two very different sets of requirements.
Of course, feel free to adjust the protocol specification as you write the code.