asio::async_read_until: robust and graceful way of handling multiple lines - c++

I'm using asio::async_read_until with '\n' delimiter to support a TCP client that fetches character data from a server.
This server continuously sends '\n' terminated lines; precisely, it can write at once either single lines or a concatenated string of multiple lines.
From the doc, I understand that asio::async_read_until could read:
One '\n' terminated line, like "some_data\n". This is the simplest case, handled with a call the std::getline on the stream associated with the asio::streambuf
One '\n' terminated line plus the beginning of a next line, like "some_data1\nbla". This can be handled with a std::getline; the rest of the second line will be handled at the next completion handler call.
Many lines; in this case, the newly read data could contain 2 or more '\n'. How can I know how many std::getline calls I should do, knowing that I don't want to risk calling std::getline on an incomplete line (which I will eventually get in a future packet)? Should I peek at the stream buffer to check the existence of multiple '\n'? Is it even possible without doing many copies?

from the documentation here:
http://www.boost.org/doc/libs/1_59_0/doc/html/boost_asio/reference/async_read_until/overload1.html
If the stream buffer already contains a newline, the handler will be invoked without an async_read_some operation being executed on the stream.
For this reason, when your handler executes you must execute no more than one getline(). Once getline has returned and you have finished processing, simply call async_read_until again from the handler.
example:
void handler(const boost::system::error_code& e, std::size_t size)
{
if (e)
{
// handle error here
}
else
{
std::istream is(&b);
std::string line;
std::getline(is, line);
do_something(line)
boost::asio::async_read_until(s, b, '\n', handler);
}
}
// Call the async read operation
boost::asio::async_read_until(s, b, '\n', handler);

this answer relates to the accepted answer:
I'd highly recommand to call std::getline() in a loop and test the return value.
while (std::getline(is, line)) {
...
do_something(line);
}
std::getline returns a reference to the istream reference, which can be implicitely converted to bool, indicating if the getline operation was really successful.
Why one shall do that:
std::getline may fail, i.e. if the input stream has reached its limits, and no newline is present
you may have more then one line inside asio's streambuf. If you blindly restart reading after processing just the first line, you may end up with exceeding memory limits on the streambuf (or have an ever growing streambuf).
Update 2017-08-23:
bytes_transferred actually gives you the position in the underlying buffer where the separator has been found. One can take advantage of that by simply upcasting the streambuf and create a string from that.
void client::on_read(const std::error_code &ec, size_t bytes_transferred) {
if (ec) {
return handle_error(ec);
}
std::string line(
asio::buffer_cast<const char*>(m_rxbuf.data()),
bytes_transferred
);
// todo: strip of trailing delimiter
m_rxbuf.consume(bytes_transferred); // don't forget to drain
handle_command(line); // leave restarting async_read_until to this handler
}
instead of copying data from the streambuf into the string, you can alternatively create a string_view from it, or replace the underlying streambuf by a std::string and chop off the bytes_transferred instead of consuming from the buffer.
Cheers,
Argonaut6x

Updated: with a somewhat better approach.
IMHO, you are better off using async_read_some directly rather than the read until operation. This requires less operations overall and gives you better control over the buffer handling, and could reduce the amount of copies you have to make of the data. You could use the asio::streambuf implementation, but you could also do this using a vector<char>, for example:
vector<char> buffer(2048); // whatever size you want, note: you'll need to somehow grow this if message length is greater...
size_t content = 0; // current content
// now the read operation;
void read() {
// This will cause asio to append from the last location
socket.async_read_some(boost::asio::buffer(buffer.data() + content, buffer.size() - content), [&](.. ec, size_t sz) {
if (ec) return; // some error
// Total content in the vector
content += sz;
auto is = begin(buffer);
auto ie = next(is, content); // end of the data region
// handle all the complete lines.
for (auto it = find(is, ie, '\n'); it != ie; it = find(is, ie, '\n')) {
// is -> it contains the message (excluding '\n')
handle(is, it);
// Skip the '\n'
it = next(it);
// Update the start of the next message
is = it;
}
// Update the remaining content
content -= distance(begin(buffer), is);
// Move the remaining data to the begining of the buffer
copy(is, ie, begin(buffer));
// Setup the next read
read();
});
}

Related

boost::asio::async_read_until Don´t truncate the input message

i am trying to truncate a input message when the program read a specific character, for this i have the next code:
This is the ActiveSocketClientConnection.h
class ActiveSocketClientConnection : public boost::enable_shared_from_this<ActiveSocketClientConnection>{
private:
boost::shared_ptr<tcp::socket> socket_;
boost::asio::streambuf data_;
...
public:
...
}
This is the ActiveSocketClientConnection.cpp
void ActiveSocketClientConnection::handleConnect(const boost::system::error_code& error){
std::string sETX;
sETX.push_back(0x3A); //0x3A = :
boost::asio::async_read_until(
*socket_.get(),
data_,
sETX.c_str(),
boost::bind(&ActiveSocketClientConnection::handleReadBody,
this,
boost::asio::placeholders::error
)
);
}
void ActiveSocketClientConnection::handleReadBody( boost::system::error_code error){
size_t t = data_.size();
unsigned char* output = (unsigned char*)malloc(t);
memcpy(output, boost::asio::buffer_cast<const void*>(data_.data()), t);
data_.consume(t);
...
}
If i pass the message (for example) AA:A with a socket connection. The function async_read_until save all the message in data_, don´t truncate the message where the caracter : is present.
Someone could say me what i am doing wrong?
Thank you.
First, you have undefined behaviour. You call async_read_until with std::string_view as a delimiter. But this view is created based on std::string which is local inside your handle function. async_read_until ends immediately, string as local is destroyed and you have dangling pointer inside string view (std::string_view doesn't make deep copy of string, it is just a pair: a pointer to data and its size).
As solution just call overload taking char:
boost::asio::async_read_until(
*socket_.get(),
data_,
0x3A, // <- added
boost::bind(&ActiveSocketClientConnection::handleReadBody,
this,
boost::asio::placeholders::error
)
);
Official boost reference states:
After a successful async_read_until operation, the dynamic buffer
sequence may contain additional data beyond the delimiter. An
application will typically leave that data in the dynamic buffer
sequence for a subsequent async_read_until operation to examine.
So you have to parse data looking for first occurence of delimiter and extract proper subbuffer of read data.

How can we sequentially receive multiple data from boost::asio::tcp::ip::read_some calls?

Let us suppose that a client holds two different big objects (in terms of byte size) and serializes those followed by sending the serialized objects
to a server over TCP/IP network connection using boost::asio.
For client side implementation, I'm using boost::asio::write to send binary data (const char*) to the server.
For server side implementation, I'm using read_some rather than boost::asio::ip::tcp::iostream for future improvement for efficiency. I built the following recv function at the server side. The second parameter std::stringstream &is holds a big received data (>65536 bytes) in the end of the function.
When the client side calls two sequential boost::asio::write in order to send two different binary objects separately, the server side sequentially calls two corresponding recv as well.
However, the first recv function absorbs all of two incoming big data while the second call receives nothing ;-(.
I am not sure why this happens and how to solve it.
Since each of two different objects has its own (De)Serialization function, I'd like to send each data separately. In fact, since there are more than 20 objects (not just 2) that have to be sent over the network.
void recv (
boost::asio::ip::tcp::socket &socket,
std::stringstream &is) {
boost::array<char, 65536> buf;
for (;;) {
boost::system::error_code error;
size_t len = socket.read_some(boost::asio::buffer(buf), error);
std::cout << " read "<< len << " bytes" << std::endl; // called multiple times for debugging!
if (error == boost::asio::error::eof)
break;
else if (error)
throw boost::system::system_error(error); // Some other error.
std::stringstream buf_ss;
buf_ss.write(buf.data(), len);
is << buf_ss.str();
}
}
Client main file:
int main () {
... // some 2 different big objects are constructed.
std::stringstream ss1, ss2;
... // serializing bigObj1 -> ss1 and bigObj2-> ss2, where each object is serialized into a string. This is due to the dependency of our using some external library
const char * big_obj_bin1 = reinterpret_cast<const char*>(ss1.str().c_str());
const char * big_obj_bin2 = reinterpret_cast<const char*>(ss2.str().c_str());
boost::system::error_code ignored_error;
boost::asio::write(socket, boost::asio::buffer(big_obj_bin1, ss1.str().size()), ignored_error);
boost::asio::write(socket, boost::asio::buffer(big_obj_bin2, ss2.str().size()), ignored_error);
... // do something
return 0;
}
Server main file:
int main () {
... // socket is generated. (communication established)
std::stringstream ss1, ss2;
recv(socket,ss1); // this guy absorbs all of incoming data
recv(socket,ss2); // this guy receives 0 bytes ;-(
... // deserialization to two bib objects
return 0;
}
recv(socket,ss1); // this guy absorbs all of incoming data
Of course it absorbs everything. You explicitly coded recv to do an infinite loop until eof. That's the end of the stream, which means "whenever the socket is closed on the remote end".
So the essential thing missing from the protocol is framing. The most common way to address it are:
sending data length before data, this way the server knows how much to read
sending a "special sequence" to delimit frames. In text, a common special delimiter would be '\0'. However, for binary data it is (very) hard to arrive at a delimiter that cannot naturally occur in the payload.
Of course, if you know extra characteristics of your payload you can use that. E.g. if your payload is compressed, you know you won't regularly find a block of 512 identical bytes (they would have been compressed). Alternatively you resort to encoding the binary data in ways that removes the ambiguity. yEnc, Base122 et al. come to mind (see Binary Data in JSON String. Something better than Base64 for inspiration).
Notes:
Regardless of that
it's clumsy to handwrite the reading loop. Next it is very unnecessary to do that and also copy the blocks into a stringstream anyways. If you're doing all that copying anyways, just use boost::asio::[async_]read with boost::asio::streambuf directly.
This is clear UB:
const char * big_obj_bin1 = reinterpret_cast<const char*>(ss1.str().c_str());
const char * big_obj_bin2 = reinterpret_cast<const char*>(ss2.str().c_str());
str() returns a temporary copy of the buffer - which not only is wasteful, but means that the const char* are dangling the moment they have been initialized.

How to stop boost::asio async reads from getting mixed up?

I am using boost::asio::ip::tcp::socket to receive data. I need an interface which allows me to specify a buffer and call a completion handler once this buffer is filled asynchronously.
When reading from sockets, we can use the async_read_some method.
However, the async_read_some method may read less than the requested number of bytes, so it must call itself with the rest of the buffer if this happens. Here is my current approach:
template<typename CompletionHandler>
void read(boost::asio::ip::tcp::socket* sock, char* target, size_t size, CompletionHandler completionHandler){
struct ReadHandler {
boost::asio::ip::tcp::socket* sock;
char* target;
size_t size;
CompletionHandler completionHandler;
ReadHandler(ip::tcp::socket* sock, char* target, size_t size, CompletionHandler completionHandler)
: sock(sock),target(target),size(size),completionHandler(completionHandler){}
// either request the remaining bytes or call the completion handler
void operator()(const boost::system::error_code& error, std::size_t bytes_transferred){
if(error){
return;
}
if(bytes_transferred < size){
// Read incomplete
char* newTarg =target+bytes_transferred;
size_t newSize = size-bytes_transferred;
sock->async_read_some(boost::asio::buffer(newTarg, newSize), ReadHandler(sock,newTarg,newSize,completionHandler));
return;
} else {
// Read complete, call handler
completionHandler();
}
}
};
// start first read
sock->async_read_some(boost::asio::buffer(target, size), ReadHandler(this,target,size,completionHandler));
}
So basically, we call async_read_some until the whole buffer is filled, then we call the completion handler. So far so good. However, I think that things get mixed up once I call this method more than once before the first call finishes a receive:
void thisMayFail(boost::asio::ip::tcp::socket* sock){
char* buffer1 = new char[128];
char* buffer2 = new char[128];
read(sock, buffer1, 128,[](){std::cout << "Buffer 1 filled";});
read(sock, buffer2, 128,[](){std::cout << "Buffer 2 filled";});
}
of course, the first 128 received bytes should go into the first buffer and the second 128 should go into the second. But in my understanding, it may be the case that this does not happen here:
Suppose the first async_read_some returns only 70 bytes, then it would issue a second async_read_some with the remaining 58 bytes. However, this read will be queued behind the second 128 byte read(!), so the first buffer will receive the first 70 bytes, the next 128 will go into the second buffer and the final 50 go into the first. I.e., in this case the second buffer would even be filled before the first is filled completely. This may not happen.
How to solve this? I know there is the async_read method, but its documentation says it is simply implemented by calling async_read_some multiple times, so it is basically the same as my read implementation and will not fix the problem.
You simply can't have two asynchronous read operations active at the same time: that's undefined behaviour.
You can
use the free function async_read_until or async_read functions, which already have the higher-level semantics and loop callling the socket's async_read_some until a condition is matched or the buffer is full.
use asynchronous operation chaining to sequence the next async read after the first. In short, you initiate the second boost::asio::async_read* call in the completion handler of the first.
Note:
Gives you the opportunity to act on transport errors first too.
together the free function interface will both raise the abstraction level of the code and solve the problem (the problem was initiating two simultaneous read operations)
use a strand in case you run multiple IO service threads; See Why do I need strand per connection when using boost::asio?

boost read_some function lost data

I'm implementing a tcp server with boost asio library.
In the server, I use asio::async_read_some to get data, and use asio::write to write data. The server code is something like that.
std::array<char, kBufferSize> buffer_;
std::string ProcessMessage(const std::string& s) {
if (s == "msg1") return "resp1";
if (s == "msg2") return "resp2";
return "";
}
void HandleRead(const boost::system::error_code& ec, size_t size) {
std::string message(buffer_.data(), size);
std::string resp = ProcessMessage(message);
if (!resp.empty()) {
asio::write(socket, boost::asio::buffer(message), WriteCallback);
}
socket.async_read_some(boost::asio::buffer(buffer_));
}
Then I write a client to test the server, the code is something like
void MessageCallback(const boost::system::error_code& ec, size_t size) {
std::cout << string(buffer_.data(), size) << std::endl;
}
//Init socket
asio::write(socket, boost::asio::buffer("msg1"));
socket.read_some(boost::asio::buffer(buffer_), MessageCallback);
// Or async_read
//socket.async_read_some(boost::asio::buffer(buffer_), MessageCallback);
asio::write(socket, boost::asio::buffer("msg1"));
socket.read_some(boost::asio::buffer(buffer_), MessageCallback);
// Or async_read
//socket.async_read_some(boost::asio::buffer(buffer_), MessageCallback);
If I run the client, the code will be waiting at second read_some, and output is:resp1.
If I remove the first read_some, the ouput is resp1resp2, that means the server done the right thing.
It seems the first read_some EAT the second response but don't give the response to MessageCallback function.
I've read the quesion at What is a message boundary?, I think if this problem is a "Message Boundary" problem, the second read_some should print something as the first read_some only get part of stream from the tcp socket.
How can I solve this problem?
UPDATE:
I've try to change the size of client buffer to 4, that output will be:
resp
resp
It seems the read_some function will do a little more than read from the socket, I'll read the boost code to find out is that true.
The async_read_some() member function is very likely not doing what you intend, pay special attention to the Remarks section of the documentation
The read operation may not read all of the requested number of bytes.
Consider using the async_read function if you need to ensure that the
requested amount of data is read before the asynchronous operation
completes.
Note that async_read() free function does offer the guarantee that you are looking for
This operation is implemented in terms of zero or more calls to the
stream's async_read_some function, and is known as a composed
operation. The program must ensure that the stream performs no other
read operations (such as async_read, the stream's async_read_some
function, or any other composed operations that perform reads) until
this operation completes.

boost::asio::async_read_until with regex and timeout - strange behaviour

Based on code from a Boost example (http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio/example/timeouts/blocking_tcp_client.cpp)
I made a function read_expect() as follows:
std::string TcpClient::read_expect(const boost::regex & expected,
boost::posix_time::time_duration timeout)
{
// Set a deadline for the asynchronous operation. Since this function uses
// a composed operation (async_read_until), the deadline applies to the
// entire operation, rather than individual reads from the socket.
deadline_.expires_from_now(timeout);
// Set up the variable that receives the result of the asynchronous
// operation. The error code is set to would_block to signal that the
// operation is incomplete. Asio guarantees that its asynchronous
// operations will never fail with would_block, so any other value in
// ec indicates completion.
boost::system::error_code ec = boost::asio::error::would_block;
// Start the asynchronous operation itself. The boost::lambda function
// object is used as a callback and will update the ec variable when the
// operation completes. The blocking_udp_client.cpp example shows how you
// can use boost::bind rather than boost::lambda.
boost::asio::async_read_until(socket_, input_buffer_, expected, var(ec) = _1);
// Block until the asynchronous operation has completed.
do
{
io_service_.run_one();
}
while (ec == boost::asio::error::would_block);
if (ec)
{
throw boost::system::system_error(ec);
}
// take the whole response
std::string result;
std::string line;
std::istream is(&input_buffer_);
while (is)
{
std::getline(is, line);
result += line;
}
return result;
}
This works fine when I use it with a regular expression
boost::regex(".*#ss#([^#]+)#.*")
, but I get strange behaviour when I change the regex to
boost::regex(".*#ss#(<Stats>[^#]+</Stats>)#.*")
, resulting in no timeout, causing the thread to hang on the async_read_until() call.
Maybe this is some silly error I cannot see in the regex, and I could use the first version, but I would really like to know why this is happening.
Thanks for any insight,
Marleen