boost::asio::async_read_until with regex and timeout - strange behaviour

boost::asio::async_read_until with regex and timeout - strange behaviour - c++

Based on code from a Boost example (http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio/example/timeouts/blocking_tcp_client.cpp)
I made a function read_expect() as follows:
std::string TcpClient::read_expect(const boost::regex & expected,
boost::posix_time::time_duration timeout)
{
// Set a deadline for the asynchronous operation. Since this function uses
// a composed operation (async_read_until), the deadline applies to the
// entire operation, rather than individual reads from the socket.
deadline_.expires_from_now(timeout);
// Set up the variable that receives the result of the asynchronous
// operation. The error code is set to would_block to signal that the
// operation is incomplete. Asio guarantees that its asynchronous
// operations will never fail with would_block, so any other value in
// ec indicates completion.
boost::system::error_code ec = boost::asio::error::would_block;
// Start the asynchronous operation itself. The boost::lambda function
// object is used as a callback and will update the ec variable when the
// operation completes. The blocking_udp_client.cpp example shows how you
// can use boost::bind rather than boost::lambda.
boost::asio::async_read_until(socket_, input_buffer_, expected, var(ec) = _1);
// Block until the asynchronous operation has completed.
do
{
io_service_.run_one();
}
while (ec == boost::asio::error::would_block);
if (ec)
{
throw boost::system::system_error(ec);
}
// take the whole response
std::string result;
std::string line;
std::istream is(&input_buffer_);
while (is)
{
std::getline(is, line);
result += line;
}
return result;
}
This works fine when I use it with a regular expression
boost::regex(".*#ss#([^#]+)#.*")
, but I get strange behaviour when I change the regex to
boost::regex(".*#ss#(<Stats>[^#]+</Stats>)#.*")
, resulting in no timeout, causing the thread to hang on the async_read_until() call.
Maybe this is some silly error I cannot see in the regex, and I could use the first version, but I would really like to know why this is happening.
Thanks for any insight,
Marleen

Related

Boost asio:async_read() using boost::asio::use_future

When calling asio::async_read() using a future, is there a way to get the number of bytes transferred when a boost:asio::error::eof exception occurs? It would seem that there are many cases when one would want to get the data transferred even if the peer disconnects.
For example:
namespace ba = boost::asio;
int32_t Session::read (unsigned char* pBuffer, uint32_t bufferSizeToRead)
{
// Create a mutable buffer
ba::mutable_buffer buffer (pBuffer, bufferSizeToRead);
int32_t result = 0;
// We do an async call using a future. A thread from the io_context pool does the
// actual read while the the thread calling this method will blocks on the
// std::future::get()
std::future<std::size_t> future =
ba::async_read(m_socket, buffer, ba::bind_executor(m_sessionStrand, ba::use_future));
try
{
// We block the calling thread here until we get the results of the async_read_some()...
result = future.get();
}
catch (boost::system::system_error &ex) // boost::system::system_error
{
auto exitCode = ex.code().value();
if ( exitCode == ba::error::eof )
{
log ("Connection closed by the peer");
}
}
return results; // This is zero if eof occurs
}
The code sample above represents our issue. It was designed to support a 3rd-party library. The library expects a blocking call. The new code under development is using ASIO with a minimal number of network threads. The expectation is that this 3rd party library calls session::read using its dedicated thread and we adapt the call to an asynchronous call. The network call must be async since we are supporting many such calls from different libraries with minimal threads.
What was unexpected and discovered late is that ASIO treats a connection closed as an error. Without the future, using a handler we could get the bytes transferred up to the point where the disconnect occurred. However, using a future, the exception is thrown and the bytes transferred becomes unknown.
void handler (const boost::system::error_code& ec,
std::size_t bytesTransferred );
Is there a way to do the above with a future and also get the bytes transferred?
Or ss there an alternative approach where we can provide the library a blocking call by still use an asio::async_read or similar.
Our expectation is that we could get the bytes transferred even if the client closed the connection. We're puzzled that when using a future this does not seem possible.

It's an implementation limitation of futures.
Modern async_result<> specializations (that use the initiate member approach) can be used together with as_tuple, e.g.:
ba::awaitable<std::tuple<boost::system::error_code, size_t>> a =
ba::async_read(m_socket, buffer, ba::as_tuple(ba::use_awaitable));
Or, more typical:
auto [ec, n] = co_await async_read(m_socket, buffer, ba::as_tuple(ba::use_awaitable));
However, the corresponding:
auto future = ba::async_read(m_socket, buffer, ba::as_tuple(ba::use_future));
isn't currently supported. It arguably could, but you'd have to create your own completion token, or ask Asio devs to add support to use_future: https://github.com/chriskohlhoff/asio/issues
Side-note: if you construct the m_socket from the m_sessioStrand executor, you do not need to bind_executor to the strand:
using Executor = net::io_context::executor_type;
struct Session {
int32_t read(unsigned char* pBuffer, uint32_t bufferSizeToRead);
net::io_context m_ioc;
net::strand<Executor> m_sessionStrand{m_ioc.get_executor()};
tcp::socket m_socket{m_sessionStrand};
};

Sending a large text via Boost ASIO

I am trying to send a very large string to one of my clients. I am mostly following code in HTTP server example: https://www.boost.org/doc/libs/1_78_0/doc/html/boost_asio/examples/cpp11_examples.html
Write callbacks return with error code 14, that probably means EFAULT, "bad address" according to this link:
https://mariadb.com/kb/en/operating-system-error-codes/
Note that I could not use message() member function of error_code to read error message, that was causing segmentation fault. (I am using Boost 1.53, and the error might be due to this: https://github.com/boostorg/system/issues/50)
When I try to send small strings, let's say of size 10 for example, write callback does not return with an error.
Here is how I am using async_write:
void Connection::do_write(const std::string& write_buffer)
{
auto self(shared_from_this());
boost::asio::async_write(socket_, boost::asio::buffer(write_buffer, write_buffer.size()),
[this, self, write_buffer](boost::system::error_code ec, std::size_t transfer_size)
{
if (!ec)
{
} else {
// code enters here **when** I am sending a large text.
// transfer_size always prints 65535
}
});
}
Here is how I am using async_read_some:
void Connection::do_read()
{
auto self(shared_from_this());
socket_.async_read_some(boost::asio::buffer(buffer_),
[this, self](boost::system::error_code ec, std::size_t bytes_transferred)
{
if (!ec)
{
do_write(VERY_LARGE_STRING);
do_read();
} else if (ec != boost::asio::error::operation_aborted) {
connection_manager_.stop(shared_from_this());
}
});
}
What could be causing write callback to return with error with large string?

The segfault indicates likely Undefined Behaviour to me.
Of course there's to little code to tell, but one strong smell is from you using a reference to a non-member as the buffer:
boost::asio::buffer(write_buffer, write_buffer.size())
Besides that could simply be spelled boost::asio::buffer(writer_buffer), there's not much hope that write_buffer stays around for the duration of the asynchronous operation that depends on it.
As the documentation states:
Although the buffers object may be copied as necessary, ownership of the underlying memory blocks is retained by the caller, which must guarantee that they remain valid until the handler is called.
I would check that you're doing that correctly.
Another potential cause for UB is when you cause overlapping writes on the same socket/stream object:
This operation is implemented in terms of zero or more calls to the stream's async_write_some function, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such as async_write, the stream's async_write_some function, or any other composed operations that perform writes) until this operation completes.
If you checked both these causes of concern and find that something must be wrong, please post a new question including a fully selfcontained example (SSCCE or MCVE)

asio::async_read_until: robust and graceful way of handling multiple lines

I'm using asio::async_read_until with '\n' delimiter to support a TCP client that fetches character data from a server.
This server continuously sends '\n' terminated lines; precisely, it can write at once either single lines or a concatenated string of multiple lines.
From the doc, I understand that asio::async_read_until could read:
One '\n' terminated line, like "some_data\n". This is the simplest case, handled with a call the std::getline on the stream associated with the asio::streambuf
One '\n' terminated line plus the beginning of a next line, like "some_data1\nbla". This can be handled with a std::getline; the rest of the second line will be handled at the next completion handler call.
Many lines; in this case, the newly read data could contain 2 or more '\n'. How can I know how many std::getline calls I should do, knowing that I don't want to risk calling std::getline on an incomplete line (which I will eventually get in a future packet)? Should I peek at the stream buffer to check the existence of multiple '\n'? Is it even possible without doing many copies?

from the documentation here:
http://www.boost.org/doc/libs/1_59_0/doc/html/boost_asio/reference/async_read_until/overload1.html
If the stream buffer already contains a newline, the handler will be invoked without an async_read_some operation being executed on the stream.
For this reason, when your handler executes you must execute no more than one getline(). Once getline has returned and you have finished processing, simply call async_read_until again from the handler.
example:
void handler(const boost::system::error_code& e, std::size_t size)
{
if (e)
{
// handle error here
}
else
{
std::istream is(&b);
std::string line;
std::getline(is, line);
do_something(line)
boost::asio::async_read_until(s, b, '\n', handler);
}
}
// Call the async read operation
boost::asio::async_read_until(s, b, '\n', handler);

this answer relates to the accepted answer:
I'd highly recommand to call std::getline() in a loop and test the return value.
while (std::getline(is, line)) {
...
do_something(line);
}
std::getline returns a reference to the istream reference, which can be implicitely converted to bool, indicating if the getline operation was really successful.
Why one shall do that:
std::getline may fail, i.e. if the input stream has reached its limits, and no newline is present
you may have more then one line inside asio's streambuf. If you blindly restart reading after processing just the first line, you may end up with exceeding memory limits on the streambuf (or have an ever growing streambuf).
Update 2017-08-23:
bytes_transferred actually gives you the position in the underlying buffer where the separator has been found. One can take advantage of that by simply upcasting the streambuf and create a string from that.
void client::on_read(const std::error_code &ec, size_t bytes_transferred) {
if (ec) {
return handle_error(ec);
}
std::string line(
asio::buffer_cast<const char*>(m_rxbuf.data()),
bytes_transferred
);
// todo: strip of trailing delimiter
m_rxbuf.consume(bytes_transferred); // don't forget to drain
handle_command(line); // leave restarting async_read_until to this handler
}
instead of copying data from the streambuf into the string, you can alternatively create a string_view from it, or replace the underlying streambuf by a std::string and chop off the bytes_transferred instead of consuming from the buffer.
Cheers,
Argonaut6x

Updated: with a somewhat better approach.
IMHO, you are better off using async_read_some directly rather than the read until operation. This requires less operations overall and gives you better control over the buffer handling, and could reduce the amount of copies you have to make of the data. You could use the asio::streambuf implementation, but you could also do this using a vector<char>, for example:
vector<char> buffer(2048); // whatever size you want, note: you'll need to somehow grow this if message length is greater...
size_t content = 0; // current content
// now the read operation;
void read() {
// This will cause asio to append from the last location
socket.async_read_some(boost::asio::buffer(buffer.data() + content, buffer.size() - content), [&](.. ec, size_t sz) {
if (ec) return; // some error
// Total content in the vector
content += sz;
auto is = begin(buffer);
auto ie = next(is, content); // end of the data region
// handle all the complete lines.
for (auto it = find(is, ie, '\n'); it != ie; it = find(is, ie, '\n')) {
// is -> it contains the message (excluding '\n')
handle(is, it);
// Skip the '\n'
it = next(it);
// Update the start of the next message
is = it;
}
// Update the remaining content
content -= distance(begin(buffer), is);
// Move the remaining data to the begining of the buffer
copy(is, ie, begin(buffer));
// Setup the next read
read();
});
}

boost read_some function lost data

I'm implementing a tcp server with boost asio library.
In the server, I use asio::async_read_some to get data, and use asio::write to write data. The server code is something like that.
std::array<char, kBufferSize> buffer_;
std::string ProcessMessage(const std::string& s) {
if (s == "msg1") return "resp1";
if (s == "msg2") return "resp2";
return "";
}
void HandleRead(const boost::system::error_code& ec, size_t size) {
std::string message(buffer_.data(), size);
std::string resp = ProcessMessage(message);
if (!resp.empty()) {
asio::write(socket, boost::asio::buffer(message), WriteCallback);
}
socket.async_read_some(boost::asio::buffer(buffer_));
}
Then I write a client to test the server, the code is something like
void MessageCallback(const boost::system::error_code& ec, size_t size) {
std::cout << string(buffer_.data(), size) << std::endl;
}
//Init socket
asio::write(socket, boost::asio::buffer("msg1"));
socket.read_some(boost::asio::buffer(buffer_), MessageCallback);
// Or async_read
//socket.async_read_some(boost::asio::buffer(buffer_), MessageCallback);
asio::write(socket, boost::asio::buffer("msg1"));
socket.read_some(boost::asio::buffer(buffer_), MessageCallback);
// Or async_read
//socket.async_read_some(boost::asio::buffer(buffer_), MessageCallback);
If I run the client, the code will be waiting at second read_some, and output is:resp1.
If I remove the first read_some, the ouput is resp1resp2, that means the server done the right thing.
It seems the first read_some EAT the second response but don't give the response to MessageCallback function.
I've read the quesion at What is a message boundary?, I think if this problem is a "Message Boundary" problem, the second read_some should print something as the first read_some only get part of stream from the tcp socket.
How can I solve this problem?
UPDATE:
I've try to change the size of client buffer to 4, that output will be:
resp
resp
It seems the read_some function will do a little more than read from the socket, I'll read the boost code to find out is that true.

The async_read_some() member function is very likely not doing what you intend, pay special attention to the Remarks section of the documentation
The read operation may not read all of the requested number of bytes.
Consider using the async_read function if you need to ensure that the
requested amount of data is read before the asynchronous operation
completes.
Note that async_read() free function does offer the guarantee that you are looking for
This operation is implemented in terms of zero or more calls to the
stream's async_read_some function, and is known as a composed
operation. The program must ensure that the stream performs no other
read operations (such as async_read, the stream's async_read_some
function, or any other composed operations that perform reads) until
this operation completes.

Chaining asynchronous Lambdas with Boost.Asio?

I find myself writing code that basically looks like this:
using boost::system::error_code;
socket.async_connect(endpoint, [&](error_code Error)
{
if (Error)
{
print_error(Error);
return;
}
// Read header
socket.async_read(socket, somebuffer, [&](error_code Error, std::size_t N)
{
if (Error)
{
print_error(Error);
return;
}
// Read actual data
socket.async_read(socket, somebuffer, [&](error_code Error, std::size_t N)
{
// Same here...
});
});
};
So basically I'm nesting callbacks in callbacks in callbacks, while the logic is simple and "linear".
Is there a more elegant way of writing this, so that the code is both local and in-order?

One elegant solution is to use coroutines. Boost.Asio supports both stackless coroutines, which introduce a small set of pseudo-keywords, and stackful coroutines, which use Boost.Coroutine.
Stackless Coroutines
Stackless coroutines introduce a set of pseudo-keywords preprocessor macros, that implement a switch statement using a technique similar to Duff's Device. The documentation covers each of the keywords in detail.
The original problem (connect->read header->read body) might look something like the following when implemented with stackless coroutines:
struct session
: boost::asio::coroutine
{
boost::asio::ip::tcp::socket socket_;
std::vector<char> buffer_;
// ...
void operator()(boost::system::error_code ec = boost::system::error_code(),
std::size_t length = 0)
{
// In this example we keep the error handling code in one place by
// hoisting it outside the coroutine. An alternative approach would be to
// check the value of ec after each yield for an asynchronous operation.
if (ec)
{
print_error(ec);
return;
}
// On reentering a coroutine, control jumps to the location of the last
// yield or fork. The argument to the "reenter" pseudo-keyword can be a
// pointer or reference to an object of type coroutine.
reenter (this)
{
// Asynchronously connect. When control resumes at the following line,
// the error and length parameters reflect the result of
// the asynchronous operation.
yield socket_.async_connect(endpoint_, *this);
// Loop until an error or shutdown occurs.
while (!shutdown_)
{
// Read header data. When control resumes at the following line,
// the error and length parameters reflect the result of
// the asynchronous operation.
buffer_.resize(fixed_header_size);
yield socket_.async_read(boost::asio::buffer(buffer_), *this);
// Received data. Extract the size of the body from the header.
std::size_t body_size = parse_header(buffer_, length);
// If there is no body size, then leave coroutine, as an invalid
// header was received.
if (!body_size) return;
// Read body data. When control resumes at the following line,
// the error and length parameters reflect the result of
// the asynchronous operation.
buffer_.resize(body_size);
yield socket_.async_read(boost::asio::buffer(buffer_), *this);
// Invoke the user callback to handle the body.
body_handler_(buffer_, length);
}
// Initiate graceful connection closure.
socket_.shutdown(tcp::socket::shutdown_both, ec);
} // end reenter
}
}
Stackful Coroutines
Stackful coroutines are created using the spawn() function. The original problem may look something like the following when implemented with stackful coroutines:
boost::asio::spawn(io_service, [&](boost::asio::yield_context yield)
{
boost::system::error_code ec;
boost::asio::ip::tcp::socket socket(io_service);
// Asynchronously connect and suspend the coroutine. The coroutine will
// be resumed automatically when the operation completes.
socket.async_connect(endpoint, yield[ec]);
if (ec)
{
print_error(ec);
return;
}
// Loop until an error or shutdown occurs.
std::vector<char> buffer;
while (!shutdown)
{
// Read header data.
buffer.resize(fixed_header_size);
std::size_t bytes_transferred = socket.async_read(
boost::asio::buffer(buffer), yield[ec]);
if (ec)
{
print_error(ec);
return;
}
// Extract the size of the body from the header.
std::size_t body_size = parse_header(buffer, bytes_transferred);
// If there is no body size, then leave coroutine, as an invalid header
// was received.
if (!body_size) return;
// Read body data.
buffer.resize(body_size);
bytes_transferred =
socket.async_read(boost::asio::buffer(buffer), yield[ec]);
if (ec)
{
print_error(ec);
return;
}
// Invoke the user callback to handle the body.
body_handler_(buffer, length);
}
// Initiate graceful connection closure.
socket.shutdown(tcp::socket::shutdown_both, ec);
});

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

boost::asio::async_read_until with regex and timeout - strange behaviour - c++

Related

Boost asio:async_read() using boost::asio::use_future

Sending a large text via Boost ASIO

asio::async_read_until: robust and graceful way of handling multiple lines

boost read_some function lost data

Chaining asynchronous Lambdas with Boost.Asio?

Categories

Resources