boost::asio::async_read_until Don´t truncate the input message - c++

i am trying to truncate a input message when the program read a specific character, for this i have the next code:
This is the ActiveSocketClientConnection.h
class ActiveSocketClientConnection : public boost::enable_shared_from_this<ActiveSocketClientConnection>{
private:
boost::shared_ptr<tcp::socket> socket_;
boost::asio::streambuf data_;
...
public:
...
}
This is the ActiveSocketClientConnection.cpp
void ActiveSocketClientConnection::handleConnect(const boost::system::error_code& error){
std::string sETX;
sETX.push_back(0x3A); //0x3A = :
boost::asio::async_read_until(
*socket_.get(),
data_,
sETX.c_str(),
boost::bind(&ActiveSocketClientConnection::handleReadBody,
this,
boost::asio::placeholders::error
)
);
}
void ActiveSocketClientConnection::handleReadBody( boost::system::error_code error){
size_t t = data_.size();
unsigned char* output = (unsigned char*)malloc(t);
memcpy(output, boost::asio::buffer_cast<const void*>(data_.data()), t);
data_.consume(t);
...
}
If i pass the message (for example) AA:A with a socket connection. The function async_read_until save all the message in data_, don´t truncate the message where the caracter : is present.
Someone could say me what i am doing wrong?
Thank you.

First, you have undefined behaviour. You call async_read_until with std::string_view as a delimiter. But this view is created based on std::string which is local inside your handle function. async_read_until ends immediately, string as local is destroyed and you have dangling pointer inside string view (std::string_view doesn't make deep copy of string, it is just a pair: a pointer to data and its size).
As solution just call overload taking char:
boost::asio::async_read_until(
*socket_.get(),
data_,
0x3A, // <- added
boost::bind(&ActiveSocketClientConnection::handleReadBody,
this,
boost::asio::placeholders::error
)
);
Official boost reference states:
After a successful async_read_until operation, the dynamic buffer
sequence may contain additional data beyond the delimiter. An
application will typically leave that data in the dynamic buffer
sequence for a subsequent async_read_until operation to examine.
So you have to parse data looking for first occurence of delimiter and extract proper subbuffer of read data.

Related

boost::asio problem passing dynamically sized data to async handler

I am processing custom tcp data packet with boost. Since all operations are asynchronously a handler must be called to process the data. The main problem is that I don't know how to pass the data to the handler when the size is not known at compiletime?
For example, say you receive the header bytes, parse them which tells you the length of the body:
int length = header.body_size();
I somehow need to allocate an array with the size of the body and then call the handler (which is a class member, not a static function) and pass the data to it. How do I do that properly?
I tried different things such as but always ended up getting a segfault or I had to provide a fixed size for the body buffer which is not what I want. An attempt I made can be found below.
After receiving the header information:
char data[header.body_size()];
boost::asio::async_read(_socket, boost::asio::buffer(data, header.body_size()),
boost::bind(&TCPClient::handle_read_body, this, boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred, data));
The handler:
void TCPClient::handle_read_body(const boost::system::error_code &error, std::size_t bytes_transferred,
const char *buffer) {
Logger::log_info("Reading body. Body size: " + std::to_string(bytes_transferred));
}
This example throws a segfault.
How can I allocate a buffer for the body after knowing the size?
And how can I then call the handler and passing over the error_code, the bytes_transferred and the body data?
An example snippet would be really appreciated since the boost-chat examples that do this are not very clear to me.
char data[header.body_size()]; is not standard in C++ and will become invalid once it goes out of scope while async_read requires buffer to remain alive until completion callback is invoked. So you should probably add a field to TCPClient holding a list of data buffers (probably of std::vector kind) pending to be received.
All you need to do is to create buffer onto heap instead of stack. In place of VLA - char [sizeAtRuntime] you can use std::string or std::vector with std::shared_ptr. By using string/vector you can set buffer to have any size and by using shared_ptr you can prolong lifetime of your buffer.
Version with bind:
void foo()
{
std::shared_ptr<std::vector<char>> buf = std::make_shared<std::vector<char>>(); // buf is local
buf->resize( header.body_size() );
// ditto with std::string
boost::asio::async_read(_socket, boost::asio::buffer(*buf),
boost::bind(&TCPClient::handle_read_body,
this, boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred,
buf)); // buf is passed by value
}
void handle_read_body(const boost::system::error_code&,
size_t,
std::shared_ptr<std::vector<char>>)
{
}
in above example buf is created onto stack and points to vector onto heap, because bind takes its arguments by value, so buf is copied and reference counter is increased - it means your buffer still exists when async_read ends and foo ends.
You can achive the same behaviour with lambda, then buf should be captured by value:
void foo()
{
std::shared_ptr<std::vector<char>> buf = std::make_shared<std::vector<char>>(); // buf is local
buf->resize( header.body_size() );
// ditto with std::string
boost::asio::async_read(_socket, boost::asio::buffer(*buf),
boost::bind(&TCPClient::handle_read_body, this, boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred, buf)); // buf is passed by value
boost::asio::async_read(_socket, boost::asio::buffer(*buf),
[buf](const boost::system::error_code& , size_t)
^^^ capture buf by value, increates reference counter of shared_ptr
{
});
}

How can we sequentially receive multiple data from boost::asio::tcp::ip::read_some calls?

Let us suppose that a client holds two different big objects (in terms of byte size) and serializes those followed by sending the serialized objects
to a server over TCP/IP network connection using boost::asio.
For client side implementation, I'm using boost::asio::write to send binary data (const char*) to the server.
For server side implementation, I'm using read_some rather than boost::asio::ip::tcp::iostream for future improvement for efficiency. I built the following recv function at the server side. The second parameter std::stringstream &is holds a big received data (>65536 bytes) in the end of the function.
When the client side calls two sequential boost::asio::write in order to send two different binary objects separately, the server side sequentially calls two corresponding recv as well.
However, the first recv function absorbs all of two incoming big data while the second call receives nothing ;-(.
I am not sure why this happens and how to solve it.
Since each of two different objects has its own (De)Serialization function, I'd like to send each data separately. In fact, since there are more than 20 objects (not just 2) that have to be sent over the network.
void recv (
boost::asio::ip::tcp::socket &socket,
std::stringstream &is) {
boost::array<char, 65536> buf;
for (;;) {
boost::system::error_code error;
size_t len = socket.read_some(boost::asio::buffer(buf), error);
std::cout << " read "<< len << " bytes" << std::endl; // called multiple times for debugging!
if (error == boost::asio::error::eof)
break;
else if (error)
throw boost::system::system_error(error); // Some other error.
std::stringstream buf_ss;
buf_ss.write(buf.data(), len);
is << buf_ss.str();
}
}
Client main file:
int main () {
... // some 2 different big objects are constructed.
std::stringstream ss1, ss2;
... // serializing bigObj1 -> ss1 and bigObj2-> ss2, where each object is serialized into a string. This is due to the dependency of our using some external library
const char * big_obj_bin1 = reinterpret_cast<const char*>(ss1.str().c_str());
const char * big_obj_bin2 = reinterpret_cast<const char*>(ss2.str().c_str());
boost::system::error_code ignored_error;
boost::asio::write(socket, boost::asio::buffer(big_obj_bin1, ss1.str().size()), ignored_error);
boost::asio::write(socket, boost::asio::buffer(big_obj_bin2, ss2.str().size()), ignored_error);
... // do something
return 0;
}
Server main file:
int main () {
... // socket is generated. (communication established)
std::stringstream ss1, ss2;
recv(socket,ss1); // this guy absorbs all of incoming data
recv(socket,ss2); // this guy receives 0 bytes ;-(
... // deserialization to two bib objects
return 0;
}
recv(socket,ss1); // this guy absorbs all of incoming data
Of course it absorbs everything. You explicitly coded recv to do an infinite loop until eof. That's the end of the stream, which means "whenever the socket is closed on the remote end".
So the essential thing missing from the protocol is framing. The most common way to address it are:
sending data length before data, this way the server knows how much to read
sending a "special sequence" to delimit frames. In text, a common special delimiter would be '\0'. However, for binary data it is (very) hard to arrive at a delimiter that cannot naturally occur in the payload.
Of course, if you know extra characteristics of your payload you can use that. E.g. if your payload is compressed, you know you won't regularly find a block of 512 identical bytes (they would have been compressed). Alternatively you resort to encoding the binary data in ways that removes the ambiguity. yEnc, Base122 et al. come to mind (see Binary Data in JSON String. Something better than Base64 for inspiration).
Notes:
Regardless of that
it's clumsy to handwrite the reading loop. Next it is very unnecessary to do that and also copy the blocks into a stringstream anyways. If you're doing all that copying anyways, just use boost::asio::[async_]read with boost::asio::streambuf directly.
This is clear UB:
const char * big_obj_bin1 = reinterpret_cast<const char*>(ss1.str().c_str());
const char * big_obj_bin2 = reinterpret_cast<const char*>(ss2.str().c_str());
str() returns a temporary copy of the buffer - which not only is wasteful, but means that the const char* are dangling the moment they have been initialized.

asio::async_read_until: robust and graceful way of handling multiple lines

I'm using asio::async_read_until with '\n' delimiter to support a TCP client that fetches character data from a server.
This server continuously sends '\n' terminated lines; precisely, it can write at once either single lines or a concatenated string of multiple lines.
From the doc, I understand that asio::async_read_until could read:
One '\n' terminated line, like "some_data\n". This is the simplest case, handled with a call the std::getline on the stream associated with the asio::streambuf
One '\n' terminated line plus the beginning of a next line, like "some_data1\nbla". This can be handled with a std::getline; the rest of the second line will be handled at the next completion handler call.
Many lines; in this case, the newly read data could contain 2 or more '\n'. How can I know how many std::getline calls I should do, knowing that I don't want to risk calling std::getline on an incomplete line (which I will eventually get in a future packet)? Should I peek at the stream buffer to check the existence of multiple '\n'? Is it even possible without doing many copies?
from the documentation here:
http://www.boost.org/doc/libs/1_59_0/doc/html/boost_asio/reference/async_read_until/overload1.html
If the stream buffer already contains a newline, the handler will be invoked without an async_read_some operation being executed on the stream.
For this reason, when your handler executes you must execute no more than one getline(). Once getline has returned and you have finished processing, simply call async_read_until again from the handler.
example:
void handler(const boost::system::error_code& e, std::size_t size)
{
if (e)
{
// handle error here
}
else
{
std::istream is(&b);
std::string line;
std::getline(is, line);
do_something(line)
boost::asio::async_read_until(s, b, '\n', handler);
}
}
// Call the async read operation
boost::asio::async_read_until(s, b, '\n', handler);
this answer relates to the accepted answer:
I'd highly recommand to call std::getline() in a loop and test the return value.
while (std::getline(is, line)) {
...
do_something(line);
}
std::getline returns a reference to the istream reference, which can be implicitely converted to bool, indicating if the getline operation was really successful.
Why one shall do that:
std::getline may fail, i.e. if the input stream has reached its limits, and no newline is present
you may have more then one line inside asio's streambuf. If you blindly restart reading after processing just the first line, you may end up with exceeding memory limits on the streambuf (or have an ever growing streambuf).
Update 2017-08-23:
bytes_transferred actually gives you the position in the underlying buffer where the separator has been found. One can take advantage of that by simply upcasting the streambuf and create a string from that.
void client::on_read(const std::error_code &ec, size_t bytes_transferred) {
if (ec) {
return handle_error(ec);
}
std::string line(
asio::buffer_cast<const char*>(m_rxbuf.data()),
bytes_transferred
);
// todo: strip of trailing delimiter
m_rxbuf.consume(bytes_transferred); // don't forget to drain
handle_command(line); // leave restarting async_read_until to this handler
}
instead of copying data from the streambuf into the string, you can alternatively create a string_view from it, or replace the underlying streambuf by a std::string and chop off the bytes_transferred instead of consuming from the buffer.
Cheers,
Argonaut6x
Updated: with a somewhat better approach.
IMHO, you are better off using async_read_some directly rather than the read until operation. This requires less operations overall and gives you better control over the buffer handling, and could reduce the amount of copies you have to make of the data. You could use the asio::streambuf implementation, but you could also do this using a vector<char>, for example:
vector<char> buffer(2048); // whatever size you want, note: you'll need to somehow grow this if message length is greater...
size_t content = 0; // current content
// now the read operation;
void read() {
// This will cause asio to append from the last location
socket.async_read_some(boost::asio::buffer(buffer.data() + content, buffer.size() - content), [&](.. ec, size_t sz) {
if (ec) return; // some error
// Total content in the vector
content += sz;
auto is = begin(buffer);
auto ie = next(is, content); // end of the data region
// handle all the complete lines.
for (auto it = find(is, ie, '\n'); it != ie; it = find(is, ie, '\n')) {
// is -> it contains the message (excluding '\n')
handle(is, it);
// Skip the '\n'
it = next(it);
// Update the start of the next message
is = it;
}
// Update the remaining content
content -= distance(begin(buffer), is);
// Move the remaining data to the begining of the buffer
copy(is, ie, begin(buffer));
// Setup the next read
read();
});
}

how I can read pointer data from boost::streambuf but without copy data

I have a stream buffer I need send n data that I know exist(get it from writer)
I can use belo syntax:
streambuf b;
chat mybuf[1024]
std::istream is(&b);
is.read(mybuf,500);
In this way I copy data. I want only get pointer and avoid copy data. I want send like below:
asio::tcp::socket socket_;
socket_->send(
Is there a way to drive streambuf and use the protected gptr and `gbum, and is this a good way?
What do I need to do?
boost::asio::streambuf is no more than buffer queue. Use streambuf::data(). It return a list of buffer witch represents committed buffers. After successful send/write use streambuf::consume(size_t) to remove/reuse buffers.
streambuf b;
size_t size;
size= read( _socket, b.prepare( 1024 ) );
b.commit( size ); // after this function you may call read again
size= write( socket_, b.data() ); // you can check size() if there is anything
b.consume(size); // remove size bytes from data() buffers

boost::asio::async_read texutal stop condition?

I'm writing a server with Boost, something pretty simple - accept an XML message, process, reply. But I'm running into trouble at telling it when to stop reading.
This is what I have right now: (_index is the buffer into which the data is read)
std::size_t tcp_connection::completion_condition(const boost::system::error_code& error,
std::size_t bytes_transferred)
{
int ret = -1;
std::istream is(&_index);
std::string s;
is >> s;
if (s.find("</end_tag>") != std::string.npos) ret = 0;
return ret;
}
void tcp_connection::start()
{
// Get index from server
boost::asio::async_read(_socket, _index, &(tcp_connection::completion_condition),
boost::bind(&tcp_connection::handle_read, shared_from_this(), boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
This doesn't compile, since I have to define completion_condition as static to pass it to async_read; and I can't define _index as static since (obviously) I need it to be specific to the class.
Is there some other way to give parameters to completion_condition? How do I get it to recognize the ending tag and call the reading handler?
You can pass pointers to member functions. The syntax for doing it with C++ is tricky, but boost::bind hides it and makes it fairly easy to do.
An example would be making completion_condition non-static and passing it to async_read as such:boost::bind(&tcp_connection::completion_condition, this, _1, _2)
&tcp_connection::completion_condition is a pointer to the function. this is the object of type tcp_connection to call the function on. _1 and _2 are placeholders; they will be replaced with the two parameters the function is called with.