Receiving large binary data over Boost::Beast websocket - c++

I am trying to receive a large amount of data using a boost::beast::websocket, fed by another boost::beast::websocket. Normally, this data is sent to a connected browser but I'd like to set up a purely C++ unit test validating certain components of the traffic. I set the auto fragmentation to true from the sender with a max size of 1MB but after a few messages, the receiver spits out:
Read 258028 bytes of binary
Read 1547176 bytes of binary
Read 168188 bytes of binary
"Failed read: The WebSocket message exceeded the locally configured limit"
Now, I should have no expectation that a fully developed and well supported browser should exhibit the same characteristics as my possibly poorly architected unit test, which it does not. The browser has no issue reading 25MB messages over the websocket. My boost::beast::websocket on the other hand hits a limit.
So before I go down a rabbit hole, I'd like to see if anyone has any thoughts on this. My read sections looks like this:
void on_read(boost::system::error_code ec, std::size_t bytes_transferred)
{
boost::ignore_unused(bytes_transferred);
if (ec)
{
m_log.error("Failed read: " + ec.message());
// Stop the websocket
stop();
return;
}
std::string data(boost::beast::buffers_to_string(m_buffer.data()));
// Yes I know this looks dangerous. The sender always sends as binary but occasionally sends JSON
if (data.at(0) == '{')
m_log.debug("Got message: " + data);
else
m_log.debug("Read " + utility::to_string(m_buffer.data().buffer_bytes()) + " of binary data");
// Do the things with the incoming doata
for (auto&& callback : m_read_callbacks)
callback(data);
// Toss the data
m_buffer.consume(bytes_transferred);
// Wait for some more data
m_websocket.async_read(
m_buffer,
std::bind(
&WebsocketClient::on_read,
shared_from_this(),
std::placeholders::_1,
std::placeholders::_2));
}
I saw in a separate example that instead of doing an async read, you can do a for/while loop reading some data until the message is done (https://www.boost.org/doc/libs/1_67_0/libs/beast/doc/html/beast/using_websocket/send_and_receive_messages.html). Would this be the right approach for an always open websocket that could send some pretty massive messages? Would I have to send some indicator to the client that the message is indeed done? And would I run into the exceeded buffer limit issue using this approach?

If your use pattern is fixed:
std::string data(boost::beast::buffers_to_string(m_buffer.data()));
And then, in particular
callback(data);
Then there will be no use at all reading block-wise, since you will be allocating the same memory anyways. Instead, you can raise the "locally configured limit":
ws.read_message_max(20ull << 20); // sets the limit to 20 miB
The default value is 16 miB (as of boost 1.75).
Side Note
You can probably also use ws.got_binary() to detect whether the last message received was binary or not.

Related

Truncated data (if more than 512 bytes) when using boost::asio::async_read_until from serial port

I'm using the boost::asio::async_read_until function to read from a serial port in Windows 10. The delimiter is a Regex pattern. It works as expected as long as the data recieved is not larger than 512 bytes.
If the data received is larger than 512 bytes, it is simply truncated and the "readComplete" function will not be called again. However if I send more data, 1 byte is enough, the missing data is received together with the new data.
I have used the same implementation on a tcp/socket and that works flawlessly. Is there any limit in the native serial interface in Windows causing this behaviour?
EDIT 1: I have noted that if the baud rate is lowered from 115200 to 28800 no data is missing.
// from .h-file: boost::asio::streambuf streamBuf_;
void RS232Instrument::readAsyncChars()
{
boost::asio::async_read_until(
serial_,
streamBuf_,
boost::regex(regexStr_.substr(6, regexStr_.length() - 7)),
boost::bind(
&RS232Instrument::readComplete,
this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
void RS232Instrument::readComplete(const boost::system::error_code& error, size_t bytes_transferred)
{
if(error)
{
// Error handling
}
else
{
std::string rawStr(
boost::asio::buffers_begin(streamBuf_.data()),
boost::asio::buffers_begin(streamBuf_.data()) + bytes_transferred);
// Log the data in rawStr....
// Remove data from beginning until all data sent to log
streamBuf_.consume(bytes_transferred);
if(abort_ == false)
{
readAsyncChars();
}
}
}
Since I have found out was caused this problem I'll answer the question myself.
I had left out some code above for the sake of clarity, code which I did not realise was actually the problem.
Example of code left out:
LOG_DEBUG("Rs232Data received");
I use the boost:log functionality and I have added more "sinks" to the the log framework. The sink used in this case logs to a vector in ram and prints to console when triggered from user input.
It turns out that the log framework consumes about 1 ms before the "consume" function in the sink is called. That is enough to cause loss of data from the serial port when using async_read_until.
Lessons learned: Do not call any time consuming tasks in the handler function in async_read_until

How can I convert serialized data in boost::beast to a string so that I could process it in a FIFO manner?

I have an application of a client where I need to receive http "long running requests" from a server. I send a command, and after getting the header of the response, I have to just receive json data separated by \r\n until the connection is terminated.
I managed to adapt boost beast client example to send the message and receive the header and parse it and receive responses from the server. However, I failed at finding a way to serialize the data so that I could process the json messages.
The closest demonstration of the problem can be found in this relay example. In that example (p is a parser, sr is a serializer, input is a socket input stream and output is an socket output stream), after reading the http header, we have a loop that reads continuously from the server:
do
{
if(! p.is_done())
{
// Set up the body for writing into our small buffer
p.get().body().data = buf;
p.get().body().size = sizeof(buf);
// Read as much as we can
read(input, buffer, p, ec);
// This error is returned when buffer_body uses up the buffer
if(ec == error::need_buffer)
ec = {};
if(ec)
return;
// Set up the body for reading.
// This is how much was parsed:
p.get().body().size = sizeof(buf) - p.get().body().size;
p.get().body().data = buf;
p.get().body().more = ! p.is_done();
}
else
{
p.get().body().data = nullptr;
p.get().body().size = 0;
}
// Write everything in the buffer (which might be empty)
write(output, sr, ec);
// This error is returned when buffer_body uses up the buffer
if(ec == error::need_buffer)
ec = {};
if(ec)
return;
}
while(! p.is_done() && ! sr.is_done());
A few things I don't understand here:
We're done reading the header. Why do we need boost beast and not boost asio to read a raw tcp message? When I tried to do that (with both async_read/async_read_some) I got an infinite reads of zero size.
The documentation of parser says (at the end of the page) that a new instance is needed for every message, but I don't see that in the example.
Since tcp message reading is not working, is there a way to convert the parser/serializer data to some kind of string? Even write it to a text file in a FIFO manner, so that I could process it with some json library? I don't want to use another socket like the example.
The function boost::beast::buffers() failed to compile for the parser and the serializer, and for the parser there's no consume function, and the serializer's consume seems to be for particular http parts of the message, which fires an assert if I do it for body().
Besides that, I also failed at getting consistent chunks of data from the parser and the buffer with old-school std::copy. I don't seem to understand how to combine the data together to get the stream of data. Consuming the buffer with .consume() at any point while receiving data leads to need buffer error.
I would really appreciate someone explaining the logic of how all this should work together.
Where is buf? You could read directly into the std::string instead. Call string.resize(N), and set the pointer and size in the buffer_body::value_type to string.data() and string.size().

Boost:asio time taken for socket read/write

I have the following code to measure the total time taken for a socket write from client to server. (assuming that the call back method invocation is done on successful write of the data to the destination socket (TCP-ACK received)) Does this ensure that - this time is the actual "network time" for the data transfer ?
void on_successful_read_from_client(const boost::system::error_code& error,
const size_t& bytes_transferred)
{
if (!error)
{
m_telnet_server_write_time = posix_time::microsec_clock::universal_time();
async_write(telnet_server,
boost::asio::buffer(data_from_device_,bytes_transferred),
boost::bind(&bridge::on_successful_send_to_server,
shared_from_this(),
boost::asio::placeholders::error));
}
else
close();
}
void on_successful_send_to_server(const boost::system::error_code& error)
{
if (!error)
{
posix_time::ptime now = posix_time::microsec_clock::universal_time();
if ((now - m_telnet_server_write_time).total_milliseconds() > 0)
{
std::ostringstream log;
log << "Time Taken for server write: " << (now - m_telnet_server_write_time).total_milliseconds() << " ms";
write_log(log.str());
}
No I don't think it does. I think it measures the time needed to get the data into the buffers in the network stack of your OS, but not the time taken for the data to be transferred to and read by the program at the other end of the connection. The only way you can do that is to have accurate clocks at both ends and have the send time sent to the recipient as part of the message so that it can do the elapsed time calculation.
it measures the time to put the data into send buffers in your network stack.
if you want to calculate the "network time", use icmp::socket, just like "ping" command does.
here is the example in boost documents
boost examples
search "ICMP" in this page
Does this ensure that - this time is the actual "network time" for the data transfer?
No. It measures the time to transfer the data into the socket send buffer. If the socket send buffer was full enough such that not all the data fitted into it immediately, it also measures the time to drain the socket send buffer sufficiently to accomodate all the new data. It does not measure time to send all this data to the server in any way.
assuming that the call back method invocation is done on successful write of the data to the destination socket (TCP-ACK received)
It isn't. It is called when all the data has been transferred into the socket send buffer.

Read failed: End of file on succesful https request to AWS s3 using boost::asio [duplicate]

I have a server that receives a compressed string (compressed with zlib) from a client, and I was using async_receive from the boost::asio library to receive this string, it turns out however that there is no guarantee that all bytes will be received, so I now have to change it to async_read. The problem I face is that the size of the bytes received is variable, so I am not sure how to use async_read without knowing the number of bytes to be received. With the async_receive I just have a boost::array<char, 1024>, however this is a buffer that is not necessarily filled completely.
I wondered if anyone can suggest a solution where I can use async_read even though I do not know the number of bytes to be received in advance?
void tcp_connection::start(boost::shared_ptr<ResolverQueueHandler> queue_handler)
{
if (!_queue_handler.get())
_queue_handler = queue_handler;
std::fill(buff.begin(), buff.end(), 0);
//socket_.async_receive(boost::asio::buffer(buff), boost::bind(&tcp_connection::handle_read, shared_from_this(), boost::asio::placeholders::error));
boost::asio::async_read(socket_, boost::asio::buffer(buff), boost::bind(&tcp_connection::handle_read, shared_from_this(), boost::asio::placeholders::error));
}
buff is a boost::array<char, 1024>
How were you expecting to do this using any other method?
There are a few general methods to sending data of variable sizes in an async manor:
By message - meaning that you have a header that defines the length of the expected message followed by a body which contains data of the specified length.
By stream - meaning that you have some marker (and this is very broad) method of knowing when you've gotten a complete packet.
By connection - each complete packet of data is sent in a single connection which is closed once the data is complete.
So can your data be parsed, or a length sent etc...
Use async_read_until and create your own match condition, or change your protocol to send a header including the number of bytes to expect in the compressed string.
A single IP packet is limited to an MTU size of ~1500 bytes, and yet still you can download gigabyte-large files from your favourite website, and watch megabyte-sized videos on YouTube.
You need to send a header indicating the actual size of the raw data, and then receive the data by pieces on smaller chunks until you finish receiving all the bytes.
For example, when you download a large file over HTTP, there is a field on the header indicating the size of the file: Content-Length:.

Boost ASIO async_read_some

I am having difficulties in implementing a simple TCP server. The following code is taken from boost::asio examples, "Http Server 1" to be precise.
void connection::start() {
socket_.async_read_some(
boost::asio::buffer(buffer_),
boost::bind(
&connection::handle_read, shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred
)
);
}
void connection::handle_read(const boost::system::error_code& e, std::size_t bytes_transferred) {
if (!e && bytes_transferred) {
std::cout << " " << bytes_transferred <<"b" << std::endl;
data_.append(buffer_.data(), buffer_.data()+bytes_transferred);
//(1) what here?
socket_.async_read_some(
boost::asio::buffer(buffer_),
boost::bind(
&connection::handle_read, shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred
)
);
}
else// if (e != boost::asio::error::operation_aborted)
{
std::cout << data_ << std::endl;
connection_manager_.stop(shared_from_this());
}
}
In the original code the buffer_ is big enough to keep the entire request. It's not what I need. I've changed the size to 32bytes.
The server compiles and listens at port 80 of localhost, so I try to connect to it via my web browser.
Now if the statement (1) is commented-out, then only the first 32bytes of the request are read and the connection hangs. Web browser keeps waiting for the response, the server does.. I dont know what.
If (1) is uncommented, then the entire request is read (and appeded to data_), but it never stops - I have to cancel the request in my browser and only then does the else { } part run - I see my request on stdout.
Question 1: How should I handle a large request?
Question 2: How should I cache the request (currently I append the buffer to a string)?
Question 3: How can I tell that the request is over? In HTTP there always is a response, so my web-browser keeps waiting for it and doesnt close the connection, but how can my server know that the request is over (and perhaps close it or reply some "200 OK")?
Suppose browser send you 1360 bytes of data, you say asio to read some data into your buffer that you say it only have 32 bytes.
then first time that you call it your handler will be called with 32 bytes start of data. here if you comment (1) then browser try to send rest of its data(actually browser already sent it and it is in the OS buffer that wait for you to peek it from there) and you are possibly blocked behind io_service::run for some miracle!!
if you uncomment (1) as you say your loop started, you read first block, then next and another and ... until the data that the browser sent finished, but after that when you say asio to read some more data it will wait for some more data that never come from the browser( since browser already sent its information and is waiting for your answer ) and when you cancel the request from the browser, it will close its socket and then your handler will be called whith an error that say I can't read more data, since the connection is closed.!!
but what you should do here to make it work is: you should learn HTTP format and thus know what is the data that your browser sent to you and provide a good answer for it and then your communication with the client will be proceeded. in this case end of buffer is \r\n\r\n and when you see it you shouldn't read any more data, you should process what you read till now and then send a response to the browser.