boost read_until does not stop at delimiter - c++

I'm using the boost read_until function to facilitate receiving and parsing HTTP messages over a socket. So what I'm trying to do is read_until from the socket until \r\n, which I think should give me one line of the HTTP header. (Each HTTP header line ends in \r\n, per the standard.) However, what I'm actually getting from read_line instead is the entire header, several lines long. (The header ends in \r\n\r\n, or in other words, a blank line. Also, per the HTTP standard.) Here's a code snippet. sock is the socket file descriptor.
boost::system::error_code err;
io::streambuf request_buff;
io::read_until(sock, request_buff, "\r\n", err); // read request line
if (err)
throw Exception(string("Failed to read HTTP header request line from socket: ") + err.message());
cerr << "Read " << request_buff.size() << " bytes." << endl;
istream request(&request_buff);
try {
request >> m_strMethod >> m_strPath >> m_strHttpVersion;
} catch (std::exception& e) {
throw Exception(string("Failed to parse HTTP header: ") + e.what(), e);
}
if (!request)
throw Exception("Failed to read HTTP header");
if (!alg::istarts_with(m_strHttpVersion, "HTTP/"))
throw Exception(string("Malformed HTTP header: expected HTTP version but got: ") + m_strHttpVersion);
string strTemp;
while (std::getline(request, strTemp))
{
cerr << "Extra line size = " << strTemp.size() << endl;
cerr << "Extra line: '" << strTemp << '\'' << endl;
}
What I expect to see is output indicating it read the number of bytes in the first line of the HTTP message and no "Extra" output. What I get instead is the number of bytes in the entire HTTP header, and a blank extra line (which maybe is because the >> operations didn't consume the newline at the end of the first line) followed by every other line in the header, and another blank line (which indicates the end of the header, as noted above). Why is read_until reading more from the socket than the first line of the header and putting it into request_buff?
Note, I used netcat to receive the request and it's coming through okay. So the HTTP message itself appears to be correctly formatted.

The documentation may seem to imply this:
"This function is used to read data into the specified streambuf
until the streambuf's get area contains the specified delimiter."
But look closer:
until the streambuf's get area contains ...
So, it doesn't promise to stop there. It just promises to return to you as soon as it read the block that contains your delimiter.

Related

c++ string.find() returnes length of the string

Hi I am a little confused. My program uses TCP to transfer messages over network. Which is in my opinion irrelevant to my question.
std::stringstream tmp(buf);
if (tmp.str().find("\r\n") == std::string::npos ) {
std::cout << " doesnt have ending char" << std::endl;
}else{
std::cout << " position of ending char " << tmp.str().find("\r\n") << std::endl;
}
when a message is read from client, it is pushed to stringstream. Then I am trying to find escape character, unfortunately string.find("\r\n") always return length of the string, even though "\r\n" is not contained in the buf.
I am using
telnet
to test it, is it possible that this behavior is caused by the telnet?
This is output of my terminal:
Connected to localhost.
Escape character is '^]'.
200 LOGIN
asdadsgdgsd
and this is output from the program:
3001 PORT NUM
sent message: asdadsgdgsd END
position of ending char 11
The string entered in the console then sent and received is
"asdadsgdgsd\r\n"
012345678901_2_
So the value 11 is correct

No longer unable to retrieve data from QIODevice after calling readAll(). Buffer flushed?

I've just noticed something when using QNetworkReply that I was unable to find the slightest hint in the Qt documentation for QIODevice::readAll() (which the QNetworkReply inherits this method from).
Here is what the documentation states:
Reads all remaining data from the device, and returns it as a byte
array.
This function has no way of reporting errors; returning an empty
QByteArray can mean either that no data was currently available for
reading, or that an error occurred.
Let's say I have the following connection:
connect(this->reply, &QIODevice::readyRead, this, &MyApp::readyReadRequest);
Ths readyReadRequest() slot looks like this:
void MyApp::readyReadRequest()
{
LOG(INFO) << "Received data from \"" << this->url.toString() << "\"";
LOG(INFO) << "Data contents:\n" << QString(this->reply->readAll());
this->bufferReply = this->reply->readAll();
}
The surprise came after I called this->bufferReply (which a QByteArray class member of MyApp). I passed it to a QXmlStreamReader and did:
while (!reader.atEnd())
{
LOG(DEBUG) << "Reading next XML element";
reader.readNext();
LOG(DEBUG) << reader.tokenString();
}
if (reader.hasError())
{
LOG(ERROR) << "Encountered error while parsing XML data:" << reader.errorString();
}
Imagine my surprise when I got the following output:
2017-10-17 16:12:18,591 DEBUG [default] [void MyApp::processReply()][...] Reading next XML element
2017-10-17 16:12:18,591 DEBUG [default] [void MyApp::processReply()] [...] Invalid
2017-10-17 16:12:18,591 ERROR [default] Encountered error while parsing XML data: Premature end of document
Through debugging I got that my bufferReply at this point is empty. I looked in the docs again but couldn't find anything that hints removing the data from the device (in my case the network reply) after reading it all.
Removing the line where I print the byte array or simply moving it after this->bufferReply = this->reply->readAll(); and then printing the contents of the class member fixed the issue:
void MyApp::readyReadRequest()
{
LOG(INFO) << "Received data from \"" << this->url.toString() << "\"";
this->bufferReply = this->reply->readAll();
LOG(INFO) << "Data contents:\n" << QString(this->bufferReply);
}
However I would like to know if I'm doing something wrong or is the documentation indeed incomplete.
Since readAll() doesn't report errors or that data is not available at the given point in time returning an empty byte array is the only thing that hints towards the fact that something didn't work as intended.
Yes. When you call QIODevice::readAll() 2 times, it is normal that the 2nd time you get nothing. Everything has been read, there is nothing more to be read.
This behavior is standard in IO read functions: each call to a read() function returns the next piece of data. Since readAll() reads to the end, further calls return nothing.
However, this does not necessarily means that the data has been flushed. For instance when you read a file, it just moves a "cursor" around and you can go back to the start of the file with QIODevice::seek(0). For QNetworkReply, I'd guess that the data is just discarded.

How to use asio buffer after async_read_until for consecutive reads

I am reading from a serial device where each message must be specifically requested. E.g. you send a request and get a response with the serialised payload.
Each message contains these parts in order:
PREAMBLE (2 bytes, "$M")
HEADER (3 bytes, containing payload length N)
PAYLOAD+CRC (N+1 bytes)
My approach with asio is to detect the start (PREAMBLE) of a message by using asio::async_read_until and afterwards using asio::async_read for reading the exact amount of bytes for HEADER and PAYLOAD+CRC. Since there is no static pattern at the end of the message, I cannot use async_read_until to read the full message.
After receiving PREAMBLE, the handler for async_read_until gets called and the buffer contains the PREAMBLE bytes and might contain additional bytes from HEADER and PAYLOAD+CRC.
The asio documentation for async_read_until says:
After a successful async_read_until operation, the streambuf may
contain additional data beyond the delimiter. An application will
typically leave that data in the streambuf for a subsequent
async_read_until operation to examine.
I interpret this as that you should only consume the requested bytes and leave all remaining bytes in the buffer for further reads. However, all consecutive reads block since the data is already in the buffer and there is nothing left on the device.
The reading is implemented as a small state machine processState, where different handlers are registered depending on which part of the message is to be read. All reading is done with the same buffer (asio::streambuf). processState is called in an infinite loop.
void processState() {
// register handler for incomming messages
std::cout << "state: " << parser_state << std::endl;
switch (parser_state) {
case READ_PREAMBLE:
asio::async_read_until(port, buffer, "$M",
std::bind(&Client::onPreamble, this, std::placeholders::_1, std::placeholders::_2));
break;
case READ_HEADER:
asio::async_read(port, buffer, asio::transfer_exactly(3),
std::bind(&Client::onHeader, this, std::placeholders::_1, std::placeholders::_2));
break;
case READ_PAYLOAD_CRC:
asio::async_read(port, buffer, asio::transfer_exactly(request_received->length+1),
std::bind(&Client::onDataCRC, this, std::placeholders::_1, std::placeholders::_2));
break;
case PROCESS_PAYLOAD:
onProcessMessage();
break;
case END:
parser_state = READ_PREAMBLE;
break;
}
// wait for incoming data
io.run();
io.reset();
}
The PREAMBLE handler onPreamble is called when receiving the PREAMBLE:
void onPreamble(const asio::error_code& error, const std::size_t bytes_transferred) {
std::cout << "onPreamble START" << std::endl;
if(error) { return; }
std::cout << "buffer: " << buffer.in_avail() << "/" << buffer.size() << std::endl;
// ignore and remove header bytes
buffer.consume(bytes_transferred);
std::cout << "buffer: " << buffer.in_avail() << "/" << buffer.size() << std::endl;
buffer.commit(buffer.size());
std::cout << "onPreamble END" << std::endl;
parser_state = READ_HEADER;
}
After this handler, no other handlers get called since the data is in the buffer and no data is left on the device.
What is the correct way to use asio::streambuf such that the handlers of consecutive async_read get called and I can process bytes in order of the state machine? I don't want to process the remaining bytes in onPreamble since it is not guaranteed that these will contain the full message.
You don't need the call to buffer.commit() in the onPreamble() handler. Calling buffer.consume() will remove the header bytes as you expect and leave the remaining bytes (if any were received) in the asio::streambuf for the next read. The streambuf's prepare() and commit() calls are used when you are filling it with data to send to the remote party.
I just finished a blog post and codecast about using the asio::streambuf to perform a simple HTTP GET with a few web servers. It might give you a better idea of how to use async_read_until() and async_read().

getline() throws basic_ios::clear exception after reading the last line

I am writing a unit test for file read using qtestlib, C++ (clang LLVM version 8.0). I have the following code for reading a file line by line.
std::ifstream infile;
try {
infile.open(path.c_str());
std::ios_base::iostate exceptionMask = infile.exceptions() | std::ios::failbit;
infile.exceptions(exceptionMask);
} catch (std::ios_base::failure& e) {
// print the exception
qDebug() << "Exception caught: " << QString::fromStdString(e.what());
}
try {
std::string line;
while (std::getline(infile, line)) {
// print the line
qDebug() << QString::fromStdString(line);
}
} catch (std::ios_base::failure& e) {
qDebug() << "Exception caught: " << QString::fromStdString(e.what());
}
The issue:
The above code reads all the lines in the file and prints it. But after printing the last line, it throws an exception and prints the following,
Exception caught: "basic_ios::clear"
I followed many threads, but could not find the solution to this. Why am I getting this error?
After you have read and printed all the lines, the while (std::getline(infile, line)) will still try to read another line. If it fails totally - zero characters read - it sets failbit to signal its failure.
The odd part of the error message is that, despite its name, basic_ios::clear can be used to set the failure bit and will also throw an exception if you have enabled the same bit with exceptions.
Take a look on documentation of std::getline. Scetion about setting flags:
failbit
The input obtained could not be interpreted as a valid textual representation of an object of this type. In this case,
distr preserves the parameters and internal data it had before the call. Notice that some eofbit cases will also set failbit.
The last sentence is a bit fuzzy, but can explain observed behavior.
I did some experiments and found the pattern. First I've corrected your code this way:
try {
std::string line;
while (std::getline(infile, line)) {
// print the line
qDebug() << QString::fromStdString(line);
if (infile.eof()) {
return;
}
}
} catch (std::ios_base::failure& e) {
qDebug() << "Exception caught: " << QString::fromStdString(e.what());
}
Now if input file ends with empty line I get an exception, if last line doesn't end with "\n" return breaks a loop.
So falbit is set if you are trying to read stream which already reached end of stream.
Without "if" check you are always doing this reading and always get an exception.
For last line empty I have some clues, but have not idea how to explain it nicely. First have to check behavior of other platforms/compilers.

Whitespace in Protocol Buffer Serialization across HTTP with Qt

I am trying to send a Google Protocol Buffer serialized string across an HTTP connection and receive it back ( unmodified ) where I will deserialize it. My problem seems to be with the 'serializeToString' method which takes my string and seems to add newline characters ( and maybe other whitespace ) to the serialized string. In the example below, I am taking the string "camera_stuff" and after serializing it I get a QString with newlines at the front. I have tried other strings with the same result only with different whitespace and newlines added. This causes problems for my deserializing operation as the whitespace is not captured in the HTTP request and so the response containing the serialized string from the server cannot be successfully decoded. I can partially decode it if I guess the whitespace in the serialized string. How can I solve this? Please see the following code - thanks.
I have a protocol buffer .proto file that looks like:
message myInfo {
required string data = 1;
required int32 number = 2;
}
After running the protoc compiler, I construct in it Qt like this:
// Now Construct our Protocol Buffer Data to Send
myInfo testData;
testData.set_data("camera_stuff");
testData.set_number(123456789);
I serialize my data to a string like this:
// Serialize the protocol buffer to a string
std::string serializedProtocolData; // Create a standard string to serialize the protocol buffer contents to
myInfo.SerializeToString(&serializedProtocolData); // Serialize the contents to the string
QString serializedProtocolDataAsQString = QString::fromStdString(serializedProtocolData);
And then I print it out like this:
// Print what we are sending
qDebug() << "Sending Serialized String: " << serializedProtocolDataAsQString;
qDebug() << "Sending Serialized String (ASCII): " << serializedProtocolDataAsQString.toAscii();
qDebug() << "Sending Serialized String (UTF8): " << serializedProtocolDataAsQString.toUtf8();
qDebug() << "Sending Serialized Protocol Buffer";
qDebug() << "Data Number: " << QString::fromStdString(myInfo.data());
qDebug() << "Number: " << (int)myInfo.number();
When I send my data as part of an HTTP multipart message I see those print statements like this ( notice the newlines in the printouts! ):
Composing multipart message...
Sending Serialized String: "
camera_stuffï:"
Sending Serialized String (ASCII): "
camera_stuffï:"
Sending Serialized String (UTF8): "
camera_stuffÂÂï:"
Sending Serialized Protocol Buffer
Data: "camera_stuff"
Number: 123456789
Length of Protocol Buffer serialized message: 22
Loading complete...
The client deserializes the message like this:
// Now deserialize the protocol buffer
string = "\n\n" + string; // Notice that if I don't add the newlines I get nothing!
myInfo protocolBuffer;
protocolBuffer.ParseFromString(string.toStdString().c_str());
std::cout << "DATA: " << protocolBuffer.model() << std::endl;
std::cout << "NUMBER: " << protocolBuffer.serial() << std::endl;
qDebug() << "Received Serialized String: " << string;
qDebug() << "Received Deserialized Protocol Buffer";
qDebug() << "Data: " << QString::fromStdString(protocolBuffer.data());
qDebug() << "Number: " << (int)protocolBuffer.number();
The server gives it back without doing anything to the serialized string and the client prints the following:
RESPONSE: "camera_stuffï:"
DATA: camera_stu
NUMBER: 0
Received Serialized String: "
camera_stuffï:"
Received Deserialized Protocol Buffer
Number: "camera_stu"
Number: 0
So you see the issue is that I cannot guess the whitespace so I cannot seem to reliably deserialize my string. Any thoughts?
A serialized protobuf cannot be treated as a C string because it probably has embedded NULs in it. It's a binary protocol which uses every possible octet value and can only be sent over an 8-bit clean connection. It's also not a valid UTF-8 sequence, and cannot be serialized and deserialized as Unicode. So QString is also not a valid way of storing a serialized protobuf, and I suspect that might be causing you problems as well.
You can use std::string and QByteArray. I strongly suggest you avoid anything else. In particular, this is wrong:
protocolBuffer.ParseFromString(string.toStdString().c_str());
because it will truncate the protobuf at the first NUL. (Your test message doesn't have any, but this will bite you sooner or later.)
As for sending the message over HTTP, you need to be able to ensure that all bytes in the message are sent as-is, which also means that you need to send the length explicitly. You didn't include the code which actually transmits and receives the message, so I can't comment on it (and I don't know the Qt HTTP library well enough in any event), but the fact that 0x0A are being deleted from the front of the message suggests that you are missing something. Make sure you set the content-type in the message part correctly (not text, for example).