boost::asio read - return after all data where read from socket, without waiting for EOF - c++

I'm quite new to boost::asio, I faced one problem I don't really know how to fix, could you please help me.
In general I'm trying to implement proxy based on boost::asio. I'm using async_read_some function to read response from server, something like that:
_ssocket.async_read_some(boost::asio::buffer(_sbuffer),
boost::bind(&connection::handle_server_read_body_some,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred
));
Everything is fine, it reads some bunch of data and call handler. The problem is at the moment when I'm caling async_read_some function - and there is no more data to read from socket. So handler is not called for about ~15 seconds - till EOF will be rased. (So server socket disconnected). I've tried different read functions, and all of them returns only when 1 or mote bytes where read or there was some error.
The thing is that sometimes I don't know how many bytes I need to read - so I just need to read everything what is present. I tried to use
boost::asio::socket_base::bytes_readable
or
_ssocket.available(err)
To figgure out how many bytes avaliable on socket, but the thing is that those function returns number of bytes which could be read without blocking, so I can't base my implementation on that, even from tests I see that sometimes bytes_readable returns 0 - and next call of async_read_some on the same socket - reads bunch of data.
My question is - is there any way to get imidiate return (in case of synchronous call) / handler call (in case of async) when there is no more data to read from socket? Because currently it just hang for 15 sec till EOF.
I will appriciate any advice or tips you can give me.

There's nothing wrong with your usage of Boost.Asio. The problem is that you need to know how to deal with HTTP messages. Basically, you need to detect message type and parse it to get known its length. Server disconnection is not always the case because HTTP supports KEEP-ALIVE (the same connection is used for multiple messages). Please read following quote from RFC 2616:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html
4.4 Message Length
The transfer-length of a message is the length of the message-body as
it appears in the message; that is, after any transfer-codings have
been applied. When a message-body is included with a message, the
transfer-length of that body is determined by one of the following (in
order of precedence):
1.Any response message which "MUST NOT" include a message-body (such as the 1xx, 204, and 304 responses and any response to a HEAD request)
is always terminated by the first empty line after the header fields,
regardless of the entity-header fields present in the message.
2.If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6), unless
the message is terminated by closing the connection.
3.If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the
transfer-length. The Content-Length header field MUST NOT be sent if
these two lengths are different (i.e., if a Transfer-Encoding
header field is present). If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.
4.If the message uses the media type "multipart/byteranges", and the transfer-length is not otherwise specified, then this self- delimiting
media type defines the transfer-length. This media type MUST NOT be
used unless the sender knows that the recipient can parse it; the
presence in a request of a Range header with multiple byte- range
specifiers from a 1.1 client implies that the client can parse
multipart/byteranges responses.
A range header might be forwarded by a 1.0 proxy that does not
understand multipart/byteranges; in this case the server MUST
delimit the message using methods defined in items 1,3 or 5 of
this section.
5.By the server closing the connection. (Closing the connection cannot be used to indicate the end of a request body, since that would leave
no possibility for the server to send back a response.)
For compatibility with HTTP/1.0 applications, HTTP/1.1 requests
containing a message-body MUST include a valid Content-Length header
field unless the server is known to be HTTP/1.1 compliant. If a
request contains a message-body and a Content-Length is not given, the
server SHOULD respond with 400 (bad request) if it cannot determine
the length of the message, or with 411 (length required) if it wishes
to insist on receiving a valid Content-Length.
All HTTP/1.1 applications that receive entities MUST accept the
"chunked" transfer-coding (section 3.6), thus allowing this mechanism
to be used for messages when the message length cannot be determined
in advance.
Messages MUST NOT include both a Content-Length header field and a
non-identity transfer-coding. If the message does include a non-
identity transfer-coding, the Content-Length MUST be ignored.
When a Content-Length is given in a message where a message-body is
allowed, its field value MUST exactly match the number of OCTETs in
the message-body. HTTP/1.1 user agents MUST notify the user when an
invalid length is received and detected.

Related

How can a web server know when an HTTP request is fully received?

I'm currently writing a very simple web server to learn more about low level socket programming. More specifically, I'm using C++ as my main language and I am trying to encapsulate the low level C system calls inside C++ classes with a more high level API.
I have written a Socket class that manages a socket file descriptor and handles opening and closing using RAII. This class also exposes the standard socket operations for a connection oriented socket (TCP) such as bind, listen, accept, connect etc.
After reading the man pages for the send and recv system calls I realized that I needed to call these functions inside some form of loop in order to guarantee that all bytes are successfully sent/received.
My API for sending and receiving looks similar to this
void SendBytes(const std::vector<std::uint8_t>& bytes) const;
void SendStr(const std::string& str) const;
std::vector<std::uint8_t> ReceiveBytes() const;
std::string ReceiveStr() const;
For the send functionality I decided to use a blocking send call inside a loop such as this (it is an internal helper function that works for both std::string and std::vector).
template<typename T>
void Send(const int fd, const T& bytes)
{
using ValueType = typename T::value_type;
using SizeType = typename T::size_type;
const ValueType *const data{bytes.data()};
SizeType bytesToSend{bytes.size()};
SizeType bytesSent{0};
while (bytesToSend > 0)
{
const ValueType *const buf{data + bytesSent};
const ssize_t retVal{send(fd, buf, bytesToSend, 0)};
if (retVal < 0)
{
throw ch::NetworkError{"Failed to send."};
}
const SizeType sent{static_cast<SizeType>(retVal)};
bytesSent += sent;
bytesToSend -= sent;
}
}
This seems to work fine and guarantees that all bytes are sent once the member function returns without throwing an exception.
However, I started running into problems when I began implementing the receive functionality. For my first attempt I used a blocking recv call inside a loop and exited the loop if recv returned 0 indicating that the underlying TCP connection was closed.
template<typename T>
T Receive(const int fd)
{
using SizeType = typename T::size_type;
using ValueType = typename T::value_type;
T result;
const SizeType bufSize{1024};
ValueType buf[bufSize];
while (true)
{
const ssize_t retVal{recv(fd, buf, bufSize, 0)};
if (retVal < 0)
{
throw ch::NetworkError{"Failed to receive."};
}
if (retVal == 0)
{
break; /* Connection is closed. */
}
const SizeType offset{static_cast<SizeType>(retVal)};
result.insert(std::end(result), buf, buf + offset);
}
return result;
}
This works fine as long as the connection is closed by the sender after all bytes have been sent. However, this is not the case when using e.g. Chrome to request a webpage. The connection is kept open and my receive member function is stuck blocked on the recv system call after receiving all bytes in the request. I managed to get around this problem by setting a timeout on the recv call using setsockopt. Basically, I return all bytes received so far once the timeout expires. This feels like a very inelegant solution and I do not think that this is the way web servers handles this issue in reality.
So, on to my question.
How does a web server know when an HTTP request have been fully received?
A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.
HTTP/1.1 is a text-based protocol, with binary POST data added in a somewhat hacky way. When writing a "receive loop" for HTTP, you cannot completely separate the data receiving part from the HTTP parsing part. This is because in HTTP, certain characters have special meaning. In particular, the CRLF (0x0D 0x0A) token is used to separate headers, but also to end the request using two CRLF tokens one after the other.
So to stop receiving, you need to keep receiving data until one of the following happens:
Timeout – follow by sending a timeout response
Two CRLF in the request – follow by parsing the request, then respond as needed (parsed correctly? request makes sense? send data?)
Too much data – certain HTTP exploits aim to exhaust server resources like memory or processes (see e.g. slow loris)
And perhaps other edge cases. Also note that this only applies to requests without a body. For POST requests, you first wait for two CRLF tokens, then read Content-Length bytes in addition. And this is even more complicated when the client is using multipart encoding.
A request header is terminated by an empty line (two CRLFs with nothing between them).
So, when the server has received a request header, and then receives an empty line, and if the request was a GET (which has no payload), it knows the request is complete and can move on to dealing with forming a response. In other cases, it can move on to reading Content-Length worth of payload and act accordingly.
This is a reliable, well-defined property of the syntax.
No Content-Length is required or useful for a GET: the content is always zero-length. A hypothetical Header-Length is more like what you're asking about, but you'd have to parse the header first in order to find it, so it does not exist and we use this property of the syntax instead. As a result of this, though, you may consider adding an artificial timeout and maximum buffer size, on top of your normal parsing, to protect yourself from the occasional maliciously slow or long request.
The solution is within your link
A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.
There it says:
It must use CRLF line endings, and it must end in \r\n\r\n
The answer is formally defined in the HTTP protocol specifications 1:
in W3C's spec for HTTP 0.9.
in RFC 1945 for HTTP 1.0, specifically in Section 4: HTTP Message, Section 5: Request, and Section 7: Entity.
in RFC 2616 for HTTP 1.1, specifically in Section 4: HTTP Message, particular in 4.3: Message Body and 4.4: Message Length.
in RFC 7230 (and 7231...7235) for HTTP 1.1, specifically in Section 3: Message Format, in particular 3.3: Message Body.
So, to summarize, the server first reads the message's initial start-line to determine the request type. If the HTTP version is 0.9, the request is done, as the only supported request is GET without any headers. Otherwise, the server then reads the message's message-headers until a terminating CRLF is reached. Then, only if the request type has a defined message body then the server reads the body according to the transfer format outlined by the request headers (requests and responses are not restricted to using a Content-Length header in HTTP 1.1).
In the case of a GET request, there is no message body defined, so the message ends after the start-line in HTTP 0.9, and after the terminating CRLF of the message-headers in HTTP 1.0 and 1.1.
1: I'm not going to get into HTTP 2.0, which is a whole different ballgame.

Read failed: End of file on succesful https request to AWS s3 using boost::asio [duplicate]

I have a server that receives a compressed string (compressed with zlib) from a client, and I was using async_receive from the boost::asio library to receive this string, it turns out however that there is no guarantee that all bytes will be received, so I now have to change it to async_read. The problem I face is that the size of the bytes received is variable, so I am not sure how to use async_read without knowing the number of bytes to be received. With the async_receive I just have a boost::array<char, 1024>, however this is a buffer that is not necessarily filled completely.
I wondered if anyone can suggest a solution where I can use async_read even though I do not know the number of bytes to be received in advance?
void tcp_connection::start(boost::shared_ptr<ResolverQueueHandler> queue_handler)
{
if (!_queue_handler.get())
_queue_handler = queue_handler;
std::fill(buff.begin(), buff.end(), 0);
//socket_.async_receive(boost::asio::buffer(buff), boost::bind(&tcp_connection::handle_read, shared_from_this(), boost::asio::placeholders::error));
boost::asio::async_read(socket_, boost::asio::buffer(buff), boost::bind(&tcp_connection::handle_read, shared_from_this(), boost::asio::placeholders::error));
}
buff is a boost::array<char, 1024>
How were you expecting to do this using any other method?
There are a few general methods to sending data of variable sizes in an async manor:
By message - meaning that you have a header that defines the length of the expected message followed by a body which contains data of the specified length.
By stream - meaning that you have some marker (and this is very broad) method of knowing when you've gotten a complete packet.
By connection - each complete packet of data is sent in a single connection which is closed once the data is complete.
So can your data be parsed, or a length sent etc...
Use async_read_until and create your own match condition, or change your protocol to send a header including the number of bytes to expect in the compressed string.
A single IP packet is limited to an MTU size of ~1500 bytes, and yet still you can download gigabyte-large files from your favourite website, and watch megabyte-sized videos on YouTube.
You need to send a header indicating the actual size of the raw data, and then receive the data by pieces on smaller chunks until you finish receiving all the bytes.
For example, when you download a large file over HTTP, there is a field on the header indicating the size of the file: Content-Length:.

How to determine length of buffer at client side

I have a server sending a multi-dimensional character array
char buff1[][3] = { {0xff,0xfd,0x18} , {0xff,0xfd,0x1e} , {0xff,0xfd,21} }
In this case the buff1 carries 3 messages (each having 3 characters). There could be multiple instances of buffers on server side with messages of variable length (Note : each message will always have 3 characters). viz
char buff2[][3] = { {0xff,0xfd,0x20},{0xff,0xfd,0x27}}
How should I store the size of these buffers on client side while compiling the code.
The server should send information about the length (and any other structure) of the message with the message as part of the message.
An easy way to do that is to send the number of bytes in the message first, then the bytes in the message. Often you also want to send the version of the protocol (so you can detect mismatches) and maybe even a message id header (so you can send more than one kind of message).
If blazing fast performance isn't the goal (and you are talking over a network interface, which tends to be slower than computers: parsing may be cheap enough that you don't care), using a higher level protocol or format is sometimes a good idea (json, xml, whatever). This also helps with debugging problems, because instead of debugging your custom protocol, you get to debug the higher level format.
Alternatively, you can send some sign that the sequence has terminated. If there is a value that is never a valid sequence element (such as 0,0,0), you could send that to say "no more data". Or you could send each element with a header saying if it is the last element, or the header could say that this element doesn't exist and the last element was the previous one.

C/C++ - implementing http protocol for PUT request

I'm writing a http server and I just had question about how to implement a PUT request.
I am reading a client socket one byte at a time, until I reach a CRLF "\r\n" new line, where I send the line to a parser to be tokenized. When I get two line breaks in a row, I send a response (as it is the http standard to symbolize that the request is finished).
This was fine for implementing GET/HEAD/DELETE. But now I see PUT has the double line break for the content.
PUT /index.html HTTP/1.0
Headers: stuff <--- not the real CRLF 1
<--- not the real CRLF 2
html content goes here <--- CRLF 1
<--- CRLF 2 ... done, send response
That is easy enough to account for. If the first line I parse is PUT, I will just say okay, don't send a request until we get the 2nd CRLF1+2.
But what if the content has line breaks too, then how can I know when the client is -really- done sending me stuff?
The client should send a content-length header field. For a more in depth discussion, see RFC 2616 section 4.4 Message length.

Progress indication with HTTP file download using WinHTTP

I want to implement an progress bar in my C++ windows application when downloading a file using WinHTTP. Any idea how to do this? It looks as though the WinHttpSetStatusCallback is what I want to use, but I don't see what notification to look for... or how to get the "percent downloaded"...
Help!
Thanks!
Per the docs:
WINHTTP_CALLBACK_STATUS_DATA_AVAILABLE
Data is available to be retrieved with
WinHttpReadData. The
lpvStatusInformation parameter points
to a DWORD that contains the number of
bytes of data available. The
dwStatusInformationLength parameter
itself is 4 (the size of a DWORD).
and
WINHTTP_CALLBACK_STATUS_READ_COMPLETE
Data was successfully read from the
server. The lpvStatusInformation
parameter contains a pointer to the
buffer specified in the call to
WinHttpReadData. The
dwStatusInformationLength parameter
contains the number of bytes read.
There may be other relevant notifications, but these two seem to be the key ones. Getting "percent" is not necessarily trivial because you may not know how much data you're getting (not all downloads have content-length set...); you can get the headers with:
WINHTTP_CALLBACK_STATUS_HEADERS_AVAILABLE
The response header has been received
and is available with
WinHttpQueryHeaders. The
lpvStatusInformation parameter is
NULL.
and if Content-Length IS available then the percentage can be computed by keeping track of the total number of bytes at each "data available" notification, otherwise your guess is as good as mine;-).