how to know if a http request is partial and how to fully parse it before generating a response c++

how to know if a http request is partial and how to fully parse it before generating a response c++ - c++

I am working on a C++ project where i listen on sockets and generate HTTP responses based on the requests i get from my clients on my fds, in short i use my browser to send a request, i end up getting the raw request, i parse it and generate the corresponding http response.
However in the case of large POST requests, usually what happens is that i get partial requests, so in the first part i will usually only find the first line (version/method/uri), some headers but no body, and i guess am supposed to get the rest of the body somehow, however i am unable to figure out two things,
first of all how do i know if the request i am getting is partial or completed from just the first part ? i am not getting any information relating to range, here's the first part i get when my client sends me a POST request.
POST / HTTP/1.1
Host: localhost:8081
Connection: keep-alive
Content-Length: 8535833
Cache-Control: max-age=0
sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
Origin: http://127.0.0.1:8081
Upgrade-Insecure-Requests: 1
DNT: 1
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryOs6fsdbaegBIumqh
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: cross-site
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Referer: http://127.0.0.1:8081/
Accept-Encoding: gzip, deflate, br
Accept-Language: fr,en-US;q=0.9,en;q=0.8
how can i figure out just from this whether or not am getting a partial request or just a faulty request (I need to generate a 400 error in the case of a request that says it has X content-length but the body size is different)
second question is, suppose i already know whether or not its partial, how do i proceed with storing the entire request in a buffer before sending it to my parser and generating a response ? here's my reception function (i already know the client's fd, so i just recv on it
void Client::receive_request(void)
{
char buffer[2024];
int ret;
ret = recv(_fd, buffer, 2024, 0);
buffer[ret] = 0;
_received_request += buffer;
_bytes_request += ret;
std::cout << "Raw Request:\n" << _received_request << std::endl;
if (buffer[ret-1] == '\n')
{
_ready_request = true;
_request.parse(_received_request, _server->get_config());
}
}
and here's the code that checks whether or not a client is attempting to send a request, parse and generate a response
int Connections::check_clients() {
int fd;
for (std::vector<Client*>::iterator client = clients.begin();
client != clients.end() && ready_fd != 0 ; client++)
{
fd = (*client)->get_fd();
if (FD_ISSET(fd, &ready_rset))
{
ready_fd--;
(*client)->receive_request();
if ((*client)->request_is_ready())
{
(*client)->wait_response();
close(fd);
FD_CLR(fd, &active_set);
fd_list.remove(fd);
max_fd = *std::max_element(fd_list.begin(), fd_list.end());
free(*client);
client = clients.erase(client);
}
}
}
return 0;
}
as you can see am coding everything in C++ (98) and would rather not get answers that just dismiss my questions and refer me to different technologies or libraries, unless it will help me understand what am doing wrong and how to handle partial requests.
for info, am only handling HTTP 1.1(GET/POST/DELETE only) and i usually only get this issue when am getting a large chunked file or a file upload that has a very large body. thank you
PS : if needed i can link up the github repo of the current project if you wanna look further into the code

how can i figure out just from this whether or not am getting a partial request or just a faulty request (I need to generate a 400 error in the case of a request that says it has X content-length but the body size is different)
The body size is, by definition, the size of the Content-Length field. Any bytes that you receive afterwards belong to the next HTTP request (see HTTP pipelining). If you do not receive Content-Length bytes within a reasonable time period, then you can make the server issue a 408 Request Timeout error.
second question is, suppose i already know whether or not its partial, how do i proceed with storing the entire request in a buffer before sending it to my parser and generating a response ? here's my reception function (i already know the client's fd, so i just recv on it
Your posted code has at least the following problems:
You should check the return value of recv to determine whether the function succeeded or failed, and if it failed, you should handle the error appropriately. In your current code, if recv fails with the return value -1, then you will write to the array buffer out of bounds, causing undefined behavior.
It does not seem appropriate to use the line if (buffer[ret-1] == '\n'). The HTTP request header will be over when you encounter a "\r\n\r\n", and the HTTP request body will be over when you have read Content-Length bytes of the body. The ends of the header and body will not necessarily occur at the end of the data read by recv, but can also occur in the middle. If you want to support HTTP pipelining, then the additional data should be handled by the handler for the next HTTP request. If you don't want to support HTTP pipelining, then you can simply discard the additional data and use Connection: close in the HTTP response header.
You seem to be using a null terminating character to mark the end of the data read by recv. However, this will not work if a byte with the value 0 is part of the HTTP request. It is probably safe to assume that such a byte should not be part of the HTTP request header, but it is probably not safe to assume that such a byte won't be part of the HTTP request body (for example when using POST with binary data).

Related

How to fix issue with post requests from arduino, only being received 2-3 times before failing

EDIT5:
I eventually fixed this issue by more or less throwing away half my code. Rather than sending data to a ruby server using HTTP, I'm now using MQTT to a broker to a NodeJS server. The NodeJS part here isn't important but to anyone else having this issue I STRONGLY recommend sending all IoT data using MQTT, and that's what solved my issue.
I'm currently trying to send data collected from sensors on an Arduino WiFi rev2, to my rails server hosted on Heroku. I do this by sending my data in a JSON format. My problem is that while my methods seem to work initially, with the first few POST requests being received and processed fine, after 2-3 requests the arduino hangs, and I receive status code: -2. I'm using the ArduinoHttpClient library.
I've tried using a local server, which has the same problem, aswell as sending the POST request via both curl and postman. Both curl and postman seem to work as expected, so I imagine the issue is with the arduino code although I can't be sure.
client.beginRequest();
client.post("/input");
client.sendHeader("Content-Type", "application/json");
client.sendHeader("Content-Length", postData.length());
client.beginBody();
client.println(postData);
client.endRequest();
LED(0,128,0);
Serial.println("Gone");
int statusCode = client.responseStatusCode();
String response = client.responseBody();
Serial.print("Status code: ");
Serial.println(statusCode);
Serial.print("Response: ");
Serial.println(response);
When this code fails, the arduino will hang for about 20-40 seconds and I will receive 'status code -3' from the Serial. However I have also received status code -2 and -4 in the past.
When it does succeed I receive the following: "Status code: 204" which is what I would expect.
EDIT:
I've since tried posting to requestcatcher.com, and the problem persisted. I'm therefore fairly confident this is an arduino problem, I also received the following output:
POST /input HTTP/1.1
Host: arduino.requestcatcher.com
Connection: close
Connection: close
Content-Length: 88
Content-Type: application/json
User-Agent: Arduino/2.2.0
{"inputs":[{"input_id":"1","value":1.778002}{"input_id":"2","value":18.037}],"id":"13"}
EDIT 2:
I accidentally discovered that the POST requests go through fine if the "Content-Length:" Header is omitted. Obviously no JSON actually gets sent so this does not fix my issue but it is likely that this header or the JSON itself is the issue.
EDIT 3:
Regardless of server I receive either status code -4, or -3, even on request catcher.
EDIT 4:
After various adjustments, code now looks as below. This seems to have helped a little and it fails less often but still does fail. I'm beginning to wonder if this is a problem with ArduinoHttpClient.
String postData = "";
serializeJson(doc, postData);
serializeJson(doc, Serial);
Serial.println(postData)
client.post("/input", "application/json", postData.c_str());
LED(0,128,0);
Serial.println("Gone");
int statusCode = client.responseStatusCode();
Serial.print("Status code: ");
Serial.println(statusCode);
client.stop();
doc.clear();
lastCycle = millis();

Try replacing
client.beginRequest();
client.post("/input");
client.sendHeader("Content-Type", "application/json");
client.sendHeader("Content-Length", postData.length());
client.beginBody();
client.println(postData);
client.endRequest();
with just
String contentType = "application/json";
client.post("/input", contentType, postData);
or
client.post("/input", "application/json", postData.c_str());
You don't need to explicitly specify the request headers - or call beginRequest(), etc. - when using the post() method(s) in that library.

How can a web server know when an HTTP request is fully received?

I'm currently writing a very simple web server to learn more about low level socket programming. More specifically, I'm using C++ as my main language and I am trying to encapsulate the low level C system calls inside C++ classes with a more high level API.
I have written a Socket class that manages a socket file descriptor and handles opening and closing using RAII. This class also exposes the standard socket operations for a connection oriented socket (TCP) such as bind, listen, accept, connect etc.
After reading the man pages for the send and recv system calls I realized that I needed to call these functions inside some form of loop in order to guarantee that all bytes are successfully sent/received.
My API for sending and receiving looks similar to this
void SendBytes(const std::vector<std::uint8_t>& bytes) const;
void SendStr(const std::string& str) const;
std::vector<std::uint8_t> ReceiveBytes() const;
std::string ReceiveStr() const;
For the send functionality I decided to use a blocking send call inside a loop such as this (it is an internal helper function that works for both std::string and std::vector).
template<typename T>
void Send(const int fd, const T& bytes)
{
using ValueType = typename T::value_type;
using SizeType = typename T::size_type;
const ValueType *const data{bytes.data()};
SizeType bytesToSend{bytes.size()};
SizeType bytesSent{0};
while (bytesToSend > 0)
{
const ValueType *const buf{data + bytesSent};
const ssize_t retVal{send(fd, buf, bytesToSend, 0)};
if (retVal < 0)
{
throw ch::NetworkError{"Failed to send."};
}
const SizeType sent{static_cast<SizeType>(retVal)};
bytesSent += sent;
bytesToSend -= sent;
}
}
This seems to work fine and guarantees that all bytes are sent once the member function returns without throwing an exception.
However, I started running into problems when I began implementing the receive functionality. For my first attempt I used a blocking recv call inside a loop and exited the loop if recv returned 0 indicating that the underlying TCP connection was closed.
template<typename T>
T Receive(const int fd)
{
using SizeType = typename T::size_type;
using ValueType = typename T::value_type;
T result;
const SizeType bufSize{1024};
ValueType buf[bufSize];
while (true)
{
const ssize_t retVal{recv(fd, buf, bufSize, 0)};
if (retVal < 0)
{
throw ch::NetworkError{"Failed to receive."};
}
if (retVal == 0)
{
break; /* Connection is closed. */
}
const SizeType offset{static_cast<SizeType>(retVal)};
result.insert(std::end(result), buf, buf + offset);
}
return result;
}
This works fine as long as the connection is closed by the sender after all bytes have been sent. However, this is not the case when using e.g. Chrome to request a webpage. The connection is kept open and my receive member function is stuck blocked on the recv system call after receiving all bytes in the request. I managed to get around this problem by setting a timeout on the recv call using setsockopt. Basically, I return all bytes received so far once the timeout expires. This feels like a very inelegant solution and I do not think that this is the way web servers handles this issue in reality.
So, on to my question.
How does a web server know when an HTTP request have been fully received?
A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.

HTTP/1.1 is a text-based protocol, with binary POST data added in a somewhat hacky way. When writing a "receive loop" for HTTP, you cannot completely separate the data receiving part from the HTTP parsing part. This is because in HTTP, certain characters have special meaning. In particular, the CRLF (0x0D 0x0A) token is used to separate headers, but also to end the request using two CRLF tokens one after the other.
So to stop receiving, you need to keep receiving data until one of the following happens:
Timeout – follow by sending a timeout response
Two CRLF in the request – follow by parsing the request, then respond as needed (parsed correctly? request makes sense? send data?)
Too much data – certain HTTP exploits aim to exhaust server resources like memory or processes (see e.g. slow loris)
And perhaps other edge cases. Also note that this only applies to requests without a body. For POST requests, you first wait for two CRLF tokens, then read Content-Length bytes in addition. And this is even more complicated when the client is using multipart encoding.

A request header is terminated by an empty line (two CRLFs with nothing between them).
So, when the server has received a request header, and then receives an empty line, and if the request was a GET (which has no payload), it knows the request is complete and can move on to dealing with forming a response. In other cases, it can move on to reading Content-Length worth of payload and act accordingly.
This is a reliable, well-defined property of the syntax.
No Content-Length is required or useful for a GET: the content is always zero-length. A hypothetical Header-Length is more like what you're asking about, but you'd have to parse the header first in order to find it, so it does not exist and we use this property of the syntax instead. As a result of this, though, you may consider adding an artificial timeout and maximum buffer size, on top of your normal parsing, to protect yourself from the occasional maliciously slow or long request.

The solution is within your link
A GET request in HTTP 1.1 does not seem to include a Content-Length header. See e.g. this link.
There it says:
It must use CRLF line endings, and it must end in \r\n\r\n

The answer is formally defined in the HTTP protocol specifications 1:
in W3C's spec for HTTP 0.9.
in RFC 1945 for HTTP 1.0, specifically in Section 4: HTTP Message, Section 5: Request, and Section 7: Entity.
in RFC 2616 for HTTP 1.1, specifically in Section 4: HTTP Message, particular in 4.3: Message Body and 4.4: Message Length.
in RFC 7230 (and 7231...7235) for HTTP 1.1, specifically in Section 3: Message Format, in particular 3.3: Message Body.
So, to summarize, the server first reads the message's initial start-line to determine the request type. If the HTTP version is 0.9, the request is done, as the only supported request is GET without any headers. Otherwise, the server then reads the message's message-headers until a terminating CRLF is reached. Then, only if the request type has a defined message body then the server reads the body according to the transfer format outlined by the request headers (requests and responses are not restricted to using a Content-Length header in HTTP 1.1).
In the case of a GET request, there is no message body defined, so the message ends after the start-line in HTTP 0.9, and after the terminating CRLF of the message-headers in HTTP 1.0 and 1.1.
1: I'm not going to get into HTTP 2.0, which is a whole different ballgame.

boost::asio read - return after all data where read from socket, without waiting for EOF

I'm quite new to boost::asio, I faced one problem I don't really know how to fix, could you please help me.
In general I'm trying to implement proxy based on boost::asio. I'm using async_read_some function to read response from server, something like that:
_ssocket.async_read_some(boost::asio::buffer(_sbuffer),
boost::bind(&connection::handle_server_read_body_some,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred
));
Everything is fine, it reads some bunch of data and call handler. The problem is at the moment when I'm caling async_read_some function - and there is no more data to read from socket. So handler is not called for about ~15 seconds - till EOF will be rased. (So server socket disconnected). I've tried different read functions, and all of them returns only when 1 or mote bytes where read or there was some error.
The thing is that sometimes I don't know how many bytes I need to read - so I just need to read everything what is present. I tried to use
boost::asio::socket_base::bytes_readable
or
_ssocket.available(err)
To figgure out how many bytes avaliable on socket, but the thing is that those function returns number of bytes which could be read without blocking, so I can't base my implementation on that, even from tests I see that sometimes bytes_readable returns 0 - and next call of async_read_some on the same socket - reads bunch of data.
My question is - is there any way to get imidiate return (in case of synchronous call) / handler call (in case of async) when there is no more data to read from socket? Because currently it just hang for 15 sec till EOF.
I will appriciate any advice or tips you can give me.

There's nothing wrong with your usage of Boost.Asio. The problem is that you need to know how to deal with HTTP messages. Basically, you need to detect message type and parse it to get known its length. Server disconnection is not always the case because HTTP supports KEEP-ALIVE (the same connection is used for multiple messages). Please read following quote from RFC 2616:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html
4.4 Message Length
The transfer-length of a message is the length of the message-body as
it appears in the message; that is, after any transfer-codings have
been applied. When a message-body is included with a message, the
transfer-length of that body is determined by one of the following (in
order of precedence):
1.Any response message which "MUST NOT" include a message-body (such as the 1xx, 204, and 304 responses and any response to a HEAD request)
is always terminated by the first empty line after the header fields,
regardless of the entity-header fields present in the message.
2.If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is
defined by use of the "chunked" transfer-coding (section 3.6), unless
the message is terminated by closing the connection.
3.If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the
transfer-length. The Content-Length header field MUST NOT be sent if
these two lengths are different (i.e., if a Transfer-Encoding
header field is present). If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.
4.If the message uses the media type "multipart/byteranges", and the transfer-length is not otherwise specified, then this self- delimiting
media type defines the transfer-length. This media type MUST NOT be
used unless the sender knows that the recipient can parse it; the
presence in a request of a Range header with multiple byte- range
specifiers from a 1.1 client implies that the client can parse
multipart/byteranges responses.
A range header might be forwarded by a 1.0 proxy that does not
understand multipart/byteranges; in this case the server MUST
delimit the message using methods defined in items 1,3 or 5 of
this section.
5.By the server closing the connection. (Closing the connection cannot be used to indicate the end of a request body, since that would leave
no possibility for the server to send back a response.)
For compatibility with HTTP/1.0 applications, HTTP/1.1 requests
containing a message-body MUST include a valid Content-Length header
field unless the server is known to be HTTP/1.1 compliant. If a
request contains a message-body and a Content-Length is not given, the
server SHOULD respond with 400 (bad request) if it cannot determine
the length of the message, or with 411 (length required) if it wishes
to insist on receiving a valid Content-Length.
All HTTP/1.1 applications that receive entities MUST accept the
"chunked" transfer-coding (section 3.6), thus allowing this mechanism
to be used for messages when the message length cannot be determined
in advance.
Messages MUST NOT include both a Content-Length header field and a
non-identity transfer-coding. If the message does include a non-
identity transfer-coding, the Content-Length MUST be ignored.
When a Content-Length is given in a message where a message-body is
allowed, its field value MUST exactly match the number of OCTETs in
the message-body. HTTP/1.1 user agents MUST notify the user when an
invalid length is received and detected.

Sending HTML tag to browser via socket connection with C++ Socket API

I am trying to make a simple http server with C++. I've followed the beej's guide of network programming in C++.
When I ran the server in some port (8080, 2127, etc.) it successfully send response to browser (Firefox) when it accessed via address bar with: localhost:PORT_NUMBER except in port 80.
This is the code i wrote:
printf("Server: Got connection from %s\n", this->client_ip);
if(!fork()) // This is the child process, fork() -> Copy and run process
{
close(this->server_socket); // Child doesn't need listener socket
// Try to send message to client
char message[] = "\r\nHTTP/1.1 \r\nContent-Type: text/html; charset=ISO-8859-4 \r\n<h1>Hello, client! Welcome to the Virtual Machine Web..</h1>";
int length = strlen(message); // Plus 1 for null terminator
int send_res = send(this->connection, message, length, 0); // Flag = 0
if(send_res == -1)
{
perror("send");
}
close(this->connection);
exit(0);
}
close(this->connection); // Parent doesn't need this;
The problem is, even I have added the header on very early of the response string, why does the browser not showing the HTML properly instead shows only plain text? It shows something like this:
Content-Type: text/html; charset=ISO-8859-4
<h1>Hello, client! Welcome to the Virtual Machine Web..</h1>
Not a big "Hello, client!.." string like a normally h1 tagged string. What is the problem? Am I missing something in the header?
Another question is, why is the server won't running in port 80? The error log in server says:
server: bind: Permission denied
server: bind: Permission denied
Server failed to bind
libc++abi.dylib: terminate called throwing an exception
Please help. Thank you. Edit: I'dont have any process on Port 80.

You need to terminate the HTTP response header with \r\n\r\n, rather than just \r\n. It should also start with something more like HTTP/1.1 200 OK\r\n, without the leading \r\n.
For your port problem, if you have nothing else running on the port in question, you may find that the socket created by the last run of your program is still sticking around. To work around this, you can use setsockopt to set the SO_REUSEADDR flag on the socket. (This is not recommended for general use, I believe because you may receive data not intended for your program, but for development it's extremely handy.)

Your request starts with \r\n when it shouldn't also it did not specify a status code and you need a blank line after all the headers.
char message[] = "HTTP/1.1 200 Okay\r\nContent-Type: text/html; charset=ISO-8859-4 \r\n\r\n<h1>Hello, client! Welcome to the Virtual Machine Web..</h1>";
As for your port 80 issue, some other application maybe bound to it.

you need to add "Content-length: ", and the length is your HTML code, just like this:
char msg[] = "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-length: 20\r\n\r\n<h1>Hello World</h1>";

Broken HTML - browsers don't downloads whole HTTP response from my webserver, CURL does

Symptom
I think, I messed up something, because both Mozilla Firefox and Google Chrome produce the same error: they don't receive the whole response the webserver sends them. CURL never misses, the last line of the quick-scrolling response is always "</html>".
Reason
The reason is, that I send response in more part:
sendHeaders(); // is calls sendResponse with a fix header
sendResponse(html_opening_part);
for ( ...scan some data... ) {
sendResponse(the_data);
} // for
sendResponse(html_closing_part)
The browsers stop receiving data between sendResponse() calls. Also, the webserver does not close() the socket, just at the end.
(Why I'm doing this way: the program I write is designed for non-linux system, it will run on an embedded computer. It has not too much memory, which is mostly occupied by lwIP stack. So, avoid collecting the - relativelly - huge webpage, I send it in parts. Browsers like it, no broken HTML occurred as under Linux.)
Environment
The platform is GNU/Linux (Ubuntu 32-bit with 3.0 kernel). My small webserver sends the stuff back to the client standard way:
int sendResponse(char* data,int length) {
int x = send(fd,data,length,MSG_NOSIGNAL);
if (x == -1) {
perror("this message never printed, so there's no error \n");
if (errno == EPIPE) return 0;
if (errno == ECONNRESET) return 0;
... panic() ... (never happened) ...
} // if send()
} // sendResponse()
And here's the fixed header I am using:
sendResponse(
"HTTP/1.0 200 OK\n"
"Server: MyTinyWebServer\n"
"Content-Type: text/html; charset=UTF-8\n"
"Cache-Control: no-store, no-cache\n"
"Pragma: no-cache\n"
"Connection: close\n"
"\n"
);
Question
Is this normal? Do I have to send the whole response with a single send()? (Which I'm working on now, until a quick solution arrives.)

If you read RFC 2616, you'll see that you should be using CR+LF for the ends of lines.
Aside from that, open the browser developer tools to see the exact requests they are making. Use a tool like Netcat to duplicate the requests, then eliminate each header in turn until it starts working.

Gotcha!
As #Jim adviced, I've tried sending same headers with CURL, as Mozilla does: fail, broken pipe, etc. I've deleted half of headers: okay. I've added back one by one: fail. Deleted another half of headers: okay... So, there is error, only if header is too long. Bingo.
As I've said, there're very small amount of memory in the embedded device. So, I don't read the whole request header, only 256 bytes of them. I need only the GET params and "Host" header (even I don't need it really, just to perform redirects with the same "Host" instead of IP address).
So, if I don't recv() the whole request header, I can not send() back the whole response.
Thanks for your advices, dudes!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js