HTTP keep-alive with C++ recv winsocket2 - c++

I'm Coding my own HTTP fetcher socket. I use C++ in MVC++ and winsocket2.h
I was able to program the socket to connect to the required website's server and send an HTTP GET request.
Now the problem is after I send an HTTP GET request with Keep-alive connection, I call the recv function , and it works fine except after it retrieves the website, it stays lingering and waiting for time-out hint from the server or a connection to close!!
This takes a few seconds of less depending in the keep-alive timeout the servers has,
Therefore, I can't benefit from the keep-alive HTTP settings.
How can I tell the recv function to stop after retrieving the website and gives back the command to me so I can send another HTTP request while avoiding another hand-shake regime.
When I use the non-blocking sockets it works faster, But I don't know when to stop, I set a str.rfind("",-1,7) to stop retrieving data.
however, it is not very efficient.
Does anybody know a way to do it, or what is that last character send by the HTTP server when the connection is kept alive, so I can use it as a stopping decision.
Best,
Moe

Check for a Content-Length: xxxxx header, and only read xxxxx bytes after the header, which is terminated by a blank line (CR-LF-CR-LF in stream).
update
If the data is chunked:
Chunked Transfer-Encoding (reference)
...
A chunked message body contains a
series of chunks, followed by a line
with "0" (zero), followed by optional
footers (just like headers), and a
blank line. Each chunk consists of two
parts:
a line with the size of the chunk
data, in hex, possibly followed by a
semicolon and extra parameters you can
ignore (none are currently standard),
and ending with CRLF.
the data itself,
followed by CRLF.
Also, http://www.w3.org description of Chunked Transfer-Encoding is in section 3.6.1 # http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html.

Set the non-blocking I/O flag on the socket, so that recv will return immediately with only as much data has already been received. Combine this with select, WSAEventSelect, WSAAsyncSelect, or completion ports to be informed when data arrives (instead of busy-waiting).

Related

handle blocked recv() function without knowing the message length before and don't want to use asy I/O

I am a creating a very simple server that accepts http request from Browser(Safari) and responding some dump HTTP response back such as "Hello World" Message.
My program was blocked on the recv() function because it doesn't know whether the the client(browser) finish sending the HTTP request and recv() is a blocking function. (A very typical question)
The most popular answer I found is to send the length of the message before sending the message.
This solution is good but it doesn't work for me because I have no control on what is being sent from the client. And as far as I know, the browser does not send any message length before sending the real message.
The second most popular answer to to use asy I/O such as select() or poll(). But, personally, I don't think it is really a good strategy because once I had already received all the request message from the client, then, of course, I would like to go to the next step to handle the request. Why would I still waste my time and resource to wait for something that will never come even though it is not blocking anymore? (Creating threads poses similar question)
The solution I came up with is to check whether the size of the message received equal to the buffer size. For example, let's say I set the recvBufferSize to be 32 and the total size of the request message is 70. Then I will receive three packets of size 32, 32, 6 respectively.
I can tell that the client finish sending the request because the last packet's size is not equal to the
recvBuffersize(32).
However, as you can see, problems occurs when the request message's size is 64/96/128......
Other approaches may be like setting timeout, but I am not sure whether they are good or not.
And I want to build all the thing by myself so I may not be interested in any library such as zeromq or Boost.Asio
Can some people give some advice on my approach or provide some other better ways to solve the problem? Thanks a lot!
If you're implementing the HTTP protocol you need to study the HTTP RFCs. There are several different ways you can know the request length, starting with the Content-length header, and the combined lengths of the chunks if the client is using chunked transfer encoding.

When is Keep-alive required for TCP Sockets?

As far as I know Keep-alive on a TCP socket is helpful to know if the sockets aren't just opened and a connection is actually alive between the two sockets. So, I have a couple of questions I'd like to inquire regarding the usage of Keepalive in Winsocks2:
What happens when keep-alive option detects a dead socket?
How can I check if connection is alive or dead without actually using
the send and recv? If I have to use send and recv functions then
what's the point of using keep-alive in the first place?
What happens when keep-alive option detects a dead socket?
The connection is reset, and any reads or writes get a 'connection reset' error. Note that keepalive is off by default, and when enabled only operates at two-hour intervals by default.
How can I check if connection is alive or dead without actually using the send and recv?
You can't. TCP/IP is deliberately designed not to have a 'dial tone'. It works much better that way. This is a major reason why it has displaced all the prior protocols such as SNA that did.
If I have to use send and recv functions then what's the point of using keep-alive in the first place?
recv() won't tell you about a broken connection. It may just block forever. You can use read timeouts, but then you have to decide how much time is too much. Or, you can implement an application-level PING.
Keep alive detects if the server at the other end of the connection (or a physical link such as a network being down) has died before you send a message. Otherwise the disconnection is only detected when you actually try to send data, which if your connection is idle for some reason could take a long time.

Are TCP packets reordered usually?

I am reimplementing an old network layer library, but using boost asio this time. Our software is tcpip dialoging with a 3rd party software. Several messages behave very well on both sides, but there is one case I misunderstand:
The 3rd party sends two messages (msg A and B) one after the other (real short timing) but I receive only a part of message A in tcp-packet 1, and the end of message A and the whole message B in tcp-packet 2. (I sniff with wireshark).
I had not thought of this case, I am wondering if it is common with tcp, and if my layer should be adaptative to that case - or should I say to the 3rd party to check what they do on their side so as I received both message in different packets.
Packets can be fragmented and arrive out-of-sequence. The TCP stack which receives them should buffer and reorder them, before presenting the data as an incoming stream to the application layer.
My problem is with message B, that I don't see because it's after the end of message one in the same packet.
You can't rely on "messages" having a one-to-one mapping to "packets": to the application, TCP (not UDP) looks like a "streaming" protocol.
An application which sends via TCP needs another way to separate messages. Sometimes that's done by marking the end of each message. For example SMTP marks the end-of-message as follows:
The transmission of the body of the mail message is initiated with a
DATA command after which it is transmitted verbatim line by line and
is terminated with an end-of-data sequence. This sequence consists of
a new-line (), a single full stop (period), followed by
another new-line. Since a message body can contain a line with just a
period as part of the text, the client sends two periods every time a
line starts with a period; correspondingly, the server replaces every
sequence of two periods at the beginning of a line with a single one.
Such escaping method is called dot-stuffing.
Alternatively, the protocol might specify a prefix at the start of each message, which will indicate the message-length in bytes.
If you're are coding the TCP stack, then you'll have access to the TCP message header: the "Data offset" field tells you how long each message is.
Yes, this is common. TCP/IP is a streaming protocol and your "logical" packet may be split across many "physical" packets, so the client is responsible for assembling the higher-level packets. Additionally, TCP/IP guarantees the proper ordering, so you don't have to worry about assembling out of order packets.
your problem has got nothing to do with TCP at all. your problem is that you expected asio to do the message parsing for you. it does not, you have to implement it.
if your messages are all the same size do an async read for that size.
if they are of different length do a async read for your header size, analyze the header and do an async read for the rest of the message according to the header.
if your messages are of variable length and the size is unknown but there is a defined end character or sequence then you have to save the remaining bytes behind that end sequence and append the next read to that remainder.

How to buffer and process chunked data before sending headers in IIS7 Native Module

So I've been working on porting an IIS6 ISAPI module to IIS7. One problem that I have is that I need to be able to parse and process responses, and then change/delete/add some HTTP headers based on the content. I got this working fine for most content, but it appears to break down when chunked encoding is being used on the response body.
It looks like CHttpModule::OnSendResponse is being called once for each chunk. I've been able to determine when a chunked response is being sent, and to buffer the data until all of the chunks have been passed in, and set the entity count to 0 to prevent it from sending that data out, but after the first OnSendResponse is called the headers are sent to the client already, so I'm not able to modify them later after I've already processed the chunked data.
I realize that doing this is going to eliminate the benefits of the chunked encoding, but in this case it is necessary.
The only example code I can find for IIS Native Modules are very simplistic and don't demonstrate performing any filtering of response data. Any tips or links on this would be great.
Edit: Okay, I found IHttpResponse::SuppressHeaders, which will prevent the headers from being sent after the first OnSendResponse. However, now it will not send the headers at all. So what I did was when it's a chunked response I set it to suppress headers, and then later after I process the response, I check to see if the headers were suppressed, and if they were I read all of the headers from raw response structure (HTTP_RESPONSE), and insert them at the beginning of the response entity chunks myself. This seems to work okay so far.
Still open to other ideas if anybody has any better option.

C++ "HTTP" Server - Chunked Data Transfer

UPDATE: Thank you for the help so far. I've just tested the program connecting directly to it from the browser, instead of thru an XMLHttpRequest. Going straight from the browser is working flawlessly.
However, this connection must be handled via an XMLHTTPRequest. According to FireBug, it's receiving the full response (31 bytes in this case). It closes the connection, sets the readyState to 4. But the responseText is completely empty.
I'm creating a C++ app that accepts connections and responds as if it were an HTTP Server. My goal is to create a real-time chat server by opening connections to this C++ app, and responding with a "page" that continues to load as new messages are sent. I am currently sending the following back:
HTTP/1.1 200 OK\r\n
Transfer-Encoding: chunked\r\n
Content-Type: text/plain\r\n
\r\n
Up to this point, everything works. Using FireBug, I can see that it is properly receiving and interpreting headers. However, I cannot figure out how to forward response text. I know that in plain text, it would be read as follows:
5
Hello
8
Good bye
But every iteration I've tried (with \r\n, without \r\n, counting \r\n as 2 additional bytes) so far does not get properly read by the browser as response text. Can somebody help with crafting a proper string to send as response text?
You should end the transfer with a zero-length chunk:
5
Hello
8
Good bye
0
Otherwise the browser does not know you are finished.
You're trying to implement "HTTP Push" or HTTP streaming or whatever, the issue is that not all browsers will support this correctly, for browsers such as firefox/opera etc, you could try the mime-type multipart/x-mixed-replace, so as long as you keep the connection live and send stuff down, firefox should read, but this will not work in IE...
"Each chunk starts with the number of octets of the data it embeds expressed in hexadecimal followed by optional parameters (chunk extension) and a terminating CRLF (carriage return and line feed) sequence, followed by the chunk data"
Are you using hex for your lengths? The \r\n after the chunk length should not be counted in the length.
Also, try closing out the page with a 0 length. That will let you know if the browser is just buffering before parsing.