How to buffer and process chunked data before sending headers in IIS7 Native Module - c++

So I've been working on porting an IIS6 ISAPI module to IIS7. One problem that I have is that I need to be able to parse and process responses, and then change/delete/add some HTTP headers based on the content. I got this working fine for most content, but it appears to break down when chunked encoding is being used on the response body.
It looks like CHttpModule::OnSendResponse is being called once for each chunk. I've been able to determine when a chunked response is being sent, and to buffer the data until all of the chunks have been passed in, and set the entity count to 0 to prevent it from sending that data out, but after the first OnSendResponse is called the headers are sent to the client already, so I'm not able to modify them later after I've already processed the chunked data.
I realize that doing this is going to eliminate the benefits of the chunked encoding, but in this case it is necessary.
The only example code I can find for IIS Native Modules are very simplistic and don't demonstrate performing any filtering of response data. Any tips or links on this would be great.
Edit: Okay, I found IHttpResponse::SuppressHeaders, which will prevent the headers from being sent after the first OnSendResponse. However, now it will not send the headers at all. So what I did was when it's a chunked response I set it to suppress headers, and then later after I process the response, I check to see if the headers were suppressed, and if they were I read all of the headers from raw response structure (HTTP_RESPONSE), and insert them at the beginning of the response entity chunks myself. This seems to work okay so far.
Still open to other ideas if anybody has any better option.

Related

How do I avoid body_limit error in boost's beast and correctly handle large messages

I have cases when there is a chunked response which is too big for beast, and I want to stop before I get to beast's body_limit, and continue handling the message from that point using plain boost::asio. Mind that this (obviously) means I already received the header and a large part of the body.
I'm using it for a reverse proxy, so basically what I want to do is somehow send the incomplete response to the http client, while continuing relaying the remaining response data using boost::asio.
I'm guessing I'll need to somehow serialize the incomplete response, maybe using operator<< to std::stringstream, send that to the client using boost::asio, and continue the communication from there.
Will this work? Is this the correct way of doing that, or is there a better way, maybe even using beast api? Is there another way to handle chunked messages that are about to exceed body_limit in beast's api?
Thanks in advance,
David.
UPDATE
I finally abandoned the idea of falling back to boost asio, and am now trying to receive the http message (chunked or regular) in chunks with a fixed size buffer, so that I don't reach body limit. I'm just done skimming over Receive/parse the message body one chunk at a time · Issue #154 · boostorg/beast, and it seems that it's exactly what I need. I'm trying to implement a reverse proxy as well.. I tried to use Incremental Read 💡 - 1.70.0 but get a Reference to non-static member function must be called error when trying to compile this line:
ctx->response.get().body().data = response_buffer;
Maybe the incremental read example page is not updated with the latest syntax? Do you have an example relevant for the reverse proxy I'm trying to write?
Thanks in advance,
David
Maybe the incremental read example page is not updated with the latest syntax? Do you have an example relevant for the reverse proxy I'm trying to write?
The examples in the docs are compiled, so they can't possibly be out of date. Perhaps you are mixing different versions of the example and Beast? Are you using http::buffer_body? What does the declaration of your message look like?
By default, Beast's parser limits the size of the body to 1MB for requests and 8MB for responses. This is to prevent trivial resource exhaustion attacks. You can always increase the limit, or eliminate it entirely (by setting it to the largest uint64_t) by calling parser::body_limit :
https://www.boost.org/doc/libs/1_71_0/libs/beast/doc/html/beast/ref/boost__beast__http__parser/body_limit.html

Python: reading from rfile in SimpleHTTPRequestHandler

While overloading SimpleHTTPRequestHandler, my function blocks on self.rfile.read(). How do I find out if there is any data in rfile before reading? Alternatively, is there a non-blocking read call that returns in absence of data?
For the record the solution is to only try to read as many bytes as specified in the header Content-Length. ie something like:
contentLength = int(request.headers['Content-Length'])
payload = str(request.rfile.read(contentLength))
I just solved a case like this.
I'm pretty sure what's "gottcha" on this is an infinite stream of bits being written to your socket by the client your connected to. This is often called "pre-connect" and it happens because http/tcp/HttpServer doesn't make a significant distinction between "chunks" and single bytes being fed slowly to the connection. If you see the response header contains Transfer-Encoding: chunked you are a candidate for this happening. Google Chrome works this way and is a good test. If fire fox and IE work, but Chrome doesn't when you recieve a reponse from the same website, then this what's probably happening.
Possible solution:
In CPython 3.7 the HttpServer has options that support pre-connect. Look at the HttpThreadingServer in the docs.
Put the request handing in a separate thread, and timeout that thread when the read operation takes too long.
Both of these are potentially painful solutions, but at least it should get you started.

using libevent to read a continuous http stream and sending data at random times

Firstly I think I need to say that I'm still learning C++ so apologies if this is blindingly obvious/simple.
I'm trying to use the libevent library (by trying I've looked through code in the sample folder and tested some) in my C++ program to consume an http stream. I'm wondering if anyone can provide me with an example of how I'd go about connecting to a URL e.g. live.domain.com, sending the appropriate headers, read the data returned and send data back over the same connection...
I'm not sure libevent does any blocking connections but just to be explicit, I'm after non-blocking samples.
Why am I trying to do this?
I'm using an API which requires you to open a connection and it keeps it alive unless there's an error. It'll periodically send status texts to the connected client until it receives a string with an ID over the same connection. At which point it starts sending data back about the ID given... I'm not entirely sure sending data back over the same connection after the initial request is strictly compliant but that's what the server expects so it'll work...if I knew how
Thanks in advance
Yuck. Given that this isn't really HTTP, I don't think you're going to be happy using a HTTP library - even if you get it to work today after a lot of frustration, it could easily be broken tomorrow. This is too rare to be a supported feature.
But...it sounds like it's also simple enough that you could just open a raw TCP connection with libevent, manually send something that looks kind of like an HTTP request, and handle it with raw sockets from there. You don't want the extra stuff a HTTP library gets you anyway (additional transfer/content encodings, proxy support, SSL, compatibility with other protocol versions, ...)
As far as examples go, look at the libevent book. In particular, the a "Trivial HTTP v0 client" that seems very close to what you want. Good luck!

HTTP keep-alive with C++ recv winsocket2

I'm Coding my own HTTP fetcher socket. I use C++ in MVC++ and winsocket2.h
I was able to program the socket to connect to the required website's server and send an HTTP GET request.
Now the problem is after I send an HTTP GET request with Keep-alive connection, I call the recv function , and it works fine except after it retrieves the website, it stays lingering and waiting for time-out hint from the server or a connection to close!!
This takes a few seconds of less depending in the keep-alive timeout the servers has,
Therefore, I can't benefit from the keep-alive HTTP settings.
How can I tell the recv function to stop after retrieving the website and gives back the command to me so I can send another HTTP request while avoiding another hand-shake regime.
When I use the non-blocking sockets it works faster, But I don't know when to stop, I set a str.rfind("",-1,7) to stop retrieving data.
however, it is not very efficient.
Does anybody know a way to do it, or what is that last character send by the HTTP server when the connection is kept alive, so I can use it as a stopping decision.
Best,
Moe
Check for a Content-Length: xxxxx header, and only read xxxxx bytes after the header, which is terminated by a blank line (CR-LF-CR-LF in stream).
update
If the data is chunked:
Chunked Transfer-Encoding (reference)
...
A chunked message body contains a
series of chunks, followed by a line
with "0" (zero), followed by optional
footers (just like headers), and a
blank line. Each chunk consists of two
parts:
a line with the size of the chunk
data, in hex, possibly followed by a
semicolon and extra parameters you can
ignore (none are currently standard),
and ending with CRLF.
the data itself,
followed by CRLF.
Also, http://www.w3.org description of Chunked Transfer-Encoding is in section 3.6.1 # http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html.
Set the non-blocking I/O flag on the socket, so that recv will return immediately with only as much data has already been received. Combine this with select, WSAEventSelect, WSAAsyncSelect, or completion ports to be informed when data arrives (instead of busy-waiting).

Stop QNetworkRequest buffering entire request

How can I stop QNetworkRequest from buffering the entire contents of a QIODevice during a put/post to an HTTPS connection? It works fine when posting to HTTP but HTTPS causes the entire file to be read into memory before the post starts.
This isn't supported using the Qt classes. The reason is that Qt needs to know the total data length for the SSL headers. Chunked encoding is not supported from a send perspective. You can however roll your own - you'll need to create your own SSL header, then create your own chunks of SSL-encoded data.
I suggest you wrap this all up in your own class, so it's nicely re-usable (why not post it online?).
BTW, most of this information was taken from a recent thread on the Qt-interest mailing list - a thread on the 30th September 2009 discussed this exact problem.
You may probably have more success with Qt 4.6. It has some bugfixes regarding that.