Python: reading from rfile in SimpleHTTPRequestHandler - python-2.7

While overloading SimpleHTTPRequestHandler, my function blocks on self.rfile.read(). How do I find out if there is any data in rfile before reading? Alternatively, is there a non-blocking read call that returns in absence of data?

For the record the solution is to only try to read as many bytes as specified in the header Content-Length. ie something like:
contentLength = int(request.headers['Content-Length'])
payload = str(request.rfile.read(contentLength))

I just solved a case like this.
I'm pretty sure what's "gottcha" on this is an infinite stream of bits being written to your socket by the client your connected to. This is often called "pre-connect" and it happens because http/tcp/HttpServer doesn't make a significant distinction between "chunks" and single bytes being fed slowly to the connection. If you see the response header contains Transfer-Encoding: chunked you are a candidate for this happening. Google Chrome works this way and is a good test. If fire fox and IE work, but Chrome doesn't when you recieve a reponse from the same website, then this what's probably happening.
Possible solution:
In CPython 3.7 the HttpServer has options that support pre-connect. Look at the HttpThreadingServer in the docs.
Put the request handing in a separate thread, and timeout that thread when the read operation takes too long.
Both of these are potentially painful solutions, but at least it should get you started.

Related

Boost ASIO - process data while still receiving

I am new to Boost ASIO and have the following use case:
A client sends 1 MB data to a server. The server is able to process each byte of the data independent from the remaining data. My current solution is using the read_some and write_some methods for the server and client, respectively. This works well, but I would like to speed up my implementation by letting the server directly process the data while it still receives them. I already had a look at the documented examples but could not find one that fits my requirements.
I also wonder how I can take track how many bytes are received so far. I always have the same amount of data that the client sends.
Thank you in advance! Best regards.

How do I avoid body_limit error in boost's beast and correctly handle large messages

I have cases when there is a chunked response which is too big for beast, and I want to stop before I get to beast's body_limit, and continue handling the message from that point using plain boost::asio. Mind that this (obviously) means I already received the header and a large part of the body.
I'm using it for a reverse proxy, so basically what I want to do is somehow send the incomplete response to the http client, while continuing relaying the remaining response data using boost::asio.
I'm guessing I'll need to somehow serialize the incomplete response, maybe using operator<< to std::stringstream, send that to the client using boost::asio, and continue the communication from there.
Will this work? Is this the correct way of doing that, or is there a better way, maybe even using beast api? Is there another way to handle chunked messages that are about to exceed body_limit in beast's api?
Thanks in advance,
David.
UPDATE
I finally abandoned the idea of falling back to boost asio, and am now trying to receive the http message (chunked or regular) in chunks with a fixed size buffer, so that I don't reach body limit. I'm just done skimming over Receive/parse the message body one chunk at a time · Issue #154 · boostorg/beast, and it seems that it's exactly what I need. I'm trying to implement a reverse proxy as well.. I tried to use Incremental Read 💡 - 1.70.0 but get a Reference to non-static member function must be called error when trying to compile this line:
ctx->response.get().body().data = response_buffer;
Maybe the incremental read example page is not updated with the latest syntax? Do you have an example relevant for the reverse proxy I'm trying to write?
Thanks in advance,
David
Maybe the incremental read example page is not updated with the latest syntax? Do you have an example relevant for the reverse proxy I'm trying to write?
The examples in the docs are compiled, so they can't possibly be out of date. Perhaps you are mixing different versions of the example and Beast? Are you using http::buffer_body? What does the declaration of your message look like?
By default, Beast's parser limits the size of the body to 1MB for requests and 8MB for responses. This is to prevent trivial resource exhaustion attacks. You can always increase the limit, or eliminate it entirely (by setting it to the largest uint64_t) by calling parser::body_limit :
https://www.boost.org/doc/libs/1_71_0/libs/beast/doc/html/beast/ref/boost__beast__http__parser/body_limit.html

handle blocked recv() function without knowing the message length before and don't want to use asy I/O

I am a creating a very simple server that accepts http request from Browser(Safari) and responding some dump HTTP response back such as "Hello World" Message.
My program was blocked on the recv() function because it doesn't know whether the the client(browser) finish sending the HTTP request and recv() is a blocking function. (A very typical question)
The most popular answer I found is to send the length of the message before sending the message.
This solution is good but it doesn't work for me because I have no control on what is being sent from the client. And as far as I know, the browser does not send any message length before sending the real message.
The second most popular answer to to use asy I/O such as select() or poll(). But, personally, I don't think it is really a good strategy because once I had already received all the request message from the client, then, of course, I would like to go to the next step to handle the request. Why would I still waste my time and resource to wait for something that will never come even though it is not blocking anymore? (Creating threads poses similar question)
The solution I came up with is to check whether the size of the message received equal to the buffer size. For example, let's say I set the recvBufferSize to be 32 and the total size of the request message is 70. Then I will receive three packets of size 32, 32, 6 respectively.
I can tell that the client finish sending the request because the last packet's size is not equal to the
recvBuffersize(32).
However, as you can see, problems occurs when the request message's size is 64/96/128......
Other approaches may be like setting timeout, but I am not sure whether they are good or not.
And I want to build all the thing by myself so I may not be interested in any library such as zeromq or Boost.Asio
Can some people give some advice on my approach or provide some other better ways to solve the problem? Thanks a lot!
If you're implementing the HTTP protocol you need to study the HTTP RFCs. There are several different ways you can know the request length, starting with the Content-length header, and the combined lengths of the chunks if the client is using chunked transfer encoding.

Resume Ability for a simple Download Manager (C++ - WinInet)

I'm writing a very simple download manager which just can Download - Pause - Resume,
how is it possible to resume the download from the exact point of the file that stop before, well actually the only thing I'm looking for is how to set the file pointer in server side and then I can download it from the exact point i wanted by InternetReadFile (Any Other Functions are accepted if you know a better way for it).
Although, InternetSetFilePointer Never works for me :) and I don't want to use BITS.
I think this can be happen by sending a header but don't know what and how to send it.
You are looking for the Range header. Use HttpAddRequestHeaders() to add a custom Range request header telling the server what range of bytes you want. See RFC 2616 Section 14.35 for syntax.
If the server supports ranges (use HttpQueryInfo(HTTP_QUERY_ACCEPT_RANGES) to verify), it will send a 206 status code instead of a 200 status code (use HttpQueryInfo(HTTP_QUERY_STATUS_CODE) to verify).
If 206 is received, simply seek your existing file to the resume position and then read the response data as-is to your file.
If 200 is received, the file is starting over from the beginning, so either:
truncate the existing file and start writing it fresh
seek the file, read and discard the response data until it reaches the desired position, then read the remaining data into your file.
Treat any other status code as a download failure.

using libevent to read a continuous http stream and sending data at random times

Firstly I think I need to say that I'm still learning C++ so apologies if this is blindingly obvious/simple.
I'm trying to use the libevent library (by trying I've looked through code in the sample folder and tested some) in my C++ program to consume an http stream. I'm wondering if anyone can provide me with an example of how I'd go about connecting to a URL e.g. live.domain.com, sending the appropriate headers, read the data returned and send data back over the same connection...
I'm not sure libevent does any blocking connections but just to be explicit, I'm after non-blocking samples.
Why am I trying to do this?
I'm using an API which requires you to open a connection and it keeps it alive unless there's an error. It'll periodically send status texts to the connected client until it receives a string with an ID over the same connection. At which point it starts sending data back about the ID given... I'm not entirely sure sending data back over the same connection after the initial request is strictly compliant but that's what the server expects so it'll work...if I knew how
Thanks in advance
Yuck. Given that this isn't really HTTP, I don't think you're going to be happy using a HTTP library - even if you get it to work today after a lot of frustration, it could easily be broken tomorrow. This is too rare to be a supported feature.
But...it sounds like it's also simple enough that you could just open a raw TCP connection with libevent, manually send something that looks kind of like an HTTP request, and handle it with raw sockets from there. You don't want the extra stuff a HTTP library gets you anyway (additional transfer/content encodings, proxy support, SSL, compatibility with other protocol versions, ...)
As far as examples go, look at the libevent book. In particular, the a "Trivial HTTP v0 client" that seems very close to what you want. Good luck!