Django return large file - django

I am trying to find the best way (most efficient way) to return large files from Django back to an http client.
receive http get request
read large file from disk
return the content of that file
I don't want to read the file then post the response using HttpResponse as the file content is first stored in RAM if I am correct. How can I do that efficiently ?
Laurent

Look into mod_xsendfile on Apache (or equivalents for nginx, etc) if you like to use Django for authentication. Otherwise, there's no need to hit django, and just server straight from Apache.

There is a ticket that aims to deal with this problem here: http://code.djangoproject.com/ticket/2131
It adds an HttpResponseSendFile class that uses sendfile() to send the file, which transparently sends the file as it's read.
However, the standard HttpResponse is implemented as an iterator, so if you pass it a file-like object, it will follow its iteration semantics, so presumably you could create a file-like object wrapper that chunks the file in small enough pieces before sending them out.
I believe the semantics of iterating over a standard file object in python is that it reads line-by-line, which most likely won't solve your problem if you're dealing with binary files.
Of course, you could always put the static files in another location and serve that with a normal web server, unless you require intricate control (like access control requiring knowledge of the Django database)

My preference for all of this is to synthesize django with your http server so that when you want to serve static files, you simply refer them to a path that will never reach django. The strategy will look something like this:
Configure http server so that some requests go to django and some go to a static document root
link to static documents from any web pages that obviously need the static documents (e.g. css, javascript, etc.)
for any non-obvious return of a static document, use an HttpRedirect("/web-path/to/doc").
If you need to include the static document inside a dynamic document (maybe a page-viewer wrapping a large text or binary file), then return a wrapper page that populates a div with an ajax call to your static document.

Related

HttpQueryInfo to get File Size

Why does this function work on a direct url to a download however fail on a php page echoing out a file for download? (GetLastError is 0)
Not all HTTP requests will have a content length field in the response. Dynamic pages generated by PHP scripts might not know how large the content actually is.
In these cases you need just need to read a little bit at the time until there is no more data returned from the server.

Always serve unversioned files raw?

I'm serving unversioned files via fossil's uv function. Now, this works fine for files without file extension and for archives. But I need to serve a .txt file. The problem now is that it gets delivered as a HTML page including the fossil web layout around it.
Is there a way to tell fossil to not do that, and instead deliver it as a raw .txt file?
You can specify a mimetype parameter on the URL. For example, mimetype=application/octet-stream will cause it to be offered as download.
For example, instead of https://www.fossil-scm.org/index.html/uv/download.html, you’d put https://www.fossil-scm.org/index.html/uv/download.html?mimetype=application/octet-stream.
Fossil reacts to the following mimetypes by putting headers around them:
text/x-fossil-wiki
text/x-markdown
text/html
text/plain
Unfortunately, all other mimetypes appear to lead to the browser downloading the unversioned file instead of displaying it.
If that's a problem, you could try a mimetype of text with no suffix.
Otherwise, you can post on Fossil's support forum. Either as a question or as a feature request. :-)

Django Identify file pattern

I am implementing a way to restrict file upload on Django 1.8 running Python 3.4.
Basically, I want to check the MIME type of a file when they upload using mimetypes. However, when I manipulate the file name from bad_image.exe to bad_image.exe.jpg, the MIME type is still image/jpeg. This could still result in a malicious attack.
Is there a way to actually implement this?
You could potentially perform the check in reverse by setting a blacklist of MIME types to prohibit. Then, for each of these MIME types, use e.g.
mimetypes.guess_all_extensions('application/x-msdownload')
to yield a list of possibly malicious extensions, which you can then search for in the uploaded filenames.
Warning.
Relying on filenames and MIME types to defend against malicious uploads is not safe practice. At the very least, sandboxing user uploads in a separate domain prevents any malicious code that slips through your defenses from attacking your site.

How to provide image data for embedded web control in C++

In my C++ app I'm embedding (via COM) a web browser (Internet Explorer) control (CLSID_WebBrowser).
I can display my own html in that control by using IHTMLDocument2::write() method but if the html has <img src="foo.png"> element, it's not displayed.
I assume there is a way for me to provide the data for foo.png somehow to the web control, but I can't find the right place to hook this functionality?
I need to be in full control of providing the content of foo.png, so work-arounds like using res:// protocol or saving to disk and using file:// protocol are not good enough. I just want to plug my code somehow so that when embedded CLSID_WebBrowser control sees <img src="foo.png"> in html data given with IHTMLDocument2::write() it will ask me to provide this data.
To answer my own question, the solution that finally worked for me is:
register custom IInternetProtocol/IInternetProtocolInfo/ via custom IClassFactory given to IInternetSession::RegisterNameSpace(). For reasons that seem like a bug to me, it has to be a protocol already known to IE (I've chosen "its") even though it would be much better if it was my own, unique namespace.
feed html data via custom IMoniker through IPersistentMoniker::Load() and make sure that IMoniker::GetDisplayName() (which is a base url according to which relative links in provided html will be resolved) starts with that protocol scheme (in my case "its://"). That way relative link "foo.png" in the html data will be its://foo.png to IE which will make urlmon call IInternetProtocol::Start() and IInternetProtocol::Read() to ask for the data for that url.
This is all rather complicated, you can look at the actual (BSD-licensed) code here:
http://code.google.com/p/sumatrapdf/source/browse/trunk/src/utils/HtmlWindow.cpp
You can embed a small webserver such as mongoose and reference those impage from there.
In mongoose, you can attach callback to specific path, thus returning images from C++ code.
We use this for our debugging tools, where each images is accessible from a web interface
The easiest solution would be a Data URI. You'd inline out the image directly with IHTMLDocument2::write().

How to get input from web?

i am trying to find out, how to get input from html inputs using c++. In windows you can send WM_GETTEXT to the window and it returns text, that you wanted. But is there any way to do the same thing in web interface?.
I am not interesting in sniffing packets now.
For example. Some site has html intput which expects name. I write name to the input. And then i want to catch it with my program
If I understood correctly what you want to do, you have to set up a web server that calls your C++ application via CGI. So, you'll have an HTML page (static or generated by your program) that will contain a form, that refers to the URL of your application. So, when the user will click Submit, the browser will issue a request to the webserver, which in turn will call your application, passing to it the various POST/GET parameters related to the form.
Your application then can process the data, extracting such parameters from the environment variables (if the data is passed using the GET method) or from the standard input (if the POST method is used). To generate the output page (along with the output HTTP header) you'll simply have to write it to the standard output.
One thing I can think of (if you're using Linux) is using wget via system() from within your C++ app.
Wget to fetch the html page and output it to a file, parse the file for the URL of the form and data that it needs, pass the response as POST / GET via wget and so on.
That is, if I understood what you meant by "do it from existing page" correctly.