I am implementing a way to restrict file upload on Django 1.8 running Python 3.4.
Basically, I want to check the MIME type of a file when they upload using mimetypes. However, when I manipulate the file name from bad_image.exe to bad_image.exe.jpg, the MIME type is still image/jpeg. This could still result in a malicious attack.
Is there a way to actually implement this?
You could potentially perform the check in reverse by setting a blacklist of MIME types to prohibit. Then, for each of these MIME types, use e.g.
mimetypes.guess_all_extensions('application/x-msdownload')
to yield a list of possibly malicious extensions, which you can then search for in the uploaded filenames.
Warning.
Relying on filenames and MIME types to defend against malicious uploads is not safe practice. At the very least, sandboxing user uploads in a separate domain prevents any malicious code that slips through your defenses from attacking your site.
Related
The documentation on secure file upload (https://docs.djangoproject.com/en/2.0/ref/models/fields/#file-upload-security) doesn't quite answer my concerns.
Consider the following use case : a (potentially malicious) user can use a form to upload a file. I want to implement the following controls (returning an error message and not accepting the file if it doesn't comply) :
the file must be less than 50MiB
the file name must be of format ^[a-zA-Z0-9_ ]{1,250}.csv$
the filemust be ANSI encoded
only a whitelist of ANSI characters inside the file will be allowed (for example [a-zA-Z1-9;])
securing metadata to prevent code injection through malicious metadata
What would be the best way to implement this ? Also did I forget important controls for this use case ?
As an example, I'm currently uploading items directly to an S3 bucket using a form. While I was testing, I didn't specify any expected filenames or extensions.
I uploaded a .png which produced this direct link:
https://s3-us-west-2.amazonaws.com/easyhighlighting2/2015-07-271438019663927upload94788
When I place this inside an img tag, it displays on a web page properly.
My question is, without an extension, how would my browser know what type of file it's loading? Inside the bucket, the file's metadata isn't even filled out.
Is there any way to get that file extension, programmatically?
I'm ready to try any clientside methods available; my server-side language is ColdFusion which is somewhat limiting, but I'm open to suggestions for that as well.
Okay, so after some more extensive digging, I found a method of retrieving the file's type that was only added since CF10 was released; that would explain the lack of documentation.
The answer lies in the FileGetMimeType function.
<cfset someVar = "https://s3-us-west-2.amazonaws.com/easyhighlighting2/2015-07-271438019663927upload94788">
<cfset FileType = FileGetMimeType(someVar)>
<cfoutput>#FileType#</cfoutput>
This code would output image/png - which is correct and has worked for every filetype I have tested thus far.
I'm surprised this kind of question hasn't popped up before, but this appears to be the best answer, at least for users of CFML.
Edit:
ColdFusion accomplishes this by either reading the contents of a file, or by trusting its extension. An implicit attribute, 'strict', is used in this function. If true, it reads the file's contents. If false, it uses the provided extension.
True is the default.
Link:
https://wikidocs.adobe.com/wiki/display/coldfusionen/FileGetMimeType
Check the Content-Type HTTP response header returned by Amazon S3.
For example, curl -I https://s3.amazonaws.com/path/to/file fetches only the headers.
Is it possible to have a charfield in a form to ask user input a absolute file path then bound the file to Request.file object? I think this is quite routine but I cannot use forms.fileField to do this since I cannot find a argument you can input file path. I searched but seems no related posts can be found.
No, there is no way to do this, because there is no way to give a path to a browser file upload field - for very good security reasons imposed by the browsers themselves.
I'm using lib-cURL as a HTTP client to retrieve various pages (can be any URL for that matter).
Usually the data comes as a UTF-8 string and then I just call "MultiByteToWideChar" and it works well.
However, some web-pages still use code-page encoding and I see gibberish if i try to convert those pages to UTF-8.
Is there an easy way to retrieve the code page from the data? or I'll have to scan it manually (for "encoding=") and then translate it accordingly.
If so, how do i get the code-page id from name (Code Page Identifiers)?
Thanks,
Omer
There are several location where a document can state its encoding:
the Content-Type HTTP header
the (optional) XML declaration
the Content-Type meta tag inside the document header
for HTML5 documents the charset meta tag.
There are probably even more I've forgotten.
In the end, detecting the actual encoding is rather hard. You really shouldn't do this yourself but use high-level libraries for retrieving and parsing HTML content. I'm sure they are available even for C++, even if they have to be thiefed from the a browser environment. :)
I used DetectInputCodepage in IMultiLanguage2 interface and it worked great !
I am trying to find the best way (most efficient way) to return large files from Django back to an http client.
receive http get request
read large file from disk
return the content of that file
I don't want to read the file then post the response using HttpResponse as the file content is first stored in RAM if I am correct. How can I do that efficiently ?
Laurent
Look into mod_xsendfile on Apache (or equivalents for nginx, etc) if you like to use Django for authentication. Otherwise, there's no need to hit django, and just server straight from Apache.
There is a ticket that aims to deal with this problem here: http://code.djangoproject.com/ticket/2131
It adds an HttpResponseSendFile class that uses sendfile() to send the file, which transparently sends the file as it's read.
However, the standard HttpResponse is implemented as an iterator, so if you pass it a file-like object, it will follow its iteration semantics, so presumably you could create a file-like object wrapper that chunks the file in small enough pieces before sending them out.
I believe the semantics of iterating over a standard file object in python is that it reads line-by-line, which most likely won't solve your problem if you're dealing with binary files.
Of course, you could always put the static files in another location and serve that with a normal web server, unless you require intricate control (like access control requiring knowledge of the Django database)
My preference for all of this is to synthesize django with your http server so that when you want to serve static files, you simply refer them to a path that will never reach django. The strategy will look something like this:
Configure http server so that some requests go to django and some go to a static document root
link to static documents from any web pages that obviously need the static documents (e.g. css, javascript, etc.)
for any non-obvious return of a static document, use an HttpRedirect("/web-path/to/doc").
If you need to include the static document inside a dynamic document (maybe a page-viewer wrapping a large text or binary file), then return a wrapper page that populates a div with an ajax call to your static document.