Shall I rename user uploaded files? - django

Django==1.11.6
There are file upload attacks. But modern Django seems well guarded against them.
Django security guide is here:
https://docs.djangoproject.com/en/1.11/topics/security/#user-uploaded-content
Concerning user uploaded files it is much shorter than other security guides.
In the Internet we can find this kind of advice:
The application should not use the file name supplied by the user.
Instead, the uploaded file should be renamed according to a
predetermined convention.
Well, I think that renaming is a good idea.
Shall I rename user uploaded files or it is not dangerous in case of modern Django?

There are a couple of reasons why you should (in some cases, need) to rename uploaded files. So it does not even matter whether Django has good measures against some attacks.
You have to deal with duplicate file names
File names can be veeery long
File names can contain characters that are not supported by the backend's file system
Special characters in file names can cause problems when you want to access the files using a URL
File names can contain lower/uppercase characters which might lead to duplicates on filesystems that are case-insensitive

Related

OSX- Auto Delete file after x-time

Can we add metadata to unlink/remove a file after x-time automatically. That is system automatically removes that file, if it finds that particular metadata attached with that file
Note- file can be present at any location, and user may move that file anywhere on their system, but based on that metadata file should get deleted(i.e system should call unlink/remove) for that file.
Is there a cocoa/objective-c/c++ api to set such metadata/attributes of a file?
The main point is i am creating an application through which i am providing some trial files to the user, and those files are also usable by other application which recognises them. After trial expiry, i want to delete those files, but user can always move my files to a different location and use them forever, how to protect those files from permanent use?
No, there is no built-in mechanism to auto-delete a file based on some metadata.
You could add the feature yourself, with an accompanying agent that would trawl for files with the metadata and delete them when the time came.
If you are doing this for good housekeeping you can follow #Petesh answer.
If you are doing this because you really want those files gone then no. The user could move the file to a USB stick and remove it, or edit the metadata, etc.
Your earlier question "Completely restricting all types of access to a folder" seems to addressing the same issue and the suggestions are the same as given there - use encryption or implement your own file system.
E.g. have a special "trial file" format which is the same as the ordinary format - which is readable by other apps - but encrypted and includes an expiry date. Your app then decrypts the file, checks the date, and either does its thing or reports to the user the file is out of date.
The system isn't unbreakable, but its a reasonable barrier - easy for you to do, too hard for the average user to break.

web.config vs. text file for storing a comma-separated value

We have a collection of VB.NET / IIS web services on some of our servers, and they have web.config files in the websites' root directories that they're already reading configurations from. There is a new configuration that needed to be added that will immediately be quite a bit longer than the others, and it'll only stand to grow. It's essentially a comma-separated value, and I'm wanting to keep it specifically in a configuration file of some sort.
At first I started doing this with a text file, but there was a problem with that. The text file's contents could change while web service threads and processes are running, so they would need to essentially re-read the file every time they needed to access its values. I thought about using some sort of caching, but unless the web services are completely restarted each time the file is updated, caching would block updates to the file from being used immediately. But reading from a text file each time is slow...
Then came the idea of putting that value in web.config, along with the other configurations the services are already using. When web.config is altered, the changes are able to be cached in the code, on top of coming into play immediately. However web.config is, well, web.config, and it's not a totally trivialized text file that is simply read out of in the code. IIS treats web.config in a special manner.
I'm tempted to think any negative consequences of putting a comma-separated value in web.config would be outweighed, in comparison to storing them in a text file (or a database, which probably can't be used for this anyway), but I guess I better ask.
What are the implications of storing a possibly lengthy, comma-separated value in web.config, instead of in its own little text file? Is either file a particularly good or bad idea? To me, it seems like web.config would be easy to get along with without having to re-read the file over and over, but there's certainly more to it than the common user is aware. Thanks!
I recommend using the Application Cache for this:
http://msdn.microsoft.com/en-us/library/vstudio/6hbbsfk6(v=vs.100).aspx

What are the best practices for user uploads with S3?

I was wondering what you recommend for running a user upload system with s3. I plan on using MongoDB for storing metadata such as the uploader, size, etc. How should I go about storing the actual file in s3.
Here are some of my ideas, what do you think is the best? All of these examples would involve saving the metadata to MongoDB.
1.Should I just store all the files in a bucket?
2. Maybe organize them into dates (e.g. 6/8/2014/mypicture.png)?
3.Should I save them all in one bucket, but with an added string (such as d1JdaZ9-mypicture.png) to avoid duplicates.
4. Or should I generate a long string for a folder, and store the file in that folder. (to retain the original file name). e.g. sh8sb36zkj391k4dhqk4n5e4ndsqule6/mypicture.png
This depends primarily on how you intend to use the pictures and which objects/classes/modules/etc. in your code will actually deal with retrieving them.
If you find yourself wanting to do things like - "all user uploads on a particular day" - A simple naming convention with folders for the year, month and day along with a folder at the top level for the user's unique ID will solve the problem.
If you want to ensure uniqueness and avoid collisions in your bucket, you could generate a unique string too.
However, since you've got MongoDB which (i'm assuming) will actually handle these queries for user uploads by date, etc., it makes the choice of your bucket more aesthetic than functional.
If all you're storing in mongoDB is the key/URL, it doesn't really matter what the actual structure of your bucket is. Nevertheless, it makes sense to still split this up in some coherent way - maybe group all a user's uploads and give each a unique name (either generate a unique name or prefix a unique prefix to the file name).
That being said, do you think there might be a point when you might look at changing how your images are stored? You might move to a CDN. A third party might come up with an even cheaper/better product which you might want to try. In a case like that, simply storing the keys/URLs in your MongoDB is not a good idea since you'll have to update every entry.
To make this relatively future-proof, I suggest you give your uploads a definite structure. I usually opt for:
bucket_name/user_id/yyyy/mm/dd/unique_name.jpg
Your database then only needs to store the file name and the upload time stamp.
You can introduce a middle layer in your logic (a new class perhaps or just a helper function/method) which then generates the URL for a file based on this info. That way, if you change your storage method later, you only need to make a small change in this middle layer (after migrating your files of course) and not worry about MongoDB.

Appropriate file upload validation

Background
In a targeted issue tracking application (in django) users are able add file attachments to internal messages. Files are mainly different image formats, office documents and spreadsheets (microsoft or open office), PDFs and PSDs.
A custom file field type (type extending FileField) currently validates that the files don't exceed a given size and that the file's content_type is in a the applications MIME Type 'white list'. But as the user base is very varied (multi national and multi platform) we are frequently having to adjust our white list as users using old or brand new application versions have different MIME types (even though they are valid files, and are opened correctly by other users within the business).
Note: Files are not 'executed' by apache, they are just stored (with unix permissions 600) and can be downloaded by users.
Question
What are the pro's and con's for the different types of validation?
A few options:
MIME type white list or black list
File extension while list or black list
Django file upload input validation and security even suggests "you have to actually read the file to be sure it's a JPEG, not an .EXE" (is that even viable when numerous types of files are to be accepeted?)
Is there a 'right' way to validate file uploads?
Edit
Let me clarify. I can understand that actually checking the entire file in the program that it should be opened with to ensure it works and isn't broken would be the only way to fully confirm that the file is what it says it is, and that it isn't corrupted.
But the files in question are like email attachments. we can't possibly verify that every PSD is a valid and working Photoshop image, same goes for JPG or any other type. Even if it is what it says it is, we couldn't guarantee that it's a fully functional file.
So What I was hoping to get at is: Is file magic absolutely crucial? What protection does it really add? And again does a MIME type whitelist actually add any protection that a file extension whitelist doesn't? If a file has an file extension of CSV, JPG, GIF, DOC, PSD is it really viable to check that it is what it says it is, even though the application itself doesn't depend on file?
Is it dangerous to use simple file extension whitelist excluding the obvious offenders (EXE, BAT, etc.) and, I think, disallowing files that are dangerous to the users?
The best way to validate that a file is what it says it is by using magic.
Er, that is, magic. Files can be identified by the first few bytes of their content. It's generally more accurate than extensions or mime types, since you're judging what a file is by what it contains rather than what the browser or user claimed it to be.
There's an article on FileMagic on the Python wiki
You might also look into using the python-magic package
Note that you don't need to get the entire file before using magic to determine what it is. You can read the first chunk of the file and send those bytes to be identified by file magic.
Clarification
Just to point out that using magic to identify a file really just means reading the first small chunk of a file. It's definitely more overhead then just checking the extension but not too mch work. All that file magic does is check that the file "looks" like it's the file you want. It's like checking the file extension only you're looking at the first few chars of the content instead of the last few chars of the filename. It's harder to spoof than just changing the filename. I'd recommend against a mime type whitelist. A file extension whitelist should work fine for your needs, just make sure that you include all possible extensions. Otherwise a perfectly valid file might be rejected just because it ends with .jpeg instead of .jpg.

How to batch rename Media files on Wordpress to remove Special Chars?

I have recently realised that all of my media files that have special chars (such as ç, á, etc) on their names are not being uploaded to my CDN.
The thing is: I have to find a way to rename all of those files and update that on the database also... and I am a very bad programmer.
Any ideas?
One very attractive solution to the problem would be to urlencode() the file names. That way, the characters would retain their original meaning (retrievable using urldecode()) but should work with any CDN.