How to limit file types on file uploads for ModelForms with FileFields? - django

My goal is to limit a FileField on a Django ModelForm to PDFs and Word Documents. The answers I have googled all deal with creating a separate file handler, but I am not sure how to do so in the context of a ModelForm. Is there a setting in settings.py I may use to limit upload file types?

Create a validation method like:
def validate_file_extension(value):
if not value.name.endswith('.pdf'):
raise ValidationError(u'Error message')
and include it on the FileField validators like this:
actual_file = models.FileField(upload_to='uploaded_files', validators=[validate_file_extension])
Also, instead of manually setting which extensions your model allows, you should create a list on your setting.py and iterate over it.
Edit
To filter for multiple files:
def validate_file_extension(value):
import os
ext = os.path.splitext(value.name)[1]
valid_extensions = ['.pdf','.doc','.docx']
if not ext in valid_extensions:
raise ValidationError(u'File not supported!')

Validating with the extension of a file name is not a consistent way. For example I can rename a picture.jpg into a picture.pdf and the validation won't raise an error.
A better approach is to check the content_type of a file.
Validation Method
def validate_file_extension(value):
if value.file.content_type != 'application/pdf':
raise ValidationError(u'Error message')
Usage
actual_file = models.FileField(upload_to='uploaded_files', validators=[validate_file_extension])

An easier way of doing it is as below in your Form
file = forms.FileField(widget=forms.FileInput(attrs={'accept':'application/pdf'}))

Django since 1.11 has a FileExtensionValidator for this purpose:
class SomeDocument(Model):
document = models.FileFiled(validators=[
FileExtensionValidator(allowed_extensions=['pdf', 'doc'])])
As #savp mentioned, you will also want to customize the widget so that users can't select inappropriate files in the first place:
class SomeDocumentForm(ModelForm):
class Meta:
model = SomeDocument
widgets = {'document': FileInput(attrs={'accept': 'application/pdf,application/msword'})}
fields = '__all__'
You may need to fiddle with accept to figure out exactly what MIME types are needed for your purposes.
As others have mentioned, none of this will prevent someone from renaming badstuff.exe to innocent.pdf and uploading it through your form—you will still need to handle the uploaded file safely. Something like the python-magic library can help you determine the actual file type once you have the contents.

For a more generic use, I wrote a small class ExtensionValidator that extends Django's built-in RegexValidator. It accepts single or multiple extensions, as well as an optional custom error message.
class ExtensionValidator(RegexValidator):
def __init__(self, extensions, message=None):
if not hasattr(extensions, '__iter__'):
extensions = [extensions]
regex = '\.(%s)$' % '|'.join(extensions)
if message is None:
message = 'File type not supported. Accepted types are: %s.' % ', '.join(extensions)
super(ExtensionValidator, self).__init__(regex, message)
def __call__(self, value):
super(ExtensionValidator, self).__call__(value.name)
Now you can define a validator inline with the field, e.g.:
my_file = models.FileField('My file', validators=[ExtensionValidator(['pdf', 'doc', 'docx'])])

I use something along these lines (note, "pip install filemagic" is required for this...):
import magic
def validate_mime_type(value):
supported_types=['application/pdf',]
with magic.Magic(flags=magic.MAGIC_MIME_TYPE) as m:
mime_type=m.id_buffer(value.file.read(1024))
value.file.seek(0)
if mime_type not in supported_types:
raise ValidationError(u'Unsupported file type.')
You could probably also incorporate the previous examples into this - for example also check the extension/uploaded type (which might be faster as a primary check than magic.) This still isn't foolproof - but it's better, since it relies more on data in the file, rather than browser provided headers.
Note: This is a validator function that you'd want to add to the list of validators for the FileField model.

I find that the best way to check the type of a file is by checking its content type. I will also add that one the best place to do type checking is in form validation. I would have a form and a validation as follows:
class UploadFileForm(forms.Form):
file = forms.FileField()
def clean_file(self):
data = self.cleaned_data['file']
# check if the content type is what we expect
content_type = data.content_type
if content_type == 'application/pdf':
return data
else:
raise ValidationError(_('Invalid content type'))
The following documentation links can be helpful:
https://docs.djangoproject.com/en/3.1/ref/files/uploads/ and https://docs.djangoproject.com/en/3.1/ref/forms/validation/

I handle this by using a clean_[your_field] method on a ModelForm. You could set a list of acceptable file extensions in settings.py to check against in your clean method, but there's nothing built-into settings.py to limit upload types.
Django-Filebrowser, for example, takes the approach of creating a list of acceptable file extensions in settings.py.
Hope that helps you out.

Related

Django Validate Image or File With A Form Inside a Form

Hello I'm having trouble here with multiple image with same field.
As far as I know in django tutorial they telling this.
for f in request.FILES.getlist('files'):
# do something (validate here maybe)
in which I don't quite get it. Like do i do manual validation? If so why?
Anyway there is another approach they give
files = forms.FileField(widget=ClearableFileInput(attrs={'multiple': True})
This one does not work in the way I want. It's self.cleaned_data['files'] only gives one output (There is a similar problem here) and django/multiupload was having a bug on my experience and sadly it was too slow to fix :(.
What I want was to validate each file and give errors to each via ImageField because I like it was validating a file versus I code it myself.
Thus I made a prototype code.
forms.py
class ImageForm(forms.Form):
# validate each image here
image = forms.ImageField()
class BaseForm(forms.Form):
# first form
ping = forms.CharField()
#images = SomeThingMultipleFileField that will raise multiple errors each validate image.
# since no option I decided to do that. below.
# so for decoration that images is required.
images = forms.ImageField()
def handle(self, request, *args, **kwargs):
#custom function
image_list = []
errors = []
# validate each image in images via another form
# if there is errors e.g this field is required will be append to errors = []
for image in request.FILES.getlist('images'):
data = ImageForm(image)
if data.is_valid():
image_list.append(data.cleaned_data['image'])
else:
errors.append(data.errors)
if errors:
# raise errors
# return the data
views.py
def base(request):
# this is an api
# expected input should be from the code or format
# {'ping': 'test', 'images': 1.jpg, 'images': 2.jpg}
# This is not the actual view code.
data = forms.BaseForm(request.POST, request.FILES)
if data.is_valid():
value = data.handle(request)
return JSONResponse({'data': value})
return JSONResponse({'errors': data.errors})
Not elegant to be honest but having trouble now and no more options I can think off but that.
The problem in my code is that
data = ImageForm(image)
does not reading the file thus image_list is always empty
So anyone can help me here?. Im stuck
Any better approach?
I wonder also for a general error like if one image is not valid it triggers like {'files': 'One of the images is not valid.'}
so far, I tested again so my bad
it seems it requires the format of data, files in ordinary forms.
in order to do so.
forms.py
... # previous code
# data = ImageForm(image) , old code
data = ImageForm({}, {'image': image})
in this way,it fills up the default QueryDict: {}, MultiValueDict in the args
Number 3 can be answered.
instead of
# previous code
else:
errors.append(error)
now should be
raise ValidationError(_('Your error'))
Any Better approach?
Not much I can think off sadly. So anyone stumble here. feel free to comment. Much appreciated for the help.

How to use validators on FileField content

In my model, I want to use a validator to analyze the content of a file, the thing I can not figure out is how to access the content of the file to parse through it as the file has not yet been saved (which is good) when the validators are running.
I'm not understanding how to get the data from the value passed to the validator into a file (I assume I should use tempfile) so I can then open it and evaluate the data.
Here's a simplified example, in my real code, I want to open the file and evaluate it with csv.
in Models.py
class ValidateFile(object):
....
def __call__(self, value):
# value is the fieldfile object but its not saved
# I believe I need to do something like:
temp_file = tempfile.TemporaryFile()
temp_file.write(value.read())
# Check the data in temp_file
....
class MyItems(models.Model):
data = models.FileField(upload_to=get_upload_path,
validators=[FileExtensionValidator(allowed_extensions=['cv']),
ValidateFile()])
Thanks for the help!
Take a look how this is done in the ImageField implementation:
So your ValidateFile class may be something like this:
from io import BytesIO
class ValidateFile(object):
def __call__(self, value):
if value is None:
#do something when None
return None
if hasattr(value, 'temporary_file_path'):
file = value.temporary_file_path()
else:
if hasattr(value, 'read'):
file = BytesIO(value.read())
else:
file = BytesIO(value['content'])
#Now validate your file
No need for tempfile:
The value passed to a FileField validator is an instance of FieldFile, as already mentioned by the OP.
Under the hood, the FieldFile instance might already use a tempfile.NamedTemporaryFile (source), or it might wrap an in-memory file, but you need not worry about that:
To "evaluate the data" you can simply treat the FieldFile instance as any Python file object.
For example, you could iterate over it:
def my_filefield_validator(value):
# note that value is a FieldFile instance
for line in value:
... # do something with line
The documentation says:
In addition to the API inherited from File such as read() and write(), FieldFile includes several methods that can be used to interact with the underlying file: ...
and the FieldFile class provides
... a wrapper around the result of the Storage.open() method, which may be a File object, or it may be a custom storage’s implementation of the File API.
An example of such an underlying file implementation is the InMemoryUploadedFile docs/source.
Also from the docs:
The File class is a thin wrapper around a Python file object with some Django-specific additions
Also note: class-based validators vs function-based validators

How does one use magic to verify file type in a Django form clean method?

I have written an email form class in Django with a FileField. I want to check the uploaded file for its type via checking its mimetype. Subsequently, I want to limit file types to pdfs, word, and open office documents.
To this end, I have installed python-magic and would like to check file types as follows per the specs for python-magic:
mime = magic.Magic(mime=True)
file_mime_type = mime.from_file('address/of/file.txt')
However, recently uploaded files lack addresses on my server. I also do not know of any method of the mime object akin to "from_file_content" that checks for the mime type given the content of the file.
What is an effective way to use magic to verify file types of uploaded files in Django forms?
Stan described good variant with buffer. Unfortunately the weakness of this method is reading file to the memory. Another option is using temporary stored file:
import tempfile
import magic
with tempfile.NamedTemporaryFile() as tmp:
for chunk in form.cleaned_data['file'].chunks():
tmp.write(chunk)
print(magic.from_file(tmp.name, mime=True))
Also, you might want to check the file size:
if form.cleaned_data['file'].size < ...:
print(magic.from_buffer(form.cleaned_data['file'].read()))
else:
# store to disk (the code above)
Additionally:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
So you might want to handle it like so:
import os
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
for chunk in form.cleaned_data['file'].chunks():
tmp.write(chunk)
print(magic.from_file(tmp.name, mime=True))
finally:
os.unlink(tmp.name)
tmp.close()
Also, you might want to seek(0) after read():
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
Where uploaded data is stored
Why no trying something like that in your view :
m = magic.Magic()
m.from_buffer(request.FILES['my_file_field'].read())
Or use request.FILES in place of form.cleaned_data if django.forms.Form is really not an option.
mime = magic.Magic(mime=True)
attachment = form.cleaned_data['attachment']
if hasattr(attachment, 'temporary_file_path'):
# file is temporary on the disk, so we can get full path of it.
mime_type = mime.from_file(attachment.temporary_file_path())
else:
# file is on the memory
mime_type = mime.from_buffer(attachment.read())
Also, you might want to seek(0) after read():
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
Example from Django code. Performed for image fields during validation.
You can use django-safe-filefield package to validate that uploaded file extension match it MIME-type.
from safe_filefield.forms import SafeFileField
class MyForm(forms.Form):
attachment = SafeFileField(
allowed_extensions=('xls', 'xlsx', 'csv')
)
In case you're handling a file upload and concerned only about images,
Django will set content_type for you (or rather for itself?):
from django.forms import ModelForm
from django.core.files import File
from django.db import models
class MyPhoto(models.Model):
photo = models.ImageField(upload_to=photo_upload_to, max_length=1000)
class MyForm(ModelForm):
class Meta:
model = MyPhoto
fields = ['photo']
photo = MyPhoto.objects.first()
photo = File(open('1.jpeg', 'rb'))
form = MyForm(files={'photo': photo})
if form.is_valid():
print(form.instance.photo.file.content_type)
It doesn't rely on content type provided by the user. But
django.db.models.fields.files.FieldFile.file is an undocumented
property.
Actually, initially content_type is set from the request, but when
the form gets validated, the value is updated.
Regarding non-images, doing request.FILES['name'].read() seems okay to me.
First, that's what Django does. Second, files larger than 2.5 Mb by default
are stored on a disk. So let me point you at the other answer
here.
For the curious, here's the stack trace that leads to updating
content_type:
django.forms.forms.BaseForm.is_valid: self.errors
django.forms.forms.BaseForm.errors: self.full_clean()
django.forms.forms.BaseForm.full_clean: self._clean_fields()
django.forms.forms.BaseForm._clean_fiels: field.clean()
django.forms.fields.FileField.clean: super().clean()
django.forms.fields.Field.clean: self.to_python()
django.forms.fields.ImageField.to_python

Only accept a certain file type in FileField, server-side

How can I restrict FileField to only accept a certain type of file (video, audio, pdf, etc.) in an elegant way, server-side?
One very easy way is to use a custom validator.
In your app's validators.py:
def validate_file_extension(value):
import os
from django.core.exceptions import ValidationError
ext = os.path.splitext(value.name)[1] # [0] returns path+filename
valid_extensions = ['.pdf', '.doc', '.docx', '.jpg', '.png', '.xlsx', '.xls']
if not ext.lower() in valid_extensions:
raise ValidationError('Unsupported file extension.')
Then in your models.py:
from .validators import validate_file_extension
... and use the validator for your form field:
class Document(models.Model):
file = models.FileField(upload_to="documents/%Y/%m/%d", validators=[validate_file_extension])
See also: How to limit file types on file uploads for ModelForms with FileFields?.
Warning
For securing your code execution environment from malicious media files
Use Exif libraries to properly validate the media files.
Separate your media files from your application code
execution environment
If possible use solutions like S3, GCS, Minio or
anything similar
When loading media files on client side, use client native methods (for example if you are loading the media files non securely in a
browser, it may cause execution of "crafted" JavaScript code)
Django in version 1.11 has a newly added FileExtensionValidator for model fields, the docs is here: https://docs.djangoproject.com/en/dev/ref/validators/#fileextensionvalidator.
An example of how to validate a file extension:
from django.core.validators import FileExtensionValidator
from django.db import models
class MyModel(models.Model):
pdf_file = models.FileField(
upload_to="foo/", validators=[FileExtensionValidator(allowed_extensions=["pdf"])]
)
Note that this method is not safe. Citation from Django docs:
Don’t rely on validation of the file extension to determine a file’s
type. Files can be renamed to have any extension no matter what data
they contain.
There is also new validate_image_file_extension (https://docs.djangoproject.com/en/dev/ref/validators/#validate-image-file-extension) for validating image extensions (using Pillow).
A few people have suggested using python-magic to validate that the file actually is of the type you are expecting to receive. This can be incorporated into the validator suggested in the accepted answer:
import os
import magic
from django.core.exceptions import ValidationError
def validate_is_pdf(file):
valid_mime_types = ['application/pdf']
file_mime_type = magic.from_buffer(file.read(1024), mime=True)
if file_mime_type not in valid_mime_types:
raise ValidationError('Unsupported file type.')
valid_file_extensions = ['.pdf']
ext = os.path.splitext(file.name)[1]
if ext.lower() not in valid_file_extensions:
raise ValidationError('Unacceptable file extension.')
This example only validates a pdf, but any number of mime-types and file extensions can be added to the arrays.
Assuming you saved the above in validators.py you can incorporate this into your model like so:
from myapp.validators import validate_is_pdf
class PdfFile(models.Model):
file = models.FileField(upload_to='pdfs/', validators=(validate_is_pdf,))
You can use the below to restrict filetypes in your Form
file = forms.FileField(widget=forms.FileInput(attrs={'accept':'application/pdf'}))
There's a Django snippet that does this:
import os
from django import forms
class ExtFileField(forms.FileField):
"""
Same as forms.FileField, but you can specify a file extension whitelist.
>>> from django.core.files.uploadedfile import SimpleUploadedFile
>>>
>>> t = ExtFileField(ext_whitelist=(".pdf", ".txt"))
>>>
>>> t.clean(SimpleUploadedFile('filename.pdf', 'Some File Content'))
>>> t.clean(SimpleUploadedFile('filename.txt', 'Some File Content'))
>>>
>>> t.clean(SimpleUploadedFile('filename.exe', 'Some File Content'))
Traceback (most recent call last):
...
ValidationError: [u'Not allowed filetype!']
"""
def __init__(self, *args, **kwargs):
ext_whitelist = kwargs.pop("ext_whitelist")
self.ext_whitelist = [i.lower() for i in ext_whitelist]
super(ExtFileField, self).__init__(*args, **kwargs)
def clean(self, *args, **kwargs):
data = super(ExtFileField, self).clean(*args, **kwargs)
filename = data.name
ext = os.path.splitext(filename)[1]
ext = ext.lower()
if ext not in self.ext_whitelist:
raise forms.ValidationError("Not allowed filetype!")
#-------------------------------------------------------------------------
if __name__ == "__main__":
import doctest, datetime
doctest.testmod()
First. Create a file named formatChecker.py inside the app where the you have the model that has the FileField that you want to accept a certain file type.
This is your formatChecker.py:
from django.db.models import FileField
from django.forms import forms
from django.template.defaultfilters import filesizeformat
from django.utils.translation import ugettext_lazy as _
class ContentTypeRestrictedFileField(FileField):
"""
Same as FileField, but you can specify:
* content_types - list containing allowed content_types. Example: ['application/pdf', 'image/jpeg']
* max_upload_size - a number indicating the maximum file size allowed for upload.
2.5MB - 2621440
5MB - 5242880
10MB - 10485760
20MB - 20971520
50MB - 5242880
100MB 104857600
250MB - 214958080
500MB - 429916160
"""
def __init__(self, *args, **kwargs):
self.content_types = kwargs.pop("content_types")
self.max_upload_size = kwargs.pop("max_upload_size")
super(ContentTypeRestrictedFileField, self).__init__(*args, **kwargs)
def clean(self, *args, **kwargs):
data = super(ContentTypeRestrictedFileField, self).clean(*args, **kwargs)
file = data.file
try:
content_type = file.content_type
if content_type in self.content_types:
if file._size > self.max_upload_size:
raise forms.ValidationError(_('Please keep filesize under %s. Current filesize %s') % (filesizeformat(self.max_upload_size), filesizeformat(file._size)))
else:
raise forms.ValidationError(_('Filetype not supported.'))
except AttributeError:
pass
return data
Second. In your models.py, add this:
from formatChecker import ContentTypeRestrictedFileField
Then instead of using 'FileField', use this 'ContentTypeRestrictedFileField'.
Example:
class Stuff(models.Model):
title = models.CharField(max_length=245)
handout = ContentTypeRestrictedFileField(upload_to='uploads/', content_types=['video/x-msvideo', 'application/pdf', 'video/mp4', 'audio/mpeg', ],max_upload_size=5242880,blank=True, null=True)
Those are the things you have to when you want to only accept a certain file type in FileField.
after I checked the accepted answer, I decided to share a tip based on Django documentation. There is already a validator for use to validate file extension. You don't need to rewrite your own custom function to validate whether your file extension is allowed or not.
https://docs.djangoproject.com/en/3.0/ref/validators/#fileextensionvalidator
Warning
Don’t rely on validation of the file extension to determine a file’s
type. Files can be renamed to have any extension no matter what data
they contain.
I think you would be best suited using the ExtFileField that Dominic Rodger specified in his answer and python-magic that Daniel Quinn mentioned is the best way to go. If someone is smart enough to change the extension at least you will catch them with the headers.
You can define a list of accepted mime types in settings and then define a validator which uses python-magic to detect the mime-type and raises ValidationError if the mime-type is not accepted. Set that validator on the file form field.
The only problem is that sometimes the mime type is application/octet-stream, which could correspond to different file formats. Did someone of you overcome this issue?
Additionally i Will extend this class with some extra behaviour.
class ContentTypeRestrictedFileField(forms.FileField):
...
widget = None
...
def __init__(self, *args, **kwargs):
...
self.widget = forms.ClearableFileInput(attrs={'accept':kwargs.pop('accept', None)})
super(ContentTypeRestrictedFileField, self).__init__(*args, **kwargs)
When we create instance with param accept=".pdf,.txt", in popup with file structure as a default we will see files with passed extension.
Just a minor tweak to #Thismatters answer since I can't comment. According to the README of python-magic:
recommend using at least the first 2048 bytes, as less can produce incorrect identification
So changing 1024 bytes to 2048 to read the contents of the file and get the mime type base from that can give the most accurate result, hence:
def validate_extension(file):
valid_mime_types = ["application/pdf", "image/jpeg", "image/png", "image/jpg"]
file_mime_type = magic.from_buffer(file.read(2048), mime=True) # Changed this to 1024 to 2048
if file_mime_type not in valid_mime_types:
raise ValidationError("Unsupported file type.")
valid_file_extensions = [".pdf", ".jpeg", ".png", ".jpg"]
ext = os.path.splitext(file.name)[1]
if ext.lower() not in valid_file_extensions:
raise ValidationError("Unacceptable file extension.")

Processing file uploads before object is saved

I've got a model like this:
class Talk(BaseModel):
title = models.CharField(max_length=200)
mp3 = models.FileField(upload_to = u'talks/', max_length=200)
seconds = models.IntegerField(blank = True, null = True)
I want to validate before saving that the uploaded file is an MP3, like this:
def is_mp3(path_to_file):
from mutagen.mp3 import MP3
audio = MP3(path_to_file)
return not audio.info.sketchy
Once I'm sure I've got an MP3, I want to save the length of the talk in the seconds attribute, like this:
audio = MP3(path_to_file)
self.seconds = audio.info.length
The problem is, before saving, the uploaded file doesn't have a path (see this ticket, closed as wontfix), so I can't process the MP3.
I'd like to raise a nice validation error so that ModelForms can display a helpful error ("You idiot, you didn't upload an MP3" or something).
Any idea how I can go about accessing the file before it's saved?
p.s. If anyone knows a better way of validating files are MP3s I'm all ears - I also want to be able to mess around with ID3 data (set the artist, album, title and probably album art, so I need it to be processable by mutagen).
You can access the file data in request.FILES while in your view.
I think that best way is to bind uploaded files to a form, override the forms clean method, get the UploadedFile object from cleaned_data, validate it anyway you like, then override the save method and populate your models instance with information about the file and then save it.
a cleaner way to get the file before be saved is like this:
from django.core.exceptions import ValidationError
#this go in your class Model
def clean(self):
try:
f = self.mp3.file #the file in Memory
except ValueError:
raise ValidationError("A File is needed")
f.__class__ #this prints <class 'django.core.files.uploadedfile.InMemoryUploadedFile'>
processfile(f)
and if we need a path, ther answer is in this other question
You could follow the technique used by ImageField where it validates the file header and then seeks back to the start of the file.
class ImageField(FileField):
# ...
def to_python(self, data):
f = super(ImageField, self).to_python(data)
# ...
# We need to get a file object for Pillow. We might have a path or we might
# have to read the data into memory.
if hasattr(data, 'temporary_file_path'):
file = data.temporary_file_path()
else:
if hasattr(data, 'read'):
file = BytesIO(data.read())
else:
file = BytesIO(data['content'])
try:
# ...
except Exception:
# Pillow doesn't recognize it as an image.
six.reraise(ValidationError, ValidationError(
self.error_messages['invalid_image'],
code='invalid_image',
), sys.exc_info()[2])
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
return f