how to unit test file upload in django - django

In my django app, I have a view which accomplishes file upload.The core snippet is like this
...
if (request.method == 'POST'):
if request.FILES.has_key('file'):
file = request.FILES['file']
with open(settings.destfolder+'/%s' % file.name, 'wb+') as dest:
for chunk in file.chunks():
dest.write(chunk)
I would like to unit test the view.I am planning to test the happy path as well as the fail path..ie,the case where the request.FILES has no key 'file' , case where request.FILES['file'] has None..
How do I set up the post data for the happy path?Can somebody tell me?

I used to do the same with open('some_file.txt') as fp: but then I needed images, videos and other real files in the repo and also I was testing a part of a Django core component that is well tested, so currently this is what I have been doing:
from django.core.files.uploadedfile import SimpleUploadedFile
def test_upload_video(self):
video = SimpleUploadedFile("file.mp4", "file_content", content_type="video/mp4")
self.client.post(reverse('app:some_view'), {'video': video})
# some important assertions ...
In Python 3.5+ you need to use bytes object instead of str. Change "file_content" to b"file_content"
It's been working fine, SimpleUploadedFile creates an InMemoryFile that behaves like a regular upload and you can pick the name, content and content type.

From Django docs on Client.post:
Submitting files is a special case. To POST a file, you need only
provide the file field name as a key, and a file handle to the file
you wish to upload as a value. For example:
c = Client()
with open('wishlist.doc') as fp:
c.post('/customers/wishes/', {'name': 'fred', 'attachment': fp})

I recommend you to take a look at Django RequestFactory. It's the best way to mock data provided in the request.
Said that, I found several flaws in your code.
"unit" testing means to test just one "unit" of functionality. So,
if you want to test that view you'd be testing the view, and the file
system, ergo, not really unit test. To make this point more clear. If
you run that test, and the view works fine, but you don't have
permissions to save that file, your test would fail because of that.
Other important thing is test speed. If you're doing something like
TDD the speed of execution of your tests is really important.
Accessing any I/O is not a good idea.
So, I recommend you to refactor your view to use a function like:
def upload_file_to_location(request, location=None): # Can use the default configured
And do some mocking on that. You can use Python Mock.
PS: You could also use Django Test Client But that would mean that you're adding another thing more to test, because that client make use of Sessions, middlewares, etc. Nothing similar to Unit Testing.

I do something like this for my own event related application but you should have more than enough code to get on with your own use case
import tempfile, csv, os
class UploadPaperTest(TestCase):
def generate_file(self):
try:
myfile = open('test.csv', 'wb')
wr = csv.writer(myfile)
wr.writerow(('Paper ID','Paper Title', 'Authors'))
wr.writerow(('1','Title1', 'Author1'))
wr.writerow(('2','Title2', 'Author2'))
wr.writerow(('3','Title3', 'Author3'))
finally:
myfile.close()
return myfile
def setUp(self):
self.user = create_fuser()
self.profile = ProfileFactory(user=self.user)
self.event = EventFactory()
self.client = Client()
self.module = ModuleFactory()
self.event_module = EventModule.objects.get_or_create(event=self.event,
module=self.module)[0]
add_to_admin(self.event, self.user)
def test_paper_upload(self):
response = self.client.login(username=self.user.email, password='foz')
self.assertTrue(response)
myfile = self.generate_file()
file_path = myfile.name
f = open(file_path, "r")
url = reverse('registration_upload_papers', args=[self.event.slug])
# post wrong data type
post_data = {'uploaded_file': i}
response = self.client.post(url, post_data)
self.assertContains(response, 'File type is not supported.')
post_data['uploaded_file'] = f
response = self.client.post(url, post_data)
import_file = SubmissionImportFile.objects.all()[0]
self.assertEqual(SubmissionImportFile.objects.all().count(), 1)
#self.assertEqual(import_file.uploaded_file.name, 'files/registration/{0}'.format(file_path))
os.remove(myfile.name)
file_path = import_file.uploaded_file.path
os.remove(file_path)

I did something like that :
from django.core.files.uploadedfile import SimpleUploadedFile
from django.test import TestCase
from django.core.urlresolvers import reverse
from django.core.files import File
from django.utils.six import BytesIO
from .forms import UploadImageForm
from PIL import Image
from io import StringIO
def create_image(storage, filename, size=(100, 100), image_mode='RGB', image_format='PNG'):
"""
Generate a test image, returning the filename that it was saved as.
If ``storage`` is ``None``, the BytesIO containing the image data
will be passed instead.
"""
data = BytesIO()
Image.new(image_mode, size).save(data, image_format)
data.seek(0)
if not storage:
return data
image_file = ContentFile(data.read())
return storage.save(filename, image_file)
class UploadImageTests(TestCase):
def setUp(self):
super(UploadImageTests, self).setUp()
def test_valid_form(self):
'''
valid post data should redirect
The expected behavior is to show the image
'''
url = reverse('image')
avatar = create_image(None, 'avatar.png')
avatar_file = SimpleUploadedFile('front.png', avatar.getvalue())
data = {'image': avatar_file}
response = self.client.post(url, data, follow=True)
image_src = response.context.get('image_src')
self.assertEquals(response.status_code, 200)
self.assertTrue(image_src)
self.assertTemplateUsed('content_upload/result_image.html')
create_image function will create image so you don't need to give static path of image.
Note : You can update code as per you code.
This code for Python 3.6.

from rest_framework.test import force_authenticate
from rest_framework.test import APIRequestFactory
factory = APIRequestFactory()
user = User.objects.get(username='#####')
view = <your_view_name>.as_view()
with open('<file_name>.pdf', 'rb') as fp:
request=factory.post('<url_path>',{'file_name':fp})
force_authenticate(request, user)
response = view(request)

As mentioned in Django's official documentation:
Submitting files is a special case. To POST a file, you need only provide the file field name as a key, and a file handle to the file you wish to upload as a value. For example:
c = Client()
with open('wishlist.doc') as fp:
c.post('/customers/wishes/', {'name': 'fred', 'attachment': fp})
More Information: How to check if the file is passed as an argument to some function?
While testing, sometimes we want to make sure that the file is passed as an argument to some function.
e.g.
...
class AnyView(CreateView):
...
def post(self, request, *args, **kwargs):
attachment = request.FILES['attachment']
# pass the file as an argument
my_function(attachment)
...
In tests, use Python's mock something like this:
# Mock 'my_function' and then check the following:
response = do_a_post_request()
self.assertEqual(mock_my_function.call_count, 1)
self.assertEqual(
mock_my_function.call_args,
call(response.wsgi_request.FILES['attachment']),
)

if you want to add other data with file upload then follow the below method
file = open('path/to/file.txt', 'r', encoding='utf-8')
data = {
'file_name_to_receive_on_backend': file,
'param1': 1,
'param2': 2,
.
.
}
response = self.client.post("/url/to/view", data, format='multipart')`
The only file_name_to_receive_on_backend will be received as a file other params received normally as post paramas.

In Django 1.7 there's an issue with the TestCase wich can be resolved by using open(filepath, 'rb') but when using the test client we have no control over it. I think it's probably best to ensure file.read() returns always bytes.
source: https://code.djangoproject.com/ticket/23912, by KevinEtienne
Without rb option, a TypeError is raised:
TypeError: sequence item 4: expected bytes, bytearray, or an object with the buffer interface, str found

from django.test import Client
from requests import Response
client = Client()
with open(template_path, 'rb') as f:
file = SimpleUploadedFile('Name of the django file', f.read())
response: Response = client.post(url, format='multipart', data={'file': file})
Hope this helps.

Very handy solution with mock
from django.test import TestCase, override_settings
#use your own client request factory
from my_framework.test import APIClient
from django.core.files import File
import tempfile
from pathlib import Path
import mock
image_mock = mock.MagicMock(spec=File)
image_mock.name = 'image.png' # or smt else
class MyTest(TestCase):
# I assume we want to put this file in storage
# so to avoid putting garbage in our MEDIA_ROOT
# we're using temporary storage for test purposes
#override_settings(MEDIA_ROOT=Path(tempfile.gettempdir()))
def test_send_file(self):
client = APIClient()
client.post(
'/endpoint/'
{'file':image_mock},
format="multipart"
)

I am using Python==3.8.2 , Django==3.0.4, djangorestframework==3.11.0
I tried self.client.post but got a Resolver404 exception.
Following worked for me:
import requests
upload_url='www.some.com/oaisjdoasjd' # your url to upload
with open('/home/xyz/video1.webm', 'rb') as video_file:
# if it was a text file we would perhaps do
# file = video_file.read()
response_upload = requests.put(
upload_url,
data=video_file,
headers={'content-type': 'video/webm'}
)

I am using django rest framework and I had to test the upload of multiple files.
I finally get it by using format="multipart" in my APIClient.post request.
from rest_framework.test import APIClient
...
self.client = APIClient()
with open('./photo.jpg', 'rb') as fp:
resp = self.client.post('/upload/',
{'images': [fp]},
format="multipart")

I am using GraphQL, upload for test:
with open('test.jpg', 'rb') as fp:
response = self.client.execute(query, variables, data={'image': [fp]})
code in class mutation
#classmethod
def mutate(cls, root, info, **kwargs):
if image := info.context.FILES.get("image", None):
kwargs["image"] = image
TestingMainModel.objects.get_or_create(
id=kwargs["id"],
defaults=kwargs
)

Related

How to upload and process large excel files using Celery in Django?

I am trying to upload and process excel file using Django and DRF with Celery.
There is an issue when I am trying to pass the file to my Celery task to be processed in the background, I get a following error:
kombu.exceptions.EncodeError: Object of type InMemoryUploadedFile is not JSON serializable
Here is my view post request handler:
class FileUploadView(generics.CreateAPIView):
"""
POST: upload file to save data in the database
"""
parser_classes = [MultiPartParser]
serializer_class = FileSerializerXLSX
def post(self, request, format=None):
"""
Allows to upload file and lets it be handled by pandas
"""
serialized = FileSerializerXLSX(data=request.data)
if serialized.is_valid():
file_obj = request.data['file']
# file_bytes = file_obj.read()
print(file_obj)
import_excel_task.delay(file_obj)
print("its working")
return Response(status=204)
return Response(serialized._errors, status=status.HTTP_400_BAD_REQUEST)
And my celery task:
def import_excel_helper(file_obj):
df = extract_excel_to_dataframe(file_obj)
transform_df_to_clientmodel(df)
transform_df_to_productmodel(df)
transform_df_to_salesmodel(df)
#shared_task(name="import_excel_task")
def import_excel_task(file_obj):
"""Save excel file in the background"""
logger.info("Importing excel file")
import_excel_helper(file_obj)
Any idea what is the way to handle importing Excel files into celery task so that it can be processed by other functions in the background?
As in the error, the body of the request to call a celery task must be JSON serializable since it is the default configuration. Then as documented in kombu:
The primary disadvantage to JSON is that it limits you to the following data types: strings, Unicode, floats, boolean, dictionaries, and lists. Decimals and dates are notably missing.
Let's say this is my excel file.
file.xlsx
Some
Value
Here
:)
Solution 1
Convert the raw bytes of the excel into Base64 string before calling the task so that it can be JSON serialized (since strings are valid data types in a JSON document, raw bytes are not). Then, everything else in the Celery configurations are the same default values.
tasks.py
import base64
import pandas
from celery import Celery
app = Celery('tasks')
#app.task
def add(excel_file_base64):
excel_file = base64.b64decode(excel_file_base64)
df = pandas.read_excel(excel_file)
print("Contents of excel file:", df)
views.py
import base64
from tasks import add
with open("file.xlsx", 'rb') as file: # Change this to be your <request.data['file']>
excel_raw_bytes = file.read()
excel_base64 = base64.b64encode(excel_raw_bytes).decode()
add.apply_async((excel_base64,))
Output
[2021-08-19 20:40:28,904: INFO/MainProcess] Task tasks.add[d5373444-485d-4c50-8695-be2e68ef1c67] received
[2021-08-19 20:40:29,094: WARNING/ForkPoolWorker-4] Contents of excel file:
[2021-08-19 20:40:29,094: WARNING/ForkPoolWorker-4]
[2021-08-19 20:40:29,099: WARNING/ForkPoolWorker-4] Some Value
0 Here :)
[2021-08-19 20:40:29,099: WARNING/ForkPoolWorker-4]
[2021-08-19 20:40:29,099: INFO/ForkPoolWorker-4] Task tasks.add[d5373444-485d-4c50-8695-be2e68ef1c67] succeeded in 0.19386404199940444s: None
Solution 2:
This is the harder way. Implement a custom serializer that will handle excel files.
tasks.py
import ast
import base64
import pandas
from celery import Celery
from kombu.serialization import register
def my_custom_excel_encoder(obj):
"""Uncomment this block if you intend to pass it as a Base64 string:
file_base64 = base64.b64encode(obj[0][0]).decode()
obj = list(obj)
obj[0] = [file_base64]
"""
return str(obj)
def my_custom_excel_decoder(obj):
obj = ast.literal_eval(obj)
"""Uncomment this block if you passed it as a Base64 string (as commented above in the encoder):
obj[0][0] = base64.b64decode(obj[0][0])
"""
return obj
register(
'my_custom_excel',
my_custom_excel_encoder,
my_custom_excel_decoder,
content_type='application/x-my-custom-excel',
content_encoding='utf-8',
)
app = Celery('tasks')
app.conf.update(
accept_content=['json', 'my_custom_excel'],
)
#app.task
def add(excel_file):
df = pandas.read_excel(excel_file)
print("Contents of excel file:", df)
views.py
from tasks import add
with open("file.xlsx", 'rb') as excel_file: # Change this to be your <request.data['file']>
excel_raw_bytes = excel_file.read()
add.apply_async((excel_raw_bytes,), serializer='my_custom_excel')
Output
Same as Solution 1
Solution 3
You might be interested with this documentation of Sending raw data without Serialization

Flask Form Image to base64 string

I created a Flask form where I can upload an image. Then I need to convert that image to base64 string, but I'm always getting the same result.
OUTPUT of my prints:
<FileStorage: '20190925_184412.jpg' ('image/jpeg')>
b''
And the code
from flask import Flask, render_template
from flask_wtf import FlaskForm
from wtforms import FileField
from flask_uploads import configure_uploads, IMAGES, UploadSet
import base64
app = Flask(__name__)
app.config['SECRET_KEY'] = 'thisisasecret'
app.config['UPLOADED_IMAGES_DEST'] = 'uploads/images'
images = UploadSet('images', IMAGES)
configure_uploads(app, images)
class MyForm(FlaskForm):
image = FileField('image')
#app.route('/', methods=['GET', 'POST'])
def index():
form = MyForm()
if form.validate_on_submit():
filename = images.save(form.image.data)
image_string = base64.b64encode(form.image.data.read())
print(form.image.data)
print(image_string)
return f'Filename: { filename }'
return render_template('index.html', form=form)
I think this is due to the way Werkzeug's FileStorage object works. As I menioned in another answer it has a stream attribute; this is of type tempfile.SpooledTemporaryFile so must be re-wound after reading, if you wish to read it again.
In your case this stream attribute is: form.image.data.stream. I suspect this is read once when you call the method images.save.
So the solution should be to rewind that stream, prior to calculating the the b64 string:
if form.validate_on_submit():
filename = images.save(form.image.data) # first read happens here
form.image.data.stream.seek(0)
image_string = base64.b64encode(form.image.data.read())
print(form.image.data)
print(image_string)
return f'Filename: { filename }'

passing commandline arguments to a selenium python webdriver test case

The following code is written using selenium python web driver which is run in saucelabs.I am providing the browser name,version and platform in a list,how do i do the same by providing the browser details through command line arguments? I am using py.test to execute the test cases.
import os
import sys
import httplib
import base64
import json
import new
import unittest
import sauceclient
from selenium import webdriver
from sauceclient import SauceClient
# it's best to remove the hardcoded defaults and always get these values
# from environment variables
USERNAME = os.environ.get('SAUCE_USERNAME', "ranjanprabhub")
ACCESS_KEY = os.environ.get('SAUCE_ACCESS_KEY', "ecec4dd0-d8da-49b9-b719-17e2c43d0165")
sauce = SauceClient(USERNAME, ACCESS_KEY)
browsers = [{"platform": "Mac OS X 10.9",
"browserName": "chrome",
"version": ""},
]
def on_platforms(platforms):
def decorator(base_class):
module = sys.modules[base_class.__module__].__dict__
for i, platform in enumerate(platforms):
d = dict(base_class.__dict__)
d['desired_capabilities'] = platform
name = "%s_%s" % (base_class.__name__, i + 1)
module[name] = new.classobj(name, (base_class,), d)
return decorator
#on_platforms(browsers)
class SauceSampleTest(unittest.TestCase):
def setUp(self):
self.desired_capabilities['name'] = self.id()
sauce_url = "http://%s:%s#ondemand.saucelabs.com:80/wd/hub"
self.driver = webdriver.Remote(
desired_capabilities=self.desired_capabilities,
command_executor=sauce_url % (USERNAME, ACCESS_KEY)
)
self.driver.implicitly_wait(30)
def test_sauce(self):
self.driver.get('http://saucelabs.com/test/guinea-pig')
assert "I am a page title - Sauce Labs" in self.driver.title
comments = self.driver.find_element_by_id('comments')
comments.send_keys('Hello! I am some example comments.'
' I should be in the page after submitting the form')
self.driver.find_element_by_id('submit').click()
commented = self.driver.find_element_by_id('your_comments')
assert ('Your comments: Hello! I am some example comments.'
' I should be in the page after submitting the form'
in commented.text)
body = self.driver.find_element_by_xpath('//body')
assert 'I am some other page content' not in body.text
self.driver.find_elements_by_link_text('i am a link')[0].click()
body = self.driver.find_element_by_xpath('//body')
assert 'I am some other page content' in body.text
def tearDown(self):
print("Link to your job: https://saucelabs.com/jobs/%s" % self.driver.session_id)
try:
if sys.exc_info() == (None, None, None):
sauce.jobs.update_job(self.driver.session_id, passed=True)
else:
sauce.jobs.update_job(self.driver.session_id, passed=False)
finally:
self.driver.quit()
So this is a bit complicated because you can pass an array of browsers into the #on_platforms decorator. My solution will only work for a single browser, as it looks like that's what you're doing right now.
For the current, single browser, situation -- you're looking for argparse. Here's my suggested fix:
import argparse
def setup_parser():
parser = argparse.ArgumentParser(description='Automation Testing!')
parser.add_argument('-p', '--platform', help='Platform for desired_caps', default='Mac OS X 10.9')
parser.add_argument('-b', '--browser-name', help='Browser Name for desired_caps', default='chrome')
parser.add_argument('-v', '--version', default='')
args = vars(parser.parse_args())
return args
desired_caps = setup_parser()
browsers = [desired_caps]
print browsers
But if you're looking to test multiple browsers (which I suggest you do!), you should not try and use command line arguments for the desired_caps of each individual browser. You should instead load a json config file for the browsers and the desired_caps for each one that you want Sauce to run.
Maybe have a different config file for each set of browsers, and then use command line arguments to pass in the config files you want to load.

how to unittest the template variables passed to jinja2 template from webapp2 request handler

I'm trying to test my webapp2 handlers. To do this, I thought it would be a good idea to send a request to the handler e.g.:
request = webapp2.Request.blank('/')
# Get a response for that request.
response = request.get_response(main.app)
The problem is, response is mostly just a bunch of HTML etc.
I want to look at what was passed to my jinja2 template from the handler before it was turned into HTML.
I want my test to get at the state within the handler class code. I wan't to be able to see what certain variables looked like in the response handler, and then I want to see what the dict templates looks like before it was passed to render_to_response()
I want to test these variables have the correct values.
Here is my test code so far, but I'm stuck because response = request.get_response() just gives me a bunch of html and not the raw variables.
import unittest
import main
import webapp2
class DemoTestCase(unittest.TestCase):
def setUp(self):
pass
def tearDown(self):
pass
def testNothing(self):
self.assertEqual(42, 21 + 21)
def testHomeHandler(self):
# Build a request object passing the URI path to be tested.
# You can also pass headers, query arguments etc.
request = webapp2.Request.blank('/')
# Get a response for that request.
response = request.get_response(main.app)
# Let's check if the response is correct.
self.assertEqual(response.status_int, 200)
self.assertEqual(response.body, 'Hello, world!')
if __name__ == '__main__':
unittest.main()
and here is my handler:
class HomeHandler(BaseHandler):
def get(self, file_name_filter=None, category_filter=None):
file_names = os.listdir('blog_posts')
blogs = []
get_line = lambda file_: file_.readline().strip().replace("<!--","").replace("-->","")
for fn in file_names:
with open('blog_posts/%s' % fn) as file_:
heading = get_line(file_)
link_name = get_line(file_)
category = get_line(file_)
date_ = datetime.strptime(fn.split("_")[0], "%Y%m%d")
blog_dict = {'date': date_, 'heading': heading,
'link_name': link_name,
'category': category,
'filename': fn.replace(".html", ""),
'raw_file_name': fn}
blogs.append(blog_dict)
categories = Counter(d['category'] for d in blogs)
templates = {'categories': categories,
'blogs': blogs,
'file_name_filter': file_name_filter,
'category_filter': category_filter}
assert(len(file_names) == len(set(d['link_name'] for d in blogs)))
self.render_template('home.html', **templates)
and here is my basehandler:
class BaseHandler(webapp2.RequestHandler):
#webapp2.cached_property
def jinja2(self):
return jinja2.get_jinja2(app=self.app)
def render_template(self, filename, **kwargs):
#kwargs.update({})
#TODO() datastore caching here for caching of (handlername, handler parameters, changeable parameters, app_upload_date)
#TODO() write rendered page to its own html file, and just serve that whole file. (includes all posts). JQuery can show/hide posts.
self.response.write(self.jinja2.render_template(filename, **kwargs))
Perhaps I have got the wrong idea of how to do unit testing, or perhaps I should have written my code in a way that makes it easier to test? or is there some way of getting the state of my code?
Also if someone were to re-write the code and change the variable names, then the tests would break.
You can mock BaseHandler.render_template method and test its parameters.
See this question for a list of popular Python mocking frameworks.
Thanks to proppy's suggestion I ended up using a mock.
http://www.voidspace.org.uk/python/mock/
(mock is included as part or unittest.mock in python 3)
So here is my main.py code which is similar to what I have in webapp2:
note instead of BaseHandler.render_template i have BaseHandler.say_yo
__author__ = 'Robert'
print "hello from main"
class BaseHandler():
def say_yo(self,some_number=99):
print "yo"
return "sup"
class TheHandler(BaseHandler):
def get(self, my_number=42):
print "in TheHandler's get()"
print self.say_yo(my_number)
return "TheHandler's return string"
and atest.py:
__author__ = 'Robert'
import unittest
import main
from mock import patch
class DemoTestCase(unittest.TestCase):
def setUp(self):
pass
def tearDown(self):
pass
def testNothing(self):
self.assertEqual(42, 21 + 21)
def testSomeRequests(self):
print "hi"
bh = main.BaseHandler()
print bh.say_yo()
print "1111111"
with patch('main.BaseHandler.say_yo') as patched_bh:
print dir(patched_bh)
patched_bh.return_value = 'double_sup'
bh2 = main.BaseHandler()
print bh2.say_yo()
print "222222"
bh3 = main.BaseHandler()
print bh3.say_yo()
print "3333"
th = main.TheHandler()
print th.get()
print "44444"
with patch('main.BaseHandler.say_yo') as patched_bh:
patched_bh.return_value = 'last_sup'
th = main.TheHandler()
print th.get()
print th.get(123)
print "---"
print patched_bh.called
print patched_bh.call_args_list
print "555555"
if __name__ == '__main__':
unittest.main()
this code gives lots of output, here is a sample:
44444
in TheHandler's get()
last_sup
TheHandler's return string
in TheHandler's get()
last_sup
TheHandler's return string
---
True
[call(42), call(123)]
555555

Only accept a certain file type in FileField, server-side

How can I restrict FileField to only accept a certain type of file (video, audio, pdf, etc.) in an elegant way, server-side?
One very easy way is to use a custom validator.
In your app's validators.py:
def validate_file_extension(value):
import os
from django.core.exceptions import ValidationError
ext = os.path.splitext(value.name)[1] # [0] returns path+filename
valid_extensions = ['.pdf', '.doc', '.docx', '.jpg', '.png', '.xlsx', '.xls']
if not ext.lower() in valid_extensions:
raise ValidationError('Unsupported file extension.')
Then in your models.py:
from .validators import validate_file_extension
... and use the validator for your form field:
class Document(models.Model):
file = models.FileField(upload_to="documents/%Y/%m/%d", validators=[validate_file_extension])
See also: How to limit file types on file uploads for ModelForms with FileFields?.
Warning
For securing your code execution environment from malicious media files
Use Exif libraries to properly validate the media files.
Separate your media files from your application code
execution environment
If possible use solutions like S3, GCS, Minio or
anything similar
When loading media files on client side, use client native methods (for example if you are loading the media files non securely in a
browser, it may cause execution of "crafted" JavaScript code)
Django in version 1.11 has a newly added FileExtensionValidator for model fields, the docs is here: https://docs.djangoproject.com/en/dev/ref/validators/#fileextensionvalidator.
An example of how to validate a file extension:
from django.core.validators import FileExtensionValidator
from django.db import models
class MyModel(models.Model):
pdf_file = models.FileField(
upload_to="foo/", validators=[FileExtensionValidator(allowed_extensions=["pdf"])]
)
Note that this method is not safe. Citation from Django docs:
Don’t rely on validation of the file extension to determine a file’s
type. Files can be renamed to have any extension no matter what data
they contain.
There is also new validate_image_file_extension (https://docs.djangoproject.com/en/dev/ref/validators/#validate-image-file-extension) for validating image extensions (using Pillow).
A few people have suggested using python-magic to validate that the file actually is of the type you are expecting to receive. This can be incorporated into the validator suggested in the accepted answer:
import os
import magic
from django.core.exceptions import ValidationError
def validate_is_pdf(file):
valid_mime_types = ['application/pdf']
file_mime_type = magic.from_buffer(file.read(1024), mime=True)
if file_mime_type not in valid_mime_types:
raise ValidationError('Unsupported file type.')
valid_file_extensions = ['.pdf']
ext = os.path.splitext(file.name)[1]
if ext.lower() not in valid_file_extensions:
raise ValidationError('Unacceptable file extension.')
This example only validates a pdf, but any number of mime-types and file extensions can be added to the arrays.
Assuming you saved the above in validators.py you can incorporate this into your model like so:
from myapp.validators import validate_is_pdf
class PdfFile(models.Model):
file = models.FileField(upload_to='pdfs/', validators=(validate_is_pdf,))
You can use the below to restrict filetypes in your Form
file = forms.FileField(widget=forms.FileInput(attrs={'accept':'application/pdf'}))
There's a Django snippet that does this:
import os
from django import forms
class ExtFileField(forms.FileField):
"""
Same as forms.FileField, but you can specify a file extension whitelist.
>>> from django.core.files.uploadedfile import SimpleUploadedFile
>>>
>>> t = ExtFileField(ext_whitelist=(".pdf", ".txt"))
>>>
>>> t.clean(SimpleUploadedFile('filename.pdf', 'Some File Content'))
>>> t.clean(SimpleUploadedFile('filename.txt', 'Some File Content'))
>>>
>>> t.clean(SimpleUploadedFile('filename.exe', 'Some File Content'))
Traceback (most recent call last):
...
ValidationError: [u'Not allowed filetype!']
"""
def __init__(self, *args, **kwargs):
ext_whitelist = kwargs.pop("ext_whitelist")
self.ext_whitelist = [i.lower() for i in ext_whitelist]
super(ExtFileField, self).__init__(*args, **kwargs)
def clean(self, *args, **kwargs):
data = super(ExtFileField, self).clean(*args, **kwargs)
filename = data.name
ext = os.path.splitext(filename)[1]
ext = ext.lower()
if ext not in self.ext_whitelist:
raise forms.ValidationError("Not allowed filetype!")
#-------------------------------------------------------------------------
if __name__ == "__main__":
import doctest, datetime
doctest.testmod()
First. Create a file named formatChecker.py inside the app where the you have the model that has the FileField that you want to accept a certain file type.
This is your formatChecker.py:
from django.db.models import FileField
from django.forms import forms
from django.template.defaultfilters import filesizeformat
from django.utils.translation import ugettext_lazy as _
class ContentTypeRestrictedFileField(FileField):
"""
Same as FileField, but you can specify:
* content_types - list containing allowed content_types. Example: ['application/pdf', 'image/jpeg']
* max_upload_size - a number indicating the maximum file size allowed for upload.
2.5MB - 2621440
5MB - 5242880
10MB - 10485760
20MB - 20971520
50MB - 5242880
100MB 104857600
250MB - 214958080
500MB - 429916160
"""
def __init__(self, *args, **kwargs):
self.content_types = kwargs.pop("content_types")
self.max_upload_size = kwargs.pop("max_upload_size")
super(ContentTypeRestrictedFileField, self).__init__(*args, **kwargs)
def clean(self, *args, **kwargs):
data = super(ContentTypeRestrictedFileField, self).clean(*args, **kwargs)
file = data.file
try:
content_type = file.content_type
if content_type in self.content_types:
if file._size > self.max_upload_size:
raise forms.ValidationError(_('Please keep filesize under %s. Current filesize %s') % (filesizeformat(self.max_upload_size), filesizeformat(file._size)))
else:
raise forms.ValidationError(_('Filetype not supported.'))
except AttributeError:
pass
return data
Second. In your models.py, add this:
from formatChecker import ContentTypeRestrictedFileField
Then instead of using 'FileField', use this 'ContentTypeRestrictedFileField'.
Example:
class Stuff(models.Model):
title = models.CharField(max_length=245)
handout = ContentTypeRestrictedFileField(upload_to='uploads/', content_types=['video/x-msvideo', 'application/pdf', 'video/mp4', 'audio/mpeg', ],max_upload_size=5242880,blank=True, null=True)
Those are the things you have to when you want to only accept a certain file type in FileField.
after I checked the accepted answer, I decided to share a tip based on Django documentation. There is already a validator for use to validate file extension. You don't need to rewrite your own custom function to validate whether your file extension is allowed or not.
https://docs.djangoproject.com/en/3.0/ref/validators/#fileextensionvalidator
Warning
Don’t rely on validation of the file extension to determine a file’s
type. Files can be renamed to have any extension no matter what data
they contain.
I think you would be best suited using the ExtFileField that Dominic Rodger specified in his answer and python-magic that Daniel Quinn mentioned is the best way to go. If someone is smart enough to change the extension at least you will catch them with the headers.
You can define a list of accepted mime types in settings and then define a validator which uses python-magic to detect the mime-type and raises ValidationError if the mime-type is not accepted. Set that validator on the file form field.
The only problem is that sometimes the mime type is application/octet-stream, which could correspond to different file formats. Did someone of you overcome this issue?
Additionally i Will extend this class with some extra behaviour.
class ContentTypeRestrictedFileField(forms.FileField):
...
widget = None
...
def __init__(self, *args, **kwargs):
...
self.widget = forms.ClearableFileInput(attrs={'accept':kwargs.pop('accept', None)})
super(ContentTypeRestrictedFileField, self).__init__(*args, **kwargs)
When we create instance with param accept=".pdf,.txt", in popup with file structure as a default we will see files with passed extension.
Just a minor tweak to #Thismatters answer since I can't comment. According to the README of python-magic:
recommend using at least the first 2048 bytes, as less can produce incorrect identification
So changing 1024 bytes to 2048 to read the contents of the file and get the mime type base from that can give the most accurate result, hence:
def validate_extension(file):
valid_mime_types = ["application/pdf", "image/jpeg", "image/png", "image/jpg"]
file_mime_type = magic.from_buffer(file.read(2048), mime=True) # Changed this to 1024 to 2048
if file_mime_type not in valid_mime_types:
raise ValidationError("Unsupported file type.")
valid_file_extensions = [".pdf", ".jpeg", ".png", ".jpg"]
ext = os.path.splitext(file.name)[1]
if ext.lower() not in valid_file_extensions:
raise ValidationError("Unacceptable file extension.")