Django (audio) File Validation - django

I'm experimenting with a site that will allow users to upload audio files. I've read every doc that I can get my hands on but can't find much about validating files.
Total newb here (never done any file validation of any kind before) and trying to figure this out. Can someone hold my hand and tell me what I need to know?
As always, thank you in advance.

You want to validate the file before it gets written to disk. When you upload a file, the form gets validated then the uploaded file gets passed to a handler/method that deals with the actual writing to the disk on your server. So in between these two operations, you want to perform some custom validation to make sure it's a valid audio file
You could:
check if the the file is less then a certain size (good practice)
then check if the submitted file has a certain content type (i.e. an audio file)
this is pretty useless as someone could easily spoof it
then check that the file ends in a certain extension (or extensions)
this is also pretty useless
try read the file and see if it's actually audio
(I haven't tested this code)
models.py
class UserSong(models.Model):
title = models.CharField(max_length=100)
audio_file = models.FileField()
forms.py
class UserSongForm(forms.ModelForm):
# Add some custom validation to our file field
def clean_audio_file(self):
file = self.cleaned_data.get('audio_file',False):
if file:
if file._size > 4*1024*1024:
raise ValidationError("Audio file too large ( > 4mb )")
if not file.content-type in ["audio/mpeg","audio/..."]:
raise ValidationError("Content-Type is not mpeg")
if not os.path.splitext(file.name)[1] in [".mp3",".wav" ...]:
raise ValidationError("Doesn't have proper extension")
# Here we need to now to read the file and see if it's actually
# a valid audio file. I don't know what the best library is to
# to do this
if not some_lib.is_audio(file.content):
raise ValidationError("Not a valid audio file")
return file
else:
raise ValidationError("Couldn't read uploaded file")
views.py
from utils import handle_uploaded_file
def upload_file(request):
if request.method == 'POST':
form = UserSongForm(request.POST, request.FILES)
if form.is_valid():
# If we are here, the above file validation has completed
# so we can now write the file to disk
handle_uploaded_file(request.FILES['file'])
return HttpResponseRedirect('/success/url/')
else:
form = UploadFileForm()
return render_to_response('upload.html', {'form': form})
utils.py
# from django's docs
def handle_uploaded_file(f):
ext = os.path.splitext(f.name)[1]
destination = open('some/file/name%s'%(ext), 'wb+')
for chunk in f.chunks():
destination.write(chunk)
destination.close()
https://docs.djangoproject.com/en/dev/topics/http/file-uploads/#file-uploads
https://docs.djangoproject.com/en/dev/ref/forms/fields/#filefield
https://docs.djangoproject.com/en/dev/ref/files/file/#django.core.files.File

Related

How to zip multiple uploaded file in Django before saving it to database?

I am trying to compress a folder before saving it to database/file storage system using Django. For this task I am using ZipFile library. Here is the code of view.py:
class BasicUploadView(View):
def get(self, request):
file_list = file_information.objects.all()
return render(self.request, 'fileupload_app/basic_upload/index.html',{'files':file_list})
def post(self, request):
zipfile = ZipFile('test.zip','w')
if request.method == "POST":
for upload_file in request.FILES.getlist('file'): ## index.html name
zipfile.write(io.BytesIO(upload_file))
fs = FileSystemStorage()
content = fs.save(upload_file.name,upload_file)
data = {'name':fs.get_available_name(content), 'url':fs.url(content)}
zipfile.close()
return JsonResponse(data)
But I am getting the following error:
TypeError: a bytes-like object is required, not 'InMemoryUploadedFile'
Is there any solution for this problem? Since I may have to upload folder with large files, do I have to write a custom TemporaryFileUploadHandler for this purpose? I have recently started working with Django and it is quite new to me. Please help me with some advice.
InMemoryUploadedFile is an object that contains more than just file you should open file and read it content ( InMemoryUploadedFile.file is the file)
InMemoryUploadedFile.open()
You should open file with open() and then read() it's content, also you should check if you have uploaded files correctly also you could use with syntax for both zip and file
https://www.pythonforbeginners.com/files/with-statement-in-python

django: how to correctly specify output-download file-type (in this case mp3)?

I have a simple django platform where I can upload text files. Ultimately I want to return a downloadable mp3 audio file made from the text in the uploaded file. My problem currently is that I cannot seem to correctly specify the type of file that the website outputs for download.
I then tried to make the downloadable output of the website an mp3 file:
views.py (code adapted from https://github.com/sibtc/simple-file-upload)
def simple_upload(request):
if request.method == 'POST' and request.FILES['myfile']:
myfile = request.FILES['myfile']
print(str(request.FILES['myfile']))
x=str(myfile.read())
tts = gTTS(text=x, lang='en')
response=HttpResponse(tts.save("result.mp3"),content_type='mp3')
response['Content-Disposition'] = 'attachment;filename=result.mp3'
return response
return render(request, 'core/simple_upload.html')
Upon pressing the upload button, the text-to-speech conversion is successful but the content_type of the response is not definable as 'mp3'. The file that results from the download is result.mp3.txt and it contains 'None'.
Can you try to prepare your response using the sample code below?
I've managed to return CSV files correctly this way so it might help you too.
Here it is:
HttpResponse(content_type='text/plain') # Plain text file type
response['Content-Disposition'] = 'attachment; filename="attachment.txt"' # Plain text file extension
response.write("Hello, this is the file contents.")
return response
There are two problems I can see here. The first is that tts.save() returns None, and that is getting passed directly to the HttpResponse. Secondly, the content_type is set to mp3 and ought to be set to audio/mp3.
After calling tts.save(), open the mp3 and pass the file handle to the HttpResponse, and then also set the content_type correctly - for example:
def simple_upload(request):
if request.method == 'POST' and request.FILES['myfile']:
...
tts.save("result.mp3")
response=HttpResponse(open("result.mp3", "rb"), content_type='audio/mp3')

How to access file after upload in Django?

I'm working on a web. User can upload a file. This file is in docx format. After he uploads a file and choose which languages he wants to translate the file to, I want to redirect him to another page, where he can see prices for translations. The prices depends on particular language and number of characters in the docx file.
I can't figure out how to handle the file uploaded. I have a function which get's path to file and returns a number of characters. After uploading file and click on submit, I want to call this function so I can render new page with estimated prices.
I've read that I can call temporary_file_path on request.FILES['file'] but it raises
'InMemoryUploadedFile' object has no attribute 'temporary_file_path'
I want to find out how many characters uploaded file contains and send it in a request to another view - /order-estimation.
VIEW:
def create_order(request):
LanguageLevelFormSet = formset_factory(LanguageLevelForm, extra=5, max_num=5)
language_level_formset = LanguageLevelFormSet(request.POST or None)
job_creation_form = JobCreationForm(request.POST or None, request.FILES or None)
context = {'job_creation_form': job_creation_form,
'formset': language_level_formset}
if request.method == 'POST':
if job_creation_form.is_valid() and language_level_formset.is_valid():
cleaned_data_job_creation_form = job_creation_form.cleaned_data
cleaned_data_language_level_formset = language_level_formset.cleaned_data
for language_level_form in [d for d in cleaned_data_language_level_formset if d]:
language = language_level_form['language']
level = language_level_form['level']
Job.objects.create(
customer=request.user,
text_to_translate=cleaned_data_job_creation_form['text_to_translate'],
file=cleaned_data_job_creation_form['file'],
short_description=cleaned_data_job_creation_form['short_description'],
notes=cleaned_data_job_creation_form['notes'],
language_from=cleaned_data_job_creation_form['language_from'],
language_to=language,
level=level,
)
path = request.FILES['file'].temporary_file_path
utilities.docx_get_characters_number(path) # THIS NOT WORKS
return HttpResponseRedirect('/order-estimation')
else:
return render(request, 'auth/jobs/create-job.html', context=context)
return render(request, 'auth/jobs/create-job.html', context=context)
The InMemoryUploadedFile does not provide temporary_file_path. The content lives 'in memory' - as the class name implies.
By default Django uses InMemoryUploadedFile for files up to 2.5MB size, larger files use TemporaryFileUploadHandler. where the later provides the temporary_file_path method in question. Django Documentation
So an easy way would be to change your settings for FILE_UPLOAD_HANDLERS to always use TemporaryFileUploadHandler:
FILE_UPLOAD_HANDLERS = [
'django.core.files.uploadhandler.TemporaryFileUploadHandler',
]
Just keep in mind that this is not the most efficient way when you have a site with a lot of concurrent small upload requests.

Django form validation, clean(), and file upload

Can someone illuminate me as to exactly when an uploaded file is actually written to the location returned by "upload_to" in the FileField, in particular with regards to the order of field, model, and form validation and cleaning?
Right now I have a "clean" method on my model which assumes the uploaded file is in place, so it can do some validation on it. It looks like the file isn't yet saved, and may just be held in a temporary location or in memory. If that is the case, how do I "open" it or find a path to it if I need to execute some external process/program to validate the file?
Thanks,
Ian
The form cleansing has nothing to do with actually saving the file, or with saving any other data for that matter. The file isn't saved until to you run the save() method of the model instance (note that if you use ModelName.objects.create() this save() method is called for you automatically).
The bound form will contain an open File object, so you should be able to do any validation on that object directly. For example:
form = MyForm(request.POST, request.FILES)
if form.is_valid():
file_object = form.cleaned_data['myFile']
#run any validation on the file_object, or define a clean_myFile() method
# that will be run automatically when you call form.is_valid()
model_inst = MyModel('my_file' = file_object,
#assign other attributes here....
)
model_inst.save() #file is saved to disk here
What do you need to do on it? If your validation will work without a temporary file, you can access the data by calling read() on what your file field returns.
def clean_field(self):
_file = self.cleaned_data.get('filefield')
contents = _file.read()
If you do need it on the disk, you know where to go from here :) write it to a temporary location and do some magic on it!
Or write it as a custom form field. This is the basic idea how I go about verification of an MP3 file using the 'mutagen' library.
Notes:
first check the file size then if correct size write to tmp location.
Will write the file to temporary location specified in SETTINGS check its MP3 and then delete it.
The code:
from django import forms
import os
from mutagen.mp3 import MP3, HeaderNotFoundError, InvalidMPEGHeader
from django.conf import settings
class MP3FileField(forms.FileField):
def clean(self, *args, **kwargs):
super(MP3FileField, self).clean(*args, **kwargs)
tmp_file = args[0]
if tmp_file.size > 6600000:
raise forms.ValidationError("File is too large.")
file_path = getattr(settings,'FILE_UPLOAD_TEMP_DIR')+'/'+tmp_file.name
destination = open(file_path, 'wb+')
for chunk in tmp_file.chunks():
destination.write(chunk)
destination.close()
try:
audio = MP3(file_path)
if audio.info.length > 300:
os.remove(file_path)
raise forms.ValidationError("MP3 is too long.")
except (HeaderNotFoundError, InvalidMPEGHeader):
os.remove(file_path)
raise forms.ValidationError("File is not valid MP3 CBR/VBR format.")
os.remove(file_path)
return args

Processing file uploads before object is saved

I've got a model like this:
class Talk(BaseModel):
title = models.CharField(max_length=200)
mp3 = models.FileField(upload_to = u'talks/', max_length=200)
seconds = models.IntegerField(blank = True, null = True)
I want to validate before saving that the uploaded file is an MP3, like this:
def is_mp3(path_to_file):
from mutagen.mp3 import MP3
audio = MP3(path_to_file)
return not audio.info.sketchy
Once I'm sure I've got an MP3, I want to save the length of the talk in the seconds attribute, like this:
audio = MP3(path_to_file)
self.seconds = audio.info.length
The problem is, before saving, the uploaded file doesn't have a path (see this ticket, closed as wontfix), so I can't process the MP3.
I'd like to raise a nice validation error so that ModelForms can display a helpful error ("You idiot, you didn't upload an MP3" or something).
Any idea how I can go about accessing the file before it's saved?
p.s. If anyone knows a better way of validating files are MP3s I'm all ears - I also want to be able to mess around with ID3 data (set the artist, album, title and probably album art, so I need it to be processable by mutagen).
You can access the file data in request.FILES while in your view.
I think that best way is to bind uploaded files to a form, override the forms clean method, get the UploadedFile object from cleaned_data, validate it anyway you like, then override the save method and populate your models instance with information about the file and then save it.
a cleaner way to get the file before be saved is like this:
from django.core.exceptions import ValidationError
#this go in your class Model
def clean(self):
try:
f = self.mp3.file #the file in Memory
except ValueError:
raise ValidationError("A File is needed")
f.__class__ #this prints <class 'django.core.files.uploadedfile.InMemoryUploadedFile'>
processfile(f)
and if we need a path, ther answer is in this other question
You could follow the technique used by ImageField where it validates the file header and then seeks back to the start of the file.
class ImageField(FileField):
# ...
def to_python(self, data):
f = super(ImageField, self).to_python(data)
# ...
# We need to get a file object for Pillow. We might have a path or we might
# have to read the data into memory.
if hasattr(data, 'temporary_file_path'):
file = data.temporary_file_path()
else:
if hasattr(data, 'read'):
file = BytesIO(data.read())
else:
file = BytesIO(data['content'])
try:
# ...
except Exception:
# Pillow doesn't recognize it as an image.
six.reraise(ValidationError, ValidationError(
self.error_messages['invalid_image'],
code='invalid_image',
), sys.exc_info()[2])
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
return f