Django send excel file to Celery Task. Error InMemoryUploadedFile - django

I have background process - read excel file and save data from this file. I need to do read file in the background process. But i have error InMemoryUploadedFile.
My code
def create(self, validated_data):
company = ''
file_type = ''
email = ''
file = validated_data['file']
import_data.delay(file=file,
company=company,
file_type=file_type,
email=email)
my method looks like
#app.task
def import_data(
file,
company,
file_type,
email):
// some code
But i have error InMemoryUploadedFile.
How i can to send a file to cellery without errors?

When you delay a task, Celery will try to serialize the parameters which in your case a file is included.
Files and especially files in memory can't be serialized.
So to fix the problem you have to save the file and pass the file path to your delayed function and then read the file there and do your calculations.

Celery does not know how to serialize complex objects such as file objects. However, this can be solved pretty easily. What I do is to encode/decode the file to its Base64 string representation. This allows me to send the file directly through Celery.
The following example shows how (I intendedly placed each conversion separatedly, though this could be arranged in a more pythonic way):
import base64
import tempfile
# (Django, HTTP server)
file = request.FILES['files'].file
file_bytes = file.read()
file_bytes_base64 = base64.b64encode(file_bytes)
file_bytes_base64_str = file_bytes_base64.decode('utf-8') # this is a str
# (...send string through Celery...)
# (Celery worker task)
file_bytes_base64 = file_bytes_base64_str.encode('utf-8')
file_bytes = base64.b64decode(file_bytes_base64)
# Write the file to a temporary location, deletion is guaranteed
with tempfile.TemporaryDirectory() as tmp_dir:
tmp_file = os.path.join(tmp_dir, 'something.zip')
with open(tmp_file, 'wb') as f:
f.write(file_bytes)
# Process the file
This can be inefficient for large files but it becomes pretty handy for small/medium sized temporary files.

Related

Trigger action after file upload into server(File exist on the server path)

I hit one question below, could you please help to give some suggestions? Appreciate.
I want to get duration(length) of a videofile upload by user, then update this “duration” into database after file uploaded. --- I already have my own algo which can calculate the duration(length) of a videoFile. What I stuck is I need to run this algo immediately after file saved on the server.
I find "post_save", but it looks this execute before the file upload(before this file exist in this server path, so if I use "post_save", I will hit the error that this file is not exist).
VUE + Django + DjangoRestFramework
Here is my step:
1.Create Voice Model include Field: videoFile(FileField) and the duration of this video(FloatField)
(Because I need to get this file at the server path, so before this file exist, I set default value for example 0 to Duration column; after file uploaded, then I calculate, and update)
2.After user upload this file into server, I will run my algorithm in the file path and calculate this duration of this file(input: filepath and this filename; output: duration)
3.After get this duration of this file, I need to update this database immediately
So my question is, how to execute my algo immediately after this file already uploaded into server? And then update database.
So your task is the following:
Check django.db.models.fields.file.ImageField and related classes
Look at contribute_to_class: it hooks into post_init signal. On upload this triggers when the model is being created and all field values have processed, so the file is an uploaded file that is either in-memory or in temporary directory.
Implement the descriptor
Implement the FieldFile subclass
Implement the django.core.files.File subclass
Hook everything up
You don't need to know the exact time it's uploaded, so don't focus on that event. Django has all the instrumentation in place, you just need to hook it up in the same way ImageField works.
Here's an example of a File subclass to get you started (I use timedelta, cause I have DurationField in my model). [UPDATED] Made it deal with in-memory files.
# Example usage:
>>> f = open(os.path.expanduser("~/Movies/IMG_0305.m4v"), "r")
>>> video = VideoFile(file=f)
>>> video.duration
datetime.timedelta(seconds=6, microseconds=320000)
# Class:
class VideoFile(File):
#property
def duration(self) -> timedelta:
data = self._get_container_metadata()
secs = float(data["format"].get("duration", 0))
return timedelta(seconds=secs)
#property
def container_metadata(self) -> t.Dict[str, t.Any]:
return self._get_container_metadata()
def _get_container_metadata(self) -> t.Dict[str, t.Any]:
if not hasattr(self, "_container_metadata"):
try:
filename = self._extract_filename()
except FileNotFoundError:
proc = self._run_ffprobe_stdin()
else:
proc = self._run_ffprobe_on_disk(filename)
setattr(self, "_container_metadata", json.loads(proc.stdout))
return getattr(self, "_container_metadata")
def _extract_filename(self) -> str:
candidates = [
self.name,
os.path.join(settings.MEDIA_ROOT, self.name),
getattr(self.file, "name", "/nonexistent"),
]
for filename in candidates:
if filename and os.path.exists(filename):
return filename
raise FileNotFoundError("File is in-memory")
def _run_ffprobe_on_disk(self, filename: str) -> subprocess.CompletedProcess:
cmd = self.ffprobe_cmd
cmd.append(filename)
try:
return subprocess.run(
cmd, capture_output=True, encoding="utf-8", check=True,
)
except subprocess.CalledProcessError:
raise TypeError("Not a valid video file or unknown container format.")
#property
def ffprobe_cmd(self) -> t.List[str]:
return [
"ffprobe",
"-show_format",
"-print_format",
"json",
"-loglevel",
"quiet",
].copy()
def _run_ffprobe_stdin(self) -> subprocess.CompletedProcess:
closed = self.closed
self.open()
file_pos = self.tell()
self.seek(0)
cmd = self.ffprobe_cmd
cmd.append("-")
try:
return subprocess.run(
cmd, stdin=self.file, capture_output=True, encoding="utf-8", check=True,
)
except subprocess.CalledProcessError:
raise TypeError("Not a valid video file or unknown container format.")
finally:
self.seek(file_pos)
if closed:
self.close()
I find TemporaryUploadedFile on the django official document which can be used to access the uploaded file before saved to the database.
Here are my steps to solve my question:
Access temporary file on the view.py file as below
file_obj = request.data['voiceFile'].temporary_file_path()
Using my algo to get duration(length) from the above temporary file(voice file)
result = GetLengthAlgo(file_obj)
Save to database
serializer.save(duration=float(result.stdout))

Why the csv file in S3 is empty after loading from Lambda

import os
import csv
import boto3
client = boto3.client('s3')
fields = ['dt','dh','key','value']
row = [dt,dh,key,value]
print(row)
# name of csv file
filename = "/tmp/sns_file.csv"
# writing to csv file
with open(filename, 'a',newline='') as csvfile:
# creating a csv writer object
csvwriter = csv.writer(csvfile)
# writing the fields
csvwriter.writerow(fields)
# writing the data row
csvwriter.writerow(row)
final_file_name="final_report_"+dt+".csv"
client.upload_file('/tmp/sns_file.csv',BUCKET_NAME,final_file_name)
if os.path.exists('/tmp/sns_file.csv'):
os.remove('/tmp/sns_file.csv')
else:
print("The file does not exist")
Python's with block is a context manager, which means, in simple terms, it will "clean up" after all operations within it are done.
In context of files "clean up" means closing file. Any changes you write to the file will not be saved on disk until you close the file. So you need to move upload operation outside and after the with block.

How does one use magic to verify file type in a Django form clean method?

I have written an email form class in Django with a FileField. I want to check the uploaded file for its type via checking its mimetype. Subsequently, I want to limit file types to pdfs, word, and open office documents.
To this end, I have installed python-magic and would like to check file types as follows per the specs for python-magic:
mime = magic.Magic(mime=True)
file_mime_type = mime.from_file('address/of/file.txt')
However, recently uploaded files lack addresses on my server. I also do not know of any method of the mime object akin to "from_file_content" that checks for the mime type given the content of the file.
What is an effective way to use magic to verify file types of uploaded files in Django forms?
Stan described good variant with buffer. Unfortunately the weakness of this method is reading file to the memory. Another option is using temporary stored file:
import tempfile
import magic
with tempfile.NamedTemporaryFile() as tmp:
for chunk in form.cleaned_data['file'].chunks():
tmp.write(chunk)
print(magic.from_file(tmp.name, mime=True))
Also, you might want to check the file size:
if form.cleaned_data['file'].size < ...:
print(magic.from_buffer(form.cleaned_data['file'].read()))
else:
# store to disk (the code above)
Additionally:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
So you might want to handle it like so:
import os
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
for chunk in form.cleaned_data['file'].chunks():
tmp.write(chunk)
print(magic.from_file(tmp.name, mime=True))
finally:
os.unlink(tmp.name)
tmp.close()
Also, you might want to seek(0) after read():
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
Where uploaded data is stored
Why no trying something like that in your view :
m = magic.Magic()
m.from_buffer(request.FILES['my_file_field'].read())
Or use request.FILES in place of form.cleaned_data if django.forms.Form is really not an option.
mime = magic.Magic(mime=True)
attachment = form.cleaned_data['attachment']
if hasattr(attachment, 'temporary_file_path'):
# file is temporary on the disk, so we can get full path of it.
mime_type = mime.from_file(attachment.temporary_file_path())
else:
# file is on the memory
mime_type = mime.from_buffer(attachment.read())
Also, you might want to seek(0) after read():
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
Example from Django code. Performed for image fields during validation.
You can use django-safe-filefield package to validate that uploaded file extension match it MIME-type.
from safe_filefield.forms import SafeFileField
class MyForm(forms.Form):
attachment = SafeFileField(
allowed_extensions=('xls', 'xlsx', 'csv')
)
In case you're handling a file upload and concerned only about images,
Django will set content_type for you (or rather for itself?):
from django.forms import ModelForm
from django.core.files import File
from django.db import models
class MyPhoto(models.Model):
photo = models.ImageField(upload_to=photo_upload_to, max_length=1000)
class MyForm(ModelForm):
class Meta:
model = MyPhoto
fields = ['photo']
photo = MyPhoto.objects.first()
photo = File(open('1.jpeg', 'rb'))
form = MyForm(files={'photo': photo})
if form.is_valid():
print(form.instance.photo.file.content_type)
It doesn't rely on content type provided by the user. But
django.db.models.fields.files.FieldFile.file is an undocumented
property.
Actually, initially content_type is set from the request, but when
the form gets validated, the value is updated.
Regarding non-images, doing request.FILES['name'].read() seems okay to me.
First, that's what Django does. Second, files larger than 2.5 Mb by default
are stored on a disk. So let me point you at the other answer
here.
For the curious, here's the stack trace that leads to updating
content_type:
django.forms.forms.BaseForm.is_valid: self.errors
django.forms.forms.BaseForm.errors: self.full_clean()
django.forms.forms.BaseForm.full_clean: self._clean_fields()
django.forms.forms.BaseForm._clean_fiels: field.clean()
django.forms.fields.FileField.clean: super().clean()
django.forms.fields.Field.clean: self.to_python()
django.forms.fields.ImageField.to_python

Django form validation, clean(), and file upload

Can someone illuminate me as to exactly when an uploaded file is actually written to the location returned by "upload_to" in the FileField, in particular with regards to the order of field, model, and form validation and cleaning?
Right now I have a "clean" method on my model which assumes the uploaded file is in place, so it can do some validation on it. It looks like the file isn't yet saved, and may just be held in a temporary location or in memory. If that is the case, how do I "open" it or find a path to it if I need to execute some external process/program to validate the file?
Thanks,
Ian
The form cleansing has nothing to do with actually saving the file, or with saving any other data for that matter. The file isn't saved until to you run the save() method of the model instance (note that if you use ModelName.objects.create() this save() method is called for you automatically).
The bound form will contain an open File object, so you should be able to do any validation on that object directly. For example:
form = MyForm(request.POST, request.FILES)
if form.is_valid():
file_object = form.cleaned_data['myFile']
#run any validation on the file_object, or define a clean_myFile() method
# that will be run automatically when you call form.is_valid()
model_inst = MyModel('my_file' = file_object,
#assign other attributes here....
)
model_inst.save() #file is saved to disk here
What do you need to do on it? If your validation will work without a temporary file, you can access the data by calling read() on what your file field returns.
def clean_field(self):
_file = self.cleaned_data.get('filefield')
contents = _file.read()
If you do need it on the disk, you know where to go from here :) write it to a temporary location and do some magic on it!
Or write it as a custom form field. This is the basic idea how I go about verification of an MP3 file using the 'mutagen' library.
Notes:
first check the file size then if correct size write to tmp location.
Will write the file to temporary location specified in SETTINGS check its MP3 and then delete it.
The code:
from django import forms
import os
from mutagen.mp3 import MP3, HeaderNotFoundError, InvalidMPEGHeader
from django.conf import settings
class MP3FileField(forms.FileField):
def clean(self, *args, **kwargs):
super(MP3FileField, self).clean(*args, **kwargs)
tmp_file = args[0]
if tmp_file.size > 6600000:
raise forms.ValidationError("File is too large.")
file_path = getattr(settings,'FILE_UPLOAD_TEMP_DIR')+'/'+tmp_file.name
destination = open(file_path, 'wb+')
for chunk in tmp_file.chunks():
destination.write(chunk)
destination.close()
try:
audio = MP3(file_path)
if audio.info.length > 300:
os.remove(file_path)
raise forms.ValidationError("MP3 is too long.")
except (HeaderNotFoundError, InvalidMPEGHeader):
os.remove(file_path)
raise forms.ValidationError("File is not valid MP3 CBR/VBR format.")
os.remove(file_path)
return args

Processing file uploads before object is saved

I've got a model like this:
class Talk(BaseModel):
title = models.CharField(max_length=200)
mp3 = models.FileField(upload_to = u'talks/', max_length=200)
seconds = models.IntegerField(blank = True, null = True)
I want to validate before saving that the uploaded file is an MP3, like this:
def is_mp3(path_to_file):
from mutagen.mp3 import MP3
audio = MP3(path_to_file)
return not audio.info.sketchy
Once I'm sure I've got an MP3, I want to save the length of the talk in the seconds attribute, like this:
audio = MP3(path_to_file)
self.seconds = audio.info.length
The problem is, before saving, the uploaded file doesn't have a path (see this ticket, closed as wontfix), so I can't process the MP3.
I'd like to raise a nice validation error so that ModelForms can display a helpful error ("You idiot, you didn't upload an MP3" or something).
Any idea how I can go about accessing the file before it's saved?
p.s. If anyone knows a better way of validating files are MP3s I'm all ears - I also want to be able to mess around with ID3 data (set the artist, album, title and probably album art, so I need it to be processable by mutagen).
You can access the file data in request.FILES while in your view.
I think that best way is to bind uploaded files to a form, override the forms clean method, get the UploadedFile object from cleaned_data, validate it anyway you like, then override the save method and populate your models instance with information about the file and then save it.
a cleaner way to get the file before be saved is like this:
from django.core.exceptions import ValidationError
#this go in your class Model
def clean(self):
try:
f = self.mp3.file #the file in Memory
except ValueError:
raise ValidationError("A File is needed")
f.__class__ #this prints <class 'django.core.files.uploadedfile.InMemoryUploadedFile'>
processfile(f)
and if we need a path, ther answer is in this other question
You could follow the technique used by ImageField where it validates the file header and then seeks back to the start of the file.
class ImageField(FileField):
# ...
def to_python(self, data):
f = super(ImageField, self).to_python(data)
# ...
# We need to get a file object for Pillow. We might have a path or we might
# have to read the data into memory.
if hasattr(data, 'temporary_file_path'):
file = data.temporary_file_path()
else:
if hasattr(data, 'read'):
file = BytesIO(data.read())
else:
file = BytesIO(data['content'])
try:
# ...
except Exception:
# Pillow doesn't recognize it as an image.
six.reraise(ValidationError, ValidationError(
self.error_messages['invalid_image'],
code='invalid_image',
), sys.exc_info()[2])
if hasattr(f, 'seek') and callable(f.seek):
f.seek(0)
return f