_io.BytesIO' object has no attribute 'name' for small size file - django

I'm uploading file with python/Django and getting two different object. When file is small in size, getting InMemoryUploadedFile object, while file is quite large, i got in temporaryFileWrapper. I m checking file mime type with magic library.
when File is large, getting correctmime type with this code
file_name = self.cleaned_data.get('file')
file_mime= magic.from_file(file_name.file.name, mime=True)
supported_format= ['video/x-flv','video/mp4','video/3gpp','video/x-ms-wmv']
if file_mime in supported_format:
...........
But when file is small in size i m getting error
_io.BytesIO' object has no attribute 'name
For large file
For small file

As per Django's this doc, Django have two upload file handler MemoryFileUploadHandler and TemporaryFileUploadHandler.
MemoryFileUploadHandler stream to memory , and TemporaryFileUploadHandler stream to disk.
I have set deafult TemporaryFileUploadHandler im my setting.py
FILE_UPLOAD_HANDLERS= ["django.core.files.uploadhandler.TemporaryFileUploadHandler"]
We can also write own custom FileUploadHandler in django

Related

When using Django's Default Storage should/can you close() an opened file?

When using Django's DefaultStorage it's possible to open and read a file something like this:
from django.core.files.storage import default_storage
file = default_storage.open("dir/file.txt", mode="rb")
data = file.read()
When using python's own open() method, it's best to close() the file afterwards, or use a with open("dir/file.txt") as file: construction.
But reading the docs for Django's Storage classes, and browsing the source, I don't see a close() equivalent.
So my questions are:
Should a file opened with Django's Default Storage be closed?
If so, how?
If not, why isn't it necessary?
You don't see a close method because you are looking at the Storage class. The open method of the Storage class returns an instance of django.core.files.base.File [Source code] which basically wraps the python file object and also has a close method that closes the file (The methods like read, etc. are inherited from FileProxyMixin).
Generally when you open a file you should close it, this is the same with Django, which is also emphasised in the documentation:
Closing files is especially important when accessing file fields in a
loop over a large number of objects. If files are not manually closed
after accessing them, the risk of running out of file descriptors may
arise. This may lead to the following error:
OSError: [Errno 24] Too many open files
But there are few instances where you shouldn't close files, which mostly are when you are passing the file to some function / method / object that will read it, for example if you create a FileResponse object you shouldn't close the file as Django will close it by itself:
The file will be closed automatically, so don’t open it with a context
manager.
To complete your example code, you will close the file as:
from django.core.files.storage import default_storage
file = default_storage.open("dir/file.txt", mode="rb")
data = file.read()
file.close()

How do I read a zip(which in fact is in bytes form) without creating a temporary copy?

I am uploading a zip(which further contains pdf files to be read) as multipart/form-data .
I am handling the upload as below:
file = request.FILES["zipfile"].read() #gives a byte object
bytes_io = io.BytesIO(file) # gives a IO stream object
What I intend to do is to read the pdf files inside the zip, but I am stuck as to how to proceed from here. I am confused, what do I do with either the bytes object from the request or the IO object after conversion.
Found the answer just after asking the question.
Simply use the zipfile package as below:
from zipfile import ZipFile
file = request.FILES["zipfile"].read()
bytes_io = io.BytesIO(file)
zipfile = ZipFile(bytes_io, 'r')
And then refer the docs for further operations on the zip file.
Hope it helps!

Why is setting a django FileField from existing file on the same partition slow?

In my Django application I have to deal with huge files. Instead of uploading them via the web app, the users may place them into a folder (called .dump) on a Samba share and then can choose the file in the Django app to create a new model instance from it. The view looks roughly like this:
class AddDumpedMeasurement(View):
def get(self, request, *args, **kwargs):
filename = request.GET.get('filename', None)
dump_dir = os.path.join(settings.MEDIA_ROOT, settings.MEASUREMENT_DATA_DUMP_PATH)
in_file = os.path.join(dump_dir, filename)
if isfile(in_file):
try:
with open(in_file, 'rb') as f:
object = NCFile.objects.create(sample=sample, created_by=request.user, file=File(f))
return JsonResponse(data={'redirect': object.get_absolute_url()})
except:
return JsonResponse(data={'error': 'Couldn\'t read file'}, status=400)
else:
return JsonResponse(data={'error': 'File not found'}, status=400)
As MEDIA_ROOT and .dump are on the same Samba share (which is mounted by the web server), why is moving the file to its new location so slow? I would have expected it to be almost instantaneous. Is it because I open() it and stream the bytes to the file object? If so, is there a better way to move the file to its correct destination and create the model instance?
Using a temporary file and replacing it with the original one allows one to use os.rename which is fast.
tmp_file = NamedTemporaryFile()
object = NCFile.objects.create(..., file=File(tmp_file))
tmp_file.close()
if isfile(object.file.path):
os.remove(object.file.path)
new_relative_path = os.path.join(os.path.dirname(object.file.name), filename)
new_relative_path = object.file.storage.get_available_name(new_relative_path)
os.rename(in_file, os.path.join(settings.MEDIA_ROOT, new_relative_path))
object.file.name = new_relative_path
object.save()
Is it because I open() it and stream the bytes to the file object?
I would argue that it is so. A simple move operation on a file system object means just updating a record on the file systems internal database. That would indeed be instantaneous
opening a local file, reading it line by line is like a copy operation which could be slow depending on the file size. Additionally you are doing this at a very high level while an OS copy operation happens at a much lower level.
But that's not the real cause of the problem. You have said the files are on a samba share. Which I presume means that you have mounted a remote folder locally. Thus when you read the file in question you are actually fetching it over the network. That will be slower than a disk read. Then when you write the destination file, you are writing data over the network, again an operation that's slower than a disk write.

upload size on multiple file upload with django

I'm using Django to upload some files with one multiple file input. Using this upload-handler that writes the cumulated size of uploaded chunks in a session-variable and some jQuery I finally got a progress-bar working. But I still got a problem with the uploaded file size, i.e. the files are uploaded to astonishing 144% of their original size. At least that's what the progress bar says. The size of the uploaded files on the server directory actually is as it should be.
As you see in the handler-script, the size is cumulated via:
data['uploaded'] += self.chunk_size
My guess is that self.chunk_size is a static value and not the actual size of the received chunks. So when there is a chunk received with a smaller size - in case I upload files that are smaller than the chunk-limit set - there is more cumulated than actually uploaded.
Now my question: Is there a way to get the actual chunk size?
I think you can use the length of the raw_data instead:
data['uploaded'] += len(raw_data)

Django gives "I/O operation on closed file" error when reading from a saved ImageField

I have a model with two image fields, a source image and a thumbnail.
When I update the new source image, save it and then try to read the source image to crop/scale it to a thumbnail I get an "I/O operation on closed file" error from PIL.
If I update the source image, don't save the source image, and then try to read the source image to crop/scale, I get an "attempting to read from closed file" error from PIL.
In both cases the source image is actually saved and available in later request/response loops.
If I don't crop/scale in a single request/response loop but instead upload on one page and then crop/scale in another page this all works fine.
This seems to be a cached buffer being reused some how, either by PIL or by the Django file storage. Any ideas on how to make an ImageField readable after saving?
More information ... ImageField is clearly closing the underlying file after saving. Is there any way to force a refresh of the ImageField? I see a few people using seek(0) but that will not work in this case.
There is a bug in the ImageField which I've tracked down and submitted to the django project.
If you have a simple model with an ImageField?, the following code will fail with a "I/O operation on closed file":
instance = MyClass.objects.get(...)
w = instance.image.width
h = instance.image.height
original = Image.open(instance.image)
The work around is to reopen the file:
instance = MyClass.objects.get(...)
w = instance.image.width
h = instance.image.height
instance.image.open()
original = Image.open(instance.image)