Django FileResponse - How to speed up file download - django

I have a setup that lets users download files that are stored in the DB as BYTEA data. Everything works OK, except the download speed is very slow...it seems to download in 33KB chunks, one chunk per second.
Is there a setting I can specify to speed this up?
views.py
from django.http import FileResponse
def getFileResponse(filedata, filename, filesize, contenttype):
response = FileResponse(filedata, content_type=contenttype)
response['Content-Disposition'] = 'attachment; filename=%s' % filename
response['Content-Length'] = filesize
return response
return getFileResponse(
filedata = myfile.filedata, # Binary data from DB
filename = myfile.filename + myfile.fileextension,
filesize = myfile.filesize,
contenttype = myfile.filetype
)
Previously, I had the binary data returned as an HttpResponse and it downloaded like a normal file, with normal speeds. This worked fine locally, but when I pushed to Heroku, it wouldn't download the file -- instead displaying <Memory at XXX> in the download file.
And another side issue...when I include a text file with non-ASCII data (i.e. á), I get an error as well:
UnicodeEncodeError: 'ascii' codec can't encode characters...: ordinal not in range(128)
How can I handle files with Unicode data?
Update
Anyone know why the download speed gets so slow when changing from HTTPResponse to FileResponse? Or alternatively, why the HTTPResponse to return a file doesn't work on Heroku?
Update - Google Drive
I re-worked my application and hooked it up with a Google Drive back-end for serving files. It employs BytesIO() suggested by Eric below:
def download_file(self, fileid, mimetype=None):
# Get binary file data
request = self.get_file(fileid=fileid, mediaflag=True)
stream = io.BytesIO()
downloader = MediaIoBaseDownload(stream, request)
done = False
# Retry if we received HTTPError
for retry in range(0, 5):
try:
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))
return stream.getvalue()
except (HTTPError) as error:
return ('API error: {}. Try # {} failed.'.format(error.response, retry))

I think the difference you observe between HttpResponse vs. FileResponse is caused by the spec: https://www.python.org/dev/peps/pep-3333/#buffering-and-streaming
In your previous code, an HttpResponse was created with one huge byte string containing your whole file, and the first iteration pass returned the complete response body. With a a FileResponse, the file is iterated in chunks (of 4kb, 8kb or other depending on your WSGI app server), which (I think) are streamed immediately upstream (to the reverse proxy then client), which may add overhead (more communication over process boundaries?).
It would help to know the app server used (uwsgi, gunicorn, waitress, other) and its relevant config. Also more details about the heroku error in case that can be solved!

why you store whole file in database.
best case is to store file on hard and store only path on database
then according to your web server you can let web server to serve file.
web services serve file better than Django.
if files have no access check store them on media
if your files have access control you according to your web server you can use some response headers
if you use Nginx must use X-Accel-Redirect and use any alternative on other web services tutorial on https://wellfire.co/learn/nginx-django-x-accel-redirects/

Related

Why is the file uploaded to AWS S3 0B in size?

I am developing a webapplication with Flask as the backend and Nuxt JS as the frontend. I receive an image file from the frontend and can save it to my Flask directory structure locally. The file is ok and the images is being shown if I open it. Now i want to upload this image to AWS S3 instead of saving it to my disk. I use the boto3 SDK, here is my code:
Here is my save_picture method, that opens the image file and resizes it. I had the save method, but commented it out to avoid saving the file to disk as I want it only on S3.
def save_picture(object_id, form_picture, path):
if form_picture is None:
return None
random_hex = token_hex(8)
filename = form_picture.filename
if '.' not in filename:
return None
extension = filename.rsplit('.', 1)[1].lower()
if not allowed_file(extension, form_picture):
return None
picture_fn = f'{object_id}_{random_hex}.{extension}'
picture_path = current_app.config['UPLOAD_FOLDER'] / path / picture_fn
# resizing image and saving the small version
output_size = (1280, 720)
i = Image.open(form_picture)
i.thumbnail(output_size)
# i.save(picture_path)
return picture_fn
image_name = save_picture(object_id=new_object.id, form_picture=file, path=f'{object_type}_images')
s3 = boto3.client(
's3',
aws_access_key_id=current_app.config['AWS_ACCESS_KEY'],
aws_secret_access_key=current_app.config['AWS_SECRET_ACCESS_KEY']
)
print(file) # this prints <FileStorage: 'Capture.JPG' ('image/jpeg')>, so the file is ok
try:
s3.upload_fileobj(
file,
current_app.config['AWS_BUCKET_NAME'],
image_name,
ExtraArgs={
'ContentType': file.content_type
}
)
except Exception as e:
print(e)
return make_response({'msg': 'Something went wrong.'}, 500)
I can see the uploaded file in my S3, but it shows 0 B in size and if I download it, it says that it cannot be viewed.
I have tried different access policies in S3, as well as many tutorials online, nothing seems to help. Changing the version of S3 to v3 when creating the client breaks the whole system and the file is not being uploaded at all with an access error.
What could be the reason for this upload failure? I it the config of AWS or something else?
Thank you!
Thanks to #jarmod I tried to avoid the image processing and it worked. I am now resizing the image, saving it to disk, opening the saved image, not the initial file, and sending it to S3. I then delete the image on disk as I don't need it.

Django, Store jpg file received as string in http POST

I am receiving an http request from a desktop application with a screenshot. I cannot speak with the developer or see source code, so all I have is the http request I am getting.
The file isn't in request.FILES, it is in request.POST.
#csrf_exempt
def create_contract_event_handler(request, contract_id, event_type):
keyboard_events_count = request.POST.get('keyboard_events_count')
mouse_events_count = request.POST.get('mouse_events_count')
screenshot_file = request.POST.get('screenshot_file')
barr2 = bytes(screenshot_file.encode(encoding='utf8'))
with open('.test/output.jpeg', 'wb') as f:
f.write(barr2)
f.close()
The file is corrupted.
The binary starts like this, I don't know if that helps:
����JFIFHH��C
%# , #&')*)-0-(0%()(��C
(((((((((((((((((((((((((((((((((((((((((((((((((((�� `"��
Also, if I try to open the image with PIL, I get the following error:
from PIL import Image
im = Image.open('./test/output.jpg')
#OSError: cannot identify image file './test/output.jpg'
Finally, I managed to touch the code in the other hand, the 'filename' was missing in the header and for that reason I was getting the file in the POST instead of in the FILES dictionary.

Django Tweepy can't access Amazon S3 file

I'm using Tweepy, a tweeting python library, django-storages and boto. I have a custom manage.py command that works correctly locally, it gets an image from the filesystem and tweets that image. If I change the storage to Amazon S3, however, I can't access the file. It gives me this error:
raise TweepError('Unable to access file: %s' % e.strerror)
I tried making the images in the bucket "public". Didn't work. This is the code (it works without S3):
filename = model_object.image.file.url
media_ids = api.media_upload(filename=filename) # ERROR
params = {'status': tweet_text, 'media_ids': [media_ids.media_id_string]}
api.update_status(**params)
This line:
model_object.image.file.url
Gives me the complete url of the image I want to tweet, something like this:
https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg?Signature=xxxExpires=1467645897&AWSAccessKeyId=yyy
I also tried constructing the url manually, since it is a public image stored in my bucket, like this:
filename = "https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg"
But it doesn't work.
¿Why do I get the Unable to access file error?
The source code from tweepy looks like this:
def media_upload(self, filename, *args, **kwargs):
""" :reference: https://dev.twitter.com/rest/reference/post/media/upload
:allowed_param:
"""
f = kwargs.pop('file', None)
headers, post_data = API._pack_image(filename, 3072, form_field='media', f=f) # ERROR
kwargs.update({'headers': headers, 'post_data': post_data})
def _pack_image(filename, max_size, form_field="image", f=None):
"""Pack image from file into multipart-formdata post body"""
# image must be less than 700kb in size
if f is None:
try:
if os.path.getsize(filename) > (max_size * 1024):
raise TweepError('File is too big, must be less than %skb.' % max_size)
except os.error as e:
raise TweepError('Unable to access file: %s' % e.strerror)
Looks like Tweepy can't get the image from the Amazon S3 bucket, but how can I make it work? Any advice will help.
The issue occurs when tweepy attempts to get file size in _pack_image:
if os.path.getsize(filename) > (max_size * 1024):
The function os.path.getsize assumes it is given a file path on disk; however, in your case it is given a URL. Naturally, the file is not found on disk and os.error is raised. For example:
# The following raises OSError on my machine
os.path.getsize('https://criptolibertad.s3.amazonaws.com/OrillaLibertaria/195.jpg')
What you could do is to fetch the file content, temporarily save it locally and then tweet it:
import tempfile
with tempfile.NamedTemporaryFile(delete=True) as f:
name = model_object.image.file.name
f.write(model_object.image.read())
media_ids = api.media_upload(filename=name, f=f)
params = dict(status='test media', media_ids=[media_ids.media_id_string])
api.update_status(**params)
For your convenience, I published a fully working example here: https://github.com/izzysoftware/so38134984

Django to serve generated excel file

I looked at the various questions similar to mine, but I could not find anything a fix for my problem.
In my code, I want to serve a freshly generated excel file residing in my app directory in a folder named files
excelFile = ExcelCreator.ExcelCreator("test")
excelFile.create()
response = HttpResponse(content_type='application/vnd.ms-excel')
response['Content-Disposition'] = 'attachment; filename="test.xls"'
return response
So when I click on the button that run this part of the code, it sends to the user an empty file. By looking at my code, I can understand that behavior because I don't point to that file within my response...
I saw some people use the file wrapper (which I don't quite understand the use). So I did like that:
response = HttpResponse(FileWrapper(excelFile.file),content_type='application/vnd.ms-excel')
But then, I receive the error message from server : A server error occurred. Please contact the administrator.
Thanks for helping me in my Django quest, I'm getting better with all of your precious advices!
First, you need to understand how this works, you are getting an empty file because that is what you are doing, actually:
response = HttpResponse(content_type='application/vnd.ms-excel')
response['Content-Disposition'] = 'attachment; filename="test.xls"'
HttpResponse receives as first arg the content of the response, take a look to its contructor:
def __init__(self, content='', mimetype=None, status=None, content_type=None):
so you need to create the response with the content that you wish, is this case, with the content of your .xls file.
You can use any method to do that, just be sure the content is there.
Here a sample:
import StringIO
output = StringIO.StringIO()
# read your content and put it in output var
out_content = output.getvalue()
output.close()
response = HttpResponse(out_content, mimetype='application/vnd.ms-excel')
response['Content-Disposition'] = 'attachment; filename="test.xls"'
I would recommend you use:
python manage.py runserver
to run your application from the command line. From here you will see the console output of your application and any exceptions that are thrown as it runs. This may provide a quick resolution to your problem.

django return file over HttpResponse - file is not served correctly

I want to return some files in a HttpResponse and I'm using the following function. The file that is returned always has a filesize of 1kb and I do not know why. I can open the file, but it seems that it is not served correctly. Thus I wanted to know how one can return files with django/python over a HttpResponse.
#login_required
def serve_upload_files(request, file_url):
import os.path
import mimetypes
mimetypes.init()
try:
file_path = settings.UPLOAD_LOCATION + '/' + file_url
fsock = open(file_path,"r")
#file = fsock.read()
#fsock = open(file_path,"r").read()
file_name = os.path.basename(file_path)
file_size = os.path.getsize(file_path)
print "file size is: " + str(file_size)
mime_type_guess = mimetypes.guess_type(file_name)
if mime_type_guess is not None:
response = HttpResponse(fsock, mimetype=mime_type_guess[0])
response['Content-Disposition'] = 'attachment; filename=' + file_name
except IOError:
response = HttpResponseNotFound()
return response
Edit:
The bug is actually not a bug ;-)
This solution is working in production on an apache server, thus the source is ok.
While writing this question I tested it local with the django development server and was wondering why it does not work. A friend of mine told me that this issue could arise if the mime types are not set in the server. But he was not sure if this is the problem. But one thing for sure.. it has something to do with the server.
Could it be that the file contains some non-ascii characters that render ok in production but not in development?
Try reading the file as binary:
fsock = open(file_path,"rb")
Try passing the fsock iterator as a parameter to HttpResponse(), rather than to its write() method which I think expects a string.
response = HttpResponse(fsock, mimetype=...)
See http://docs.djangoproject.com/en/dev/ref/request-response/#passing-iterators
Also, I'm not sure you want to call close on your file before returning response. Having played around with this in the shell (I've not tried this in an actual Django view), it seems that the response doesn't access the file until the response itself is read. Trying to read a HttpResponse created using a file that is now closed results in a ValueError: I/O operation on closed file.
So, you might want to leave fsock open, and let the garbage collector deal with it after the response is read.
Try disabling "django.middleware.gzip.GZipMiddleware" from your MIDDLEWARE_CLASSES in settings.py
I had the same problem, and after I looked around the middleware folder, this middleware seemed guilty to me and removing it did the trick for me.