Storing images and thumbnails on s3 in django - django

I'm trying to get my images thumbnailed and stored on s3 using django-storages, boto, and sorl-thumbnail. I have it working, but it's very slow, even with small images. I don't mind it being slow when I save the form and upload the images to s3, but I'd like it to display the image quickly after that.
The answer to this SO question explains that the thumbnail won't be created until first access, but that you can use get_thumbnail() to create it beforehand.
Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation
I'm doing that, and now it seems that all entries into the thumbnail_kvstore table are created when uploading the image, rather than when it is displayed.
The problem is that the page displaying the image is still really slow. Looking at the logging panel in the debug toolbar, it looks like there is still lots of communication with s3. It seems like after the image and thumbnails are uploaded and cached, page should render quickly without communicating with s3.
What am I doing wrong? Thanks!
Update: weak hack seems to have gotten it working, but I'd love to know how to do this properly:
https://github.com/asciitaxi/sorl-thumbnail/commit/545cce3f5e719a91dd9cc21d78bb973b2211bbbf
Update: more information for #sorl
I'm working with 2 views:
ADD VIEW: In this view I submit the form to create the model with the image in it. The image is uploaded to s3. In a post_save signal, I call get_thumbnail() to generate the thumbnail before it's needed:
im = get_thumbnail(instance.image, '360x360')
DISPLAY VIEW: In this view I display the thumbnail generated in the add view:
{% thumbnail object.image "360x360" as im %}
<img src="{{ im.url }}" width="{{ im.width }}" height="{{ im.height }}">
{% endthumbnail %}
Without the patch:
ADD VIEW: creates 3 entries in the kvstore table, accesses the cache 10 times (6 sets, 4 gets), logging tab of debug toolbar says "establishing HTTP connection" 12 times
DISPLAY VIEW: still just 3 entries in the kvstore table, just 1 get from cache, but debug toolbar says "establishing HTTP connection" 3 times still
With only the change on line 122:
ADD VIEW: same as above, except the logging only says "establishing HTTP connection" 2 times
DISPLAY VIEW: same as above, except the logging only says "establishing HTTP connection" 1 time
Also adding the change on line 118:
ADD VIEW: same as above, but now we are down to 2 "establishing HTTP connection" messages
DISPLAY VIEW: same as above, with no logging messages at all
UPDATE: It looks like storage._setup() is called twice, and storage.url() is called once. Based on the timing, I'd say each one makes connections to s3:
1304711315.4
_setup
1304711317.84
1304711317.84
_setup
1304711320.3
1304711320.39
_url
1304711323.66
This seems to be reflected by the boto logging, which says "establishing HTTP connection" 3 times.

As the author of sorl thumbnail I am really interested in solving this if it is not working as I intended. If the key value sotre is populated it will currently store: name, storage and size. I have made the assumption that the url is based on the name and thus should not cause any storage calls. Looking at django storages, https://github.com/e-loue/django-storages/blob/master/storages/backends/s3boto.py#L214 it seems like a safe assumption to make. In your patch you have patched the read method for some reason. When creating a thumbnail a ImageFile instance is fetched from cache (if not create it) then you can of course call read which will read the file, but the intended use is .url which calls url on the storage with the cached name which inturn should be a non storage access op. Could you try to isolate your problem to exacly where in your code this storage access happends?
Also make sure you have THUMBNAIL_DEBUG on and that you have the key value store properly set up.

I'm not sure if you problem is the same as mine, but I found that accessing the width or height property of a normal Django ImageField would read the file from the storage backend, load it into PIL, and return the dimensions from there. This is especially costly with a remote backend like we're using, and we have very media-heavy pages.
https://code.djangoproject.com/ticket/8307 was opened to address this but the Django devs closed as wontfix because they want the width and height properties to always return the true values. So I just monkeypatch _get_image_dimensions() to use those fields, which does prevent a large number of the boto messages and improves my page-load times.
Below is my code modified from the patch attached to that ticket. I stuck this in a place which gets executed early, such as a models.py.
from django.core.files.images import ImageFile, get_image_dimensions
def _get_image_dimensions(self):
from numbers import Number
if not hasattr(self, '_dimensions_cache'):
close = self.closed
if self.field.width_field and self.field.height_field:
width = getattr(self.instance, self.field.width_field)
height = getattr(self.instance, self.field.height_field)
#check if the fields have proper values
if isinstance(width, Number) and isinstance(height, Number):
self._dimensions_cache = (width, height)
else:
self.open()
self._dimensions_cache = get_image_dimensions(self, close=close)
else:
self.open()
self._dimensions_cache = get_image_dimensions(self, close=close)
return self._dimensions_cache
ImageFile._get_image_dimensions = _get_image_dimensions

After looking at the #shadfc django ticket, I reimplemented the monkeypatch as follows:
from django.core.files.images import ImageFile
def _get_image_dimensions(self):
if not hasattr(self, '_dimensions_cache'):
if getattr(self.storage, 'IGNORE_IMAGE_DIMENSIONS', False):
self._dimensions_cache = (0, 0)
else:
close = self.closed
self.open()
self._dimensions_cache = get_image_dimensions(self, close=close)
return self._dimensions_cache
ImageFile._get_image_dimensions = _get_image_dimensions
To use it, just add a IGNORE_IMAGE_DIMENSIONS = True to your storage class and it will not be touched to get image dimensions. Likely:
from storages.backends.s3boto import S3BotoStorage
S3BotoStorage.IGNORE_IMAGE_DIMENSIONS = True
I still need to investigate where the numbers are used, to know if simple returning (0, 0) can lead to any problem, but no bug raised for now.

Related

Programmatically saving image to Django ImageField returning 404 in production

I have a Django app where users can upload images and can have a processed version of the images if they want. and the processing function returns the path, so my approach was
model2.processed_image = processingfunction( model1.uploaded_image.path)
and as the processing function returns path here's how it looks in my admin view
not like the normally uploaded images
In my machine it worked correctly and I always get a 404 error for the processed ones while the normally uploaded is shown correctly when I try to change the url of the processed from
myurl.com/media/home/ubuntu/Eyelizer/media/path/to/the/image
to
myurl.com/media/path/to/the/image
so how can I fix this ? is there a better approach to saving the images manually to the database ?
I have the same function but returns a Pil.image.image object and I've tried many methods to save it in a model but I didn't know how so I've made the function return a file path.
I think the problem is from nginx where I define the media path.
should/can I override the url attribute of the processedimage?
making something like
model.processed_image.url = media/somefolder/filename
Instead of using the PIL Image directly, create a django.core.files.File.
Example:
from io import BytesIO
from django.core.files import File
img_io = BytesIO() # create a BytesIO object to temporarily save the file in memory
img = processingfunction( model1.uploaded_image.path)
img.save(img_io, 'PNG') # save the PIL image to the BytesIO object
img_file = File(thumb_io, name='some-name.png') # create the File object
# you can use the `name` from `model1.uploaded_image` and use
# that above
# finally, pass the image file to your model field
model2.processed_image = img_file
To avoid repetition of this code, it would be a good idea to keep this code in processingfunction and return the File object directly from there.
My approach is a bit different from #Xyres's, I thought xyres's would make a duplicate of the existing image and create a new one and when I tried overriding the URL attribute it returned an error of
can't set the attribute
but when I saw this question and this ticket I tried making this and it worked
model2.processed_image = processingfunction(model1.uploaded_image.path)
full_path = model2.processed_image.path
model2.processed_image.name = full_path.split('media')[1]
so that explicitly making the URL media/path/to/image and cut out all of the unneeded parts like home/ubuntu and stuff

Manage multiple uploads with Flask session

I have a following situation. I created a simple backend in Flask that handles file uploads. With files received, Flask does something (uploads them), and returns the data to the caller. There are two scenarios with the app, to upload one image and multiple images. When uploading one image, I can simply get the response and voila, I'm all set.
However, I am stuck on handling multiple file uploads. I can use the same handler for the actual file upload, but the issue is that all of those files need to be stored into a list or something, then processed, and after doing that, a single link (album) containing all those images, needs to be delivered.
Here is my upload handling code:
#app.route('/uploadv3', methods=['POST'])
def upload():
if request.method == 'POST':
data_file = request.files["file"]
file_name = data_file.filename
path_to_save_to = os.path.join(app.config['UPLOAD_FOLDER'], file_name)
data_file.save(path_to_save_to)
file_url = upload_image_to_image_host(path_to_save_to)
return file_url
I was experimenting with session in flask, but I dont know can I create a list of items under one key, like session['links'], and then get all those, and clear it after doing the work. Or is there some other simpler solution?
I assume that I could probably do this via key for each image, like session["link1"], and so on, but that would impose a limit on the images (depending on how much of those I create), would make the code very ugly, make the iteration over each in order to generate a list that is passed to an album building method problematic, and session clearing would be tedious.
Some code that I wrote for getting the actual link at the end and clearing the session follows (this assume that session['link'] has a list of urls, which I can't really achieve with my knowledge of session management in Flask:
def create_album(images):
session.pop('link', None)
new_album = im.create_album(images)
return new_album.link
#app.route('/get_album_link')
def get_album_link():
return create_album(session['link'])
Thanks in advance for your time!
You can assign anything to a session including individual value or list/dictionary etc. If you know the links, you can store them in the session as follows:
session['links'] = ['link1','link2'...and so on]
This way, you have a list of all the links. You can now access a link by:
if 'links' in session:
for link in session['links']:
print link
Once you are done with them, you can clear the session as:
if 'links' in session:
del session['links']
To clarify what I have done to make this work. At the end, it appeared that the uploading images and adding them to the album anonymously had to be done "reversely", so not adding images to an album object, but uploading an image object to an album id.
I made a method that gets the album link and puts it in the session:
#app.route('/get_album_link')
def get_album_link():
im = pyimgur.Imgur(CLIENT_ID)
new_album = im.create_album()
session.clear()
session['album'] = new_album.deletehash
session['album_link'] = new_album.link
return new_album.link
Later on, when handling uploads, I just add the image to the album and voila, all set :)
uploaded_image = im.upload_image(path_of_saved_image, album=session['album'])
file_url = uploaded_image.link
return file_url
One caveat is that the image should be added to the "deleteahash" value passed as the album value, not the album ID (which is covered by the imgur api documentation).

Django - show loading message during long processing

How can I show a please wait loading message from a django view?
I have a Django view that takes significant time to perform calculations on a large dataset.
While the process loads, I would like to present the user with a feedback message e.g.: spinning loading animated gif or similar.
After trying the two different approaches suggested by Brandon and Murat, Brandon's suggestion proved the most successful.
Create a wrapper template that includes the javascript from http://djangosnippets.org/snippets/679/. The javascript has been modified: (i) to work without a form (ii) to hide the progress bar / display results when a 'done' flag is returned (iii) with the JSON update url pointing to the view described below
Move the slow loading function to a thread. This thread will be passed a cache key and will be responsible for updating the cache with progress status and then its results. The thread renders the original template as a string and saves it to the cache.
Create a view based on upload_progress from http://djangosnippets.org/snippets/678/ modified to (i) instead render the original wrapper template if progress_id='' (ii) generate the cache_key, check if a cache already exists and if not start a new thread (iii) monitor the progress of the thread and when done, pass the results to the wrapper template
The wrapper template displays the results via document.getElementById('main').innerHTML=data.result
(* looking at whether step 4 might be better implemented via a redirect as the rendered template contains javascript that is not currently run by document.getElementById('main').innerHTML=data.result)
Another thing you could do is add a javascript function that displays a loading image before it actually calls the Django View.
function showLoaderOnClick(url) {
showLoader();
window.location=url;
}
function showLoader(){
$('body').append('<div style="" id="loadingDiv"><div class="loader">Loading...</div></div>');
}
And then in your template you can do:
This will take some time...
Here's a quick default loadingDiv : https://stackoverflow.com/a/41730965/13476073
Note that this requires jQuery.
a more straightforward approach is to generate a wait page with your gif etc. and then use the javascript
window.location.href = 'insert results view here';
to switch to the results view which starts your lengthy calculation. The page wont change until the calculation is finished. When it finishes, then the results page will be rendered.
Here's an oldie, but might get you going in the right direction: http://djangosnippets.org/snippets/679/
A workaround that I chose was to use beforunload and unload events to show the loading image. This can be used with or without window.load. In my case, it's the view that is taking a great amount of time and not the page loading, hence I am not using window.load (because it's already a lot of time by the time window.load comes into picture, and at that point of time, I do not need the loading icon to be shown anymore).
The downside is that there is a false message that goes out to the user that the page is loading even when when the request has not even reached the server or it's taking much time. Also, it doesn't work for requests coming from outside my website. But I'm living with this for now.
Update: Sorry for not adding code snippet earlier, thanks #blockhead. The following is a quick and dirty mix of normal JS and JQuery that I have in the master template.
Update 2: I later moved to making my view(s) lightweight which send the crucial part of the page quickly, and then using ajax to get the remaining content while showing the loading icon. It needed quite some work, but the end result is worth it.
window.onload=function(){
$("#load-icon").hide(); // I needed the loading icon to hide once the page loads
}
var onBeforeUnLoadEvent = false;
window.onunload = window.onbeforeunload= function(){
if(!onBeforeUnLoadEvent){ // for avoiding dual calls in browsers that support both events
onBeforeUnLoadEvent = true;
$("#load-icon").show();
setTimeout(function(){
$("#load-icon").hide();},5000); // hiding the loading icon in any case after
// 5 seconds (remove if you do not want it)
}
};
P.S. I cannot comment yet hence posted this as an answer.
Iterating HttpResponse
https://stackoverflow.com/a/1371061/198062
Edit:
I found an example to sending big files with django: http://djangosnippets.org/snippets/365/ Then I look at FileWrapper class(django.core.servers.basehttp):
class FileWrapper(object):
"""Wrapper to convert file-like objects to iterables"""
def __init__(self, filelike, blksize=8192):
self.filelike = filelike
self.blksize = blksize
if hasattr(filelike,'close'):
self.close = filelike.close
def __getitem__(self,key):
data = self.filelike.read(self.blksize)
if data:
return data
raise IndexError
def __iter__(self):
return self
def next(self):
data = self.filelike.read(self.blksize)
if data:
return data
raise StopIteration
I think we can make a iterable class like this
class FlushContent(object):
def __init__(self):
# some initialization code
def __getitem__(self,key):
# send a part of html
def __iter__(self):
return self
def next(self):
# do some work
# return some html code
if finished:
raise StopIteration
then in views.py
def long_work(request):
flushcontent = FlushContent()
return HttpResponse(flushcontent)
Edit:
Example code, still not working:
class FlushContent(object):
def __init__(self):
self.stop_index=2
self.index=0
def __getitem__(self,key):
pass
def __iter__(self):
return self
def next(self):
if self.index==0:
html="loading"
elif self.index==1:
import time
time.sleep(5)
html="finished loading"
self.index+=1
if self.index>self.stop_index:
raise StopIteration
return html
Here is another explanation on how to get a loading message for long loading Django views
Views that do a lot of processing (e.g. complex queries with many objects, accessing 3rd party APIs) can take quite some time before the page is loaded and shown to the user in the browser. What happens is that all that processing is done on the server and Django is not able to serve the page before it is completed.
The only way to show a show a loading message (e.g. a spinner gif) during the processing is to break up the current view into two views:
First view renders the page with no processing and with the loading message
The page includes a AJAX call to the 2nd view that does the actual processing. The result of the processing is displayed on the page once its done with AJAX / JavaScript

Same random record across website

I am building a website that shows a different background every time a user enters it. This background is then used across the website with the same user and session.
So basically, a user entres the homepage, gets a background and that image won't change until the user closes the website or opens up a new page. I think you understand what I mean.
I know how to get a random record from the database using Django, but I'm not sure how to keep that record persistent across the website, because if I pull it on every view, I'll get a different image on different pages.
So my "index" view could be calling
bgimage = BackgroundImage.objects.random()
But then I have a problem. How can I get this random record unchanged across all the other views. Is that possible? Should I be looking into sessions, cookies?
Thank you!
you could use sessions - something like
if 'bgimage' not in request.session:
bgimage = BackgroundImage.objects.random()
request.session['bgimage'] = bgimage.pathtoimage
Context processor:
def bgimage(request):
if 'bg_image' not in request.session:
image = BackgroundImage.objects.random()
request.session['bg_image'] = image.file
return {'background_image' : request.session['image']}

Django ImageField validation (is it sufficient)?

I have a lot of user uploaded content and I want to validate that uploaded image files are not, in fact, malicious scripts. In the Django documentation, it states that ImageField:
"Inherits all attributes and methods from FileField, but also validates that the uploaded object is a valid image."
Is that totally accurate? I've read that compressing or otherwise manipulating an image file is a good validation test. I'm assuming that PIL does something like this....
Will ImageField go a long way toward covering my image upload security?
Django validates the image uploaded via form using PIL.
See https://code.djangoproject.com/browser/django/trunk/django/forms/fields.py#L519
try:
# load() is the only method that can spot a truncated JPEG,
# but it cannot be called sanely after verify()
trial_image = Image.open(file)
trial_image.load()
# Since we're about to use the file again we have to reset the
# file object if possible.
if hasattr(file, 'reset'):
file.reset()
# verify() is the only method that can spot a corrupt PNG,
# but it must be called immediately after the constructor
trial_image = Image.open(file)
trial_image.verify()
...
except Exception: # Python Imaging Library doesn't recognize it as an image
raise ValidationError(self.error_messages['invalid_image'])
PIL documentation states the following about verify():
Attempts to determine if the file is broken, without actually decoding
the image data. If this method finds any problems, it raises suitable
exceptions. This method only works on a newly opened image; if the
image has already been loaded, the result is undefined. Also, if you
need to load the image after using this method, you must reopen the
image file.
You should also note that ImageField is only validated when uploaded using form. If you save the model your self (e.g. using some kind of download script), the validation is not performed.
Another test is with the file command. It checks for the presence of "magic numbers" in the file to determine its type. On my system, the file package includes libmagic as well as a ctypes-based wrapper /usr/lib64/python2.7/site-packages/magic.py. It looks like you use it like:
import magic
ms = magic.open(magic.MAGIC_NONE)
ms.load()
type = ms.file("/path/to/some/file")
print type
f = file("/path/to/some/file", "r")
buffer = f.read(4096)
f.close()
type = ms.buffer(buffer)
print type
ms.close()
(Code from here.)
As to your original question: "Read the Source, Luke."
django/core/files/images.py:
"""
Utility functions for handling images.
Requires PIL, as you might imagine.
"""
from django.core.files import File
class ImageFile(File):
"""
A mixin for use alongside django.core.files.base.File, which provides
additional features for dealing with images.
"""
def _get_width(self):
return self._get_image_dimensions()[0]
width = property(_get_width)
def _get_height(self):
return self._get_image_dimensions()[1]
height = property(_get_height)
def _get_image_dimensions(self):
if not hasattr(self, '_dimensions_cache'):
close = self.closed
self.open()
self._dimensions_cache = get_image_dimensions(self, close=close)
return self._dimensions_cache
def get_image_dimensions(file_or_path, close=False):
"""
Returns the (width, height) of an image, given an open file or a path. Set
'close' to True to close the file at the end if it is initially in an open
state.
"""
# Try to import PIL in either of the two ways it can end up installed.
try:
from PIL import ImageFile as PILImageFile
except ImportError:
import ImageFile as PILImageFile
p = PILImageFile.Parser()
if hasattr(file_or_path, 'read'):
file = file_or_path
file_pos = file.tell()
file.seek(0)
else:
file = open(file_or_path, 'rb')
close = True
try:
while 1:
data = file.read(1024)
if not data:
break
p.feed(data)
if p.image:
return p.image.size
return None
finally:
if close:
file.close()
else:
file.seek(file_pos)
So it looks like it just reads the file 1024 bytes at a time until PIL says it's an image, then stops. This obviously does not integrity-check the entire file, so it really depends on what you mean by "covering my image upload security": illicit data could be appended to an image and passed through your site. Someone could DOS your site by uploading a lot of junk or a really big file. You could be vulnerable to an injection attack if you don't check any uploaded captions or make assumptions about the image's uploaded filename. And so on.