Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation

Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation - django

I am using S3 storage backend across a Django site I am developing, both to reduce load from the EC2 server(s), and to allow multiple webservers (redundancy, load balancing) access the same set of uploaded media.
Sorl.thumbnail (v11) template tags are being used in our templates to allow flexible image resizing/cropping.
Performance on media-rich pages is not very good, and when a page containing thumbnails needing to be generated for the first time is accessed, the requests even time out.
I understand that this is due to sorl thumbnail checking/downloading the original image from S3 (which could be quite large and high resolution), and rendering/checking/uploading the thumbnail.
What would you suggest is the best solution to this setup?
I have seen suggestions of storing a local copy of files in addition to the S3 copy (not to great when a couple of server are being used for load balancing). Also I've seen it suggested to store 0-byte files to fool sorl.thumbnail.
Are there any other suggestions or better ways of approaching this?

sorl thumbnail is now created with remote slow storages in mind. The first creation of the thumbnail is however done quering the storage, for example first accessed from template, but after that the references are cached in a key value store. Still you need the first query and creation, well one solution is to use the low level api sorl.thumbnail.get_thumbnail with the same options when the image is uploaded. When the image uploaded add this thumbnail creation job to a que like celery.

You can use Sorlery. It combines sorl and celery to create thumbnails via workers. It's very careful not to do any filesystem access outside of the worker thread.
The thumbnail returned immediately (before the worker has had a chance) can be controlled by setting your THUMBNAIL_DUMMY_SOURCE to an appropriate placeholder.
The job is created the first time the thumbnail is requested, subsequent requests are served the dummy image until the worker thread completes.

Almost same as #Aidan's solution, I have made some tweaks on sorl-thumbnail. I also pre-generate thumbnails with celery. My code is here sorl_thumbnail-async
But I came to know easy_thumbnails does exactly what I was trying to do, so I am using it in my current project. You might find useful, short post on the topic is here

The easiest solution I've found so far is actually this third party service: http://cloudinary.com/

Related

Heroku doesn't update github file system when an image is uploaded from website

I ran into the problem where Heroku doesn't update my GitHub repository (or say static filesystem) when a blog post (including pictures) is created from the website.
Other images survive, whilst the ones saved in my filesystem with the server running on heroku, disapper.
I found this on their documentation.
The Heroku filesystem is ephemeral - that means that any changes to the filesystem whilst the dyno is running only last until that dyno is shut down or restarted.
I'm still confused why not all the pictures disappear and only those added later do.
Is AWS S3 a solution for this? If it is, how can I represent my filesystem using buckets?
Say, for the Blog Post 1 I have 2 picture resolutions, which means storing the files in different folders corresponding to those resolutions.
---1920x1920
-----picture.jpg
---800x800
-----picture.jpg
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?

Is AWS S3 a solution for this?
S3 is the recommended solution for this, and the configuration is documented in Heroku DevCentre with specfic instructions for uploading from Python.
Note these Python instructions use the Direct Upload approch: Have the flask app generate a pre-signed URL, which is then passed back to the client Javascript code, so that the user's browser can make the upload to S3 directly. The resulting S3 URL of the image, is then put into a hidden element in the form, which is then received by your app on form submit.
The fact that you have separate image sizes suggests your app does some processing (maybe with PIL) to get these thumbnails. In which case it may be easier to use the Pass-Through approach where your app implements its own upload mechanism, does the processing and then uploads the thumbnails to S3 (The upload to S3 part is well document, such as in this SO thread).
The Pass-Through method carries the warning that this may cause blocking of a single threaded worker. If your site gets a volume of requests that causes this to be an issue, you may need to increase the number of gunicorn workers, or change to a worker type that supports concurrency (This github post has some useful commands/info on concurrent worker types).
The best way to implement this whole thing (although the requirement for a redisgo dyno and worker dyno may push you into the paid teir) may be with Background Tasks using rq. You use the Direct-Upload approach above to upload the original image, then have a background job download that, do the resizing, and put the resulting thumbnails back onto S3.
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Have one Bucket for the entire app, and just include forward slashes in the object's key to mimic a subdirectory structure.

Apex Oracle : How to get image URL?

I'm running Apex 19.2 on Oracle 18c and I would like to get some images URL to show them in the application. The images are stored in the database as blob (not static images).
For the moment what I did is creating an ORDS Restfull Service that connects to database and load the images. The images are then accessible via an URL that I insert in my pages
<img src="URL to my Restfull service module with the image identifier">
This works well but I find it quite complex and most importantly, it's very slow and doesn't cache the image. Whenever I load the page I have to wait for the image to load (even though it's very small : 50kb)
Does anyone have a solution for this please ? Is there any Apex out of the box solution like for static imaes ?
Thanks,
Cheers

There is no direct method to expose BLOBs to the end user as it would be kind of complicated to secure these files. I can suggest the following two methods:
Use the code just like you did it, but consider putting it in an application process. This way, you can use all your session variables directly. You will then be able to generate a link that does exactly what you want, or call the process from a button or branch. There is a nice tutorial here:
https://oracle-base.com/articles/misc/apex-tips-file-download-from-a-button-or-link
Using APEX_UTIL.GET_BLOB_FILE_SRC
This function only works out of a apex session and requires you to set up an application page with an item that holds a primary key to your table. I doubt that this is what you want.
Note that APEX_MAIL.GET_IMAGES_URL does not work for your use case - this only works for files in your shared components application files or workspace files.
I actually like your approach, because it may be more lightweight than 1). That the image gets loaded again every time probably does not depend on the method you are using. I guess it is more likely due to the headers you are sending out. Take a look at the cache-control headers on this page:
https://developer.mozilla.org/de/docs/Web/HTTP/Headers/Cache-Control

Maybe check out APEX_MAIL.GET_IMAGES_URL
It is supposed to do essentially what you need so perhaps you can use it.

Heroku & Django: Storing a single jpg asset

With Heroku's ephemeral filesystem, it's obvious we can't host user-uploaded images or dynamically changing ones in the app's filesystem because these won't be available to other dynos.
However, I'm looking for a solution to store a SINGLE image, used as sort of a logo, in a Django URLField() - this image's URL will be sort of a fallback if an image at a remote location (external service) is not available( Hence the URLField and not ImageField)
picture_url = models.URLField(null=True)
For me, setting-up and using services like Amazon S3 seems like an overkill. Using services like imgur/Photobucket seems unreliable.
Any workarounds?

Sadly, no.
There's no functional difference between storing a single image, or multiple images. The issue is that Heroku deploys a snapshot of your code, as represented in Git. If an asset doesn't live in your repository, it will be throw away whenever Heroku re-deploys your code (including when you push new code).
An image hosting service (storing the URL as you're already planning to do) is your best bet.

Django-skel slow due to httplib requests to S3

G'day,
I am playing around with django-skel on a recent project and have used most of its defaults: Heroku for hosting and S3 for file storage. I'm mostly serving a static-y site except using sorl for thumbnail generation, however the response times are pathetic.
You can visit the site: http://bit.ly/XlzkXp
My template looks like: https://gist.github.com/cd15e320be6f4454a7fb
I'm serving the template using a shortcut from the URL conf, no database usage at all: https://gist.github.com/f9d1a9a191959dcff1b5
However, it's consistently taking 15+ seconds for the response. New relic shows this is because of requests going to S3 while processing the view. This does not make any sense to me.
New Relic data: http://i.imgur.com/vs9ZTLP.png?1
Why is something using httplib to request things from S3? I can see how collectstatic might be doing it, but not the processing of the view itself.
What am I not understanding about Django-skel and this setup?

Have the same issue, my guess is that:
django-compress with django-storage are both in use
which results the former saving cache it needs to render templates to S3 bucket
and then reading it (through network, so httplib) while rendering each template
My second guess was that instructions on django-compress with remote storage to implement "S3 Storage backend which caches files locally, too" would resolve this issue.
Though it makes sense to me: saving cache to both locations local and S3 and reading from local filesystem first should speed things up, it somehow does not work this way.. still the response time is around 8+ sec.
By disabling django-compress with COMPRESS_ENABLED = False i managed to get 1-1.3 sec average response time.
Any ideas?
(I will update this answer in case of any progress)

Django action after file upload

We have an extensive existing codebase and we've added load-balanced servers with a single master server to the equation now. There are various apps that contain models with uploaded files and images which all work fine... However, this raises the obvious problem of the rsync delay. Rsync is in the crontab and set to run every minute but this still means there's a potential 59 second wait between content being created and it actually existing on the webservers.
What I would like, is to be able to register some kind of 'post file changed' handler that triggers rsync whenever a new file is uploaded. I can't find anything of the sort though! Django has file upload handlers, but these appear to only deal with the actual upload stream, not the file as it is saved to the filesystem thereafter.
The best approach I can see is to create simple extensions to FileField, FieldFile, ImageField and ImageFieldFile as part of my project and hook into the save and delete methods in the FileField. Essentially, to create custom File and Image fields with this behaviour added. This isn't massively complicated to do but it doesn't seem like the most elegant solution to me. I'll need to teach South about my new fields, update every model that is affected and then create hordes of south migrations (which I'm pretty sure will clash with some code we have pending).
I'm also looking into creating a custom Storage class for the project, but I'm nervous about this having far-reaching effects on other pieces of code.
I can't believe no-one has come across this issue before, is there a canonical approach?
Thanks very much!

If you want to tackle this problem from the server-side (eg. similar solution to rsync) and you're running Linux, you might want to check out lsyncd:
http://code.google.com/p/lsyncd/
lsyncd uses inotify in the Linux kernel to watch directories and invoke an rsync as soon as files are modified. Fairly simple to drop in.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js