Django-skel slow due to httplib requests to S3 - django

G'day,
I am playing around with django-skel on a recent project and have used most of its defaults: Heroku for hosting and S3 for file storage. I'm mostly serving a static-y site except using sorl for thumbnail generation, however the response times are pathetic.
You can visit the site: http://bit.ly/XlzkXp
My template looks like: https://gist.github.com/cd15e320be6f4454a7fb
I'm serving the template using a shortcut from the URL conf, no database usage at all: https://gist.github.com/f9d1a9a191959dcff1b5
However, it's consistently taking 15+ seconds for the response. New relic shows this is because of requests going to S3 while processing the view. This does not make any sense to me.
New Relic data: http://i.imgur.com/vs9ZTLP.png?1
Why is something using httplib to request things from S3? I can see how collectstatic might be doing it, but not the processing of the view itself.
What am I not understanding about Django-skel and this setup?

Have the same issue, my guess is that:
django-compress with django-storage are both in use
which results the former saving cache it needs to render templates to S3 bucket
and then reading it (through network, so httplib) while rendering each template
My second guess was that instructions on django-compress with remote storage to implement "S3 Storage backend which caches files locally, too" would resolve this issue.
Though it makes sense to me: saving cache to both locations local and S3 and reading from local filesystem first should speed things up, it somehow does not work this way.. still the response time is around 8+ sec.
By disabling django-compress with COMPRESS_ENABLED = False i managed to get 1-1.3 sec average response time.
Any ideas?
(I will update this answer in case of any progress)

Related

Continuous Delivery issues with S3 and AWS CloudFront

I'm building out a series of content websites, and I've built a working CodePipeline that allows me to push edits to HTML files on github that instantly reflect in the S3 bucket, and consequently on the live website.
I created a cloudfront distro to get HTTPS for my website. The certificate and distro work fine, and it populates with my index.html in my S3 bucket, but the changes made via my github pipeline to the S3 bucket are reflected in the S3 bucket but not the CloudFront Distribution.
From what I've read, the edge locations used in cloudfront don't update their caches super often, and when they do, they might not update the edited index.html file because it has the same name as the old version.
I don't want to manually rename my index.html file in S3 every time one of my writers needs to post a top 10 Tractor Brands article or implement an experimental, low-effort clickbait idea, so that's pretty much off the table.
My overall objective is to build something where teams can quickly add an article with a few images to the website that goes live in minutes, and I've been able to do it so far but not with HTTPS.
If any of you know a good way of instantly updating CloudFront Distributions without changing file names, that would be great. Othterwise I'll probably have to start over because I need my sites secured and the ability to update them instantly.
You people are awesome. Thanks a million for any help.
You need to invalidate files from the edge caches. It's a simple and quick process.
You can automate the process yourself in your pipeline, or you could potentially use a third-party tool such as aws-cloudfront-auto-invalidator.

Heroku doesn't update github file system when an image is uploaded from website

I ran into the problem where Heroku doesn't update my GitHub repository (or say static filesystem) when a blog post (including pictures) is created from the website.
Other images survive, whilst the ones saved in my filesystem with the server running on heroku, disapper.
I found this on their documentation.
The Heroku filesystem is ephemeral - that means that any changes to the filesystem whilst the dyno is running only last until that dyno is shut down or restarted.
I'm still confused why not all the pictures disappear and only those added later do.
Is AWS S3 a solution for this? If it is, how can I represent my filesystem using buckets?
Say, for the Blog Post 1 I have 2 picture resolutions, which means storing the files in different folders corresponding to those resolutions.
---1920x1920
-----picture.jpg
---800x800
-----picture.jpg
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Is AWS S3 a solution for this?
S3 is the recommended solution for this, and the configuration is documented in Heroku DevCentre with specfic instructions for uploading from Python.
Note these Python instructions use the Direct Upload approch: Have the flask app generate a pre-signed URL, which is then passed back to the client Javascript code, so that the user's browser can make the upload to S3 directly. The resulting S3 URL of the image, is then put into a hidden element in the form, which is then received by your app on form submit.
The fact that you have separate image sizes suggests your app does some processing (maybe with PIL) to get these thumbnails. In which case it may be easier to use the Pass-Through approach where your app implements its own upload mechanism, does the processing and then uploads the thumbnails to S3 (The upload to S3 part is well document, such as in this SO thread).
The Pass-Through method carries the warning that this may cause blocking of a single threaded worker. If your site gets a volume of requests that causes this to be an issue, you may need to increase the number of gunicorn workers, or change to a worker type that supports concurrency (This github post has some useful commands/info on concurrent worker types).
The best way to implement this whole thing (although the requirement for a redisgo dyno and worker dyno may push you into the paid teir) may be with Background Tasks using rq. You use the Direct-Upload approach above to upload the original image, then have a background job download that, do the resizing, and put the resulting thumbnails back onto S3.
Does that mean I have to create 2 buckets named 1920x1920 and 800x800 or is there a better way of handling them?
Have one Bucket for the entire app, and just include forward slashes in the object's key to mimic a subdirectory structure.

How can i get django to process media files on production?

How can i get django to process media files on production when DEBUG = False on heroku server?
I know that it’s better not to do this and that this will lead to a loss of performance, but my application is used only by me and my cat, so I don't think that this will be unjustified in my case.
The reason this won't work is because the Heroku file system is ephemeral, meaning any files uploaded after your app code is pushed will be overwritten anytime your app is restarted. This will leave your app with image links in the DB which lead to non-existent files.
You can read more about it here:
https://help.heroku.com/K1PPS2WM/why-are-my-file-uploads-missing-deleted
Your best bet is using a bucket like Amazon S3 to upload your files to. It costs almost nothing for small use, and is very reliable.
https://blog.theodo.com/2019/07/aws-s3-upload-django/

Why serving static file in production, using django it's discouraged?

I have developed a web application that uses (obviously) some static files, in order to deploy it, I've chosen to serve the files with the WSGI interpreter and use for the job gunicorn behind a firewall and a reverse proxy.
My application uses whitenoise to server staticfiles: Everything works fine and I don't have any issue regarding the performances...but, really, I can't understand WHY the practice to serve those static files using directly the WSGI interpreter it's discouraged (LINK), says:
This is not suitable for production use! For some common deployment strategies...
I mean, my service it's a collection of microservices: DB-Frontend-Services-Etc...If I need to scale them, I can do this without any problem and, in addition, using this philosophy, I'm not worried about the footprint of my microservices: for me, this seems logical, but maybe, for the rest of the world this is a completely out-of-mind strategy.
You've misinterpreted that documentation. It's fine to use Whitenoise to serve static files; that is entirely what it's for. What is not a good idea is to use that internal Django function to do so, since it is inefficient.
Three reasons why I personally serve static from CDN,
1- You are using up bandwidth from your app server and loosing time getting these static files instead of throwing the load to CDN to handle all that. (WhiteNoise should though eliminate that)
2- Some hosting services like AWS will charge you for extra traffic in/out, while you can use cheaper services like Cloudfront and a S3 bucket.
3- I like to keep my app servers for app purposes only, and utilize each service for its job only, this helps me in debugging and reducing my failure points.
On the other hand though, serving static from app server with something like WhiteNoise is much much easier than configuring your CDN.
Hope this helps!
It's quite ok when you use Whitenoise because:
Whitenoise is exactly made for this purpose and therefore efficient
It'll set the HTTP response headers correctly so clients cache the files.
But think of it this way: Instead of serving 1 or 2 requests per web page, you'll often get 10x more requests (usually web pages will request a bunch of images, one or more css files, a couple of js files...). Meaning you have to scale your application server to serve 10x more traffic on average than if you leave the job to a CDN.
By the way, I've written a tutorial on this topic which may help.

Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation

I am using S3 storage backend across a Django site I am developing, both to reduce load from the EC2 server(s), and to allow multiple webservers (redundancy, load balancing) access the same set of uploaded media.
Sorl.thumbnail (v11) template tags are being used in our templates to allow flexible image resizing/cropping.
Performance on media-rich pages is not very good, and when a page containing thumbnails needing to be generated for the first time is accessed, the requests even time out.
I understand that this is due to sorl thumbnail checking/downloading the original image from S3 (which could be quite large and high resolution), and rendering/checking/uploading the thumbnail.
What would you suggest is the best solution to this setup?
I have seen suggestions of storing a local copy of files in addition to the S3 copy (not to great when a couple of server are being used for load balancing). Also I've seen it suggested to store 0-byte files to fool sorl.thumbnail.
Are there any other suggestions or better ways of approaching this?
sorl thumbnail is now created with remote slow storages in mind. The first creation of the thumbnail is however done quering the storage, for example first accessed from template, but after that the references are cached in a key value store. Still you need the first query and creation, well one solution is to use the low level api sorl.thumbnail.get_thumbnail with the same options when the image is uploaded. When the image uploaded add this thumbnail creation job to a que like celery.
You can use Sorlery. It combines sorl and celery to create thumbnails via workers. It's very careful not to do any filesystem access outside of the worker thread.
The thumbnail returned immediately (before the worker has had a chance) can be controlled by setting your THUMBNAIL_DUMMY_SOURCE to an appropriate placeholder.
The job is created the first time the thumbnail is requested, subsequent requests are served the dummy image until the worker thread completes.
Almost same as #Aidan's solution, I have made some tweaks on sorl-thumbnail. I also pre-generate thumbnails with celery. My code is here sorl_thumbnail-async
But I came to know easy_thumbnails does exactly what I was trying to do, so I am using it in my current project. You might find useful, short post on the topic is here
The easiest solution I've found so far is actually this third party service: http://cloudinary.com/