Heroku + S3 + Django: Static Files not Cached - django

Currently have a project deployed on Heroku with static files loaded from S3. I'm using boto/django-storage to manage my S3 content, but if I call the same view or load the same page repeatedly, all the images/static content load twice and is not cached.
I've placed
AWS_HEADERS = {
'Cache-Control': 'max-age=2592000',
}
in my settings.py, but the reason seems the same exact images (refreshed + loaded twice) have different signatures in their URL? I've tried multiples headers, but the browser doesn't seem want to cache it and instead loads them all everytime.

try setting AWS_QUERYSTRING_AUTH = False. Then the URL generated will always be the same (public) URL. The default-ACL in S3BotoStorageis public-read, which shouldn't be changed then.
Two things not to forget:
perhaps you want to add public, max-age=XXX, so public proxies also can cache your content?
When you want the browser to cache that long, you should keep in mind that the filenames have to change when you change the content. One solution would be to S3BotoStorage combined with the Django-CachedStaticFilesStorage (see here, but I use it without the seperate cache-backend)

Related

django & AWS S3 - network/performance cost difference on enabling pre-signed url

Say, I have a django project with static files stored on Amazon S3 with django-storage as a custom storage.
Say, I have a static file called style.css.
According to this question:
By default, AWS_QUERYSTRING_AUTH is set to True, and the generated link will be as such : https://bucket.s3.amazonaws.com/style.css?AWSAccessKeyId=xxxxxx&Signature=xxxx&Expires=1595823548
But if I set AWS_QUERYSTRING_AUTH to False, the generated link will be as such : https://bucket.s3.amazonaws.com/style.css (e.g. without accessId, signature, and expire)
for public file, they said I should set this to False
If I get this correctly, this is what people call a "signed url".
Functionality-wise, both of these 2 options above will work exaclty as the same, despite the longer link on the first option.
My question are: performance-wise, does pre-signed url affect CPU/networking bandwith as much?
If I have a hundreds of public files that can be served without pre-signed url, does my server get the overhead (in terms of CPU and networking) if I insist to serve all of it with presigned url ?

Purge client browser cache after deploying app to Heroku

My Flask application is hosted by Heroku and served on Nginx and uses Cloudflare as a CDN. There are times when I change static assets (images, CSS, JS, etc.) on the backend that get changed through deployment on Heroku. These changes will not change on the client's browser unless they manually purge their cache. The cache does expire on the client's browser every month as recommended, but I want the backend to manually tell client browsers to purge their cache for my website every time I deploy to Heroku and they load/reload my website after the fact. Is there a way to automize this process?
If you're using the same filenames it'll use a cached-copy so why not provide versioning on your static files using a filter? You don't have to change the filename at all. Although do read about the caveats in the link provided.
import os
from some_app import app
#app.template_filter('autoversion')
def autoversion_filter(filename):
# determining fullpath might be project specific
fullpath = os.path.join('some_app/', filename[1:])
try:
timestamp = str(os.path.getmtime(fullpath))
except OSError:
return filename
newfilename = "{0}?v={1}".format(filename, timestamp)
return newfilename
Via https://ana-balica.github.io/2014/02/01/autoversioning-static-assets-in-flask/
“Don’t include a query string in the URL for static resources.” It
says that most proxies will not cache static files with query
parameters. Consequently that will increase the bandwidth, since all
the resources will be downloaded on each request.
“To enable proxy caching for these resources, remove query strings
from references to static resources, and instead encode the parameters
into the file names themselves.” But this implies a slightly different
implementation :)

Static content on CloudFront is cached incorrectly over time

I have set up a CloudFront on top of multiple S3 buckets (in different regions) to provide a fast stable version of my webapp. This webapp is implemented with React which means it's all one single HTML file and one single Javascript file.
Using the routing mechanism of React, all the paths in the URL are handled within the code. This means if I click on a link like www.example.com/users, there won't be a request sent to the server. Instead, the client code will render the appropriate page without any consultation with the server (I'm just talking about the HTML and not considering the data). This means that if some user types in the given URL, the server should return the index.html (the only HTML file I have) which then will take care of the URL on the client-side. In other words, all the requests sent to the server should either return the HTML file or the Javascript file I mentioned earlier. Even the requests that are pointing to none-existing files.
In order to implement this requirement, I asked this question and I got an answer like this:
I need to set up an error page for my distribution on CloudFront and
redirect all the 403 (Forbidden) requests to /index.html file. This
is because when the request is pointing to a nonexisting file on S3,
S3 will return 403 to CloudFront due to the lack of listing
permission. Or I can grant the listing permission and instead handle
the 404 error (I didn't test this latter option).
Anyways, I set this up and it works perfectly - for a few hours. But then, for some unknown reason, the request to the Javascript file also returns the HTML file. And of course, all I'm getting back is actually coming from CloudFront's cache which means, no matter how many times I send the request, it will keep returning the same value. That is until I invalidate the cache on CloudFront which will solve the problem for few more hours. And we go around and around.
Even though I'm not sure why this happens but my guess is that at some point the S3 buck is inaccessible to CloudFront which will result in CloudFront caching the index.html. What can I do about this?
I think I found the problem:
MAKE SURE YOUR STATIC CONTENT ON ALL THE S3 BUCKETS ARE IDENTICAL!!!
In my case, the Javascript filename is automatically generated by Webpack which means it's random. And since different regions were "compiled" separated, their filenames differed.

make some django/wagtail files private

Is there a way to make some files unaccessible via direct url? For example, an image appears on a page but the image location doesn't work on it's own.
How Things Normally Work
By default, all static and media files are served up from the static root and media root folders. A good practice is to have NGINX or Apache route to these, or to use something like Django Whitenoise to serve up static files.
In production, you definitely don't want to have the runserver serving up these files because 1) That doesn't scale well, 2) It means you're running in DEBUG mode on production which is an absolute no-no.
Protecting Those Files
You can keep this configuration for most files that you don't need to protect. Instead, you can serve up files of your own within Django from a different filepath. Use the upload_to parameter of the filefield to specify where you want those files to live. For example,
protectedfile = models.FileField(upload_to="protected/")
publicfile = models.FileField(upload_to="public/")
Then in NGINX make your block direct to /var/www/myproject/MEDIAROOT/public instead. That will enable you to continue serving up public files.
Meanwhile, for the protected files, those can be served up by the view with:
def view_to_serve_up_docs(request, document_id):
my_doc = MyDocumentModel.objects.get(id=document_id)
# Do some check against this user
if request.user.is_authenticated():
response = FileResponse(my_doc.privatefile)
response["Content-Disposition"] = "attachment; filename=" + my_doc.privatefile.name
else:
raise Http404()
return response
And link to this view within your templates
<a href='/downloadFileView/12345'>Download File #12345 Here!</a>
Reference
More about the FileResponse object: https://docs.djangoproject.com/en/1.11/ref/request-response/#fileresponse-objects

Amazon Cloudfront not caching certain small number of static objects

Has anyone come across this issue where Amazon Cloudfront seems to refuse to cache a certain small number of static objects?
I've tried invaliding the cache (root path) several times to no avail.
I had a look at the file permissions of the objects in question, and they seemed all ok.
I've also gone into the Amazon Console and there are no errors logged.
You can see more details of this here :
http://www.webpagetest.org/performance_optimization.php?test=171106_A4_be80c122489ae6fabf5e2caadcac8123&run=1#use_of_cdn
My website is using Processwire 3 running Apache and a PW caching product called Procache.
One of your issues is that you are not taking advantage of cache control headers on your objects. This is why you are seeing the message No max-age or expires. Look at this link to learn more about Cache-Control and Expires. Note: You should be using these headers even if you do not use CloudFront as the browser will cache certain objects also.
Using Headers to Control Cache Duration for Individual Objects
You do not indicate what web server that you are using. I have included a link for setting up Apache mod_expires to add cache control headers to your objects.
Apache Module mod_expires
For static assests such as css, js, images, etc. I would setup S3 and serve those objects from S3 via CloudFront. You can control the headers for S3 objects.
The above steps will improve caching of your objects in CloudFront and in the users' browser cache.