Purge client browser cache after deploying app to Heroku - flask

My Flask application is hosted by Heroku and served on Nginx and uses Cloudflare as a CDN. There are times when I change static assets (images, CSS, JS, etc.) on the backend that get changed through deployment on Heroku. These changes will not change on the client's browser unless they manually purge their cache. The cache does expire on the client's browser every month as recommended, but I want the backend to manually tell client browsers to purge their cache for my website every time I deploy to Heroku and they load/reload my website after the fact. Is there a way to automize this process?

If you're using the same filenames it'll use a cached-copy so why not provide versioning on your static files using a filter? You don't have to change the filename at all. Although do read about the caveats in the link provided.
import os
from some_app import app
#app.template_filter('autoversion')
def autoversion_filter(filename):
# determining fullpath might be project specific
fullpath = os.path.join('some_app/', filename[1:])
try:
timestamp = str(os.path.getmtime(fullpath))
except OSError:
return filename
newfilename = "{0}?v={1}".format(filename, timestamp)
return newfilename
Via https://ana-balica.github.io/2014/02/01/autoversioning-static-assets-in-flask/
“Don’t include a query string in the URL for static resources.” It
says that most proxies will not cache static files with query
parameters. Consequently that will increase the bandwidth, since all
the resources will be downloaded on each request.
“To enable proxy caching for these resources, remove query strings
from references to static resources, and instead encode the parameters
into the file names themselves.” But this implies a slightly different
implementation :)

Related

make some django/wagtail files private

Is there a way to make some files unaccessible via direct url? For example, an image appears on a page but the image location doesn't work on it's own.
How Things Normally Work
By default, all static and media files are served up from the static root and media root folders. A good practice is to have NGINX or Apache route to these, or to use something like Django Whitenoise to serve up static files.
In production, you definitely don't want to have the runserver serving up these files because 1) That doesn't scale well, 2) It means you're running in DEBUG mode on production which is an absolute no-no.
Protecting Those Files
You can keep this configuration for most files that you don't need to protect. Instead, you can serve up files of your own within Django from a different filepath. Use the upload_to parameter of the filefield to specify where you want those files to live. For example,
protectedfile = models.FileField(upload_to="protected/")
publicfile = models.FileField(upload_to="public/")
Then in NGINX make your block direct to /var/www/myproject/MEDIAROOT/public instead. That will enable you to continue serving up public files.
Meanwhile, for the protected files, those can be served up by the view with:
def view_to_serve_up_docs(request, document_id):
my_doc = MyDocumentModel.objects.get(id=document_id)
# Do some check against this user
if request.user.is_authenticated():
response = FileResponse(my_doc.privatefile)
response["Content-Disposition"] = "attachment; filename=" + my_doc.privatefile.name
else:
raise Http404()
return response
And link to this view within your templates
<a href='/downloadFileView/12345'>Download File #12345 Here!</a>
Reference
More about the FileResponse object: https://docs.djangoproject.com/en/1.11/ref/request-response/#fileresponse-objects

Amazon Cloudfront not caching certain small number of static objects

Has anyone come across this issue where Amazon Cloudfront seems to refuse to cache a certain small number of static objects?
I've tried invaliding the cache (root path) several times to no avail.
I had a look at the file permissions of the objects in question, and they seemed all ok.
I've also gone into the Amazon Console and there are no errors logged.
You can see more details of this here :
http://www.webpagetest.org/performance_optimization.php?test=171106_A4_be80c122489ae6fabf5e2caadcac8123&run=1#use_of_cdn
My website is using Processwire 3 running Apache and a PW caching product called Procache.
One of your issues is that you are not taking advantage of cache control headers on your objects. This is why you are seeing the message No max-age or expires. Look at this link to learn more about Cache-Control and Expires. Note: You should be using these headers even if you do not use CloudFront as the browser will cache certain objects also.
Using Headers to Control Cache Duration for Individual Objects
You do not indicate what web server that you are using. I have included a link for setting up Apache mod_expires to add cache control headers to your objects.
Apache Module mod_expires
For static assests such as css, js, images, etc. I would setup S3 and serve those objects from S3 via CloudFront. You can control the headers for S3 objects.
The above steps will improve caching of your objects in CloudFront and in the users' browser cache.

How to Serve Django media user uploaded files using Cherokee with restriction to logged users

How to configure Django and Cherokee to serve media (user uploaded) files from Cherokee but to logged in users only as with #login_required on production.
Create a Django view which servers the file
Use #login_required on this view to restrict the access
Read the file from the disk using standard Python io operations
Use StreamingHttpResponse so there is no latency or memory overhead writing the response
Set response mimetype correctly
I will answer my own question
As you are using Cherokee
Remove direct access to media folder with the media URL as localhost/media/.. for exemple by removing the virtuelhost serving it
Activate (check) Allow X-Sendfile under Handler tab in Common CGI Options in the virtuelserver page that handle Django request.
Let's say you have users pictures under media/pictures to protect that will be visible to all users only. (can be modified as you want just an exemple)
Every user picture is stored in media/pictures/pk.jpg (1.jpg, 2.jpg ..)
Create a view :
#login_required(redirect_field_name=None)
def media_pictures(request,pk):
response = HttpResponse()
path=os.path.join (settings.MEDIA_ROOT,'pictures',pk+'.jpg')
if os.path.isfile(path):
#response['Content-Type']="" # add it if it's not working without ^^
response['X-Accel-Redirect'] = path
#response['X-Sendfile'] = path # same as previous line,
# X-Accel-Redirect is for NGINX and X-Sendfile is for apache , in our case cherokee is compatible with two , use one of them.
return response
return HttpResponseForbidden()
Cherokee now take care of serving the file , it's why we checked the Allow X-Sendfile , this will not work without
path variable here is the full path to the file, can be anywhere , just read accsible by cherokee user or group
4. Url conf
As we disable direct access of Media folder, we need to provide an url to access with from Django using the previous view
for exemple , To make image of user with id 17 accessible
localhost/media/pictures/17.jpg
url(r"^media/pictures/(?P<pk>\d+)\.jpg$", views.media_pictures,name="media_pictures"),
This will also work for Apache, Nginx etc , just configure your server to use X-Sendfile (or X-Accel-Redirect for Nginx), this can be easily found on docs
Using this, every logged user can view all users' pictures , feel free to add additional verifications before serving the file , per user check etc
Hope it will help someone

Heroku + S3 + Django: Static Files not Cached

Currently have a project deployed on Heroku with static files loaded from S3. I'm using boto/django-storage to manage my S3 content, but if I call the same view or load the same page repeatedly, all the images/static content load twice and is not cached.
I've placed
AWS_HEADERS = {
'Cache-Control': 'max-age=2592000',
}
in my settings.py, but the reason seems the same exact images (refreshed + loaded twice) have different signatures in their URL? I've tried multiples headers, but the browser doesn't seem want to cache it and instead loads them all everytime.
try setting AWS_QUERYSTRING_AUTH = False. Then the URL generated will always be the same (public) URL. The default-ACL in S3BotoStorageis public-read, which shouldn't be changed then.
Two things not to forget:
perhaps you want to add public, max-age=XXX, so public proxies also can cache your content?
When you want the browser to cache that long, you should keep in mind that the filenames have to change when you change the content. One solution would be to S3BotoStorage combined with the Django-CachedStaticFilesStorage (see here, but I use it without the seperate cache-backend)

Microsoft Azure appending extra query string to urls with query strings

In deploying a version of the Django website I'm working on to Microsoft's Azure service, I added a page which takes a query string like
http://<my_site_name>.azurewebsites.net/security/user/?username=<some_username>&password=<some_password>
However, I was getting 404 responses to this URL. So I turned on Django's Debug flag and the page I get returned said:
Page not found (404)
Request Method: GET
Request URL: http://<my_site_name>.azurewebsites.net/security/user/?username=<some_username>&password=<some_password>?username=<some_username>&password=<some_password>
Using the `URLconf` defined in `<my_project_name>.urls`, Django tried these URL patterns, in this order:
^$
^security/ ^user/$
^account/
^admin/
^api/
The current URL, `security/user/?username=<some_username>&password=<some_password>`, didn't match any of these.
So it seems to be appending the query string onto the end of the url that already has the same query string. I have the site running on my local machine and on an iis server on my internal network which I'm using for staging before pushing to Azure. Neither of these site deployments do this, so this seems to be something specific to Azure.
Is there something I need to set in the Azure website management interface to prevent it from modifying URLs with query strings? Is there something I'm doing wrong with regards to using query strings with Azure?
In speaking to the providers of wfastcgi.py they told me it may be an issue with wfastcgi.py that is causing this problem. While they look into it they gave me a work around that fixes the issue.
Download the latest copy of wfastcgi.py from http://pytools.codeplex.com/releases
In that file find this part of the code:
if 'HTTP_X_ORIGINAL_URL' in record.params:
# We've been re-written for shared FastCGI hosting, send the original URL as the PATH_INFO.
record.params['PATH_INFO'] = record.params['HTTP_X_ORIGINAL_URL']
And add right below it (still part of the if block):
# PATH_INFO is not supposed to include the query parameters, so remove them
record.params['PATH_INFO'] = record.params['PATH_INFO'].split('?')[0]
Then, upload/deploy this modified file to the Azure site (either use the ftp to put it somewhere or add it to your site deployment. I'm deploying it so that if I need to modify it further its versioned and backed up.
In the Azure management page for the site, go to the site's configure page and change the handler mapping to point to the modified wfastcgi.py file and save the configuration.
i.e. my handler used to be the default D:\python27\scripts\wfastcgi.py. Since I deployed my modified file, the handler path is now: D:\home\site\wwwroot\wfastcgi.py
I also restarted the site, but you may not have to.
This modified script should now strip the query string from PATH_INFO, and urls with query strings should work. I'll be using this until I hear from the wfastcgi.py devs that the default wfastcgi.py file in the Python27 install has been fixed/replaced.