I am running a django application on Heroku, and currently using AWS S3 to serve my static files. We store our static files both in static folders per app, and also in a static/ folder at the root of the directory. The static/ folder at the root is about 40Mb large.
Whenever we deploy our app to Heroku, the static files are included in the Heroku slug, so that
heroku run python manage.py collectstatic --no-input can be run from the Dyno itself, which then copies any changed/new static files to our S3 bucket so that they can be served.
The issue is after we go through this process, we now have a static/ folder on the Dyno which takes up about 40Mb of space, and is seemingly useless since our files are being served from our S3 bucket!
Is there a better way to go about deploying our application, and collecting our static files to our S3 bucket but not copying the static files to Heroku?
One way I was thinking was to add all static files to Heroku's .slugignore file, and then configure a way to upload static files to our S3 bucket without using Heroku at all. I'm not sure if this is the correct way to go about it, however, and would appreciate advice on this.
The reason we have been looking into this is our Heroku slug size is starting to grow far too large (~450Mb), and we need to start reducing it.
After some more digging, I found examples of people doing exactly what I was describing above, which is uploading static files directly to S3 without using any intermediary storage. This article shows how to configure Django and S3 so that running python manage.py collectstatic on your local machine will copy the static files directly to S3.
This configuration, in combination with disabling collectstatic on Heroku (https://devcenter.heroku.com/articles/django-assets#disabling-collectstatic) and adding our static files to .slugignore, would be exactly what I was looking for, which was to upload static files directly to S3 without uploading them first to Heroku.
More reading from Django' docs
Related
I am using a combination of django-storages and ManifestStaticFilesStorage to server static files and media from S3.
class StaticStorage(ManifestFilesMixin, S3BotoStorage):
location = settings.STATICFILES_LOCATION
When I run collectstatic I can see the newest version of my JS file on S3 with the correct timestamp.
I can also see that file being referenced in the staticfiles.json manifest.
However looking at the site in the browser I am still seeing the old JS being pulled down, not the one in the manifest
What could be going wrong?
The staticfiles.json seems to be loaded once when the server starts up (from the S3 instance). If you run collectstatic while the server is running it has no way of knowing that there were changes made to S3. You need to restart the server after running collectstatic if changes have been made.
You can read this post for more infomation. In short:
By default staticfiles.json will reside in STATIC_ROOT which is the
directory where all static files are collected in. We host all our
static assets on an S3 bucket which means staticfiles.json by default
would end up being synced to S3.
So if your staticfiles.json being cached, your static files will be the old ones.
There are 2 ways to fix this:
Versionize staticfiles.json like you're already done with your static files
Keep staticfiles.json in local instead of S3
So I was finally able to set up local + prod test project I'm working on.
# wsgi.py
from dj_static import Cling, MediaCling
application = Cling(MediaCling(get_wsgi_application()))
application = DjangoWhiteNoise(application)
I set up static files using whitenoise (without any problems) and media (file uploads) using dj_static and Postgres for local + prod. Everything works fine at first... static files, file uploads.
But after the Heroku dynos restart I lose all the file uploads. My question is, --- Since I'm serving the media files from the Django app instead of something like S3, does the dyno restart wipe all that out too?
PS: I'm aware I can do this with AWS, etc, but I just want to know if thats the reason I'm losing all the uploads.
Since I'm serving the media files from the Django app instead of something like S3, does the dyno restart wipe all that out too?
Yes!. That's right. According to the Heroku docs:
Each dyno gets its own ephemeral filesystem, with a fresh copy of the most recently deployed code.
See, also this answer and this answer.
Conclusion: For media files (the uploaded ones), you must use some external service (like S3 or something). whitenoise is just for static files. See here why whitenoise is not suitable for serving user-uploaded (media) files.
How can I collect media files from different django project applications into one folder to be easier to deploy in production with Nginx/Apache?
No, there isn't a collectstatic equivalent for media files.
Media files are meant to be uploaded by the user as the site runs, so it doesn't make sense to collect them before deployment.
What is the necessity to do collectstatic in django? Why can't I just copy files to the static folder and make my server refer to that folder? Why does that not work?
Convenience? You can manually copy over the static files and there's no problem with doing that, but when you have multiple different folders where static files are stored and you're deploying to a production server, it's much more of a hassle to go individually to each folder and copy them instead of having collectstatic run as an automated task and do the work for you.
I have been using s3boto's S3BotoStorage as my static files backend and syncing files to my aws s3 buckets (staging and production) using ./manage.py collectstatic. It works fine. However it is painfully slow. In addition to my own static files (just a few) and django admin, I have a few third party packages with many many static files (grappelli, django-redactor). And collectstatic can take upwards of 15 minutes each time I run it, depending on my internet connection. For instances where I'm syncing with my staging bucket and things aren't quite right, and I have to tweak something and re-sync, its a big time killer. Are there any good, fast, scriptable alternatives for syncing static files to s3?
I wrote a pluggable Django app, based on a djangosnippet, that caches the ETag of the remote file and compares the chached checksum instead of performing a lookup every time. It took me from about 1m30s to around 10s per call to manage.py collectstatic for a few hundred static files. Check it out here: https://github.com/antonagestam/collectfast
Set AWS_PRELOAD_METADATA to True in your settings so it pre-loads all files on s3 before syncing and only syncs the ones that are not already there (or have changed).