Django collectstatic from Heroku pushes to S3 everytime - django

I'm using django-storages for static files with S3 (and S3BotoStorage). When I do collectstatic from my local machine, the behaviour is as expected, where only modified files are pushed to S3. This process needs python-dateutils 1.5 to check for modified time.
However, doing the same on Heroku results in every file being pushed regardless, although the setup is the same. I then looked into the modified time of the files on Heroku itself, and it seems like, the os.stat(static_filename).st_mtime is the same as the time of the last push.
Is this expected behaviour? Does heroku copy around files even when there is no change from git?

Try setting DISABLE_COLLECTSTATIC=1 as an environment setting for your app - that should disable it from running on every push.
See this article for details - https://devcenter.heroku.com/articles/django-assets :
> Sometimes, you may not want Heroku to run collectstatic on your behalf.
> You can disable collectstatic by enabling user-env-compile as well:
$ heroku labs:enable user-env-compile
$ heroku config:set DISABLE_COLLECTSTATIC=1
I've found that simply setting the config will do - no need to also enable user-env-compile - it may be that that this has passed from labs into production?
NB the deployment is managed by the Heroku python buildpack, which you can see here - https://github.com/heroku/heroku-buildpack-python/
EDIT 1
I've just done a bunch of tests on this, and can confirm that DISABLE_COLLECTSTATIC does indeed disable collectstatic, irrespective of the user-env-compile setting - I think that's now in the main trunk (but that's speculation). Doesn't seem to care what the setting is - if DISABLE_COLLECTSTATIC exists as a config var it is used.

I strongly recommend using the collectfast package for any django static deployment to s3, whether local or from your heroku server. It ignores modified dates and utilizes md5 hashes, which the s3 api will provides very quickly, and (optional) caching to make your static deployments zoom. It took my static deployments from ~10-15 minutes to < 2 minutes and only deploys the files that have actually changed.

I've just had that exact same issue and contacted Heroku's support to find out what is going on. My question to them was
I've run into a funky issue doing some deployments. It appears that on each push the date modified on all files is updated to the time a new deploy/git push happens. Is this intended behaviour?
When considering that Django's collectstatic command only checks the modified date on files when evaluating if the file should be copied across to the final storage backend for static assets, it means that on each new push, all files are first removed from the remote storage (in this case S3) and then re-uploaded. This is both a very slow and wasteful process in terms of bandwidth consumed and requests made.
The answer I received today from "Caio", one of Heroku's support staff, was
Hi, that's how it currently works, yes. I'm routing your feedback to our runtime team to see if we can package files with their original dates.

As confirmed by Alen, Heroku changes the modified date of the files when it deploys. However, Amazon S3 also has an attribute called etag that is an md5 hash of the file content. It's possible to use this to check if the files have changed instead of the modified date, as implemented in this Django snippet.
I took that code, packaged it and fixed some errors I found and put it on Github as django-s3-collectstatic. It includes a new management command fasts3collectstatic that only uploads new files. Check the Github page for installation instructions.

Why not run collectstatic from local machine?
python manage.py collectstatic --noinput --settings=settings.[prod]

I agree this is annoying- there's a couple things you can do. I override the collectstatic command and wire it up in my production settings. Below is the command I use:
```
from django.core.management.base import BaseCommand
class Command(BaseCommand):
args = '< none >'
help = "disables collectstatic cmd in contrib"
def handle(self, *args, **kwargs):
print 'collectstatic disabled'
```
I keep this in mysite/disablecollectstatic/management/commands
Then in production settings:
INSTALLED_APPS += ('mysite.disablecollectstatic',)
Alternatively you could use the fact that Heroku does a dry run first before actually invoking the command. If it fails, it won't run it, which means you could contrive an error (by maybe deleting the static root in your settings, for example) but this approach makes me nervous:
https://devcenter.heroku.com/articles/django-assets#detection

Related

Deploying Django to production correct way to do it?

I am developing Django Wagtail application on my local machine connected to a local postgres server.
I have a test server and a production server.
However when I develop locally and try upload it there is always some issue with makemigration and migrate e.g. KeyError etc.
What are the best practices of ensuring I do not get run into these issues? What files do I need to port across?
so ill tell you what i do and what most of the companies that i worked as a django developer did and i can tell you by experience that worked pretty well.
First containerize your application, this will make your life much more easy and you will remove external influence in your code, also will get you an easy way to reproduce your environment.
Your Dockerfile should be from some python image and should do 3 basically things:
Install your requirements dependencies
Run the python manage.py migrate --noinput command
Run a http server such as gunicorn with gunicorn -c /gunicorn.py wsgi:application
You ill do the makemigration in your local machine and make sure that everything is working before commit then to the repo.
In your gunicorn.py you ill put your settings to run the app such as the number of CPU, the binding port, the folder that your app is, something like:
import os
import multiprocessing
# Chdir to specified directory before apps loading.
# https://docs.gunicorn.org/en/stable/settings.html#chdir
chdir = '/app/'
# Bind the application on localhost both on ipv6 and ipv4 interfaces.
# https://docs.gunicorn.org/en/stable/settings.html#bind
bind = '0.0.0.0:8000'
Second containerize your other stuff, for example the postgres database, the redis (for cache), a connection pooler for the database depending on the size of your application.
Its highly recommend that you have a step in the pipeline to do tests, they need to run before everything, maybe just after lint
Ok what now? now you need a way to deploy that stuff, the best for that scenario is: pull your image to github registry, and you can add a tag to that for example:
IMAGE_ID=ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME
# Change all uppercase to lowercase
IMAGE_ID=$(echo $IMAGE_ID | tr '[A-Z]' '[a-z]')
docker tag $IMAGE_NAME $IMAGE_ID:staging
docker push $IMAGE_ID:staging
This can be add in a github action in the build step for example.
After having your new code in a new image inside github you just need to update the current one, this can be done by creaaing a script to do it in the server and running that script from github action, is something like:
docker pull ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME
echo 'Restarting Application...'
docker stop {YOUR_CONTAINER} && docker up -d
sudo systemctl restart nginx
echo 'Cleaning old images'
sudo docker system prune -af
You can see that i create the image with a staging tag, you can create a rule in github actions for example to trigger that action when you create a new release for example, and create another action to be trigger in every new commit and build/deploy for a dev tag.
For the migration problem, the first thing is, when your application go live squash every migration to the first one (you can drop the database and all the migration then create the database and run the makemigration command again to reach this), so you can have a clean migration in the server. Never creates unnecessary relation between the tables, prefer always doing cached properties instead of add new columns, use UUID for unique ids, and try to not do breaking changes in the database, its hard but if you plan the database before is not so difficult to do.
Hit me if you have any questions. A lot of the stuff that i said can be done in a lot of other platforms such as gitlab, travis, circle ci, but i use the github action in the example because i think is more simple to picture.
EDIT:
I forgot to tell you to have a cron in your server doing backups of your databases, the migrate command ill apply the changes only after the verification but if something else break the database this can save your life.

Django collectstatic keeps waiting when run through Github Action

We are facing a very weird issue. We ship a django application in a docker container through Github Actions on each push. Everything is working fine except collectstatic.
We have the following lines at the end of our CD github action:
docker exec container_name python manage.py migrate --noinput
docker exec container_name python manage.py collectstatic --noinput
migrate works perfectly fine, but collectstatic just keeps on waiting if ran through the github action. If I run the command directly on the server then it works just fine and completes with in few minutes.
Can someone please help me figuring out what could be the issue?
Thanks in advance.
Now I am far from the most experienced but I did this recently and I have some suggestions of where to look. I'm definitely not the greatest authority though.
I wasn't using docker so I can't say anything about that. From the issues, I had here are some suggestions I can recommend to try.
Take note that all of this was for a self-hosted runner. Things would be very different otherwise.
Check to make sure STATIC_ROOT and MEDIA_ROOT variables are set correctly in the settings file.
If the STATIC and MEDIA root variables are environment variables make sure you are serving the correct environment variables file like a .env file which I used.
I used django-environ to serve my environment variables. From the docs, it says to have the .env file in the same directory as the settings file. Well if you are putting the project on a production server with github actions, you won't be able to put the .env file anywhere in the project because it will get overwritten every time new code is pushed.
So to fix that you need to specify the correct .env file from somewhere else on the server. Do that by specifying ENV_PATH.
https://django-environ.readthedocs.io/en/latest/
Under the section Multiple env files
Another resource that was helpful:
https://github.com/joke2k/django-environ/issues/143
I set up my settings file like how they did there.
I put my .env file in a proj directory I made in the virtualenvironment folder for the project.
I don't know if it's a good place to put it but that's how I did it. I didn't find much great info online for this stuff. Had to figure out a lot on my own.
Make sure the user which is running the github action has permissions to read the .env file.
Also like .env file, if you have the static files being collected into the base directory of your project you might have an issue with github actions overwriting those files every time new code is pushed. If you have a media directory where the user uploads files to then that will really be an issue because those files won't get overwritten. They'll just disappear.
Now if this was an issue it shouldn't cause github actions to just get stuck on the collect static command. It would just cause files to get overwritten every time the workflow runs and the media files will disappear.
If you do change the directory of where the static and media files are located as stated before, make sure all the variables for the paths are correct in the settings file and the .env file.
You will also need to update the nginx config file for the static and media root directories if you used nginx. Not sure about how apache does this.
You can do that with this command:
sudo nano /etc/nginx/sites-available/myproject
Don't forget to restart the nginx server after doing that.
If you are writing static and media files at a different location from the base project directory on the server, also check permissions on those directories. Make sure the user running the github action has permissions to write to those directories. I suspect that might cause it to hang but it very well might just cause an error.
Check all the syntax in the github actions yml file. Make sure everything is correct and it's not hanging cause it had an incomplete command or something like that.
But yeah, that's some things I had to take a look at. Honestly, none of this might be relevant for you. All of these issues should cause an error somewhere for the most part.
I couldn't really offer many external resources for you to look deeper into this because I'm just speaking from personal experience.
Hope I could help.
Heres my github repo for the project I did: https://github.com/pkudlanov/personal-portfolio-django
I hosted it on digitalocean on a linux server using nginx and gunicorn.

Heroku/Django Do I need to turn on maintenance mode when collecting static files

I've noticed that when i run python manage.py collectstatic --noinput users start seeing scrambled pages. After following on this, i found out that during the process i get 404 on css files, hence the scrambled view.
I've read the docs on Heroku regarding maintenance mode and deploying process, and nothing indicates that i need to turn maintenance mode on when i do a deploy with collectstatic.
The process of collecting static files is quite slow(~20 mins). I'm using django-pipeline to minify and combine the static files (with hashes),and then upload them to Amazon S3.
Is this a normal behaviour? Or am i doing something wrong?
Is there any way to deploy, with collectstatic, without taking the site down?
If you're having issues with the speed. Perhaps you could just upload the static files using collect static locally (or on another server) instead of on Heroku, of course you'll have to set up a process for that, but at least you won't have to tie up an expensive server for 20 minutes just to upload files.
You could also use: https://github.com/antonagestam/collectfast which speeds up the collect process by using md5 hashes. This would really improve the speed since only the "newness" would get uploaded.

Using collectstatic with multiple environments

I have a Django app on Heroku, with staging and production environments. Static files are hosted on S3. I'm streamlining my deployment process and plan to set up fabfiles once I have things working manually.
How can I configure collectstatic to push to multiple places? If I run it locally, it uses my dev settings (with a local STATIC_ROOT). If I run it on one of my Heroku apps (heroku run ./manage.py collectstatic), then it can't grab the files (since .slugignore ensures they're never pushed to Heroku). The same applies if I include collectstatic in my Procfile.
I'm also using django-pipeline, though it's not yet doing much since I'm stuck on the collectstatic bit.
UPDATE
In response to Marat's question, I tried passing a settings file as an option to collectstatic: ./manage.py collectstatic --settings=project.settings.prod, but got an error: Unknown command: 'collectstatic' I checked on the server though and Installed Apps does include django.contrib.staticfiles and I can also run collectstatic remotely, so I'm not sure what would cause that.
You can set the environment variable DJANGO_SETTINGS_MODULE so you don't need specify --settings everywhere:
heroku config:set DJANGO_SETTINGS_MODULE=project.settings.prod
First, if you are going to serve static via CloudFront, you can use custom origin and always use local STATIC_ROOT. Actually it has some advantages over S3 source, eg gzip support.
Another good thing you can do is to have environment dependent settings in a separate file and then import it in settings.py, eg:
local_settings.py (not in project repository, yet you can have local_settings.py.example):
#environment dependent settings
DATABASES = { .. }
CACHES = { .. }
STATIC_ROOT = 'your_path/static'
settings.py:
import local_settings
I've just replied a similar question on Upload Media from Heroku to Amazon S3. If you customise your settings to take in account environmental vars, you can use filesystem storage backends locally and S3 storage backends when pushing to Heroku. This will collect and upload your static files when your slug is compiled.

Google App Engine Development and Production Environment Setup

Here is my current setup:
GitHub repository, a branch for dev.
myappdev.appspot.com (not real url)
myapp.appspot.com (not real url)
App written on GAE Python 2.7, using django-nonrel
Development is performed on a local dev server. When I'm ready to release to dev, I increment the version, commit, and run "manage.py upload" to the myappdev.appspot.com
Once testing is satisfactory, I merge the changes from dev to main repo. I then run "manage.py upload" to upload the main repo code to the myapp.appspot.com domain.
Is this setup good? Here are a few issues I've run into.
1) I'm new to git, so sometimes I forget to add files, and the commit doesn't notify me. So I deploy code to dev that works, but does not match what is in the dev branch. (This is bad practice).
2) The datastore file in the git repo causes issues. Merging binary files? Is it ok to migrate this file between local machines, or will it get messed up?
3) Should I be using "manage.py upload" for each release to the dev or prod environment, or is there a better way to do this? Heroku looks like it can pull right from GitHub. The way I'm doing it now seems like there is too much room for human error.
Any overall suggestions on how to improve my setup?
Thanks!
I'm on a pretty similar setup, though I'm still runing on py2.5, django-nonrel.
1) I usually use 'git status' or 'git gui' to see if I forgot to check in files.
2) I personally don't check in my datastore. Are you familiar with .gitignore? It's a text file in which you list files for git to ignore when you run 'git status' and other functions. I put in .gaedata as well as .pyc and backup files.
To manage the database I use "python manage.py dumpdata > file" which dumps the database to a json encoded file. Then I can reload it using "python manage.py loaddata".
3) I don't know of any deploy from git. You can probably write a little python script to check whether git is up to date before you deploy. Personally though, I deploy stuff to test to make sure it's working, before I check it in.