We are running our stack on Heroku. We are using Django 2.2 with a Postgres 11 DB. Our build pipeline (Github Actions) pushes to Heroku (git push https://git.heroku.com...) and immediately afterwards runs the migrations (heroku run python manage.py migrate --app heroku-app-name). All of that was working with a Postgres 9.6 database and is still working in our staging environment (Postgres 11). Now with production being on Postgres 11, the django migrate command is just stuck and doesn't produce any output, even so there are no actual migrations to apply.
The only differences between our production setup and our staging setup are a follower/slave in production attached to the master DB and "production workload".
In order to fix that deployment I have to run a
heroku pg:killall -a heroku-app-name
heroku restart -a heroku-app-name
At this point the migrations task in the build pipeline fails.
and afterwards migrations can be applied manually without problems:
heroku run python manage.py migrate --app heroku-app-name
So for some reason the migrations command is "waiting" for something, some database lock or whatever, yet I cannot put my finger on it. Especially odd for me is the fact that its also stuck where no migrations are to applied. Why would it be stuck there?
We found the solution. There are actually three things coming together.
We trigger a DB backup before running any migrations. We only do so in Production and not on staging, which was the reason why our staging environment had no issues while production had.
A DB migration (even so it looks like there is nothing to apply) is actually running some commands (besides regular SELECT, UPDATE, INSERTS). E.g. in our case there was a CREATE EXTENSION ... IF NOT EXISTS always executed at the beginning.
While it was possible with Postgres 9.6 to have a backup job running in parallel (I don't know what heroku is using under the hood, yet I assume a noraml pg_dump) the backup job on Postgres 11 (and others?) has now a more exclusiv lock on some operations. I assume that that a CREATE EXTENSION ... IF NOT EXISTS (even so the extension already exists) cannot be executed while having a backup job running in parallel.
(I am sure there are some Postgres internas missing to explain this more correctly)
As a result of these three things, the DB blocks the migrate operation, waiting for the backup job to finish. I have moved the daily backup job to a different time and reconfigured our pipeline to wait for the "pre-deploy"-backup to finish first.
Related
I have a django REST project and a PostgreSQL database deployed to DigitalOcean. When I develop locally, I have a separate dockerized REST server and a separate PostgreSQL database to test backend features without touching production data.
My question arises when I'm adding/modifying model fields that require me to make migrations using python [manage.py](https://manage.py) makemigrations and python [manage.py](https://manage.py) migrate command. Here is my current situation so far:
What I was supposed to do
IN LOCAL ENV, to create the migration files,
python manage.py makemigrations
python manage.py migrate
Now commit these newly created files, something like below.
git add app/migrations/...
git commit -m 'add migration files' app/migrations/...
IN PRODUCTION ENV, run only the below command.
python manage.py migrate
What I did so far
IN LOCAL ENV, created the migration files,
python manage.py makemigrations
python manage.py migrate
I committed & pushed the changes to production WITHOUT the created migration file
IN PRODUCTION ENV, ran BOTH commands.
python manage.py makemigrations
python manage.py migrate
The production server successfully added the isActive field to the database and is working fine, but I still have a 0011_user_isActive.py migration file in my local changes that hasn't been staged/committed/pushed to github repo.
And because I ran makemigrations command in production env, it probably created the same migration file that I haven't pushed from local env.
My questions are:
What happens if I push the local migration file to production? Wouldn't it create a conflict when I run migration command on digitalocean console in the future?
How should I fix this situation?
I am just scared I'm going to corrupt/conflict my production database as I'm very inexperienced in databases and have too much to risk at the moment. Would appreciate any tips on best practices when dealing with such situations!
As docs says:
The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.
So it's best practice is to push your migration files and make your local and production migration files sync.
And if you got conflict when pushing migraions files and pulling them, the makemigrations --merge command is for solving that.
Also docs says:
Because migrations are stored in version control, you’ll occasionally come across situations where you and another developer have both committed a migration to the same app at the same time, resulting in two migrations with the same number.
Don’t worry - the numbers are just there for developers’ reference, Django just cares that each migration has a different name. Migrations specify which other migrations they depend on - including earlier migrations in the same app - in the file, so it’s possible to detect when there’s two new migrations for the same app that aren’t ordered.
When this happens, Django will prompt you and give you some options. If it thinks it’s safe enough, it will offer to automatically linearize the two migrations for you. If not, you’ll have to go in and modify the migrations yourself - don’t worry, this isn’t difficult, and is explained more in Migration files below.
Also be aware that in case of updating existed data in production, you can use RunPython in migration file. Read about it here.
This is a Heroku-specific issue with a Django project 1.11.24 running Python 3.6.5 and a Heroku postgres database.
During testing of two different branches during development, different conflicting migration files were deployed at different times to the Heroku server. We recognized this, and have now merged the migrations, but the order the Heroku psql db schema was migrated is out of order with the current migration files.
As a result, specific tables already exist, so on deploy applying the updated merged migration files errs with:
psycopg2.errors.DuplicateTable: relation "table_foo" already exists
In heroku run python manage.py showmigrations -a appname all of the migrations are shown as having run.
We've followed Heroku's docs and done the following:
Rolled back the app itself to before when the conflicting migrations took place and were run (https://blog.heroku.com/releases-and-rollbacks)
Rolled back the postgres db itself to a datetime before when the conflicting migrations took place and were run (https://devcenter.heroku.com/articles/heroku-postgres-rollback)
However, despite both app and db rollbacks, when we check the db tables themselves in the rollback in pql shell with \dt, the table causing the DuplicateTable err still exists, so the db rollback doesn't actually seem to effect the django_migrations table.
It's Heroku, so we can't fake the migrations.
We could attempt to drop the specific db tables that already exist (or drop the entire db, since it's a test server), but that seems like bad practice. Is there any other way to address this in Heroku? thanks
I eventually fixed this by manually modifying migration files to align with the schema dependency order that was established. Very unsatisfying fix, wish Heroku offered a better solution for this (or a longer postgres database rollback window)
I have a rather complex K8s environment that one of the Deployments is a Django application. Currently we are having a very hard time whenever I need to update a model that has already been migrated to a PostgreSQL database.
Let's say for instance that I create an application named Sample, that has a simple table on the models.py. My development process (skaffold) builds the docker and apply it locally on the minikube, after this is done I connect to the pod via kubectl exec and execute the python manage.py makemigrations and python manage.py migrate, so far so good.
After some time, let's say I need to create a new table on the models.py file of the Sample application, the skaffold builds the docker, kills the old pod, and create the new pod. So I connect as usual via kubectl exec and try to execute the makemigrations and migrate command, lo and behold, there's no migration to apply. And of course no change is made on the PostgreSQL.
Upon further searching this, I believe that the reason for this is that since the docker is built without the Sample/migrations folder, and there's already a table (the original one) on the PostgreSQL, when I run the makemigrations it creates only the 0001_initial.py file, that has all the tables but, since the table already exists, when executing the migratethe django believes that the migration is already applied, therefore it won't apply.
If what I found out is true, how can I keep this files on a PVC, so that they are always kept between each pod recreation?
Thank you.
My Django deployment has x number of pods (3 currently)running a Django backend REST API server. We're still in the development/staging phase. I wanted to ask for advice regarding DB migration. Right now the pods simply start by launching the webserver, assuming the database is migrated and ready. This assumption can be wrong of course.
Can I simply put python manage.py migrate before running the server? What happens if 2 or 3 pods are started at the same time and all run migrations at the same time? would there be any potential damage or problem from that? Is there a best practice pattern to follow here to ensure that all pods start the server with a healthy migrated database?
I was thinking about this:
During initial deployment, define a Kubernetes Job object that'll run once, after the database pod is ready. It will be using the same Django container I have, and will simply run python manage.py migrate. the script that deploys will kubectl wait for that job pod to finish, and then apply the yaml that creates the full Django deployment. This will ensure all django pods "wake up" with a database that's fully migrated.
In subsequent updates, I will run the same job again before re-applying the Django deployment pod upgrades.
Now there is a question of chicken and egg and maintaining 100% uptime during migration, but this is a question for another post: How do you apply data migrations that BREAK existing container Version X when the code to work with the new migrations is updated in container Version X+1. Do you take the entire service offline for the duration of the update? is there a pattern to keep service up and running?
Well you are right about the part that multiple migrate commands will run against your database by multiple pods getting started.
But this will not cause any problems. When you are going to make actual changes to your database, if the changes are already applied, your changes will be ignored. So, say 3 pods start at the same time and run the migrate command. Only One of those commands will end up applying changes to the database. Migrations normally need to lock the database for different actions (this is highly related to your DBMS). The lock will happen by one of the migrate commands (one of the pods) and other commands should wait until the work of the first one is over. After the job is done by the first one, others' commands will be ignored automatically. So each migration will happen once.
You can however, change your deployment strategy and ask kubernetes to first, spin up only 1 pod and when the first pod's health check succeeds, others will spin up too. In this case, you can be sure that the lock time for the migration, will happen only once and others will just check that migrations are already applied and ignore them automatically.
You may use Kubernetes init containers, which are specialized containers that run before app containers in a Pod. The init containers stop after successfully executing the commands you want, so they won't occupy unnecessary resources.
Here is the official link:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
The best thing you can do is to use kind:Jobs, an run this before the rollout.
Please check: Django migrations by Kubernetes Job and persistent Volume Claim
As I understand In K8s you can't control the order of the pods.
Let's say your DB host is a docker container (Dev environment) and not an RDS/ static host, migrations command could not be executed before the DB is up and ready.
I try to to use the ready function in app.py to execute run_migrations.py that execute shell script of run migrations but run migrations trigger app.py and it's infinity loop.
In my organization We use circleCI which build the image first and then push it to AWS ECR with tags before deploy it to EKS. At the build stage I can't assume the DB pod is ready so run migrations will cause an error.
When the DB for Django is also a K8s pod, the best way to execute it will be a job.
I am working on a Django app, and I would like my Database migrations to be run when deploying on Heroku.
So far we have simply put the following command in the Procfile:
python manage.py migrate
When deploying the migrations are indeed run, but they seem to be run once for each dyno (and we use several dynos). As a consequence, data migrations (as opposed to pure schema migrations) are run several times, and data is duplicated.
Running heroku run python manage.py migrate after the deployment is not satisfactory since we want the database to be in sync with the code at all times.
What is the correct way to do this in Heroku?
Thanks.
This is my Procfile and it is working exactly as you describe:
release: python manage.py migrate
web: run-program waitress-serve --port=$PORT settings.wsgi:application
See Heroku docs on defining a release process:
https://devcenter.heroku.com/articles/release-phase#defining-a-release-command
The release command is run immediately after a release is created, but before the release is deployed to the app’s dyno formation. That means it will be run after an event that creates a new release:
An app build
A pipeline promotion
A config var change
A rollback
A release via the platform API
The app dynos will not boot on a new release until the release command finishes successfully.
If the release command exits with a non-zero exit status, or if it’s shut down by the dyno manager, the release will be discarded and will not be deployed to the app’s formation.
Be aware, however, this feature is still in beta.
Update:
When you have migrations that remove models and content types, Django requires a confirmation in the console
The following content types are stale and need to be deleted:
...
Any objects related to these content types by a foreign key will also be deleted. Are you sure you want to delete these content types? If you're unsure, answer 'no'. Type 'yes' to continue, or 'no' to cancel:
The migrate command in your Procfile does not respond and the release command fails. In this scenario, remove the migrate line, push live, run the migrate command manually, then add it back for future deploys.
The migrate does automatically runs on Heroku, but for now you can safely do it once your dyno is deployed with heroku run python manage.py migrate.
If production, you can put your app in maintenance first with heroku maintenance:on --app=<app name here>
Setup your Procfile like in the docs
release: python manage.py migrate
web: gunicorn myproject.wsgi --log-file -
documented at https://devcenter.heroku.com/articles/release-phase#specifying-release-phase-tasks
You can create a file bin/post_compile which will run bash commands after the build.
Note that it is still considered experimental.
Read here for more buildpack info.
See here for an example
Alternatively, Heroku is working on a new Releases feature, which aims to simplify and solve this process. (Currently in Beta).
Good luck!