flyway and initialization of Repeatable migrations - database-migration

Quoting flyway doc at https://flywaydb.org/documentation/migration/repeatable :
Repeatable migrations do not have a version. Instead they are
(re-)applied every time their checksum changes.
This is very useful for managing database objects whose definition can
then simply be maintained in a single file in version control.
Within a single migration run, repeatable migrations are always
applied last, after all pending versioned migrations have been
executed. Repeatable migrations are applied in the order of their
description.
This sounds exciting however I can't seem to find any clarification on how this actually works and on how to initialize repeatable migrations. I understand that for Versioned migration I can create a base migration (https://flywaydb.org/documentation/existing) and then run the baseline command to get things ready for my future versions. However for repeatable migrations I do not understand how flyway is able to checksum changes.
Instead they are
(re-)applied every time their checksum changes.
Is flyway making the assumption I am recreating my database from scratch to get the checksum comparison working? This would explain how it is able to compare checksum (since it has access to the file-definition of object already in database).

Let's assume the checksum of your repeatable migration SQL script is e.g. 123.
First time you run Flyway it will check schema_version table, find out this repeatable migration has not been applied yet so it will execute it.
Second time you launch Flyway it will check that your SQL script has its checksum equal to 123 which is equal to what's recorded in schema_version since last time and so your repeatable migration script won't be executed.
Now let's assume you modify your repeatable migration SQL script for the third version and the checksum changes to e.g. 987. When you launch Flyway it will find out that 987 is not equal to what's still stored in schema_version (123) so this time it will execute the new version of the repeatable migration SQL and then update the 123 checksum value in schema_version to 987.
This means you can keep on changing repeatable migration scripts with each new version as you need. You cannot update the baseline (non-repeatable) script this way because Flyway would throw an error about non-matching checksum.

UPDATE: I know it s been some time, the solution I was looking for is flyway repair. It will sync a pre-existing schema with flyway internal checksum table. https://flywaydb.org/documentation/commandline/repair
I've tested flyway and now understand initialization: flyway will ignore existing scripts / migrations as long as the file name will not have a specific prefix (I think 'REPEATABLE' is the default) - so as long as migrations are not renamed to be executed flyway will ignore them.

Related

Moving Django models between "apps" - easy and fast

In my company's Django project, our models are currently spread across about 20 "app" folders with little rhyme or reason. We'd like to consolidate them into a single new app (along with the rest of our code) so that in the future, we can refactor all parts of our system at-will without worrying about violating an "app" boundary.
Due to how Django works, simply moving the model breaks everything. I've spent hours reading everything I can about the various proposed solutions to this problem, with the best article being:
https://realpython.com/move-django-model/
So far, I'm not happy with any of the solutions I've come across. They're all just too onerous.
If I do the following:
Move all models to the new app. (Let's call it "newapp".)
Update all references in the code to the previous locations. (Including migration files.)
Take down the site for a minute.
Run a script to rename all database tables from someprevapp_model to newapp_model.
Run a script to update app_label for all rows in the django_content_type table.
Deploy latest codebase version, with updated model locations and references.
Turn site back on.
Have I succeeded, or am I missing something?
UPDATE
Just remembered - I'd also have to:
5.5) Run a script to update app for all rows in the in django_migrations table (easy part), and to renumber all migrations to form a single sequence (hard part), as well as modifying all migration files to match new sequence.
UPDATE - V2 - Full Reset
Given the need to re-sequence migrations, combined with having already been wanting to squash our huge migration stack, lead me to this new and improved recipe:
Move models to new app. (Let’s call it “newapp”.)
Delete previous migrations.
Update code to reflect new model locations.
Run makemigrations to generate single migration in new app with all models.
Turn site off.
Run script to rename my tables from prevapp_model to newapp_model.
Run script to update django_content_type.app_label.
Run script to replace contents of django_migrations with single row indicating migration newapp-0001 has already run.
Deploy new version.
Turn site back on.

Am I required to use django data migrations?

I migrated a database that was manipulated via SQL to use Django's migration module: https://docs.djangoproject.com/en/3.0/topics/migrations/.
Initially I thought of using migrations only for changes in the model such as changes in columns or new tables, performing deletes, inserts and updates through SQL, as they are constant and writing a migration for each case would be a little impractical.
Could using the migration module without using data migrations cause me some kind of problem or inconsistency in the future?
You can think about data migration if you made a change that required also a manual fix on the data.
Maybe you decided to normalize something in the database, e.g. to split a name column to the first name and the last name. If you have only one instance of the application with one database and you are the only developer then you will also not write a data migration, but if you want to change it on a live production site with 24 x 7 hours traffic or you cooperate with other developers then you probably prepare a data migration for their databases or you will thoroughly test the migration on a copy of live data that the update will work on the production site correctly without issues and with minimal shutdown. If you don't write a data migration and you had no problem immediately then it is OK and will be not worse then a possible ill-conceived data migration.

Mix data and schema migrations in one migrations file (Django)?

I've heard the opinion that mix data migration and structure migrations is bad practice in Django. Even if you are specify atomic=False in your Migration class. But i could not find any information on this topic. Even my more expirience collegues could not answer this question.
So, is it bad to mix data and structure migrations? If so why? What exactly may happen if i do it?
There is an actual reason for not mixing data and schema migrations in one migration, mentioned in the entry for RunPython operation in Django docs:
On databases that do support DDL transactions (SQLite and PostgreSQL), RunPython operations do not have any transactions automatically added besides the transactions created for each migration. Thus, on PostgreSQL, for example, you should avoid combining schema changes and RunPython operations in the same migration or you may hit errors like OperationalError: cannot ALTER TABLE "mytable" because it has pending trigger events.
It should be also noted that for databases that do not support DDL transactions, it may be easier to fix the database after an unsuccessful migration attempt when data and schema migration operations are not mixed together, as data migration operations can be rolled back automatically in Django.
In the past the best practice was to keep them separate. The second sentence in this section in the docs says:
Migrations that alter data are usually called “data migrations”;
they’re best written as separate migrations, sitting alongside your
schema migrations.
But doesn't list any reasons why. Since Django ~2.0 I've been allowing small data migrations to occur with schema migrations. However there have been times when the migration simply couldn't run with the schema migration. There are two main cases that I've run into.
The data migration takes a long time and shouldn't be a migration in the first place. The resolution was to simply run a script that did what the data migration would have, but in batches.
Attempting to add/update data, then creating an index. This forced me into splitting the migrations into two separate files. I don't remember the exact error, but it simply wouldn't migrate. This shouldn't cause problems for you unless there are non-atomic migrations running which would leave your DB in an unexpected state.

Flask-Migrate `db upgrade` fails with "relation does not exist"

I'm working in a development environment on a flask-app with a Postgres 10 database that has ~80 tables. There are lots of relationships and ForeignKeyConstraints networking it all together.
It was working fine with Flask-Migrate. I'd bootstrapped and migrated up to this point with ~80 tables. But, I wanted to test out some new scripts to seed the database tables, and thought it would be quickest to just drop the database and bring it back up again using Flask-Migrate.
In this process, the migration folder was deleted, so I just started over fresh with a db init. Then ran db migrate. I manually fixed a few imports in the migrate script. Finally, I ran db upgrade.
However, now with all these 80 create_table commands in my migrate script, when I run db_upgrade, I receive an error:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) relation "items" does not exist
I receive this error for every Child table that has a ForeignKeyConstraint if the Child table is not in an order which is below the Parent table in the migration file.
But, the autogenerated script from db migrate has the tables sorted alphabetically, ordered by table name.
Referring to documentation, I don't see this importance of sort order mentioned.
Bottom line is, it seems I'm either forced to write a script to sort all these tables in an order where the Parent table is above the Child table. Or else, just cut and paste like a jigsaw puzzle until all the tables are in the required order.
What am I missing? Is there an easier way to do this with Flask-Migrate or Alembic?
After researching this, it seems flask-migrate and/or Alembic does not have any built-in methods to resolve this sort order issue. I fixed it by cutting and pasting the tables in an order which ensured the Parent table was above the child tables in the migration file.
I've just encountered this myself, and could not find a better and/or official answer.
My approach was to separate the table creation from the creation of foreign key constraints:
Edit Alembic's auto-generated migration script: In each table create operation, remove all lines creating foreign key constraints
Run Alembic's upgrade command (tables are created, minus the FK constraints, of course)
Run Alembic's migrate command (additional migration script created, that adds all FK constraints)
Run Alembic's upgrade command (FK constraints added to tables)
I've faced some problems when using flask-sqlalchemy and flask-migrate, I solved it using python interactive shell.
>>> from yourapp import db, create_app
>>> db.create_all(app=create_app())
Check this link to get more information.
Happy coding...

Schema migration commit changes

I have the following situation:
more than one schema migration
one data migration
It would be simple enough but I encountered a problem with the data migration. It sends a query for a specific ContentType which I need for django-taggit. The problem is that the model I want to query didn't exist until the migration that preceded it. That errors out with an empty result from that query.
However, when I run all migrations up to the data migration and then I run the data migration itself, everything works well. I've noticed that a migration process doesn't save changes until all of the migrations are finished which doesn't work for this.
One of the solutions I got to was to manually commit/save changes to the database however I haven't been able to find a way to do it. Of course, if there are any other ideas/better solution I'd be happy to hear them.
This is the code where the data migration errors out:
# ChallengeContest ContentType
challenge_contest_ct = ContentType.objects.get(model='challengecontest')
As you can see the model challengecontest is the one that was created in a migration preceeding data migration.
I have found data migrations to be more trouble than they're worth. In my last two jobs we abandoned them, replacing them with writing one-off management commands.