I've been looking at Flyway as a database migration tool.
The one thing that I have been unable to find a definite answer for is the following:
Can I force Flyway to run all as-of-yet unapplied migrations in a single transaction, instead of having each migration be its own transaction?
In a dev environment it's not an issue, but in a production environment where you would potentially perform multiple migrations from one update to the next, one of the migrations failing would leave the database in a 'half-migrated' state, where some migrations were committed and some not - quite a bad thing.
A workaround would be to simply cram all the SQL required in a single file, but there are issues with that:
The production migrations and dev migrations would end up being performed differently, since you cannot know in advance what will be in the migration on dev environment. I guess you could always do a clean and then a new migrate, but this seems to be against the spirit of the flyway design with regard to incremental migrations.
The checksums will be different as soon as a new change is added.
Does Flyway still not support such a feature? Does Liquibase, or any other migration tool?
There is no such feature out of the box. It's a great question though and I'd bet it was thought about since Flyway provides the transactional boundaries per migration - hopefully Axel Fontaine will chime in on the technical / design considerations that resulted in this not being a feature.
The FAQ have this and this to say regarding downgrading / failures. The policy boils down to:
Maintain backwards compatibility between the DB and all versions of
the code currently deployed in production.... Have a well tested,
backup and restore strategy.
In my case, we have being using Flyway for almost 3 years and have abided by the quoted policy. At any given deployment we could have 100 or more migrations running against many databases and happy to say have never had anything untoward happen in production. This all comes down to minimizing the opportunity for failure in your release process.
I used Liquibase on a much smaller project prior to that and don't recall any such feature apart from providing the rollback procedures.
Related
By design the flyway migrations run in transactions and synchronously pretty early during startup of an application. This is usually desired, to ensure before business logic starts to execute the database is in a consistent state (migrated) or the migration fails and the app crashes.
In some cases I'd really like to be able to start the application without it waiting for some migrations to be completed (long running migrations, creating indexes or materialized views, etc.). This might also be needed when deploying from a CI-Server and using deploy-timeouts / health-checks (that can not be raised indefinitely) to ensure the deployment worked as expected.
Is there any configuration / convention / best practice to enable async migrations? (i.e. naming the migration A2_00__UpdateSthLong.sql instead of V2 (standard) or R2 (Repeatable migration).
Seems like its not possible (yet):
There is an issue requesting this on github:
https://github.com/flyway/flyway/issues/950
Flyway for micronaut, seems to already support this: https://github.com/flyway/flyway/issues/950
A temp solution until support is integrated could be to use Java Migrations and spawn an async-Task from within the migrate method yourself. The migration wouldn't be transactional then of course :(
Another solution might be to do the migration before actually starting up the application (i.e. by using the maven task).
I am facing a challenge here. So I inhertied the models from previous developers and the tables were not properly built. I added some constraints and new tables in order to normalize those tables. Before pushing the application to the heroku I tested it on my local machine and it actually broke my database.
Now the heroku website is already in production, so there are user information. How should i approach this, do I need to destroy the existing database and create a new one and run the migrations
Be very, very careful. Applying migrations on production servers can cause irreversible damage if you are not careful, and so you should be prepared for every possible situation.
My best recommendation would be to create an entire duplicate copy of your live DB (using Heroku this is as simple as a PG dump/backup). You can then create a new staging site using the same code, upload the backup into a new Database instance, and then test against that. Live environments are not always the same as local ones. You can then run your migrations on the staging site, and see if there are any unexpected effects (the best way to do this would be by utilizing django test cases). If there are any issues, be sure to understand how the rollback process works with django migrations.
A good tutorial that is fairly recent can be found here: https://realpython.com/django-migrations-a-primer/
We have been working on a django project for months. You know for a dev team, migrations conflicts happen many times. I searched a lot to look what others do with this kind of problem and got results:
What really annoys me about Django migrations
django migrations - workflow with multiple dev branches
Django Migrations and How to Manage Conflicts
How to avoid migration conflicts with other developers?
And many other articles about how to avoid and resolve migration conflicts.
I want to know what if we just ignore migration files and just don't commit them?
Any answer is appreciated.
You should not ignore database migrations. The Django documentation makes this pretty clear (emphasis is mine):
The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.
The fact that you have migration conflicts is an indication that your multiple developers are all creating their migrations at different times, resulting in a different set of files. If you commit the migrations as you should, this will never be a problem.
However, if you plan on squashing migrations (e.g. you expect to have a lot of churn in your database schema during a development cycle), you might wait to commit the migrations until all of your database design work for that cycle is complete. But they should always get committed.
After that, everyone will have the same set of files and no more conflicts.
I'm currently using Dropwizard Migrations, which uses the open source Liquibase to perform DB migrations. There is an enterprise paid version of liquibase called Datical, which brings a lot of extra features.
However, I'm wondering if the Dropwizard Migrations extension will work nicely with Datical, since I cannot test it without first paying for it.
Has anyone had this experience? Can I use Liquibase (in my case through Dropwizard Migrations) to run the changesets in development and then use Datical for production?
I have not used DropWizard, so I am not positive, but if we assume that DropWizard uses the standard Liquibase APIs, then this should work. Note that I am one of the developers at Datical, so I do have some expertise in this.
If you want to see if it really does work, I would recommend that your organization contact the Datical sales folks, who can set up a Proof-Of-Concept to make sure that it works for your organization.
On Heroku, as soon as you push new code, the web-serving instances restart... even if the underlying database schema additions/changes (via syncdb or south migrate) haven't yet been applied.
In many cases, this might just cause harmless errors undtil the syncdb/migrate is run soon afterward. But I'm concerned that in some cases, new code might half-work making unexpected changes in the pre-migration database.
What's the right way to be safe against this risk?
One technique might be to add the syncdb/migrate to the Procfile so it's run before web restart. But, in the case of multiple instances, or maybe even a case where the one old-code-instance is left running until the moment the one new-code-instance is known-up, there's still a variant of the issue where code is talking to a DB with a mismatched schema.
Is there a 'hold all web instances' feature (or common best practice) for letting the migrate complete without web traffic?
Or am I being overly concerned about a risk that is negligible in practice?
The safest way to handle migrations of this nature, Heroku or no, is to strictly adopt a compatibility approach with your schema and code:
Every additive or transformative schema change must be backwards-compatible;
Every destructive schema change must be performed after the code that depends on it has been removed;
Every code change must either be:
durable against the possibility that associated schema changes have not yet been made (for instance, removing a model or a field on a model) or
made only after the associated schema change has been performed (adding a model or a field on a model)
If you need to make a significant transformation of a model, this approach might require the following steps:
Create a new database table to hold your new model structure, and deploy that migration
Create a new model with the new structure, and code to copy changes from the old model to the new model when the old model changes, and deploy that code
Execute a migration or code action to copy all old model data to the new model
Update your codebase to use the new model rather than the old model, deleting the old model, and deploy that code
Execute a migration to delete the old model structure from the database
With some thought and planning, it can be used for more drastic changes as well:
Deploy code that completely removes dependence on a section of the database, presumably replacing those sections of the site with maintenance pages
Deploy a migration that makes drastic changes that would not for whatever reason work with the above dual-model workflow
Deploy code that brings the affected sections back with the new model structure supported
This can be hard to organize and requires strict discipline and firm understanding of your code's interaction with your database, but in practice, it does allow for most changes to be made with no more downtime than the server restart itself imposes.
Looks like fast-database changeovers are the way to go, but it requires a dedicated database.
http://devcenter.heroku.com/articles/fast-database-changeovers
Alternatively, here's a tutorial for copying the data from one database (e.g., production) to another database (e.g., staging), doing the schema/data migration (e.g., using django/south), then switching the app to use the newly-updated database instance.
http://devcenter.heroku.com/articles/migrating-data-between-plans
Seems reasonable, but potentially slow if there's a large amount of data.
The recommended method is this:
Add database changes for your new features to your existing code
Make the existing code compatible with the new schema
Deploy
Add the new features to your codebase
Deploy
This means that your database changes are already in place when the code starts to require them.
However....
There's a couple of issues with this. First that I know of no development shop that is organised enough to be able to handle this, as features just get built ad-hoc, and secondly that you're not really saving anything.
Generally speaking, unless your making big changes to a massive database your changes won't take long to apply and are usually over in a couple of seconds which a developer can work around quite happily issuing restarts etc when needed. The risk being that a user might get an error page. If the changes are larger, you have some alternatives. One is using maintenance mode to turn the site off for a few seconds.
To be honest, there is no clear cut way for how to handle this nicely as by definition your code needs to be in place for your database changes to start. The best way I've found to approach the problem is to look at each change individually and work out the smoothest path for each on a case by case basis.
Rehearsing deployments on a staging environment will mitigate the risk of a deploy going bad, and give you an idea of the impact.
Heroku recently released "buildpacks" which are the scripts they use to set up an environment for your application, from managing dependencies to restarting the instances. Essentially it's a more comprehensive Procfile which you can customize.
You can fork the Python buildpack and modify the script to run in the sequence you want. Append the command you run to syncdb to the end of bin/steps/django. Commit and put this repo on Github.
Unfortunately as of now it's not possible to modify the buildpack of an existing Heroku app, so you'll have to delete it and recreate one that points to your buildpack repo:
heroku create --stack cedar --buildpack git#github.com:...
This is the best solution because it
Doesn't cost anything at all
Doesn't require you to adapt your code to Heroku
Only syncs the db once per deployment
Hope this helps.