DB migration: deleting a table - database-migration

What is the best practice for removing a database table when database upgrades are handled in 'migrations' fashion?
We use Flyway for database migrations. Every time there is a database change, a migration script (which takes care of the delta change) gets added.
After a round of refactoring to remove obsolete code, couple of tables are no more needed.
Options I can think of are:
Leave those tables alone. I don't like clutter, so prefer not to go with this option.
Add migration script to delete these tables. Creating and later deleting a few tables will add to app installation time, again not preferable to us.
Edit one of the initial migration scripts, so the table doesn't get created for new installations. Problem: Flyway will complain that one of the migrations was tampered with.
Are there other options?

Don't worry so much about #2. The overhead is little when the tables are empty and it's not very often that one needs to rebuild the full DB anyway.

Related

Best way to update Doctrine migrations tables

Overview
I have an application that uses Doctrine migrations. The application is a software code base used to manage multiple businesses who all use the same code instance in their own unique environments. i.e. Each code base is identical other than configurations.
Mistakes we made
One of the mistakes the dev team made was to include core migrations in the customer folders. i.e. With each new application we have to drop in the migration files or run a migration:diff to get going which I feel is not efficient and can lead to a mess.
What I would prefer is to have core migrations as part of the core code since it rarely changes and custom migrations on the client app side.
What I want to do
I want to move all core structure migrations to our code files
I want to have a single custom migration to drop in customised data in the client migration folder.
The problem
The problem I face is pretty much how to reorganize the migrations without breaking databases on existing client applications.
I have thought of two solutions:
Solution 1:
Add blank migrations as a placeholder for the new migrations I want.
Commit these to the repo and deploy to our environments.
They will be run, nothing will be changed, the migraitons table will store them as having been executed.
Next, Update the blank migrations to the actual code I want, and empty all other migration files. Commit this to the environments.
Finally - remove the unwanted migration files, remove the unwanted database migration records.
Solution 2
Change the migration location in the db to a new location
Remove all migration files and add blank migrations for the new ones I want
Commit this to the repo, allow to run and record the migrations as being run in the new table.
Add migration code.
Now all new applications will have the updated migration files and the old apps will have the new migration files...
Question:
Am I re-inventing the wheel? Is there a standard on how to do this as I am certain I am not the first to bump into this problem?
So for anyone who finds themselves in a similar position where they need to tidy up a mess of doctrine migrations, this should serve as a simple pattern to follow.
In our development environment we use continuous integration/git/kubernetes etc. and the following process works well with our environment.
The steps:
Update the migrations table name, this you can do in the configs quite easily.
'table_storage' => [
'table_name' => 'migration_version',
'version_column_name' => 'version_timestamp',
],
Next, delete your old migrations (delete the files) and run migrations:diff to generate a new one which will be a combination of all your changes.
Now comment out the code in the new file so that it's essentially an empty migration file.
On local, delete the old migrations table and run your build process which will add the new migration to the new table.
Commit to develop/staging/live etc. and repeat the process.
Now that the db in all your environments has the updated migrations file in it. You can now uncomments the code which will not be executed when you commit the file since it exists in your migrations table.
Hope this helps someone!

What's best approach to maintain database table field between git branch?

I'm using Django and Postgresql to develop a web service.
Suppose we've 3~4 branch which for the different features or old-version bugfix purpose.
Then, I met a problem, when I was in branch A and change django model, and run migrate to change database in my local test desktop.
When I switch to another branch which has no migration file, database will inconsistent and cannot work when I try to run django, I've to delete the database and recreate it.
In general, what's the best/common way to deal with this kind demands for developer environment?
I understand your situation well and have been in same shoe several times.
Here is what I prefer(/do):
I am in branch bug-fix/surname_degrade
I changed the user data model [which generated user_migration_005] and then migrated the DB.
Then my boss came and pointed out that the user is not able to login due to login degrade.
So I have to switch branch and fix that first.
I can rollback the migration[user_migration_005] which I have done few moments back. With something like this python manage.py migrate user_migration_004
Switched branch and started working on hot-fix/login_degrade
When I switch back to my previous task , I can just do migration and proceed.
With this procedure I don't need to delete my all tables or restore old database or anything like that.
I am a newbie, will be extremely happy to hear your thoughts.
The major issue here is that, you database will change everytime You migrate,so either you mantain you database consistency among different branches, or You can do One thing, while using/testing (after declaring all the models)
1) Delete all database tables ( If you have a backup or dummy data )
2) Delete all existing migration files in you branch
3) Create new migrations
4) Migrate to new migrations
The above steps can also be done if the models are re modified, after modification just repeat the steps.
Run a different test database in each branch.
When you fork the design, fork the database
Make a clone of the database and migrate that.
Make sure when you push to git, you include your migrations, that wait when someone else pulls the branch and does a migrate django knows what changes were made to the database.

--fake-initial vs --fake in Django migration?

What is the difference between --fake-initial and --fake in Django migrations? What are the dangers of using fake migrations? Anybody knows? Thank you very much to all.
I am using django 1.10
Well the documentation is very clear about this
--fake-initial
Allows Django to skip an app’s initial migration if all database
tables with the names of all models created by all CreateModel
operations in that migration already exist. This option is intended
for use when first running migrations against a database that
preexisted the use of migrations. This option does not, however, check
for matching database schema beyond matching table names
You were asking about the risks, well here it is
only safe to use if you are confident that your existing schema
matches what is recorded in your initial migration.
--fake
Tells Django to mark the migrations as having been applied or
unapplied, but without actually running the SQL to change your
database schema.
This is intended for advanced users to manipulate the current
migration state directly if they’re manually applying changes;
Once again risks are clearly highlighted
be warned that using --fake runs the risk of putting the migration
state table into a state where manual recovery will be needed to make
migrations run correctly.
This answer is valid not just for django versions 1.8+ but for other versions as well.
edit Nov, 2018: I sometimes I see answers here and elsewhere that suggest that you should drop your databae. That's almost never the right thing to do. If you drop your database you lose all your data.
#e4c5 already gave an answer about this question, but I would like to add one more thing concerning when to use --fake and --fake-initial.
Suppose you have a database from production and you want to use it for development and apply migrations without destroying the data. In that case --fake-initial comes in handy.
The --fake-initial will force Django to look at your migration files and basically skip the creation of tables that are already in your database. Do note, though, that any migrations that don’t create tables (but rather modify existing tables) will be run.
Conversely, If you have an existing project with migration files and you want to reset the history of existing migrations, then --fake is usually used.
Short answer
--fake does not apply the migration
--fake-initial might, or might not apply the migration
Longer answer:
--fake: Django keeps a table called django_migrations to know which migrations it has applied in the past, to prevent you from accidentally applying them again. All --fake does is insert the migration filename into that table, without actually running the migration. This is useful if you manually changed the database schema first, and the models later, and want to bypass django's actions. However, during that step you are on your own, so take care that you don't end up in an inconsistent state.
--fake-initial: depends on the state of the database
all of the tables already exist in the database: in that case, it works like --fake. Only the names of the tables are checked, not their actual schema, so, again, take care
none of the tables already exist in the database: in that case, it works like a normal migration
some of the table already exist: you get an error. That's not supposed to happen, either you take care of the database, or django does.
Note that, --fake-initial is only taken into account if the migration file has initial=True in its class, otherwise the flag is ignored. Also, this is the only documented usage of initial=True in migrations.

On Heroku, is there danger in a Django syncdb / South migrate after the instance has already restarted with changed model code?

On Heroku, as soon as you push new code, the web-serving instances restart... even if the underlying database schema additions/changes (via syncdb or south migrate) haven't yet been applied.
In many cases, this might just cause harmless errors undtil the syncdb/migrate is run soon afterward. But I'm concerned that in some cases, new code might half-work making unexpected changes in the pre-migration database.
What's the right way to be safe against this risk?
One technique might be to add the syncdb/migrate to the Procfile so it's run before web restart. But, in the case of multiple instances, or maybe even a case where the one old-code-instance is left running until the moment the one new-code-instance is known-up, there's still a variant of the issue where code is talking to a DB with a mismatched schema.
Is there a 'hold all web instances' feature (or common best practice) for letting the migrate complete without web traffic?
Or am I being overly concerned about a risk that is negligible in practice?
The safest way to handle migrations of this nature, Heroku or no, is to strictly adopt a compatibility approach with your schema and code:
Every additive or transformative schema change must be backwards-compatible;
Every destructive schema change must be performed after the code that depends on it has been removed;
Every code change must either be:
durable against the possibility that associated schema changes have not yet been made (for instance, removing a model or a field on a model) or
made only after the associated schema change has been performed (adding a model or a field on a model)
If you need to make a significant transformation of a model, this approach might require the following steps:
Create a new database table to hold your new model structure, and deploy that migration
Create a new model with the new structure, and code to copy changes from the old model to the new model when the old model changes, and deploy that code
Execute a migration or code action to copy all old model data to the new model
Update your codebase to use the new model rather than the old model, deleting the old model, and deploy that code
Execute a migration to delete the old model structure from the database
With some thought and planning, it can be used for more drastic changes as well:
Deploy code that completely removes dependence on a section of the database, presumably replacing those sections of the site with maintenance pages
Deploy a migration that makes drastic changes that would not for whatever reason work with the above dual-model workflow
Deploy code that brings the affected sections back with the new model structure supported
This can be hard to organize and requires strict discipline and firm understanding of your code's interaction with your database, but in practice, it does allow for most changes to be made with no more downtime than the server restart itself imposes.
Looks like fast-database changeovers are the way to go, but it requires a dedicated database.
http://devcenter.heroku.com/articles/fast-database-changeovers
Alternatively, here's a tutorial for copying the data from one database (e.g., production) to another database (e.g., staging), doing the schema/data migration (e.g., using django/south), then switching the app to use the newly-updated database instance.
http://devcenter.heroku.com/articles/migrating-data-between-plans
Seems reasonable, but potentially slow if there's a large amount of data.
The recommended method is this:
Add database changes for your new features to your existing code
Make the existing code compatible with the new schema
Deploy
Add the new features to your codebase
Deploy
This means that your database changes are already in place when the code starts to require them.
However....
There's a couple of issues with this. First that I know of no development shop that is organised enough to be able to handle this, as features just get built ad-hoc, and secondly that you're not really saving anything.
Generally speaking, unless your making big changes to a massive database your changes won't take long to apply and are usually over in a couple of seconds which a developer can work around quite happily issuing restarts etc when needed. The risk being that a user might get an error page. If the changes are larger, you have some alternatives. One is using maintenance mode to turn the site off for a few seconds.
To be honest, there is no clear cut way for how to handle this nicely as by definition your code needs to be in place for your database changes to start. The best way I've found to approach the problem is to look at each change individually and work out the smoothest path for each on a case by case basis.
Rehearsing deployments on a staging environment will mitigate the risk of a deploy going bad, and give you an idea of the impact.
Heroku recently released "buildpacks" which are the scripts they use to set up an environment for your application, from managing dependencies to restarting the instances. Essentially it's a more comprehensive Procfile which you can customize.
You can fork the Python buildpack and modify the script to run in the sequence you want. Append the command you run to syncdb to the end of bin/steps/django. Commit and put this repo on Github.
Unfortunately as of now it's not possible to modify the buildpack of an existing Heroku app, so you'll have to delete it and recreate one that points to your buildpack repo:
heroku create --stack cedar --buildpack git#github.com:...
This is the best solution because it
Doesn't cost anything at all
Doesn't require you to adapt your code to Heroku
Only syncs the db once per deployment
Hope this helps.

Is there a way to update the database with the changes in my models? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
update django database to reflect changes in existing models
I've used Django in the past and one of the frustrations I've had with it as an ORM tools is the inability to update an existing database with changes in the model. (Hibernate does this very well and makes things really easy for updating and heavily modifying a model and applying this to an existing database.) Is there a way to do this without wiping the database every time? It gets really old having to regenerate admin users and sites after every change in the model which I'd like to play with.
You will want to look into South. It provides a migrations system to migrate both schema changes as well as data from one version to the next.
It's quite powerful and the vast majority of changes can be handled simple by going
manage.py schemamigration --auto
manage.py migrate
The auto functionality does have it limits, and especially if the change is going to be run on a production system eventually you should check the code --auto generated to be sure it's doing what you expect.
South has a great guide to getting started and is well documented. You can find it at http://south.aeracode.org
No.
As the documentation of syncdb command states:
Syncdb will not alter existing tables
syncdb will only create tables
for models which have not yet been installed. It will never issue
ALTER TABLE statements to match changes made to a model class after
installation. Changes to model classes and database schemas often
involve some form of ambiguity and, in those cases, Django would have
to guess at the correct changes to make. There is a risk that critical
data would be lost in the process.
If you have made changes to a model and wish to alter the database
tables to match, use the sql command to display the new SQL structure
and compare that to your existing table schema to work out the
changes.
South seems to be how most people solve this problem, but a really quick and easy way to do this is to change the db directly through your database's interactive shell. Just launch your db shell (usually just dbshell) and manually alter, add, drop the fields and tables you need changed using your db syntax.
You may want to run manage.py sqlall appname to see the sql statements Django would run if it was creating the updated table, and then use those to alter the database tables and fields as required.
The Making Changes to a Database Schema section of the Django book has a few examples of how to do this: http://www.djangobook.com/en/1.0/chapter05/
I manually go into the database - whatever that may be for you: MySQL, PostgreSQL, etc. - to change database info, and then I adjust the models.py accordingly for reference. I know there is Django South, but I didn't want to bother with using another 3rd party application.