As the title says... I'm not sure if Django migrations should live in source control.
For:
If they get accidentally deleted from my local machine, it's going to cause me issues next time I want to run a migration... right? So it would be useful for me to have them.
Against:
Devs setting up the project for the first time shouldn't need to run them, they can just work straight from the models file.
They seem like machine-specific cruft.
Could they potentially reveal things I don't want about the database?
Yes, absolutely!!
From the docs:
The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.
One big point is that migrations should always be tested before you deploy them in production. You should never create migrations on production, only apply them.
You also want to synchronise the state of the models in source control with the state of the database. If someone pulls your branch, has to find a bug, and goes back in the source control's history, he'd need the migration files to change the state of the database to match that point in time. If he has to create his own migration files, they won't include the intermediate state, and he runs into a problem where his models are out-of-sync with the database.
Related
We have been working on a django project for months. You know for a dev team, migrations conflicts happen many times. I searched a lot to look what others do with this kind of problem and got results:
What really annoys me about Django migrations
django migrations - workflow with multiple dev branches
Django Migrations and How to Manage Conflicts
How to avoid migration conflicts with other developers?
And many other articles about how to avoid and resolve migration conflicts.
I want to know what if we just ignore migration files and just don't commit them?
Any answer is appreciated.
You should not ignore database migrations. The Django documentation makes this pretty clear (emphasis is mine):
The migration files for each app live in a “migrations” directory inside of that app, and are designed to be committed to, and distributed as part of, its codebase. You should be making them once on your development machine and then running the same migrations on your colleagues’ machines, your staging machines, and eventually your production machines.
The fact that you have migration conflicts is an indication that your multiple developers are all creating their migrations at different times, resulting in a different set of files. If you commit the migrations as you should, this will never be a problem.
However, if you plan on squashing migrations (e.g. you expect to have a lot of churn in your database schema during a development cycle), you might wait to commit the migrations until all of your database design work for that cycle is complete. But they should always get committed.
After that, everyone will have the same set of files and no more conflicts.
I have setup a git repository and cloned open source code which I am planning on modifying from github to start development. I committed the codebase in our repository.
I now have added few users and posts and other "stuff" into database.
I want to commit this change as well so that my teammate can check out and we have same settings and database throughout.
Is this possible by using south migrations? i.e will database bot contents and schema be in sync as well?
I have the project where I am writing the code as well the actual app. Should I commit both of them.
What should the github repository look like after doing the "right" thing
Data and database structure
This is possible using south migrations, data migrations and fixtures.
The easiest way for development is to just use a SQLite database, which is a binary file that you can commit. The test_project of django-autocomplete-light demonstrates such a possibility: http://django-autocomplete-light.readthedocs.org/en/latest/demo.html
you must use south anyway !
Apps in the project repo
I think you should keep apps as small and loosely coupled as possible.
If sound, make another repository and python package for the app:
In some cases it makes sense at the beginning, ie. a "blog" app that you know you will reuse,
In some cases it makes sense later, ie. you tought your app was really project specific but then you want to reuse it in another project,
In some cases it never makes sense (ie. the app is only useful to that particular project).
Best practice
As for best practices, there is http://12factor.net, http://lincolnloop.com/django-best-practices/ and pinax projects which are really interresting.
If you're going to reuse and extend external apps, then maybe this article on best practice reusing apps can help.
If there is data that every developer needs, these can be provided by initial_fixtures. I have just begun to use south, but I think it is mainly for migrations, (changing your models, without having to delete your data, and resync) not for sharing data. Schema should always be in sync using south, because a developer can pull the south migration, and apply it.
On Heroku, as soon as you push new code, the web-serving instances restart... even if the underlying database schema additions/changes (via syncdb or south migrate) haven't yet been applied.
In many cases, this might just cause harmless errors undtil the syncdb/migrate is run soon afterward. But I'm concerned that in some cases, new code might half-work making unexpected changes in the pre-migration database.
What's the right way to be safe against this risk?
One technique might be to add the syncdb/migrate to the Procfile so it's run before web restart. But, in the case of multiple instances, or maybe even a case where the one old-code-instance is left running until the moment the one new-code-instance is known-up, there's still a variant of the issue where code is talking to a DB with a mismatched schema.
Is there a 'hold all web instances' feature (or common best practice) for letting the migrate complete without web traffic?
Or am I being overly concerned about a risk that is negligible in practice?
The safest way to handle migrations of this nature, Heroku or no, is to strictly adopt a compatibility approach with your schema and code:
Every additive or transformative schema change must be backwards-compatible;
Every destructive schema change must be performed after the code that depends on it has been removed;
Every code change must either be:
durable against the possibility that associated schema changes have not yet been made (for instance, removing a model or a field on a model) or
made only after the associated schema change has been performed (adding a model or a field on a model)
If you need to make a significant transformation of a model, this approach might require the following steps:
Create a new database table to hold your new model structure, and deploy that migration
Create a new model with the new structure, and code to copy changes from the old model to the new model when the old model changes, and deploy that code
Execute a migration or code action to copy all old model data to the new model
Update your codebase to use the new model rather than the old model, deleting the old model, and deploy that code
Execute a migration to delete the old model structure from the database
With some thought and planning, it can be used for more drastic changes as well:
Deploy code that completely removes dependence on a section of the database, presumably replacing those sections of the site with maintenance pages
Deploy a migration that makes drastic changes that would not for whatever reason work with the above dual-model workflow
Deploy code that brings the affected sections back with the new model structure supported
This can be hard to organize and requires strict discipline and firm understanding of your code's interaction with your database, but in practice, it does allow for most changes to be made with no more downtime than the server restart itself imposes.
Looks like fast-database changeovers are the way to go, but it requires a dedicated database.
http://devcenter.heroku.com/articles/fast-database-changeovers
Alternatively, here's a tutorial for copying the data from one database (e.g., production) to another database (e.g., staging), doing the schema/data migration (e.g., using django/south), then switching the app to use the newly-updated database instance.
http://devcenter.heroku.com/articles/migrating-data-between-plans
Seems reasonable, but potentially slow if there's a large amount of data.
The recommended method is this:
Add database changes for your new features to your existing code
Make the existing code compatible with the new schema
Deploy
Add the new features to your codebase
Deploy
This means that your database changes are already in place when the code starts to require them.
However....
There's a couple of issues with this. First that I know of no development shop that is organised enough to be able to handle this, as features just get built ad-hoc, and secondly that you're not really saving anything.
Generally speaking, unless your making big changes to a massive database your changes won't take long to apply and are usually over in a couple of seconds which a developer can work around quite happily issuing restarts etc when needed. The risk being that a user might get an error page. If the changes are larger, you have some alternatives. One is using maintenance mode to turn the site off for a few seconds.
To be honest, there is no clear cut way for how to handle this nicely as by definition your code needs to be in place for your database changes to start. The best way I've found to approach the problem is to look at each change individually and work out the smoothest path for each on a case by case basis.
Rehearsing deployments on a staging environment will mitigate the risk of a deploy going bad, and give you an idea of the impact.
Heroku recently released "buildpacks" which are the scripts they use to set up an environment for your application, from managing dependencies to restarting the instances. Essentially it's a more comprehensive Procfile which you can customize.
You can fork the Python buildpack and modify the script to run in the sequence you want. Append the command you run to syncdb to the end of bin/steps/django. Commit and put this repo on Github.
Unfortunately as of now it's not possible to modify the buildpack of an existing Heroku app, so you'll have to delete it and recreate one that points to your buildpack repo:
heroku create --stack cedar --buildpack git#github.com:...
This is the best solution because it
Doesn't cost anything at all
Doesn't require you to adapt your code to Heroku
Only syncs the db once per deployment
Hope this helps.
I am building a Django powered web app that has a large database component. I am wondering, how would I go about continuing to develop the web app while users are using the live, production version? There are two parts to the problem, as I see it, as follows:
Making changes to templates, scripts, and other files
Making database schema changes
Now, the first problem is easy to manage with a SVN system. Heck, I could just have a "dev" directory which have all my in-development files and, once ready, just copy them into the "production" directory.
However, the second problem is more confusing to me. How do I test/develop new database changes without affecting the main/live database? I have been using South to do schema migrations during the initial creation stages of the web app, but surely I wouldn't want to make changes to the database while it is being used. Especially if I make changes that I don't want to keep.
Any thoughts/ideas?
You need another server on which to do your development. Typically, this is a personal machine, like your laptop. Often, you also have a copy of your production environment on a server, known as the staging server.
Your workflow would be like this:
Work on your code on your development machine, make all the changes you want, it's just you using it.
When the code is ready for production, you push it to the staging server to see that it really works properly in a server environment.
When you're sure it's ready for production, push it to the production server.
I just finished a Django app that I want to get some outside user feedback on. I'd like to launch one version and then fork a private version so I can incorporate feedback and add more features. I'm planning to do lots of small iterations of this process. I'm new to web development; how do websites typically do this? Is it simply a matter of copying my Django project folder to another directory, launching the server there, and continuing my dev work in the original directory? Or would I want to use a version control system instead? My intuition is that it's the latter, but if so, it seems like a huge topic with many uses (e.g. collaboration, which doesn't apply here) and I don't really know where to start.
1) Seperate URLs www.yoursite.com vs test.yoursite.com. you can also do www.yoursite.com and www.yoursite.com/development, etc.. You could also create a /beta or /staging..
2) Keep seperate databases, one for production, and one for development. Write a script that will copy your live database into a dev database. Keep one database for each type of site you create. (You may want to create a beta or staging database for your tester).. Do your own work in the dev database. If you change the database structure, save the changes as a .sql file that can be loaded and run on the live site database when you turn those changes live.
3) Merge features into your different sites with version control. I am currently playing with a subversion setup for web apps that has my stable (trunk), one for staging, and one for development. Development tags + branches get merged into staging, and then staging tags/branches get merged into stable. Version control will let you manage your source code in any way you want. You will have to find a methodology that works for you and use it.
4) Consider build automation. It will publish your site for you automatically. Take a look at http://ant.apache.org/. It can drive a lot of automatically checking out your code and uploading it to each specific site as you might need.
5) Toy of the month: There is a utility called cUrl that you may find valuable. It does a lot from the command line. This might be okay for you to do in case you don't want to use all or any of Ant.
Good luck!
You would typically use version control, and have two domains: your-site.com and test.your-site.com. Then your-site.com would always update to trunk which is the current latest, shipping version. You would do your development in a branch of trunk and test.your-site.com would update to that. Then you periodically merge changes from your development branch to trunk.
Jas Panesar has the best answer if you are asking this from a development standpoint, certainly. That is, if you're just asking how to easily keep your new developments separate from the site that is already running. However, if your question was actually asking how to run both versions simultaniously, then here's my two cents.
Your setup has a lot to do with this, but I always recommend running process-based web servers in the first place. That is, not to use threaded servers (less relevant to this question) and not embedding in the web server (that is, not using mod_python, which is the relevant part here). So, you have one or more processes getting HTTP requests from your web server (Apache, Nginx, Lighttpd, etc.). Now, when you want to try something out live, without affecting your normal running site, you can bring up a process serving requests that never gets the regular requests proxied to it like the others do. That is, normal users don't see it.
You can setup a subdomain that points to this one, and you can install middleware that redirects "special" user to the beta version. This allows you to unroll new features to some users, but not others.
Now, the biggest issues come with database changes. Schema migration is a big deal and something most of us never pay attention to. I think that running side-by-side is great, because it forces you to do schema migrations correctly. That is, you can't just shut everything down and run lengthy schema changes before bringing it back up. You'd never see any remotely important site doing that.
The key is those small steps. You need to always have two versions of your code able to access the same database, so changes you make for the new code need to not break the old code. This breaks down into a few steps you can always make:
You can add a column with a default value, or that is optional. The new code can use it, and the old code can ignore it.
You can update the live version with code that knows to use a new column, at which point you can make it required.
You can make the new version ignore a column, and when it becomes the main version, you can delete that column.
You can make these small steps to migrate between any schemas. You can iteratively add a new column that replaces an old one, roll out the new code, and remove the old column, all without interrupting service.
That said, its your first web app? You can probably break it. You probably have few users :-) But, it is fantastic you're even asking this question. Many "professionals" fair to ever ask it, and even then fewer answer it.
What I do is have an export a copy of my SVN repository and put the files on the live production server, and then keep a virtual machine with a development working copy, and submit the changes to the repo when Im done.