specify backward compatibility depth of migrations in apps or projects (django) - django

Often when one develops django applications, models change and hence migrations are added. IMHO a good practice is to make new code intended for a newer database schema to also be compatible with the older schema, possibly by not showing new functionality.
For example, if you add a function of including a bounding box for a profile photo, this feature is not there if the appropriate table doesn't have the fields for this bounding box. Such graceful degradation can keep a website running, even though not all migrations were executed yet.
This removes some stress during updating live deployments. Distributed web nodes can be updated one at a time, and the migration is done at the end.
Nevertheless, graceful degradation can only go so far, at some point the burden of endless backward comparability is too much a drain on development resources. System administrators, however, can not be expected to be aware of how large this migration code gap can be and verbal communication can easily get noisy. And after a week a developer might have forgotten how migration backward compatible some release is.
This makes me wonder, can an app or a project specify with which migrations it is compatible?
If not, then what is a good update order of n web nodes and one database?
Such specification would allow updates to warn about whether or not the database is compatible and vice versa possibly also whether there are outdated web nodes when migrating a database.

There are a few things that make a migration not compatible with old versions of the code:
Adding a NOT NULL column
Dropping a column (while it is still used in the old version)
Renaming a column
Altering a column (e.g. changing the type)
Renaming a table
There is a project called migration-linter that can help detecting those backward incompatible migrations in an automated way.
Hope that helps.

Related

Maintaining South migrations on Django forks

I'm working on a pretty complex Django project (50+ models) with some complicated logic (lots of different workflows, views, signals, APIs, background tasks etc.). Let's call this project-base. Currently using Django 1.6 + South migrations and quite a few other 3rd party apps.
Now, one of the requirements is to create a fork of this project that will add some fields/models here and there and some extra logic on top of that. Let's call this project-fork. Most of the extra work will be on top of the existing models, but there will also be a few new ones.
As project-base continues to be developed, we want these features to also get into project-fork (much like a rebase/merge in git-land). The extra changes in project-fork will not be merged back into project-base.
What could be the best possible way to accomplish this? Here are some of my ideas:
Use South merges in project-fork to keep it up-to-date with latest changes from project-base, as explained here. Use signals and any other means necessarry to keep the new logic from project-fork as loosely coupled as possible to avoid any potential conflicts.
Do not modify ANY of the original project-base models and instead create new models in different apps that reference the old models (i.e. using OneToOneField). Extra logic could end up in the old and/or new apps.
your idea here please :)
I would go with option 1 as it seems less complicated as a whole, but might expose a greater risk. Here's how I would see it happening:
Migrations on project-base:
0001_project_base_one
0002_project_base_two
0003_project_base_three
Migrations on project-fork:
0001_project_base_one
0002_project_fork_one
After merge, the migrations would look like this:
0001_project_base_one
0002_project_base_two
0002_project_fork_one
0003_project_base_three
0004_project_fork_merge_noop (added to merge in changes from both projects)
Are there any pitfalls using this approach? Is there a better way?
Thank you for your time.
Official South workflow:
The official South recommendation is to try with the --merge flag: http://south.readthedocs.org/en/latest/tutorial/part5.html#team-workflow
This obviously won't work in all cases, from my experience it works in most though.
Pitfalls:
Multiple changes to the same model can still break
Duplicate changes can break things
The "better" way is usually to avoid simultaneous changes to the same models, the easiest way to do this is by reducing the error-window as much as possible.
My personal workflows in these cases:
With small forks where the model changes are obvious from the beginning:
Discus which model changes will need to be done for the fork
Apply those changes to both/all branches as fast as possible to avoid conflicts
Work on the fork ....
Merge the fork which gives no new migrations
With large forks where the changes are not always obvious and/or will change again:
Do the normal fork and development stuff while trying to stay up to date with the latest master/develop branch as much as possible
Before merging back, throw away all schemamigrations in the fork
Merge all changes from the master/develop
Recreate all needed schema changes
Merge to develop/master

Is there a quick way to automatically update a Django project to the next major version?

So Django 1.5 will be out within an Internet-Week (soon) and I'm feeling the pressure to start playing around with it for its vastly improved User model. But like with 1.4 and 1.3 before it, upgrading is not simply a virtualenv change, I've needed to alter (quite heavily in places) the structure of my projects.
I recently jumped from 1.2 to 1.4 and spend hours (over many projects) updating their file structures (and settings) to mirror modern Django preferences. Django 1.5 is going to force me to go through and fix all the CACHE settings and fix practically every instance of {% url ... %} in my templates.. I'm wondering if there's an easier way.
Is there anything out there that can automagically "upgrade" a Django project?
... Or at the very least "scan" a project to show up the larger honking issues so I can kick those out before starting QC.
No. And for good reason.
Some of the changes between versions are minor (such as the whole quotes around variables thing), while some are major (like the new User model).
While automatic detection might be possible (though non-trivial considering you'd have to execute every possible code path in the app), changing them would be fragile. And possible very dangerous.
Instead, when moving between versions, I move up one version (to the last minor version in for that major) at a time and test. Django will issue warnings for features that will be removed between versions, so it's easy to catch them while still having running code that works.

Database versions deployment. Entity Framework Migrations vs SSDT DacPacs

I have a data-centered application with SQL Server. The environments in which it´ll be deployed are not under our control and there´s no DBA in there (they are all small businesses) so we need the process of distribution of each application/database update to be as automatic as possible.
Besides of the normal changes between versions of an application (kind of unpredictable sometimes), we already know that we´ll need to distribute some new seed data with each version. Sometimes this seed data will be related to other data in our system. For instance: maybe we´ll need to insert 2 new rows of some master data during the v2-v3 update process, and some other 5 rows during the v5-v6 update process.
EF
We have checked Entity Framework Db Migrations (available for existing databases with no Code-First since 4.3.1 release), which represents the traditional sequential scripts in a more automatic and controlled way (like Fluent Migrations).
SSDT
On the other hand, with a different philosophy, we have checked SSDT and its dacpacs, snapshots and pre- and post-deployment scripts.
The questions are:
Which of these technologies / philosophies is more appropriate for the case described?
Any other technology / philosophy that could be used?
Any other advice?
Thanks in advance.
That's an interesting question. Here at Red Gate we're hoping to tackle this issue later this year, as we have many customers asking about how we might provide a simple deployment package. We do have SQL Packager, which essentially wraps a SQL script into an exe.
I would say that dacpacs are designed to cover the use case you describe. However, as far as I understand they work be generating a deployment script dynamically when applied to the target. The drawback is that you won't have the warm fuzzy feeling that you might get when deploying a pre-tested SQL script.
I've not tried updating data with dacpacs before, so I'd be interested to know how well this works. As far as I recall, it truncates the target tables and repopulates them.
I have no experience with EF migrations so I'd be curious to read any answers on this topic.
We´ll probably adopt an hybrid solution. We´d like not to renounce to the idea deployment packagers, but in the other hand, due to our applications´s nature (small businesses as final users, no DBA, no obligation to upgrade so multiple "alive" database versions coexisting), we can´t either renounce to the full control of the migration process, including schema and data. In our case, pre and post-deployment scripts may not be enough (or at least not comfortable enough ) for a full migration like EF Migrations are. Changes like addind/removing seed data, changing a "one to many" to a "many to many" relationship or even radical database schema changes (and, consequently , data migrations to this schema from any previous released schema) may be part of our diary work when our first version is released.
So we´ll probably use EF migations, with its "Up" and "Down" system for each version release. In principle, each "Up" will invoke a dacpac with the last database snapshot (and each Down, its previous), each one with its own deployment parameters for this specific migration. EF migrations will handle the versioning line, an maybe also some complex parts of data migration.
We feel more secure in this hybrid way. We missed automatization and schema changes detection in Entity Framework Migrations as much as we missed versioning line control in Dacpacs way.

Optimisation tips when migrating data into Sitecore CMS

I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.
During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.
The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.
If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
Use BulkUpdateContext()
Don't forget your target language
If you can, make the fields shared and unversioned. This should help migration execution speed.
The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!
The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?
What optimization tips do you have?
Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)
Since I wrote that post, I've added a few new things to my normal routines when doing imports.
Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets
And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)
Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay
Hope this helps a bit :-)
I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.
I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?

Is there a database implementation that has notifications and revisions?

I am looking for a database library that can be used within an editor to replace a custom document format. In my case the document would contain a functional program.
I want application data to be persistent even while editing, so that when the program crashes, no data is lost. I know that all databases offer that.
On top of that, I want to access and edit the document from multiple threads, processes, possibly even multiple computers.
Format: a simple key/value database would totally suffice. SQL usually needs to be wrapped, and if I can avoid pulling in a heavy ORM dependency, that would be splendid.
Revisions: I want to be able to roll back changes up to the first change to the document that has ever been made, not only in one session, but also between sessions/program runs.
I need notifications: each process must be able to be notified of changes to the document so it can update its view accordingly.
I see these requirements as rather basic, a foundation to solve the usual tough problems of an editing application: undo/redo, multiple views on the same data. Thus, the database system should be lightweight and undemanding.
Thank you for your insights in advance :)
Berkeley DB is an undemanding, light-weight key-value database that supports locking and transactions. There are bindings for it in a lot of programming languages, including C++ and python. You'll have to implement revisions and notifications yourself, but that's actually not all that difficult.
It might be a bit more power than what you ask for, but You should definitely look at CouchDB.
It is a document database with "document" being defined as a JSON record.
It stores all the changes to the documents as revisions, so you instantly get revisions.
It has powerful javascript based view engine to aggregate all the data you need from the database.
All the commits to the database are written to the end of the repository file and the writes are atomic, meaning that unsuccessful writes do not corrupt the database.
Another nice bonus You'll get is easy and flexible replication and of your database.
See the full feature list on their homepage
On the minus side (depending on Your point of view) is the fact that it is written in Erlang and (as far as I know) runs as an external process...
I don't know anything about notifications though - it seems that if you are working with replicated databases, the changes are instantly replicated/synchronized between databases. Other than that I suppose you should be able to roll your own notification schema...
Check out ZODB. It doesn't have notifications built in, so you would need a messaging system there (since you may use separate computers). But it has transactions, you can roll back forever (unless you pack the database, which removes earlier revisions), you can access it directly as an integrated part of the application, or it can run as client/server (with multiple clients of course), you can have automatic persistency, there is no ORM, etc.
It's pretty much Python-only though (it's based on Pickles).
http://en.wikipedia.org/wiki/Zope_Object_Database
http://pypi.python.org/pypi/ZODB3
http://wiki.zope.org/ZODB/guide/index.html
http://wiki.zope.org/ZODB/Documentation