How do the big sites handle immediate schema changes whilst using Django?

How do the big sites handle immediate schema changes whilst using Django? - django

I've been using south but I really hate having to manually migrate data all over again even if I make one small itty bitty update to a class. If I'm not using django I can easily just alter the table schema and make an adjustment in a class and I'm good.
I know that most people would probably tell me to properly think out the schema way in advance, but realistically speaking there are times where you need to immediately make changes, and I don't think using south is ideal for this.
Is there some sort of advanced method people use, perhaps even modifying the core of Django itself? Or is there something about south that I'm not just grokking?

I really hate having to manually migrate data all over again even if I make one small itty bitty update to a class.
Can you specify what kind of updates? If you mean adding new fields or editing existing ones then obviously yes. If you mean modifying methods that operate on fields then there is no need to migrate.
I know that most people would probably tell me to properly think out the schema way in advance
It would certainly help to think it over a couple of times. Experience helps too. But obviously you cannot foresee everything.
but realistically speaking there are times where you need to immediately make changes, and I don't think using south is ideal for this.
Honestly I am not convinced by this argument. If changes can be deployed "immediately" using SQL then I'd argue that they can be deployed using South as well. Especially if you have automated your deployment using Fabric or such.
Also I find it hard to believe that the time taken to execute a migration using a generated script can be significantly greater than the time taken to first write the appropriate SQL and then execute it. At least this has not been the case in my experience.
The one exception could be a situation where the ORM doesn't readily have an equivalent for the SQL. In that case you can still execute the raw SQL through your (South) migration script.
Or is there something about south that I'm not just grokking?
I suspect that you are not grokking the idea of having orderly, version-controlled, reversible migrations. SQL-only migrations are not always designed to be reversible (I know there are exceptions). And they are not orderly unless the developers take particular care about keeping them so. I've even seen people fire potentially troublesome updates on production without even pausing to start a transaction first and then discard the SQL without even making a record of it.
I'm not questioning your skills or attention to detail here; I'm just pointing out what I think is your disconnect with South.
Hope this helps.

Related

Maintaining South migrations on Django forks

I'm working on a pretty complex Django project (50+ models) with some complicated logic (lots of different workflows, views, signals, APIs, background tasks etc.). Let's call this project-base. Currently using Django 1.6 + South migrations and quite a few other 3rd party apps.
Now, one of the requirements is to create a fork of this project that will add some fields/models here and there and some extra logic on top of that. Let's call this project-fork. Most of the extra work will be on top of the existing models, but there will also be a few new ones.
As project-base continues to be developed, we want these features to also get into project-fork (much like a rebase/merge in git-land). The extra changes in project-fork will not be merged back into project-base.
What could be the best possible way to accomplish this? Here are some of my ideas:
Use South merges in project-fork to keep it up-to-date with latest changes from project-base, as explained here. Use signals and any other means necessarry to keep the new logic from project-fork as loosely coupled as possible to avoid any potential conflicts.
Do not modify ANY of the original project-base models and instead create new models in different apps that reference the old models (i.e. using OneToOneField). Extra logic could end up in the old and/or new apps.
your idea here please :)
I would go with option 1 as it seems less complicated as a whole, but might expose a greater risk. Here's how I would see it happening:
Migrations on project-base:
0001_project_base_one
0002_project_base_two
0003_project_base_three
Migrations on project-fork:
0001_project_base_one
0002_project_fork_one
After merge, the migrations would look like this:
0001_project_base_one
0002_project_base_two
0002_project_fork_one
0003_project_base_three
0004_project_fork_merge_noop (added to merge in changes from both projects)
Are there any pitfalls using this approach? Is there a better way?
Thank you for your time.

Official South workflow:
The official South recommendation is to try with the --merge flag: http://south.readthedocs.org/en/latest/tutorial/part5.html#team-workflow
This obviously won't work in all cases, from my experience it works in most though.
Pitfalls:
Multiple changes to the same model can still break
Duplicate changes can break things
The "better" way is usually to avoid simultaneous changes to the same models, the easiest way to do this is by reducing the error-window as much as possible.
My personal workflows in these cases:
With small forks where the model changes are obvious from the beginning:
Discus which model changes will need to be done for the fork
Apply those changes to both/all branches as fast as possible to avoid conflicts
Work on the fork ....
Merge the fork which gives no new migrations
With large forks where the changes are not always obvious and/or will change again:
Do the normal fork and development stuff while trying to stay up to date with the latest master/develop branch as much as possible
Before merging back, throw away all schemamigrations in the fork
Merge all changes from the master/develop
Recreate all needed schema changes
Merge to develop/master

Liquibase Database Migrations, Managing over time

We've been using Liquibase for our database migrations, which runs off the Migrations.xml file. This references all of our migrations so it looks something to the effect of:
<include file="ABC.xml" relativeToChangelogFile="true" />
<include file="DEF.xml" relativeToChangelogFile="true" />
<include file="HIJ.xml" relativeToChangelogFile="true" />
...
However, over time, this has grown immensely long. Looking to refactor this has anyone solved this issue? What options are available to essentially condense all these going forward?
Thanks

There is no single answer, what works best depends on your environment, process, and why you want to trim them down.
Generally, I recommend just letting your changelog grow because existing steps are the ones you have tested, deployed, and know are correct. Changing them introduces risk which is always good to avoid. Keeping your changelog unchanged also allows you to continue to bootstrap new dev/QA databases the same way older instances were.
While there is some performance penalty to having a large changelog, Liquibase tries to keep it to a minimum and is primarily just parsing the changelog file and then reading from the DATABASECHANGELOG table which should scale well. Since liquibase update is usually ran infrequently, even if it takes a couple seconds to run it's often not a big deal.
All that being said, there are times it makes sense clear out your changelog. The best way to do that depends on what the problem you are trying to resolve is.
The easiest approach is usually to just remove the include references to your oldest changelog files. If you know that all your databases have the changesSets in ABC.xml and DEF.xml already applied you can just remove the references to them from your master changelog and everthing will be fine. Liquibase doesn't care if there are "unknown" changeSets in the DATABASECHANGELOG table. If you want to continue boostrapping dev and QA environments with ABC.xml and DEF.xml, you can create a second legacy" master changelog.xml which include them and either run both changelogs when needed or make sure the legacy changelog contains both the old and new changelogs.
If you do not want to completely remove changeLog references, you can manually modify the existing changeSets. Often times there are just a few changes that can make a big difference in update performance. For example, if an index was created and then dropped you can skip the cost of the create by removing the drop and create changeSets. Again, liquibase doesn't care for databases that already ran the changeSets and will not see them for new ones. If you have some databases that may still have the index and you want it removed, you can use an indexExists precondition on the drop changeSet after you delete the create changeset.
Precondition checks can be expensive as well, especially depending on the database vendor. Sometimes removing now-unneeded precondition checks can improve performance as well.

So far this has not been an issue for us. Yes, you do end up with a lot of includes but so far I simply did not care. I would strongly suggest not to delete any old changeset or making changes to them. Try to keep everything reproduceable as-is.
Maybe it helps to group changesets by the version of your application. With that I mean that in your migrations.xml file include the files for each of version of your application which include your "real" changesets. That way you could also easily figure out which changeset belongs to which version of your application.

Database versions deployment. Entity Framework Migrations vs SSDT DacPacs

I have a data-centered application with SQL Server. The environments in which it´ll be deployed are not under our control and there´s no DBA in there (they are all small businesses) so we need the process of distribution of each application/database update to be as automatic as possible.
Besides of the normal changes between versions of an application (kind of unpredictable sometimes), we already know that we´ll need to distribute some new seed data with each version. Sometimes this seed data will be related to other data in our system. For instance: maybe we´ll need to insert 2 new rows of some master data during the v2-v3 update process, and some other 5 rows during the v5-v6 update process.
EF
We have checked Entity Framework Db Migrations (available for existing databases with no Code-First since 4.3.1 release), which represents the traditional sequential scripts in a more automatic and controlled way (like Fluent Migrations).
SSDT
On the other hand, with a different philosophy, we have checked SSDT and its dacpacs, snapshots and pre- and post-deployment scripts.
The questions are:
Which of these technologies / philosophies is more appropriate for the case described?
Any other technology / philosophy that could be used?
Any other advice?
Thanks in advance.

That's an interesting question. Here at Red Gate we're hoping to tackle this issue later this year, as we have many customers asking about how we might provide a simple deployment package. We do have SQL Packager, which essentially wraps a SQL script into an exe.
I would say that dacpacs are designed to cover the use case you describe. However, as far as I understand they work be generating a deployment script dynamically when applied to the target. The drawback is that you won't have the warm fuzzy feeling that you might get when deploying a pre-tested SQL script.
I've not tried updating data with dacpacs before, so I'd be interested to know how well this works. As far as I recall, it truncates the target tables and repopulates them.
I have no experience with EF migrations so I'd be curious to read any answers on this topic.

We´ll probably adopt an hybrid solution. We´d like not to renounce to the idea deployment packagers, but in the other hand, due to our applications´s nature (small businesses as final users, no DBA, no obligation to upgrade so multiple "alive" database versions coexisting), we can´t either renounce to the full control of the migration process, including schema and data. In our case, pre and post-deployment scripts may not be enough (or at least not comfortable enough ) for a full migration like EF Migrations are. Changes like addind/removing seed data, changing a "one to many" to a "many to many" relationship or even radical database schema changes (and, consequently , data migrations to this schema from any previous released schema) may be part of our diary work when our first version is released.
So we´ll probably use EF migations, with its "Up" and "Down" system for each version release. In principle, each "Up" will invoke a dacpac with the last database snapshot (and each Down, its previous), each one with its own deployment parameters for this specific migration. EF migrations will handle the versioning line, an maybe also some complex parts of data migration.
We feel more secure in this hybrid way. We missed automatization and schema changes detection in Entity Framework Migrations as much as we missed versioning line control in Dacpacs way.

Optimisation tips when migrating data into Sitecore CMS

I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.
During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.
The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.
If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
Use BulkUpdateContext()
Don't forget your target language
If you can, make the fields shared and unversioned. This should help migration execution speed.
The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!
The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?
What optimization tips do you have?

Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)
Since I wrote that post, I've added a few new things to my normal routines when doing imports.
Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets
And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)
Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay
Hope this helps a bit :-)

I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.
I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?

my Django development (needs advice)

I am writing a website using Django. I need to push the web site out as soon as possible. I don't need a lot of amazing things right now.
I am concern about the future development.
If I enable registration, which means I allow more contents to be writable. If I don't, then only the admins can publish the content. The website isn't exactly a CMS.
This is a big problem, as I will continue to add new features and rewriting codes (either by adapting third-party apps, or rewrites the app itself). So how would either path affects my database contents?
So the bottom line is, how do I ensure as the development continues, I can ensure the safety of my data?
I hope someone can offer a little insights on this matter.
Thank you very much. It's hard to describe my concern, really.

Whatever functionalities you will add after, if you add new fields, etc ... you can still migrate your data to the "new" database.
It becomes more complicated with relationships, because you might have integrity problems. Say you have a Comment model, and say you don't enable registration, so all users can comment on certain posts. If after, you decide to enable registration, and you decide that ALL the comments have to be associated with a user, then you will have problems migrating your data, because you'll have lots of comments for which you'll have to make up a user, or that you'll just have to drop. Of course, in that case there would be work-arounds, but it is just to illustrate some of the problems you might encounter later.
Personally, I try to have a good data-model, with only the minimum necessary fields (more fields will come after, with new functionalities). I especially try to avoid having to add new foreign keys in already existing models. For example, it is fine to add a new model later, with a foreign key to existing model, but the opposite is more complicated.
Finally, I am not sure about why you hesitate to enable registration. It is actually very very simple to do (you can for example use django-registration, and you would just have to write some urlconf, and some templates, and that's all ...)
Hope this helps !

if you are afraid of data migration, just use south...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js