Django sqlite to postgres database migration - django

I have a quite large db of 750MB which I need to migrate from sqlite to postgres. As, the db is quite large I am facing some issues that some of the previous questions on the same topic did not had.
Few days ago, I have migrated one sqlite db of size 30MB without any issues with loaddata and dumpdata commands. But for this large db, one of my app throws Database image is malformed error when running command dumpdata. Another of my app dumps successfully but does not load. I have seen the with -v 3 verbose flag that the objects are not even processed. To be precise, while running the loaddata command data from json file is processed first to check duplicate primary key and other model constraints then those data are used to create model objects. But for this app, data are not processed in the first place.
Apart from this two commands there are some other methods that does the migration. But, the schema is completely changed in those way which I don't desire to do. Moreover I have faced issue DurationField becomes a string after migration and I couldn't typecast those fields. Bigint becomes int and varchar becomes text etc are some of the issues. I cannot afford to have this kind of issues.

I found the issues for these problems.
Database image is malformed occurred because I was copying entire db file directly. Just as, angardi suggested, I compressed the db file using gzip package and then copied it. Which solved this issue.
There were two issues for which loaddata was not loading a certain app. That app contained a model which had a foreign key to itself. And one of the entry of the model had the foreign key set to the entry. This resulted in an infinite loop and the app wasn't loading. Even after fixing the entry I was facing the same issue. This happened because I dumped data of a specific app. One of the model of the app had a ManyToMany field set to a model of another app. Though, loaddata throws error when ForeignKey field reference is not found, I think it kept searching for the ManyToMany fields without throwing an error which turned into an infinite loop type of situation. All, I had to do here is, dump the entire database or the related app on the same file.

Read this for the image malformed part.
When migrating from one database to another, you should first dump all data, then run django migrations on the new database and the load data. That should retain all column types.
When running loaddata, the tables that have no dependencies should be loaded first and then the ones that depend on those already loaded.

Related

Load fixtures + add Page: IntegrityError (duplicate key value)

I have a migration that loads a fixture for populating the database with a basic site structure (from Loading initial data with Django 1.7 and data migrations
). After that migration ran, my test adds a (custom) NewsPage. THis yields an "IntegrityError at /admin/pages/add/website/newspage/5/
duplicate key value violates unique constraint "wagtailcore_page_pkey"
DETAIL: Key (id)=(3) already exists." The same happens when i add the page through the admin interface.
It's a bit suspicious that the Page with pk=3 is the first one that is created in my fixture. The other two pk's were already created by Wagtail's migrations.
I've read up about fixtures an migrations, and it seems Postgres won't reset the primary key sequences. I'm assuming this is also my problem here.
I found a possible solution in Django: loaddata in migrations errors, but that failed (with "psycopg2.ProgrammingError: syntax error at or near "LINE 1: BEGIN;"). Trying to execute the gist of it, I ran the sqlsequencereset management command (./manage.py sqlsequencereset wagtailcore myapp), but i still get the error, although now for id=4.
Is my assumption correct that Postgres not resetting the primary key sequences is my problem here?
Does anyone know how to reliably fix that from/after a migration loaded fixtures?
Would it maybe be easier / more reliable to create content in Python code?
Edit (same day):
If i don't follow the example in Loading initial data with Django 1.7 and data migrations, but just run the management command, it works:
def load_fixture(fixture_file):
"""Load a fixture."""
commands = StringIO()
call_command('loaddata', fixture_file, stdout=commands)
I don't know what the drawbacks of this more simple approach are.
Edit 2 (also same day):
Ok i do know, the fixtures will be based on the current model state, not the state that the migration is for, so it will likely break if your model changes.
I converted the whole thing to Python code. That works and will likely keep working. Today i learned: don't load fixtures in migrations. (Pity, it would have been a nice shortcut.)

Schema migration commit changes

I have the following situation:
more than one schema migration
one data migration
It would be simple enough but I encountered a problem with the data migration. It sends a query for a specific ContentType which I need for django-taggit. The problem is that the model I want to query didn't exist until the migration that preceded it. That errors out with an empty result from that query.
However, when I run all migrations up to the data migration and then I run the data migration itself, everything works well. I've noticed that a migration process doesn't save changes until all of the migrations are finished which doesn't work for this.
One of the solutions I got to was to manually commit/save changes to the database however I haven't been able to find a way to do it. Of course, if there are any other ideas/better solution I'd be happy to hear them.
This is the code where the data migration errors out:
# ChallengeContest ContentType
challenge_contest_ct = ContentType.objects.get(model='challengecontest')
As you can see the model challengecontest is the one that was created in a migration preceeding data migration.
I have found data migrations to be more trouble than they're worth. In my last two jobs we abandoned them, replacing them with writing one-off management commands.

Update database records from data fixtures

How can I update an already populated Django database with records from a data fixture (or other serialized records)?
I know I can use Django data fixtures to provide initial data. Can I use the same already written features to update the database from data fixtures (or similar serialized data like a JSON document)?
The “insert or update from serialised data” operation should be idempotent:
If a record doesn't exist (by its key) in the database, it should be inserted.
If a record already exists (by its key) in the database, it should be updated to match the data fixture.
The end state should be that all data from the data fixture should be updated in the database, no matter if the records already existed.
Specifically, can I update existing rows by specifying pk=null and using natural keys?
How can I use the existing Django “load data” features (whether loaddata or something else similar in Django), to read serialised data, insert records if they don't exist and update them if they already exist?
Yes, you can with loaddata:
python3 -m django loaddata path/to/fixture/some_new_data.json
BUT:
don't call your new data initial_data.json else it will be reloaded every time you syncdb or run a South migraton and will wipe out any changes made since the fixture was loaded. (As mentioned mid-way through this page)
related to the above, loading data via fixtures can be a brittle way to do things unless you're sure that you new data won't arrive while you're writing your fixture. If it does, you risk a clash and either a failure or 'incorrect' data in your DB.
Depending on the nature of your data, it may be better to use a data migration in South. A data migration might be good when you're setting just a couple of values for lots of records (eg, changing the default of a field, or filling in an empty field with data). A data migration also lets you apply some checks/logic to things, via Python, so that you can target which records you update, or update different records in different ways, rather than writing lots of fixtures. If, however, you want to load in a ton of new records, a fixture would make more sense.

Fixtures causing problems when migrating

One of my apps has an initial_data.json file in /fixtures, which works great when using vanilla django to specify some initial data. However, when migrating using South, when it steps through 'Loading initial data for ' I get an IntegrityError, because duplicate data already exists. This makes sense, because my migration didn't empty the table, so the initial data from previous calls to syncdb is already there.
How can I either 1) tell South to not load initial data while migrating, or 2) modify initial_data.json or other django files so that duplicate data errors are handled gracefully, instead of crashing the migrate process in south?
migrate using the --no-initial-data option

Django: flush command doesnt completely clear database, reset fails

I rewrote a lot of my models, and since I am just running a test server, I do ./manage.py reset myapp to reset the db tables and everything has been working fine.
But I tried to do it this time, and I get an error,
"The full error: contraint owner_id_refs_id_9036cedd" of relation "myapp_tagger" does not exist"
So I figured I would just nuke the whole site and start fresh. So i did ./manage.py flush then did a syncdb this did not raise an error and deleted all my data, however it did not update the database since when I try to access any of my_app's objects, i get a column not found error. I thought that flush was supposed to drop all tables. The syncdb said that no fixtures were added.
I assume the error is related to the fact that I changed the tagger model to having a foreignkey with a name owner tied to another object.
I have tried adding related_name to the foreignkey arguments and nothing seems to be working.
I thought that flush was supposed to drop all tables.
No. According to the documentation, manage.py flush doesn't drop the tables. Instead it does the following:
Returns the database to the state it was in immediately after syncdb was executed. This means that all data will be removed from the database, any post-synchronization handlers will be re-executed, and the initial_data fixture will be re-installed.
As stated in chapter 10 of The Django Book in the "Making Changes to a Database Schema" section,
syncdb merely creates tables that don't yet exist in your database — it does not sync changes in models or perform deletions of models. If you add or change a model's field, or if you delete a model, you’ll need to make the change in your database manually.
Therefore, to solve your problem you will need to either:
Delete the database and reissue manage.py syncdb. This is the process that I use when I'm still developing the database schema. I use an initial_data fixture to install some test data, which also needs to be updated when the database schema changes.
Manually issue the SQL commands to modify your database schema.
Use South.