Update database records from data fixtures - django

How can I update an already populated Django database with records from a data fixture (or other serialized records)?
I know I can use Django data fixtures to provide initial data. Can I use the same already written features to update the database from data fixtures (or similar serialized data like a JSON document)?
The “insert or update from serialised data” operation should be idempotent:
If a record doesn't exist (by its key) in the database, it should be inserted.
If a record already exists (by its key) in the database, it should be updated to match the data fixture.
The end state should be that all data from the data fixture should be updated in the database, no matter if the records already existed.
Specifically, can I update existing rows by specifying pk=null and using natural keys?
How can I use the existing Django “load data” features (whether loaddata or something else similar in Django), to read serialised data, insert records if they don't exist and update them if they already exist?

Yes, you can with loaddata:
python3 -m django loaddata path/to/fixture/some_new_data.json
BUT:
don't call your new data initial_data.json else it will be reloaded every time you syncdb or run a South migraton and will wipe out any changes made since the fixture was loaded. (As mentioned mid-way through this page)
related to the above, loading data via fixtures can be a brittle way to do things unless you're sure that you new data won't arrive while you're writing your fixture. If it does, you risk a clash and either a failure or 'incorrect' data in your DB.
Depending on the nature of your data, it may be better to use a data migration in South. A data migration might be good when you're setting just a couple of values for lots of records (eg, changing the default of a field, or filling in an empty field with data). A data migration also lets you apply some checks/logic to things, via Python, so that you can target which records you update, or update different records in different ways, rather than writing lots of fixtures. If, however, you want to load in a ton of new records, a fixture would make more sense.

Related

Django sqlite to postgres database migration

I have a quite large db of 750MB which I need to migrate from sqlite to postgres. As, the db is quite large I am facing some issues that some of the previous questions on the same topic did not had.
Few days ago, I have migrated one sqlite db of size 30MB without any issues with loaddata and dumpdata commands. But for this large db, one of my app throws Database image is malformed error when running command dumpdata. Another of my app dumps successfully but does not load. I have seen the with -v 3 verbose flag that the objects are not even processed. To be precise, while running the loaddata command data from json file is processed first to check duplicate primary key and other model constraints then those data are used to create model objects. But for this app, data are not processed in the first place.
Apart from this two commands there are some other methods that does the migration. But, the schema is completely changed in those way which I don't desire to do. Moreover I have faced issue DurationField becomes a string after migration and I couldn't typecast those fields. Bigint becomes int and varchar becomes text etc are some of the issues. I cannot afford to have this kind of issues.
I found the issues for these problems.
Database image is malformed occurred because I was copying entire db file directly. Just as, angardi suggested, I compressed the db file using gzip package and then copied it. Which solved this issue.
There were two issues for which loaddata was not loading a certain app. That app contained a model which had a foreign key to itself. And one of the entry of the model had the foreign key set to the entry. This resulted in an infinite loop and the app wasn't loading. Even after fixing the entry I was facing the same issue. This happened because I dumped data of a specific app. One of the model of the app had a ManyToMany field set to a model of another app. Though, loaddata throws error when ForeignKey field reference is not found, I think it kept searching for the ManyToMany fields without throwing an error which turned into an infinite loop type of situation. All, I had to do here is, dump the entire database or the related app on the same file.
Read this for the image malformed part.
When migrating from one database to another, you should first dump all data, then run django migrations on the new database and the load data. That should retain all column types.
When running loaddata, the tables that have no dependencies should be loaded first and then the ones that depend on those already loaded.

Am I required to use django data migrations?

I migrated a database that was manipulated via SQL to use Django's migration module: https://docs.djangoproject.com/en/3.0/topics/migrations/.
Initially I thought of using migrations only for changes in the model such as changes in columns or new tables, performing deletes, inserts and updates through SQL, as they are constant and writing a migration for each case would be a little impractical.
Could using the migration module without using data migrations cause me some kind of problem or inconsistency in the future?
You can think about data migration if you made a change that required also a manual fix on the data.
Maybe you decided to normalize something in the database, e.g. to split a name column to the first name and the last name. If you have only one instance of the application with one database and you are the only developer then you will also not write a data migration, but if you want to change it on a live production site with 24 x 7 hours traffic or you cooperate with other developers then you probably prepare a data migration for their databases or you will thoroughly test the migration on a copy of live data that the update will work on the production site correctly without issues and with minimal shutdown. If you don't write a data migration and you had no problem immediately then it is OK and will be not worse then a possible ill-conceived data migration.

Mix data and schema migrations in one migrations file (Django)?

I've heard the opinion that mix data migration and structure migrations is bad practice in Django. Even if you are specify atomic=False in your Migration class. But i could not find any information on this topic. Even my more expirience collegues could not answer this question.
So, is it bad to mix data and structure migrations? If so why? What exactly may happen if i do it?
There is an actual reason for not mixing data and schema migrations in one migration, mentioned in the entry for RunPython operation in Django docs:
On databases that do support DDL transactions (SQLite and PostgreSQL), RunPython operations do not have any transactions automatically added besides the transactions created for each migration. Thus, on PostgreSQL, for example, you should avoid combining schema changes and RunPython operations in the same migration or you may hit errors like OperationalError: cannot ALTER TABLE "mytable" because it has pending trigger events.
It should be also noted that for databases that do not support DDL transactions, it may be easier to fix the database after an unsuccessful migration attempt when data and schema migration operations are not mixed together, as data migration operations can be rolled back automatically in Django.
In the past the best practice was to keep them separate. The second sentence in this section in the docs says:
Migrations that alter data are usually called “data migrations”;
they’re best written as separate migrations, sitting alongside your
schema migrations.
But doesn't list any reasons why. Since Django ~2.0 I've been allowing small data migrations to occur with schema migrations. However there have been times when the migration simply couldn't run with the schema migration. There are two main cases that I've run into.
The data migration takes a long time and shouldn't be a migration in the first place. The resolution was to simply run a script that did what the data migration would have, but in batches.
Attempting to add/update data, then creating an index. This forced me into splitting the migrations into two separate files. I don't remember the exact error, but it simply wouldn't migrate. This shouldn't cause problems for you unless there are non-atomic migrations running which would leave your DB in an unexpected state.

How can I add new models and do migrations without restarting the server manually?

For the app I'm building I need to be able to create a new data model in models.py as fast as possible automatically.
I created a way to do this by making a seperate python program that opens models.py, edits it, closes it, and does server migrations automatically but there must be a better way.
edit: my method works on my local server but not on pythonanywhere
In the Django documentation, I found SchemaEditor, which is exactly what you want. Using the SchemaEditor, you can create Models, delete Models, add fields, delete fields etc..
Here's an excerpt:
Django’s migration system is split into two parts; the logic for
calculating and storing what operations should be run
(django.db.migrations), and the database abstraction layer that turns
things like “create a model” or “delete a field” into SQL - which is
the job of the SchemaEditor.
Don't rewrite your models.py file automatically, that is not how it's meant to work. When you need more flexibility in the way you store data, you should do the following:
think hard about what kind of data you want to store and make your data model more abstract to fit more cases, if needed.
Use JSON fields to store arbitrary JSON data with your model (e.g. for the Postgres database)
if it's not a fit, don't use Django's ORM and use a different store (e.g. Redis for key-value or MongoDB for JSON documents)

Django migrations - default entries (inserting rows in tables as part of migration)

Background:
I'm dealing with quite complex django application and I'm looking for a way to make my life a little bit easier.
One of the models (tables) serves as source of "options" for most "select lists" in the system (simple field <-> dictionary construct). It should always be populated with the default values and/or extenden when the there are new entries provided with a new version of the system.
I manage it all now manually so:
I run migrations for the main app
Unload the dictionary tables from the dev enviroment
Insert them manually into the enviroment I'm currently upgrading
I migrate the rest of the apps which may contain refferences to the dictionary tables
All of this because of default values provided in ForeignKeys.
Actual Question:
Is it possible to add table entries (table content) to the makemigrations process for particular tables??
Yes, you can do anything you like in a migration via the RunPython operation. This is called a data migration and is fully documented.