Am I required to use django data migrations? - django

I migrated a database that was manipulated via SQL to use Django's migration module: https://docs.djangoproject.com/en/3.0/topics/migrations/.
Initially I thought of using migrations only for changes in the model such as changes in columns or new tables, performing deletes, inserts and updates through SQL, as they are constant and writing a migration for each case would be a little impractical.
Could using the migration module without using data migrations cause me some kind of problem or inconsistency in the future?

You can think about data migration if you made a change that required also a manual fix on the data.
Maybe you decided to normalize something in the database, e.g. to split a name column to the first name and the last name. If you have only one instance of the application with one database and you are the only developer then you will also not write a data migration, but if you want to change it on a live production site with 24 x 7 hours traffic or you cooperate with other developers then you probably prepare a data migration for their databases or you will thoroughly test the migration on a copy of live data that the update will work on the production site correctly without issues and with minimal shutdown. If you don't write a data migration and you had no problem immediately then it is OK and will be not worse then a possible ill-conceived data migration.

Related

Django Accessing external Database to get data into project database

i'm looking for a "best-practice" guide/solution to the following situation.
I have a Django project with a MySql DB which i created and manage. I have to import data, every 5 minutes, from a second (external, not managed by me) db in order to do some actions. I have read rights for the external db and all the necessary information.
I have read the django docs regarding the usage of multiple database: register the db in settings.py, migrate using the --database flag, query/access data by routing to the db (short version) and multiple question on this matter on stackoverflow.
So my plan is:
Register the second database in settings.py, use inspectdb to add to the model, migrate, define a method which reads data from the external db and add it to the internal (own) db.
However I do have some questions:
Do i have to register the external db if i don't manage it?
(Most probably yes in order to use ORM or the cursors to access the data)
How can i migrate the model if I don't manage the DB and don't have write permissions? I also don't need all the tables (around 250, but only 5 needed).
(is fake migration an option worth considering? I would use inspectdb and migrate only the necessary tables.)
Because I only need to retrieve data from the external db and not to write back, would it suffice to have a method that constantly gets the latest data like the second solution suggested in this answer
Any thoughts/ideas/suggestions are welcomed!
I would not use Django's ORM for it, but rather just access the DB with psycopg2 and SQL, get the columns you care about into dicts, and work with those. Otherwise any minor change to that external DB's tables may break your Django app, because the models don't match anymore. That could create more headaches than an ORM is worth.

Mix data and schema migrations in one migrations file (Django)?

I've heard the opinion that mix data migration and structure migrations is bad practice in Django. Even if you are specify atomic=False in your Migration class. But i could not find any information on this topic. Even my more expirience collegues could not answer this question.
So, is it bad to mix data and structure migrations? If so why? What exactly may happen if i do it?
There is an actual reason for not mixing data and schema migrations in one migration, mentioned in the entry for RunPython operation in Django docs:
On databases that do support DDL transactions (SQLite and PostgreSQL), RunPython operations do not have any transactions automatically added besides the transactions created for each migration. Thus, on PostgreSQL, for example, you should avoid combining schema changes and RunPython operations in the same migration or you may hit errors like OperationalError: cannot ALTER TABLE "mytable" because it has pending trigger events.
It should be also noted that for databases that do not support DDL transactions, it may be easier to fix the database after an unsuccessful migration attempt when data and schema migration operations are not mixed together, as data migration operations can be rolled back automatically in Django.
In the past the best practice was to keep them separate. The second sentence in this section in the docs says:
Migrations that alter data are usually called “data migrations”;
they’re best written as separate migrations, sitting alongside your
schema migrations.
But doesn't list any reasons why. Since Django ~2.0 I've been allowing small data migrations to occur with schema migrations. However there have been times when the migration simply couldn't run with the schema migration. There are two main cases that I've run into.
The data migration takes a long time and shouldn't be a migration in the first place. The resolution was to simply run a script that did what the data migration would have, but in batches.
Attempting to add/update data, then creating an index. This forced me into splitting the migrations into two separate files. I don't remember the exact error, but it simply wouldn't migrate. This shouldn't cause problems for you unless there are non-atomic migrations running which would leave your DB in an unexpected state.

Django - data migrations + db dump

Consider I have a DB with some initial data loaded using data migrations. Since the initial loading, the data has been further changed by users of the app via the website. Of course, these changes are not recorded in additional data migrations since they happen in realtime. So the data migrations are somewhat redundant since they don't capture all the changes made by the users.
Now, I want to deploy the app onto a new server and DB. So I take a dump of the current database, then log onto the new server and use the dump to initialize the new DB. What I'm confused about is: if I then run the aforementioned data migrations on the new DB, they will add redundant outdated data, no?
More generally, my confusion lies in how to make data migrations and db dumps work together when deploying an existing web app onto a new server+DB. Is there a better way to think about this?
The dump will include the tables used by migrations to keep track of where the database is in terms of running your migrations. Your new database is going to be at the same place (in terms of migrations) as your current database.
Simply put, if you run
python manage.py migrate
On the new server hooked up to the new databased after you "restore" the data, it'll say there isn't anything to run.

Schema migration commit changes

I have the following situation:
more than one schema migration
one data migration
It would be simple enough but I encountered a problem with the data migration. It sends a query for a specific ContentType which I need for django-taggit. The problem is that the model I want to query didn't exist until the migration that preceded it. That errors out with an empty result from that query.
However, when I run all migrations up to the data migration and then I run the data migration itself, everything works well. I've noticed that a migration process doesn't save changes until all of the migrations are finished which doesn't work for this.
One of the solutions I got to was to manually commit/save changes to the database however I haven't been able to find a way to do it. Of course, if there are any other ideas/better solution I'd be happy to hear them.
This is the code where the data migration errors out:
# ChallengeContest ContentType
challenge_contest_ct = ContentType.objects.get(model='challengecontest')
As you can see the model challengecontest is the one that was created in a migration preceeding data migration.
I have found data migrations to be more trouble than they're worth. In my last two jobs we abandoned them, replacing them with writing one-off management commands.

Migrating existing Postgres database to use South on Heroku.

I'm pretty new to Django and it's deployment on Heroku.
I've got a Postgres database up and running on the app server. My app requirements need me to add a new column to my existing database which has a sizable amount of data in it, which I can't lose.
Looking around, I found a solution described by Mike Ball here.
I have the following queries though:
What exactly is South? (I read the docs but didn't get a clear idea)
Will it help me save and move my existing data from my current database?
As a complete newbie, is the above link an easy way to move the data?
Also, in general, if you could hook me up with a good guide for general DBMS concepts, I'd be very grateful.
Thanks!
What exactly is South? (I read the docs but didn't get a clear idea)
south migrates your database schema as it changes in time. the schema has to start with a django models.py file. If you use 'manage.py syncdb ...' to create your database then you can probably use south.
Will it help me save and move my existing data from my current database?
As long as you used syncdb to create your database using a models.py file in django, then south can change that database and add the new column. basically, south records the changes you make to the models.py file in migration files, then you can apply those migration files to your database which update it non-destructively.
As a complete newbie, is the above link an easy way to move the data?
south doesn't move your data. it allows you to add the new columns to your existing database without destroying the database. to move your data you will need to backup the data to a file, then copy the file to another machine, then restore the backup. That's not what south does.