Do django db_index migrations run concurrently? - django

I'm looking to add a multi-column index to a postgres database. I have a non blocking SQL command to do this which looks like this:
CREATE INDEX CONCURRENTLY shop_product_fields_index ON shop_product (id, ...);
When I add db_index to my model and run the migration, will it also run concurrently or will it block writes? Is a concurrent migration possible in django?

There are AddIndexConcurrently and RemoveIndexConcurrently in Django 3.0:
https://docs.djangoproject.com/en/dev/ref/contrib/postgres/operations/#django.contrib.postgres.operations.AddIndexConcurrently
Create a migration and then change migrations.AddIndex to AddIndexConcurrently. Import it from django.contrib.postgres.operations.

With Django 1.10 migrations you can create a concurrent index by using RunSQL and disabling the wrapping transaction by making the migration non-atomic by setting atomic = False as a data attribute on the migration:
class Migration(migrations.Migration):
atomic = False # disable transaction
dependencies = []
operations = [
migrations.RunSQL('CREATE INDEX CONCURRENTLY ...')
]
RunSQL: https://docs.djangoproject.com/en/stable/ref/migration-operations/#runsql
Non-atomic Migrations: https://docs.djangoproject.com/en/stable/howto/writing-migrations/#non-atomic-migrations

You could use the SeparateDatabaseAndState migration operation to provide a custom SQL command for creating the index. The operation accepts two lists of operations:
state_operations are operations to apply on the Django model state.
They do not affect the database.
database_operations are operations to apply to the database.
An example migration may look like this:
from django.db import migrations, models
class Migration(migrations.Migration):
atomic = False
dependencies = [
('myapp', '0001_initial'),
]
operations = [
migrations.SeparateDatabaseAndState(
state_operations=[
# operation generated by `makemigrations` to create an ordinary index
migrations.AlterField(
# ...
),
],
database_operations=[
# operation to run custom SQL command (check the output of `sqlmigrate`
# to see the auto-generated SQL, edit as needed)
migrations.RunSQL(sql='CREATE INDEX CONCURRENTLY ...',
reverse_sql='DROP INDEX ...'),
],
),
]

Do what tgroshon says for new django 1.10 +
for lesser versions of django i have had success with a more verbose subclassing method:
from django.db import migrations, models
class RunNonAtomicSQL(migrations.RunSQL):
def _run_sql(self, schema_editor, sqls):
if schema_editor.connection.in_atomic_block:
schema_editor.atomic.__exit__(None, None, None)
super(RunNonAtomicSQL, self)._run_sql(schema_editor, sqls)
class Migration(migrations.Migration):
dependencies = [
]
operations = [
RunNonAtomicSQL(
"CREATE INDEX CONCURRENTLY",
)
]

You can do something like
import django.contrib.postgres.indexes
from django.db import migrations, models
from django.contrib.postgres.operations import AddIndexConcurrently
class Migration(migrations.Migration):
atomic = False
dependencies = [
("app_name", "parent_migration"),
]
operations = [
AddIndexConcurrently(
model_name="mymodel",
index=django.contrib.postgres.indexes.GinIndex(
fields=["field1"],
name="field1_idx",
),
),
AddIndexConcurrently(
model_name="mymodel",
index=models.Index(
fields=["field2"], name="field2_idx"
),
),
]
Ref: https://docs.djangoproject.com/en/dev/ref/contrib/postgres/operations/#django.contrib.postgres.operations.AddIndexConcurrently

There is no support for PostgreSQL concurent index creation in django.
Here is the ticket requesting this feature - https://code.djangoproject.com/ticket/21039
But instead, you can manually specify any custom RunSQL operation in the migration -
https://docs.djangoproject.com/en/stable/ref/migration-operations/#runsql

Related

Is django data migration immediately applied?

I read the following text on docs:
"""
Django’s default behavior is to run in autocommit mode. Each query is immediately committed to the database, unless a transaction is active. See below for details.
"""
and I'm running the following data migration:
def fill_query(apps, schema_editor):
Result = apps.get_model('monitoring', 'Result')
for r in Result.objects.all():
r.query = r.monitored_search.query
r.user_id = r.monitored_search.user_id
r.save()
class Migration(migrations.Migration):
dependencies = [
('monitoring', '0006_searchresult_user_id'),
]
operations = [
migrations.RunPython(fill_query),
]
But when I try to find objects from Result I found that all still have query and user_id as null. And my data migration keep running (more than 2 millions registers on database)
maybe the changes will be applied when data migration stop running or my data migration is not working?

How to fail a custom data migration?

I have written a data migration that initializes new tables with some rows. The objects have foreign keys to other tables, so I have to check that the foreign ids exist. If they don't, I would like to stop the migration with an error message.
I have written two functions: forward and reverse. What is a recommended way of stopping a migration from within forward?
def forward(apps, schema_editor):
...
def reverse(apps, schema_editor):
...
class Migration(migrations.Migration):
dependencies = [
("my_app", "0001_initial"),
]
operations = [
migrations.RunPython(code=forward, reverse_code=reverse)
]

Creating a GIN index in Django

I have created a model in Django.
class MyModel(models.Model):
features = TextField(blank=True, default='')
There are several possible ways to store the data in the feature field. Some examples below.
feature1;feature2
feature1, feature2
feature1,feature2
And so on. I need to create a GIN index for that field. I would probably do it in postgreSQL in the following way
CREATE INDEX features_search_idx ON "mymodel" USING gin (regexp_split_to_array("mymodel"."features", E'[,;\\s]+'));
Would it be possible to do the same thing by a migration?
Yes.
Create an empty migration: python manage.py makemigration yourapp --empty -n pour_gin
Add a migrations.RunSQL() operation in the migration file.
class Migration(migrations.Migration):
dependencies = [
# ...
]
operations = [
migrations.RunSQL(
sql="""CREATE INDEX features_search_idx ON "mymodel" USING gin (regexp_split_to_array("mymodel"."features", E'[,;\\s]+'));""",
reverse_sql=migrations.RunSQL.noop, # TODO: replace me with DROP INDEX
),
]

What does RunPython.noop() do?

In the documentation it says,
"Pass the RunPython.noop method to code or reverse_code when you want the operation not to do anything in the given direction. This is especially useful in making the operation reversible."
Sometimes it is possible that you want to revert a migration. For example you have added a field, but now you want to bring the database back to the state before the migration. You can do this by reverting the migration [Django-doc] with the command:
python3 manage.py migrate app_name previous_migration_name
Then Django will look how it can migrate back the the previous_migration_name and perform the operations necessary. For example if you renamed a field from foo to bar, then Django will rename it from bar back to foo.
Other operations are not reversible. For example if you remove a field in the migration, and that field has no default and is non-NULLable, then this can not be reversed. This makes sense since the reverse of removing a field is adding a field, but since there is no value to take for the existing records, what should Django fill in for that field that is recreated for the existing records?
A RunPython command is by default not reversible. In general in computer science one can not computationally determine the reverse of a function if any exists. This is a consequence of Rice's theorem [wiki]. But sometimes it is possible. If we for example constructed a migration where we incremented a certain field with one, then the reverse is to decrement all the fields with one, for example:
from django.db.models import F
from django.db import migrations
def forwards_func(apps, schema_editor):
MyModel = apps.get_model('my_app', 'MyModel')
db_alias = schema_editor.connection.alias
MyModel.objects.using(db_alias).all().update(
some_field=F('some_field')+1
])
def reverse_func(apps, schema_editor):
MyModel = apps.get_model('my_app', 'MyModel')
db_alias = schema_editor.connection.alias
MyModel.objects.using(db_alias).all().update(
some_field=F('some_field')-1
])
class Migration(migrations.Migration):
dependencies = []
operations = [
migrations.RunPython(code=forwards_func, reverse_code=reverse_func),
]
But sometimes it is possible that a (data)migration does nothing when you migrate it forward, or more common when you migrate it backwards. Instead of each time implementing an empty function, you can then pass a reference to noop, which does nothing:
from django.db.models import F
from django.db import migrations
def forwards_func(apps, schema_editor):
# … some action …
pass
class Migration(migrations.Migration):
dependencies = []
operations = [
migrations.RunPython(code=forwards_func, reverse_code=migrations.RunPython.noop),
]

How to solve the problem of Dual behavior of django CustomUser in migration?

I have a data migration as below in which I want to use create_user method of CustomUser, get an instance of the created user, and use this instance to create instance of Partner model.
It is worth mentioning that I have a Partner model that has a one-to-one relationship with CustomUser.
I have two options:
# Option One:
def populate_database_create_partner(apps, schema_editor):
Partner = apps.get_model('partners', 'Partner')
CustomUser.objects.create_user(
id=33,
email='test_email#email.com',
password='password',
first_name='test_first_name',
last_name="test_last_name",
is_partner=True,
)
u = CustomUser.objects.get(id=33)
partner = Partner.objects.create(user=u, )
class Migration(migrations.Migration):
dependencies = [
('accounts', '0006_populate_database_createsuperuser'),
]
operations = [
migrations.RunPython(populate_database_create_partner),
]
In option one, I see this error:
ValueError: Cannot assign "<CustomUser: test_email#email.com>": "Partner.user" must be a "CustomUser" instance.
I then test this:
# Option Two:
def populate_database_create_partner(apps, schema_editor):
Partner = apps.get_model('partners', 'Partner')
CustomUser = apps.get_model('accounts', 'CustomUser')
CustomUser.objects.create_user(
id=33,
email='test_email#email.com',
password='password',
first_name='test_first_name',
last_name="test_last_name",
is_partner=True,
)
u = CustomUser.objects.get(id=33)
partner = Partner.objects.create(user=u, )
class Migration(migrations.Migration):
dependencies = [
('accounts', '0006_populate_database_createsuperuser'),
]
operations = [
migrations.RunPython(populate_database_create_partner),
]
I the see this error:
CustomUser.objects.create_user(
AttributeError: 'Manager' object has no attribute 'create_user'
The create_user method does not work.
If I do not use the create_user method and simply use CustomUser.objects.create(...), I will not be able to set password in here.
Django only keeps limited historical information about each version of your models. One of the things it doesn't keep track of, as documented here, is custom model managers.
The good news is that there's a way to force the migrations system to use your custom manager:
You can optionally serialize managers into migrations and have them available in RunPython operations. This is done by defining a use_in_migrations attribute on the manager class.
As noted, this just allows your migration to use the version of the manager that exists when the migration is run; so, if you later make changes to it, you could break the migration. A safer alternative is to just copy the relevant create_user code into the migration itself.