Is django data migration immediately applied? - django

I read the following text on docs:
"""
Django’s default behavior is to run in autocommit mode. Each query is immediately committed to the database, unless a transaction is active. See below for details.
"""
and I'm running the following data migration:
def fill_query(apps, schema_editor):
Result = apps.get_model('monitoring', 'Result')
for r in Result.objects.all():
r.query = r.monitored_search.query
r.user_id = r.monitored_search.user_id
r.save()
class Migration(migrations.Migration):
dependencies = [
('monitoring', '0006_searchresult_user_id'),
]
operations = [
migrations.RunPython(fill_query),
]
But when I try to find objects from Result I found that all still have query and user_id as null. And my data migration keep running (more than 2 millions registers on database)
maybe the changes will be applied when data migration stop running or my data migration is not working?

Related

How to fail a custom data migration?

I have written a data migration that initializes new tables with some rows. The objects have foreign keys to other tables, so I have to check that the foreign ids exist. If they don't, I would like to stop the migration with an error message.
I have written two functions: forward and reverse. What is a recommended way of stopping a migration from within forward?
def forward(apps, schema_editor):
...
def reverse(apps, schema_editor):
...
class Migration(migrations.Migration):
dependencies = [
("my_app", "0001_initial"),
]
operations = [
migrations.RunPython(code=forward, reverse_code=reverse)
]

Django data migration fails unless it's run separately

I've run into this a couple other times and can't figure out why it happens. When I run the migrations all together through ./manage.py migrate then the last migration (a data migration) fails. The solution is to run the data migration on it's own after the other migrations have been completed. How can I run them all automatically with no errors?
I have a series of migrations:
fulfillment/0001.py
order/0041.py (dependency: fulfillment/0001.py)
order/0042.py
order/0043.py
I followed this RealPython article to move a model to a new app which which works perfectly and is covered by migrations #1 to #3. Migration #3 also adds a GenericForeignKey field. Migration #4 is a data migration that simply populates the GenericForeignKey field from the existing ForeignKey field.
from django.db import migrations, models
def copy_to_generic_fk(apps, schema_editor):
ContentType = apps.get_model('contenttypes.ContentType')
Order = apps.get_model('order.Order')
pickup_point_type = ContentType.objects.get(
app_label='fulfillment',
model='pickuppoint'
)
Order.objects.filter(pickup_point__isnull=False).update(
content_type=pickup_point_type,
object_id=models.F('pickup_point_id')
)
class Migration(migrations.Migration):
dependencies = [
('order', '0042'),
]
operations = [
migrations.RunPython(copy_to_generic_fk, reverse_code=migrations.RunPython.noop)
]
Running the sequence together I get an error:
fake.DoesNotExist: ContentType matching query does not exist.
If I run the migration to #3 then run #4 by itself everything works properly. How can I get them to run in sequence with no errors?
There is two things that might fix the problem, first look into run_before https://docs.djangoproject.com/en/3.1/howto/writing-migrations/#controlling-the-order-of-migrations
if you add it to fulfillment #1, and make sure it runs before orders #4, it should fix the problem.
Another thing that you can do is to move your data migrations to fulfillment #2, that way you know for sure all orders are finished and fulfillment #1 is also finished.
Instead of getting the ContentType through .get() you have to retrieve the model through the apps argument then use get_for_model().
def copy_to_generic_fk(apps, schema_editor):
ContentType = apps.get_model('contenttypes', 'ContentType')
PickupPoint = apps.get_model('fulfillment', 'pickuppoint')
pickup_point_type = ContentType.objects.get_for_model(PickupPoint)
...

How can I initialise group names in Django every time the program runs?

I have this code and I want it to just create the groups every time the program runs so that if the database is deleted it will still be a sufficient program itself and someone won't have to create groups again, do you know an easy way to do this?
system_administrator = Group.objects.get_or_create(name='system_administrator')
manager = Group.objects.get_or_create(name='manager')
travel_advisor = Group.objects.get_or_create(name='travel_advisor')
If you lose your DB, you'd have to rerun migrations on a fresh db before the program could run again. So I think data migrations might be a good solution for this? A data migration, is a migration that runs python code to alter the data in the DB, not the schema as a normal migration does.
You could do something like this:
In a new migration file (you can run python manage.py makemigrations --empty yourappname to create an empty migration file for an app)
def generate_groups(apps, schema_editor):
Group = apps.get_model('yourappname', 'Group')
Group.objects.get_or_create(name="system_administrator")
Group.objects.get_or_create(name="manager")
Group.objects.get_or_create(name="travel_advisor")
class Migration(migrations.Migration):
dependencies = [
('yourappname', 'previous migration'),
]
operations = [
migrations.RunPython(generate_groups),
]
Worth reading the docs on this https://docs.djangoproject.com/en/3.0/topics/migrations/#data-migrations
You can do it in the ready method of one of your apps.
class YourApp(Appconfig):
def ready(self):
# important do the import inside the method
from something import Group
Group.objects.get_or_create(name='system_administrator')
Group.objects.get_or_create(name='manager')
Group.objects.get_or_create(name='travel_advisor')
The problem with the data migrations approach is that it is useful for populate the database the first time. But if the groups are deleted once the data migration has run, you will need to populate them again.
Also remember that get_or_create return a tuple.
group, created = Group.objects.get_or_create(name='manager')
# group if an instance of Group
# created is a boolean

Data migration only executed for the first test

I have a simple data migration, which creates a Group, and which looks like this :
def make_manager_group(apps, schema_editor):
Group = apps.get_model("auth", "Group")
managers_group = Group(name="managers")
managers_group.save()
class Migration(migrations.Migration):
dependencies = [
('my_app', '0001_initial'),
('auth', '0006_require_contenttypes_0002'),
]
operations = [
migrations.RunPython(make_manager_group, reverse_code=lambda *args, **kwargs: True)
]
and a simple functional test app containing the following tests :
from django.contrib.auth.models import Group
from django.contrib.staticfiles.testing import StaticLiveServerTestCase
class FunctionalTest(StaticLiveServerTestCase):
def setUp(self):
print("Groups : {}".format(Group.objects.all()))
def test_2(self):
pass
def test_1(self):
pass
When I run the tests, I get :
Creating test database for alias 'default'...
Groups : [<Group: managers>]
.Groups : []
.
Clearly, the group is being created when the test db is created, but when this db is reset between tests, it is reset to an empty db and not to the state it was after all the migrations were applied.
The model itself doesn't contain anything special (I only created one for the migration not to be the first, as in the project I'm working on, but I'm not sure it is needed at all).
Is this a bug, or am I missing something about data migration, to be able to have my group created when every single test starts?
Edit 1 : I'm using Django 1.8.3
Edit 2 : Quick'n'dirty hack added to the setUp of the test class :
from django.contrib.auth.models import Group, Permission
if not Group.objects.all():
managers_group = Group(name="managers")
managers_group.save()
managers_group.permissions.add(
Permission.objects.get(codename='add_news'),
Permission.objects.get(codename='change_news'),
Permission.objects.get(codename='delete_news')
)
This is all but DRY, but until now, I couldn't find another way...
I answer my own question :
It seems to be a filed bug which has became documented
It says that using TransactionTestCase and its subclasses (like in my case LiveServerTestCase) doesn't insert the data migrations before every test. It is just done once for the first of them.
It also says that we could set serialized_rollback to True, which should force the rollback to the filled database. But in my case, I'm having the same error as the last message in the bug report.
So I'm going to stick with my dirty hack for now, and maybe create a data fixture, as it says that fixtures are used everytime.

Do django db_index migrations run concurrently?

I'm looking to add a multi-column index to a postgres database. I have a non blocking SQL command to do this which looks like this:
CREATE INDEX CONCURRENTLY shop_product_fields_index ON shop_product (id, ...);
When I add db_index to my model and run the migration, will it also run concurrently or will it block writes? Is a concurrent migration possible in django?
There are AddIndexConcurrently and RemoveIndexConcurrently in Django 3.0:
https://docs.djangoproject.com/en/dev/ref/contrib/postgres/operations/#django.contrib.postgres.operations.AddIndexConcurrently
Create a migration and then change migrations.AddIndex to AddIndexConcurrently. Import it from django.contrib.postgres.operations.
With Django 1.10 migrations you can create a concurrent index by using RunSQL and disabling the wrapping transaction by making the migration non-atomic by setting atomic = False as a data attribute on the migration:
class Migration(migrations.Migration):
atomic = False # disable transaction
dependencies = []
operations = [
migrations.RunSQL('CREATE INDEX CONCURRENTLY ...')
]
RunSQL: https://docs.djangoproject.com/en/stable/ref/migration-operations/#runsql
Non-atomic Migrations: https://docs.djangoproject.com/en/stable/howto/writing-migrations/#non-atomic-migrations
You could use the SeparateDatabaseAndState migration operation to provide a custom SQL command for creating the index. The operation accepts two lists of operations:
state_operations are operations to apply on the Django model state.
They do not affect the database.
database_operations are operations to apply to the database.
An example migration may look like this:
from django.db import migrations, models
class Migration(migrations.Migration):
atomic = False
dependencies = [
('myapp', '0001_initial'),
]
operations = [
migrations.SeparateDatabaseAndState(
state_operations=[
# operation generated by `makemigrations` to create an ordinary index
migrations.AlterField(
# ...
),
],
database_operations=[
# operation to run custom SQL command (check the output of `sqlmigrate`
# to see the auto-generated SQL, edit as needed)
migrations.RunSQL(sql='CREATE INDEX CONCURRENTLY ...',
reverse_sql='DROP INDEX ...'),
],
),
]
Do what tgroshon says for new django 1.10 +
for lesser versions of django i have had success with a more verbose subclassing method:
from django.db import migrations, models
class RunNonAtomicSQL(migrations.RunSQL):
def _run_sql(self, schema_editor, sqls):
if schema_editor.connection.in_atomic_block:
schema_editor.atomic.__exit__(None, None, None)
super(RunNonAtomicSQL, self)._run_sql(schema_editor, sqls)
class Migration(migrations.Migration):
dependencies = [
]
operations = [
RunNonAtomicSQL(
"CREATE INDEX CONCURRENTLY",
)
]
You can do something like
import django.contrib.postgres.indexes
from django.db import migrations, models
from django.contrib.postgres.operations import AddIndexConcurrently
class Migration(migrations.Migration):
atomic = False
dependencies = [
("app_name", "parent_migration"),
]
operations = [
AddIndexConcurrently(
model_name="mymodel",
index=django.contrib.postgres.indexes.GinIndex(
fields=["field1"],
name="field1_idx",
),
),
AddIndexConcurrently(
model_name="mymodel",
index=models.Index(
fields=["field2"], name="field2_idx"
),
),
]
Ref: https://docs.djangoproject.com/en/dev/ref/contrib/postgres/operations/#django.contrib.postgres.operations.AddIndexConcurrently
There is no support for PostgreSQL concurent index creation in django.
Here is the ticket requesting this feature - https://code.djangoproject.com/ticket/21039
But instead, you can manually specify any custom RunSQL operation in the migration -
https://docs.djangoproject.com/en/stable/ref/migration-operations/#runsql