Django prevent creating objects in concurrency

Django prevent creating objects in concurrency - django

models:
class CouponUsage(models.Model):
coupon = models.ForeignKey('Coupon', on_delete=models.CASCADE, related_name="usage")
date = models.DateTimeField(auto_now_add=True)
class Coupon(models.Model):
name = models.CharField(max_length=255)
capacity = models.IntegerField()
#property
def remaining(self):
usage = self.usage.all().count()
return self.capacity - usage
views:
def use_coupon(request):
coupon = Coupon.objects.get(condition)
if coupon.remaining > 0:
# do something
I don't know how to handle concurrency issues in the code above, I believe one possible bug is that when the if clause in the view is executing another CouponUsage object can be created...
how do I go about handling that?
how do I prevent CouponUsage objects from being created when inside the if clause in the view

One way of doing this would be to rely on the database integrity checks and transactions. Assuming your capacity must always be in the range [0, +infinity) you could change your Coupon model to use a PositiveIntegerField instead of an IntegerField:
class Coupon(models.Model):
name = models.CharField(max_length=255)
capacity = models.PositiveIntegerField()
Then you need to update your Coupon capacity every time a CouponUsage is created. You can override the save() method to reflect this change:
from django.db import models, transaction
class CouponUsage(models.Model):
coupon = models.ForeignKey('Coupon', on_delete=models.CASCADE, related_name="usage")
date = models.DateTimeField(auto_now_add=True)
#transaction.atomic()
def save(self, ...): # Arguments missing
if not self.pk: # This is an insert, you may want to raise an error otherwise
self.coupon.capacity = models.F('capacity') - 1 # The magic is here, this is executed at the database level so no problem with old in memory values
self.coupon.save()
super().save(...)
Now whenever a CuponUsage is created you update the capacity for the associated Coupon instance. The key here is that instead of reading the value from database into python's memory, updating and then saving, which could lead to inconsistent results, the update to capacity is made at the database level using an F expression. This guarantees that no two transactions use the same value.
Now, notice that by using a PositiveInteger field instead of an IntegerField the database will also guarantee that capacity cannot fall below 0. Therefore if you now try to create a CuponUsage instance such that the Cupon capacity would get a negative value, an exception will arise, thus preventing the creation of such CuponUsage.
You now need to take advantage of this in your code by doing something like the following:
def use_coupon(request):
coupon = Coupon.objects.get(condition)
try:
usage = CuponUsage.objects.create(coupon=coupon)
# Do whatever you want here, you already 'consumed' a coupon
except IntegrityError: # Check for the specific exception
# Sorry no capacity left
pass
If in the event of getting the coupon you need to do things that may fail, and in such a case you need to 'revert' the usage, you can enclose your whole use_coupon function inside a transaction.

Related

Django: object creation in atomic transaction

I have a simple Task model:
class Task(models.Model):
name = models.CharField(max_length=255)
order = models.IntegerField(db_index=True)
And a simple task_create view:
def task_create(request):
name = request.POST.get('name')
order = request.POST.get('order')
Task.objects.filter(order__gte=order).update(order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
return HttpResponse(new_task.id)
View shifts existing tasks that goes after newly created by + 1, then creates a new one.
And there are lots of users of this method, and I suppose something will go wrong one day with ordering because update and create definitely should be performed together.
So, I just want to be shure, will it be enough to avoid any data corruptions:
from django.db import transaction
def task_create(request):
name = request.POST.get('name')
order = request.POST.get('order')
with transaction.atomic():
Task.objects.select_for_update().filter(order__gte=order).update(order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
return HttpResponse(new_task.id)
1) Probably, something more should be done in task creation line like select_for_update before filter of existing Task.objects?
2) Does it matter where return HttpResponse() is located? Inside transaction block or outside?
Big thx

1) Probably, something more should be done in task creation line like select_for_update before filter of existing Task.objects?
No - what you have currently looks fine and should work the way you want it to.
2) Does it matter where return HttpResponse() is located? Inside transaction block or outside?
Yes, it does matter. You need to return a response to the client regardless of whether your transaction was successful or not - so it definitely needs to be outside of the transaction block. If you did it inside the transaction, the client would get a 500 Server Error if the transaction failed.
However if the transaction fails, then you will not have a new task ID and cannot return that in your response. So you probably need to return different responses depending on whether the transaction succeeds, e.g,:
from django.db import IntegrityError, transaction
try:
with transaction.atomic():
Task.objects.select_for_update().filter(order__gte=order).update(
order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
except IntegrityError:
# Transaction failed - return a response notifying the client
return HttpResponse('Failed to create task, please try again!')
# If it succeeded, then return a normal response
return HttpResponse(new_task.id)

You could also try to change your model so you don't need to update so many other rows when inserting a new one.
For example, you could try something resembling a double-linked list.
(I used long explicit names for fields and variables here).
# models.py
class Task(models.Model):
name = models.CharField(max_length=255)
task_before_this_one = models.ForeignKey(
Task,
null=True,
blank=True,
related_name='task_before_this_one_set')
task_after_this_one = models.ForeignKey(
Task,
null=True,
blank=True,
related_name='tasks_after_this_one_set')
Your task at the top of the queue would be the one that has the field task_before_this_one set to null. So to get the first task of the queue:
# these will throw exceptions if there are many instances
first_task = Task.objects.get(task_before_this_one=None)
last_task = Task.objects.get(task_after_this_one=None)
When inserting a new instance, you just need to know after which task it should be placed (or, alternatively, before which task). This code should do that:
def task_create(request):
new_task = Task.objects.create(
name=request.POST.get('name'))
task_before = get_object_or_404(
pk=request.POST.get('task_before_the_new_one'))
task_after = task_before.task_after_this_one
# modify the 2 other tasks
task_before.task_after_this_one = new_task
task_before.save()
if task_after is not None:
# 'task_after' will be None if 'task_before' is the last one in the queue
task_after.task_before_this_one = new_task
task_after.save()
# update newly create task
new_task.task_before_this_one = task_before
new_task.task_after_this_one = task_after # this could be None
new_task.save()
return HttpResponse(new_task.pk)
This method only updates 2 other rows when inserting a new row. You might still want to wrap the whole method in a transaction if there is really high concurrency in your app, but this transaction will only lock up to 3 rows, not all the others as well.
This approach might be of use to you if you have a very long list of tasks.
EDIT: how to get an ordered list of tasks
This can not be done at the database level in a single query (as far as I know), but you could try this function:
def get_ordered_task_list():
# get the first task
aux_task = Task.objects.get(task_before_this_one=None)
task_list = []
while aux_task is not None:
task_list.append(aux_task)
aux_task = aux_task.task_after_this_one
return task_list
As long as you only have a few hundered tasks, this operation should not take that much time so that it impacts the response time. But you will have to try that out for yourself, in your environment, your database, your hardware.

Implementing multiple person relationship

I've done a facebook like model, but I want the Personne to have more than one link with another Personne.
I have an intermediary table PersonneRelation with a custom save method. The idea is: when I add a relation to a person, I want to create another relation the other way. The problem is that if I try to save in the save method it's a recursive call. So my idea was to create a variable of the class and set it to True only when I want to avoid recursion.
Here's how I did:
class Personne(models.Model):
user = models.OneToOneField(User)
relations = models.ManyToManyField('self', through='PersonneRelation',
symmetrical=False)
class PersonneRelation(models.Model):
is_saving = False
# TAB_TYPES omitted for brevity
type_relation = models.CharField(max_length=1,
choices=[(a, b) for a, b in
list(TAB_TYPES.items())],
default=TYPE_FRIEND)
src = models.ForeignKey('Personne', related_name='src')
dst = models.ForeignKey('Personne', related_name='dst')
opposite = models.ForeignKey('PersonneRelation',
null=True, blank=True, default=None)
def save(self, *args, **kwargs):
if self.is_saving:
return super(PersonneRelation, self).save(args, kwargs)
old = None
if self.pk and self.opposite:
old = self.type_relation
retour = super(PersonneRelation, self).save(args, kwargs)
if old:
PersonneRelation.objects.filter(
src=self.dst, dst=self.src, opposite=self, type_relation=old
).update(type_relation=self.type_relation)
if self.opposite is None:
self.opposite = PersonneRelation(
src=self.dst, dst=self.src, opposite=self,
type_relation=self.type_relation, is_reverse=True)
self.opposite.save()
self.is_saving = True
self.save()
self.is_saving = False
return retour
My question is: is it safe to do so (using a class variable is_saving) (I dont know how Python deals with such variables)? If not, why? I feel like it's not ok, so what are the other possibilities to implement multiple many to many relationship that should behave like that?

Unfortunately, it's not safe, because it's not thread-safe. When two simultaneous Django threads will try to save your model, the behaviour can be unpredictable.
If you want to have more reliable locking, take a look, for example, at the Redis locking.
But to be honest, I'd try to implement it using plain reverse relations, maybe incapsulating the complexity into the ModelManager.

Here's how I modified it: I totally removed the save method and used the post_save message to check:
if it was created without opposite side, I create here with opposite side as the one created (and I can do it here without any problem!) then I update the one created with the "opposite"
if it wasn't created, this is an update, so just make sure the opposite side is changed as well.
I did this because I'll almost never have to change relationships between people, and when I'll create new ones there wont be any possible race conditions, because of the context where I will create new relationships
#receiver(post_save, sender=PersonneRelation)
def signal_receiver(sender, **kwargs):
created = kwargs['created']
obj = kwargs['instance']
if created and not obj.opposite:
opposite = PersonneRelation(
src=obj.dst, dst=obj.src, opposite=obj,
type_relation=obj.type_relation, is_reverse=True)
opposite.save()
obj.opposite = opposite
obj.save()
elif not created and obj.type_relation != obj.opposite.type_relation:
obj.opposite.type_relation = obj.type_relation
obj.opposite.save()

If I get the idea behind your code, then:
Django automatically makes relation available on both ends so you can get from src Personne to dst Personne via PersonneRelation and reverse dst -> src in your code. Therefore no need for additional opposite field in PersonneRelation.
If you need to have both symmetrical and asymmetrical realtions, i.e. src -> dst, but not dst -> src for particaular record, then I would suggest to add boolean field:
class PersonneRelation(models.Model):
symmetrical = models.BooleanField(default=False)
this way you can check if symmetrical is True when accessing relation in your code to identify if it's scr -> dst only or both src -> dst and dst -> src. In facebook terms: if symmetrical is False you get src is subscriber of dst, if it's True you get mutual friendship between src and dst. You might want to define custom manager to incapsulate this behavior, though it's more advanced topic.
If you need to check if the model instance is being saved or updated, there's no need in is_saving boolean field. Since you're using automatic primary key field, you can just check if pk on model instance is None. In Django before the model instance is first time saved to DB ('created') pk is None, when the instance is 'updated' (it has been read from DB before and is being saved now with some field values changed) it's pk is set to pk value from DB. This is the way Django ORM decides if it should update or create new record.
In general when redefining Save method on a model, or when using signals like pre_save/post_save take into consideration, that those functions you define on them might not be called by Django in some circumstances, i.e. when the model is updated in bulk. See Django docs for more info.

Django Tests: setUpTestData on Postgres throws: "Duplicate key value violates unique constraint"

I am running into a database issue in my unit tests. I think it has something to do with the way I am using TestCase and setUpData.
When I try to set up my test data with certain values, the tests throw the following error:
django.db.utils.IntegrityError: duplicate key value violates unique constraint
...
psycopg2.IntegrityError: duplicate key value violates unique constraint "InventoryLogs_productgroup_product_name_48ec6f8d_uniq"
DETAIL: Key (product_name)=(Almonds) already exists.
I changed all of my primary keys and it seems to be running fine. It doesn't seem to affect any of the tests.
However, I'm concerned that I am doing something wrong. When it first happened, I reversed about an hour's worth of work on my app (not that much code for a noob), which corrected the problem.
Then when I wrote the changes back in, the same issue presented itself again. TestCase is pasted below. The issue seems to occur after I add the sortrecord items, but corresponds with the items above it.
I don't want to keep going through and changing primary keys and urls in my tests, so if anyone sees something wrong with the way I am using this, please help me out. Thanks!
TestCase
class DetailsPageTest(TestCase):
#classmethod
def setUpTestData(cls):
cls.product1 = ProductGroup.objects.create(
product_name="Almonds"
)
cls.variety1 = Variety.objects.create(
product_group = cls.product1,
variety_name = "non pareil",
husked = False,
finished = False,
)
cls.supplier1 = Supplier.objects.create(
company_name = "Acme",
company_location = "Acme Acres",
contact_info = "Call me!"
)
cls.shipment1 = Purchase.objects.create(
tag=9,
shipment_id=9999,
supplier_id = cls.supplier1,
purchase_date='2015-01-09',
purchase_price=9.99,
product_name=cls.variety1,
pieces=99,
kgs=999,
crackout_estimate=99.9
)
cls.shipment2 = Purchase.objects.create(
tag=8,
shipment_id=8888,
supplier_id=cls.supplier1,
purchase_date='2015-01-08',
purchase_price=8.88,
product_name=cls.variety1,
pieces=88,
kgs=888,
crackout_estimate=88.8
)
cls.shipment3 = Purchase.objects.create(
tag=7,
shipment_id=7777,
supplier_id=cls.supplier1,
purchase_date='2014-01-07',
purchase_price=7.77,
product_name=cls.variety1,
pieces=77,
kgs=777,
crackout_estimate=77.7
)
cls.sortrecord1 = SortingRecords.objects.create(
tag=cls.shipment1,
date="2015-02-05",
bags_sorted=20,
turnout=199,
)
cls.sortrecord2 = SortingRecords.objects.create(
tag=cls.shipment1,
date="2015-02-07",
bags_sorted=40,
turnout=399,
)
cls.sortrecord3 = SortingRecords.objects.create(
tag=cls.shipment1,
date='2015-02-09',
bags_sorted=30,
turnout=299,
)
Models
from datetime import datetime
from django.db import models
from django.db.models import Q
class ProductGroup(models.Model):
product_name = models.CharField(max_length=140, primary_key=True)
def __str__(self):
return self.product_name
class Meta:
verbose_name = "Product"
class Supplier(models.Model):
company_name = models.CharField(max_length=45)
company_location = models.CharField(max_length=45)
contact_info = models.CharField(max_length=256)
class Meta:
ordering = ["company_name"]
def __str__(self):
return self.company_name
class Variety(models.Model):
product_group = models.ForeignKey(ProductGroup)
variety_name = models.CharField(max_length=140)
husked = models.BooleanField()
finished = models.BooleanField()
description = models.CharField(max_length=500, blank=True)
class Meta:
ordering = ["product_group_id"]
verbose_name_plural = "Varieties"
def __str__(self):
return self.variety_name
class PurchaseYears(models.Manager):
def purchase_years_list(self):
unique_years = Purchase.objects.dates('purchase_date', 'year')
results_list = []
for p in unique_years:
results_list.append(p.year)
return results_list
class Purchase(models.Model):
tag = models.IntegerField(primary_key=True)
product_name = models.ForeignKey(Variety, related_name='purchases')
shipment_id = models.CharField(max_length=24)
supplier_id = models.ForeignKey(Supplier)
purchase_date = models.DateField()
estimated_delivery = models.DateField(null=True, blank=True)
purchase_price = models.DecimalField(max_digits=6, decimal_places=3)
pieces = models.IntegerField()
kgs = models.IntegerField()
crackout_estimate = models.DecimalField(max_digits=6,decimal_places=3, null=True)
crackout_actual = models.DecimalField(max_digits=6,decimal_places=3, null=True)
objects = models.Manager()
purchase_years = PurchaseYears()
# Keep manager as "objects" in case admin, etc. needs it. Filter can be called like so:
# Purchase.objects.purchase_years_list()
# Managers in docs: https://docs.djangoproject.com/en/1.8/intro/tutorial01/
class Meta:
ordering = ["purchase_date"]
def __str__(self):
return self.shipment_id
def _weight_conversion(self):
return round(self.kgs * 2.20462)
lbs = property(_weight_conversion)
class SortingModelsBagsCalulator(models.Manager):
def total_sorted(self, record_date, current_set):
sorted = [SortingRecords['bags_sorted'] for SortingRecords in current_set if
SortingRecords['date'] <= record_date]
return sum(sorted)
class SortingRecords(models.Model):
tag = models.ForeignKey(Purchase, related_name='sorting_record')
date = models.DateField()
bags_sorted = models.IntegerField()
turnout = models.IntegerField()
objects = models.Manager()
def __str__(self):
return "%s [%s]" % (self.date, self.tag.tag)
class Meta:
ordering = ["date"]
verbose_name_plural = "Sorting Records"
def _calculate_kgs_sorted(self):
kg_per_bag = self.tag.kgs / self.tag.pieces
kgs_sorted = kg_per_bag * self.bags_sorted
return (round(kgs_sorted, 2))
kgs_sorted = property(_calculate_kgs_sorted)
def _byproduct(self):
waste = self.kgs_sorted - self.turnout
return (round(waste, 2))
byproduct = property(_byproduct)
def _bags_remaining(self):
current_set = SortingRecords.objects.values().filter(~Q(id=self.id), tag=self.tag)
sorted = [SortingRecords['bags_sorted'] for SortingRecords in current_set if
SortingRecords['date'] <= self.date]
remaining = self.tag.pieces - sum(sorted) - self.bags_sorted
return remaining
bags_remaining = property(_bags_remaining)
EDIT
It also fails with integers, like so.
django.db.utils.IntegrityError: duplicate key value violates unique constraint "InventoryLogs_purchase_pkey"
DETAIL: Key (tag)=(9) already exists.
UDPATE
So I should have mentioned this earlier, but I completely forgot. I have two unit test files that use the same data. Just for kicks, I matched a primary key in both instances of setUpTestData() to a different value and sure enough, I got the same error.
These two setups were working fine side-by-side before I added more data to one of them. Now, it appears that they need different values. I guess you can only get away with using repeat data for so long.

I continued to get this error without having any duplicate data but I was able to resolve the issue by initializing the object and calling the save() method rather than creating the object via Model.objects.create()
In other words, I did this:
#classmethod
def setUpTestData(cls):
cls.person = Person(first_name="Jane", last_name="Doe")
cls.person.save()
Instead of this:
#classmethod
def setUpTestData(cls):
cls.person = Person.objects.create(first_name="Jane", last_name="Doe")

I've been running into this issue sporadically for months now. I believe I just figured out the root cause and a couple solutions.
Summary
For whatever reason, it seems like the Django test case base classes aren't removing the database records created by let's just call it TestCase1 before running TestCase2. Which, in TestCase2 when it tries to create records in the database using the same IDs as TestCase1 the database raises a DuplicateKey exception because those IDs already exists in the database. And even saying the magic word "please" won't help with database duplicate key errors.
Good news is, there are multiple ways to solve this problem! Here are a couple...
Solution 1
Make sure if you are overriding the class method tearDownClass that you call super().tearDownClass(). If you override tearDownClass() without calling its super, it will in turn never call TransactionTestCase._post_teardown() nor TransactionTestCase._fixture_teardown(). Quoting from the doc string in TransactionTestCase._post_teardown()`:
def _post_teardown(self):
"""
Perform post-test things:
* Flush the contents of the database to leave a clean slate. If the
class has an 'available_apps' attribute, don't fire post_migrate.
* Force-close the connection so the next test gets a clean cursor.
"""
If TestCase.tearDownClass() is not called via super() then the database is not reset in between test cases and you will get the dreaded duplicate key exception.
Solution 2
Override TransactionTestCase and set the class variable serialized_rollback = True, like this:
class MyTestCase(TransactionTestCase):
fixtures = ['test-data.json']
serialized_rollback = True
def test_name_goes_here(self):
pass
Quoting from the source:
class TransactionTestCase(SimpleTestCase):
...
# If transactions aren't available, Django will serialize the database
# contents into a fixture during setup and flush and reload them
# during teardown (as flush does not restore data from migrations).
# This can be slow; this flag allows enabling on a per-case basis.
serialized_rollback = False
When serialized_rollback is set to True, Django test runner rolls back any transactions inserted into the database beween test cases. And batta bing, batta bang... no more duplicate key errors!
Conclusion
There are probably many more ways to implement a solution for the OP's issue, but these two should work nicely. Would definitely love to have more solutions added by others for clarity sake and a deeper understanding of the underlying Django test case base classes. Phew, say that last line real fast three times and you could win a pony!

The log you provided states DETAIL: Key (product_name)=(Almonds) already exists. Did you verify in your db?
To prevent such errors in the future, you should prefix all your test data string by test_

I discovered the issue, as noted at the bottom of the question.
From what I can tell, the database didn't like me using duplicate data in the setUpTestData() methods of two different tests. Changing the primary key values in the second test corrected the problem.

I think the problem here is that you had a tearDownClass method in your TestCase without the call to super method.
In this way the django TestCase lost the transactional functionalities behind the setUpTestData so it doesn't clean your test db after a TestCase is finished.
Check warning in django docs here:
https://docs.djangoproject.com/en/1.10/topics/testing/tools/#django.test.SimpleTestCase.allow_database_queries

I had similar problem that had been caused by providing the primary key value to a test case explicitly.
As discussed in the Django documentation, manually assigning a value to an auto-incrementing field doesn’t update the field’s sequence, which might later cause a conflict.
I have solved it by altering the sequence manually:
from django.db import connection
class MyTestCase(TestCase):
#classmethod
def setUpTestData(cls):
Model.objects.create(id=1)
with connection.cursor() as c:
c.execute(
"""
ALTER SEQUENCE "app_model_id_seq" RESTART WITH 2;
"""
)

Is it possible to write a QuerySet method that modifies the dataset but delays evaluation (similar to prefetch_related)?

I'm working on a QuerySet class that does something similar to prefetch_related but allows the query to link data that's in an unconnected database (basically, linking records from django apps's database to records in a legacy system, using a shared unique key, something along the links of:
class UserFoo(models.Model):
''' Uses the django database & can link to User model '''
user = models.OneToOneField(User, related_name='userfoo')
foo_record = models.CharField(
max_length=32,
db_column="foo",
unique=True
) # uuid pointing to legacy db table
#property
def foo(self):
if not hasattr(self, '_foo'):
self._foo = Foo.objects.get(uuid=self.foo_record)
return self._foo
#foo.setter
def foo(self, foo_obj):
self._foo = foo_obj
and then
class Foo(models.Model):
'''Uses legacy database'''
id = models.AutoField(primary_key=True)
uuid = models.CharField(max_length=32) # uuid for Foo legacy db table
…
#property
def user(self):
if not hasattr(self, '_user'):
self._user = User.objects.get(userfoo__foo_record=self.uuid)
return self._user
#user.setter
def user(self, user_obj):
self._user = user_obj
Run normally, a query that matches 100 foos (each with, say, 1 user record) will end up requiring 101 queries: one to get the foos, and a hundred for each user record (by doing a look up for the user record by calling the user property on each food).
To get around this, I am making something similar to prefetch_related which pulls all of the matching records for a query by the key, which means I just need one additional query to get the remaining records.
My code looks something like this:
class FooWithUserQuerySet(models.query.QuerySet):
def with_foo(self):
qs = self._clone()
foo_idx = {}
for record in self.all():
foo_idx.setdefault(record.uuid, []).append(record)
users = User.objects.filter(
userfoo__foo_record__in=foo_idx.keys()
).select_related('django','relations','here')
user_idx = {}
for user in users:
user_idx[user.userfoo.foo_record] = user
for fid, frecords in foo_idx.items():
user = user_idx.get(fid)
for frecord in frecords:
if user:
setattr(frecord, 'user', user)
return qs
This works, but any extra data saved to a foo is lost if the query is later modified — that is, if the queryset is re-ordered or filtered in any way.
I would like a way to create a method that does exactly what I am doing now, but waits until the moment that adjusts whenever the query is evaluated, so that foo records always have a User record.
Some notes:
the example has been highly simplified. There are actually a lot of tables that link up to the legacy data, and so for example although there is a one-to-on relationship between Foo and User, there will be some cases where a queryset will have multiple Foo records with the same key.
the legacy database is on a different server and server platform, so I can't link the two tables using a database server itself
ideally I'd like the User data to be cached, so that even if the records are sorted or sliced I don't have to re-run the foo query a second time.
Basically, I don't know enough about the internals of how the lazy evaluation of querysets works in order to do the necessary coding. I have jumped back and forth on the source code for django.db.models.query but it really is a fairly dense read and I'm hoping someone out there who's worked with this already can offer some pointers.

How do I set a model field to the current count?

Say I have some groups with some items. I would like items to have a unique index within a group:
class Item(models.Model):
group = models.ForeignKey(Group, null=True)
index = models.IntegerField(default=0)
class Meta:
unique_together=('group','index')
def save(self):
if self.pk is None and self.group_id is not None:
self.thread_index = Item.objects.filter(group_id=group_id).count()+1
return super(Item, self).save()
But this is problematic because the update is not atomic, i.e. another transaction may have added another row after I calculate thread_index and before I write to the database.
I understand that I can get it working with catching IntegrityError and retrying. But I wonder if there's a good way to populate it atomically as part of the insert command, with SQL something like:
insert into app_item (group, index)
select 25, count(*) from app_item where group_id=25

Since you're using django models, what does adding an index field give you over the primary key added to all models unless you specify otherwise? Unless you need items to have sequential indices, which would still be problematic upon deleting objects, the id field would certainly appear to satisfy the requirement as you outlined it.
If you have to have this field, then an option could be to take advantage of signals. Write a receiver for the object's post_save signal and it could conceivably handle it.

One possible solution could be if you have issues with signals or custom save approach, calculate it at runtime by making index a property field
class Item(models.Model):
group = models.ForeignKey(Group, null=True)
def _get_index(self):
"Returns the index of groups"
return Item.objects.filter(group_id=group_id).count()+1
index = property(_get_index)
Now use index as field of Item model.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js