How do I deal with this race condition in django? - django

This code is supposed to get or create an object and update it if necessary. The code is in production use on a website.
In some cases - when the database is busy - it will throw the exception "DoesNotExist: MyObj matching query does not exist".
# Model:
class MyObj(models.Model):
thing = models.ForeignKey(Thing)
owner = models.ForeignKey(User)
state = models.BooleanField()
class Meta:
unique_together = (('thing', 'owner'),)
# Update or create myobj
#transaction.commit_on_success
def create_or_update_myobj(owner, thing, state)
try:
myobj, created = MyObj.objects.get_or_create(owner=user,thing=thing)
except IntegrityError:
myobj = MyObj.objects.get(owner=user,thing=thing)
# Will sometimes throw "DoesNotExist: MyObj matching query does not exist"
myobj.state = state
myobj.save()
I use an innodb mysql database on ubuntu.
How do I safely deal with this problem?

This could be an off-shoot of the same problem as here:
Why doesn't this loop display an updated object count every five seconds?
Basically get_or_create can fail - if you take a look at its source, there you'll see that it's: get, if-problem: save+some_trickery, if-still-problem: get again, if-still-problem: surrender and raise.
This means that if there are two simultaneous threads (or processes) running create_or_update_myobj, both trying to get_or_create the same object, then:
first thread tries to get it - but it doesn't yet exist,
so, the thread tries to create it, but before the object is created...
...second thread tries to get it - and this obviously fails
now, because of the default AUTOCOMMIT=OFF for MySQLdb database connection, and REPEATABLE READ serializable level, both threads have frozen their views of MyObj table.
subsequently, first thread creates its object and returns it gracefully, but...
...second thread cannot create anything as it would violate unique constraint
what's funny, subsequent get on the second thread doesn't see the object created in the first thread, due to the frozen view of MyObj table
So, if you want to safely get_or_create anything, try something like this:
#transaction.commit_on_success
def my_get_or_create(...):
try:
obj = MyObj.objects.create(...)
except IntegrityError:
transaction.commit()
obj = MyObj.objects.get(...)
return obj
Edited on 27/05/2010
There is also a second solution to the problem - using READ COMMITED isolation level, instead of REPEATABLE READ. But it's less tested (at least in MySQL), so there might be more bugs/problems with it - but at least it allows tying views to transactions, without committing in the middle.
Edited on 22/01/2012
Here are some good blog posts (not mine) about MySQL and Django, related to this question:
http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
http://www.no-ack.org/2011/05/broken-transaction-management-in-mysql.html

Your exception handling is masking the error. You should pass a value for state in get_or_create(), or set a default in the model and database.

One (dumb) way might be to catch the error and simply retry once or twice after waiting a small amount of time. I'm not a DB expert, so there might be a signaling solution.

Since 2012 in Django we have select_for_update which lock rows until the end of the transaction.
To avoid race conditions in Django + MySQL
under default circumstances:
REPEATABLE_READ in the Mysql
READ_COMMITTED in the Django
you can use this:
with transaction.atomic():
instance = YourModel.objects.select_for_update().get(id=42)
instance.evolve()
instance.save()
The second thread will wait for the first thread (lock), and only if first is done, the second will read data saved by first, so it will work on updated data.
Then together with get_or_create:
def select_for_update_or_create(...):
instance = YourModel.objects.filter(
...
).select_for_update().first()
if order is None:
instnace = YouModel.objects.create(...)
return instance
The function must be inside transaction block, otherwise, you will get from Django:
TransactionManagementError: select_for_update cannot be used outside of a transaction
Also sometimes it's good to use refresh_from_db()
In case like:
instance = YourModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.attr = response.something
you'd like to see:
instance = MyModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.refresh_from_db() # 3
instance.attr = response.something
and that # 3 will reduce a lot a time window of possible race conditions, thus chance for that.

Related

django.db.transaction.TransactionManagementError: cannot perform saving of other object in model within transaction

Can't seem to find much info about this. This is NOT happening in a django test. I'm using DATABASES = { ATOMIC_REQUESTS: True }. Within a method (in mixin I created) called by the view, I'm trying to perform something like this:
def process_valid(self, view):
old_id = view.object.id
view.object.id = None # need a new instance in db
view.object.save()
old_fac = Entfac.objects.get(id=old_id)
new_fac = view.object
old_dets = Detfac.objects.filter(fk_ent__id__exact = old_fac.id)
new_formset = view.DetFormsetClass(view.request.POST, instance=view.object, save_as_new=True)
if new_formset.is_valid():
new_dets = new_formset.save()
new_fac.fk_cancel = old_fac # need a fk reference to initial fac in new one
old_fac.fk_cancel = new_fac # need a fk reference to new in old fac
# any save() action after this crashes with TransactionManagementError
new_fac.save()
I do not understand this error. I already created & saved a new object in db (when I set the object.id to None & saved that). Why would creating other objects create an issue for further saves?
I have tried not instantiating the new_dets objects with the Formset, but instead explicitely defining them:
new_det = Detfac(...)
new_det.save()
But then again, any further save after that raises the error.
Further details:
Essentially, I have an Entfac model, and a Detfac model that has a foreignkey to Entfac. I need to instantiate a new Enfac (distinct in db), as well as corresponding new Detfac for the new Entfac. Then I need to change some values in some of the fields for both new & old objects, and save all that to db.
Ah. The code above is fine.
But turns out, signals can be bad. I had forgotten that upon saving Detfac, there is a signal that goes to another class and that depending on the circumstances, adds a record to another table (sort of an history table).
Since that signal is just a single operation. Something like that:
#receiver(post_save, sender=Detfac)
def quantity_adjust_detfac(sender, **kwargs):
try:
detfac_qty = kwargs["instance"].qte
product = kwargs["instance"].fk_produit
if kwargs["created"]:
initial = {# bunch of values}
adjustment = HistoQuantity(**initial)
adjustment.save()
else:
except TypeError as ex:
logger.error(f"....")
except AttributeError as ex:
logger.error(f"....")
In itself, the fact that THIS wasn't marked as atomic isn't problematic. BUT if one of those exception throws, THEN I get the transactionmanagementerror. I am still not 100% sure why, tough the django docs do mention that when wrapping a whole view in atomic (or any chunk of code for that matter), then try/except within that block can yield unexpected result, because DJango does rely on exception to decide whether or not to commit the transaction as a whole. And the data I was testing with actually threw the exception (type error when creating the HistoQuantity object).
Wrapping the try/exception with a transaction.atomic manager worked however. Guessing that this... removed/handled the throw, thus the outer atomic could work.

How to make obj.save() without reversing object values in the db in django

I have recursive function and obj.save() is inside it.
how to prevent the query from db at every iteration?
is django transaction.atomic do that.
If you're using django >= 2.2 (which you should be using.. since ALL other versions of django are 100% out of support as of me writing this Jan 5, 2020) you can do this:
objs = []
for obj in Entry.objects.filter(...):
if not obj.condition:
continue
obj.headline = 'something!!!'
obj.author = 'John Smith'
objs.append(obj)
with transaction.atomic():
Entry.objects.bulk_update(objs, ['headline', 'author'])
Couple of things to note:
all the work is done outside of the transaction.atomic
transaction.atomic means that if anything fails inside that block, it will rollback the WHOLE work (transaction) and not keep a piece of it around. Example: you have 2 authors to save, first one saves successfully, second one does not. Because is inside the transaction atomic, it means both of them are NOT committed. It has nothing to do with doing it all in one query
More information could be found here: https://docs.djangoproject.com/en/3.1/ref/models/querysets/#bulk-update

Effecient Bulk Update of Model Records in Django

I'm building a Django app that will periodically take information from an external source, and use it to update model objects.
What I want to to be able to do is create a QuerySet which has all the objects which might match the final list. Then check which model objects need to be created, updated, and deleted. And then (ideally) perform the update in the fewest number of transactions. And without performing any unnecessary DB operations.
Using create_or_update gets me most of the way to what I want to do.
jobs = get_current_jobs(host, user)
for host, user, name, defaults in jobs:
obj, _ = Job.upate_or_create(host=host, user=user, name=name, defaults=defaults)
The problem with this approach is that it doesn't delete anything that no longer exists.
I could just delete everything up front, or do something dumb like
to_delete = set(Job.objects.filter(host=host, user=user)) - set(current)
(Which is an option) but I feel like there must already be an elegant solution that doesn't require either deleting everything, or reading everything into memory.
You should use Redis for storage and use this python package in your code. For example:
import redis
import requests
pool = redis.StrictRedis('localhost')
time_in_seconds = 3600 # the time period you want to keep your data
response = requests.get("url_to_ext_source")
pool.set("api_response", response.json(), ex=time_in_seconds)

Multi-DB Transactions

Django Version 1.10.5 with Postgres 9.6.1
For the last year I've been working in a multi-schema default database environment. However things are beginning to grow to the point I've decided to split the single database into 3 databases.
I've got things working with a master/slave router for all 3 databases.
I am not using the 'default' database key. Instead I have 'db1', 'db2', and 'db3'
The part I am confused about is with transactions in this multi-database environment.
In this example it fails as expected. Caused of course by not using #transaction.atomic(using='db1') which is clear to me.
#transaction.atomic()
def edit(self, context):
"""Edit
:param dict context: Context
:return: None
"""
# Check if employee exists
try:
result = Passport.objects.get(pk=self.user.employee_id)
except Passport.DoesNotExist:
return False
result.name = context.get('name')
result.save()
However I have this strange example, simply because I'm trying to understand... I would have expected this to fail but it does not:
#transaction.atomic(using='db1')
def edit(self, context):
"""Edit
:param dict context: Context
:return: None
"""
# Check if employee exists
try:
result = Passport.objects.get(pk=self.user.employee_id)
except Passport.DoesNotExist:
return False
result.name = context.get('name')
with transaction.atomic(using='db2'):
result.save()
The model Passport does not exist in DB2 models at all.
My router is setup so that all writes go to each respected DB.
So what is the purpose of setting the using='db1' in the atomic transaction? I've looked at the source and I see it defaults to default when not "using".
In the above example I even made another transaction inside of the initial transaction but this time using='db2' where the model doesn't even exist. I figured that would have failed, but it didn't and the data was written to the proper database.
I bring this up because there will be situations where I need to interact with all 3 databases and if a single problem occurs when writing to all 3 databases, all 3 need to be rolled back or if on success of everything, then committed of course.
Perhaps someone can help break this down for me so I can understand?
You're interpreting transaction.atomic(using='X') to mean: run the following database commands on X, inside a transaction.
In fact, it just means: open a transaction on database X, and then either commit it or roll it back at the end of the block.
Or, as the documentation puts it:
Under the hood, Django’s transaction management code:
opens a transaction when entering the outermost atomic block;
commits or rolls back the transaction when exiting the outermost block.
The question of which database to use for a given command is determined by your router, not the using clause. So your transaction.atomic(using='db2') block is pointless (it will simply open a transaction on db2 and then close it), but not an error.

Implementing unread/read checking for a message

I'm having a message model. To this model I want to add a read/unread field, which I did by using a boolean field. Now, if someone reads this message, I want this boolean field to be turned to true. I access these messages at different parts in my app, so updating the field manually is going to be tedious.
Is there any way I can get some messages according to some condition, and when the message is fetched from db, the field gets auto updated?
Why don't you create a read_message() method on a custom model manager. Have this method return the messages you want, whilst also updating the field on each message returned.
You new method allow you to replace Message.objects.get() with Message.objects.read_message()
class MessageManager(models.Manager):
def read_message(self, message_id):
# This won't fail quietly it'll raise an ObjectDoesNotExist exception
message = super(MessageManager, self).get(pk=message_id)
message.read = True
message.save()
return message
Then include the manager on your model -
class Message(models.Model):
objects = MessageManager()
Obviously you could write other methods that return querysets whilst marking all the messages returned as read.
If you don't want to update your code (places where you call Message.objects.get()), then you could always actually override get() so that it updates the read field. Just replace the read_message function name above with get.
Depending on your database management system, you may be able to install a trigger:
PostgreSQL: http://www.postgresql.org/docs/9.1/static/sql-createtrigger.html
MySQL: http://dev.mysql.com/doc/refman/5.0/en/triggers.html
SQLite: http://www.sqlite.org/lang_createtrigger.html
Of course, this will need to be done manually in the database - outside of the Django application.