How to make obj.save() without reversing object values in the db in django - django

I have recursive function and obj.save() is inside it.
how to prevent the query from db at every iteration?
is django transaction.atomic do that.

If you're using django >= 2.2 (which you should be using.. since ALL other versions of django are 100% out of support as of me writing this Jan 5, 2020) you can do this:
objs = []
for obj in Entry.objects.filter(...):
if not obj.condition:
continue
obj.headline = 'something!!!'
obj.author = 'John Smith'
objs.append(obj)
with transaction.atomic():
Entry.objects.bulk_update(objs, ['headline', 'author'])
Couple of things to note:
all the work is done outside of the transaction.atomic
transaction.atomic means that if anything fails inside that block, it will rollback the WHOLE work (transaction) and not keep a piece of it around. Example: you have 2 authors to save, first one saves successfully, second one does not. Because is inside the transaction atomic, it means both of them are NOT committed. It has nothing to do with doing it all in one query
More information could be found here: https://docs.djangoproject.com/en/3.1/ref/models/querysets/#bulk-update

Related

django.db.transaction.TransactionManagementError: cannot perform saving of other object in model within transaction

Can't seem to find much info about this. This is NOT happening in a django test. I'm using DATABASES = { ATOMIC_REQUESTS: True }. Within a method (in mixin I created) called by the view, I'm trying to perform something like this:
def process_valid(self, view):
old_id = view.object.id
view.object.id = None # need a new instance in db
view.object.save()
old_fac = Entfac.objects.get(id=old_id)
new_fac = view.object
old_dets = Detfac.objects.filter(fk_ent__id__exact = old_fac.id)
new_formset = view.DetFormsetClass(view.request.POST, instance=view.object, save_as_new=True)
if new_formset.is_valid():
new_dets = new_formset.save()
new_fac.fk_cancel = old_fac # need a fk reference to initial fac in new one
old_fac.fk_cancel = new_fac # need a fk reference to new in old fac
# any save() action after this crashes with TransactionManagementError
new_fac.save()
I do not understand this error. I already created & saved a new object in db (when I set the object.id to None & saved that). Why would creating other objects create an issue for further saves?
I have tried not instantiating the new_dets objects with the Formset, but instead explicitely defining them:
new_det = Detfac(...)
new_det.save()
But then again, any further save after that raises the error.
Further details:
Essentially, I have an Entfac model, and a Detfac model that has a foreignkey to Entfac. I need to instantiate a new Enfac (distinct in db), as well as corresponding new Detfac for the new Entfac. Then I need to change some values in some of the fields for both new & old objects, and save all that to db.
Ah. The code above is fine.
But turns out, signals can be bad. I had forgotten that upon saving Detfac, there is a signal that goes to another class and that depending on the circumstances, adds a record to another table (sort of an history table).
Since that signal is just a single operation. Something like that:
#receiver(post_save, sender=Detfac)
def quantity_adjust_detfac(sender, **kwargs):
try:
detfac_qty = kwargs["instance"].qte
product = kwargs["instance"].fk_produit
if kwargs["created"]:
initial = {# bunch of values}
adjustment = HistoQuantity(**initial)
adjustment.save()
else:
except TypeError as ex:
logger.error(f"....")
except AttributeError as ex:
logger.error(f"....")
In itself, the fact that THIS wasn't marked as atomic isn't problematic. BUT if one of those exception throws, THEN I get the transactionmanagementerror. I am still not 100% sure why, tough the django docs do mention that when wrapping a whole view in atomic (or any chunk of code for that matter), then try/except within that block can yield unexpected result, because DJango does rely on exception to decide whether or not to commit the transaction as a whole. And the data I was testing with actually threw the exception (type error when creating the HistoQuantity object).
Wrapping the try/exception with a transaction.atomic manager worked however. Guessing that this... removed/handled the throw, thus the outer atomic could work.

Django when looping over a queryset, when does the db read happen?

I am looping through my database and updating all my Company objects.
for company in Company.objects.filter(updated=False):
driver.get(company.company_url)
company.adress = driver.find_element_by_id("address").text
company.visited = True
company.save()
My problem is that it's taking too long so I wanted to run another instance of this same code, but I'm curious when the actual db reads happen. If company.visited get's changed to True while this loop is running, will still be visited by this loop? What if I added a second check for visited? I don't want to start a second loop if the first instance isn't going to recognize the work of the second instance:
for company in Company.objects.filter(updated=False):
if company.visited:
continue
driver.get(company.company_url)
company.adress = driver.find_element_by_id("address").text
company.visited = True
company.save()
Company.objects.filter(updated=False) translates to an ordinary SQL query:
SELECT * FROM appName_company WHERE updated is false
This SQL query is executed when you start iterating through Company objects. It's executed only once. The second server will not recognize the work of the first one, because they both will go through the same Company objects.
Lock rows to avoid race conditions using atomic transactions and select_for_update():
from django.db import transaction
for company in Company.objects.filter(updated=False):
with transaction.atomic():
Company.objects.select_for_update().get(id=company.id)
if company.visited:
continue
driver.get(company.company_url)
company.adress = driver.find_element_by_id("address").text
company.visited = True
company.save()
You can run this code on multiple servers. Each Company will be processed just once.
If you need to execute this code regularly, I highly recommend using Celery. Dispatch a task per each company, and let multiple workers do the work in parallel:
from celery import shared_task
#shared_task
def dispatch_tasks():
for company in Company.objects.filter(updated=False):
process_company.delay(company.id)
#shared_task
#transaction.atomic
def process_company(company_id):
company = Company.objects.select_for_update().get(id=company_id)
if company.visited:
continue
driver.get(company.company_url)
company.adress = driver.find_element_by_id("address").text
company.visited = True
company.save()
Edit: oh, I see that you've tagged the question with the sqlite tag. I recommend switching to PostgreSQL, as SQLite is really bad at concurrency. My answer should work with SQlite, but locks may slow down the database.

Multi-DB Transactions

Django Version 1.10.5 with Postgres 9.6.1
For the last year I've been working in a multi-schema default database environment. However things are beginning to grow to the point I've decided to split the single database into 3 databases.
I've got things working with a master/slave router for all 3 databases.
I am not using the 'default' database key. Instead I have 'db1', 'db2', and 'db3'
The part I am confused about is with transactions in this multi-database environment.
In this example it fails as expected. Caused of course by not using #transaction.atomic(using='db1') which is clear to me.
#transaction.atomic()
def edit(self, context):
"""Edit
:param dict context: Context
:return: None
"""
# Check if employee exists
try:
result = Passport.objects.get(pk=self.user.employee_id)
except Passport.DoesNotExist:
return False
result.name = context.get('name')
result.save()
However I have this strange example, simply because I'm trying to understand... I would have expected this to fail but it does not:
#transaction.atomic(using='db1')
def edit(self, context):
"""Edit
:param dict context: Context
:return: None
"""
# Check if employee exists
try:
result = Passport.objects.get(pk=self.user.employee_id)
except Passport.DoesNotExist:
return False
result.name = context.get('name')
with transaction.atomic(using='db2'):
result.save()
The model Passport does not exist in DB2 models at all.
My router is setup so that all writes go to each respected DB.
So what is the purpose of setting the using='db1' in the atomic transaction? I've looked at the source and I see it defaults to default when not "using".
In the above example I even made another transaction inside of the initial transaction but this time using='db2' where the model doesn't even exist. I figured that would have failed, but it didn't and the data was written to the proper database.
I bring this up because there will be situations where I need to interact with all 3 databases and if a single problem occurs when writing to all 3 databases, all 3 need to be rolled back or if on success of everything, then committed of course.
Perhaps someone can help break this down for me so I can understand?
You're interpreting transaction.atomic(using='X') to mean: run the following database commands on X, inside a transaction.
In fact, it just means: open a transaction on database X, and then either commit it or roll it back at the end of the block.
Or, as the documentation puts it:
Under the hood, Django’s transaction management code:
opens a transaction when entering the outermost atomic block;
commits or rolls back the transaction when exiting the outermost block.
The question of which database to use for a given command is determined by your router, not the using clause. So your transaction.atomic(using='db2') block is pointless (it will simply open a transaction on db2 and then close it), but not an error.

Django test case db giving inconsistent responses, caching or transaction culprit?

I am seeing some really surprising and frustrating behavior with Django testing. Model objects are being "found" by a related lookup, but no model objects exist. (I apologize for the weird description here...the behavior is bizarre enough that I don't know quite how to describe it. Do the objects exist? Do I exist? Do you??)
I need them to exist, so I have a method in place that creates them if they don't exist. The problem is that on one line, Django finds that they do exist, and therefore they are not created...and then on the next line we can confirm that no such objects exist.
My tests are giving Errors in test_something() related to the absence of the necessary TaskMetadata object.
#the model
class TaskMetadata(models.Model):
task = models.OneToOneField(ContentType)
...
#the test
class SimpleTest(TestCase):
def setUp(self):
some_utility_function()
def test_something(self):
...something that requires TaskMetadata...
def some_utility_function():
task = ...whatever...
ctype = ContentType.objects.get_for_model(task)
try:
ctype.taskmetadata
except TaskMetadata.DoesNotExist:
...create TaskMetadata...
print "Created TaskMetadata object for %s" % task.__name__
else:
print "TaskMetadata object already exists for %s" % task.__name__
print ctype.taskmetadata
print "ALL OF THEM!! %s" % TaskMetadata.objects.all()
and the printed result of some_utility_function():
TaskMetadata object already exists for SomeTask
some task
ALL OF THEM!! [] # <-- NOTE EMPTY QUERYSET
In summary: "Yes, TaskMetadata object exists. Yes, TaskMetadata object exists. No, there are no TaskMetadata objects at all!!"
So, seriously, what on earth is going on here? Is this a cache problem? I tried clearing the cache (wild guess; I don't have CACHES configured in settings.py)
def setUp(self):
cache.clear()
some_utility_function()
Does not help. Transactions maybe? I'm stumped. How do I even debug this?
UPDATE:
See a minimal django project that replicates the issue here.
When the first testcase runs, TaskMetadata.objects.all() is NOT an empty queryset (it is in fact populated with objects, as I would expect); when the second testcase (exactly the same as the first) runs, it is empty.
I suspect this has something to do with database flushing between testcases that is clearing out the TaskMetadata objects, but the related lookup is cached, and so the next time some_utility_function() is called for the next testcase, it doesn't create any TaskMetadata objects. 1) Is that plausible? 2) How to work around it? 3) This is a Django bug, right?
Django bug ticket
In your tearDown method you need to call ContentType.objects.clear_cache(). This is because Django caches calls to ContentType.objects.get_for_model. Having a one-to-one to content type is a bit weird, so I don't think django needs to make any changes for this, especially as it should be a one line fix for you.
The problem here is the "finally" clause.
A finally clause is always executed before leaving the try statement, whether an exception has occurred or not.
http://docs.python.org/2/tutorial/errors.html
So, the finally clause containing the print statements will always be executed.

How do I deal with this race condition in django?

This code is supposed to get or create an object and update it if necessary. The code is in production use on a website.
In some cases - when the database is busy - it will throw the exception "DoesNotExist: MyObj matching query does not exist".
# Model:
class MyObj(models.Model):
thing = models.ForeignKey(Thing)
owner = models.ForeignKey(User)
state = models.BooleanField()
class Meta:
unique_together = (('thing', 'owner'),)
# Update or create myobj
#transaction.commit_on_success
def create_or_update_myobj(owner, thing, state)
try:
myobj, created = MyObj.objects.get_or_create(owner=user,thing=thing)
except IntegrityError:
myobj = MyObj.objects.get(owner=user,thing=thing)
# Will sometimes throw "DoesNotExist: MyObj matching query does not exist"
myobj.state = state
myobj.save()
I use an innodb mysql database on ubuntu.
How do I safely deal with this problem?
This could be an off-shoot of the same problem as here:
Why doesn't this loop display an updated object count every five seconds?
Basically get_or_create can fail - if you take a look at its source, there you'll see that it's: get, if-problem: save+some_trickery, if-still-problem: get again, if-still-problem: surrender and raise.
This means that if there are two simultaneous threads (or processes) running create_or_update_myobj, both trying to get_or_create the same object, then:
first thread tries to get it - but it doesn't yet exist,
so, the thread tries to create it, but before the object is created...
...second thread tries to get it - and this obviously fails
now, because of the default AUTOCOMMIT=OFF for MySQLdb database connection, and REPEATABLE READ serializable level, both threads have frozen their views of MyObj table.
subsequently, first thread creates its object and returns it gracefully, but...
...second thread cannot create anything as it would violate unique constraint
what's funny, subsequent get on the second thread doesn't see the object created in the first thread, due to the frozen view of MyObj table
So, if you want to safely get_or_create anything, try something like this:
#transaction.commit_on_success
def my_get_or_create(...):
try:
obj = MyObj.objects.create(...)
except IntegrityError:
transaction.commit()
obj = MyObj.objects.get(...)
return obj
Edited on 27/05/2010
There is also a second solution to the problem - using READ COMMITED isolation level, instead of REPEATABLE READ. But it's less tested (at least in MySQL), so there might be more bugs/problems with it - but at least it allows tying views to transactions, without committing in the middle.
Edited on 22/01/2012
Here are some good blog posts (not mine) about MySQL and Django, related to this question:
http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
http://www.no-ack.org/2011/05/broken-transaction-management-in-mysql.html
Your exception handling is masking the error. You should pass a value for state in get_or_create(), or set a default in the model and database.
One (dumb) way might be to catch the error and simply retry once or twice after waiting a small amount of time. I'm not a DB expert, so there might be a signaling solution.
Since 2012 in Django we have select_for_update which lock rows until the end of the transaction.
To avoid race conditions in Django + MySQL
under default circumstances:
REPEATABLE_READ in the Mysql
READ_COMMITTED in the Django
you can use this:
with transaction.atomic():
instance = YourModel.objects.select_for_update().get(id=42)
instance.evolve()
instance.save()
The second thread will wait for the first thread (lock), and only if first is done, the second will read data saved by first, so it will work on updated data.
Then together with get_or_create:
def select_for_update_or_create(...):
instance = YourModel.objects.filter(
...
).select_for_update().first()
if order is None:
instnace = YouModel.objects.create(...)
return instance
The function must be inside transaction block, otherwise, you will get from Django:
TransactionManagementError: select_for_update cannot be used outside of a transaction
Also sometimes it's good to use refresh_from_db()
In case like:
instance = YourModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.attr = response.something
you'd like to see:
instance = MyModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.refresh_from_db() # 3
instance.attr = response.something
and that # 3 will reduce a lot a time window of possible race conditions, thus chance for that.