Concurrency issue or something else? .save() method + DB timing - django

So the situtation is this:
I have an endpoint A that creates data and calls .save() on that data (call this functionA) which also sends a post request to an external 3rd party API that will call my endpoint B (call this functionB)
def functionA():
try:
with transaction.atomic()
newData = Blog(title="new blog")
newData.save()
# findSavedBlog = Blog.objects.get(title="new blog")
# print(findSavedBlog)
r = requests.post('www.thirdpartyapi.com/confirm_blog_creation/', some_data) # this post request will trigger the third party to send a post request to endpoint calling functionB
return HttpResponse("Result was: " + r.status)
def functionB():
blogTitle = request.POST.get('blog_title') # assume this evaluates to 'new blog'
# sleep(20)
try:
findBlog = Blog.objects.get(title=blogTitle) # again this will be the same as Blog.objects.get(title="new blog")
except ObjectDoesNotExist as e:
print("Blog not found!")
If I uncomment the findSavedBlog portion of functionA, it will print the saved blog, but functionB will still fail.
If I add in a sleep to function B to wait for the DB to finish writing and then trying to fetch the newly created data, it still fails anyway.
Anyone with knowledge of Django's .save() method and/or some concurrency knowledge help me out here? Much appreciated. Thanks!
EDIT:
The issue was that I was wrapping all of functionA in an atomic block (forgot to write that part of functionA originally), which meant that the transactions don't commit until after functionA returns!

Related

Is there a way to disconnect the post_delete signal in simple history?

I need to bulk_create the historical records for each object deleted in a queryset. I've coded it correct I think as follows.
def bulk_delete_with_history(objects, model, batch_size=None, default_user=None, default_change_reason="", default_date=None):
"""
The package `simple_history` logs the deletion one object at a time.
This does it in bulk.
"""
model_manager = model._default_manager
model_manager.filter(pk__in=[obj.pk for obj in objects]).delete()
history_manager = get_history_manager_for_model(model)
history_type = "-"
historical_instances = []
for instance in objects:
history_user = getattr(
instance,
"_history_user",
default_user or history_manager.model.get_default_history_user(
instance),
)
row = history_manager.model(
history_date=getattr(
instance, "_history_date", default_date or timezone.now()
),
history_user=history_user,
history_change_reason=get_change_reason_from_object(
instance) or default_change_reason,
history_type=history_type,
**{
field.attname: getattr(instance, field.attname)
for field in instance._meta.fields
if field.name not in history_manager.model._history_excluded_fields
}
)
if hasattr(history_manager.model, "history_relation"):
row.history_relation_id = instance.pk
historical_instances.append(row)
return history_manager.bulk_create(
historical_instances, batch_size=batch_size
)
The problem though is I need to disconnect the post_delete signal so that a historical record isn't created by simple history before i do it all at once.
I've tried this but it doesn't work.
models.signals.post_delete.disconnect(HistoricalRecords.post_delete, sender=Customer)
Where Customer is just a class I'm using to test this utility function.
Can anybody advise? Thanks in advance.
Asked the question on their github page also - https://github.com/jazzband/django-simple-history/issues/717
I made a classic mistake with disconnecting signals. One must of course disconnect with the object that was connected. This is discussed here in more detail - django signal disconnect not working
This now works because I am disconnecting with reference to the correct object.
receiver_object = models.signals.post_delete._live_receivers(Customer)[0]
models.signals.post_delete.disconnect(receiver_object, sender=Customer)
Still, I think it would be nice if django-simple-history provided a bulk_delete utility function like they provide one for bulk_create and bulk_update.

TransactionManagementError While Executing Unittest in Django Rest Framework

I have written a test which checks whether IntegrityError was raised in case of duplicate records in the database. To create that scenario I am issue a REST API twice. The code looks like this:
class TestPost(APITestCase):
#classmethod
def setUpClass(cls):
super().setUpClass()
common.add_users()
def tearDown(self):
super().tearDown()
self.client.logout()
def test_duplicate_record(self):
# first time
response = self.client.post('/api/v1/trees/', dict(alias="some name", path="some path"))
# same request second time
response = self.client.post('/api/v1/trees/', dict(alias="some name", path="some path"))
self.assertEqual(response.status_code, status.HTTP_400_BAD_RREQUEST)
But I get an error stack like this
"An error occurred in the current transaction. You can't "
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.
How can I avoid this error this is certainly undesirable.
I've come across this issue today, and it took me a while to see the bigger picture, and fix it properly. It's true that removing self.client.logout() fixes the issue, and is probably not needed there, but the problem lies in your view.
Tests which are subclasses of TestCase wrap your test cases and tests in db transactions (atomic blocks), and your view somehow breaks that transaction. For me it was swallowing an IntegrityError exception. At that point, the transaction was already broken, but Django didn't know about it, so it couldn't correctly perform a rollback. Any query that would then be executed would cause a TransactionManagementError.
The fix for me was to correctly wrap the view code in yet another atomic block:
try:
with transaction.atomic(): # savepoint which will be rolled back
counter.save() # the operation which is expected to throw an IntegrityError
except IntegrityError:
pass # swallow the exception without breaking the transaction
This may not be a problem for you in production, if you're not using ATOMIC_REQUESTS, but I still think it's the correct solution.
Try removing self.client.logout() from the tearDown method. Django rolls back the transaction at the end of each test. You shouldn't have to log out manually.

Scrapy Request callback method never called

I am building a CrawlSpider using Scrapy 0.22.2 for Python 2.7.3 and am having problems with Requests, where the callback method that I specify is never called. Here is a snippet from my parsing method that initiates a Request within a elif block:
elif current_status == "Superseded":
#Need to do more work here. Have to check whether there is a replacement unit available. If there isn't, download whatever outline is there
# We need to look for a <td> element which contains "Is superseded by " and follow that link
updated_unit = hxs.xpath('/html/body/div[#id="page"]/div[#id="layoutWrapper"]/div[#id="twoColLayoutWrapper"]/div[#id="twoColLayoutLeft"]/div[#class="layoutContentWrapper"]/div[#class="outer"]/div[#class="fieldset"]/div[#class="display-row"]/div[#class="display-row"]/div[#class="display-field-info"]/div[#class="t-widget t-grid"]/table/tbody/tr[1]/td[contains(., "Is superseded by ")]/a')
# need child element a
updated_unit_link = updated_unit.xpath('#href').extract()[0]
updated_url = "http://training.gov.au" + updated_unit_link
print "\033[0;31mSuperceded by "+updated_url+"\033[0m" # prints in Red for superseded, need to follow this link to current
yield Request(url=updated_url, callback='sortSuperseded', dont_filter=True)
def sortSuperseded(self, response):
print "\033[0;35mtest callback called\033[0m"
There are no errors when I execute this and the url is OK, but sortSuperseded is never called as I never see the 'test callback called' printed in the console.
The url I am extracting is also within the domain that I specify for my CrawlSpider.
allowed_domains = ["training.gov.au"]
Where am I going wrong?
Quotes are not required around the callback method name. Change the line:
yield Request(url=updated_url, callback='sortSuperseded', dont_filter=True)
to
yield Request(updated_url, callback=self.sortSuperseded, dont_filter=True)

Django: Passing a request directly (inline) to a second view

I'm trying to call a view directly from another (if this is at all possible). I have a view:
def product_add(request, order_id=None):
# Works. Handles a normal POST check and form submission and redirects
# to another page if the form is properly validated.
Then I have a 2nd view, that queries the DB for the product data and should call the first one.
def product_copy_from_history(request, order_id=None, product_id=None):
product = Product.objects.get(owner=request.user, pk=product_id)
# I need to somehow setup a form with the product data so that the first
# view thinks it gets a post request.
2nd_response = product_add(request, order_id)
return 2nd_response
Since the second one needs to add the product as the first view does it I was wondering if I could just call the first view from the second one.
What I'm aiming for is just passing through the request object to the second view and return the obtained response object in turn back to the client.
Any help greatly appreciated, critism as well if this is a bad way to do it. But then some pointers .. to avoid DRY-ing.
Thanx!
Gerard.
My god, what was I thinking. This would be the cleanest solution ofcourse:
def product_add_from_history(request, order_id=None, product_id=None):
""" Add existing product to current order
"""
order = get_object_or_404(Order, pk=order_id, owner=request.user)
product = Product.objects.get(owner=request.user, pk=product_id)
newproduct = Product(
owner=request.user,
order = order,
name = product.name,
amount = product.amount,
unit_price = product.unit_price,
)
newproduct.save()
return HttpResponseRedirect(reverse('order-detail', args=[order_id]) )
A view is a regular python method, you can of course call one from another giving you pass proper arguments and handle the result correctly (like 404...). Now if it is a good practice I don't know. I would myself to an utiliy method and call it from both views.
If you are fine with the overhead of calling your API through HTTP you can use urllib to post a request to your product_add request handler.
As far as I know this could add some troubles if you develop with the dev server that comes with django, as it only handles one request at a time and will block indefinitely (see trac, google groups).

How do I deal with this race condition in django?

This code is supposed to get or create an object and update it if necessary. The code is in production use on a website.
In some cases - when the database is busy - it will throw the exception "DoesNotExist: MyObj matching query does not exist".
# Model:
class MyObj(models.Model):
thing = models.ForeignKey(Thing)
owner = models.ForeignKey(User)
state = models.BooleanField()
class Meta:
unique_together = (('thing', 'owner'),)
# Update or create myobj
#transaction.commit_on_success
def create_or_update_myobj(owner, thing, state)
try:
myobj, created = MyObj.objects.get_or_create(owner=user,thing=thing)
except IntegrityError:
myobj = MyObj.objects.get(owner=user,thing=thing)
# Will sometimes throw "DoesNotExist: MyObj matching query does not exist"
myobj.state = state
myobj.save()
I use an innodb mysql database on ubuntu.
How do I safely deal with this problem?
This could be an off-shoot of the same problem as here:
Why doesn't this loop display an updated object count every five seconds?
Basically get_or_create can fail - if you take a look at its source, there you'll see that it's: get, if-problem: save+some_trickery, if-still-problem: get again, if-still-problem: surrender and raise.
This means that if there are two simultaneous threads (or processes) running create_or_update_myobj, both trying to get_or_create the same object, then:
first thread tries to get it - but it doesn't yet exist,
so, the thread tries to create it, but before the object is created...
...second thread tries to get it - and this obviously fails
now, because of the default AUTOCOMMIT=OFF for MySQLdb database connection, and REPEATABLE READ serializable level, both threads have frozen their views of MyObj table.
subsequently, first thread creates its object and returns it gracefully, but...
...second thread cannot create anything as it would violate unique constraint
what's funny, subsequent get on the second thread doesn't see the object created in the first thread, due to the frozen view of MyObj table
So, if you want to safely get_or_create anything, try something like this:
#transaction.commit_on_success
def my_get_or_create(...):
try:
obj = MyObj.objects.create(...)
except IntegrityError:
transaction.commit()
obj = MyObj.objects.get(...)
return obj
Edited on 27/05/2010
There is also a second solution to the problem - using READ COMMITED isolation level, instead of REPEATABLE READ. But it's less tested (at least in MySQL), so there might be more bugs/problems with it - but at least it allows tying views to transactions, without committing in the middle.
Edited on 22/01/2012
Here are some good blog posts (not mine) about MySQL and Django, related to this question:
http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
http://www.no-ack.org/2011/05/broken-transaction-management-in-mysql.html
Your exception handling is masking the error. You should pass a value for state in get_or_create(), or set a default in the model and database.
One (dumb) way might be to catch the error and simply retry once or twice after waiting a small amount of time. I'm not a DB expert, so there might be a signaling solution.
Since 2012 in Django we have select_for_update which lock rows until the end of the transaction.
To avoid race conditions in Django + MySQL
under default circumstances:
REPEATABLE_READ in the Mysql
READ_COMMITTED in the Django
you can use this:
with transaction.atomic():
instance = YourModel.objects.select_for_update().get(id=42)
instance.evolve()
instance.save()
The second thread will wait for the first thread (lock), and only if first is done, the second will read data saved by first, so it will work on updated data.
Then together with get_or_create:
def select_for_update_or_create(...):
instance = YourModel.objects.filter(
...
).select_for_update().first()
if order is None:
instnace = YouModel.objects.create(...)
return instance
The function must be inside transaction block, otherwise, you will get from Django:
TransactionManagementError: select_for_update cannot be used outside of a transaction
Also sometimes it's good to use refresh_from_db()
In case like:
instance = YourModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.attr = response.something
you'd like to see:
instance = MyModel.objects.create(**kwargs)
response = do_request_which_lasts_few_seconds(instance)
instance.refresh_from_db() # 3
instance.attr = response.something
and that # 3 will reduce a lot a time window of possible race conditions, thus chance for that.