Django get_or_create vs catching IntegrityError - django

I want to insert several User rows in the DB. I don't really care if the insert didn't succeed, as long as I'm notified about it, which I'm able to do in both cases, so which one is better in terms of performance (speed mostly)?
Always insert the row (by calling the model's save method) and catching potential IntegrityError exceptions
Call the get_or_create method from the QuerySet class

Think about what are you doing: you want to insert rows into the database, you don't need to get the object out of it if it exists. Ask for forgiveness, and catch IntegrityError exception:
try:
user = User(username=username)
user.save()
except IntegrityError:
print "Duplicate user"
This way you would avoid an overhead of an additional get() lookup made by get_or_create().
FYI, EAFP is a common practice/coding style in Python:
Easier to ask for forgiveness than permission. This common Python
coding style assumes the existence of valid keys or attributes and
catches exceptions if the assumption proves false. This clean and fast
style is characterized by the presence of many try and except
statements.
Also see: https://stackoverflow.com/questions/6092992/why-is-it-easier-to-ask-forgiveness-than-permission-in-python-but-not-in-java

Related

Any way to handler IntegrityError in django?

In django the exception IntegrityError could be caused by a lot of reasons.
Some times this error means a unique conflict. Some other times this could be caused by some foreign key check.
Currently we can only know the root cause by convert the exception into text:
(1062, "Duplicate entry '79d3dd88917a11e98d42f000ac192cee-not_created' for key 'fs_cluster_id_dir_e8164dce_uniq'")
But this is very unfriendly for program to identify. Is there any way for code to identify the root cause of exception?
For example, if I know this is caused by a unique conflict, I can tell the client this is caused because some resouce already exist. If I know this is caused by foreign key not exist, I can tell the client this is caused by some parent resource not created.
So can any good way to identify the cause by code?
Don't know about good.
Do you need to identify the exact cause in your code? If you have an alternative way of proceeding that you want to avoid using every time because of efficiency considerations, you might just code:
try:
# the simple way
except IntegrityError as simplefail:
try:
# the complicated way
except IntegrityError as complexfail:
# log this utter failure and re-raise one or other of the caught errors

Is it ever OK to catch Django ORM errors inside atomic blocks?

Example code:
with transaction.atomic():
# Create and save some models here
try:
model_instance.save()
except IntegrityError:
raise SomeCustomError()
Two questions:
1) Will this work as intended and roll back any previously saved models, given that nothing is done in the exception handler except for re-raising a custom error?
2) From a code style perspective, does it make sense to not use a nested transaction inside the try block in cases like this? (That is, only one line of code within the try block, no intention of persisting anything else within the transaction, no writes to the database inside the exception handler, etc.)
Will this work as intended and roll back any previously saved models, given that nothing is done in the exception handler except for re-raising a custom error?
Yes. The rollback will be triggered by the exception (any exception), and as long as you don't touch the database after the database error you won't risk the TransactionManagementError mentioned in the documentation.
From a code style perspective, does it make sense to not use a nested transaction inside the try block in cases like this?
Style is a matter of opinion, but I don't see any point in using a nested transaction here. It makes the code more complicated (not to mention the unnecessary savepoints in the transaction) for no discernible benefit.
You should use atomic transaction like below code. Handle atomic transaction in catching block, avoid catching exceptions inside atomic.
try:
with transaction.atomic():
SomeModel.objects.get(id=NON_EXISTENT_ID)
except SomeModel.DoesNotExist:
raise SomeCustomError()

Can't execute queries until end of atomic block in my data migration on django 1.7

I have a pretty long data migration that I'm doing to correct an earlier bad migration where some rows were created incorrectly. I'm trying to assign values to a new column based upon old ones, however, sometimes this leads to integrity errors. When this happens, I want to throw away the one that's causing the integrity error
Here is a code snippet:
def load_data(apps, schema_editor):
MyClass = apps.get_model('my_app', 'MyClass')
new_col_mapping = {old_val1: new_val1, ....}
for inst in MyClass.objects.filter(old_col=c):
try:
inst.new_col = new_col_mapping[c]
inst.save()
except IntegrityError:
inst.delete()
Then in the operations of my Migration class I do
operations = [
migrations.RunPython(load_data)
]
I get the following error when running the migration
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block
I get the feeling that doing
with transaction.atomic():
somewhere is my solution but I'm not exactly sure where the right place is. More importantly I'd like to understand WHY this is necessary
This is similar to the example in the docs.
First, add the required import if you don't have it already.
from django.db import transaction
Then wrap the code that might raise an integrity error in an atomic block.
try:
with transaction.atomic():
inst.new_col = new_col_mapping[c]
inst.save()
except IntegrityError:
inst.delete()
The reason for the error is explained in the warning block 'Avoid catching exceptions inside atomic!' in the docs. Once Django encounters a database error, it will roll back the atomic block. Attempting any more database queries will cause a TransactionManagementError, which you are seeing. By wrapping the code in an atomic block, only that code will be rolled back, and you can execute queries outside of the block.
Each migration is wrapped around one transaction, so when something fails during migration, all operations will be cancelled. Because of that, each transaction in which something failed, can't take new queries (they will be cancelled anyway).
Wrapping some operations with with transaction.atomic(): is not good solution, because you won't be able to cancel that operation when something will fail. Instead of that, avoid integrity errors by doing some more checks before saving data.
It seems that the same exception can have a variety of causes. In my case it was caused by an invalid model field name: I used a greek letter delta 𐤃 in my field name.
It seemed to work fine, all app worked well (perhaps I just didn't try any more complex use case). The tests, however, raised TransactionManagementError.
I solved the problem by removing 𐤃 from the field name and from all the migration files.
I faced on same issue, but I resolved it by using django.test.TransactionTestCase instead of django.test.TestCase.

Django - Prevent automatic related table fetch

How can I prevent Django, for testing purposes, from automatically fetching related tables not specified in the select_related() call during the intial query?
I have a large application where I make significant use of
select_related() to bring in related model data during each original
query. All select_related() calls are used to specify the specific related models, rather than relying on the default, e.g. select_related('foo', 'bar', 'foo__bar')
As the application has grown, the select_related calls haven't
completely kept up, leaving a number of scenarios where Django happily
and kindly goes running off to the database to fetch related model
rows. This significantly increases the number of database hits, which
I obviously don't want.
I've had some success in tracking these down by checking the queries
generated using the django.db.connection.queries collection, but some
remain unsolved.
I've tried to find a suitable patch location in the django code to raise an
exception in this scenario, making the tracking much easier, but tend
to get lost in the code.
Thanks.
After some more digging, I've found the place in the code to do this.
The file in question is django/db/models/fields/related.py
You need to insert two lines into this file.
Locate class "SingleRelatedObjectDescriptor". You need to change the function __get__() as follows:
def __get__(self, instance, instance_type=None):
if instance is None:
return self
try:
return getattr(instance, self.cache_name)
except AttributeError:
raise Exception("Automated Database Fetch on %s.%s" % (instance._meta.object_name, self.related.get_accessor_name()))
# leave the old code here for when you revert!
Similarly, in class "ReverseSingleRelatedObjectDescriptor" further down the code, you again need to change __get__() to:
def __get__(self, instance, instance_type=None):
if instance is None:
return self
cache_name = self.field.get_cache_name()
try:
return getattr(instance, cache_name)
except AttributeError:
raise Exception("Automated Database Fetch on %s.%s" % (instance._meta.object_name, self.field.name))
# BEWARE: % parameters are different to previous class
# leave old code here for when you revert
Once you've done this, you'll find that Django raises an exception every time it performs an automatic database lookup. This is pretty annoying when you first start, but it will help you track down those pesky database lookups. Obviously, when you've found them all, it's probably best to revert the database code back to normal. I would only suggest using this during a debugging/performance investigation phase and not in the live production code!
So, you're asking how to stop a method from doing what it's specifically designed to do? I don't understand why you would want to do that.
However, one thing to know about select_related is that it doesn't automatically follow relationships which are defined as null=True. So if you can set your FKs to that for now, the relationship won't be followed.

Django unique constraint + form errors

I'm having some issues with double-posting on my site. I figure a simple unique constraint across all the relevant fields will solve the issue on a database level, but then it just produces a nasty error page for the user. Is there a way I can turn this into a pretty form error instead? Like a non_field_error? Or what approach should I take?
Maybe something like this will help you:
class YourForm(forms.Form):
# Everything as before.
...
def clean(self):
cleaned_data = self.cleaned_data
your_unique_key = cleaned_data['your_unique_key']
if your_unique_key and YourModel.objects.get(your_unique_key=your_unique_key):
raise forms.ValidationError("not unique")
# Always return the full collection of cleaned data.
return cleaned_data
The clean() method will allow you to access all fields of the form which might be useful if you have a combined unique key. Otherwise a (sightly shorter) clean_your_unique_key() might suit you better.
And please note that under rare circumstances (race conditions) the form validation might not report a duplicate entry (but it's of course reported by the database engine). But for most applications the provided example will be the easier and more maintainable one, so I still recommend this approach.
as far as a 'nasty error page' for the user, Django lets you customize your own 500,404 and probably other pages. general info on that:
In order to use the Http404 exception
to its fullest, you should create a
template that is displayed when a 404
error is raised. This template should
be called 404.html and located in the
top level of your template tree.
-- http://docs.djangoproject.com/en/dev/topics/http/views/
another nice way, not as DRY as tux21b's solution but perhaps a little easier to understand for a one-time solution, might be to catch the error intelligently. one way is to do so without even bothering to violate the constraint - a simple query should verify whether the user is about to do something illegal.
okToUpdate=MyModel.objects.filter(parameters=values...).count()
if okToUpdate>0: # an object already exists
errorExists=True
errors={customError:customMessage}
...
if errorExists:
return render_to_response(errors,'customErrorPage.html')
else:
# return whatever you normally would return
you then use render_to_response to render a custom error page.
(another way is to allow the database violation to occur, then catch that error and do the same thing... i theorize that a DB gets slightly less stress doing a lookup than handling an exception but it's up to you how you like to do things).
JB