django - detect IntegrityError without "save()"

django - detect IntegrityError without "save()" - django

ok, i need a little help here.
I have a model which has a field called slug = models.SlugField(unique=True), and i am trying to set this field on save() by appending 1 to slug if slug already exists and so on.
I want to consider race conditions.
def set_uniqslug(self, slug, i=0):
new_slug = u"{}{}".format(slug, str(i) if i else '')
try:
with transaction.atomic():
self.slug = slugify(new_slug.lower())
self.save()
return self
return self
except IntegrityError as e:
i += 1
return set_uniqslug(self, slug, i)
def save(self, *args, **kwargs):
if not self.pk:
set_uniqslug(self.name.lower()) # <--- but it does "save" above.
# i want something like:
# self.slug = self.get_uniqslug(self.name.lower())
super(Company, self).save(*args, **kwargs)
my problem is, if i call the set_uniqslug(), it needs to try to save, just to know if there is IntegrityError. in my code, it goes into infinite loop.
how can I know without saving if there is IntegrityError and then just return the unique slug back to save() method?
update:
i tried this:
with transaction.atomic():
if Company.objects.filter(slug=new_slug).exists():
i += 1
return self.set_uniqslug(slug, i)
return new_slug
it is working, but i have a stomachache by locking READ-action. am I not blocking other queries or doing any other bad stuff by doing this?

Your check-and-set version will probably not work. That will depend on your database and its implementation of the transaction isolation levels; but taking PostgreSQL as an example, the default READ COMMITTED isolation level will not prevent another transaction from inserting a row with the same slug in between your check and set.
So use your original, optimistic locking idea. As Hugo Rodger-Brown pointed out, you can avoid the infinite loop by calling the superclass's save().
Finally, you might want to consider an alternative slug format. Many times the slug will incorporate the database id (similar to StackOverflow itself, actually), which eliminates the possibility of duplicate slugs.

Related

How to modify a Django Model field on `.save()` whose value depends on the incoming changes?

I have fields in multiple related models whose values are fully derived from other fields both in the model being saved and from fields in related models. I wanted to automate their value maintenance so that they are always current/valid, so I wrote a base class that each model inherits from. It overrides the .save() and .delete().
It pretty much works except for when multiple updates are triggered via changes to a through model of a M:M relationship between models named Infusate and Tracer (the through model is named InfusateTracer). So for example, I have a test that creates 2 InfusateTracer model records, which triggers updates to Infusate:
glu_t = Tracer.objects.create(compound=glu)
c16_t = Tracer.objects.create(compound=c16)
io = Infusate.objects.create(short_name="ti")
InfusateTracer.objects.create(infusate=io, tracer=glu_t, concentration=1.0)
InfusateTracer.objects.create(infusate=io, tracer=c16_t, concentration=2.0)
print(f"Name: {infusate.name}")
Infusate.objects.get(name="ti{C16:0-[5,6-13C5,17O1];glucose-[2,3-13C5,4-17O1]}")
The save() override looks like this:
def save(self, *args, **kwargs):
# Set the changed value triggering this update so that the derived value of the automatically updated field reflects the new values:
super().save(*args, **kwargs)
# Update the fields that change due to the above change (if any)
self.update_decorated_fields()
# Note, I cannot call save again because I get a duplicate exception, so `update_decorated_fields` uses `setattr`:
# super().save(*args, **kwargs)
# Percolate changes up to the parents (if any)
self.call_parent_updaters()
The automatically maintained field updates are performed here. Note that the fields to update, the function that generates their value, and the link to the parent are all maintained in a global returned by get_my_updaters() whose values are from a decorator I wrote applied to the updating functions:
def update_decorated_fields(self):
for updater_dict in self.get_my_updaters():
update_fun = getattr(self, updater_dict["function"])
update_fld = updater_dict["update_field"]
if update_fld is not None:
current_val = None
# ... brevity edit
new_val = update_fun()
setattr(self, update_fld, new_val)
print(f"Auto-updated {self.__class__.__name__}.{update_fld} using {update_fun.__qualname__} from [{current_val}] to [{new_val}]")
And in the test code example at the top of this post, where InfusateTracer linking records are created, this method is crucial to the updates that are not fully happening:
def call_parent_updaters(self):
parents = []
for updater_dict in self.get_my_updaters():
update_fun = getattr(self, updater_dict["function"])
parent_fld = updater_dict["parent_field"]
# ... brevity edit
if parent_inst is not None and parent_inst not in parents:
parents.append(parent_inst)
for parent_inst in parents:
if isinstance(parent_inst, MaintainedModel):
parent_inst.save()
elif parent_inst.__class__.__name__ == "ManyRelatedManager":
if parent_inst.count() > 0 and isinstance(
parent_inst.first(), MaintainedModel
):
for mm_parent_inst in parent_inst.all():
mm_parent_inst.save()
And here's the relevant ordered debug output:
Auto-updated Infusate.name using Infusate._name from [ti] to [ti{glucose-[2,3-13C5,4-17O1]}]
Auto-updated Infusate.name using Infusate._name from [ti{glucose-[2,3-13C5,4-17O1]}] to [ti{C16:0-[5,6-13C5,17O1];glucose-[2,3-13C5,4-17O1]}]
Name: ti{glucose-[2,3-13C5,4-17O1]}
DataRepo.models.infusate.Infusate.DoesNotExist: Infusate matching query does not exist.
Note that the output Name: ti{glucose-[2,3-13C5,4-17O1]} is incomplete (even though the debug output above it is complete: ti{C16:0-[5,6-13C5,17O1];glucose-[2,3-13C5,4-17O1]}). It contains the information resulting from the creation of the first through record:
InfusateTracer.objects.create(infusate=io, tracer=glu_t, concentration=1.0)
But the subsequent through record created by:
InfusateTracer.objects.create(infusate=io, tracer=c16_t, concentration=2.0)
...(while all the Auto-updated debug output is correct - and is what I expected to see), is not the final value of the Infusate record's name field (which should be composed of values gathered from 7 different records as displayed in the last Auto-updated debug output (1 Infusate record, 2 Tracer records, and 4 TracerLabel records))...
Is this due to asynchronous execution or is this because I should be using something other than setattr to save the changes? I've tested this many times and the result is always the same.
Incidentally, I lobbied our team to not even have these automatically maintained fields because of their potential to become invalid from DB changes, but the lab people like having them apparently because that's how the suppliers name the compounds, and they want to be able to copy/paste them in searches, etc).

The problem here is a misconception over how changes are applied, when they are used in the construction of the new derived field value, and when the super().save method should be called.
Here, I am creating a record:
io = Infusate.objects.create(short_name="ti")
That is related to these 2 records (also being created):
glu_t = Tracer.objects.create(compound=glu)
c16_t = Tracer.objects.create(compound=c16)
Then, those records are linked together in a through model:
InfusateTracer.objects.create(infusate=io, tracer=glu_t, concentration=1.0)
InfusateTracer.objects.create(infusate=io, tracer=c16_t, concentration=2.0)
I had thought (incorrectly) that I had to call super().save() so that when the field values are gathered together to compose the name field, those incoming changes would be included in the name.
However, the self object, is what is being used to retrieve those values. It doesn't matter that they aren't saved yet.
At this point, it's useful to include some of the gaps in the supplied code in the question. This is a portion of the Infusate model:
class Infusate(MaintainedModel):
id = models.AutoField(primary_key=True)
name = models.CharField(...)
short_name = models.CharField(...)
tracers = models.ManyToManyField(
Tracer,
through="InfusateTracer",
)
#field_updater_function(generation=0, update_field_name="name")
def _name(self):
if self.tracers is None or self.tracers.count() == 0:
return self.short_name
return (
self.short_name
+ "{"
+ ";".join(sorted(map(lambda o: o._name(), self.tracers.all())))
+ "}"
)
And this was an error I had inferred (incorrectly) to mean that the record had to have been saved before I could access the values:
ValueError: "<Infusate: >" needs to have a value for field "id" before this many-to-many relationship can be used.
when I had tried the following version of my save override:
def save(self, *args, **kwargs):
self.update_decorated_fields()
super().save(*args, **kwargs)
self.call_parent_updaters()
But what this really meant was that I had to test something else other than self.tracers is None to see if any M:M links exist. We can simply check self.id. If it's None, we can infer that self.tracers does not exist. So the answer to this question is simply to edit the save method override to:
def save(self, *args, **kwargs):
self.update_decorated_fields()
super().save(*args, **kwargs)
self.call_parent_updaters()
and edit the method that generates the value for the field update to:
#field_updater_function(generation=0, update_field_name="name")
def _name(self):
if self.id is None or self.tracers is None or self.tracers.count() == 0:
return self.short_name
return (
self.short_name
+ "{"
+ ";".join(sorted(map(lambda o: o._name(), self.tracers.all())))
+ "}"
)

Auto-incrementing Django DateField

How does one create a DateField, which automatically increments by 1 day in the way that the pk field does?
For example, I would create a new object, this would be of 16/04/2017, the next object would be of 17/04/2017, even if they are both submitted on the same day.
How would I do this?

How about override the model's save method like this:
from datetime import datetime, timedelta
from django.db import models
class MyModel(models.Model):
date = models.DateField() # the below method will NOT work if auto_now/auto_now_add are set to True
def save(self, *args, **kwargs):
# count how many objects are already saved with the date this current object is saved
date_gte_count = MyModel.objects.filter(date__gte=self.date).count()
if date_gte_count:
# there are some objects saved with the same or greater date. Increase the day by this number.
self.date += timedelta(days=date_gte_count)
# save object in db
super().save(*args, **kwargs)
Of course, the above can be implemented using Django signals. The pre_save one.

So I worked this out using parts of Nik_m's answer and also some of my knowledge.
I essentially made a while loop which kept iterating over and adding a day, as opposed to Nik_m's answer which doesn't work after the third object due to a lack of iteration.
def save(self, *args, **kwargs):
same_date_obj = Challenge.objects.filter(date=self.date)
if same_date_obj.exists():
while True:
if Challenge.objects.filter(date=self.date).exists():
self.date += timedelta(days=1)
else:
break
super().save(*args, **kwargs)
EDIT: This answer is no longer valid, it requires a while loop and thus an indefinite amount of queries. #Nik_m's modified answer is better.

Django Save Override Throwing Primary Duplicate Errors

So, I have a model called ScheduleItem
class ScheduleItem(models.Model):
agreement = FK
location = FK
start = models.DateTimeField()
end = models.DateTimeField()
totalHours = DecimalField
def get_total_hours(self):
start = timedelta(hours=self.start.hour, minutes=self.start.minute)
end = timedelta(hours=self.end.hour, minutes=self.end.minute)
td = (end-start).seconds
totalHours=Decimal(td/Decimal(60)/Decimal(60))
return totalHours
def save(self,*args,**kwargs):
if self.pk == None:
super(ScheduleItem,self).save(self,*args,**kwargs)
self.refresh_from_db() # to access the datetime values, rather than unicode POST
self.totalHours = self.get_total_hours()
else:
self.totalHours = self.get_total_hours()
super(ScheduleItem,self).save(self,*args,**kwargs)
This throws PRIMARY key errors. I get duplicate entries with the second super(ScheduleItem,self). I cannot for the life of me figure out how to check for pk to access the datetime value and then save again within the save override method. I've tried moving things around, I've tried saving within the get_total_hours() function, with nothing but trouble.
I just want the object to be committed to the db so I can get the datetime objects and then calculate the total hours.
I'd rather not convert to datetime within the save function.
Does anyone have any tip or can anyone tell me where I'm going wrong?

You should not pass self to save(). You're calling super().save() as a bound method on an instance, so self is implicitly passed as the first argument. Change it to this:
def save(self,*args,**kwargs):
if self.pk is None:
super(ScheduleItem,self).save(*args,**kwargs)
self.refresh_from_db() # to access the datetime values, rather than unicode POST
self.totalHours = self.get_total_hours()
else:
self.totalHours = self.get_total_hours()
super(ScheduleItem,self).save(*args,**kwargs)
You get this weird behaviour because the first positional argument is force_insert, and the model instance evaluates to True. The second call to super().save() tries to force an insert with the same pk you previously saved.

Django symmetrical relation

I was trying to solve a simple problem with a site where a Person model has a married_to field as Foreign Key.
To maintain this when a user (PersonA) changes who he/she is married to the following should happen:
The previous Person that PersonA was married to should set its married-field to None
The new Person that PersonA is married to should update and set its married-field to PersonA (which in turn can trigger that the new Person's potentially previously married Person should set its married-field to None)
So what I tried was to override the save method something along the lines
if self.pk is not None and self.married is not None:
orig = Person.objects.get(pk=self.pk)
orig.married.married = None
orig.married.save()
if self.married is not None:
self.married.married = self
self.married.save()
super(Person, self).save()
I ran into maximum recursive problem etc. and started to search for answers but didnt find anythin conclusive.
What is the idiomatic way to do this for noobs like me...
Thanks

option 1 Write your code avoid save method call:
if self.pk is not None and self.married is not None:
Person.objects.filter(pk=orig.married.married.pk).update( married = None )
if self.married is not None:
Person.objects.filter(pk=orig.married.pk).update( married = self )
super(Person, self).save()
option 2 Also, you can stop recursion with a conditional particular case:
if self.married is not None and self.married != self:
self.married.married = self
self.married.save()
right way Perhaps, the right relation in your scenario is OneToOneField ( you are talking about ForeignKey in your question) if a Person can be only be married with another one.

I would implement this as a separate method, not as a part of save(). Assuming that "married" is an FK to the related Person:
class Person(models.Model):
[...]
def set_married(self, married_to):
if self.married_id != married_to.id: # <-- prevent recursion/looping
self.married = married_to
self.save()
self.married.set_married(self)

Something like this will work.
class Person(models.Model):
...
partner = models.OneToOneField('self', blank=true, null=true)
def save(self, *args, **kwargs):
# to avoid infinite looping on save.
if not self.partner.partner:
self.partner.partner = self
self.partner.save()
This will on save simply equal partner fields (create a symmetrical relation).

DRF - How to get WritableField to not load entire database into memory?

I have a very large database (6 GB) that I would like to use Django-REST-Framework with. In particular, I have a model that has a ForeignKey relationship to the django.contrib.auth.models.User table (not so big) and a Foreign Key to a BIG table (lets call it Products). The model can be seen below:
class ShoppingBag(models.Model):
user = models.ForeignKey('auth.User', related_name='+')
product = models.ForeignKey('myapp.Product', related_name='+')
quantity = models.SmallIntegerField(default=1)
Again, there are 6GB of Products.
The serializer is as follows:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = serializers.RelatedField(many=False)
user = serializers.RelatedField(many=False)
class Meta:
model = ShoppingBag
fields = ('product', 'user', 'quantity')
So far this is great- I can do a GET on the list and individual shopping bags, and everything is fine. For reference the queries (using a query logger) look something like this:
SELECT * FROM myapp_product WHERE product_id=1254
SELECT * FROM auth_user WHERE user_id=12
SELECT * FROM myapp_product WHERE product_id=1404
SELECT * FROM auth_user WHERE user_id=12
...
For as many shopping bags are getting returned.
But I would like to be able to POST to create new shopping bags, but serializers.RelatedField is read-only. Let's make it read-write:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = serializers.PrimaryKeyRelatedField(many=False)
user = serializers.PrimaryKeyRelatedField(many=False)
...
Now things get bad... GET requests to the list action take > 5 minutes and I noticed that my server's memory jumps up to ~6GB; why?! Well, back to the SQL queries and now I see:
SELECT * FROM myapp_products;
SELECT * FROM auth_user;
Ok, so that's not good. Clearly we're doing "prefetch related" or "select_related" or something like that in order to get access to all the products; but this table is HUGE.
Further inspection reveals where this happens on Line 68 of relations.py in DRF:
def initialize(self, parent, field_name):
super(RelatedField, self).initialize(parent, field_name)
if self.queryset is None and not self.read_only:
manager = getattr(self.parent.opts.model, self.source or field_name)
if hasattr(manager, 'related'): # Forward
self.queryset = manager.related.model._default_manager.all()
else: # Reverse
self.queryset = manager.field.rel.to._default_manager.all()
If not readonly, self.queryset = ALL!!
So, I'm pretty sure that this is where my problem is; and I need to say, don't select_related here, but I'm not 100% if this is the issue or where to deal with this. It seems like all should be memory safe with pagination, but this is simply not the case. I'd appreciate any advice.

In the end, we had to simply create our own PrimaryKeyRelatedField class to override the default behavior in Django-Rest-Framework. Basically we ensured that the queryset was None until we wanted to lookup the object, then we performed the lookup. This was extremely annoying, and I hope the Django-Rest-Framework guys take note of this!
Our final solution:
class ProductField(serializers.PrimaryKeyRelatedField):
many = False
def __init__(self, *args, **kwargs):
kwarsgs['queryset'] = Product.objects.none() # Hack to ensure ALL products are not loaded
super(ProductField, self).__init__(*args, **kwargs)
def field_to_native(self, obj, field_name):
return unicode(obj)
def from_native(self, data):
"""
Perform query lookup here.
"""
try:
return Product.objects.get(pk=data)
except Product.ObjectDoesNotExist:
msg = self.error_messages['does_not_exist'] % smart_text(data)
raise ValidationError(msg)
except (TypeError, ValueError):
msg = self.error_messages['incorrect_type'] % type(data)
raise ValidationError(msg)
And then our serializer is as follows:
class ShoppingBagSerializer(serializers.ModelSerializer):
product = ProductField()
...
This hack ensures the entire database isn't loaded into memory, but rather performs one-off selects based on the data. It's not as efficient computationally, but it also doesn't blast our server with 5 second database queries loaded into memory!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

django - detect IntegrityError without "save()" - django

Related

How to modify a Django Model field on `.save()` whose value depends on the incoming changes?

Auto-incrementing Django DateField

Django Save Override Throwing Primary Duplicate Errors

Django symmetrical relation

DRF - How to get WritableField to not load entire database into memory?

Categories

Resources