Updating and fetching a Django model object atomically - django

I want a capability to update-and-read an object atomically. For example, something like a first() on top of update() below:
obj = MyModel.objects.filter(pk=100).update(counter=F('counter') + 1).first()
I know it is awkward construct. But just want to show my need.
For the record I have used class method like:
#classmethod
def update_counter(cls, job_id):
with transaction.atomic():
job = (cls.objects.select_for_update().get(pk=job_id))
job.counter += 1
job.save()
return job
where I would call as below and get my updated obj.
my_obj = my_obj.update_counter()
But the question is, is there any other django model technique given such read back are common and likely used by multiple threads to conclude something based on, say, the final count.

Digging deeper I could not find any direct way of getting the object(s) that I am updating in an sql chained way. As Dani Herrera commented above the update and read have to be two sql queries. Only mechanism that helps me with that requirement is therefore the class method I had also included above. In fact, it helps me to add additional field updates in the same class method atomically in future.
For example, the method could very well be "def update_progress(job_id, final_desired_count)" where I can update more fields such as "self.progress_percentage = (self.counter / final_desired_count) * 100".
The class method approach may turn out to be a good investment for me.

Related

Peewee How do i call post_delete signal for dependencies?

I see peewee has signal http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#signals
But it works only with delete_instance().
For what I hope are obvious reasons, Peewee signals do not work when
you use the Model.insert(), Model.update(), or Model.delete() methods.
These methods generate queries that execute beyond the scope of the
ORM, and the ORM does not know about which model instances might or
might not be affected when the query executes.
Signals work by hooking into the higher-level peewee APIs like
Model.save() and Model.delete_instance(), where the affected model
instance is known ahead of time.
So, how do i use signal on records which are dependencies since it use delete() on dependencies?
def delete_instance(self, recursive=False, delete_nullable=False):
if recursive:
dependencies = self.dependencies(delete_nullable)
for query, fk in reversed(list(dependencies)):
model = fk.model
if fk.null and not delete_nullable:
model.update(**{fk.name: None}).where(query).execute()
else:
model.delete().where(query).execute()
return type(self).delete().where(self._pk_expr()).execute()
The short answer is that you can't, because recursive deletes are implemented using queries of the form:
DELETE FROM ... WHERE foreign_key_id = X
e.g., consider you have a user and they have created 1000 tweets. The tweet.user_id points to the user.id. When you delete that user, peewee issues 2 queries:
DELETE FROM tweets WHERE user_id = 123
DELETE FROM users WHERE id = 123
If it were to call delete_instance() you would end up issuing 1001 queries instead. So it is hopefully quite clear why it is implemented in this way.
I'd suggest that if you're deleting rows and need to perform some kind of cleanup on related rows, that you may be better off doing soft-deletes (e.g., setting a status=DELETED). Then do the processing of any relations outside of the signals API.

django save instance between parent and child model class

I came across this problem on form save the data needs to be persisted somewhere then go through a payment process then on success retrieve the data and save to the proper model.
I have seen this done using session, but with some hacky way to persist file uploads when commit=False and it doesn't seem very pythonic
I am thinking if I have a model class A, and have a child class extending A, such as A_Temp
class A(models.Model):
name = models.CharField(max_lenght=25)
image = models.ImageField()
class A_Temp(A):
pass
class AForm(forms.ModelForm):
class Meta:
model = A_Temp
On model form (A_Temp) save, it stores to A_Temp, and when payment successful, it move the instance to the parent model class A.
Here are the questions:
Has anyone done this before?
How to properly move an instance of a child model class to the parent model class?
Edit:
There are other different ways to do it, such as adding extra fields to the table, yes I would've done that if I am using PHP without a ORM framework, but since the ORM is pretty decent in django, I thought that I might trial something different.
Since I am asking here, means I am not convinced myself about this approach as well. What are your thoughts?
As suggested in the question comments, adding an extra field to your model containing payment state may be the easiest approach. Conceptually it's the same object, it's just that the state changes once payment has been made. As you've indicated, you will need logic to purge out items from your database which never proceed through the required states such as payment. This may involve adding both a payment_state and state_change_time field to your model which indicates when the state last changed. If the state is PAYMENT_PENDING for for too long, that record could be purged.
If you take the approach that unpaid items are stored in a different table as you've suggested, you still have to manage that table to determine when it's safe to delete items. For example, if a payment is never processed, when will you delete record from the A_temp table? Also, having a separate table means that you really only have two states possible, paid and unpaid as determine by the table in which the record occurs. Having a single table with a payment_state may be more flexible in that it allows you to extend the state as required. eg. Let's say you decide you need the payment states ITEM_SUBMITTED, AWAITING_PAYMENT, PAYMENT_ACCEPTED, PAYMENT_REJECTED. This could all be implemented with a single state field. If this was implemented as you've described, you'd need a separate table for each state.
Having said all that, if you're still set on having a separate table structure, you can create a function which will copy the values from an instance of A_temp to A. Something like the following may work, but any relationship type fields such as ForeignKey are likely to require special attention.
def copy_A_temp_to_A(a, a_temp):
for field_name in a._meta.fields:
value = getattr(a, field_name)
setattr(a_temp, field_name, value)
When you need to do the move from A_temp to A, you'd have to instantiate an A instance, then call the copy function, save the instance and delete the A_temp instance from the database.

count() returning zero hits in post_save

We have a Dialog and a Comment object. We have a denormalized field, num_comments, on Dialog to keep track of the number of comments. When a new comment is saved (or deleted) we want to increase/decrease this value properly.
# sender=Comment, called post_save and post_delete
def recalc_comments(sender, instance, created=False, **kwargs):
# Comments that will be deleted might not have a dialog (when dialog gets deleted)
if not hasattr(instance, "dialog"):
return
dialog = instance.dialog
dialog.update(
num_comments = sender.public.filter(dialog=dialog).count(),
num_commentators = sender.public.filter(dialog=dialog).aggregate(c=Count('user', distinct=True))["c"],
)
The problem that has started to appear is that the query for num_comments returns zero for the first comment posted. This does not happen every time, and only in cases with aprox. > 1000 comments in the result set (not much, I know...).
Could it be that the Comment has not yet been saved to the database when the count() is performed? To complicate things further we are using Johnny Cache (with memcached) as a layer between the ORM and database.
Any input would be greatly appreciated!
As far as I understood you want to do denormalization of your database scheme for best query performance. In this case I can recommend you application designed specially for this purpose - django-composition
As documentation said:
django-composition provides the abstract way to denormalize data from
your models in simple declarative way through special generic model
field called CompositionField.
Most cases of data denormalization are pretty common so
django-composition has several "short-cuts" fields that handles most
of them.
CompositionField is django model field that provides interface to data
denormalization.
You also can use this shortcut ForeignCountField. It help to count number of objects related by foreignkey.

Django ORM: Optimizing queries involving many-to-many relations

I have the following model structure:
class Container(models.Model):
pass
class Generic(models.Model):
name = models.CharacterField(unique=True)
cont = models.ManyToManyField(Container, null=True)
# It is possible to have a Generic object not associated with any container,
# thats why null=True
class Specific1(Generic):
...
class Specific2(Generic):
...
...
class SpecificN(Generic):
...
Say, I need to retrieve all Specific-type models, that have a relationship with a particular Container.
The SQL for that is more or less trivial, but that is not the question. Unfortunately, I am not very experienced at working with ORMs (Django's ORM in particular), so I might be missing a pattern here.
When done in a brute-force manner, -
c = Container.objects.get(name='somename') # this gets me the container
items = c.generic_set.all()
# this gets me all Generic objects, that are related to the container
# Now what? I need to get to the actual Specific objects, so I need to somehow
# get the type of the underlying Specific object and get it
for item in items:
spec = getattr(item, item.get_my_specific_type())
this results in a ton of db hits (one for each Generic record, that relates to a Container), so this is obviously not the way to do it. Now, it could, perhaps, be done by getting the SpecificX objects directly:
s = Specific1.objects.filter(cont__name='somename')
# This gets me all Specific1 objects for the specified container
...
# do it for every Specific type
that way the db will be hit once for each Specific type (acceptable, I guess).
I know, that .select_related() doesn't work with m2m relationships, so it is not of much help here.
To reiterate, the end result has to be a collection of SpecificX objects (not Generic).
I think you've already outlined the two easy possibilities. Either you do a single filter query against Generic and then cast each item to its Specific subtype (results in n+1 queries, where n is the number of items returned), or you make a separate query against each Specific table (results in k queries, where k is the number of Specific types).
It's actually worth benchmarking to see which of these is faster in reality. The second seems better because it's (probably) fewer queries, but each one of those queries has to perform a join with the m2m intermediate table. In the former case you only do one join query, and then many simple ones. Some database backends perform better with lots of small queries than fewer, more complex ones.
If the second is actually significantly faster for your use case, and you're willing to do some extra work to clean up your code, it should be possible to write a custom manager method for the Generic model that "pre-fetches" all the subtype data from the relevant Specific tables for a given queryset, using only one query per subtype table; similar to how this snippet optimizes generic foreign keys with a bulk prefetch. This would give you the same queries as your second option, with the DRYer syntax of your first option.
Not a complete answer but you can avoid a great number of hits by doing this
items= list(items)
for item in items:
spec = getattr(item, item.get_my_specific_type())
instead of this :
for item in items:
spec = getattr(item, item.get_my_specific_type())
Indeed, by forcing a cast to a python list, you force the django orm to load all elements in your queryset. It then does this in one query.
I accidentally stubmled upon the following post, which pretty much answers your question :
http://lazypython.blogspot.com/2008/11/timeline-view-in-django.html

What's the easiest way to perform total calculations on child objects?

I need to perform a range of simple calculations on an Invoice model object, which has a number of Order children associated with it that hold the order quantity and order value etc. This is all managed through the admin form for Invoice (with order as inlines)
The code I'm working with now performs these calcs like this:
Invoice (models.py)
def save():
#get the total of all associated orders
for order in self.invoice_orders.all():
self.invoice_net_total += order.order_value
super(Invoice, self).save()
Which causes a few problems when changing a child order quantity and then saving the form - I get the previous total instead of the new total, only when I save again does the total correct itself. Perhaps due to the way Django saves parent and child objects?
Other option I toyed with was moving this calculation to the child order object (prepare for horrible code):
Order (models.py)
def save():
if not self.id:
self.invoice.invoice_net_total += self.order_value
elif self.id:
#grab old instance
old = Order.objects.get(pk=self.id)
#remove that old total
self.invoice.invoice_net_total -= old.order_value
self.invoice.save()
#add new total
self.invoice.invoice_net_value += self.order_value
self.invoice.save()
Though that's not very effective either.
Wondering if anyone could guide me to a straightforward way of ensuring these calcs perform as they should? Thought of signals (relatively new to me) but wondered if that was overkill. I'm probably overthinking this!
Thank you
On a different thought, consider not even explicitly saving the invoice total. Instead, it may be dynamically recalculated each time you query for an invoice.
I think this is better modeling as it avoid redundant and possibly inconsistent data. SQL and Django are smart enough to compute you invoice total without much overhead each time you need it. Esp. you don't do the summation in you program if you use an aggregation function
Me thinks you need to explicitly save the child objects before doing the calculation, since your query references the objects as in the database, while any changed object from the form may not have been saved you.