We have a Dialog and a Comment object. We have a denormalized field, num_comments, on Dialog to keep track of the number of comments. When a new comment is saved (or deleted) we want to increase/decrease this value properly.
# sender=Comment, called post_save and post_delete
def recalc_comments(sender, instance, created=False, **kwargs):
# Comments that will be deleted might not have a dialog (when dialog gets deleted)
if not hasattr(instance, "dialog"):
return
dialog = instance.dialog
dialog.update(
num_comments = sender.public.filter(dialog=dialog).count(),
num_commentators = sender.public.filter(dialog=dialog).aggregate(c=Count('user', distinct=True))["c"],
)
The problem that has started to appear is that the query for num_comments returns zero for the first comment posted. This does not happen every time, and only in cases with aprox. > 1000 comments in the result set (not much, I know...).
Could it be that the Comment has not yet been saved to the database when the count() is performed? To complicate things further we are using Johnny Cache (with memcached) as a layer between the ORM and database.
Any input would be greatly appreciated!
As far as I understood you want to do denormalization of your database scheme for best query performance. In this case I can recommend you application designed specially for this purpose - django-composition
As documentation said:
django-composition provides the abstract way to denormalize data from
your models in simple declarative way through special generic model
field called CompositionField.
Most cases of data denormalization are pretty common so
django-composition has several "short-cuts" fields that handles most
of them.
CompositionField is django model field that provides interface to data
denormalization.
You also can use this shortcut ForeignCountField. It help to count number of objects related by foreignkey.
Related
I have a long list of "Quotation" objects.
The price of a quotation depends on dozens of children (and grand children) objects. The lowest one being the rate/hour.
When I change the children objects of a quotation like the rate/hour, the quotation price changes.
I would like to recalculate the price of each quotation that is impacted by any change I make on their consistuant objects automatically. I am using Wagtail for the object admin.
I am not sure about the way to do that, should I use signals? Wagtail hooks?
Wagtail models and pages are just Django models.
Are all children (and grand children) of the same model? In that case you could just register a post_save signal on the child model, but in general, I'd recommend using post_save only for non-mutating actions like sending an email when an object changes, unless you take the following things into consideration:
This kind of processing can become very slow very quickly. If you have multiple quotations, you'll need to use an atomic transaction, otherwise you'll make a new db connection when saving every quotation.
You'll also need to prefetch children and grandchildren to prevent multiple database calls.
When saving other models inside a post_save you run the risk of creating an endless loop of post_save signals.
I think a better way would be to add a #property called price on Quotation instead of storing the price in the database:
class Quotation(models.Model):
# Model fields here
...
#property
def price(self):
return sum([child.price for child in self.children])
This will only calculate a price when you call quotation.price. You could speed up this calculation by prefetching children when receiving Quotations.
I a react front-end, django backend (used as REST back).
I've inherited the app, and it loads all the user data using many Models and Serializes. It loads very slow.
It uses a filter to query for a single member, then passes that to a Serializer:
found_account = Accounts.objects.get(id='customer_id')
AccountDetailsSerializer(member, context={'request': request}).data
Then there are so many various nested Serializers:
AccountDetailsSerializers(serializers.ModelSerializer):
Invoices = InvoiceSerializer(many=True)
Orders = OrderSerializer(many=True)
....
From looking at the log, looks like the ORM issues so many queries, it's crazy, for some endpoints we end up with like 50 - 60 queries.
Should I attempt to look into using select_related and prefetch or would you skip all of that and just try to write one sql query to do multiple joins and fetch all the data at once as json?
How can I define the prefetch / select_related when I pass in a single object (result of get), and not a queryset to the serializer?
Some db entities don't have links between them, meaning not fk or manytomany relationships, just hold a field that has an id to another, but the relationship is not enforced in the database? Will this be an issue for me? Does it mean once more that I should skip the select_related approach and write a customer sql for fetching?
How would you suggest to approach performance tuning this nightmare of queries?
I recommend initially seeing what effects you get with prefetch_related. It can have a major impact on load time, and is fairly trivial to implement. Going by your example above something like this could alone reduce load time significantly:
AccountDetailsSerializers(serializers.ModelSerializer):
class Meta:
model = AccountDetails
fields = (
'invoices',
'orders',
)
invoices = serializers.SerializerMethodField()
orders = serializers.SerializerMethodField()
def get_invoices(self, obj):
qs = obj.invoices.all()\
.prefetch_related('invoice_sub_object_1')\
.prefetch_related('invoice_sub_object_2')
return InvoiceSerializer(qs, many=True, read_only=True).data
def get_orders(self, obj):
qs = obj.orders.all()\
.prefetch_related('orders_sub_object_1')\
.prefetch_related('orders_sub_object_2')
return OrderSerializer(qs, many=True, read_only=True).data
As for your question of architecture, I think a lot of other factors play in as to whether and to which degree you should refactor the codebase. In general though, if you are married to Django and DRF, you'll have a better developer experience if you can embrace the idioms and patterns of those frameworks, instead of trying to buypass them with your own fixes.
There's no silver bullet without looking at the code (and the profiling results) in detail.
The only thing that is a no-brainer is enforcing relationships in the models and in the database. This prevents a whole host of bugs, encourages the use of standardized, performant access (rather than concocting SQL on the spot which more often than not is likely to be buggy and slow) and makes your code both shorter and a lot more readable.
Other than that, 50-60 queries can be a lot (if you could do the same job with one or two) or it can be just right - it depends on what you achieve with them.
The use of prefetch_related and select_related is important, yes – but only if used correctly; otherwise it can slow you down instead of speeding you up.
Nested serializers are the correct approach if you need the data – but you need to set up your querysets properly in your viewset if you want them to be fast.
Time the main parts of slow views, inspect the SQL queries sent and check if you really need all data that is returned.
Then you can look at the sore spots and gain time where it matters. Asking specific questions on SO with complete code examples can also get you far fast.
If you have just one top-level object, you can refine the approach offered by #jensmtg, doing all the prefetches that you need at that level and then for the lower levels just using ModelSerializers (not SerializerMethodFields) that access the prefetched objects. Look into the Prefetch object that allows nested prefetching.
But be aware that prefetch_related is not for free, it involves quite some processing in Python; you may be better off using flat (db-view-like) joined queries with values() and values_list.
I am working on a project where I need to recalculate values based on if fields changed or not. Here is an example:
Model1:
field_a = DatetimeField()
calculated_field_1 = ForeignKey(Model2)
Model2:
field_j = DatetimeField()
If field_a changes on model1 I have to recalculate the value for field calculated_field_1 to see if it needs to change as well. The calculations that are done require me querying the database to check values of other models and then determining if the value of the calculated field needs to change.
Example) field_a changes then I would have to do this calculation
result = Model2.objects.filter(field_j__gte=Model1.field_a)
If result.exists():
Model1.field_a = result.first()
Model1.save(update_fields=(‘field_a’,))
This is the most basic example I could think of and the queries can be much more complicated than this.
The project started out with one calculation when a field changed so I decided the best approach was to use django signals. Months later the requirements have changed for the project and now there are several other calculations that I had to implement that are very similar to the example above. I have noticed that my post_save function is getting out of hand and I am just wondering what alternatives there are to using signals. Although the post_save calculations I do now take far less than half a second, for the sake of my question pretend they took a second or more.
A valid answer cannot include doing these calculations on the fly when I pull them from the db. We use a validation framework that requires me to set these values on the model and querying on the fly has been an approach we attempted but for performance reasons it was not viable. Also, on field change one of the requirements is that the user needs to see the results of the calculated field so this has to happen synchronously.
What are some alternative approaches to using this pattern?
My application creates several rows of data per customer per day. Each row is modified as necessary using a form. Several modifications to a row may take place daily. At the end of the day the customer will "commit" the changes, at which point no further changes will be allowed. In each row I have a 'stage' field, stage=1 allows edits, stage=2 is committed, no further changes allows.
How can I update the stage value to 2 on commit?
In my model I have:
#property
def commit_stage(self):
self.stage = 2
self.save()
Is this the correct way to do this? And if so, how to I attach this function to a "commit" button.
I suspect you are confused about what properties do. You should absolutely not attach this sort of functionality to a property. It would be extremely surprising behaviour for something which is supposed to retrieve a value from an instance to instead modify it and save it to the db.
You can of course put this in a standard model method. But it's so trivial there is no point in doing so.
In terms of "attaching it to a button", nothing in Django can be called from the front-end without a URL and a view. You need a view that accepts the ID of the model instance from a POST request, gets the instance and modifies its stage value, then saves it. There is nothing different from the form views you already use.
First of all, sorry if this isn't an appropriate question for StackOverflow. I've tried to make it as generalisable as possible.
I want to create a database (MySQL, site running Django) that has users, who can be allocated a certain number of points for various types of action - it's a collaborative game. My requirements are to obtain:
the number of points a user has
the user's ranking compared to all other users
and the overall leaderboard (i.e. all users ranked in order of points)
This is what I have so far, in my Django models.py file:
class SiteUser(models.Model):
name = models.CharField(max_length=250 )
email = models.EmailField(max_length=250 )
date_added = models.DateTimeField(auto_now_add=True)
def points_total(self):
points_added = PointsAdded.objects.filter(user=self)
points_total = 0
for point in points_added:
points_total += point.points
return points_total
class PointsAdded(models.Model):
user = models.ForeignKey('SiteUser')
action = models.ForeignKey('Action')
date_added = models.DateTimeField(auto_now_add=True)
def points(self):
points = Action.objects.filter(action=self.action)
return points
class Action(models.Model):
points = models.IntegerField()
action = models.CharField(max_length=36)
However it's rapidly becoming clear to me that it's actually quite complex (in Django query terms at least) to figure out the user's ranking and return the leaderboard of users. At least, I'm finding it tough. Is there a more elegant way to do something like this?
This question seems to suggest that I shouldn't even have a separate points table - what do people think? It feels more robust to have separate tables, but I don't have much experience of database design.
this is old, but I'm not sure exactly why you have 2 separate tables (Points Added & Action). It's late, so maybe my mind isn't ticking, but it seems like you just separated one table into 2 for some reason. It doesn't seem like you get any benefit out of it. It's not like there's a 1 to many relationship in it right?
So first of all, I would combine those two tables. Secondly, you are probably better off storing points_total into a value in your site_user table. This is what I think Demitry is trying to allude to, but didn't say explicitly. This way instead of doing this whole additional query (pulling everything a user has done in his history of the site is expensive) + looping action (going through it is even more expensive), you can just pull it as one field. It's denormalizing the data for a greater good.
Just be sure to update the value everytime you add in something that has points. You can use django's post_save signal to do that
It's a bit more difficult to have points saved in the same table, but it's totally worth it. You can do very simple ordering/filtering operations if you have computed points total on user model. And you can count totals only when something changes (not every time you want to show them). Just put some validation logic into post_save signals and make sure to cover this logic with tests and you're good.
p.s. denormalization on wiki.