Django Rest Framework: optimize nester serializers performance - django

I have problem with my endpoint performance which returns around 40 items and response takes around 17 seconds.
I have model:
class GameTask(models.Model):
name= models.CharField()
description = RichTextUploadingField()
...
and another model like that:
class TaskLevel(models.Model):
master_task = models.ForeignKey(GameTask, related_name="sub_levels", on_delete-models.CASCADE)
sub_tasks = models.ManyToManyField(GameTask, related_name="master_levels")
...
So basicly I can have "normal" tasks, but when I create TaskLevel object I can add master_task as a task which gonna fasten other tasks added to sub_tasks field.
My serializers look like:
class TaskBaseSerializer(serializers.ModelSerializer):
fields created by serializers.SerializerMethodField()
...
class TaskLevelSerializer(serializers.ModelSerializer):
sub_tasks = serializers.SerializerMethodField()
class Meta:
model = TaskLevel
def get_sub_tasks(self, obj: TaskLevel):
sub_tasks = get_sub_tasks(level=obj, user=self.context["request"].user) # method from other module
return TaskBaseSerializer(sub_tasks, many=True, context=self.context).data
class TaskSerializer(TaskBaseSerializer):
levels_config = serializers.SerializerMethodField()
class Meta:
model = GameTask
def get_levels_config(self, obj: GameTask):
if is_mastertask(obj):
return dict(
sub_levels=TaskLevelSerializer(
obj.sub_levels.all().order_by("number"), many=True, context=self.context
).data,
progress=get_progress(
master_task=obj, user=self.context["request"].user
),
)
return None
When I tried to measure time it turned out that get_levels_config method takes around 0.25 seconds for one multilevel-task (which contain 7 subtasks). Is there any way to improve this performance? If any more detailed methods are needed I will add them

Your code might be suffering from N+1 problem. TaskSerializer.get_levels_config() performs database queries from obj.sub_levels.all().order_by("number").
What happens when serializing multiple instances like:
TaskSerializer(tasks, many=True)
each instance calls .get_levels_config()
You can use prefetch_related & selected_related(more explanation here).
You will have to manually check for prefetched objects since you are using SerializerMethodField. There's also the functions get_progress & get_sub_tasks which I assume does another query.
Here are some examples that can be used around your code:
Prefetching:
GameTask.objects.prefetch_related("sub_levels", "master_levels")
# Accessing deeper level
GameTask.objects.prefetch_related(
"sub_levels",
"master_levels",
"sub_levels__sub_tasks",
).select_related(
"master_levels__master_task",
)
Checking prefetch:
def get_sub_tasks(self, obj: TaskLevel):
if hasattr(obj, "_prefetched_objects_cache") and obj._prefetched_objects_cache.get("sub_tasks", None):
return obj._prefetched_objects_cache
return obj.sub_tasks.all()

Related

How to call instance methods on the nested Serializer with django rest?

I've got two models I'm trying to nest together. Timesheet and Invoice
My InvoiceSerializer looks something like this:
class InvoiceSerializer(serializers.ModelSerializer):
billable_timesheets = serializers.SerializerMethodField()
total_hours_and_cost = serializers.SerializerMethodField()
class Meta:
model = Invoice
fields = (
"hours",
"hour_cost",
"billable_timesheets",
"total_hours_and_cost",
...
)
def get_total_hours_and_cost(self, obj):
return obj.hours * obj.hour_cost
def get_billable_timesheets(self, obj):
"""Getting all timesheets for selected billable period"""
timesheets = obj.project.timesheets.filter(<queryset here>)
return TimesheetSerializer(timesheets, many=True).data
This works fine and all - I can define MethodFields and the correct JSON is returned, great. However, I got a method on my child model (in this case, the Timesheet model) that I need to access and run some calculations on. I'm getting the data necessary via get_billable_timesheets, and now I need to run a method on my Timesheet model called total_duration(). Whenever I try to do something along the lines of
timesheets = self.get_billable_timesheets(obj)
hours = 0
for timesheet in timesheets:
hours += timesheet.total_duration()
I get:
AttributeError: 'collections.OrderedDict' object has no attribute 'total_duration'
What I don't understand is that I'm actually serializing the data already through the get_billable_timesheets method - why am I still receiving this error?
The timesheet in your for loop is an ordered dict instance, since get_billable_timesheets returns serialized data.
A workaround can be,
timesheets = self.get_billable_timesheets(obj)
hours = 0
for timesheet in timesheets:
timesheet_id = timesheet.get('id')
timesheet_obj = Timesheet.objects.get(id=timesheet_id)
hours += timesheet_obj.total_duration()
A working solution for my case was to not return a serialized version of the timesheets, (e.g instead of return TimesheetSerializer(timesheets, many=True).data
I just did return timesheets → did my calculations needed and then serialized the timesheet:
def get_billable_timesheets(self, obj):
"""Getting all timesheets for selected billable period"""
timesheets = obj.project.timesheets.filter(<queryset here>)
return timesheets # instead of a serialized version

Prevent repeating query within SerializerMethodField s

So i got my serializer called like this:
result_serializer = TaskInfoSerializer(tasks, many=True)
And the serializer:
class TaskInfoSerializer(serializers.ModelSerializer):
done_jobs_count = serializers.SerializerMethodField()
total_jobs_count = serializers.SerializerMethodField()
task_status = serializers.SerializerMethodField()
class Meta:
model = Task
fields = ('task_id', 'task_name', 'done_jobs_count', 'total_jobs_count', 'task_status')
def get_done_jobs_count(self, obj):
qs = Job.objects.filter(task__task_id=obj.task_id, done_flag=1)
condition = False
# Some complicate logic to determine condition that I can't reveal due to business
result = qs.count() if condition else 0
# this function take around 3 seconds
return result
def get_total_jobs_count(self, obj):
qs = Job.objects.filter(task__task_id=obj.task_id)
# this query take around 3-5 seconds
return qs.count()
def get_task_status(self, obj):
done_count = self.get_done_jobs_count(obj)
total_count = self.get_total_jobs_count(obj)
if done_count >= total_count:
return 'done'
else:
return 'not yet'
When the get_task_status function is called, it call other 2 function and make those 2 costly query again.
Is there any best way to prevent that? And I dont really know the order of those functions to be called, is it based on the order declare in Meta's fields? Or above that?
Edit:
The logic in get_done_jobs_count is a bit complicate and I cannot make it into a single query when get task
Edit 2:
I just bring all those count function into model and use cached_property
https://docs.djangoproject.com/en/2.1/ref/utils/#module-django.utils.functional
But it raise another question: Is that number reliable? I don't understand much about django cache, is that cached_property is only exist for this instance (just until the API get list of tasks return a response) or will it exist for sometime?
I just try cached_property and it did resolve the problem.
Model:
from django.utils.functional import cached_property
from django.db import models
class Task(models.Model):
task_id = models.AutoField(primary_key=True)
task_name = models.CharField(default='')
#cached_property
def done_jobs_count(self):
qs = self.jobs.filter(done_flag=1)
condition = False
# Some complicate logic to determine condition that I can't reveal due to business
result = qs.count() if condition else 0
# this function take around 3 seconds
return result
#cached_property
def total_jobs_count(self):
qs = Job.objects.filter(task__task_id=obj.task_id)
# this query take around 3-5 seconds
return qs.count()
#property
def task_status(self):
done_count = self.done_jobs_count
total_count = self.total_jobs_count
if done_count >= total_count:
return 'done'
else:
return 'not yet'
Serializer:
class TaskInfoSerializer(serializers.ModelSerializer):
class Meta:
model = Task
fields = ('task_id', 'task_name', 'done_jobs_count', 'total_jobs_count', 'task_status')
You could annotate those values to avoid making extra queries. So the queryset passed to the serializer would look something like this (it might change depending on the Django version you're using and the related query name for jobs):
tasks = tasks.annotate(
done_jobs=Count('jobs', filter=Q(done_flag=1)),
total_jobs=Count('jobs'),
)
result_serializer = TaskInfoSerializer(tasks, many=True)
Then the serializer method would look like:
def get_task_status(self, obj):
if obj.done_jobs >= obj.total_jobs:
return 'done'
else:
return 'not yet'
Edit: a cached_property won't help you if you have to call the method for each task instance (which seems the case). The problem is not so much the calculation but whether you have to hit the database for each separate task. You have to focus on getting all the information needed for the calculation in a single query. If that's impossible or too complicated, maybe think about changing the data structure (models) in order to facilitate that.
Using iterator() and counting Iterator might solve your problem.
job_iter = Job.objects.filter(task__task_id=obj.task_id).iterator()
count = len(list(job_iter))
return count
You can use select_related() and prefetch_related() for Retrieve everything at once if you will need them.
Note: if you use iterator() to run the query, prefetch_related() calls will be ignored
You might want to go through documentation for optimisation

Django Rest Framework Ordering on a SerializerMethodField

I have a Forum Topic model that I want to order on a computed SerializerMethodField, such as vote_count. Here are a very simplified Model, Serializer and ViewSet to show the issue:
# models.py
class Topic(models.Model):
"""
An individual discussion post in the forum
"""
title = models.CharField(max_length=60)
def vote_count(self):
"""
count the votes for the object
"""
return TopicVote.objects.filter(topic=self).count()
# serializers.py
class TopicSerializer(serializers.ModelSerializer):
vote_count = serializers.SerializerMethodField()
def get_vote_count(self, obj):
return obj.vote_count()
class Meta:
model = Topic
# views.py
class TopicViewSet(TopicMixin, viewsets.ModelViewSet):
queryset = Topic.objects.all()
serializer_class = TopicSerializer
Here is what works:
OrderingFilter is on by default and I can successfully order /topics?ordering=title
The vote_count function works perfectly
I'm trying to order by the MethodField on the TopicSerializer, vote_count like /topics?ordering=-vote_count but it seems that is not supported. Is there any way I can order by that field?
My simplified JSON response looks like this:
{
"id": 1,
"title": "first post",
"voteCount": 1
},
{
"id": 2,
"title": "second post",
"voteCount": 8
},
{
"id": 3,
"title": "third post",
"voteCount": 4
}
I'm using Ember to consume my API and the parser is turning it to camelCase. I've tried ordering=voteCount as well, but that doesn't work (and it shouldn't)
This is not possible using the default OrderingFilter, because the ordering is implemented on the database side. This is for efficiency reasons, as manually sorting the results can be incredibly slow and means breaking from a standard QuerySet. By keeping everything as a QuerySet, you benefit from the built-in filtering provided by Django REST framework (which generally expects a QuerySet) and the built-in pagination (which can be slow without one).
Now, you have two options in these cases: figure out how to retrieve your value on the database side, or try to minimize the performance hit you are going to have to take. Since the latter option is very implementation-specific, I'm going to skip it for now.
In this case, you can use the Count function provided by Django to do the count on the database side. This is provided as part of the aggregation API and works like the SQL COUNT function. You can do the equivalent Count call by modifying your queryset on the view to be
queryset = Topic.objects.annotate(vote_count=Count('topicvote_set'))
Replacing topicvote_set with your related_name for the field (you have one set, right?). This will allow you to order the results based on the number of votes, and even do filtering (if you want to) because it is available within the query itself.
This would require making a slight change to your serializer, so it pulls from the new vote_count property available on objects.
class TopicSerializer(serializers.ModelSerializer):
vote_count = serializers.IntegerField(read_only=True)
class Meta:
model = Topic
This will override your existing vote_count method, so you may want to rename the variable used when annotating (if you can't replace the old method).
Also, you can pass a method name as the source of a Django REST framework field and it will automatically call it. So technically your current serializer could just be
class TopicSerializer(serializers.ModelSerializer):
vote_count = serializers.IntegerField(read_only=True)
class Meta:
model = Topic
And it would work exactly like it currently does. Note that read_only is required in this case because a method is not the same as a property, so the value cannot be set.
Thanks #Kevin Brown for your great explanation and answer!
In my case I needed to sort a serializerMethodField called total_donation which is the sum of donations from the UserPayments table.
UserPayments has:
User as a foreignKey
sum which is an IntegerField
related_name='payments'
I needed to get the total donations per User but only donations that have a status of 'donated', not 'pending'. Also needed to filter out the payment_type coupon, which is related through two other foreign keys.
I was dumbfounded how to join and filter those donations and then be able to sort it via ordering_fields.
Thanks to your post I figured it out!
I realized it needed to be part of the original queryset in order to sort with ordering.
All I needed to do was annotate the queryset in my view, using Sum() with filters inside like so:
class DashboardUserListView(generics.ListAPIView):
donation_filter = Q(payments__status='donated') & ~Q(payments__payment_type__payment_type='coupon')
queryset = User.objects.annotate(total_donated=Sum('payments__sum', filter=donation_filter ))
serializer_class = DashboardUserListSerializer
pagination_class = DashboardUsersPagination
filter_backends = [filters.OrderingFilter]
ordering_fields = ['created', 'last_login', 'total_donated' ]
ordering = ['-created',]
I will put it here because the described case is not the only one.
The idea is to rewrite the list method of your Viewset to order by any of your SerializerMethodField(s) also without moving your logic from the Serializer to the ModelManager (especially when you work with several complex methods and/or related models)
def list(self, request, *args, **kwargs):
response = super().list(request, args, kwargs)
ordering = request.query_params.get('ordering')
if "-" in ordering:
response.data['results'] = sorted(response.data['results'], key=lambda k: (k[ordering.replace('-','')], ), reverse=True)
else:
response.data['results'] = sorted(response.data['results'], key=lambda k: (k[ordering], ))
return response

Django queryset with Model method containing another queryset

Suppose I have a model, MyModel, with a property method that uses another model's queryset.
class OtherModel(models.Model)
...
class MyModel(models.Model):
simple_attr = models.CharField('Yada yada')
#property
def complex_attr(self):
list_other_model = OtherModel.objects.all()
...
# Complex algorithm using queryset from 'OtherModel' and simple_attr
return result
This causes my get_queryset() method on MyModel to query the database to generate the list_other_model variable every time for every single row.
Which causes my MyModel ListView to generate hundreds of SQL queries. Not efficient.
How can I architect a Manager or get_queryset method to cache the variable list_other_model for each row when using MyModel.objects.all()?
I hope my question makes sense--I'm on my sixth shot of espresso, and still haven't found a way to reduce the db queries.
Not sure if this is the best way to do it, but it works.
If someone posts a better answer, I'll accept theirs.
class OtherModel(models.Model)
...
class MyModelManager(models.Manager):
def get_queryset(self):
self.model.list_other_model = OtherModel.objects.all()
return super(MyModelManager, self).get_queryset()
class MyModel(models.Model):
simple_attr = models.CharField('Yada yada')
list_other_model = None
objects = MyModelManager()
#property
def complex_attr(self):
...
# Complex algorithm using queryset from 'OtherModel' and simple_attr
return result

'private' models, default query sets and chaining methods

I have a private boolean flag on my model, and a custom manager that overwrites the get_query_set method, with a filter, removing private=True:
class myManager(models.Manager):
def get_query_set(self):
qs = super(myManager, self).get_query_set()
qs = qs.filter(private=False)
return qs
class myModel(models.Model):
private = models.BooleanField(default=False)
owner = models.ForeignKey('Profile', related_name="owned")
#...etc...
objects = myManager()
I want the default queryset to exclude the private models be default as a security measure, preventing accidental usage of the model showing private models.
Sometimes, however, I will want to show the private models, so I have the following on the manager:
def for_user(self, user):
if user and not user.is_authenticated():
return self.get_query_set()
qs = super(myManager, self).get_query_set()
qs = qs.filter(Q(owner=user, private=True) | Q(private=False))
return qs
This works excellently, with the limitation that I can't chain the filter. This becomes a problem when I have a fk pointing the myModel and use otherModel.mymodel_set. otherModel.mymodel_set.for_user(user) wont work because mymodel_set returns a QuerySet object, rather than the manager.
Now the real problem starts, as I can't see a way to make the for_user() method work on a QuerySet subclass, because I can't access the full, unfiltered queryset (basically overwriting the get_query_set) form the QuerySet subclass, like I can in the manager (using super() to get the base queryset.)
What is the best way to work around this?
I'm not tied to any particular interface, but I would like it to be as djangoy/DRY as it can be. Obviously I could drop the security and just call a method to filter out private tasks on each call, but I really don't want to have to do that.
Update
manji's answer below is very close, however it fails when the queryset I want isn't a subset of the default queryset. I guess the real question here is how can I remove a particular filter from a chained query?
Define a custom QuerySet (containing your custom filter methods):
class MyQuerySet(models.query.QuerySet):
def public(self):
return self.filter(private=False)
def for_user(self, user):
if user and not user.is_authenticated():
return self.public()
return self.filter(Q(owner=user, private=True) | Q(private=False))
Define a custom manager that will use MyQuerySet (MyQuerySet custom filters will be accessible as if they were defined in the manager[by overriding __getattr__]):
# A Custom Manager accepting custom QuerySet
class MyManager(models.Manager):
use_for_related_fields = True
def __init__(self, qs_class=models.query.QuerySet):
self.queryset_class = qs_class
super(QuerySetManager, self).__init__()
def get_query_set(self):
return self.queryset_class(self.model).public()
def __getattr__(self, attr, *args):
try:
return getattr(self.__class__, attr, *args)
except AttributeError:
return getattr(self.get_query_set(), attr, *args)
Then in the model:
class MyModel(models.Model):
private = models.BooleanField(default=False)
owner = models.ForeignKey('Profile', related_name="owned")
#...etc...
objects = myManager(MyQuerySet)
Now you can:
¤ access by default only public models:
MyModel.objects.filter(..
¤ access for_user models:
MyModel.objects.for_user(user1).filter(..
Because of (use_for_related_fields = True), this same manager wil be used for related managers. So you can also:
¤ access by default only public models from related managers:
otherModel.mymodel_set.filter(..
¤ access for_user from related managers:
otherModel.mymodel_set.for_user(user).filter(..
More informations: Subclassing Django QuerySets & Custom managers with chainable filters (django snippet)
To use the chain you should override the get_query_set in your manager and place the for_user in your custom QuerySet.
I don't like this solution, but it works.
class CustomQuerySet(models.query.QuerySet):
def for_user(self):
return super(CustomQuerySet, self).filter(*args, **kwargs).filter(private=False)
class CustomManager(models.Manager):
def get_query_set(self):
return CustomQuerySet(self.model, using=self._db)
If you need to "reset" the QuerySet you can access the model of the queryset and call the original manager again (to fully reset). However that's probably not very useful for you, unless you were keeping track of the previous filter/exclude etc statements and can replay them again on the reset queryset. With a bit of planning that actually wouldn't be too hard to do, but may be a bit brute force.
Overall manji's answer is definitely the right way to go.
So amending manji's answer you need to replace the existing "model"."private" = False with ("model"."owner_id" = 2 AND "model"."private" = True ) OR "model"."private" = False ). To do that you will need to walk through the where object on the query object of the queryset to find the relevant bit to remove. The query object has a WhereNode object that represents the tree of the where clause, with each node having multiple children. You'd have to call the as_sql on the node to figure out if it's the one you are after:
from django.db import connection
qn = connection.ops.quote_name
q = myModel.objects.all()
print q.query.where.children[0].as_sql(qn, connection)
Which should give you something like:
('"model"."private" = ?', [False])
However trying to do that is probably way more effort than it's worth and it's delving into bits of Django that are probably not API-stable.
My recommendation would be to use two managers. One that can access everything (an escape hatch of sort), the other with the default filtering applied. The default manager is the first one, so you need to play around with the ordering depending on what you need to do. Then restructure your code to know which one to use - so you don't have the problem of having the extra private=False clause in there already.