Django REST Framework: Setting up prefetching for nested serializers - django

My Django-powered app with a DRF API is working fine, but I've started to run into performance issues as the database gets populated with actual data. I've done some profiling with Django Debug Toolbar and found that many of my endpoints issue tens to hundreds of queries in the course of returning their data.
I expected this, since I hadn't previously optimized anything with regard to database queries. Now that I'm setting up prefetching, however, I'm having trouble making use of properly prefetched serializer data when that serializer is nested in a different serializer. I've been using this awesome post as a guide for how to think about the different ways to prefetch.
Currently, my ReadingGroup serializer does prefetch properly when I hit the /api/readinggroups/ endpoint. My issue is the /api/userbookstats/ endpoint, which returns all UserBookStats objects. The related serializer, UserBookStatsSerializer, has a nested ReadingGroupSerializer.
The models, serializers, and viewsets are as follows:
models.py
class ReadingGroup(models.model):
owner = models.ForeignKeyField(settings.AUTH_USER_MODEL)
users = models.ManyToManyField(settings.AUTH_USER_MODEL)
book_type = models.ForeignKeyField(BookType)
....
<other group related fields>
def __str__(self):
return '%s group: %s' % (self.name, self.book_type)
class UserBookStats(models.Model):
reading_group = models.ForeignKey(ReadingGroup)
user = models.ForeignKey(settings.AUTH_USER_MODEL)
alias = models.CharField()
total_books_read = models.IntegerField(default=0)
num_books_owned = models.IntegerField(default=0)
fastest_read_time = models.IntegerField(default=0)
average_read_time = models.IntegerField(default=0)
serializers.py
class ReadingGroupSerializer(serializers.ModelSerializer):
users = UserSerializer(many = True,read_only=True)
owner = UserSerializer(read_only=True)
class Meta:
model = ReadingGroup
fields = ('url', 'id','owner', 'users')
#staticmethod
def setup_eager_loading(queryset):
#select_related for 'to-one' relationships
queryset = queryset.select_related('owner')
#prefetch_related for 'to-many' relationships
queryset = queryset.prefetch_related('users')
return queryset
class UserBookStatsSerializer(serializers.HyperlinkedModelSerializer):
reading_group = ReadingGroupSerializer()
user = UserSerializer()
awards = AwardSerializer(source='award_set', many=True)
class Meta:
model = UserBookStats
fields = ('url', 'id', 'alias', 'total_books_read', 'num_books_owned',
'average_read_time', 'fastest_read_time', 'awards')
#staticmethod
def setup_eager_loading(queryset):
#select_related for 'to-one' relationships
queryset = queryset.select_related('user')
#prefetch_related for 'to-many' relationships
queryset = queryset.prefetch_related('awards_set')
#setup prefetching for nested serializers
groups = Prefetch('reading_group', queryset ReadingGroup.objects.prefetch_related('userbookstats_set'))
queryset = queryset.prefetch_related(groups)
return queryset
views.py
class ReadingGroupViewset(views.ModelViewset):
def get_queryset(self):
qs = ReadingGroup.objects.all()
qs = self.get_serializer_class().setup_eager_loading(qs)
return qs
class UserBookStatsViewset(views.ModelViewset):
def get_queryset(self):
qs = UserBookStats.objects.all()
qs = self.get_serializer_class().setup_eager_loading(qs)
return qs
I've optimized the prefetching for the ReadingGroup endpoint (I actually posted about eliminating duplicate queries for that endpoint here), and now I'm working on the UserBookStats endpoint.
The issue I'm having is that, with my current setup_eager_loading in the UserBookStatsSerializer, it doesn't appear to use the prefetching set up by the eager loading method in the ReadingGroupSerializer. I'm still a little hazy on the syntax for the Prefetch object - I was inspired by this excellent answer to try that approach.
Obviously the get_queryset method of UserBookStatsViewset doesn't call setup_eager_loading for the ReadingGroup objects, but I'm sure there's a way to accomplish the same prefetching.

prefetch_related() supports prefetching inner relations by using double underscore syntax:
queryset = queryset.prefetch_related('reading_group', 'reading_group__users', 'reading_group__owner')
I don't think Django REST provides any elegant solutions out of the box for fetching all necessary fields automatically.

An alternative to prefetching all nested relationships manually, there is also a package called django-auto-prefetching which will automatically traverse related fields on your model and serializer to find all the models which need to be mentioned in prefetch_related and select_related calls. All you need to do is add in the AutoPrefetchViewSetMixin to your ViewSets:
from django_auto_prefetching import AutoPrefetchViewSetMixin
class ReadingGroupViewset(AutoPrefetchViewSetMixin, views.ModelViewset):
def get_queryset(self):
qs = ReadingGroup.objects.all()
return qs
class UserBookStatsViewset(AutoPrefetchViewSetMixin, views.ModelViewset):
def get_queryset(self):
qs = UserBookStats.objects.all()
return qs
Any extra prefetches with more complex Prefetch objects can be added in the get_queryset method on the ViewSet.

Related

Django Rest FrameWork Reduce number of queries using group by

I am writing an api using Django Rest Frameworks. The api fetches a list of clients.A Clients has many projects. My api should returns the list of clients with number of projects completed, pending and total. My api works, but it has too many sql queries. The api is paginated
class ClientViewSet(ModelViewSet):
"""
A simple view for creating clients, updating and retrieving
"""
model = Client
queryset = Client.objects.all()
serializer_class = ClientSerializer
Now my client Serializer
class ClientSerializer(serializers.ModelSerializer):
total_projects_count = serializers.SerializerMethodField()
on_going_projects_count = serializers.SerializerMethodField()
completed_projects_count = serializers.SerializerMethodField()
class Meta:
model = Client
fields = __all__
def get_total_projects_count(self, obj):
return obj.total_projects_count()
def get_on_going_projects_count(self, obj):
return obj.on_going_project_count()
def get_completed_projects_count(self, obj):
return obj.completed_projects_count()
Project has a client foreign key. I tried to fetch all products like below and group by using annotate. But annotate worked only on a single field.
projects = Project.objects.filter(client__in=queryset).values('client', 'status')
How to do group by on multiple fields and pass that extra argument to serializer. Or is there any better approach. I also tried prefetch_related but the total_projects_count was still exuecting new sql queries
You need to annotate the calculated fields in the queryset and then, instead of calling the methods, use the annotated columns to return the relevant result. This will make sure that all data is retrieved using a single query, which will definitely be faster.
Update your queryset.
class ClientViewSet(ModelViewSet):
"""
A simple view for creating clients, updating and retrieving
"""
model = Client
queryset = Client.objects.annotate(total_projects_count_val=...)
serializer_class = ClientSerializer
Then, in your serializer, use the annotated column
class ClientSerializer(serializers.ModelSerializer):
total_projects_count = serializers.SerializerMethodField()
on_going_projects_count = serializers.SerializerMethodField()
completed_projects_count = serializers.SerializerMethodField()
class Meta:
model = Client
fields = __all__
def get_total_projects_count(self, obj):
return obj.total_projects_count_val
...
Looking at the method names, I think you will need Case-When annotation.
I reduced the query by using the below queries
from django.db.models import Count, Q
pending = Count('project', filter=Q(project__status="pending"))
finished = Count('project', filter=Q(project__status="finished"))
queryset = Client.objects.annotate(pending=pending).annotate(finished=finished)
Now was able to access queryset[0].finished etc . As I was using pagination provided drf the query generated was
SELECT "clients_client"."id",
"clients_client"."created_at",
"clients_client"."updated_at",
"clients_client"."client_name",
"clients_client"."phone_number",
"clients_client"."email",
"clients_client"."address_lane",
"clients_client"."state",
"clients_client"."country",
"clients_client"."zipCode",
"clients_client"."registration_number",
"clients_client"."gst",
COUNT("projects_project"."id") FILTER (WHERE "projects_project"."status" = 'pending') AS "pending",
COUNT("projects_project"."id") FILTER (WHERE "projects_project"."status" = 'finished') AS "finished"
FROM "clients_client"
LEFT OUTER JOIN "projects_project"
ON ("clients_client"."id" = "projects_project"."client_id")
GROUP BY "clients_client"."id"
ORDER BY "clients_client"."id" ASC
LIMIT 6

Using `search=term1,term2` to match multiple tags for the same object using DRF

I'm uplifting an old Django 1.11 codebase to recent versions of Django and Django Rest Framework, but I've run into a hard wall around how the ?search=... filter works when using multiple terms in recent versions of Django Rest Framework.
Up until DRF version 3.6.3 it was possible to do a ?search=term1,term2 endpoint request and have DRF return objects with many-to-many relations in which both search terms matched the same field name, e.g if the model had a many-to-many field called tags relating to some model Tag, then an object with tags cake and baker could be found by DRF by asking for ?search=cake,baker.
In the codebase I'm uplifting, the (reduced) code for this looks like:
class TagQuerySet(models.query.QuerySet):
def public(self):
return self
class Tag(models.Model):
name = models.CharField(unique=True, max_length=150)
objects = TagQuerySet.as_manager()
def _get_entry_count(self):
return self.entries.count()
entry_count = property(_get_entry_count)
def __str__(self):
return str(self.name)
class Meta:
ordering = ['name',]
class Entry(models.Model):
title = models.CharField(max_length=140)
description = models.CharField(max_length=600, blank=True)
tags = models.ManyToManyField(Tag, related_name='entries', blank=True)
def __str__(self):
return str(self.title)
class Meta:
verbose_name_plural = "entries"
ordering = ['-id']
class EntryCustomFilter(filters.FilterSet):
tag = django_filters.CharFilter(name='tags__name', lookup_expr='iexact', )
class Meta:
model = Entry
fields = [ 'tags', ]
class EntriesListView(ListCreateAPIView):
"""
- `?search=` - Searches title, description, and tags
- `&format=json` - return results in JSON rather than HTML format
"""
filter_backends = (filters.DjangoFilterBackend, filters.SearchFilter, )
filter_class = EntryCustomFilter
search_fields = ('title', 'description', 'tags__name', )
parser_classes = ( JSONParser, )
However, this kind of behaviour for search got inadvertently changed in 3.6.4, so that DRF now instead only matches if a single relation found through a many-to-many field matches all terms. So, an Entry with a tags field that has relations to Tag(name="cake") and Tag(name="baker") no longer matches, as there is no single Tag that matches both terms, but an Entry with Tag(name="baker of cake") and Tag(name="teller of tales") does match, as there is a single relation that matches both terms.
There is (at least at the time of writing) no documentation that I can find that explains how to achieve this older behaviour for the generic search filter, nor can I find any previously asked questions here on Stackoverflow about making DRF work like this again (or even "at all"). There are some questions around specific field-named filters, but none for search=.
So: what changes can I make here so that ?search=... keeps working as before, while using a DRF version 3.6.4+? I.e. how does one make the ?search=term1,term2 filter find models in which many-to-many fields have separate relations that match one or more of the specified terms?
This is expected behavior in DRF, introduced in order to optimize the M2M search/filter, as of 3.6.4. The reason this was introduced was to prevent a combinatorial explosion when using more than one term (See "SearchFilter time grows exponentially by # of search terms" and its associated PR "Fix SearchFilter to-many behavior/performance " for more details).
In order to perform the same kind of matching as in 3.6.3 and below, you need to create a custom search filter class by extending filters.SearchFilter, and add a custom implementaiton for the filter_queryset definition (the original definition can be found here for DRF v3.6.3).
from rest_framework import filters
import operator
from functools import reduce
from django.db import models
from rest_framework.compat import distinct
class CustomSearchFilter(filters.SearchFilter):
def required_m2m_optimization(self, view):
return getattr(view, 'use_m2m_optimization', True)
def get_search_fields(self, view, request):
# For DRF versions >=3.9.2 remove this method,
# as it already has get_search_fields built in.
return getattr(view, 'search_fields', None)
def chained_queryset_filter(self, queryset, search_terms, orm_lookups):
for search_term in search_terms:
queries = [
models.Q(**{orm_lookup: search_term})
for orm_lookup in orm_lookups
]
queryset = queryset.filter(reduce(operator.or_, queries))
return queryset
def optimized_queryset_filter(self, queryset, search_terms, orm_lookups):
conditions = []
for search_term in search_terms:
queries = [
models.Q(**{orm_lookup: search_term})
for orm_lookup in orm_lookups
]
conditions.append(reduce(operator.or_, queries))
return queryset.filter(reduce(operator.and_, conditions))
def filter_queryset(self, request, queryset, view):
search_fields = self.get_search_fields(view, request)
search_terms = self.get_search_terms(request)
if not search_fields or not search_terms:
return queryset
orm_lookups = [
self.construct_search(str(search_field))
for search_field in search_fields
]
base = queryset
if self.required_m2m_optimization(view):
queryset = self.optimized_queryset_filter(queryset, search_terms, orm_lookups)
else:
queryset = self.chained_queryset_filter(queryset, search_terms, orm_lookups)
if self.must_call_distinct(queryset, search_fields):
# Filtering against a many-to-many field requires us to
# call queryset.distinct() in order to avoid duplicate items
# in the resulting queryset.
# We try to avoid this if possible, for performance reasons.
queryset = distinct(queryset, base)
return queryset
Then, replace the filters.Searchfilter in your filter_backends with this custom class:
class EntriesListView(ListCreateAPIView):
filter_backends = (
filters.DjangoFilterBackend,
CustomSearchFilter,
...
)
use_m2m_optimization = False # this attribute control the search results
...

get_queryset vs manager in Django

As far as I know, we bring the database by model managers right?
e.g. queryset = Model.objects.all()
But sometimes, I see some code that seems almost same thing but is a bit different,
post = self.get_queryset()
which also fetches database but not by manager.
What's the difference between fetching database by manager and get_queryset() and their usage?
The below example helps you to understand what ModelManager and get_queryset are:
class PersonQuerySet(models.QuerySet):
def authors(self):
return self.filter(role='A')
def editors(self):
return self.filter(role='E')
class PersonManager(models.Manager):
def get_queryset(self):
return PersonQuerySet(self.model, using=self._db)
def authors(self):
return self.get_queryset().authors()
def editors(self):
return self.get_queryset().editors()
class Person(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
role = models.CharField(max_length=1, choices=(('A', _('Author')), ('E', _('Editor'))))
people = PersonManager()
Every model have at least one model manager and get_queryset is the model manager base QuerySet.
When you use Person.objects.all() it will return all results from Person model, not filters or anything else.
In the above example, we use a custom model manager named PersonManager, where we override get_queryset.
Firstly, we apply authors filter with role='A'.
Secondly, we apply editors filter with role='E'.
So, now if we use Person.people.all() it will return only authors. Look here we use all() but it returns only authors. Because we override the Default model manager queryset.

Django Beginner. How do I update all objects and set a certain field to a value that is a function of another field?

EDIT:
I needed a student_count field in course because I'm supposed to use this model for REST API. If you can tell me how to add fields in serializer without adding to model, I'd take that too.
This is my model:
class Course(models.Model):
student_count = models.PositiveIntegerField(default=0)
students = models.ManyToManyField()
What I try to do is the following but it doesn't work.
Course.objects.update(student_count=F('students__count'))
The following works but it's not ideal
courses = Course.objects.all()
for course in courses:
course.student_count = course.students.count()
course.save()
return Course.objects.all()
Try to add/update your serializer as below,
from django.db.models import Count
class CourseSerializer(serializers.ModelSerializer):
count = serializers.SerializerMethodField(read_only=True)
def get_count(self, model):
return model.students.aggregate(count=Count('id'))['count']
class Meta:
model = Course
fields = ('count',)
You can use aggregation Count() method instead of queryset.count() its more faster. Read this SO post, aggregate(Count()) vs queryset.count()

Filtering Django queryset with custom function

I'm developing a REST API for an existing system that uses custom permission handling. I'm attempting to use the built-in generics from the Django REST Framework, but I'm running into trouble filtering the list views using my custom permissions. An example of my current view is:
class WidgetList(generics.ListCreateAPIView):
permission_classes = (permissions.IsAuthenticated,)
model = Widget
serializer_class = WidgetSerializer
filter_backends = (filters.DjangoFilterBackend,)
filter_fields = ('widget_type', 'widget_owner')
def get_queryset(self):
"""
Overwrite the query set to check permissions
"""
qs_list = [w.id for w in self.model.objects.all() if
canReadWidget(self.request.user, w)]
return self.model.objects.filter(id__in=qs_list)
This works, however I feel like the get_queryset function could be improved. Because my canReadWidget is custom, I have to evaluate self.model.objects.all() and check which widgets the user can read, but the function must return a query set so I use the id__in=qs_list part. The result being that I make two database calls for what is really just one list fetch.
Is there a standard way to handle this kind of per-object filtering for a generic list view?
At some point, it's better to drop the default generic views or function and roll your own.
You should have a look at the ListModelMixin and override the list to deal with the list instead of turning it into a queryset.
You should adapt the filtering and pagination but you won't hit the DB twice as you currently do.
first install django-filter package and register in settings.py
Write this code on filter.py file
import django_filters
from .models import CustomUser
class UserFilter(django_filters.FilterSet):
first_name = django_filters.CharFilter(label="First Name", lookup_expr='icontains')
last_name = django_filters.CharFilter(label="Last Name", lookup_expr='icontains')
email = django_filters.CharFilter(label="Email", lookup_expr='icontains')
mobile_number = django_filters.CharFilter(label="Mobile No.", lookup_expr='icontains')
##Change Your Fields What You Want To Filtering
class Meta:
model = Widget
fields = {'is_verify'}
On Your Views File write this code:
class WidgetViewSet(MyModelViewSet):
queryset = Widget.objects
serializer_class = "pass your serializer"
def get_filter_data(self):
_data = self.queryset.all()
data = UserFilter(self.request.GET, queryset=_data)
return data.qs.order_by('-id')
def get_queryset(self):
return self.get_filter_data()