Django Rest FrameWork Reduce number of queries using group by - django

I am writing an api using Django Rest Frameworks. The api fetches a list of clients.A Clients has many projects. My api should returns the list of clients with number of projects completed, pending and total. My api works, but it has too many sql queries. The api is paginated
class ClientViewSet(ModelViewSet):
"""
A simple view for creating clients, updating and retrieving
"""
model = Client
queryset = Client.objects.all()
serializer_class = ClientSerializer
Now my client Serializer
class ClientSerializer(serializers.ModelSerializer):
total_projects_count = serializers.SerializerMethodField()
on_going_projects_count = serializers.SerializerMethodField()
completed_projects_count = serializers.SerializerMethodField()
class Meta:
model = Client
fields = __all__
def get_total_projects_count(self, obj):
return obj.total_projects_count()
def get_on_going_projects_count(self, obj):
return obj.on_going_project_count()
def get_completed_projects_count(self, obj):
return obj.completed_projects_count()
Project has a client foreign key. I tried to fetch all products like below and group by using annotate. But annotate worked only on a single field.
projects = Project.objects.filter(client__in=queryset).values('client', 'status')
How to do group by on multiple fields and pass that extra argument to serializer. Or is there any better approach. I also tried prefetch_related but the total_projects_count was still exuecting new sql queries

You need to annotate the calculated fields in the queryset and then, instead of calling the methods, use the annotated columns to return the relevant result. This will make sure that all data is retrieved using a single query, which will definitely be faster.
Update your queryset.
class ClientViewSet(ModelViewSet):
"""
A simple view for creating clients, updating and retrieving
"""
model = Client
queryset = Client.objects.annotate(total_projects_count_val=...)
serializer_class = ClientSerializer
Then, in your serializer, use the annotated column
class ClientSerializer(serializers.ModelSerializer):
total_projects_count = serializers.SerializerMethodField()
on_going_projects_count = serializers.SerializerMethodField()
completed_projects_count = serializers.SerializerMethodField()
class Meta:
model = Client
fields = __all__
def get_total_projects_count(self, obj):
return obj.total_projects_count_val
...
Looking at the method names, I think you will need Case-When annotation.

I reduced the query by using the below queries
from django.db.models import Count, Q
pending = Count('project', filter=Q(project__status="pending"))
finished = Count('project', filter=Q(project__status="finished"))
queryset = Client.objects.annotate(pending=pending).annotate(finished=finished)
Now was able to access queryset[0].finished etc . As I was using pagination provided drf the query generated was
SELECT "clients_client"."id",
"clients_client"."created_at",
"clients_client"."updated_at",
"clients_client"."client_name",
"clients_client"."phone_number",
"clients_client"."email",
"clients_client"."address_lane",
"clients_client"."state",
"clients_client"."country",
"clients_client"."zipCode",
"clients_client"."registration_number",
"clients_client"."gst",
COUNT("projects_project"."id") FILTER (WHERE "projects_project"."status" = 'pending') AS "pending",
COUNT("projects_project"."id") FILTER (WHERE "projects_project"."status" = 'finished') AS "finished"
FROM "clients_client"
LEFT OUTER JOIN "projects_project"
ON ("clients_client"."id" = "projects_project"."client_id")
GROUP BY "clients_client"."id"
ORDER BY "clients_client"."id" ASC
LIMIT 6

Related

How to filter relationships based on their fields in response using Django REST Framework

I have three models: House, Resident, Car.
Each House has many Residents (One to Many). Each Resident has 0 or 1 cars (One to One).
For my frontend, I want to display all the residents of a house that have a car.
Django Rest Framework suggests using Filtering, but this only works at the top level. For example, in my HouseDetailView(generics.RetrieveAPIView), I can only modify the queryset of the House model itself. I want to be able to modify the queryset of the Resident (resident_queryset.exclude(car=None)).
class HouseDetailView(generics.RetrieveAPIView):
queryset = House.objects.all()
serializer_class = HouseSerializer
Can/Should I do this all in one request?
Are query parameters my only way of filtering?
# If you want to display all the residents of a house that have a car, then you should query the car model
class CarDetailView(generics.RetrieveAPIView):
queryset = Car.objects.all()
serializer_class = CarSerializer
serializers.py
class CarSerializer(serializers.ModelSerializer):
# get the resident details (name)
resident_name = serializers.SerializerMethodField('get_resident_name')
def get_resident_name(self, obj):
return obj.resident.name
class Meta:
model = Car
fields = ("name", "resident_name")
You can use Prefetch to filter related objects:
from django.db.models import Prefetch
class HouseDetailView(generics.RetrieveAPIView):
serializer_class = HouseSerializer
def get_queryset(self):
return House.objects.prefetch_related(Prefetch('resident_set', queryset=Resident.objects.exclude(car__isnull=True)))
Note resident_set is reverse name for Resident model and may be different for you base on related_name argument.

displaying group permissions causing duplicate queries in django

TLDR : I wanted to serialize a Group along with its permissions name. But lot of duplicate queries of content_type from Permission Model occurred. I tried to solve it through prefetch, but didn't work. What am i doing wrong?
so my serializer for retreive method is given below
class RetrieveGroupSerializer(serializers.ModelSerializer):
user_set = UserSerializer(many=True, read_only=True)
permissions = PermissionsSerializer(many=True, read_only=True)
class Meta:
model = Group
fields = ('name', 'user_set', 'permissions')
The serializer for list method is given below
class GroupSerializer(serializers.ModelSerializer):
user_set = UserSerializer(many=True)
permissions = PermissionsSerializer(many=True)
class Meta:
model = Group
fields = ('url', 'user_set', 'permissions')
the views is given below
class GroupViewSet(
mixins.CreateModelMixin,
mixins.RetrieveModelMixin,
mixins.UpdateModelMixin,
mixins.ListModelMixin,
viewsets.GenericViewSet):
"""
Creates, Updates, and retrives User Groups
"""
queryset = Group.objects.all().prefetch_related('user_set').prefetch_related('permissions__content_type')
serializer_class = GroupSerializer
permission_classes = (
IsAuthenticated,
)
action_serializer_classes = {
"create": CreateGroupSerializer,
"retrieve": RetrieveGroupSerializer,
"update": UpdateGroupSerializer
}
def get_serializer_class(self):
try:
return self.action_serializer_classes[self.action]
except (KeyError, AttributeError):
return super(GroupViewSet, self).get_serializer_class()
When I use the list method I am not facing any duplicate queries, but when I use the retrieve method on any single group instance I am getting lot of duplicate queries.
As you can see the content_type from Permission Model is getting queried 62 times.
So I used prefetch_related on the Foreign Key in Permission Model. But the result is the same.
But the same queryset works well for the List method and doesn't cause duplicate queries. you can see that below
Other than the problem of duplicate queries I am also confused how can the same queryset cause such different results?
That's likely because the browsable API also displays a create / update form that has a dropdown with the content types and won't use the prefetch optimizations.
Try to get it as JSON and see how many requests it performs or remove the permissions on updates to see if it changes the query quantities.

Django REST Framework: Setting up prefetching for nested serializers

My Django-powered app with a DRF API is working fine, but I've started to run into performance issues as the database gets populated with actual data. I've done some profiling with Django Debug Toolbar and found that many of my endpoints issue tens to hundreds of queries in the course of returning their data.
I expected this, since I hadn't previously optimized anything with regard to database queries. Now that I'm setting up prefetching, however, I'm having trouble making use of properly prefetched serializer data when that serializer is nested in a different serializer. I've been using this awesome post as a guide for how to think about the different ways to prefetch.
Currently, my ReadingGroup serializer does prefetch properly when I hit the /api/readinggroups/ endpoint. My issue is the /api/userbookstats/ endpoint, which returns all UserBookStats objects. The related serializer, UserBookStatsSerializer, has a nested ReadingGroupSerializer.
The models, serializers, and viewsets are as follows:
models.py
class ReadingGroup(models.model):
owner = models.ForeignKeyField(settings.AUTH_USER_MODEL)
users = models.ManyToManyField(settings.AUTH_USER_MODEL)
book_type = models.ForeignKeyField(BookType)
....
<other group related fields>
def __str__(self):
return '%s group: %s' % (self.name, self.book_type)
class UserBookStats(models.Model):
reading_group = models.ForeignKey(ReadingGroup)
user = models.ForeignKey(settings.AUTH_USER_MODEL)
alias = models.CharField()
total_books_read = models.IntegerField(default=0)
num_books_owned = models.IntegerField(default=0)
fastest_read_time = models.IntegerField(default=0)
average_read_time = models.IntegerField(default=0)
serializers.py
class ReadingGroupSerializer(serializers.ModelSerializer):
users = UserSerializer(many = True,read_only=True)
owner = UserSerializer(read_only=True)
class Meta:
model = ReadingGroup
fields = ('url', 'id','owner', 'users')
#staticmethod
def setup_eager_loading(queryset):
#select_related for 'to-one' relationships
queryset = queryset.select_related('owner')
#prefetch_related for 'to-many' relationships
queryset = queryset.prefetch_related('users')
return queryset
class UserBookStatsSerializer(serializers.HyperlinkedModelSerializer):
reading_group = ReadingGroupSerializer()
user = UserSerializer()
awards = AwardSerializer(source='award_set', many=True)
class Meta:
model = UserBookStats
fields = ('url', 'id', 'alias', 'total_books_read', 'num_books_owned',
'average_read_time', 'fastest_read_time', 'awards')
#staticmethod
def setup_eager_loading(queryset):
#select_related for 'to-one' relationships
queryset = queryset.select_related('user')
#prefetch_related for 'to-many' relationships
queryset = queryset.prefetch_related('awards_set')
#setup prefetching for nested serializers
groups = Prefetch('reading_group', queryset ReadingGroup.objects.prefetch_related('userbookstats_set'))
queryset = queryset.prefetch_related(groups)
return queryset
views.py
class ReadingGroupViewset(views.ModelViewset):
def get_queryset(self):
qs = ReadingGroup.objects.all()
qs = self.get_serializer_class().setup_eager_loading(qs)
return qs
class UserBookStatsViewset(views.ModelViewset):
def get_queryset(self):
qs = UserBookStats.objects.all()
qs = self.get_serializer_class().setup_eager_loading(qs)
return qs
I've optimized the prefetching for the ReadingGroup endpoint (I actually posted about eliminating duplicate queries for that endpoint here), and now I'm working on the UserBookStats endpoint.
The issue I'm having is that, with my current setup_eager_loading in the UserBookStatsSerializer, it doesn't appear to use the prefetching set up by the eager loading method in the ReadingGroupSerializer. I'm still a little hazy on the syntax for the Prefetch object - I was inspired by this excellent answer to try that approach.
Obviously the get_queryset method of UserBookStatsViewset doesn't call setup_eager_loading for the ReadingGroup objects, but I'm sure there's a way to accomplish the same prefetching.
prefetch_related() supports prefetching inner relations by using double underscore syntax:
queryset = queryset.prefetch_related('reading_group', 'reading_group__users', 'reading_group__owner')
I don't think Django REST provides any elegant solutions out of the box for fetching all necessary fields automatically.
An alternative to prefetching all nested relationships manually, there is also a package called django-auto-prefetching which will automatically traverse related fields on your model and serializer to find all the models which need to be mentioned in prefetch_related and select_related calls. All you need to do is add in the AutoPrefetchViewSetMixin to your ViewSets:
from django_auto_prefetching import AutoPrefetchViewSetMixin
class ReadingGroupViewset(AutoPrefetchViewSetMixin, views.ModelViewset):
def get_queryset(self):
qs = ReadingGroup.objects.all()
return qs
class UserBookStatsViewset(AutoPrefetchViewSetMixin, views.ModelViewset):
def get_queryset(self):
qs = UserBookStats.objects.all()
return qs
Any extra prefetches with more complex Prefetch objects can be added in the get_queryset method on the ViewSet.

Filtering Django queryset with custom function

I'm developing a REST API for an existing system that uses custom permission handling. I'm attempting to use the built-in generics from the Django REST Framework, but I'm running into trouble filtering the list views using my custom permissions. An example of my current view is:
class WidgetList(generics.ListCreateAPIView):
permission_classes = (permissions.IsAuthenticated,)
model = Widget
serializer_class = WidgetSerializer
filter_backends = (filters.DjangoFilterBackend,)
filter_fields = ('widget_type', 'widget_owner')
def get_queryset(self):
"""
Overwrite the query set to check permissions
"""
qs_list = [w.id for w in self.model.objects.all() if
canReadWidget(self.request.user, w)]
return self.model.objects.filter(id__in=qs_list)
This works, however I feel like the get_queryset function could be improved. Because my canReadWidget is custom, I have to evaluate self.model.objects.all() and check which widgets the user can read, but the function must return a query set so I use the id__in=qs_list part. The result being that I make two database calls for what is really just one list fetch.
Is there a standard way to handle this kind of per-object filtering for a generic list view?
At some point, it's better to drop the default generic views or function and roll your own.
You should have a look at the ListModelMixin and override the list to deal with the list instead of turning it into a queryset.
You should adapt the filtering and pagination but you won't hit the DB twice as you currently do.
first install django-filter package and register in settings.py
Write this code on filter.py file
import django_filters
from .models import CustomUser
class UserFilter(django_filters.FilterSet):
first_name = django_filters.CharFilter(label="First Name", lookup_expr='icontains')
last_name = django_filters.CharFilter(label="Last Name", lookup_expr='icontains')
email = django_filters.CharFilter(label="Email", lookup_expr='icontains')
mobile_number = django_filters.CharFilter(label="Mobile No.", lookup_expr='icontains')
##Change Your Fields What You Want To Filtering
class Meta:
model = Widget
fields = {'is_verify'}
On Your Views File write this code:
class WidgetViewSet(MyModelViewSet):
queryset = Widget.objects
serializer_class = "pass your serializer"
def get_filter_data(self):
_data = self.queryset.all()
data = UserFilter(self.request.GET, queryset=_data)
return data.qs.order_by('-id')
def get_queryset(self):
return self.get_filter_data()

Django REST Framework Serializers: Display the latest object of a reverse relationship

The default behavior of the ListAPIView (code below) is to serialize all Report objects and the nested Log objects per Report object. What if I only want the latest Log object to be displayed per Report? How do I go about doing that?
# models.py
class Log(models.Model):
# ...
report = models.ForeignKey(Report)
timestamp = models.DateTimeField(default=datetime.datetime.now)
class Report(models.Model):
code = models.CharField(max_length=32, unique=True)
description = models.TextField()
# serializers.py
class LogSerializer(serializers.ModelSerializer):
class Meta:
model = Log
class ReportSerializer(serializers.ModelSerializer):
log_set = LogSerializer(many=True, read_only=True)
class Meta:
model = Report
fields = ('code', 'description', 'log_set')
# views.py
class ReportListView(generics.ListAPIView):
queryset = Report.objects.all()
serializer_class = ReportSerializer
I know I can do this by using a SerializerMethodField, but this can be a potentially expensive operation, since there will be an extra SQL query to retrieve the appropriate Log object for each Report object.
class ReportSerializer(serializers.ModelSerializer):
latest_log = serializers.SerializerMethodField()
class Meta:
model = Report
def get_latest_log(self, obj):
try:
latest_log = Log.objects.filter(report_id=obj.id).latest('timestamp')
except Log.DoesNotExist:
latest_log = None
return latest_log
If I have 1000 report objects, there will be 1000 extra queries if I want to render them all. How do I avoid those extra queries besides using pagination? Can anyone point me to the right direction? Thanks!
EDIT: Regarding the possible duplicate tag, the link alone provided by Mark did not completely clear up the picture for me. Todor's answer was more clear.
You need to somehow annotate the latest_log in the ReportQuerySet, so it can be used by the serializer without making any extra queries.
The simplest way to achieve this is by prefetching all the logs per report. The drawback of this approach is that you load in memory all the logs per report per page. Which is not so bad if one report got something like 5-10-15 logs. This will mean that for a page with 50 reports, you are gonna load 50*10=500 logs which is not a big deal. If there are more logs per report (lets say 100) then you need to make additional filtering of the queryset.
Here is some example code:
Prefetch the logs.
# views.py
class ReportListView(generics.ListAPIView):
queryset = Report.objects.all()\
.prefetch_related(Prefetch('log_set',
queryset=Log.objects.all().order_by('-timestamp'),
to_attr='latest_logs'
))
serializer_class = ReportSerializer
Create a helper method for easy access the latest_log
class Report(models.Model):
#...
#property
def latest_log(self):
if hasattr(self, 'latest_logs') and len(self.latest_logs) > 0:
return self.latest_logs[0]
#you can eventually implement some fallback logic here
#to get the latest log with a query if there is no cached latest_logs
return None
Finally the serializer just use the property
class ReportSerializer(serializers.ModelSerializer):
latest_log = serializers.LogSerializer()
class Meta:
model = Report
An example of a more advanced filtering of the logs can be something like this:
Report.objects.all().prefetch_related(Prefetch('log_set', queryset=Log.objects.all().extra(where=[
"`myapp_log`.`timestamp` = (\
SELECT max(timestamp) \
FROM `myapp_log` l2 \
WHERE l2.report == `myapp_log`.`report`\
)"]
), to_attr='latest_logs'
))
You can use select related argument. It will hit the database only once using JOIN.
class ReportListView(generics.ListAPIView):
queryset = Report.objects.select_related('log');
serializer_class = ReportSerializer