QuerySet Optimisations in Django

QuerySet Optimisations in Django - django

I was just wondering, I have the following two pseudo-related queries:
organisation = Organisation.objects.get(pk=org_id)
employees = Employee.objects.filter(organisation=organisation).filter(is_active=True)
Each Employee has a ForeignKey relationship with Organisation.
I was wondering if there is anything I can leverage to do the above in one Query in the native Django ORM?
Also, would:
employees = Employee.objects.filter(organisation__id=organisation.id).filter(is_active=True)
Be a quicker way to fetch employees?
For Willem's reference, employees is then used as:
# Before constructing **parameters, it is neccessary to filter out any supurfluous key, value pair that do not correspond to model attributes:
if len(request.GET.getlist('gender[]')) > 0:
parameters['gender__in'] = request.GET.getlist('gender[]')
employees = employees.filter(**parameters)
if len(request.GET.getlist('age_group[]')) > 0:
parameters['age_group__in'] = request.GET.getlist('age_group[]')
employees = employees.filter(**parameters)
results = SurveyResult.objects.filter(
user__in=employees,
created_date__range=date_range,
).annotate(
date=TruncDate('created_date'),
).values(
'survey',
'date',
).annotate(
score=Sum('normalized_score'),
participants=Count('user'),
).order_by(
'survey',
'date',
)
I omitted this as it seemed like unnecessary complications to my original goal.

Also, would:
employees = Employee.objects.filter(organisation__id=organisation.id).filter(is_active=True)
Be a quicker way to fetch employees?
No, or perhaps marginally, since that is in essence what the Django ORM will do itself: it will simply obtain the primary key of the organisation and then make a query like the one you describe.
If you do not need the organisation itself, you can query with:
employees = Employee.objects.filter(organisation_id=org_pk, is_active=True)
Furthermore you can for example perform a .select_related(..) [Django-doc] on the organisation, to load the data of the organisation in the same query as the one of the employee, although reducing one extra query, usually does not make that much of a difference. Performance is more an issue if iut results in N+1 queries.
We can for example "piggyback" fetching the Organisation details with fetching the employees, like:
employees = list(
Employee.objects.select_related('organization').filter(
organisation_id=org_pk, is_active=True
)
)
if employees: # at least one employee
organization = employees[0].organization
But anyway, as said before the difference between one or two queries is not that much. It is usually more of a problem if you have N+1 queries. It is a bit of a pitty that Django/Python does not seem to have a Haxl [GitHub] equivalent, to enable fast retrieval of (remote) resources through algebraic analysis.
In case you are interested in the Employee servey results, you can query with:
results = SurveyResult.objects.filter(
user__organization_id=org_pk,
created_date__range=date_range,
).annotate(
date=TruncDate('created_date'),
).values(
'survey',
'date',
).annotate(
score=Sum('normalized_score'),
participants=Count('user'),
).order_by(
'survey',
'date',
)
You can thus omit a separate querying of Employees if you do not need these anyway.
You can furthermore add the filters to your query, like:
emp_filter = {}
genders = request.GET.getlist('gender[]')
if genders:
emp_filter['user__gender__in'] = genders
age_groups = request.GET.getlist('age_group[]')
if age_groups:
emp_filter['user__age_group__in'] = age_groups
results = SurveyResult.objects.filter(
user__organization_id=org_pk,
created_date__range=date_range,
**emp_filter
).annotate(
date=TruncDate('created_date'),
).values(
'survey',
'date',
).annotate(
score=Sum('normalized_score'),
participants=Count('user'),
).order_by(
'survey',
'date',
)

if you have a foreign key relation between organisation and employees then you can get the employees using the select_related like this:
employees = Employee.objects.selected_related('organisation').filter(is_active=True)
OR
organisation = Organisation.objects.get(pk=org_id)
employees =organisation.employee_set.all() #your_employee_model_name_set.all

Related

Django query set object query taking too much time?

I am using a job model fetching the related jobs data which is also relationship with others model too for one-to-one or foreign key relations i have used select_related() and pass relative model inside and another one manny-to-manny relations i have used prefetch_related() and pass relative model inside. I have used this queryset on page 2 times based on different conditions filter everythings working fine but 1job queryset taking time and 2nd working well I can't understand that what's thing i missed up and how to resolved it. Please help if any one understand.
Model queryset
jobs = Job.objects.filter(
(Q(job_status__job_status_description='Booked') |
Q(job_status__job_status_description='Allocated') |
Q(job_status__job_is_done=True)) &
Q(completed_date__gte=start.strftime("%Y-%m-%d"), completed_date__lte=week_end_date.strftime("%Y-%m-%d")) |
Q(allocated_date__gte=start.strftime("%Y-%m-%d"), allocated_date__lte=week_end_date.strftime("%Y-%m-%d")),
functools.reduce(operator.and_, jobs_q_condition)
).select_related(
'customer',
'job_status',
'customer__book_with',
'customer__frequency'
).prefetch_related(
'customer__booking_road__area__franchise',
'customer__booking_road__area'
).annotate(
job_id=F('id'),
job_window_cleaner=Concat(
'window_cleaner__user__first_name',
Value(' '),
'window_cleaner__user__last_name'
),
job_window_cleaner_booking_road=Concat(
'customer__booking_road__area__default_cleaner__user__first_name',
Value(' '),
'customer__booking_road__area__default_cleaner__user__last_name'),
job_window_cleaner_id1=F('window_cleaner__user__id'),
job_window_cleaner_id2=F('customer__booking_road__area__default_cleaner__user__id'),
address=F('customer__address_line_1'),
wc_wants=F('customer__booking_road__area__default_cleaner__requested_turnover'),
action_on_check_in_str=F('action_on_check_in'),
booking_info_str=F('customer__booking_info'),
book_with_str=F('customer__book_with__book_with'),
is_job_completed = F('job_status__job_is_done'),
job_status_str = F('job_status__job_status_description'),
area_str=F('customer__booking_road__area__area'),
booking_road_str=F('customer__booking_road__booking_road'),
price_str=Coalesce('price_on_day', 'set_price'),
frequency_text_str=F('customer__frequency__frequency_text'),
due_date_str=F('due_date'),
allocated_date_str=F('allocated_date'),
completed_date_str=F('completed_date')
).order_by(
'job_window_cleaner','area_str','booking_road_str','due_date'
)[:20]
due_jobs = Job.objects.filter(
functools.reduce(operator.and_, due_jobs_q_condition)
).select_related(
'customer',
'job_status',
'customer__book_with',
'customer__frequency'
).prefetch_related(
'customer__booking_road__area__franchise',
'customer__booking_road__area'
).annotate(
job_id=F('id'),
job_window_cleaner=Concat(
'customer__booking_road__area__default_cleaner__user__first_name',
Value(' '),
'customer__booking_road__area__default_cleaner__user__last_name'),
job_window_cleaner_id=F('customer__booking_road__area__default_cleaner__user__id'),
address=F('customer__address_line_1'),
wc_wants=F('customer__booking_road__area__default_cleaner__requested_turnover'),
action_on_check_in_str=F('action_on_check_in'),
booking_info_str=F('customer__booking_info'),
book_with_str=F('customer__book_with__book_with'),
is_job_completed = F('job_status__job_is_done'),
job_status_str = F('job_status__job_status_description'),
area_str=F('customer__booking_road__area__area'),
booking_road_str=F('customer__booking_road__booking_road'),
price_str=Coalesce('price_on_day', 'set_price'),
frequency_text_str=F('customer__frequency__frequency_text'),
due_date_str=F('due_date'),
).order_by(
'job_window_cleaner','area_str','booking_road_str','due_date'
)[:20]

How to limit records in tables with n to n relation before joining them , to avoid duplicates in django ORM?

Extending my previous question on stack-overflow. I have four tables:
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, where the relation between B and Category is n to 1)
Relation stores 'Intensity' of A in B. I need to calculate the intensity of A in each Category and find the Maximum result. It is achievable using:
A.objects.values(
'id', 'Relation_set__B__Category_id'
).annotate(
AcIntensity=Sum(F('Relation_set__Intensity'))
).aggregate(
Max(F('AcIntensity'))
)['AcIntensity__max']
Now I need to filter the intensities based on some fields in B beforhand:
A.objects.values(
'id', 'Relation_set__B__Category_id'
).filter(
Relation_set__B__BType=type_filter
).annotate(
AcIntensity=Sum(F('Relation_set__Intensity'))
).aggregate(
Max(F('AcIntensity'))
)['AcIntensity__max']
However I need to avoid duplication resulted due to table join which messes the calculation up.(beside those field to define filtration, I do not need any fields in B)
Is there a way to achieve this using Django ORM?
Update
I think what I need is to limit the records in Relation table (based on B filters) before querying the database. How can I do that?
(Maybe using Prefetch_related and Prefetch?)

Finally I've done it using conditional aggregation.
You could find more details in this stack-overflow post.
So the final code would be:
result = A.objects.values(
'id', 'Relation_set__B__Category_id'
).annotate(
AcIntensity=Sum(
Case(
When(
q_statement_for_filter,then=F('Relation_set__Intensity'),
),
default=0,
output_field=FloatField()
)
)
).exclude(
AcIntensity=0
).aggregate(
Max(F('AcIntensity'))
)['AcIntensity__max']
Notice that 'q_statement_for_filter' cannot be an empty Q().

Django filter by annotated field is too slow

I use DRF and I have model Motocycle, which has > 2000 objects in DB. Model has one brand. I want to search by full_name:
queryset = Motocycle.objects.prefetch_related(
"brand"
).annotate(
full_name=Concat(
'brand__title',
Value(' - '),
'title',
)
)
)
I want to filter by full_name, but query is running very slowly:
(1.156) SELECT "mp_api_motocycle"."id"...
Without filtering with pagination:
(3.980) SELECT "mp_api_motocycle"."id"...
There is some possibilty to make this query faster?

Keep your full_name annotation as a column in the database and add an index to it.
Otherwise, you are doing full table scan while calculating full_name and then sorting by it.

Adding aggregate over filtered self-join field to Admin list_display

I would like to augment one of my model admins with an interesting value. Given a model like this:
class Participant(models.Model):
pass
class Registration(models.Model):
participant = models.ForeignKey(Participant)
is_going = models.BooleanField(verbose_name='Is going')
Now, I would like to show the number of other Registrations for this Participant where is_going is False. So, something akin to this SQL query:
SELECT reg.*, COUNT(past.id) AS not_going_num
FROM registrations AS reg, registrations AS past
WHERE past.participant_id = reg.participant_id AND
past.is_going = False
I think I can extend the Admin's queryset() method according to Django Admin, Show Aggregate Values From Related Model, by annotating it with the extra Count, but I still cannot figure out how to work the self-join and filter into this.
I looked at Self join with django ORM and Django self join , How to convert this query to ORM query, but the former is doing SELECT * AND the latter seems to have data model problems.
Any suggestions on how to solve this?

See edit history for previous version of the answer.
The admin implementation below will display "Not Going Count" for each Registration model. The "Not Going Count" is the count of is_going=False for the registration's participant.
#admin.register(Registration)
class RegistrationAdmin(admin.ModelAdmin):
list_display = ['id', 'participant', 'is_going', 'ng_count']
def ng_count(self, obj):
return obj.not_going_count
ng_count.short_description = 'Not Going Count'
def get_queryset(self, request):
qs = super(RegistrationAdmin, self).get_queryset(request)
qs = qs.filter(participant__registration__isnull=False)
qs = qs.annotate(not_going_count=Sum(
Case(
When(participant__registration__is_going=False, then=1),
default=0,
output_field=models.IntegerField())
))
return qs
Below is a more thorough explanation of the QuerySet:
qs = qs.filter(participant__registration__isnull=False)
The filter causes Django to perform two joins - an INNER JOIN to participant table, and a LEFT OUTER JOIN to registration table.
qs = qs.annotate(not_going_count=Sum(
Case(
When(participant__registration__is_going=False, then=1),
default=0,
output_field=models.IntegerField())
)
))
This is a standard aggregate, which will be used to SUM up the count of is_going=False. This translates into the SQL
SUM(CASE WHEN past."is_going" = False THEN 1 ELSE 0 END)
The sum is generated for each registration model, and the sum belongs to the registration's participant.

I might misunderstood, but you can do for single participant:
participant = Participant.objects.get(id=1)
not_going_count = Registration.objects.filter(participant=participant,
is_going=False).count()
For all participants,
from django.db.models import Count
Registration.objects.filter(is_going=False).values('participant') \
.annotate(not_going_num=Count('participant'))
Django doc about aggregating for each item in a queryset.

filtering the order_by relationship in Django ORM

In the below, product has many writers through contributor, and contributor.role_code defines the exact kind of contribution made to the product. Is it possible with the Django ORM to filter the contributors referenced by the order_by() method below? E.g. I want to order products only by contributors such that contributor.role_code in ['A01', 'B01'].
Product.objects.filter(
product_type__name=choices.PRODUCT_BOOK
).order_by(
'contributor__writer__last_name' # filter which contributors it uses?
)

You can do this via an annotation subquery:
Define the Subquery that represents the thing we want to order by
Annotate the original QuerySet with the Subquery
Order by the annotation.
contributors_for_ordering = Contributor.objects.filter( # 1
product=OuterRef('pk'),
role_code__in=['A01', 'B01'],
).values('writer__last_name')
queryset = Product.objects.filter(
product_type__name=choices.PRODUCT_BOOK
).annotate( # 2
writer_last_name=Subquery(contributors_for_ordering[:1]) # Slice [:1] to ensure a single result
).order_by( # 3
'writer_last_name'
)
Note, however, that there is a potential quirk here. If a Product has contributors with both 'A01' and 'B01' we haven't controlled which one will be used for ordering--we'll get whichever the database returns first. You can add an order_by clause to contributors_for_ordering to deal with that.

To filter on specific values, first build your list of accepted values:
accepted_values = ['A01', 'B01']
Then filter for values in this list:
Product.objects.filter(
product_type__name=choices.PRODUCT_BOOK
).filter(
contributor__role_code__in=accepted_values
).order_by(
'contributor__writer__last_name'
)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

QuerySet Optimisations in Django - django

Related

Django query set object query taking too much time?

How to limit records in tables with n to n relation before joining them , to avoid duplicates in django ORM?

Django filter by annotated field is too slow

Adding aggregate over filtered self-join field to Admin list_display

filtering the order_by relationship in Django ORM

Categories

Resources