Django ORM: Perform conditional `order_by` - django

Say one has this simple model:
from django.db import models
class Foo(models.Model):
n = models.IntegerField()
In SQL you can perform an order by with a condition
e.g.
select * from foo orber by n=7, n=17, n=3, n
This will sort the rows by first if n is 7, then if n is 14, then if n is 3, and then finally by n ascending.
How does one do the same with the Django ORM? It is not covered in their order_by docs.

You can work with a generic solution that looks like:
from django.db.models import Case, IntegerField, When, Value
items = [7, 17, 3]
Foo.objects.alias(
n_order=Case(
*[When(n=item, then=Value(i)) for i, item in enumerate(items)],
default=Value(len(items)),
output_field=IntegerField()
)
).order_by('n_order', 'n')
This thus constructs a conditional expression chainĀ [Django-doc] that is used first, and if n is not one of these, it will fall back on ordering with n itself.

You can use .annotate() to assign records a custom_order value and use then .order_by() to order the queryset based on this value.
For example:
Foo.objects \
.annotate(custom_order=Case(
When(n=7, then=Value(0)),
When(n=17, then=Value(1)),
When(n=3, then=Value(2)),
default=Value(3),
output_field=IntegerField()
) \
.order_by('custom_order', 'n')

Related

Sum aggregation over list items in Django JSONField

I'd like to calculate the sum of all elements in a list inside a JSONField via Django's ORM. The objects basically look like this:
[
{"score": 10},
{"score": 0},
{"score": 40},
...
]
There are several problems that made me use a Raw Query in the end (see SQL query below) but I'd like to know if it is possible with Django's ORM.
SELECT id,
SUM(elements.score) AS total_score
FROM my_table,
LATERAL (SELECT
(jsonb_array_elements('results')->'score')::integer AS score
) AS elements
GROUP BY id
ORDER BY total_score DESC
The main problems I faced is that the list in the JSONField needs to be turned into a set via jsonb_array_elements. Afterwards it is impossible to run an aggregate function over the results. Postgres complains:
aggregate function calls cannot contain set-returning function calls
Using a LATERAL FROM -- as widely suggested -- is not possible with the ORM. Not even with Django's .extra() queryset method because it is not possible to specify an additional table that is not quoted in the final query:
Model.objects.annotate(...).extra(
tables="LATERAL (SELECT (jsonb_array_elements('results')->'score')::integer AS score) AS elements"
)
# ERROR: no relation "LATERAL (SELECT ..."
You can annotate the queryset with the score value from the JSONField, Cast it to an integer, retrieve the distinct values, and get the sum of whatever is left. I think the following query should do the trick:
from django.db.models import IntegerField
from django.db.models import Sum
from django.db.models.fields.json import KeyTextTransform
from django.db.models.functions import Cast
Model.objects.annotate(
score=Cast(
KeyTextTransform("score", "JSONField_name"),
IntegerField(),
)
).values("score").distinct().aggregate(Sum("score"))["score__sum"]
Note that you will still have to change the JSONField_name according to your model

How to aggregate sum of several previous aggregated values in django ORM

In use: django 3.2.10, postgresql 13.4
I have next query set with aggregation function Count
queryset = Model.objects.all().aggregate(
trues=Count('id', filter=Q(criteria=True)),
falses=Count('id', filter=Q(criteria=False)),
)
What I want:
queryset = Model.objects.all().aggregate(
trues=Count('id', filter=Q(criteria=True)),
falses=Count('id', filter=Q(criteria=False)),
total=trues+falses, <--------------THIS
)
How to do this?
There is little thing you can do after aggregation, as it returns a python dict object.
I do understand your example here is not your real situation, as you can simply do
Model.objects.aggregate(
total = (Count('id', filter=Q(criteria=True))
+ Count('id', filter=Q(criteria=False)))
)
What I want to say is Django provides .values().annotate() to achieve GROUP BY clause as in sql language.
Take your example here
queryset = Model.objects.values('criteria').annotate(count=Count('id'))
queryset here is still a 'QuerySet' object, and you can further modify the queryset like
queryset = queryset.aggregate(
total=Sum('count')
)
Hopefully it helps.
it seems you want the total number of false and true criteria so you can simply do as follow
queryset = Model.objects.all().filter(
Q(criteria=True) | Q(criteria=False)).count()
or you can use (not recommended except you want to show something in the middle)
from django.db.models import Avg, Case, Count, F, Max, Min, Prefetch, Q, Sum, When
query = Model.objects.annotate(trues=Count('id',filter=Q(criteria=True)),
falses=Count('id',filter=Q(criteria=False))).annotate(trues_false=F('trues')+F('falses')).aggregate(total=Sum('trues_false'))

Django conditional Subquery aggregate

An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
The problem here is that this absolutely murders performance due to the N+1 problem.
I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.
Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.
I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)
EDIT: Example of raw sql solution
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
I think with Subquery we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
The generated SQL is
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
1. GROUP BY using .values and .annotate
2. order_by() problems
3. Subquery
You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
I like the following way of doing it:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
The more details on this method you can see in this answer: https://stackoverflow.com/a/69020732/10567223

using Filtered Count in django over joined tables returns wrong values

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, where the relation between B and Category is n to 1)
What I need is to calculate the occurrence rate of A in Category which is obtained using:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
ANum = Count('id', distinct=False)
)
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1 which is not the desired outcome. The problem is that I have to filter the calculation based on the dates that B has been occurred on(and some other fields in B table),
I am using django 2.0's feature which makes using filter as an argument in aggregation possible.
Let's assume:
kwargs= {}
kwargs['relation_set__B____BDate__gte'] = the_start_limit
I could use it in my code like:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Count('id', distinct=False, filter=Q(**kwargs))
)
However the result I get is duplicated due to the table joins and I cannot use distinct=True as I explained. (querying A is also a must since I have to aggregate some other fields on this table as explained in my question here)
I am using Postgres and django 2.0.1 .
Is there any workarounds to achieve what I have in mind?
Update
Got it done using another Subquery:
# subquery
annotation = {
'ANum': Count('relation_set__A_id', distinct=False,
filter=Q(**Bkwargs),
}
sub_filter = Q(relation_set__A_id=OuterRef('id')) &
Q(Category_id=OuterRef('relation_set__B__Category_id'))
# you could annotate 'relation_set__B__Category_id' to A query an set the field here.
subquery = B.objects.filter(
sub_filter
).values(
'relation_set__A_id'
).annotate(**annotation).values('ANum')[:1]
# main query
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Subquery(subquery)
)
I'm still not sure if I understood what you want. You write
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1
Of course. You count the associated A-object to each A-object. Each counts itself. So I still think you don't want to annotate A-objects with Anum, but probably Categories. This one should give you the desired number of As in each Category.
Category.objects.annotate(
Anum=Count(
'b__relation__a',
filter=Q(b__BDate__gte=the_start_limit),
distinct=True
)
)
'b__relation__a' follows the relations backwards and picks all A-objects that are related to the Category. However the filter limits the counted relations to certain Bs. The distinct=True is needed to avoid a query bug.
If you really want "a list of A objects grouped by its id" (and not only the aggregated Anum-count), as you stated in your comment, I don't see an easy way to do that in a single query.

Django order_by specific order

Is it possible to replicate this kind of specific sql ordering in the django ORM:
order by
(case
when id = 5 then 1
when id = 2 then 2
when id = 3 then 3
when id = 1 then 4
when id = 4 then 5
end) asc
?
Since Django 1.8 you have Conditional Expressions so using extra is not necessary anymore.
from django.db.models import Case, When, Value, IntegerField
SomeModel.objects.annotate(
custom_order=Case(
When(id=5, then=Value(1)),
When(id=2, then=Value(2)),
When(id=3, then=Value(3)),
When(id=1, then=Value(4)),
When(id=4, then=Value(5)),
output_field=IntegerField(),
)
).order_by('custom_order')
It is possible. Since Django 1.8 you can do in the following way:
from django.db.models import Case, When
ids = [5, 2, 3, 1, 4]
order = Case(*[When(id=id, then=pos) for pos, id in enumerate(ids)])
queryset = MyModel.objects.filter(id__in=ids).order_by(order)
You could do it w/ extra() or more plain raw(), but they can not work well w/ more complex situation.
qs.extra(select={'o':'(case when id=5 then 1 when id=2 then 2 when id=3 then 3 when id=1 then 4 when id=4 then 5 end)', order_by='o'}
YourModel.raw('select ... order by (case ...)')
For your code, condition set is very limited, you could sort in Python easily.