Raw SQL in order by in django model query - django

I've a simple query like this (I want 1 BHK to come first, then 2BHK, then anything else)
select *
from service_options
order by case space when '1BHK' then 0 when '2BHK' then 1 else 2 end,
space
In Django, how to do it? I've a model named ServiceOption
I tried this but no luck.
ServiceOption.objects.order_by(RawSQL("case space when '1BHK' then 0 when '2BHK' then 1 else 2 end,space"), ()).all()
I don't want to execute raw query with something like
ServiceOption.objects.raw("raw query here")
In Laravel, something like this could easily be pulled off like this
Model::query()->orderByRaw('raw order by query here')->get();
Any input will be appreciated. Thank you in advance.

You can work with a .annotate(…) [Django-doc] and then .order_by(…) [Django-doc]:
from django.db.models import Case, IntegerField, Value, When
ServiceOption.objects.annotate(
sp=Case(
When(space='1BHK', then=Value(0)),
When(space='2BHK', then=Value(1)),
default=Value(2),
output_field=IntegerField()
)
).order_by('sp', 'space')
The raw query would come to this
SELECT *, CASE WHEN "service_options"."space" = 1BHK THEN 0 WHEN "service_options"."space" = 2BHK THEN 1 ELSE 2 END AS "sp" FROM "service_options" ORDER BY "sp" ASC, "service_options"."space" ASC
Since django-3.2 you can work with .alias(…) [Django-doc] to prevent calculating this both as column and in the ORDER BY clause:
from django.db.models import Case, IntegerField, Value, When
ServiceOption.objects.alias(
sp=Case(
When(space='1BHK', then=Value(0)),
When(space='2BHK', then=Value(1)),
default=Value(2),
output_field=IntegerField()
)
).order_by('sp', 'space')
The raw query would come to this
SELECT * FROM "service_options" ORDER BY CASE WHEN ("service_options"."space" = 1BHK) THEN 0 WHEN ("service_options"."space" = 2BHK) THEN 1 ELSE 2 END ASC, "service_options"."space" ASC

Related

Count annotation adds unwanted group by statement for all fields

I want to generate the following query:
select id, (select count(*) from B where B.x = A.x) as c from A
Which should be simple enough with the Subquery expression. Except I get a group by statement added to my count query which I can't get rid of:
from django.contrib.contenttypes.models import ContentType
str(ContentType.objects.annotate(c=F('id')).values('c').query)
# completely fine query with annotated field
'SELECT "django_content_type"."id" AS "c" FROM "django_content_type"'
str(ContentType.objects.annotate(c=Count('*')).values('c').query)
# gets group by for every single field out of nowhere
'SELECT COUNT(*) AS "c" FROM "django_content_type" GROUP BY "django_content_type"."id", "django_content_type"."app_label", "django_content_type"."model"'
Which makes the result be [{'c': 1}, {'c': 1}, {'c': 1}, {'c': 1},...] instead of [{c:20}]. But subqueries have to have only one row of result to be usable.
Since the query is supposed to be used in a subquery I can't use .count() or .aggregate() either since those evaluate instantly and complain about the usage of OuterRef expression.
Example with subquery:
str(ContentType.objects.annotate(fields=Subquery(
Field.objects.filter(model_id=OuterRef('pk')).annotate(c=Count('*')).values('c')
)).query)
Generates
SELECT "django_content_type"."id",
"django_content_type"."app_label",
"django_content_type"."model",
(SELECT COUNT(*) AS "c"
FROM "meta_field" U0
WHERE U0."model_id" = ("django_content_type"."id")
GROUP BY U0."id", U0."model_id", U0."module", U0."name", U0."label", U0."widget", U0."visible", U0."readonly",
U0."desc", U0."type", U0."type_model_id", U0."type_meta_id", U0."is_type_meta", U0."multi",
U0."translatable", U0."conditions") AS "fields"
FROM "django_content_type"
Expected query:
SELECT "django_content_type"."id",
"django_content_type"."app_label",
"django_content_type"."model",
(SELECT COUNT(*) AS "c"
FROM "meta_field" U0
WHERE U0."model_id" = ("django_content_type"."id")) AS "fields"
FROM "django_content_type"
Update: (to add models from real app requested in comments):
class Translation(models.Model):
field = models.ForeignKey(MetaField, models.CASCADE)
ref_id = models.IntegerField()
# ... other fields
class Choice(models.Model):
meta = models.ForeignKey(MetaField, on_delete=models.PROTECT)
# ... other fields
I need a query to get number of Translations available for each choice where Translation.field_id refers to Choice.meta_id and Translation.ref_id refers to Choice.id.
The reason there are no foreign keys is that not all meta fields are choice fields (e.g. text fields may also have translations). I could make a separate table for each translatable entity, but this setup should be easy to use with a count subquery that doesn't have a group by statement in it.
UPDATE Here's a query using subquery that should come close to what you want:
str(ContentType.objects.annotate(fields=Subquery(
Field.objects.filter(model_id=OuterRef('pk')).values('model').annotate(c=Count('pk')).values('c')
)).query)
The only thing I did was adding the values('model') group_by clause which makes the Count('pk') actually work since it aggregates all rows into one.
It will return null instead of 0 when there are no related rows, which you can probably transform to 0 using a Coalesce function or a Case ... When ... then.
The exact query you want isn't possible with the Django ORM, although you can achieve the same result with
Choice.objects.annotate(c=Count(
'meta__translation',
distinct=True,
filter=Q(meta__translation__ref_id=F('id'))
))
Alternatively look at the django-sql-utils package, as also mentioned in this post.
It is a bit of a dirty hack, but after diving inside Django's ORM code, I found the following works wonderfully for me (I am trying to use your own example's subquery):
counting_subquery = Subquery( Field.objects
.filter( model_id = OuterRef( 'pk' ) )
.annotate( c = Count( '*' ) )
.values('c') )
# Note: the next line fixes a bug in the Django ORM, where the subquery defined above
# triggers an unwanted group_by clause in the generated SQL which ruins the count operation.
counting_subquery.query.group_by = True
results = ContentType.objects
.annotate( fields_count = Subquery( counting_subquery ) )
...
The key is setting group_by to True. That gets rid of the unwanted group_by clause in your SQL.
I am not happy about it, as it relies on Django's undocumented behaviour to work. But I can live with it; I am even less happy about the maintainability of using direct SQL in the subquery...

Django conditional Subquery aggregate

An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
The problem here is that this absolutely murders performance due to the N+1 problem.
I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.
Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.
I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)
EDIT: Example of raw sql solution
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
I think with Subquery we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
The generated SQL is
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
1. GROUP BY using .values and .annotate
2. order_by() problems
3. Subquery
You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
I like the following way of doing it:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
The more details on this method you can see in this answer: https://stackoverflow.com/a/69020732/10567223

Django count of related objects with conditions

I'm trying to get the count of related objects with a condition:
Item.objects.annotate(count_subitems=Count('subitems'))
Subitem has a created_at column, which I need to use for filtering the count (greater than a date, less than a date or between dates).
How can I do this with the Django ORM?
Maybe you're looking for something like this:
from django.db.models import Count, Sum, Case, When, IntegerField
Item.objects.annotate(
count_subitems=Sum(
Case(
When(subitems__created_at__lte=datetime.now(), then=1)
),
output_field=IntegerField()
)
)
Filter the Items that have at least one subitem matching, and then count all the subitems for that Item:
(Item.objects
.filter(subitems__created_at__lte=datetime.now())
.annotate(count_subitems=Count('subitems')))

Django ORM: is it possible to inject subqueries?

I have a Django model that looks something like this:
class Result(models.Model):
date = DateTimeField()
subject = models.ForeignKey('myapp.Subject')
test_type = models.ForeignKey('myapp.TestType')
summary = models.PositiveSmallIntegerField()
# more fields about the result like its location, tester ID and so on
Sometimes we want to retrieve all the test results, other times we only want the most recent result of a particular test type for each subject. This answer has some great options for SQL that will find the most recent result.
Also, we sometimes want to bucket the results into different chunks of time so that we can graph the number of results per day / week / month.
We also want to filter on various fields, and for elegance I'd like a QuerySet that I can then make all the filter() calls on, and annotate for the counts, rather than making raw SQL calls.
I have got this far:
qs = Result.objects.extra(select = {
'date_range': "date_trunc('{0}', time)".format("day"), # Chunking into time buckets
'rn' : "ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)"})
qs = qs.values('date_range', 'result_summary', 'rn')
qs = qs.order_by('-date_range')
which results in the following SQL:
SELECT (ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)) AS "rn", (date_trunc('day', time)) AS "date_range", "myapp_result"."result_summary" FROM "myapp_result" ORDER BY "date_range" DESC
which is kind of approaching what I'd like, but now I need to somehow filter to only get the rows where rn = 1. I tried using the 'where' field in extra(), which gives me the following SQL and error:
SELECT (ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)) AS "rn", (date_trunc('day', time)) AS "date_range", "myapp_result"."result_summary" FROM "myapp_result" WHERE "rn"=1 ORDER BY "date_range" DESC ;
ERROR: column "rn" does not exist
So I think the query that finds "rn" needs to be a subquery - but is it possible to do that somehow, perhaps using extra()?
I know I could do this with raw SQL but it just looks ugly! I'd love to find a nice neat way where I have a filterable QuerySet.
I guess the other option is to have a field in the model that indicates whether it is actually the most recent result of that test type for that subject...
I've found a way!
qs = Result.objects.extra(where = ["NOT EXISTS(SELECT * FROM myapp_result as T2 WHERE (T2.test_type_id = myapp_result.test_type_id AND T2.subject_id = myapp_result.subject ID AND T2.time > myapp_result.time))"])
This is based on a different option from the answer I referenced earlier. I can filter or annotate qs with whatever I want.
As an aside, on the way to this solution I tried this:
qq = Result.objects.extra(where = ["NOT EXISTS(SELECT * FROM myapp_result as T2 WHERE (T2.test_type_id = myapp_result.test_type_id AND T2.subject_id = myapp_result.subject ID AND T2.time > myapp_result.time))"])
qs = Result.objects.filter(id__in=qq)
Django embeds the subquery just as you want it to:
SELECT ...some fields... FROM "myapp_result"
WHERE ("myapp_result"."id" IN (SELECT "myapp_result"."id" FROM "myapp_result"
WHERE (NOT EXISTS(SELECT * FROM myapp_result as T2
WHERE (T2.subject_id = myapp_result.subject_id AND T2.test_type_id = myapp_result.test_type_id AND T2.time > myapp_result.time)))))
I realised this had more subqueries than I need, but I note it here as I can imagine it being useful to know that you can filter one queryset with another and Django does exactly what you'd hope for in terms of embedding the subquery (rather than, say, executing it and embedding the returned values, which would be horrid.)

Django order_by specific order

Is it possible to replicate this kind of specific sql ordering in the django ORM:
order by
(case
when id = 5 then 1
when id = 2 then 2
when id = 3 then 3
when id = 1 then 4
when id = 4 then 5
end) asc
?
Since Django 1.8 you have Conditional Expressions so using extra is not necessary anymore.
from django.db.models import Case, When, Value, IntegerField
SomeModel.objects.annotate(
custom_order=Case(
When(id=5, then=Value(1)),
When(id=2, then=Value(2)),
When(id=3, then=Value(3)),
When(id=1, then=Value(4)),
When(id=4, then=Value(5)),
output_field=IntegerField(),
)
).order_by('custom_order')
It is possible. Since Django 1.8 you can do in the following way:
from django.db.models import Case, When
ids = [5, 2, 3, 1, 4]
order = Case(*[When(id=id, then=pos) for pos, id in enumerate(ids)])
queryset = MyModel.objects.filter(id__in=ids).order_by(order)
You could do it w/ extra() or more plain raw(), but they can not work well w/ more complex situation.
qs.extra(select={'o':'(case when id=5 then 1 when id=2 then 2 when id=3 then 3 when id=1 then 4 when id=4 then 5 end)', order_by='o'}
YourModel.raw('select ... order by (case ...)')
For your code, condition set is very limited, you could sort in Python easily.