How to perform a multiple column group by in Django?
I have only seen examples on one column group by.
Below is the query i am trying to convert to Django ORM.
SELECT order_id,city,locality,login_time,sum(morning_hours),sum(afternoon_hours),sum(evening_hours),sum(total_hours)
FROM orders
GROUPBY order_id,city,locality,login_time`
from django.db.models import Sum
Your_Model.objects.values(
"order_id", "city", "locality", "login_time"
).order_by().annotate(
Sum("morning_hours"),
Sum("afternoon_hours"),
Sum("evening_hours"),
Sum("total_hours"),
)
I hope the above code snippet helps (While answering, I had no idea whether morning hours, afternoon hours, etc are derived columns or existing fields in the table, because you have specified nothing of the sort in your question. Hence, I made my assumption and have answered your question). For grouping by multiple columns, there already exists a question on SO. See this link.
Related
I need to fetch the top performer for each month, here is the below MySql query which gives me the correct output.
select id,Name,totalPoints, createdDateTime
from userdetail
where app=4 and totalPoints in ( select
max(totalPoints)
FROM userdetail
where app=4
group by month(createdDateTime), year(createdDateTime))
order by totalPoints desc
I am new to Django ORM. I am not able to write an equivalent Django query which does the task. I have been struggling with this logic for 2 days. Any help would be highly appreciated.
While the GROUP BY clause in a subquery is slightly difficult to express with the ORM because aggregate() operations don't emit querysets, a similar effect can be achieved with a Window function:
UserDetail.objects.filter(total_points__in=UserDetail.objects.annotate(max_points=Window(
expression=Max('total_points'),
partition_by=[Trunc('created_datetime', 'month')]
)).values('max_points')
)
In general, this sort of pattern is implemented with Subquery expressions. In this case, I've implicitly used a subquery by passing a queryset to an __in predicate.
The Django documentation's notes on using aggregates within subqueries is are also relevant to this sort of query, since you want to use the results of an aggregate in a subquery (which I've avoided by using a window function).
However, I believe your query may not correctly capture what you want to do: as written it could return rows for users who weren't the best in a given month but did have the same score as another user who was the best in any month.
This is a bleeding-edge feature that I'm currently skewered upon and quickly bleeding out. I want to annotate a subquery-aggregate onto an existing queryset. Doing this before 1.11 either meant custom SQL or hammering the database. Here's the documentation for this, and the example from it:
from django.db.models import OuterRef, Subquery, Sum
comments = Comment.objects.filter(post=OuterRef('pk')).values('post')
total_comments = comments.annotate(total=Sum('length')).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
They're annotating on the aggregate, which seems weird to me, but whatever.
I'm struggling with this so I'm boiling it right back to the simplest real-world example I have data for. I have Carparks which contain many Spaces. Use Book→Author if that makes you happier but —for now— I just want to annotate on a count of the related model using Subquery*.
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
This gives me a lovely ProgrammingError: more than one row returned by a subquery used as an expression and in my head, this error makes perfect sense. The subquery is returning a list of spaces with the annotated-on total.
The example suggested that some sort of magic would happen and I'd end up with a number I could use. But that's not happening here? How do I annotate on aggregate Subquery data?
Hmm, something's being added to my query's SQL...
I built a new Carpark/Space model and it worked. So the next step is working out what's poisoning my SQL. On Laurent's advice, I took a look at the SQL and tried to make it more like the version they posted in their answer. And this is where I found the real problem:
SELECT "bookings_carpark".*, (SELECT COUNT(U0."id") AS "c"
FROM "bookings_space" U0
WHERE U0."carpark_id" = ("bookings_carpark"."id")
GROUP BY U0."carpark_id", U0."space"
)
AS "space_count" FROM "bookings_carpark";
I've highlighted it but it's that subquery's GROUP BY ... U0."space". It's retuning both for some reason. Investigations continue.
Edit 2: Okay, just looking at the subquery SQL I can see that second group by coming through ☹
In [12]: print(Space.objects_standard.filter().values('carpark').annotate(c=Count('*')).values('c').query)
SELECT COUNT(*) AS "c" FROM "bookings_space" GROUP BY "bookings_space"."carpark_id", "bookings_space"."space" ORDER BY "bookings_space"."carpark_id" ASC, "bookings_space"."space" ASC
Edit 3: Okay! Both these models have sort orders. These are being carried through to the subquery. It's these orders that are bloating out my query and breaking it.
I guess this might be a bug in Django but short of removing the Meta-order_by on both these models, is there any way I can unsort a query at querytime?
*I know I could just annotate a Count for this example. My real purpose for using this is a much more complex filter-count but I can't even get this working.
Shazaam! Per my edits, an additional column was being output from my subquery. This was to facilitate ordering (which just isn't required in a COUNT).
I just needed to remove the prescribed meta-order from the model. You can do this by just adding an empty .order_by() to the subquery. In my code terms that meant:
from django.db.models import Count, OuterRef, Subquery
spaces = Space.objects.filter(carpark=OuterRef('pk')).order_by().values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
And that works. Superbly. So annoying.
It's also possible to create a subclass of Subquery, that changes the SQL it outputs. For instance, you can use:
class SQCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
You then use this as you would the original Subquery class:
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
Carpark.objects.annotate(space_count=SQCount(spaces))
You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.
I just bumped into a VERY similar case, where I had to get seat reservations for events where the reservation status is not cancelled. After trying to figure the problem out for hours, here's what I've seen as the root cause of the problem:
Preface: this is MariaDB, Django 1.11.
When you annotate a query, it gets a GROUP BY clause with the fields you select (basically what's in your values() query selection). After investigating with the MariaDB command line tool why I'm getting NULLs or Nones on the query results, I've came to the conclusion that the GROUP BY clause will cause the COUNT() to return NULLs.
Then, I started diving into the QuerySet interface to see how can I manually, forcibly remove the GROUP BY from the DB queries, and came up with the following code:
from django.db.models.fields import PositiveIntegerField
reserved_seats_qs = SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES
).values('id').annotate(
count=Count('id')).values('count')
# Query workaround: remove GROUP BY from subquery. Test this
# vigorously!
reserved_seats_qs.query.group_by = []
performances_qs = Performance.objects.annotate(
reserved_seats=Subquery(
queryset=reserved_seats_qs,
output_field=PositiveIntegerField()))
print(performances_qs[0].reserved_seats)
So basically, you have to manually remove/update the group_by field on the subquery's queryset in order for it to not have a GROUP BY appended on it on execution time. Also, you'll have to specify what output field the subquery will have, as it seems that Django fails to recognize it automatically, and raises exceptions on the first evaluation of the queryset. Interestingly, the second evaluation succeeds without it.
I believe this is a Django bug, or an inefficiency in subqueries. I'll create a bug report about it.
Edit: the bug report is here.
Problem
The problem is that Django adds GROUP BY as soon as it sees using an aggregate function.
Solution
So you can just create your own aggregate function but so that Django thinks it is not aggregate. Just like this:
total_comments = Comment.objects.filter(
post=OuterRef('pk')
).order_by().annotate(
total=Func(F('length'), function='SUM')
).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
This way you get the SQL query like this:
SELECT "testapp_post"."id", "testapp_post"."length"
FROM "testapp_post"
WHERE "testapp_post"."length" > (SELECT SUM(U0."length") AS "total"
FROM "testapp_comment" U0
WHERE U0."post_id" = "testapp_post"."id")
So you can even use aggregate subqueries in aggregate functions.
Example
You can count the number of workdays between two dates, excluding weekends and holidays, and aggregate and summarize them by employee:
class NonWorkDay(models.Model):
date = DateField()
class WorkPeriod(models.Model):
employee = models.ForeignKey(User, on_delete=models.CASCADE)
start_date = DateField()
end_date = DateField()
number_of_non_work_days = NonWorkDay.objects.filter(
date__gte=OuterRef('start_date'),
date__lte=OuterRef('end_date'),
).annotate(
cnt=Func('id', function='COUNT')
).values('cnt')
WorkPeriod.objects.values('employee').order_by().annotate(
number_of_word_days=Sum(F('end_date__year') - F('start_date__year') - number_of_non_work_days)
)
Hope this will help!
A solution which would work for any general aggregation could be implemented using Window classes from Django 2.0. I have added this to the Django tracker ticket as well.
This allows the aggregation of annotated values by calculating the aggregate over partitions based on the outer query model (in the GROUP BY clause), then annotating that data to every row in the subquery queryset. The subquery can then use the aggregated data from the first row returned and ignore the other rows.
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)
If I understand correctly, you are trying to count Spaces available in a Carpark. Subquery seems overkill for this, the good old annotate alone should do the trick:
Carpark.objects.annotate(Count('spaces'))
This will include a spaces__count value in your results.
OK, I have seen your note...
I was also able to run your same query with other models I had at hand. The results are the same, so the query in your example seems to be OK (tested with Django 1.11b1):
activities = Activity.objects.filter(event=OuterRef('pk')).values('event')
count_activities = activities.annotate(c=Count('*')).values('c')
Event.objects.annotate(spaces__count=Subquery(count_activities))
Maybe your "simplest real-world example" is too simple... can you share the models or other information?
"works for me" doesn't help very much. But.
I tried your example on some models I had handy (the Book -> Author type), it works fine for me in django 1.11b1.
Are you sure you're running this in the right version of Django? Is this the actual code you're running? Are you actually testing this not on carpark but some more complex model?
Maybe try to print(thequery.query) to see what SQL it's trying to run in the database. Below is what I got with my models (edited to fit your question):
SELECT (SELECT COUNT(U0."id") AS "c"
FROM "carparks_spaces" U0
WHERE U0."carpark_id" = ("carparks_carpark"."id")
GROUP BY U0."carpark_id") AS "space_count" FROM "carparks_carpark"
Not really an answer, but hopefully it helps.
I have model named IssueFlags with columns:
id, created_at, flags_id, issue_id, comments
I want to get data of unique issue_id (latest created) with info about flags_id, created_at and comments
By sql it's working like this:
SELECT created_at, flags_id, issue_id, comments
FROM Issues_issueflags
group by issue_id
How to do the same in Django? I tried to wrote sth in shell, but there is no attribute group by
IssueFlags.objects.order_by('-created_at')
This above return me only the list of ordered data.
Try doing this way:
from django.db.models import Count
IssueFlags.objects.values('created_at', 'flags_id', 'issue_id', 'comments').order_by('-created_at').annotate(total=Count('issue_id'))
I have written annotate(total=Count('issue_id')) assuming that you would have multiple entries of unique issue_id (Note that you can do all possible types of aggregations like Sum, Count, Max, Avg inside . Also, there already exists answers for, doing group by in django. Also have a look at this link or this link. Also, read this django documentation to get a clear idea on when to place values() before annotate() and when to place it after, and then implement the learning as per your requirement.
Would be happy to help if you have any further doubts.
How can I make an order_by like this ....
p = Product.objects.filter(vendornumber='403516006')\
.order_by('-created').distinct('vendor__name')
The problem is that I have multiple vendors with the same name, and I only want the latest product by the vendor ..
Hope it makes sense?
I got this DB error:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON ("search_vendor"."name")
"search_product"...
Based on your error message and this other question, it seems to me this would fix it:
p = Product.objects.filter(vendornumber='403516006')\
.order_by('vendor__name', '-created').distinct('vendor__name')
That is, it seems that the DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). So by making the column you use in distinct as the first column in the order_by, I think it should work.
Just matching leftmost order_by() arg and distinct() did not work for me, producing the same error (Django 1.8.7 bug or a feature)?
qs.order_by('project').distinct('project')
however it worked when I changed to:
qs.order_by('project__id').distinct('project')
and I do not even have multiple order_by args.
In case you are hoping to use a separate field for distinct and order by another field you can use the below code
from django.db.models import Subquery
Model.objects.filter(
pk__in=Subquery(
Model.objects.all().distinct('foo').values('pk')
)
).order_by('bar')
I had a similar issue but then with related fields. With just adding the related field in distinct(), I didn't get the right results.
I wanted to sort by room__name keeping the person (linked to residency ) unique. Repeating the related field as per the below fixed my issue:
.order_by('room__name', 'residency__person', ).distinct('room__name', 'residency__person')
See also these related posts:
ProgrammingError: when using order_by and distinct together in django
django distinct and order_by
Postgresql DISTINCT ON with different ORDER BY
Is it possible to filter within an annotation?
In my mind something like this (which doesn't actually work)
Student.objects.all().annotate(Count('attendance').filter(type="Excused"))
The resultant table would have every student with the number of excused absences. Looking through documentation filters can only be before or after the annotation which would not yield the desired results.
A workaround is this
for student in Student.objects.all():
student.num_excused_absence = Attendance.objects.filter(student=student, type="Excused").count()
This works but does many queries, in a real application this can get impractically long. I think this type of statement is possible in SQL but would prefer to stay with ORM if possible. I even tried making two separate queries (one for all students, another to get the total) and combined them with |. The combination changed the total :(
Some thoughts after reading answers and comments
I solved the attendance problem using extra sql here.
Timmy's blog post was useful. My answer is based off of it.
hash1baby's answer works but seems equally complex as sql. It also requires executing sql then adding the result in a for loop. This is bad for me because I'm stacking lots of these filtering queries together. My solution builds up a big queryset with lots of filters and extra and executes it all at once.
If performance is no issue - I suggest the for loop work around. It's by far the easiest to understand.
As of Django 1.8 you can do this directly in the ORM:
students = Student.objects.all().annotate(num_excused_absences=models.Sum(
models.Case(
models.When(absence__type='Excused', then=1),
default=0,
output_field=models.IntegerField()
)))
Answer adapted from another SO question on the same topic
I haven't tested the sample above but did accomplish something similar in my own app.
You are correct - django does not allow you to filter the related objects being counted, without also applying the filter to the primary objects, and therefore excluding those primary objects with a no related objects after filtering.
But, in a bit of abstraction leakage, you can count groups by using a values query.
So, I collect the absences in a dictionary, and use that in a loop. Something like this:
# a query for students
students = Students.objects.all()
# a query to count the student attendances, grouped by type.
attendance_counts = Attendence(student__in=students).values('student', 'type').annotate(abs=Count('pk'))
# regroup that into a dictionary {student -> { type -> count }}
from itertools import groupby
attendance_s_t = dict((s, (dict(t, c) for (s, t, c) in g)) for s, g in groupby(attendance_counts, lambda (s, t, c): s))
# then use them efficiently:
for student in students:
student.absences = attendance_s_t.get(student.pk, {}).get('Excused', 0)
Maybe this will work for you:
excused = Student.objects.filter(attendance__type='Excused').annotate(abs=Count('attendance'))
You need to filter the Students you're looking for first to just those with excused absences and then annotate the count of them.
Here's a link to the Django Aggregation Docs where it discusses filtering order.