Annotate in one model Django - django

See my model
https://github.com/rg3915/morris/blob/master/myproject/core/models.py
How to return percent 'uf' of total?
Example:
'uf','quant'
'AM',8
'SP',9
'RJ',4
I try
p=Person.objects.values('uf').annotate(quant=Count('uf', distinct=True)).values_list('uf','quant')
print(p.query)
but return
SELECT "core_person"."uf", COUNT(DISTINCT "core_person"."uf") AS "quant" FROM "core_person" GROUP BY "core_person"."uf", "core_person"."first_name" ORDER BY "core_person"."first_name" ASC
and i need
SELECT "core_person"."uf", COUNT(DISTINCT "core_person"."uf") AS "quant" FROM "core_person" GROUP BY "core_person"."uf" ORDER BY "core_person"."uf" ASC

From your example, I'm going to assume that you are looking to retrieve a list of states and the total number of people that belong to that state.
You were pretty close in your attempt, however you were lacking an order_by() that would have let you choose the order field.
As such your query should look like this:
Person.objects.values('uf').annotate(quant=Count('uf')).order_by('uf').values_list('uf','quant')
I hope you'll notice that I have also removed the distinct=True argument from the Count function, as we want to be able to group all of the similar results.

Related

Django query to fetch top performers for each month

I need to fetch the top performer for each month, here is the below MySql query which gives me the correct output.
select id,Name,totalPoints, createdDateTime
from userdetail
where app=4 and totalPoints in ( select
max(totalPoints)
FROM userdetail
where app=4
group by month(createdDateTime), year(createdDateTime))
order by totalPoints desc
I am new to Django ORM. I am not able to write an equivalent Django query which does the task. I have been struggling with this logic for 2 days. Any help would be highly appreciated.
While the GROUP BY clause in a subquery is slightly difficult to express with the ORM because aggregate() operations don't emit querysets, a similar effect can be achieved with a Window function:
UserDetail.objects.filter(total_points__in=UserDetail.objects.annotate(max_points=Window(
expression=Max('total_points'),
partition_by=[Trunc('created_datetime', 'month')]
)).values('max_points')
)
In general, this sort of pattern is implemented with Subquery expressions. In this case, I've implicitly used a subquery by passing a queryset to an __in predicate.
The Django documentation's notes on using aggregates within subqueries is are also relevant to this sort of query, since you want to use the results of an aggregate in a subquery (which I've avoided by using a window function).
However, I believe your query may not correctly capture what you want to do: as written it could return rows for users who weren't the best in a given month but did have the same score as another user who was the best in any month.

Doctrine 2 How to do a Sum from a Join but Group by 2 columns

I am using Symfony and Doctrine Querybuilder to return a list of Orders with a count of all Products in each Order. That means I have a 1:n Relationship between Orders and OrderProducts the OrderProducts table stores the OrderID, and the quantity of each product. If I do a groupBy o.orderId then I get the correct Sum, but then I do not get the data for all the orderProducts belonging to the orderId so if there are 2 different Products in the orderproducts table then I get qty:2 but only the data for 1 product, so I need to group by oi.orderProductId but obviously as soon I add the second groupBy oi.orderProductId then the TotalCount is showing me the sum for each individual record. I am fully aware that this is the correct behaviour, but I just can not get my head around it if there is a way to achieve what I am trying to do. I assume I have to somehow use a subselect, but I am not sure if this is even possible to achieve. My 2 tables and my attemped so far
tbl order
orderId
dateCreated
tblorderProducts
orderProductid
orderId
qty
return $this->createQueryBuilder('o')
->select(array(
'o AS orderData',
'oi',
'sum(oi.qty)AS noOfItems'))
->join('o.orderItems', 'oi')
->groupBy('o.orderId')
->groupBy('oi.orderProductId')
->getQuery()
->execute();
I hope somebody can give me a hint how I could achieve it. Thank you very much in advance.
The groupBy() method always replaces the current group by clause. Use addGroupBy() method instead.

Django 1.11 Annotating a Subquery Aggregate

This is a bleeding-edge feature that I'm currently skewered upon and quickly bleeding out. I want to annotate a subquery-aggregate onto an existing queryset. Doing this before 1.11 either meant custom SQL or hammering the database. Here's the documentation for this, and the example from it:
from django.db.models import OuterRef, Subquery, Sum
comments = Comment.objects.filter(post=OuterRef('pk')).values('post')
total_comments = comments.annotate(total=Sum('length')).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
They're annotating on the aggregate, which seems weird to me, but whatever.
I'm struggling with this so I'm boiling it right back to the simplest real-world example I have data for. I have Carparks which contain many Spaces. Use Book→Author if that makes you happier but —for now— I just want to annotate on a count of the related model using Subquery*.
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
This gives me a lovely ProgrammingError: more than one row returned by a subquery used as an expression and in my head, this error makes perfect sense. The subquery is returning a list of spaces with the annotated-on total.
The example suggested that some sort of magic would happen and I'd end up with a number I could use. But that's not happening here? How do I annotate on aggregate Subquery data?
Hmm, something's being added to my query's SQL...
I built a new Carpark/Space model and it worked. So the next step is working out what's poisoning my SQL. On Laurent's advice, I took a look at the SQL and tried to make it more like the version they posted in their answer. And this is where I found the real problem:
SELECT "bookings_carpark".*, (SELECT COUNT(U0."id") AS "c"
FROM "bookings_space" U0
WHERE U0."carpark_id" = ("bookings_carpark"."id")
GROUP BY U0."carpark_id", U0."space"
)
AS "space_count" FROM "bookings_carpark";
I've highlighted it but it's that subquery's GROUP BY ... U0."space". It's retuning both for some reason. Investigations continue.
Edit 2: Okay, just looking at the subquery SQL I can see that second group by coming through ☹
In [12]: print(Space.objects_standard.filter().values('carpark').annotate(c=Count('*')).values('c').query)
SELECT COUNT(*) AS "c" FROM "bookings_space" GROUP BY "bookings_space"."carpark_id", "bookings_space"."space" ORDER BY "bookings_space"."carpark_id" ASC, "bookings_space"."space" ASC
Edit 3: Okay! Both these models have sort orders. These are being carried through to the subquery. It's these orders that are bloating out my query and breaking it.
I guess this might be a bug in Django but short of removing the Meta-order_by on both these models, is there any way I can unsort a query at querytime?
*I know I could just annotate a Count for this example. My real purpose for using this is a much more complex filter-count but I can't even get this working.
Shazaam! Per my edits, an additional column was being output from my subquery. This was to facilitate ordering (which just isn't required in a COUNT).
I just needed to remove the prescribed meta-order from the model. You can do this by just adding an empty .order_by() to the subquery. In my code terms that meant:
from django.db.models import Count, OuterRef, Subquery
spaces = Space.objects.filter(carpark=OuterRef('pk')).order_by().values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
And that works. Superbly. So annoying.
It's also possible to create a subclass of Subquery, that changes the SQL it outputs. For instance, you can use:
class SQCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
You then use this as you would the original Subquery class:
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
Carpark.objects.annotate(space_count=SQCount(spaces))
You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.
I just bumped into a VERY similar case, where I had to get seat reservations for events where the reservation status is not cancelled. After trying to figure the problem out for hours, here's what I've seen as the root cause of the problem:
Preface: this is MariaDB, Django 1.11.
When you annotate a query, it gets a GROUP BY clause with the fields you select (basically what's in your values() query selection). After investigating with the MariaDB command line tool why I'm getting NULLs or Nones on the query results, I've came to the conclusion that the GROUP BY clause will cause the COUNT() to return NULLs.
Then, I started diving into the QuerySet interface to see how can I manually, forcibly remove the GROUP BY from the DB queries, and came up with the following code:
from django.db.models.fields import PositiveIntegerField
reserved_seats_qs = SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES
).values('id').annotate(
count=Count('id')).values('count')
# Query workaround: remove GROUP BY from subquery. Test this
# vigorously!
reserved_seats_qs.query.group_by = []
performances_qs = Performance.objects.annotate(
reserved_seats=Subquery(
queryset=reserved_seats_qs,
output_field=PositiveIntegerField()))
print(performances_qs[0].reserved_seats)
So basically, you have to manually remove/update the group_by field on the subquery's queryset in order for it to not have a GROUP BY appended on it on execution time. Also, you'll have to specify what output field the subquery will have, as it seems that Django fails to recognize it automatically, and raises exceptions on the first evaluation of the queryset. Interestingly, the second evaluation succeeds without it.
I believe this is a Django bug, or an inefficiency in subqueries. I'll create a bug report about it.
Edit: the bug report is here.
Problem
The problem is that Django adds GROUP BY as soon as it sees using an aggregate function.
Solution
So you can just create your own aggregate function but so that Django thinks it is not aggregate. Just like this:
total_comments = Comment.objects.filter(
post=OuterRef('pk')
).order_by().annotate(
total=Func(F('length'), function='SUM')
).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
This way you get the SQL query like this:
SELECT "testapp_post"."id", "testapp_post"."length"
FROM "testapp_post"
WHERE "testapp_post"."length" > (SELECT SUM(U0."length") AS "total"
FROM "testapp_comment" U0
WHERE U0."post_id" = "testapp_post"."id")
So you can even use aggregate subqueries in aggregate functions.
Example
You can count the number of workdays between two dates, excluding weekends and holidays, and aggregate and summarize them by employee:
class NonWorkDay(models.Model):
date = DateField()
class WorkPeriod(models.Model):
employee = models.ForeignKey(User, on_delete=models.CASCADE)
start_date = DateField()
end_date = DateField()
number_of_non_work_days = NonWorkDay.objects.filter(
date__gte=OuterRef('start_date'),
date__lte=OuterRef('end_date'),
).annotate(
cnt=Func('id', function='COUNT')
).values('cnt')
WorkPeriod.objects.values('employee').order_by().annotate(
number_of_word_days=Sum(F('end_date__year') - F('start_date__year') - number_of_non_work_days)
)
Hope this will help!
A solution which would work for any general aggregation could be implemented using Window classes from Django 2.0. I have added this to the Django tracker ticket as well.
This allows the aggregation of annotated values by calculating the aggregate over partitions based on the outer query model (in the GROUP BY clause), then annotating that data to every row in the subquery queryset. The subquery can then use the aggregated data from the first row returned and ignore the other rows.
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)
If I understand correctly, you are trying to count Spaces available in a Carpark. Subquery seems overkill for this, the good old annotate alone should do the trick:
Carpark.objects.annotate(Count('spaces'))
This will include a spaces__count value in your results.
OK, I have seen your note...
I was also able to run your same query with other models I had at hand. The results are the same, so the query in your example seems to be OK (tested with Django 1.11b1):
activities = Activity.objects.filter(event=OuterRef('pk')).values('event')
count_activities = activities.annotate(c=Count('*')).values('c')
Event.objects.annotate(spaces__count=Subquery(count_activities))
Maybe your "simplest real-world example" is too simple... can you share the models or other information?
"works for me" doesn't help very much. But.
I tried your example on some models I had handy (the Book -> Author type), it works fine for me in django 1.11b1.
Are you sure you're running this in the right version of Django? Is this the actual code you're running? Are you actually testing this not on carpark but some more complex model?
Maybe try to print(thequery.query) to see what SQL it's trying to run in the database. Below is what I got with my models (edited to fit your question):
SELECT (SELECT COUNT(U0."id") AS "c"
FROM "carparks_spaces" U0
WHERE U0."carpark_id" = ("carparks_carpark"."id")
GROUP BY U0."carpark_id") AS "space_count" FROM "carparks_carpark"
Not really an answer, but hopefully it helps.

Django get all values Group By particular one field

I want to execute a simple query like:
select *,count('id') from menu_permission group by menu_id
In Django format I have tried:
MenuPermission.objects.all().values('menu_id').annotate(Count('id))
It selects only menu_id. The executed query is:
SELECT `menu_permission`.`menu_id`, COUNT(`menu_permission`.`id`) AS `id__count` FROM `menu_permission` GROUP BY `menu_permission`.`menu_id`
But I need other fields also. If I try:
MenuPermission.objects.all().values('id','menu_id').annotate(Count('id))
It adds 'id' in group by condition.
GROUP BY `menu_permission`.`id`
As a result I am not getting the expected result. How I can get all all fields in the output but group by a single one?
You can try subqueries to do what you need.
In my case I have two tables: Item and Transaction where item_id links to Item
First, I prepare Transaction subquery with group by item_id where I sum all amount fields and mark item_id as pk for outer query.
per_item_total=Transaction.objects.values('item_id').annotate(total=Sum('amount')).filter(item_id=OuterRef('pk'))
Then I select all rows from item plus subquery result as total filed.
items_with_total=Item.objects.annotate(total=Subquery(per_item_total.values('total')))
This produces the following SQL:
SELECT `item`.`id`, {all other item fields},
(SELECT SUM(U0.`amount`) AS `total` FROM `transaction` U0
WHERE U0.`item_id` = `item`.`id` GROUP BY U0.`item_id` ORDER BY NULL) AS `total` FROM `item`
You are trying to achieve this SQL:
select *, count('id') from menu_permission group by menu_id
But normally SQL requires that when a group by clause is used you only include those column names in the select that you are grouping by. This is not a django matter, but that's how SQL group by works.
The rows are grouped by those columns so those columns can be included in select and other columns can be aggregated if you want them to into a value. You can't include other columns directly as they may have more than one value (since the rows are grouped).
For example if you have a column called "permission_code", you could ask for an array of the values in the "permission_code" column when the rows are grouped by menu_id.
Depending on the SQL flavor you are using, this could be in PostgreSQL something like this:
select menu_id, array_agg(permission_code), count(id) from menu_permissions group by menu_id
Similary django queryset can be constructed for this.
Hopefully this helps, but if needed please share more about what you need to do and what your data models are.
The only way currently that it works as expected is to hve your query based on the model you want the GROUP BY to be based on.
In your case it looks like you have a Menu model (menu_id field foreign key) so doing this would give you what you want and will allow getting other aggregate information from your MenuPermission model but will only group by the Menu.id field:
Menu.objects.annotate(perm_count=Count('menupermission__id')).values('perm_count')
Of course there is no need for the "annotate" intermediate step if all you want is that single count.
query = MenuPermission.objects.values('menu_id').annotate(menu_id_count=Count('menu_id'))
You can check your SQL query by print(query.query)
This solution doesn't work, all fields end up in the group by clause, leaving it here because it may still be useful to someone.
model_fields = queryset.model._meta.get_fields()
queryset = queryset.values('menu_id') \
.annotate(
count=Count('id'),
**{field.name: F(field.name) for field in model_fields}
)
What i'm doing is getting the list of fields of our model, and set up a dictionary with the field name as key and an F instance with the field name as a parameter.
When unpacked (the **) it gets interpreted as named arguments passed into the annotate function.
For example, if we had a "name" field on our model, this annotate call would end up being equal to this:
queryset = queryset.values('menu_id') \
.annotate(
count=Count('id'),
name=F("name")
)
you can use the following code:
MenuPermission.objects.values('menu_id').annotate(Count('id)).values('field1', 'field2', 'field3'...)

Using .extra() on fields created by .annotate() in Django

I want to retrieve a sum of two fields (which are aggregations themselves) for each object in a table.
The following may describe a bit better what I'm after but results in an Unknown column in field list-Error:
items = MyModel.objects.annotate(
field1=Sum("relatedModel__someField"),
field2=Sum("relatedModel__someField")).extra(
select={"sum_field1_field2": "field1 + field2"})
I also tried using F() for the field lookups but that gives me an invalid sql statement.
Any ideas on how to solve this are much appreciated.
it this what you want?
items = MyModel.objects.extra(
select = {'sum_field1_field2': 'SUM(relatedModel__someField) + SUM(relatedModel__someField)'},
)
To make it work for many to many or for many to one (reverse) relations, you may use the following:
items = MyModel.objects.extra(
select = {'sum_field1_field2': 'SUM("relatedModel"."someField") + SUM("relatedModel"."someField")'},
)
But this will break also if you need another annotate, like for a count, because extra will add the statement to the GROUP BY clause, whereas aggregate functions are not allowed in there.