I'm working with Django and I need to group a model by multiple fields.
The query I want is:
select field1, field2 from tbl group by field1, field2 having count(*) > 1
Then what I've tried is the following:
Tbl.objects.annotate(cnt=Count('field1','field2')).filter(cnt__gt=1)
But, it didn't work.
Can't group by multiple fields on ORM?
Do I need to write RawQuery?
In your django queryset, you have to first use the .values() to select which columns you wish to group by on.
Tbl.objects.values('field1', 'field2').annotate(cnt=Count('pk')).filter(cnt__gt=1)
# or whatever you needed to do
EDIT
Also, please check previous questions before asking.
Django orm group by multiple columns
Django: GROUP BY two values
Related
I am playing around with django ORM
import django
django.setup()
from django.contrib.auth.models import User, Group
from django.db.models import Count
# All users
print(User.objects.all().count())
# --> 742
# Should be: All users which are in a group.
# But the result is different. I don't understand this.
print(User.objects.filter(groups__in=Group.objects.all()).count())
# --> 1731
# All users which are in a group.
# distinct needed
print(User.objects.filter(groups__in=Group.objects.all()).distinct().count())
# --> 543
# All users which are in a group. Without distinct, annotate seems to do this.
print(User.objects.filter(groups__in=Group.objects.all()).annotate(Count('pk')).count())
# --> 543
# All users which are in no group
print(User.objects.filter(groups__isnull=True).count())
# --> 199
# 199 + 543 = 742 (nice)
I don't understand the second query which returns 1731.
I know that I can use distinct().
Nevertheless 1731 looks like a bug to me.
What is the intention why below query is not distinct/unique?
User.objects.filter(groups__in=Group.objects.all())
Raw MySQL query looks like this:
SELECT user.id, group.id FROM user LEFT JOIN group ON user.group_id = group.id
The result will contain all possible combinations of users and groups and I guess some users belong to more than one group.
You are trying to fetch all users from all groups, but a user can present in multiple groups that's why distinct is required. if you want users ina specific group instead of doing an all try a filter query.
I assume that User.groups is a ForeignKey or some other relationship that associates each User with zero to many Group instances.
So the query which confuses you:
User.objects.filter(groups__in=Group.objects.all())
That query can be described as:
Access the Group model manager (Group.objects).
Make a QuerySet:
Return all Group instances (Group.objects.all()).
Access the User model manager (User.objects).
Make a Queryset:
Join to the Group model, on the User.groups foreign key.
Return every (User + Group) row which has an associated Group.
That is not “all users which are in a group”; instead, it is “All user–group pairs where the group exists”.
By querying on each of the multiple-value User.groups field, you are implying that the query must contain a join from User to Group rows.
Instead, you want:
Access the User model manager (User.objects).
Make a QuerySet:
Return all rows which have groups not empty.
User.objects.filter(groups__isnull=False)
Note that this – “All users which have a non-empty set of associated groups” – is the inverse of another example query you have (“All users which are in no group”).
Since groups is a ManyToManyField the query translated into INNER JOIN statement.
If you print the following you will see the query generated by the QuerySet:
>>> print(User.objects.filter(groups__in=Group.objects.all()).query)
SELECT `auth_user`.`id`, .... , `auth_user`.`date_joined` FROM `auth_user` INNER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) WHERE `auth_user_groups`.`group_id` IN (SELECT `auth_group`.`id` FROM `auth_group`)
As you would see the query joins auth_user and auth_user_groups tables.
Where auth_user_groups is the ManyToManyField table not the table for Group model. Thus a user will come more than once.
You would want to use annotate get users having grous, in my case the numbers are following:
$ ./manage.py shell
>>>
>>> from django.contrib.auth.models import User, Group
>>> from django.db.models import Count
>>>
# All users
>>> print(User.objects.all().count())
556
>>>
# All users which are not in a group.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count=0).count())
44
>>>
# All users which are in a group.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count__gt=0).count())
512
>>>
Annotate is similar to distinct in behaviour. It creates a group by query. You can see and inspect the query as following.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count__gt=0).query)
SELECT `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`, COUNT(`auth_user_groups`.`group_id`) AS `group_count` FROM `auth_user` LEFT OUTER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) GROUP BY `auth_user`.`id` HAVING COUNT(`auth_user_groups`.`group_id`) > 0 ORDER BY NULL
When you run a 'DISTINCT' query against a database you end up with a listing of each distinct row in the data results. The reason that you have more 'DISTINCT' rows in your Django result is there is a combinatoric cross multiplication going on, creating extra results.
Other answers have mentioned all of this, but since you're asking the why:
The ORM, in this join, would probably allow you to pull fields attached to the group from the query. So if you wanted, say, all these users and all the groups and the group contact for some kind of massive weird mail merge, you could get them.
The post-processing brought on by DISTINCT is narrowing your results down according to the fields you have pulled rather than the rows in the query. If you were to use the PyCharm debugger or something, you might find that the groups aren't as easy to access using various ORM syntax when you have the distinct as when you don't.
I want to execute a simple query like:
select *,count('id') from menu_permission group by menu_id
In Django format I have tried:
MenuPermission.objects.all().values('menu_id').annotate(Count('id))
It selects only menu_id. The executed query is:
SELECT `menu_permission`.`menu_id`, COUNT(`menu_permission`.`id`) AS `id__count` FROM `menu_permission` GROUP BY `menu_permission`.`menu_id`
But I need other fields also. If I try:
MenuPermission.objects.all().values('id','menu_id').annotate(Count('id))
It adds 'id' in group by condition.
GROUP BY `menu_permission`.`id`
As a result I am not getting the expected result. How I can get all all fields in the output but group by a single one?
You can try subqueries to do what you need.
In my case I have two tables: Item and Transaction where item_id links to Item
First, I prepare Transaction subquery with group by item_id where I sum all amount fields and mark item_id as pk for outer query.
per_item_total=Transaction.objects.values('item_id').annotate(total=Sum('amount')).filter(item_id=OuterRef('pk'))
Then I select all rows from item plus subquery result as total filed.
items_with_total=Item.objects.annotate(total=Subquery(per_item_total.values('total')))
This produces the following SQL:
SELECT `item`.`id`, {all other item fields},
(SELECT SUM(U0.`amount`) AS `total` FROM `transaction` U0
WHERE U0.`item_id` = `item`.`id` GROUP BY U0.`item_id` ORDER BY NULL) AS `total` FROM `item`
You are trying to achieve this SQL:
select *, count('id') from menu_permission group by menu_id
But normally SQL requires that when a group by clause is used you only include those column names in the select that you are grouping by. This is not a django matter, but that's how SQL group by works.
The rows are grouped by those columns so those columns can be included in select and other columns can be aggregated if you want them to into a value. You can't include other columns directly as they may have more than one value (since the rows are grouped).
For example if you have a column called "permission_code", you could ask for an array of the values in the "permission_code" column when the rows are grouped by menu_id.
Depending on the SQL flavor you are using, this could be in PostgreSQL something like this:
select menu_id, array_agg(permission_code), count(id) from menu_permissions group by menu_id
Similary django queryset can be constructed for this.
Hopefully this helps, but if needed please share more about what you need to do and what your data models are.
The only way currently that it works as expected is to hve your query based on the model you want the GROUP BY to be based on.
In your case it looks like you have a Menu model (menu_id field foreign key) so doing this would give you what you want and will allow getting other aggregate information from your MenuPermission model but will only group by the Menu.id field:
Menu.objects.annotate(perm_count=Count('menupermission__id')).values('perm_count')
Of course there is no need for the "annotate" intermediate step if all you want is that single count.
query = MenuPermission.objects.values('menu_id').annotate(menu_id_count=Count('menu_id'))
You can check your SQL query by print(query.query)
This solution doesn't work, all fields end up in the group by clause, leaving it here because it may still be useful to someone.
model_fields = queryset.model._meta.get_fields()
queryset = queryset.values('menu_id') \
.annotate(
count=Count('id'),
**{field.name: F(field.name) for field in model_fields}
)
What i'm doing is getting the list of fields of our model, and set up a dictionary with the field name as key and an F instance with the field name as a parameter.
When unpacked (the **) it gets interpreted as named arguments passed into the annotate function.
For example, if we had a "name" field on our model, this annotate call would end up being equal to this:
queryset = queryset.values('menu_id') \
.annotate(
count=Count('id'),
name=F("name")
)
you can use the following code:
MenuPermission.objects.values('menu_id').annotate(Count('id)).values('field1', 'field2', 'field3'...)
I have been trying to group the results of table into Hourly format using DateTimeField.
SQL:
SELECT strftime('%H', created_on), count(*)
FROM users_test
GROUP BY strftime('%H', created_on);
This query works fine, but the corresponding Django query does not.
Django queries I've tried:
Test.objects.extra({'hour': 'strftime("%%H", created_on)'}).values('hour').annotate(count=Count('id'))
# SELECT (strftime("%H", created_on)) AS "hour", COUNT("users_test"."id") AS "count" FROM "users_test" GROUP BY (strftime("%H", created_on)), "users_test"."created_on" ORDER BY "users_test"."created_on" DESC
It adds additional group by "users_test"."created_on", which I guess is giving incorrect results.
It would be great if anyone can explain me this and provide a solution as well.
Environment:
Python 3
Django 1.8.1
Thanks in Advance
References (Possible Duplicates) (But None helping out):
Grouping Django model entries by day using its datetime field
Django - Group By with Date part alone
Django aggregate on .extra values
To fix it, append order_by() to query chain. This will override model Meta default ordering. Like this:
Test
.objects
.extra({'hour': 'strftime("%%H", created_on)'})
.order_by() #<------ here
.values('hour')
.annotate(count=Count('id'))
In my environment ( Postgres also ):
>>> print ( Material
.objects
.extra({'hour': 'strftime("%%H", data_creacio)'})
.order_by()
.values('hour')
.annotate(count=Count('id'))
.query )
SELECT (strftime("%H", data_creacio)) AS "hour",
COUNT("material_material"."id") AS "count"
FROM "material_material"
GROUP BY (strftime("%H", data_creacio))
Learn more in order_by django docs:
If you don’t want any ordering to be applied to a query, not even the default ordering, call order_by() with no parameters.
Side note:
using extra() may introduce SQL injection vulnerability to your code. Use this with precaution and escape any parameters that user can introduce. Compare with docs:
Warning
You should be very careful whenever you use extra(). Every time you
use it, you should escape any parameters that the user can control by
using params in order to protect against SQL injection attacks .
Please read more about SQL injection protection.
How can we achieve the following via the Django 1.5 ORM:
SELECT TO_CHAR(date, 'IW/YYYY') week_year, COUNT(*) FROM entries GROUP BY week_year;
EDIT: cf. Follow up: Count of Group Elements With Joins in Django in case you need a join.
I had to do something like this recently.
You need to add your week_year column via Django's extra, then you can use that column in the values method.
...it's not obvious but if you then use annotate Django will GROUP BY all of the fields mentioned in the values clause (as described in the docs here https://docs.djangoproject.com/en/dev/topics/db/aggregation/#values)
So your code should look like:
Entry.objects.extra(select={'week_year': "TO_CHAR(date, 'IW/YYYY')"}).values('week_year').annotate(Count('id'))
At the moment I have a query that selects distinct values from a model:
Meeting.objects.values('club').distinct()
In addition to the 'club' field, I also wish to select a 'time' field. In other words I wish to select distinct values of the 'club' field and the associated 'time' field. For example for:
CLUB,TIME
ABC1,10:35
ABC2,10:45
ABC2,10:51
ABC3,11:42
I would want:
ABC1,10:35
ABC2,10:45
ABC3,11:42
What is the syntax for this?
This is possible, but only if your database backend is PostgreSQL. Here how it can be done:
Meeting.objects.order_by('club', 'time').values('club', 'time').distinct('club')
Look documentation for distinct