I know that I can get the number of groups each user has with this query:
User.objects.filter(groups__in=Group.objects.all()).annotate(Count('pk'))
But something is missing:
The users which are in no group at all.
How can I use the django orm to get all users annotated by their group count, incluse users with no group?
You can use Count method with groups attribute:
from django.db.models import Count
User.objects.annotate(group_count=Count('groups'))
I am playing around with django ORM
import django
django.setup()
from django.contrib.auth.models import User, Group
from django.db.models import Count
# All users
print(User.objects.all().count())
# --> 742
# Should be: All users which are in a group.
# But the result is different. I don't understand this.
print(User.objects.filter(groups__in=Group.objects.all()).count())
# --> 1731
# All users which are in a group.
# distinct needed
print(User.objects.filter(groups__in=Group.objects.all()).distinct().count())
# --> 543
# All users which are in a group. Without distinct, annotate seems to do this.
print(User.objects.filter(groups__in=Group.objects.all()).annotate(Count('pk')).count())
# --> 543
# All users which are in no group
print(User.objects.filter(groups__isnull=True).count())
# --> 199
# 199 + 543 = 742 (nice)
I don't understand the second query which returns 1731.
I know that I can use distinct().
Nevertheless 1731 looks like a bug to me.
What is the intention why below query is not distinct/unique?
User.objects.filter(groups__in=Group.objects.all())
Raw MySQL query looks like this:
SELECT user.id, group.id FROM user LEFT JOIN group ON user.group_id = group.id
The result will contain all possible combinations of users and groups and I guess some users belong to more than one group.
You are trying to fetch all users from all groups, but a user can present in multiple groups that's why distinct is required. if you want users ina specific group instead of doing an all try a filter query.
I assume that User.groups is a ForeignKey or some other relationship that associates each User with zero to many Group instances.
So the query which confuses you:
User.objects.filter(groups__in=Group.objects.all())
That query can be described as:
Access the Group model manager (Group.objects).
Make a QuerySet:
Return all Group instances (Group.objects.all()).
Access the User model manager (User.objects).
Make a Queryset:
Join to the Group model, on the User.groups foreign key.
Return every (User + Group) row which has an associated Group.
That is not “all users which are in a group”; instead, it is “All user–group pairs where the group exists”.
By querying on each of the multiple-value User.groups field, you are implying that the query must contain a join from User to Group rows.
Instead, you want:
Access the User model manager (User.objects).
Make a QuerySet:
Return all rows which have groups not empty.
User.objects.filter(groups__isnull=False)
Note that this – “All users which have a non-empty set of associated groups” – is the inverse of another example query you have (“All users which are in no group”).
Since groups is a ManyToManyField the query translated into INNER JOIN statement.
If you print the following you will see the query generated by the QuerySet:
>>> print(User.objects.filter(groups__in=Group.objects.all()).query)
SELECT `auth_user`.`id`, .... , `auth_user`.`date_joined` FROM `auth_user` INNER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) WHERE `auth_user_groups`.`group_id` IN (SELECT `auth_group`.`id` FROM `auth_group`)
As you would see the query joins auth_user and auth_user_groups tables.
Where auth_user_groups is the ManyToManyField table not the table for Group model. Thus a user will come more than once.
You would want to use annotate get users having grous, in my case the numbers are following:
$ ./manage.py shell
>>>
>>> from django.contrib.auth.models import User, Group
>>> from django.db.models import Count
>>>
# All users
>>> print(User.objects.all().count())
556
>>>
# All users which are not in a group.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count=0).count())
44
>>>
# All users which are in a group.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count__gt=0).count())
512
>>>
Annotate is similar to distinct in behaviour. It creates a group by query. You can see and inspect the query as following.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count__gt=0).query)
SELECT `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`, COUNT(`auth_user_groups`.`group_id`) AS `group_count` FROM `auth_user` LEFT OUTER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) GROUP BY `auth_user`.`id` HAVING COUNT(`auth_user_groups`.`group_id`) > 0 ORDER BY NULL
When you run a 'DISTINCT' query against a database you end up with a listing of each distinct row in the data results. The reason that you have more 'DISTINCT' rows in your Django result is there is a combinatoric cross multiplication going on, creating extra results.
Other answers have mentioned all of this, but since you're asking the why:
The ORM, in this join, would probably allow you to pull fields attached to the group from the query. So if you wanted, say, all these users and all the groups and the group contact for some kind of massive weird mail merge, you could get them.
The post-processing brought on by DISTINCT is narrowing your results down according to the fields you have pulled rather than the rows in the query. If you were to use the PyCharm debugger or something, you might find that the groups aren't as easy to access using various ORM syntax when you have the distinct as when you don't.
Suppose user1 is a staff member belonging to a group g1 in django admin. This group g1 is having access to say a model m1. Now I want to remove the access of m1 model from user1 without removing him from the group g1.
How can it be done?
Implementing it in the code seems to be hackish. It is better to create another group g2, add there user1 and delete him/her from g1. This is the only Django way I know.
I want to somehow join a group with all my permissions.
I want to query ALL permissions, and for each permission query a boolean indicator if the group has it.
So assume this
group = Group.objects.get(pk=1) # specific group
Permission.objects.all().annotate( group has it ??)
group.permissions.all()
won't help since I want to query all permissions.
UPDATE:
Clear explanation:
Assume my Permission table is (pk values): 1, 2 ,3 - total three rows.
Group table: one group with pk=1.
Group-permission (many-to-many) table: group with pk 1, has permission 1,2 (so two rows)
I want to display all the permissions, with an indicator near them whether the group has it.
So in our case I should get :
1 True
2 True
3 False
Cause the group don't have permission with pk=3.
I think the below query would work for you
from django.db.models import Case, When, BooleanField
group_name = 'your group name'
Permission.objects.annotate(
has_perm=Max(Case(
When(group__name=group_name, then=1),
default=0,
output_field=BooleanField()
))
).values_list('name', 'has_perm').order_by('has_perm')
Conditional expressions were introduced in Django 1.8, and aggregation (annotate) is also well documented.
What would be the best method for altering the default group to include a foreign key to a company?
This is using the built - in auth, or would you create a custom auth?
To further update... I would like a many - to - many between groups and company and then a third table for user groups. For example -- company groups (m2m) and user companygroups (m2m)
This will allow organizations to create their own groups if needed...
I have decided to create a m2m between the company table and groups table