Django Model QuerySet Group By Specific Fields - django

Considering this model & data:
Ad_id + Date = primary key
Ad_id date clicks
------------------------------
3 8/10/12 124
3 7/10/12 433
3 6/10/12 99
4 8/10/12 23
4 7/10/12 80
I'm trying to group by ad_id to return the sum of the over all clicks.
in sql terms:
select Ad_id, date, sum(clicks) from ads group by Ad_id
The problem is the Django automatically do the group by for each field in the model, so the group by is not really working (because each row is unique).
Solutions I've already checked:
I know it is possible to do something like this:
Ad.objects.values('ad_id').annotate(clicks_sum=Sum('clicks'))
But it is not good as it doesn't return the Ad Model, but a dictionary.
I can't use also raw SQL because it is not chain-able
Also I tried to set
MyQuerySet.group_by = ['ad_id']
Not working too..
So I really need to group by only by the fields I need, and that the result will be an Ad Model.

You can perform raw SQL queries using Manager.raw(), in your case that would be:
Ad.objects.raw('select Ad_id, date, sum(clicks) from ads group by Ad_id')
This method method takes a raw SQL query, executes it, and returns a RawQuerySet instance. This RawQuerySet instance can be iterated over just like an normal QuerySet to provide object instances.

Related

Django select query with where clause

Sheet_Table
id ref_id name data
1 10 A 9078
2 10 AAA 6789
3 12 C 345
Sheet Model have multiple Columns id,ref_id,name,data
Now i want to write this query in django
select data from Sheet_Table where ref_id=10
Here Model/Table name is Sheet_Table
It's pretty explicitly stated in the django doc on queries that filter(foo=bar) evaluates to a WHERE clause. In your specific case, try this to get a list of just the data elements (if your model is actually called Sheet_Table?):
Sheet_Table.objects.filter(ref_id=10).values_list('data', flat=True)
or you can leave off the values_list part if you want to iterate over the model objects (e.g., if you want to examine id as well as data).

Django-ORM: distinct is needed. Why?

I am playing around with django ORM
import django
django.setup()
from django.contrib.auth.models import User, Group
from django.db.models import Count
# All users
print(User.objects.all().count())
# --> 742
# Should be: All users which are in a group.
# But the result is different. I don't understand this.
print(User.objects.filter(groups__in=Group.objects.all()).count())
# --> 1731
# All users which are in a group.
# distinct needed
print(User.objects.filter(groups__in=Group.objects.all()).distinct().count())
# --> 543
# All users which are in a group. Without distinct, annotate seems to do this.
print(User.objects.filter(groups__in=Group.objects.all()).annotate(Count('pk')).count())
# --> 543
# All users which are in no group
print(User.objects.filter(groups__isnull=True).count())
# --> 199
# 199 + 543 = 742 (nice)
I don't understand the second query which returns 1731.
I know that I can use distinct().
Nevertheless 1731 looks like a bug to me.
What is the intention why below query is not distinct/unique?
User.objects.filter(groups__in=Group.objects.all())
Raw MySQL query looks like this:
SELECT user.id, group.id FROM user LEFT JOIN group ON user.group_id = group.id
The result will contain all possible combinations of users and groups and I guess some users belong to more than one group.
You are trying to fetch all users from all groups, but a user can present in multiple groups that's why distinct is required. if you want users ina specific group instead of doing an all try a filter query.
I assume that User.groups is a ForeignKey or some other relationship that associates each User with zero to many Group instances.
So the query which confuses you:
User.objects.filter(groups__in=Group.objects.all())
That query can be described as:
Access the Group model manager (Group.objects).
Make a QuerySet:
Return all Group instances (Group.objects.all()).
Access the User model manager (User.objects).
Make a Queryset:
Join to the Group model, on the User.groups foreign key.
Return every (User + Group) row which has an associated Group.
That is not “all users which are in a group”; instead, it is “All user–group pairs where the group exists”.
By querying on each of the multiple-value User.groups field, you are implying that the query must contain a join from User to Group rows.
Instead, you want:
Access the User model manager (User.objects).
Make a QuerySet:
Return all rows which have groups not empty.
User.objects.filter(groups__isnull=False)
Note that this – “All users which have a non-empty set of associated groups” – is the inverse of another example query you have (“All users which are in no group”).
Since groups is a ManyToManyField the query translated into INNER JOIN statement.
If you print the following you will see the query generated by the QuerySet:
>>> print(User.objects.filter(groups__in=Group.objects.all()).query)
SELECT `auth_user`.`id`, .... , `auth_user`.`date_joined` FROM `auth_user` INNER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) WHERE `auth_user_groups`.`group_id` IN (SELECT `auth_group`.`id` FROM `auth_group`)
As you would see the query joins auth_user and auth_user_groups tables.
Where auth_user_groups is the ManyToManyField table not the table for Group model. Thus a user will come more than once.
You would want to use annotate get users having grous, in my case the numbers are following:
$ ./manage.py shell
>>>
>>> from django.contrib.auth.models import User, Group
>>> from django.db.models import Count
>>>
# All users
>>> print(User.objects.all().count())
556
>>>
# All users which are not in a group.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count=0).count())
44
>>>
# All users which are in a group.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count__gt=0).count())
512
>>>
Annotate is similar to distinct in behaviour. It creates a group by query. You can see and inspect the query as following.
>>> print(User.objects.annotate(group_count=Count('groups')).filter(group_count__gt=0).query)
SELECT `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`, COUNT(`auth_user_groups`.`group_id`) AS `group_count` FROM `auth_user` LEFT OUTER JOIN `auth_user_groups` ON (`auth_user`.`id` = `auth_user_groups`.`user_id`) GROUP BY `auth_user`.`id` HAVING COUNT(`auth_user_groups`.`group_id`) > 0 ORDER BY NULL
When you run a 'DISTINCT' query against a database you end up with a listing of each distinct row in the data results. The reason that you have more 'DISTINCT' rows in your Django result is there is a combinatoric cross multiplication going on, creating extra results.
Other answers have mentioned all of this, but since you're asking the why:
The ORM, in this join, would probably allow you to pull fields attached to the group from the query. So if you wanted, say, all these users and all the groups and the group contact for some kind of massive weird mail merge, you could get them.
The post-processing brought on by DISTINCT is narrowing your results down according to the fields you have pulled rather than the rows in the query. If you were to use the PyCharm debugger or something, you might find that the groups aren't as easy to access using various ORM syntax when you have the distinct as when you don't.

How to group by AND aggregate with Django

I have a fairly simple query I'd like to make via the ORM, but can't figure that out..
I have three models:
Location (a place), Attribute (an attribute a place might have), and Rating (a M2M 'through' model that also contains a score field)
I want to pick some important attributes and be able to rank my locations by those attributes - i.e. higher total score over all selected attributes = better.
I can use the following SQL to get what I want:
select location_id, sum(score)
from locations_rating
where attribute_id in (1,2,3)
group by location_id order by sum desc;
which returns
location_id | sum
-------------+-----
21 | 12
3 | 11
The closest I can get with the ORM is:
Rating.objects.filter(
attribute__in=attributes).annotate(
acount=Count('location')).aggregate(Sum('score'))
Which returns
{'score__sum': 23}
i.e. the sum of all, not grouped by location.
Any way around this? I could execute the SQL manually, but would rather go via the ORM to keep things consistent.
Thanks
Try this:
Rating.objects.filter(attribute__in=attributes) \
.values('location') \
.annotate(score = Sum('score')) \
.order_by('-score')
Can you try this.
Rating.objects.values('location_id').filter(attribute__in=attributes).annotate(sum_score=Sum('score')).order_by('-score')

Latest post by different users

Post has user and date attributes.
How can I turn this
posts = Post.objects.order_by('-date')[:30]
to give me 30 or less posts consisting of the last post by every user?
For example if I have 4 posts stored, 3 are from post.user="Benny" and 1 from post.user="Catherine" it should return 1 post from Benny and 1 from Catherine ordered by date.
I would probably use annotate, you might be able to use extra to get down to a single query.
posts = []
for u in User.objects.annotate(last_post=Max('post__date')).order_by('-last_post')[:30]:
posts.append(u.post_set.latest('date'))
You could also use raw, which would let you write a SQL query, but still return model instances. For instance:
sql = """
SELECT * FROM app_post
WHERE app_post.date IN
(SELECT MAX(app_post.date) FROM app_post
GROUP BY app_post.user_id)
ORDER BY app_post.date DESC
"""
posts = list(Post.objects.raw(sql))
untested but you might be able to do this
from django.db.models import Max
Post.objects.all().annotate(user=Max('date'))

Grouping Custom Attributes in a Query

I have an application that allows for "contacts" to be made completely customized. My method of doing that is letting the administrator setup all of the fields allowed for the contact. My database is as follows:
Contacts
id
active
lastactive
created_on
Fields
id
label
FieldValues
id
fieldid
contactid
response
So the contact table only tells whether they are active and their identifier; the fields tables only holds the label of the field and identifier, and the fieldvalues table is what actually holds the data for contacts (name, address, etc.)
So this setup has worked just fine for me up until now. The client would like to be able to pull a cumulative report, but say state of all the contacts in a certain city. Effectively the data would have to look like the following
California (from fields table)
Costa Mesa - (from fields table) 5 - (counted in fieldvalues table)
Newport 2
Connecticut
Wallingford 2
Clinton 2
Berlin 5
The state field might be id 6 and the city field might be id 4. I don't know if I have just been looking at this code way to long to figure it out or what,
The SQL to create those three tables can be found at https://s3.amazonaws.com/davejlong/Contact.sql
You've got an Entity Attribute Value (EAV) model. Use the field and fieldvalue tables for searching only - the WHERE caluse. Then make life easier by keeping the full entity's data in a CLOB off the main table (e.g. Contacts.data) in a serialized format (WDDX is good for this). Read the data column out, deserialize, and work with on the server side. This is much easier than the myriad of joins you'd need to do otherwise to reproduce the fully hydrated entity from an EAV setup.