Django group by in many to many relationships

Django group by in many to many relationships - django

I have a model named Evaluation with following schema:
user = models.ForeignKey(User)
value = models.IntegerField()
The value field will take value in 0,1,2,3.
Now I want to get the count of evaluations of a given user with each value. For example, suppose my data are:
user.id | value
1 | 0
1 | 0
1 | 1
1 | 2
1 | 3
1 | 3
I want to get the result
value | count
0 | 2
1 | 1
2 | 1
3 | 2
I use the query
Evaluation.objects.filter(user=request.user).annotate(count=Count('value')).order_by('value')
But it does not return the correct answer. Could anyone help?

you can do it this way:
Evaluation.objects.filter(user=request.user).values('value').annotate(count=Count('value')).order_by('value')

Add the values() method:
Evaluation.objects.filter(user_id=request.user) \
.values('value').annotate(count=Count('value')) \
.order_by('value')

You could build reverse query and query the User model instead:
User.objects.filter(user=request.user).values('evaluation__value').annotate(count=Count('evaluation__user'))
which will produce below results:
[{'count': 1, 'evaluation__value': 1}, {'count': 1, 'evaluation__value': 2}, {'count': 2, 'evaluation__value': 0}, {'count': 2, 'evaluation__value': 3}]
Additionally you might want to sort the results:
queryset.order_by('-count') # sorts by count desc
Unfortunately you cannot alias the value in values queryset method hence the ugly evaluation__value as field name. See this Django ticket.
HTH.

Related

Django race condition aggregate(Max) in F() expression

Imagine the following model:
class Item(models.Model):
# natural PK
increment = models.PositiveIntegerField(_('Increment'), null=True, blank=True, default=None)
# other fields
When an item is created, I want the increment fields to automatically acquire the maximum value is has across the whole table, +1. For example:
|_item____________________________|
|_id_|_increment__________________|
| 1 | 1 |
| 2 | 2 |
| 4 | 3 | -> id 3 was deleted at some stage..
| 5 | 4 |
| 6 | 5 |
.. etc
When a new Item() comes in and is saved(), how in one pass, and in way that will avoid race conditions, make sure it will have increment 6 and not 7 in case another process does exactly the same thing, at the same time?
I have tried:
with transaction.atomic():
i = Item()
highest_increment = Item.objects.all().aggregate(Max('increment'))
i.increment = highest_increment['increment__max']
i.save()
I would like to be able to create it in a way similar to the following, but that obviously does not work (have checked places like https://docs.djangoproject.com/en/3.2/ref/models/expressions/#avoiding-race-conditions-using-f):
from django.db.models import Max, F
i = Item(
increment=F(Max(increment))
)
Many thanks

Django ORM. Select only duplicated fields from DB

I have table in DB like this:
MyTableWithValues
id | user(fk to Users) | value(fk to Values) | text | something1 | something2 ...
1 | userobject1 | valueobject1 |asdasdasdasd| 123 | 12321
2 | userobject2 | valueobject50 |QWQWQWQWQWQW| 515 | 5555455
3 | userobject1 | valueobject1 |asdasdasdasd| 12345 | 123213
I need to delete all objects where are repeated fields user, value and text, but save one from them. In this example will be deleted 3rd record.
How can I do this, using Django ORM?
PS:
try this:
recs = (
MyTableWithValues.objects
.order_by()
.annotate(max_id=Max('id'), count_id=Count('user__id'))
#.filter(count_id__gt=1)
.annotate(count_values=Count('values'))
#.filter(count_icd__gt=1)
)
...
...
for r in recs:
print(r.id, r.count_id, , r.count_values)
it prints something like this:
1 1 1
2 1 1
3 1 1
...
Dispite the fact, that in database there are duplicated values. I cant understand, why Count function does not work.
Can anybody help me?

You should first be aware of how count works.
The Count method will count for identical rows.
It uses all the fields available in an object to check if it is identical with fields of other rows or not.
So in current situation the count_values is resulting 1 because Count is using all fields excluding id to look for similar rows.
Count is including user,value,text,something1,something2 fields to check for similarity.
To count rows with similar fields you have to use only user,values & text field
Query:
recs = MyTableWithValues.objects
.values('user','values','text')
.annotate(max_id=Max('id'),count_id=Count('user__id'))
.annotate(count_values=Count('values'))
It will return a list of dictionary
print(recs)
Output:
<QuerySet[{'user':1,'values':1,'text':'asdasdasdasd','max_id':3,'count_id':2,'count_values':2},{'user':2,'values':2,'text':'QWQWQWQWQWQW','max_id':2,'count_id':1,'count_values':1}]
using this queryset you can check how many times a row contains user,values & text field with same values

Would a Python loop work for you?
import collections
d = collections.defaultdict(list)
# group all objects by the key
for e in MyTableWithValues.objects.all():
k = (e.user_id, e.value_id, e.text)
d[k].append(e)
for k, obj_list in d.items():
if len(obj_list) > 1:
for e in obj_list[1:]:
# except the first one, delete all objects
e.delete()

Compare fields within relationship on Django ORM

I have two models, route and stop.
A route can have several stop, each stop have a name and a number. On same route, stop.number are unique.
The problem:
I need to search which route has two different stops and one stop.number is less than the other stop.number
Consider the following models:
class Route(models.Model):
name = models.CharField(max_length=20)
class Stop(models.Model):
route = models.ForeignKey(Route)
number = models.PositiveSmallIntegerField()
location = models.CharField(max_length=45)
And the following data:
Stop table
| id | route_id | number | location |
|----|----------|--------|----------|
| 1 | 1 | 1 | 'A' |
| 2 | 1 | 2 | 'B' |
| 3 | 1 | 3 | 'C' |
| 4 | 2 | 1 | 'C' |
| 5 | 2 | 2 | 'B' |
| 6 | 2 | 3 | 'A' |
In example:
Given two locations 'A' and 'B', search which routes have both location and A.number is less than B.number
With the previous data, it should match route id 1 and not route id 2
On raw SQL, this works with a single query:
SELECT
`route`.id
FROM
`route`
LEFT JOIN `stop` stop_from ON stop_from.`route_id` = `route`.`id`
LEFT JOIN `stop` stop_to ON stop_to.`route_id` = `route`.`id`
WHERE
stop_from.`stop_location_id` = 'A'
AND stop_to.`stop_location_id` = 'B'
AND stop_from.stop_number < stop_to.stop_number
Is this possible to do with one single query on Django ORM as well?

Generally ORM frameworks like Django ORM, SQLAlchemy and even Hibernate is not design to autogenerate most efficient query. There is a way to write this query only using Model objects, however, since I had similar issue, I would suggest to use raw query for more complex queries. Following is link for Django raw query:
[https://docs.djangoproject.com/en/1.11/topics/db/sql/]
Although, you can write your query in many ways but something like following could help.
from django.db import connection
def my_custom_sql(self):
with connection.cursor() as cursor:
cursor.execute("SELECT
`route`.id
FROM
`route`
LEFT JOIN `stop` stop_from ON stop_from.`route_id` = `route`.`id`
LEFT JOIN `stop` stop_to ON stop_to.`route_id` = `route`.`id`
WHERE
stop_from.`stop_location_id` = %s
AND stop_to.`stop_location_id` = %s
AND stop_from.stop_number < stop_to.stop_number", ['A', 'B'])
row = cursor.fetchone()
return row
hope this helps.

Weird behavior in Django queryset union of values

I want to join the sum of related values from users with the users that do not have those values.
Here's a simplified version of my model structure:
class Answer(models.Model):
person = models.ForeignKey(Person)
points = models.PositiveIntegerField(default=100)
correct = models.BooleanField(default=False)
class Person(models.Model):
# irrelevant model fields
Sample dataset:
Person | Answer.Points
------ | ------
3 | 50
3 | 100
2 | 100
2 | 90
Person 4 has no answers and therefore, points
With the query below, I can achieve the sum of points for each person:
people_with_points = Person.objects.\
filter(answer__correct=True).\
annotate(points=Sum('answer__points')).\
values('pk', 'points')
<QuerySet [{'pk': 2, 'points': 190}, {'pk': 3, 'points': 150}]>
But, since some people might not have any related Answer entries, they will have 0 points and with the query below I use Coalesce to "fake" their points, like so:
people_without_points = Person.objects.\
exclude(pk__in=people_with_points.values_list('pk')).\
annotate(points=Coalesce(Sum('answer__points'), 0)).\
values('pk', 'points')
<QuerySet [{'pk': 4, 'points': 0}]>
Both of these work as intended but I want to have them in the same queryset so I use the union operator | to join them:
everyone = people_with_points | people_without_points
Now, for the problem:
After this, the people without points have their points value turned into None instead of 0.
<QuerySet [{'pk': 2, 'points': 190}, {'pk': 3, 'points': 150}, {'pk': 4, 'points': None}]>
Anyone has any idea of why this happens?
Thanks!

I should mention that I can fix that by annotating the queryset again and coalescing the null values to 0, like this:
everyone.\
annotate(real_points=Concat(Coalesce(F('points'), 0), Value(''))).\
values('pk', 'real_points')
<QuerySet [{'pk': 2, 'real_points': 190}, {'pk': 3, 'real_points': 150}, {'pk': 4, 'real_points': 0}]>
But I wish to understand why the union does not work as I expected in my original question.
EDIT:
I think I got it. A friend instructed me to use django-debug-toolbar to check my SQL queries to investigate further on this situation and I found out the following:
Since it's a union of two queries, the second query annotation is somehow not considered and the COALESCE to 0 is not used. By moving that to the first query it is propagated to the second query and I could achieve the expected result.
Basically, I changed the following:
# Moved the "Coalesce" to the initial query
people_with_points = Person.objects.\
filter(answer__correct=True).\
annotate(points=Coalesce(Sum('answer__points'), 0)).\
values('pk', 'points')
# Second query does not have it anymore
people_without_points = Person.objects.\
exclude(pk__in=people_with_points.values_list('pk')).\
values('pk', 'points')
# We will have the values with 0 here!
everyone = people_with_points | people_without_points

django - OR query using lambda

I want to perform OR query using django ORM. I referred this answer and it fits my need.
I have a list of integers which gets generated dynamically. These integers represent user id in a particular table. This table also has a date field. I want to query the database for all user ids in the list for a given date.
For example: From below table, I want records for user ids 2 and 3 for the date 2015-02-28
id | date
---------------
1 | 2015-02-23
1 | 2015-02-25
1 | 2015-02-28
2 | 2015-02-28
2 | 2015-03-01
3 | 2015-02-28
I am unable to figure out which of the following two should be perfect for my use case:
Table.objects.filter(reduce(lambda x, y: (x | y) & Q(date=datetime.date(2015, 2, 28)), [Q(user_id=i) for i in ids])
OR
Table.objects.filter(reduce(lambda x, y: (x | y), [Q(user_id=i) for i in ids]) & Q(date=datetime.date(2015, 2, 28))
Both of the above yield similar output at the moment. Without lambda, below query would fit my need:
Table.objects.filter(Q(user_id=3) & Q(date=datetime.date(2015, 2, 28))| Q(user_id=2) & Q(date=datetime.date(2015, 2, 28)))

I think you do not need reduce and Q objects here, you can just do:
Table.objects.filter(
user_id__in=[2,3],
date=datetime.date(2015, 2, 28),
)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django group by in many to many relationships - django

you can do it this way: Evaluation.objects.filter(user=request.user).values('value').annotate(count=Count('value')).order_by('value')

Add the values() method: Evaluation.objects.filter(user_id=request.user) \ .values('value').annotate(count=Count('value')) \ .order_by('value')

Related

Django race condition aggregate(Max) in F() expression

Django ORM. Select only duplicated fields from DB

Compare fields within relationship on Django ORM

Weird behavior in Django queryset union of values

django - OR query using lambda

Categories

Resources