How to group by over related model count in Django ORM - django

Let's say I have the following models:
class Conversation(Model):
...
class Message(Model):
conversation = models.ForeignKey(Conversation)
The following code counts the number of messages for each conversation:
Conversation.objects \
.filter(...) \
.annotate(size=Count("message")) \
.values("size")
and returns a query set of {'size': 0}, {'size': 1}, {'size': 0}, {'size': 0} etc.
But how do I aggregate over the conversation sizes? That is, obtaining a result something like this: (0, 3), (1, 1), where the first element in each pair is the conversation size and the second element is the number of conversations of this size?

Related

How to get top 5 records in django dict data

I m having two tables 1) Visit 2) disease. visit table having a column for disease. I m trying to get top 5 disease from visit table.
dis=disease.objects.all()
for d in dis:
v=visits.objects.filter(disease=d.disease_name).count()
data={
d.disease_name : v
}
print (data)
This print all disease with respective count. as below:
{'Headache': 2}
{'Cold': 1}
{'Cough': 4}
{'Dog Bite': 0}
{'Fever': 2}
{'Piles': 3}
{'Thyroid': 4}
{'Others': 9}
I want to get top 5 from this list based on count. How to do it?
Thank you all for your reply, I got an other simple solution for it.
from django.db.models import Count
x = visits.objects.values('disease').annotate(disease_count=Count('disease')).order_by('-disease_count')[:5]
print(x)
it returns as below:
<QuerySet [{'disease': 'Others', 'disease_count': 9}, {'disease': 'Thyroid', 'disease_count': 4}, {'disease': 'Cough', 'disease_count': 4}, {'disease': 'Piles', 'disease_count': 3}, {'disease': 'Headache', 'disease_count': 2}]>
I think this is simplest solutions. It working for me...
Add data in a list and sort list based on what you want:
dis=disease.objects.all()
l = list()
for d in dis:
v=visits.objects.filter(disease=d.disease_name).count()
data={
d.disease_name : v
}
l.append(data)
l.sort(reverse=True, key=lambda x:list(x.values())[0])
for i in range(min(len(l), 5)):
print(l[i])
You can sort these values by writing code like that:
diseases = list(Disease.objects.values_list('disease_name', flat=True))
visits = list(
Visits.objects.filter(disease__in=diseases).values_list('disease', flat=True))
data = {}
for name in diseases:
count = visits.count(name)
data[name] = count
sorted_data = sorted(data.items(), key=operator.itemgetter(1), reverse=True)
new_data = {}
for idx in range(min(len(sorted_data), 5)):
item = sorted_data[idx]
new_data[item[0]] = item[1]
print(new_data)
It's little messy, but it does the job:
I also optimised your queries, so the code should also run bit faster (when you do logic like that, use list and .values_list(...) because it caches data in memory - and using native python functions on list instead of QuerySet like .count() should also be faster than hitting database).

Get top n records for each group with Django queryset

I have a model like the following Table,
create table `mytable`
(
`person` varchar(10),
`groupname` int,
`age` int
);
And I want to get the 2 oldest people from each group. The original SQL question and answers are here StackOverflow and One of the solutions that work is
SELECT
person,
groupname,
age
FROM
(
SELECT
person,
groupname,
age,
#rn := IF(#prev = groupname, #rn + 1, 1) AS rn,
#prev := groupname
FROM mytable
JOIN (SELECT #prev := NULL, #rn := 0) AS vars
ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2
You can check the SQL output here as well SQLFIDLE
I just want to know how can I implement this query in Django's views as queryset.
Another SQL with similar output would have window function that annotates each row with row number within particular group name and then you would filter row numbers lower or equal 2 in HAVING clause.
At the moment of writing django does not support filtering based on window function result so you need to calculate row in the first query and filter People in the second query.
Following code is based on similar question but it implements limiting number of rows to be returned per group_name.
from django.db.models import F, When, Window
from django.db.models.functions import RowNumber
person_ids = {
pk
for pk, row_no_in_group in Person.objects.annotate(
row_no_in_group=Window(
expression=RowNumber(),
partition_by=[F('group_name')],
order_by=['group_name', F('age').desc(), 'person']
)
).values_list('id', 'row_no_in_group')
if row_no_in_group <= 2
}
filtered_persons = Person.objects.filter(id__in=person_ids)
For following state of Person table
>>> Person.objects.order_by('group_name', '-age', 'person').values_list('group_name', 'age', 'person')
<QuerySet [(1, 19, 'Brian'), (1, 17, 'Brett'), (1, 14, 'Teresa'), (1, 13, 'Sydney'), (2, 20, 'Daniel'), (2, 18, 'Maureen'), (2, 14, 'Vincent'), (2, 12, 'Carlos'), (2, 11, 'Kathleen'), (2, 11, 'Sandra')]>
queries above return
>>> filtered_persons.order_by('group_name', '-age', 'person').values_list('group_name', 'age', 'person')
<QuerySet [(1, 19, 'Brian'), (1, 17, 'Brett'), (2, 20, 'Daniel'), (2, 18, 'Maureen')]>

Combine two Django Querysets based on common field

I have two Querysets (actually, list of dicts) like:
q1 = M1.objects.filter(id=pk).values('p_id', 'q1_quantity')
# q1: <Queryset[{'p_id': 2, 'q1_quantity': 4}, {'p_id': 3, 'q1_quantity': 5}]>
q2 = M2.objects.filter(p_id__in=[q1[x]['p_id'] for x in range(len(q1))]).values('p_id', 'q2_quantity')
# q2: <Queryset[{'p_id': 2, 'q2_quantity': 2}, {'p_id': 2, 'q2_quantity': 5}, {'p_id': 3, 'q2_quantity': 1}, {'p_id': 3, 'q2_quantity': 7}]>
q1 has distinct key:value pairs, while q2 has repeated keys.
1) I want to sum all the values of q2 by common p_id, such that q2 becomes:
# q2: <Queryset[{'p_id': 2, 'q2_quantity': 7}, {'p_id': 3, 'q2_quantity': 8}]>
2) Then, merge q1 and q2 into q3, based on common p_id, like:
q3 = ?
# q3: <Queryset[{'p_id': 2, 'q1_quantity': 4, 'q2_quantity': 7}, {'p_id': 3, 'q1_quantity': 5, 'q2_quantity': 8}]>
I have looked into union(). But don't know how to go about summing the queryset (q2) and then merging it with q1.
Can someone please help me?
The problem is that you're implementing inefficient models, having 2 separate models with repeated fields will force you to make 2 queries. You may want to consider having them all in one model, or the M2 model extends M1.
models.py
class M(models.Model):
p_id = #Your Field...
q1_quantity = #Your Field...
q2_quantity = #Your Field...
then on your views.py
q = M.objects.filter(id=pk).values('p_id', 'q1_quantity', 'q2_quantity')
Potential Issue: In the code you posted, the commented section shows a queryset of more than 1 object and pk as primary key should be unique and therefore should return a unique object queryset.
1) I want to sum all the values of q2 by common p_id, such that q2 becomes:
# q2: <Queryset[{'p_id': 2, 'q2_quantity': 7}, {'p_id': 3, 'q2_quantity': 8}]>
Used itertools.combinations:
from itertools import combinations
compare = []
for a, b in combinations(q2, 2):
if a['p_id'] == b ['p_id']:
a['q2_quantity'] += b['q2_quantity']
if len(compare) <= 0:
compare.append(a)
else:
[compare[d]['q2_quantity'] for d in range(len(compare)) if a['p_id'] == compare[d]['p_id']]
else:
if len(compare) <= 0:
compare.append(a)
compare.append(b)
else:
if any([a['p_id'] == compare[d]['p_id'] for d in range(len(compare))]):
pass
else:
compare.append(a)
if any([b['p_id'] == compare[d]['p_id'] for d in range(len(compare))]):
pass
else:
compare.append(b)
2) Then, merge q1 and q2 into q3, based on common p_id, like:
q3 = ?
# q3: <Queryset[{'p_id': 2, 'q1_quantity': 4, 'q2_quantity': 7}, {'p_id': 3, 'q1_quantity': 5, 'q2_quantity': 8}]>
As per this SO post:
from collections import defaultdict
from itertools import chain
collector = defaultdict(dict)
for collectible in chain(cp, compare):
collector[collectible['p_id']].update(collectible.items())
products = list(collector.values())

Can't query the sum of values using aggregators

I want to sum the values of all existing rows grouping by another field.
Here's my model structure:
class Answer(models.Model):
person = models.ForeignKey(Person)
points = models.PositiveIntegerField(default=100)
correct = models.BooleanField(default=False)
class Person(models.Model):
# irrelevant model fields
Sample dataset:
Person | Points
------ | ------
4 | 90
3 | 50
3 | 100
2 | 100
2 | 90
Here's my query:
Answer.objects.values('person').filter(correct=True).annotate(points_person=Sum('points'))
And the result (you can see that all the person values are separated):
[{'person': 4, 'points_person': 90}, {'person': 3, 'points_person': 50}, {'person': 3, 'points_person': 100}, {'person': 2, 'points_person': 100}, {'person': 2, 'points_person': 90}]
But what I want (sum of points by each person):
[{'person': 4, 'points_person': 90}, {'person': 3, 'points_person': 150}, {'person': 2, 'points_person': 190}]
Is there any way to achieve this using only queryset filtering?
Thanks!
Turns out I had to do the inverse filtering, by the Person's and not the Answers, like so:
Person.objects.filter(answer__correct=True).annotate(points=Sum('answer__points'))
Now I get the total summed points for each person correctly.

Break django values down into count of each value

I have a model defined similar to below
class MyModel(models.Model):
num_attempts = models.IntegerField()
num_generated = models.IntegerField()
num_deleted = models.IntegerField()
Assuming my data looked something like this:
|id|num_attempts|num_generated|num_deleted
1 1 2 0
2 2 0 1
3 3 2 1
4 3 1 2
I want to get a count of the instances at each possible value for each possible field.
For example, a return sample could look like this.
{
'num_attempts_at_1': 1,
'num_attempts_at_2': 1,
'num_attempts_at_3': 2,
'num_generated_at_0': 1,
'num_generated_at_1': 1,
'num_generated_at_2': 2,
'num_deleted_at_0': 1,
'num_deleted_at_1': 2,
'num_deleted_at_2': 1
}
This above example assumes a lot, like naming of the variables after and that it would be serialized. None of that matters but rather just how do I get it broken down like that from the database. It would be best to have this done in one query if possible.
We are using Postgres as the database.
Here is sorta close, but not quite.
qs.values_list('num_attempts','num_generated','num_deleted').annotate(Count('id'))
Gives this (not the same data as the example above)
[{'num_attempts': 4, 'id__count': 3, 'num_deleted': 3, 'num_generated': 6}, {'num_attempts': 3, 'id__count': 12, 'num_deleted': 2, 'num_generated': 2}, {'num_attempts': 2, 'id__count': 5, 'num_deleted': 0, 'num_generated': 6}]
Now with some custom python I was able to do this, but really want a database solution if possible.
def get(self, request, *args, **kwargs):
qs = self.get_queryset()
return_data = {}
for obj in qs:
count = obj.pop('id__count')
for k, v in obj.items():
key = "{}_at_{}".format(k, v)
value = return_data.get(key, 0) + count
return_data[key] = value
return Response(return_data)