Django - annotate with multiple Count - django

I have a model called Post which has two fields upvotes and downvotes. Now, upvotes, downvotes are ManyToManyField to a Profile. This is the model:
class Post(models.Model):
profile = models.ForeignKey(Profile, on_delete=models.CASCADE)
title = models.CharField(max_length=300)
content = models.CharField(max_length=1000)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
subreddit = models.ForeignKey(Subreddit, on_delete=models.CASCADE)
upvotes = models.ManyToManyField(Profile, blank=True, related_name='upvoted_posts')
downvotes = models.ManyToManyField(Profile, blank=True, related_name='downvoted_posts')
So, I want to fetch all the posts such that they are in the order of
total(upvotes) - total(downvotes)
So I have used this query:
Post.objects.annotate(
total_votes=Count('upvotes')-Count('downvotes')
).order_by('total_votes')
The problem with this query is the total_votes is always turning out to be zero.
The below queries will explain the situation:
In [5]: Post.objects.annotate(up=Count('upvotes')).values('up')
Out[5]: <QuerySet [{'up': 1}, {'up': 3}, {'up': 2}]>
In [6]: Post.objects.annotate(down=Count('downvotes')).values('down')
Out[6]: <QuerySet [{'down': 1}, {'down': 1}, {'down': 1}]>
In [10]: Post.objects.annotate(up=Count('upvotes'), down=Count('downvotes'), total=Count('upvotes')-Count('downvotes')).values('up', 'down', 'total')
Out[10]: <QuerySet [{'up': 1, 'down': 1, 'total': 0}, {'up': 3, 'down': 3, 'total': 0}, {'up': 2, 'down': 2, 'total': 0}]>
Seems like both up and down are having the same value(which is actually the value of up). How can I solve this?
I have tried this:
In [9]: Post.objects.annotate(up=Count('upvotes')).annotate(down=Count('downvotes')).values('up', 'down')
Out[9]: <QuerySet [{'up': 1, 'down': 1}, {'up': 3, 'down': 3}, {'up': 2, 'down': 2}]>
but even this gives the same output.

Try to use dictinct argument:
Post.objects.annotate(
total_votes=Count('upvotes', distinct=True)-Count('downvotes', distinct=True)
).order_by('total_votes')
From the docs:
Combining multiple aggregations with annotate() will yield the wrong
results because joins are used instead of subqueries. For most
aggregates, there is no way to avoid this problem, however, the Count
aggregate has a distinct parameter that may help.

(I'm aware that this isn't exactly an answer, but code can't be embedded in a comment.)
A better data model would be
class Post:
# ...
class Vote:
voter = models.ForeignKey(Profile, on_delete=models.PROTECT)
post = models.ForeignKey(Post, on_delete=models.CASCADE)
score = models.IntegerField() # either -1 or +1; validate accordingly
class Meta:
unique_together = [('voter', 'post'),]
This way you could count the current total score for a post simply with
Vote.objects.filter(post=post).aggregate(score=Sum('score'))
However, you should be well aware of the performance implications of doing this (or your original version for that matter) every time. It would be better to add a
score = models.IntegerField(editable=False)
field to the Post, that gets updated with the aggregate score every time a vote is created, modified or deleted.

Related

Accessing one model from within another in Django many-to-one using ForeignKey

Lets imagine we have two models (many-to-one model).
Code below shows that a reporter can have multiple articles
class Reporter(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
email = models.EmailField()
def __str__(self):
return "%s %s" % (self.first_name, self.last_name)
class Article(models.Model):
headline = models.CharField(max_length=100)
pub_date = models.DateField(null=True)
reporter = models.ForeignKey(Reporter, on_delete=models.CASCADE, null=True)
def __str__(self):
return self.headline
Let's see what I have in my database on this model.
# Reporter.objects.all().values()
# <QuerySet [
# {'id': 1, 'first_name': 'John', 'last_name': 'Smith', 'email': 'john#example.com'},
# {'id': 2, 'first_name': 'Paul', 'last_name': 'Jones', 'email': 'paul#example.com'}
# ]>
# Article.objects.all().values()
# <QuerySet [
# {'id': 5, 'headline': "1st headline", 'pub_date': datetime.date(2005, 7, 29),
# 'reporter_id': 1},
# {'id': 6, 'headline': "2nd headline", 'pub_date': datetime.date(2006, 1, 17),
# 'reporter_id': 2},
# {'id': 7, 'headline': '3rd headline', 'pub_date': datetime.date(2005, 7, 27),
# 'reporter_id': 1}
# ]>
The first reporter has two publications and second has the only.
I need to get the list of all articles for each reporter.
I tried this way (according to django docs):
Article.objects.filter(reporter__first_name='John')
It's okay. It works. I also tried to instantiate the first reporter as 'r1' and then do this:
r1.article_set.all()
And this piece of code works too.
But as I'm new to django, I think that instantiating the first reporter as 'r1' and then making a query is a bit slow. It is because django makes me run r1.save() and then r1.article_set.all(). It looks like django makes 2 query into database (first query - to save an instance, the second query to run r1.article_set.all)
Is my point of view correct? And how to query all the reporter's articles as fast as Article.objects.filter(reporter__first_name='John') but using the Reporter object?
Thanks
I also tried to instantiate the first reporter as 'r1' and then do this:
r1.article_set.all()
And this piece of code works too. But as I'm new to django, I think that instantiating the first reporter as 'r1' and then making a query is a bit slow.
Yes, but Django can load the related articles all with a second query in bulk. We do this with .prefetch_related(…) [Django-doc]:
reporters = Reporter.objects.prefetch_related('article_set')
for reporter in reporters:
print(reporter.first_name)
print(reporter.article_set.all())
Instead of the N+1 queries that your implementations make (one query to fetch all reporters, and one query per reporter to fetch the related articles), this will make two queries: one to fetch all the reporters, and one to fetch all the articles related to one of these reporters. Django will then do some joining such that the articles related to the first reporter r1 end up in r1.article_set.all()

Django: Group by date then calculate Sum of amount for each date

I have a django model like. it stores total transactions happened over time periods.
class Transaction(models.Model):
amount = models.FloatField()
seller = models.ForeignKey(User, related_name='sells', on_delete=models.CASCADE)
buyer = models.ForeignKey(User, related_name='purchased', on_delete=models.CASCADE)
created_at_date = models.DateField(auto_now_add=True)
my question:
is that how can i find total amount of transactions for each day. for each day it should calculate Sum of all transactions in that day.
I need for example do this for last 7 days.
Found the solution, Query below will work:
Transaction.objects.filter().values('created_at__date').order_by('created_at__date').annotate(sum=Sum('amount'))
result will be:
<QuerySet [{'created_at__date': datetime.date(2019, 1, 3), 'sum': 10000000.0}, {'created_at__date': datetime.date(2019, 1, 4), 'sum': 4367566577.0}]>

Django annotate on property field

I'm using Django 2.0 and Django REST Framework
I have a model like below.
class Contact(models.Model):
first_name = models.CharField(max_length=100)
class AmountGiven(models.Model):
contact = models.ForeignKey(Contact, on_delete=models.PROTECT)
amount = models.FloatField(help_text='Amount given to the contact')
#property
def total_payable(self):
return self.amount
#property
def amount_due(self):
returned_amount = 0
for returned in self.amountreturned_set.all():
returned_amount += returned.amount
return self.total_payable - returned_amount
class AmountReturned(models.Model):
amount_given = models.ForeignKey(AmountGiven, on_delete=models.CASCADE)
amount = models.FloadField()
I have to get the top 10 contacts of the amount given and due respectively.
In my view, I'm filtering data like
#api_view(http_method_names=['GET'])
def top_ten(request):
filter_type = request.query_params.get('type', None)
if filter_type == 'due':
# query for due type
elif filter_type == 'given':
qs = Contact.objects.filter(
user=request.user
).values('id').annotate(
amount_given=Sum('amountgiven__amount')
).order_by(
'-amount_given'
)[:10]
graph_data = []
for q in qs:
d = {}
contact = Contact.objects.get(pk=q['id'])
d['contact'] = contact.full_name if contact else 'Unknown'
d['value'] = q['amount_given']
graph_data.append(d)
return Response(graph_data)
else:
raise NotFound('No data found for given filter type')
the type query can be due or given.
The code for given type is working fine as all fields are in the database. But how can I filter based on the virtual field for due type?
What I have to do is to annotate Sum of amount_due property group by contact.
You cannot filter based on #property.
As far as I understand your problem correctly you can aggregate sum of related AmountGiven and sum of AmountReturned, then calculate due field which keep result of subtracting letter and former.
The query:
from django.db.models import Sum, Value
from django.db.models.functions import Coalesce
Contact.objects.filter(
amountgiven__amount__gt=0
).annotate(
due=Sum('amountgiven__amount') - Coalesce(Sum('amountgiven__amountreturned__amount'), Value(0))
).order_by('-due').values_list('due', 'id')
will return:
<QuerySet [{'id': 3, 'due': 2500.0}, {'id': 1, 'due': 2450.0}, {'id': 2, 'due': 1500.0}]>
However with this solution you cannot distinct between many AmountGiven across one Contact. You get big picture like results.
If you want split due value per AmountGiven instance the just annotate like so:
AmountGiven.objects.annotate(
due=Sum('amount') - Coalesce(Sum('amountreturned__amount'), Value(0))
).order_by('-due').values_list('due', 'contact__id', 'id')
which returns
<QuerySet [
{'contact__id': 3, 'id': 3, 'due': 2500.0},
{'contact__id': 1, 'id': 1, 'due': 1750.0},
{'contact__id': 2, 'id': 2, 'due': 1500.0},
{'contact__id': 1, 'id': 4, 'due': 350.0},
{'contact__id': 1, 'id': 5, 'due': 350.0}
]>
References
Coalesce

Django distinct() not returning distinct values

I have a Session model like this:
class Session(models.Model):
user = models.ForeignKey(User, null=True, blank=True, on_delete=models.CASCADE, related_name="sessions")
flavor = models.ForeignKey(Flavor, null=True, blank=True, on_delete=models.CASCADE, related_name="sessions")
....
And I'm trying to run a query:
sessions = Session.objects.all().values('flavor__pk', 'user__pk').distinct()
But when I then print the sessions object I get this:
<QuerySet [{'user__pk': 14544, 'flavor__pk': 1}, {'user__pk': 14544, 'flavor__pk': 1}, {'user__pk': None, 'flavor__pk': 30}, {'user__pk': 193, 'flavor__pk': 30}, '...(remaining elements truncated)...']>
Which, if you look closely, the first two entries are exactly the same {'user__pk': 14544, 'flavor__pk': 1}! Isn't this supposed to be distinct?
I think this code works:
Session.objects.all().values_list('flavor__pk', 'user__pk').distinct()

Django ORM: exclude does not return expected result

Django 1.11.4
I'd like to select only frames with lost items.
If an item is lost, lost_day is not empty.
The problem: I can't understand what is going on in case of "Wrong result". I would say, it should return the same result as the "Correct result". Could you help me realize what is wrong with it?
Wrong result
>>> Frame.objects.all().exclude(item__lost_day__isnull=True)
<QuerySet [<Frame: 3>]>
Correct result:
>>> Frame.objects.all().filter(item__lost_day__isnull=False)
<QuerySet [<Frame: 1>, <Frame: 3>]>
Self-checking
>>> a = Item.objects.get(pk=1)
>>> a.lost_day
datetime.date(1997, 1, 1)
>>> a.lost_day is None
False
>>> b = Item.objects.get(pk=2)
>>> b.lost_day
datetime.date(1997, 2, 2)
>>> b.lost_day is None
False
models.py
class Frame(models.Model):
pass
class Item(models.Model):
frame = models.ForeignKey(Frame,
blank=False,
on_delete=models.PROTECT,
verbose_name=_("frame"))
lost_day = models.DateField(auto_now=False,
auto_now_add=False,
blank=True,
null=True,
verbose_name=_("lost"))
Can you show:
Frame.objects.filter(pk=1).values('item__lost_day')
looks like you have multi reference with empty and not lost_day, so we have
[{'item__lost_day': None},
{'item__lost_day': datetime.date(1997, 1, 1)},
{'item__lost_day': None}]
when you query Frame.objects.exclude(item__lost_day__isnull=True) then djanqo get all frames that not have item__lost_day__isnull and this is frame pk=3 it has null values, but not frame <1>
and when you Frame.objects.filter(item__lost_day__isnull=False) then django find all frames with not null lost_day and they are 1 and 3 == all right.
The most trouble in your research is you change condition(null, not_null) and method(filter, exclude) all together.