Django distinct() not returning distinct values

Django distinct() not returning distinct values - django

I have a Session model like this:
class Session(models.Model):
user = models.ForeignKey(User, null=True, blank=True, on_delete=models.CASCADE, related_name="sessions")
flavor = models.ForeignKey(Flavor, null=True, blank=True, on_delete=models.CASCADE, related_name="sessions")
....
And I'm trying to run a query:
sessions = Session.objects.all().values('flavor__pk', 'user__pk').distinct()
But when I then print the sessions object I get this:
<QuerySet [{'user__pk': 14544, 'flavor__pk': 1}, {'user__pk': 14544, 'flavor__pk': 1}, {'user__pk': None, 'flavor__pk': 30}, {'user__pk': 193, 'flavor__pk': 30}, '...(remaining elements truncated)...']>
Which, if you look closely, the first two entries are exactly the same {'user__pk': 14544, 'flavor__pk': 1}! Isn't this supposed to be distinct?

I think this code works:
Session.objects.all().values_list('flavor__pk', 'user__pk').distinct()

Related

Accessing one model from within another in Django many-to-one using ForeignKey

Lets imagine we have two models (many-to-one model).
Code below shows that a reporter can have multiple articles
class Reporter(models.Model):
first_name = models.CharField(max_length=30)
last_name = models.CharField(max_length=30)
email = models.EmailField()
def __str__(self):
return "%s %s" % (self.first_name, self.last_name)
class Article(models.Model):
headline = models.CharField(max_length=100)
pub_date = models.DateField(null=True)
reporter = models.ForeignKey(Reporter, on_delete=models.CASCADE, null=True)
def __str__(self):
return self.headline
Let's see what I have in my database on this model.
# Reporter.objects.all().values()
# <QuerySet [
# {'id': 1, 'first_name': 'John', 'last_name': 'Smith', 'email': 'john#example.com'},
# {'id': 2, 'first_name': 'Paul', 'last_name': 'Jones', 'email': 'paul#example.com'}
# ]>
# Article.objects.all().values()
# <QuerySet [
# {'id': 5, 'headline': "1st headline", 'pub_date': datetime.date(2005, 7, 29),
# 'reporter_id': 1},
# {'id': 6, 'headline': "2nd headline", 'pub_date': datetime.date(2006, 1, 17),
# 'reporter_id': 2},
# {'id': 7, 'headline': '3rd headline', 'pub_date': datetime.date(2005, 7, 27),
# 'reporter_id': 1}
# ]>
The first reporter has two publications and second has the only.
I need to get the list of all articles for each reporter.
I tried this way (according to django docs):
Article.objects.filter(reporter__first_name='John')
It's okay. It works. I also tried to instantiate the first reporter as 'r1' and then do this:
r1.article_set.all()
And this piece of code works too.
But as I'm new to django, I think that instantiating the first reporter as 'r1' and then making a query is a bit slow. It is because django makes me run r1.save() and then r1.article_set.all(). It looks like django makes 2 query into database (first query - to save an instance, the second query to run r1.article_set.all)
Is my point of view correct? And how to query all the reporter's articles as fast as Article.objects.filter(reporter__first_name='John') but using the Reporter object?
Thanks

I also tried to instantiate the first reporter as 'r1' and then do this:
r1.article_set.all()
And this piece of code works too. But as I'm new to django, I think that instantiating the first reporter as 'r1' and then making a query is a bit slow.
Yes, but Django can load the related articles all with a second query in bulk. We do this with .prefetch_related(…) [Django-doc]:
reporters = Reporter.objects.prefetch_related('article_set')
for reporter in reporters:
print(reporter.first_name)
print(reporter.article_set.all())
Instead of the N+1 queries that your implementations make (one query to fetch all reporters, and one query per reporter to fetch the related articles), this will make two queries: one to fetch all the reporters, and one to fetch all the articles related to one of these reporters. Django will then do some joining such that the articles related to the first reporter r1 end up in r1.article_set.all()

How to count with filter django query?

I'm trying to get a list of latest 100 posts and also the aggregated count of approved, pending, and rejected posts for the user of that post.
models.py
class BlogPost(models.Model):
POST_STATUSES = (
('A', 'Approved'),
('P', 'Pending'),
('R', 'Rejected')
)
author = models.ForeignKey(User)
title = models.CharField(max_length=50)
description = models.TextField()
status = models.ChoiceField(max_length=1, choices=POST_STATUSES)
views.py
def latest_posts(request)
latest_100_posts = BlogPost.objects.all()[:100]
I get the latest 100 posts, now I want to get each author of the post and display their total Approved, Pending, Rejected count
Title Of Post, Author1, 10, 5, 1
Title Of Post2, Author2, 7, 3, 1
Title Of Post3, Author1, 10, 5, 1
...
Some things I've thought about are looping through each of the 100 posts and returning the count, but it seems very inefficient
for post in latest_100_posts:
approved_count = BlogPost.objects.filter(author=post.user,status='A').count()
pending_count = BlogPost.objects.filter(author=post.user,status='P').count()
rejected_count = BlogPost.objects.filter(author=post.user,status='R').count()
Is there a more efficient way to do this? I know about using aggregate Count but I'm not sure how to sub-filter on the status ChoiceField

You can do it like this using conditional aggregation:
For that, lets add related_name in BlogPost.
class BlogPost(models.Model):
POST_STATUSES = (
('A', 'Approved'),
('P', 'Pending'),
('R', 'Rejected')
)
author = models.ForeignKey(User, related_name="user_posts")
title = models.CharField(max_length=50)
description = models.TextField()
status = models.ChoiceField(max_length=1, choices=POST_STATUSES)
Then lets update the queryset:
from django.db.models import Count, Case, When, IntegerField
top_post_users = list(BlogPost.objects.values_list('auther_id', flat=True))[:100]
users = User.objects.filter(pk__in=top_post_users).annotate(approved_count=Count(Case(When(user_posts__status="A", then=1),output_field=IntegerField()))).annotate(pending_count=Count(Case(When(user_posts__status="P", then=1),output_field=IntegerField()))).annotate(reject_count=Count(Case(When(user_posts__status="R", then=1),output_field=IntegerField())))
users.values('approved_count', 'pending_count', 'reject_count')

from django.db.models import Count
approved_count = BlogPost.objects.filter(author=post.user, status=‘A’).annotate(approved_count=Count(‘id’))
pending_count = BlogPost.objects.filter(author=post.user, status=‘P’).annotate(approved_count=Count(‘id’))
rejected_count = BlogPost.objects.filter(author=post.user, status=‘R’).annotate(approved_count=Count(‘id’))

Django - annotate with multiple Count

I have a model called Post which has two fields upvotes and downvotes. Now, upvotes, downvotes are ManyToManyField to a Profile. This is the model:
class Post(models.Model):
profile = models.ForeignKey(Profile, on_delete=models.CASCADE)
title = models.CharField(max_length=300)
content = models.CharField(max_length=1000)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
subreddit = models.ForeignKey(Subreddit, on_delete=models.CASCADE)
upvotes = models.ManyToManyField(Profile, blank=True, related_name='upvoted_posts')
downvotes = models.ManyToManyField(Profile, blank=True, related_name='downvoted_posts')
So, I want to fetch all the posts such that they are in the order of
total(upvotes) - total(downvotes)
So I have used this query:
Post.objects.annotate(
total_votes=Count('upvotes')-Count('downvotes')
).order_by('total_votes')
The problem with this query is the total_votes is always turning out to be zero.
The below queries will explain the situation:
In [5]: Post.objects.annotate(up=Count('upvotes')).values('up')
Out[5]: <QuerySet [{'up': 1}, {'up': 3}, {'up': 2}]>
In [6]: Post.objects.annotate(down=Count('downvotes')).values('down')
Out[6]: <QuerySet [{'down': 1}, {'down': 1}, {'down': 1}]>
In [10]: Post.objects.annotate(up=Count('upvotes'), down=Count('downvotes'), total=Count('upvotes')-Count('downvotes')).values('up', 'down', 'total')
Out[10]: <QuerySet [{'up': 1, 'down': 1, 'total': 0}, {'up': 3, 'down': 3, 'total': 0}, {'up': 2, 'down': 2, 'total': 0}]>
Seems like both up and down are having the same value(which is actually the value of up). How can I solve this?
I have tried this:
In [9]: Post.objects.annotate(up=Count('upvotes')).annotate(down=Count('downvotes')).values('up', 'down')
Out[9]: <QuerySet [{'up': 1, 'down': 1}, {'up': 3, 'down': 3}, {'up': 2, 'down': 2}]>
but even this gives the same output.

Try to use dictinct argument:
Post.objects.annotate(
total_votes=Count('upvotes', distinct=True)-Count('downvotes', distinct=True)
).order_by('total_votes')
From the docs:
Combining multiple aggregations with annotate() will yield the wrong
results because joins are used instead of subqueries. For most
aggregates, there is no way to avoid this problem, however, the Count
aggregate has a distinct parameter that may help.

(I'm aware that this isn't exactly an answer, but code can't be embedded in a comment.)
A better data model would be
class Post:
# ...
class Vote:
voter = models.ForeignKey(Profile, on_delete=models.PROTECT)
post = models.ForeignKey(Post, on_delete=models.CASCADE)
score = models.IntegerField() # either -1 or +1; validate accordingly
class Meta:
unique_together = [('voter', 'post'),]
This way you could count the current total score for a post simply with
Vote.objects.filter(post=post).aggregate(score=Sum('score'))
However, you should be well aware of the performance implications of doing this (or your original version for that matter) every time. It would be better to add a
score = models.IntegerField(editable=False)
field to the Post, that gets updated with the aggregate score every time a vote is created, modified or deleted.

Django QuerySet returns duplicate instance using "filed__in" on list

I have a model like this:
class Post(models.Model):
STATUS_CHOICE = (
('draft', 'Draft'),
('published', 'Published'),
)
title = models.CharField(max_length=250)
slug = models.SlugField(max_length=250,
unique_for_date='publish')
author = models.ForeignKey(User, related_name='blog_posts')
body = models.TextField()
publish = models.DateTimeField(default=timezone.now)
created = models.DateTimeField(auto_now_add=True)
updated = models.DateTimeField(auto_now=True)
status = models.CharField(max_length=10,
choices=STATUS_CHOICE,
default='draft')
objects = models.Manager()
published = PublishedManager()
tags = TaggableManager()
This model use tags provide by taggit.
So I want to find the similar blogs according to the number of the shared tags the have(actually it is the example on 《django by example》).
I use this:
post_tag_ids = post.tags.values_list('id', flat=True)
similar_posts = Post.published.filter(tags__in=post_tag_ids) \
.exclude(id=post.id)
similar_posts = similar_posts.annotate(same_tags=Count('tags')) \
.order_by('-same_tags', '-publish')[:4]
I know the what the first line means.But if the first line return the blog's id list [1,2,3,4,5] and the second line filter all posts.when a post have a id list like this [1,2,6,7] it will return 2 similar_blogs because they have 2 shared id.so I do not know what's going on?(I know annotate meas and I know aggregation means)But why this three lines of code can find the similar blogs according to the tags they shared! (Is there a mistake in Book?But I have tried it ,curious about why??)The third line just aggregate the count of common tags on every object but not the shared tags.How is this working?
I have tried this:
post_tag_ids = post.tags.values_list('id', flat=True)
output:<QuerySet [1, 4, 5, 6, 7]>
similar_posts = Post.published.filter(tags__in=post_tag_ids) \
.exclude(id=post.id)
output:<QuerySet [<Post: asdkjakls>, <Post: asdkjakls>, <Post: asdkjakls>, <Post: asdkjakls>, <Post: another post>]>
but after:similar_posts = similar_posts.annotate(same_tags=Count('tags'))
it becomes 2:<QuerySet [<Post: asdkjakls>, <Post: another post>]>,the same post [<Post: asdkjakls> merge into one

trysimilar_posts = Post.published.filter(tags__in=post_tag_ids) \
.exclude(id=post.id).distinct()to remove duplicate instance in queryset,see this about distinct()

Django get foreign key object inside Queryset

I have 3 models:
class Event(models.Model):
cur_datetime = models.DateTimeField(default=datetime.datetime(1970, 1, 1, 0, 0, 0, 0, pytz.UTC), blank=True, null=True)
week_no = models.IntegerField()
day_no = models.IntegerField()
class SubEvent(models.Model):
event = models.ForeignKey(Event, on_delete=models.CASCADE)
version = models.IntegerField()
class SubSubEvent(models.Model):
sub_event = models.ForeignKey(SubEvent, on_delete=models.CASCADE)
length = models.IntegerField()
I want to get a Queryset from SubSubEvent model, which includes all the Foreign keys as one single object. What I have now is:
querySet = SubSubEvent.objects.filter(sub_event__event__cur_datetime__range=[from_date, to_date])
This will return a queryset, and using a for loop to get __dict__ on each of objects, I get something like this:
{'event_id': 1, '_state': <django.db.models.base.ModelState object at 0x7fd7d9cefeb8>, 'id': 10, 'length': '1'}
This is just a part of the query I want to achieve. What I really want, is all the fields in event_id instead of just the id number. In other word, all the fields (including data) from Event plus SubEvent plus SubSubEvent in one queryset. This queryset should contains objects with cur_datetime, week_no, day_no, version and length.

It sounds like you're looking for select_related().
qs = SubSubEvent.objects \
.select_related('sub_event__event') \
.filter(sub_event__event__cur_datetime__range=[from_date, to_date])
You can then access the related SubEvent and Event resources without hitting the database.
sub_sub_event = qs[0]
sub_event = sub_sub_event.sub_event # doesn't hit the database
event = sub_sub_event.sub_event.event # doesn't hit the database

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django distinct() not returning distinct values - django

I think this code works: Session.objects.all().values_list('flavorpk', 'userpk').distinct()

Related

Accessing one model from within another in Django many-to-one using ForeignKey

How to count with filter django query?

Django - annotate with multiple Count

Django QuerySet returns duplicate instance using "filed__in" on list

Django get foreign key object inside Queryset

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django distinct() not returning distinct values - django

I think this code works: Session.objects.all().values_list('flavor__pk', 'user__pk').distinct()

Related

Accessing one model from within another in Django many-to-one using ForeignKey

How to count with filter django query?

Django - annotate with multiple Count

Django QuerySet returns duplicate instance using "filed__in" on list

Django get foreign key object inside Queryset

Categories

Resources

I think this code works: Session.objects.all().values_list('flavorpk', 'userpk').distinct()