Filter and count with django - django

Suppose I have a Post and Vote tables.
Each post can be either liked or disliked (this is the post_type).
class Post(models.Model):
author = models.ForeignKey(User)
title = models.CharField(verbose_name=_("title"), max_length=100, null=True, blank=True)
content = models.TextField(verbose_name=_("content"), unique=True)
ip = models.CharField(verbose_name=_("ip"), max_length=15)
class Vote(models.Model):
user = models.ForeignKey(User)
post = models.ForeignKey(Post)
post_type = models.PositiveSmallIntegerField(_('post_type'))
I want to get posts and annotate each post with number of likes.
What is the best way to do this?

You should make a function in Post model and call this whenever you need the count.
class Post(models.Model):
...
def likes_count(self):
return self.vote_set.filter(post_type=1).count()
Use it like this:
p = Post.objects.get(pk=1)
print p.likes_count()

One approach is to add a method to the Post class that fetches this count, as shown by #sachin-gupta. However this will generate one extra query for every post that you fetch. If you are fetching posts and their counts in bulk, this is not desirable.
You could annotate the posts in bulk but I don't think your current model structure will allow it, because you cannot filter within an annotation. You could consider changing your structure as follows:
class Vote(models.Model):
"""
An abstract vote model.
"""
user = models.ForeignKey(User)
post = models.ForeignKey(Post)
class Meta:
abstract = True
class LikeVote(Vote)
pass
class DislikeVote(Vote)
pass
i.e., instead of storing likes and dislikes in one model, you have a separate model for each. Now, you can annotate your posts in bulk, in a single query:
from django.db.models import Count
posts = Post.objects.all().annotate(Count('likevote_set'))
for post in posts:
print post.likevote__count
Of course, whether or not this is feasible depends on the architecture of the rest of your app, and how many "vote types" you are planning to have. However if you are going to be querying the vote counts of posts frequently then you will need to try and avoid a large number of database queries.

Related

Django Model queries with relationships. How to do the right join

Let's say I have 2 Models:
class Auction(models.Model):
seller = models.ForeignKey(User, on_delete=models.CASCADE, related_name="seller")
title = models.CharField(max_length=64)
class Watchlist(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='user_watchlist')
auction = models.ForeignKey(Auction, on_delete=models.CASCADE, related_name='auction_watchlist')
The view receives a request, creates a context variable with the auction objects the are:
associated with the user who made the request and
that have been added to the Watchlist Model,
sends it to the template.
I have set up my view to work like this:
#login_required
def watchlist(request):
watchlist_objects = Watchlist.objects.filter(user=request.user)
auction_objects = Auction.objects.filter(auction_watchlist__in=watchlist_objects).all()
context = {'watchlist_auctions': auction_objects}
print(context)
return render(request, "auctions/watchlist.html", context)
-I make the first query to get the list of items in the watchlist associate with the user.
-Then I use that to get another query from the Auction Model and I pass it to the template.
In the template I can access the attributes of Auction to display them. (title, author, and others that I did not include for simplicity)
The question is:
Is this the "right way? Is there a better way to access the attributes in Auction from the first Watchlist query?
It seems to me that I'm doing something overcomplicated.
This is not that bad, considering that it will probably be executed as one query, because of the lazy queryset evaluations. You can skip the .all() if you already have .filter().
However, there is a more convenient way to do this, using lookups that span relationships.:
auction_objects = Auction.objects.filter(auction_watchlist__user_id=request.user.id)

dynamiclly count vs database record

I have a post model as below, now I use number_of_likes to record the liked post number. If so, I have to manually maintain the number_of_likes field.
Now, I add this field in post mainly two reasons, and I would like to hear your advice.
it is easy to write serialisation using declarative syntax(every post need this)
I don't need to filter and count on model Like, which is more expensive than just get this value from field
class Post(models.Model):
...
number_of_likes = models.IntegerField()
class Like(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
post = models.ForeignKey(Post, on_delete=models.CASCADE)
I would like to know which method is better, using Like.objects.filter(user=user).count() or maintain a new field such as number_of_likes.If choose later, what is the best way to maintain this field
As #WillemVanOnsem suggested, best way to display this data is by annotation. For example:
from django.db.models import Count
posts = Post.objects.annotate(num_of_likes=Count('like'))
# usage
for post in posts:
print(post.num_of_likes)
# or
posts.values('pk', 'num_of_likes')

Django project architecture advice

I have a django project and I have a Post model witch look like that:
class BasicPost(models.Model):
author = models.ForeignKey('auth.User', on_delete=models.CASCADE)
published = models.BooleanField(default=False)
created_date = models.DateTimeField(auto_now_add=True)
title = models.CharField(max_length=100, blank=False)
body = models.TextField(max_length=999)
media = models.ImageField(blank=True)
def get_absolute_url(self):
return reverse('basic_post', args=[str(self.pk)])
def __str__(self):
return self.title
Also, I use the basic User model that comes with the basic django app.
I want to save witch posts each user has read so I can send him posts he haven't read.
My question is what is the best way to do so, If I use Many to Many field, should I put it on the User model and save all the posts he read or should I do it in the other direction, put the Many to Many field in the Post model and save for each post witch user read it?
it's going to be more that 1 million + posts in the Post model and about 50,000 users and I want to do the best filters to return unread posts to the user
If I should use the first option, how do I expand the User model?
thanks!
On your first question (which way to go): I believe that ManyToMany by default creates indices in the DB for both foreign keys. Therefore, wherever you put the relation, in User or in BasicPost, you'll have the direct and reverse relationships working through an index. Django will create for you a pivot table with three columns like: (id, user_id, basic_post_id). Every access to this table will index through user_id or basic_post_id and check that there's a unique couple (user_id, basic_post_id), if any. So it's more within your application that you'll decide whether you filter from a 1 million set or from a 50k posts.
On your second question (how to overload User), it's generally recommended to subclass User from the very beginning. If that's too late and your project is too far advanced for that, you can do this in your models.py:
class BasicPost(models.Model):
# your code
readers = models.ManyToManyField(to='User', related_name="posts_already_read")
# "manually" add method to User class
def _unread_posts(user):
return BasicPost.objects.exclude(readers__in=user)
User.unread_posts = _unread_posts
Haven't run this code though! Hope this helps.
Could you have a separate ReadPost model instead of a potentially large m2m, which you could save when a user reads a post? That way you can just query the ReadPost models to get the data, instead of storing it all in the blog post.
Maybe something like this:
from django.utils import timezone
class UserReadPost(models.Model):
user = models.ForeignKey("auth.User", on_delete=models.CASCADE, related_name="read_posts")
seen_at = models.DateTimeField(default=timezone.now)
post = models.ForeignKey(BasicPost, on_delete=models.CASCADE, related_name="read_by_users")
You could add a unique_together constraint to make sure that only one UserReadPost object is created for each user and post (to make sure you don't count any twice), and use get_or_create() when creating new records.
Then finding the posts a user has read is:
posts = UserReadPost.objects.filter(user=current_user).values_list("post", flat=True)
This could also be extended relatively easily. For example, if your BasicPost objects can be edited, you could add an updated_at field to the post. Then you could compare the seen_at of the UserReadPost field to the updated_at field of the BasicPost to check if they've seen the updated version.
Downside is you'd be creating a lot of rows in the DB for this table.
If you place your posts in chronological order (by created_at, for example), your option could be to extend user model with latest_read_post_id field.
This case:
class BasicPost(models.Model):
# your code
def is_read_by(self, user):
return self.id < user.latest_read_post_id

Optimizing Django queryset related comparisons

I have a Django app where users upload photos, and leave comments under them. The data models to reflect these objects are Photo and PhotoComment respectively.
There's a third data model called PhotoThreadSubscription. Whenever a user comments under a photo, the user is subscribed to that particular thread via creating an object in PhotoThreadSubscription. This way, he/she can be apprised of comments left in the same thread by other users subsequently.
class PhotoThreadSubscription(models.Model):
viewer = models.ForeignKey(User)
viewed_at = models.DateTimeField(db_index=True)
which_photo = models.ForeignKey(Photo)
Every time a user comments under a photo, I update the viewed_at attribute of the user's PhotoThreadSubscription object for that particular photo. Any comments by other users that have a submission time of greater than viewed_at for that particular thread are therefore new.
Suppose I have a queryset of comments, all belonging to unique photos that never repeat. I want to traverse through this queryset and find the latest unseen comment.
Currently, I'm trying this in a very DB heavy way:
latest_unseen_comment = PhotoComment(id=1) #i.e. a very old comment
for comment in comments:
if comment.submitted_on > PhotoThreadSubscription.objects.get(viewer=user, which_photo_id=comment.which_photo_id).viewed_at and comment.submitted_on > latest_unseen_comment.submitted_on:
latest_unseen_comment = comment
This is obviously not a good way to do it. For one, I don't want to do DB calls in a for loop. How do I manage the above in one call? Specifically, how do I get the relevant PhotoThreadSubscription queryset in one call, and next, how do I use that to calculate the max_unseen_comment? I'm highly confused right now.
class Photo(models.Model):
owner = models.ForeignKey(User)
image_file = models.ImageField(upload_to=upload_photo_to_location, storage=OverwriteStorage())
upload_time = models.DateTimeField(auto_now_add=True, db_index=True)
latest_comment = models.ForeignKey(blank=True, null=True, on_delete=models.CASCADE)
class PhotoComment(models.Model):
which_photo = models.ForeignKey(Photo)
text = models.TextField(validators=[MaxLengthValidator(250)])
submitted_by = models.ForeignKey(User)
submitted_on = models.DateTimeField(auto_now_add=True)
Please ask for clarification if the question seemed hazy.
I think this will do it in a single query:
latest_unseen_comment = (
comments.filter(which_photo__photothreadsubscription__viewer=user,
which_photo__photothreadsubscription__viewed_at__lt=F("submitted_on"))
.order_by("-submitted_on")
.first()
)
The key here is using F expressions so that the comparison can be done with each comment's individual date, rather than using a single date hardcoded in the query. After filtering the queryset to only include the comments that are unseen, we then order_by the date of the comment and take the first one.

Working with annotation in Django queryset

I need help in a Django annotation.
I have a Django data model called Photo, and another called PhotoStream (one PhotoStream can have many Photos - detailed models at the end). I get the most recent 200 photos simply by: Photo.objects.order_by('-id')[:200]
To every object in the above queryset, I want to annotate the count of all related photos. A related photo is one which is (i) from the same PhotoStream, (ii) whose timestamp is less than or equal to the time stamp of the object in question.
In other words:
for obj in context["object_list"]:
count = Photo.objects.filter(which_stream=obj.which_stream).order_by('-upload_time').exclude(upload_time__gt=obj.upload_time).count()
I'm new to this, and can't seem to translate the for loop above into a queryset annotation. Any help?
Here's the photo and photostream data models with relevant fields:
class Photo(models.Model):
owner = models.ForeignKey(User)
which_stream = models.ForeignKey(PhotoStream)
image_file = models.ImageField(upload_to=upload_photo_to_location, storage=OverwriteStorage())
upload_time = models.DateTimeField(auto_now_add=True, db_index=True)
class PhotoStream(models.Model):
stream_cover = models.ForeignKey(Photo)
children_count = models.IntegerField(default=1)
creation_time = models.DateTimeField(auto_now_add=True)
So far my attempt has been:
Photo.objects.order_by('-id').annotate(num_related_photos=Count('which_stream__photo__upload_time__lte=F('upload_time')))[:200]
Gives an invalid syntax error.
Whereas the following works, but doesn't cater to my timestamp related requirement in (ii) above:
Photo.objects.order_by('-id').annotate(num_related_photos=Count('which_stream__photo'))[:200]
I broke the query down as follows:
Photo.objects.filter(which_stream__photo__upload_time__lte=F('upload_time')).annotate(num_related_photos=Count('which_stream__photo')).order_by('-id')[:200]
It looks counter-intuitive, but works because of the way this creates the underlying SQL. Good explanation here: https://stackoverflow.com/a/7001419/4936905