Working with annotation in Django queryset - django

I need help in a Django annotation.
I have a Django data model called Photo, and another called PhotoStream (one PhotoStream can have many Photos - detailed models at the end). I get the most recent 200 photos simply by: Photo.objects.order_by('-id')[:200]
To every object in the above queryset, I want to annotate the count of all related photos. A related photo is one which is (i) from the same PhotoStream, (ii) whose timestamp is less than or equal to the time stamp of the object in question.
In other words:
for obj in context["object_list"]:
count = Photo.objects.filter(which_stream=obj.which_stream).order_by('-upload_time').exclude(upload_time__gt=obj.upload_time).count()
I'm new to this, and can't seem to translate the for loop above into a queryset annotation. Any help?
Here's the photo and photostream data models with relevant fields:
class Photo(models.Model):
owner = models.ForeignKey(User)
which_stream = models.ForeignKey(PhotoStream)
image_file = models.ImageField(upload_to=upload_photo_to_location, storage=OverwriteStorage())
upload_time = models.DateTimeField(auto_now_add=True, db_index=True)
class PhotoStream(models.Model):
stream_cover = models.ForeignKey(Photo)
children_count = models.IntegerField(default=1)
creation_time = models.DateTimeField(auto_now_add=True)
So far my attempt has been:
Photo.objects.order_by('-id').annotate(num_related_photos=Count('which_stream__photo__upload_time__lte=F('upload_time')))[:200]
Gives an invalid syntax error.
Whereas the following works, but doesn't cater to my timestamp related requirement in (ii) above:
Photo.objects.order_by('-id').annotate(num_related_photos=Count('which_stream__photo'))[:200]

I broke the query down as follows:
Photo.objects.filter(which_stream__photo__upload_time__lte=F('upload_time')).annotate(num_related_photos=Count('which_stream__photo')).order_by('-id')[:200]
It looks counter-intuitive, but works because of the way this creates the underlying SQL. Good explanation here: https://stackoverflow.com/a/7001419/4936905

Related

Django Query ManyToMany with Custom Through Table Field Data

I've been trying to figure this one out for a while now but am confused. Every ManyToMany relationship always goes through a third table which isn't that difficult to understand. But in the event that the third table is a custom through table with additional fields how do you grab the custom field for each row?
Here's a sample table I made. How can I get all the movies a User has watched along with the additional watched field and finished field? This example assumes the user is only allowed to see the movie once whether they finish it or not so there will only be 1 record for each movie they saw.
class Movie(models.Model):
title = models.CharField(max_length=191)
class User(models.Model):
username = models.CharField(max_length=191)
watched = models.ManyToMany(Movie, through='watch')
class Watch(models.Model):
user = models.Foreignkey(User, on_delete=models.CASCADE)
movie = models.Foreignkey(Movie, on_delete=models.CASCADE)
watched = models.DateTimeField()
finished = models.BooleanField()
Penny for your thoughts my friends.
You can uses:
from django.db.models import F
my_user.watched.annotate(
watched=F('watch__watched'),
finished=F('watch__finished')
)
This will return a QuerySet of Movies that contain as extra attributes .watched and .finished.
That being said, it might be cleaner to just access the watch_set, and thus iterate over the Watch objects and access the .movie object for details about the movie. You can use .select_related(..) [Django-doc] to fetch the information about the Movies in the same database query:
for watch in my_user.watch_set.select_related('movie'):
print(f'{watch.movie.title}: {watch.watched}, {watch.finished}')

Django project architecture advice

I have a django project and I have a Post model witch look like that:
class BasicPost(models.Model):
author = models.ForeignKey('auth.User', on_delete=models.CASCADE)
published = models.BooleanField(default=False)
created_date = models.DateTimeField(auto_now_add=True)
title = models.CharField(max_length=100, blank=False)
body = models.TextField(max_length=999)
media = models.ImageField(blank=True)
def get_absolute_url(self):
return reverse('basic_post', args=[str(self.pk)])
def __str__(self):
return self.title
Also, I use the basic User model that comes with the basic django app.
I want to save witch posts each user has read so I can send him posts he haven't read.
My question is what is the best way to do so, If I use Many to Many field, should I put it on the User model and save all the posts he read or should I do it in the other direction, put the Many to Many field in the Post model and save for each post witch user read it?
it's going to be more that 1 million + posts in the Post model and about 50,000 users and I want to do the best filters to return unread posts to the user
If I should use the first option, how do I expand the User model?
thanks!
On your first question (which way to go): I believe that ManyToMany by default creates indices in the DB for both foreign keys. Therefore, wherever you put the relation, in User or in BasicPost, you'll have the direct and reverse relationships working through an index. Django will create for you a pivot table with three columns like: (id, user_id, basic_post_id). Every access to this table will index through user_id or basic_post_id and check that there's a unique couple (user_id, basic_post_id), if any. So it's more within your application that you'll decide whether you filter from a 1 million set or from a 50k posts.
On your second question (how to overload User), it's generally recommended to subclass User from the very beginning. If that's too late and your project is too far advanced for that, you can do this in your models.py:
class BasicPost(models.Model):
# your code
readers = models.ManyToManyField(to='User', related_name="posts_already_read")
# "manually" add method to User class
def _unread_posts(user):
return BasicPost.objects.exclude(readers__in=user)
User.unread_posts = _unread_posts
Haven't run this code though! Hope this helps.
Could you have a separate ReadPost model instead of a potentially large m2m, which you could save when a user reads a post? That way you can just query the ReadPost models to get the data, instead of storing it all in the blog post.
Maybe something like this:
from django.utils import timezone
class UserReadPost(models.Model):
user = models.ForeignKey("auth.User", on_delete=models.CASCADE, related_name="read_posts")
seen_at = models.DateTimeField(default=timezone.now)
post = models.ForeignKey(BasicPost, on_delete=models.CASCADE, related_name="read_by_users")
You could add a unique_together constraint to make sure that only one UserReadPost object is created for each user and post (to make sure you don't count any twice), and use get_or_create() when creating new records.
Then finding the posts a user has read is:
posts = UserReadPost.objects.filter(user=current_user).values_list("post", flat=True)
This could also be extended relatively easily. For example, if your BasicPost objects can be edited, you could add an updated_at field to the post. Then you could compare the seen_at of the UserReadPost field to the updated_at field of the BasicPost to check if they've seen the updated version.
Downside is you'd be creating a lot of rows in the DB for this table.
If you place your posts in chronological order (by created_at, for example), your option could be to extend user model with latest_read_post_id field.
This case:
class BasicPost(models.Model):
# your code
def is_read_by(self, user):
return self.id < user.latest_read_post_id

Django - 'WhereNode' object has no attribute 'output_field' error

I am trying to query and annotate some data from my models:
class Feed(models.Model): # Feed of content
user = models.ForeignKey(User, on_delete=models.CASCADE)
class Piece(models.Model): # Piece of content (video or playlist)
removed = models.BooleanField(default=False)
feed = models.ForeignKey(Feed, on_delete=models.CASCADE)
user = models.ForeignKey(User, on_delete=models.CASCADE)
Other fields are not used in the following queries so I skipped them here.
In my view I need to get queryset of all feeds of an authenticated user. Annotation should contain quantity of all pieces that are not removed.
Initially, Piece model didn't contain removed field and everything worked great with the queryset like this:
Feed.objects.filter(user=self.request.user).annotate(Count('piece'))
But then I added the field removed to Piece model and needed to count only pieces that were not removed:
Feed.objects.filter(user=self.request.user)
.annotate(Count('piece'), filter=Q(piece__removed=False))
It gave me the following error:
'WhereNode' object has no attribute 'output_field'
It is only a little fraction of what django outputs on the error page, so if it is not enough, please let me know what else I need to include in my question.
I tried to include output_field with options like models.IntegerField() or models.FloatField() (properly imported) here and there but got some errors which I do not provide here because I believe those actions made no sense.
I am using Django 2.0.3
Your syntax mistake is here,
Feed.objects.filter(user=self.request.user)
.annotate(Count('piece', filter=Q(piece__removed=False)))
filter needs to apply at Count not at annotate.
Reference from Django's documentation:
https://docs.djangoproject.com/en/2.1/topics/db/aggregation/#filtering-on-annotations

Optimizing Django queryset related comparisons

I have a Django app where users upload photos, and leave comments under them. The data models to reflect these objects are Photo and PhotoComment respectively.
There's a third data model called PhotoThreadSubscription. Whenever a user comments under a photo, the user is subscribed to that particular thread via creating an object in PhotoThreadSubscription. This way, he/she can be apprised of comments left in the same thread by other users subsequently.
class PhotoThreadSubscription(models.Model):
viewer = models.ForeignKey(User)
viewed_at = models.DateTimeField(db_index=True)
which_photo = models.ForeignKey(Photo)
Every time a user comments under a photo, I update the viewed_at attribute of the user's PhotoThreadSubscription object for that particular photo. Any comments by other users that have a submission time of greater than viewed_at for that particular thread are therefore new.
Suppose I have a queryset of comments, all belonging to unique photos that never repeat. I want to traverse through this queryset and find the latest unseen comment.
Currently, I'm trying this in a very DB heavy way:
latest_unseen_comment = PhotoComment(id=1) #i.e. a very old comment
for comment in comments:
if comment.submitted_on > PhotoThreadSubscription.objects.get(viewer=user, which_photo_id=comment.which_photo_id).viewed_at and comment.submitted_on > latest_unseen_comment.submitted_on:
latest_unseen_comment = comment
This is obviously not a good way to do it. For one, I don't want to do DB calls in a for loop. How do I manage the above in one call? Specifically, how do I get the relevant PhotoThreadSubscription queryset in one call, and next, how do I use that to calculate the max_unseen_comment? I'm highly confused right now.
class Photo(models.Model):
owner = models.ForeignKey(User)
image_file = models.ImageField(upload_to=upload_photo_to_location, storage=OverwriteStorage())
upload_time = models.DateTimeField(auto_now_add=True, db_index=True)
latest_comment = models.ForeignKey(blank=True, null=True, on_delete=models.CASCADE)
class PhotoComment(models.Model):
which_photo = models.ForeignKey(Photo)
text = models.TextField(validators=[MaxLengthValidator(250)])
submitted_by = models.ForeignKey(User)
submitted_on = models.DateTimeField(auto_now_add=True)
Please ask for clarification if the question seemed hazy.
I think this will do it in a single query:
latest_unseen_comment = (
comments.filter(which_photo__photothreadsubscription__viewer=user,
which_photo__photothreadsubscription__viewed_at__lt=F("submitted_on"))
.order_by("-submitted_on")
.first()
)
The key here is using F expressions so that the comparison can be done with each comment's individual date, rather than using a single date hardcoded in the query. After filtering the queryset to only include the comments that are unseen, we then order_by the date of the comment and take the first one.

Filter and count with django

Suppose I have a Post and Vote tables.
Each post can be either liked or disliked (this is the post_type).
class Post(models.Model):
author = models.ForeignKey(User)
title = models.CharField(verbose_name=_("title"), max_length=100, null=True, blank=True)
content = models.TextField(verbose_name=_("content"), unique=True)
ip = models.CharField(verbose_name=_("ip"), max_length=15)
class Vote(models.Model):
user = models.ForeignKey(User)
post = models.ForeignKey(Post)
post_type = models.PositiveSmallIntegerField(_('post_type'))
I want to get posts and annotate each post with number of likes.
What is the best way to do this?
You should make a function in Post model and call this whenever you need the count.
class Post(models.Model):
...
def likes_count(self):
return self.vote_set.filter(post_type=1).count()
Use it like this:
p = Post.objects.get(pk=1)
print p.likes_count()
One approach is to add a method to the Post class that fetches this count, as shown by #sachin-gupta. However this will generate one extra query for every post that you fetch. If you are fetching posts and their counts in bulk, this is not desirable.
You could annotate the posts in bulk but I don't think your current model structure will allow it, because you cannot filter within an annotation. You could consider changing your structure as follows:
class Vote(models.Model):
"""
An abstract vote model.
"""
user = models.ForeignKey(User)
post = models.ForeignKey(Post)
class Meta:
abstract = True
class LikeVote(Vote)
pass
class DislikeVote(Vote)
pass
i.e., instead of storing likes and dislikes in one model, you have a separate model for each. Now, you can annotate your posts in bulk, in a single query:
from django.db.models import Count
posts = Post.objects.all().annotate(Count('likevote_set'))
for post in posts:
print post.likevote__count
Of course, whether or not this is feasible depends on the architecture of the rest of your app, and how many "vote types" you are planning to have. However if you are going to be querying the vote counts of posts frequently then you will need to try and avoid a large number of database queries.