Django query, annotate a chain of related models - django

I have following schema with PostgreSQL.
class Video(models.Model):
title = models.CharField(max_length=255)
created_at = models.DateTimeField()
disabled = models.BooleanField(default=False)
view_count = DecimalField(max_digits=10, decimal_places=0)
class TopVideo(models.Model):
videos = (Video, on_delete=models.CASCADE, primary_key=True)
class Comment(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
video = models.ForeignKey(Video, related_name="comments", on_delete=models.CASCADE)
The reason I have a TopVideo model is because I have millions of videos and querying them takes a long time on a cheap server, so I have a secondary model that is populated by a celery task, and flushes and re-populates on each run, which makes the homepage load time much faster. The task runs the query that you see next, and saves them into the TopVideo model. This way, the task may take long to run, but user doesn't have to wait for the expensive query anymore.
Before having the TopVideo model, I ran this query for my homepage:
videos = (
Video.objects.filter(created_at__range=[start, end])
.annotate(comment_count=Count("comments"))
.exclude(disabled=True)
.order_by("-view_count")[:100]
)
This worked perfectly and I had access to "comment_count" in my template, where I could easily show the number of comments each video had.
But now that I make this query:
top_videos = (
TopVideo.objects.all().annotate(comment_count=Count("video__comments"))
.select_related("video")
.order_by("-video__view_count")[:100]
)
and with a simple for-loop,
videos = []
for video in top_videos:
videos.append(video.video)
I send the videos to the template to render.
My problem is, I no longer have access to the "comment_count" inside the template, and naturally so; I don't send the queryset anymore. How can I now access the comment_count?
Things I tried:
Sending the TopVideo query to template did not work. They're a bunch of TopVideo objects, not Video objects.
I added this piece of code in my template "{{ video.comments.count }}" but this makes 100 requests to the database, which is not really optimal.

You can set the .comment_count to your Video objects with:
videos = []
for top_video in top_videos:
video = top_video.video
video.comment_count = top_video.comment_count
videos.append(video)
but that being said, it is unclear to my why you are querying with TopVideo if you basically strip the TopVideo context from the video.
If you want to obtain the Videos for which there exists a TopVideo object, you can work with:
videos = Video.objects.filter(
created_at__range=[start, end], topvideo__isnull=False
).annotate(
comment_count=Count('comments')
).exclude(disabled=True).order_by('-view_count')[:100]
The topvideo__isnull=False will thus filter out Videos that are not TopVideos.

Related

Django - prefetch_related GenericForeignKey results and sort them

I have the below structure, where content modules, which are subclassed from a common model, are attached to pages via a 'page module' model that references them via a GenericForeignKey:
class SitePage(models.Model):
title = models.CharField()
# [etc..]
class PageModule(models.Model):
page = models.ForeignKey(SitePage, db_index=True, on_delete=models.CASCADE)
module_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
module_id = models.PositiveIntegerField()
module_object = GenericForeignKey("module_type", "module_id")
class CommonModule(models.Model):
published_time = models.DateTimeField()
class SingleImage(CommonModule):
title = models.CharField()
# [etc..]
class Article(CommonModule):
title = models.CharField()
# [etc..]
At the moment, populating pages from this results in a LOT of SQL queries. I want to fetch all the module contents (i.e. all the SingleImage and Article instances) for a given page in the most database-efficient manner.
I can't just do a straight prefetch_related because it "must be restricted to a homogeneous set of results", and I'm fetching multiple content types.
I can get each module type individually:
image_modules = PageModule.objects.filter(page=whatever_page, module_type=ContentType.objects.get_for_model(SingleImage)).prefetch_related('module_object_')
article_modules = PageModule.objects.filter(page=whatever_page, module_type=ContentType.objects.get_for_model(Article)).prefetch_related('module_object')
all_modules = image_modules | article_modules
But I need to sort them:
all_modules.order_by('module_object__published_time')
and I can't because:
"Field 'module_object' does not generate an automatic reverse relation
and therefore cannot be used for reverse querying"
... and I don't think I can add the recommended GenericRelation field to all the content models because there's already content in there.
So... can I do this at all? Or am I stuck?
Following the advice in the comments above I eventually arrived at this code (from 2012!) that has roughly halved the number of queries:
https://gist.github.com/justinfx/3095246
However, as I noted above, it's done that at the expense of creating some fairly inefficient WHERE pk IN() queries, so I've not actually saved much time in total.

Django pagination query duplicated, double the time

In my current project I want to do some filtering and ordering on a queryset and show it to the user in a paginated form.
This works fine, however I am not comfortable with the performance.
When I use and order_by statement either explicitly or implicitly with the model Meta ordering, I can see in the Debug toolbar that this query is essentially executed twice.
Once for the paginator count (without the ORDER BY) and once to fetch the objects slice (with ORDER BY).
From my observation this leads to doubling the time it takes.
Is there any way this can be optimized?
Below is a minimal working example, in my actual app I use class based views.
class Medium(models.Model):
title = models.CharField(verbose_name=_('title'),
max_length=256,
null=False, blank=False,
db_index=True,
)
offered_by = models.ForeignKey(Institution,
verbose_name=_('Offered by'),
on_delete=models.CASCADE,
)
quantity = models.IntegerField(verbose_name=_('Quantity'),
validators=[
MinValueValidator(0)
],
null=False, blank=False,
)
deleted = models.BooleanField(verbose_name=_('Deleted'),
default=False,
)
def index3(request):
media = Medium.objects.filter(deleted=False, quantity__gte=0)
media = media.exclude(offered_by_id=request.user.institution_id)
media = media.filter(title__icontains="funktion")
media = media.order_by('title')
paginator = Paginator(media, 25)
media = paginator.page(1)
return render(request, 'media/empty2.html', {'media': media})
Debug toolbar sql timings
The query is not exactly duplicated: One is a COUNT query, the other one fetches the actual objects for the specific page requested. This is unavoidable, since Django's Paginator needs to know the total number of objects. However, if the queryset media isn't too large, you can optimise by forcing the media Queryset to be evaluated (just add a line len(media) before you define the Paginator).
But note that if media is very large, you might not want to force media to be evaluated as you're loading all the objects into memory.

Django join on cached queryset

I have three models:
class Video(models.Model):
video_pk = models.AutoField(primary_key=True)
author_fk = models.ForeignKey(GRUser, related_name='uploaded_videos', db_column='author_fk')
class VideoLike(models.Model):
video_like_pk = models.AutoField(primary_key=True)
video_fk = models.ForeignKey(Video, related_name='likes_list', db_column='video_fk')
author_fk = models.ForeignKey(GRUser, related_name='video_likes_list', db_column='author_fk')
video_like_dttm = models.DateTimeField(auto_now_add=True)
class VideoStats(models.Model):
video_fk_pk = models.OneToOneField(Video, primary_key=True, db_column='video_fk_pk', related_name='stats')
likes_num = models.BigIntegerField(default=0)
To prevent the same database hits I periodically cache some popular videos:
from django.core.cache import cache
qs_all = models.Video.objects.select_related('author_fk', 'stats').filter(publication_status=models.Video.PUBLISHED).order_by('-stats__likes_num')
cache.set('popular_videos_all', qs_all[:length])
When returning those videos through my API (Django-Rest-Framework) I should add video_like_pk of current user's like if the user is authenticated and the like exists.
I can do it using prefetch_related but it makes two calls to the database and doesn't use the cached results. I want to fetch from the database only VideoLikes of current user and take the rest from the cache. Is it possible? Maybe there is a completely other approach which is better?

Creating a query with foreign keys and grouping by some data in Django

I thought about my problem for days and i need a fresh view on this.
I am building a small application for a client for his deliveries.
# models.py - Clients app
class ClientPR(models.Model):
title = models.CharField(max_length=5,
choices=TITLE_LIST,
default='mr')
last_name = models.CharField(max_length=65)
first_name = models.CharField(max_length=65, verbose_name='Prénom')
frequency = WeekdayField(default=[]) # Return a CommaSeparatedIntegerField from 0 for Monday to 6 for Sunday...
[...]
# models.py - Delivery app
class Truck(models.Model):
name = models.CharField(max_length=40, verbose_name='Nom')
description = models.CharField(max_length=250, blank=True)
color = models.CharField(max_length=10,
choices=COLORS,
default='green',
unique=True,
verbose_name='Couleur Associée')
class Order(models.Model):
delivery = models.ForeignKey(OrderDelivery, verbose_name='Delivery')
client = models.ForeignKey(ClientPR)
order = models.PositiveSmallIntegerField()
class OrderDelivery(models.Model):
date = models.DateField(default=d.today())
truck = models.ForeignKey(Truck, verbose_name='Camion', unique_for_date="date")
So i was trying to get a query and i got this one :
ClientPR.objects.today().filter(order__delivery__date=date.today())
.order_by('order__delivery__truck', 'order__order')
But, i does not do what i really want.
I want to have a list of Client obj (query sets) group by truck and order by today's delivery order !
The thing is, i want to have EVERY clients for the day even if they are not in the delivery list and with filter, that cannot be it.
I can make a query with OrderDelivery model but i will only get the clients for the delivery, not all of them for the day...
Maybe i will need to do it with a Q object ? or even raw SQL ?
Maybe i have built my models relationships the wrong way ? Or i need to lower what i want to do... Well, for now, i need your help to see the problem with new eyes !
Thanks for those who will take some time to help me.
After some tests, i decided to go with 2 querys for one table.
One from OrderDelivery Queryset for getting a list of clients regroup by Trucks and another one from ClientPR Queryset for all the clients without a delivery set for them.
I that way, no problem !

Ordering by a custom model field in django

I am trying to add an additional custom field to a django model. I have been having quite a hard time figuring out how to do the following, and I will be awarding a 150pt bounty for the first fully correct answer when it becomes available (after it is available -- see as a reference Improving Python/django view code).
I have the following model, with a custom def that returns a video count for each user --
class UserProfile(models.Model):
user = models.ForeignKey(User, unique=True)
positions = models.ManyToManyField('Position', through ='PositionTimestamp', blank=True)
def count(self):
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"""SELECT (
SELECT COUNT(*)
FROM videos_video v
WHERE v.uploaded_by_id = p.id
OR EXISTS (
SELECT NULL
FROM videos_videocredit c
WHERE c.video_id = v.id
AND c.profile_id = p.id
)
) AS Total_credits
FROM userprofile_userprofile p
WHERE p.id = %d"""%(int(self.pk))
)
return int(cursor.fetchone()[0])
I want to be able to order by the count, i.e., UserProfile.objects.order_by('count'). Of course, I can't do that, which is why I'm asking this question.
Previously, I tried adding a custom model Manager, but the problem with that was I also need to be able to filter by various criteria of the UserProfile model: Specifically, I need to be able to do: UserProfile.objects.filter(positions=x).order_by('count'). In addition, I need to stay in the ORM (cannot have a raw sql output) and I do not want to put the filtering logic into the SQL, because there are various filters, and would require several statements.
How exactly would I do this? Thank you.
My reaction is that you're trying to take a bigger bite than you can chew. Break it into bite size pieces by giving yourself more primitives to work with.
You want to create these two pieces separately so you can call on them:
Does this user get credit for this video? return boolean
For how many videos does this user get credit? return int
Then use a combination of #property, model managers, querysets, and methods that make it easiest to express what you need.
For example you might attach the "credit" to the video model taking a user parameter, or the user model taking a video parameter, or a "credit" manager on users which adds a count of videos for which they have credit.
It's not trivial, but shouldn't be too tricky if you work for it.
"couldn't you use something like the "extra" queryset modifier?"
see the docs
I didn't put this in an answer at first because I wasn't sure it would actually work or if it was what you needed - it was more like a nudge in the (hopefully) right direction.
in the docs on that page there is an example
query
Blog.objects.extra(
select={
'entry_count': 'SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id'
},
)
resulting sql
SELECT blog_blog.*, (SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id) AS entry_count
FROM blog_blog;
Perhaps doing something like that and accessing the user id which you currently have as p.id as appname_userprofile.id
note:
Im just winging it so try to play around a bit.
perhaps use the shell to output the query as sql and see what you are getting.
models:
class Positions(models.Model):
x = models.IntegerField()
class Meta:
db_table = 'xtest_positions'
class UserProfile(models.Model):
user = models.ForeignKey(User, unique=True)
positions = models.ManyToManyField(Positions)
class Meta:
db_table = 'xtest_users'
class Video(models.Model):
usr = models.ForeignKey(UserProfile)
views = models.IntegerField()
class Meta:
db_table = 'xtest_video'
result:
test = UserProfile.objects.annotate(video_views=Sum('video__views')).order_by('video_views')
for t in test:
print t.video_views
doc: https://docs.djangoproject.com/en/dev/topics/db/aggregation/
This is either what you want, or I've completely misunderstood!.. Anywhoo... Hope it helps!