Django join on cached queryset - django

I have three models:
class Video(models.Model):
video_pk = models.AutoField(primary_key=True)
author_fk = models.ForeignKey(GRUser, related_name='uploaded_videos', db_column='author_fk')
class VideoLike(models.Model):
video_like_pk = models.AutoField(primary_key=True)
video_fk = models.ForeignKey(Video, related_name='likes_list', db_column='video_fk')
author_fk = models.ForeignKey(GRUser, related_name='video_likes_list', db_column='author_fk')
video_like_dttm = models.DateTimeField(auto_now_add=True)
class VideoStats(models.Model):
video_fk_pk = models.OneToOneField(Video, primary_key=True, db_column='video_fk_pk', related_name='stats')
likes_num = models.BigIntegerField(default=0)
To prevent the same database hits I periodically cache some popular videos:
from django.core.cache import cache
qs_all = models.Video.objects.select_related('author_fk', 'stats').filter(publication_status=models.Video.PUBLISHED).order_by('-stats__likes_num')
cache.set('popular_videos_all', qs_all[:length])
When returning those videos through my API (Django-Rest-Framework) I should add video_like_pk of current user's like if the user is authenticated and the like exists.
I can do it using prefetch_related but it makes two calls to the database and doesn't use the cached results. I want to fetch from the database only VideoLikes of current user and take the rest from the cache. Is it possible? Maybe there is a completely other approach which is better?

Related

Django query, annotate a chain of related models

I have following schema with PostgreSQL.
class Video(models.Model):
title = models.CharField(max_length=255)
created_at = models.DateTimeField()
disabled = models.BooleanField(default=False)
view_count = DecimalField(max_digits=10, decimal_places=0)
class TopVideo(models.Model):
videos = (Video, on_delete=models.CASCADE, primary_key=True)
class Comment(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
video = models.ForeignKey(Video, related_name="comments", on_delete=models.CASCADE)
The reason I have a TopVideo model is because I have millions of videos and querying them takes a long time on a cheap server, so I have a secondary model that is populated by a celery task, and flushes and re-populates on each run, which makes the homepage load time much faster. The task runs the query that you see next, and saves them into the TopVideo model. This way, the task may take long to run, but user doesn't have to wait for the expensive query anymore.
Before having the TopVideo model, I ran this query for my homepage:
videos = (
Video.objects.filter(created_at__range=[start, end])
.annotate(comment_count=Count("comments"))
.exclude(disabled=True)
.order_by("-view_count")[:100]
)
This worked perfectly and I had access to "comment_count" in my template, where I could easily show the number of comments each video had.
But now that I make this query:
top_videos = (
TopVideo.objects.all().annotate(comment_count=Count("video__comments"))
.select_related("video")
.order_by("-video__view_count")[:100]
)
and with a simple for-loop,
videos = []
for video in top_videos:
videos.append(video.video)
I send the videos to the template to render.
My problem is, I no longer have access to the "comment_count" inside the template, and naturally so; I don't send the queryset anymore. How can I now access the comment_count?
Things I tried:
Sending the TopVideo query to template did not work. They're a bunch of TopVideo objects, not Video objects.
I added this piece of code in my template "{{ video.comments.count }}" but this makes 100 requests to the database, which is not really optimal.
You can set the .comment_count to your Video objects with:
videos = []
for top_video in top_videos:
video = top_video.video
video.comment_count = top_video.comment_count
videos.append(video)
but that being said, it is unclear to my why you are querying with TopVideo if you basically strip the TopVideo context from the video.
If you want to obtain the Videos for which there exists a TopVideo object, you can work with:
videos = Video.objects.filter(
created_at__range=[start, end], topvideo__isnull=False
).annotate(
comment_count=Count('comments')
).exclude(disabled=True).order_by('-view_count')[:100]
The topvideo__isnull=False will thus filter out Videos that are not TopVideos.

Django one-to-many relation: optimize code to reduce number of database queries executed

I have 2 models with a one-to-many relation on a MySQL DB:
class Domains(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=50, unique=True)
description = models.TextField(blank=True, null=True)
class Kpis(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=50, unique=True)
description = models.TextField(blank=True, null=True)
domain_id = models.ForeignKey(Domains, on_delete=models.CASCADE, db_column='domain_id')
In order to bring ALL the domains with all their kpis objects, i use this code with a for loop:
final_list = []
domains_list = Domains.objects.all()
for domain in domains_list:
# For each domain, get all related KPIs
domain_kpis = domain.kpis_set.values()
final_list.append({domain:domains_kpis})
The total number of queries i run is: 1 + the number of total domains i have, which is quite a lot.
I'm looking for a way to optimize this, preferably to execute it within only one query on the database. Is this possible?
You use .prefetch_related(…) [Django-doc] for this:
final_list = []
domains_list = Domains.objects.prefetch_related('kpis_set')
for domain in domains_list:
# For each domain, get all related KPIs
domain_kpis = domain.kpis_set.all()
final_list.append({domain:domains_kpis})
This will make two queries: one to query the domains, and a second to query all the related Kpis with a single query into memory.
Furthermore please do not use .values(). You can serialze data to JSON with Django's serializer framework, by making use of .values() you "erode" the model layer. See the Serializing Django objects section of the documentation for more information.
Just wanted to add that you are asking a solution for "classic" N +1 queries problem. Here you can read a something about it and aslo find the examples for prefetch_related method adviced in Willem's answer.
Another thing worth mentioning is that probably you aren't suppose to use this dict final_list.append({domain:domains_kpis}), but instead you may want to map some field(s) from Domain to some field(s) from Kapis models and, if this is true, you can specify exact fields you'd like to have prefetched using Prefetch:
domains_list = Domains.objects.prefetch_related(Prefetch('kpis_set'), queryset=Kapis.objects.all().only('some_field_you_want_to_have'))
final_list = []
for domain in domains_list:
domain_kpis = domain.kpis_set.all()
final_list.append({domain.some_field:domains_kpis.prefetched_field})
This should give another boost to performance on big-volume table's.

How to join these two queries in Django?

I have query like this in Django:
shared_file = File.objects.filter(id__in= Share.objects.filter(users_id = log_id).values_list('files', flat=True)).annotate(count=Count('share__shared_user_id'))
file1 = [i.file_name for i in shared_file]
shared_username = [User.objects.filter(id__in= Share.objects.filter(users_id = log_id, files__file_name=k).values_list('shared_user_id', flat=True)).values_list('username') for k in file1]
I want to join them so that I can loop over it and find the usernames with whom the file is shared.
You can use select_related() and fetch related objects in one query (and avoid list comprehensions with queries):
I assumed that use model definiton as in your previous question:
class Share(models.Model):
users = models.ForeignKey(User)
files = models.ForeignKey(File)
shared_user_id = models.IntegerField()
shared_date = models.TextField()
You can fetch Shares with User and File in one shot using:
shares = Share.objects.select_related('users', 'files').filter(users_id = log_id).all()
for share in shares:
share # Share model
share.users # User model
share.files # File model
Use itertools chain from the STL http://docs.python.org/2/library/itertools.html#itertools.chain
chain(file1, shared_username)

django prefetch_related with filter

models.py:
class Ingredient(models.Model):
_est_param = None
param = models.ManyToManyField(Establishment, blank=True, null=True, related_name='+', through='IngredientParam')
def est_param(self, establishment):
if not self._est_param:
self._est_param, created = self.ingredientparam_set\
.get_or_create(establishment=establishment)
return self._est_param
class IngredientParam(models.Model):
#ingredient params
active = models.BooleanField(default=False)
ingredient = models.ForeignKey(Ingredient)
establishment = models.ForeignKey(Establishment)
I need to fetch all Ingredient with parametrs for Establishment. First I fetch Ingredients.objects.all() and use all params like Ingredients.objects.all()[0].est_param(establishment).active. How I can use django 1.4 prefetch_related to make less sql queries? May be I can use other way to store individual Establishment properties for Ingredient?
Django 1.7 adds the Prefetch object you can put into prefetch_related. It allows you to specify a queryset which should provide the filtering. I'm having some problems with it at the moment for getting a singular (latest) entry from a list, but it seems to work very well when trying to get all the related entries.
You could also checkout django-prefetch which is part of this question which does not seem a duplicate of this question because of the vastly different wording.
The following code would fetch all the ingredients and their parameters in 2 queries:
ingredients = Ingredients.objects.all().prefetch_related('ingredientparam_set')
You could then access the parameters you're interested in without further database queries.

Serialize in django witha query set that contains defer

i have i little problem, and that is how can serialize a django query with defer ?
I have this model :
class Evento(models.Model):
nome=models.CharField(max_length=100)
descricao=models.CharField(max_length=200,null=True)
data_inicio= models.DateTimeField()
data_fim= models.DateTimeField()
preco=models.DecimalField(max_digits=6,decimal_places=2)
consumiveis= models.CharField(max_length=5)
dress_code= models.CharField(max_length=6)
guest_list=models.CharField(max_length=15)
local = models.ForeignKey(Local)
user= models.ManyToManyField(User,null=True,blank=True)
def __unicode__(self):
return unicode('%s %s'%(self.nome,self.descricao))
my query is this :
eventos_totais = Evento.objects.defer("user").filter(data_inicio__gte=default_inicio,
data_fim__lte=default_fim)
it works fine i think (how can i check if the query has realy defer the field user ? ) but when i do:
json_serializer = serializers.get_serializer("json")()
eventos_totais = json_serializer.serialize(eventos_totais,
ensure_ascii=False,
use_natural_keys=True)
it always folow the natural keys for user and local, i need natural keys for this query because of the fields local. But i do not need the field user.
To serialize a subset of your models fields, you need to specify the fields argument to the serializers.serialize()
from django.core import serializers
data = serializers.serialize('xml', SomeModel.objects.all(), fields=('name','size'))
Ref: Django Docs