django prefetch_related id only - django

I'm trying to optimise my queries but prefetch_related insists on joining the tables and selecting all the fields even though I only need the list of ids from the relations table.
You can ignore the 4th query. It's not related to the question.
Related Code:
class Contact(models.Model):
...
Groups = models.ManyToManyField(ContactGroup, related_name='contacts')
...
queryset = Contact.objects.all().prefetch_related('Groups')

Django 1.7 added Prefetch objects which let you customise the queryset used when prefetching.
In particular, see only().
In this case, you'd want something like:
queryset = Contact.objects.all().prefetch_related(
Prefetch('Groups', queryset=Group.objects.all().only('id')))

Related

Django3 prefetch_related

I had worked by django 2.X. But I'm going to use django3.x at my new project.
At version2, when I should make outer join. I used prefetch_related and filtered about model of prefetch_related.
In version 2, if I use prefetch_related it was queried as single query. but in version 3, queried by multiple query.
If I only use Q() of joined target without prefetch_related, it works single query at version 3.
from django.db import models
from django.db.models import Q
from django.db.models import Prefetch
class Member(models.Model):
member_no = models.AutoField()
member_name = models.CharField()
class Permission(models.Model):
permission_no = models.AutoField()
class MemberPermission(models.Model):
member_permission_no = models.AutoField()
member_no = models.ForeignKey(
Member, related_name='members', on_delete=models.CASCADE,
)
permission_no = models.ForeignKey(
Permission, related_name='member_permissions', on_delete=models.CASCADE,
)
my_permission = Member.objects.prefetch_related('member_permissions').filter(Q(member_permissions__isnull=False))[:1]
print(my_permission[0].member_permissions)
# member outer join permission, single query at django 2.X
# member outer join permission & additional query at django 3.x
my_permission = Member.objects.filter(Q(member_permissions__isnull=False))[:1]
print(my_permission[0].member_permissions)
# member outer join permission, single query at django 3.X
my_permission = Member.objects.prefetch_related(
Prefetch('member_permissions', MemberPermission.objects.select_related(
'permission_no').all())
).filter(Q(members__isnull=False))[:1]
print(my_permission[0].member_permissions.all()[0].permission_no.permission_no)
# member outer join permission & additional query at django 3.x
If I don't use prefetch_related, I could get single query.
But if I want to get model of joined model (Permission of MemberPermission by Member) it couldn't.
I wonder how to query once by Prefetch() in django3.
This isn't a version difference. It's the way prefetch_related works. It will execute 1 extra query per outer join. However, this is still a lot less than executing 1 query per iteration. The documentation is very clear on this:
select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related.
So let's say we have a 2 outer joins and in total 1000 matching rows:
Number of queries without prefetch_related: 1 + 2*1000 = 2001
Number of queries with prefetch_related: 1 + 2 = 3
So it makes very little sense to worry about that 1 extra query per join.

Django how to fetch related objects with a join?

My models are similar to the following:
class Reporter(models.Model):
def gold_star(self):
return self.article_set.get().total_views >= 100000
class Article(models.Model):
reporter = models.ForeignKey(Reporter, on_delete=models.CASCADE)
total_views = models.IntegerField(default=0, blank=True)
Then in one of the templates I have this line:
{% if r.gold_star %}<img src="{% static 'gold-star.png' %}">{% endif %}
Obviously django sends as many queries as there are reporters on the page... Ideally this could be just one query, which would select reporters by criteria and join appropriate articles. Is there a way?
EDIT
Neither select_related nor prefetch_related doesn't seem to work as I'm selecting on the Reporter table and then use RelatedManager to access related data on the Article.
In other words django doesn't know what to prefetch until there's non empty queryset.
Because an article can only have one reporter it's for sure possible to join these tables together and then apply filter to subquery, I just can't find how it's done in django query language.
There's alternative - select on the Article table and filter by Reporter fields, but there's a problem with such approach. If I deleted all the articles of some reporter then I wouldn't be able to include that reporter in the list as from the Article point of view such reporter doesn't exist and yet reporter is in the Reporter table.
EDIT2
I tried what people suggested in the comments. The following generates desired query:
reporters = Reporter.objects.filter(**query).select_related().annotate(
gold_star=Case(
When(article__total_views__gte=0, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
Query generated by django:
SELECT
`portal_reporter`.`id`,
...,
CASE WHEN `portal_article`.`total_views` >= 0 THEN 1 ELSE 0 END AS `gold_star`
FROM
`portal_reporter`
LEFT OUTER JOIN `portal_article`
ON (`portal_reporter`.`id` = `portal_article`.`reporter_id`)
WHERE
...
Now I just need to work out a way how to produce similar query but without Case/When statements.
EDIT3
If I chose slightly different strategy, then django selects wrong join type:
query['article__id__gte'] = 0
reporters = Reporter.objects.filter(**query).select_related()
This code produce similar query but with the INNER JOIN instead of desired LEFT OUTER JOIN:
SELECT
`portal_reporter`.`id`,
...,
FROM
`portal_reporter`
INNER JOIN `portal_article`
ON (`portal_reporter`.`id` = `portal_article`.`reporter_id`)
WHERE
...
You can use select_related (https://docs.djangoproject.com/en/1.11/ref/models/querysets/#select-related) to do a join on the related table.
There's also prefetch_related (https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related) which uses an IN clause to fetch the related objects with an extra query. The difference is explained in the docs, but is reproduced below:
select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.
Try annotating the new field gold_star and set it to 1 if reporter has an article that has more than 100000 total_views like this:
from django.db.models import Case, When, Value, IntegerField
reporters = Reporter.objects.annotate(
gold_star=Case(
When(article__total_views__gte=100000, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
You can leave the template code as it is.

django prefetch_related many queries

Django 1.7
I have a model:
class Model(models.Model):
tags = models.ManyToManyField(..)
When I do Model.objects.prefetch_related().... it results in many individual queries being issued to fetch the tags, one for each model.
I'd expect 2 queries to happen: 1 to fetch the models, another to fetch the tags for all models.
How to do that?
EDITED:
I'm using a raw query like Model.objects.prefetch_related('tags').raw_query(..)
You should specify a field name to prefetch:
Model.objects.prefetch_related('tags')
If you use the queryset.raw() method then prefetch_related() logic doesn't work.

prefetch_related for multiple Levels

If my Models look like:
class Publisher(models.Model):
pass
class Book(models.Model):
publisher = models.ForeignKey(Publisher)
class Page(models.Model):
book = models.ForeignKey(Book)
and I would like to get the queryset for Publisher I do Publisher.object.all().
If then want to make sure to prefetch I can do:
Publisher.objects.all().prefetch_related('book_set')`
My questions are:
Is there a way to do this prefetching using select_related or
must I use prefetch_related?
Is there a way to prefetch the
page_set? This does not work:
Publisher.objects.all().prefetch_related('book_set', 'book_set_page_set')
Since Django 1.7, instances of django.db.models.Prefetch class can be used as an argument of .prefetch_related. Prefetch object constructor has a queryset argument that allows to specify nested multiple levels prefetches like that:
Project.objects.filter(
is_main_section=True
).select_related(
'project_group'
).prefetch_related(
Prefetch(
'project_group__project_set',
queryset=Project.objects.prefetch_related(
Prefetch(
'projectmember_set',
to_attr='projectmember_list'
)
),
to_attr='project_list'
)
)
It is stored into attributes with _list suffix because I use ListQuerySet to process prefetch results (filter / order).
No, you cannot use select_related for a reverse relation. select_related does a SQL join, so a single record in the main queryset needs to reference exactly one in the related table (ForeignKey or OneToOne fields). prefetch_related actually does a totally separate second query, caches the results, then "joins" it into the queryset in python. So it is needed for ManyToMany or reverse ForeignKey fields.
Have you tried two underscores to do the multi level prefetches? Like this: Publisher.objects.all().prefetch_related('book_set', 'book_set__page_set')

Django ORM: Join to queryset models with foreign key

I need to get list of all companies and join the company user with minimal companyuser id.
There are two models:
class Company(models.Model):
name = models.CharField(max_length=255)
kind = models.CharField(max_length=255)
class CompanyUser(models.Model):
company = models.ForeignKey('Company')
email = models.EmailField(max_length=40, unique=True)
#other fields
I've tried something like this:
companies = Company.objects.all().select_related(Min('companyuser__email'))
but It doesn't work. How can I do this with Django ORM? Is there any way to do it without raw SQL?
from django.db.models import Min
Company.objects.annotate(lowest_companyuser_id=Min("companyuser__id"))
Explanation
select_related() can be used for telling Django which related tables should be joined to the resulting queryset for reducing the number of queries, namely solving the dreaded "N+1 problem" when looping over a queryset and accessing related objects in iteration. (see docs)
With using Min() you were on the right track, but it ought to be used in conjunction with the annotate() queryset method. Using annotate() with aggregate expressions like Min(), Max(), Count(), etc. translates in an SQL query using one of the aforementioned aggregate expressions with GROUP BY. (see docs about annotate() in Django, about GROUP BY in Postgres docs)
As Burhan said - do not rely on the pk, but if u must...
companies = Company.objects.all().order_by('pk')[0]