Django3 prefetch_related - django

I had worked by django 2.X. But I'm going to use django3.x at my new project.
At version2, when I should make outer join. I used prefetch_related and filtered about model of prefetch_related.
In version 2, if I use prefetch_related it was queried as single query. but in version 3, queried by multiple query.
If I only use Q() of joined target without prefetch_related, it works single query at version 3.
from django.db import models
from django.db.models import Q
from django.db.models import Prefetch
class Member(models.Model):
member_no = models.AutoField()
member_name = models.CharField()
class Permission(models.Model):
permission_no = models.AutoField()
class MemberPermission(models.Model):
member_permission_no = models.AutoField()
member_no = models.ForeignKey(
Member, related_name='members', on_delete=models.CASCADE,
)
permission_no = models.ForeignKey(
Permission, related_name='member_permissions', on_delete=models.CASCADE,
)
my_permission = Member.objects.prefetch_related('member_permissions').filter(Q(member_permissions__isnull=False))[:1]
print(my_permission[0].member_permissions)
# member outer join permission, single query at django 2.X
# member outer join permission & additional query at django 3.x
my_permission = Member.objects.filter(Q(member_permissions__isnull=False))[:1]
print(my_permission[0].member_permissions)
# member outer join permission, single query at django 3.X
my_permission = Member.objects.prefetch_related(
Prefetch('member_permissions', MemberPermission.objects.select_related(
'permission_no').all())
).filter(Q(members__isnull=False))[:1]
print(my_permission[0].member_permissions.all()[0].permission_no.permission_no)
# member outer join permission & additional query at django 3.x
If I don't use prefetch_related, I could get single query.
But if I want to get model of joined model (Permission of MemberPermission by Member) it couldn't.
I wonder how to query once by Prefetch() in django3.

This isn't a version difference. It's the way prefetch_related works. It will execute 1 extra query per outer join. However, this is still a lot less than executing 1 query per iteration. The documentation is very clear on this:
select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related.
So let's say we have a 2 outer joins and in total 1000 matching rows:
Number of queries without prefetch_related: 1 + 2*1000 = 2001
Number of queries with prefetch_related: 1 + 2 = 3
So it makes very little sense to worry about that 1 extra query per join.

Related

How can I have check constraint in django which check two fields of related models?

from django.db import models
from djago.db.models import F, Q
class(models.Model):
order_date = models.DateField()
class OrderLine(models.Model):
order = models.ForeignKeyField(Order)
loading_date = models.DateField()
class Meta:
constraints = [
models.CheckConstraint(check=Q(loading_date__gte=F("order__order_date")), name="disallow_backdated_loading")
I want to make sure always orderline loading_date is higher than orderdate
A CHECK constraint can span over a column, or over a table, but not over multiple tables, so this is not possible through a CHECK constraint.
Some databases allow to define triggers. These triggers run for example when a records is created/updated, and can run SQL queries, and decide to reject the creation/update based on such queries, but currently, the Django ORM does not support that.
One could also work with a composite primary key of an id and the creation date of the order, in which case the create timestamp is thus stored in the OrderLine table, and thus one can implement a check at the table level, but Django does not support working with composite primary keys for a number of reasons.
Therefore, besides running raw SQL, for example with a migration file that has a RunSQL operation [Django-doc], but this will likely be specific towards a database.
Therefore probably the most sensical check is to override the model clean() method [Django-doc]. Django however does not run the .clean() method before saving an object in the database, this is only done by ModelForms, and ModelAdmins. We can thus add a check with:
from django.core.exceptions import ValidationError
class OrderLine(models.Model):
order = models.ForeignKeyField(
Order,
on_delete=models.CASCADE
)
loading_date = models.DateField()
def clean(self):
if self.loading_date < self.order.order_date:
raise ValidationError('Can not load before ordering')
return super().clean()

Optimal project organization and querysets

I have 2 models Company and Product with FK on Product:
class Product(Meta):
company = models.ForeignKey(Company, related_name='products', on_delete=models.CASCADE)
In case of a View that will gather company products what is the optimal approach(use infor form both models):
1) add the View in companies app and as queryset use:
Company.objects.prefetch_related('products').get(pk=company_pk)
2) add the View in products app and as queryset use:
Product.objects.select_related('company').filter(company=company_pk)
What about ordering can be chained with prefetch or select ?
The Django docs illustrate the difference quite well:
prefetch_related(*lookups)
Returns a QuerySet that will
automatically retrieve, in a single batch, related objects for each of
the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different.
select_related works by creating an SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a ‘many’ relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
select_related(*fields)
Returns a QuerySet that will “follow” foreign-key relationships,
selecting additional related-object data when it executes its query.
This is a performance booster which results in a single more complex
query but means later use of foreign-key relationships won’t require
database queries.

Django how to fetch related objects with a join?

My models are similar to the following:
class Reporter(models.Model):
def gold_star(self):
return self.article_set.get().total_views >= 100000
class Article(models.Model):
reporter = models.ForeignKey(Reporter, on_delete=models.CASCADE)
total_views = models.IntegerField(default=0, blank=True)
Then in one of the templates I have this line:
{% if r.gold_star %}<img src="{% static 'gold-star.png' %}">{% endif %}
Obviously django sends as many queries as there are reporters on the page... Ideally this could be just one query, which would select reporters by criteria and join appropriate articles. Is there a way?
EDIT
Neither select_related nor prefetch_related doesn't seem to work as I'm selecting on the Reporter table and then use RelatedManager to access related data on the Article.
In other words django doesn't know what to prefetch until there's non empty queryset.
Because an article can only have one reporter it's for sure possible to join these tables together and then apply filter to subquery, I just can't find how it's done in django query language.
There's alternative - select on the Article table and filter by Reporter fields, but there's a problem with such approach. If I deleted all the articles of some reporter then I wouldn't be able to include that reporter in the list as from the Article point of view such reporter doesn't exist and yet reporter is in the Reporter table.
EDIT2
I tried what people suggested in the comments. The following generates desired query:
reporters = Reporter.objects.filter(**query).select_related().annotate(
gold_star=Case(
When(article__total_views__gte=0, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
Query generated by django:
SELECT
`portal_reporter`.`id`,
...,
CASE WHEN `portal_article`.`total_views` >= 0 THEN 1 ELSE 0 END AS `gold_star`
FROM
`portal_reporter`
LEFT OUTER JOIN `portal_article`
ON (`portal_reporter`.`id` = `portal_article`.`reporter_id`)
WHERE
...
Now I just need to work out a way how to produce similar query but without Case/When statements.
EDIT3
If I chose slightly different strategy, then django selects wrong join type:
query['article__id__gte'] = 0
reporters = Reporter.objects.filter(**query).select_related()
This code produce similar query but with the INNER JOIN instead of desired LEFT OUTER JOIN:
SELECT
`portal_reporter`.`id`,
...,
FROM
`portal_reporter`
INNER JOIN `portal_article`
ON (`portal_reporter`.`id` = `portal_article`.`reporter_id`)
WHERE
...
You can use select_related (https://docs.djangoproject.com/en/1.11/ref/models/querysets/#select-related) to do a join on the related table.
There's also prefetch_related (https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related) which uses an IN clause to fetch the related objects with an extra query. The difference is explained in the docs, but is reproduced below:
select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.
Try annotating the new field gold_star and set it to 1 if reporter has an article that has more than 100000 total_views like this:
from django.db.models import Case, When, Value, IntegerField
reporters = Reporter.objects.annotate(
gold_star=Case(
When(article__total_views__gte=100000, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
You can leave the template code as it is.

Django ORM: Join to queryset models with foreign key

I need to get list of all companies and join the company user with minimal companyuser id.
There are two models:
class Company(models.Model):
name = models.CharField(max_length=255)
kind = models.CharField(max_length=255)
class CompanyUser(models.Model):
company = models.ForeignKey('Company')
email = models.EmailField(max_length=40, unique=True)
#other fields
I've tried something like this:
companies = Company.objects.all().select_related(Min('companyuser__email'))
but It doesn't work. How can I do this with Django ORM? Is there any way to do it without raw SQL?
from django.db.models import Min
Company.objects.annotate(lowest_companyuser_id=Min("companyuser__id"))
Explanation
select_related() can be used for telling Django which related tables should be joined to the resulting queryset for reducing the number of queries, namely solving the dreaded "N+1 problem" when looping over a queryset and accessing related objects in iteration. (see docs)
With using Min() you were on the right track, but it ought to be used in conjunction with the annotate() queryset method. Using annotate() with aggregate expressions like Min(), Max(), Count(), etc. translates in an SQL query using one of the aforementioned aggregate expressions with GROUP BY. (see docs about annotate() in Django, about GROUP BY in Postgres docs)
As Burhan said - do not rely on the pk, but if u must...
companies = Company.objects.all().order_by('pk')[0]

django prefetch_related id only

I'm trying to optimise my queries but prefetch_related insists on joining the tables and selecting all the fields even though I only need the list of ids from the relations table.
You can ignore the 4th query. It's not related to the question.
Related Code:
class Contact(models.Model):
...
Groups = models.ManyToManyField(ContactGroup, related_name='contacts')
...
queryset = Contact.objects.all().prefetch_related('Groups')
Django 1.7 added Prefetch objects which let you customise the queryset used when prefetching.
In particular, see only().
In this case, you'd want something like:
queryset = Contact.objects.all().prefetch_related(
Prefetch('Groups', queryset=Group.objects.all().only('id')))