My models are similar to the following:
class Reporter(models.Model):
def gold_star(self):
return self.article_set.get().total_views >= 100000
class Article(models.Model):
reporter = models.ForeignKey(Reporter, on_delete=models.CASCADE)
total_views = models.IntegerField(default=0, blank=True)
Then in one of the templates I have this line:
{% if r.gold_star %}<img src="{% static 'gold-star.png' %}">{% endif %}
Obviously django sends as many queries as there are reporters on the page... Ideally this could be just one query, which would select reporters by criteria and join appropriate articles. Is there a way?
EDIT
Neither select_related nor prefetch_related doesn't seem to work as I'm selecting on the Reporter table and then use RelatedManager to access related data on the Article.
In other words django doesn't know what to prefetch until there's non empty queryset.
Because an article can only have one reporter it's for sure possible to join these tables together and then apply filter to subquery, I just can't find how it's done in django query language.
There's alternative - select on the Article table and filter by Reporter fields, but there's a problem with such approach. If I deleted all the articles of some reporter then I wouldn't be able to include that reporter in the list as from the Article point of view such reporter doesn't exist and yet reporter is in the Reporter table.
EDIT2
I tried what people suggested in the comments. The following generates desired query:
reporters = Reporter.objects.filter(**query).select_related().annotate(
gold_star=Case(
When(article__total_views__gte=0, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
Query generated by django:
SELECT
`portal_reporter`.`id`,
...,
CASE WHEN `portal_article`.`total_views` >= 0 THEN 1 ELSE 0 END AS `gold_star`
FROM
`portal_reporter`
LEFT OUTER JOIN `portal_article`
ON (`portal_reporter`.`id` = `portal_article`.`reporter_id`)
WHERE
...
Now I just need to work out a way how to produce similar query but without Case/When statements.
EDIT3
If I chose slightly different strategy, then django selects wrong join type:
query['article__id__gte'] = 0
reporters = Reporter.objects.filter(**query).select_related()
This code produce similar query but with the INNER JOIN instead of desired LEFT OUTER JOIN:
SELECT
`portal_reporter`.`id`,
...,
FROM
`portal_reporter`
INNER JOIN `portal_article`
ON (`portal_reporter`.`id` = `portal_article`.`reporter_id`)
WHERE
...
You can use select_related (https://docs.djangoproject.com/en/1.11/ref/models/querysets/#select-related) to do a join on the related table.
There's also prefetch_related (https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related) which uses an IN clause to fetch the related objects with an extra query. The difference is explained in the docs, but is reproduced below:
select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related. It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.
Try annotating the new field gold_star and set it to 1 if reporter has an article that has more than 100000 total_views like this:
from django.db.models import Case, When, Value, IntegerField
reporters = Reporter.objects.annotate(
gold_star=Case(
When(article__total_views__gte=100000, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
You can leave the template code as it is.
Related
I had worked by django 2.X. But I'm going to use django3.x at my new project.
At version2, when I should make outer join. I used prefetch_related and filtered about model of prefetch_related.
In version 2, if I use prefetch_related it was queried as single query. but in version 3, queried by multiple query.
If I only use Q() of joined target without prefetch_related, it works single query at version 3.
from django.db import models
from django.db.models import Q
from django.db.models import Prefetch
class Member(models.Model):
member_no = models.AutoField()
member_name = models.CharField()
class Permission(models.Model):
permission_no = models.AutoField()
class MemberPermission(models.Model):
member_permission_no = models.AutoField()
member_no = models.ForeignKey(
Member, related_name='members', on_delete=models.CASCADE,
)
permission_no = models.ForeignKey(
Permission, related_name='member_permissions', on_delete=models.CASCADE,
)
my_permission = Member.objects.prefetch_related('member_permissions').filter(Q(member_permissions__isnull=False))[:1]
print(my_permission[0].member_permissions)
# member outer join permission, single query at django 2.X
# member outer join permission & additional query at django 3.x
my_permission = Member.objects.filter(Q(member_permissions__isnull=False))[:1]
print(my_permission[0].member_permissions)
# member outer join permission, single query at django 3.X
my_permission = Member.objects.prefetch_related(
Prefetch('member_permissions', MemberPermission.objects.select_related(
'permission_no').all())
).filter(Q(members__isnull=False))[:1]
print(my_permission[0].member_permissions.all()[0].permission_no.permission_no)
# member outer join permission & additional query at django 3.x
If I don't use prefetch_related, I could get single query.
But if I want to get model of joined model (Permission of MemberPermission by Member) it couldn't.
I wonder how to query once by Prefetch() in django3.
This isn't a version difference. It's the way prefetch_related works. It will execute 1 extra query per outer join. However, this is still a lot less than executing 1 query per iteration. The documentation is very clear on this:
select_related works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python. This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related, in addition to the foreign key and one-to-one relationships that are supported by select_related.
So let's say we have a 2 outer joins and in total 1000 matching rows:
Number of queries without prefetch_related: 1 + 2*1000 = 2001
Number of queries with prefetch_related: 1 + 2 = 3
So it makes very little sense to worry about that 1 extra query per join.
I have 2 models Company and Product with FK on Product:
class Product(Meta):
company = models.ForeignKey(Company, related_name='products', on_delete=models.CASCADE)
In case of a View that will gather company products what is the optimal approach(use infor form both models):
1) add the View in companies app and as queryset use:
Company.objects.prefetch_related('products').get(pk=company_pk)
2) add the View in products app and as queryset use:
Product.objects.select_related('company').filter(company=company_pk)
What about ordering can be chained with prefetch or select ?
The Django docs illustrate the difference quite well:
prefetch_related(*lookups)
Returns a QuerySet that will
automatically retrieve, in a single batch, related objects for each of
the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different.
select_related works by creating an SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a ‘many’ relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
select_related(*fields)
Returns a QuerySet that will “follow” foreign-key relationships,
selecting additional related-object data when it executes its query.
This is a performance booster which results in a single more complex
query but means later use of foreign-key relationships won’t require
database queries.
I'm trying to optimise my queries but prefetch_related insists on joining the tables and selecting all the fields even though I only need the list of ids from the relations table.
You can ignore the 4th query. It's not related to the question.
Related Code:
class Contact(models.Model):
...
Groups = models.ManyToManyField(ContactGroup, related_name='contacts')
...
queryset = Contact.objects.all().prefetch_related('Groups')
Django 1.7 added Prefetch objects which let you customise the queryset used when prefetching.
In particular, see only().
In this case, you'd want something like:
queryset = Contact.objects.all().prefetch_related(
Prefetch('Groups', queryset=Group.objects.all().only('id')))
I'm trying to optimise my app by keeping the number of queries to a minimum... I've noticed I'm getting a lot of extra queries when doing something like this:
class Category(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=127, blank=False)
class Project(models.Model):
categories = models.ManyToMany(Category)
Then later, if I want to retrieve a project and all related categories, I have to do something like this :
{% for category in project.categories.all() %}
Whilst this does what I want it does so in two queries. I was wondering if there was a way of joining the M2M field so I could get the results I need with just one query? I tried this:
def category_list(self):
return self.join(list(self.category))
But it's not working.
Thanks!
Which, whilst does what I want, adds an extra query.
What do you mean by this? Do you want to pick up a Project and its categories using one query?
If you did mean this, then unfortunately there is no mechanism at present to do this without resorting to a custom SQL query. The select_related() mechanism used for foreign keys won't work here either. There is (was?) a Django ticket open for this but it has been closed as "wontfix" by the Django developers.
What you want is not seem to possible because,
In DBMS level, ManyToMany relatin is not possible, so an intermediate table is needed to join tables with ManyToMany relation.
On Django level, for your model definition, django creates an ectra table to create a ManyToMany connection, table is named using your two tables, in this example it will be something like *[app_name]_product_category*, and contains foreignkeys for your two database table.
So, you can not even acces to a field on the table with a manytomany connection via django with a such categories__name relation in your Model filter or get functions.
I run a lab annotation website where users can annotate samples with tags relating to disease, tissue type, etc. Here is a simple example from models.py:
from django.contrib.auth.models import User
from django.db import models
class Sample(models.Model):
name = models.CharField(max_length = 255)
tags=models.ManyToManyField('Tag', through = 'Annot')
class Tag(models.Model):
name = models.CharField(max_length = 255)
class Annot(models.Model):
tag = models.ForeignKey('Tag')
sample = models.ForeignKey('Sample')
user = models.ForeignKey(User, null = True)
date = models.DateField(auto_now_add = True)
I'm looking for a query in django's ORM which will return the tags in which two users agree on the annotation of same tag. It would be helpful if I could supply a list of users to limit my query (if someone only believes User1 and User2 and wants to find the sample/tag pairs that only they agree on.)
I think I understood what you need. This one made me think, thanks! :-)
I believe the equivalent SQL query would be something like:
select t.name, s.name, count(user_id) count_of_users
from yourapp_annot a, yourapp_tag t, yourapp_sample s
where a.tag_id = t.id
and s.id = a.sample_id
group by t.name, s.name
having count_of_users > 1
While I try hard not to think in SQL when I'm coming up with django model navigation (it tends to get in the way); when it comes to aggregation queries it always helps me to visualize what the SQL would be.
In django we now have aggregations.
Here is what I came up with:
models.Annot.objects.select_related().values(
'tag__name','sample__name').annotate(
count_of_users=Count('user__id')).filter(count_of_users__gt=1)
The result set will contain the tag, the sample, and the count of users that tagged said sample with said tag.
Breaking it apart for the folks that are not used to django aggregation:
models.Annot.objects.select_related()
select_related() is forcing all tables related to Annot to be retrieved in the same query
This is what will allow me to specify tag__name and sample__name in the values() call
values('tag__name','sample__name')
values() is limiting the fields to retrieve to tag.name and sample.name
This makes sure that my aggregation on count of clients will group by just these fields
annotate(count_of_users=Count('user__id'))
annotate() adds an aggregation as an extra field to a query
filter(count_of_users__gt=1)
And finally I filter on the aggregate count.
If you want to add an additional filter on what users should be taken into account, you need to do this:
models.Annot.objects.filter(user=[... list of users...]).select_related().values(
'tag__name','sample__name').annotate(
count_of_users=Count('user__id')).filter(count_of_users__gt=1)
I think that is it.
One thing... Notice that I used tag__name and sample__name in the query above. But your models do not specify that tag names and sample names are unique.
Should they be unique? Add a unique=True to the field definitions in the models.
Shouldn't they be unique? You need to replace tag__name and sample__name with tag__id and sample__id in the query above.