django prefetch_related & Prefetch nested - django

I'm trying to return, for each UserProfile, which has one-to-many Subscription, which has a Foreignkey to both Artist and UserProfile, with each artist having many ReleaseGroup, the count of future release groups that each UserProfile have.
In short: I want to return the total count of upcoming releases for all of the subscription that each of the users have.
However I'm getting stuck way before I get to count...
context['test_totals'] = UserProfile.objects.prefetch_related(
Prefetch('subscription_set', queryset=Subscription.objects.
prefetch_related(Prefetch('artist', queryset=Artist.objects.
prefetch_related(Prefetch('release_groups',
queryset=ReleaseGroup.objects.filter(
release_date__gte=startdate
), to_attr='rggg')), to_attr='arti')), to_attr='arts'))
accessing userprofile.arts|length in template returns total number of subscription, but rggg and arti return nothing. How can this be done?
I tried using filtering on self with, say, filter(profile='userprofile)`, but that returns an error. If I could filter on self I could probably get this to work?

After tons of help from Nicholas Cluade LeBlanc, below is the working query:
UserProfile.objects.annotate(rgs=Count(
Case(
When(subscriptions__artist__release_groups__release_date__gte=startdate, then=F('subscriptions__artist__release_groups__release_date')),
When(subscriptions__artist__release_groups__release_date__lt=startdate, then=None),
output_field=DateField()
)
))
As Nicholas suggested, subscriptions is the profile related_query_name set in Subscription.

context['test_totals'] = UserProfile.objects.prefetch_related(
Prefetch(
'subscription_set',
queryset=Subscription.objects.select_related(
'artist', 'profile').prefetch_related(
Prefetch(
'artist__release_groups',
queryset=ReleaseGroup.objects.filter(
release_date__gte=startdate
),
to_attr='release_groups'
)
),
to_attr='subscriptions'
)
)
I haven't had the chance to test this, but it should work. you were using prefetch_related on a foreign key artist which is not supported; prefetch_related is meant for relations to support a list of items. So, you prefetch the subscription_set and use select_related on the artist, then prefetch the artist__release_groups relationship. now you should have profile_instance.subscriptions ...subscriptions[index].artist ...subscriptions[index].artist.release_groups
*EDIT:
After discussion with the OP, we wanted to use this method but the Date filter is not used.
UserProfile.objects.annotate(
rgs=Count(
'subscription_set__artist__release_groups',
filter=Q(subscription_set__artist__release_groups__release_date__gte=startdate),
distinct=True
)
)
The real answer is to use django.db.models Case and When as the OP and I found. See his answer for the finished query

Related

Django - Is it possible to prefetch multiple filters of a single field for a queryset?

I know you can prefetch a single filtered queryset E.g.
Parent.objects.all()
.prefetch_related(
Prefetch("child_set", queryset=Child.objects.filter(type="A")
)
That way running obj.child_set.all().count() will return the count of related A Childs without running another query.
But what if I wanted to have the B count too? So the following would take 2 queries - can I somehow prefetch them both?
return {
"a_count": obj.log_set.filter(type="A").all().count(),
"b_count": obj.log_set.filter(type="B").all().count(),
}
Edit:
I've tried
Parent.objects.all()
.prefetch_related(
Prefetch("child_set", queryset=Child.objects.filter(type="A"),
Prefetch("child_set", queryset=Child.objects.filter(type="B")
)
But that gives me the following error when I try to access the object:
{
"detail": "Not found."
}
With regards to your main question, you can use Prefetch..[Django-doc] object on the same field with different filters, but assign them with different to_attr values like this:
from django.db.models import Prefetch
Parent.objects.prefetch_related(
Prefetch(
"child_set",
queryset=Child.objects.filter(type="A"),
to_attr="child_set_a"
),
Prefetch(
"child_set",
queryset=Child.objects.filter(type="B"),
to_attr="child_set_b"
),
)

Django ORM: how to make aggregation with filter with annotated field?

I have view with statistics where I calculate multiple count's on different filters of some base query set:
qs = Model.onjects.filter(...).annotate(a=...)
a = qs.filter(Q(a__lt=5)).count()
b = qs.filter(Q(a__lt=10)).count() # this is just an example, real filters are more complex
...
But each count makes separate query to the DB and I want to optimize it. I tried aggregation:
qs.aggregate(
a=Count('a', filter=Q(a__lt=5)),
b=Count('a', filter=Q(a__lt=10)),
)
but got an error: django.db.utils.OperationalError: (1054, "Unknown column '__col2' in 'field list'"). I don't even know where this __col2 comes from.
It seems like aggregation doesn't work well with annotation because when I use regular model field inside count.filter instead of annotated field a everything is fine.
If you're on Django 2.2, your approach should work, as explained here. You probably should count on 'pk' as shown there, not sure you can count on the annotation itself:
qs.aggregate(
a=Count('pk', filter(Q(a__lt=5)),
b=Count('pk', filter(Q(a__lt=10))
)
If you're on Django 1.11, the above approach doesn't work as Count will always return 1, ignoring the filter. You should use Case ... When:
qs.aggregate(
a=Sum(
Case(When(a__lt=5, then=1),
output_field=IntegerField())
),
b=Sum(
Case(When(a__lt=10, then=1),
output_field=IntegerField())
)
)

Django conditional Subquery aggregate

An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
The problem here is that this absolutely murders performance due to the N+1 problem.
I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.
Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.
I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)
EDIT: Example of raw sql solution
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
I think with Subquery we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
The generated SQL is
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
1. GROUP BY using .values and .annotate
2. order_by() problems
3. Subquery
You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
I like the following way of doing it:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
The more details on this method you can see in this answer: https://stackoverflow.com/a/69020732/10567223

using Filtered Count in django over joined tables returns wrong values

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, where the relation between B and Category is n to 1)
What I need is to calculate the occurrence rate of A in Category which is obtained using:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
ANum = Count('id', distinct=False)
)
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1 which is not the desired outcome. The problem is that I have to filter the calculation based on the dates that B has been occurred on(and some other fields in B table),
I am using django 2.0's feature which makes using filter as an argument in aggregation possible.
Let's assume:
kwargs= {}
kwargs['relation_set__B____BDate__gte'] = the_start_limit
I could use it in my code like:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Count('id', distinct=False, filter=Q(**kwargs))
)
However the result I get is duplicated due to the table joins and I cannot use distinct=True as I explained. (querying A is also a must since I have to aggregate some other fields on this table as explained in my question here)
I am using Postgres and django 2.0.1 .
Is there any workarounds to achieve what I have in mind?
Update
Got it done using another Subquery:
# subquery
annotation = {
'ANum': Count('relation_set__A_id', distinct=False,
filter=Q(**Bkwargs),
}
sub_filter = Q(relation_set__A_id=OuterRef('id')) &
Q(Category_id=OuterRef('relation_set__B__Category_id'))
# you could annotate 'relation_set__B__Category_id' to A query an set the field here.
subquery = B.objects.filter(
sub_filter
).values(
'relation_set__A_id'
).annotate(**annotation).values('ANum')[:1]
# main query
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Subquery(subquery)
)
I'm still not sure if I understood what you want. You write
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1
Of course. You count the associated A-object to each A-object. Each counts itself. So I still think you don't want to annotate A-objects with Anum, but probably Categories. This one should give you the desired number of As in each Category.
Category.objects.annotate(
Anum=Count(
'b__relation__a',
filter=Q(b__BDate__gte=the_start_limit),
distinct=True
)
)
'b__relation__a' follows the relations backwards and picks all A-objects that are related to the Category. However the filter limits the counted relations to certain Bs. The distinct=True is needed to avoid a query bug.
If you really want "a list of A objects grouped by its id" (and not only the aggregated Anum-count), as you stated in your comment, I don't see an easy way to do that in a single query.

Django possible to nest prefetch related?

Suppose I have the following models.
class Item(models.Model):
seller = models.ForeignKey('seller.Seller')
class ItemSet(models.Model):
items = models.ManyToManyField("Item", related_name="special_items")
What I'd like to do is
For a given seller, retrieve all items which is stored as special_items in ItemSet.
The following code is what I come up with, havent tried, just a hunch it won't work. :(
(I want to retrieve item_founds and item_specials)
item_founds = Item.objects.filter(seller=seller).prefetch_related(
Prefetch(
"special_items",
queryset=Items.objects.prefetch_related("items")
)
)
item_specials = Item.objects.none()
for item_found in item_founds.all():
for special_item in item_found.special_items.all():
item_specials |= special_item.items.all()
In Django you must always perform your queries from the model that you want in the end. In your case that's seller.Seller. Otherwise you get both poor code and risk bad performance (like in your code where you have two lookups inside a loop...)
Seller.objects.get(pk=<seller id here>).prefetch_related(
'items',
queryset=Item.objects.exclude(
special_items=None
),
to_attr='item_specials'
)
Which should give you a Seller with a set of special items in item_specials attribute.