Conditional Aggregation of Foreign Key fields

Conditional Aggregation of Foreign Key fields - django

I would like to get the count of foreign key objects with django, the foreign key itself will change conditionally. So, something like the example below.
Game.objects.annotate(
filled=models.Case(
models.When(
GreaterThan(
models.F("size_max"),
(
models.Count(
models.Case(
models.When(
participant_type=1, then="players"
),
models.When(
participant_type=2, then="teams",
),
),
),
),
),
then=1,
),
default=0,
)
)
What I'd like to achieve is this:
players and teams are reverse foreign keys to Game. I want to check whether the size_max field of Game exceeds the count of players or teams depending on the participant_type. How would I go about achieving this? Any help would be appreciated.
The above query results in an error - it introduces a GROUP BY with the model name in it. So, something like
GROUP BY ('Game'), "game"."id"
which I have no clue why this happens.

Related

Django annotate exclude with Case & When (Conditional Expression)

I'm using Django 2.2
While making queryset, I want count of related model, based on few conditions like
queryset = self.model.objects.filter(user=self.request.user).annotate(
count_videos=Count('video'),
count_completed=Count(
Case(
When(video__status__in=Video.STATUS_LIST_COMPLETED)
)
),
count_failed=Count(
Case(
When(video__status__in=Video.STATUS_LIST_FAILED)
)
),
count_pending=Count(
Case(
When(
video__status__not_in=Video.STATUS_LIST_PENDING_EXCLUDE
)
)
)
)
Here 3 counts are working, but in last count count_pending, I have to count against exlude(). i.e., count number of records excluding the passed list.
How can I use exclude with the above statement?

We can negate the value we pass to the filter= parameter [Django-doc]:
from django.db.models import Count, Q
queryset = self.model.objects.filter(user=self.request.user).annotate(
count_videos=Count('video'),
count_completed=Count(
'video',
filter=Q(video__status__in=STATUS_LIST_COMPLETED)
),
count_failed=Count(
'video',
filter=Q(video__status__in=Video.STATUS_LIST_FAILED)
),
count_pending=Count(
'video',
filter=~Q(video__status__in=Video.STATUS_LIST_PENDING_EXCLUDE)
)
)
This will result in a query like:
SELECT model.*,
COUNT(
CASE WHEN NOT video.status IN STATUS_LIST_PENDING_EXCLUDE
AND video.status IS NOT NULL
THEN video.id
ELSE NULL END
) AS count_pending
FROM model
LEFT OUTER JOIN video ON model.id = video.model_id
GROUP BY model.id

Apologies for the reply to a super old question, but this one hits high on searches for this topic. I needed a very similar thing and wanted a count but had some odd conditions I couldn't work out with ~Q and landed on an annotate that looked like the following. Posting here only for case for someone that happens to need something similar.
I required a count of Reviews completed, and those in progress, but if the review.status was UNTOUCHED it wasn't to get counted in the 'in progress' or 'completed' bin. I used Case with the default value set to 1 for the "not" condition (not completed) then wrapped the Case in a Sum as shown. There were about 9 different status's that indicated 'in progress' and I didn't want to name them all.
.values(___bunch_of_group_by_fields_here___)\
.annotate(
completed=Sum(Case(
When(status__in=[Review.REVIEW_COMPLETE,
], then=Value(1)),
default=Value(0),
output_field=IntegerField(),
)),
# essentially: ( not (review complete or untouched) )
# gets all the status between untouched (default first step) and
# complete (final status in the workflow for a review) without having
# to specify all the in between statuses
inprogress=Sum(Case(
When(status__in=[Review.REVIEW_COMPLETE,
Review.UNTOUCHED
], then=Value(0)),
default=Value(1),
output_field=IntegerField(),
))

django prefetch_related & Prefetch nested

I'm trying to return, for each UserProfile, which has one-to-many Subscription, which has a Foreignkey to both Artist and UserProfile, with each artist having many ReleaseGroup, the count of future release groups that each UserProfile have.
In short: I want to return the total count of upcoming releases for all of the subscription that each of the users have.
However I'm getting stuck way before I get to count...
context['test_totals'] = UserProfile.objects.prefetch_related(
Prefetch('subscription_set', queryset=Subscription.objects.
prefetch_related(Prefetch('artist', queryset=Artist.objects.
prefetch_related(Prefetch('release_groups',
queryset=ReleaseGroup.objects.filter(
release_date__gte=startdate
), to_attr='rggg')), to_attr='arti')), to_attr='arts'))
accessing userprofile.arts|length in template returns total number of subscription, but rggg and arti return nothing. How can this be done?
I tried using filtering on self with, say, filter(profile='userprofile)`, but that returns an error. If I could filter on self I could probably get this to work?

After tons of help from Nicholas Cluade LeBlanc, below is the working query:
UserProfile.objects.annotate(rgs=Count(
Case(
When(subscriptions__artist__release_groups__release_date__gte=startdate, then=F('subscriptions__artist__release_groups__release_date')),
When(subscriptions__artist__release_groups__release_date__lt=startdate, then=None),
output_field=DateField()
)
))
As Nicholas suggested, subscriptions is the profile related_query_name set in Subscription.

context['test_totals'] = UserProfile.objects.prefetch_related(
Prefetch(
'subscription_set',
queryset=Subscription.objects.select_related(
'artist', 'profile').prefetch_related(
Prefetch(
'artist__release_groups',
queryset=ReleaseGroup.objects.filter(
release_date__gte=startdate
),
to_attr='release_groups'
)
),
to_attr='subscriptions'
)
)
I haven't had the chance to test this, but it should work. you were using prefetch_related on a foreign key artist which is not supported; prefetch_related is meant for relations to support a list of items. So, you prefetch the subscription_set and use select_related on the artist, then prefetch the artist__release_groups relationship. now you should have profile_instance.subscriptions ...subscriptions[index].artist ...subscriptions[index].artist.release_groups
*EDIT:
After discussion with the OP, we wanted to use this method but the Date filter is not used.
UserProfile.objects.annotate(
rgs=Count(
'subscription_set__artist__release_groups',
filter=Q(subscription_set__artist__release_groups__release_date__gte=startdate),
distinct=True
)
)
The real answer is to use django.db.models Case and When as the OP and I found. See his answer for the finished query

Django conditional Subquery aggregate

An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
The problem here is that this absolutely murders performance due to the N+1 problem.
I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.
Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.
I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)
EDIT: Example of raw sql solution
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")

I think with Subquery we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
The generated SQL is
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
1. GROUP BY using .values and .annotate
2. order_by() problems
3. Subquery

You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)

I like the following way of doing it:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
The more details on this method you can see in this answer: https://stackoverflow.com/a/69020732/10567223

using Filtered Count in django over joined tables returns wrong values

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, where the relation between B and Category is n to 1)
What I need is to calculate the occurrence rate of A in Category which is obtained using:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
ANum = Count('id', distinct=False)
)
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1 which is not the desired outcome. The problem is that I have to filter the calculation based on the dates that B has been occurred on(and some other fields in B table),
I am using django 2.0's feature which makes using filter as an argument in aggregation possible.
Let's assume:
kwargs= {}
kwargs['relation_set__B____BDate__gte'] = the_start_limit
I could use it in my code like:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Count('id', distinct=False, filter=Q(**kwargs))
)
However the result I get is duplicated due to the table joins and I cannot use distinct=True as I explained. (querying A is also a must since I have to aggregate some other fields on this table as explained in my question here)
I am using Postgres and django 2.0.1 .
Is there any workarounds to achieve what I have in mind?
Update
Got it done using another Subquery:
# subquery
annotation = {
'ANum': Count('relation_set__A_id', distinct=False,
filter=Q(**Bkwargs),
}
sub_filter = Q(relation_set__A_id=OuterRef('id')) &
Q(Category_id=OuterRef('relation_set__B__Category_id'))
# you could annotate 'relation_set__B__Category_id' to A query an set the field here.
subquery = B.objects.filter(
sub_filter
).values(
'relation_set__A_id'
).annotate(**annotation).values('ANum')[:1]
# main query
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Subquery(subquery)
)

I'm still not sure if I understood what you want. You write
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1
Of course. You count the associated A-object to each A-object. Each counts itself. So I still think you don't want to annotate A-objects with Anum, but probably Categories. This one should give you the desired number of As in each Category.
Category.objects.annotate(
Anum=Count(
'b__relation__a',
filter=Q(b__BDate__gte=the_start_limit),
distinct=True
)
)
'b__relation__a' follows the relations backwards and picks all A-objects that are related to the Category. However the filter limits the counted relations to certain Bs. The distinct=True is needed to avoid a query bug.
If you really want "a list of A objects grouped by its id" (and not only the aggregated Anum-count), as you stated in your comment, I don't see an easy way to do that in a single query.

Django count of related objects with conditions

I'm trying to get the count of related objects with a condition:
Item.objects.annotate(count_subitems=Count('subitems'))
Subitem has a created_at column, which I need to use for filtering the count (greater than a date, less than a date or between dates).
How can I do this with the Django ORM?

Maybe you're looking for something like this:
from django.db.models import Count, Sum, Case, When, IntegerField
Item.objects.annotate(
count_subitems=Sum(
Case(
When(subitems__created_at__lte=datetime.now(), then=1)
),
output_field=IntegerField()
)
)

Filter the Items that have at least one subitem matching, and then count all the subitems for that Item:
(Item.objects
.filter(subitems__created_at__lte=datetime.now())
.annotate(count_subitems=Count('subitems')))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Conditional Aggregation of Foreign Key fields - django

Related

Django annotate exclude with Case & When (Conditional Expression)

django prefetch_related & Prefetch nested

Django conditional Subquery aggregate

using Filtered Count in django over joined tables returns wrong values

Django count of related objects with conditions

Categories

Resources