Django count of related objects with conditions - django

I'm trying to get the count of related objects with a condition:
Item.objects.annotate(count_subitems=Count('subitems'))
Subitem has a created_at column, which I need to use for filtering the count (greater than a date, less than a date or between dates).
How can I do this with the Django ORM?

Maybe you're looking for something like this:
from django.db.models import Count, Sum, Case, When, IntegerField
Item.objects.annotate(
count_subitems=Sum(
Case(
When(subitems__created_at__lte=datetime.now(), then=1)
),
output_field=IntegerField()
)
)

Filter the Items that have at least one subitem matching, and then count all the subitems for that Item:
(Item.objects
.filter(subitems__created_at__lte=datetime.now())
.annotate(count_subitems=Count('subitems')))

Related

The annotation 'id' conflicts with a field on the model

In Annotate I am trying to get the count of quires for which is_healthy is True but I am getting an Error The annotation 'id' conflicts with a field on the model.
Any solution to solve this? and why is this causing how can i resolve this?
DeviceHealthHistory.objects.filter(**filter_data).values(
id = F('id'),
).annotate(
healthy_count=Count('id', filter=Q(is_healthy=True)),
)
If you are just looking for count then you use count function fo queryset:
DeviceHealthHistory.objects.filter(**filter_data).filter(is_healthy=True).count()
To get other fields along with the count.
DeviceHealthHistory.objects.filter(**filter_data).values(
'other_field1'
).annotate(
healthy_count=Count('id', filter=Q(is_healthy=True)),
)
You should count with:
DeviceHealthHistory.objects.filter(**filter_data, is_healthy=True).count()
This will filter on the **filter_data and also ensure that it only counts records with is_healthy=True. We then count the number of records.
If you want to "group by" a certain field, like patient_id, you can work with:
DeviceHealthHistory.objects.filter(**filter_data).values('patient_id').annotate(
n=Count('pk', filter=Q(is_healthy=True))
).order_by('patient_id')
This will produce a queryset of dictionaries with 'patient_id' and 'n' as keys, and the patient and the corresponding counts as values.

Django ORM query - Sum inside annotate using when condition

I have a table, lets call it as DummyTable.
It has fields - price_effective, store_invoice_updated_date, bag_status, gstin_code.
Now I want to get the output which does a group by of - month, year from the field store_invoice_updated_date and gstin_code.
Along with that group by I wanna do thse calculations -
Sum of price_effective as 'forward_price_effective' if the bag_status is other than 'return_accepted' or 'rto_bag_accepted'. Dont know how to do an exclude here i.e. using a filter in annotate
Sum of price effective as 'return_price_effective' if the bag_status is 'return_accepted' or 'rto_bag_accepted'.
A field 'total_price' that subtracts the 'return_price_effective' from 'forward_price_effective'.
I have formulated this query, which doesn't work
from django.db.models.functions import TruncMonth
from django.db.models import Count, Sum, When, Case, IntegerField
DummyTable.objects.annotate(month=TruncMonth('store_invoice_updated_date'), year=TruncYear('store_invoice_updated_date')).annotate(forward_price_effective=Sum(Case(When(bag_status__in=['delivery_done']), then=Sum(forward_price_effective)), output_field=IntegerField()), return_price_effective=Sum(Case(When(bag_status__in=['return_accepted', 'rto_bag_accepted']), then=Sum('return_price_effective')), output_field=IntegerField())).values('month','year','forward_price_effective', 'return_price_effective', 'gstin_code')
Solved it by multiple querysets.
Just couldnt find out a way to appropriately use 'Case' with 'When' with 'filter' and 'exclude'.
basic_query = BagDetails.objects.filter(store_invoice_updated_date__year__in=[2018]).annotate(month=TruncMonth('store_invoice_updated_date'), year=TruncYear('store_invoice_updated_date') ).values('year', 'month', 'gstin_code', 'price_effective', 'company_id', 'bag_status')
forward_bags = basic_query.exclude(bag_status__in=['return_accepted', 'rto_bag_accepted']).annotate(
Sum('price_effective')).values('year', 'month', 'gstin_code', 'price_effective', 'company_id')
return_bags = basic_query.filter(bag_status__in=['return_accepted', 'rto_bag_accepted']).annotate(
Sum('price_effective')).values('month', 'gstin_code', 'price_effective', 'company_id')

django prefetch_related & Prefetch nested

I'm trying to return, for each UserProfile, which has one-to-many Subscription, which has a Foreignkey to both Artist and UserProfile, with each artist having many ReleaseGroup, the count of future release groups that each UserProfile have.
In short: I want to return the total count of upcoming releases for all of the subscription that each of the users have.
However I'm getting stuck way before I get to count...
context['test_totals'] = UserProfile.objects.prefetch_related(
Prefetch('subscription_set', queryset=Subscription.objects.
prefetch_related(Prefetch('artist', queryset=Artist.objects.
prefetch_related(Prefetch('release_groups',
queryset=ReleaseGroup.objects.filter(
release_date__gte=startdate
), to_attr='rggg')), to_attr='arti')), to_attr='arts'))
accessing userprofile.arts|length in template returns total number of subscription, but rggg and arti return nothing. How can this be done?
I tried using filtering on self with, say, filter(profile='userprofile)`, but that returns an error. If I could filter on self I could probably get this to work?
After tons of help from Nicholas Cluade LeBlanc, below is the working query:
UserProfile.objects.annotate(rgs=Count(
Case(
When(subscriptions__artist__release_groups__release_date__gte=startdate, then=F('subscriptions__artist__release_groups__release_date')),
When(subscriptions__artist__release_groups__release_date__lt=startdate, then=None),
output_field=DateField()
)
))
As Nicholas suggested, subscriptions is the profile related_query_name set in Subscription.
context['test_totals'] = UserProfile.objects.prefetch_related(
Prefetch(
'subscription_set',
queryset=Subscription.objects.select_related(
'artist', 'profile').prefetch_related(
Prefetch(
'artist__release_groups',
queryset=ReleaseGroup.objects.filter(
release_date__gte=startdate
),
to_attr='release_groups'
)
),
to_attr='subscriptions'
)
)
I haven't had the chance to test this, but it should work. you were using prefetch_related on a foreign key artist which is not supported; prefetch_related is meant for relations to support a list of items. So, you prefetch the subscription_set and use select_related on the artist, then prefetch the artist__release_groups relationship. now you should have profile_instance.subscriptions ...subscriptions[index].artist ...subscriptions[index].artist.release_groups
*EDIT:
After discussion with the OP, we wanted to use this method but the Date filter is not used.
UserProfile.objects.annotate(
rgs=Count(
'subscription_set__artist__release_groups',
filter=Q(subscription_set__artist__release_groups__release_date__gte=startdate),
distinct=True
)
)
The real answer is to use django.db.models Case and When as the OP and I found. See his answer for the finished query

Django conditional Subquery aggregate

An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
The problem here is that this absolutely murders performance due to the N+1 problem.
I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.
Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.
I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)
EDIT: Example of raw sql solution
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
I think with Subquery we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
The generated SQL is
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
1. GROUP BY using .values and .annotate
2. order_by() problems
3. Subquery
You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
I like the following way of doing it:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
The more details on this method you can see in this answer: https://stackoverflow.com/a/69020732/10567223

using Filtered Count in django over joined tables returns wrong values

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, where the relation between B and Category is n to 1)
What I need is to calculate the occurrence rate of A in Category which is obtained using:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
ANum = Count('id', distinct=False)
)
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1 which is not the desired outcome. The problem is that I have to filter the calculation based on the dates that B has been occurred on(and some other fields in B table),
I am using django 2.0's feature which makes using filter as an argument in aggregation possible.
Let's assume:
kwargs= {}
kwargs['relation_set__B____BDate__gte'] = the_start_limit
I could use it in my code like:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Count('id', distinct=False, filter=Q(**kwargs))
)
However the result I get is duplicated due to the table joins and I cannot use distinct=True as I explained. (querying A is also a must since I have to aggregate some other fields on this table as explained in my question here)
I am using Postgres and django 2.0.1 .
Is there any workarounds to achieve what I have in mind?
Update
Got it done using another Subquery:
# subquery
annotation = {
'ANum': Count('relation_set__A_id', distinct=False,
filter=Q(**Bkwargs),
}
sub_filter = Q(relation_set__A_id=OuterRef('id')) &
Q(Category_id=OuterRef('relation_set__B__Category_id'))
# you could annotate 'relation_set__B__Category_id' to A query an set the field here.
subquery = B.objects.filter(
sub_filter
).values(
'relation_set__A_id'
).annotate(**annotation).values('ANum')[:1]
# main query
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Subquery(subquery)
)
I'm still not sure if I understood what you want. You write
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1
Of course. You count the associated A-object to each A-object. Each counts itself. So I still think you don't want to annotate A-objects with Anum, but probably Categories. This one should give you the desired number of As in each Category.
Category.objects.annotate(
Anum=Count(
'b__relation__a',
filter=Q(b__BDate__gte=the_start_limit),
distinct=True
)
)
'b__relation__a' follows the relations backwards and picks all A-objects that are related to the Category. However the filter limits the counted relations to certain Bs. The distinct=True is needed to avoid a query bug.
If you really want "a list of A objects grouped by its id" (and not only the aggregated Anum-count), as you stated in your comment, I don't see an easy way to do that in a single query.