Using Annotate & Artithmetic in a Django subquery - django

I am trying to improve my understanding of the Django queryset syntax and am hoping that someone could help me check my understanding.
Could this:
total_packed = (
PackingRecord.objects.filter(
product=OuterRef('pk'), fifolink__sold_out=False
).values('product') # Group by product
.annotate(total=Sum('qty')) # Sum qty for 'each' product
.values('total')
)
total_sold = (
FifoLink.objects.filter(
packing_record__product=OuterRef('pk'), sold_out=False
).values('packing_record__product')
.annotate(total=Sum('sale__qty'))
.values('total')
)
output = obj_set.annotate(
sold=Subquery(total_sold[:1]),
packed=Subquery(total_packed[:1]),
).annotate(
in_stock=F('packed') - F('sold')
)
be safely reduced to this:
in_stock = (
FifoLink.objects.filter(
packing_record__product=OuterRef('pk'), sold_out=False
).values('packing_record__product')
.annotate(total=Sum(F('sale__qty')-F('packing_record__qty')))
.values('total')
)
output = obj_set.annotate(
in_stock=Subquery(total_sold[:1]),
)
Basically, I am trying to move the math being completed in the outer .annotate() into the queryset itself by using the fk relationship instead of running two separate querysets. I think this is allowed, but I am not sure if I am understanding it correctly.

Related

Using annotated value again in a Django queryset

In this annotation, I'm creating a value ever_error that is 1 if for any unique 'image__image_category__label', 'image__subject', 'image__event' combination and 0 otherwise
d = ImageAcquisitionJob.active_objects.values('image__image_category__label', 'image__subject', 'image__event')\
.annotate(
ever_error=(
Max( Case(When(has_error=True, then=Value(1)), default=Value(0), output_field=IntegerField()) )
)
)
Now, my objective is to make a new annotation that sums up the ever_error attribute for each 'image__image_category__label'
d = d.values('image__image_category__label').annotate(has_error=Sum('ever_error'))
This is not possible however because I get the error: Cannot compute Sum('ever_error'): 'ever_error' is an aggregate
The core of my problem is that I need to use the first annotation again as a regular field but this isn't permitted. Is there a way to save the annotated field so that I can use it again?

Can we do arithmetic using Django Subqueries?

I am wondering if Django's ORM allows us to do aggregate operations on subqueires, and then do arithmetic with the resulting values.
What would be the proper way to go about something like this:
record = PackingRecord.objects.filter(product=OuterRef('pk'))
packed = FifoLink.objects.filter(packing_record__product=OuterRef('pk'))
output = obj_set.annotate(
in_stock=(Subquery(record.aggregate(Sum('qty'))) - Subquery(packed.aggregate(Sum('sale__qty'))))
).values('id', 'name', 'in_stock')
You certainly can but as far as I have dig, you cannot use aggregate(). When trying to use aggregate() in a Subquery django complains about trying to execute a query that has OuterRefs. The way I do this (I really don't know if this is THE way - according to the docs it is -) is by using annotate(). In a case like the one you have in your example I'd do something like the following:
records_total = (PackingRecord.objects.filter(product=OuterRef('pk'))
.values('product') # Group by product
.annotate(total=Sum('qty')) # Sum qty for 'each' product
.values('total')
)
packed_total = (FifoLink.objects.filter(packing_record__product=OuterRef('pk'))
.values('packing_record__product') # Group by packing_record__product
.annotate(total=Sum('sale__qty')) # Sum sale__qty for 'each' product
.values('total')
)
output = obj_set.annotate(
r_tot=Subquery(record_total[:1]),
p_tot=Subquery(packed_total[:1])
).annotate(
in_stock=F('r_tot')-F('p_tot')
) # Whatever you need
I did not run the example, so it may need some adjustments here and there.

Django - How to use update on an annotated queryset instead of using a for loop?

I have to build a field from an annotation, the field is a score which is equal to the division of two other annotations which are based on different Querysets Expressions.
annotated_queryset = Influencer.objects.annotate(
score_sum=Sum('submission__answer__decisions__score'),
submission_count=Count('submission',
filter=Q(
submission__status__in=Submission.ANSWERED_STATUS
)
)
).annotate(rate=F('score_sum') / F('submission_count'))
This snippet shows how i get my annotation.
Now i'd love to be able to do the following :
annotated_queryset.update(acceptation_rate=F('rate'))
But I get a FieldError: Aggregate functions are not allowed in this query
My only solutions is to use the ugly and expensive for loop :
for each in annotated_queryset:
each.acceptation_rate = each.rate
each.save()
My questions are :
Why can't I use the annotate + update form ?
Is there any better way than the for loop to do this ? What would it be ?
Edit, Using Subqueries :
annotated_queryset = Influencer.objects.annotate(
score_sum=Sum('submission__answer__score'),
submission_count=Count('submission',
filter=Q(
submission__status__in=Submission.ANSWERED_STATUS
)
)
).annotate(
rate=ExpressionWrapper(
F('score_sum') / F('submission_count'),
output_field=FloatField()
)
)[:1]
Influencer.objects.update(acceptation_rate=Subquery(annotated_queryset))

Django conditional Subquery aggregate

An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
The problem here is that this absolutely murders performance due to the N+1 problem.
I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.
Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.
I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)
EDIT: Example of raw sql solution
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
I think with Subquery we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
The generated SQL is
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
1. GROUP BY using .values and .annotate
2. order_by() problems
3. Subquery
You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
I like the following way of doing it:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
The more details on this method you can see in this answer: https://stackoverflow.com/a/69020732/10567223

using Filtered Count in django over joined tables returns wrong values

To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, where the relation between B and Category is n to 1)
What I need is to calculate the occurrence rate of A in Category which is obtained using:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
ANum = Count('id', distinct=False)
)
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1 which is not the desired outcome. The problem is that I have to filter the calculation based on the dates that B has been occurred on(and some other fields in B table),
I am using django 2.0's feature which makes using filter as an argument in aggregation possible.
Let's assume:
kwargs= {}
kwargs['relation_set__B____BDate__gte'] = the_start_limit
I could use it in my code like:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Count('id', distinct=False, filter=Q(**kwargs))
)
However the result I get is duplicated due to the table joins and I cannot use distinct=True as I explained. (querying A is also a must since I have to aggregate some other fields on this table as explained in my question here)
I am using Postgres and django 2.0.1 .
Is there any workarounds to achieve what I have in mind?
Update
Got it done using another Subquery:
# subquery
annotation = {
'ANum': Count('relation_set__A_id', distinct=False,
filter=Q(**Bkwargs),
}
sub_filter = Q(relation_set__A_id=OuterRef('id')) &
Q(Category_id=OuterRef('relation_set__B__Category_id'))
# you could annotate 'relation_set__B__Category_id' to A query an set the field here.
subquery = B.objects.filter(
sub_filter
).values(
'relation_set__A_id'
).annotate(**annotation).values('ANum')[:1]
# main query
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Subquery(subquery)
)
I'm still not sure if I understood what you want. You write
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1
Of course. You count the associated A-object to each A-object. Each counts itself. So I still think you don't want to annotate A-objects with Anum, but probably Categories. This one should give you the desired number of As in each Category.
Category.objects.annotate(
Anum=Count(
'b__relation__a',
filter=Q(b__BDate__gte=the_start_limit),
distinct=True
)
)
'b__relation__a' follows the relations backwards and picks all A-objects that are related to the Category. However the filter limits the counted relations to certain Bs. The distinct=True is needed to avoid a query bug.
If you really want "a list of A objects grouped by its id" (and not only the aggregated Anum-count), as you stated in your comment, I don't see an easy way to do that in a single query.