I am wondering if Django's ORM allows us to do aggregate operations on subqueires, and then do arithmetic with the resulting values.
What would be the proper way to go about something like this:
record = PackingRecord.objects.filter(product=OuterRef('pk'))
packed = FifoLink.objects.filter(packing_record__product=OuterRef('pk'))
output = obj_set.annotate(
in_stock=(Subquery(record.aggregate(Sum('qty'))) - Subquery(packed.aggregate(Sum('sale__qty'))))
).values('id', 'name', 'in_stock')
You certainly can but as far as I have dig, you cannot use aggregate(). When trying to use aggregate() in a Subquery django complains about trying to execute a query that has OuterRefs. The way I do this (I really don't know if this is THE way - according to the docs it is -) is by using annotate(). In a case like the one you have in your example I'd do something like the following:
records_total = (PackingRecord.objects.filter(product=OuterRef('pk'))
.values('product') # Group by product
.annotate(total=Sum('qty')) # Sum qty for 'each' product
.values('total')
)
packed_total = (FifoLink.objects.filter(packing_record__product=OuterRef('pk'))
.values('packing_record__product') # Group by packing_record__product
.annotate(total=Sum('sale__qty')) # Sum sale__qty for 'each' product
.values('total')
)
output = obj_set.annotate(
r_tot=Subquery(record_total[:1]),
p_tot=Subquery(packed_total[:1])
).annotate(
in_stock=F('r_tot')-F('p_tot')
) # Whatever you need
I did not run the example, so it may need some adjustments here and there.
Related
In use: django 3.2.10, postgresql 13.4
I have next query set with aggregation function Count
queryset = Model.objects.all().aggregate(
trues=Count('id', filter=Q(criteria=True)),
falses=Count('id', filter=Q(criteria=False)),
)
What I want:
queryset = Model.objects.all().aggregate(
trues=Count('id', filter=Q(criteria=True)),
falses=Count('id', filter=Q(criteria=False)),
total=trues+falses, <--------------THIS
)
How to do this?
There is little thing you can do after aggregation, as it returns a python dict object.
I do understand your example here is not your real situation, as you can simply do
Model.objects.aggregate(
total = (Count('id', filter=Q(criteria=True))
+ Count('id', filter=Q(criteria=False)))
)
What I want to say is Django provides .values().annotate() to achieve GROUP BY clause as in sql language.
Take your example here
queryset = Model.objects.values('criteria').annotate(count=Count('id'))
queryset here is still a 'QuerySet' object, and you can further modify the queryset like
queryset = queryset.aggregate(
total=Sum('count')
)
Hopefully it helps.
it seems you want the total number of false and true criteria so you can simply do as follow
queryset = Model.objects.all().filter(
Q(criteria=True) | Q(criteria=False)).count()
or you can use (not recommended except you want to show something in the middle)
from django.db.models import Avg, Case, Count, F, Max, Min, Prefetch, Q, Sum, When
query = Model.objects.annotate(trues=Count('id',filter=Q(criteria=True)),
falses=Count('id',filter=Q(criteria=False))).annotate(trues_false=F('trues')+F('falses')).aggregate(total=Sum('trues_false'))
An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
The problem here is that this absolutely murders performance due to the N+1 problem.
I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.
Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.
I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)
EDIT: Example of raw sql solution
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
I think with Subquery we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
The generated SQL is
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
1. GROUP BY using .values and .annotate
2. order_by() problems
3. Subquery
You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
I like the following way of doing it:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
The more details on this method you can see in this answer: https://stackoverflow.com/a/69020732/10567223
To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B.
A <--- Relation ---> B ---> Category
(So the relation between A and B is n to n, where the relation between B and Category is n to 1)
What I need is to calculate the occurrence rate of A in Category which is obtained using:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
ANum = Count('id', distinct=False)
)
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1 which is not the desired outcome. The problem is that I have to filter the calculation based on the dates that B has been occurred on(and some other fields in B table),
I am using django 2.0's feature which makes using filter as an argument in aggregation possible.
Let's assume:
kwargs= {}
kwargs['relation_set__B____BDate__gte'] = the_start_limit
I could use it in my code like:
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Count('id', distinct=False, filter=Q(**kwargs))
)
However the result I get is duplicated due to the table joins and I cannot use distinct=True as I explained. (querying A is also a must since I have to aggregate some other fields on this table as explained in my question here)
I am using Postgres and django 2.0.1 .
Is there any workarounds to achieve what I have in mind?
Update
Got it done using another Subquery:
# subquery
annotation = {
'ANum': Count('relation_set__A_id', distinct=False,
filter=Q(**Bkwargs),
}
sub_filter = Q(relation_set__A_id=OuterRef('id')) &
Q(Category_id=OuterRef('relation_set__B__Category_id'))
# you could annotate 'relation_set__B__Category_id' to A query an set the field here.
subquery = B.objects.filter(
sub_filter
).values(
'relation_set__A_id'
).annotate(**annotation).values('ANum')[:1]
# main query
A.objects.values(
'id', 'relation_set__B__Category_id'
).annotate(
Anum = Subquery(subquery)
)
I'm still not sure if I understood what you want. You write
Please notice that If I use 'distinct=True' instead every and each 'Anum' would be equal to 1
Of course. You count the associated A-object to each A-object. Each counts itself. So I still think you don't want to annotate A-objects with Anum, but probably Categories. This one should give you the desired number of As in each Category.
Category.objects.annotate(
Anum=Count(
'b__relation__a',
filter=Q(b__BDate__gte=the_start_limit),
distinct=True
)
)
'b__relation__a' follows the relations backwards and picks all A-objects that are related to the Category. However the filter limits the counted relations to certain Bs. The distinct=True is needed to avoid a query bug.
If you really want "a list of A objects grouped by its id" (and not only the aggregated Anum-count), as you stated in your comment, I don't see an easy way to do that in a single query.
I am running django with postgres and I need to query some record from a table, sorting them by rank, and get unique entry in respect of a foreign key.
Basically my model is something like this:
class BookingCatalog(models.Model):
.......
boat = models.ForeignKey(Boat, verbose_name=u"Boat", related_name="booking_catalog")
is_skippered = models.BooleanField(u'Is Skippered',choices=SKIPPER_CHOICE, default=False)
rank = models.IntegerField(u"Rank", default=0, db_index=True)
.......
The idea is to run something like this
BookingCatalog.objects.filter (...).order_by ('-rank', 'boat', 'is_skippered').distinct ('boat')
Unfortunately, this is not working since I am using postgres which raises this exception:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
What should I do instead?
The distinct argument has to match the first order argument. Try using this:
BookingCatalog.objects.filter(...) \
.order_by('boat', '-rank', 'is_skippered') \
.distinct('boat')
The way that I do this is to select the distinct objects first, then use those results to filter another queryset.
# Initial filtering
result = BookingCatalog.objects.filter(...)
# Make the results distinct
result = result.order_by('boat').distinct('boat')
# Extract the pks from the result
result_pks = result.values_list("pk", flat=True)
# Use those result pks to create a new queryset
restult_2 = BookingCatalog.objects.filter(pk__in=result_pks)
# Order that queryset
result_2 = result_2.order_by('-rank', 'is_skippered')
print(result_2)
I believe that this results in a single query being executed, which contains a subquery. I would love for someone who knows more about Django to confirm this though.
..ordering by -rank will give you the lowest rank of each duplicate, but your overall query results will be ordered by boat field
BookingCatalog.objects.filter (...).order_by('boat','-rank','is_skippered').distinct('boat')
For more info on, refer to Django documentation
including for Postgres
I have a (working) query that looks like
authors = Authors.objects.complicated_queryset()
with_scores = authors.annotate(total_book_score=Sum('books__score'))
It finds all authors who are returned by a complicated_queryset method, and then sums up the total of the scores of their books. However, I wish to amend this QuerySet such that it only includes the scores from the books published the last year. In pretend syntax:
with_scores = authors.annotate(total_book_score=Sum('books__score'),
filter=Q(books__published=2015))
Is this possible with QuerySets or do I have to write raw SQL (or, I guess, two separate queries) to get that behaviour?
You could try using Case if you're using Django 1.8+
DISCLAIMER: The following code is an aproximation, I haven't tested this, so this could not work exactly in this way.
# You will need import:
from django.db.models import Sum, IntegerField, Case, When, Value
with_scores = authors.annotate(total_book_score=Sum(
Case(When(books__published=2015, then=Value(F('books__score'))),
default=Value(0), output=IntegerField()) # Or float if it fits your needs.
)
)