Django conditionally count annotated values in "group-by" statement

Django conditionally count annotated values in "group-by" statement - django

Having the following models:
class TheModel(models.Model):
created_at = models.DateTimeField(default=datetime.now)
class Item(models.Model):
the_model = models.ForeignKey(TheModel, on_delete=models.CASCADE, related_name='items')
How can be calculated the number of models and how many of them have more than 2 items grouped by day?
I tried:
qs = models.TheModel.objects.all()
qs = qs.annotate(contained_items=Count('items'))
result = qs.values('created_at__date').annotate(
total_count=Count('created_at__date'),
models_with_contained_items=Count('created_at__date', filter=Q(contained_items__gt=2))
)
But it raises "OperationalError" "misuse of aggregate function COUNT()"

You can do it as follows:
from django.db.models.functions import ExtractDay, ExtractMonth, ExtractYear
query_set = Model.objects.filter(contained_items__gt=2).annotate(day=ExtractDay('created_at'), month=ExtractMonth('created_at'), year=ExtractYear('created_at')).values('day', 'month', 'year').annotate(total_count=Count('items')).values('day', 'month', 'year', 'total_count').order_by()
Read more about Extract
A question might arise, why order_by() is used at last? It is used because at the end Django always applies its default ordering so you might get unexpected results and not get the data grouped, so to overcome that .order_by() is used without any parameters to tell django to not apply any ordering at the end.

Related

Django: Filtering a related field by date yields unwanted results

models:
class Vehicle(models.Model):
licence_plate = models.CharField(max_length=16)
class WorkTime(models.Model):
work_start = models.DateTimeField()
work_end = models.DateTimeField()
vehicle = models.ForeignKey(Vehicle, on_delete=models.SET_NULL, related_name="work_times")
However when I try to filter those working times using:
qs = Vehicle.objects.filter(
work_times__work_start__date__gte="YYYY-MM-DD",
work_times__work_end__date__lte="YYYY-MM-DD").distinct()
I get results that do not fit the timeframe given. Most commonly when the work_end fits to something, it returns everything from WorkTime
What I would like to have:
for vehicle in qs:
for work_time in vehicle.work_times:
print(vehicle, work_time.work_start, work_time.work_end)

The filter has no effect on the .work_times from the Vehicles, it only will ensure that the Vehicles in the qs will contain at least one WorkTime in the given range.
You can work with a Prefetch object [Django-doc] to allow filtering efficiently on a related manager:
from django.db.models import Prefetch
qs = Vehicle.objects.prefetch_related(
Prefetch(
'work_times',
WorkTime.objects.filter(
work_start__date__range=('2021-03-01', '2021-03-12')
),
to_attr='filtered_work_times'
)
)
and then you can work with:
for vehicle in qs:
for work_time in vehicle.filtered_work_times:
print(vehicle, work_time.work_start, work_time.work_end)

Django: cannot annotate using prefetch calculated attribute

Target is to sum and annotate workingtimes for each employee on a given time range.
models:
class Employee(models.Model):
first_name = models.CharField(max_length=64)
class WorkTime(models.Model):
employee = models.ForeignKey(Employee, on_delete=models.CASCADE, related_name="work_times")
work_start = models.DateTimeField()
work_end = models.DateTimeField()
work_delta = models.IntegerField(default=0)
def save(self, *args, **kwargs):
self.work_delta = (self.work_end - self.work_start).seconds
super().save(*args, **kwargs)
getting work times for each employee at a given date range:
queryset = Employee.objects.prefetch_related(
Prefetch(
'work_times',
queryset=WorkTime.objects.filter(work_start__date__range=("2021-03-01", "2021-03-15"]))
.order_by("work_start"),
to_attr="filtered_work_times"
)).all()
trying to annotate sum of work_delta to each employee:
queryset.annotate(work_sum=Sum("filtered_work_times__work_delta"))
This causes a FieldError:
Cannot resolve keyword 'filtered_work_times' into field. Choices are: first_name, id, work_times
How would one proceed from here? Using Django 3.1 btw.

You should use filtering on annotations.
I haven't tried, but I think the following code might help you:
from django.db.models import Sum, Q
Employee.objects.annotate(
work_sum=Sum(
'work_times__work_delta',
filter=Q(work_times__work_start__date__range=["2021-03-01", "2021-03-15"])
)
)

You cannot use the prefetch_related values in the query because simply the prefetching is done separately, Django would first fetch the current objects and then make queries to fetch the related objects so the field you try to refer is not even part of the query you want to add it to.
Instead of doing this simply add a filter [Django docs] keyword argument to your aggregation function:
from django.db.models import Q
start_date = datetime.date(2021, 3, 1)
end_date = datetime.date(2021, 3, 15)
result = queryset.annotate(work_sum=Sum("work_times__work_delta", filter=Q(work_times__work_start__date__range=(start_date, end_date))))

Django attribute of most recent reverse relation

I have two models:
class Test(models.Model):
test_id = models.CharField(max_length=20, unique=True, db_index=True)
class TestResult(models.Model):
test = models.ForeignKey("Test", to_field="test_id", on_delete=models.CASCADE)
status = models.CharField(max_length=30, choices=status_choices)
with status_choices as an enumeration of tuples of strings.
Some Test objects may have zero related TestResult objects, but most have at least one.
I want to filter Test objects based on their most recent TestResult status.
I have tried this:
queryset = Test.objects.all()
queryset = queryset.annotate(most_recent_result_pk=Max("testresult__pk"))
queryset = queryset.annotate(current_status=Subquery(TestResult.objects.filter(pk=OuterRef("most_recent_result")).values("status")[:1]))
But I get the error:
column "u0.status" must appear in the GROUP BY clause or be used in an
aggregate function LINE 1: ...lts_testresult"."id") AS
"most_recent_result_pk", (SELECT U0."status...
I can find the most recent TestResult object fine with the first annotation of the pk, but the second annotation breaks everything. It seems like it ought to be easy to find an attribute of the TestResult object, once its pk is known. How can I do this?

You can do this with one subquery, without annotating this first:
from django.db.models import OuterRef, Subquery
queryset = Test.objects.annotate(
current_status=Subquery(
TestResult.objects.filter(
test=OuterRef('pk')
).order_by('-pk').values('status')[:1])
)
This will generate a query that looks like:
SELECT test.*,
(SELECT U0.status
FROM testresult U0
WHERE U0.test_id = test.id
ORDER BY U0.id DESC
LIMIT 1
) AS current_status
FROM test
or without subquery:
from django.db.models import F, Max
queryset = Test.objects.annotate(
max_testresult=Max('testresult__test__testresult__pk')
).filter(
testresult__pk=F('max_testresult')
).annotate(
current_status=F('testresult__status')
)
That being said, ordering by primary key is not a good idea to retrieve the latest object. You can see primary keys as "blackboxes" that simply hold a value to refer to it.
It is often better to use a column that stores the timestamp:
class TestResult(models.Model):
test = models.ForeignKey("Test", to_field="test_id", on_delete=models.CASCADE)
status = models.CharField(max_length=30, choices=status_choices)
created = models.DateTimeField(auto_now_add=True)
and then query with:
from django.db.models import OuterRef, Subquery
queryset = Test.objects.annotate(
current_status=Subquery(
TestResult.objects.filter(
test=OuterRef('pk')
).order_by('-created').values('status')[:1])
)

Getting distinct objects of a queryset from a reverse relation in Django

class Customer(models.Model):
name = models.CharField(max_length=189)
class Message(models.Model):
message = models.TextField()
customer = models.ForeignKey(Customer, on_delete=models.CASCADE, related_name="messages")
created_at = models.DateTimeField(auto_now_add=True)
What I want to do here is that I want to get the queryset of distinct Customers ordered by the Message.created_at. My database is mysql.
I have tried the following.
qs = Customers.objects.all().order_by("-messages__created_at").distinct()
m = Messages.objects.all().values("customer").distinct().order_by("-created_at")
m = Messages.objects.all().order_by("-created_at").values("customer").distinct()
In the end , I used a set to accomplish this, but I think I might be missing something. My current solution:
customers = set(Interaction.objects.all().values_list("customer").distinct())
customer_list = list()
for c in customers:
customer_list.append(c[0])
EDIT
Is it possible to get a list of customers ordered by according to their last message time but the queryset will also contain the last message value as another field?

Based on your comment you want to order the customers based on their latest message. We can do so by annotating the Customers and then sort on the annotation:
from dango.db.models import Max
Customer.objects.annotate(
last_message=Max('messages__crated_at')
).order_by("-last_message")
A potential problem is what to do for Customers that have written no message at all. In that case the last_message attribute will be NULL (None) in Python. We can specify this with nulls_first or nulls_last in the .order_by of an F-expression. For example:
from dango.db.models import F, Max
Customer.objects.annotate(
last_message=Max('messages__crated_at')
).order_by(F('last_message').desc(nulls_last=True))
A nice bonus is that the Customer objects of this queryset will have an extra attribute: the .last_message attribute will specify what the last time was when the user has written a message.
You can also decide to filter them out, for example with:
from dango.db.models import F, Max
Customer.objects.filter(
messages__isnull=False,
).annotate(
last_message=Max('messages__crated_at')
).order_by('-last_message')

Django query with order_by, distinct and limit on Postgresql

I have the following :
class Product(models.Model):
name = models.CharField(max_length=255)
class Action(models.Model):
product = models.ForeignKey(Product)
created_at = models.DateTimeField(auto_now_add=True)
I would like to retrieve the 10 most recent actions ordered by created_at DESC with distinct products.
The following is close to the result but still misses the ordering:
Action.objects.all().order_by('product_id').distinct('product_id')[:10]

Your solution seems like it's trying to do too much. It will also result in 2 separate SQL queries. This would work fine and with only a single query:
action_ids = Action.objects.order_by('product_id', '-created_at')\
.distinct('product_id').values_list('id', flat=True)
result = Action.objects.filter(id__in=action_ids)\
.order_by('-created_at')[:10]

EDIT: this solution works but Ross Lote's is cleaner
This is the way I finally did it, using Django Aggregation:
from django.db.models import Max
actions_id = Action.objects.all().values('product_id') \
.annotate(action_id=Max('id')) \
.order_by('-action_id')[:10] \
.values_list('action_id', flat=True)
result = Action.objects.filter(id__in=actions_id).order_by('-created_at')
By setting values('product_id') we do a group by on product_id.
With annotate() we can use order_by only on fields used in values() or annotate(). Since for each action the created_at field is automatically set to now, ordering on created_at is the same as ordering on id, using annotate(action_id=Max('id')).order_by('-action_id') is the right way.
Finnaly, we just need to slice our query [:10]
Hope this helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django conditionally count annotated values in "group-by" statement - django

Related

Django: Filtering a related field by date yields unwanted results

Django: cannot annotate using prefetch calculated attribute

Django attribute of most recent reverse relation

Getting distinct objects of a queryset from a reverse relation in Django

Django query with order_by, distinct and limit on Postgresql

Categories

Resources