Django annotate and values(): extra field in 'group by' causes unexpected results - django

I must be missing something obvious, as the behavior is not as expected for this simple requirement. Here is my model class:
class Encounter(models.Model):
activity_type = models.CharField(max_length=2,
choices=(('ip','ip'), ('op','op'), ('ae', 'ae')))
cost = models.DecimalField(max_digits=8, decimal_places=2)
I want to find the total cost for each activity type. My query is:
>>> Encounter.objects.values('activity_type').annotate(Sum('cost'))
Which yields:
>>> [{'cost__sum': Decimal("140.00"), 'activity_type': u'ip'},
{'cost__sum': Decimal("100.00"), 'activity_type': u'op'},
{'cost__sum': Decimal("0.00"), 'activity_type': u'ip'}]
In the result set there are 2 'ip' type encounters. This is because it is not grouped by only activity_type but by activity_type AND cost which does not give the intended result. The generated SQL query for this is:
SELECT "encounter_encounter"."activity_type",
SUM("encounter_encounter"."total_cost") AS "total_cost__sum"
FROM "encounter_encounter"
GROUP BY "encounter_encounter"."activity_type",
"encounter_encounter"."total_cost" <<<< THIS MESSES THINGS
ORDER BY "encounter_encounter"."total_cost" DESC
How can I make this query work as expected (and as implied by the docs if I am not getting it wrong) and make it only do a group by on activity_type?

As #Skirmantas correctly pointed, the problem was related to order_by. Although it is not explicitly stated in the query, the default ordering in the model's Meta class is added to the query, which is then added to the group by clause because SQL requires so.
The solution is either to remove the default ordering or add an empty order_by() to reset ordering:
>>> Encounter.objects.values('activity_type').annotate(Sum('cost')).order_by()

Related

Filter with order for two fields

I have Model Klass with fields like this:
date_start = models.DateField(null=True, blank=True, default=date.today)
date_finish = models.DateField(null=True, blank=True)
As you see, date_start will be usually filled but date_finish may not. If neither one is filled we should not consider this record in further filtering.
I would like to see objects (first N results) ordered by latest date regardless if that's date_start or date_finish. To be exact: considered date shall be date_finish if exists, date_start otherwise.
Please note, that I don't want N/2 finished items and N/2 only started items concatenated but recent N "touched" ones.
My first idea is to provide this extra field of considered_date that would be filled as I proposed but I don't know how to implement this. Shall it be done:
on Model level, so new DateField to be added and have it's content always filled with sth
selected for 2 seperate conditions (N elements each), then provided with temporary extra field (but without saving into db), then 2 sets joined and ordered by again for this new condition
Fun fact: I also have BooleanField that indicates if period is closed or not. I needed it for simplicity and filtering but we could use this here as well. It's obviously handled by save() function (default True, set to False if date_finish gets filled).
To be honest this "feature" is not critical in my app. It's just displaying some "latest changes" on welcome page so can be triggered quite often.
It seems that your description of considered_date is a perfect use case for COALESCE sql function, which returns first not NULL value from its arguments.
So, in plain SQL this will return value of date_finish if it is not NULL or date_start otherwise (assuming date_start is never NULL as it has the default value)
COALESCE(your_table_name.date_finish, your_table_name.date_start);
Django ORM has an API for this function. Using it with annotate queryset method, we can build the query you want without creating extra fields on the model.
Let's call your model TestModel just for convenience
from django.db.models.functions import Coalesce
TestModel.objects.annotate(
latest_touched_date=Coalesce("date_finish", "date_start")
).order_by("-latest_touched_date")
Please note, this will work only if date_finish is bigger than date_start. (which I think is true)
You can add exclude to filter any records where both date_finish and date_start are NULLs
from django.db.models.functions import Coalesce
TestModel.objects.exclude(date_start__isnull=True, date_finish__isnull=True).annotate(
latest_touched_date=Coalesce("date_finish", "date_start")
).order_by("-latest_touched_date")
Just slice the queryset to get first N results.

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

distinct() is not working

I'm trying to group duplicate values but it's not working. I've google many times and they point distinct() function. No matter what I do is not working. I try distinct() before in other queries (not mine) and it's working, now I'm using it, it's not working.
Here are my codes:
models.py
class Transaction(models.Model):
payee = models.CharField(
max_length=255
)
views.py
transactions = Transaction.objects.values_list('payee', flat=True).distinct()
output:
[u'YOUR LOCAL SUPERMARKET',
u'CITY OF SPRINGFIELD',
u'SPRINGFIELD WATER UTILITY',
u'DEPOSIT',
u'DEPOSIT']
Notice the output there is duplicate for DEPOSIT
When you have defined an ordering the distinct() will take these fields into account when trying to do the SQL and thusly can return strange results.
You can therefore:
either skip ordering,
call an empty order_by() in your query,
you can define what fields you want to have distinct() on.
So on your case the query would be
Transaction.objects.order_by('payee').distinct('payee')
this will disregard any ordering you might have and it will also be a bit more clearer to whats happening but this comes at the cost of only being available in PostGresSQL.
Read more here in the docs

What is the internal function in django to add new tables to a queryset in a sensible way?

In django 1.2:
I have a queryset with an extra parameter which refers to a table which is not currently included in the query django generates for this queryset.
If I add an order_by to the queryset which refers to the other table, django adds joins to the other table in the proper way and the extra works. But without the order_by, the extra parameter is failing. I could just add a useless secondary order_by to something in the other table, but I think there should be a better way to do it.
What is the django function to add joins in a sensible way? I know this must be getting called somewhere.
Here is some sample code. It selects all readings for a given user, and annotates the results with the rating (if any) given by another user stored in 'friend'.
class Book(models.Model):
name = models.CharField(max_length=200)
urlname = models.CharField(max_length=200)
entrydate=models.DateTimeField(auto_now_add=True)
class Reading(models.Model):
book=models.ForeignKey(Book,related_name='readings')
user=models.ForeignKey(User)
rating=models.IntegerField()
entrydate=models.DateTimeField(auto_now_add=True)
readings=Reading.objects.filter(user=user).order_by('entrydate')
friendrating='(select rating from proj_reading where user_id=%d and \
book_id=proj_book.id and rating in (1,2,3,4,5,6))'%friend.id
readings=readings.extra(select={'friendrating':friendrating})
at the moment, readings won't work because the join to readings is not set up correctly. however, if I add an order by such as:
.order_by('entrydate','reading__entrydate')
django magically knows to add an inner join through the foreign key and I get what I want.
additional information:
print readings.query ==>
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
assuming
user.id=1
friend.id=2
the error is:
OperationalError: Unknown column proj_book.id in 'where clause'
and it happens because the table proj_book is not included in the query. To restate what I said above - if I now do readings2=readings.order_by('book__entrydate') I can see the proper join is set up and the query works.
Ideally I'd just like to figure out what the name of the qs.query function is that looks at two tables and figures out how they are joined by foreign keys, and just call that manually.
Your generated query:
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
The db has no way to understand what does it mean by proj_book, since it is not included in (from tables or inner join).
You are getting expected results, when you add order_by, because that order_by query is adding inner join between proj_book and proj_reading.
As far as I understand, if you refer any other column in Book, not just order_by, you will get similar results.
Q1 = Reading.objects.filter(user=user).exclude(Book__name='') # Exclude forces to add JOIN
Q2 = "Select rating from proj_reading where user_id=%d" % user.id
Result = Q1.extra("foo":Q2)
This way, at step Q1, you are forcing DJango to add join on Book table, which is not default, unless you access any field of Book table.
you mean:
class SomeModel(models.Model)
id = models.IntegerField()
...
class SomeOtherModel(models.Model)
otherfield = models.ForeignKey(SomeModel)
qrst = SomeOtherModel.objects.filter(otherfield__id=1)
You can use "__" to create table joins.
EDIT:
It wont work because you do not define table join correctly.
myrating='(select rating from proj_reading inner join proj_book on (proj_book.id=proj_reading_id) where proj_reading.user_id=%d and rating in (1,2,3,4,5,6))'%user.id)'
This is a pesdocode and it is not tested.
But, i advice you to use django filters instead of writing sql queries.
read = Reading.objects.filter(book__urlname__icontains="smith", user_id=user.id, rating__in=(1,2,3,4,5,6)).values('rating')
Documentation for more details.

chain filter and exclude on django model with field lookups that span relationships

I have the following models:
class Order_type(models.Model):
description = models.CharField()
class Order(models.Model):
type= models.ForeignKey(Order_type)
order_date = models.DateField(default=datetime.date.today)
status = models.CharField()
processed_time= models.TimeField()
I want a list of the order types that have orders that meet this criteria: (order_date <= today AND processed_time is empty AND status is not blank)
I tried:
qs = Order_type.objects.filter(order__order_date__lte=datetime.date.today(),\
order__processed_time__isnull=True).exclude(order__status='')
This works for the original list of orders:
orders_qs = Order.objects.filter(order_date__lte=datetime.date.today(), processed_time__isnull=True)
orders_qs = orders_qs.exclude(status='')
But qs isn't the right queryset. I think its actually returning a more narrowed filter (since no records are present) but I'm not sure what. According to this (django reference), because I'm referencing a related model I think the exclude works on the original queryset (not the one from the filter), but I don't get exactly how.
OK, I just thought of this, which I think works, but feels sloppy (Is there a better way?):
qs = Order_type.objects.filter(order__id__in=[o.id for o in orders_qs])
What's happening is that the exclude() query is messing things up for you. Basically, it's excluding any Order_type that has at least one Order without a status, which is almost certainly not what you want to happen.
The simplest solution in your case is to use order__status__gt='' in you filter() arguments. However, you will also need to append distinct() to the end of your query, because otherwise you'd get a QuerySet with multiple instances of the same Order_type if it has more than one Order that matches the query. This should work:
qs = Order_type.objects.filter(
order__order_date__lte=datetime.date.today(),
order__processed_time__isnull=True,
order__status__gt='').distinct()
On a side note, in the qs query you gave at the end of the question, you don't have to say order__id__in=[o.id for o in orders_qs], you can simply use order__in=orders_qs (you still also need the distinct()). So this will also work:
qs = Order_type.objects.filter(order__in=Order.objects.filter(
order_date__lte=datetime.date.today(),
processed_time__isnull=True).exclude(status='')).distinct()
Addendum (edit):
Here's the actual SQL that Django issues for the above querysets:
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
LEFT OUTER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE ("testapp_order"."order_date" <= E'2010-07-18'
AND "testapp_order"."processed_time" IS NULL
AND "testapp_order"."status" > E'' );
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
INNER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE "testapp_order"."id" IN
(SELECT U0."id" FROM "testapp_order" U0
WHERE (U0."order_date" <= E'2010-07-18'
AND U0."processed_time" IS NULL
AND NOT (U0."status" = E'' )));
EXPLAIN reveals that the second query is ever so slightly more expensive (cost of 28.99 versus 28.64 with a very small dataset).