django aggregate aggregated fields? - django

I have a model called Item, with m2m relation to User ("owner").
For each item, I need to count users who own it. That's easy enough with annotate()
But then I need to calculate ratio between owners of specific gender and total owner count for each item. For example if, 2 males own the item out of 5 users, the ratio is 0.4.
What's the best way to do that?

To do this with the ORM, you need conditional aggregates, which aren't supported in Django. http://www.voteruniverse.com/Members/jlantz/blog/conditional-aggregates-in-django proposes a hacky solution that might work.
If you don't need to sort by the ratio, then you can make two calls to annotate, and then compute the ratio in Python. Something like:
items = Item.objects.annotate(ucount=Count('users')).annotate(ccount=CountIf(<condition>))
for item in items:
item.ratio = item.ucount / item.ccount
If you don't want to do that, I'd recommend using the extra() method and some custom sql to get the extra info you want. Documentation for that method is on the Django Queryset API documentation page.

Just on top of my head, something like the following could work. Iterate on it to get your perfect solution if you wish:
items = Item.objects.annotate(Count('users'))
for item in items:
total = item.users__count
num_males = item.users.filter(gender='M').count()
num_females = item.users.filter(gender='F').count()

Related

Django Aggregate with several models

I have these models :
class Package(models.Model):
title = CharField(...)
class Item(models.Model)
package = ForeignKey(Package)
price = FloatField(...)
class UserItem(models.Model)
user = ForeignKey(User)
item = ForeignKey(Item)
purchased = BooleanField()
I am trying to achieve 2 functionality with the best performance possible :
In my templete I would like to calculate each package price sum of all its items. (Aggregate I assume ?)
More complicated : I wish that for each user I can sum up the price of all item purchased. so the purchased = True.
Assume I have 10 items in one package which each of them cost 10$ the package sum should be 100$. assume the user purchase 5 items the second sum should be 50$.
I can easily do simple queries with templetetags but I believe it can be done better ? (Hopefully)
To total the price for a specific package a_package you can use this code
Item.objects.filter(package=a_package).aggregate(Sum('price'))
There is a a guide on how to do these kind of queries, and the aggregate documentation with all the different functions described.
This kind of query can also solve your second problem.
UserItem.objects.filter(user=a_user).filter(purchased=True).aggregate(sum('price'))
You can also use annotate() to attach the count to each object, see the first link above.
The most elegant way in my opinion would be to define a method total on the Model class and decorate it as a property. This will return the total (using Django ORM's Sum aggregate) for either Package or User.
Example for class Package:
from django.db.models import Sum
...
class Package(models.Model):
...
#property
def total(self):
return self.item_set.aggregate(Sum('price'))
In your template code you would use total as any other model attribute. E.g.:
{{ package_instance.total }}
#Vic Smith got the solution.
But I would add a price attribute on the package model if you wish
the best performance possible
You would add a on_save signal to Item, and if created, you update the related package object.
This way you can get the package price very quickly, and even make quick sorting, comparing, etc.
Plus, I don't really get the purpose of the purchased attribute. But you probably want to make a ManyToMany relationship between Item and User, and define UserItem as the connection with the trhough parameter.
Anyway, my experience is that you usually want to make a relationship between Item and a Purchasse objet, which is linked to User, and not a direct link (unless you start to get performances issues...). Having Purchasse as a record of the event "the user bough this and that" make things easier to handle.

Custom ordering by calculation in Django

Any solutions for custom calculation sorting in Django? I want to create a view that shows the Top Posts in my Blog. The ranking will be calculated by Post's attributes. Let's just say I have 3 IntegerFields called x, y, and z, and the ranking calculation will be x * y / z.
Any ideas? I would like to do Top Post ever, and also other variations filtered by time such as last 24 hours, 7 days, 1 month, etc.
Thanks!
You can use extra to retrieve extra calculated column(s) and sort by it:
MyModel.objects.filter(post_date__lt=#date#)
.extra(select={'custom_order': "x*y/z"}).order_by('custom_order')
The problem with this approach is that you're writing sql so it is not always portable across databases (although, for the example you supplied, this problem is avoided because it's a simple calculation)
Otherwise, you can do the sorting with pure python:
sorted_models = sorted(MyModel.objects.filter(post_date__lt=#date#)
, key=lambda my_model:my_model.x*my_model.y/my_model.z))
The extra() queryset method should allow you to do this. See the docs
As you can't order querysets by methods and properties in django you have to do the sorting in python.
Consider turning your calculated field into a property on your model and then you can do this in your view:
sorted_posts = sorted(Post.objects.all(), key=lambda post: post.calculated_field )
Finally you can pass sorted_posts to your list-template.

Django annotation with nested filter

Is it possible to filter within an annotation?
In my mind something like this (which doesn't actually work)
Student.objects.all().annotate(Count('attendance').filter(type="Excused"))
The resultant table would have every student with the number of excused absences. Looking through documentation filters can only be before or after the annotation which would not yield the desired results.
A workaround is this
for student in Student.objects.all():
student.num_excused_absence = Attendance.objects.filter(student=student, type="Excused").count()
This works but does many queries, in a real application this can get impractically long. I think this type of statement is possible in SQL but would prefer to stay with ORM if possible. I even tried making two separate queries (one for all students, another to get the total) and combined them with |. The combination changed the total :(
Some thoughts after reading answers and comments
I solved the attendance problem using extra sql here.
Timmy's blog post was useful. My answer is based off of it.
hash1baby's answer works but seems equally complex as sql. It also requires executing sql then adding the result in a for loop. This is bad for me because I'm stacking lots of these filtering queries together. My solution builds up a big queryset with lots of filters and extra and executes it all at once.
If performance is no issue - I suggest the for loop work around. It's by far the easiest to understand.
As of Django 1.8 you can do this directly in the ORM:
students = Student.objects.all().annotate(num_excused_absences=models.Sum(
models.Case(
models.When(absence__type='Excused', then=1),
default=0,
output_field=models.IntegerField()
)))
Answer adapted from another SO question on the same topic
I haven't tested the sample above but did accomplish something similar in my own app.
You are correct - django does not allow you to filter the related objects being counted, without also applying the filter to the primary objects, and therefore excluding those primary objects with a no related objects after filtering.
But, in a bit of abstraction leakage, you can count groups by using a values query.
So, I collect the absences in a dictionary, and use that in a loop. Something like this:
# a query for students
students = Students.objects.all()
# a query to count the student attendances, grouped by type.
attendance_counts = Attendence(student__in=students).values('student', 'type').annotate(abs=Count('pk'))
# regroup that into a dictionary {student -> { type -> count }}
from itertools import groupby
attendance_s_t = dict((s, (dict(t, c) for (s, t, c) in g)) for s, g in groupby(attendance_counts, lambda (s, t, c): s))
# then use them efficiently:
for student in students:
student.absences = attendance_s_t.get(student.pk, {}).get('Excused', 0)
Maybe this will work for you:
excused = Student.objects.filter(attendance__type='Excused').annotate(abs=Count('attendance'))
You need to filter the Students you're looking for first to just those with excused absences and then annotate the count of them.
Here's a link to the Django Aggregation Docs where it discusses filtering order.

Django query to select parent with nonzero children

I have model with a Foreign Key to itself like this:
class Concept(models.Model):
name = models.CharField(max_length=200)
category = models.ForeignKey('self')
But I can't figure out how I can select all concepts that have nonzero children value. Is this possible with django QuerySet API or I must write custom SQL?
If I understand it correctly, each Concept may have another Concept as parent, and this is set into the category field.
In other words, a Concept with at least a child will be referenced at least once in the category field.
Generally speaking, this is not really easy to get in Django; however if you do not have too many categories, you can think for a query of the like of SELECT * FROM CONCEPTS WHERE CONCEPTS.ID IN (SELECT CATEGORY FROM CONCEPTS); - and this is something you can map easily with Django:
Concept.objects.filter(pk__in=Concept.objects.all().values('category'))
Note that, as stated on Django documentation, this query may have performance issues on certain databases; therefore you should instead put it as a list:
Concept.objects.filter(id__in=list(Concept.objects.all().values('category')))
But please be aware that this could hit some database limitation -- for instance, Oracle allows up to 1000 elements in such lists.
How about something like this:
concepts = Concept.objects.exclude(category=None)
The way you have it written there will require a value for category. Once you have fixed that (with null=True in the field constructor), use this:
Concept.objects.filter(category__isnull=False)

How do I get the related objects In an extra().values() call in Django?

Thank to this post I'm able to easily do count and group by queries in a Django view:
Django equivalent for count and group by
What I'm doing in my app is displaying a list of coin types and face values available in my database for a country, so coins from the UK might have a face value of "1 farthing" or "6 pence". The face_value is the 6, the currency_type is the "pence", stored in a related table.
I have the following code in my view that gets me 90% of the way there:
def coins_by_country(request, country_name):
country = Country.objects.get(name=country_name)
coin_values = Collectible.objects.filter(country=country.id, type=1).extra(select={'count': 'count(1)'},
order_by=['-count']).values('count', 'face_value', 'currency_type')
coin_values.query.group_by = ['currency_type_id', 'face_value']
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
The currency_type_id comes across as the number stored in the foreign key field (i.e. 4). What I want to do is retrieve the actual object that it references as part of the query (the Currency model, so I can get the Currency.name field in my template).
What's the best way to do that?
You can't do it with values(). But there's no need to use that - you can just get the actual Collectible objects, and each one will have a currency_type attribute that will be the relevant linked object.
And as justinhamade suggests, using select_related() will help to cut down the number of database queries.
Putting it together, you get:
coin_values = Collectible.objects.filter(country=country.id,
type=1).extra(
select={'count': 'count(1)'},
order_by=['-count']
).select_related()
select_related() got me pretty close, but it wanted me to add every field that I've selected to the group_by clause.
So I tried appending values() after the select_related(). No go. Then I tried various permutations of each in different positions of the query. Close, but not quite.
I ended up "wimping out" and just using raw SQL, since I already knew how to write the SQL query.
def coins_by_country(request, country_name):
country = get_object_or_404(Country, name=country_name)
cursor = connection.cursor()
cursor.execute('SELECT count(*), face_value, collection_currency.name FROM collection_collectible, collection_currency WHERE collection_collectible.currency_type_id = collection_currency.id AND country_id=%s AND type=1 group by face_value, collection_currency.name', [country.id] )
coin_values = cursor.fetchall()
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
If there's a way to phrase that exact query in the Django queryset language I'd be curious to know. I imagine that an SQL join with a count and grouping by two columns isn't super-rare, so I'd be surprised if there wasn't a clean way.
Have you tried select_related() http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
I use it a lot it seems to work well then you can go coin_values.currency.name.
Also I dont think you need to do country=country.id in your filter, just country=country but I am not sure what difference that makes other than less typing.