Django aggregation over choices - django

I have a following model:
class VotingRound(models.Model):
pass # here are some unimportant fields
class Vote(models.Model):
voting_round = models.ForeignKey(VotingRound)
vote = models.CharField(choices=...)
Now I have instance of VotingRound and I would like to get to know how many times was each value represented. This is easily done through collections.Counter:
>>> Counter(voting_round_instance.vote_set.values_list('vote', flat=True))
Counter({u'decline': 8, u'neutral': 5, u'approve': 4})
Now I would like to know if there is a way to do this with Django aggregation techniques....
I have found this module, but before using it I wanted to know if there is native way to do it.

yes, you can!
from django.db.models import Count
voting_round_instance.vote_set.values('vote') \
.annotate(count=Count('vote')).distinct()
EDIT : use order_by()
You may also need to make sure the default ordering does not mess up your aggregation. This is especially true when using related object managers.
https://docs.djangoproject.com/en/1.8/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate. This shows up, particularly, when counting things.
from django.db.models import Count
voting_round_instance.vote_set.values('vote') \
.annotate(count=Count('vote')) \
.distinct().order_by()

Related

How to get boolean result in annotate django?

I have a filter which should return a queryset with 2 objects, and should have one different field. for example:
obj_1 = (name='John', age='23', is_fielder=True)
obj_2 = (name='John', age='23', is_fielder=False)
Both the objects are of same model, but different primary key. I tried usign the below filter:
qs = Model.objects.filter(name='John', age='23').annotate(is_fielder=F('plays__outdoor_game_role')=='Fielder')
I used annotate first time, but it gave me the below error:
TypeError: QuerySet.annotate() received non-expression(s): False.
I am new to Django, so what am I doing wrong, and what should be the annotate to get the required objects as shown above?
The solution by #ktowen works well, quite straightforward.
Here is another solution I am using, hope it is helpful too.
queryset = queryset.annotate(is_fielder=ExpressionWrapper(
Q(plays__outdoor_game_role='Fielder'),
output_field=BooleanField(),
),)
Here are some explanations for those who are not familiar with Django ORM:
Annotate make a new column/field on the fly, in this case, is_fielder. This means you do not have a field named is_fielder in your model while you can use it like plays.outdor_game_role.is_fielder after you add this 'annotation'. Annotate is extremely useful and flexible, can be combined with almost every other expression, should be a MUST-KNOWN method in Django ORM.
ExpressionWrapper basically gives you space to wrap a more complecated combination of conditions, use in a format like ExpressionWrapper(expression, output_field). It is useful when you are combining different types of fields or want to specify an output type since Django cannot tell automatically.
Q object is a frequently used expression to specify a condition, I think the most powerful part is that it is possible to chain the conditions:
AND (&): filter(Q(condition1) & Q(condition2))
OR (|): filter(Q(condition1) | Q(condition2))
Negative(~): filter(~Q(condition))
It is possible to use Q with normal conditions like below:
(Q(condition1)|id__in=[list])
The point is Q object must come to the first or it will not work.
Case When(then) can be simply explained as if con1 elif con2 elif con3 .... It is quite powerful and personally, I love to use this to customize an ordering object for a queryset.
For example, you need to return a queryset of watch history items, and those must be in an order of watching by the user. You can do it with for loop to keep the order but this will generate plenty of similar queries. A more elegant way with Case When would be:
item_ids = [list]
ordering = Case(*[When(pk=pk, then=pos)
for pos, pk in enumerate(item_ids)])
watch_history = Item.objects.filter(id__in=item_ids)\
.order_by(ordering)
As you can see, by using Case When(then) it is possible to bind those very concrete relations, which could be considered as 1) a pinpoint/precise condition expression and 2) especially useful in a sequential multiple conditions case.
You can use Case/When with annotate
from django.db.models import Case, BooleanField, Value, When
Model.objects.filter(name='John', age='23').annotate(
is_fielder=Case(
When(plays__outdoor_game_role='Fielder', then=Value(True)),
default=Value(False),
output_field=BooleanField(),
),
)

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

Get information from a model using unrelated field

I have these two models:
class A(models.Model):
name=models.CharField(max_length=10)
class D(models.Model):
code=models.IntegerField()
the code field can have a number that exists in model A but it cant be related due to other factors. But what I want know is to list items from A whose value is the same with code
items=D.objects.values('code__name')
would work but since they are not related nor can be related, how can I handle that?
You can use Subquery() expressions in Django 1.11 or newer.
from django.db.models import OuterRef, Subquery
code_subquery = A.objects.filter(id=OuterRef('code'))
qs = D.objects.annotate(code_name=Subquery(code_subquery.values('name')))
The output of qs is a queryset of objects D with an added field code_name.
Footnotes:
It is compiled to a very similar SQL (like the Bear Brown's solution with "extra" method, but without disadvantages of his solution, see there):
SELECT app_d.id, app_d.code,
(SELECT U0.name FROM app_a U0 WHERE U0.id = (app_d.code)) AS code_name
FROM app_d
If a dictionary output is required it can be converted by .values() finally. It can work like a left join i.e. if the pseudo related field allows null (code = models.IntegerField(none=True)) then the objects D are not restricted and the output code_name value could be None. A feature of Subquery is that it returns only one field expression must be eventually repeated for another fields. (That is similar to extra(select={...: "SELECT ..."}), but thanks to object syntax it can be more readable customized than an explicit SQL.)
you can use django extra, replace YOUAPP on your real app name
D.objects.extra(select={'a_name': 'select name from YOUAPP_a where id=code'}).values('a_name')
# Replace YOUAPP^^^^^

django most efficient way to count same field values in a query

Lets say if I have a model that has lots of fields, but I only care about a charfield. Lets say that charfield can be anything so I don't know the possible values, but I know that the values frequently overlap. So I could have 20 objects with "abc" and 10 objects with "xyz" or I could have 50 objects with "def" and 80 with "stu" and i have 40000 with no overlap which I really don't care about.
How do I count the objects efficiently? What I would like returned is something like:
{'abc': 20, 'xyz':10, 'other': 10,000}
or something like that, w/o making a ton of SQL calls.
EDIT:
I dont know if anyone will see this since I am editing it kind of late, but...
I have this model:
class Action(models.Model):
author = models.CharField(max_length=255)
purl = models.CharField(max_length=255, null=True)
and from the answers, I have done this:
groups = Action.objects.filter(author='James').values('purl').annotate(count=Count('purl'))
but...
this is what groups is:
{"purl": "waka"},{"purl": "waka"},{"purl": "waka"},{"purl": "waka"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "lora"}
(I just filled purl with dummy values)
what I want is
{'waka': 4, 'mora': 5, 'lora': 1}
Hopefully someone will see this edit...
EDIT 2:
Apparently my database (BigTable) does not support the aggregate functions of Django and this is why I have been having all the problems.
You want something similar to "count ... group by". You can do this with the aggregation features of django's ORM:
from django.db.models import Count
fieldname = 'myCharField'
MyModel.objects.values(fieldname)
.order_by(fieldname)
.annotate(the_count=Count(fieldname))
Previous questions on this subject:
How to query as GROUP BY in django?
Django equivalent of COUNT with GROUP BY
This is called aggregation, and Django supports it directly.
You can get your exact output by filtering the values you want to count, getting the list of values, and counting them, all in one set of database calls:
from django.db.models import Count
MyModel.objects.filter(myfield__in=('abc', 'xyz')).\
values('myfield').annotate(Count('myfield'))
You can use Django's Count aggregation on a queryset to accomplish this. Something like this:
from django.db.models import Count
queryset = MyModel.objects.all().annotate(count = Count('my_charfield'))
for each in queryset:
print "%s: %s" % (each.my_charfield, each.count)
Unless your field value is always guaranteed to be in a specific case, it may be useful to transform it prior to performing a count, i.e. so 'apple' and 'Apple' would be treated as the same.
from django.db.models import Count
from django.db.models.functions import Lower
MyModel.objects.annotate(lower_title=Lower('title')).values('lower_title').annotate(num=Count('lower_title')).order_by('num')

Django DB, finding Categories whose Items are all in a subset

I have a two models:
class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].
How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.
Category.objects.filter(item__id__in=[1, 3, 5])
Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.
lets say you require all items to be in the following set:
allowable_items = set([1,3,4])
one bruteforce solution would be to check the item_set for every category as so:
categories_with_allowable_items = [
category for category in
Category.objects.all() if
set([item.id for item in category.item_set.all()]) <= allowable_items
]
but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):
categories_with_allowable_items = set([
item.category for item in
Item.objects.select_related('category').filter(pk__in=allowable_items) if
set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])
if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module
I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.
EDIT:
This is the query that gets you part way there:
from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:
Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])