django most efficient way to count same field values in a query - django

Lets say if I have a model that has lots of fields, but I only care about a charfield. Lets say that charfield can be anything so I don't know the possible values, but I know that the values frequently overlap. So I could have 20 objects with "abc" and 10 objects with "xyz" or I could have 50 objects with "def" and 80 with "stu" and i have 40000 with no overlap which I really don't care about.
How do I count the objects efficiently? What I would like returned is something like:
{'abc': 20, 'xyz':10, 'other': 10,000}
or something like that, w/o making a ton of SQL calls.
EDIT:
I dont know if anyone will see this since I am editing it kind of late, but...
I have this model:
class Action(models.Model):
author = models.CharField(max_length=255)
purl = models.CharField(max_length=255, null=True)
and from the answers, I have done this:
groups = Action.objects.filter(author='James').values('purl').annotate(count=Count('purl'))
but...
this is what groups is:
{"purl": "waka"},{"purl": "waka"},{"purl": "waka"},{"purl": "waka"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "lora"}
(I just filled purl with dummy values)
what I want is
{'waka': 4, 'mora': 5, 'lora': 1}
Hopefully someone will see this edit...
EDIT 2:
Apparently my database (BigTable) does not support the aggregate functions of Django and this is why I have been having all the problems.

You want something similar to "count ... group by". You can do this with the aggregation features of django's ORM:
from django.db.models import Count
fieldname = 'myCharField'
MyModel.objects.values(fieldname)
.order_by(fieldname)
.annotate(the_count=Count(fieldname))
Previous questions on this subject:
How to query as GROUP BY in django?
Django equivalent of COUNT with GROUP BY

This is called aggregation, and Django supports it directly.
You can get your exact output by filtering the values you want to count, getting the list of values, and counting them, all in one set of database calls:
from django.db.models import Count
MyModel.objects.filter(myfield__in=('abc', 'xyz')).\
values('myfield').annotate(Count('myfield'))

You can use Django's Count aggregation on a queryset to accomplish this. Something like this:
from django.db.models import Count
queryset = MyModel.objects.all().annotate(count = Count('my_charfield'))
for each in queryset:
print "%s: %s" % (each.my_charfield, each.count)

Unless your field value is always guaranteed to be in a specific case, it may be useful to transform it prior to performing a count, i.e. so 'apple' and 'Apple' would be treated as the same.
from django.db.models import Count
from django.db.models.functions import Lower
MyModel.objects.annotate(lower_title=Lower('title')).values('lower_title').annotate(num=Count('lower_title')).order_by('num')

Related

Django distinct on case sensitive entries

I have the following query:
>>> z = Restaurant.objects.values_list('city',flat=True).order_by('city').distinct()
>>> z
[u'ELURU', u'Eluru', u'Hyderabad']
As you can see, it is not completely distinct because of the case sensitivity. How do i correct this issue?
You can use annotate in conjunction with Lower (or Upper, etc...) to normalize your values and return truly distinct values like this...
from django.db.models.functions import Lower
z = Restaurant.objects.annotate(
city_lower=Lower('city')).values_list(
'city_lower',flat=True).order_by('city_lower').distinct()
Note: Make sure order_by is set to 'city_lower' and not 'city' to avoid duplicates.
I'm not sure you're going to find a solution to this since django doesn't offer a case-insensitive distinct method (currently). But then maybe it would be better to fix the values in your database anyway since you don't really want your end users to see their city in capitals since it will look ugly.
I'd suggest thinking about making a simple method that you could run either once in a data migration and stopping the city field from ever getting in this state again - or just running this periodically.
something similar to
for restaurant in Restaurant.objects.all():
if restaurant.city != restaurant.city.title():
restaurant.city = restaurant.city.title()
restaurant.save()
Try this;
z = Restaurant.objects.extra(select = {'tmp_city': lower('city')}).values_list('city',flat=True).order_by('city').distinct('tmp_city')
This works, although it is a little messy. I ended up having to use values, since distinct only works on database tables, regardless of whether or not you use annotate, extra, or rawSQL.
You end up creating an extra field with annotate, and then use that field in your list of dictionaries created by values. Once you have that list of dictionaries, you can use groupby to group dictionaries based on the Lower values key in the values list of dicts. Then, depending on how you want to select the object (in this case, just taking the first object of the group), you can select the version of the distinct that you want.
from django.db.models.functions import Lower
from itertools import groupby
restaurant = [g.next() for k, g in groupby(
list(
Restaurant.objects.annotate(city_lower=Lower('message_text')).values_list('city', flat=True)
).order_by('city').values('city_lower', 'city')
), lambda x: x['city_lower'])]

How do I perform a filter on the FK I'm aggregating on in a QuerySet?

I have a (working) query that looks like
authors = Authors.objects.complicated_queryset()
with_scores = authors.annotate(total_book_score=Sum('books__score'))
It finds all authors who are returned by a complicated_queryset method, and then sums up the total of the scores of their books. However, I wish to amend this QuerySet such that it only includes the scores from the books published the last year. In pretend syntax:
with_scores = authors.annotate(total_book_score=Sum('books__score'),
filter=Q(books__published=2015))
Is this possible with QuerySets or do I have to write raw SQL (or, I guess, two separate queries) to get that behaviour?
You could try using Case if you're using Django 1.8+
DISCLAIMER: The following code is an aproximation, I haven't tested this, so this could not work exactly in this way.
# You will need import:
from django.db.models import Sum, IntegerField, Case, When, Value
with_scores = authors.annotate(total_book_score=Sum(
Case(When(books__published=2015, then=Value(F('books__score'))),
default=Value(0), output=IntegerField()) # Or float if it fits your needs.
)
)

Can i sort query set return objects in order to function result

I am writing a web based music application and I want implement some feature that user can see most favor album in last week-month-year.
so this is my model :
class album(models.Model):
def get_weely_count():
...
def get_monthly_count():
...
def get_yearly_count():
...
class like(models.Model):
created_at = models.DateField()
albumID = models.ForeignKey(Album)
Now I want to receive albums that most liked in last week or last month or last year,I want done some thing like this(but I can not):
Album.objects.all().order_by('get_weekly_count')
can any one help me to fix it or give another approach to achieve that goal??
The order_by method translates into an SQL ORDER BY, therefore it works only with model fields, which correspond to table columns. It won't work if you intend to sort your elements by a model's method.
So, if you want to accomplish something like
Album.objects.all().order_by('get_weekly_count')
You'll have to do it the python way
sorted(Album.objects.all(), key=lambda x: x.get_weekly_count())
Performance-wise, this means you'll get your elements with a query and then you'll sort them with python (that's different from getting a sorted queryset in one shot).
Otherwise, if it's possible for you to turn get_weekly_count into raw SQL, you could use it with a Count() or an extra modifier, that would make order_by usable, i.e.:
Album.objects.all().extra(
select={'weekly_count': "<some SQL>"},
select_params=(<you params>,),
).order_by('weekly_count')
Have a look at https://docs.djangoproject.com/en/1.8/ref/models/querysets/#extra
According to the documentation, you should use:
from django.db.models import Count
like.objects.filter(created_at__gt=START_OF_MONTH, created_at__lt=Datetime.now()).values('albumID').annotate(count=Count('albumID')).order_by('count')
This will get results for you in single db query. For more details visit https://docs.djangoproject.com/en/dev/topics/db/aggregation/.

Django aggregation over choices

I have a following model:
class VotingRound(models.Model):
pass # here are some unimportant fields
class Vote(models.Model):
voting_round = models.ForeignKey(VotingRound)
vote = models.CharField(choices=...)
Now I have instance of VotingRound and I would like to get to know how many times was each value represented. This is easily done through collections.Counter:
>>> Counter(voting_round_instance.vote_set.values_list('vote', flat=True))
Counter({u'decline': 8, u'neutral': 5, u'approve': 4})
Now I would like to know if there is a way to do this with Django aggregation techniques....
I have found this module, but before using it I wanted to know if there is native way to do it.
yes, you can!
from django.db.models import Count
voting_round_instance.vote_set.values('vote') \
.annotate(count=Count('vote')).distinct()
EDIT : use order_by()
You may also need to make sure the default ordering does not mess up your aggregation. This is especially true when using related object managers.
https://docs.djangoproject.com/en/1.8/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate. This shows up, particularly, when counting things.
from django.db.models import Count
voting_round_instance.vote_set.values('vote') \
.annotate(count=Count('vote')) \
.distinct().order_by()

Django DB, finding Categories whose Items are all in a subset

I have a two models:
class Category(models.Model):
pass
class Item(models.Model):
cat = models.ForeignKey(Category)
I am trying to return all Categories for which all of that category's items belong to a given subset of item ids (fixed thanks). For example, all categories for which all of the items associated with that category have ids in the set [1,3,5].
How could this be done using Django's query syntax (as of 1.1 beta)? Ideally, all the work should be done in the database.
Category.objects.filter(item__id__in=[1, 3, 5])
Django creates the reverse relation ship on the model without the foreign key. You can filter on it by using its related name (usually just the model name lowercase but it can be manually overwritten), two underscores, and the field name you want to query on.
lets say you require all items to be in the following set:
allowable_items = set([1,3,4])
one bruteforce solution would be to check the item_set for every category as so:
categories_with_allowable_items = [
category for category in
Category.objects.all() if
set([item.id for item in category.item_set.all()]) <= allowable_items
]
but we don't really have to check all categories, as categories_with_allowable_items is always going to be a subset of the categories related to all items with ids in allowable_items... so that's all we have to check (and this should be faster):
categories_with_allowable_items = set([
item.category for item in
Item.objects.select_related('category').filter(pk__in=allowable_items) if
set([siblingitem.id for siblingitem in item.category.item_set.all()]) <= allowable_items
])
if performance isn't really an issue, then the latter of these two (if not the former) should be fine. if these are very large tables, you might have to come up with a more sophisticated solution. also if you're using a particularly old version of python remember that you'll have to import the sets module
I've played around with this a bit. If QuerySet.extra() accepted a "having" parameter I think it would be possible to do it in the ORM with a bit of raw SQL in the HAVING clause. But it doesn't, so I think you'd have to write the whole query in raw SQL if you want the database doing the work.
EDIT:
This is the query that gets you part way there:
from django.db.models import Count
Category.objects.annotate(num_items=Count('item')).filter(num_items=...)
The problem is that for the query to work, "..." needs to be a correlated subquery that looks up, for each category, the number of its items in allowed_items. If .extra had a "having" argument, you'd do it like this:
Category.objects.annotate(num_items=Count('item')).extra(having="num_items=(SELECT COUNT(*) FROM app_item WHERE app_item.id in % AND app_item.cat_id = app_category.id)", having_params=[allowed_item_ids])