better way to query in django - django

Is there a better way to do this
questionobjects = Questions.objects.all()
for questionobject in questionobjects:
answerobjects = Answers.objects.filter(question=questionobject.id).count()
In the above query Answers model has foreign key relation with Questions. But in the above script the query Answer query executes based on the number of questionobjects.
Suppose there are 10 questionobjects then 10 separate answerobject queries take place. Is there a way to do this with a single query because as the number of questionobjects increases, it will be a problem because the number of answerobjects queries also increase

So it looks like you just care about the count of answers, rather than getting the actual answer objects. You can do this with annotations:
from django.db.models import Count
Question.objects.all().annotate(Count('answers'))

Take a look at annotation: Django Annotation
from django.db.models import Count
questions = Questions.objects.annotate(count=Count('answers'))
Then you can access the count with [q.count for q in questions]

Related

How do I perform a filter on the FK I'm aggregating on in a QuerySet?

I have a (working) query that looks like
authors = Authors.objects.complicated_queryset()
with_scores = authors.annotate(total_book_score=Sum('books__score'))
It finds all authors who are returned by a complicated_queryset method, and then sums up the total of the scores of their books. However, I wish to amend this QuerySet such that it only includes the scores from the books published the last year. In pretend syntax:
with_scores = authors.annotate(total_book_score=Sum('books__score'),
filter=Q(books__published=2015))
Is this possible with QuerySets or do I have to write raw SQL (or, I guess, two separate queries) to get that behaviour?
You could try using Case if you're using Django 1.8+
DISCLAIMER: The following code is an aproximation, I haven't tested this, so this could not work exactly in this way.
# You will need import:
from django.db.models import Sum, IntegerField, Case, When, Value
with_scores = authors.annotate(total_book_score=Sum(
Case(When(books__published=2015, then=Value(F('books__score'))),
default=Value(0), output=IntegerField()) # Or float if it fits your needs.
)
)

Django QuerySet update performance

Which one would be better for performance?
We take a slice of products. which make us impossible to bulk update.
products = Product.objects.filter(featured=True).order_by("-modified_on")[3:]
for product in products:
product.featured = False
product.save()
or (invalid)
for product in products.iterator():
product.update(featured=False)
I have tried QuerySet's in statement too as following.
Product.objects.filter(pk__in=products).update(featured=False)
This line works fine on SQLite. But, it rises following exception on MySQL. So, I couldn't use that.
DatabaseError: (1235, "This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'")
Edit: Also iterator() method causes re-evaluate the query. So, it is bad for performance.
As #Chris Pratt pointed out in comments, the second example is invalid because the objects don't have update methods. Your first example will require queries equal to results+1 since it has to update each object. That might really be costly if you have 1000 products. Ideally you do want to reduce this to a more fixed expense if possible.
This is a similar situation to another question:
Django: Cannot update a query once a slice has been taken
That being said, you would have to do it in at least 2 queries, but you have to be a bit sneaky on how to construct the LIMIT...
Using Q objects for complex queries:
# get the IDs we want to exclude
products = Product.objects.filter(featured=True).order_by("-modified_on")[:3]
# flatten them into just a list of ids
ids = products.values_list('id', flat=True)
# Now use the Q object to construct a complex query
from django.db.models import Q
# This builds a list of "AND id NOT EQUAL TO i"
limits = [~Q(id=i) for i in ids]
Product.objects.filter(featured=True, *limits).update(featured=False)
In some cases it's acceptable to cache QuerySet in array
products = list(products)
Product.objects.filter(pk__in=products).update(featured=False)
Small optimization with values_list
products_id = list(products.values_list('id', flat=True)
Product.objects.filter(pk__in=products_id).update(featured=False)

Django Query Related Field Count

I've got an app where users create pages. I want to run a simple DB query that returns how many users have created more than 2 pages.
This is essentially what I want to do, but of course it's not the right method:
User.objects.select_related('page__gte=2').count()
What am I missing?
You should use aggregates.
from django.db.models import Count
User.objects.annotate(page_count=Count('page')).filter(page_count__gte=2).count()
In my case, I didn't use last .count() like the other answer and it also works nice.
from django.db.models import Count
User.objects.annotate( our_param=Count("all_comments")).filter(our_param__gt=12)
use aggregate() function with django.db.models methods!
this is so useful and not really crushing with other annotation aggregated columns.
*use aggregate() at the last step of calculation, it turns your queryset to dict.
below is my code snippet using them.
cnt = q.values("person__year_of_birth").filter(person__year_of_birth__lte=year_interval_10)\
.filter(person__year_of_birth__gt=year_interval_10-10)\
.annotate(group_cnt=Count("visit_occurrence_id")).aggregate(Sum("group_cnt"))

Django annotation with nested filter

Is it possible to filter within an annotation?
In my mind something like this (which doesn't actually work)
Student.objects.all().annotate(Count('attendance').filter(type="Excused"))
The resultant table would have every student with the number of excused absences. Looking through documentation filters can only be before or after the annotation which would not yield the desired results.
A workaround is this
for student in Student.objects.all():
student.num_excused_absence = Attendance.objects.filter(student=student, type="Excused").count()
This works but does many queries, in a real application this can get impractically long. I think this type of statement is possible in SQL but would prefer to stay with ORM if possible. I even tried making two separate queries (one for all students, another to get the total) and combined them with |. The combination changed the total :(
Some thoughts after reading answers and comments
I solved the attendance problem using extra sql here.
Timmy's blog post was useful. My answer is based off of it.
hash1baby's answer works but seems equally complex as sql. It also requires executing sql then adding the result in a for loop. This is bad for me because I'm stacking lots of these filtering queries together. My solution builds up a big queryset with lots of filters and extra and executes it all at once.
If performance is no issue - I suggest the for loop work around. It's by far the easiest to understand.
As of Django 1.8 you can do this directly in the ORM:
students = Student.objects.all().annotate(num_excused_absences=models.Sum(
models.Case(
models.When(absence__type='Excused', then=1),
default=0,
output_field=models.IntegerField()
)))
Answer adapted from another SO question on the same topic
I haven't tested the sample above but did accomplish something similar in my own app.
You are correct - django does not allow you to filter the related objects being counted, without also applying the filter to the primary objects, and therefore excluding those primary objects with a no related objects after filtering.
But, in a bit of abstraction leakage, you can count groups by using a values query.
So, I collect the absences in a dictionary, and use that in a loop. Something like this:
# a query for students
students = Students.objects.all()
# a query to count the student attendances, grouped by type.
attendance_counts = Attendence(student__in=students).values('student', 'type').annotate(abs=Count('pk'))
# regroup that into a dictionary {student -> { type -> count }}
from itertools import groupby
attendance_s_t = dict((s, (dict(t, c) for (s, t, c) in g)) for s, g in groupby(attendance_counts, lambda (s, t, c): s))
# then use them efficiently:
for student in students:
student.absences = attendance_s_t.get(student.pk, {}).get('Excused', 0)
Maybe this will work for you:
excused = Student.objects.filter(attendance__type='Excused').annotate(abs=Count('attendance'))
You need to filter the Students you're looking for first to just those with excused absences and then annotate the count of them.
Here's a link to the Django Aggregation Docs where it discusses filtering order.

django most efficient way to count same field values in a query

Lets say if I have a model that has lots of fields, but I only care about a charfield. Lets say that charfield can be anything so I don't know the possible values, but I know that the values frequently overlap. So I could have 20 objects with "abc" and 10 objects with "xyz" or I could have 50 objects with "def" and 80 with "stu" and i have 40000 with no overlap which I really don't care about.
How do I count the objects efficiently? What I would like returned is something like:
{'abc': 20, 'xyz':10, 'other': 10,000}
or something like that, w/o making a ton of SQL calls.
EDIT:
I dont know if anyone will see this since I am editing it kind of late, but...
I have this model:
class Action(models.Model):
author = models.CharField(max_length=255)
purl = models.CharField(max_length=255, null=True)
and from the answers, I have done this:
groups = Action.objects.filter(author='James').values('purl').annotate(count=Count('purl'))
but...
this is what groups is:
{"purl": "waka"},{"purl": "waka"},{"purl": "waka"},{"purl": "waka"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "lora"}
(I just filled purl with dummy values)
what I want is
{'waka': 4, 'mora': 5, 'lora': 1}
Hopefully someone will see this edit...
EDIT 2:
Apparently my database (BigTable) does not support the aggregate functions of Django and this is why I have been having all the problems.
You want something similar to "count ... group by". You can do this with the aggregation features of django's ORM:
from django.db.models import Count
fieldname = 'myCharField'
MyModel.objects.values(fieldname)
.order_by(fieldname)
.annotate(the_count=Count(fieldname))
Previous questions on this subject:
How to query as GROUP BY in django?
Django equivalent of COUNT with GROUP BY
This is called aggregation, and Django supports it directly.
You can get your exact output by filtering the values you want to count, getting the list of values, and counting them, all in one set of database calls:
from django.db.models import Count
MyModel.objects.filter(myfield__in=('abc', 'xyz')).\
values('myfield').annotate(Count('myfield'))
You can use Django's Count aggregation on a queryset to accomplish this. Something like this:
from django.db.models import Count
queryset = MyModel.objects.all().annotate(count = Count('my_charfield'))
for each in queryset:
print "%s: %s" % (each.my_charfield, each.count)
Unless your field value is always guaranteed to be in a specific case, it may be useful to transform it prior to performing a count, i.e. so 'apple' and 'Apple' would be treated as the same.
from django.db.models import Count
from django.db.models.functions import Lower
MyModel.objects.annotate(lower_title=Lower('title')).values('lower_title').annotate(num=Count('lower_title')).order_by('num')