Django what is the most effective way of subsetting a queryset? - django

Since each queryset is unique, what is the best way of subsetting a large queryset without generate too many sql queries in a loop?
For instance, I need to produce a report which requires me to loop through all of the data in a database:
for user in users:
notes = Note.objects.filter(owner=user.id)
for note in notes:
answers = Answer.filter(note_id = note.id)
for answer in answers:
#do something
As you can already see how bad this loop will be since each filter statement creates a queryset which hits the database.
What am I suppose to do in this situation to avoid calling the database thousands of times?
Thanks!

If users is QuerySet, then
answers = Answer.objects.filter(note__owner__in=users)
UPDATE: If you want to do something with Answer's note and owner, then
answers = Answer.objects.filter(note__owner__in=users).select_related('note', 'note__owner')
It requires one query.

Try this
answer = Answer.objects.filter(note__owner__in = user_ids)

Related

Django Query without referencing fields

Quick overview:
I have a Forecast model setup which has a workflow_state.
Now I'm trying to query the Forecast for all the forecasts that are in a certain state AND the current person logged in is_staff.
If i was writing a raw query this wouldn't be an issue because i could write something like:
SELECT * FROM forecast WHERE forecast.workflow_state_id in (1,2,3,4) AND 1 = user.is_staff
However, when trying to write this in a queryset I can't figure out how to reference a constant. I don't want to write a raw queryset and if possible want to avoid using the extra field.
Any suggestions would be appreciated. Thanks
Your constant is only a True
Forecast.objects.filter(workflow_state__in=[1,2,3,4], user__is_staf=True)
Your edit makes things rather less than clear, but you seem to be asking how to do a check on the current logged-in user, rather than on the user referenced by the model. In which case, you don't do that in a query at all; your example SQL statement wouldn't work, and neither would doing it in the ORM. You do it in Python, of course:
if request.user.is_staff:
forecasts = Forecast.objects.filter(workflow_state__in=[1,2,3,4])

Appending to QuerySet used as a __in lookup parameter

I'd like to retrieve all posts authored by a user OR by his/her friends:
Currently, I use Q objects:
# friends = QuerySet containing User objects
# user = User object
posts = Post.objects.filter(Q(author__in=friends) | Q(author=user))
Which generates a SQL query that looks like this:
SELECT ... WHERE ("posts_post"."author_id" IN (2, 3) OR "posts_post"."author_id" = 1)
Question:
Is it possible to append the user object to the friends QuerySet to generate a query that looks like this, instead:
SELECT ... WHERE "posts_post"."author_id" IN (2, 3, 1)
Directly using the QuerySet's .append() method does in fact work:
friends.append(user)
posts = Post.objects.filter(author__in=friends)
However, I've seen a number of answers here and elsewhere cautioning against treating QuerySets as basic lists.
Is this latter .append() technique safe? Is it efficient, particularly if the QuerySet is fairly large? Or is there another preferred method? Alternatively, feel free to tell me that I'm being silly and that there's nothing wrong with the Q objects approach!
Many thanks.
Querysets don't have an append method, so if that's working in your context friends is already a list rather than a queryset.
As for performance - my gut feeling is that a queryset that's too large to pull into memory so you can append to it is also going to be too large to do well as a subquery. But as always with performance questions, testing is the only real way to be sure.
You'll definitely take something of a performance hit either way. If you don't need friends for anything else, you could use a values_list queryset to get just the PKs into memory, append user.id, then filter on that list of PKs.
You can use this trick:
friends = Friend.objects.values_list('id', flat=True).order_by('id')
friends.append(user.pk)
posts = Post.objects.filter(author__in=friends)
As far as you save only id's, not the whole queryset this method is pretty safe.

How to update a queryset that has been annotated?

I have the following models:
class Work(models.Model):
visible = models.BooleanField(default=False)
class Book(models.Model):
work = models.ForeignKey('Work')
I am attempting to update some rows like so:
qs=Work.objects.all()
qs.annotate(Count('book')).filter(Q(book__count__gt=1)).update(visible=False)
However, this is giving an error:
DatabaseError: subquery has too many columns
LINE 1: ...SET "visible" = false WHERE "app_work"."id" IN (SELECT...
If I remove the update clause, the query runs with no problems and returns what I am expecting.
It looks like this error happens for queries with an annotate followed by an update. Is there some other way to write this?
Without making a toy database to be able to duplicate your issue and try out solutions, I can at least suggest the approach in Django: Getting complement of queryset as one possible approach.
Try this approach:
qs.annotate(Count('book')).filter(Q(book__count__gt=1))
Work.objects.filter(pk__in=qs.values_list('pk', flat=True)).update(visible=False)
You can also clear the annotations off a queryset quite simply:
qs.query.annotations.clear()
qs.update(..)
And this means you're only firing off one query, not one into another, but don't use this if your query relies on an annotation to filter. This is great for stripping out database-generated concatenations, and the utility rubbish that I occasionally add into model's default queries... but the example in the question is a perfect example of where this would not work.
To add to Oli's answer: If you need your annotations for the update then do the filters first and store the result in a variable and then call filter with no arguments on that queryset to access the update function like so:
q = X.objects.filter(annotated_val=5, annotated_name='Nima')
q.query.annotations.clear()
q.filter().update(field=900)
I've duplicated this issue & believe its a bug with the Django ORM. #acjay answer is a good workaround. Bug report: https://code.djangoproject.com/ticket/25171
Fix released in Django 2 alpha: https://code.djangoproject.com/ticket/19513

Django annotation with nested filter

Is it possible to filter within an annotation?
In my mind something like this (which doesn't actually work)
Student.objects.all().annotate(Count('attendance').filter(type="Excused"))
The resultant table would have every student with the number of excused absences. Looking through documentation filters can only be before or after the annotation which would not yield the desired results.
A workaround is this
for student in Student.objects.all():
student.num_excused_absence = Attendance.objects.filter(student=student, type="Excused").count()
This works but does many queries, in a real application this can get impractically long. I think this type of statement is possible in SQL but would prefer to stay with ORM if possible. I even tried making two separate queries (one for all students, another to get the total) and combined them with |. The combination changed the total :(
Some thoughts after reading answers and comments
I solved the attendance problem using extra sql here.
Timmy's blog post was useful. My answer is based off of it.
hash1baby's answer works but seems equally complex as sql. It also requires executing sql then adding the result in a for loop. This is bad for me because I'm stacking lots of these filtering queries together. My solution builds up a big queryset with lots of filters and extra and executes it all at once.
If performance is no issue - I suggest the for loop work around. It's by far the easiest to understand.
As of Django 1.8 you can do this directly in the ORM:
students = Student.objects.all().annotate(num_excused_absences=models.Sum(
models.Case(
models.When(absence__type='Excused', then=1),
default=0,
output_field=models.IntegerField()
)))
Answer adapted from another SO question on the same topic
I haven't tested the sample above but did accomplish something similar in my own app.
You are correct - django does not allow you to filter the related objects being counted, without also applying the filter to the primary objects, and therefore excluding those primary objects with a no related objects after filtering.
But, in a bit of abstraction leakage, you can count groups by using a values query.
So, I collect the absences in a dictionary, and use that in a loop. Something like this:
# a query for students
students = Students.objects.all()
# a query to count the student attendances, grouped by type.
attendance_counts = Attendence(student__in=students).values('student', 'type').annotate(abs=Count('pk'))
# regroup that into a dictionary {student -> { type -> count }}
from itertools import groupby
attendance_s_t = dict((s, (dict(t, c) for (s, t, c) in g)) for s, g in groupby(attendance_counts, lambda (s, t, c): s))
# then use them efficiently:
for student in students:
student.absences = attendance_s_t.get(student.pk, {}).get('Excused', 0)
Maybe this will work for you:
excused = Student.objects.filter(attendance__type='Excused').annotate(abs=Count('attendance'))
You need to filter the Students you're looking for first to just those with excused absences and then annotate the count of them.
Here's a link to the Django Aggregation Docs where it discusses filtering order.

How to convert a Django QuerySet to a list?

I have the following:
answers = Answer.objects.filter(id__in=[answer.id for answer in answer_set.answers.all()])
then later:
for i in range(len(answers)):
# iterate through all existing QuestionAnswer objects
for existing_question_answer in existing_question_answers:
# if an answer is already associated, remove it from the
# list of answers to save
if answers[i].id == existing_question_answer.answer.id:
answers.remove(answers[i]) # doesn't work
existing_question_answers.remove(existing_question_answer)
I get an error:
'QuerySet' object has no attribute 'remove'
I've tried all sorts to convert the QuerySet to a standard set or list. Nothing works.
How can I remove an item from the QuerySet so it doesn't delete it from the database, and doesn't return a new QuerySet (since it's in a loop that won't work)?
Why not just call list() on the Queryset?
answers_list = list(answers)
This will also evaluate the QuerySet/run the query. You can then remove/add from that list.
You could do this:
import itertools
ids = set(existing_answer.answer.id for existing_answer in existing_question_answers)
answers = itertools.ifilter(lambda x: x.id not in ids, answers)
Read when QuerySets are evaluated and note that it is not good to load the whole result into memory (e.g. via list()).
Reference: itertools.ifilter
Update with regard to the comment:
There are various ways to do this. One (which is probably not the best one in terms of memory and time) is to do exactly the same :
answer_ids = set(answer.id for answer in answers)
existing_question_answers = filter(lambda x: x.answer.id not in answers_id, existing_question_answers)
It is a little hard to follow what you are really trying to do. Your first statement looks like you may be fetching the same exact QuerySet of Answer objects twice. First via answer_set.answers.all() and then again via .filter(id__in=...). Double check in the shell and see if this will give you the list of answers you are looking for:
answers = answer_set.answers.all()
Once you have that cleaned up so it is a little easier for you (and others working on the code) to read you might want to look into .exclude() and the __in field lookup.
existing_question_answers = QuestionAnswer.objects.filter(...)
new_answers = answers.exclude(question_answer__in=existing_question_answers)
The above lookup might not sync up with your model definitions but it will probably get you close enough to finish the job yourself.
If you still need to get a list of id values then you want to play with .values_list(). In your case you will probably want to add the optional flat=True.
answers.values_list('id', flat=True)
By the use of slice operator with step parameter which would cause evaluation of the queryset and create a list.
list_of_answers = answers[::1]
or initially you could have done:
answers = Answer.objects.filter(id__in=[answer.id for answer in
answer_set.answers.all()])[::1]
You can directly convert using the list keyword.
For example:
obj=emp.objects.all()
list1=list(obj)
Using the above code you can directly convert a query set result into a list.
Here list is keyword and obj is result of query set and list1 is variable in that variable we are storing the converted result which in list.
Try this values_list('column_name', flat=True).
answers = Answer.objects.filter(id__in=[answer.id for answer in answer_set.answers.all()]).values_list('column_name', flat=True)
It will return you a list with specified column values
Use python list() function
list(). Force evaluation of a QuerySet by calling list() on it. For
example:
answers = list(answer_set.answers.all())
Why not just call
.values('reqColumn1','reqColumn2') or .values_list('reqColumn1','reqColumn2') on the queryset?
answers_list = models.objects.values('reqColumn1','reqColumn2')
result = [{'reqColumn1':value1,'reqColumn2':value2}]
OR
answers_list = models.objects.values_list('reqColumn1','reqColumn2')
result = [(value1,value2)]
You can able to do all the operation on this QuerySet, which you do for list .
def querySet_to_list(qs):
"""
this will return python list<dict>
"""
return [dict(q) for q in qs]
def get_answer_by_something(request):
ss = Answer.objects.filter(something).values()
querySet_to_list(ss) # python list return.(json-able)
this code convert django queryset to python list
instead of remove() you can use exclude() function to remove an object from the queryset.
it's syntax is similar to filter()
eg : -
qs = qs.exclude(id= 1)
in above code it removes all objects from qs with id '1'
additional info :-
filter() used to select specific objects but exclude() used to remove