Will Django use previously-evaluated results when applying additional filters to a query set? - django

Let's say I need to do some work both on a set of model objects, as well as a subset of the first set:
things = Thing.objects.filter(active=True)
for thing in things: # (1)
pass # do something with each `thing`
special_things = things.filter(special=True)
for thing in special_things: # (2)
pass # do something more with these things
My understanding is that at point (1) marked in the code above, an actual SQL query something like SELECT * FROM things_table WHERE active=1 will get executed against the database. The QuerySet documentation also says:
When a QuerySet is evaluated, it typically caches its results.
Now my question is, what happens at point (2) in the example Python code above?
Will Django execute a second SQL query, something like SELECT * FROM things_table WHERE active=1 AND special=1?
Or, will it use the cached result from earlier, automatically doing for me behind the scenes something like the more optimal filter(lambda d: d.special == True, things), i.e. avoiding a needless second trip to the database?
Either way, is the current behavior guaranteed (by documentation or something) or should I not rely on it? For example, it is not only a point of optimization, but could also make a possible logic difference if the database table is modified by another thread/process between the two potential queries.

It will execute a second SQL query. filter creates a new queryset, which doesn't copy the results cache.
As for guarantees - well, the docs specify that filter returns a new queryset object. I think you can be confident that that new queryset won't have cached results yet. As further support, the "when are querysets evaluated" docs suggest using .all() to get a new queryset if you want to pick up possibly changed results:
If the data in the database might have changed since a QuerySet was
evaluated, you can get updated results for the same query by calling
all() on a previously evaluated QuerySet.

Related

Which is a more efficient method, using a list comprehension or django's 'values_list' function?

When attempting to return a list of values from django objects, will performance be better using a list comprehension:
[x.value for x in Model.objects.all()]
or calling list() on django's values_list function:
list(Model.objects.values_list('value', flat=True))
and why?
The most efficient way is to do the second approach (using values_list()). The reason for this is that this modifies the SQL query that is sent to the database to only select the values provided.
The first approach FIRST selects all values from the database, and after that filters them again. So you have already "spend" the resources to fetch all values with that approach.
You can compare the queries generated by wrapping your QuerySet with str(queryset.query) and it will return the actual SQL query that gets executed.
See example below
class Model(models.Model):
foo = models.CharField()
bar = models.CharField()
str(Model.objects.all().query)
# SELECT "model"."id", "model"."foo", "model"."bar" FROM "model"
str(Model.objects.values_list("foo").query)
# SELECT "model"."foo" FROM "model"
I had also somewhat assumed the argument in the currently-accepted answer would be correct. Namely, having a fewer number of fields being fetched would lead to Model.objects.all() taking less time than Model.objects.values_list('foo') to execute. However, I didn't find this in practice when using %timeit.
I actually found that doing
Model.objects.values_list('foo', flat=True) would take ~2-10x longer than just Model.objects.all(). I found this was the case for
an empty django table
a table with 10s of rows
a table with millions of rows
Including/removing flat=True seemed to make no significant difference in executing time for values_list. I would be interested what others find as well?
So this makes me think from a pure "what SQL is executed" point of view, although the values_list ORM query fetches fewer field values from the db, I imagine there is more logic still within the source django code of .all() vs .values_list() which could lead to different additional execution times (including .all() taking less time).
However, to fully address the initial example code, we would also need to factor in any further considerations affecting the execution time due to using a list comprehension [] in the .all() case VS list() in the .values_list() case. The general discussion of list() VS a list comprehension is covered in other questions already.
TLDR So I imagine it is a trade-off between those 2 factors.
the apparent difference in execution time between .values_list() and .all() (which from my tests indicate we can't simply deduce fewer fields being fetched leads to faster execution - more investigation of underlying django source code needed for cause of this)
any differences between using a list comprehension and list()
In my test cases, I generally found the .all() query was actually faster than the .values_list() query, but when also factoring in the transformation to a list, the .values_list scenario would overall take less time. So it may well depend on the scenario...

Apply Q object to one object

I have a complicated query in a Django model and I want to do two things:
Get all objects that satisify the query
Check if one object satisfies the query
To do (1), I have a Q object encoding the query, and I just do
Model.objects.filter(THE_QUERY)
The query is something like
THE_QUERY = Q(field_1__isnull=False) & Q(field_2__gte=2) & Q(field3=0)
But I don't know how to reuse the query in THE_QUERY for (2). I want to have the predicate of the query in just one place and use that information to do (1) and (2), so that, if I ever have to change the query, both actions would do as expected.
Is there a way to put the query in just one place?
Model.objects.filter(THE_QUERY) returns an unevaluated queryset. You can extend this with extra conditions - in this case, you can add a filter to a specific ID and then an exists() call.
Model.objects.filter(THE_QUERY).filter(pk=my_object_id).exists()

How do I tell if a Django QuerySet has been evaluated?

I'm creating a Django queryset by hand and want to just use the Django ORM to read the resulting querset.query SQL itself without hitting my DB.
I know Django quersets are lazy and I see all the ops that trigger a queryset being evaluated:
https://docs.djangoproject.com/en/1.10/ref/models/querysets/#when-querysets-are-evaluated
But... what if I just want to verify my code is purely building the queryset guts but ISN'T evaluating and hitting my DB yet inadvertently? Are there any attributes on the queryset object I can use to verify it hasn't been evaluated without actually evaluating it?
For querysets that use a select to return lists of model instances, like a basic filter or exclude, the _result_cache attribute is None if the queryset has not been evaluated, or a list of results if it has. The usual caveats about non-public attributes apply.
Note that printing a queryset - although the docs note calling repr() as an evaluation trigger - does not in fact evaluate the original queryset. Instead, internally the original queryset chains into a new one so that it can limit the amount of data printed without changing the limits of the original queryset. It's true that it evaluates a subset of the original queryset and therefore hits the DB, so that's another weakness of this approach if you're in fact trying to use it to monitor all DB traffic.
For other querysets (count, delete, etc) I'm not sure there is a simple way. Maybe watch your database logs, or run in DEBUG mode and check connection.queries as described here:
https://docs.djangoproject.com/en/dev/faq/models/#how-can-i-see-the-raw-sql-queries-django-is-running

Django another optimizing save()

In the process of optimizing queries in my app I noticed something strange. In a given section of code I would get the object, make update some values and then save. In theory this should execute 2 queries. But in fact its executing 3 queries. 1 select query when I get the object and 2 when I save the object (Another select and then the update!). While removing one query may seem silly. In this particular method I am updating many objects so every query I save is 1 less hit on the db and should speed up the method.
Through inspection of the queries the two select queries are different the first gets many things and the select executed by the same is simple.
Here is the example code:
myobject = room.myobjects.get(id=myobject_id) # one query executed here
myobject.color = color
myobject.shape = shape
myobject.place = place
myobject.save() # two queries executed here
queries:
1) "SELECT `rooms_object`.`id`, `rooms_object`.`room_id`, ......FROM `rooms_object` WHERE (`rooms_object`.`id` = %s AND `rooms_object`.`room_id` = %s )"
2) "SELECT (1) AS `a` FROM `rooms_object` WHERE `rooms_object`.`id` = %s LIMIT 1"
3) "UPDATE ......this ones obvious"
I want the save method to recognize it already has the object in memory and it does not need to get it again....if that is even possible...
The second query is not actually pulling down the object again. It is doing an extremely fast "existence" check on the id before performing an UPDATE query. All that is returned from that query is a single 1, and the field is indexed, so it should be extremely efficient.
The reason they have chosen to design the ORM this way, is first they look at your object to see if it currently has an ID. If it does, they do the SELECT to make sure it really does still exist in the database. If it does, they perform the update. If somehow the record does not exist, they perform an INSERT. You can test this by creating the object, then deleting the row manually from your database, without django knowing. Then call save()
This is how it works to make sure django maintains consistency.
If it were a new object, you would only get a single INSERT query, because it knows the object has no id right now.
This is managed with force_update parameter in
Model.save([force_insert=False, force_update=False, using=DEFAULT_DB_ALIAS, update_fields=None])
Set force_update to True to disable existence checking ("SELECT (1) AS a FROM...").
https://docs.djangoproject.com/en/dev/ref/models/instances/

How does Django go about filtering an evaluated queryset?

I've cached a common queryset, which I would like to filter based off of different fields depending on the situation. I'm wondering if by filtering an evaluated queryset if I lose the advantage of caching it in the first place; does Django just create another queryset from scratch that's an aggregate of the querysets involved in creating the cached queryset and the filter that I apply afterwards?
Yes, the results get thrown out.
You can see this from the source: filter() calls _filter_or_exclude(), which calls _clone() and then adds to its query. _clone, you can see, doesn't set the _result_cache attribute.
In general, it's not really clear what it could possibly do to keep the common results. If it's a complicated query with a small result set, it could be replaced by just issuing SQL that checks that the primary key is one of the results you've found, but that's not always going to be more efficient, and in some situations it would confusingly mess with the semantics (if the DB changes in a way that affects the query results in the time between when it's cached and when you do the additional filter).
If you want to force this behavior of saving the IDs manually, you can do that:
pks = SomeObject.objects.filter(...).values_list('pk', flat=True)
some_of_them = SomeObject.objects.filter(pk_in=pks).filter(...)
others = SomeObject.objects.filter(pk_in=pks).filter(...)
You can also of course just do the filtering in Python, e.g. by
common = SomeObject.objects.filter(...)
some_of_them = [m for m in common if m.attribute == 'foo']
others = [m for m in common if m.other_attribute == 'bar']
(You could also use filter(lambda m: m.attribute == 'foo', common) if you preferred, or wrap the definition of common in list to be more explicit.)
Whether one of these or reissuing the query depends a lot on the size of the sets involved, the complexity of the filters, and what indices are present.