I am trying to find out if it is safe to do the following:
items = MyModel.objects.filter(q)
if items:
list(items)
I know that the QuerySet is being evaluated in the if statement, checking if the database returns an empty set. But I want to know if the value of that evaluation is reused when doing list(items). Is the QuerySet being evaluated here too, or is it using the previously evaluated one?
I know I could just do the following:
items = MyModel.objects.filter(q).all()
if items:
list(items)
And this would result in one evaluation, but I am just trying to find out the behavior the first variation has. I have gone throught these pieces of doc (1 2) but couldn't really find a straight answer to this matter.
No. Both will not execute twice (internally .filter(), .all() and .filter().all() are same). You can check it in django shell itself
from django.db import connection
print connection.queries
items = MyModel.objects.filter(q).all() #or MyModel.objects.filter(q)
if items:
list(items)
print connection.queries
Then here is the magic of .all()
queryset = MyModel.objects.all() #no db hit, hit=0
print list(queryset) #db hit, hit=1
print list(queryset) #no db hit, hit=1
print list(queryset.all()) #db hit, hit=2
print list(queryset.all()) #db hit, hit=3
That means .all() on an evaluated queryset will force db hit.
When a QuerySet is evaluated, it typically caches its results. If the data in the database might have changed since a QuerySet was evaluated, you can get updated results for the same query by calling all() on a previously evaluated QuerySet
It will reuse it's cache, because when you do
if items:
It will call __bool__ method
def __bool__(self):
self._fetch_all()
return bool(self._result_cache)
So as you see inside __bool__ it does call _fetch_all. Which caches data
def _fetch_all(self):
if self._result_cache is None:
self._result_cache = list(self.iterator())
if self._prefetch_related_lookups and not self._prefetch_done:
self._prefetch_related_objects()
For better perfomance do:
items = MyModel.objects.filter(q) # no evaluation
if items.exists(): # evaluates, hits db
# do stuff here # further actions evaluates, hits db
Related
PostgreSQL is able to have .distint('field name') database queries, however Sqlite isn't, so I created a try/except block which should run a more simple query if the user is using sqlite3.
try:
qs = qs.filter(tag__istartswith=self.q).order_by('tag').distinct('tag')
except NotImplementedError:
qs = qs.filter(tag__istartswith=self.q)
So if the user is using Sqlite I would expect the simple query in the except block to get executed, however the exception is thrown and the simple query never gets executed:
raise NotImplementedError('DISTINCT ON fields is not supported by this database backend')
NotImplementedError: DISTINCT ON fields is not supported by this database backend
Do you have any idea why this isn't working as expected?
Thanks
Because querysets are lazy and the error is not raised while constructing the query, but when you actually evaluate it.
You can try forcing your query to be evaluated to catch the exception:
try:
qs = qs.filter(tag__istartswith=self.q).order_by('tag').distinct('tag')
dummy_boolean_var = qs.exists()
except NotImplementedError:
qs = qs.filter(tag__istartswith=self.q)
EDIT: Apparently my untested version did not work as the assignment in the except clause tries to modify the original query. This is the working version as tested by OP:
try:
qa = qs
qs = qs.filter(tag__istartswith=self.q).order_by('tag').distinct('tag')
dummy_boolean_var = qs.exists()
except NotImplementedError:
qs = qa
qs = qs.filter(tag__istartswith=self.q)
Here is sample codes in django.
[Case 1]
views.py
from sampleapp.models import SampleModel
from django.core.cache import cache
def get_filtered_data():
result = cache.get("result")
# make cache if result not exists
if not result:
result = SampleModel.objects.filter(field_A="foo")
cache.set("result", result)
return render_to_response('template.html', locals(), context_instance=RequestContext(request))
template.html
{% for case in result %}
<p>{{ case.field_A}}</p>
{% endfor %}
In this case, there's no generated query after cache made. I checked it by django_debug_toolbar.
[Case 2]
views.py - added one line result = result.order_by('?')
from sampleapp.models import SampleModel
from django.core.cache import cache
def get_filtered_data():
result = cache.get("result")
# make cache if result not exists
if not result:
result = SampleModel.objects.filter(field_A="foo")
cache.set("result", result)
result = result.order_by('?')
return render_to_response('template.html', locals(), context_instance=RequestContext(request))
template.html - same as previous one
In this case, it generated new query even though I cached filtered query.
How can I adapt random ordering without additional queryset?
I can't put order_by('?') when making a cache.
(e.g. result = SampleModel.objects.filter(field_A="foo").order_by('?'))
Because it even caches random order.
Is it related with 'django queryset is lazy' ?
Thanks in advance.
.order_by performs sorting at database level.
Here is an example. We store lasy queryset in var results. No query has been made yet:
results = SampleModel.objects.filter(field_A="foo")
Touch the results, for example, by iterating it:
for r in results: # here query was send to database
# ...
Now, if we'll do it again, no attempt to database will be made, as we already have this exact query:
for r in results: # no query send to database
# ...
But, when you apply .order_by, the query will be different. So, django has to send new request to database:
for r in results.order_by('?'): # new query was send to database
# ...
Solution
When you do the query in django, and you know, that you will get all elements from that query (i.e., no OFFSET and LIMIT), then you can process those elements in python, after you get them from database.
results = list(SampleModel.objects.filter(field_A="foo")) # convert here queryset to list
At that line query was made and you have all elements in results.
If you need to get random order, do it in python now:
from random import shuffle
shuffle(results)
After that, results will have random order without additional query being send to database.
I have a function that search with hatstack, and I need to get the comments of each object that haystack get in the array, I have this:
def search(request):
if 'q' in request.GET and request.GET['q']:
q = request.GET['q']
results = SearchQuerySet().auto_query(q)
things = []
for r in results:
things.append(r.object)
return render_to_response('resultados.html',
{'things': things, 'query': q}, context_instance=RequestContext(request))
How I append to the results the number of comments that each object have?
If I add annotate, debugger throw me: SearchQuerySet has not 'annotate' attribute
SearchQuerySet isn't the ORM query set you're familiar with. It only imitates it. Annotations doesn't make sense with search engines as well. You need to put already prepared data to an index.
Just make another query using ORM.
I'm trying to cache access to the Django profile object. I'm using django-redis-cache to cache data in this project. I'm using a snippet for automatically creating a profile if one does not exist. Here is a simplified version of what I am doing (without caching):
User.profile = property(lambda u: UserProfile.objects.get_or_create(user=u)[0])
Whenever profile information is needed, the user.profile property is accessed. That works as expected, however, when I try to cache the profile property, such as in Exhibit 1, I still see SQL queries (in django-debug-toolbar) that are selecting the profile and are not taking advantage of the cache.
Specifically, the cache_object_list() function from Exhibit 2 is a bit of code that checks to see if a cached value is available. If it is, it calls the cache key. If not, it runs the query passed to it (via the "query" argument) and caches the results.
cache_object_list() prints "Hit" or "Miss" indicating a cache hit or miss. After refreshing twice, everything is reported as a hit (as expected). However, django-debug-toolbar still shows no reduction in query count and shows queries selecting the profile.
Does anyone have any advice as to how to ensure that the user.profile pulls a cached version of the profile when available? Thanks for reading.
Exhibit 1: myproject/myapp/models.py
def get_or_create_profile(u):
return cache_utils.cache_single_object(
"user_get_or_create_profile",
u.id, UserProfile.objects.get_or_create(user=u)[0])
User.profile = property(lambda u: cache_utils.cache_single_object(
"user_get_or_create_profile", u.id,
get_or_create_profile(u)))
Exhibit 2: myproject/cache_utils.py
def cache_single_object(key_prefix, id, query, timeout=500):
key = '%s_%s' % (key_prefix, id)
object_list = cache.get(key, None)
if object_list is None:
print "Miss %s" % (key)
object_list = query
cache.set(key, object_list, timeout)
else:
print "Hit %s" % (key)
return object_list
Exhibit 3: myproject/templates/mytemplate.py
<div>Example of what's in the template </div>
{{ myobject.owner.profile.bio }}
I think the problem is related to the way you defined your method....
User.profile = property(lambda u: cache_utils.cache_single_object(
"user_get_or_create_profile", u.id,
get_or_create_profile(u)))
when you access the profile property, you will always call the method get_or_create_profile(u) that calls:
return cache_utils.cache_single_object(
"user_get_or_create_profile",
u.id, UserProfile.objects.get_or_create(user=u)[0])
having UserProfile.objects.get_or_create(user=u) there is what is creating your query every single time even if you already have the data in the cache. I think you should try using a util method where you don't evaluate the query every time you call it. Maybe something like this: https://stackoverflow.com/a/2216326/234304
I read here that Django querysets are lazy, it won't be evaluated until it is actually printed. I have made a simple pagination using the django's built-in pagination. I didn't realize there were apps already such as "django-pagination", and "django-endless" which does that job for.
Anyway I wonder whether the QuerySet is still lazy when I for example do this
entries = Entry.objects.filter(...)
paginator = Paginator(entries, 10)
output = paginator.page(page)
return HttpResponse(output)
And this part is called every time I want to get whatever page I currently I want to view.
I need to know since I don't want unnecessary load to the database.
If you want to see where are occurring, import django.db.connection and inspect queries
>>> from django.db import connection
>>> from django.core.paginator import Paginator
>>> queryset = Entry.objects.all()
Lets create the paginator, and see if any queries occur:
>>> paginator = Paginator(queryset, 10)
>>> print connection.queries
[]
None yet.
>>> page = paginator.page(4)
>>> page
<Page 4 of 788>
>>> print connection.queries
[{'time': '0.014', 'sql': 'SELECT COUNT(*) FROM `entry`'}]
Creating the page has produced one query, to count how many entries are in the queryset. The entries have not been fetched yet.
Assign the page's objects to the variable 'objects':
>>> objects = page.object_list
>>> print connection.queries
[{'time': '0.014', 'sql': 'SELECT COUNT(*) FROM `entry`'}]
This still hasn't caused the entries to be fetched.
Generate the HttpResponse from the object list
>>> response = HttpResponse(page.object_list)
>>> print connection.queries
[{'time': '0.014', 'sql': 'SELECT COUNT(*) FROM `entry`'}, {'time': '0.011', 'sql': 'SELECT `entry`.`id`, <snip> FROM `entry` LIMIT 10 OFFSET 30'}]
Finally, the entries have been fetched.
It is. Django's pagination uses the same rules/optimizations that apply to querysets.
This means it will start evaluating on return HttpResponse(output)