What is the recommended idiom for checking whether a query returned any results?
Example:
orgs = Organisation.objects.filter(name__iexact = 'Fjuk inc')
# If any results
# Do this with the results without querying again.
# Else, do something else...
I suppose there are several different ways of checking this, but I'd like to know how an experienced Django user would do it.
Most examples in the docs just ignore the case where nothing was found...
if not orgs:
# The Queryset is empty ...
else:
# The Queryset has results ...
Since version 1.2, Django has QuerySet.exists() method which is the most efficient:
if orgs.exists():
# Do this...
else:
# Do that...
But if you are going to evaluate QuerySet anyway it's better to use:
if orgs:
...
For more information read QuerySet.exists() documentation.
To check the emptiness of a queryset:
if orgs.exists():
# Do something
or you can check for a the first item in a queryset, if it doesn't exist it will return None:
if orgs.first():
# Do something
If you have a huge number of objects, this can (at times) be much faster:
try:
orgs[0]
# If you get here, it exists...
except IndexError:
# Doesn't exist!
On a project I'm working on with a huge database, not orgs is 400+ ms and orgs.count() is 250ms. In my most common use cases (those where there are results), this technique often gets that down to under 20ms. (One case I found, it was 6.)
Could be much longer, of course, depending on how far the database has to look to find a result. Or even faster, if it finds one quickly; YMMV.
EDIT: This will often be slower than orgs.count() if the result isn't found, particularly if the condition you're filtering on is a rare one; as a result, it's particularly useful in view functions where you need to make sure the view exists or throw Http404. (Where, one would hope, people are asking for URLs that exist more often than not.)
The most efficient way (before django 1.2) is this:
if orgs.count() == 0:
# no results
else:
# alrigh! let's continue...
I disagree with the predicate
if not orgs:
It should be
if not orgs.count():
I was having the same issue with a fairly large result set (~150k results). The operator is not overloaded in QuerySet, so the result is actually unpacked as a list before the check is made. In my case execution time went down by three orders.
You could also use this:
if(not(orgs)):
#if orgs is empty
else:
#if orgs is not empty
Related
Take any given queryset, qs = QS.objects.filter(active=True)
Is there i difference between:
if qs:
and
if qs.exists():
regarding load on the db, etc?
Yes, there's a difference:
if qs will use the __nonzero__ method of the QuerySet object, which calls _fetch_all which will in turn actually execute a full query (that's how I interpret it anyway).
exists() does something more efficient, as noted by Ewan. That's why this method... exists.
So, in short, use exists() when you only need to check for existence since that's what it's for.
From the documentation of exists()
Returns True if the QuerySet contains any results, and False if not. This tries to perform the query in the simplest and fastest way possible, but it does execute nearly the same query as a normal QuerySet query.
exists() is useful for searches relating to both object membership in a QuerySet and to the existence of any objects in a QuerySet, particularly in the context of a large QuerySet.
However they then go on to show a few examples and conclude that if qs vs if qs.exists() needs a large queryset for efficiency gains.
A final caveat from the documentation:
Additionally, if a some_queryset has not yet been evaluated, but you know that it will be at some point, then using some_queryset.exists() will do more overall work (one query for the existence check plus an extra one to later retrieve the results) than simply using bool(some_queryset), which retrieves the results and then checks if any were returned.
It produces the same result. From the help on https://docs.djangoproject.com/en/dev/ref/models/querysets/
exists()
Returns True if the QuerySet contains any results, and False if not.
bool()
Testing a QuerySet in a boolean context, ..., will cause the query to be executed. If there is at least one result, the QuerySet is True, otherwise False.
I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.
The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.
You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)
when got a QuerySet by using filter and it's easy to use the following code to do the change and save operation:
qs = SomeModel.objects.filter(owner_id=123)
# suppose qs has 1 or many elements
last_login_time = qs[0].last_login_time
qs[0].last_login_time = datetime.now() # I expect it can assign the new value, but it won't
assertEquals(qs[0].last_login_time, last_login_time) # YES, it doesn't change
qs[0].save() #So it won't update the old record
And after figuring this out, the following code will be used instead and it works:
qs = SomeModel.objects.filter(owner_id=123)
# suppose qs has 1 or many elements
obj = qs[0]
last_login_time = obj.last_login_time
obj.last_login_time = datetime.now() # I expect it can assign the new value, but it will
assertNotEquals(obj.last_login_time, last_login_time) # YES, it does change
obj.save() #So it will update the old record as expected
And I have met some of my friends/colleagues use the first approach to do the record updating. And IMO, it's natural and prone to use. (when you type qs[0] and type obj , they have the same type)
After reading the code(db.models.query), it can be figured out why.(when you subscript the QuerySet it will use the qs = self._clone() and assigning a value won't change at all)
Possible solutions:
make the assigning work for the
subscripting QuerySet
announce the
above first approach is wrong and
let the users know it
So I want to ask:
Is my question a real issue for django?(I'm wondering why django developer not make it work as expected)
What's your suggestion about this issue? And what's your preferred way for such an issue?
Use update for updating fields in a queryset.
I'm not really sure what you're asking here. Are you saying this is a bug? I don't think so, it's clearly defined behaviour: the queryset is lazy, but is evaluated when you iterate or slice it. Each time you do slice it, you get a new object. This is the logical consequence of the fact that slicing by itself doesn't cause the non-sliced queryset to be evaluated - if the result isn't already cached, slicing will perform a single database call with a LIMIT 1 to only get one result. Otherwise, you're left with extremely undesirable side-effects.
Now, if you think this could be better explained in the docs, you're welcome - and encouraged - to submit a bug with a patch that explains it better.
My lack of CS and inexperience is really coming to the forefront in this moment. I've never really handled filtering results server side. I'm thinking that this is not the right way to go about it. I'm using Django....
First, I assume that I can keep it DRYer by keeping this validation in my form definitions. Next, I was concerned about my chained filter statements. How important is it to use Q complex lookups as opposed to chaining filters at this point? I'm just building a prototype and I assume that I'll eventually have to go for a search solution more powerful than full text search.
My big issue right now (besides the length of the code and clearly the inefficiency) is that I'm not sure how to handle my rooms and workers inputs, which are select forms. If the user does not select a value, I want to remove these filters from the process server side. Should I just create two separate conditional series of lookups for these outcomes?
def search(request):
if request.method=='GET' and request.GET.get('region',''):
neighborhoods=request.GET.getlist('region')
min_rent=request.GET.get('min_cost','0')
min_rent=re.sub(r'[,]','',min_cost) #remove any ','
if re.search(r'[^\d]',min_cost):
min_cost=0
else:
min_cost=int(min_cost)
max_cost=request.GET.get('max_cost','0')
max_cost=re.sub(r'[,]','',max_cost) #remove any ','
if re.search(r'[^\d]',max_cost):
max_cost=100000
else:
max_cost=int(max_rent)
date_min=request.GET.get('from','')
date_max=request.GET.get('to','')
if not date_min:
date=(str(datetime.date.today()))
date_min=u'%s' %date
if not date_max:
date_max=u'2013-03-18'
rooms=request.GET.get('rooms',0)
if not rooms:
rooms=0
workers=request.GET.get('workers',0)
if not workers:
workers=0
#I should probably use Q objects here for complex lookups
posts=Post.objects.filter(region__in=region).filter(cost__gt=min_cost).filter(cost__lt=max_cost).filter(availability__gt=date_min).filter(availability__lt=date_max).filter(rooms=rooms).filter(workers=workers)
#return HttpResponse('%s' %posts)
return render_to_response("website/search.html",{'posts':posts),context_instance=RequestContext(request))
First, I assume that I can keep it
DRYer by keeping this validation in my
form definitions.
Yes, I'd put this in a form as it looks like you are using one to display the form anyways? Also, you can put a lot of your date formatting stuff right in the clean_FIELD methods to format the data in the cleaned_data dict. The only issue here is that output is actually modified so your users will see the change from 1,000 to 1000. Either way, I would put this logic in a form method.
# makes the view clean.
if form.is_valid():
form.get_posts(request)
return response
My big issue right now (besides the
length of the code and clearly the
inefficiency) is that I'm not sure how
to handle my rooms and workers inputs,
which are select forms. If the user
does not select a value, I want to
remove these filters from the process
server side. Should I just create two
separate conditional series of lookups
for these outcomes?
Q objects are only for complex lookups. I don't see a need for them here.
I also don't see why you need to chain the filters. I at first wondered if these are m2m, but these types of queries (__gt/__lt) don't behave any differently chaining as there is no overlap between the queries.
# this is more readable / concise.
# I'd combine as many of your queries as you can just for readability.
posts = Posts.objects.filter(
region__in=region,
cost__gte=min_cost,
# etc
)
Now, if you want optional arguments, my suggestion is to use a dictionary of keyword arguments so that you can dynamically populate the kwargs.
keyword_arguments = {
'region__in': region,
'cost__gte': min_cost,
'cost__lt': max_cost,
'availability__gt': date_min,
'availability__lt': date_max,
}
if request.GET.get('rooms'):
keyword_arguments['rooms'] = request.GET['rooms']
if request.GET.get('workers'):
keyword_arguments['workers'] = request.GET['workers']
posts = Posts.objects.filter(**keyword_arguments)
I have developed a few Django apps, all pretty straight-forward in terms of how I am interacting with the models.
I am building one now that has several different views which, for lack of a better term, are "canned" search result pages. These pages all return results from the same model, but they are filtered on different columns. One page we might be filtering on type, another we might be filtering on type and size, and on yet another we may be filtering on size only, etc...
I have written a function in views.py which is used by each of these pages, it takes a kwargs and in that are the criteria upon which to search. The minimum is one filter but one of the views has up to 4.
I am simply seeing if the kwargs dict contains one of the filter types, if so I filter the result on that value (I just wrote this code now, I apologize if any errors, but you should get the point):
def get_search_object(**kwargs):
q = Entry.objects.all()
if kwargs.__contains__('the_key1'):
q = q.filter(column1=kwargs['the_key1'])
if kwargs.__contains__('the_key2'):
q = q.filter(column2=kwargs['the_key2'])
return q.distinct()
Now, according to the django docs (http://docs.djangoproject.com/en/dev/topics/db/queries/#id3), these is fine, in that the DB will not be hit until the set is evaluated, lately though I have heard that this is not the most efficient way to do it and one should probably use Q objects instead.
I guess I am looking for an answer from other developers out there. My way currently works fine, if my way is totally wrong from a resources POV, then I will change ASAP.
Thanks in advance
Resource-wise, you're fine, but there are a lot of ways it can be stylistically improved to avoid using the double-underscore methods and to make it more flexible and easier to maintain.
If the kwargs being used are the actual column names then you should be able to pretty easily simplify it since what you're kind of doing is deconstructing the kwargs and rebuilding it manually but for only specific keywords.
def get_search_object(**kwargs):
entries = Entry.objects.filter(**kwargs)
return entries.distinct()
The main difference there is that it doesn't enforce that the keys be actual columns and pretty badly needs some exception handling in there. If you want to restrict it to a specific set of fields, you can specify that list and then build up a dict with the valid entries.
def get_search_object(**kwargs):
valid_fields = ['the_key1', 'the_key2']
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
If you want a fancier solution that just checks that it's a valid field on that model, you can (ab)use _meta:
def get_search_object(**kwargs):
valid_fields = [field.name for field in Entry._meta.fields]
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
In this case, your usage is fine from an efficiency standpoint. You would only need to use Q objects if you needed to OR your filters instead of AND.