I'm confused about how distinct() works with Django queries - django

I have this query:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location)
Which works great for telling me how many checkins have happened in my date range for a specific location. But I want know how many checkins were done by unique users. So I tried this:
checkins = CheckinAct.objects.filter(time__range=[start, end], location=checkin.location).values('user').distinct()
But that doesn't work, I get back an empty Array. Any ideas why?
Here is my CheckinAct model:
class CheckinAct(models.Model):
user = models.ForeignKey(User)
location = models.ForeignKey(Location)
time = models.DateTimeField()
----Update------
So now I have updated my query to look like this:
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
But I'm still getting multiple objects back that have the same user, like so:
[{'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
---- Update 2------
Here is something else I tried, but I'm still getting lots of identical user objects back when I log the checkins object.
checkins = CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user')).values('user', 'dcount')
logger.info("checkins!!! : " + str(checkins))
Logs the following:
checkins!!! : [{'user': 15521L}, {'user': 15521L}, {'user': 15521L}]
Notice how there are 3 instances of the same user object. Is this working correctly or not? Is there a difference way to read out what comes back in the dict object? I just need to know how many unique users check into that specific location during the time range.

The answer is actually right in the Django docs. Unfortunately, very little attention is drawn to the importance of the particular part you need; so it's understandably missed. (Read down a little to the part dealing with Items.)
For your use-case, the following should give you exactly what you want:
checkins = CheckinAct.objects.filter(time__range=[start,end], location=checkin.location).\
values('user').annotate(checkin_count=Count('pk')).order_by()
UPDATE
Based on your comment, I think the issue of what you wanted to achieve has been confused all along. What the query above gives you is a list of the number of times each user checked in at a location, without duplicate users in said list. It now seems what you really wanted was the number of unique users that checked in at one particular location. To get that, use the following (which is much simpler anyways):
User.objects.filter(checkinat__location=location).distinct().count()
UPDATE for non-rel support
checkin_users = [(c.user.pk, c.user) for c in CheckinAct.objects.filter(location=location)]
unique_checkins = len(dict(checkin_users))
This works off the principle that dicts have unique keys. So when you convert the list of tuples to a dict, you end up with a list of unique users. But, this will generate 1*N queries, where N is the total amount of checkins (one query each time the user attribute is used. Normally, I'd do something like .select_related('user'), but that too requires a JOIN, which is apparently out. JOINs not being supported seems like a huge downside to non-rel, if true, but if that's the case this is going to be your only option.

You don't want DISTINCT. You actually want Django to do something that will end up giving you a GROUP BY clause. You are also correct that your final solution is to combine annotate() and values(), as discussed in the Django documentation.
What you want to do to get your results is to use annotate first, and then values, such as:
CheckinAct.objects.filter(
time__range=[start, end],
location=checkin.location,
).annotate(dcount=Count('user').values('user', 'dcount')
The Django docs at the link I gave you above show a similarly constructed query (minus the filter aspect, which I added for your case in the proper location), and note that this will "now yield one unique result for each [checkin act]; however, only the [user] and the [dcount] annotation will be returned in the output data". (I edited the sentence to fit your case, but the principle is the same).
Hope that helps!

checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(dcount=Count('user'))
If I am not mistaken, wouldn't the value you want be in the input as "dcount"? As a result, isn't that just being discarded when you decide to output the user value alone?
Can you tell me what happens when you try this?
checkins = CheckinAct.objects.values('user').\
filter(time__range=[start, end], location=checkin.location).\
annotate(Count('user')).order_by()
(The last order_by is to clear any built-in ordering that you may already have at the model level - not sure if you have anything like that, but doesn't hurt to ask...)

Related

Django Query without referencing fields

Quick overview:
I have a Forecast model setup which has a workflow_state.
Now I'm trying to query the Forecast for all the forecasts that are in a certain state AND the current person logged in is_staff.
If i was writing a raw query this wouldn't be an issue because i could write something like:
SELECT * FROM forecast WHERE forecast.workflow_state_id in (1,2,3,4) AND 1 = user.is_staff
However, when trying to write this in a queryset I can't figure out how to reference a constant. I don't want to write a raw queryset and if possible want to avoid using the extra field.
Any suggestions would be appreciated. Thanks
Your constant is only a True
Forecast.objects.filter(workflow_state__in=[1,2,3,4], user__is_staf=True)
Your edit makes things rather less than clear, but you seem to be asking how to do a check on the current logged-in user, rather than on the user referenced by the model. In which case, you don't do that in a query at all; your example SQL statement wouldn't work, and neither would doing it in the ORM. You do it in Python, of course:
if request.user.is_staff:
forecasts = Forecast.objects.filter(workflow_state__in=[1,2,3,4])

Django views - optimum query set in a ForeignKey model

Having the model:
class Notebook(models.Model):
n_id = models.AutoField(primary_key = True)
class Note(models.Model):
b_nbook = models.ForeignKey(Notebook)
the URL pattern passing one parameter:
(r'^(?P<n_id>\d+)/$', 'notebook_notes')
and the following view:
def notebook_notes(request, n_id):
nbook = get_object_or_404(Nbook, pk=n_id)
...
which of the following is the optimum query set, and why? (they both work and pass the notes based to a selected by URL notebook)
notes = nbook.note_set.filter(b_nbook = n_id)
notes = Note.objects.select_related().filter(b_nbook = n_id)
Well you're comparing apples and oranges a bit there. They may return virtually the same, but you're doing different things on both.
Let's take the relational version first. That query is saying get all the notes that belong to nbook. You're then filtering that queryset by only notes that belong to nbook. You're filtering it twice on the same criteria, in effect. Since Django's querysets are lazy, it doesn't really do anything bad, like hit the database multiple times, but it's still unnecessary.
Now, the second version. Here, you're starting with all notes and filtering to just those that belong to the particular notebook. There's only one filter this time, but it's bad form to do it this way. Since it's a relation, you should look it up through the relational format, i.e. nbook.note_set.all(). On this version, though, you're also using select_related(), which wasn't used on the other version.
select_related will attempt to create a join table with any other relations on the model, in this case a Note. However, since the only relation on Note is Notebook and you already have the notebook, it's redundant.
Taking out all the redundancy in those two version leaves you with just:
notes = nbook.note_set.all()
That, too, will return exactly the same results as the other two version, but is much cleaner and standardized.

Select DISTINCT individual columns in django?

I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.
One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()
It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4
User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()
The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.

Django most efficient way to do this?

I have developed a few Django apps, all pretty straight-forward in terms of how I am interacting with the models.
I am building one now that has several different views which, for lack of a better term, are "canned" search result pages. These pages all return results from the same model, but they are filtered on different columns. One page we might be filtering on type, another we might be filtering on type and size, and on yet another we may be filtering on size only, etc...
I have written a function in views.py which is used by each of these pages, it takes a kwargs and in that are the criteria upon which to search. The minimum is one filter but one of the views has up to 4.
I am simply seeing if the kwargs dict contains one of the filter types, if so I filter the result on that value (I just wrote this code now, I apologize if any errors, but you should get the point):
def get_search_object(**kwargs):
q = Entry.objects.all()
if kwargs.__contains__('the_key1'):
q = q.filter(column1=kwargs['the_key1'])
if kwargs.__contains__('the_key2'):
q = q.filter(column2=kwargs['the_key2'])
return q.distinct()
Now, according to the django docs (http://docs.djangoproject.com/en/dev/topics/db/queries/#id3), these is fine, in that the DB will not be hit until the set is evaluated, lately though I have heard that this is not the most efficient way to do it and one should probably use Q objects instead.
I guess I am looking for an answer from other developers out there. My way currently works fine, if my way is totally wrong from a resources POV, then I will change ASAP.
Thanks in advance
Resource-wise, you're fine, but there are a lot of ways it can be stylistically improved to avoid using the double-underscore methods and to make it more flexible and easier to maintain.
If the kwargs being used are the actual column names then you should be able to pretty easily simplify it since what you're kind of doing is deconstructing the kwargs and rebuilding it manually but for only specific keywords.
def get_search_object(**kwargs):
entries = Entry.objects.filter(**kwargs)
return entries.distinct()
The main difference there is that it doesn't enforce that the keys be actual columns and pretty badly needs some exception handling in there. If you want to restrict it to a specific set of fields, you can specify that list and then build up a dict with the valid entries.
def get_search_object(**kwargs):
valid_fields = ['the_key1', 'the_key2']
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
If you want a fancier solution that just checks that it's a valid field on that model, you can (ab)use _meta:
def get_search_object(**kwargs):
valid_fields = [field.name for field in Entry._meta.fields]
filter_dict = {}
for key in kwargs:
if key in valid_fields:
filter_dict[key] = kwargs[key]
entries = Entry.objects.filter(**filter_dict)
return entries.distinct()
In this case, your usage is fine from an efficiency standpoint. You would only need to use Q objects if you needed to OR your filters instead of AND.

How do I get the related objects In an extra().values() call in Django?

Thank to this post I'm able to easily do count and group by queries in a Django view:
Django equivalent for count and group by
What I'm doing in my app is displaying a list of coin types and face values available in my database for a country, so coins from the UK might have a face value of "1 farthing" or "6 pence". The face_value is the 6, the currency_type is the "pence", stored in a related table.
I have the following code in my view that gets me 90% of the way there:
def coins_by_country(request, country_name):
country = Country.objects.get(name=country_name)
coin_values = Collectible.objects.filter(country=country.id, type=1).extra(select={'count': 'count(1)'},
order_by=['-count']).values('count', 'face_value', 'currency_type')
coin_values.query.group_by = ['currency_type_id', 'face_value']
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
The currency_type_id comes across as the number stored in the foreign key field (i.e. 4). What I want to do is retrieve the actual object that it references as part of the query (the Currency model, so I can get the Currency.name field in my template).
What's the best way to do that?
You can't do it with values(). But there's no need to use that - you can just get the actual Collectible objects, and each one will have a currency_type attribute that will be the relevant linked object.
And as justinhamade suggests, using select_related() will help to cut down the number of database queries.
Putting it together, you get:
coin_values = Collectible.objects.filter(country=country.id,
type=1).extra(
select={'count': 'count(1)'},
order_by=['-count']
).select_related()
select_related() got me pretty close, but it wanted me to add every field that I've selected to the group_by clause.
So I tried appending values() after the select_related(). No go. Then I tried various permutations of each in different positions of the query. Close, but not quite.
I ended up "wimping out" and just using raw SQL, since I already knew how to write the SQL query.
def coins_by_country(request, country_name):
country = get_object_or_404(Country, name=country_name)
cursor = connection.cursor()
cursor.execute('SELECT count(*), face_value, collection_currency.name FROM collection_collectible, collection_currency WHERE collection_collectible.currency_type_id = collection_currency.id AND country_id=%s AND type=1 group by face_value, collection_currency.name', [country.id] )
coin_values = cursor.fetchall()
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
If there's a way to phrase that exact query in the Django queryset language I'd be curious to know. I imagine that an SQL join with a count and grouping by two columns isn't super-rare, so I'd be surprised if there wasn't a clean way.
Have you tried select_related() http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
I use it a lot it seems to work well then you can go coin_values.currency.name.
Also I dont think you need to do country=country.id in your filter, just country=country but I am not sure what difference that makes other than less typing.