Django get count of each age - django

I have this model:
class User_Data(AbstractUser):
date_of_birth = models.DateField(null=True,blank=True)
city = models.CharField(max_length=255,default='',null=True,blank=True)
address = models.TextField(default='',null=True,blank=True)
gender = models.TextField(default='',null=True,blank=True)
And I need to run a django query to get the count of each age. Something like this:
Age || Count
10 || 100
11 || 50
and so on.....

Here is what I did with lambda:
usersAge = map(lambda x: calculate_age(x[0]), User_Data.objects.values_list('date_of_birth'))
users_age_data_source = [[x, usersAge.count(x)] for x in set(usersAge)]
users_age_data_source = sorted(users_age_data_source, key=itemgetter(0))

There's a few ways of doing this. I've had to do something very similar recently. This example works in Postgres.
Note: I've written the following code the way I have so that syntactically it works, and so that I can write between each step. But you can chain these together if you desire.
First we need to annotate the queryset to obtain the 'age' parameter. Since it's not stored as an integer, and can change daily, we can calculate it from the date of birth field by using the database's 'current_date' function:
ud = User_Data.objects.annotate(
age=RawSQL("""(DATE_PART('year', current_date) - DATE_PART('year', "app_userdata"."date_of_birth"))::integer""", []),
)
Note: you'll need to change the "app_userdata" part to match up with the table of your model. You can pick this out of the model's _meta, but this just depends if you want to make this portable or not. If you do, use a string .format() to replace it with what the model's _meta provides. If you don't care about that, just put the table name in there.
Now we pick the 'age' value out so that we get a ValuesQuerySet with just this field
ud = ud.values('age')
And then annotate THAT queryset with a count of age
ud = ud.annotate(
count=Count('age'),
)
At this point we have a ValuesQuerySet that has both 'age' and 'count' as fields. Order it so it comes out in a sensible way..
ud = ud.order_by('age')
And there you have it.
You must build up the queryset in this order otherwise you'll get some interesting results. i.e; you can't group all the annotates together, because the second one for count depends on the first, and as a kwargs dict has no notion of what order the kwargs were defined in, when the queryset does field/dependency checking, it will fail.
Hope this helps.
If you aren't using Postgres, the only thing you'll need to change is the RawSQL annotation to match whatever database engine it is that you're using. However that engine can get the year of a date, either from a field or from its built in "current date" function..providing you can get that out as an integer, it will work exactly the same way.

Related

Django ORM. Filter many to many with AND clause

With the following models:
class Item(models.Model):
name = models.CharField(max_length=255)
attributes = models.ManyToManyField(ItemAttribute)
class ItemAttribute(models.Model):
attribute = models.CharField(max_length=255)
string_value = models.CharField(max_length=255)
int_value = models.IntegerField()
I also have an Item which has 2 attributes, 'color': 'red', and 'size': 3.
If I do any of these queries:
Item.objects.filter(attributes__string_value='red')
Item.objects.filter(attributes__int_value=3)
I will get Item returned, works as I expected.
However, if I try to do a multiple query, like:
Item.objects.filter(attributes__string_value='red', attributes__int_value=3)
All I want to do is an AND. This won't work either:
Item.objects.filter(Q(attributes__string_value='red') & Q(attributes__int_value=3))
The output is:
<QuerySet []>
Why? How can I build such a query that my Item is returned, because it has the attribute red and the attribute 3?
If it's of any use, you can chain filter expressions in Django:
query = Item.objects.filter(attributes__string_value='red').filter(attributes__int_value=3')
From the DOCS:
This takes the initial QuerySet of all entries in the database, adds a filter, then an exclusion, then another filter. The final result is a QuerySet containing all entries with a headline that starts with “What”, that were published between January 30, 2005, and the current day.
To do it with .filter() but with dynamic arguments:
args = {
'{0}__{1}'.format('attributes', 'string_value'): 'red',
'{0}__{1}'.format('attributes', 'int_value'): 3
}
Product.objects.filter(**args)
You can also (if you need a mix of AND and OR) use Django's Q objects.
Keyword argument queries – in filter(), etc. – are “AND”ed together. If you need to execute more complex queries (for example, queries with OR statements), you can use Q objects.
A Q object (django.db.models.Q) is an object used to encapsulate a
collection of keyword arguments. These keyword arguments are specified
as in “Field lookups” above.
You would have something like this instead of having all the Q objects within that filter:
** import Q from django
from *models import Item
#assuming your arguments are kwargs
final_q_expression = Q(kwargs[1])
for arg in kwargs[2:..]
final_q_expression = final_q_expression & Q(arg);
result = Item.objects.filter(final_q_expression)
This is code I haven't run, it's out of the top of my head. Treat it as pseudo-code if you will.
Although, this doesn't answer why the ways you've tried don't quite work. Maybe it has to do with the lookups that span relationships, and the tables that are getting joined to get those values. I would suggest printing yourQuerySet.query to visualize the raw SQL that is being formed and that might help guide you as to why .filter( Q() & Q()) is not working.

Select DISTINCT individual columns in django?

I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.
One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()
It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4
User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()
The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.

chain filter and exclude on django model with field lookups that span relationships

I have the following models:
class Order_type(models.Model):
description = models.CharField()
class Order(models.Model):
type= models.ForeignKey(Order_type)
order_date = models.DateField(default=datetime.date.today)
status = models.CharField()
processed_time= models.TimeField()
I want a list of the order types that have orders that meet this criteria: (order_date <= today AND processed_time is empty AND status is not blank)
I tried:
qs = Order_type.objects.filter(order__order_date__lte=datetime.date.today(),\
order__processed_time__isnull=True).exclude(order__status='')
This works for the original list of orders:
orders_qs = Order.objects.filter(order_date__lte=datetime.date.today(), processed_time__isnull=True)
orders_qs = orders_qs.exclude(status='')
But qs isn't the right queryset. I think its actually returning a more narrowed filter (since no records are present) but I'm not sure what. According to this (django reference), because I'm referencing a related model I think the exclude works on the original queryset (not the one from the filter), but I don't get exactly how.
OK, I just thought of this, which I think works, but feels sloppy (Is there a better way?):
qs = Order_type.objects.filter(order__id__in=[o.id for o in orders_qs])
What's happening is that the exclude() query is messing things up for you. Basically, it's excluding any Order_type that has at least one Order without a status, which is almost certainly not what you want to happen.
The simplest solution in your case is to use order__status__gt='' in you filter() arguments. However, you will also need to append distinct() to the end of your query, because otherwise you'd get a QuerySet with multiple instances of the same Order_type if it has more than one Order that matches the query. This should work:
qs = Order_type.objects.filter(
order__order_date__lte=datetime.date.today(),
order__processed_time__isnull=True,
order__status__gt='').distinct()
On a side note, in the qs query you gave at the end of the question, you don't have to say order__id__in=[o.id for o in orders_qs], you can simply use order__in=orders_qs (you still also need the distinct()). So this will also work:
qs = Order_type.objects.filter(order__in=Order.objects.filter(
order_date__lte=datetime.date.today(),
processed_time__isnull=True).exclude(status='')).distinct()
Addendum (edit):
Here's the actual SQL that Django issues for the above querysets:
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
LEFT OUTER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE ("testapp_order"."order_date" <= E'2010-07-18'
AND "testapp_order"."processed_time" IS NULL
AND "testapp_order"."status" > E'' );
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
INNER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE "testapp_order"."id" IN
(SELECT U0."id" FROM "testapp_order" U0
WHERE (U0."order_date" <= E'2010-07-18'
AND U0."processed_time" IS NULL
AND NOT (U0."status" = E'' )));
EXPLAIN reveals that the second query is ever so slightly more expensive (cost of 28.99 versus 28.64 with a very small dataset).

Retrieving unique results in Django queryset based on column contents

I am not sure if the title makes any sense but here is the question.
Context: I want to keep track of which students enter and leave a classroom, so that at any given time I can know who is inside the classroom. I also want to keep track, for example, how many times a student has entered the classroom. This is a hypothetical example that is quite close to what I want to achieve.
I made a table Classroom and each entry has a Student (ForeignKey), Action (enter,leave), and Date.
My question is how to get the students that are currently inside (ie. their enter actions' date is later than their leave actions' date, or don't have a leave date), and how to specify a date range to get the students that were inside the classroom at that time.
Edit: On better thought I should also add that there are more than one classrooms.
my first attempt was something like this:
students_in = Classroom.objects.filter(classroom__exact=1, action__exact='1')
students_out = Classroom.objects.filter(classroom__exact=1, action__exact='0').values_list('student', flat=True)
students_now = students_in.exclude(student__in=students_out)
where if action == 1 is in, 0 is out.
This however provides the wrong data as soon as a student leaves a classroom and re-enters. She is listed twice in the students_now queryset, as there are two 'enters' and one 'leave'. Also, I can't check upon specific date ranges to see which students have an entry date that is later than their leave date.
To check a field based on the value of another field, use the F() operator.
from django.db.models import F
students_in_classroom_now = Student.objects.filter(leave__gte=F('enter'))
To get all students in the room at a certain time:
import datetime
start_time = datetime.datetime(2010, 1, 21, 10, 0, 0) # 10am yesterday
students_in_classroom_then = Student.objects.filter(enter__lte=start_time,
leave__gte=start_time)
Django gives you the Q() and F() operators, which are very powerful and enough for most of the situations. However I don't think that it will be enough for you. Let's think about your problem at the SQL level.
We have something like a table Classroom ( action, ts, student_id ). In order to know which students are at the classroom right now, we would have to make something like:
with ( /* temporary view with last user_action */
select action, max(ts) xts, student_id
from Classroom
group by action, student_id
) as uber_table
select a.student_id student_id
from uber_table a, uber_table b
where a.action = 'enter'
/* either he entered and never left */
and (a.student_id not in (select student_id from uber_table where action = 'leave')
/* or he left before he entered again, so he's still in */
or (a.student_id = b.student_id and b.action = 'leave' and b.xts < a.xts))
This is, I believe, standard SQL. However, if you're using SQLite or MySQL as database backends (most likely you are), then stuff like the WITH keyword for creating temporary views probably isn't supported and the query will just have to get even more complex. There may be a simpler version but I don't really see it.
My point here is that when you get to this level of complexity, F() and Q() become inadequate tools for the job, so I'd rather recommend that you write the SQL code by hand and use Raw SQL in Django.
Should you need to use the more common data access APIs, you should probably rewrite your data model in the way #Daniel Roseman implied.
By the way, a query for getting people that were inside the classroom in the same interval is just like that one, but all you have to do is limit the last leave ts to the beginning of the interval and the last enter ts to the end of the interval.

How do I get the related objects In an extra().values() call in Django?

Thank to this post I'm able to easily do count and group by queries in a Django view:
Django equivalent for count and group by
What I'm doing in my app is displaying a list of coin types and face values available in my database for a country, so coins from the UK might have a face value of "1 farthing" or "6 pence". The face_value is the 6, the currency_type is the "pence", stored in a related table.
I have the following code in my view that gets me 90% of the way there:
def coins_by_country(request, country_name):
country = Country.objects.get(name=country_name)
coin_values = Collectible.objects.filter(country=country.id, type=1).extra(select={'count': 'count(1)'},
order_by=['-count']).values('count', 'face_value', 'currency_type')
coin_values.query.group_by = ['currency_type_id', 'face_value']
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
The currency_type_id comes across as the number stored in the foreign key field (i.e. 4). What I want to do is retrieve the actual object that it references as part of the query (the Currency model, so I can get the Currency.name field in my template).
What's the best way to do that?
You can't do it with values(). But there's no need to use that - you can just get the actual Collectible objects, and each one will have a currency_type attribute that will be the relevant linked object.
And as justinhamade suggests, using select_related() will help to cut down the number of database queries.
Putting it together, you get:
coin_values = Collectible.objects.filter(country=country.id,
type=1).extra(
select={'count': 'count(1)'},
order_by=['-count']
).select_related()
select_related() got me pretty close, but it wanted me to add every field that I've selected to the group_by clause.
So I tried appending values() after the select_related(). No go. Then I tried various permutations of each in different positions of the query. Close, but not quite.
I ended up "wimping out" and just using raw SQL, since I already knew how to write the SQL query.
def coins_by_country(request, country_name):
country = get_object_or_404(Country, name=country_name)
cursor = connection.cursor()
cursor.execute('SELECT count(*), face_value, collection_currency.name FROM collection_collectible, collection_currency WHERE collection_collectible.currency_type_id = collection_currency.id AND country_id=%s AND type=1 group by face_value, collection_currency.name', [country.id] )
coin_values = cursor.fetchall()
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
If there's a way to phrase that exact query in the Django queryset language I'd be curious to know. I imagine that an SQL join with a count and grouping by two columns isn't super-rare, so I'd be surprised if there wasn't a clean way.
Have you tried select_related() http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
I use it a lot it seems to work well then you can go coin_values.currency.name.
Also I dont think you need to do country=country.id in your filter, just country=country but I am not sure what difference that makes other than less typing.