Retrieving unique results in Django queryset based on column contents

Retrieving unique results in Django queryset based on column contents - django

I am not sure if the title makes any sense but here is the question.
Context: I want to keep track of which students enter and leave a classroom, so that at any given time I can know who is inside the classroom. I also want to keep track, for example, how many times a student has entered the classroom. This is a hypothetical example that is quite close to what I want to achieve.
I made a table Classroom and each entry has a Student (ForeignKey), Action (enter,leave), and Date.
My question is how to get the students that are currently inside (ie. their enter actions' date is later than their leave actions' date, or don't have a leave date), and how to specify a date range to get the students that were inside the classroom at that time.
Edit: On better thought I should also add that there are more than one classrooms.
my first attempt was something like this:
students_in = Classroom.objects.filter(classroom__exact=1, action__exact='1')
students_out = Classroom.objects.filter(classroom__exact=1, action__exact='0').values_list('student', flat=True)
students_now = students_in.exclude(student__in=students_out)
where if action == 1 is in, 0 is out.
This however provides the wrong data as soon as a student leaves a classroom and re-enters. She is listed twice in the students_now queryset, as there are two 'enters' and one 'leave'. Also, I can't check upon specific date ranges to see which students have an entry date that is later than their leave date.

To check a field based on the value of another field, use the F() operator.
from django.db.models import F
students_in_classroom_now = Student.objects.filter(leave__gte=F('enter'))
To get all students in the room at a certain time:
import datetime
start_time = datetime.datetime(2010, 1, 21, 10, 0, 0) # 10am yesterday
students_in_classroom_then = Student.objects.filter(enter__lte=start_time,
leave__gte=start_time)

Django gives you the Q() and F() operators, which are very powerful and enough for most of the situations. However I don't think that it will be enough for you. Let's think about your problem at the SQL level.
We have something like a table Classroom ( action, ts, student_id ). In order to know which students are at the classroom right now, we would have to make something like:
with ( /* temporary view with last user_action */
select action, max(ts) xts, student_id
from Classroom
group by action, student_id
) as uber_table
select a.student_id student_id
from uber_table a, uber_table b
where a.action = 'enter'
/* either he entered and never left */
and (a.student_id not in (select student_id from uber_table where action = 'leave')
/* or he left before he entered again, so he's still in */
or (a.student_id = b.student_id and b.action = 'leave' and b.xts < a.xts))
This is, I believe, standard SQL. However, if you're using SQLite or MySQL as database backends (most likely you are), then stuff like the WITH keyword for creating temporary views probably isn't supported and the query will just have to get even more complex. There may be a simpler version but I don't really see it.
My point here is that when you get to this level of complexity, F() and Q() become inadequate tools for the job, so I'd rather recommend that you write the SQL code by hand and use Raw SQL in Django.
Should you need to use the more common data access APIs, you should probably rewrite your data model in the way #Daniel Roseman implied.
By the way, a query for getting people that were inside the classroom in the same interval is just like that one, but all you have to do is limit the last leave ts to the beginning of the interval and the last enter ts to the end of the interval.

Related

Annotate one part of a range to a new field

So we've been using a DateTimeRangeField in a booking model to denote start and end. The rationale for this might not have been great —separate start and end fields might have been better in hindsight— but we're over a year into this now and there's no going back.
It's generally been fine except I need to annotate just the end datetime onto a related model's query. And I can't work out the syntax.
Here's a little toy example where I want a list of Employees with end of their last booking annotated on.
class Booking(models.Model):
timeframe = DateTimeRangeField()
employee = models.ForeignKey('Employee')
sq = Booking.objects.filter(employee=OuterRef('pk')).values('timeframe')
Employee.objects.annotate(last_on_site=Subquery(sq, output_field=DateTimeField()))
That doesn't work because the annotated value is the range, not the single value. I've tried a heap of modifiers (egs __1 .1 but nothing works).
Is there a way to get just the one value? I guess you could simulate this without the complication of the subquery just doing a simple values lookup. Booking.objects.values('timeframe__start') (or whatever). That's essentially what I'm trying to do here.

Thanks to some help in IRC, it turns out you can use the RangeStartsWith and RangeEndsWith model transform classes directly. These are the things that are normally just registered to provide you with a __startswith filter access to range values, but directly they can pull back the value.
In my example, that means just modifying the annotation slightly:
from django.contrib.postgres.fields.ranges import RangeEndsWith
sq = Booking.objects.filter(employee=OuterRef('pk')).values('timeframe')
Employee.objects.annotate(last_on_site=RangeEndsWith(Subquery(sq[:1])))

Django get count of each age

I have this model:
class User_Data(AbstractUser):
date_of_birth = models.DateField(null=True,blank=True)
city = models.CharField(max_length=255,default='',null=True,blank=True)
address = models.TextField(default='',null=True,blank=True)
gender = models.TextField(default='',null=True,blank=True)
And I need to run a django query to get the count of each age. Something like this:
Age || Count
10 || 100
11 || 50
and so on.....

Here is what I did with lambda:
usersAge = map(lambda x: calculate_age(x[0]), User_Data.objects.values_list('date_of_birth'))
users_age_data_source = [[x, usersAge.count(x)] for x in set(usersAge)]
users_age_data_source = sorted(users_age_data_source, key=itemgetter(0))

There's a few ways of doing this. I've had to do something very similar recently. This example works in Postgres.
Note: I've written the following code the way I have so that syntactically it works, and so that I can write between each step. But you can chain these together if you desire.
First we need to annotate the queryset to obtain the 'age' parameter. Since it's not stored as an integer, and can change daily, we can calculate it from the date of birth field by using the database's 'current_date' function:
ud = User_Data.objects.annotate(
age=RawSQL("""(DATE_PART('year', current_date) - DATE_PART('year', "app_userdata"."date_of_birth"))::integer""", []),
)
Note: you'll need to change the "app_userdata" part to match up with the table of your model. You can pick this out of the model's _meta, but this just depends if you want to make this portable or not. If you do, use a string .format() to replace it with what the model's _meta provides. If you don't care about that, just put the table name in there.
Now we pick the 'age' value out so that we get a ValuesQuerySet with just this field
ud = ud.values('age')
And then annotate THAT queryset with a count of age
ud = ud.annotate(
count=Count('age'),
)
At this point we have a ValuesQuerySet that has both 'age' and 'count' as fields. Order it so it comes out in a sensible way..
ud = ud.order_by('age')
And there you have it.
You must build up the queryset in this order otherwise you'll get some interesting results. i.e; you can't group all the annotates together, because the second one for count depends on the first, and as a kwargs dict has no notion of what order the kwargs were defined in, when the queryset does field/dependency checking, it will fail.
Hope this helps.
If you aren't using Postgres, the only thing you'll need to change is the RawSQL annotation to match whatever database engine it is that you're using. However that engine can get the year of a date, either from a field or from its built in "current date" function..providing you can get that out as an integer, it will work exactly the same way.

optimal django manytomany query

I'm having trouble reducing the number of queries for a particular view. It's a fairly heavy one but I'm sure it can be reduced:
Profile:
name = CharField()
Officers:
club= ManyToManyField(Club, related_name='officers')
title= CharField()
Club:
name = CharField()
members = ManyToManyField(Profile)
Election:
club = ForeignKey(Club)
elected = ForeignKey(Profile)
title= CharField()
when = DateTimeField()
Clubs have members and officers (president, tournament director). People can be members of multiple clubs etc...
Officers are elected at elections, the results of which are stored.
Given a player how can I find out the most recently elected officer at each of the players clubs?
At the moment I have
clubs = Club.objects.filter(members=me).prefetch_related('officers')
for c in clubs:
officers = c.officers.all()
most_recent = Elections.objects.filter(club=c).filter(elected__in=officers).order_by('-when')[:1].get()
print(c.name + ' elected ' + most_recent.name + ' most recently')
Problem is the looped query, it's nice and fast if you're a member of 1 club but if you join fifty my database crawls.
Edit:
The answer from Nil does what I want but doesn't get the object. I don't really need the object but I do need another field as well as the datetime. If it's helpful the query:
Club.objects.annotate(last_election=Max('election__when'))
produces the raw SQL
SELECT "organisation_club"."id", "organisation_club"."name", MAX("organisation_election"."when") AS "last_election"
FROM "organisation_club"
LEFT OUTER JOIN "organisation_election" ON ( "organisation_club"."id" = "organisation_election"."club_id" )
GROUP BY "organisation_club"."id", "organisation_club"."name"
I'd really like an ORM answer if at all possible (or a 'mostly' ORM answer).

I believe this is what you're looking for:
from django.db.models import Max, F
Election.objects.filter(club__members=me) \
.annotate(max_date=Max('club__election_set__when')) \
.filter(when=F('max_date')).select_related('elected')
Relations can be followed forwards and backwards again in a single statement, allowing you to annotate the max_date for any election related to the club of the current election. The F class allows you to filter a queryset based on selected fields in SQL, including any extra fields added through annotation, aggregation, joins etc.

What you want is defined here in SQL term: query the Election table, group them by Club and keep only the last election of each club.
Now, how can we translate that in Django ORM? Looking at the documentation, we learn that we can do it with an annotation. The trick is that you need to think in reverse. You want to annotate (add a new data) each club with its last election. This gives us:
Club.objects.annotate(last_election=Max('election__when'))
# Use it in a for loop like that
for club in Club.objects.annotate(last_election=Max('election__when')):
print(club, club.last_election)
Sadly, this only adds the date, which doesn't answer your question! You want the name or the complete Club object. I searched and I still don't know how to do it properly. If everything fails though, you can still use a raw SQL query in Django using a query like in the first link.

The simplest way I can think of is filtering partially at the application level
If you do
e = Election.objects.filter(club__members=me).select_related('elected')
or
e = me.club_set.election_set.select_related('elected')
This is a single query and it should get back all the elections that happened for the all the clubs that the member me is in. Then you can use python to just get the most recent date. Of course, if you have many elections per club, you end up fetching much more data than will be used.
Another way which should do it in two queries:
# Get all member's clubs & most recent election
clubs = Club.objects.filter(members=me).annotate(last_election=Max('election__when'))
# Create filters for election based on the club id and the latest election time
election_Q = [Q(club__id=c.id) & Q(when=c.last_election) for c in clubs]
# Combine filters with an OR
election_filter = reduce(lambda f1, f2: f1 | f2, election_Q)
# Get elections restricting by specific clubs & election date
elections = Election.objects.filter(election_filter).select_related('elected')
for e in elections:
print '%s elected %s most recently at %s' % (e.club.name, e.elected, e.when)
This builds upon #Nil's method and uses its result to build a query in python, then feeds it into the second query. However, there is a limit with the size of a SQL statement and if there are a lot of clubs that a member is in, then you may hit the limit. The limit is fairly high though and I've only ever reached it when importing large datasets in a single INSERT statement so I think it should be fine for your purpose.
Sorry I cannot think of a way that the Django ORM can link them together using a single SQL query. The Django ORM is actually quite limited for complex queries so if you really need the efficiency I think it's probably best to write the raw SQL query.

Is python's slicing sytax used on model queryset excuted on database level?

model:
class person(models.Model)
name = models.CharField()
...
If I use
persons = person.objects.order_by('name')[0:25]
in the code, is the slice executed on database level (converting to SELECT * FROM person ORDER BY name LIMIT 25) or on the "code" level?

This is made very clear in the documentation (and the answer is yes is does):
Use a subset of Python’s array-slicing syntax to limit your QuerySet to a certain number of results. This is the equivalent of SQL’s LIMIT and OFFSET clauses.

Yes, slicing gets translated to SQL's LIMIT.

I think it depends on when it's executed.
Django's ORM QuerySets are "lazy", in that they don't actually run until they are iterated over. This lets you do things like this:
persons = person.objects.filter(age__gte=25)
persons = persons.filter(age__lte=50)
persons = persons.exclude(age=30)
persons = persons.order_by('name')
persons = persons[:25]
for person in persons:
print person.name
Which translates to "Get everyone over the age of 25, under the age of 50, excluding anyone who is 30, order by their name, and give me the first 25 records.
Because the QuerySet is lazy, all of that code only creates a single database call, when you actually enter the for loop.
So, yes, technically, order_by translates to a LIMIT, when the ORM enters the loop.
However, what the ORM does behind the scenes is a create a Python list of each record the database returns. So, let's say we continue on after the above:
for person in persons: # SQL command is compiled and run, with a list returned
print person.name
persons = persons[:10] # Django just slices the list we already have in memory.
It may seem trivial, or an edge case, but it's important to understand what's happening behind the scenes.

How do I get the related objects In an extra().values() call in Django?

Thank to this post I'm able to easily do count and group by queries in a Django view:
Django equivalent for count and group by
What I'm doing in my app is displaying a list of coin types and face values available in my database for a country, so coins from the UK might have a face value of "1 farthing" or "6 pence". The face_value is the 6, the currency_type is the "pence", stored in a related table.
I have the following code in my view that gets me 90% of the way there:
def coins_by_country(request, country_name):
country = Country.objects.get(name=country_name)
coin_values = Collectible.objects.filter(country=country.id, type=1).extra(select={'count': 'count(1)'},
order_by=['-count']).values('count', 'face_value', 'currency_type')
coin_values.query.group_by = ['currency_type_id', 'face_value']
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
The currency_type_id comes across as the number stored in the foreign key field (i.e. 4). What I want to do is retrieve the actual object that it references as part of the query (the Currency model, so I can get the Currency.name field in my template).
What's the best way to do that?

You can't do it with values(). But there's no need to use that - you can just get the actual Collectible objects, and each one will have a currency_type attribute that will be the relevant linked object.
And as justinhamade suggests, using select_related() will help to cut down the number of database queries.
Putting it together, you get:
coin_values = Collectible.objects.filter(country=country.id,
type=1).extra(
select={'count': 'count(1)'},
order_by=['-count']
).select_related()

select_related() got me pretty close, but it wanted me to add every field that I've selected to the group_by clause.
So I tried appending values() after the select_related(). No go. Then I tried various permutations of each in different positions of the query. Close, but not quite.
I ended up "wimping out" and just using raw SQL, since I already knew how to write the SQL query.
def coins_by_country(request, country_name):
country = get_object_or_404(Country, name=country_name)
cursor = connection.cursor()
cursor.execute('SELECT count(*), face_value, collection_currency.name FROM collection_collectible, collection_currency WHERE collection_collectible.currency_type_id = collection_currency.id AND country_id=%s AND type=1 group by face_value, collection_currency.name', [country.id] )
coin_values = cursor.fetchall()
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
If there's a way to phrase that exact query in the Django queryset language I'd be curious to know. I imagine that an SQL join with a count and grouping by two columns isn't super-rare, so I'd be surprised if there wasn't a clean way.

Have you tried select_related() http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
I use it a lot it seems to work well then you can go coin_values.currency.name.
Also I dont think you need to do country=country.id in your filter, just country=country but I am not sure what difference that makes other than less typing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js