Django ORM - select_related and order_by with foreign keys - django

I have a simple music schema: Artist, Release, Track, and Song. The first 3 are all logical constructs while the fourth (Song) is a specific instance of an (Artist, Release, Track) as an mp3, wav, ogg, whatever.
I am having trouble generating an ordered list of the Songs in the database. The catch is that both Track and Release have an Artist. While Song.Track.Artist is always the performer name, Song.Track.Release.Artist may either be a performer name or "Various Artists" for compilations. I want to be able to sort by one or the other, and I can't figure out the correct way to make this work.
Here's my schema:
class Artist(models.Model):
name = models.CharField(max_length=512)
class Release(models.Model):
name = models.CharField(max_length=512)
artist = models.ForeignKey(Artist)
class Track(models.Model):
name = models.CharField(max_length=512)
track_number = models.IntegerField('Position of the track on its release')
length = models.IntegerField('Length of the song in seconds')
artist = models.ForeignKey(Artist)
release = models.ForeignKey(Release)
class Song(models.Model):
bitrate = models.IntegerField('Bitrate of the song in kbps')
location = models.CharField('Permanent storage location of the file', max_length=1024)
owner = models.ForeignKey(User)
track = models.ForeignKey(Track)
My query should be fairly simple; filter for all songs owned by a specific user, and then sort them by either Song.Track.Artist.name or Song.Track.Release.Artist.name. Here's my code inside a view, which is sorting by Song.Track.Artist.name:
songs = Song.objects.filter(owner=request.user).select_related('track__artist', 'track__release', 'track__release__artist').order_by('player_artist.name')
I can't get order_by to work unless I use tblname.colname. I took a look at the underlying query object's as_sql method, which indicates that when the inner join is made to get Song.Track.Release.Artist the temporary name T6 is used for the Artist table since an inner join was already done on this same table to get Song.Track.Artist:
>>> songs = Song.objects.filter(owner=request.user).select_related('track__artist', 'track__release', 'track__release__artist').order_by('T6.name')
>>> print songs.query.as_sql()
('SELECT "player_song"."id", "player_song"."bitrate", "player_song"."location",
"player_song"."owner_id", "player_song"."track_id", "player_track"."id",
"player_track"."name", "player_track"."track_number", "player_track"."length",
"player_track"."artist_id", "player_track"."release_id", "player_artist"."id",
"player_artist"."name", "player_release"."id", "player_release"."name",
"player_release"."artist_id", T6."id", T6."name" FROM "player_song" INNER JOIN
"player_track" ON ("player_song"."track_id" = "player_track"."id") INNER JOIN
"player_artist" ON ("player_track"."artist_id" = "player_artist"."id") INNER JOIN
"player_release" ON ("player_track"."release_id" = "player_release"."id") INNER JOIN
"player_artist" T6 ON ("player_release"."artist_id" = T6."id") WHERE
"player_song"."owner_id" = %s ORDER BY T6.name ASC', (1,))
When I put this as the table name in order_by it does work (see example output above), but this seems entirely non-portable. Surely there's a better way to do this! What am I missing?

I'm afraid I really can't understand what your question is.
A couple of corrections: select_related has nothing to do with ordering (it doesn't change the queryset at all, just follows joins to get related objects and cache them); and to order by a field in a related model you use the double-underscore notation, not dotted. For example:
Song.objects.filter(owner=request.user).order_by('track__artist__name')
But in your example, you use 'player_artist', which doesn't seem to be a field anywhere in your model. And I don't understand your reference to portability.

Related

Django Query - Get list that isnt in FK of another model

I am working on a django web app that manages payroll based on reports completed, and then payroll generated. 3 models as follows. (ive tried to limit to data needed for question).
class PayRecord(models.Model):
rate = models.FloatField()
user = models.ForeignKey(User)
class Payroll(models.Model):
company = models.ForeignKey(Company)
name = models.CharField()
class PayrollItem(models.Model):
payroll = models.ForeignKey(Payroll)
record = models.OneToOneField(PayRecord, unique=True)
What is the most efficient way to get all the PayRecords that aren't also in PayrollItem. So i can select them to create a payroll item.
There are 100k records, and my initial attempt takes minutes. Attempt tried below (this is far from feasible).
records_completed_in_payrolls = [
p.report.id for p in PayrollItem.objects.select_related(
'record',
'payroll'
)
]
Because you have the related field record in PayrollItem you can reach into that model while you filter PayRecord. Using the __isnull should give you what you want.
PayRecord.objects.filter(payrollitem__isnull=True)
Translates to a sql statement like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON payroll_payrecord.id = payroll_payrollitem.record_id
WHERE payroll_payrollitem.id IS NULL
Depending on your intentions, you may want to chain on a .select_related (https://docs.djangoproject.com/en/3.1/ref/models/querysets/#select-related)
PayRecord.objects.filter(payrollitem__isnull=True).select_related('user')
which translates to something like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id,
payroll_user.id,
payroll_user.name
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON (payroll_payrecord.id = payroll_payrollitem.record_id)
INNER JOIN payroll_user
ON (payroll_payrecord.user_id = payroll_user.id)
WHERE payroll_payrollitem.id IS NULL

django query aggregation grouping with access to all fields

With models defined like so:
class Athlete(models.Model):
name = models.CharField()
class Event(models.Model):
winner = models.ForeignKey(Athlete)
distance = models.FloatField()
type_choices = [('LJ', 'Long Jump'), ('HJ', 'High Jump')]
type = models.CharField(choices=type_choices)
I want to run a query picking out all the events an Athlete has won, grouped by type. I'm currently doing it like so:
athlete = Athlete.objects.get(name='dave')
events_by_type = Events.objects.values('type').annotate(Count('winner')).filter(winner=athlete)
This gives me a dictionary of event types (short versions) and the number of times the athlete has been the winner. However that's all it gives me. If I then want to dig into one of these events to find the distance or even just the verbose type name, I can't.
Am I going about this in the right way? How can I get the events grouped, but also with access to all their fields as well?
You are not getting the event instances because you are querying Event.objects with the values method. This will provide you the data for only the specified fields:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#values
Performing this kind of group by with the Django ORM is not straightforward. The proposed solution often is:
q = Events.objects.filter(winner=athlete)
q.query.group_by = ['type']
q.count()
But I'd rather do it with straight python. Maybe something like
athlete = Athlete.objects.get(name='dave')
events = Events.objects.filter(winner=athlete)
won_per_type = defaultdict(list)
for e in events:
won_per_type(e.type).append(e)

Django complex query without using loop

I have two models such that
class Employer(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
eminence = models.IntegerField(null=False,default=4)
class JobTitle(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
employer= models.ForeignKey(JobTitle,unique=False,null=False)
class People(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
jobtitle = models.ForeignKey(JobTitle,unique=False,null=False)
I would like to list random 5 employers and one job title for each employer. However, job title should be picked up from first 10 jobtitles of the employer whose number of people is maximum.
One approach could be
employers = Employer.objects.filter(isActive=True).filter(eminence__lt=4 ).order_by('?')[:5]
for emp in employers:
jobtitle = JobTitle.objects.filter(employer=emp)... and so on.
However, loop through selected employers may be ineffiecent. Is there any way to do it in one query ?
Thanks
There is! Check out: https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
select_related() tells Django to follow all the foreign key relationships using JOINs. This will result in one large query as opposed to many small queries, which in most cases is what you want. The QuerySet you get will be pre-populated and Django won't have to lazy-load anything from the database.
I've used select_related() in the past to solve almost this exact problem.
I have written such code block and it works. Although I loop over employers because I have used select_related('jobtitle'), I consider it doesn't hit database and works faster.
employers = random.sample(Employer.objects.select_related('jobtitle').filter(eminence__lt=4,status=EmployerStatus.ACTIVE).annotate(jtt_count=Count('jobtitle')).filter(jtt_count__gt=0),3)
jtList = []
for emp in employers:
jt = random.choice(emp.jobtitle_set.filter(isActive=True).annotate(people_count=Count('people')).filter(people_count__gt=0)[:10])
jtList.append(jt)

Annotating a Django queryset with a left outer join?

Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)
A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.
This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps
Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)
I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.
The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.
You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.
As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])
You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)
The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.
maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)

What is the internal function in django to add new tables to a queryset in a sensible way?

In django 1.2:
I have a queryset with an extra parameter which refers to a table which is not currently included in the query django generates for this queryset.
If I add an order_by to the queryset which refers to the other table, django adds joins to the other table in the proper way and the extra works. But without the order_by, the extra parameter is failing. I could just add a useless secondary order_by to something in the other table, but I think there should be a better way to do it.
What is the django function to add joins in a sensible way? I know this must be getting called somewhere.
Here is some sample code. It selects all readings for a given user, and annotates the results with the rating (if any) given by another user stored in 'friend'.
class Book(models.Model):
name = models.CharField(max_length=200)
urlname = models.CharField(max_length=200)
entrydate=models.DateTimeField(auto_now_add=True)
class Reading(models.Model):
book=models.ForeignKey(Book,related_name='readings')
user=models.ForeignKey(User)
rating=models.IntegerField()
entrydate=models.DateTimeField(auto_now_add=True)
readings=Reading.objects.filter(user=user).order_by('entrydate')
friendrating='(select rating from proj_reading where user_id=%d and \
book_id=proj_book.id and rating in (1,2,3,4,5,6))'%friend.id
readings=readings.extra(select={'friendrating':friendrating})
at the moment, readings won't work because the join to readings is not set up correctly. however, if I add an order by such as:
.order_by('entrydate','reading__entrydate')
django magically knows to add an inner join through the foreign key and I get what I want.
additional information:
print readings.query ==>
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
assuming
user.id=1
friend.id=2
the error is:
OperationalError: Unknown column proj_book.id in 'where clause'
and it happens because the table proj_book is not included in the query. To restate what I said above - if I now do readings2=readings.order_by('book__entrydate') I can see the proper join is set up and the query works.
Ideally I'd just like to figure out what the name of the qs.query function is that looks at two tables and figures out how they are joined by foreign keys, and just call that manually.
Your generated query:
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
The db has no way to understand what does it mean by proj_book, since it is not included in (from tables or inner join).
You are getting expected results, when you add order_by, because that order_by query is adding inner join between proj_book and proj_reading.
As far as I understand, if you refer any other column in Book, not just order_by, you will get similar results.
Q1 = Reading.objects.filter(user=user).exclude(Book__name='') # Exclude forces to add JOIN
Q2 = "Select rating from proj_reading where user_id=%d" % user.id
Result = Q1.extra("foo":Q2)
This way, at step Q1, you are forcing DJango to add join on Book table, which is not default, unless you access any field of Book table.
you mean:
class SomeModel(models.Model)
id = models.IntegerField()
...
class SomeOtherModel(models.Model)
otherfield = models.ForeignKey(SomeModel)
qrst = SomeOtherModel.objects.filter(otherfield__id=1)
You can use "__" to create table joins.
EDIT:
It wont work because you do not define table join correctly.
myrating='(select rating from proj_reading inner join proj_book on (proj_book.id=proj_reading_id) where proj_reading.user_id=%d and rating in (1,2,3,4,5,6))'%user.id)'
This is a pesdocode and it is not tested.
But, i advice you to use django filters instead of writing sql queries.
read = Reading.objects.filter(book__urlname__icontains="smith", user_id=user.id, rating__in=(1,2,3,4,5,6)).values('rating')
Documentation for more details.