Annotating a Django queryset with a left outer join? - django

Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)

A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.

This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps

Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)

I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.

The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.

You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.

As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])

You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)

The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.

maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)

Related

Filtering together with annotate in Django - unexpected result [duplicate]

I always assumed that chaining multiple filter() calls in Django was always the same as collecting them in a single call.
# Equivalent
Model.objects.filter(foo=1).filter(bar=2)
Model.objects.filter(foo=1,bar=2)
but I have run across a complicated queryset in my code where this is not the case
class Inventory(models.Model):
book = models.ForeignKey(Book)
class Profile(models.Model):
user = models.OneToOneField(auth.models.User)
vacation = models.BooleanField()
country = models.CharField(max_length=30)
# Not Equivalent!
Book.objects.filter(inventory__user__profile__vacation=False).filter(inventory__user__profile__country='BR')
Book.objects.filter(inventory__user__profile__vacation=False, inventory__user__profile__country='BR')
The generated SQL is
SELECT "library_book"."id", "library_book"."asin", "library_book"."added", "library_book"."updated" FROM "library_book" INNER JOIN "library_inventory" ON ("library_book"."id" = "library_inventory"."book_id") INNER JOIN "auth_user" ON ("library_inventory"."user_id" = "auth_user"."id") INNER JOIN "library_profile" ON ("auth_user"."id" = "library_profile"."user_id") INNER JOIN "library_inventory" T5 ON ("library_book"."id" = T5."book_id") INNER JOIN "auth_user" T6 ON (T5."user_id" = T6."id") INNER JOIN "library_profile" T7 ON (T6."id" = T7."user_id") WHERE ("library_profile"."vacation" = False AND T7."country" = BR )
SELECT "library_book"."id", "library_book"."asin", "library_book"."added", "library_book"."updated" FROM "library_book" INNER JOIN "library_inventory" ON ("library_book"."id" = "library_inventory"."book_id") INNER JOIN "auth_user" ON ("library_inventory"."user_id" = "auth_user"."id") INNER JOIN "library_profile" ON ("auth_user"."id" = "library_profile"."user_id") WHERE ("library_profile"."vacation" = False AND "library_profile"."country" = BR )
The first queryset with the chained filter() calls joins the Inventory model twice effectively creating an OR between the two conditions whereas the second queryset ANDs the two conditions together. I was expecting that the first query would also AND the two conditions. Is this the expected behavior or is this a bug in Django?
The answer to a related question Is there a downside to using ".filter().filter().filter()..." in Django? seems to indicated that the two querysets should be equivalent.
The way I understand it is that they are subtly different by design (and I am certainly open for correction): filter(A, B) will first filter according to A and then subfilter according to B, while filter(A).filter(B) will return a row that matches A 'and' a potentially different row that matches B.
Look at the example here:
https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
particularly:
Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects
...
In this second example (filter(A).filter(B)), the first filter restricted the queryset to (A). The second filter restricted the set of blogs further to those that are also (B). The entries select by the second filter may or may not be the same as the entries in the first filter.`
These two style of filtering are equivalent in most cases, but when query on objects base on ForeignKey or ManyToManyField, they are slightly different.
Examples from the documentation.
model
Blog to Entry is a one-to-many relation.
from django.db import models
class Blog(models.Model):
...
class Entry(models.Model):
blog = models.ForeignKey(Blog)
headline = models.CharField(max_length=255)
pub_date = models.DateField()
...
objects
Assuming there are some blog and entry objects here.
queries
Blog.objects.filter(entry__headline_contains='Lennon',
entry__pub_date__year=2008)
Blog.objects.filter(entry__headline_contains='Lennon').filter(
entry__pub_date__year=2008)
For the 1st query (single filter one), it match only blog1.
For the 2nd query (chained filters one), it filters out blog1 and blog2.
The first filter restricts the queryset to blog1, blog2 and blog5; the second filter restricts the set of blogs further to blog1 and blog2.
And you should realize that
We are filtering the Blog items with each filter statement, not the Entry items.
So, it's not the same, because Blog and Entry are multi-valued relationships.
Reference: https://docs.djangoproject.com/en/1.8/topics/db/queries/#spanning-multi-valued-relationships
If there is something wrong, please correct me.
Edit: Changed v1.6 to v1.8 since the 1.6 links are no longer available.
As you can see in the generated SQL statements the difference is not the "OR" as some may suspect. It is how the WHERE and JOIN is placed.
Example1 (same joined table) :
(example from https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships)
Blog.objects.filter(entry__headline__contains='Lennon', entry__pub_date__year=2008)
This will give you all the Blogs that have one entry with both (entry_headline_contains='Lennon') AND (entry__pub_date__year=2008), which is what you would expect from this query.
Result:
Book with {entry.headline: 'Life of Lennon', entry.pub_date: '2008'}
Example 2 (chained)
Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)
This will cover all the results from Example 1, but it will generate slightly more result. Because it first filters all the blogs with (entry_headline_contains='Lennon') and then from the result filters (entry__pub_date__year=2008).
The difference is that it will also give you results like:
Book with {entry.headline: 'Lennon', entry.pub_date: 2000}, {entry.headline: 'Bill', entry.pub_date: 2008}
In your case
I think it is this one you need:
Book.objects.filter(inventory__user__profile__vacation=False, inventory__user__profile__country='BR')
And if you want to use OR please read: https://docs.djangoproject.com/en/dev/topics/db/queries/#complex-lookups-with-q-objects
From Django docs :
To handle both of these situations, Django has a consistent way of processing filter() calls. Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects, but for multi-valued relations, they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.
It is clearly said that multiple conditions in a single filter() are applied simultaneously.
That means that doing :
objs = Mymodel.objects.filter(a=True, b=False)
will return a queryset with raws from model Mymodel where a=True AND b=False.
Successive filter(), in some case, will provide the same result. Doing :
objs = Mymodel.objects.filter(a=True).filter(b=False)
will return a queryset with raws from model Mymodel where a=True AND b=False too. Since you obtain "first" a queryset with records which have a=True and then it's restricted to those who have b=False at the same time.
The difference in chaining filter() comes when there are multi-valued relations, which means you are going through other models (such as the example given in the docs, between Blog and Entry models). It is said that in that case (...) they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.
Which means that it applies the successives filter() on the target model directly, not on previous filter()
If I take the example from the docs :
Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)
remember that it's the model Blog that is filtered, not the Entry. So it will treat the 2 filter() independently.
It will, for instance, return a queryset with Blogs, that have entries that contain 'Lennon' (even if they are not from 2008) and entries that are from 2008 (even if their headline does not contain 'Lennon')
THIS ANSWER goes even further in the explanation. And the original question is similar.
Sometimes you don't want to join multiple filters together like this:
def your_dynamic_query_generator(self, event: Event):
qs \
.filter(shiftregistrations__event=event) \
.filter(shiftregistrations__shifts=False)
And the following code would actually not return the correct thing.
def your_dynamic_query_generator(self, event: Event):
return Q(shiftregistrations__event=event) & Q(shiftregistrations__shifts=False)
What you can do now is to use an annotation count-filter.
In this case we count all shifts which belongs to a certain event.
qs: EventQuerySet = qs.annotate(
num_shifts=Count('shiftregistrations__shifts', filter=Q(shiftregistrations__event=event))
)
Afterwards you can filter by annotation.
def your_dynamic_query_generator(self):
return Q(num_shifts=0)
This solution is also cheaper on large querysets.
Hope this helps.
Saw this in a comment and I thought it was the simplest explanation.
filter(A, B) is the AND ; filter(A).filter(B) is OR
It's true if every linked model satisfies both conditions

django orm - How to use select_related() on the Foreign Key of a Subclass from its Super Class

I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?
It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.
I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.

Chaining multiple filter() in Django, is this a bug?

I always assumed that chaining multiple filter() calls in Django was always the same as collecting them in a single call.
# Equivalent
Model.objects.filter(foo=1).filter(bar=2)
Model.objects.filter(foo=1,bar=2)
but I have run across a complicated queryset in my code where this is not the case
class Inventory(models.Model):
book = models.ForeignKey(Book)
class Profile(models.Model):
user = models.OneToOneField(auth.models.User)
vacation = models.BooleanField()
country = models.CharField(max_length=30)
# Not Equivalent!
Book.objects.filter(inventory__user__profile__vacation=False).filter(inventory__user__profile__country='BR')
Book.objects.filter(inventory__user__profile__vacation=False, inventory__user__profile__country='BR')
The generated SQL is
SELECT "library_book"."id", "library_book"."asin", "library_book"."added", "library_book"."updated" FROM "library_book" INNER JOIN "library_inventory" ON ("library_book"."id" = "library_inventory"."book_id") INNER JOIN "auth_user" ON ("library_inventory"."user_id" = "auth_user"."id") INNER JOIN "library_profile" ON ("auth_user"."id" = "library_profile"."user_id") INNER JOIN "library_inventory" T5 ON ("library_book"."id" = T5."book_id") INNER JOIN "auth_user" T6 ON (T5."user_id" = T6."id") INNER JOIN "library_profile" T7 ON (T6."id" = T7."user_id") WHERE ("library_profile"."vacation" = False AND T7."country" = BR )
SELECT "library_book"."id", "library_book"."asin", "library_book"."added", "library_book"."updated" FROM "library_book" INNER JOIN "library_inventory" ON ("library_book"."id" = "library_inventory"."book_id") INNER JOIN "auth_user" ON ("library_inventory"."user_id" = "auth_user"."id") INNER JOIN "library_profile" ON ("auth_user"."id" = "library_profile"."user_id") WHERE ("library_profile"."vacation" = False AND "library_profile"."country" = BR )
The first queryset with the chained filter() calls joins the Inventory model twice effectively creating an OR between the two conditions whereas the second queryset ANDs the two conditions together. I was expecting that the first query would also AND the two conditions. Is this the expected behavior or is this a bug in Django?
The answer to a related question Is there a downside to using ".filter().filter().filter()..." in Django? seems to indicated that the two querysets should be equivalent.
The way I understand it is that they are subtly different by design (and I am certainly open for correction): filter(A, B) will first filter according to A and then subfilter according to B, while filter(A).filter(B) will return a row that matches A 'and' a potentially different row that matches B.
Look at the example here:
https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
particularly:
Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects
...
In this second example (filter(A).filter(B)), the first filter restricted the queryset to (A). The second filter restricted the set of blogs further to those that are also (B). The entries select by the second filter may or may not be the same as the entries in the first filter.`
These two style of filtering are equivalent in most cases, but when query on objects base on ForeignKey or ManyToManyField, they are slightly different.
Examples from the documentation.
model
Blog to Entry is a one-to-many relation.
from django.db import models
class Blog(models.Model):
...
class Entry(models.Model):
blog = models.ForeignKey(Blog)
headline = models.CharField(max_length=255)
pub_date = models.DateField()
...
objects
Assuming there are some blog and entry objects here.
queries
Blog.objects.filter(entry__headline_contains='Lennon',
entry__pub_date__year=2008)
Blog.objects.filter(entry__headline_contains='Lennon').filter(
entry__pub_date__year=2008)
For the 1st query (single filter one), it match only blog1.
For the 2nd query (chained filters one), it filters out blog1 and blog2.
The first filter restricts the queryset to blog1, blog2 and blog5; the second filter restricts the set of blogs further to blog1 and blog2.
And you should realize that
We are filtering the Blog items with each filter statement, not the Entry items.
So, it's not the same, because Blog and Entry are multi-valued relationships.
Reference: https://docs.djangoproject.com/en/1.8/topics/db/queries/#spanning-multi-valued-relationships
If there is something wrong, please correct me.
Edit: Changed v1.6 to v1.8 since the 1.6 links are no longer available.
As you can see in the generated SQL statements the difference is not the "OR" as some may suspect. It is how the WHERE and JOIN is placed.
Example1 (same joined table) :
(example from https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships)
Blog.objects.filter(entry__headline__contains='Lennon', entry__pub_date__year=2008)
This will give you all the Blogs that have one entry with both (entry_headline_contains='Lennon') AND (entry__pub_date__year=2008), which is what you would expect from this query.
Result:
Book with {entry.headline: 'Life of Lennon', entry.pub_date: '2008'}
Example 2 (chained)
Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)
This will cover all the results from Example 1, but it will generate slightly more result. Because it first filters all the blogs with (entry_headline_contains='Lennon') and then from the result filters (entry__pub_date__year=2008).
The difference is that it will also give you results like:
Book with {entry.headline: 'Lennon', entry.pub_date: 2000}, {entry.headline: 'Bill', entry.pub_date: 2008}
In your case
I think it is this one you need:
Book.objects.filter(inventory__user__profile__vacation=False, inventory__user__profile__country='BR')
And if you want to use OR please read: https://docs.djangoproject.com/en/dev/topics/db/queries/#complex-lookups-with-q-objects
From Django docs :
To handle both of these situations, Django has a consistent way of processing filter() calls. Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects, but for multi-valued relations, they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.
It is clearly said that multiple conditions in a single filter() are applied simultaneously.
That means that doing :
objs = Mymodel.objects.filter(a=True, b=False)
will return a queryset with raws from model Mymodel where a=True AND b=False.
Successive filter(), in some case, will provide the same result. Doing :
objs = Mymodel.objects.filter(a=True).filter(b=False)
will return a queryset with raws from model Mymodel where a=True AND b=False too. Since you obtain "first" a queryset with records which have a=True and then it's restricted to those who have b=False at the same time.
The difference in chaining filter() comes when there are multi-valued relations, which means you are going through other models (such as the example given in the docs, between Blog and Entry models). It is said that in that case (...) they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.
Which means that it applies the successives filter() on the target model directly, not on previous filter()
If I take the example from the docs :
Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)
remember that it's the model Blog that is filtered, not the Entry. So it will treat the 2 filter() independently.
It will, for instance, return a queryset with Blogs, that have entries that contain 'Lennon' (even if they are not from 2008) and entries that are from 2008 (even if their headline does not contain 'Lennon')
THIS ANSWER goes even further in the explanation. And the original question is similar.
Sometimes you don't want to join multiple filters together like this:
def your_dynamic_query_generator(self, event: Event):
qs \
.filter(shiftregistrations__event=event) \
.filter(shiftregistrations__shifts=False)
And the following code would actually not return the correct thing.
def your_dynamic_query_generator(self, event: Event):
return Q(shiftregistrations__event=event) & Q(shiftregistrations__shifts=False)
What you can do now is to use an annotation count-filter.
In this case we count all shifts which belongs to a certain event.
qs: EventQuerySet = qs.annotate(
num_shifts=Count('shiftregistrations__shifts', filter=Q(shiftregistrations__event=event))
)
Afterwards you can filter by annotation.
def your_dynamic_query_generator(self):
return Q(num_shifts=0)
This solution is also cheaper on large querysets.
Hope this helps.
Saw this in a comment and I thought it was the simplest explanation.
filter(A, B) is the AND ; filter(A).filter(B) is OR
It's true if every linked model satisfies both conditions

What is the internal function in django to add new tables to a queryset in a sensible way?

In django 1.2:
I have a queryset with an extra parameter which refers to a table which is not currently included in the query django generates for this queryset.
If I add an order_by to the queryset which refers to the other table, django adds joins to the other table in the proper way and the extra works. But without the order_by, the extra parameter is failing. I could just add a useless secondary order_by to something in the other table, but I think there should be a better way to do it.
What is the django function to add joins in a sensible way? I know this must be getting called somewhere.
Here is some sample code. It selects all readings for a given user, and annotates the results with the rating (if any) given by another user stored in 'friend'.
class Book(models.Model):
name = models.CharField(max_length=200)
urlname = models.CharField(max_length=200)
entrydate=models.DateTimeField(auto_now_add=True)
class Reading(models.Model):
book=models.ForeignKey(Book,related_name='readings')
user=models.ForeignKey(User)
rating=models.IntegerField()
entrydate=models.DateTimeField(auto_now_add=True)
readings=Reading.objects.filter(user=user).order_by('entrydate')
friendrating='(select rating from proj_reading where user_id=%d and \
book_id=proj_book.id and rating in (1,2,3,4,5,6))'%friend.id
readings=readings.extra(select={'friendrating':friendrating})
at the moment, readings won't work because the join to readings is not set up correctly. however, if I add an order by such as:
.order_by('entrydate','reading__entrydate')
django magically knows to add an inner join through the foreign key and I get what I want.
additional information:
print readings.query ==>
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
assuming
user.id=1
friend.id=2
the error is:
OperationalError: Unknown column proj_book.id in 'where clause'
and it happens because the table proj_book is not included in the query. To restate what I said above - if I now do readings2=readings.order_by('book__entrydate') I can see the proper join is set up and the query works.
Ideally I'd just like to figure out what the name of the qs.query function is that looks at two tables and figures out how they are joined by foreign keys, and just call that manually.
Your generated query:
select ((select rating from proj_reading where user_id=2 and book_id=proj_book.id and rating in (1,2,3,4,5,6)) as 'hisrating', proj_reading.id, proj_reading.user_id, proj_reading.rating, proj_reading.entrydate from proj_reading where proj_reading.user_id=1;
The db has no way to understand what does it mean by proj_book, since it is not included in (from tables or inner join).
You are getting expected results, when you add order_by, because that order_by query is adding inner join between proj_book and proj_reading.
As far as I understand, if you refer any other column in Book, not just order_by, you will get similar results.
Q1 = Reading.objects.filter(user=user).exclude(Book__name='') # Exclude forces to add JOIN
Q2 = "Select rating from proj_reading where user_id=%d" % user.id
Result = Q1.extra("foo":Q2)
This way, at step Q1, you are forcing DJango to add join on Book table, which is not default, unless you access any field of Book table.
you mean:
class SomeModel(models.Model)
id = models.IntegerField()
...
class SomeOtherModel(models.Model)
otherfield = models.ForeignKey(SomeModel)
qrst = SomeOtherModel.objects.filter(otherfield__id=1)
You can use "__" to create table joins.
EDIT:
It wont work because you do not define table join correctly.
myrating='(select rating from proj_reading inner join proj_book on (proj_book.id=proj_reading_id) where proj_reading.user_id=%d and rating in (1,2,3,4,5,6))'%user.id)'
This is a pesdocode and it is not tested.
But, i advice you to use django filters instead of writing sql queries.
read = Reading.objects.filter(book__urlname__icontains="smith", user_id=user.id, rating__in=(1,2,3,4,5,6)).values('rating')
Documentation for more details.

chain filter and exclude on django model with field lookups that span relationships

I have the following models:
class Order_type(models.Model):
description = models.CharField()
class Order(models.Model):
type= models.ForeignKey(Order_type)
order_date = models.DateField(default=datetime.date.today)
status = models.CharField()
processed_time= models.TimeField()
I want a list of the order types that have orders that meet this criteria: (order_date <= today AND processed_time is empty AND status is not blank)
I tried:
qs = Order_type.objects.filter(order__order_date__lte=datetime.date.today(),\
order__processed_time__isnull=True).exclude(order__status='')
This works for the original list of orders:
orders_qs = Order.objects.filter(order_date__lte=datetime.date.today(), processed_time__isnull=True)
orders_qs = orders_qs.exclude(status='')
But qs isn't the right queryset. I think its actually returning a more narrowed filter (since no records are present) but I'm not sure what. According to this (django reference), because I'm referencing a related model I think the exclude works on the original queryset (not the one from the filter), but I don't get exactly how.
OK, I just thought of this, which I think works, but feels sloppy (Is there a better way?):
qs = Order_type.objects.filter(order__id__in=[o.id for o in orders_qs])
What's happening is that the exclude() query is messing things up for you. Basically, it's excluding any Order_type that has at least one Order without a status, which is almost certainly not what you want to happen.
The simplest solution in your case is to use order__status__gt='' in you filter() arguments. However, you will also need to append distinct() to the end of your query, because otherwise you'd get a QuerySet with multiple instances of the same Order_type if it has more than one Order that matches the query. This should work:
qs = Order_type.objects.filter(
order__order_date__lte=datetime.date.today(),
order__processed_time__isnull=True,
order__status__gt='').distinct()
On a side note, in the qs query you gave at the end of the question, you don't have to say order__id__in=[o.id for o in orders_qs], you can simply use order__in=orders_qs (you still also need the distinct()). So this will also work:
qs = Order_type.objects.filter(order__in=Order.objects.filter(
order_date__lte=datetime.date.today(),
processed_time__isnull=True).exclude(status='')).distinct()
Addendum (edit):
Here's the actual SQL that Django issues for the above querysets:
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
LEFT OUTER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE ("testapp_order"."order_date" <= E'2010-07-18'
AND "testapp_order"."processed_time" IS NULL
AND "testapp_order"."status" > E'' );
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
INNER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE "testapp_order"."id" IN
(SELECT U0."id" FROM "testapp_order" U0
WHERE (U0."order_date" <= E'2010-07-18'
AND U0."processed_time" IS NULL
AND NOT (U0."status" = E'' )));
EXPLAIN reveals that the second query is ever so slightly more expensive (cost of 28.99 versus 28.64 with a very small dataset).