I have these three models:
class Track(models.Model):
title = models.TextField()
artist = models.TextField()
class Tag(models.Model):
name = models.CharField(max_length=50)
class TrackHasTag(models.Model):
track = models.ForeignKey('Track', on_delete=models.CASCADE)
tag = models.ForeignKey('Tag', on_delete=models.PROTECT)
And I want to retrieve all Tracks that are not tagged with a specific tag. This gets me what I want: Track.objects.exclude(trackhastag__tag_id='1').only('id') but it's very slow when the tables grow. This is what I get when printing .query of the queryset:
SELECT "track"."id"
FROM "track"
WHERE NOT ( "track"."id" IN (SELECT U1."track_id" AS Col1
FROM "trackhastag" U1
WHERE U1."tag_id" = 1) )
I would like Django to send this query instead:
SELECT "track"."id"
FROM "track"
LEFT OUTER JOIN "trackhastag"
ON "track"."id" = "trackhastag"."track_id"
AND "trackhastag"."tag_id" = 1
WHERE "trackhastag"."id" IS NULL;
But haven't found a way to do so. Using a Raw Query is not really an option as I have to filter the resulting queryset very often.
The cleanest workaround I have found is to create a view in the database and a model TrackHasTagFoo with managed = False that I use to query like: Track.objects.filter(trackhastagfoo__isnull=True). I don't think this is an elegant nor sustainable solution as it involves adding Raw SQL to my migrations to mantain said view.
This is just one example of a situation where we need to do this kind of left join with an extra condition, but the truth is that we are facing this problem in more parts of our application.
Thanks a lot!
As mentioned in Django #29555 you can use FilteredRelation for this purpose since Django 2.0.
Track.objects.annotate(
has_tag=FilteredRelation(
'trackhastag', condition=Q(trackhastag__tag=1)
),
).filter(
has_tag__isnull=True,
)
What about queryset extras? They do not break ORM and can be further filtered (vs RawSQL)
from django.db.models import Q
Track.objects.filter(
# work around to force left outer join
Q(trackhastag__isnull=True) | Q(trackhastag__isnull=False)
).extra(
# where parameters are “AND”ed to any other search criteria
# thus we need to account for NULL
where=[
'"app_trackhastag"."id" <> %s or "app_trackhastag"."id" is NULL'
],
params=[1],
)
produces this somewhat convoluted query:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE (
("app_trackhastag"."id" IS NULL OR "app_trackhastag"."id" IS NOT NULL) AND
("app_trackhastag"."id" <> 1 or "app_trackhastag"."id" is NULL)
)
Rationale
Step 1
One straight forward way to have a left outer join with queryset is the following:
Track.objects.filter(trackhastag__isnull=True)
which gives:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE "app_trackhastag"."id" IS NULL
Step 2
Realize that once step 1 is done (we have a left outer join), we can leverage
queryset's extra:
Track.objects.filter(
trackhastag__isnull=True
).extra(
where=['"app_trackhastag"."id" <> %s'],
params=[1],
)
which gives:
SELECT "app_track"."id", "app_track"."title", "app_track"."artist"
FROM "app_track"
LEFT OUTER JOIN "app_trackhastag"
ON ("app_track"."id" = "app_trackhastag"."track_id")
WHERE (
"app_trackhastag"."id" IS NULL AND
("app_trackhastag"."id" <> 1)
)
Step 3
Playing around extra limitations (All where parameters are “AND”ed to any other search criteria) to come up with final solution above.
Using filters is better than exclude... because wit exclude they will get the entire query first and only than excluding the itens you dont want, while filter get only what you want Like you said Track.objects.filter(trackhastagfoo__isnull=True) is better than Exclude one.
Suggestion: You trying to manually do one ManyToMany Relations, as Mohammad said, why you dont try use ManyToManyField? is more easy to use
Maybe this answer your question: Django Left Outer Join
Enric, why you did not use many to many relation
class Track(models.Model):
title = models.TextField()
artist = models.TextField()
tags = models.ManyToManyField(Tag)
class Tag(models.Model):
name = models.CharField(max_length=50)
And for your question
Track.objects.filter(~Q(tags__id=1))
Related
I am working on a django web app that manages payroll based on reports completed, and then payroll generated. 3 models as follows. (ive tried to limit to data needed for question).
class PayRecord(models.Model):
rate = models.FloatField()
user = models.ForeignKey(User)
class Payroll(models.Model):
company = models.ForeignKey(Company)
name = models.CharField()
class PayrollItem(models.Model):
payroll = models.ForeignKey(Payroll)
record = models.OneToOneField(PayRecord, unique=True)
What is the most efficient way to get all the PayRecords that aren't also in PayrollItem. So i can select them to create a payroll item.
There are 100k records, and my initial attempt takes minutes. Attempt tried below (this is far from feasible).
records_completed_in_payrolls = [
p.report.id for p in PayrollItem.objects.select_related(
'record',
'payroll'
)
]
Because you have the related field record in PayrollItem you can reach into that model while you filter PayRecord. Using the __isnull should give you what you want.
PayRecord.objects.filter(payrollitem__isnull=True)
Translates to a sql statement like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON payroll_payrecord.id = payroll_payrollitem.record_id
WHERE payroll_payrollitem.id IS NULL
Depending on your intentions, you may want to chain on a .select_related (https://docs.djangoproject.com/en/3.1/ref/models/querysets/#select-related)
PayRecord.objects.filter(payrollitem__isnull=True).select_related('user')
which translates to something like:
SELECT payroll_payrecord.id,
payroll_payrecord.rate,
payroll_payrecord.user_id,
payroll_user.id,
payroll_user.name
FROM payroll_payrecord
LEFT OUTER JOIN payroll_payrollitem
ON (payroll_payrecord.id = payroll_payrollitem.record_id)
INNER JOIN payroll_user
ON (payroll_payrecord.user_id = payroll_user.id)
WHERE payroll_payrollitem.id IS NULL
The strangest thing, either I'm missing something basic, or maybe a django bug
for example:
class Author(Model):
name = CharField()
class Parent(Model):
name = CharField(
class Subscription(Model):
parent = ForeignKey(Parent, related_name='subscriptions')
class Book(Model):
name = CharField()
good_book = BooleanField()
author = ForeignKey(Author, related_name='books')
class AggregatePerson(Model):
author = OneToOneField(Author, related_name='+')
parent = OneToOneField(Parent, related_name='+')
when I try:
AggregatePerson.objects.annotate(counter=Count('author__books')).order_by('counter')
everything work correctly. both ordering and fields counter and existing_subs show the correct number BUT if I add the following:
AggregatePerson.objects.annotate(existing_subs=Count('parent__subscriptions')).exclude(existing_subs=0).annotate(counter=Count('author__books')).order_by('counter')
Then counter and existing_subs fields become 18
Why 18? and what am I doing wrong?
Thanks for the help!
EDIT clarification after further research:
is the number of parent__subscriptions, the code breaks even without the exclude, **for some reason counter also gets the value of existing_subs
I found the answer to this issue.
Tl;dr:
You need to add distinct=True inside the Count like this:
AggregatePerson.objects.annotate(counter=Count('author__books', distinct=True))
Longer version:
Adding a Count annotation is adding a LEFT OUTER JOIN behind the scene. Since we add two annotations, both referring to the same table, the number of selected and grouped_by rows is increased since some rows may appear twice (once for the first annotation and another for the second annotation) because LEFT OUTER JOIN allows empty cells (rows) on select from the right table.
(repeating essentials of my reply in another forum)
This looks like a Django bug. Possible workarounds:
1) Add the two annotations in one annotate() call:
...annotate(existing_subs=Count('parent__subscriptions'),counter=Count('author__books'))...
2) Replace the annotation for existing_subs and exclude(existing_subs=0) with an exclude (parent__subscriptions=None).
Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)
A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.
This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps
Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)
I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.
The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.
You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.
As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])
You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)
The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.
maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)
I have the following models:
class Order_type(models.Model):
description = models.CharField()
class Order(models.Model):
type= models.ForeignKey(Order_type)
order_date = models.DateField(default=datetime.date.today)
status = models.CharField()
processed_time= models.TimeField()
I want a list of the order types that have orders that meet this criteria: (order_date <= today AND processed_time is empty AND status is not blank)
I tried:
qs = Order_type.objects.filter(order__order_date__lte=datetime.date.today(),\
order__processed_time__isnull=True).exclude(order__status='')
This works for the original list of orders:
orders_qs = Order.objects.filter(order_date__lte=datetime.date.today(), processed_time__isnull=True)
orders_qs = orders_qs.exclude(status='')
But qs isn't the right queryset. I think its actually returning a more narrowed filter (since no records are present) but I'm not sure what. According to this (django reference), because I'm referencing a related model I think the exclude works on the original queryset (not the one from the filter), but I don't get exactly how.
OK, I just thought of this, which I think works, but feels sloppy (Is there a better way?):
qs = Order_type.objects.filter(order__id__in=[o.id for o in orders_qs])
What's happening is that the exclude() query is messing things up for you. Basically, it's excluding any Order_type that has at least one Order without a status, which is almost certainly not what you want to happen.
The simplest solution in your case is to use order__status__gt='' in you filter() arguments. However, you will also need to append distinct() to the end of your query, because otherwise you'd get a QuerySet with multiple instances of the same Order_type if it has more than one Order that matches the query. This should work:
qs = Order_type.objects.filter(
order__order_date__lte=datetime.date.today(),
order__processed_time__isnull=True,
order__status__gt='').distinct()
On a side note, in the qs query you gave at the end of the question, you don't have to say order__id__in=[o.id for o in orders_qs], you can simply use order__in=orders_qs (you still also need the distinct()). So this will also work:
qs = Order_type.objects.filter(order__in=Order.objects.filter(
order_date__lte=datetime.date.today(),
processed_time__isnull=True).exclude(status='')).distinct()
Addendum (edit):
Here's the actual SQL that Django issues for the above querysets:
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
LEFT OUTER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE ("testapp_order"."order_date" <= E'2010-07-18'
AND "testapp_order"."processed_time" IS NULL
AND "testapp_order"."status" > E'' );
SELECT DISTINCT "testapp_order_type"."id", "testapp_order_type"."description"
FROM "testapp_order_type"
INNER JOIN "testapp_order"
ON ("testapp_order_type"."id" = "testapp_order"."type_id")
WHERE "testapp_order"."id" IN
(SELECT U0."id" FROM "testapp_order" U0
WHERE (U0."order_date" <= E'2010-07-18'
AND U0."processed_time" IS NULL
AND NOT (U0."status" = E'' )));
EXPLAIN reveals that the second query is ever so slightly more expensive (cost of 28.99 versus 28.64 with a very small dataset).
I have a model:
class Trades(models.Model):
userid = models.PositiveIntegerField(null=True, db_index=True)
positionid = models.PositiveIntegerField(db_index=True)
tradeid = models.PositiveIntegerField(db_index=True)
orderid = models.PositiveIntegerField(db_index=True)
...
and I want to execute next query:
select *
from trades t1
inner join trades t2
ON t2.tradeid = t1.positionid and t1.tradeid = t2.positionid
can it be done without hacks using Django ORM?
Thx!
select * ...
will take more work. If you can trim back the columns you want from the right hand side
table=SomeModel._meta.db_table
join_column_1=SomeModel._meta.get_field('field1').column
join_column_2=SomeModel._meta.get_field('field2').column
join_queryset=SomeModel.objects.filter()
# Force evaluation of query
querystr=join_queryset.query.__str__()
# Add promote=True and nullable=True for left outer join
rh_alias=join_queryset.query.join((table,table,join_column_1,join_column_2))
# Add the second conditional and columns
join_queryset=join_queryset.extra(select=dict(rhs_col1='%s.%s' % (rhs,join_column_2)),
where=['%s.%s = %s.%s' % (table,join_column_2,rh_alias,join_column_1)])
Add additional columns to have available to the select dict.
The additional constraints are put together in a WHERE after the ON (), which your SQL engine may optimize poorly.
I believe Django's ORM doesn't support doing a join on anything that isn't specified as a ForeignKey (at least, last time I looked into it, that was a limitation. They're always adding features though, so maybe it snuck in).
So your options are to either re-structure your tables so you can use proper foreign keys, or just do a raw SQL Query.
I wouldn't consider a raw SQL query a "hack". Django has good documentation on how to do raw SQL Queries.