Django orm subquery - in clause without substitution - django

I need to build a query using Django ORM, that looks like this one in SQL:
select * from A where id not in (select a_id from B where ... )
I try to use such code:
ids = B.objects.filter(...)
a_objects = A.object.exclude(id__in=Subquery(ids.values('a__id'))).all()
The problem is that instead of nested select Django generates query that looks like
select * from A where id not in (1, 2, 3, 4, 5 ....)
where in clause explicitly lists all ids that should be excluded, making result sql unreadable when it is printed into logs. Is it possible to adjst this query, so nested select is used?

So I see that your goal is to get all the A's that have no foreign key relations from B's. If I'm right, then you can just use inverse lookup to do it.
So, when you define models like that:
class A:
pass
class B:
a = ForeignKey(to=a, related_name='bs')
You can filter it like this:
A.objects.filter(bs__isnull=True)
Also, if you don't define related_name, it will default to b_set, so you will be able to A.objects.filter(b_set__isnull=True)

to make a filter on B you can
ids = B.objects.filter(x=x).values_list('id',flat=true)
you get a list of ids then make
a_objects = A.object.exclude(id__in=ids)
as mentioned before if there is a relation

You don't need to do anything special, just use the queryset directly in your filter.
ids = B.objects.filter(...)
a_objects = A.object.exclude(id__in=ids).all()
# that should generate the subquery statement
select * from A where NOT (id in (select a_id from B where ... ))

Related

Django - getting list of values after annotating a queryset

I have a Django code like this:
max_id_qs = qs1.values('parent__id').\
annotate(max_id = Max('id'),).\
values_list('max_id', flat = True)
The problem is that when I use max_id_qs in a filter like this:
rs = qs2.filter(id__in = max_id_qs)
the query transforms into a MySQL query of the following structure:
select ... from ... where ... and id in (select max(id) from ...)
whereas the intended result should be
select ... from ... where ... and id in [2342, 233, 663, ...]
In other words, I get subquery instead of list of integers in the MySQL query which slows down the lookup dramatically. What surprises me is that I thought that Django's values_list returns a list of values.
So the question, how should I rewrite the code to achieve the desired MySQL query with integers instead of id in (select ... from...) subquery
Querysets are lazy, and .values_list still returns a queryset object. To evaluate it simply convert it into a list:
rs = qs2.filter(id__in=list(max_id_qs))

Django get all values Group By particular one field

I want to execute a simple query like:
select *,count('id') from menu_permission group by menu_id
In Django format I have tried:
MenuPermission.objects.all().values('menu_id').annotate(Count('id))
It selects only menu_id. The executed query is:
SELECT `menu_permission`.`menu_id`, COUNT(`menu_permission`.`id`) AS `id__count` FROM `menu_permission` GROUP BY `menu_permission`.`menu_id`
But I need other fields also. If I try:
MenuPermission.objects.all().values('id','menu_id').annotate(Count('id))
It adds 'id' in group by condition.
GROUP BY `menu_permission`.`id`
As a result I am not getting the expected result. How I can get all all fields in the output but group by a single one?
You can try subqueries to do what you need.
In my case I have two tables: Item and Transaction where item_id links to Item
First, I prepare Transaction subquery with group by item_id where I sum all amount fields and mark item_id as pk for outer query.
per_item_total=Transaction.objects.values('item_id').annotate(total=Sum('amount')).filter(item_id=OuterRef('pk'))
Then I select all rows from item plus subquery result as total filed.
items_with_total=Item.objects.annotate(total=Subquery(per_item_total.values('total')))
This produces the following SQL:
SELECT `item`.`id`, {all other item fields},
(SELECT SUM(U0.`amount`) AS `total` FROM `transaction` U0
WHERE U0.`item_id` = `item`.`id` GROUP BY U0.`item_id` ORDER BY NULL) AS `total` FROM `item`
You are trying to achieve this SQL:
select *, count('id') from menu_permission group by menu_id
But normally SQL requires that when a group by clause is used you only include those column names in the select that you are grouping by. This is not a django matter, but that's how SQL group by works.
The rows are grouped by those columns so those columns can be included in select and other columns can be aggregated if you want them to into a value. You can't include other columns directly as they may have more than one value (since the rows are grouped).
For example if you have a column called "permission_code", you could ask for an array of the values in the "permission_code" column when the rows are grouped by menu_id.
Depending on the SQL flavor you are using, this could be in PostgreSQL something like this:
select menu_id, array_agg(permission_code), count(id) from menu_permissions group by menu_id
Similary django queryset can be constructed for this.
Hopefully this helps, but if needed please share more about what you need to do and what your data models are.
The only way currently that it works as expected is to hve your query based on the model you want the GROUP BY to be based on.
In your case it looks like you have a Menu model (menu_id field foreign key) so doing this would give you what you want and will allow getting other aggregate information from your MenuPermission model but will only group by the Menu.id field:
Menu.objects.annotate(perm_count=Count('menupermission__id')).values('perm_count')
Of course there is no need for the "annotate" intermediate step if all you want is that single count.
query = MenuPermission.objects.values('menu_id').annotate(menu_id_count=Count('menu_id'))
You can check your SQL query by print(query.query)
This solution doesn't work, all fields end up in the group by clause, leaving it here because it may still be useful to someone.
model_fields = queryset.model._meta.get_fields()
queryset = queryset.values('menu_id') \
.annotate(
count=Count('id'),
**{field.name: F(field.name) for field in model_fields}
)
What i'm doing is getting the list of fields of our model, and set up a dictionary with the field name as key and an F instance with the field name as a parameter.
When unpacked (the **) it gets interpreted as named arguments passed into the annotate function.
For example, if we had a "name" field on our model, this annotate call would end up being equal to this:
queryset = queryset.values('menu_id') \
.annotate(
count=Count('id'),
name=F("name")
)
you can use the following code:
MenuPermission.objects.values('menu_id').annotate(Count('id)).values('field1', 'field2', 'field3'...)

django how to write queryset which matches NESTED SELECT

I need to get queryset, which is similar to this in SQL:
select * from kraj
where kraj_id in (select kraj_id from klient_kraj where klient_id = 1)
As you can see, I work with klient_kraj model, which is filtered and 1 column is returned kraj_id, which is then used for another filtering.
I wasn't able to find way, how to obtain this queryset using ORM.
Thanks

Annotating a Django queryset with a left outer join?

Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)
A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.
This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps
Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)
I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.
The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.
You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.
As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])
You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)
The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.
maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)

Limit django queryset by another related table

Lets say I have 2 django models like this:
class Spam(models.Model):
somefield = models.CharField()
class Eggs(models.Model):
parent_spam = models.ForeignKey(Spam)
child_spam = models.ForeignKey(Spam)
Given the input of a "Spam" object, how would the django query looks like that:
Limits this query based on the parent_spam field in the "Eggs" table
Gives me the corresponding child_spam field
And returns a set of "Spam" objects
In SQL:
SELECT * FROM Spam WHERE id IN (SELECT child_spam FROM Eggs WHERE parent_spam = 'input_id')
I know this is only an example, but this model setup doesn't actually validate as it is - you can't have two separate ForeignKeys pointing at the same model without specifying a related_name. So, assuming the related names are egg_parent and egg_child respectively, and your existing Spam object is called my_spam, this would do it:
my_spam.egg_parent.child_spam.all()
or
Spam.objects.filter(egg_child__parent_spam=my_spam)
Even better, define a ManyToManyField('self') on the Spam model, which handles all this for you, then you would do:
my_spam.other_spams.all()
According to your sql code you need something like this
Spam.objects.filter(id__in= \
Eggs.objects.values_list('child_spam').filter(parent_spam='input_id'))