Django - getting list of values after annotating a queryset - django

I have a Django code like this:
max_id_qs = qs1.values('parent__id').\
annotate(max_id = Max('id'),).\
values_list('max_id', flat = True)
The problem is that when I use max_id_qs in a filter like this:
rs = qs2.filter(id__in = max_id_qs)
the query transforms into a MySQL query of the following structure:
select ... from ... where ... and id in (select max(id) from ...)
whereas the intended result should be
select ... from ... where ... and id in [2342, 233, 663, ...]
In other words, I get subquery instead of list of integers in the MySQL query which slows down the lookup dramatically. What surprises me is that I thought that Django's values_list returns a list of values.
So the question, how should I rewrite the code to achieve the desired MySQL query with integers instead of id in (select ... from...) subquery

Querysets are lazy, and .values_list still returns a queryset object. To evaluate it simply convert it into a list:
rs = qs2.filter(id__in=list(max_id_qs))

Related

Django orm subquery - in clause without substitution

I need to build a query using Django ORM, that looks like this one in SQL:
select * from A where id not in (select a_id from B where ... )
I try to use such code:
ids = B.objects.filter(...)
a_objects = A.object.exclude(id__in=Subquery(ids.values('a__id'))).all()
The problem is that instead of nested select Django generates query that looks like
select * from A where id not in (1, 2, 3, 4, 5 ....)
where in clause explicitly lists all ids that should be excluded, making result sql unreadable when it is printed into logs. Is it possible to adjst this query, so nested select is used?
So I see that your goal is to get all the A's that have no foreign key relations from B's. If I'm right, then you can just use inverse lookup to do it.
So, when you define models like that:
class A:
pass
class B:
a = ForeignKey(to=a, related_name='bs')
You can filter it like this:
A.objects.filter(bs__isnull=True)
Also, if you don't define related_name, it will default to b_set, so you will be able to A.objects.filter(b_set__isnull=True)
to make a filter on B you can
ids = B.objects.filter(x=x).values_list('id',flat=true)
you get a list of ids then make
a_objects = A.object.exclude(id__in=ids)
as mentioned before if there is a relation
You don't need to do anything special, just use the queryset directly in your filter.
ids = B.objects.filter(...)
a_objects = A.object.exclude(id__in=ids).all()
# that should generate the subquery statement
select * from A where NOT (id in (select a_id from B where ... ))

Raw query with rank over subquery / params not quoted

My Goal
I need PostgreSQL's rank() window function applied to an annotated queryset from Django's ORM. Django's sql query has to be a subquery in order to apply the window function and this is what I'm doing so far:
queryset = Item.objects.annotate(…)
queryset_with_rank = Items.objects.raw("""
select rank() over (order by points), *
from (%(subquery)s)""", { 'subquery': queryset.query }
)
The problem
Unfortunately, the query returned by queryset.query does not quote the parameters used for annotation correctly although the query itself is executed perfectly fine.
Example of returned query
The query returned by queryset_with_rank.query or queryset.query returns the following
"participation"."category" = )
"participation"."category" = amateur)
which I rather expected to be
"participation"."category" = '')
"participation"."category" = 'amateur')
Question
I noticed that the Django documentation states the following about Query.__str__()
Parameter values won't necessarily be quoted correctly, since that is done by the database interface at execution time.
As long as I fix the quotation manually and pass it to Postgres myself, everything works as expected. Is there a way to receive the needed subquery with correct quotation? Or is there an alternative and better approach to applying a window function to a Django ORM queryset altoghether?
As Django core developer Aymeric Augustin said, there's no way to get the exact query that is executed by the database backend beforehand.
I still managed to build the query the way I hoped to, although a bit cumbersome:
# Obtain query and parameters separately
query, params = item_queryset.query.sql_with_params()
# Put additional quotes around string. I guess this is what
# the database adapter does as well.
params = [
'\'{}\''.format(p)
if isinstance(p, basestring) else p
for p in params
]
# Cast list of parameters to tuple because I got
# "not enough format characters" otherwise. Dunno why.
params = tuple(params)
participations = Item.objects.raw("""
select *,
rank() over (order by points DESC) as rank
from ({subquery}
""".format(subquery=query.format(params)), []
)

How to do a filter with a queryset result Django

I was wondering if it is possible to filter a queryset result with a queryset object attribute.
Example:
clients = Client.objects.filter(name__contains=search)
This should return several objects
result = Invoice.objects.filter(client_id=clients.id)
Now I want all the data inside Invoice that corresponds to the clients.id found.
What is the most optimized way to do it? Since Django is a powerful framework, I was wondering if it has a good and fast way to do it without me having to add the primary result to a list and do a for loop.
You can do this by filtering directly Invoices using lookups
result = Invoice.objects.filter(client__name__contains=search)
Alternatively, you can find all clients, extract ids, and filter invoices by those id.
clients = Client.objects.filter(**your_crazy_search).values_list('id', flat=True).all()
result = Invoices.objects.filter(client_id__in=clients_id)
You don't even need to extract ID from clients, that will work just fine:
clients = Client.objects.filter(name__contains=search)
result = Invoices.objects.filter(client__in=clients)
It will result in SQL query:
SELECT * FROM invoices WHERE result.client_id IN (SELECT `id` FROM `client` WHERE ...)

Django ORM: is it possible to inject subqueries?

I have a Django model that looks something like this:
class Result(models.Model):
date = DateTimeField()
subject = models.ForeignKey('myapp.Subject')
test_type = models.ForeignKey('myapp.TestType')
summary = models.PositiveSmallIntegerField()
# more fields about the result like its location, tester ID and so on
Sometimes we want to retrieve all the test results, other times we only want the most recent result of a particular test type for each subject. This answer has some great options for SQL that will find the most recent result.
Also, we sometimes want to bucket the results into different chunks of time so that we can graph the number of results per day / week / month.
We also want to filter on various fields, and for elegance I'd like a QuerySet that I can then make all the filter() calls on, and annotate for the counts, rather than making raw SQL calls.
I have got this far:
qs = Result.objects.extra(select = {
'date_range': "date_trunc('{0}', time)".format("day"), # Chunking into time buckets
'rn' : "ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)"})
qs = qs.values('date_range', 'result_summary', 'rn')
qs = qs.order_by('-date_range')
which results in the following SQL:
SELECT (ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)) AS "rn", (date_trunc('day', time)) AS "date_range", "myapp_result"."result_summary" FROM "myapp_result" ORDER BY "date_range" DESC
which is kind of approaching what I'd like, but now I need to somehow filter to only get the rows where rn = 1. I tried using the 'where' field in extra(), which gives me the following SQL and error:
SELECT (ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)) AS "rn", (date_trunc('day', time)) AS "date_range", "myapp_result"."result_summary" FROM "myapp_result" WHERE "rn"=1 ORDER BY "date_range" DESC ;
ERROR: column "rn" does not exist
So I think the query that finds "rn" needs to be a subquery - but is it possible to do that somehow, perhaps using extra()?
I know I could do this with raw SQL but it just looks ugly! I'd love to find a nice neat way where I have a filterable QuerySet.
I guess the other option is to have a field in the model that indicates whether it is actually the most recent result of that test type for that subject...
I've found a way!
qs = Result.objects.extra(where = ["NOT EXISTS(SELECT * FROM myapp_result as T2 WHERE (T2.test_type_id = myapp_result.test_type_id AND T2.subject_id = myapp_result.subject ID AND T2.time > myapp_result.time))"])
This is based on a different option from the answer I referenced earlier. I can filter or annotate qs with whatever I want.
As an aside, on the way to this solution I tried this:
qq = Result.objects.extra(where = ["NOT EXISTS(SELECT * FROM myapp_result as T2 WHERE (T2.test_type_id = myapp_result.test_type_id AND T2.subject_id = myapp_result.subject ID AND T2.time > myapp_result.time))"])
qs = Result.objects.filter(id__in=qq)
Django embeds the subquery just as you want it to:
SELECT ...some fields... FROM "myapp_result"
WHERE ("myapp_result"."id" IN (SELECT "myapp_result"."id" FROM "myapp_result"
WHERE (NOT EXISTS(SELECT * FROM myapp_result as T2
WHERE (T2.subject_id = myapp_result.subject_id AND T2.test_type_id = myapp_result.test_type_id AND T2.time > myapp_result.time)))))
I realised this had more subqueries than I need, but I note it here as I can imagine it being useful to know that you can filter one queryset with another and Django does exactly what you'd hope for in terms of embedding the subquery (rather than, say, executing it and embedding the returned values, which would be horrid.)

Django: filter a RawQuerySet

i've got some weird query, so i have to execute raw SQL. The thing is that this query is getting bigger and bigger and with lots of optional filters (ordering, column criteria, etc.).
So, given the this query:
SELECT DISTINCT Camera.* FROM Camera c
INNER JOIN cameras_features fc1 ON c.id = fc1.camera_id AND fc1.feature_id = 1
INNER JOIN cameras_features fc2 ON c.id = fc2.camera_id AND fc2.feature_id = 2
This is roughly the Python code:
def get_cameras(features):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
return Camera.objects.raw(query, tuple(features))
This is working great, but i need to add more filters and ordering, for example suppose i need to filter by color and order by price, it starts to grow:
#extra_filters is a list of tuples like:
# [('price', '=', '12'), ('color' = 'blue'), ('brand', 'like', 'lum%']
def get_cameras_big(features,extra_filters=None,order=None):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
if extra_filters:
query += " WHERE "
for ef in extra_filters:
query += "%s %s %s" % ef #not very safe, refactoring needed
if order:
query += "order by %s" % order
return Camera.objects.raw(query, tuple(features))
So, i don't like how it started to grow, i know Model.objects.raw() returns a RawQuerySet, so i'd like to do something like this:
queryset = get_cameras( ... )
queryset.filter(...)
queryset.order_by(...)
But this doesn't work. Of course i could just perform the raw query and after that get the an actual QuerySet with the data, but i will perform two querys. Like:
raw_query_set = get_cameras( ... )
camera.objects.filter(id__in(raw_query_set.ids)) #don't know if it works, but you get the idea
I'm thinking that something with the QuerySet init or the cache may do the trick, but haven't been able to do it.
.raw() is an end-point. Django can't do anything with the queryset because that would require being able to somehow parse your SQL back into the DBAPI it uses to create SQL in the first place. If you use .raw() it is entirely on you to construct the exact SQL you need.
If you can somehow reduce your query into something that could be handled by .extra() instead. You could construct whatever query you like with Django's API and then tack on the additional SQL with .extra(), but that's going to be your only way around.
There's another option: turn the RawQuerySet into a list, then you can do your sorting like this...
results_list.sort(key=lambda item:item.some_numeric_field, reverse=True)
and your filtering like this...
filtered_results = [i for i in results_list if i.some_field == 'something'])
...all programatically. I've been doing this a ton to minimize db requests. Works great!
I implemented Django raw queryset which supports filter(), order_by(), values() and values_list(). It will not work for any RAW query but for typical SELECT with some INNER JOIN or a LEFT JOIN it should work.
The FilteredRawQuerySet is implemented as a combination of Django model QuerySet and RawQuerySet, where the base (left part) of the SQL query is generated via RawQuerySet, while WHERE and ORDER BY directives are generared by QuerySet:
https://github.com/Dmitri-Sintsov/django-jinja-knockout/blob/master/django_jinja_knockout/query.py
It works with Django 1.8 .. 1.11.
It also has a ListQuerySet implementation for Prefetch object result lists of model instances as well, so these can be processed the same way as ordinary querysets.
Here is the example of usage:
https://github.com/Dmitri-Sintsov/djk-sample/search?l=Python&q=filteredrawqueryset&type=&utf8=%E2%9C%93
Another thing you can do is that if you are unable to convert it to a regular QuerySet is to create a View in your database backend. It basically executes the query in the View when you access it. In Django, you would then create an unmanaged model to attach to the View. With that model, you can apply filter as if it were a regular model. With your foreign keys, you would set the on_delete arg to models.DO_NOTHING.
More information about unmanaged models:
https://docs.djangoproject.com/en/2.0/ref/models/options/#managed