Goal: Use RowNumber function to get number for each row, out of that, filter by a value, but retain the appropriate RowNumber given when not applying the filter, since otherwise RowNumber would always return 1.
Before translating to Django ORM, I find it helps to get the SQL syntax, which is this:
SELECT rn.row_number, name
FROM ( SELECT ROW_NUMBER() OVER (ORDER BY name), name
FROM customer ) as rn
WHERE name = 'Juan'
Problem I can´t manage to translate this to Django ORM. I have tried the following:
subq = models.Customer.objects.all().annotate(
rank=Window(
expression=RowNumber(),
order_by=(F('name'))
)
)
Here´s where I don´t know how to continue. How do I tell my models.Customer to use subq as the FROM in its query?
Related
I am trying to produce SQL like so:
SELECT ...
FROM bigtable
INNER JOIN (
SELECT DISTINCT key
FROM smalltable
WHERE smalltable.x = 'user_input'
) subq ON bigtable.key = subq.key
I have tried a handful of stuff with Django, and so far I've got:
subq = Smalltable.objects.filter(x='%s').values("key").distinct("key")
queryset = Bigtable.objects.extra(
tables=[f"({subq.query}) subq"],
where=["bigtable.key = smalltable.key"],
params=["user_input"],
)
The goal here is a cross join on bigtable and the DISTINCT smalltable. The ON clause is then replaced by a condition in the WHERE. In other words, a valid old-school inner join.
And Django ALMOST has it. It is producing SQL like so:
SELECT ...
FROM bigtable, "(SELECT DISTINCT ...) subq"
WHERE (bigtable.key = subq.key)
Note the double quotes - Django expects a table literal only there and is escaping it as so. How can I get this query done, in either this way or another way? It is important for me that it's an actual join vs IN or EXISTS for query planning purposes.
Django is to making a query much more complicated than it needs to be.
A Sentiment may have a User and a Card, and I am getting the Cards which are not in the passed User's Sentiments
This is the query:
Card.objects.all().exclude(sentiments__in=user.sentiments.all())
this is what Django runs:
SELECT * FROM "cards_card"
WHERE NOT ("cards_card"."id" IN (
SELECT V1."card_id" AS "card_id"
FROM "sentiments_sentiment" V1
WHERE V1."id" IN (
SELECT U0."id"
FROM "sentiments_sentiment" U0
WHERE U0."user_id" = 1
)
)
)
This is a version I came up with which didn't do an N-Times full table scan:
Card.objects.raw('
SELECT DISTINCT "id"
FROM "cards_card"
WHERE NOT "id" IN (
SELECT "card_id"
FROM "sentiments_sentiment"
WHERE "user_id" = ' + user_id + '
)
)')
I don't know why Django has to do it with the N-Times scan. I've been scouring the web for answers, but nothing so far. Any suggestions on how to keep the performance but not have to fall back to raw SQL?
A better way of writing this query without the subqueries would be:
Card.objects.all().exclude(sentiments__user__id=user.id)
I want to get a list of the latest purchase of each customer, sorted by the date.
The following query does what I want except for the date:
(Purchase.objects
.all()
.distinct('customer')
.order_by('customer', '-date'))
It produces a query like:
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
I am forced to use customer_id as the first ORDER BY expression because of DISTINCT ON.
I want to sort by the date, so what the query I really need should look like this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I don't want to sort using python because I still got to page limit the query. There can be tens of thousands of rows in the database.
In fact it is currently sorted by in python now and is causing very long page load times, so that's why I'm trying to fix this.
Basically I want something like this https://stackoverflow.com/a/9796104/242969. Is it possible to express it with django querysets instead of writing raw SQL?
The actual models and methods are several pages long, but here is the set of models required for the queryset above.
class Customer(models.Model):
user = models.OneToOneField(User)
class Purchase(models.Model):
customer = models.ForeignKey(Customer)
date = models.DateField(auto_now_add=True)
item = models.CharField(max_length=255)
If I have data like:
Customer A -
Purchase(item=Chair, date=January),
Purchase(item=Table, date=February)
Customer B -
Purchase(item=Speakers, date=January),
Purchase(item=Monitor, date=May)
Customer C -
Purchase(item=Laptop, date=March),
Purchase(item=Printer, date=April)
I want to be able to extract the following:
Purchase(item=Monitor, date=May)
Purchase(item=Printer, date=April)
Purchase(item=Table, date=February)
There is at most one purchase in the list per customer. The purchase is each customer's latest. It is sorted by latest date.
This query will be able to extract that:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id"
"shop_purchase.id"
"shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC,
"shop_purchase.date" DESC;
)
AS result
ORDER BY date DESC;
I'm trying to find a way not to have to use raw SQL to achieve this result.
This may not be exactly what you're looking for, but it might get you closer. Take a look at Django's annotate.
Here is an example of something that may help:
from django.db.models import Max
Customer.objects.all().annotate(most_recent_purchase=Max('purchase__date'))
This will give you a list of your customer models each one of which will have a new attribute called "most_recent_purchase" and will contain the date on which they made their last purchase. The sql produced looks like this:
SELECT "demo_customer"."id",
"demo_customer"."user_id",
MAX("demo_purchase"."date") AS "most_recent_purchase"
FROM "demo_customer"
LEFT OUTER JOIN "demo_purchase" ON ("demo_customer"."id" = "demo_purchase"."customer_id")
GROUP BY "demo_customer"."id",
"demo_customer"."user_id"
Another option, would be adding a property to your customer model that would look something like this:
#property
def latest_purchase(self):
return self.purchase_set.order_by('-date')[0]
You would obviously need to handle the case where there aren't any purchases in this property, and this would potentially not perform very well (since you would be running one query for each customer to get their latest purchase).
I've used both of these techniques in the past and they've both worked fine in different situations. I hope this helps. Best of luck!
Whenever there is a difficult query to write using Django ORM, I first try the query in psql(or whatever client you use). The SQL that you want is not this:
SELECT * FROM (
SELECT DISTINCT ON
"shop_purchase.customer_id" "shop_purchase.id" "shop_purchase.date"
FROM "shop_purchase"
ORDER BY "shop_purchase.customer_id" ASC, "shop_purchase.date" DESC;
) AS result
ORDER BY date DESC;
In the above SQL, the inner SQL is looking for distinct on a combination of (customer_id, id, and date) and since id will be unique for all, you will get all records from the table. I am assuming id is the primary key as per convention.
If you need to find the last purchase of every customer, you need to do something like:
SELECT "shop_purchase.customer_id", max("shop_purchase.date")
FROM shop_purchase
GROUP BY 1
But the problem with the above query is that it will give you only the customer name and date. Using that will not help you in finding the records when you use these results in a subquery.
To use IN you need a list of unique parameters to identify a record, e.g., id
If in your records id is a serial key, then you can leverage the fact that the latest date will be the maximum id as well. So your SQL becomes:
SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id";
Note that I kept only one field (id) in the selected clause to use it in a subquery using IN.
The complete SQL will now be:
SELECT *
FROM shop_customer
WHERE "shop_customer.id" IN
(SELECT max("shop_purchase.id")
FROM shop_purchase
GROUP BY "shop_purchase.customer_id");
and using the Django ORM it looks like:
(Purchase.objects.filter(
id__in=Purchase.objects
.values('customer_id')
.annotate(latest=Max('id'))
.values_list('latest', flat=True)))
Hope it helps!
I have a similar situation and this is how I'm planning to go about it:
query = Purchase.objects.distinct('customer').order_by('customer').query
query = 'SELECT * FROM ({}) AS result ORDER BY sent DESC'.format(query)
return Purchase.objects.raw(query)
Upside it gives me the query I want. Downside is that it is raw query and I can't append any other queryset filters.
This is my approach if I need some subset of data (N items) along with the Django query. This is example using PostgreSQL and handy json_build_object() function (Postgres 9.4+), but same way you can use other aggregate function in other database system. For older PostgreSQL versions you can use combination of array_agg() and array_to_string() functions.
Imagine you have Article and Comment models and along with every article in the list you want to select 3 recent comments (change LIMIT 3 to adjust size of subset or ORDER BY c.id DESC to change sorting of subset).
qs = Article.objects.all()
qs = qs.extra(select = {
'recent_comments': """
SELECT
json_build_object('comments',
array_agg(
json_build_object('id', id, 'user_id', user_id, 'body', body)
)
)
FROM (
SELECT
c.id,
c.user_id,
c.body
FROM app_comment c
WHERE c.article_id = app_article.id
ORDER BY c.id DESC
LIMIT 3
) sub
"""
})
for article in qs:
print(article.recent_comments)
# Output:
# {u'comments': [{u'user_id': 1, u'id': 3, u'body': u'foo'}, {u'user_id': 1, u'id': 2, u'body': u'bar'}, {u'user_id': 1, u'id': 1, u'body': u'joe'}]}
# ....
class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried stuff like:
Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]
But this either return nothing or Log entries of the same project..
Any help would be appreciated!
Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)
Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;
Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:
SELECT logs.*
FROM logs JOIN (
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
AND logs.date = latest.max_date;
Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:
SELECT * FROM (
SELECT logs.field1, logs.field2, logs.field3, logs.date
rank() over ( partition by project_id
order by "date" DESC ) as dateorder
FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;
OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.
I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.
I know this is an old post, but in Django 2.0, I think you could just use:
Log.objects.values('project').distinct().order_by('project')[:4]
You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).
latest_ids_per_project = Log.objects.values_list(
'project').annotate(latest=Max('date')).order_by(
'-latest').values_list('project')
log_objects = Log.objects.filter(
id__in=latest_ids_per_project[:4]).order_by('-date')
This looks a bit convoluted, but it actually results in a surprisingly compact query:
SELECT "log"."id",
"log"."project_id",
"log"."msg"
"log"."date"
FROM "log"
WHERE "log"."id" IN
(SELECT U0."id"
FROM "log" U0
GROUP BY U0."project_id"
ORDER BY MAX(U0."date") DESC
LIMIT 4)
ORDER BY "log"."date" DESC
I was following the documentation on FullTextSearch in postgresql. I've created a tsvector column and added the information i needed, and finally i've created an index.
Now, to do the search i have to execute a query like this
SELECT *, ts_rank_cd(textsearchable_index_col, query) AS rank
FROM client, plainto_tsquery('famille age') query
WHERE textsearchable_index_col ## query
ORDER BY rank DESC LIMIT 10;
I want to be able to execute this with Django's ORM so i could get the objects. (A little question here: do i need to add the tsvector column to my model?)
My guess is that i should use extra() to change the "where" and "tables" in the queryset
Maybe if i change the query to this, it would be easier:
SELECT * FROM client
WHERE plainto_tsquery('famille age') ## textsearchable_index_col
ORDER BY ts_rank_cd(textsearchable_index_col, plainto_tsquery(text_search)) DESC LIMIT 10
so id' have to do something like:
Client.objects.???.extra(where=[???])
Thxs for your help :)
Another thing, i'm using Django 1.1
Caveat: I'm writing this on a wobbly train, with a headcold, but this should do the trick:
where_statement = """plainto_tsquery('%s') ## textsearchable_index_col
ORDER BY ts_rank_cd(textsearchable_index_col,
plainto_tsquery(%s))
DESC LIMIT 10"""
qs = Client.objects.extra(where=[where_statement],
params=['famille age', 'famille age'])
If you were on Django 1.2 you could just call:
Client.objects.raw("""
SELECT *, ts_rank_cd(textsearchable_index_col, query) AS rank
FROM client, plainto_tsquery('famille age') query
WHERE textsearchable_index_col ## query
ORDER BY rank DESC LIMIT 10;""")