Emitting a subquery in the FROM in Django - django

I am trying to produce SQL like so:
SELECT ...
FROM bigtable
INNER JOIN (
SELECT DISTINCT key
FROM smalltable
WHERE smalltable.x = 'user_input'
) subq ON bigtable.key = subq.key
I have tried a handful of stuff with Django, and so far I've got:
subq = Smalltable.objects.filter(x='%s').values("key").distinct("key")
queryset = Bigtable.objects.extra(
tables=[f"({subq.query}) subq"],
where=["bigtable.key = smalltable.key"],
params=["user_input"],
)
The goal here is a cross join on bigtable and the DISTINCT smalltable. The ON clause is then replaced by a condition in the WHERE. In other words, a valid old-school inner join.
And Django ALMOST has it. It is producing SQL like so:
SELECT ...
FROM bigtable, "(SELECT DISTINCT ...) subq"
WHERE (bigtable.key = subq.key)
Note the double quotes - Django expects a table literal only there and is escaping it as so. How can I get this query done, in either this way or another way? It is important for me that it's an actual join vs IN or EXISTS for query planning purposes.

Related

How to convert this SQL query to Django Queryset?

I have this query which selects values from two different tables and used array agg over matched IDs how can I get same results using the queryset. Thank you!
select
sf.id_s2_users ,
array_agg(sp.id)
from
s2_followers sf
left join s2_post sp on
sp.id_s2_users = sf.id_s2_users1
where
sp.id_s2_post_status = 1
and sf.id_s2_user_status = 1
group by
sf.id_s2_users
You can run raw SQL queries with Django's ORM if that's what you wanted. You don't have to change your query in that case, you can check documentation here.

Django ORM - use SubQuery in FROM clause

Goal: Use RowNumber function to get number for each row, out of that, filter by a value, but retain the appropriate RowNumber given when not applying the filter, since otherwise RowNumber would always return 1.
Before translating to Django ORM, I find it helps to get the SQL syntax, which is this:
SELECT rn.row_number, name
FROM ( SELECT ROW_NUMBER() OVER (ORDER BY name), name
FROM customer ) as rn
WHERE name = 'Juan'
Problem I can´t manage to translate this to Django ORM. I have tried the following:
subq = models.Customer.objects.all().annotate(
rank=Window(
expression=RowNumber(),
order_by=(F('name'))
)
)
Here´s where I don´t know how to continue. How do I tell my models.Customer to use subq as the FROM in its query?

How to enforce Django to use "JOIN VALUES"

I'm having a performance problem where I need to replace section of my query statement. Right now I have a the following:
select count(*) FROM "mytable" WHERE "field" IN ('v1', 'v2', ..., 'vN');
this can be translated to Django ORM:
Mytable.objects.all().filter(field__in=[myvalues]).count()
I need to do the following though:
select count(*) FROM "mytable" JOIN (values ('v1', 'v2', ..., 'vN')) as lookup(value) on lookup.value = "mytable".field;
Is there a way to add this to the ORM? I need to do with ORM because I already have other filters. Worst case scenario I thought of getting the query string and adding there manually...
I'm using Postgresql 9.6
I found a way after reading over and over the documentation. I even found a patch that was not merged a while ago.
It doesn't really do the join, but it works much faster than using __in straightforward.
What I'm doing is executing a RawSQL() that was introduced in Django 2.0 and with that result I do the __in again.
So here is a code example:
query = """select myfield from mytable join (values
('v1'), ('v2'), ..., ('vN')
) as lookup(value) on lookup.value = mytable.myfield"""
r = RawSQL(query, [])
mymodel.filter(myfield__in=r)
Now it takes miliseconds instead of minutes!

subquery in join with doctrine dql

I want to use DQL to create a query which looks like this in SQL:
select
e.*
from
e
inner join (
select
uuid, max(locale) as locale
from
e
where
locale = 'nl_NL' or
locale = 'nl'
group by
uuid
) as e_ on e.uuid = e_.uuid and e.locale = e_.locale
I tried to use QueryBuilder to generate the query and subquery. I think they do the right thing by them selves but I can't combine them in the join statement. Does anybody now if this is possible with DQL? I can't use native SQL because I want to return real objects and I don't know for which object this query is run (I only know the base class which have the uuid and locale property).
$subQueryBuilder = $this->_em->createQueryBuilder();
$subQueryBuilder
->addSelect('e.uuid, max(e.locale) as locale')
->from($this->_entityName, 'e')
->where($subQueryBuilder->expr()->in('e.locale', $localeCriteria))
->groupBy('e.uuid');
$queryBuilder = $this->_em->createQueryBuilder();
$queryBuilder
->addSelect('e')
->from($this->_entityName, 'e')
->join('('.$subQueryBuilder.') as', 'e_')
->where('e.uuid = e_.uuid')
->andWhere('e.locale = e_.locale');
You cannot put a subquery in the FROM clause of your DQL.
I will assume that your PK is {uuid, locale}, as of discussion with you on IRC. Since you also have two different columns in your query, this can become ugly.
What you can do is putting it into the WHERE clause:
select
e
from
MyEntity e
WHERE
e.uuid IN (
select
e2.uuid
from
MyEntity e2
where
e2.locale IN (:selectedLocales)
group by
e2.uuid
)
AND e.locale IN (
select
max(e3.locale) as locale
from
MyEntity e3
where
e3.locale IN (:selectedLocales)
group by
e3.uuid
)
Please note that I used a comparison against a (non empty) array of locales that you bind to to the :selectedLocales. This is to avoid destroying the query cache if you want to match against additional locales.
I also wouldn't suggest building this with the query builder if there's no real advantage in doing so since it will just make it simpler to break the query cache if you add conditionals dynamically (also, it's 3 query builders involved!)

Django: Distinct foreign keys

class Log:
project = ForeignKey(Project)
msg = CharField(...)
date = DateField(...)
I want to select the four most recent Log entries where each Log entry must have a unique project foreign key. I've tries the solutions on google search but none of them works and the django documentation isn't that very good for lookup..
I tried stuff like:
Log.objects.all().distinct('project')[:4]
Log.objects.values('project').distinct()[:4]
Log.objects.values_list('project').distinct('project')[:4]
But this either return nothing or Log entries of the same project..
Any help would be appreciated!
Queries don't work like that - either in Django's ORM or in the underlying SQL. If you want to get unique IDs, you can only query for the ID. So you'll need to do two queries to get the actual Log entries. Something like:
id_list = Log.objects.order_by('-date').values_list('project_id').distinct()[:4]
entries = Log.objects.filter(id__in=id_list)
Actually, you can get the project_ids in SQL. Assuming that you want the unique project ids for the four projects with the latest log entries, the SQL would look like this:
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4;
Now, you actually want all of the log information. In PostgreSQL 8.4 and later you can use windowing functions, but that doesn't work on other versions/databases, so I'll do it the more complex way:
SELECT logs.*
FROM logs JOIN (
SELECT project_id, max(log.date) as max_date
FROM logs
GROUP BY project_id
ORDER BY max_date DESC LIMIT 4 ) as latest
ON logs.project_id = latest.project_id
AND logs.date = latest.max_date;
Now, if you have access to windowing functions, it's a bit neater (I think anyway), and certainly faster to execute:
SELECT * FROM (
SELECT logs.field1, logs.field2, logs.field3, logs.date
rank() over ( partition by project_id
order by "date" DESC ) as dateorder
FROM logs ) as logsort
WHERE dateorder = 1
ORDER BY logs.date DESC LIMIT 1;
OK, maybe it's not easier to understand, but take my word for it, it runs worlds faster on a large database.
I'm not entirely sure how that translates to object syntax, though, or even if it does. Also, if you wanted to get other project data, you'd need to join against the projects table.
I know this is an old post, but in Django 2.0, I think you could just use:
Log.objects.values('project').distinct().order_by('project')[:4]
You need two querysets. The good thing is it still results in a single trip to the database (though there is a subquery involved).
latest_ids_per_project = Log.objects.values_list(
'project').annotate(latest=Max('date')).order_by(
'-latest').values_list('project')
log_objects = Log.objects.filter(
id__in=latest_ids_per_project[:4]).order_by('-date')
This looks a bit convoluted, but it actually results in a surprisingly compact query:
SELECT "log"."id",
"log"."project_id",
"log"."msg"
"log"."date"
FROM "log"
WHERE "log"."id" IN
(SELECT U0."id"
FROM "log" U0
GROUP BY U0."project_id"
ORDER BY MAX(U0."date") DESC
LIMIT 4)
ORDER BY "log"."date" DESC