Django 1.11 OuterRef generating invalid Postgres - django

I have a model named Brand with a field named "name", so this works fine:
brands = (Brand
.objects
.annotate(as_string=models.functions.Concat('name',models.Value("''"))))
I have a model named Item with a foreign key to Brand. The following does NOT work:
annotated_brands = brands.filter(pk=models.OuterRef('brand'))
(Item
.objects
.annotate(brand_string=models.Subquery(annotated_brands.values('as_string'))))
Specifically, I get a ProgrammingError:
missing FROM-clause entry for table "policeinventory_item"
LINE 1: ... FROM "PoliceInventory_brand" U0 WHERE U0."id" = (PoliceInve...
This is born out when I inspect the SQL query. Here is the broken query:
'SELECT "PoliceInventory_item"."id", "PoliceInventory_item"."_created", "PoliceInventory_item"."_created_by_id", "PoliceInventory_item"."_last_updated", "PoliceInventory_item"."_last_updated_by_id", "PoliceInventory_item"."brand_id", "PoliceInventory_item"."type", (SELECT CONCAT(U0."name", \'\') AS "as_string" FROM "PoliceInventory_brand" U0 WHERE U0."id" = (PoliceInventory_item."brand_id") ORDER BY U0."_created" DESC) AS "brand_string" FROM "PoliceInventory_item" ORDER BY "PoliceInventory_item"."_created" DESC'
Note how the nested id comparison is performed with PoliceInventory, NOT "PoliceInventory" as it is referred to everywhere else. The following query works as expected:
'SELECT "PoliceInventory_item"."id", "PoliceInventory_item"."_created", "PoliceInventory_item"."_created_by_id", "PoliceInventory_item"."_last_updated", "PoliceInventory_item"."_last_updated_by_id", "PoliceInventory_item"."brand_id", "PoliceInventory_item"."type", (SELECT CONCAT(U0."name", \'\') AS "as_string" FROM "PoliceInventory_brand" U0 WHERE U0."id" = ("PoliceInventory_item"."brand_id") ORDER BY U0."_created" DESC) AS "brand_string" FROM "PoliceInventory_item" ORDER BY "PoliceInventory_item"."_created" DESC'
The problem seems pretty clearly to be OuterRef incorrectly failing to use the same table reference as the outer query, resulting in a mismatch. Does anyone know how I can persuade OuterRef to behave correctly?

Related

Queryset for a sql query

See Filtering unique values for the problem description, sample data and postgres query. I'd like to convert the SQL to a queryset. I feel like I'm close but not quite.
SELECT Column_A, Column_B, Column_C, 0 as RN
FROM TABLE
WHERE COLUMN_C is null and Column_B in (UserA, UserB, UserC)
UNION ALL
SELECT Column_A, Column_B, Column_C, RN
FROM (
SELECT A.*, ROW_NUMBER() over (partition by A.column_C Order by case A.column_B when 'UserA' then 0 else 1 end, U.Time_Created) rn
FROM Table A
INNER JOIN user U
on U.Column_B = A.Column_B
WHERE A.Column_C is not null and ColumnB in (userA, userB, UserC)) B
WHERE RN = 1
This is what I have so far:
qs1 = Table.objects.filter(Column_C__isnull=True).annotate(rn=Value(0))
qs2 = Table.objects.annotate(rn=Window(
expression=RowNumber(),
partition_by=[Column_C],
order_by=[Case(When(Column_B=UserA, then=0), default=1), 'Table_for_Column_B__time_created']
)).filter(Column_C__isnull=False, rn=1)
return qs2.union(qs1)
This doesn't quite work.
django.db.utils.NotSupportedError: Window is disallowed in the filter clause.
Next, I tried pulling the intermediate result in a subquery, to allow for filtering in the outer query, since I only really need rows with row number = 1.
qs1 = Table.objects.filter(Column_C__isnull=True).annotate(rn=Value(0))
qs2 = Table.objects.annotate(rn=Window(
expression=RowNumber(),
partition_by=[Column_C],
order_by=[Case(When(Column_B=UserA, then=0), default=1), 'Table_for_Column_B__time_created']
)).filter(pk=OuterRef('pk'))
qs3 = Table.objects.annotate(rn=Subquery(qs2.values('rn'))).filter(Column_C__isnull=False, rn=1)
return qs3.union(q1)
No exceptions this time, but this doesn't work. Every row in the table gets row_number=1 annotated. From the original example, the queryset returns all 7 rows instead of filtering to 5.
Is it possible to filter on window expressions?
What's the best practices to keep in mind when converting window queries to subqueries?
Is there a better way to structure the queryset?
You should be able to do this without a window expression, using a SubQuery
First create a queryset for the subquery that orders by the Column_B=UserA match and then time_created
from django.db.models import Case, When, Q, Subquery, OuterRef
tables_ordered = Table.objects.filter(
Column_C=OuterRef('Column_C')
).annotate(
user_match=Case(When(Column_B=UserA, then=0), default=1)
).order_by('user_match', 'time_created')
Then this subquery returns the first pk for the matched Column_C from the OuterRef, similar to selecting the first row from your window function
first_pk_for_each_column_c = Subquery(tables_ordered.values('pk')[:1])
Then use two Q objects to create an OR that selects the row if Column_C is NULL or the pk matches the first pk from the subquery
Table.objects.filter(
Q(Column_C__isnull=True) | Q(pk=first_pk_for_each_column_c)
)

Creating a Django object with field values from a SELECT equivalent

I am trying to figure out how to reproduce the following using Django - anyone help?
INSERT INTO table1 (table2_id, a_field)
SELECT table2.id as table2_id, table3.a_field
FROM table2
INNER JOIN table3 ON
table3.table2_id == table2.id
WHERE table2.id = 123
If I've got this correct (not my original query ;-) ), this is doing the following:
Creating an entry in table1 where...
a field named table2_id will match the id of a row in table2 and
a field named a_field will match the same named field in a_field in a row of table3 and
the table2/table3 objects from which these values are read are identified by a shared table2.id/table3.table_id2 relationship and also the table2 id being 123.
I don't see how such "calculated" field values can be passed to a create() or get_or_create() style command. It this perhaps possible using Q() objects?
Django model is an ORM framework.
In ORM way, you need
get table2Entity
construct a new table1Entity with table2Entity and related table3Entity values
save the table1Entity
def batch_save_entity2():
entity2 = Table2Entity.objects.get('123')
entity1 = Table1Entity()
entity1.table2_id = entity2.id
entity1.a_field = entity2.entity3.a_field
entity1.save()
or just execute sql directly without ORM
from django.db import connection
def my_custom_sql(self):
with connection.cursor() as cursor:
cursor.execute('''
INSERT INTO table1 (table2_id, a_field)
SELECT table2.id as table2_id, table3.a_field
FROM table2
INNER JOIN table3 ON
table3.table2_id == table2.id
WHERE table2.id = 123''')

Django ORM and GROUP BY

Newcommer to Django here.
I'm currently trying to fetch some data from my model with a query that need would need a GROUP BY in SQL.
Here is my simplified model:
class Message(models.Model):
mmsi = models.CharField(max_length=16)
time = models.DateTimeField()
point = models.PointField(geography=True)
I'm basically trying to get the last Message from every distinct mmsi number.
In SQL that would translates like this for example:
select a.* from core_message a
inner join
(select mmsi, max(time) as time from core_message group by mmsi) b
on a.mmsi=b.mmsi and a.time=b.time;
After some tries, I managed to have something working similarly with Django ORM:
>>> mf=Message.objects.values('mmsi').annotate(Max('time'))
>>> Message.objects.filter(mmsi__in=mf.values('mmsi'),time__in=mf.values('time__max'))
That works, but I find my Django solution quite clumsy. Not sure it's the proper way to do it.
Looking at the underlying query this looks like this :
>>> print(Message.objects.filter(mmsi__in=mf.values('mmsi'),time__in=mf.values('time__max')).query)
SELECT "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea FROM "core_message" WHERE ("core_message"."mmsi" IN (SELECT U0."mmsi" FROM "core_message" U0 GROUP BY U0."mmsi") AND "core_message"."time" IN (SELECT MAX(U0."time") AS "time__max" FROM "core_message" U0 GROUP BY U0."mmsi"))
I'd appreciate if you could propose a better solution for this problem.
Thanks !
You only need something like this:
Message.objects.all().distinct('mmsi').values('mmsi', 'time').order_by('mmsi','-id')
or like this:
Message.objects.all().values('mmsi').annotate(date_last=Max('time'))
Note: the last is translate by Django in this sql query:
SELECT "message"."mmsi", MAX("message"."time") AS "date_last" FROM "message" GROUP BY "message"."mmsi", "message"."time" ORDER BY "message"."time" DESC
Using the answers and comments, I managed to solve this using a subquery or a simple distinct order by.
Simple distinct order by solution inspired by #Oriphiel answer:
Message.objects.distinct('mmsi').order_by('mmsi','-time')
The underlying SQL query looks like this :
SELECT DISTINCT ON ("core_message"."mmsi") "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea
FROM "core_message"
ORDER BY "core_message"."mmsi" ASC, "core_message"."time" DESC
Simple and straightforward.
Subquery solution inspired by #DanielRoseman comment:
time_order=Message.objects.filter(mmsi=OuterRef('mmsi')).order_by('-time')
Message.objects.filter(id__in=Subquery(time_order.values('id')[:1]))
The underlying SQL query looks like this :
SELECT "core_message"."id", "core_message"."mmsi", "core_message"."time", "core_message"."point"::bytea
FROM "core_message"
WHERE "core_message"."id" IN
(SELECT U0."id" FROM "core_message" U0 WHERE U0."mmsi" = ("core_message"."mmsi") ORDER BY U0."time" DESC LIMIT 1)
A tad more complex but it gives more flexibility. If I wanted to get first five messages for every MMSI, I'd just need to change the LIMIT value. In Django, it would look like this :
Message.objects.filter(id__in=Subquery(time_order.values('id')[:5]))

Using Django Window Functions on a Filtered QuerySet

I ran into a surprising conundrum with Window Functions on Filtered QuerySets.
Consider two models: mymodel and relatedmodel where there is a one to many relationship (i.e. relatedmodel has a ForeignKey into mymodel).
I am using something like this:
window_lag = Window(expression=Lag("pk"), order_by=order_by)
window_lead = Window(expression=Lead("pk"), order_by=order_by)
window_rownnum = Window(expression=RowNumber(), order_by=order_by)
qs1 = mymodel.objects.filter(relatedmodel__field=XXX)
qs2 = qs1.annotate(row=window_rownnum, prior=window_lag, next=window_lead)
qs3 = qs2.filter(pk=myid)
which returns a lovely result for the object pk=myid. I now know its position in the filtered list and its prior and next and I use this to great effect in browsing filtered lists.
Significantly len(qs1) = len(qs2) is the size of the list and len(qs3)=1
Alas I just discovered this breaks badly when the filter is less specific:
window_lag = Window(expression=Lag("pk"), order_by=order_by)
window_lead = Window(expression=Lead("pk"), order_by=order_by)
window_rownnum = Window(expression=RowNumber(), order_by=order_by)
qs1 = mymodel.objects.filter(relatedmodel__field__contains=X)
qs2 = qs1.annotate(row=window_rownnum, prior=window_lag, next=window_lead)
qs3 = qs2.filter(pk=myid)
In this instance, qs2 suddenly has more rows in it that qs1! And len(qs2)>len(qs1).
Which totally breaks the browser in a sense (as prior and next are not reliable any more). The extra rows are duplicate mymodel objects, wherever more than one relatedmodel object matches the criterion.
I've traced this to the generated SQL.
This is the form of qs1's SQL:
SELECT DISTINCT
"mymodel"."id", "mymodel"."order_by" ....
FROM "mymodel"
INNER JOIN "relatedmodel" ON ("mymodel"."id" = "relatedmodel"."mymodel_id")
WHERE ("related_model"."field"::text LIKE '%X%')
ORDER BY "mymodel"."order_by" ASC
And this runs fine in my database engine as a SQL query, and returns the same number of rows that Django sees. All good.
Then the SQL that qs2 produces resembles:
SELECT DISTINCT
ROW_NUMBER() OVER (ORDER BY "mymodel"."order_by" ASC) AS "row",
LAG("mymodel"."id", 1) OVER (ORDER BY "mymodel"."order_by" ASC) AS "prior,
LEAD("mymodel"."id", 1) OVER (ORDER BY "mymodel"."order_by" ASC) AS "next",
"mymodel"."id", "mymodel"."order_by" ....
FROM "mymodel"
INNER JOIN "relatedmodel" ON ("mymodel"."id" = "relatedmodel"."mymodel_id")
WHERE ("related_model"."field"::text LIKE '%X%')
ORDER BY "mymodel"."order_by" ASC
And again this produces the same number of rows as I see in Django but that's more than qs1 when the relatedmodel matches more than once.
I can doctor the SQL and get what I want, namely by windowing after filtering:
SELECT
ROW_NUMBER() OVER (ORDER BY "mymodel"."order_by" ASC) AS "row"
LAG("mymodel"."id", 1) OVER (ORDER BY "mymodel"."order_by" ASC) AS "prior,
LEAD("mymodel"."id", 1) OVER (ORDER BY "mymodel"."order_by" ASC) AS "next",
"id", "order_by" ....
FROM (
SELECT DISTINCT
"mymodel"."id", "mymodel"."order_by" ....
FROM "mymodel"
INNER JOIN "relatedmodel" ON ("mymodel"."id" = "relatedmodel"."mymodel_id")
WHERE ("related_model"."field"::text LIKE '%X%')
ORDER BY "mymodel"."order_by" ASC
) AS QS
Which works beautifully and returns the same number of rows as qs1 again.
Adding just one window function inside the SELECT causes the DISTINCT to fail for some reason. DISTINCT works fine without a window function (returning only unique mymodel rows) but adding a window function breaks this.
Using the filter as a subquery on the window function works.
And Django supports subqueries, but I can't find a way to apply them here.
So I wonder if there is a way to do this. Namely, to apply the annotation as a wrapper around the QuerySet rather than as additional columns in the queryset.

Why is the Prefetch object being ignored in prefetch_related?

Using Django 1.7:
My Queryset looks like this:
return super(EmployeeViewSet, self).get_queryset()\
.filter(status__deleted_flag=False, **filter_kwargs)\
.prefetch_related('phone_number_set', 'email_address_set', 'street_address_set',
Prefetch('file_set', EmployeeFile.objects.active().select_related('content_type')))\
.order_by('last_name', 'first_name')
DjDT shows the prefetch query executed
SELECT ••• FROM "employees_employeefile" INNER JOIN "employees_employeefiletype"
ON ("employees_employeefile"."content_type_id" = "employees_employeefiletype"."id" )
WHERE ("employees_employeefile"."deleted_at" IS NULL
AND "employees_employeefile"."owner_id" IN (53, 81, ...))
ORDER BY "employees_employeefile"."created_at" DESC
Then DjDT shows each employee getting requeried for the same list of files
SELECT ••• FROM "employees_employeefile"
WHERE ("employees_employeefile"."owner_id" = 53
AND "employees_employeefile"."deleted_at" IS NULL)
ORDER BY "employees_employeefile"."created_at" DESC
They queries look the same to me, and the ModelManager method being used to do the select is the same one (EmployeeFile.objects.active()) vs employee.file_set.active()
Also, I tried removing the select_related('content_type') just in case that was the problem