How to achieve left join without where clause in Django - django

House.objects.filter(owner__isnull=True) will produces following SQL:
SELECT "app_House"."id", "app_House"."name" FROM "app_House"
LEFT OUTER JOIN "app_Owner" ON ("app_House"."id" = "app_Owner"."house_id")
WHERE "app_Owner"."id" IS NULL
Is there a syntax available to force Django to omit the where clause?

If I understand you correctly, you want to select all houses without doing a database query each time you need informations about the owner.
For this Case the django-api hast the select_related() method. In your case you would have to use the following query:
House.objects.all().select_related('owner')

Related

Forcing evaluation of prefetched results in django queries

I'm trying to use select_related/prefetch_related to optimize some queries. However I have issues in "forcing" the queries to be evaluated all at once.
Say I'm doing the following:
fp_query = Fourprod.objects.filter(choisi=True).select_related("fk_fournis")
pf = Prefetch("fourprod", queryset=fp_query) #
products = Products.objects.filter(id__in=fp_query).prefetch_related(pf)
With models:
class Fourprod(models.Model):
fk_produit = models.ForeignKey(to=Produit, related_name="fourprod")
fk_fournis = models.ForeignKey(to=Fournis,related_name="fourprod")
choisi = models.BooleanField(...)
class Produit(models.Model):
... ordinary fields...
class Fournis(models.Model):
... ordinary fields...
So essentially, Fourprod has a fk to Fournis, Produit, and I want to prefetch those when I build the Produits queryset. I've checked in debug that the prefetch actually occurs and it does.
I have a bunch of fields from different models I need to use to compute results. I don't really control the table structure, so I have to work with this. I can't come up with a reasonable query to do it all with the queries (or using raw), so I want to compute stuff python-side. It's a few 1000 objects, so reasonable to do in-memory. So I cast to a list to force the query evaluation:
products = list(products)
At this point, I would think that the Products and the related objects that I have pre-fetched should have been fetched from the DB. In the logs, just after the list() call, I get this:
02/08/22 15:21:08 DEBUG DEFAULT: (0.019) SELECT "products_fourprod"."id", "products_fourprod"."fk_produit_id", "products_fourprod"."fk_fournis_id", "products_fourprod"."choisi", "products_fourprod"."code_four", "products_fourprod"."prix", "products_fourprod"."comment", "products_fournis"."id", "products_fournis"."fk_user_create_id", "products_fournis"."nom", "products_fournis"."adresse", "products_fournis"."ville", "products_fournis"."tel", "products_fournis"."fax", "products_fournis"."contact", "products_fournis"."note", "products_fournis"."pays", "products_fournis"."province", "products_fournis"."postal", "products_fournis"."monnaie", "products_fournis"."tel_long", "products_fournis"."inactif", "products_fournis"."inuse", "products_fournis"."par", "products_fournis"."fk_langue", "products_fournis"."NOTE2" FROM "products_fourprod" LEFT OUTER JOIN "products_fournis" ON ("products_fourprod"."fk_fournis_id" = "products_fournis"."id") WHERE ("products_fourprod"."choisi" AND "products_fourprod"."fk_produit_id" IN (... all Product.id meeting the conditions...)
But then, the list comprehension using the products takes forever to complete:
rows = [[p.id, p.fourprod.first().id, p.desuet, p.no_prod, ... ] for p in products]
With apparently each single call to p.fourprod resulting in a DB hit:
02/08/22 15:26:19 DEBUG DEFAULT: (0.000) SELECT "products_fourprod"."id", "products_fourprod"."fk_produit_id", "products_fourprod"."fk_fournis_id", "products_fourprod"."choisi", "products_fourprod"."code_four", "products_fourprod"."prix", "products_fourprod"."comment", "products_fournis"."id", "products_fournis"."fk_user_create_id", "products_fournis"."nom", "products_fournis"."adresse", "products_fournis"."ville", "products_fournis"."tel", "products_fournis"."fax", "products_fournis"."contact", "products_fournis"."note", "products_fournis"."pays", "products_fournis"."province", "products_fournis"."postal", "products_fournis"."monnaie", "products_fournis"."tel_long", "products_fournis"."inactif", "products_fournis"."inuse", "products_fournis"."par", "products_fournis"."fk_langue", "products_fournis"."NOTE2" FROM "products_fourprod" LEFT OUTER JOIN "products_fournis" ON ("products_fourprod"."fk_fournis_id" = "products_fournis"."id") WHERE ("products_fourprod"."choisi" AND "products_fourprod"."fk_produit_id" = 1185) ORDER BY "products_fourprod"."id" ASC LIMIT 1; args=(1185,)
02/08/22 15:26:19 DEBUG DEFAULT: (0.000) SELECT "products_fourprod"."id", (.... more similar db hits... )
If I remove all the uses of related objects, then the list() call has actually forced the db hit already and the query executes quickly.
So.... if simply calling products = list(products) does not force the db to be queried for the prefetched objects as well, is there any ways I can make django's orm do so?
From the docs:
Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.
first() implies a database query, so that will cause your query to not use the prefetched values.
Try to use p.fourprod.all()[0] instead to access the first related fourprod instead.

Annotate A object with the count of related B objects with Subquery

I am trying to annotate User with the count of delayed leads objects. The calculation of delayed leads is complex (uses RawSQL) implemented using a custom model manager. Hence, I am trying to implement this using a subquery.
sq = Lead.delayed.filter(assigned_to_id=OuterRef('pk'))
User.objects.annotate(num=Count(Subquery(sq.count())))
However, I keep getting this error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
UPDATE:
I tried adding only('id') so my code:
sq = Lead.delayed.filter(assigned_to_id=OuterRef('id')).only('id')
User.objects.annotate(num=Count(Subquery(sq)))
This generated the sql query:
SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`,
`auth_user`.`last_name`, COUNT((SELECT U0.`id` FROM
`lead` U0 WHERE U0.`assigned_to_id` = (`auth_user`.`id`))) AS `count`
FROM `auth_user` GROUP BY `auth_user`.`id`;
This is throwing error:
ERROR 1242 (21000): Subquery returns more than 1 row
I would like to have my query generated as:
SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`,
`auth_user`.`last_name`, (SELECT COUNT(U0.`id`) FROM `marketing_lead` U0 WHERE
(more complex conditions here) U0.`assigned_to_id` = (`auth_user`.`id`)) AS `count`
FROM `auth_user` GROUP BY `auth_user`.`id`;
How can I acheive that using django ORM?
Alternative question label might be How to use Count() not perform grouping (GROUP BY) or How to count all in a Subquery
Check this answer for custom Count function to just perform simple count on any queryset without grouping.
Unfortunately, so far haven't found native django option for this.
Though the answer from Oleg was quite close to my requirement but I was still getting an SQL error on the query generated by django. Hence, I ended up implementing using cursor.

Django annotate and LEFT OUTER JOIN with desired WHERE Clause

Django 1.10.6
Asset.objects.annotate(
coupon_saved=Count(
Q(coupons__device_id='8ae83c6fa52765061360f5459025cb85e6dc8905')
)
).all().query
produces the following query:
SELECT
"assets_asset"."id",
"assets_asset"."title",
"assets_asset"."description",
"assets_asset"."created",
"assets_asset"."modified",
"assets_asset"."uid",
"assets_asset"."org_id",
"assets_asset"."subtitle",
"assets_asset"."is_active",
"assets_asset"."is_generic",
"assets_asset"."file_standalone",
"assets_asset"."file_ios",
"assets_asset"."file_android",
"assets_asset"."file_preview",
"assets_asset"."json_metadata",
"assets_asset"."file_icon",
"assets_asset"."file_image",
"assets_asset"."video_mobile",
"assets_asset"."video_standalone",
"assets_asset"."file_coupon",
"assets_asset"."where_to_buy",
COUNT("games_coupon"."device_id" = 8ae83c6fa52765061360f5459025cb85e6dc8905) AS "coupon_saved"
FROM
"assets_asset"
LEFT OUTER JOIN
"games_coupon"
ON ("assets_asset"."id" = "games_coupon"."asset_id")
GROUP BY
"assets_asset"."id"
I need to get that device_id=X into LEFT OUTER JOIN definition below.
How to achieve?
TL;DR:
The condition should be in filter.
qs = (
Asset.objects
.filter(coupons__device_id='8ae83c6fa52765061360f5459025cb85e6dc8905')
.annotate(coupon_saved=Count('coupons'))
)
If you want only count > 0 then it can be filtered.
qs = qs.filter(coupon_saved__gt=0)
Footnotes: A one to many query is compiled to LEFT OUTER JOIN in order to be possible to get also base objects (Asset) with zero children. JOINs in Django are based every times on a ForeignKey to the primary key or similarly on OneToOne or ManyToMany, other conditions are compiled to WHERE.
Conditions in annotation (that you used) are possible e.g. as part of Conditional Expressions but it is more complicated to be used correctly and useful e.g. if you want to get many aggregations with many conditions by one query without subqueries and if a full scan is acceptable. This is probably not a subject of a question.

Executing Anti Join in Django ORM

I have two models:
class Note(model):
<attribs>
class Permalink(model):
note = foreign key to Note
I want to execute a query: get all notes which don't have a permalink.
In SQL, I would do it as something like:
SELECT * FROM Note WHERE id NOT IN (SELECT note FROM Permalink);
Wondering how to do this in ORM.
Edit: I don't want to get all the permalinks out into my application. Would instead prefer it to run as a query inside the DB.
You should be able to use this query:
Note.objects.filter(permalink_set__isnull=True)
you can use:
Note.objects.exclude(id__in=Permalink.objects.all().values_list('id', flat=True))

django's .extra(where= clauses are clobbered by table-renaming .filter(foo__in=... subselects

The short of it is, the table names of all queries that are inside a filter get renamed to u0, u1, ..., so my extra where clauses won't know what table to point to. I would love to not have to hand-make all the queries for every way I might subselect on this data, and my current workaround is to turn my extra'd queries into pk values_lists, but those are really slow and something of an abomination.
Here's what this all looks like. You can mostly ignore the details of what goes in the extra of this manager method, except the first sql line which points to products_product.id:
def by_status(self, *statii):
return self.extra(where=["""products_product.id IN
(SELECT recent.product_id
FROM (
SELECT product_id, MAX(start_date) AS latest
FROM products_productstatus
GROUP BY product_id
) AS recent
JOIN products_productstatus AS ps ON ps.product_id = recent.product_id
WHERE ps.start_date = recent.latest
AND ps.status IN (%s))""" % (', '.join([str(stat) for stat in statii]),)])
Which works wonderfully for all the situations involving only the products_product table.
When I want these products as a subselect, i do:
Piece.objects.filter(
product__in=Product.objects.filter(
pk__in=list(
Product.objects.by_status(FEATURED).values_list('id', flat=True))))
How can I keep the generalized abilities of a query set, yet still use an extra where clause?
At first: the issue is not totally clear to me. Is the second code block in your question the actual code you want to execute? If this is the case the query should work as expected since there is no subselect performed.
I assume so that you want to use the second code block without the list() around the subselect to prevent a second query being performed.
The django documentation refers to this issue in the documentation about the extra method. However its not very easy to overcome this issue.
The easiest but most "hakish" solution is to observe which table alias is produced by django for the table you want to query in the extra method. You can rely on the persistent naming of this alias as long as you construct the query always in the same fashion (you don't change the order of multiple extra methods or filter calls that cause a join).
You can inspect a query that will be execute in the DB queryset by using:
print Model.objects.filter(...).query
This will reveal the aliases that are used for the tables you want to query.
As of Django 1.11, you should be able to use Subquery and OuterRef to generate an equivalent query to your extra (using a correlated subquery rather than a join):
def by_status(self, *statii):
return self.filter(
id__in=Subquery(ProductStatus.values("product_id").filter(
status__in=statii,
product__in=Subquery(ProductStatus.objects.values(
"product_id",
).annotate(
latest=Max("start_date"),
).filter(
latest=OuterRef("start_date"),
).values("product_id"),
),
)
You could probably do it with Window expressions as well (as of Django 2.0).
Note that this is untested, so may need some tweaks.