Executing Anti Join in Django ORM - django

I have two models:
class Note(model):
<attribs>
class Permalink(model):
note = foreign key to Note
I want to execute a query: get all notes which don't have a permalink.
In SQL, I would do it as something like:
SELECT * FROM Note WHERE id NOT IN (SELECT note FROM Permalink);
Wondering how to do this in ORM.
Edit: I don't want to get all the permalinks out into my application. Would instead prefer it to run as a query inside the DB.

You should be able to use this query:
Note.objects.filter(permalink_set__isnull=True)

you can use:
Note.objects.exclude(id__in=Permalink.objects.all().values_list('id', flat=True))

Related

Django GROUP BY without aggregate

I would like to write the following query in Postgresql using Django ORM:
SELECT t.id, t.field1 FROM mytable t JOIN ... JOIN ... WHERE .... GROUP BY id
Note that there is NO aggregate function (like SUM, COUNT etc.) in the SELECT part.
By the way, it's a perfectly legal SQL to GROUP BY primary key only in Postgresql.
How do I write it in Django ORM?
I saw workarounds like adding .annotate(dummy=Count('*')) to the queryset (but it slows down the execution) or introducing a dummy custom aggregate function (but it's a dirty hack). How to do it in a clean way?

QuerySet in Django

How can I perform this SQL query
SELECT *
FROM table_name
WHERE column_name != value
in the QuerySet of Django?
I tried this but isn't correct way:
To execute sql queries just use raw method of Model like this:
posts = Post.objects.raw("SELECT * FROM table_name WHERE column_name != %s;", [value])
for post in posts:
# do stuff with post object
But i don't think you need raw query (unless you want to get rid of ORM overhead to fetch records quicker) you can just use ORM like this:
posts = Post.objects.all().exclude(column_name=value)
You can do it using Q:
from django.db.models import Q
posts = Post.objects.filter(~Q(column_name=value))
I think you want to do this query using Django ORM (If I am not wrong). You can do this using Q Expression in Django (https://docs.djangoproject.com/en/4.0/topics/db/queries/#s-complex-lookups-with-q-objects). For != you can use ~ sign. In your case, the query will look like this
Post.objects.filter(~Q(<column_name>=<value>))
Another way to use exclude method (https://docs.djangoproject.com/en/4.0/ref/models/querysets/#s-exclude)
Post.objects.exclude(<column_name>=<value>)
Both queries generate the same raw query in your case:
SELECT * FROM <table_name> WHERE NOT (<column_name>=<value)
If you want to do the raw query then you can use raw method (https://docs.djangoproject.com/en/4.0/topics/db/sql/)
posts = Post.objects.raw("SELECT * FROM table_name WHERE column_name != %s;", [value])
If you want to execute a custom raw query directly then use cursor from django.db connection (https://docs.djangoproject.com/en/4.0/topics/db/sql#s-executing-custom-sql-directly)

Annotating a QuerySet with joined tables in Django

How can I annotate a Django QuerySet with data from a custom join expression, without using raw SQL?
I'd like to translate the following query for the Django ORM, without having to use this question :
SELECT a.*, b.name as b_name
FROM a
JOIN b ON ST_Within(ST_Centroid(a.geom), b.geom)
As far as I could tell, the best candidate for doing something like this it the annotate(...) function, but the documentation didn't have anything on how to add a joined table to the annotated QuerySet.
My other idea was to use something similar to ManyToManyField (maybe subclass it) that can use custom ON ... expressions for its joined model.
Any other idea?

Prevent multiple SQL querys with model relations

Is it possible to prevent multiple querys when i use django ORM ? Example:
product = Product.objects.get(name="Banana")
for provider in product.providers.all():
print provider.name
This code will make 2 SQL querys:
1 - SELECT ••• FROM stock_product WHERE stock_product.name = 'Banana'
2 - SELECT stock_provider.id, stock_provider.name FROM stock_provider INNER JOIN stock_product_reference ON (stock_provider.id = stock_product_reference.provider_id) WHERE stock_product_reference.product_id = 1
I confess, i use Doctrine (PHP) for some projects. With doctrine it's possible to specify joins when retrieve the object (relations are populated in object, so no need to query database again for get attribute relation value).
Is it possible to do the same with Django's ORM ?
PS: I hop my question is comprehensive, english is not my primary language.
In Django 1.4 or later, you can use prefetch_related. It's like select_related but allows M2M relations and such.
product = Product.objects.prefetch_related('providers').get(name="Banana")
You still get two queries, though. From the docs:
prefetch_related, on the other hand, does a separate lookup for each relationship, and does the ‘joining’ in Python.
As for packing this down into a single query, Django won't do it like Doctrine because it doesn't do that much post-processing of the result set (Django would have to remove all the redundant column data, since you'll get a row per provider and each of these rows will have a copy of all of product's fields).
So if you want to pack this down to one query, you're going to have to turn it around and run the query on the Provider table (I'm guessing at your schema):
providers = Provider.objects.filter(product__name="Banana").select_related('product')
This should pack it down to one query, but you won't get a single product ORM object out of it, instead needing to get the product fields via providers[k].product.
You can use prefetch_related, sometimes in combination with select_related, to get all related objects in a single query: https://docs.djangoproject.com/en/1.5/ref/models/querysets/#prefetch-related

django orm - How to use select_related() on the Foreign Key of a Subclass from its Super Class

I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?
It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.
I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.