Django with filtered join - django

I have an inner join:
SELECT c.name, b.name FROM company c
INNER JOIN branch b ON b.company_id = c.id
WHERE c.id = 5 AND b.employees > 10;
That I got 3 registers.
How do I make this query in Django to returns only 3 registers?
c = Company.objects.prefetch_related("comp_branches").filter(id=5, comp_branches__employees__gt=10)
c[0].comp_branches.all() # returns 5 registers

You can use a Prefetch object [Django-doc]:
from django.db.models import Prefetch
c = Company.objects.prefetch_related(
Prefetch('comp_branches', Branch.objects.filter(employees__gte=10))
).filter(id=5)
Here Brach is the model that is targeted by comp_branches, this thus might be different.
If you .filter(), you do not filter on the related objects you are prefetching, since that is done in a separate query. The comp_brances__employees__gte=10 would only remove Companys from the QuerySet that have no brach with 10 or more employees.

Related

How to combine two filters from two models?

How to combine two filters from two models?
Must be work as AND (&)
Credit.objects.filter(id__in=CreditPayment.objects.filter(security='Deposit - deposit').values('credit__id').distinct(), bank__id=1))
Credit.objects.filter(id__in=Condition.objects.filter(purpose=3).values('credit__id').distinct(), bank__id=1))
you can use django Q with &
from django.db.models import Q
Credit.objects.filter(Q(id__in=CreditPayment.objects.filter(security='Deposit - deposit').values('credit__id').distinct()) & Q(id__in=Condition.objects.filter(purpose=3).values('credit__id').distinct()), bank__id=1))
Given I understood it correctly, you can do this with two filter statements where you join on the Condition model, like:
Credit.objects.filter(
bank__id=1,
condition__security='Deposit - deposit'
).filter(
condition__purpose=3
).distinct()
This yields a query like:
SELECT DISTINCT credit.*
FROM credit
INNER JOIN condition ON credit.id = condition.credit_id
INNER JOIN condition T3 ON credit.id = T3.credit_id
WHERE credit.bank_id = 1
AND condition.security = Deposit - deposit
AND T3.purpose = 3

Django conditional Subquery aggregate

An simplified example of my model structure would be
class Corporation(models.Model):
...
class Division(models.Model):
corporation = models.ForeignKey(Corporation)
class Department(models.Model):
division = models.ForeignKey(Division)
type = models.IntegerField()
Now I want to display a table that display corporations where a column will contain the number of departments of a certain type, e.g. type=10. Currently, this is implemented with a helper on the Corporation model that retrieves those, e.g.
class Corporation(models.Model):
...
def get_departments_type_10(self):
return (
Department.objects
.filter(division__corporation=self, type=10)
.count()
)
The problem here is that this absolutely murders performance due to the N+1 problem.
I have tried to approach this problem with select_related, prefetch_related, annotate, and subquery, but I havn't been able to get the results I need.
Ideally, each Corporation in the queryset should be annotated with an integer type_10_count which reflects the number of departments of that type.
I'm sure I could do something with raw sql in .extra(), but the docs announce that it is going to be deprecated (I'm on Django 1.11)
EDIT: Example of raw sql solution
corps = Corporation.objects.raw("""
SELECT
*,
(
SELECT COUNT(*)
FROM foo_division div ON div.corporation_id = c.id
JOIN foo_department dept ON dept.division_id = div.id
WHERE dept.type = 10
) as type_10_count
FROM foo_corporation c
""")
I think with Subquery we can get SQL similar to one you have provided, with this code
# Get amount of departments with GROUP BY division__corporation [1]
# .order_by() will remove any ordering so we won't get additional GROUP BY columns [2]
departments = Department.objects.filter(type=10).values(
'division__corporation'
).annotate(count=Count('id')).order_by()
# Attach departments as Subquery to Corporation by Corporation.id.
# Departments are already grouped by division__corporation
# so .values('count') will always return single row with single column - count [3]
departments_subquery = departments.filter(division__corporation=OuterRef('id'))
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(
departments_subquery.values('count'), output_field=IntegerField()
)
)
The generated SQL is
SELECT "corporation"."id", ... (other fields) ...,
(
SELECT COUNT("division"."id") AS "count"
FROM "department"
INNER JOIN "division" ON ("department"."division_id" = "division"."id")
WHERE (
"department"."type" = 10 AND
"division"."corporation_id" = ("corporation"."id")
) GROUP BY "division"."corporation_id"
) AS "departments_of_type_10"
FROM "corporation"
Some concerns here is that subquery can be slow with large tables. However, database query optimizers can be smart enough to promote subquery to OUTER JOIN, at least I've heard PostgreSQL does this.
1. GROUP BY using .values and .annotate
2. order_by() problems
3. Subquery
You should be able to do this with a Case() expression to query the count of departments that have the type you are looking for:
from django.db.models import Case, IntegerField, Sum, When, Value
Corporation.objects.annotate(
type_10_count=Sum(
Case(
When(division__department__type=10, then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
I like the following way of doing it:
departments = Department.objects.filter(
type=10,
division__corporation=OuterRef('id')
).annotate(
count=Func('id', 'Count')
).values('count').order_by()
corporations = Corporation.objects.annotate(
departments_of_type_10=Subquery(depatments)
)
The more details on this method you can see in this answer: https://stackoverflow.com/a/69020732/10567223

Using existing field values in django update query

I want to update a bunch of rows in a table to set the id = self.id. How would I do the below?
from metadataorder.tasks.models import Task
tasks = Task.objects.filter(task_definition__cascades=False)
.update(shared_task_id=self.id)
The equivalent SQL would be:
update tasks_task t join tasks_taskdefinition d
on t.task_definition_id = d.id
set t.shared_task_id = t.id
where d.cascades = 0
You can do this using an F expression:
from django.db.models import F
tasks = Task.objects.filter(task_definition__cascades=False)
.update(shared_task_id=F('id'))
There are some restrictions on what you can do with F objects in an update call, but it'll work fine for this case:
Calls to update can also use F expressions to update one field based on the value of another field in the model.
However, unlike F() objects in filter and exclude clauses, you can’t introduce joins when you use F() objects in an update – you can only reference fields local to the model being updated. If you attempt to introduce a join with an F() object, a FieldError will be raised[.]
https://docs.djangoproject.com/en/dev/topics/db/queries/#updating-multiple-objects-at-once
I stumbled upon this topic and noticed Django's limitation of updates with foreign keys, so I now use raw SQL in Django:
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("UPDATE a JOIN b ON a.b_id = b.id SET a.myField = b.myField")

django filtering vs python filtering with prefetched objects

I'm trying to optimize some Django code, and I've got two similar approach that are performing differetly. Here are some example models:
class A(models.Model):
name = models.CharField(max_length=100)
class B(models.Model):
name = models.CharField(max_length=100)
a = models.ForeignKey(A)
c = models.ForeignKey(C)
class C(models.Model):
name = models.CharField(max_length=100)
For each A object, I'd like to iterate over a subset of its incoming B's, filtered on the their c value. Simple:
for a in A.objects.all() :
for b in a.B_set.filter( c__name='some_val' ) :
print a.name, b.name, c.name
The problem with this is that there is a new database lookup for every a value iterated over.
It seems that the solution is to prefetch the c values which will feed into the filter.
qs_A = A.objects.all().prefetch_related('B_set__c')
Now consider the following two filter approaches:
# Django filter
for a in qs_A :
for b in a.B_set.filter( c__name='some_val' ) :
print a.name, b.name, n.name
# Python filter
for a in qs_A :
for b in filter( lambda b: b.c.name == 'some_val', a.B_set.all() ):
print a.name, b.name, c.name
With the data I'm using, the django filter makes 48 more SQL queries than the python filter (on a 12-element qs_A result set). This makes me believe that the django filter doesn't make use of the prefetched tables.
Could someone explain what is happened?
Perhaps it's possible to apply the filter during the prefetch?
Prefetch and filtering don't have any direct connection... The filtering always happens inside your database, whereas prefetch_related's main purpose is to get data for related objects when outputting them or something similar.
Lesser SQL queries are mostly better, but if you want to optimize your use case you should perform some benchmarking and profiling and not rely on some general statements!
You could probably make your example more efficient if you wouldn't work with A in the first place but with B instead:
qs = B.objects.select_related('a', 'c').filter(c__name='some val')
# maybe you need some filtering for a as well:
# qs = qs.filter(a__x=....)
for b in qs:
print b.a.name, b.name, b.c.name
Maybe you'll need to do some regrouping/ordering after filtering (in python) but if you can already perform all the filtering action in one step it'll be more efficient...
Otherwise maybe look at raw sql queries...

Django: filter a RawQuerySet

i've got some weird query, so i have to execute raw SQL. The thing is that this query is getting bigger and bigger and with lots of optional filters (ordering, column criteria, etc.).
So, given the this query:
SELECT DISTINCT Camera.* FROM Camera c
INNER JOIN cameras_features fc1 ON c.id = fc1.camera_id AND fc1.feature_id = 1
INNER JOIN cameras_features fc2 ON c.id = fc2.camera_id AND fc2.feature_id = 2
This is roughly the Python code:
def get_cameras(features):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
return Camera.objects.raw(query, tuple(features))
This is working great, but i need to add more filters and ordering, for example suppose i need to filter by color and order by price, it starts to grow:
#extra_filters is a list of tuples like:
# [('price', '=', '12'), ('color' = 'blue'), ('brand', 'like', 'lum%']
def get_cameras_big(features,extra_filters=None,order=None):
query = "SELECT DISTINCT Camera.* FROM Camera c"
i = 1
for f in features:
alias_name = "fc%s" % i
query += "INNER JOIN cameras_features %s ON c.id = %s.camera_id AND %s.feature_id = " % (alias_name,alias_name,alias_name)
query += " %s "
i += 1
if extra_filters:
query += " WHERE "
for ef in extra_filters:
query += "%s %s %s" % ef #not very safe, refactoring needed
if order:
query += "order by %s" % order
return Camera.objects.raw(query, tuple(features))
So, i don't like how it started to grow, i know Model.objects.raw() returns a RawQuerySet, so i'd like to do something like this:
queryset = get_cameras( ... )
queryset.filter(...)
queryset.order_by(...)
But this doesn't work. Of course i could just perform the raw query and after that get the an actual QuerySet with the data, but i will perform two querys. Like:
raw_query_set = get_cameras( ... )
camera.objects.filter(id__in(raw_query_set.ids)) #don't know if it works, but you get the idea
I'm thinking that something with the QuerySet init or the cache may do the trick, but haven't been able to do it.
.raw() is an end-point. Django can't do anything with the queryset because that would require being able to somehow parse your SQL back into the DBAPI it uses to create SQL in the first place. If you use .raw() it is entirely on you to construct the exact SQL you need.
If you can somehow reduce your query into something that could be handled by .extra() instead. You could construct whatever query you like with Django's API and then tack on the additional SQL with .extra(), but that's going to be your only way around.
There's another option: turn the RawQuerySet into a list, then you can do your sorting like this...
results_list.sort(key=lambda item:item.some_numeric_field, reverse=True)
and your filtering like this...
filtered_results = [i for i in results_list if i.some_field == 'something'])
...all programatically. I've been doing this a ton to minimize db requests. Works great!
I implemented Django raw queryset which supports filter(), order_by(), values() and values_list(). It will not work for any RAW query but for typical SELECT with some INNER JOIN or a LEFT JOIN it should work.
The FilteredRawQuerySet is implemented as a combination of Django model QuerySet and RawQuerySet, where the base (left part) of the SQL query is generated via RawQuerySet, while WHERE and ORDER BY directives are generared by QuerySet:
https://github.com/Dmitri-Sintsov/django-jinja-knockout/blob/master/django_jinja_knockout/query.py
It works with Django 1.8 .. 1.11.
It also has a ListQuerySet implementation for Prefetch object result lists of model instances as well, so these can be processed the same way as ordinary querysets.
Here is the example of usage:
https://github.com/Dmitri-Sintsov/djk-sample/search?l=Python&q=filteredrawqueryset&type=&utf8=%E2%9C%93
Another thing you can do is that if you are unable to convert it to a regular QuerySet is to create a View in your database backend. It basically executes the query in the View when you access it. In Django, you would then create an unmanaged model to attach to the View. With that model, you can apply filter as if it were a regular model. With your foreign keys, you would set the on_delete arg to models.DO_NOTHING.
More information about unmanaged models:
https://docs.djangoproject.com/en/2.0/ref/models/options/#managed