I am facing one issue while filtering the data .
I am having three models ...
class Product:
size = models.CharField(max_length=200)
class Make(models.Model):
name = models.ForeignKey(Product, related_name='product_set')
class MakeContent(models.Model):
make = models.ForeignKey(Make, related_name='make_set')
published = models.BooleanField()
I can generate a queryset that contains all Makes and each one's related MakeContents where published = True.
Make.objects.filter(make_set__published=True)
I'd like to know if it's possible (without writing SQL directly) for me to generate a queryset that contains all Product and each one's related MakeContents where published = True.
I have tried this
Product.objects.filter(product_set__make_set__published=True)
But it's not working
A subquery can solve the problem.
from django.db.models import Subquery
sub_query = MakeContent.objects.filter(published=True)
Product.objects.filter(
pk__in=Subquery(sub_query.values('make__name__pk'))
)
Use this
Product.objects.filter(make__makecontent__published=True).distinct()
Filtering through ForeignKey works both forward and backwards in Django. You can use the model name in lower case for backward filtering.
https://docs.djangoproject.com/en/3.1/topics/db/queries/#lookups-that-span-relationships
Finally .distinct() is required to remove the multiple values produced due to SQL joins
Related
class Subject(models.Model):
...
students = models.ManyToMany('Student')
type = models.CharField(max_length=100)
class Student(models.Model):
class = models.IntergerField()
dropped = models.BooleanField()
...
subjects_with_dropouts = (
Subject.objects.filter(category=Subject.STEM).
prefetch_related(
Prefetch('students', queryset=Students.objects.filter(class=2020))
.annotate(dropped_out=Case(
When(
students__dropped=True,
then=True,
),
output_field=BooleanField(),
default=False,
))
.filter(dropped_out=True)
)
I am trying to get all Subjects from category STEM, that have dropouts of class 2020, but for some reason I get Subjects that have dropouts from other classes as well.
I know that I can achive with
subjects_with_dropouts = Subject.objects.filter(
category=Subject.STEM,
students__dropped=True,
students__class=2020,
)
But why 1st approach doesn't work? I am using PostgreSQL.
When using prefetch, the joining is done in python. A good way to think of this is that you have two tables in the first query. One of subjects with at least one student who dropped out (note that you are doing an aggregate there (Case) so there is a JOIN with a GROUP BY on student.id), and one of students in class of 2020 (this is separate than the join in the first table). The prefetch just says to join these two separate queries using the through table that contains both of their ids representing a connection that is auto generated by ManyToManyField.
A good way to see what is actually happening is by using print(QuerySet.query) where QuerySet is the instance of the QuerySet (Subject.objects.all()). Or if you have the means, django debug toolbar is a fantastic tool that shows you the EXPLAIN statement of each query in each endpoint.
I am suffering to get a query working despite all I have been trying based on my web search, and I think I need some help before becoming crazy.
I have four models:
class Series(models.Model):
puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
...
class Puzzle(models.Model):
puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
...
class SeriesElement(models.Model):
puzzle = models.ForeignKey(Puzzle,on_delete=models.CASCADE,verbose_name='Puzzle',)
series = models.ForeignKey(Series,on_delete=models.CASCADE,verbose_name='Series',)
puzzle_index = models.PositiveIntegerField(verbose_name='Order',default=0,editable=True,)
class Play(models.Model):
puzzle = models.ForeignKey(Puzzle, on_delete=models.CASCADE, related_name='plays')
user = models.ForeignKey(settings.AUTH_USER_MODEL, blank=True,null=True, on_delete=models.SET_NULL, related_name='plays')
series = models.ForeignKey(Series, blank=True, null=True, on_delete=models.SET_NULL, related_name='plays')
puzzle_completed = models.BooleanField(default=None, blank=False, null=False)
...
each user can play any puzzle several times, each time creating a Play record.
that means that for a given set of (user,series,puzzle) we can have several Play records,
some with puzzle_completed = True, some with puzzle_completed = False
What I am trying (unsuccesfully) to achieve, is to calculate for each series, through an annotation, the number of puzzles nb_completed_by_user and nb_not_completed_by_user.
For nb_completed_by_user, I have something which works in almost all cases (I have one glitch in one of my test that I cannot explain so far):
Series.objects.annotate(nb_completed_by_user=Count('puzzles',
filter=Q(puzzles__plays__puzzle_completed=True,
puzzles__plays__series_id=F('id'),puzzles__plays__user=user), distinct=True))
For nb_not_completed_by_user, I was able to make a query on Puzzle that gives me the good answer, but I am not able to transform it into a Subquery expression that works without throwing up an error, or to get a Count expression to give me the proper answer.
This one works:
puzzles = Puzzle.objects.filter(~Q(plays__puzzle_completed=True,
plays__series_id=1, plays__user=user),series=s)
but when trying to move to a subquery, I cannot find the way to use the following expression not to throw the error:ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
pzl_completed_by_user = Puzzle.objects.filter(plays__series_id=OuterRef('id')).exclude(
plays__puzzle_completed=True,plays__series_id=OuterRef('id'), plays__user=user)
and the following Count expression doesn't give me the right result:
Series.objects.annotate(nb_not_completed_by_user=Count('puzzles', filter=~Q(
puzzle__plays__puzzle_completed=True, puzzle__plays__series_id=F('id'),
puzzle__plays__user=user))
Could anybody explain me how I could obtain both values ?
and eventually to propose me a link which explains clearly how to use subqueries for less-obvious cases than those in the official documentation
Thanks in advance
Edit March 2021:
I recently found two posts which guided me through one potential solution to this specific issue:
Django Count and Sum annotations interfere with each other
and
Django 1.11 Annotating a Subquery Aggregate
I implemented the proposed solution from https://stackoverflow.com/users/188/matthew-schinckel and https://stackoverflow.com/users/1164966/benoit-blanchon
having help classes: class SubqueryCount(Subquery) and class SubquerySum(Subquery)
class SubqueryCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = PositiveIntegerField()
class SubquerySum(Subquery):
template = '(SELECT sum(_sum."%(column)s") FROM (%(subquery)s) _sum)'
def __init__(self, queryset, column, output_field=None, **extra):
if output_field is None:
output_field = queryset.model._meta.get_field(column)
super().__init__(queryset, output_field, column=column, **extra)
It works extremely well ! and is far quicker than the conventional Django Count annotation.
... at least in SQlite, and probably PostgreSQL as stated by others.
But when I tried in a MariaDB environnement ... it crashed !
MariaDB is apparently not able / not willing to handle correlated subqueries as those are considered sub-optimal.
In my case, as I try to get from the database multiple Count/distinct annotations for each record at the same time, I really see a tremendous gain in performance (in SQLite)
that I would like to replicate in MariaDB.
Would anyone be able to help me figure out a way to implement those helper functions for MariaDB ?
What should template be in this environnement?
matthew-schinckel ?
benoit-blanchon ?
rktavi ?
Going a bit deeper and analysis the Django docs a bit more in details, I was finally able to produce a satisfying way to produce a Count or Sum based on subquery.
For simplifying the process, I defined the following helper functions:
To generate the subquery:
def get_subquery(app_label, model_name, reference_to_model_object, filter_parameters={}):
"""
Return a subquery from a given model (work with both FK & M2M)
can add extra filter parameters as dictionary:
Use:
subquery = get_subquery(
app_label='puzzles', model_name='Puzzle',
reference_to_model_object='puzzle_family__target'
)
or directly:
qs.annotate(nb_puzzles=subquery_count(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target')),)
"""
model = apps.get_model(app_label, model_name)
# we need to declare a local dictionary to prevent the external dictionary to be changed by the update method:
parameters = {f'{reference_to_model_object}__id': OuterRef('id')}
parameters.update(filter_parameters)
# putting '__id' instead of '_id' to work with both FK & M2M
return model.objects.filter(**parameters).order_by().values(f'{reference_to_model_object}__id')
To count the subquery generated through get_subquery:
def subquery_count(subquery):
"""
Use:
qs.annotate(nb_puzzles=subquery_count(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target')),)
"""
return Coalesce(Subquery(subquery.annotate(count=Count('pk', distinct=True)).order_by().values('count'), output_field=PositiveIntegerField()), 0)
To sum the subquery generated through get_subquery on the field field_to_sum:
def subquery_sum(subquery, field_to_sum, output_field=None):
"""
Use:
qs.annotate(total_points=subquery_sum(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target'),'points'),)
"""
if output_field is None:
output_field = queryset.model._meta.get_field(column)
return Coalesce(Subquery(subquery.annotate(result=Sum(field_to_sum, output_field=output_field)).order_by().values('result'), output_field=output_field), 0)
The required imports:
from django.db.models import Count, Subquery, PositiveIntegerField, DecimalField, Sum
from django.db.models.functions import Coalesce
I spent so many hours on solving this ...
I hope that this will save many of you all the frustration I went through figuring out the right way to proceed.
I'm new to Django and I'm facing a question to which I didn't an answer to on Stackoverflow.
Basically, I have 2 models, Client and Order defined as below:
class Client(models.Model):
name = models.CharField(max_length=200)
registration_date = models.DateTimeField(default=timezone.now)
# ..
class Order(models.Model):
Client = models.ForeignKey(ModelA, on_delete=models.CASCADE, related_name='orders')
is_delivered = models.BooleanField(default=False)
order_date = models.DateTimeField(default=timezone.now)
# ..
I would like my QuerySet clients_results to fulfill the 2 following conditions:
Client objects fill some conditions (for example, their name start with "d" and they registered in 2019, but it could be more complex)
Order objects I can access by using the orders relationship defined in 'related_name' are only the ones that fills other conditions; for example, order is not delivered and was done in the last 6 weeks.
I could do this directly in the template but I feel this is not the correct way to do it.
Additionally, I read in the doc that Base Manager from Order shouldn't be used for this purpose.
Finally, I found a question relatively close to mine using Q and F, but in the end, I would get the order_id while, ideally, I would like to have the whole object.
Could you please advise me on the best way to address this need?
Thanks a lot for your help!
You probably should use a Prefetch(..) object [Django-doc] here to fetch the related non-delivered Orders for each Client, and stores these in the Clients, but then in a different attribute, since otherwise this can generate confusion.
You thus can create a queryset like:
from django.db.models import Prefetch
from django.utils.timezone import now
from datetime import timedelta
last_six_weeks = now() - timedelta(days=42)
clients_results = Client.objects.filter(
name__startswith='d'
).prefetch_related(
Prefetch(
'orders',
Order.objects.filter(is_delivered=False, order_date__gte=last_six_weeks),
to_attr='nondelivered_orders'
)
)
This will contain all Clients where the name starts with 'd', and each Client object that arises from this queryset will have an attribute nondelivered_orders that contains a list of Orders that are not delivered, and ordered in the last 42 days.
models.py:
class Ingredient(models.Model):
_est_param = None
param = models.ManyToManyField(Establishment, blank=True, null=True, related_name='+', through='IngredientParam')
def est_param(self, establishment):
if not self._est_param:
self._est_param, created = self.ingredientparam_set\
.get_or_create(establishment=establishment)
return self._est_param
class IngredientParam(models.Model):
#ingredient params
active = models.BooleanField(default=False)
ingredient = models.ForeignKey(Ingredient)
establishment = models.ForeignKey(Establishment)
I need to fetch all Ingredient with parametrs for Establishment. First I fetch Ingredients.objects.all() and use all params like Ingredients.objects.all()[0].est_param(establishment).active. How I can use django 1.4 prefetch_related to make less sql queries? May be I can use other way to store individual Establishment properties for Ingredient?
Django 1.7 adds the Prefetch object you can put into prefetch_related. It allows you to specify a queryset which should provide the filtering. I'm having some problems with it at the moment for getting a singular (latest) entry from a list, but it seems to work very well when trying to get all the related entries.
You could also checkout django-prefetch which is part of this question which does not seem a duplicate of this question because of the vastly different wording.
The following code would fetch all the ingredients and their parameters in 2 queries:
ingredients = Ingredients.objects.all().prefetch_related('ingredientparam_set')
You could then access the parameters you're interested in without further database queries.
I have an application where users select their own display columns. Each display column has a specified formula. To compute that formula, I need to join few related columns (one-to-one relationship) and compute the value.
The models are like (this is just an example model, actual has more than 100 fields):
class CompanyCode(models.Model):
"""Various Company Codes"""
nse_code = models.CharField(max_length=20)
bse_code = models.CharField(max_length=20)
isin_code = models.CharField(max_length=20)
class Quarter(models.Model):
"""Company Quarterly Result Figures"""
company_code = models.OneToOneField(CompanyCode)
sales_now = models.IntegerField()
sales_previous = models.IntegerField()
I tried doing:
ratios = {'growth':'quarter__sales_now / quarter__sales_previous'}
CompanyCode.objects.extra(select=ratios)
# raises "Unknown column 'quarter__sales_now' in 'field list'"
I also tried using raw query:
query = ','.join(['round((%s),2) AS %s' % (formula, ratio_name)
for ratio_name, formula in ratios.iteritems()])
companies = CompanyCode.objects.raw("""
SELECT `backend_companycode`.`id`, %s
FROM `backend_companycode`
INNER JOIN `backend_quarter` ON ( `backend_companycode`.`id` = `backend_companyquarter`.`company_code_id` )
""", [query])
#This just gives empty result
So please give me a little clue as to how I can use related columns preferably using 'extra' command. Thanks.
By now the Django documentation says that one should use extra as a last resort.
So here is a query without extra():
from django.db.models import F
CompanyCode.objects.annotate(
growth=F('quarter__sales_now') / F('quarter__sales_previous'),
)
Since the calculation is being done on a single Quarter instance, where's the need to do it in the SELECT? You could just define a ratio method/property on the Quarter model:
#property
def quarter(self):
return self.sales_now / self.sales_previous
and call it where necessary
Ok, I found it out. In above using:
CompanyCode.objects.select_related('quarter').extra(select=ratios)
solved the problem.
Basically, to access any related model data through 'extra', we just need to ensure that that model is joined in our query. Using select_related, the query automatically joins the mentioned models.
Thanks :).