Django 'not in' querying for related fields - django

I want to filter on objects that only have related objects with values in a finite set - here's how I tried to write it:
trips = Trip.objects\
.filter(study=study, field_values__field__name='mode', field_values__int_value__in=modes)\
.exclude(study=study, field_values__field__name='mode', field_values__int_value__not_in=modes)\
.all()
I think this would work, except 'not in' is not a valid operator. Unfortunately, 'not modes' here is an infinite set - it could be any int not in modes, so I can't 'exclude in [not modes].'
How can I write this with a Django query?

You can filter this with:
from django.db.models import Count, F, Q
Trip.objects.filter(
study=study,
field__values__field__name='mode'
).annotate(
total_values=Count('field_values')
).filter(
total_values=Count('field_values', filter=Q(field_values__int_value__in=modes)),
total_values__gt=0
)
Here we thus count the total number of related field_values with name_model, and the ones where the int_value is in the given modes. If both are the same, we know that no value exists outside of this.

Related

How to get boolean result in annotate django?

I have a filter which should return a queryset with 2 objects, and should have one different field. for example:
obj_1 = (name='John', age='23', is_fielder=True)
obj_2 = (name='John', age='23', is_fielder=False)
Both the objects are of same model, but different primary key. I tried usign the below filter:
qs = Model.objects.filter(name='John', age='23').annotate(is_fielder=F('plays__outdoor_game_role')=='Fielder')
I used annotate first time, but it gave me the below error:
TypeError: QuerySet.annotate() received non-expression(s): False.
I am new to Django, so what am I doing wrong, and what should be the annotate to get the required objects as shown above?
The solution by #ktowen works well, quite straightforward.
Here is another solution I am using, hope it is helpful too.
queryset = queryset.annotate(is_fielder=ExpressionWrapper(
Q(plays__outdoor_game_role='Fielder'),
output_field=BooleanField(),
),)
Here are some explanations for those who are not familiar with Django ORM:
Annotate make a new column/field on the fly, in this case, is_fielder. This means you do not have a field named is_fielder in your model while you can use it like plays.outdor_game_role.is_fielder after you add this 'annotation'. Annotate is extremely useful and flexible, can be combined with almost every other expression, should be a MUST-KNOWN method in Django ORM.
ExpressionWrapper basically gives you space to wrap a more complecated combination of conditions, use in a format like ExpressionWrapper(expression, output_field). It is useful when you are combining different types of fields or want to specify an output type since Django cannot tell automatically.
Q object is a frequently used expression to specify a condition, I think the most powerful part is that it is possible to chain the conditions:
AND (&): filter(Q(condition1) & Q(condition2))
OR (|): filter(Q(condition1) | Q(condition2))
Negative(~): filter(~Q(condition))
It is possible to use Q with normal conditions like below:
(Q(condition1)|id__in=[list])
The point is Q object must come to the first or it will not work.
Case When(then) can be simply explained as if con1 elif con2 elif con3 .... It is quite powerful and personally, I love to use this to customize an ordering object for a queryset.
For example, you need to return a queryset of watch history items, and those must be in an order of watching by the user. You can do it with for loop to keep the order but this will generate plenty of similar queries. A more elegant way with Case When would be:
item_ids = [list]
ordering = Case(*[When(pk=pk, then=pos)
for pos, pk in enumerate(item_ids)])
watch_history = Item.objects.filter(id__in=item_ids)\
.order_by(ordering)
As you can see, by using Case When(then) it is possible to bind those very concrete relations, which could be considered as 1) a pinpoint/precise condition expression and 2) especially useful in a sequential multiple conditions case.
You can use Case/When with annotate
from django.db.models import Case, BooleanField, Value, When
Model.objects.filter(name='John', age='23').annotate(
is_fielder=Case(
When(plays__outdoor_game_role='Fielder', then=Value(True)),
default=Value(False),
output_field=BooleanField(),
),
)

Improve Django queryset performance when using annotate Exists

I have a queryset that returns a lot of data, it can be filtered by year which will return around 100k lines, or show all which will bring around 1 million lines.
The objective of this annotate is to generate a xlsx spreadsheet.
Models representation, RelatedModel is manytomany between Model and AnotherModel
Model:
id
field1
field2
field3
RelatedModel:
foreign_key_model (Model)
foreign_key_another (AnotherModel)
Queryset, if the relation exists it will annotate, this annotate is very slow and can take several minutes.
Model.objects.all().annotate(
related_exists=Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id'))),
related_column=Case(
When(related_exists=True, then=Value('The relation exists!')),
When(related_exists=False, then=Value('The relation doesn't exist!')),
default=Value('This is the default value!'),
output_field=CharField(),
)
).values_list(
'related_column',
'field1',
'field2',
'field3'
)
If only thing needed is to change how True / False is displayed in xlsx - one option is to just have one related_exists BooleanField annotation and later customize how it will be converted when creating xlsx document - i.e. in serializer. Database should store raw / unformatted values, and app prepare them to be shown to user.
Other things to consider:
Indexes to speed-up filtering.
If you have millions of records after filtering, in one table - maybe table partitioning could be considered.
But let's look into raw sql of original query. It will be like this:
SELECT [model_fields],
EXISTS([CLIENT_SELECT]) AS related_exists,
CASE
WHEN EXISTS([CLIENT_SELECT]) = true THEN 'The relation exists!'
WHEN EXISTS([CLIENT_SELECT]) = true THEN 'The relation does not exist!'
ELSE 'The relation exists!'
END AS related_column
FROM model;
And right away we can see nested query for Exists CLIENT_SELECT is there 3 times. Even though it is exactly the same, it may be executed minimum 2 times and up to 3 times. Database may optimize it to be faster than 3x, but it still is not optimal as 1x.
First, EXISTS returns either True or False, we can leave just one check that it is True, making 'The relation does not exist!' the default value.
related_column=Case(
When(related_exists=True, then=Value('The relation exists!')),
default=Value('The relation does not exist!')
Why related_column performs same select again and not takes the value of related_exists?
Because we cannot reference calculated columns while calculating another columns - and this is database level constraint django knows about and duplicates expression.
Wait, then we actually do not need related_exists column, lets just leave related_column with CASE statement and 1 exists subquery.
Here comes Django - we cannot (till 3.0) use expressions in filters without annotating them first.
So, it our case it is like: in order to use Exist in When, we first need to add it as annotation, but it won't be used as a reference, but a full copy of expression.
Good news!
Since Django 3.0 we can use expressions that output BooleanField directly in QuerySet filters, without having to first annotate. Exists is one of such BooleaField expressions.
Model.objects.all().annotate(
related_column=Case(
When(
Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id'))),
then=Value('The relation exists!'),
),
default=Value('The relation doesn't exist!'),
output_field=CharField(),
)
)
And only one nested select, and one annotated field.
Django 2.1, 2.2
Here's the commit that finalized allowance of boolean expressions although many pre-conditions for it were added earlier. One of them is presence of conditional attribute on expression object and check for this attribute.
So, although not recommended and not tested it seems quite working little hack for Django 2.1, 2.2 (before there was no conditional check, and it will require more intrusive changes):
create Exists expression instance
monkey patch it with conditional = True
use it as condition in When statement
related_model_exists = Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id')))
setattr(related_model_exists, 'conditional', True)
Model.objects.all().annotate(
related_column=Case(
When(
relate_model_exists,
then=Value('The relation exists!'),
),
default=Value('The relation doesn't exist!'),
output_field=CharField(),
)
)
Related checks
relatedmodel_set__isnull=True check is not suitable for several reasons:
it performs LEFT OUTER JOIN - that is less efficient than EXISTS
it performs LEFT OUTER JOIN - it joins tables, this makes it ONLY suitable in filter() condition (not in annotate - When), and only for OneToOne or OneToMany (One is on relatedmodel side) relations
You can considerably simplify your query to:
from django.db.models import Count
Model.objects.all().annotate(
related_column=Case(
When(relatedmodel_set__isnull=True, then=Value("The relation doesn't exist!")),
default=Value("The relation exists!"),
output_field=CharField()
)
)
Where relatedmodel_set is the related_name on your foreign key.

How to calculate count of related many to many objects based on another queryset?

class Zone(Model):
...
class Flight(Model):
zones = ManyToManyField(Zone)
flights = Flight.objects.filter(...)
qs1 = Zone.objects.annotate(
count=flights.filter(zones__pk=F('pk')).distinct().count(), # this is not valid expression
)
Despite having F inside queryset with count() in annotation it still throw an error TypeError: QuerySet.annotate() received non-expression(s): 0. meaning that that queryset was executed in place.
Also doesn't work, but this time it just returns invalid value (always 1, always counting Zone single object instead of what inside filter):
qs1 = Zone.objects.annotate(
count=Count('pk', filter=flights.filter(zones__pk=F('pk'))), # with 'flight' instead of first 'pk' it also doesn't work
)
A .count() is evaluated eagerly in Django, so Django will try to evaluate the flights.filter(zones__pk=F('pk')).distinct().count(), and succeed to do so, since F('pk') will count the number of fligts where there are zones that happen to have the same primary key as the primary key of the Flight. You will need to use OuterRef [Django-doc], and an .annotate(..) on the subquery.
But you make this too complex. You can simply annotate with:
from django.db.models import Q, Sum
Zone.objects.annotate(
count=Count('flight', distinct=True, filter=Q(flight__…))
)
Here the filter=Q(flight__…) is the part of the filter of your flights. So if the Flights are filtered by a hypothetical active=True, you filter with:
Zone.objects.annotate(
count=Count('flight', distinct=True, filter=Q(flight__active=True))
)

Django Aggregate- Division with Zero Values

I am using Django's aggregate query expression to total some values. The final value is a division expression that may sometimes feature zero as a denominator. I need a way to escape if this is the case, so that it simply returns 0.
I've tried the following, as I've been using something similar my annotate expressions:
from django.db.models import Sum, F, FloatField, Case, When
def for_period(self, start_date, end_date):
return self.model.objects.filter(
date__range=(start_date, end_date)
).aggregate(
sales=Sum(F("value")),
purchase_cogs=Sum(F('purchase_cogs')),
direct_cogs=Sum(F("direct_cogs")),
profit=Sum(F('profit'))
).aggregate(
margin=Case(
When(sales=0, then=0),
default=(Sum(F('profit')) / Sum(F('value')))*100
)
)
However, it obviously doesn't work, because as the error says:
'dict' object has no attribute 'aggregate'
What is the proper way to handle this?
This will obviously not work; because aggregate returns a dictionary, not a QuerySet (see the docs), so you can't chain two aggregate calls together.
I think using annotate will solve your issue. annotate is almost identical to aggregate, except in that it returns a QuerySet with the results saved as attributes rather than return a dictionary. The result is that you can chain annotate calls, or even call annotate then aggregate.
So I believe something like:
return self.model.objects.filter(
date__range=(start_date, end_date)
).annotate( # call `annotate`
sales=Sum(F("value")),
purchase_cogs=Sum(F('purchase_cogs')),
direct_cogs=Sum(F("direct_cogs")),
profit=Sum(F('profit'))
).aggregate( # then `aggregate`
margin=Case(
When(sales=0, then=0),
default=(Sum(F('profit')) / Sum(F('value')))*100
)
)
should work.
Hope this helps.
I've made it work (in Django 2.0) with:
from django.db.models import Case, F, FloatField, Sum, When
aggr_results = models.Result.objects.aggregate(
at_total_units=Sum(F("total_units")),
ag_pct_units_sold=Case(
When(at_total_units=0, then=0),
default=Sum("sold_units") / (1.0 * Sum("total_units")) * 100,
output_field=FloatField(),
),
)
You can't chain together aggregate statements like that. The docs say:
aggregate() is a terminal clause for a QuerySet that, when invoked,
returns a dictionary of name-value pairs.
It returns a python dict, so you'll need to figure out a way to modify your query to do it all at once. You might be able to replace the first call to aggregate with annotate instead, as it returns a queryset:
Unlike aggregate(), annotate() is not a terminal clause. The output of
the annotate() clause is a QuerySet
As for the division by 0 possibility, you could wrap your code in a try catch block, watching for ZeroDivisionError.

Django conditional annotation

I'm surprised that this question apparently doesn't yet exist. If it does, please help me find it.
I want to use annotate (Count) and order_by, but I don't want to count every instance of a related object, only those that meet a certain criteron.
To wit, that I might list swallows by the number of green coconuts they have carried:
swallow.objects.annotate(num_coconuts=Count('coconuts_carried__husk__color = "green"').order_by('num_coconuts')
For Django >= 1.8:
from django.db.models import Sum, Case, When, IntegerField
swallow.objects.annotate(
num_coconuts=Sum(Case(
When(coconuts_carried__husk__color="green", then=1),
output_field=IntegerField(),
))
).order_by('num_coconuts')
This should be the right way.
swallow.objects.filter(
coconuts_carried__husk__color="green"
).annotate(
num_coconuts=Count('coconuts_carried')
).order_by('num_coconuts')
Note that when you filter for a related field, in raw SQL it translates as a LEFT JOIN plus a WHERE. In the end the annotation will act on the result set, which contains only the related rows which are selected from the first filter.