Django CheckConstraint: Elements of reverse ForeignKey lookup must not be empty - django

I've got these models:
class Container(models.Model):
...
class Meta:
constraints = [
models.CheckConstraint(
check=~Q(elements=None),
name='container_must_have_elements'
),
]
class Element(models.Model):
container = models.ForeignKey(Container),
related_name='elements',
on_delete=models.CASCADE
)
I want to enforce the constraint that every Container object must have at least one Element referencing it via the foreign key relation.
As you can see I already added a check constraint. However, the negation operator ~ on the Q object seems to be forbidden. I get django.db.utils.NotSupportedError: cannot use subquery in check constraint when I try to apply the generated migration.
Without the negation operator the constraint seems to be valid (it only fails due to a data integrity error).
Is there another way I can express this constraint so it is supported by CheckConstraint?
(E.g. is there a way to check if the set of elements is not empty?)

I'll answer my own question by summarizing the question's comments.
A check constraint is intended to check every row in a table for a condition, which only takes the row itself into consideration and does not join other tables for this.
Sticking with SQL, one can formulate extended constraints including other tables by defining a function in SQL and calling it from within the constraint.
The CheckConstraint introduced in Django 2.2 only supports conditions on the table itself by using Q objects.
Update:
Since Django 3.1, CheckConstraints not only support Q objects but also boolean Expressions. See the Django 3.2 documentation.

Related

Django 4 how to do annotation over boolean field queryset?

I found this discussion about annotation over querysets. I wonder, if that ever got implemented?
When I run the following annotation:
x = Isolate.objects.all().annotate(sum_shipped=Sum('shipped'))
x[0].sum_shipped
>>> True
shipped is a boolean field, so I would expect here the number of instances where shipped is set to True. Instead, I get True or 1. That is pretty inconvenient behaviour.
There is also this discussion on stackoverflow. However, this only covers django 1.
I would expect something like Sum([True, True, False]) -> 2.
Instead, True is returned.
Seems this was not touched since then. Is that discussion in the second link still the state-of-the-art ???
Is there a better way to do statistics with the content of the database than this, now that Django is in version 4?
My current database is sqlite3 for development and testing, but will be Postgres in production. I would very much like to get database independent results.
SUM(bool_field) on Oracle will return 1 also, because bools in Oracle are just a bit (1 or 0). Postgres has a specific bool type, and you can't sum a True/False without an implicit conversion to int. This is why I say SUM(bool) is only accidentally supported for a subset of databases. The field type that is returned is based on the backend get_db_converters and the actual values that come back.
Example model
class Isolate(models.Model):
isolate_nbr = models.CharField(max_length=10, primary_key=True)
growing = models.BooleanField(default=True)
shipped = models.BooleanField(default=True)
....
I think you have a few options here, an annotate is not one of them at this time.
from django.db.models import Sum, Count
# Use aggregate with a filter
print(Isolate.objects.filter(shipped=True).aggregate(Sum('shipped')))
# Just filter then get the count of the queryset
print(Isolate.objects.filter(shipped=True).count())
# Just use aggregate without filter (this will only aggregate/Sum the True values)
print(Isolate.objects.aggregate(Sum('shipped')))
# WON'T WORK: annotate will only annotate on that row whatever that row's value is, not the aggregate across the table
print(Isolate.objects.annotate(sum_shipped==Count('shipped')).first().sum_shipped)

Improve Django queryset performance when using annotate Exists

I have a queryset that returns a lot of data, it can be filtered by year which will return around 100k lines, or show all which will bring around 1 million lines.
The objective of this annotate is to generate a xlsx spreadsheet.
Models representation, RelatedModel is manytomany between Model and AnotherModel
Model:
id
field1
field2
field3
RelatedModel:
foreign_key_model (Model)
foreign_key_another (AnotherModel)
Queryset, if the relation exists it will annotate, this annotate is very slow and can take several minutes.
Model.objects.all().annotate(
related_exists=Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id'))),
related_column=Case(
When(related_exists=True, then=Value('The relation exists!')),
When(related_exists=False, then=Value('The relation doesn't exist!')),
default=Value('This is the default value!'),
output_field=CharField(),
)
).values_list(
'related_column',
'field1',
'field2',
'field3'
)
If only thing needed is to change how True / False is displayed in xlsx - one option is to just have one related_exists BooleanField annotation and later customize how it will be converted when creating xlsx document - i.e. in serializer. Database should store raw / unformatted values, and app prepare them to be shown to user.
Other things to consider:
Indexes to speed-up filtering.
If you have millions of records after filtering, in one table - maybe table partitioning could be considered.
But let's look into raw sql of original query. It will be like this:
SELECT [model_fields],
EXISTS([CLIENT_SELECT]) AS related_exists,
CASE
WHEN EXISTS([CLIENT_SELECT]) = true THEN 'The relation exists!'
WHEN EXISTS([CLIENT_SELECT]) = true THEN 'The relation does not exist!'
ELSE 'The relation exists!'
END AS related_column
FROM model;
And right away we can see nested query for Exists CLIENT_SELECT is there 3 times. Even though it is exactly the same, it may be executed minimum 2 times and up to 3 times. Database may optimize it to be faster than 3x, but it still is not optimal as 1x.
First, EXISTS returns either True or False, we can leave just one check that it is True, making 'The relation does not exist!' the default value.
related_column=Case(
When(related_exists=True, then=Value('The relation exists!')),
default=Value('The relation does not exist!')
Why related_column performs same select again and not takes the value of related_exists?
Because we cannot reference calculated columns while calculating another columns - and this is database level constraint django knows about and duplicates expression.
Wait, then we actually do not need related_exists column, lets just leave related_column with CASE statement and 1 exists subquery.
Here comes Django - we cannot (till 3.0) use expressions in filters without annotating them first.
So, it our case it is like: in order to use Exist in When, we first need to add it as annotation, but it won't be used as a reference, but a full copy of expression.
Good news!
Since Django 3.0 we can use expressions that output BooleanField directly in QuerySet filters, without having to first annotate. Exists is one of such BooleaField expressions.
Model.objects.all().annotate(
related_column=Case(
When(
Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id'))),
then=Value('The relation exists!'),
),
default=Value('The relation doesn't exist!'),
output_field=CharField(),
)
)
And only one nested select, and one annotated field.
Django 2.1, 2.2
Here's the commit that finalized allowance of boolean expressions although many pre-conditions for it were added earlier. One of them is presence of conditional attribute on expression object and check for this attribute.
So, although not recommended and not tested it seems quite working little hack for Django 2.1, 2.2 (before there was no conditional check, and it will require more intrusive changes):
create Exists expression instance
monkey patch it with conditional = True
use it as condition in When statement
related_model_exists = Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id')))
setattr(related_model_exists, 'conditional', True)
Model.objects.all().annotate(
related_column=Case(
When(
relate_model_exists,
then=Value('The relation exists!'),
),
default=Value('The relation doesn't exist!'),
output_field=CharField(),
)
)
Related checks
relatedmodel_set__isnull=True check is not suitable for several reasons:
it performs LEFT OUTER JOIN - that is less efficient than EXISTS
it performs LEFT OUTER JOIN - it joins tables, this makes it ONLY suitable in filter() condition (not in annotate - When), and only for OneToOne or OneToMany (One is on relatedmodel side) relations
You can considerably simplify your query to:
from django.db.models import Count
Model.objects.all().annotate(
related_column=Case(
When(relatedmodel_set__isnull=True, then=Value("The relation doesn't exist!")),
default=Value("The relation exists!"),
output_field=CharField()
)
)
Where relatedmodel_set is the related_name on your foreign key.

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

Django QuerySet access foreign key field directly, without forcing a join

Suppose you have a model Entry, with a field "author" pointing to another model Author. Suppose this field can be null.
If I run the following QuerySet:
Entry.objects.filter(author=X)
Where X is some value. Suppose in MySQL I have setup a compound index on Entry for some other column and author_id, ideally I'd like the SQL to just use "author_id" on the Entry model, so that it can use the compound index.
It turns out that Entry.objects.filter(author=5) would work, no join is done. But, if I say author=None, Django does a join with Author, then add to the Where clause Author.id IS NULL. So in this case, it can't use the compound index.
Is there a way to tell Django to just check the pk, and not follow the link?
The only way I know is to add an additional .extra(where=['author_id IS NULL']) to the QuerySet, but I was hoping some magic in .filter() would work.
Thanks.
(Sorry I was not clearer earlier about this, and thanks for the answers from lazerscience and Josh).
Does this not work as expected?
Entry.objects.filter(author=X.id)
You can either use a model or the model id in a foreign key filter. I can't check right yet if this executes a separate query, though I'd really hope it wouldn't.
If do as you described and do not use select_related() Django will not perform any join at all - no matter if you filter for the primary key of the related object or the related itself (which doesn't make any difference).
You can try:
print Entry.objects.(author=X).query
Assuming that the foreign key to Author has the name author_id, (if you didn't specify the name of the foreign key column for ForeignKey field, it should be NAME_id, if you specified the name, then check the model definition / your database schema),
Entry.objects.filter(author_id=value)
should work.
Second Attempt:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#isnull
Maybe you can have a separate query, depending on whether X is null or not by having author__isnull?
Pretty late, but I just ran into this. I'm using Q objects to build up the query, so in my case this worked fine:
~Q(author_id__gt=0)
This generates sql like
NOT ("author_id" > 0 AND "author_id" IS NOT NULL)
You could probably solve the problem in this question by using
Entry.objects.exclude(author_id__gt=0)

Django query to select parent with nonzero children

I have model with a Foreign Key to itself like this:
class Concept(models.Model):
name = models.CharField(max_length=200)
category = models.ForeignKey('self')
But I can't figure out how I can select all concepts that have nonzero children value. Is this possible with django QuerySet API or I must write custom SQL?
If I understand it correctly, each Concept may have another Concept as parent, and this is set into the category field.
In other words, a Concept with at least a child will be referenced at least once in the category field.
Generally speaking, this is not really easy to get in Django; however if you do not have too many categories, you can think for a query of the like of SELECT * FROM CONCEPTS WHERE CONCEPTS.ID IN (SELECT CATEGORY FROM CONCEPTS); - and this is something you can map easily with Django:
Concept.objects.filter(pk__in=Concept.objects.all().values('category'))
Note that, as stated on Django documentation, this query may have performance issues on certain databases; therefore you should instead put it as a list:
Concept.objects.filter(id__in=list(Concept.objects.all().values('category')))
But please be aware that this could hit some database limitation -- for instance, Oracle allows up to 1000 elements in such lists.
How about something like this:
concepts = Concept.objects.exclude(category=None)
The way you have it written there will require a value for category. Once you have fixed that (with null=True in the field constructor), use this:
Concept.objects.filter(category__isnull=False)