Django conditional many-to-many relation exists check - django

The model is:
class RecordModel(BaseModel):
visibility_setting = models.PositiveIntegerField()
visible_to = models.ManyToManyField(UserModel, blank=True)
I need to return rows depending on row visibility_setting:
if visibility_setting == 0 - return row without any checks,
if visibility_setting == 1 - I need to check if user is in visible_to m2m relation exists.
In second case query works okay, but in first case my approach of setting None on first case not working (as expected):
Feed.objects.filter(
visible_to=Case(
When(visibility_setting=0, then=None), # no data returned in this case, but I want to
# return all rows with visibility_setting=0
When(visibility_setting=1, then=user.id), # rows with visibility_setting=1 are queried okay
)
)
I am a little bit stuck which way to use in this situation? Can we not apply case at all in some conditions or use some specific default to skip visible_to relations check?

You can .filter(…) [Django-doc] with:
from django.db.models import Q
Feed.objects.filter(Q(visibility_setting=0) | Q(visible_to=my_user)).distinct()
the .distinct() call [Django-doc]
prevents returning the same Feed multiple times.

Related

sqlite3.IntegrityError: UNIQUE constraint failed | When using bulk adding many-to-many relation

I have two models, where one of them has a TreeManyToMany field (a many-to-many field for django-mptt models).
class ProductCategory(models.Model):
pass
class Product(models.Model):
categories = TreeManyToManyField('ProductCategory')
I'm trying to add categories to the Product objects in bulk by using the through-table, as seen in this thread or this.
I do this with a list of pairs: product_id and productcategory_id, which is also what my through-table contains. This yields the following error:
sqlite3.IntegrityError: UNIQUE constraint failed: products_product_categories.product_id, products_product_categories.productcategory_id
My code looks as follows:
def bulk_add_cats(generic_df):
# A pandas dataframe where ["product_object"] is the product_id
# and ["found_categories"] is the productcategory_id to add to that product.
generic_df = generic_df.explode("found_categories")
generic_df["product_object"] = generic_df.apply(
lambda x: Product.objects.filter(
merchant = x["forhandler"], product_id = x["produktid"]
).values('id')[0]['id'], axis=1
)
# The actual code
# Here row.product_object is the product_id and row.found_categories is one
# particular productcategory_id, so they make up a pair to add to the through-table.
through_objs = [
Product.categories.through(
product_id = row.product_object,
productcategory_id = row.found_categories
) for row in generic_df.itertuples()
]
Product.categories.through.objects.bulk_create(through_objs, batch_size=1000)
I have also done the following, to check that there are no duplicate pairs in the set of pairs that I want to add. There were none:
print(generic_df[generic_df.duplicated(subset=['product_object','found_categories'], keep=False)])
I suspect that the error happens because some of the product-to-productcategory relations already exist in the table. So maybe I should check if that is the case first, for each of the pairs, and then do the bulk_create. I just want to hope to retain effiency and would be sad if I have to iterate through each pair to check. Is there are way to bulk-update-or-create for this type of problem?
Or what do you think? Any help is highly appreciated :-)
You should probably check if the reason why you are having those errors is because of duplicate data. If that's the case, I think the ignore_conflicts parameter of bulk_create can solve your problem. It will save every valid entry and ignore those who have duplicate conflict.
Check the doc:
https://docs.djangoproject.com/en/3.2/ref/models/querysets/#django.db.models.query.QuerySet.bulk_create

How to get boolean result in annotate django?

I have a filter which should return a queryset with 2 objects, and should have one different field. for example:
obj_1 = (name='John', age='23', is_fielder=True)
obj_2 = (name='John', age='23', is_fielder=False)
Both the objects are of same model, but different primary key. I tried usign the below filter:
qs = Model.objects.filter(name='John', age='23').annotate(is_fielder=F('plays__outdoor_game_role')=='Fielder')
I used annotate first time, but it gave me the below error:
TypeError: QuerySet.annotate() received non-expression(s): False.
I am new to Django, so what am I doing wrong, and what should be the annotate to get the required objects as shown above?
The solution by #ktowen works well, quite straightforward.
Here is another solution I am using, hope it is helpful too.
queryset = queryset.annotate(is_fielder=ExpressionWrapper(
Q(plays__outdoor_game_role='Fielder'),
output_field=BooleanField(),
),)
Here are some explanations for those who are not familiar with Django ORM:
Annotate make a new column/field on the fly, in this case, is_fielder. This means you do not have a field named is_fielder in your model while you can use it like plays.outdor_game_role.is_fielder after you add this 'annotation'. Annotate is extremely useful and flexible, can be combined with almost every other expression, should be a MUST-KNOWN method in Django ORM.
ExpressionWrapper basically gives you space to wrap a more complecated combination of conditions, use in a format like ExpressionWrapper(expression, output_field). It is useful when you are combining different types of fields or want to specify an output type since Django cannot tell automatically.
Q object is a frequently used expression to specify a condition, I think the most powerful part is that it is possible to chain the conditions:
AND (&): filter(Q(condition1) & Q(condition2))
OR (|): filter(Q(condition1) | Q(condition2))
Negative(~): filter(~Q(condition))
It is possible to use Q with normal conditions like below:
(Q(condition1)|id__in=[list])
The point is Q object must come to the first or it will not work.
Case When(then) can be simply explained as if con1 elif con2 elif con3 .... It is quite powerful and personally, I love to use this to customize an ordering object for a queryset.
For example, you need to return a queryset of watch history items, and those must be in an order of watching by the user. You can do it with for loop to keep the order but this will generate plenty of similar queries. A more elegant way with Case When would be:
item_ids = [list]
ordering = Case(*[When(pk=pk, then=pos)
for pos, pk in enumerate(item_ids)])
watch_history = Item.objects.filter(id__in=item_ids)\
.order_by(ordering)
As you can see, by using Case When(then) it is possible to bind those very concrete relations, which could be considered as 1) a pinpoint/precise condition expression and 2) especially useful in a sequential multiple conditions case.
You can use Case/When with annotate
from django.db.models import Case, BooleanField, Value, When
Model.objects.filter(name='John', age='23').annotate(
is_fielder=Case(
When(plays__outdoor_game_role='Fielder', then=Value(True)),
default=Value(False),
output_field=BooleanField(),
),
)

Improve Django queryset performance when using annotate Exists

I have a queryset that returns a lot of data, it can be filtered by year which will return around 100k lines, or show all which will bring around 1 million lines.
The objective of this annotate is to generate a xlsx spreadsheet.
Models representation, RelatedModel is manytomany between Model and AnotherModel
Model:
id
field1
field2
field3
RelatedModel:
foreign_key_model (Model)
foreign_key_another (AnotherModel)
Queryset, if the relation exists it will annotate, this annotate is very slow and can take several minutes.
Model.objects.all().annotate(
related_exists=Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id'))),
related_column=Case(
When(related_exists=True, then=Value('The relation exists!')),
When(related_exists=False, then=Value('The relation doesn't exist!')),
default=Value('This is the default value!'),
output_field=CharField(),
)
).values_list(
'related_column',
'field1',
'field2',
'field3'
)
If only thing needed is to change how True / False is displayed in xlsx - one option is to just have one related_exists BooleanField annotation and later customize how it will be converted when creating xlsx document - i.e. in serializer. Database should store raw / unformatted values, and app prepare them to be shown to user.
Other things to consider:
Indexes to speed-up filtering.
If you have millions of records after filtering, in one table - maybe table partitioning could be considered.
But let's look into raw sql of original query. It will be like this:
SELECT [model_fields],
EXISTS([CLIENT_SELECT]) AS related_exists,
CASE
WHEN EXISTS([CLIENT_SELECT]) = true THEN 'The relation exists!'
WHEN EXISTS([CLIENT_SELECT]) = true THEN 'The relation does not exist!'
ELSE 'The relation exists!'
END AS related_column
FROM model;
And right away we can see nested query for Exists CLIENT_SELECT is there 3 times. Even though it is exactly the same, it may be executed minimum 2 times and up to 3 times. Database may optimize it to be faster than 3x, but it still is not optimal as 1x.
First, EXISTS returns either True or False, we can leave just one check that it is True, making 'The relation does not exist!' the default value.
related_column=Case(
When(related_exists=True, then=Value('The relation exists!')),
default=Value('The relation does not exist!')
Why related_column performs same select again and not takes the value of related_exists?
Because we cannot reference calculated columns while calculating another columns - and this is database level constraint django knows about and duplicates expression.
Wait, then we actually do not need related_exists column, lets just leave related_column with CASE statement and 1 exists subquery.
Here comes Django - we cannot (till 3.0) use expressions in filters without annotating them first.
So, it our case it is like: in order to use Exist in When, we first need to add it as annotation, but it won't be used as a reference, but a full copy of expression.
Good news!
Since Django 3.0 we can use expressions that output BooleanField directly in QuerySet filters, without having to first annotate. Exists is one of such BooleaField expressions.
Model.objects.all().annotate(
related_column=Case(
When(
Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id'))),
then=Value('The relation exists!'),
),
default=Value('The relation doesn't exist!'),
output_field=CharField(),
)
)
And only one nested select, and one annotated field.
Django 2.1, 2.2
Here's the commit that finalized allowance of boolean expressions although many pre-conditions for it were added earlier. One of them is presence of conditional attribute on expression object and check for this attribute.
So, although not recommended and not tested it seems quite working little hack for Django 2.1, 2.2 (before there was no conditional check, and it will require more intrusive changes):
create Exists expression instance
monkey patch it with conditional = True
use it as condition in When statement
related_model_exists = Exists(RelatedModel.objects.filter(foreign_key_model=OuterRef('id')))
setattr(related_model_exists, 'conditional', True)
Model.objects.all().annotate(
related_column=Case(
When(
relate_model_exists,
then=Value('The relation exists!'),
),
default=Value('The relation doesn't exist!'),
output_field=CharField(),
)
)
Related checks
relatedmodel_set__isnull=True check is not suitable for several reasons:
it performs LEFT OUTER JOIN - that is less efficient than EXISTS
it performs LEFT OUTER JOIN - it joins tables, this makes it ONLY suitable in filter() condition (not in annotate - When), and only for OneToOne or OneToMany (One is on relatedmodel side) relations
You can considerably simplify your query to:
from django.db.models import Count
Model.objects.all().annotate(
related_column=Case(
When(relatedmodel_set__isnull=True, then=Value("The relation doesn't exist!")),
default=Value("The relation exists!"),
output_field=CharField()
)
)
Where relatedmodel_set is the related_name on your foreign key.

django + postgres and select distinct

I am running django with postgres and I need to query some record from a table, sorting them by rank, and get unique entry in respect of a foreign key.
Basically my model is something like this:
class BookingCatalog(models.Model):
.......
boat = models.ForeignKey(Boat, verbose_name=u"Boat", related_name="booking_catalog")
is_skippered = models.BooleanField(u'Is Skippered',choices=SKIPPER_CHOICE, default=False)
rank = models.IntegerField(u"Rank", default=0, db_index=True)
.......
The idea is to run something like this
BookingCatalog.objects.filter (...).order_by ('-rank', 'boat', 'is_skippered').distinct ('boat')
Unfortunately, this is not working since I am using postgres which raises this exception:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
What should I do instead?
The distinct argument has to match the first order argument. Try using this:
BookingCatalog.objects.filter(...) \
.order_by('boat', '-rank', 'is_skippered') \
.distinct('boat')
The way that I do this is to select the distinct objects first, then use those results to filter another queryset.
# Initial filtering
result = BookingCatalog.objects.filter(...)
# Make the results distinct
result = result.order_by('boat').distinct('boat')
# Extract the pks from the result
result_pks = result.values_list("pk", flat=True)
# Use those result pks to create a new queryset
restult_2 = BookingCatalog.objects.filter(pk__in=result_pks)
# Order that queryset
result_2 = result_2.order_by('-rank', 'is_skippered')
print(result_2)
I believe that this results in a single query being executed, which contains a subquery. I would love for someone who knows more about Django to confirm this though.
..ordering by -rank will give you the lowest rank of each duplicate, but your overall query results will be ordered by boat field
BookingCatalog.objects.filter (...).order_by('boat','-rank','is_skippered').distinct('boat')
For more info on, refer to Django documentation
including for Postgres

Django: ManyToMany filter matching on ALL items in a list

I have such a Book model:
class Book(models.Model):
authors = models.ManyToManyField(Author, ...)
...
In short:
I'd like to retrieve the books whose authors are strictly equal to a given set of authors. I'm not sure if there is a single query that does it, but any suggestions will be helpful.
In long:
Here is what I tried, (that failed to run getting an AttributeError)
# A sample set of authors
target_authors = set((author_1, author_2))
# To reduce the search space,
# first retrieve those books with just 2 authors.
candidate_books = Book.objects.annotate(c=Count('authors')).filter(c=len(target_authors))
final_books = QuerySet()
for author in target_authors:
temp_books = candidate_books.filter(authors__in=[author])
final_books = final_books and temp_books
... and here is what I got:
AttributeError: 'NoneType' object has no attribute '_meta'
In general, how should I query a model with the constraint that its ManyToMany field contains a set of given objects as in my case?
ps: I found some relevant SO questions but couldn't get a clear answer. Any good pointer will be helpful as well. Thanks.
Similar to #goliney's approach, I found a solution. However, I think the efficiency could be improved.
# A sample set of authors
target_authors = set((author_1, author_2))
# To reduce the search space, first retrieve those books with just 2 authors.
candidate_books = Book.objects.annotate(c=Count('authors')).filter(c=len(target_authors))
# In each iteration, we filter out those books which don't contain one of the
# required authors - the instance on the iteration.
for author in target_authors:
candidate_books = candidate_books.filter(authors=author)
final_books = candidate_books
You can use complex lookups with Q objects
from django.db.models import Q
...
target_authors = set((author_1, author_2))
q = Q()
for author in target_authors:
q &= Q(authors=author)
Books.objects.annotate(c=Count('authors')).filter(c=len(target_authors)).filter(q)
Q() & Q() is not equal to .filter().filter(). Their raw SQLs are different where by using Q with &, its SQL just add a condition like WHERE "book"."author" = "author_1" and "book"."author" = "author_2". it should return empty result.
The only solution is just by chaining filter to form a SQL with inner join on same table: ... ON ("author"."id" = "author_book"."author_id") INNER JOIN "author_book" T4 ON ("author"."id" = T4."author_id") WHERE ("author_book"."author_id" = "author_1" AND T4."author_id" = "author_1")
I came across the same problem and came to the same conclusion as iuysal,
untill i had to do a medium sized search (with 1000 records with 150 filters my request would time out).
In my particular case the search would result in no records since the chance that a single record will align with ALL 150 filters is very rare, you can get around the performance issues by verifying that there are records in the QuerySet before applying more filters to save time.
# In each iteration, we filter out those books which don't contain one of the
# required authors - the instance on the iteration.
for author in target_authors:
if candidate_books.count() > 0:
candidate_books = candidate_books.filter(authors=author)
For some reason Django applies filters to empty QuerySets.
But if optimization is to be applied correctly however, using a prepared QuerySet and correctly applied indexes are necessary.