Django count shared manytomany between objects and count them - django

I have two models, called Article and Label. A simplified snippet of them is below:
class Label(models.Model):
name = models.CharField(null=True)
class Article(models.Model):
title = models.CharField(null=True)
labels = models.ManyToManyField(Label, related_name='pieces', blank=True)
When viewing a specific article, I would like to display articles which have similar labels to those applied to the article being viewed, ordered by the number of labels that are shared with the article being read (like "similar articles").
I am attempting to perform this operation in the DB but I am struggling to find a queryset which will give me the same functionality as what I have done in Python by pulling all the articles from DB and performing a for-loop on each of them. A non-functioning query attempt of what I am trying to do is below (viewed_article is the article object being viewed):
articles = Article.objects.all()\
.annotate(
tags_count=Article.objects.filter(F('viewed_article.labels')
).count()).order_by(tags_count)

You need to use conditional expressions and a somewhat complicated query to achieve this:
from django.db.models import Case, Count, IntegerField, Sum, When
current_labels = viewed_article.labels.all()
similar_articles = Article.objects.filter(labels__in=current_labels).distinct()\
.annotate(
tag_count=Sum(
Case(
When(labels__in=current_labels, then=1),
default=0, output_field=IntegerField()
)
)
).order_by('-tag_count')
What is happening is:
Fetch all articles that share any labels with the current one. distinct() is required to weed out duplicates returned by the underlying JOIN query.
Annotate each article with conditional expression. Here it checks the article has each of the current article's labels, and adds 1 to the sum if it does. The result is a count of matching labels.
Order results by the count of matching labels.

Related

How to get a Count based on a subquery?

I am suffering to get a query working despite all I have been trying based on my web search, and I think I need some help before becoming crazy.
I have four models:
class Series(models.Model):
puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
...
class Puzzle(models.Model):
puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
...
class SeriesElement(models.Model):
puzzle = models.ForeignKey(Puzzle,on_delete=models.CASCADE,verbose_name='Puzzle',)
series = models.ForeignKey(Series,on_delete=models.CASCADE,verbose_name='Series',)
puzzle_index = models.PositiveIntegerField(verbose_name='Order',default=0,editable=True,)
class Play(models.Model):
puzzle = models.ForeignKey(Puzzle, on_delete=models.CASCADE, related_name='plays')
user = models.ForeignKey(settings.AUTH_USER_MODEL, blank=True,null=True, on_delete=models.SET_NULL, related_name='plays')
series = models.ForeignKey(Series, blank=True, null=True, on_delete=models.SET_NULL, related_name='plays')
puzzle_completed = models.BooleanField(default=None, blank=False, null=False)
...
each user can play any puzzle several times, each time creating a Play record.
that means that for a given set of (user,series,puzzle) we can have several Play records,
some with puzzle_completed = True, some with puzzle_completed = False
What I am trying (unsuccesfully) to achieve, is to calculate for each series, through an annotation, the number of puzzles nb_completed_by_user and nb_not_completed_by_user.
For nb_completed_by_user, I have something which works in almost all cases (I have one glitch in one of my test that I cannot explain so far):
Series.objects.annotate(nb_completed_by_user=Count('puzzles',
filter=Q(puzzles__plays__puzzle_completed=True,
puzzles__plays__series_id=F('id'),puzzles__plays__user=user), distinct=True))
For nb_not_completed_by_user, I was able to make a query on Puzzle that gives me the good answer, but I am not able to transform it into a Subquery expression that works without throwing up an error, or to get a Count expression to give me the proper answer.
This one works:
puzzles = Puzzle.objects.filter(~Q(plays__puzzle_completed=True,
plays__series_id=1, plays__user=user),series=s)
but when trying to move to a subquery, I cannot find the way to use the following expression not to throw the error:ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
pzl_completed_by_user = Puzzle.objects.filter(plays__series_id=OuterRef('id')).exclude(
plays__puzzle_completed=True,plays__series_id=OuterRef('id'), plays__user=user)
and the following Count expression doesn't give me the right result:
Series.objects.annotate(nb_not_completed_by_user=Count('puzzles', filter=~Q(
puzzle__plays__puzzle_completed=True, puzzle__plays__series_id=F('id'),
puzzle__plays__user=user))
Could anybody explain me how I could obtain both values ?
and eventually to propose me a link which explains clearly how to use subqueries for less-obvious cases than those in the official documentation
Thanks in advance
Edit March 2021:
I recently found two posts which guided me through one potential solution to this specific issue:
Django Count and Sum annotations interfere with each other
and
Django 1.11 Annotating a Subquery Aggregate
I implemented the proposed solution from https://stackoverflow.com/users/188/matthew-schinckel and https://stackoverflow.com/users/1164966/benoit-blanchon
having help classes: class SubqueryCount(Subquery) and class SubquerySum(Subquery)
class SubqueryCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = PositiveIntegerField()
class SubquerySum(Subquery):
template = '(SELECT sum(_sum."%(column)s") FROM (%(subquery)s) _sum)'
def __init__(self, queryset, column, output_field=None, **extra):
if output_field is None:
output_field = queryset.model._meta.get_field(column)
super().__init__(queryset, output_field, column=column, **extra)
It works extremely well ! and is far quicker than the conventional Django Count annotation.
... at least in SQlite, and probably PostgreSQL as stated by others.
But when I tried in a MariaDB environnement ... it crashed !
MariaDB is apparently not able / not willing to handle correlated subqueries as those are considered sub-optimal.
In my case, as I try to get from the database multiple Count/distinct annotations for each record at the same time, I really see a tremendous gain in performance (in SQLite)
that I would like to replicate in MariaDB.
Would anyone be able to help me figure out a way to implement those helper functions for MariaDB ?
What should template be in this environnement?
matthew-schinckel ?
benoit-blanchon ?
rktavi ?
Going a bit deeper and analysis the Django docs a bit more in details, I was finally able to produce a satisfying way to produce a Count or Sum based on subquery.
For simplifying the process, I defined the following helper functions:
To generate the subquery:
def get_subquery(app_label, model_name, reference_to_model_object, filter_parameters={}):
"""
Return a subquery from a given model (work with both FK & M2M)
can add extra filter parameters as dictionary:
Use:
subquery = get_subquery(
app_label='puzzles', model_name='Puzzle',
reference_to_model_object='puzzle_family__target'
)
or directly:
qs.annotate(nb_puzzles=subquery_count(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target')),)
"""
model = apps.get_model(app_label, model_name)
# we need to declare a local dictionary to prevent the external dictionary to be changed by the update method:
parameters = {f'{reference_to_model_object}__id': OuterRef('id')}
parameters.update(filter_parameters)
# putting '__id' instead of '_id' to work with both FK & M2M
return model.objects.filter(**parameters).order_by().values(f'{reference_to_model_object}__id')
To count the subquery generated through get_subquery:
def subquery_count(subquery):
"""
Use:
qs.annotate(nb_puzzles=subquery_count(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target')),)
"""
return Coalesce(Subquery(subquery.annotate(count=Count('pk', distinct=True)).order_by().values('count'), output_field=PositiveIntegerField()), 0)
To sum the subquery generated through get_subquery on the field field_to_sum:
def subquery_sum(subquery, field_to_sum, output_field=None):
"""
Use:
qs.annotate(total_points=subquery_sum(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target'),'points'),)
"""
if output_field is None:
output_field = queryset.model._meta.get_field(column)
return Coalesce(Subquery(subquery.annotate(result=Sum(field_to_sum, output_field=output_field)).order_by().values('result'), output_field=output_field), 0)
The required imports:
from django.db.models import Count, Subquery, PositiveIntegerField, DecimalField, Sum
from django.db.models.functions import Coalesce
I spent so many hours on solving this ...
I hope that this will save many of you all the frustration I went through figuring out the right way to proceed.

How to filter Django annotations on reverse foreign key fields

I am trying to get a count of all related models with a particular field value.
Here is some code...
models.py:
class Author(models.Model):
name = models.CharField(max_length=100)
class Book(models.Model):
BAD = "BAD"
MEH = "MEH"
GOOD = "GOOD"
GREAT = "GREAT"
REVIEW_CHOICES = (
(BAD, BAD.title()),
(MEH, MEH.title()),
(GOOD, GOOD.title()),
(GREAT, GREAT.title()),
)
title = models.CharField(max_length=100)
review = models.CharField(max_length=100, choices=REVIEW_CHOICES)
author = models.ForeignKey(Author, related_name="books")
Suppose I want to list the number of each type of reviews for each author.
I have tried:
Authors.object.annotate(n_good_books=Count("books")).filter(books__review="GOOD").values("name", "n_good_books")
I have also tried:
Authors.object.annotate(n_good_books=Count("books", filter=Q(books_review="GOOD"))).values("name", "n_good_books")
But neither of these works.
Any suggestions?
You need to .filter(..) before the .annotate(..), so:
Authors.object.filter(
books__review="GOOD" # before the annotate
).annotate(
n_good_books=Count("books")
)
This will result in a QuerySet of Authors, where each Author has an extra attribute .n_good_books that contains the number of good Books. The opposite means that you only will retrieve Authors for which at least one related Book has had a good review. As is specified in the documentation:
When used with an annotate() clause, a filter has the effect of
constraining the objects for which an annotation is calculated. For example, you can generate an annotated list of all books that have
a title starting with "Django" using the query:
>>> from django.db.models import Count, Avg
>>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count('authors'))
(..)
Annotated values can also be filtered. The alias for the annotation can be used in filter() and exclude() clauses in the
same way as any other model field.
For example, to generate a list of books that have more than one
author, you can issue the query:
>>> Book.objects.annotate(num_authors=Count('authors')).filter(num_authors__gt=1)
This query generates an annotated result set, and then generates a
filter based upon that annotation.
The Count(..., filter=Q(..)) approach only works since django-2.0, so in django-1.11 this will not work.
#willem-van-onsem has the correct answer to the question I asked.
However, if I wanted to get a count for all book types at once, I could do something like:
from django.db.models import Case, When, IntegerField
Authors.object.annotate(
n_bad_books=Count(Case(When(books__review="BAD", then=1), output_field=IntegerField())),
n_meh_books=Count(Case(When(books__review="MEH", then=1), output_field=IntegerField())),
n_good_books=Count(Case(When(books__review="GOOD", then=1), output_field=IntegerField())),
n_great_books=Count(Case(When(books__review="GREAT", then=1), output_field=IntegerField())),
)
And he's right, it is very inelegant.

Annotate queryset with whether matching related object exists

I have two models with an explicit many-to-many relationship: a thing, auth.user, and a "favorite" model connecting the two. I want to be able to order my "thing"s by whether or not they are favorited by a particular user. In Sqlite3, the best query i've come up with is (roughly) this:
select
*, max(u.name = "john cleese") as favorited
from thing as t
join favorite as f on f.thing_id = t.id
join user as u on f.user_id = u.id
group by t.id
order by favorited desc
;
The thing tripping me up in my sql-to-django translation is the max(u.name = "john cleese") bit. As far as I can tell, Django has support for arithmatic but not equality. The closest I can come is a case statement that doesn't properly group the output rows:
Thing.objects.annotate(favorited=Case(
When(favorites__user=john_cleese, then=Value(True)),
default=Value(False),
output_field=BooleanField()
))
The other direction I've tried is to use RawSQL:
Thing.objects.annotate(favorited=RawSQL('"auth_user"."username" = "%s"', ["john cleese"]))
However, this won't work, because (as far as I'm aware) there's no way to explicitly join the favorite and auth_user tables I need.
Is there something I'm missing?
This will achieve what you (or anyone else googling there way here) wants to do:
Thing.objects.annotate(
favorited=Count(Case(
When(
favorites__user=john_cleese,
then=1
),
default=0,
output_field=BooleanField(),
)),
)
From what I read in a related ticket, you can use subquery with the Exists query expression.
Exists is a Subquery subclass that uses an SQL EXISTS statement. In many cases it will perform better than a subquery since the database is able to stop evaluation of the subquery when a first matching row is found.
Assuming the middle model in your case of ManyToMany is called Favorite
from django.db.models import Exists, OuterRef
is_favorited_subquery = Favorite.objects.filter(
thing_id = OuterRef('pk')
)
Thing.objects.annotate(favorited=Exists(is_favorited_subquery))
Then you can order by favorited attribute of the query.
I'm not exactly sure what you're trying to achieve, but I would start it like this way.
from django.db import models
from django.contrib.auth.models import User
class MyUser(models.Model):
person = models.OneToOneField(User)
class Thing(models.Model):
thingname = models.CharField(max_length=10)
favorited_by = models.ManyToManyField(MyUser)
And in your view:
qs = MyUser.objects.get(id=pk_of_user_john_reese).thing_set.all()
Will give you all Thing objects of the given user.
You should have a look in the Django Docs for ManyToMany
I'm using Django for some years now in several smaller and even bigger Projects, but I have never used the RawSQL features. Most times I thought about it, I have had a mistake in my model design.

Annotate filtering -- sum only some of related objects' fields

Let's say there's an Author and he has Books. In order to fetch authors together with the number of written pages, the following can be done:
Author.objects.annotate(total_pages=Sum('book__pages'))
But what if I wanted to sum pages of sci-fi and fantasy books separately? I'd like to end up with an Author, that has total_pages_books_scifi_pages and total_pages_books_fantasy_pages properties.
I know I can do following:
Author.objects.filter(book__category='scifi').annotate(total_pages_books_scifi_pages=Sum('book__pages'))
Author.objects.filter(book__category='fantasy').annotate(total_pages_books_fantasy_pages=Sum('book__pages'))
But how do it in one queryset?
from django.db.models import IntegerField, F, Case, When, Sum
categories = ['scifi', 'fantasy']
annotations = {}
for category in categories:
annotation_name = 'total_pages_books_{}'.format(category)
case = Case(
When(book__category=category, then=F('book__pages')),
default=0,
output_field=IntegerField()
)
annotations[annotation_name] = Sum(case)
Author.objects.filter(
book__category__in=categories
).annotate(
**annotations
)
Try:
Author.objects.values("book__category").annotate(total_pages=Sum('book__pages'))
From Django docs:
https://docs.djangoproject.com/en/1.10/topics/db/aggregation/#values:
values()
Ordinarily, annotations are generated on a per-object basis - an annotated QuerySet will return one result for each object in the original QuerySet. However, when a values() clause is used to constrain the columns that are returned in the result set, the method for evaluating annotations is slightly different. Instead of returning an annotated result for each result in the original QuerySet, the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.

How to do SELECT COUNT(*) GROUP BY and ORDER BY in Django?

I'm using a transaction model to keep track all the events going through the system
class Transaction(models.Model):
actor = models.ForeignKey(User, related_name="actor")
acted = models.ForeignKey(User, related_name="acted", null=True, blank=True)
action_id = models.IntegerField()
......
how do I get the top 5 actors in my system?
In sql it will basically be
SELECT actor, COUNT(*) as total
FROM Transaction
GROUP BY actor
ORDER BY total DESC
According to the documentation, you should use:
from django.db.models import Count
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('total')
values() : specifies which columns are going to be used to "group by"
Django docs:
"When a values() clause is used to constrain the columns that are
returned in the result set, the method for evaluating annotations is
slightly different. Instead of returning an annotated result for each
result in the original QuerySet, the original results are grouped
according to the unique combinations of the fields specified in the
values() clause"
annotate() : specifies an operation over the grouped values
Django docs:
The second way to generate summary values is to generate an independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The order by clause is self explanatory.
To summarize: you group by, generating a queryset of authors, add the annotation (this will add an extra field to the returned values) and finally, you order them by this value
Refer to https://docs.djangoproject.com/en/dev/topics/db/aggregation/ for more insight
Good to note: if using Count, the value passed to Count does not affect the aggregation, just the name given to the final value. The aggregator groups by unique combinations of the values (as mentioned above), not by the value passed to Count. The following queries are the same:
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('total')
Transaction.objects.all().values('actor').annotate(total=Count('id')).order_by('total')
Just like #Alvaro has answered the Django's direct equivalent for GROUP BY statement:
SELECT actor, COUNT(*) AS total
FROM Transaction
GROUP BY actor
is through the use of values() and annotate() methods as follows:
Transaction.objects.values('actor').annotate(total=Count('actor')).order_by()
However one more thing must be pointed out:
If the model has a default ordering defined in class Meta, the .order_by() clause is obligatory for proper results. You just cannot skip it even when no ordering is intended.
Further, for a high quality code it is advised to always put a .order_by() clause after annotate(), even when there is no class Meta: ordering. Such approach will make the statement future-proof: it will work just as intended, regardless of any future changes to class Meta: ordering.
Let me provide you with an example. If the model had:
class Transaction(models.Model):
actor = models.ForeignKey(User, related_name="actor")
acted = models.ForeignKey(User, related_name="acted", null=True, blank=True)
action_id = models.IntegerField()
class Meta:
ordering = ['id']
Then such approach WOULDN'T work:
Transaction.objects.values('actor').annotate(total=Count('actor'))
That's because Django performs additional GROUP BY on every field in class Meta: ordering
If you would print the query:
>>> print Transaction.objects.values('actor').annotate(total=Count('actor')).query
SELECT "Transaction"."actor_id", COUNT("Transaction"."actor_id") AS "total"
FROM "Transaction"
GROUP BY "Transaction"."actor_id", "Transaction"."id"
It will be clear that the aggregation would NOT work as intended and therefore the .order_by() clause must be used to clear this behaviour and get proper aggregation results.
See: Interaction with default ordering or order_by() in official Django documentation.
If you want reverse (bigger value to smaller value) order just use - minus.
from django.db.models import Count
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('-total')