Annotate queryset with whether matching related object exists - django

I have two models with an explicit many-to-many relationship: a thing, auth.user, and a "favorite" model connecting the two. I want to be able to order my "thing"s by whether or not they are favorited by a particular user. In Sqlite3, the best query i've come up with is (roughly) this:
select
*, max(u.name = "john cleese") as favorited
from thing as t
join favorite as f on f.thing_id = t.id
join user as u on f.user_id = u.id
group by t.id
order by favorited desc
;
The thing tripping me up in my sql-to-django translation is the max(u.name = "john cleese") bit. As far as I can tell, Django has support for arithmatic but not equality. The closest I can come is a case statement that doesn't properly group the output rows:
Thing.objects.annotate(favorited=Case(
When(favorites__user=john_cleese, then=Value(True)),
default=Value(False),
output_field=BooleanField()
))
The other direction I've tried is to use RawSQL:
Thing.objects.annotate(favorited=RawSQL('"auth_user"."username" = "%s"', ["john cleese"]))
However, this won't work, because (as far as I'm aware) there's no way to explicitly join the favorite and auth_user tables I need.
Is there something I'm missing?

This will achieve what you (or anyone else googling there way here) wants to do:
Thing.objects.annotate(
favorited=Count(Case(
When(
favorites__user=john_cleese,
then=1
),
default=0,
output_field=BooleanField(),
)),
)

From what I read in a related ticket, you can use subquery with the Exists query expression.
Exists is a Subquery subclass that uses an SQL EXISTS statement. In many cases it will perform better than a subquery since the database is able to stop evaluation of the subquery when a first matching row is found.
Assuming the middle model in your case of ManyToMany is called Favorite
from django.db.models import Exists, OuterRef
is_favorited_subquery = Favorite.objects.filter(
thing_id = OuterRef('pk')
)
Thing.objects.annotate(favorited=Exists(is_favorited_subquery))
Then you can order by favorited attribute of the query.

I'm not exactly sure what you're trying to achieve, but I would start it like this way.
from django.db import models
from django.contrib.auth.models import User
class MyUser(models.Model):
person = models.OneToOneField(User)
class Thing(models.Model):
thingname = models.CharField(max_length=10)
favorited_by = models.ManyToManyField(MyUser)
And in your view:
qs = MyUser.objects.get(id=pk_of_user_john_reese).thing_set.all()
Will give you all Thing objects of the given user.
You should have a look in the Django Docs for ManyToMany
I'm using Django for some years now in several smaller and even bigger Projects, but I have never used the RawSQL features. Most times I thought about it, I have had a mistake in my model design.

Related

Django doesn't respect Prefetch filters in annotate

class Subject(models.Model):
...
students = models.ManyToMany('Student')
type = models.CharField(max_length=100)
class Student(models.Model):
class = models.IntergerField()
dropped = models.BooleanField()
...
subjects_with_dropouts = (
Subject.objects.filter(category=Subject.STEM).
prefetch_related(
Prefetch('students', queryset=Students.objects.filter(class=2020))
.annotate(dropped_out=Case(
When(
students__dropped=True,
then=True,
),
output_field=BooleanField(),
default=False,
))
.filter(dropped_out=True)
)
I am trying to get all Subjects from category STEM, that have dropouts of class 2020, but for some reason I get Subjects that have dropouts from other classes as well.
I know that I can achive with
subjects_with_dropouts = Subject.objects.filter(
category=Subject.STEM,
students__dropped=True,
students__class=2020,
)
But why 1st approach doesn't work? I am using PostgreSQL.
When using prefetch, the joining is done in python. A good way to think of this is that you have two tables in the first query. One of subjects with at least one student who dropped out (note that you are doing an aggregate there (Case) so there is a JOIN with a GROUP BY on student.id), and one of students in class of 2020 (this is separate than the join in the first table). The prefetch just says to join these two separate queries using the through table that contains both of their ids representing a connection that is auto generated by ManyToManyField.
A good way to see what is actually happening is by using print(QuerySet.query) where QuerySet is the instance of the QuerySet (Subject.objects.all()). Or if you have the means, django debug toolbar is a fantastic tool that shows you the EXPLAIN statement of each query in each endpoint.

How to get a Count based on a subquery?

I am suffering to get a query working despite all I have been trying based on my web search, and I think I need some help before becoming crazy.
I have four models:
class Series(models.Model):
puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
...
class Puzzle(models.Model):
puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
...
class SeriesElement(models.Model):
puzzle = models.ForeignKey(Puzzle,on_delete=models.CASCADE,verbose_name='Puzzle',)
series = models.ForeignKey(Series,on_delete=models.CASCADE,verbose_name='Series',)
puzzle_index = models.PositiveIntegerField(verbose_name='Order',default=0,editable=True,)
class Play(models.Model):
puzzle = models.ForeignKey(Puzzle, on_delete=models.CASCADE, related_name='plays')
user = models.ForeignKey(settings.AUTH_USER_MODEL, blank=True,null=True, on_delete=models.SET_NULL, related_name='plays')
series = models.ForeignKey(Series, blank=True, null=True, on_delete=models.SET_NULL, related_name='plays')
puzzle_completed = models.BooleanField(default=None, blank=False, null=False)
...
each user can play any puzzle several times, each time creating a Play record.
that means that for a given set of (user,series,puzzle) we can have several Play records,
some with puzzle_completed = True, some with puzzle_completed = False
What I am trying (unsuccesfully) to achieve, is to calculate for each series, through an annotation, the number of puzzles nb_completed_by_user and nb_not_completed_by_user.
For nb_completed_by_user, I have something which works in almost all cases (I have one glitch in one of my test that I cannot explain so far):
Series.objects.annotate(nb_completed_by_user=Count('puzzles',
filter=Q(puzzles__plays__puzzle_completed=True,
puzzles__plays__series_id=F('id'),puzzles__plays__user=user), distinct=True))
For nb_not_completed_by_user, I was able to make a query on Puzzle that gives me the good answer, but I am not able to transform it into a Subquery expression that works without throwing up an error, or to get a Count expression to give me the proper answer.
This one works:
puzzles = Puzzle.objects.filter(~Q(plays__puzzle_completed=True,
plays__series_id=1, plays__user=user),series=s)
but when trying to move to a subquery, I cannot find the way to use the following expression not to throw the error:ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
pzl_completed_by_user = Puzzle.objects.filter(plays__series_id=OuterRef('id')).exclude(
plays__puzzle_completed=True,plays__series_id=OuterRef('id'), plays__user=user)
and the following Count expression doesn't give me the right result:
Series.objects.annotate(nb_not_completed_by_user=Count('puzzles', filter=~Q(
puzzle__plays__puzzle_completed=True, puzzle__plays__series_id=F('id'),
puzzle__plays__user=user))
Could anybody explain me how I could obtain both values ?
and eventually to propose me a link which explains clearly how to use subqueries for less-obvious cases than those in the official documentation
Thanks in advance
Edit March 2021:
I recently found two posts which guided me through one potential solution to this specific issue:
Django Count and Sum annotations interfere with each other
and
Django 1.11 Annotating a Subquery Aggregate
I implemented the proposed solution from https://stackoverflow.com/users/188/matthew-schinckel and https://stackoverflow.com/users/1164966/benoit-blanchon
having help classes: class SubqueryCount(Subquery) and class SubquerySum(Subquery)
class SubqueryCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = PositiveIntegerField()
class SubquerySum(Subquery):
template = '(SELECT sum(_sum."%(column)s") FROM (%(subquery)s) _sum)'
def __init__(self, queryset, column, output_field=None, **extra):
if output_field is None:
output_field = queryset.model._meta.get_field(column)
super().__init__(queryset, output_field, column=column, **extra)
It works extremely well ! and is far quicker than the conventional Django Count annotation.
... at least in SQlite, and probably PostgreSQL as stated by others.
But when I tried in a MariaDB environnement ... it crashed !
MariaDB is apparently not able / not willing to handle correlated subqueries as those are considered sub-optimal.
In my case, as I try to get from the database multiple Count/distinct annotations for each record at the same time, I really see a tremendous gain in performance (in SQLite)
that I would like to replicate in MariaDB.
Would anyone be able to help me figure out a way to implement those helper functions for MariaDB ?
What should template be in this environnement?
matthew-schinckel ?
benoit-blanchon ?
rktavi ?
Going a bit deeper and analysis the Django docs a bit more in details, I was finally able to produce a satisfying way to produce a Count or Sum based on subquery.
For simplifying the process, I defined the following helper functions:
To generate the subquery:
def get_subquery(app_label, model_name, reference_to_model_object, filter_parameters={}):
"""
Return a subquery from a given model (work with both FK & M2M)
can add extra filter parameters as dictionary:
Use:
subquery = get_subquery(
app_label='puzzles', model_name='Puzzle',
reference_to_model_object='puzzle_family__target'
)
or directly:
qs.annotate(nb_puzzles=subquery_count(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target')),)
"""
model = apps.get_model(app_label, model_name)
# we need to declare a local dictionary to prevent the external dictionary to be changed by the update method:
parameters = {f'{reference_to_model_object}__id': OuterRef('id')}
parameters.update(filter_parameters)
# putting '__id' instead of '_id' to work with both FK & M2M
return model.objects.filter(**parameters).order_by().values(f'{reference_to_model_object}__id')
To count the subquery generated through get_subquery:
def subquery_count(subquery):
"""
Use:
qs.annotate(nb_puzzles=subquery_count(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target')),)
"""
return Coalesce(Subquery(subquery.annotate(count=Count('pk', distinct=True)).order_by().values('count'), output_field=PositiveIntegerField()), 0)
To sum the subquery generated through get_subquery on the field field_to_sum:
def subquery_sum(subquery, field_to_sum, output_field=None):
"""
Use:
qs.annotate(total_points=subquery_sum(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target'),'points'),)
"""
if output_field is None:
output_field = queryset.model._meta.get_field(column)
return Coalesce(Subquery(subquery.annotate(result=Sum(field_to_sum, output_field=output_field)).order_by().values('result'), output_field=output_field), 0)
The required imports:
from django.db.models import Count, Subquery, PositiveIntegerField, DecimalField, Sum
from django.db.models.functions import Coalesce
I spent so many hours on solving this ...
I hope that this will save many of you all the frustration I went through figuring out the right way to proceed.

Django distinct related querying

I have two models:
Model A is an AbstractUserModel and Model B
class ModelB:
user = ForeignKey(User, related_name='modelsb')
timestamp = DateTimeField(auto_now_add=True)
What I want to find is how many users have at least one ModelB object created at least in 3 of the 7 past days.
So far, I have found a way to do it but I know for sure there is a better one and that is why I am posting this question.
I basically split the query into 2 parts.
Part1:
I added a foo method inside the User Model that checks if a user meets the above conditions
def foo(self):
past_limit = starting_date - timedelta(days=7)
return self.modelsb.filter(timestamp__gte=past_limit).order_by('timestamp__day').distinct('timestamp__day').count() > 2
Part 2:
In the Custom User Manager, I find the users that have more than 2 modelsb objects in the last 7 days and iterate through them applying the foo method for each one of them.
By doing this I narrow down the iterations of the required for loop. (basically its a filter function but you get the point)
def boo(self):
past_limit = timezone.now() - timedelta(days=7)
candidates = super().get_queryset().annotate(rc=Count('modelsb', filter=Q(modelsb__timestamp__gte=past_limit))).filter(rc__gt=2)
return list(filter(lambda x: x.foo(), candidates))
However, I want to know if there is a more efficient way to do this, that is without the for loop.
You can use conditional annotation.
I haven't been able to test this query, but something like this should work:
from django.db.models import Q, Count
past_limit = starting_date - timedelta(days=7)
users = User.objects.annotate(
modelsb_in_last_seven_days=Count('modelsb__timestap__day',
filter=Q(modelsb__timestamp__gte=past_limit),
distinct=True))
.filter(modelsb_in_last_seven_days__gte = 3)
EDIT:
This solution did not work, because the distinct option does specify what field makes an entry distinct.
I did some experimenting on my own Django instance, and found a way to make this work using SubQuery. The way this works is that we generate a subquery where we make the distinction ourself.
counted_modelb = ModelB.objects
.filter(user=OuterRef('pk'), timestamp__gte=past_limit)
.values('timestamp__day')
.distinct()
.annotate(count=Count('timestamp__day'))
.values('count')
query = User.objects
.annotate(modelsb_in_last_seven_days=Subquery(counted_modelb, output_field=IntegerField()))
.filter(modelsb_in_last_seven_days__gt = 2)
This annotates each row in the queryset with the count of all distinct days in modelb for the user, with a date greater than the selected day.
In the subquery I use values('timestamp__day') to make sure I can do distinct() (Because a combination of distinct('timestamp__day') and annotate() is unsupported.)

How to filter Django annotations on reverse foreign key fields

I am trying to get a count of all related models with a particular field value.
Here is some code...
models.py:
class Author(models.Model):
name = models.CharField(max_length=100)
class Book(models.Model):
BAD = "BAD"
MEH = "MEH"
GOOD = "GOOD"
GREAT = "GREAT"
REVIEW_CHOICES = (
(BAD, BAD.title()),
(MEH, MEH.title()),
(GOOD, GOOD.title()),
(GREAT, GREAT.title()),
)
title = models.CharField(max_length=100)
review = models.CharField(max_length=100, choices=REVIEW_CHOICES)
author = models.ForeignKey(Author, related_name="books")
Suppose I want to list the number of each type of reviews for each author.
I have tried:
Authors.object.annotate(n_good_books=Count("books")).filter(books__review="GOOD").values("name", "n_good_books")
I have also tried:
Authors.object.annotate(n_good_books=Count("books", filter=Q(books_review="GOOD"))).values("name", "n_good_books")
But neither of these works.
Any suggestions?
You need to .filter(..) before the .annotate(..), so:
Authors.object.filter(
books__review="GOOD" # before the annotate
).annotate(
n_good_books=Count("books")
)
This will result in a QuerySet of Authors, where each Author has an extra attribute .n_good_books that contains the number of good Books. The opposite means that you only will retrieve Authors for which at least one related Book has had a good review. As is specified in the documentation:
When used with an annotate() clause, a filter has the effect of
constraining the objects for which an annotation is calculated. For example, you can generate an annotated list of all books that have
a title starting with "Django" using the query:
>>> from django.db.models import Count, Avg
>>> Book.objects.filter(name__startswith="Django").annotate(num_authors=Count('authors'))
(..)
Annotated values can also be filtered. The alias for the annotation can be used in filter() and exclude() clauses in the
same way as any other model field.
For example, to generate a list of books that have more than one
author, you can issue the query:
>>> Book.objects.annotate(num_authors=Count('authors')).filter(num_authors__gt=1)
This query generates an annotated result set, and then generates a
filter based upon that annotation.
The Count(..., filter=Q(..)) approach only works since django-2.0, so in django-1.11 this will not work.
#willem-van-onsem has the correct answer to the question I asked.
However, if I wanted to get a count for all book types at once, I could do something like:
from django.db.models import Case, When, IntegerField
Authors.object.annotate(
n_bad_books=Count(Case(When(books__review="BAD", then=1), output_field=IntegerField())),
n_meh_books=Count(Case(When(books__review="MEH", then=1), output_field=IntegerField())),
n_good_books=Count(Case(When(books__review="GOOD", then=1), output_field=IntegerField())),
n_great_books=Count(Case(When(books__review="GREAT", then=1), output_field=IntegerField())),
)
And he's right, it is very inelegant.

Ordering by a custom model field in django

I am trying to add an additional custom field to a django model. I have been having quite a hard time figuring out how to do the following, and I will be awarding a 150pt bounty for the first fully correct answer when it becomes available (after it is available -- see as a reference Improving Python/django view code).
I have the following model, with a custom def that returns a video count for each user --
class UserProfile(models.Model):
user = models.ForeignKey(User, unique=True)
positions = models.ManyToManyField('Position', through ='PositionTimestamp', blank=True)
def count(self):
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"""SELECT (
SELECT COUNT(*)
FROM videos_video v
WHERE v.uploaded_by_id = p.id
OR EXISTS (
SELECT NULL
FROM videos_videocredit c
WHERE c.video_id = v.id
AND c.profile_id = p.id
)
) AS Total_credits
FROM userprofile_userprofile p
WHERE p.id = %d"""%(int(self.pk))
)
return int(cursor.fetchone()[0])
I want to be able to order by the count, i.e., UserProfile.objects.order_by('count'). Of course, I can't do that, which is why I'm asking this question.
Previously, I tried adding a custom model Manager, but the problem with that was I also need to be able to filter by various criteria of the UserProfile model: Specifically, I need to be able to do: UserProfile.objects.filter(positions=x).order_by('count'). In addition, I need to stay in the ORM (cannot have a raw sql output) and I do not want to put the filtering logic into the SQL, because there are various filters, and would require several statements.
How exactly would I do this? Thank you.
My reaction is that you're trying to take a bigger bite than you can chew. Break it into bite size pieces by giving yourself more primitives to work with.
You want to create these two pieces separately so you can call on them:
Does this user get credit for this video? return boolean
For how many videos does this user get credit? return int
Then use a combination of #property, model managers, querysets, and methods that make it easiest to express what you need.
For example you might attach the "credit" to the video model taking a user parameter, or the user model taking a video parameter, or a "credit" manager on users which adds a count of videos for which they have credit.
It's not trivial, but shouldn't be too tricky if you work for it.
"couldn't you use something like the "extra" queryset modifier?"
see the docs
I didn't put this in an answer at first because I wasn't sure it would actually work or if it was what you needed - it was more like a nudge in the (hopefully) right direction.
in the docs on that page there is an example
query
Blog.objects.extra(
select={
'entry_count': 'SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id'
},
)
resulting sql
SELECT blog_blog.*, (SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id) AS entry_count
FROM blog_blog;
Perhaps doing something like that and accessing the user id which you currently have as p.id as appname_userprofile.id
note:
Im just winging it so try to play around a bit.
perhaps use the shell to output the query as sql and see what you are getting.
models:
class Positions(models.Model):
x = models.IntegerField()
class Meta:
db_table = 'xtest_positions'
class UserProfile(models.Model):
user = models.ForeignKey(User, unique=True)
positions = models.ManyToManyField(Positions)
class Meta:
db_table = 'xtest_users'
class Video(models.Model):
usr = models.ForeignKey(UserProfile)
views = models.IntegerField()
class Meta:
db_table = 'xtest_video'
result:
test = UserProfile.objects.annotate(video_views=Sum('video__views')).order_by('video_views')
for t in test:
print t.video_views
doc: https://docs.djangoproject.com/en/dev/topics/db/aggregation/
This is either what you want, or I've completely misunderstood!.. Anywhoo... Hope it helps!