Django annotate subquery's aggregation - django

So I have three models
class Advert(BaseModel):
company = models.ForeignKey(Company, on_delete=CASCADE, related_name="adverts")
class Company(BaseModel):
name = models.CharField(max_length=50)
class OrderRating(BaseModel):
reported_company = models.ForeignKey(Company, on_delete=CASCADE, related_name='ratings')
rating = models.DecimalField(
max_digits=2,
decimal_places=1,
validators=[MinValueValidator(1.0), MaxValueValidator(5.0)],
help_text='Rating from 1.0 to 5.0.'
)
And I'm trying to get average of all order ratings related to the company and annotate that to the Advert model, when I do this:
qs = Advert.objects.all().annotate(
avg_rating=Subquery(
OrderRating.objects.filter(
reported_company=OuterRef('company')).aggregate(Avg("rating"))["rating__avg"]
)
)
I get back stating
This queryset contains a reference to an outer query and may only be used in a subquery.'
Not sure where the problem is when I am calling the OuterRef inside a Subquery.

According to my experience Subqueries are often a bit tricky and not well documented. And they tend to return the message you are receiving when there is some error in your code defining the Subquery (not a very helpful message indeed).
As far as I know aggregate does not work in Subequeries, you must use annotations instead. So this should work:
qs = Advert.objects.all().annotate(
avg_rating=Subquery(
OrderRating.objects.filter(
reported_company=OuterRef('company')).values('reported_company').annotate(av=Avg('rating')).values('av')
)
)

Related

Django annotate M2M object with field from through model

Let's say I have the following models:
class Color(models.Model):
name = models.CharField(max_length=255, unique=True)
users = models.ManyToManyField(User, through="UserColor", related_name="colors")
class UserColor(models.Model):
class Meta:
unique_together = (("user", "color"), ("user", "rank"))
user = models.ForeignKey(User, on_delete=models.CASCADE)
color = models.ForeignKey(Color, on_delete=models.CASCADE)
rank = models.PositiveSmallIntegerField()
I want to fetch all users from the database with their respective colors and color ranks. I know I can do this by traversing across the through model, which makes a total of 3 DB hits:
users = User.objects.prefetch_related(
Prefetch(
"usercolor_set",
queryset=UserColor.objects.order_by("rank").prefetch_related(
Prefetch("color", queryset=Color.objects.only("name"))
),
)
)
for user in users:
for usercolor in user.usercolor_set.all():
print(user, usercolor.color.name, usercolor.rank)
I discovered another way to do this by annotating the rank onto the Color objects, which makes sense because we have a unique constraint on user and color.
users = User.objects.prefetch_related(
Prefetch(
"colors",
queryset=(
Color.objects.annotate(rank=F("usercolor__rank"))
.order_by("rank")
.distinct()
),
)
)
for user in users:
for color in user.colors.all():
print(user, color, color.rank)
This approach comes with several benefits:
Makes only 2 DB hits instead of 3.
Don't have to deal with the through object, which I think is more intuitive.
However, it only works if I chain distinct() (otherwise I get duplicate objects) and I'm worried this may not be a legit approach (maybe I just came up with a hack that may not work in all cases).
So is the second solution legit? Is there a better way to it? Or should I stick to the first one?

Trouble with Django annotation with multiple ForeignKey references

I'm struggling with annotations and haven't found examples that help me understand. Here are relevant parts of my models:
class Team(models.Model):
team_name = models.CharField(max_length=50)
class Match(models.Model):
match_time = models.DateTimeField()
team1 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team1')
team2 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team2')
team1_points = models.IntegerField(null=True)
team2_points = models.IntegerField(null=True)
What I'd like to end up with is an annotation on the Teams objects that would give me each team's total points. Sometimes, a team is match.team1 (so their points are in match.team1_points) and sometimes they are match.team2, with their points stored in match.team2_points.
This is as close as I've gotten, in maybe a hundred or so tries:
teams = Team.objects.annotate(total_points =
Value(
(Match.objects.filter(team1=21).aggregate(total=Sum(F('team1_points'))))['total'] or 0 +
(Match.objects.filter(team2=21).aggregate(total=Sum(F('team2_points'))))['total'] or 0,
output_field=IntegerField())
)
This works great, but (of course) annotates the total_points for the team with pk=21 to every team in the queryset. If there's a better approach for all this, I'd love to see it, but short of that, if you can show me how to turn those '21' values into a reference to the outer team's pk, I think that will work?
EDIT: I ended up using a combination of elyas' answers and annotating a raw SQL statement to solve my issues. I was not able to keep normal annotations from dropping non-unique scores from the queryset, but raw SQL seems to work.
Here's that raw annotation:
teams = Team.objects.raw('select id, sum(points) as total_points from (select team1_id as id, team1_points as points from leagueman_match union all select team2_id as id, team2_points as points from leagueman_match) group by id order by total_points desc;')
An alternative solution to annotating the QuerySet might be to make total_points a #property of the Team model (depending on the use case):
from django.db.models import Case, Q, Sum, When
class Team(models.Model):
team_name = models.CharField(max_length=50)
#property
def total_points(self):
return Match.objects.filter(Q(team1=self.id) | Q(team2=self.id)).aggregate(
total_points=Sum(Case(
When(team1=self.id, then='team1_points'),
When(team2=self.id, then='team2_points')
))
)['total_points']
The disadvantage is that it can't be used in subsequent QuerySet operations e.g. .values(), .order_by().
Django also has a #cached_property decorator which will cache the output of the attribute when it is first called.
Other solutions tried
Originally I thought you could leverage the reverse relations match_team1 and match_team2 from the Team model to generate a simple annotation:
teams = Team.objects.annotate(
total_points=(
Sum('match_team1__team1_points', distinct=True)
+ Sum('match_team2__team2_points', distinct=True)
)
)
Unfortunately this solution encounters difficulties in handling duplicates. The distinct=True argument eliminates the issue of points from the same match being summed more than once. But it introduces a different issue where different matches with the same points scored will be excluded.
Maybe I would advice to refactor your data model. Even if you find a solution for this specific problem, you may want to think little ahead.
This is a solution:
class Team(models.Model):
team_name = models.CharField(max_length=50)
class Match(models.Model):
match_time = models.DateTimeField()
team1 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team1')
team2 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team2')
class Points(models.Model):
match = models.ForeignKey(
Match, on_delete=models.CASCADE, related_name='match')
team = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='team')
points = models.IntegerField(null=True)
With this you can sum up easily the points of any team, and also filter it by matches.

multiple joins on django queryset

For the below sample schema
# schema sameple
class A(models.Model):
n = models.ForeignKey(N, on_delete=models.CASCADE)
d = models.ForeignKey(D, on_delete=models.PROTECT)
class N(models.Model):
id = models.AutoField(primary_key=True, editable=False)
d = models.ForeignKey(D, on_delete=models.PROTECT)
class D(models.Model):
dsid = models.CharField(max_length=255, primary_key=True)
class P(models.Model):
id = models.AutoField(primary_key=True, editable=False)
name = models.CharField(max_length=255)
n = models.ForeignKey(N, on_delete=models.CASCADE)
# raw query for the result I want
# SELECT P.name
# FROM P, N, A
# WHERE (P.n_id = N.id
# AND A.n_id = N.id
# AND A.d_id = \'MY_DSID\'
# AND P.name = \'MY_NAME\')
What am I trying to achieve?
Well, I’m trying to find a way somehow be able to write a single queryset which does the same as what the above raw query does. So far I was able to do it by writing two queryset, and use the result from one queryset and then using that queryset I wrote the second one, to get the final DB records. However that’s 2 hits to the DB, and I want to optimize it by just doing everything in one DB hit.
What will be the queryset for this kinda raw query ? or is there a better way to do it ?
Above code is here https://dpaste.org/DZg2
You can archive it using related_name attribute and functions like select_related and prefetch_related.
Assuming the related name for each model will be the model's name and _items, but it is better to have proper model names and then provided meaningful related names. Related name is how you access the model in backward.
This way, you can use this query to get all models in a single DB hit:
A.objects.all().select_related("n", "d", "n__d").prefetch_related("n__p_items")
I edited the code in the pasted site, however, it will expire soon.

Django: Annotate with field from another table (one-to-many)

Good day.
I wish to annotate my model with information from a different table.
class CompetitionTeam(models.Model):
competition_id = models.ForeignKey('Competition', on_delete=models.CASCADE, to_field='id', db_column='competition_id')
team_id = models.ForeignKey('Team', on_delete=models.CASCADE, to_field='id', null=True, db_column='team_id')
...
class Team(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=30)
teamleader_id = models.ForeignKey('User', on_delete=models.CASCADE, to_field='id', db_column='teamleader_id')
...
class Competition(models.Model):
id = models.AutoField(primary_key=True)
name = models.CharField(max_length=30)
...
Looping through my competitions, I wish to retrieve the list of competitionteam objects to be displayed with the relevant team's name. I tried:
CompetitionTeam.objects.filter(competition_id=_competition.id).filter(team_id__in=joined_team_ids).annotate(name=...)
-where instead of the ellipses I put Subquery expressions in. However, I'm unsure of how to match the team_id variable. eg.
*.anotate(name=Subquery(Team.objects.filter(id=competitionteam.team_id)).values('name'))
Related is the question: Django annotate field value from another model but I am unsure of how to implement that in this case. In that case, in place of mymodel_id, I used team_id but it only had parameters from the Team object, not my competition team object. I didn't really understand OuterRef but here is my attempt that failed:
CompetitionTeam.objects.filter(competition_id=_competition.id).filter(team_id__in=joined_team_ids).annotate(name=Subquery(Team.objects.get(id=OuterRef('team_id'))))
"Error: This queryset contains a reference to an outer query and may only be used in a subquery."
The solution for my question was:
CompetitionTeam.objects.filter(
competition_id=_competition.id,
team_id__in=joined_team_ids
).annotate(
name=Subquery(
Team.objects.filter(
id=OuterRef('team_id')
).values('name')
))
Thanks.

Django: Filter in multiple models linked via ForeignKey?

I'd like to create a filter-sort mixin for following values and models:
class Course(models.Model):
title = models.CharField(max_length=70)
description = models.TextField()
max_students = models.IntegerField()
min_students = models.IntegerField()
is_live = models.BooleanField(default=False)
is_deleted = models.BooleanField(default=False)
teacher = models.ForeignKey(User)
class Session(models.Model):
course = models.ForeignKey(Course)
title = models.CharField(max_length=50)
description = models.TextField(max_length=1000, default='')
date_from = models.DateField()
date_to = models.DateField()
time_from = models.TimeField()
time_to = models.TimeField()
class CourseSignup(models.Model):
course = models.ForeignKey(Course)
student = models.ForeignKey(User)
enrollment_date = models.DateTimeField(auto_now=True)
class TeacherRating(models.Model):
course = models.ForeignKey(Course)
teacher = models.ForeignKey(User)
rated_by = models.ForeignKey(User)
rating = models.IntegerField(default=0)
comment = models.CharField(max_length=300, default='')
A Course could be 'Discrete mathematics 1'
Session are individual classes related to a Course (e.g. 1. Introduction, 2. Chapter I, 3 Final Exam etc.) combined with a date/time
CourseSignup is the "enrollment" of a student
TeacherRating keeps track of a student's rating for a teacher (after course completion)
I'd like to implement following functions
Sort (asc, desc) by Date (earliest Session.date_from), Course.Name
Filter by: Date (earliest Session.date_from and last Session.date_to), Average TeacherRating (e.g. minimum value = 3), CourseSignups (e.g. minimum 5 users signed up)
(these options are passed via a GET parameters, e.g. sort=date_ascending&f_min_date=10.10.12&...)
How would you create a function for that?
I've tried using
denormalization (just added a field to Course for the required filter/sort criterias and updated it whenever changes happened), but I'm not very satisfied with it (e.g. needs lots of update after each TeacherRating).
ForeignKey Queries (Course.objects.filter(session__date_from=xxx)), but I might run into performance issues later on..
Thanks for any tipp!
In addition to using the Q object for advanced AND/OR queries, get familiar with reverse lookups.
When Django creates reverse lookups for foreign key relationships. In your case you can get all Sessions belonging to a Course, one of two ways, each of which can be filtered.
c = Course.objects.get(id=1)
sessions = Session.objects.filter(course__id=c.id) # First way, forward lookup.
sessions = c.session_set.all() # Second way using the reverse lookup session_set added to Course object.
You'll also want to familiarize with annotate() and aggregate(), these allow you you to calculate fields and order/filter on the results. For example, Count, Sum, Avg, Min, Max, etc.
courses_with_at_least_five_students = Course.objects.annotate(
num_students=Count('coursesignup_set__all')
).order_by(
'-num_students'
).filter(
num_students__gte=5
)
course_earliest_session_within_last_240_days_with_avg_teacher_rating_below_4 = Course.objects.annotate(
min_session_date_from = Min('session_set__all')
).annotate(
avg_teacher_rating = Avg('teacherrating_set__all')
).order_by(
'min_session_date_from',
'-avg_teacher_rating'
).filter(
min_session_date_from__gte=datetime.now() - datetime.timedelta(days=240)
avg_teacher_rating__lte=4
)
The Q is used to allow you to make logical AND and logical OR in the queries.
I recommend you take a look at complex lookups: https://docs.djangoproject.com/en/1.5/topics/db/queries/#complex-lookups-with-q-objects
The following query might not work in your case (what does the teacher model look like?), but I hope it serves as an indication of how to use the complex lookup.
from django.db.models import Q
Course.objects.filter(Q(session__date__range=(start,end)) &
Q(teacher__rating__gt=3))
Unless absolutely necessary I'd indeed steer away from denormalization.
Your sort question wasn't entirely clear to me. Would you like to display Courses, filtered by date_from, and sort it by Date, Name?