I'm struggling with annotations and haven't found examples that help me understand. Here are relevant parts of my models:
class Team(models.Model):
team_name = models.CharField(max_length=50)
class Match(models.Model):
match_time = models.DateTimeField()
team1 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team1')
team2 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team2')
team1_points = models.IntegerField(null=True)
team2_points = models.IntegerField(null=True)
What I'd like to end up with is an annotation on the Teams objects that would give me each team's total points. Sometimes, a team is match.team1 (so their points are in match.team1_points) and sometimes they are match.team2, with their points stored in match.team2_points.
This is as close as I've gotten, in maybe a hundred or so tries:
teams = Team.objects.annotate(total_points =
Value(
(Match.objects.filter(team1=21).aggregate(total=Sum(F('team1_points'))))['total'] or 0 +
(Match.objects.filter(team2=21).aggregate(total=Sum(F('team2_points'))))['total'] or 0,
output_field=IntegerField())
)
This works great, but (of course) annotates the total_points for the team with pk=21 to every team in the queryset. If there's a better approach for all this, I'd love to see it, but short of that, if you can show me how to turn those '21' values into a reference to the outer team's pk, I think that will work?
EDIT: I ended up using a combination of elyas' answers and annotating a raw SQL statement to solve my issues. I was not able to keep normal annotations from dropping non-unique scores from the queryset, but raw SQL seems to work.
Here's that raw annotation:
teams = Team.objects.raw('select id, sum(points) as total_points from (select team1_id as id, team1_points as points from leagueman_match union all select team2_id as id, team2_points as points from leagueman_match) group by id order by total_points desc;')
An alternative solution to annotating the QuerySet might be to make total_points a #property of the Team model (depending on the use case):
from django.db.models import Case, Q, Sum, When
class Team(models.Model):
team_name = models.CharField(max_length=50)
#property
def total_points(self):
return Match.objects.filter(Q(team1=self.id) | Q(team2=self.id)).aggregate(
total_points=Sum(Case(
When(team1=self.id, then='team1_points'),
When(team2=self.id, then='team2_points')
))
)['total_points']
The disadvantage is that it can't be used in subsequent QuerySet operations e.g. .values(), .order_by().
Django also has a #cached_property decorator which will cache the output of the attribute when it is first called.
Other solutions tried
Originally I thought you could leverage the reverse relations match_team1 and match_team2 from the Team model to generate a simple annotation:
teams = Team.objects.annotate(
total_points=(
Sum('match_team1__team1_points', distinct=True)
+ Sum('match_team2__team2_points', distinct=True)
)
)
Unfortunately this solution encounters difficulties in handling duplicates. The distinct=True argument eliminates the issue of points from the same match being summed more than once. But it introduces a different issue where different matches with the same points scored will be excluded.
Maybe I would advice to refactor your data model. Even if you find a solution for this specific problem, you may want to think little ahead.
This is a solution:
class Team(models.Model):
team_name = models.CharField(max_length=50)
class Match(models.Model):
match_time = models.DateTimeField()
team1 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team1')
team2 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team2')
class Points(models.Model):
match = models.ForeignKey(
Match, on_delete=models.CASCADE, related_name='match')
team = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='team')
points = models.IntegerField(null=True)
With this you can sum up easily the points of any team, and also filter it by matches.
Related
I have some models in Django:
# models.py, simplified here
class Category(models.Model):
"""The category an inventory item belongs to. Examples: car, truck, airplane"""
name = models.CharField(max_length=255)
class UserInterestCategory(models.Model):
"""
How interested is a user in a given category. `interest` can be set by any method, maybe a neural network or something like that
"""
user = models.ForeignKey(User, on_delete=models.CASCADE) # user is the stock Django user
category = models.ForeignKey(Category, on_delete=models.CASCADE)
interest = models.PositiveIntegerField(default=0, validators=[MinValueValidator(0)])
class Item(models.Model):
"""This is a product that we have in stock, which we are trying to get a User to buy"""
model_number = models.CharField(max_length=40, default="New inventory item")
product_category = models.ForeignKey(Category, null=True, blank=True, on_delete=models.SET_NULL, verbose_name="Category")
I have a list view showing items, and I'm trying to sort by user_interest_category for the currently logged in user.
I have tried a couple different querysets and I'm not thrilled with them:
primary_queryset = Item.objects.all()
# this one works, and it's fast, but only finds items the users ALREADY has an interest in --
primary_queryset = primary_queryset.filter(product_category__userinterestcategory__user=self.request.user).annotate(
recommended = F('product_category__userinterestcategory__interest')
)
# this one works great but the baby jesus weeps at its slowness
# probably because we are iterating through every user, item, and userinterestcategory in the db
primary_queryset = primary_queryset.annotate(
recommended = Case(
When(product_category__userinterestcategory__user=self.request.user, then=F('product_category__userinterestcategory__interest')),
default=Value(0),
output_field=IntegerField(),
)
)
# this one works, but it's still a bit slow -- 2-3 seconds per query:
interest = Subquery(UserInterestCategory.objects.filter(category=OuterRef('product_category'), user=self.request.user).values('interest'))
primary_queryset = primary_queryset.annotate(interest)
The third method is workable, but it doesn't seem like the most efficient way to do things. Isn't there a better method than this?
I have two models: the gas station and the price of a product. The price can up to have 4 choices, one for each product type, not every station has all four products. I want to query the latest entry of each of those products, preferably in a single query:
class GasStation(models.Model):
place_id = models.IntegerField(primary_key=True)
name = models.CharField(max_length=255, null=True, blank=True)
class Price(models.Model):
class Producto(models.TextChoices):
GASOLINA_REGULAR = 'GR', _('Gasolina regular')
GASOINA_PREMIUM = 'GP', _('Gasolina premium')
DIESEL_REGULAR = 'DR', _('Diesel regular')
DIESEL_PREMIUM = 'DP', _('Diesel premium')
product = models.CharField(max_length=2, choices=Producto.choices)
created = models.DateTimeField(auto_now_add=True)
updated = models.DateTimeField(auto_now=True)
price = models.FloatField(null=True, blank=True)
estacion = models.ForeignKey(GasStation,
on_delete=models.SET_NULL,
null=True,
related_name='prices')
I've tried with:
station.price.filter(product__in=['GR', 'GP', 'DR', 'DP']).latest()
But it only returns the latest of the whole queryset, not the latest price of each product type. I want to avoid querying for each individual product because some stations don't sell all types .Any advice?
You're looking for annotations and Subquery. Below is what I think might work. Your models aren't fully defined. If you need the whole Price instance, then this won't work for you. Subquery can only annotate a single field.
from django.db.models import OuterRef, Subquery
stations = GasStation.objects.annotate(
latest_regular=Subquery(
Price.objects.filter(station_id=OuterRef("pk"), product="GR").order_by('-updated').values("price")[:1]
),
latest_premium=Subquery(
Price.objects.filter(station_id=OuterRef("pk"), product="GP").order_by('-updated').values("price")[:1]
),
...
)
station = stations.get(something_here)
station.latest_premium, station.latest_regular
You can make this more concise by using a dict comprehension iterating over your Product short codes and then doing .annotate(**annotations)
So I have three models
class Advert(BaseModel):
company = models.ForeignKey(Company, on_delete=CASCADE, related_name="adverts")
class Company(BaseModel):
name = models.CharField(max_length=50)
class OrderRating(BaseModel):
reported_company = models.ForeignKey(Company, on_delete=CASCADE, related_name='ratings')
rating = models.DecimalField(
max_digits=2,
decimal_places=1,
validators=[MinValueValidator(1.0), MaxValueValidator(5.0)],
help_text='Rating from 1.0 to 5.0.'
)
And I'm trying to get average of all order ratings related to the company and annotate that to the Advert model, when I do this:
qs = Advert.objects.all().annotate(
avg_rating=Subquery(
OrderRating.objects.filter(
reported_company=OuterRef('company')).aggregate(Avg("rating"))["rating__avg"]
)
)
I get back stating
This queryset contains a reference to an outer query and may only be used in a subquery.'
Not sure where the problem is when I am calling the OuterRef inside a Subquery.
According to my experience Subqueries are often a bit tricky and not well documented. And they tend to return the message you are receiving when there is some error in your code defining the Subquery (not a very helpful message indeed).
As far as I know aggregate does not work in Subequeries, you must use annotations instead. So this should work:
qs = Advert.objects.all().annotate(
avg_rating=Subquery(
OrderRating.objects.filter(
reported_company=OuterRef('company')).values('reported_company').annotate(av=Avg('rating')).values('av')
)
)
My company has a pretty complicated web of Django models that I don't deal too much with. But sometimes I need to do queries on it. One that I'm doing right now is taking an inconveniently long time. So because I'm not the best at understanding how to use annotations effectively and I don't really understand subqueries at all (and probably some other key Django stuff) I was hoping someone here could help figure out how to do a better job at getting this result quicker.
Here's a facsimile of the relevant models in our database.
class Company(models.Model):
name = models.CharField(max_length=255)
#property
def active_humans(self):
if hasattr(self, '_active_humans'):
return self._active_humans
else:
self._active_humans = Human.objects.filter(active=True, departments__company=self).distinct()
return self._active_humans
class Department(models.Model):
name = models.CharField(max_length=225)
company = models.ForeignKey(
'muh_project.Company',
related_name="departments",
on_delete=models.PROTECT
)
humans = models.ManyToManyField('muh_project.Human', through='muh_project.Job', related_name='departments')
class Job(models.Model):
name = models.CharField(max_length=225)
department = models.ForeignKey(
'muh_project.Department',
on_delete=models.PROTECT
)
human = models.ForeignKey(
'muh_project.Human',
on_delete=models.PROTECT
)
class Human(models.Model):
active = models.BooleanField(default=True)
#property
def fixed_happy_dogs(self):
return self.solutions.filter(is_neutered_spayed=True, disposition="happy")
class Dog(models.Model):
is_neutered_spayed = models.BooleanField(default=True)
disposition = models.CharField(max_length=225)
age = models.IntegerField()
human = models.ForeignKey(
'muh_project.Human',
related_name="dogs",
on_delete=models.PROTECT
)
human_job = models.ForeignKey(
'muh_project.Job',
blank=True,
null=True,
on_delete=models.PROTECT
)
What I'm trying to do (in the language of this silly toy example) is to get the number of humans with at least one of a certain type of dog for each of some companies. So what I'm doing is running this.
rows = []
company_type = "Tech"
fixed_happy_dogs = Dog.objects.filter(is_neutered_spayed=True, disposition="happy")
old_dogs = fixed_happy_dogs.filter(age__gte=7)
companies = Company.objects.filter(name__icontains=company_type)
for company in companies.order_by('id'):
humans = company.active_humans
num_humans = humans.distinct().count()
humans_with_fixed_happy_dogs = humans.filter(dogs__in=fixed_happy_dogs).distinct().count()
humans_with_old_dogs = humans.filter(dogs__in=old_dogs).distinct().count()
rows.append(f'{company.id};{num_humans};{humans_with_fixed_happy_dogs};{humans_with_old_dogs}')
It generally takes anywhere from 45 - 120 seconds to run depending on how many companies I run it over. I'd like to cut that down. I do need the final result as a list of strings as shown.
One low-hanging fruit would be to add db index to the column Dog.disposition, since it's being used in the .filter() statement, and it looks like it needs to do sequence scan over the table (each time it goes through the for loop).
For this task specifically I'd recommend to use Django Debug Toolbar where you can see all SQL queries, which can help you to pinpoint the slowest ones, and use EXPLAIN to see what goes wrong there.
I'd like to create a filter-sort mixin for following values and models:
class Course(models.Model):
title = models.CharField(max_length=70)
description = models.TextField()
max_students = models.IntegerField()
min_students = models.IntegerField()
is_live = models.BooleanField(default=False)
is_deleted = models.BooleanField(default=False)
teacher = models.ForeignKey(User)
class Session(models.Model):
course = models.ForeignKey(Course)
title = models.CharField(max_length=50)
description = models.TextField(max_length=1000, default='')
date_from = models.DateField()
date_to = models.DateField()
time_from = models.TimeField()
time_to = models.TimeField()
class CourseSignup(models.Model):
course = models.ForeignKey(Course)
student = models.ForeignKey(User)
enrollment_date = models.DateTimeField(auto_now=True)
class TeacherRating(models.Model):
course = models.ForeignKey(Course)
teacher = models.ForeignKey(User)
rated_by = models.ForeignKey(User)
rating = models.IntegerField(default=0)
comment = models.CharField(max_length=300, default='')
A Course could be 'Discrete mathematics 1'
Session are individual classes related to a Course (e.g. 1. Introduction, 2. Chapter I, 3 Final Exam etc.) combined with a date/time
CourseSignup is the "enrollment" of a student
TeacherRating keeps track of a student's rating for a teacher (after course completion)
I'd like to implement following functions
Sort (asc, desc) by Date (earliest Session.date_from), Course.Name
Filter by: Date (earliest Session.date_from and last Session.date_to), Average TeacherRating (e.g. minimum value = 3), CourseSignups (e.g. minimum 5 users signed up)
(these options are passed via a GET parameters, e.g. sort=date_ascending&f_min_date=10.10.12&...)
How would you create a function for that?
I've tried using
denormalization (just added a field to Course for the required filter/sort criterias and updated it whenever changes happened), but I'm not very satisfied with it (e.g. needs lots of update after each TeacherRating).
ForeignKey Queries (Course.objects.filter(session__date_from=xxx)), but I might run into performance issues later on..
Thanks for any tipp!
In addition to using the Q object for advanced AND/OR queries, get familiar with reverse lookups.
When Django creates reverse lookups for foreign key relationships. In your case you can get all Sessions belonging to a Course, one of two ways, each of which can be filtered.
c = Course.objects.get(id=1)
sessions = Session.objects.filter(course__id=c.id) # First way, forward lookup.
sessions = c.session_set.all() # Second way using the reverse lookup session_set added to Course object.
You'll also want to familiarize with annotate() and aggregate(), these allow you you to calculate fields and order/filter on the results. For example, Count, Sum, Avg, Min, Max, etc.
courses_with_at_least_five_students = Course.objects.annotate(
num_students=Count('coursesignup_set__all')
).order_by(
'-num_students'
).filter(
num_students__gte=5
)
course_earliest_session_within_last_240_days_with_avg_teacher_rating_below_4 = Course.objects.annotate(
min_session_date_from = Min('session_set__all')
).annotate(
avg_teacher_rating = Avg('teacherrating_set__all')
).order_by(
'min_session_date_from',
'-avg_teacher_rating'
).filter(
min_session_date_from__gte=datetime.now() - datetime.timedelta(days=240)
avg_teacher_rating__lte=4
)
The Q is used to allow you to make logical AND and logical OR in the queries.
I recommend you take a look at complex lookups: https://docs.djangoproject.com/en/1.5/topics/db/queries/#complex-lookups-with-q-objects
The following query might not work in your case (what does the teacher model look like?), but I hope it serves as an indication of how to use the complex lookup.
from django.db.models import Q
Course.objects.filter(Q(session__date__range=(start,end)) &
Q(teacher__rating__gt=3))
Unless absolutely necessary I'd indeed steer away from denormalization.
Your sort question wasn't entirely clear to me. Would you like to display Courses, filtered by date_from, and sort it by Date, Name?