My company has a pretty complicated web of Django models that I don't deal too much with. But sometimes I need to do queries on it. One that I'm doing right now is taking an inconveniently long time. So because I'm not the best at understanding how to use annotations effectively and I don't really understand subqueries at all (and probably some other key Django stuff) I was hoping someone here could help figure out how to do a better job at getting this result quicker.
Here's a facsimile of the relevant models in our database.
class Company(models.Model):
name = models.CharField(max_length=255)
#property
def active_humans(self):
if hasattr(self, '_active_humans'):
return self._active_humans
else:
self._active_humans = Human.objects.filter(active=True, departments__company=self).distinct()
return self._active_humans
class Department(models.Model):
name = models.CharField(max_length=225)
company = models.ForeignKey(
'muh_project.Company',
related_name="departments",
on_delete=models.PROTECT
)
humans = models.ManyToManyField('muh_project.Human', through='muh_project.Job', related_name='departments')
class Job(models.Model):
name = models.CharField(max_length=225)
department = models.ForeignKey(
'muh_project.Department',
on_delete=models.PROTECT
)
human = models.ForeignKey(
'muh_project.Human',
on_delete=models.PROTECT
)
class Human(models.Model):
active = models.BooleanField(default=True)
#property
def fixed_happy_dogs(self):
return self.solutions.filter(is_neutered_spayed=True, disposition="happy")
class Dog(models.Model):
is_neutered_spayed = models.BooleanField(default=True)
disposition = models.CharField(max_length=225)
age = models.IntegerField()
human = models.ForeignKey(
'muh_project.Human',
related_name="dogs",
on_delete=models.PROTECT
)
human_job = models.ForeignKey(
'muh_project.Job',
blank=True,
null=True,
on_delete=models.PROTECT
)
What I'm trying to do (in the language of this silly toy example) is to get the number of humans with at least one of a certain type of dog for each of some companies. So what I'm doing is running this.
rows = []
company_type = "Tech"
fixed_happy_dogs = Dog.objects.filter(is_neutered_spayed=True, disposition="happy")
old_dogs = fixed_happy_dogs.filter(age__gte=7)
companies = Company.objects.filter(name__icontains=company_type)
for company in companies.order_by('id'):
humans = company.active_humans
num_humans = humans.distinct().count()
humans_with_fixed_happy_dogs = humans.filter(dogs__in=fixed_happy_dogs).distinct().count()
humans_with_old_dogs = humans.filter(dogs__in=old_dogs).distinct().count()
rows.append(f'{company.id};{num_humans};{humans_with_fixed_happy_dogs};{humans_with_old_dogs}')
It generally takes anywhere from 45 - 120 seconds to run depending on how many companies I run it over. I'd like to cut that down. I do need the final result as a list of strings as shown.
One low-hanging fruit would be to add db index to the column Dog.disposition, since it's being used in the .filter() statement, and it looks like it needs to do sequence scan over the table (each time it goes through the for loop).
For this task specifically I'd recommend to use Django Debug Toolbar where you can see all SQL queries, which can help you to pinpoint the slowest ones, and use EXPLAIN to see what goes wrong there.
Related
I'm struggling with annotations and haven't found examples that help me understand. Here are relevant parts of my models:
class Team(models.Model):
team_name = models.CharField(max_length=50)
class Match(models.Model):
match_time = models.DateTimeField()
team1 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team1')
team2 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team2')
team1_points = models.IntegerField(null=True)
team2_points = models.IntegerField(null=True)
What I'd like to end up with is an annotation on the Teams objects that would give me each team's total points. Sometimes, a team is match.team1 (so their points are in match.team1_points) and sometimes they are match.team2, with their points stored in match.team2_points.
This is as close as I've gotten, in maybe a hundred or so tries:
teams = Team.objects.annotate(total_points =
Value(
(Match.objects.filter(team1=21).aggregate(total=Sum(F('team1_points'))))['total'] or 0 +
(Match.objects.filter(team2=21).aggregate(total=Sum(F('team2_points'))))['total'] or 0,
output_field=IntegerField())
)
This works great, but (of course) annotates the total_points for the team with pk=21 to every team in the queryset. If there's a better approach for all this, I'd love to see it, but short of that, if you can show me how to turn those '21' values into a reference to the outer team's pk, I think that will work?
EDIT: I ended up using a combination of elyas' answers and annotating a raw SQL statement to solve my issues. I was not able to keep normal annotations from dropping non-unique scores from the queryset, but raw SQL seems to work.
Here's that raw annotation:
teams = Team.objects.raw('select id, sum(points) as total_points from (select team1_id as id, team1_points as points from leagueman_match union all select team2_id as id, team2_points as points from leagueman_match) group by id order by total_points desc;')
An alternative solution to annotating the QuerySet might be to make total_points a #property of the Team model (depending on the use case):
from django.db.models import Case, Q, Sum, When
class Team(models.Model):
team_name = models.CharField(max_length=50)
#property
def total_points(self):
return Match.objects.filter(Q(team1=self.id) | Q(team2=self.id)).aggregate(
total_points=Sum(Case(
When(team1=self.id, then='team1_points'),
When(team2=self.id, then='team2_points')
))
)['total_points']
The disadvantage is that it can't be used in subsequent QuerySet operations e.g. .values(), .order_by().
Django also has a #cached_property decorator which will cache the output of the attribute when it is first called.
Other solutions tried
Originally I thought you could leverage the reverse relations match_team1 and match_team2 from the Team model to generate a simple annotation:
teams = Team.objects.annotate(
total_points=(
Sum('match_team1__team1_points', distinct=True)
+ Sum('match_team2__team2_points', distinct=True)
)
)
Unfortunately this solution encounters difficulties in handling duplicates. The distinct=True argument eliminates the issue of points from the same match being summed more than once. But it introduces a different issue where different matches with the same points scored will be excluded.
Maybe I would advice to refactor your data model. Even if you find a solution for this specific problem, you may want to think little ahead.
This is a solution:
class Team(models.Model):
team_name = models.CharField(max_length=50)
class Match(models.Model):
match_time = models.DateTimeField()
team1 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team1')
team2 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team2')
class Points(models.Model):
match = models.ForeignKey(
Match, on_delete=models.CASCADE, related_name='match')
team = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='team')
points = models.IntegerField(null=True)
With this you can sum up easily the points of any team, and also filter it by matches.
I have some models in Django:
# models.py, simplified here
class Category(models.Model):
"""The category an inventory item belongs to. Examples: car, truck, airplane"""
name = models.CharField(max_length=255)
class UserInterestCategory(models.Model):
"""
How interested is a user in a given category. `interest` can be set by any method, maybe a neural network or something like that
"""
user = models.ForeignKey(User, on_delete=models.CASCADE) # user is the stock Django user
category = models.ForeignKey(Category, on_delete=models.CASCADE)
interest = models.PositiveIntegerField(default=0, validators=[MinValueValidator(0)])
class Item(models.Model):
"""This is a product that we have in stock, which we are trying to get a User to buy"""
model_number = models.CharField(max_length=40, default="New inventory item")
product_category = models.ForeignKey(Category, null=True, blank=True, on_delete=models.SET_NULL, verbose_name="Category")
I have a list view showing items, and I'm trying to sort by user_interest_category for the currently logged in user.
I have tried a couple different querysets and I'm not thrilled with them:
primary_queryset = Item.objects.all()
# this one works, and it's fast, but only finds items the users ALREADY has an interest in --
primary_queryset = primary_queryset.filter(product_category__userinterestcategory__user=self.request.user).annotate(
recommended = F('product_category__userinterestcategory__interest')
)
# this one works great but the baby jesus weeps at its slowness
# probably because we are iterating through every user, item, and userinterestcategory in the db
primary_queryset = primary_queryset.annotate(
recommended = Case(
When(product_category__userinterestcategory__user=self.request.user, then=F('product_category__userinterestcategory__interest')),
default=Value(0),
output_field=IntegerField(),
)
)
# this one works, but it's still a bit slow -- 2-3 seconds per query:
interest = Subquery(UserInterestCategory.objects.filter(category=OuterRef('product_category'), user=self.request.user).values('interest'))
primary_queryset = primary_queryset.annotate(interest)
The third method is workable, but it doesn't seem like the most efficient way to do things. Isn't there a better method than this?
I have user profiles that are each assigned a manager. I thought using recursion would be a good way to query every employee at every level under a particular manager. The goal is, if the CEO were to sign in, he should be able to query everyone at the company - but If I sign on I can only see people in my immediate team and the people below them, etc. until you get to the low level employees.
However when I run the following:
def team_training_list(request):
# pulls all training documents from training document model
user = request.user
manager_direct_team = Profile.objects.filter(manager=user)
query = Profile.objects.filter(first_name='fake')
trickle_team = manager_loop(manager_direct_team, query)
# manager_trickle_team = manager_direct_team | trickle_team
print(trickle_team)
def manager_loop(list, query):
for member in list:
user_instance = User.objects.get(username=member)
has_team = Profile.objects.filter(manager=user_instance)
if has_team:
query = query | has_team
manager_loop(has_team, query)
else:
continue
return query
It only returns the last query that was run instead of the compiled queryset that I am trying to grow. I've tried placing 'return' before 'manager_loop(has_team, query) in order save the values but it also kills the loop at the first non-manager employee instead of continuing to the next employee.
I'm new to django so if there is an better way than recursion to pull the information that I need, I'd appreciate suggestions on that too.
EDIT:
As requested, here is the profile model.
class Profile(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
first_name = models.CharField(max_length=30, blank=False)
last_name = models.CharField(max_length=30, blank=False)
email = models.EmailField( blank=True, help_text='Optional',)
receive_email_notifications = models.BooleanField(default=False)
mobile_number = models.CharField(
max_length=15,
blank=True,
help_text='Optional'
)
carrier_options = (
(None, ''),
('#txt.att.net', 'AT&T'),
('#messaging.sprintpcs.com', 'Sprint'),
('#tmomail.net', 'T-Mobile'),
('#vtext.com', 'Verizon'),
)
mobile_carrier = models.CharField(max_length=25, choices=carrier_options, blank=True,
help_text='Optional')
receive_sms_notifications = models.BooleanField(default=False)
job_title = models.ForeignKey(JobTitle, unique=False, null=True)
manager = models.ForeignKey(User, unique=False, blank=True, related_name='+', null=True)
Ok, so it's a hierarchical model.
The problem with your current approach is this line:
query = query | has_team
This reassigns the local name query to a new queryset, but does not reassign the name in the caller. (Well, that's what I think it's trying to do - I am a little rusty but I don't think you can just | together querysets like that.) You'd also need something like:
query = manager_loop(has_team, query)
to propagate the changes via the returned object.
That said, while Django doesn't have built-in support for recursive queries, there are some third party packages that do. Old answers eg (Django self-recursive foreignkey filter query for all childs and Creating efficient database queries for hierarchical models (django)) recommend django-mptt. Your tag mentions postgres, so this post might be relevant:
https://two-wrongs.com/fast-sql-for-inheritance-in-a-django-hierarchy
If you don't use a third-party approach, it should be possible to clean up the evolution of the queryset - cast it to a set and use update or something, since you're accumulating profiles. But the key error is not using the returned modified object.
I'm trying to execute a complex query using Django's ORM and I can't seem to find a nice solution. Namely, I have a web application where users answer questions based on a video. I need to display all the videos for a specified user that have at least one question unanswered (not responded to). I haven't been able to figure it out yet with the ORM ... I know that I could probably write a SQL query for this and just execute it with the raw SQL function, but I really would prefer to stay in the ORM.
Models: Video, Question, Response and default User.
Relationships:
Question has a many to many relation towards video
Response has a foreign key each to Question, Video and User
What the query needs to do:
Display all the videos for a specified user that have at least one video question unanswered (not responded to).
Any help would be awesome! I've been struggling with this for way too long.
EDIT: The models I have are (simplified):
class Video(TimeStampedModel):
title = models.CharField(max_length=200)
source_id = models.CharField(max_length=20)
class Question(TimeStampedModel):
DEMOGRAPHIC_QUESTION = 'd'
QUESTION_TYPES = (
(VIDEO_QUESTION, 'Video related question'),
(DEMOGRAPHIC_QUESTION, 'Demographic question'),
)
MULTIPLE_CHOICE = 0
PLAIN_TEXT = 1
RESPONSE_TYPE = (
(MULTIPLE_CHOICE, 'Multiple Choice'),
(PLAIN_TEXT, 'Plain Text')
)
type = models.CharField(max_length=1, choices=QUESTION_TYPES)
videos = models.ManyToManyField(Video, null=True, blank=True)
title = models.CharField(max_length=500)
priority = models.IntegerField()
class Response(TimeStampedModel):
user = models.ForeignKey(User)
question = models.ForeignKey(Question)
video = models.ForeignKey(Video, blank=True, null=True)
choice = models.ForeignKey(Choice, null=True, blank=True,related_name='selected_choice')
text = models.CharField(max_length=500, blank=True)
// Not relevant but included for clarity
class Choice(TimeStampedModel):
question = models.ForeignKey(Question)
text_response = models.CharField(max_length=500)
image = models.FileField(upload_to=_get_choice_img_path, blank=True)
value = models.IntegerField(default=0)
external_id = models.IntegerField(default=0)
Judging logically by the way your models look like, I think something close to the following should be fine.
q = Response.objects.select_related().filter(user__name=user).filter(response__choice=None)
videos = Video.objects.filter(id__in=q.extra(where=["{}>=1".format(q.count())]).values('video_id'))
Hope you understand what I did there. The first line basically tries to take a natural join of the model objects. The second line is using the query generated in the first line to get the count and checks if it is at least 1, and gets the Videos that belong to that query.
I'd like to create a filter-sort mixin for following values and models:
class Course(models.Model):
title = models.CharField(max_length=70)
description = models.TextField()
max_students = models.IntegerField()
min_students = models.IntegerField()
is_live = models.BooleanField(default=False)
is_deleted = models.BooleanField(default=False)
teacher = models.ForeignKey(User)
class Session(models.Model):
course = models.ForeignKey(Course)
title = models.CharField(max_length=50)
description = models.TextField(max_length=1000, default='')
date_from = models.DateField()
date_to = models.DateField()
time_from = models.TimeField()
time_to = models.TimeField()
class CourseSignup(models.Model):
course = models.ForeignKey(Course)
student = models.ForeignKey(User)
enrollment_date = models.DateTimeField(auto_now=True)
class TeacherRating(models.Model):
course = models.ForeignKey(Course)
teacher = models.ForeignKey(User)
rated_by = models.ForeignKey(User)
rating = models.IntegerField(default=0)
comment = models.CharField(max_length=300, default='')
A Course could be 'Discrete mathematics 1'
Session are individual classes related to a Course (e.g. 1. Introduction, 2. Chapter I, 3 Final Exam etc.) combined with a date/time
CourseSignup is the "enrollment" of a student
TeacherRating keeps track of a student's rating for a teacher (after course completion)
I'd like to implement following functions
Sort (asc, desc) by Date (earliest Session.date_from), Course.Name
Filter by: Date (earliest Session.date_from and last Session.date_to), Average TeacherRating (e.g. minimum value = 3), CourseSignups (e.g. minimum 5 users signed up)
(these options are passed via a GET parameters, e.g. sort=date_ascending&f_min_date=10.10.12&...)
How would you create a function for that?
I've tried using
denormalization (just added a field to Course for the required filter/sort criterias and updated it whenever changes happened), but I'm not very satisfied with it (e.g. needs lots of update after each TeacherRating).
ForeignKey Queries (Course.objects.filter(session__date_from=xxx)), but I might run into performance issues later on..
Thanks for any tipp!
In addition to using the Q object for advanced AND/OR queries, get familiar with reverse lookups.
When Django creates reverse lookups for foreign key relationships. In your case you can get all Sessions belonging to a Course, one of two ways, each of which can be filtered.
c = Course.objects.get(id=1)
sessions = Session.objects.filter(course__id=c.id) # First way, forward lookup.
sessions = c.session_set.all() # Second way using the reverse lookup session_set added to Course object.
You'll also want to familiarize with annotate() and aggregate(), these allow you you to calculate fields and order/filter on the results. For example, Count, Sum, Avg, Min, Max, etc.
courses_with_at_least_five_students = Course.objects.annotate(
num_students=Count('coursesignup_set__all')
).order_by(
'-num_students'
).filter(
num_students__gte=5
)
course_earliest_session_within_last_240_days_with_avg_teacher_rating_below_4 = Course.objects.annotate(
min_session_date_from = Min('session_set__all')
).annotate(
avg_teacher_rating = Avg('teacherrating_set__all')
).order_by(
'min_session_date_from',
'-avg_teacher_rating'
).filter(
min_session_date_from__gte=datetime.now() - datetime.timedelta(days=240)
avg_teacher_rating__lte=4
)
The Q is used to allow you to make logical AND and logical OR in the queries.
I recommend you take a look at complex lookups: https://docs.djangoproject.com/en/1.5/topics/db/queries/#complex-lookups-with-q-objects
The following query might not work in your case (what does the teacher model look like?), but I hope it serves as an indication of how to use the complex lookup.
from django.db.models import Q
Course.objects.filter(Q(session__date__range=(start,end)) &
Q(teacher__rating__gt=3))
Unless absolutely necessary I'd indeed steer away from denormalization.
Your sort question wasn't entirely clear to me. Would you like to display Courses, filtered by date_from, and sort it by Date, Name?