I have two Models in Django. The first has the hierarchy of what job functions (positions) report to which other positions, and the second is people and what job function they hold.
class PositionHierarchy(model.Model):
pcn = models.CharField(max_length=50)
title = models.CharField(max_length=100)
level = models.CharField(max_length=25)
report_to = models.ForeignKey('PositionHierachy', null=True)
class Person(model.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
...
position = models.ForeignKey(PositionHierarchy)
When I have a Person record and I want to find the person's manager, I have to do
manager = person.position.report_to.person_set.all()[0]
# Can't use .first() because we haven't upgraded to 1.6 yet
If I'm getting people with a QuerySet, I can join (and avoid a second trip to the database) with position and report_to using Person.objects.select_related('position', 'position__reports_to').filter(...), but is there any way to avoid making another trip to the database to get the person_set? I tried adding 'position__reports_to__person_set' or just position__reports_to__person to the select_related, but that doesn't seem to change the query. Is this what prefetch_related is for?
I'd like to make a custom manager so that when I do a query to get Person records, I also get their PositionHeirarchy and their manager's Person record without more round trips to the database. This is what I have so far:
class PersonWithManagerManager(models.Manager):
def get_query_set(self):
qs = super(PersonWithManagerManager, self).get_query_set()
return qs.select_related(
'position',
'position__reports_to',
).prefetch_related(
)
Yes, that is what prefetch_related() is for. It will require an additional query, but the idea is that it will get all of the related information at once, instead of once per Person.
In your case:
qs.select_related('position__report_to')
.prefetch_related('position__report_to__person_set')
should require two queries, regardless of the number of Persons in the original query set.
Compare this example from the documentation:
>>> Restaurant.objects.select_related('best_pizza')
.prefetch_related('best_pizza__toppings')
Related
I have a model in Django in which a field has a fk relationship with the teacher model. I have came across select_related in django and want to use it in my view. However, I am not sure whether to use it in my query or not.
My models:
class Teacher(models.Model):
name = models.OneToOneField(max_length=255, default="", blank=True)
address = models.CharField(max_length=255, default="", blank=True)
college_name = models.CharField(max_length=255, default="", blank=True)
class OnlineClass(models.Model):
teacher = models.ForeignKey(Teacher,on_delete=models.CASCADE)
My view:
def get(self, request,*args, **kwargs):
teacher = self.request.user.teacher
classes = Class.objects.filter(teacher=teacher) #confusion is here..............
serializer_class = self.get_serializer_class()
serializer = serializer_class(classes,many=True)
return Response(serializer.data,status=status.HTTP_200_OK)
I have commented on the line or the section of the problem. So I wanted to list all the classes of that teacher. Here I have used filter. But can we use select_related here?? What I understood is if I want to show another fields of teacher model as well, for eg name or college_name, then I have to use it. Otherwise the way I have done it is correct. Also, select_related is only used for get api not for post api, is that correct??
First, the easiest way to get all classes per teacher is by using the related_name attribute (https://docs.djangoproject.com/en/3.2/ref/models/fields/#django.db.models.ForeignKey.related_name).
class OnlineClass(models.Model):
teacher = models.ForeignKey(
Teacher,
on_delete=models.CASCADE,
related_name='classes'
)
# All classes of a teacher
teacher.classes.all()
When select_related is used, new sql joins are added to the Django internals SQL query. It is useful to reduce the workload in the database engine, getting the data quickly, and yes, is only for reading.
for obj in OnlineClass.objects.all():
# This hits the database every cycle to get the teacher data,
# with a new query like: select * from teacher_table where id = ...
print(obj.teacher)
for obj in OnlineClass.objects.select_related('teacher').all():
# This don'ts hits the database.
# Previously, the Django ORM joined the
# OnlineClass and Teacher data with a single SQL query.
print(obj.teacher)
I think that, in your example, with only one teacher, using "select_related" or not don't make big difference.
select_related is used to select additional data from related objects when the query is executed. It results in a more complex query. But it boosts performance if you have to access related data, since no additional database queries will be required.
See documentation here.
In your code it would be possible to use select_related, but it would be inefficient, because you're not accessing related objects of the queried classes. So using select_related would result in a more complex query without any advantage.
If you wanted to use select_related, the syntax would be classes = Class.objects.select_related('teacher').filter(teacher=teacher)
I am building a queryset of a model, and want to put an annotation on a related model that I am selected_related'ing into the queryset. Hypothetically:
class Book(models.Model):
author = models.ForeignKey(Author, on_delete=models.CASCADE)
class Author(models.Model):
name = models.TextField()
I am selecting Book.objects.all().select_related(), and want to annotate the authors with awesome=True. I could resolve the queryset and alter the objects myself, but is there a cleaner ORM-oriented way to do it? Inverting the operation to select the Authors instead is undesirable.
For this prefetch_related may be more suitable and clean solution.
from django.db import models
q = Book.objects.all().prefetch_related(
models.Prefetch(
"author",
queryset=Author.objects.all().annotate(
awesome=models.Value(
True,
output_field=models.BooleanField()
)
)
)
)
This will result in two queries to the database, instead of one with select_related, the second one being made when you access .manager, still only two queries in total, not additional one for every book to get its manager,
unless different queryset (i.e. with filters) in used while accessing .manager later.
Below is my post model.
class Post(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
title = models.CharField(max_length=200)
content = models.TextField()
datetime = models.DateTimeField(auto_now_add=True)
votes = models.ManyToManyField(settings.AUTH_USER_MODEL,
related_name="post_votes", default=None, blank=True)
tags = models.ManyToManyField(Tag, default=None, blank=True)
I want to filter posts which contain a certain query in their title, content or as the name of one of their tags. To do this I've tried:
query_set = Post.objects.filter(Q(content__icontains=query)|
Q(tags__name__icontains=query)|
Q(title__icontains=query))
But this often returns QuerySets with duplicate results. I have tried using the distinct method to solve this, but that results in incorrect ordering when I sort the posts later on by the number of votes they have:
query_set.annotate(vote_count=Count('votes')).order_by('-vote_count', '-datetime')
If anybody could help me I would be very grateful.
Jack
The duplicates originate from the fact that you filter on related objects. This means that Django will perform a query with a JOIN in it. You can of course perform a uniqness filter at the Django/Python level, but those are inefficient (well the ineffeciency is two-fold: first it will result in more data being transmitted from the database to the Django server, and furthermore Python does not handle large collections very well).
Furthermore the line:
query_set.annotate(vote_count=Count('votes')).order_by('-vote_count', '-datetime')
is basically a no-op, since QuerySets are immutable, here you did not sort the QuerySet on votes, you constructed a new one that will do that, but you immediately throw it away, since you do nothing with the result.
You can add the annotation and ordering and thus obtain distinct results later on:
query_set = Post.objects.filter(
Q(content__icontains=query)|
Q(tags__name__icontains=query)|
Q(title__icontains=query)
).annotate(
vote_count=Count('votes', distinct=True)
).order_by('-vote_count', '-date_time').distinct()
The distinct=True on the Count is necessary, since, as said before, the query acts like a JOIN, and JOINs can act like "multipliers" when counting things, since a row can occur multiple times.
Can someone give me the best approach with an example for the following...
On a page I load the 'Group' object by ID. I also want to list all contacts that belong to that group (with paging).
Because of the paging issue I was thinking of just running a second database query with...
In my view...
group = get_object_or_404(Group, pk=id)
contacts = Contacts.objects.filter(group=x)
But this seems wasteful as I'm already getting the Group why hit the database twice.
See my model.
class GroupManager(models.Manager):
def for_user(self, user):
return self.get_query_set().filter(user=user,)
class Group(models.Model):
name = models.CharField(max_length=60)
modified = models.DateTimeField(null=True, auto_now=True,)
#FK
user = models.ForeignKey(User, related_name="user")
objects = GroupManager()
def get_absolute_url(self):
return reverse('contacts.views.group', args=[str(self.id)])
class Contact(models.Model):
first_name = models.CharField(max_length=60)
last_name = models.CharField(max_length=60)
#FK
group = models.ForeignKey(Group)
This is what select_related is designed for:
Returns a QuerySet that will automatically “follow” foreign-key
relationships, selecting that additional related-object data when it
executes its query. This is a performance booster which results in
(sometimes much) larger queries but means later use of foreign-key
relationships won’t require database queries.
In your case it would be:
Group.objects.select_related().get(pk=group)
Now on each FK lookup, you won't hit the database again.
The next step would be to cache the results using the cache api so that you don't hit the database everytime the next "page" is called. This would be useful if your data isn't time sensitive.
I'm making a little vocabulary-quiz app, and the basic model for a word is this:
class Word(models.Model):
id = models.AutoField(primary_key=True)
word = models.CharField(max_length=80)
id_image = models.ForeignKey(Image)
def __unicode__(self):
return self.word
class Meta:
db_table = u'word'
The model for words I'm currently quizzing myself on is this:
class WordToWorkOn(models.Model):
id = models.AutoField(primary_key=True)
id_student = models.ForeignKey(Student)
id_word = models.ForeignKey(Word)
level = models.IntegerField()
def __unicode__(self):
return u'%s %s' % (self.id_word.__unicode__(), self.id_student.__unicode__() )
class Meta:
db_table = u'word_to_work_on'
Where "level" indicates how well I've learned it. The set of words I've already learned has this model:
class WordLearned(models.Model):
id = models.AutoField(primary_key=True)
id_word = models.ForeignKey(Word, related_name='word_to_learn')
id_student = models.ForeignKey(Student, related_name='student_learning_word')
def __unicode__(self):
return u'%s %s' % (self.id_word.__unicode__(), self.id_student.__unicode__() )
class Meta:
db_table = u'word_learned'
When a queryset on WordToWorkOn comes back with too few results (because they have been learned well enough to get moved into WordLearned and deleted from WordToWorkOn), I want to find a Word to add to it. The part I don't know a good way to do is to limit it to Words which are not already in WordLearned.
So, generally speaking, I think I want to do an .exclude() of some sort on a queryset of Words, but it needs to exclude based on membership in the WordLearned table. Is there a good way to do this? I find lots of references to joining querysets, but couldn't find a good one on how to do this (probably just don't know the right term to search for).
I don't want to just use a flag on each Word to indicate learned, working on it, or not learned, because eventually this will be a multi-user app and I wouldn't want to have flags for every user. Hence, I thought multiple tables for each set would be better.
All advice is appreciated.
Firstly, a couple of notes about style.
There's no need to prefix the foreign key fields with id_. The underlying database field that Django creates for those FKs are suffixed with _id anyway, so you'll get something like id_word_id in the db. It'll make your code much clearer if you just call the fields 'word', 'student', etc.
Also, there's no need to specify the id autofields in each model. They are created automatically, and you should only specify them if you need to call them something else. Similarly, no need to specify db_table in your Meta, as this is also done automatically.
Finally, no need to call __unicode__ on the fields in your unicode method. The string interpolation will do that automatically, and again leaving it out will make your code much easier to read. (If you really want to do it explicitly, at least use the unicode(self.word) form.)
Anyway, on to your actual question. You can't 'join' querysets as such - the normal way to do a cross-model query is to have a foreignkey from one model to the other. You could do this:
words_to_work_on = Word.objects.exclude(WordLearned.objects.filter(student=user))
which under the hood will do a subquery to get all the WordLearned objects for the current user and exclude them from the list of words returned.
However, and especially bearing in mind your future requirement for a multiuser app, I think you should restructure your tables. What you want is a ManyToMany relationship between Word and Student, with an intermediary table capturing the status of a Word for a particular Student. That way you can get rid of the WordToWorkOn and WordLearned tables, which are basically duplicates.
Something like:
class Word(models.Model):
word = models.CharField(max_length=80)
image = models.ForeignKey(Image)
def __unicode__(self):
return self.word
class Student(models.Model):
... name, etc ...
words = models.ManyToManyField(Word, through='StudentWord')
class StudentWord(models.Model):
word = models.ForeignKey(Word)
student = models.ForeignKey(Student)
level = models.IntegerField()
learned = models.BooleanField()
Now you can get all the words to learn for a particular student:
words_to_learn = Word.objects.filter(studentword__student=student, studentword__learned=False)