Foreign Key Relationships - django

I have two models
class Subject(models.Model):
name = models.CharField(max_length=100,choices=COURSE_CHOICES)
created = models.DateTimeField('created', auto_now_add=True)
modified = models.DateTimeField('modified', auto_now=True)
syllabus = models.FileField(upload_to='syllabus')
def __unicode__(self):
return self.name
and
class Pastquestion(models.Model):
subject=models.ForeignKey(Subject)
year =models.PositiveIntegerField()
questions = models.FileField(upload_to='pastquestions')
def __unicode__(self):
return str(self.year)
Each Subject can have one or more past questions but a past question can have only one subject. I want to get a subject, and get its related past questions of a particular year. I was thinking of fetching a subject and getting its related past question.
Currently am implementing my code such that I rather get the past question whose subject and year correspond to any specified subject like
this_subject=Subject.objects.get(name=the_subject)
thepastQ=Pastquestion.objects.get(year=2000,subject=this_subject)
I was thinking there is a better way to do this. Or is this already a better way? Please Do tell ?

I think what you want is the related_name property of the ForeignKey field. This creates a link back to the Subject object and provides a manager you can use to query the set.
So to use this functionality, change the foreignkey line to:
subject=models.ForeignKey(Subject, related_name='questions')
Then with an instance of Subject we'll call subj, you can:
subj.questions.filter(year=2000)
I don't think this performs much differently to the technique you have used. Roughly speaking, SQL performance boils down a) whether there's an index and b) how many queries you're issuing. So you need to think about both. One way to find out what SQL your model usage is generating is to use SqlLogMiddleware - and alternatively play with the options in How to show the SQL Django is running It can be tempting when you get going to start issuing queries across relationships - e.g. q = Question.objects.get(year=2000, subject__name=SUBJ_MATHS) but unless you keep a close eye on these types of queries, you can and will kill your app's performance, badly.

Django's query syntax allows you to 'reach into' related objects.
past_questions = Pastquestion.objects.filter(year=2000, subject__name=subject_name)

Related

Filtering a django Prefetch based on more fields from the root object

This is similar to other questions regarding complicated prefetches but seems to be slightly different. Here are some example models:
class Author(Model):
pass
class Book(Model):
authors = ManyToManyField(Author, through='BookAuthor')
class BookAuthor(Model):
book = ForeignKey(Book, ...)
author = ForeignKey(Author, ...)
class Event(Model):
book = ForeignKey(Book, ...)
author = ForeignKey(Author, ...)
In summary: a BookAuthor links a book to one of its authors, and an Event also concerns a book and one of its authors (the latter constraint is unimportant). One could design the relationships differently but it's too late for that now, and in my case, this in fact is part of migrating away from the current setup.
Now suppose I have a BookAuthor instance and want to find any events that relate to that combination of book and author. I can follow either the book or author relations on BookAuthor and their event_set reverse relationship, but neither gives me what I want and, if I have several BookAuthors it becomes a pain.
It seems that the way to get an entire queryset "annotated" onto a model instance is via prefetch_related but as far as I can tell there is no way, in the Prefetch object's queryset property, to refer to the root object's fields. I would like to do something like this:
BookAuthor.objects.filter(...).prefetch_related(
Prefetch(
'book__event_set',
queryset=Event.objects.filter(
author=OuterRef('author')
)
}
)
However, OuterRef only works in subqueries and this is not one. The answer to this question suggests I could use a subquery here but I don't understand how it could ever work: you have to put the subquery inside a query to work in the Prefetch, and the OuterRef refers to that object/DB row; there is no way to get back to the original, root object. If I translate the code there into my situation it is clear that the OuterRef is referring to the outer Event query, not the outer BookAuthor.
To make the question precise: what I want is O(1) queries which, for a queryset of BookAuthors annotate each instance - or one of its foreign keys - with a collection of the corresponding Events. Obviously one can get all the Events and glue everything together in python. I want to avoid this but if anyone has a particularly elegant way of doing that, it would also be useful to know.

Django efficient lookups: Related manager vs whole queryset

Let's say I have the following Django models:
class X(models.Model):
some_field = models.FloatField()
class Y(models.Model):
x = models.ForeignKey(X)
another_field = models.DateField()
Let's say I'm looking for a particular instance of y, with a certain date (lookup_date), belonging to a certain x. Which option would be a more efficient lookup, if any?:
1. Y.objects.get(x=x, another_field=lookup_date)
or using the related manager:
2. x.y_set.get(another_field=lookup_date)
You'll probably find that they produce the same query, you can check this by adding .query to the end of the query which will show the resulting sql.
Y.objects.get(x=x, another_field=lookup_date).query
x.y_set.get(another_field=lookup_date).query
But either way this is a micro optimization and you may find it interesting to read Eric Lippert's performance rant.
Is one of them considered more pythonic?
Not really, I tend to use the second since it can make it slightly easier to conform to pep8's line length standard

django subquery with a join in it

I've got django 1.8.5 and Python 3.4.3, and trying to create a subquery that constrains my main data set - but the subquery itself (I think) needs a join in it. Or maybe there is a better way to do it.
Here's a trimmed down set of models:
class Lot(models.Model):
lot_id = models.CharField(max_length=200, unique=True)
class Lot_Country(models.Model):
lot = models.ForeignKey(Lot)
country = CountryField()
class Discrete(models.Model):
discrete_id = models.CharField(max_length=200, unique=True)
master_id = models.ForeignKey(Inventory_Master)
location = models.ForeignKey(Location)
lot = models.ForeignKey(Lot)
I am filtering on various attributes of Discrete (which is discrete supply) and I want to go "up" through Lot, over the Lot_Country, meaning "I only want to get rows from Discrete if the Lot associated with that row has an entry in Lot_Country for my appropriate country (let's say US.)
I've tried something like this:
oklots=list(Lot_Country.objects.filter(country='US'))
But, first of all that gives me the str back, which I don't really want (and changed it to be lot_id, but that's a hack.)
What's the best way to constrain Discrete through Lot and over to Lot_Country? In SQL I would just join in the subquery (or even in the main query - maybe that's what I need? I guess I don't know how to join up to a parent then down into that parent's other child...)
Thanks in advance for your help.
I'm not sure what you mean by "it gives me the str back"... Lot_Country.objects.filter(country='US') will return a queryset. Of course if you print it in your console, you will see a string.
I also think your models need refactoring. The way you have currently defined it, you can associate multiple Lot_Countrys with one Lot, and a country can only be associated with one lot.
If I understand your general model correctly that isn't what you want - you want to associate multiple Lots with one Lot_Country. To do that you need to reverse your foreign key relationship (i.e., put it inside the Lot).
Then, for fetching all the Discrete lots that are in a given country, you would do:
discretes_in_us = Discrete.objects.filter(lot__lot_country__country='US')
Which will give you a queryset of all Discretes whose Lot is in the US.

Counting the number of related objects with a certain value in Django

This are simplified models to demonstrate my problem:
class User(models.Model):
username = models.CharField(max_length=30)
total_readers = models.IntegerField(default=0)
class Book(models.Model):
author = models.ForeignKey(User)
title = models.CharField(max_length=100)
class Reader(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
So, we have Users, Books and Readers (Users, who have read a Book). Thus, Reader is basically a many-to-many relationship between Book and User.
Now let's say, the current user reads a book. Now, I'd like to update the number of total readers for all books of this book's author:
# get the book (as an example pk=1)
book = Book.objects.get(pk=1)
# save Reader object for this user and this book
Reader(user=request.user, book=book).save()
# count and save the total number of readers for this author in all his books
book.author.total_readers = Reader.objects.filter(book__author=book.author).count()
book.author.save()
By doing so, Django creates a LEFT OUTER JOIN query for PostgreSQL and we get the expected result. However, the database tables are huge and this has become a bottleneck.
In this example, we could simply increase the total_readers by one on each view, instead of actually counting the database rows. However, this is just a simplified model structure and we cannot do this in reality here.
What I can do, is creating another field in the Reader model called book_author_id. Thus, I denormalize data and can count the Reader objects without having PostgreSQL making the LEFT OUTER JOIN with the User table.
Finally, here's my question: Is it possible to create some sort of database index, so that PostgreSQL handles this denormalization automatically? Or do I really have to create this additional model field and redundantly store the author's PK in there?
EDIT - to point out the essential question: I got several great answers, which work for a lot of scenarios. However, they don't solve this actual problem. The only thing I'd like to know, is if it's possible to have PostgreSQL handle such a denormalization automatically - e.g. by creating some sort of database index.
Sometimes, this query can serve better:
book.author.total_readers = Reader.objects.filter(book__in=Book.objects.filter(author=book.author)).count()
That will generate query with sub-query, sometimes it will have better performance that query with join. You even go further and end up creating 2 queries separately:
book.author.total_readers = Reader.objects.filter(book_id__in=Book.objects.filter(author=book.author).values_list('id', flat=True)).count()
That will generate 2 queries, one will retrieve list of all book IDs for that author and second will retrieve count of reads for books with ID in that list.
Good solution also may be to create some batch task that will run for example once per hour and count up all reads, but that way you will end up with not live refreshing count of reads.
You can also create celery task that will run just after read is created to generate new value for author. That way you won't have long response time and delay from creating read to counting it up won't be so long.
It's always way better to solve bottlenecks of this sort with good design and maybe a little bit of caching rather than duplicating data in the way you suggest. The total_readers field is data you should generate instead of recording.
class User(models.Model):
username = models.CharField(max_length=30)
#property
def total_readers(self):
cached_value = caching_client.get("readers_"+self.username, None)
if cached_value is None:
cached_value = self.readers()
caching_client.set("readers_"+self.username,
cached_value)
return cached_value
def readers(self):
return Reader.objects.filter(book__author__user=self).count()
There are libraries that do the caching via decorators but I felt it was a pattern you would benefit from seeing expressly. You can also attach a TTL to the cache so that you insure that the value can't be wrong for longer than TTL. You can also regenerate the cache upon creation of a Reader object.
You might actually get some mileage with declaring an m2m and defining through relationships but I have no experience of it.

Django Queryset with filtering on reverse foreign key

I have the following Django model:
class Make:
name = models.CharField(max_length=200)
class MakeContent:
make = models.ForeignKey(Make)
published = models.BooleanField()
I'd like to know if it's possible (without writing SQL directly) for me to generate a queryset that contains all Makes and each one's related MakeContents where published = True.
Yes, I think you want
make = Make.objects.get(pk=1)
make.make_content_set.filter(published=True)
or maybe
make_ids = MakeContent.objects.filter(published=True).values_list('make_id', flat=True)
makes = Make.objects.filter(id__in=make_ids)
I know this is very old question, but I am answering. As I think my answer can help others. I have changed the model a bit as follows. I have used Django 1.8.
class Make(models.Model):
name = models.CharField(max_length=200)
class MakeContent(models.Model):
make = models.ForeignKey(Make, related_name='makecontent')
published = models.BooleanField()
I have used the following queryset.
Make.objects.filter(makecontent__published=True)
You should use distinct() to avoid the duplicate result.
Make.objects.filter(makecontent__published=True).distinct()
I hope it will help.
Django doesn't support the select_related() method for reverse foreign key lookups, so the best you can do without leaving Python is two database queries. The first is to grab all the Makes that contain MakeContents where published = True, and the second is to grab all the MakeContents where published = True. You then have to loop through and arrange the data how you want it. Here's a good article about how to do this:
http://blog.roseman.org.uk/2010/01/11/django-patterns-part-2-efficient-reverse-lookups/
Let me translate Spike's worded answer into codes for future viewers. Please note that each 'Make' can have zero to multiple 'MakeContent'
If the asker means to query 'Make' with AT LEAST ONE 'MakeContent' whose published=True, then Jason Christa's 2nd snippet answers the question.
The snippet is equivalent to
makes = Make.objects.select_related().filter(makecontent__published=True).distinct()
But if the asker means to query 'Make' with ALL 'MakeContent' whose published=True, then following the 'makes' above,
import operator
make_ids = [m.id for m in makes if
reduce(operator.and_, [c.published for c in m.makecontent_set.all()] )
]
makes_query = Make.objects.filter(id__in=make_ids)
contains the desired query.
One more time, it is not clear what was the question, but if it was desired that all related MakeContent objects must have been published this can work:
Make.objects.exclude(MakeContent_set__published=False)
And if at least one of them (as it was in other answers):
Make.objects.filter(MakeContent_set__published=True)