I've got django 1.8.5 and Python 3.4.3, and trying to create a subquery that constrains my main data set - but the subquery itself (I think) needs a join in it. Or maybe there is a better way to do it.
Here's a trimmed down set of models:
class Lot(models.Model):
lot_id = models.CharField(max_length=200, unique=True)
class Lot_Country(models.Model):
lot = models.ForeignKey(Lot)
country = CountryField()
class Discrete(models.Model):
discrete_id = models.CharField(max_length=200, unique=True)
master_id = models.ForeignKey(Inventory_Master)
location = models.ForeignKey(Location)
lot = models.ForeignKey(Lot)
I am filtering on various attributes of Discrete (which is discrete supply) and I want to go "up" through Lot, over the Lot_Country, meaning "I only want to get rows from Discrete if the Lot associated with that row has an entry in Lot_Country for my appropriate country (let's say US.)
I've tried something like this:
oklots=list(Lot_Country.objects.filter(country='US'))
But, first of all that gives me the str back, which I don't really want (and changed it to be lot_id, but that's a hack.)
What's the best way to constrain Discrete through Lot and over to Lot_Country? In SQL I would just join in the subquery (or even in the main query - maybe that's what I need? I guess I don't know how to join up to a parent then down into that parent's other child...)
Thanks in advance for your help.
I'm not sure what you mean by "it gives me the str back"... Lot_Country.objects.filter(country='US') will return a queryset. Of course if you print it in your console, you will see a string.
I also think your models need refactoring. The way you have currently defined it, you can associate multiple Lot_Countrys with one Lot, and a country can only be associated with one lot.
If I understand your general model correctly that isn't what you want - you want to associate multiple Lots with one Lot_Country. To do that you need to reverse your foreign key relationship (i.e., put it inside the Lot).
Then, for fetching all the Discrete lots that are in a given country, you would do:
discretes_in_us = Discrete.objects.filter(lot__lot_country__country='US')
Which will give you a queryset of all Discretes whose Lot is in the US.
Related
I am working on converting some relatively complex SQL into something that Django can play with. I am trying not to just use the raw SQL, since I think playing with the standard Django toolkit will help me learn more about Django.
I have already managed to break up parts of the sql into chunks, and am tackling them piecemeal to make things a little easier.
Here is the SQL in question:
SELECT i.year, i.brand, i.desc, i.colour, i.size, i.mpn, i.url,
COALESCE(DATE_FORMAT(i_eta.eta, '%M %Y'),'Unknown')
as eta
FROM i
JOIN i_eta ON i_eta.mpn = i.mpn
WHERE category LIKE 'kids'
ORDER BY i.brand, i.desc, i.colour, FIELD(size, 'xxl','xl','l','ml','m','s','xs','xxs') DESC, size+0, size
Here is what I have (trying to convert line by line):
(grabbed automatically when performing filters)
(have to figure out django doc on coalesce for syntax)
db alias haven't been able to find yet - it is crucial since there is a db view that requires it
already included in the original q
.select_related?
.filter(category="kids")
.objects.order_by('brand','desc','colour') - don't know how to deal with SQL FIELDS
Any advice would be appreciated!
Here's how I would structure this.
First, I'm assuming your models for i and i_eta look something like this:
class I(models.Model):
mpn = models.CharField(max_length=30, primary_key=True)
year = models.CharField(max_length=30)
brand = models.CharField(max_length=30)
desc = models.CharField(max_length=100)
colour = models.CharField(max_length=30)
size = models.CharField(max_length=3)
class IEta(models.Model):
i = models.ForeignKey(I, on_delete=models.CASCADE)
eta = models.DateField()
General thoughts:
To write the coalesce in Django: I would not replace nulls with "Unknown" in the ORM. This is a presentation-layer concern: it should be dealt with in a template.
For date formatting, you can do date formatting in Python.
Not sure what a DB alias is.
For using multiple tables together, you can use either select_related(), prefetch_related(), or do nothing.
select_related() will perform a join.
prefect_related() will get the foreign key ID's from the first queryset, then generate a query like SELECT * FROM table WHERE id in (12, 13, 14).
Doing nothing will work automatically, but has the disadvantage of the SELECT N+1 problem.
I generally prefer prefetch_related().
For customizing the sort order of the size field, you have three options. My preference would be option 1, but any of the three will work.
Denormalize the sort criteria. Add a new field called size_numeric. Override the save() method to populate this field when saving new instances, giving xxl the value 1, xl the value 2, etc.
Sort in Python. Essentially, you use Python's built-in sorting methods to do the sort, rather than sorting it in the database. This doesn't work well if you have thousands of results.
Invoke the MySQL function. Essentially, using annotate(), you add the output of a function to the queryset. order_by() can sort by that function.
This are simplified models to demonstrate my problem:
class User(models.Model):
username = models.CharField(max_length=30)
total_readers = models.IntegerField(default=0)
class Book(models.Model):
author = models.ForeignKey(User)
title = models.CharField(max_length=100)
class Reader(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
So, we have Users, Books and Readers (Users, who have read a Book). Thus, Reader is basically a many-to-many relationship between Book and User.
Now let's say, the current user reads a book. Now, I'd like to update the number of total readers for all books of this book's author:
# get the book (as an example pk=1)
book = Book.objects.get(pk=1)
# save Reader object for this user and this book
Reader(user=request.user, book=book).save()
# count and save the total number of readers for this author in all his books
book.author.total_readers = Reader.objects.filter(book__author=book.author).count()
book.author.save()
By doing so, Django creates a LEFT OUTER JOIN query for PostgreSQL and we get the expected result. However, the database tables are huge and this has become a bottleneck.
In this example, we could simply increase the total_readers by one on each view, instead of actually counting the database rows. However, this is just a simplified model structure and we cannot do this in reality here.
What I can do, is creating another field in the Reader model called book_author_id. Thus, I denormalize data and can count the Reader objects without having PostgreSQL making the LEFT OUTER JOIN with the User table.
Finally, here's my question: Is it possible to create some sort of database index, so that PostgreSQL handles this denormalization automatically? Or do I really have to create this additional model field and redundantly store the author's PK in there?
EDIT - to point out the essential question: I got several great answers, which work for a lot of scenarios. However, they don't solve this actual problem. The only thing I'd like to know, is if it's possible to have PostgreSQL handle such a denormalization automatically - e.g. by creating some sort of database index.
Sometimes, this query can serve better:
book.author.total_readers = Reader.objects.filter(book__in=Book.objects.filter(author=book.author)).count()
That will generate query with sub-query, sometimes it will have better performance that query with join. You even go further and end up creating 2 queries separately:
book.author.total_readers = Reader.objects.filter(book_id__in=Book.objects.filter(author=book.author).values_list('id', flat=True)).count()
That will generate 2 queries, one will retrieve list of all book IDs for that author and second will retrieve count of reads for books with ID in that list.
Good solution also may be to create some batch task that will run for example once per hour and count up all reads, but that way you will end up with not live refreshing count of reads.
You can also create celery task that will run just after read is created to generate new value for author. That way you won't have long response time and delay from creating read to counting it up won't be so long.
It's always way better to solve bottlenecks of this sort with good design and maybe a little bit of caching rather than duplicating data in the way you suggest. The total_readers field is data you should generate instead of recording.
class User(models.Model):
username = models.CharField(max_length=30)
#property
def total_readers(self):
cached_value = caching_client.get("readers_"+self.username, None)
if cached_value is None:
cached_value = self.readers()
caching_client.set("readers_"+self.username,
cached_value)
return cached_value
def readers(self):
return Reader.objects.filter(book__author__user=self).count()
There are libraries that do the caching via decorators but I felt it was a pattern you would benefit from seeing expressly. You can also attach a TTL to the cache so that you insure that the value can't be wrong for longer than TTL. You can also regenerate the cache upon creation of a Reader object.
You might actually get some mileage with declaring an m2m and defining through relationships but I have no experience of it.
So first off, I want to clarify that I am trying to make One-To-Many relationships, not Many-to-One. I already understand how ForeignKeys work.
For the sake of the discussion, I've simplified my models; they're much more field-rich than this in the real implementation.
I have a model, called a ColumnDefinition:
class ColumnDefinition(Model):
column_name = CharField(max_length=32)
column_type = PositiveSmallIntegerField()
column_size = PositiveSmallIntegerField(null=True, blank=True)
I think have a registry. Each registry has a separate set of columns for it's input and output definition. I've put the theoretical "OneToManyField" in there to demonstrate what I'm trying to do.
class Registry(Model):
input_dictionary = OneToManyField(ColumnDefinition)
output_dictionary = OneToManyField(ColumnDefinition)
created_date = DateTimeField(auto_now_add=True, editable=False)
A ColumnDefinition is only ever related to one Registry ever. So it's not a Many to Many relationship. If I put a ForeignKey on the ColumnDefinition instead to create a reverse relationship, it can only create a single reverse, whereas I need both an input and output reverse.
I don't want to have to do anything kludgey like adding a "column_registry_type" field onto ColumnDefinition if I can get around it.
Does anyone have any good ideas on how to solve this problem?
Thanks!
You can add two ForeignKeys on ColumnDefinition, one for input and one for output, and give them separate related_names:
class ColumnDefinition(Model):
...
input_registry = models.ForeignKey(Registry, related_name='input_columns')
output_registry = models.ForeignKey(Registry, related_name='output_columns')
You can then access the set of columns like registry.input_columns.
You can and should define two different ForeignKey fields on ColumnDefinition. Just make sure to specify a related_name value for at least one of them.
https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.ForeignKey.related_name
I have two models
class Subject(models.Model):
name = models.CharField(max_length=100,choices=COURSE_CHOICES)
created = models.DateTimeField('created', auto_now_add=True)
modified = models.DateTimeField('modified', auto_now=True)
syllabus = models.FileField(upload_to='syllabus')
def __unicode__(self):
return self.name
and
class Pastquestion(models.Model):
subject=models.ForeignKey(Subject)
year =models.PositiveIntegerField()
questions = models.FileField(upload_to='pastquestions')
def __unicode__(self):
return str(self.year)
Each Subject can have one or more past questions but a past question can have only one subject. I want to get a subject, and get its related past questions of a particular year. I was thinking of fetching a subject and getting its related past question.
Currently am implementing my code such that I rather get the past question whose subject and year correspond to any specified subject like
this_subject=Subject.objects.get(name=the_subject)
thepastQ=Pastquestion.objects.get(year=2000,subject=this_subject)
I was thinking there is a better way to do this. Or is this already a better way? Please Do tell ?
I think what you want is the related_name property of the ForeignKey field. This creates a link back to the Subject object and provides a manager you can use to query the set.
So to use this functionality, change the foreignkey line to:
subject=models.ForeignKey(Subject, related_name='questions')
Then with an instance of Subject we'll call subj, you can:
subj.questions.filter(year=2000)
I don't think this performs much differently to the technique you have used. Roughly speaking, SQL performance boils down a) whether there's an index and b) how many queries you're issuing. So you need to think about both. One way to find out what SQL your model usage is generating is to use SqlLogMiddleware - and alternatively play with the options in How to show the SQL Django is running It can be tempting when you get going to start issuing queries across relationships - e.g. q = Question.objects.get(year=2000, subject__name=SUBJ_MATHS) but unless you keep a close eye on these types of queries, you can and will kill your app's performance, badly.
Django's query syntax allows you to 'reach into' related objects.
past_questions = Pastquestion.objects.filter(year=2000, subject__name=subject_name)
I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.
One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()
It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4
User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()
The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.