Django efficient lookups: Related manager vs whole queryset - django

Let's say I have the following Django models:
class X(models.Model):
some_field = models.FloatField()
class Y(models.Model):
x = models.ForeignKey(X)
another_field = models.DateField()
Let's say I'm looking for a particular instance of y, with a certain date (lookup_date), belonging to a certain x. Which option would be a more efficient lookup, if any?:
1. Y.objects.get(x=x, another_field=lookup_date)
or using the related manager:
2. x.y_set.get(another_field=lookup_date)

You'll probably find that they produce the same query, you can check this by adding .query to the end of the query which will show the resulting sql.
Y.objects.get(x=x, another_field=lookup_date).query
x.y_set.get(another_field=lookup_date).query
But either way this is a micro optimization and you may find it interesting to read Eric Lippert's performance rant.
Is one of them considered more pythonic?
Not really, I tend to use the second since it can make it slightly easier to conform to pep8's line length standard

Related

django subquery with a join in it

I've got django 1.8.5 and Python 3.4.3, and trying to create a subquery that constrains my main data set - but the subquery itself (I think) needs a join in it. Or maybe there is a better way to do it.
Here's a trimmed down set of models:
class Lot(models.Model):
lot_id = models.CharField(max_length=200, unique=True)
class Lot_Country(models.Model):
lot = models.ForeignKey(Lot)
country = CountryField()
class Discrete(models.Model):
discrete_id = models.CharField(max_length=200, unique=True)
master_id = models.ForeignKey(Inventory_Master)
location = models.ForeignKey(Location)
lot = models.ForeignKey(Lot)
I am filtering on various attributes of Discrete (which is discrete supply) and I want to go "up" through Lot, over the Lot_Country, meaning "I only want to get rows from Discrete if the Lot associated with that row has an entry in Lot_Country for my appropriate country (let's say US.)
I've tried something like this:
oklots=list(Lot_Country.objects.filter(country='US'))
But, first of all that gives me the str back, which I don't really want (and changed it to be lot_id, but that's a hack.)
What's the best way to constrain Discrete through Lot and over to Lot_Country? In SQL I would just join in the subquery (or even in the main query - maybe that's what I need? I guess I don't know how to join up to a parent then down into that parent's other child...)
Thanks in advance for your help.
I'm not sure what you mean by "it gives me the str back"... Lot_Country.objects.filter(country='US') will return a queryset. Of course if you print it in your console, you will see a string.
I also think your models need refactoring. The way you have currently defined it, you can associate multiple Lot_Countrys with one Lot, and a country can only be associated with one lot.
If I understand your general model correctly that isn't what you want - you want to associate multiple Lots with one Lot_Country. To do that you need to reverse your foreign key relationship (i.e., put it inside the Lot).
Then, for fetching all the Discrete lots that are in a given country, you would do:
discretes_in_us = Discrete.objects.filter(lot__lot_country__country='US')
Which will give you a queryset of all Discretes whose Lot is in the US.

Foreign Key Relationships

I have two models
class Subject(models.Model):
name = models.CharField(max_length=100,choices=COURSE_CHOICES)
created = models.DateTimeField('created', auto_now_add=True)
modified = models.DateTimeField('modified', auto_now=True)
syllabus = models.FileField(upload_to='syllabus')
def __unicode__(self):
return self.name
and
class Pastquestion(models.Model):
subject=models.ForeignKey(Subject)
year =models.PositiveIntegerField()
questions = models.FileField(upload_to='pastquestions')
def __unicode__(self):
return str(self.year)
Each Subject can have one or more past questions but a past question can have only one subject. I want to get a subject, and get its related past questions of a particular year. I was thinking of fetching a subject and getting its related past question.
Currently am implementing my code such that I rather get the past question whose subject and year correspond to any specified subject like
this_subject=Subject.objects.get(name=the_subject)
thepastQ=Pastquestion.objects.get(year=2000,subject=this_subject)
I was thinking there is a better way to do this. Or is this already a better way? Please Do tell ?
I think what you want is the related_name property of the ForeignKey field. This creates a link back to the Subject object and provides a manager you can use to query the set.
So to use this functionality, change the foreignkey line to:
subject=models.ForeignKey(Subject, related_name='questions')
Then with an instance of Subject we'll call subj, you can:
subj.questions.filter(year=2000)
I don't think this performs much differently to the technique you have used. Roughly speaking, SQL performance boils down a) whether there's an index and b) how many queries you're issuing. So you need to think about both. One way to find out what SQL your model usage is generating is to use SqlLogMiddleware - and alternatively play with the options in How to show the SQL Django is running It can be tempting when you get going to start issuing queries across relationships - e.g. q = Question.objects.get(year=2000, subject__name=SUBJ_MATHS) but unless you keep a close eye on these types of queries, you can and will kill your app's performance, badly.
Django's query syntax allows you to 'reach into' related objects.
past_questions = Pastquestion.objects.filter(year=2000, subject__name=subject_name)

Complex reverse query in Django

In a nutshell: my models are B --> A <-- C, I want to filter Bs where at least one C exists, satisfying some arbitrary conditions and related to the same A as that B. Help with some complicating factors (see below) is also appreciated.
Details:
I'm trying to create a generic model to limit user access to rows in other models. Here's a (simplified) example:
class CanRead(models.Model):
user = models.ForeignKey(User)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type', 'object_id')
class Direct(models.Model):
...
class Indirect(models.Model):
direct = models.ForeignKey(Direct)
...
class Indirect2(models.Model):
indirect = models.ForeignKey(Indirect)
...
It's not feasible to associate a CanRead to every row in every model (too costly in space), so only some models are expected to have that association (like Direct above). In this case, here's how I'd see if a Direct is accessible to a user or not:
Direct.objects.filter(Q(canread__user=current_user), rest_of_query)
(Unfortunately, this query won't work - in 1.2.5 at least - because of the generic fk; any help with this would be appreciated, but there are workarounds, the real issue is what follows next)
The others' accessibility will be dictated by their relations with other models. So, Indirect will be accessible to an user if direct is accessible, and Indirect2 will be if indirect__direct is, etc.
My problem is, how can I do this query? I'm tempted to write something like:
Indirect.objects.filter(Q(canread__content_object=F('direct'), canread__user=current_user), rest_of_query)
Indirect2.objects.filter(Q(canread__content_object=F('indirect__direct'), canread__user=current_user), rest_of_query)
but that doesn't work (Django expects a relation between CanRead and Indirect - which doesn't exist - for the reverse query to work). If I were writing it directy in SQL, I would do something like:
SELECT *
FROM indirect i
JOIN direct d ON i.direct = d.id
JOIN canread c ON c.object_id = d.id
WHERE
c.content_type = <<content type for Direct>> AND
c.user = <<current user>> AND
<<rest_of_query>>
but I can't translate that query to Django. Is it possible? If not, what would be the least instrusive way of doing it (using as little raw SQL as possible)?
Thanks for your time!
Note: The workaround mentioned would be not to use generic fk... :( I could discard the CanRead model and have many CanReadDirect, CanReadDirect2, CanReadDirect3, etc. It's a minor hassle, but wouldn't hinder my project too much.
For the simple case you've given, the solution is simple:
B.objects.filter(a__c__isnull=False)
For the actual query, here's my try:
Indirect.objects.filter(direct__id__in=
zip(*CanRead.objects.filter(
content_type=ContentType.objects.get_for_model(Direct)
).values_list('id'))[0])
But this way is very slow: you extract IDs from one queryset, then do a query with
where id in (1, 2, 3, ... 10000)
Which is VERY SLOW. We had a similar issue with joins on generic foreign keys in our project and decided to resort to raw queries in the model manager.
class DirectManager(Manager):
def can_edit(self, user):
return self.raw(...)
I'd also recommend checking out the per-row permissions framework in Django 1.3.
access control models are not that simple...
use a well-known access control model such as:
DAC/MAC
or
RBAC
also there is a project called django-rbac.

Django views - optimum query set in a ForeignKey model

Having the model:
class Notebook(models.Model):
n_id = models.AutoField(primary_key = True)
class Note(models.Model):
b_nbook = models.ForeignKey(Notebook)
the URL pattern passing one parameter:
(r'^(?P<n_id>\d+)/$', 'notebook_notes')
and the following view:
def notebook_notes(request, n_id):
nbook = get_object_or_404(Nbook, pk=n_id)
...
which of the following is the optimum query set, and why? (they both work and pass the notes based to a selected by URL notebook)
notes = nbook.note_set.filter(b_nbook = n_id)
notes = Note.objects.select_related().filter(b_nbook = n_id)
Well you're comparing apples and oranges a bit there. They may return virtually the same, but you're doing different things on both.
Let's take the relational version first. That query is saying get all the notes that belong to nbook. You're then filtering that queryset by only notes that belong to nbook. You're filtering it twice on the same criteria, in effect. Since Django's querysets are lazy, it doesn't really do anything bad, like hit the database multiple times, but it's still unnecessary.
Now, the second version. Here, you're starting with all notes and filtering to just those that belong to the particular notebook. There's only one filter this time, but it's bad form to do it this way. Since it's a relation, you should look it up through the relational format, i.e. nbook.note_set.all(). On this version, though, you're also using select_related(), which wasn't used on the other version.
select_related will attempt to create a join table with any other relations on the model, in this case a Note. However, since the only relation on Note is Notebook and you already have the notebook, it's redundant.
Taking out all the redundancy in those two version leaves you with just:
notes = nbook.note_set.all()
That, too, will return exactly the same results as the other two version, but is much cleaner and standardized.

Select DISTINCT individual columns in django?

I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.
One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()
It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4
User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()
The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.