Should entry_set be cached with select_related? My DB is still getting calls even after I use select_related. The pertinent sections
class Alias(models.Model):
achievements = models.ManyToManyField('Achievement', through='Achiever')
def points(self) :
points = 0
for a in self.achiever_set.all() :
points += a.achievement.points * a.count
return points
class Achievement(models.Model):
name = models.CharField(max_length=100)
points = models.IntegerField(default=1)
class Achiever(models.Model):
achievement = models.ForeignKey(Achievement)
alias = models.ForeignKey(Alias)
count = models.IntegerField(default=1)
aliases = Alias.objects.all().select_related()
for alias in aliases :
print "points : %s" % alias.points()
for a in alias.achiever_set.all()[:5] :
print "%s x %d" % (a.achievement.name, a.count)
And I'm seeing a big join query at the start, and then individual calls for each achievement. Both for the points and for the name lookup.
Is this a bug, or am I doing something wrong?
With Django 1.4 you can use prefetch_related which will work for ManyToMany relations:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related
Select_related() doesn't work with manytomanyfields. At the moment, this is something that is not planned, but might be a future feature. See http://code.djangoproject.com/ticket/6432
In this case, if you want to make a single query you got two options
1) Make your own SQL, probably won't be pretty or fast.
2) You could also query on the model with the foreignkey. You would be able to use select_related in that case. You stil won't be able to access the modelname_set but with some formatting you would be able to vet the data you need in a single query. None of the options are ideal, but you could get it working at a deacent speed aswell.
In Django 1.3 You can use Queryset.values() and do something like:
Alias.objects[.filter().exclude() etc.].values('achievements__name', 'achievement__points')
Only drwaback is that You get QuerySetList instead of QuerySet. But this can be simply overcome by passing all necessary fields into values() - You have to change Your perception ;)
This can save you few dosen of queries...
Details can be found here in django docs:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.values
Related
Is it possible to filter a Django queryset by model property?
i have a method in my model:
#property
def myproperty(self):
[..]
and now i want to filter by this property like:
MyModel.objects.filter(myproperty=[..])
is this somehow possible?
Nope. Django filters operate at the database level, generating SQL. To filter based on Python properties, you have to load the object into Python to evaluate the property--and at that point, you've already done all the work to load it.
I might be misunderstanding your original question, but there is a filter builtin in python.
filtered = filter(myproperty, MyModel.objects)
But it's better to use a list comprehension:
filtered = [x for x in MyModel.objects if x.myproperty()]
or even better, a generator expression:
filtered = (x for x in MyModel.objects if x.myproperty())
Riffing off #TheGrimmScientist's suggested workaround, you can make these "sql properties" by defining them on the Manager or the QuerySet, and reuse/chain/compose them:
With a Manager:
class CompanyManager(models.Manager):
def with_chairs_needed(self):
return self.annotate(chairs_needed=F('num_employees') - F('num_chairs'))
class Company(models.Model):
# ...
objects = CompanyManager()
Company.objects.with_chairs_needed().filter(chairs_needed__lt=4)
With a QuerySet:
class CompanyQuerySet(models.QuerySet):
def many_employees(self, n=50):
return self.filter(num_employees__gte=n)
def needs_fewer_chairs_than(self, n=5):
return self.with_chairs_needed().filter(chairs_needed__lt=n)
def with_chairs_needed(self):
return self.annotate(chairs_needed=F('num_employees') - F('num_chairs'))
class Company(models.Model):
# ...
objects = CompanyQuerySet.as_manager()
Company.objects.needs_fewer_chairs_than(4).many_employees()
See https://docs.djangoproject.com/en/1.9/topics/db/managers/ for more.
Note that I am going off the documentation and have not tested the above.
Looks like using F() with annotations will be my solution to this.
It's not going to filter by #property, since F talks to the databse before objects are brought into python. But still putting it here as an answer since my reason for wanting filter by property was really wanting to filter objects by the result of simple arithmetic on two different fields.
so, something along the lines of:
companies = Company.objects\
.annotate(chairs_needed=F('num_employees') - F('num_chairs'))\
.filter(chairs_needed__lt=4)
rather than defining the property to be:
#property
def chairs_needed(self):
return self.num_employees - self.num_chairs
then doing a list comprehension across all objects.
I had the same problem, and I developed this simple solution:
objects = [
my_object
for my_object in MyModel.objects.all()
if my_object.myProperty == [...]
]
This is not a performatic solution, it shouldn't be done in tables that contains a large amount of data. This is great for a simple solution or for a personal small project.
PLEASE someone correct me, but I guess I have found a solution, at least for my own case.
I want to work on all those elements whose properties are exactly equal to ... whatever.
But I have several models, and this routine should work for all models. And it does:
def selectByProperties(modelType, specify):
clause = "SELECT * from %s" % modelType._meta.db_table
if len(specify) > 0:
clause += " WHERE "
for field, eqvalue in specify.items():
clause += "%s = '%s' AND " % (field, eqvalue)
clause = clause [:-5] # remove last AND
print clause
return modelType.objects.raw(clause)
With this universal subroutine, I can select all those elements which exactly equal my dictionary of 'specify' (propertyname,propertyvalue) combinations.
The first parameter takes a (models.Model),
the second a dictionary like:
{"property1" : "77" , "property2" : "12"}
And it creates an SQL statement like
SELECT * from appname_modelname WHERE property1 = '77' AND property2 = '12'
and returns a QuerySet on those elements.
This is a test function:
from myApp.models import myModel
def testSelectByProperties ():
specify = {"property1" : "77" , "property2" : "12"}
subset = selectByProperties(myModel, specify)
nameField = "property0"
## checking if that is what I expected:
for i in subset:
print i.__dict__[nameField],
for j in specify.keys():
print i.__dict__[j],
print
And? What do you think?
i know it is an old question, but for the sake of those jumping here i think it is useful to read the question below and the relative answer:
How to customize admin filter in Django 1.4
It may also be possible to use queryset annotations that duplicate the property get/set-logic, as suggested e.g. by #rattray and #thegrimmscientist, in conjunction with the property. This could yield something that works both on the Python level and on the database level.
Not sure about the drawbacks, however: see this SO question for an example.
I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?
It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.
I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.
I'd like to check for a particular object's existence within a ManyToMany relation. For instance:
class A(models.Model):
members = models.ManyToManyField(B)
class B(models.Model):
pass
results = [some query]
for r in results:
print r.has_object // True if object is related to some B of pk=1
My first stab at [some query] was A.objects.all().annotate(Count(has_object='members__id=1')) but it looks like I can't put anything more than the field name into the argument to Count. Is there some other way to do this?
You can try
A.objects.filter(members__id=1).exists()
I pretty sure there won't be any decently performing way to do this in pure Python until many-to-many prefetching gets implemented in 1.4
In the meantime, this is how I'd do it by dropping down into SQL:
results = A.objects.all().extra(
select={
'has_object': 'EXISTS(SELECT * FROM myapp_a_members WHERE a_id=myapp_a.id AND b_id=1)'
}
)
Of course, the simpler way would simply be to refactor your code to operate on two separate querysets:
results_with_member_1 = A.objects.filter(members__id=1)
results_without_member_1 = A.objects.exclude(members__id=1)
Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)
A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.
This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps
Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)
I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.
The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.
You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.
As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])
You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)
The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.
maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)
I'm curious if there's any way to do a query in Django that's not a "SELECT * FROM..." underneath. I'm trying to do a "SELECT DISTINCT columnName FROM ..." instead.
Specifically I have a model that looks like:
class ProductOrder(models.Model):
Product = models.CharField(max_length=20, promary_key=True)
Category = models.CharField(max_length=30)
Rank = models.IntegerField()
where the Rank is a rank within a Category. I'd like to be able to iterate over all the Categories doing some operation on each rank within that category.
I'd like to first get a list of all the categories in the system and then query for all products in that category and repeat until every category is processed.
I'd rather avoid raw SQL, but if I have to go there, that'd be fine. Though I've never coded raw SQL in Django/Python before.
One way to get the list of distinct column names from the database is to use distinct() in conjunction with values().
In your case you can do the following to get the names of distinct categories:
q = ProductOrder.objects.values('Category').distinct()
print q.query # See for yourself.
# The query would look something like
# SELECT DISTINCT "app_productorder"."category" FROM "app_productorder"
There are a couple of things to remember here. First, this will return a ValuesQuerySet which behaves differently from a QuerySet. When you access say, the first element of q (above) you'll get a dictionary, NOT an instance of ProductOrder.
Second, it would be a good idea to read the warning note in the docs about using distinct(). The above example will work but all combinations of distinct() and values() may not.
PS: it is a good idea to use lower case names for fields in a model. In your case this would mean rewriting your model as shown below:
class ProductOrder(models.Model):
product = models.CharField(max_length=20, primary_key=True)
category = models.CharField(max_length=30)
rank = models.IntegerField()
It's quite simple actually if you're using PostgreSQL, just use distinct(columns) (documentation).
Productorder.objects.all().distinct('category')
Note that this feature has been included in Django since 1.4
User order by with that field, and then do distinct.
ProductOrder.objects.order_by('category').values_list('category', flat=True).distinct()
The other answers are fine, but this is a little cleaner, in that it only gives the values like you would get from a DISTINCT query, without any cruft from Django.
>>> set(ProductOrder.objects.values_list('category', flat=True))
{u'category1', u'category2', u'category3', u'category4'}
or
>>> list(set(ProductOrder.objects.values_list('category', flat=True)))
[u'category1', u'category2', u'category3', u'category4']
And, it works without PostgreSQL.
This is less efficient than using a .distinct(), presuming that DISTINCT in your database is faster than a python set, but it's great for noodling around the shell.
Update:
This is answer is great for making queries in the Django shell during development. DO NOT use this solution in production unless you are absolutely certain that you will always have a trivially small number of results before set is applied. Otherwise, it's a terrible idea from a performance standpoint.