Checking for object's existence in ManyToMany relation (Django) - django

I'd like to check for a particular object's existence within a ManyToMany relation. For instance:
class A(models.Model):
members = models.ManyToManyField(B)
class B(models.Model):
pass
results = [some query]
for r in results:
print r.has_object // True if object is related to some B of pk=1
My first stab at [some query] was A.objects.all().annotate(Count(has_object='members__id=1')) but it looks like I can't put anything more than the field name into the argument to Count. Is there some other way to do this?

You can try
A.objects.filter(members__id=1).exists()

I pretty sure there won't be any decently performing way to do this in pure Python until many-to-many prefetching gets implemented in 1.4
In the meantime, this is how I'd do it by dropping down into SQL:
results = A.objects.all().extra(
select={
'has_object': 'EXISTS(SELECT * FROM myapp_a_members WHERE a_id=myapp_a.id AND b_id=1)'
}
)
Of course, the simpler way would simply be to refactor your code to operate on two separate querysets:
results_with_member_1 = A.objects.filter(members__id=1)
results_without_member_1 = A.objects.exclude(members__id=1)

Related

Using django select_related with an additional filter

I'm trying to find an optimal way to execute a query, but got myself confused with the prefetch_related and select_related use cases.
I have a 3 table foreign key relationship: A -> has 1-many B h-> as 1-many C.
class A(models.model):
...
class B(models.model):
a = models.ForeignKey(A)
class C(models.model):
b = models.ForeignKey(B)
data = models.TextField(max_length=50)
I'm trying to get a list of all C.data for all instances of A that match a criteria (an instance of A and all its children), so I have something like this:
qs1 = A.objects.all().filter(Q(id=12345)|Q(parent_id=12345))
qs2 = C.objects.select_related('B__A').filter(B__A__in=qs1)
But I'm wary of the (Prefetch docs stating that:
any subsequent chained methods which imply a different database query
will ignore previously cached results, and retrieve data using a fresh
database query
I don't know if that applies here (because I'm using select_related), but reading it makes it seem as if anything gained from doing select_related is lost as soon as I do the filter.
Is my two-part query as optimal as it can be? I don't think I need prefetch as far as I'm aware, although I noticed I can swap out select_related with prefetch_related and get the same result.
I think your question is driven by a misconception. select_related (and prefetch_related) are an optimisation, specifically for returning values in related models along with the original query. They are never required.
What's more, neither has any impact at all on filter. Django will automatically do the relevant joins and subqueries in order to make your query, whether or not you use select_related.

Get information from a model using unrelated field

I have these two models:
class A(models.Model):
name=models.CharField(max_length=10)
class D(models.Model):
code=models.IntegerField()
the code field can have a number that exists in model A but it cant be related due to other factors. But what I want know is to list items from A whose value is the same with code
items=D.objects.values('code__name')
would work but since they are not related nor can be related, how can I handle that?
You can use Subquery() expressions in Django 1.11 or newer.
from django.db.models import OuterRef, Subquery
code_subquery = A.objects.filter(id=OuterRef('code'))
qs = D.objects.annotate(code_name=Subquery(code_subquery.values('name')))
The output of qs is a queryset of objects D with an added field code_name.
Footnotes:
It is compiled to a very similar SQL (like the Bear Brown's solution with "extra" method, but without disadvantages of his solution, see there):
SELECT app_d.id, app_d.code,
(SELECT U0.name FROM app_a U0 WHERE U0.id = (app_d.code)) AS code_name
FROM app_d
If a dictionary output is required it can be converted by .values() finally. It can work like a left join i.e. if the pseudo related field allows null (code = models.IntegerField(none=True)) then the objects D are not restricted and the output code_name value could be None. A feature of Subquery is that it returns only one field expression must be eventually repeated for another fields. (That is similar to extra(select={...: "SELECT ..."}), but thanks to object syntax it can be more readable customized than an explicit SQL.)
you can use django extra, replace YOUAPP on your real app name
D.objects.extra(select={'a_name': 'select name from YOUAPP_a where id=code'}).values('a_name')
# Replace YOUAPP^^^^^

django orm - How to use select_related() on the Foreign Key of a Subclass from its Super Class

I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?
It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.
I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.

Complex reverse query in Django

In a nutshell: my models are B --> A <-- C, I want to filter Bs where at least one C exists, satisfying some arbitrary conditions and related to the same A as that B. Help with some complicating factors (see below) is also appreciated.
Details:
I'm trying to create a generic model to limit user access to rows in other models. Here's a (simplified) example:
class CanRead(models.Model):
user = models.ForeignKey(User)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type', 'object_id')
class Direct(models.Model):
...
class Indirect(models.Model):
direct = models.ForeignKey(Direct)
...
class Indirect2(models.Model):
indirect = models.ForeignKey(Indirect)
...
It's not feasible to associate a CanRead to every row in every model (too costly in space), so only some models are expected to have that association (like Direct above). In this case, here's how I'd see if a Direct is accessible to a user or not:
Direct.objects.filter(Q(canread__user=current_user), rest_of_query)
(Unfortunately, this query won't work - in 1.2.5 at least - because of the generic fk; any help with this would be appreciated, but there are workarounds, the real issue is what follows next)
The others' accessibility will be dictated by their relations with other models. So, Indirect will be accessible to an user if direct is accessible, and Indirect2 will be if indirect__direct is, etc.
My problem is, how can I do this query? I'm tempted to write something like:
Indirect.objects.filter(Q(canread__content_object=F('direct'), canread__user=current_user), rest_of_query)
Indirect2.objects.filter(Q(canread__content_object=F('indirect__direct'), canread__user=current_user), rest_of_query)
but that doesn't work (Django expects a relation between CanRead and Indirect - which doesn't exist - for the reverse query to work). If I were writing it directy in SQL, I would do something like:
SELECT *
FROM indirect i
JOIN direct d ON i.direct = d.id
JOIN canread c ON c.object_id = d.id
WHERE
c.content_type = <<content type for Direct>> AND
c.user = <<current user>> AND
<<rest_of_query>>
but I can't translate that query to Django. Is it possible? If not, what would be the least instrusive way of doing it (using as little raw SQL as possible)?
Thanks for your time!
Note: The workaround mentioned would be not to use generic fk... :( I could discard the CanRead model and have many CanReadDirect, CanReadDirect2, CanReadDirect3, etc. It's a minor hassle, but wouldn't hinder my project too much.
For the simple case you've given, the solution is simple:
B.objects.filter(a__c__isnull=False)
For the actual query, here's my try:
Indirect.objects.filter(direct__id__in=
zip(*CanRead.objects.filter(
content_type=ContentType.objects.get_for_model(Direct)
).values_list('id'))[0])
But this way is very slow: you extract IDs from one queryset, then do a query with
where id in (1, 2, 3, ... 10000)
Which is VERY SLOW. We had a similar issue with joins on generic foreign keys in our project and decided to resort to raw queries in the model manager.
class DirectManager(Manager):
def can_edit(self, user):
return self.raw(...)
I'd also recommend checking out the per-row permissions framework in Django 1.3.
access control models are not that simple...
use a well-known access control model such as:
DAC/MAC
or
RBAC
also there is a project called django-rbac.

django: select_related with entry_set

Should entry_set be cached with select_related? My DB is still getting calls even after I use select_related. The pertinent sections
class Alias(models.Model):
achievements = models.ManyToManyField('Achievement', through='Achiever')
def points(self) :
points = 0
for a in self.achiever_set.all() :
points += a.achievement.points * a.count
return points
class Achievement(models.Model):
name = models.CharField(max_length=100)
points = models.IntegerField(default=1)
class Achiever(models.Model):
achievement = models.ForeignKey(Achievement)
alias = models.ForeignKey(Alias)
count = models.IntegerField(default=1)
aliases = Alias.objects.all().select_related()
for alias in aliases :
print "points : %s" % alias.points()
for a in alias.achiever_set.all()[:5] :
print "%s x %d" % (a.achievement.name, a.count)
And I'm seeing a big join query at the start, and then individual calls for each achievement. Both for the points and for the name lookup.
Is this a bug, or am I doing something wrong?
With Django 1.4 you can use prefetch_related which will work for ManyToMany relations:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related
Select_related() doesn't work with manytomanyfields. At the moment, this is something that is not planned, but might be a future feature. See http://code.djangoproject.com/ticket/6432
In this case, if you want to make a single query you got two options
1) Make your own SQL, probably won't be pretty or fast.
2) You could also query on the model with the foreignkey. You would be able to use select_related in that case. You stil won't be able to access the modelname_set but with some formatting you would be able to vet the data you need in a single query. None of the options are ideal, but you could get it working at a deacent speed aswell.
In Django 1.3 You can use Queryset.values() and do something like:
Alias.objects[.filter().exclude() etc.].values('achievements__name', 'achievement__points')
Only drwaback is that You get QuerySetList instead of QuerySet. But this can be simply overcome by passing all necessary fields into values() - You have to change Your perception ;)
This can save you few dosen of queries...
Details can be found here in django docs:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.values