In a nutshell: my models are B --> A <-- C, I want to filter Bs where at least one C exists, satisfying some arbitrary conditions and related to the same A as that B. Help with some complicating factors (see below) is also appreciated.
Details:
I'm trying to create a generic model to limit user access to rows in other models. Here's a (simplified) example:
class CanRead(models.Model):
user = models.ForeignKey(User)
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = generic.GenericForeignKey('content_type', 'object_id')
class Direct(models.Model):
...
class Indirect(models.Model):
direct = models.ForeignKey(Direct)
...
class Indirect2(models.Model):
indirect = models.ForeignKey(Indirect)
...
It's not feasible to associate a CanRead to every row in every model (too costly in space), so only some models are expected to have that association (like Direct above). In this case, here's how I'd see if a Direct is accessible to a user or not:
Direct.objects.filter(Q(canread__user=current_user), rest_of_query)
(Unfortunately, this query won't work - in 1.2.5 at least - because of the generic fk; any help with this would be appreciated, but there are workarounds, the real issue is what follows next)
The others' accessibility will be dictated by their relations with other models. So, Indirect will be accessible to an user if direct is accessible, and Indirect2 will be if indirect__direct is, etc.
My problem is, how can I do this query? I'm tempted to write something like:
Indirect.objects.filter(Q(canread__content_object=F('direct'), canread__user=current_user), rest_of_query)
Indirect2.objects.filter(Q(canread__content_object=F('indirect__direct'), canread__user=current_user), rest_of_query)
but that doesn't work (Django expects a relation between CanRead and Indirect - which doesn't exist - for the reverse query to work). If I were writing it directy in SQL, I would do something like:
SELECT *
FROM indirect i
JOIN direct d ON i.direct = d.id
JOIN canread c ON c.object_id = d.id
WHERE
c.content_type = <<content type for Direct>> AND
c.user = <<current user>> AND
<<rest_of_query>>
but I can't translate that query to Django. Is it possible? If not, what would be the least instrusive way of doing it (using as little raw SQL as possible)?
Thanks for your time!
Note: The workaround mentioned would be not to use generic fk... :( I could discard the CanRead model and have many CanReadDirect, CanReadDirect2, CanReadDirect3, etc. It's a minor hassle, but wouldn't hinder my project too much.
For the simple case you've given, the solution is simple:
B.objects.filter(a__c__isnull=False)
For the actual query, here's my try:
Indirect.objects.filter(direct__id__in=
zip(*CanRead.objects.filter(
content_type=ContentType.objects.get_for_model(Direct)
).values_list('id'))[0])
But this way is very slow: you extract IDs from one queryset, then do a query with
where id in (1, 2, 3, ... 10000)
Which is VERY SLOW. We had a similar issue with joins on generic foreign keys in our project and decided to resort to raw queries in the model manager.
class DirectManager(Manager):
def can_edit(self, user):
return self.raw(...)
I'd also recommend checking out the per-row permissions framework in Django 1.3.
access control models are not that simple...
use a well-known access control model such as:
DAC/MAC
or
RBAC
also there is a project called django-rbac.
Related
I am working on converting some relatively complex SQL into something that Django can play with. I am trying not to just use the raw SQL, since I think playing with the standard Django toolkit will help me learn more about Django.
I have already managed to break up parts of the sql into chunks, and am tackling them piecemeal to make things a little easier.
Here is the SQL in question:
SELECT i.year, i.brand, i.desc, i.colour, i.size, i.mpn, i.url,
COALESCE(DATE_FORMAT(i_eta.eta, '%M %Y'),'Unknown')
as eta
FROM i
JOIN i_eta ON i_eta.mpn = i.mpn
WHERE category LIKE 'kids'
ORDER BY i.brand, i.desc, i.colour, FIELD(size, 'xxl','xl','l','ml','m','s','xs','xxs') DESC, size+0, size
Here is what I have (trying to convert line by line):
(grabbed automatically when performing filters)
(have to figure out django doc on coalesce for syntax)
db alias haven't been able to find yet - it is crucial since there is a db view that requires it
already included in the original q
.select_related?
.filter(category="kids")
.objects.order_by('brand','desc','colour') - don't know how to deal with SQL FIELDS
Any advice would be appreciated!
Here's how I would structure this.
First, I'm assuming your models for i and i_eta look something like this:
class I(models.Model):
mpn = models.CharField(max_length=30, primary_key=True)
year = models.CharField(max_length=30)
brand = models.CharField(max_length=30)
desc = models.CharField(max_length=100)
colour = models.CharField(max_length=30)
size = models.CharField(max_length=3)
class IEta(models.Model):
i = models.ForeignKey(I, on_delete=models.CASCADE)
eta = models.DateField()
General thoughts:
To write the coalesce in Django: I would not replace nulls with "Unknown" in the ORM. This is a presentation-layer concern: it should be dealt with in a template.
For date formatting, you can do date formatting in Python.
Not sure what a DB alias is.
For using multiple tables together, you can use either select_related(), prefetch_related(), or do nothing.
select_related() will perform a join.
prefect_related() will get the foreign key ID's from the first queryset, then generate a query like SELECT * FROM table WHERE id in (12, 13, 14).
Doing nothing will work automatically, but has the disadvantage of the SELECT N+1 problem.
I generally prefer prefetch_related().
For customizing the sort order of the size field, you have three options. My preference would be option 1, but any of the three will work.
Denormalize the sort criteria. Add a new field called size_numeric. Override the save() method to populate this field when saving new instances, giving xxl the value 1, xl the value 2, etc.
Sort in Python. Essentially, you use Python's built-in sorting methods to do the sort, rather than sorting it in the database. This doesn't work well if you have thousands of results.
Invoke the MySQL function. Essentially, using annotate(), you add the output of a function to the queryset. order_by() can sort by that function.
I'm trying to find an optimal way to execute a query, but got myself confused with the prefetch_related and select_related use cases.
I have a 3 table foreign key relationship: A -> has 1-many B h-> as 1-many C.
class A(models.model):
...
class B(models.model):
a = models.ForeignKey(A)
class C(models.model):
b = models.ForeignKey(B)
data = models.TextField(max_length=50)
I'm trying to get a list of all C.data for all instances of A that match a criteria (an instance of A and all its children), so I have something like this:
qs1 = A.objects.all().filter(Q(id=12345)|Q(parent_id=12345))
qs2 = C.objects.select_related('B__A').filter(B__A__in=qs1)
But I'm wary of the (Prefetch docs stating that:
any subsequent chained methods which imply a different database query
will ignore previously cached results, and retrieve data using a fresh
database query
I don't know if that applies here (because I'm using select_related), but reading it makes it seem as if anything gained from doing select_related is lost as soon as I do the filter.
Is my two-part query as optimal as it can be? I don't think I need prefetch as far as I'm aware, although I noticed I can swap out select_related with prefetch_related and get the same result.
I think your question is driven by a misconception. select_related (and prefetch_related) are an optimisation, specifically for returning values in related models along with the original query. They are never required.
What's more, neither has any impact at all on filter. Django will automatically do the relevant joins and subqueries in order to make your query, whether or not you use select_related.
I have a database schema similar to this:
class User(models.Model):
… (Some fields irrelevant for this query)
class UserNotifiy(models.Model):
user = models.ForeignKey(User)
target = models.ForeignKey(<Some other Model>)
notification_level = models.SmallPositivIntegerField(choices=(1,2,3))
Now I want to query for all Users that have a UserNotify object for a specific target and at least a specific notification level (e.g. 2).
If I do something like this:
User.objects.filter(usernotify__target=desired_target,
usernotify__notification_level__gte=2)
I get all Users that have a UserNotify object for the specified target and at least one UserNotify object with a notification_level greater or equal to 2. These two UserNotify objects, however, do not have to be identical.
I am aware that I can do something like this:
user_ids = UserNotify.objects.filter(target=desired_target,
notification_level__gte=2).values_list('user_id', flat=True)
users = User.objects.filter(id__in=user_ids).distinct()
But this seems a step too much for me and I believe it executes two queries.
Is there a way to solve my problem with a single query?
Actually I don't see how you can run the first query, given that usernotify is not a valid field name for User.
You should start from UserNotify as you did in your second example:
UserNotify.objects.filter(
target=desired_target,
notification_level__gte=2
).select_related('user').values('user').distinct()
I've been looking for this behaviour but I've never found a better way than the one you describe (creating a query for user ids and inject it in a User query). Note this is not bad since if your database support subqueries, your code should fire only one request composed by a query and a subquery.
However, if you just need a particular field from the User objects (for example first_name), you may try
qs = (UserNotify.objects
.filter(target=desired_target, notification_level__gte=2)
.values_list('user_id', 'user__first_name')
.order_by('user_id')
.distinct('user_id')
)
I am not sure if I understood your question, but:
class User(models.Model):
… (Some fields irrelevant for this query)
class UserNotifiy(models.Model):
user = models.ForeignKey(User, related_name="notifications")
target = models.ForeignKey(<Some other Model>)
notification_level = models.SmallPositivIntegerField(choices=(1,2,3))
Then
users = User.objects.select_related('notifications').filter(notifications__target=desired_target,
notifications__notification_level__gte=2).distinct('id')
for user in users:
notifications = [x for x in user.notifications.all()]
I don't have my vagrant box handy now, but I believe this should work.
Say I have a model that is
class Bottles(models.Model)
BottleCode = models.IntegerField()
class Labels(models.Model)
LabelCode = models.IntegerField()
How do I get a queryset of Bottles where the BottleCode and LabelCode are equal? (i.e. Bottles and Labels with no common Code are excluded)
It can be achieved via extra():
Bottles.objects.extra(where=["Bottles.BottleCode in (select Labels.LabelCode from Labels)"])
You may also need to add an app name prefix to the table names, e.g. app_bottles instead of bottles.
Though #danihp has a point here, if you would often encounter queries like these, when you are trying to relate unrelated models - you should probably think about changing your model design.
I've always found the Django orm's handling of subclassing models to be pretty spiffy. That's probably why I run into problems like this one.
Take three models:
class A(models.Model):
field1 = models.CharField(max_length=255)
class B(A):
fk_field = models.ForeignKey('C')
class C(models.Model):
field2 = models.CharField(max_length=255)
So now you can query the A model and get all the B models, where available:
the_as = A.objects.all()
for a in the_as:
print a.b.fk_field.field2 #Note that this throws an error if there is no B record
The problem with this is that you are looking at a huge number of database calls to retrieve all of the data.
Now suppose you wanted to retrieve a QuerySet of all A models in the database, but with all of the subclass records and the subclass's foreign key records as well, using select_related() to limit your app to a single database call. You would write a query like this:
the_as = A.objects.select_related("b", "b__fk_field").all()
One query returns all of the data needed! Awesome.
Except not. Because this version of the query is doing its own filtering, even though select_related is not supposed to filter any results at all:
set_1 = A.objects.select_related("b", "b__fk_field").all() #Only returns A objects with associated B objects
set_2 = A.objects.all() #Returns all A objects
len(set_1) > len(set_2) #Will always be False
I used the django-debug-toolbar to inspect the query and found the problem. The generated SQL query uses an INNER JOIN to join the C table to the query, instead of a LEFT OUTER JOIN like other subclassed fields:
SELECT "app_a"."field1", "app_b"."fk_field_id", "app_c"."field2"
FROM "app_a"
LEFT OUTER JOIN "app_b" ON ("app_a"."id" = "app_b"."a_ptr_id")
INNER JOIN "app_c" ON ("app_b"."fk_field_id" = "app_c"."id");
And it seems if I simply change the INNER JOIN to LEFT OUTER JOIN, then I get the records that I want, but that doesn't help me when using Django's ORM.
Is this a bug in select_related() in Django's ORM? Is there any work around for this, or am I simply going to have to do a direct query of the database and map the results myself? Should I be using something like Django-Polymorphic to do this?
It looks like a bug, specifically it seems to be ignoring the nullable nature of the A->B relationship, if for example you had a foreign key reference to B in A instead of the subclassing, that foreign key would of course be nullable and django would use a left join for it. You should probably raise this in the django issue tracker. You could also try using prefetch_related instead of select_related that might get around your issue.
I found a work around for this, but I will wait a while to accept it in hopes that I can get some better answers.
The INNER JOIN created by the select_related('b__fk_field') needs to be removed from the underlying SQL so that the results aren't filtered by the B records in the database. So the new query needs to leave the b__fk_field parameter in select_related out:
the_as = A.objects.select_related('b')
However, this forces us to call the database everytime a C object is accessed from the A object.
for a in the_as:
#Note that this throws an DoesNotExist error if a doesn't have an
#associated b
print a.b.fk_field.field2 #Hits the database everytime.
The hack to work around this is to get all of the C objects we need from the database from one query and then have each B object reference them manually. We can do this because the database call that accesses the B objects retrieved will have the fk_field_id that references their associated C object:
c_ids = [a.b.fk_field_id for a in the_as] #Get all the C ids
the_cs = C.objects.filter(pk__in=c_ids) #Run a query to get all of the needed C records
for c in the_cs:
for a in the_as:
if a.b.fk_field_id == c.pk: #Throws DoesNotExist if no b associated with a
a.b.fk_field = c
break
I'm sure there's a functional way to write that without the nested loop, but this illustrates what's happening. It's not ideal, but it provides all of the data with the absolute minimum number of database hits - which is what I wanted.