Django filter on two fields of the same foreign key object - django

I have a database schema similar to this:
class User(models.Model):
… (Some fields irrelevant for this query)
class UserNotifiy(models.Model):
user = models.ForeignKey(User)
target = models.ForeignKey(<Some other Model>)
notification_level = models.SmallPositivIntegerField(choices=(1,2,3))
Now I want to query for all Users that have a UserNotify object for a specific target and at least a specific notification level (e.g. 2).
If I do something like this:
User.objects.filter(usernotify__target=desired_target,
usernotify__notification_level__gte=2)
I get all Users that have a UserNotify object for the specified target and at least one UserNotify object with a notification_level greater or equal to 2. These two UserNotify objects, however, do not have to be identical.
I am aware that I can do something like this:
user_ids = UserNotify.objects.filter(target=desired_target,
notification_level__gte=2).values_list('user_id', flat=True)
users = User.objects.filter(id__in=user_ids).distinct()
But this seems a step too much for me and I believe it executes two queries.
Is there a way to solve my problem with a single query?

Actually I don't see how you can run the first query, given that usernotify is not a valid field name for User.
You should start from UserNotify as you did in your second example:
UserNotify.objects.filter(
target=desired_target,
notification_level__gte=2
).select_related('user').values('user').distinct()

I've been looking for this behaviour but I've never found a better way than the one you describe (creating a query for user ids and inject it in a User query). Note this is not bad since if your database support subqueries, your code should fire only one request composed by a query and a subquery.
However, if you just need a particular field from the User objects (for example first_name), you may try
qs = (UserNotify.objects
.filter(target=desired_target, notification_level__gte=2)
.values_list('user_id', 'user__first_name')
.order_by('user_id')
.distinct('user_id')
)

I am not sure if I understood your question, but:
class User(models.Model):
… (Some fields irrelevant for this query)
class UserNotifiy(models.Model):
user = models.ForeignKey(User, related_name="notifications")
target = models.ForeignKey(<Some other Model>)
notification_level = models.SmallPositivIntegerField(choices=(1,2,3))
Then
users = User.objects.select_related('notifications').filter(notifications__target=desired_target,
notifications__notification_level__gte=2).distinct('id')
for user in users:
notifications = [x for x in user.notifications.all()]
I don't have my vagrant box handy now, but I believe this should work.

Related

Return object when aggregating grouped fields in Django

Assuming the following example model:
# models.py
class event(models.Model):
location = models.CharField(max_length=10)
type = models.CharField(max_length=10)
date = models.DateTimeField()
attendance = models.IntegerField()
I want to get the attendance number for the latest date of each event location and type combination, using Django ORM. According to the Django Aggregation documentation, we can achieve something close to this, using values preceding the annotation.
... the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.
So using the example model, we can write:
event.objects.values('location', 'type').annotate(latest_date=Max('date'))
which does indeed group events by location and type, but does not return the attendance field, which is the desired behavior.
Another approach I tried was to use distinct i.e.:
event.objects.distinct('location', 'type').annotate(latest_date=Max('date'))
but I get an error
NotImplementedError: annotate() + distinct(fields) is not implemented.
I found some answers which rely on database specific features of Django, but I would like to find a solution which is agnostic to the underlying relational database.
Alright, I think this one might actually work for you. It is based upon an assumption, which I think is correct.
When you create your model object, they should all be unique. It seems highly unlikely that that you would have two events on the same date, in the same location of the same type. So with that assumption, let's begin: (as a formatting note, class Names tend to start with capital letters to differentiate between classes and variables or instances.)
# First you get your desired events with your criteria.
results = Event.objects.values('location', 'type').annotate(latest_date=Max('date'))
# Make an empty 'list' to store the values you want.
results_list = []
# Then iterate through your 'results' looking up objects
# you want and populating the list.
for r in results:
result = Event.objects.get(location=r['location'], type=r['type'], date=r['latest_date'])
results_list.append(result)
# Now you have a list of objects that you can do whatever you want with.
You might have to look up the exact output of the Max(Date), but this should get you on the right path.

Django ORM - LEFT JOIN with WHERE clause

I have made a previous post related to this problem here but because this is a related but new problem I thought it would be best to make another post for it.
I'm using Django 1.8
I have a User model and a UserAction model. A user has a type. UserAction has a time, which indicates how long the action took as well as a start_time which indicates when the action began. They look like this:
class User(models.Model):
user_type = models.IntegerField()
class UserAction:
user = models.ForeignKey(User)
time = models.IntegerField()
start_time = models.DateTimeField()
Now what I want to do is get all users of a given type and the sum of time of their actions, optionally filtered by the start_time.
What I am doing is something like this:
# stubbing in a start time to filter by
start_time = datetime.now() - datetime.timedelta(days=2)
# stubbing in a type
type = 2
# this gives me the users and the sum of the time of their actions, or 0 if no
# actions exist
q = User.objects.filter(user_type=type).values('id').annotate(total_time=Coalesce(Sum(useraction__time), 0)
# now I try to add the filter for start_time of the actions to be greater than or # equal to start_time
q = q.filter(useraction__start_time__gte=start_time)
Now what this does is of course is an INNER JOIN on UserAction, thus removing all the users without actions. What I really want to do is the equivalent of my LEFT JOIN with a WHERE clause, but for the life of me I can't find how to do that. I've looked at the docs, looked at the source but am not finding an answer. I'm (pretty) sure this is something that can be done, I'm just not seeing how. Could anyone point me in the right direction? Any help would be very much appreciated. Thanks much!
I'm having the same kind of problem as you. I haven't found any proper way of solving the problem yet, but I've found a few fixes.
One way would be looping through all the users:
q = User.objects.filter(user_type=type)
for (u in q):
u.time_sum = UserAction.filter(user=u, start_time__gte=start_time).aggregate(time_sum=Sum('time'))['time_sum']
This method does however a query at the database for each user. It might do the trick if you don't have many users, but might get very time-consuming if you have a large database.
Another way of solving the problem would be using the extra method of the QuerySet API. This is a method that is detailed in this blog post by Timmy O'Mahony.
valid_actions = UserAction.objects.filter(start_time__gte=start_time)
q = User.objects.filter(user_type=type).extra(select={
"time_sum": """
SELECT SUM(time)
FROM userAction
WHERE userAction.user_id = user.id
AND userAction.id IN %s
""" % (%s) % ",".join([str(uAction.id) for uAction in valid_actions.all()])
})
This method however relies on calling the database with the SQL table names, which is very un-Django - if you change the db_table of one of your databases or the db_column of one of their columns, this code will no longer work. It though only requires 2 queries, the first one to get the list of valid userAction and the other one to sum them to the matching user.

Django and writing queries with lots of joins

I have trouble to make these kind of queries with lots of joins. I didn't found examples, but I guess they are not so complicated to write. It's just there are several FKs.
Here is the models.py (not complicated)
class User(AbstractBaseUser, PermissionsMixin): # Django custom user model
# Some stuff
class CliProfile(models.Model):
user = models.OneToOneField(settings.AUTH_USER_MODEL)
class BizProfile(models.Model):
user = models.OneToOneField(settings.AUTH_USER_MODEL)
class Card(models.Model):
linked_client = models.ForeignKey(CliProfile, blank=True, null=True)
class Points(models.Model):
benef_card = models.ForeignKey(Card)
at_owner = models.ForeignKey(BizProfile)
creation_date = models.DateTimeField(auto_now_add=True)
Quick description of the model
a user can be a client (using CliProfile) or a business (using BizProfile)
each card is linked to a client
each card contains a [points - business] association
This way: a client has a card and can has 3 points at Pizza Hut, and 5 points at McDonalds with the same card)
The request I'm trying to write
Functionally speaking, the purpose is a owner (like PizzaHut) can see all his clients (client who have cards which has points at Pizza Hut)
Technically speaking, I'm trying to write a query to get all clients (ie. a CliProfile queryset) whose cards (at least 1 of all) whose points (at least 1 of all) whose owner (there is only 1) whose user (there is only 1) = request.user ?
Do you have any idea how to write such a query? Thanks a lot.
To match fields within models in filter() you need to use two underscores. The following worked for me
CliProfile.objects.filter(card__points__at_owner=request.user)
But #Alex's suggestion makes the most sense unless this was just an example of what you are trying to do.
If you wanted profiles that are associated with one of several cards you can use the __in field lookup:
CliProfile.objects.filter(card__in=IterableOfCards)
Also you don't use == in filter(). That would return True or False and then pass that value in the filter() call effectively making the call filter(True or False) which won't do anything useful. you have to use = because you are passing a named parameter into the filter function.
Why card instead of card_set()?
cart_set only exists within an instance of a CliProfile. You are not in an instance of a CliProfile, you are trying to get a list of them.
You can try it in the terminal and it will tell you the valid choices.
#Note that it doesn't matter what you put after=, since it fails before that is checked.
>>> CliProfile.objects.filter(card_set=True)
FieldError: Cannot resolve keyword 'card_set' into field. Choices are: card, id, user
a CliProfile can be referenced by multiple cards, which is why card_set exists in it but you are trying to match one card. The card whose points at_owner field is request.user.
You would use a_cliprofile_instance.card_set.filter() to get a subset of their cards or a_cliprofile_instance.card_set.all() to display all of their cards

Annotating a Django queryset with a left outer join?

Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)
A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.
This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps
Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)
I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.
The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.
You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.
As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])
You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)
The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.
maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)

Limit django queryset by another related table

Lets say I have 2 django models like this:
class Spam(models.Model):
somefield = models.CharField()
class Eggs(models.Model):
parent_spam = models.ForeignKey(Spam)
child_spam = models.ForeignKey(Spam)
Given the input of a "Spam" object, how would the django query looks like that:
Limits this query based on the parent_spam field in the "Eggs" table
Gives me the corresponding child_spam field
And returns a set of "Spam" objects
In SQL:
SELECT * FROM Spam WHERE id IN (SELECT child_spam FROM Eggs WHERE parent_spam = 'input_id')
I know this is only an example, but this model setup doesn't actually validate as it is - you can't have two separate ForeignKeys pointing at the same model without specifying a related_name. So, assuming the related names are egg_parent and egg_child respectively, and your existing Spam object is called my_spam, this would do it:
my_spam.egg_parent.child_spam.all()
or
Spam.objects.filter(egg_child__parent_spam=my_spam)
Even better, define a ManyToManyField('self') on the Spam model, which handles all this for you, then you would do:
my_spam.other_spams.all()
According to your sql code you need something like this
Spam.objects.filter(id__in= \
Eggs.objects.values_list('child_spam').filter(parent_spam='input_id'))