Django query aggregate upvotes in backward relation - django

I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?

You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.

Related

Elegant way of fetching multiple objects in custom order

What's an elegant way for fetching multiple objects in some custom order from a DB in django?
For example, suppose you have a few products, each with its name, and you want to fetch three of them to display in a row on your website page, in some fixed custom order. Suppose the names of the products which you want to display are, in order: ["Milk", "Chocolate", "Juice"]
One could do
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
products = [
unordered_products.filter(name="Milk")[0],
unordered_products.filter(name="Chocolate")[0],
unordered_products.filter(name="Juice")[0],
]
And the post-fetch ordering part could be improved to use a name-indexed dictionary instead:
ordered_product_names = ["Milk", "Chocolate", "Juice"]
products_by_name = dict((x.name, x) for x in unordered_products)
products = [products_by_name[name] for name in ordered_product_names]
But is there a more elegant way? e.g., convey the desired order to the DB layer somehow, or return the products grouped by their name (aggregation seems to be similar to what I want, but I want the actual objects, not statistics about them).
You can order your product by a custom order with only one query of your ORM (executing one SQL query only):
ordered_products = Product.objects.filter(
name__in=['Milk', 'Chocolate', 'Juice']
).annotate(
order=Case(
When(name='Milk', then=Value(0)),
When(name='Chocolate', then=Value(1)),
When(name='Juice', then=Value(2)),
output_field=IntegerField(),
)
).order_by('order')
Update
Note
Speaking about "elegant way" (and best practice) I think extra method (proposed by #Satendra) is absolutely to avoid.
Official Django documentation report this about extra :
Warning
You should be very careful whenever you use extra(). Every time you
use it, you should escape any parameters that the user can control by
using params in order to protect against SQL injection attacks .
Please read more about SQL injection protection.
Optimized version
If you want to handle more items whit only one query you can change my first query and use the Django ORM flexibility as suggested by #Shubhanshu in his answer:
products = ['Milk', 'Chocolate', 'Juice']
ordered_products = Product.objects.filter(
name__in=products
).order_by(Case(
*[When(name=n, then=i) for i, n in enumerate(products)],
output_field=IntegerField(),
))
The output of this command will be similar to this:
<QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>]>
And the SQL generated by the ORM will be like this:
SELECT "id", "name"
FROM "products"
WHERE "name" IN ('Milk', 'Chocolate', 'Juice')
ORDER BY CASE
WHEN "name" = 'Milk' THEN 0
WHEN "name" = 'Chocolate' THEN 1
WHEN "name" = 'Juice' THEN 2
ELSE NULL
END ASC
When there is no relation between the objects that you are fetching and you still wish to fetch (or arrange) them in certain (custom) order, you may try doing this:
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
product_order = ["Milk", "Chocolate", "Juice"]
preserved = Case(*[When(name=name, then=pos) for pos, name in enumerate(product_order)])
ordered_products = unordered_products.order_by(preserved)
Hope it helps!
Try this into meta class from model:
class Meta:
ordering = ('name', 'related__name', )
this get your records ordered by your specified field's
then: chocolate, chocolate blue, chocolate white, juice green, juice XXX, milk, milky, milk YYYY should keep that order when you fetch
Creating a QuerySet from a list while preserving order
This means the order of output QuerySet will be same as the order of list used to filter it.
The solution is more or less same as #PaoloMelchiorre answer
But if there are more items lets say 1000 products in
product_names then you don't have to worry about adding more conditions in Case, you can use extra method of QuerySet
product_names = ["Milk", "Chocolate", "Juice", ...]
clauses = ' '.join(['WHEN name=%s THEN %s' % (name, i) for i, name in enumerate(product_names)])
ordering = 'CASE %s END' % clauses
queryset = Product.objects.filter(name__in=product_names).extra(
select={'ordering': ordering}, order_by=('ordering',))
# Output: <QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>,...]>

Django sum values of a column after 'group by' in another column

I found some solutions here and in the django documentation, but I could not manage to make one query work the way I wanted.
I have the following model:
class Inventory(models.Model):
blindid = models.CharField(max_length=20)
massug = models.IntegerField()
I want to count the number of Blind_ID and then sum the massug after they were grouped.
My currently Django ORM
samples = Inventory.objects.values('blindid', 'massug').annotate(aliquots=Count('blindid'), total=Sum('massug'))
It's not counting correctly (it shows only one), thus it 's not summing correctly. It seems it is only getting the first result... I tried to use Count('blindid', distinct=True) and Count('blindid', distinct=False) as well.
This is the query result using samples.query. Django is grouping by the two columns...
SELECT "inventory"."blindid", "inventory"."massug", COUNT("inventory"."blindid") AS "aliquots", SUM("inventory"."massug") AS "total" FROM "inventory" GROUP BY "inventory"."blindid", "inventory"."massug"
This should be the raw sql
SELECT blindid,
Count(blindid) AS aliquots,
Sum(massug) AS total
FROM inventory
GROUP BY blindid
Try this:
samples = Inventory.objects.values('blindid').annotate(aliquots=Count('blindid'), total=Sum('massug'))

Django ORM: is it possible to inject subqueries?

I have a Django model that looks something like this:
class Result(models.Model):
date = DateTimeField()
subject = models.ForeignKey('myapp.Subject')
test_type = models.ForeignKey('myapp.TestType')
summary = models.PositiveSmallIntegerField()
# more fields about the result like its location, tester ID and so on
Sometimes we want to retrieve all the test results, other times we only want the most recent result of a particular test type for each subject. This answer has some great options for SQL that will find the most recent result.
Also, we sometimes want to bucket the results into different chunks of time so that we can graph the number of results per day / week / month.
We also want to filter on various fields, and for elegance I'd like a QuerySet that I can then make all the filter() calls on, and annotate for the counts, rather than making raw SQL calls.
I have got this far:
qs = Result.objects.extra(select = {
'date_range': "date_trunc('{0}', time)".format("day"), # Chunking into time buckets
'rn' : "ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)"})
qs = qs.values('date_range', 'result_summary', 'rn')
qs = qs.order_by('-date_range')
which results in the following SQL:
SELECT (ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)) AS "rn", (date_trunc('day', time)) AS "date_range", "myapp_result"."result_summary" FROM "myapp_result" ORDER BY "date_range" DESC
which is kind of approaching what I'd like, but now I need to somehow filter to only get the rows where rn = 1. I tried using the 'where' field in extra(), which gives me the following SQL and error:
SELECT (ROW_NUMBER() OVER(PARTITION BY subject_id, test_type_id ORDER BY time DESC)) AS "rn", (date_trunc('day', time)) AS "date_range", "myapp_result"."result_summary" FROM "myapp_result" WHERE "rn"=1 ORDER BY "date_range" DESC ;
ERROR: column "rn" does not exist
So I think the query that finds "rn" needs to be a subquery - but is it possible to do that somehow, perhaps using extra()?
I know I could do this with raw SQL but it just looks ugly! I'd love to find a nice neat way where I have a filterable QuerySet.
I guess the other option is to have a field in the model that indicates whether it is actually the most recent result of that test type for that subject...
I've found a way!
qs = Result.objects.extra(where = ["NOT EXISTS(SELECT * FROM myapp_result as T2 WHERE (T2.test_type_id = myapp_result.test_type_id AND T2.subject_id = myapp_result.subject ID AND T2.time > myapp_result.time))"])
This is based on a different option from the answer I referenced earlier. I can filter or annotate qs with whatever I want.
As an aside, on the way to this solution I tried this:
qq = Result.objects.extra(where = ["NOT EXISTS(SELECT * FROM myapp_result as T2 WHERE (T2.test_type_id = myapp_result.test_type_id AND T2.subject_id = myapp_result.subject ID AND T2.time > myapp_result.time))"])
qs = Result.objects.filter(id__in=qq)
Django embeds the subquery just as you want it to:
SELECT ...some fields... FROM "myapp_result"
WHERE ("myapp_result"."id" IN (SELECT "myapp_result"."id" FROM "myapp_result"
WHERE (NOT EXISTS(SELECT * FROM myapp_result as T2
WHERE (T2.subject_id = myapp_result.subject_id AND T2.test_type_id = myapp_result.test_type_id AND T2.time > myapp_result.time)))))
I realised this had more subqueries than I need, but I note it here as I can imagine it being useful to know that you can filter one queryset with another and Django does exactly what you'd hope for in terms of embedding the subquery (rather than, say, executing it and embedding the returned values, which would be horrid.)

Annotating a Django queryset with a left outer join?

Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)
A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.
This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps
Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)
I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.
The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.
You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.
As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])
You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)
The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.
maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)

How do I get the related objects In an extra().values() call in Django?

Thank to this post I'm able to easily do count and group by queries in a Django view:
Django equivalent for count and group by
What I'm doing in my app is displaying a list of coin types and face values available in my database for a country, so coins from the UK might have a face value of "1 farthing" or "6 pence". The face_value is the 6, the currency_type is the "pence", stored in a related table.
I have the following code in my view that gets me 90% of the way there:
def coins_by_country(request, country_name):
country = Country.objects.get(name=country_name)
coin_values = Collectible.objects.filter(country=country.id, type=1).extra(select={'count': 'count(1)'},
order_by=['-count']).values('count', 'face_value', 'currency_type')
coin_values.query.group_by = ['currency_type_id', 'face_value']
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
The currency_type_id comes across as the number stored in the foreign key field (i.e. 4). What I want to do is retrieve the actual object that it references as part of the query (the Currency model, so I can get the Currency.name field in my template).
What's the best way to do that?
You can't do it with values(). But there's no need to use that - you can just get the actual Collectible objects, and each one will have a currency_type attribute that will be the relevant linked object.
And as justinhamade suggests, using select_related() will help to cut down the number of database queries.
Putting it together, you get:
coin_values = Collectible.objects.filter(country=country.id,
type=1).extra(
select={'count': 'count(1)'},
order_by=['-count']
).select_related()
select_related() got me pretty close, but it wanted me to add every field that I've selected to the group_by clause.
So I tried appending values() after the select_related(). No go. Then I tried various permutations of each in different positions of the query. Close, but not quite.
I ended up "wimping out" and just using raw SQL, since I already knew how to write the SQL query.
def coins_by_country(request, country_name):
country = get_object_or_404(Country, name=country_name)
cursor = connection.cursor()
cursor.execute('SELECT count(*), face_value, collection_currency.name FROM collection_collectible, collection_currency WHERE collection_collectible.currency_type_id = collection_currency.id AND country_id=%s AND type=1 group by face_value, collection_currency.name', [country.id] )
coin_values = cursor.fetchall()
return render_to_response('icollectit/coins_by_country.html', {'coin_values': coin_values, 'country': country } )
If there's a way to phrase that exact query in the Django queryset language I'd be curious to know. I imagine that an SQL join with a count and grouping by two columns isn't super-rare, so I'd be surprised if there wasn't a clean way.
Have you tried select_related() http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
I use it a lot it seems to work well then you can go coin_values.currency.name.
Also I dont think you need to do country=country.id in your filter, just country=country but I am not sure what difference that makes other than less typing.