I need to fetch the top performer for each month, here is the below MySql query which gives me the correct output.
select id,Name,totalPoints, createdDateTime
from userdetail
where app=4 and totalPoints in ( select
max(totalPoints)
FROM userdetail
where app=4
group by month(createdDateTime), year(createdDateTime))
order by totalPoints desc
I am new to Django ORM. I am not able to write an equivalent Django query which does the task. I have been struggling with this logic for 2 days. Any help would be highly appreciated.
While the GROUP BY clause in a subquery is slightly difficult to express with the ORM because aggregate() operations don't emit querysets, a similar effect can be achieved with a Window function:
UserDetail.objects.filter(total_points__in=UserDetail.objects.annotate(max_points=Window(
expression=Max('total_points'),
partition_by=[Trunc('created_datetime', 'month')]
)).values('max_points')
)
In general, this sort of pattern is implemented with Subquery expressions. In this case, I've implicitly used a subquery by passing a queryset to an __in predicate.
The Django documentation's notes on using aggregates within subqueries is are also relevant to this sort of query, since you want to use the results of an aggregate in a subquery (which I've avoided by using a window function).
However, I believe your query may not correctly capture what you want to do: as written it could return rows for users who weren't the best in a given month but did have the same score as another user who was the best in any month.
Related
how can I translate query like this in django orm?
select id, year, month
where (year*100 + month) between 201703 and 201801
Thanks in advance for any help
You can first create an annotation, and then filter on that:
from django.db.models import F
(Modelname.objects
.annotate(yearmonth=F('year')*100+F('month'))
.filter(yearmonth__range=(201703, 201801)))
So here we construct an annotation yearmonth (you can use another name if you like), and make it equal to the year column times 100 plus the month column. Next we can filter on that annotation, and do so by specifying a __range here with two bounds.
Normally this will work for any database system that performs the operations you here perform (multiplying a column with a constant number, adding two values together), as well as do the __range filter (in MySQL this is translated into <var> BETWEEN <min> AND <max>). Since we use Django ORM however, if we later decide to use another database, the query will be translated in the other database query language (given of course that is possible).
How about using something similar to this.
Did you try filter and __range
Created_at will be the field in your DB
ModelName.objects.filter(created_at__range=(start_date, end_date))
Later you can do calculation in your view this is just a workaround.
If you want to run the exactly same query then probably you can run using.
ModelName.objects.raw("select id, year, month
where (year*100 + month) between 201703 and 201801")
This is a bleeding-edge feature that I'm currently skewered upon and quickly bleeding out. I want to annotate a subquery-aggregate onto an existing queryset. Doing this before 1.11 either meant custom SQL or hammering the database. Here's the documentation for this, and the example from it:
from django.db.models import OuterRef, Subquery, Sum
comments = Comment.objects.filter(post=OuterRef('pk')).values('post')
total_comments = comments.annotate(total=Sum('length')).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
They're annotating on the aggregate, which seems weird to me, but whatever.
I'm struggling with this so I'm boiling it right back to the simplest real-world example I have data for. I have Carparks which contain many Spaces. Use Book→Author if that makes you happier but —for now— I just want to annotate on a count of the related model using Subquery*.
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
This gives me a lovely ProgrammingError: more than one row returned by a subquery used as an expression and in my head, this error makes perfect sense. The subquery is returning a list of spaces with the annotated-on total.
The example suggested that some sort of magic would happen and I'd end up with a number I could use. But that's not happening here? How do I annotate on aggregate Subquery data?
Hmm, something's being added to my query's SQL...
I built a new Carpark/Space model and it worked. So the next step is working out what's poisoning my SQL. On Laurent's advice, I took a look at the SQL and tried to make it more like the version they posted in their answer. And this is where I found the real problem:
SELECT "bookings_carpark".*, (SELECT COUNT(U0."id") AS "c"
FROM "bookings_space" U0
WHERE U0."carpark_id" = ("bookings_carpark"."id")
GROUP BY U0."carpark_id", U0."space"
)
AS "space_count" FROM "bookings_carpark";
I've highlighted it but it's that subquery's GROUP BY ... U0."space". It's retuning both for some reason. Investigations continue.
Edit 2: Okay, just looking at the subquery SQL I can see that second group by coming through ☹
In [12]: print(Space.objects_standard.filter().values('carpark').annotate(c=Count('*')).values('c').query)
SELECT COUNT(*) AS "c" FROM "bookings_space" GROUP BY "bookings_space"."carpark_id", "bookings_space"."space" ORDER BY "bookings_space"."carpark_id" ASC, "bookings_space"."space" ASC
Edit 3: Okay! Both these models have sort orders. These are being carried through to the subquery. It's these orders that are bloating out my query and breaking it.
I guess this might be a bug in Django but short of removing the Meta-order_by on both these models, is there any way I can unsort a query at querytime?
*I know I could just annotate a Count for this example. My real purpose for using this is a much more complex filter-count but I can't even get this working.
Shazaam! Per my edits, an additional column was being output from my subquery. This was to facilitate ordering (which just isn't required in a COUNT).
I just needed to remove the prescribed meta-order from the model. You can do this by just adding an empty .order_by() to the subquery. In my code terms that meant:
from django.db.models import Count, OuterRef, Subquery
spaces = Space.objects.filter(carpark=OuterRef('pk')).order_by().values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
And that works. Superbly. So annoying.
It's also possible to create a subclass of Subquery, that changes the SQL it outputs. For instance, you can use:
class SQCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
You then use this as you would the original Subquery class:
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
Carpark.objects.annotate(space_count=SQCount(spaces))
You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.
I just bumped into a VERY similar case, where I had to get seat reservations for events where the reservation status is not cancelled. After trying to figure the problem out for hours, here's what I've seen as the root cause of the problem:
Preface: this is MariaDB, Django 1.11.
When you annotate a query, it gets a GROUP BY clause with the fields you select (basically what's in your values() query selection). After investigating with the MariaDB command line tool why I'm getting NULLs or Nones on the query results, I've came to the conclusion that the GROUP BY clause will cause the COUNT() to return NULLs.
Then, I started diving into the QuerySet interface to see how can I manually, forcibly remove the GROUP BY from the DB queries, and came up with the following code:
from django.db.models.fields import PositiveIntegerField
reserved_seats_qs = SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES
).values('id').annotate(
count=Count('id')).values('count')
# Query workaround: remove GROUP BY from subquery. Test this
# vigorously!
reserved_seats_qs.query.group_by = []
performances_qs = Performance.objects.annotate(
reserved_seats=Subquery(
queryset=reserved_seats_qs,
output_field=PositiveIntegerField()))
print(performances_qs[0].reserved_seats)
So basically, you have to manually remove/update the group_by field on the subquery's queryset in order for it to not have a GROUP BY appended on it on execution time. Also, you'll have to specify what output field the subquery will have, as it seems that Django fails to recognize it automatically, and raises exceptions on the first evaluation of the queryset. Interestingly, the second evaluation succeeds without it.
I believe this is a Django bug, or an inefficiency in subqueries. I'll create a bug report about it.
Edit: the bug report is here.
Problem
The problem is that Django adds GROUP BY as soon as it sees using an aggregate function.
Solution
So you can just create your own aggregate function but so that Django thinks it is not aggregate. Just like this:
total_comments = Comment.objects.filter(
post=OuterRef('pk')
).order_by().annotate(
total=Func(F('length'), function='SUM')
).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
This way you get the SQL query like this:
SELECT "testapp_post"."id", "testapp_post"."length"
FROM "testapp_post"
WHERE "testapp_post"."length" > (SELECT SUM(U0."length") AS "total"
FROM "testapp_comment" U0
WHERE U0."post_id" = "testapp_post"."id")
So you can even use aggregate subqueries in aggregate functions.
Example
You can count the number of workdays between two dates, excluding weekends and holidays, and aggregate and summarize them by employee:
class NonWorkDay(models.Model):
date = DateField()
class WorkPeriod(models.Model):
employee = models.ForeignKey(User, on_delete=models.CASCADE)
start_date = DateField()
end_date = DateField()
number_of_non_work_days = NonWorkDay.objects.filter(
date__gte=OuterRef('start_date'),
date__lte=OuterRef('end_date'),
).annotate(
cnt=Func('id', function='COUNT')
).values('cnt')
WorkPeriod.objects.values('employee').order_by().annotate(
number_of_word_days=Sum(F('end_date__year') - F('start_date__year') - number_of_non_work_days)
)
Hope this will help!
A solution which would work for any general aggregation could be implemented using Window classes from Django 2.0. I have added this to the Django tracker ticket as well.
This allows the aggregation of annotated values by calculating the aggregate over partitions based on the outer query model (in the GROUP BY clause), then annotating that data to every row in the subquery queryset. The subquery can then use the aggregated data from the first row returned and ignore the other rows.
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)
If I understand correctly, you are trying to count Spaces available in a Carpark. Subquery seems overkill for this, the good old annotate alone should do the trick:
Carpark.objects.annotate(Count('spaces'))
This will include a spaces__count value in your results.
OK, I have seen your note...
I was also able to run your same query with other models I had at hand. The results are the same, so the query in your example seems to be OK (tested with Django 1.11b1):
activities = Activity.objects.filter(event=OuterRef('pk')).values('event')
count_activities = activities.annotate(c=Count('*')).values('c')
Event.objects.annotate(spaces__count=Subquery(count_activities))
Maybe your "simplest real-world example" is too simple... can you share the models or other information?
"works for me" doesn't help very much. But.
I tried your example on some models I had handy (the Book -> Author type), it works fine for me in django 1.11b1.
Are you sure you're running this in the right version of Django? Is this the actual code you're running? Are you actually testing this not on carpark but some more complex model?
Maybe try to print(thequery.query) to see what SQL it's trying to run in the database. Below is what I got with my models (edited to fit your question):
SELECT (SELECT COUNT(U0."id") AS "c"
FROM "carparks_spaces" U0
WHERE U0."carpark_id" = ("carparks_carpark"."id")
GROUP BY U0."carpark_id") AS "space_count" FROM "carparks_carpark"
Not really an answer, but hopefully it helps.
In one of the django apps we use two database engine A and B, both are the same database but with different schemas. We have a table called C in both schemas but using db routing it's always made to point to database B. We have formed a valuelist queryset from one of the models in A, tried to pass the same in table C using filter condition __in but it always fetches empty though there are matching records. When we convert valueslist queryset to a list and use it in table C using filter condition __in it works fine.
Not working
data = modelindbA.objects.values_list('somecolumn',flat=True)
info = C.objects.filter(somecolumn__in=data).values_list
Working
data = modelindbA.objects.values_list('somecolumn',flat=True)
data = list(data)
info = C.objects.filter(somecolumn__in=data).values_list
I have read django docs and other SO questions, couldn't find anything relative. My guess is that since both models are in different database schemas the above is not working. I need assistance on how to troubleshoot this issue.
When you use a queryset with __in, Django will construct a single SQL query that uses a subquery for the __in clause. Since the two tables are in different databases, no rows will match.
By contrast, if you convert the first queryset to a list, Django will go ahead and fetch the data from the first database. When you then pass that data to the second query, hitting the second database, it will work as expected.
See the documentation for the in field lookup for more details:
You can also use a queryset to dynamically evaluate the list of values instead of providing a list of literal values.... This queryset will be evaluated as subselect statement:
SELECT ... WHERE blog.id IN (SELECT id FROM ... WHERE NAME LIKE '%Cheddar%')
Because values_list method returns django.db.models.query.QuerySet, not a list.
When you use it with same schema the orm optimise it and should make just one query, but when schemas are different it fails.
Just use list().
I would even recommend to use it for one schema since it can decrease complexity of query and work better on big tables.
I have been trying to group the results of table into Hourly format using DateTimeField.
SQL:
SELECT strftime('%H', created_on), count(*)
FROM users_test
GROUP BY strftime('%H', created_on);
This query works fine, but the corresponding Django query does not.
Django queries I've tried:
Test.objects.extra({'hour': 'strftime("%%H", created_on)'}).values('hour').annotate(count=Count('id'))
# SELECT (strftime("%H", created_on)) AS "hour", COUNT("users_test"."id") AS "count" FROM "users_test" GROUP BY (strftime("%H", created_on)), "users_test"."created_on" ORDER BY "users_test"."created_on" DESC
It adds additional group by "users_test"."created_on", which I guess is giving incorrect results.
It would be great if anyone can explain me this and provide a solution as well.
Environment:
Python 3
Django 1.8.1
Thanks in Advance
References (Possible Duplicates) (But None helping out):
Grouping Django model entries by day using its datetime field
Django - Group By with Date part alone
Django aggregate on .extra values
To fix it, append order_by() to query chain. This will override model Meta default ordering. Like this:
Test
.objects
.extra({'hour': 'strftime("%%H", created_on)'})
.order_by() #<------ here
.values('hour')
.annotate(count=Count('id'))
In my environment ( Postgres also ):
>>> print ( Material
.objects
.extra({'hour': 'strftime("%%H", data_creacio)'})
.order_by()
.values('hour')
.annotate(count=Count('id'))
.query )
SELECT (strftime("%H", data_creacio)) AS "hour",
COUNT("material_material"."id") AS "count"
FROM "material_material"
GROUP BY (strftime("%H", data_creacio))
Learn more in order_by django docs:
If you don’t want any ordering to be applied to a query, not even the default ordering, call order_by() with no parameters.
Side note:
using extra() may introduce SQL injection vulnerability to your code. Use this with precaution and escape any parameters that user can introduce. Compare with docs:
Warning
You should be very careful whenever you use extra(). Every time you
use it, you should escape any parameters that the user can control by
using params in order to protect against SQL injection attacks .
Please read more about SQL injection protection.
The short of it is, the table names of all queries that are inside a filter get renamed to u0, u1, ..., so my extra where clauses won't know what table to point to. I would love to not have to hand-make all the queries for every way I might subselect on this data, and my current workaround is to turn my extra'd queries into pk values_lists, but those are really slow and something of an abomination.
Here's what this all looks like. You can mostly ignore the details of what goes in the extra of this manager method, except the first sql line which points to products_product.id:
def by_status(self, *statii):
return self.extra(where=["""products_product.id IN
(SELECT recent.product_id
FROM (
SELECT product_id, MAX(start_date) AS latest
FROM products_productstatus
GROUP BY product_id
) AS recent
JOIN products_productstatus AS ps ON ps.product_id = recent.product_id
WHERE ps.start_date = recent.latest
AND ps.status IN (%s))""" % (', '.join([str(stat) for stat in statii]),)])
Which works wonderfully for all the situations involving only the products_product table.
When I want these products as a subselect, i do:
Piece.objects.filter(
product__in=Product.objects.filter(
pk__in=list(
Product.objects.by_status(FEATURED).values_list('id', flat=True))))
How can I keep the generalized abilities of a query set, yet still use an extra where clause?
At first: the issue is not totally clear to me. Is the second code block in your question the actual code you want to execute? If this is the case the query should work as expected since there is no subselect performed.
I assume so that you want to use the second code block without the list() around the subselect to prevent a second query being performed.
The django documentation refers to this issue in the documentation about the extra method. However its not very easy to overcome this issue.
The easiest but most "hakish" solution is to observe which table alias is produced by django for the table you want to query in the extra method. You can rely on the persistent naming of this alias as long as you construct the query always in the same fashion (you don't change the order of multiple extra methods or filter calls that cause a join).
You can inspect a query that will be execute in the DB queryset by using:
print Model.objects.filter(...).query
This will reveal the aliases that are used for the tables you want to query.
As of Django 1.11, you should be able to use Subquery and OuterRef to generate an equivalent query to your extra (using a correlated subquery rather than a join):
def by_status(self, *statii):
return self.filter(
id__in=Subquery(ProductStatus.values("product_id").filter(
status__in=statii,
product__in=Subquery(ProductStatus.objects.values(
"product_id",
).annotate(
latest=Max("start_date"),
).filter(
latest=OuterRef("start_date"),
).values("product_id"),
),
)
You could probably do it with Window expressions as well (as of Django 2.0).
Note that this is untested, so may need some tweaks.