Django count group by date from datetime - django

I'm trying to count the dates users register from a DateTime field. In the database this is stored as '2016-10-31 20:49:38' but I'm only interested in the date '2016-10-31'.
The raw SQL query is:
select DATE(registered_at) registered_date,count(registered_at) from User
where course='Course 1' group by registered_date;
It is possible using 'extra' but I've read this is deprecated and should not be done. It works like this though:
User.objects.all()
.filter(course='Course 1')
.extra(select={'registered_date': "DATE(registered_at)"})
.values('registered_date')
.annotate(**{'total': Count('registered_at')})
Is it possible to do without using extra?
I read that TruncDate can be used and I think this is the correct queryset however it does not work:
User.objects.all()
.filter(course='Course 1')
.annotate(registered_date=TruncDate('registered_at'))
.values('registered_date')
.annotate(**{'total': Count('registered_at')})
I get <QuerySet [{'total': 508346, 'registered_date': None}]> so there is something going wrong with TruncDate.
If anyone understands this better than me and can point me in the right direction that would be much appreciated.
Thanks for your help.

I was trying to do something very similar and was having the same problems as you. I managed to get my problem working by adding in an order_by clause after applying the TruncDate annotation. So I imagine that this should work for you too:
User.objects.all()
.filter(course='Course 1')
.annotate(registered_date=TruncDate('registered_at'))
.order_by('registered_date')
.values('registered_date')
.annotate(**{'total': Count('registered_at')})
Hope this helps?!

This is an alternative to using TruncDate by using `registered_at__date' and Django does the truncate for you.
from django.db.models import Count
from django.contrib.auth import get_user_model
metrics = {
'total': Count('registered_at__date')
}
get_user_model().objects.all()
.filter(course='Course 1')
.values('registered_at__date')
.annotate(**metrics)
.order_by('registered_at__date')
For Postgresql this transforms to the DB query:
SELECT
("auth_user"."registered_at" AT TIME ZONE 'Asia/Kolkata')::date,
COUNT("auth_user"."registered_at") AS "total"
FROM
"auth_user"
GROUP BY
("auth_user"."registered_at" AT TIME ZONE 'Asia/Kolkata')::date
ORDER BY
("auth_user"."registered_at" AT TIME ZONE 'Asia/Kolkata')::date ASC;
From the above example you can see that Django ORM reverses SELECT and GROUP_BY arguments. In Django ORM .values() roughly controls the GROUP_BY argument while .annotate() controls the SELECT columns and what aggregations needs to be done. This feels a little odd but is simple when you get the hang of it.

Related

Django-Postgres: how to group by DATE a datetime field with timezone enabled

I am having this problem with prostgresql and django:
I have a lot of events that were created on a certain date at a certain time which is stored in a datetime field created .
I want to have aggregations based on the date part of the created field. The simplest examples is: how many event are in each day of this month?.
The created field is timezone aware. So the result should change depending on the timezone the user is in. For example if you created 2 events at 23:30 UTC time on 2017-10-02 if you view them from UTC-1 you should see them on 3rd of October at 00:30 and the totals should add for the 3rd.
I am struggling to find a solution to this problem that works with a lot of data. So doing for each day and SQL statement is not an option. I want something that translates into:
SELECT count(*) from table GROUP BY date
Now I found a solution for the first part of the problem:
from django.db import connection
truncate_date = connection.ops.date_trunc_sql('day', 'created')
queryset = queryset.extra({'day': truncate_date})
total_list = list(queryset.values('day').annotate(amount=Count('id')).order_by('day'))
Is there a way to add to this the timezone that should be used by the date_trunc_sql function to calculate the day? Or some other function before date_trunc_sql and then chain that one.
Thanks!
You're probably looking for this: timezone aware date_trunc function
However bear in mind this might conflict with how your django is configured. https://docs.djangoproject.com/en/1.11/topics/i18n/timezones/
Django 2.2+ supports the TruncDate database function with timezones
You can now do the following to :
import pytz
east_coast = pytz.timezone('America/New_York')
queryset.annotate(created_date=TruncDay("created", tzinfo=east_coast))
.values("created_date")
.order_by("created_date")
.annotate(count=Count("created_date"))
.order_by("-created_date")

Alternative nullif in Django ORM

Use Postgres as db and Django 1.9
I have some model with field 'price'. 'Price' blank=True.
On ListView, I get query set. Next, I want to sort by price with price=0 at end.
How I can write in SQL it:
'ORDER BY NULLIF('price', 0) NULLS LAST'
How write it on Django ORM? Or on rawsql?
Ok. I found alternative. Write own NullIf with django func.
from django.db.models import Func
class NullIf(Func):
template = 'NULLIF(%(expressions)s, 0)'
And use it for queryset:
queryset.annotate(new_price=NullIf('price')).order_by('new_price')
Edit : Django 2.2 and above have this implemented out of the box. The equivalent code will be
from django.db.models.functions import NullIf
from django.db.models import Value
queryset.annotate(new_price=NullIf('price', Value(0)).order_by('new_price')
You can still ORDER BY PRICE NULLS LAST if in your select you select the price as SELECT NULLIF('price', 0). That way you get the ordering you want, but the data is returned in the way you want. In django ORM you would select the price with annotate eg TableName.objects.annotate(price=NullIf('price', 0) and for the order by NULLS LAST and for the order by I'd follow the recommendations here Django: Adding "NULLS LAST" to query
Otherwise you could also ORDER BY NULLIF('price', 0) DESC but that will reorder the other numeric values. You can also obviously exclude null prices from the query entirely if you don't require them.

Django 1.11 Annotating a Subquery Aggregate

This is a bleeding-edge feature that I'm currently skewered upon and quickly bleeding out. I want to annotate a subquery-aggregate onto an existing queryset. Doing this before 1.11 either meant custom SQL or hammering the database. Here's the documentation for this, and the example from it:
from django.db.models import OuterRef, Subquery, Sum
comments = Comment.objects.filter(post=OuterRef('pk')).values('post')
total_comments = comments.annotate(total=Sum('length')).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
They're annotating on the aggregate, which seems weird to me, but whatever.
I'm struggling with this so I'm boiling it right back to the simplest real-world example I have data for. I have Carparks which contain many Spaces. Use Book→Author if that makes you happier but —for now— I just want to annotate on a count of the related model using Subquery*.
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
This gives me a lovely ProgrammingError: more than one row returned by a subquery used as an expression and in my head, this error makes perfect sense. The subquery is returning a list of spaces with the annotated-on total.
The example suggested that some sort of magic would happen and I'd end up with a number I could use. But that's not happening here? How do I annotate on aggregate Subquery data?
Hmm, something's being added to my query's SQL...
I built a new Carpark/Space model and it worked. So the next step is working out what's poisoning my SQL. On Laurent's advice, I took a look at the SQL and tried to make it more like the version they posted in their answer. And this is where I found the real problem:
SELECT "bookings_carpark".*, (SELECT COUNT(U0."id") AS "c"
FROM "bookings_space" U0
WHERE U0."carpark_id" = ("bookings_carpark"."id")
GROUP BY U0."carpark_id", U0."space"
)
AS "space_count" FROM "bookings_carpark";
I've highlighted it but it's that subquery's GROUP BY ... U0."space". It's retuning both for some reason. Investigations continue.
Edit 2: Okay, just looking at the subquery SQL I can see that second group by coming through ☹
In [12]: print(Space.objects_standard.filter().values('carpark').annotate(c=Count('*')).values('c').query)
SELECT COUNT(*) AS "c" FROM "bookings_space" GROUP BY "bookings_space"."carpark_id", "bookings_space"."space" ORDER BY "bookings_space"."carpark_id" ASC, "bookings_space"."space" ASC
Edit 3: Okay! Both these models have sort orders. These are being carried through to the subquery. It's these orders that are bloating out my query and breaking it.
I guess this might be a bug in Django but short of removing the Meta-order_by on both these models, is there any way I can unsort a query at querytime?
*I know I could just annotate a Count for this example. My real purpose for using this is a much more complex filter-count but I can't even get this working.
Shazaam! Per my edits, an additional column was being output from my subquery. This was to facilitate ordering (which just isn't required in a COUNT).
I just needed to remove the prescribed meta-order from the model. You can do this by just adding an empty .order_by() to the subquery. In my code terms that meant:
from django.db.models import Count, OuterRef, Subquery
spaces = Space.objects.filter(carpark=OuterRef('pk')).order_by().values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
And that works. Superbly. So annoying.
It's also possible to create a subclass of Subquery, that changes the SQL it outputs. For instance, you can use:
class SQCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
You then use this as you would the original Subquery class:
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
Carpark.objects.annotate(space_count=SQCount(spaces))
You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.
I just bumped into a VERY similar case, where I had to get seat reservations for events where the reservation status is not cancelled. After trying to figure the problem out for hours, here's what I've seen as the root cause of the problem:
Preface: this is MariaDB, Django 1.11.
When you annotate a query, it gets a GROUP BY clause with the fields you select (basically what's in your values() query selection). After investigating with the MariaDB command line tool why I'm getting NULLs or Nones on the query results, I've came to the conclusion that the GROUP BY clause will cause the COUNT() to return NULLs.
Then, I started diving into the QuerySet interface to see how can I manually, forcibly remove the GROUP BY from the DB queries, and came up with the following code:
from django.db.models.fields import PositiveIntegerField
reserved_seats_qs = SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES
).values('id').annotate(
count=Count('id')).values('count')
# Query workaround: remove GROUP BY from subquery. Test this
# vigorously!
reserved_seats_qs.query.group_by = []
performances_qs = Performance.objects.annotate(
reserved_seats=Subquery(
queryset=reserved_seats_qs,
output_field=PositiveIntegerField()))
print(performances_qs[0].reserved_seats)
So basically, you have to manually remove/update the group_by field on the subquery's queryset in order for it to not have a GROUP BY appended on it on execution time. Also, you'll have to specify what output field the subquery will have, as it seems that Django fails to recognize it automatically, and raises exceptions on the first evaluation of the queryset. Interestingly, the second evaluation succeeds without it.
I believe this is a Django bug, or an inefficiency in subqueries. I'll create a bug report about it.
Edit: the bug report is here.
Problem
The problem is that Django adds GROUP BY as soon as it sees using an aggregate function.
Solution
So you can just create your own aggregate function but so that Django thinks it is not aggregate. Just like this:
total_comments = Comment.objects.filter(
post=OuterRef('pk')
).order_by().annotate(
total=Func(F('length'), function='SUM')
).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
This way you get the SQL query like this:
SELECT "testapp_post"."id", "testapp_post"."length"
FROM "testapp_post"
WHERE "testapp_post"."length" > (SELECT SUM(U0."length") AS "total"
FROM "testapp_comment" U0
WHERE U0."post_id" = "testapp_post"."id")
So you can even use aggregate subqueries in aggregate functions.
Example
You can count the number of workdays between two dates, excluding weekends and holidays, and aggregate and summarize them by employee:
class NonWorkDay(models.Model):
date = DateField()
class WorkPeriod(models.Model):
employee = models.ForeignKey(User, on_delete=models.CASCADE)
start_date = DateField()
end_date = DateField()
number_of_non_work_days = NonWorkDay.objects.filter(
date__gte=OuterRef('start_date'),
date__lte=OuterRef('end_date'),
).annotate(
cnt=Func('id', function='COUNT')
).values('cnt')
WorkPeriod.objects.values('employee').order_by().annotate(
number_of_word_days=Sum(F('end_date__year') - F('start_date__year') - number_of_non_work_days)
)
Hope this will help!
A solution which would work for any general aggregation could be implemented using Window classes from Django 2.0. I have added this to the Django tracker ticket as well.
This allows the aggregation of annotated values by calculating the aggregate over partitions based on the outer query model (in the GROUP BY clause), then annotating that data to every row in the subquery queryset. The subquery can then use the aggregated data from the first row returned and ignore the other rows.
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)
If I understand correctly, you are trying to count Spaces available in a Carpark. Subquery seems overkill for this, the good old annotate alone should do the trick:
Carpark.objects.annotate(Count('spaces'))
This will include a spaces__count value in your results.
OK, I have seen your note...
I was also able to run your same query with other models I had at hand. The results are the same, so the query in your example seems to be OK (tested with Django 1.11b1):
activities = Activity.objects.filter(event=OuterRef('pk')).values('event')
count_activities = activities.annotate(c=Count('*')).values('c')
Event.objects.annotate(spaces__count=Subquery(count_activities))
Maybe your "simplest real-world example" is too simple... can you share the models or other information?
"works for me" doesn't help very much. But.
I tried your example on some models I had handy (the Book -> Author type), it works fine for me in django 1.11b1.
Are you sure you're running this in the right version of Django? Is this the actual code you're running? Are you actually testing this not on carpark but some more complex model?
Maybe try to print(thequery.query) to see what SQL it's trying to run in the database. Below is what I got with my models (edited to fit your question):
SELECT (SELECT COUNT(U0."id") AS "c"
FROM "carparks_spaces" U0
WHERE U0."carpark_id" = ("carparks_carpark"."id")
GROUP BY U0."carpark_id") AS "space_count" FROM "carparks_carpark"
Not really an answer, but hopefully it helps.

how does django query work?

my models are designed like so
class Warehouse:
name = ...
sublocation = FK(Sublocation)
class Sublocation:
name = ...
city = FK(City)
class City:
name = ..
state = Fk(State)
Now if i throw a query.
wh = Warehouse.objects.value_list(['name', 'sublocation__name',
'sublocation__city__name']).first()
it returns correct result but internally how many query is it throwing? is django fetching the data in one request?
Django makes only one query to the database for getting the data you described.
When you do:
wh = Warehouse.objects.values_list(
'name', 'sublocation__name', 'sublocation__city__name').first()
It translates in to this query:
SELECT "myapp_warehouse"."name", "myapp_sublocation"."name", "myapp_city"."name"
FROM "myapp_warehouse" INNER JOIN "myapp_sublocation"
ON ("myapp_warehouse"."sublocation_id" = "myapp_sublocation"."id")
INNER JOIN "myapp_city" ON ("myapp_sublocation"."city_id" = "myapp_city"."id")'
It gets the result in a single query. You can count number of queries in your shell like this:
from django.db import connection as c, reset_queries as rq
In [42]: rq()
In [43]: len(c.queries)
Out[43]: 0
In [44]: wh = Warehouse.objects.values_list('name', 'sublocation__name', 'sublocation__city__name').first()
In [45]: len(c.queries)
Out[45]: 1
My suggestion would be to write a test for this using assertNumQueries (docs here).
from django.test import TestCase
from yourproject.models import Warehouse
class TestQueries(TestCase):
def test_query_num(self):
"""
Assert values_list query executes 1 database query
"""
values = ['name', 'sublocation__name', 'sublocation__city__name']
with self.assertNumQueries(1):
Warehouse.objects.value_list(values).first()
FYI I'm not sure how many queries are indeed sent to the database, 1 is my current best guess. Adjust the number of queries expected to get this to pass in your project and pin the requirement.
There is extensive documentation on how and when querysets are evaluated in Django docs: QuerySet API Reference.
The pretty much standard way to have a good insight of how many and which queries are taken place during a page render is to use the Django Debug Toolbar. This could tell you precisely how many times this recordset is evaluated.
You can use django-debug-toolbar to see real queries to db

Django Query Related Field Count

I've got an app where users create pages. I want to run a simple DB query that returns how many users have created more than 2 pages.
This is essentially what I want to do, but of course it's not the right method:
User.objects.select_related('page__gte=2').count()
What am I missing?
You should use aggregates.
from django.db.models import Count
User.objects.annotate(page_count=Count('page')).filter(page_count__gte=2).count()
In my case, I didn't use last .count() like the other answer and it also works nice.
from django.db.models import Count
User.objects.annotate( our_param=Count("all_comments")).filter(our_param__gt=12)
use aggregate() function with django.db.models methods!
this is so useful and not really crushing with other annotation aggregated columns.
*use aggregate() at the last step of calculation, it turns your queryset to dict.
below is my code snippet using them.
cnt = q.values("person__year_of_birth").filter(person__year_of_birth__lte=year_interval_10)\
.filter(person__year_of_birth__gt=year_interval_10-10)\
.annotate(group_cnt=Count("visit_occurrence_id")).aggregate(Sum("group_cnt"))