Django + PostgreSQL group by date on datetimefield - django

I have a model which has a datetimefield that I'm trying to annotate on grouping by date.
Eg:
order_totals = Transfer.objects.filter(created__range=[datetime.datetime.combine(datetime.date.today(), datetime.time.min) + datetime.timedelta(days=-5), datetime.datetime.combine(datetime.date.today(), datetime.time.max)]).values('created').annotate(Count('id'))
The problem with the above is it groups by every second/millisecond of the datetime field rather then just the date.
How would I do this?

You should be able to solve this by using QuerySet.extra and add a column to the query
eg.
qs.filter(...).extra(select={'created_date': 'created::date'}).values('created_date')

Starting on Django 1.8, you can also use the new DateTime expression (weirdly it's is not documented in the built-in expressions sheet).
import pytz
from django.db.models.expressions import DateTime
qs.annotate(created_date=DateTime('created', 'day', pytz.UTC))
If you want to group by created_date, just chain another aggregating expression :
qs.annotate(created_date=DateTime('created', 'day', pytz.UTC)).values('created_date').annotate(number=Count('id'))
(Redundant values is needed to generate the appropriate GROUP BY. See aggregation topic in Django documentation).

Related

Django-Postgres: how to group by DATE a datetime field with timezone enabled

I am having this problem with prostgresql and django:
I have a lot of events that were created on a certain date at a certain time which is stored in a datetime field created .
I want to have aggregations based on the date part of the created field. The simplest examples is: how many event are in each day of this month?.
The created field is timezone aware. So the result should change depending on the timezone the user is in. For example if you created 2 events at 23:30 UTC time on 2017-10-02 if you view them from UTC-1 you should see them on 3rd of October at 00:30 and the totals should add for the 3rd.
I am struggling to find a solution to this problem that works with a lot of data. So doing for each day and SQL statement is not an option. I want something that translates into:
SELECT count(*) from table GROUP BY date
Now I found a solution for the first part of the problem:
from django.db import connection
truncate_date = connection.ops.date_trunc_sql('day', 'created')
queryset = queryset.extra({'day': truncate_date})
total_list = list(queryset.values('day').annotate(amount=Count('id')).order_by('day'))
Is there a way to add to this the timezone that should be used by the date_trunc_sql function to calculate the day? Or some other function before date_trunc_sql and then chain that one.
Thanks!
You're probably looking for this: timezone aware date_trunc function
However bear in mind this might conflict with how your django is configured. https://docs.djangoproject.com/en/1.11/topics/i18n/timezones/
Django 2.2+ supports the TruncDate database function with timezones
You can now do the following to :
import pytz
east_coast = pytz.timezone('America/New_York')
queryset.annotate(created_date=TruncDay("created", tzinfo=east_coast))
.values("created_date")
.order_by("created_date")
.annotate(count=Count("created_date"))
.order_by("-created_date")

Django count group by date from datetime

I'm trying to count the dates users register from a DateTime field. In the database this is stored as '2016-10-31 20:49:38' but I'm only interested in the date '2016-10-31'.
The raw SQL query is:
select DATE(registered_at) registered_date,count(registered_at) from User
where course='Course 1' group by registered_date;
It is possible using 'extra' but I've read this is deprecated and should not be done. It works like this though:
User.objects.all()
.filter(course='Course 1')
.extra(select={'registered_date': "DATE(registered_at)"})
.values('registered_date')
.annotate(**{'total': Count('registered_at')})
Is it possible to do without using extra?
I read that TruncDate can be used and I think this is the correct queryset however it does not work:
User.objects.all()
.filter(course='Course 1')
.annotate(registered_date=TruncDate('registered_at'))
.values('registered_date')
.annotate(**{'total': Count('registered_at')})
I get <QuerySet [{'total': 508346, 'registered_date': None}]> so there is something going wrong with TruncDate.
If anyone understands this better than me and can point me in the right direction that would be much appreciated.
Thanks for your help.
I was trying to do something very similar and was having the same problems as you. I managed to get my problem working by adding in an order_by clause after applying the TruncDate annotation. So I imagine that this should work for you too:
User.objects.all()
.filter(course='Course 1')
.annotate(registered_date=TruncDate('registered_at'))
.order_by('registered_date')
.values('registered_date')
.annotate(**{'total': Count('registered_at')})
Hope this helps?!
This is an alternative to using TruncDate by using `registered_at__date' and Django does the truncate for you.
from django.db.models import Count
from django.contrib.auth import get_user_model
metrics = {
'total': Count('registered_at__date')
}
get_user_model().objects.all()
.filter(course='Course 1')
.values('registered_at__date')
.annotate(**metrics)
.order_by('registered_at__date')
For Postgresql this transforms to the DB query:
SELECT
("auth_user"."registered_at" AT TIME ZONE 'Asia/Kolkata')::date,
COUNT("auth_user"."registered_at") AS "total"
FROM
"auth_user"
GROUP BY
("auth_user"."registered_at" AT TIME ZONE 'Asia/Kolkata')::date
ORDER BY
("auth_user"."registered_at" AT TIME ZONE 'Asia/Kolkata')::date ASC;
From the above example you can see that Django ORM reverses SELECT and GROUP_BY arguments. In Django ORM .values() roughly controls the GROUP_BY argument while .annotate() controls the SELECT columns and what aggregations needs to be done. This feels a little odd but is simple when you get the hang of it.

Group objects by dates

clicks = SellerClick.objects.extra({'date' : "date(timestamp)"}).values('date').annotate(count=Count('timestamp'))
The model has a datetime field called timestamp that was are using. I first, convert the datetime field to just a date field. Then the rest is guessing. I need to group by, and then count how many objects are of each date.
So the desired result would be a date, then a count, based on how many objects have that date in the timestamp field.
I prefer to use annotate over extra
from django.db.models.expressions import RawSQL
SellerClick.objects.annotate(
date=RawSQL('date(date_joined)',[]),
).values('date').annotate(count=Count('date')))
You've got everything but an initial queryset there. The extra sql you're passing doesn't include a select so you need to give it something to act on.
clicks = SellerClick.objects.all()
.extra({'date' : "date(timestamp)"})
.values('date')
.annotate(count=Count('timestamp'))
Ref: StackOverflow: Count number of records by date in Django

Django DateTimeField with optional time part

I have a field which will represent the start time of an event and I am using the Django DateTimeField for this.
This field is mandatory but sometimes the users will only know the start date and not the time.
Is there anyway to make the time part optional and keep the date part mandatory?
Maybe you should try to separate date from time. There are DateField and TimeField for that.
Example for use at the views or models:
You can use function strptime to show the datetime field any formats.
from datetime import datetime
datetime.now().strftime('%Y-%m-%d')
# print string '2013-06-25'
Example for use at the templates:
you can use templatetag date
{{ datetime_field|date:"Y-m-d" }}

How to aggregate computed field with django ORM? (without raw SQL)

I'm trying to find the cumulated duration of some events, 'start' and 'end' field are both django.db.models.DateTimeField fields.
What I would like to do should have been written like this:
from django.db.models import F, Sum
from my.models import Event
Event.objects.aggregate(anything=Sum(F('start') - F('end')))
# this first example return:
# AttributeError: 'ExpressionNode' object has no attribute 'split'
# Ok I'll try more SQLish:
Event.objects.extra(select={
'extra_field': 'start - end'
}).aggregate(Sum('extra_field'))
# this time:
# FieldError: Cannot resolve keyword 'extra_field' into field.
I can't agreggate (Sum) start and end separately then substract in python because DB can't Sum DateTime objects.
A good way to do without raw sql?
Can't help Christophe without a Delorean, but I was hitting this error and was able to solve it in Django 1.8 like:
total_sum = Event.objects\
.annotate(anything=Sum(F('start') - F('end')))\
.aggregate(total_sum=Sum('anything'))['total_sum']
When I couldn't upgrade all my dependencies to 1.8, I found this to work with Django 1.7.9 on top of MySQL:
totals = self.object_list.extra(Event.objects.extra(select={
'extra_field': 'sum(start - end)'
})[0]
If you are on Postgres, then you can use the django-pg-utils package and compute in the database. Cast the duration field into seconds and then take the sum
from pg_utils import Seconds
from django.db.models import Sum
Event.objects.aggregate(anything=Sum(Seconds(F('start') - F('end'))))
This answer don't realy satisfy me yet, my current work around works but it's not DB computed...
reduce(lambda h, e: h + (e.end - e.start).total_seconds(), events, 0)
It returns the duration of all events in the queryset in seconds
Better SQL less solutions?