Aggregate difference between DateTime fields in Django - django

I have a table containing a series of entries which relate to time periods (specifically, time worked for a client):
task_time:
id | start_time | end_time | client (fk)
1 08/12/2011 14:48 08/12/2011 14:50 2
I am trying to aggregate all the time worked for a given client, from my Django app:
time_worked_aggregate = models.TaskTime.objects.\
filter(client = some_client_id).\
extra(select = {'elapsed': 'SUM(task_time.end_time - task_time.start_time)'}).\
values('elapsed')
if len(time_worked_aggregate) > 0:
time_worked = time_worked_aggregate[0]['elapsed'].total_seconds()
else:
time_worked = 0
This seems inelegant, but it does work. Or at least so I thought: it turns out that it works fine on a PostgreSQL database, but when I move over to SQLite, everything dies.
A bit of digging suggests that the reason for this is that DateTimes aren't first-class data in SQLite. The following raw SQLite query will do my job:
SELECT SUM(strftime('%s', end_time) - strftime('%s', start_time)) FROM task_time WHERE ...;
My question is as follows:
The Python sample above seems roundabout. Can we do this more elegantly?
More importantly at this stage, can we do it in a way that will work on both Postgres and SQLite? Ideally, I'd like not to be writing raw SQL queries and switching on the database backend that happens to be in place; in general, Django is extremely good at protecting us from this. Does Django have a reasonable abstraction for this operation? If not, what's a sensible way for me to do a conditional switch on the backend?
I should mention for context that the dataset is many thousands of entries; the following is not really practical:
sum([task_time.end_date - task_time.start_date for task_time in models.TaskTime.objects.filter(...)])

Almost the same solution as #andri proposed. In the final result you will get the same data.
ExpressionWrapper - New in Django 1.8.
from datetime import timedelta
from django.db.models import ExpressionWrapper, F, fields
from app.models import MyModel
duration = ExpressionWrapper(F('closed_at') - F('opened_at'), output_field=fields.DurationField())
objects = MyModel.objects.closed().annotate(duration=duration).filter(duration__gt=timedelta(seconds=2))
for obj in objects:
print obj.id, obj.duration, obj.duration.seconds
# sample output
# 807 0:00:57.114017 57
# 800 0:01:23.879478 83
# 804 3:40:06.797188 13206
# 801 0:02:06.786300 126

I think since Django 1.8 we can do better:
I would like just to draw the part with annotation, the further part with aggregation should be straightforward:
from django.db.models import F, Func
SomeModel.objects.annotate(
duration = Func(F('end_date'), F('start_date'), function='age')
)
[more about postgres age function here: http://www.postgresql.org/docs/8.4/static/functions-datetime.html ]
each instance of SomeModel will be anotated with duration field containg time difference, which in python will be a datetime.timedelta() object [more about datetime timedelta here: https://docs.python.org/2/library/datetime.html#timedelta-objects ]

I will do it step by step:
first step:annotate the timedelta
group by and sum timedelta
the code like this:
from django.db.models import Count, Sum, F
times_obj_list = models.TaskTime.objects.annotate(times=F("end_time")-F("start_time"))
groupby_obj_list = times_obj_list.values("client").annotate(cnt=Count("id"),seconds=Sum(times)).order_by()

Django currently only supports aggregates for Min, Max, Avg and Count, so using raw SQL is the only way to achieve what you want. When you use raw SQL, database-independence is out the window, so unfortunately, you're out of luck. You'll have to just detect the database and alter the SQL appropriately.

Related

how does django query work?

my models are designed like so
class Warehouse:
name = ...
sublocation = FK(Sublocation)
class Sublocation:
name = ...
city = FK(City)
class City:
name = ..
state = Fk(State)
Now if i throw a query.
wh = Warehouse.objects.value_list(['name', 'sublocation__name',
'sublocation__city__name']).first()
it returns correct result but internally how many query is it throwing? is django fetching the data in one request?
Django makes only one query to the database for getting the data you described.
When you do:
wh = Warehouse.objects.values_list(
'name', 'sublocation__name', 'sublocation__city__name').first()
It translates in to this query:
SELECT "myapp_warehouse"."name", "myapp_sublocation"."name", "myapp_city"."name"
FROM "myapp_warehouse" INNER JOIN "myapp_sublocation"
ON ("myapp_warehouse"."sublocation_id" = "myapp_sublocation"."id")
INNER JOIN "myapp_city" ON ("myapp_sublocation"."city_id" = "myapp_city"."id")'
It gets the result in a single query. You can count number of queries in your shell like this:
from django.db import connection as c, reset_queries as rq
In [42]: rq()
In [43]: len(c.queries)
Out[43]: 0
In [44]: wh = Warehouse.objects.values_list('name', 'sublocation__name', 'sublocation__city__name').first()
In [45]: len(c.queries)
Out[45]: 1
My suggestion would be to write a test for this using assertNumQueries (docs here).
from django.test import TestCase
from yourproject.models import Warehouse
class TestQueries(TestCase):
def test_query_num(self):
"""
Assert values_list query executes 1 database query
"""
values = ['name', 'sublocation__name', 'sublocation__city__name']
with self.assertNumQueries(1):
Warehouse.objects.value_list(values).first()
FYI I'm not sure how many queries are indeed sent to the database, 1 is my current best guess. Adjust the number of queries expected to get this to pass in your project and pin the requirement.
There is extensive documentation on how and when querysets are evaluated in Django docs: QuerySet API Reference.
The pretty much standard way to have a good insight of how many and which queries are taken place during a page render is to use the Django Debug Toolbar. This could tell you precisely how many times this recordset is evaluated.
You can use django-debug-toolbar to see real queries to db

How can I make a Django update with a conditional case?

I would like to use Django to update a field to a different value depending on its current value, but I haven't figured out how to do it without doing 2 separate update statements.
Here's an example of what I'd like to do:
now = timezone.now()
data = MyData.objects.get(pk=dataID)
if data.targetTime < now:
data.targetTime = now + timedelta(days=XX)
else:
data.targetTime = data.targetTime + timedelta(days=XX)
data.save()
Now, I'd like to use an update() statement to avoid overwriting other fields on my data, but I don't know how to do it in a single update(). I tried some code like this, but the second update didn't use the up to date time (I ended up with a field equal to the current time) :
# Update the time to the current time
now = timezone.now()
MyData.objects.filter(pk=dataID).filter(targetTime__lt=now).update(targetTime=now)
# Then add the additional time
MyData.objects.filter(pk=dataID).update(targetTime=F('targetTime') + timedelta(days=XX))
Is there a way I can reduce this to a single update() statement? Something similar to the SQL CASE statement?
You need to use conditional expressions, like this
from django.db.models import Case, When, F
object = MyData.objects.get(pk=dataID)
now = timezone.now()
object.targetTime = Case(
When(targetTime__lt=now, then=now + timedelta(days=XX)),
default=F('targetTime') + timedelta(days=XX)
)
object.save(update_fields=['targetTime'])
For debugging, try running this right after save to see what SQL queries have just run:
import pprint
from django.db import connection
pprint.pprint(["queries", connection.queries])
I've tested this with integers and it works in Django 1.8, I haven't tried dates yet so it might need some tweaking.
Django 1.9 added the Greatest and Least database functions. This is an adaptation of Benjamin Toueg's answer:
from django.db.models import F
from django.db.models.functions import Greatest
MyData.objects.filter(pk=dataID).update(
targetTime=Greatest(F('targetTime'), timezone.now()) + timedelta(days=XX)
)
Simple Example for Django 3 and above:
from django.db.models import Case, Value, When, F
MyModel.objects.filter(abc__id=abc_id_list)\
.update(status=Case(
When(xyz__isnull=False, then=Value("this_value")),
default=Value("default_value"),))
If I understand correctly, you take the maximum time between now and the value in database.
If that is so, you can do it in one line with the max function:
from django.db.models import F
MyData.objects.filter(pk=dataID).update(targetTime=max(F('targetTime'),timezone.now()) + timedelta(days=XX))
Instead of using queryset.update(...), use obj.save(update_fields=['field_one', 'field_two']) (see https://docs.djangoproject.com/en/dev/ref/models/instances/#specifying-which-fields-to-save), which won't overwrite your existing fields.
It's not possible to do this without a select query first (get), because you're doing two different things based on a conditional (i.e., you can't pass that kind of logic to the database with Django - there are limits to what can be achieved with F), but at least this gets you a single insert/update.
I have figured out how to do it with a raw SQL statement:
cursor = connection.cursor()
cursor.execute("UPDATE `mydatabase_name` SET `targetTime` = CASE WHEN `targetTime` < %s THEN %s ELSE (`targetTime` + %s) END WHERE `dataID` = %s", [timezone.now(), timezone.now() + timedelta(days=XX), timedelta(days=XX), dataID])
transaction.commit_unless_managed()
I'm using this for now and it seems to be accomplishing what I want.

How to aggregate computed field with django ORM? (without raw SQL)

I'm trying to find the cumulated duration of some events, 'start' and 'end' field are both django.db.models.DateTimeField fields.
What I would like to do should have been written like this:
from django.db.models import F, Sum
from my.models import Event
Event.objects.aggregate(anything=Sum(F('start') - F('end')))
# this first example return:
# AttributeError: 'ExpressionNode' object has no attribute 'split'
# Ok I'll try more SQLish:
Event.objects.extra(select={
'extra_field': 'start - end'
}).aggregate(Sum('extra_field'))
# this time:
# FieldError: Cannot resolve keyword 'extra_field' into field.
I can't agreggate (Sum) start and end separately then substract in python because DB can't Sum DateTime objects.
A good way to do without raw sql?
Can't help Christophe without a Delorean, but I was hitting this error and was able to solve it in Django 1.8 like:
total_sum = Event.objects\
.annotate(anything=Sum(F('start') - F('end')))\
.aggregate(total_sum=Sum('anything'))['total_sum']
When I couldn't upgrade all my dependencies to 1.8, I found this to work with Django 1.7.9 on top of MySQL:
totals = self.object_list.extra(Event.objects.extra(select={
'extra_field': 'sum(start - end)'
})[0]
If you are on Postgres, then you can use the django-pg-utils package and compute in the database. Cast the duration field into seconds and then take the sum
from pg_utils import Seconds
from django.db.models import Sum
Event.objects.aggregate(anything=Sum(Seconds(F('start') - F('end'))))
This answer don't realy satisfy me yet, my current work around works but it's not DB computed...
reduce(lambda h, e: h + (e.end - e.start).total_seconds(), events, 0)
It returns the duration of all events in the queryset in seconds
Better SQL less solutions?

Making queries using F() and timedelta at django

I have the following model:
class Process(models.Model):
title = models.Charfield(max_length=255)
date_up = models.DateTimeField(auto_now_add=True)
days_activation = models.PositiveSmallIntegerField(default=0)
Now I need to query for all Process objects that have expired, according to their value of days_activation.
I tried
from datetime import datetime, timedelta
Process.objects.filter(date_up__lte=datetime.now()-timedelta(days=F('days_activation')))
and received the following error message:
TypeError: unsupported type for timedelta days component: F
I can of course do it in Python:
filter (lambda x: x.date_up<=datetime.now() - timedelta(days=x.days_activation),
Process.objects.all ()),
but I really need to produce a django.db.models.query.QuerySet.
7 days == 1 day * 7
F is deep-black Django magic and the objects that encounter it
must belong to the appropriate magical circles to handle it.
In your case, django.db.models.query.filter knows about F, but datetime.timedelta does not.
Therefore, you need to keep the F out of the timedelta argument list.
Fortunately, multiplication of timedelta * int is supported by F,
so the following can work:
Process.objects.filter(date_up__lte=datetime.now()-timedelta(days=1)*F('days_activation'))
As it turns out, this will work with PostgreSQL, but will not work with SQlite (for which Django 1.11 only supports + and - for timedelta,
perhaps because of a corresponding SQlite limitation).
You are mixing two layers: run-time layer and the database layer. F function is just a helper which allows you to build slightly more complex queries with django ORM. You are using timedelta and Ftogether and expecting that django ORM will be smart enough to convert these things to raw SQL, but it can't, as I see. Maybe I am wrong and do not know something about django ORM.
Anyway, you can rewrite you ORM call with extra extra and build the WHERE clause manually using native SQL functions which equals to datetime.now() and timedelta.
You have to extend Aggregate. Do like below:
from django.db import models as DM
class BaseSQL(object):
function = 'DATE_SUB'
template = '%(function)s(NOW(), interval %(expressions)s day)'
class DurationAgr(BaseSQL, DM.Aggregate):
def __init__(self, expression, **extra):
super(DurationAgr, self).__init__(
expression,
output_field=DM.DateTimeField(),
**extra
)
Process.objects.filter(date_up__lte=DurationAgr('days_activation'))
Hopefully, It will work for you. :)
I tried to use solution by Lutz Prechelt above, but got MySQL syntax error.
It's because we can't perform arithmetic operations with INTERVAL in MySQL.
So, for MySQL my solution is create a custom DB function:
class MysqlSubDate(Func):
function = 'SUBDATE'
output_field = DateField()
Example of usage:
.annotate(remainded_days=MysqlSubDate('end_datetime', F('days_activation')))
Also you can use timedelta, it will be converted into INTERVAL
.annotate(remainded_days=MysqlSubDate('end_datetime', datetime.timedelta(days=10)))

How to put timedelta in django model?

With inspectdb I was able to get a "interval" field from postgres into django. In Django, it was a TextField. The object that I retrieved was indeed a timedelta object!
Now I want to put this timedelta object in a new model. What's the best way to do this? Because putting a timedelta in a TextField results in the str version of the object...
Since Django 1.8 you can use DurationField.
You can trivially normalize a timedelta to a single floating-point number in days or seconds.
Here's the "Normalize to Days" version.
float(timedelta.days) + float(timedelta.seconds) / float(86400)
You can trivially turn a floating-point number into a timedelta.
>>> datetime.timedelta(2.5)
datetime.timedelta(2, 43200)
So, store your timedelta as a float.
Here's the "Normalize to Seconds" version.
timedelta.days*86400+timedelta.seconds
Here's the reverse (using seconds)
datetime.timedelta( someSeconds/86400 )
First, define your model:
class TimeModel(models.Model):
time = models.FloatField()
To store a timedelta object:
# td is a timedelta object
TimeModel.objects.create(time=td.total_seconds())
To get the timedelta object out of the database:
# Assume the previously created TimeModel object has an id of 1
td = timedelta(seconds=TimeModel.objects.get(id=1).time)
Note: I'm using Python 2.7 for this example.
https://bitbucket.org/schinckel/django-timedelta-field/src
There is a ticket which dates back to July 2006 relating to this:
https://code.djangoproject.com/ticket/2443
Several patches were written but the one that was turned in to a project:
https://github.com/johnpaulett/django-durationfield
Compared to all the other answers here this project is mature and would have been merged to core except that its inclusion is currently considered to be "bloaty".
Personally, I've just tried a bunch of solutions and this is the one that works beautifully.
from django.db import models
from durationfield.db.models.fields.duration import DurationField
class Event(models.Model):
start = models.DateTimeField()
duration = DurationField()
#property
def finish(self):
return self.start + self.duration
Result:
$ evt = Event.objects.create(start=datetime.datetime.now(), duration='1 week')
$ evt.finish
Out[]: datetime.datetime(2013, 6, 13, 5, 29, 29, 404753)
And in admin:
Change event
Duration: 7 days, 0:00:00
For PostgreSQL, use django-pgsql-interval-field here: http://code.google.com/p/django-pgsql-interval-field/
Putting this out there cause it might be another way to solve this problem.
first install this library: https://pypi.python.org/pypi/django-timedeltafield
Then:
import timedelta
class ModelWithTimeDelta(models.Model):
timedeltafield = timedelta.fields.TimedeltaField()
within the admin you will be asked to enter data into the field with the following format: 3 days, 4 hours, 2 minutes
There is a workaround explained here. If you're using Postgresql, then multiplying the result of F expression with timedelta solves the problem. For example if you have a start_time and a duration in minutes, you can calculate the end_time like this:
YourModel.objects.annotate(
end_time=ExpressionWrapper(F('start_time') + timedelta(minutes=1) * F('duration'), output_field=DateTimeField())
)