How to aggregate computed field with django ORM? (without raw SQL)

How to aggregate computed field with django ORM? (without raw SQL) - django

I'm trying to find the cumulated duration of some events, 'start' and 'end' field are both django.db.models.DateTimeField fields.
What I would like to do should have been written like this:
from django.db.models import F, Sum
from my.models import Event
Event.objects.aggregate(anything=Sum(F('start') - F('end')))
# this first example return:
# AttributeError: 'ExpressionNode' object has no attribute 'split'
# Ok I'll try more SQLish:
Event.objects.extra(select={
'extra_field': 'start - end'
}).aggregate(Sum('extra_field'))
# this time:
# FieldError: Cannot resolve keyword 'extra_field' into field.
I can't agreggate (Sum) start and end separately then substract in python because DB can't Sum DateTime objects.
A good way to do without raw sql?

Can't help Christophe without a Delorean, but I was hitting this error and was able to solve it in Django 1.8 like:
total_sum = Event.objects\
.annotate(anything=Sum(F('start') - F('end')))\
.aggregate(total_sum=Sum('anything'))['total_sum']
When I couldn't upgrade all my dependencies to 1.8, I found this to work with Django 1.7.9 on top of MySQL:
totals = self.object_list.extra(Event.objects.extra(select={
'extra_field': 'sum(start - end)'
})[0]

If you are on Postgres, then you can use the django-pg-utils package and compute in the database. Cast the duration field into seconds and then take the sum
from pg_utils import Seconds
from django.db.models import Sum
Event.objects.aggregate(anything=Sum(Seconds(F('start') - F('end'))))

This answer don't realy satisfy me yet, my current work around works but it's not DB computed...
reduce(lambda h, e: h + (e.end - e.start).total_seconds(), events, 0)
It returns the duration of all events in the queryset in seconds
Better SQL less solutions?

Related

Custom transform for DateRangeField to query on delta between start and stop dates in Django

Question is regarding custom transform or lookup( can’t figure it out yet).
For example I have following model.
from django.contrib.postgres.fields import DateRangeField
class Example(models.Model).
date_range = DateRangeField()
Each date_range is a object with start and stop dates, for example [2020-02-24, 2020-04-16)
Question is – is it possible to create a transform or(and) lookup in order to filter instances of model by their range between start date and stop date?
Example
I want to find instances where difference between start and stop date would be more then 1 year.
This would be something like
True - [2020-02-24, 2021-04-16) - delta more then one year
False - [2020-02-24, 2020-04-16) - delta less then one year
Example.objects.filter(date_range__transform_name_here__gt=365)
I can do it via raw SQL but I don’t want to use it as it is quite common task.
Thank you.

I know it's probably a bit late to help you, but you should be able to do this to achieve your goal:
from django.db.models import DurationField, ExpressionWrapper, F
from django.db.models.functions import Lower, Upper
from django.utils import timezone
greater_than_a_year = Example.objects.all().annotate(
delta=ExpressionWrapper(
F('date_range__endswith') - F('date_range__startswith'), output_field=DurationField()
).filter(
delta__gt=timezone.timedelta(days=365)
)
Here we annotate by subtracting the lower portion of the range from the upper, and output as a DurationField() with the name delta. Then we simply filter on delta like any other DurationField.

Django manager with datetime.timedelta object inside F query combined with annotate and filter

I am trying to create manager method inside my app, to filter emails object, that have been created 5/10/15 minutes or what so ever, counting exactly from now.
I though I'am gonna use annotate to create new parameter, which will be bool and his state depends on simple subtraction with division and checking if the result is bigger than 0.
from django.db.models import F
from django.utils import timezone
delta = 60 * 1 * 5
current_date = timezone.now()
qs = self.annotate(passed=((current_date - F('created_at')).seconds // delta > 0)).filter(passed=True)
Atm my error says:
AttributeError: 'CombinedExpression' object has no attribute 'seconds'
It is clearly happening duo the fact, that ((current_date - F('created_at')) does not evaluate to datetime.timedelta object but to the CombinedExpression object.
I see more problems out there, i.e. how to compare the expression to 0?
Anyway, would appreciate any tips if I am somewhere close to achieve my goal or is my entire logic behind this query incorrect

Well, I managed to find the solution, even though it might not be the elegant one, it works
qs = self.annotate(foo=Sum(current_date - F('created_at'))).filter(foo__gt=Sum(timezone.timedelta(seconds=delta)))

Why not something like this:
time_cut_off = timezone.now() - timezone.timedelta(minutes=delta)
qs = self.filter(created_at__gte=time_cut_off)
This will get you the messages created in the last delta minutes. Or where you looking for messages created exactly 5 minutes ago (how do you define that if that is the question).

The documentation provides a simple and elegant solution if your timedelta is a constant :
For date and date/time fields, you can add or subtract a timedelta object. The following would return all entries that were modified more than 3 days after they were published:
>>> from datetime import timedelta
>>> Entry.objects.filter(mod_date__gt=F('pub_date') + timedelta(days=3))
In your case I don't think you even need the F() objects.

How can I make a Django update with a conditional case?

I would like to use Django to update a field to a different value depending on its current value, but I haven't figured out how to do it without doing 2 separate update statements.
Here's an example of what I'd like to do:
now = timezone.now()
data = MyData.objects.get(pk=dataID)
if data.targetTime < now:
data.targetTime = now + timedelta(days=XX)
else:
data.targetTime = data.targetTime + timedelta(days=XX)
data.save()
Now, I'd like to use an update() statement to avoid overwriting other fields on my data, but I don't know how to do it in a single update(). I tried some code like this, but the second update didn't use the up to date time (I ended up with a field equal to the current time) :
# Update the time to the current time
now = timezone.now()
MyData.objects.filter(pk=dataID).filter(targetTime__lt=now).update(targetTime=now)
# Then add the additional time
MyData.objects.filter(pk=dataID).update(targetTime=F('targetTime') + timedelta(days=XX))
Is there a way I can reduce this to a single update() statement? Something similar to the SQL CASE statement?

You need to use conditional expressions, like this
from django.db.models import Case, When, F
object = MyData.objects.get(pk=dataID)
now = timezone.now()
object.targetTime = Case(
When(targetTime__lt=now, then=now + timedelta(days=XX)),
default=F('targetTime') + timedelta(days=XX)
)
object.save(update_fields=['targetTime'])
For debugging, try running this right after save to see what SQL queries have just run:
import pprint
from django.db import connection
pprint.pprint(["queries", connection.queries])
I've tested this with integers and it works in Django 1.8, I haven't tried dates yet so it might need some tweaking.

Django 1.9 added the Greatest and Least database functions. This is an adaptation of Benjamin Toueg's answer:
from django.db.models import F
from django.db.models.functions import Greatest
MyData.objects.filter(pk=dataID).update(
targetTime=Greatest(F('targetTime'), timezone.now()) + timedelta(days=XX)
)

Simple Example for Django 3 and above:
from django.db.models import Case, Value, When, F
MyModel.objects.filter(abc__id=abc_id_list)\
.update(status=Case(
When(xyz__isnull=False, then=Value("this_value")),
default=Value("default_value"),))

If I understand correctly, you take the maximum time between now and the value in database.
If that is so, you can do it in one line with the max function:
from django.db.models import F
MyData.objects.filter(pk=dataID).update(targetTime=max(F('targetTime'),timezone.now()) + timedelta(days=XX))

Instead of using queryset.update(...), use obj.save(update_fields=['field_one', 'field_two']) (see https://docs.djangoproject.com/en/dev/ref/models/instances/#specifying-which-fields-to-save), which won't overwrite your existing fields.
It's not possible to do this without a select query first (get), because you're doing two different things based on a conditional (i.e., you can't pass that kind of logic to the database with Django - there are limits to what can be achieved with F), but at least this gets you a single insert/update.

I have figured out how to do it with a raw SQL statement:
cursor = connection.cursor()
cursor.execute("UPDATE `mydatabase_name` SET `targetTime` = CASE WHEN `targetTime` < %s THEN %s ELSE (`targetTime` + %s) END WHERE `dataID` = %s", [timezone.now(), timezone.now() + timedelta(days=XX), timedelta(days=XX), dataID])
transaction.commit_unless_managed()
I'm using this for now and it seems to be accomplishing what I want.

Django + PostgreSQL group by date on datetimefield

I have a model which has a datetimefield that I'm trying to annotate on grouping by date.
Eg:
order_totals = Transfer.objects.filter(created__range=[datetime.datetime.combine(datetime.date.today(), datetime.time.min) + datetime.timedelta(days=-5), datetime.datetime.combine(datetime.date.today(), datetime.time.max)]).values('created').annotate(Count('id'))
The problem with the above is it groups by every second/millisecond of the datetime field rather then just the date.
How would I do this?

You should be able to solve this by using QuerySet.extra and add a column to the query
eg.
qs.filter(...).extra(select={'created_date': 'created::date'}).values('created_date')

Starting on Django 1.8, you can also use the new DateTime expression (weirdly it's is not documented in the built-in expressions sheet).
import pytz
from django.db.models.expressions import DateTime
qs.annotate(created_date=DateTime('created', 'day', pytz.UTC))
If you want to group by created_date, just chain another aggregating expression :
qs.annotate(created_date=DateTime('created', 'day', pytz.UTC)).values('created_date').annotate(number=Count('id'))
(Redundant values is needed to generate the appropriate GROUP BY. See aggregation topic in Django documentation).

Aggregate difference between DateTime fields in Django

I have a table containing a series of entries which relate to time periods (specifically, time worked for a client):
task_time:
id | start_time | end_time | client (fk)
1 08/12/2011 14:48 08/12/2011 14:50 2
I am trying to aggregate all the time worked for a given client, from my Django app:
time_worked_aggregate = models.TaskTime.objects.\
filter(client = some_client_id).\
extra(select = {'elapsed': 'SUM(task_time.end_time - task_time.start_time)'}).\
values('elapsed')
if len(time_worked_aggregate) > 0:
time_worked = time_worked_aggregate[0]['elapsed'].total_seconds()
else:
time_worked = 0
This seems inelegant, but it does work. Or at least so I thought: it turns out that it works fine on a PostgreSQL database, but when I move over to SQLite, everything dies.
A bit of digging suggests that the reason for this is that DateTimes aren't first-class data in SQLite. The following raw SQLite query will do my job:
SELECT SUM(strftime('%s', end_time) - strftime('%s', start_time)) FROM task_time WHERE ...;
My question is as follows:
The Python sample above seems roundabout. Can we do this more elegantly?
More importantly at this stage, can we do it in a way that will work on both Postgres and SQLite? Ideally, I'd like not to be writing raw SQL queries and switching on the database backend that happens to be in place; in general, Django is extremely good at protecting us from this. Does Django have a reasonable abstraction for this operation? If not, what's a sensible way for me to do a conditional switch on the backend?
I should mention for context that the dataset is many thousands of entries; the following is not really practical:
sum([task_time.end_date - task_time.start_date for task_time in models.TaskTime.objects.filter(...)])

Almost the same solution as #andri proposed. In the final result you will get the same data.
ExpressionWrapper - New in Django 1.8.
from datetime import timedelta
from django.db.models import ExpressionWrapper, F, fields
from app.models import MyModel
duration = ExpressionWrapper(F('closed_at') - F('opened_at'), output_field=fields.DurationField())
objects = MyModel.objects.closed().annotate(duration=duration).filter(duration__gt=timedelta(seconds=2))
for obj in objects:
print obj.id, obj.duration, obj.duration.seconds
# sample output
# 807 0:00:57.114017 57
# 800 0:01:23.879478 83
# 804 3:40:06.797188 13206
# 801 0:02:06.786300 126

I think since Django 1.8 we can do better:
I would like just to draw the part with annotation, the further part with aggregation should be straightforward:
from django.db.models import F, Func
SomeModel.objects.annotate(
duration = Func(F('end_date'), F('start_date'), function='age')
)
[more about postgres age function here: http://www.postgresql.org/docs/8.4/static/functions-datetime.html ]
each instance of SomeModel will be anotated with duration field containg time difference, which in python will be a datetime.timedelta() object [more about datetime timedelta here: https://docs.python.org/2/library/datetime.html#timedelta-objects ]

I will do it step by step:
first step:annotate the timedelta
group by and sum timedelta
the code like this:
from django.db.models import Count, Sum, F
times_obj_list = models.TaskTime.objects.annotate(times=F("end_time")-F("start_time"))
groupby_obj_list = times_obj_list.values("client").annotate(cnt=Count("id"),seconds=Sum(times)).order_by()

Django currently only supports aggregates for Min, Max, Avg and Count, so using raw SQL is the only way to achieve what you want. When you use raw SQL, database-independence is out the window, so unfortunately, you're out of luck. You'll have to just detect the database and alter the SQL appropriately.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to aggregate computed field with django ORM? (without raw SQL) - django

If you are on Postgres, then you can use the django-pg-utils package and compute in the database. Cast the duration field into seconds and then take the sum from pg_utils import Seconds from django.db.models import Sum Event.objects.aggregate(anything=Sum(Seconds(F('start') - F('end'))))

This answer don't realy satisfy me yet, my current work around works but it's not DB computed... reduce(lambda h, e: h + (e.end - e.start).total_seconds(), events, 0) It returns the duration of all events in the queryset in seconds Better SQL less solutions?

Related

Custom transform for DateRangeField to query on delta between start and stop dates in Django

Django manager with datetime.timedelta object inside F query combined with annotate and filter

How can I make a Django update with a conditional case?

Django + PostgreSQL group by date on datetimefield

Aggregate difference between DateTime fields in Django

Categories

Resources