Making queries using F() and timedelta at django - django

I have the following model:
class Process(models.Model):
title = models.Charfield(max_length=255)
date_up = models.DateTimeField(auto_now_add=True)
days_activation = models.PositiveSmallIntegerField(default=0)
Now I need to query for all Process objects that have expired, according to their value of days_activation.
I tried
from datetime import datetime, timedelta
Process.objects.filter(date_up__lte=datetime.now()-timedelta(days=F('days_activation')))
and received the following error message:
TypeError: unsupported type for timedelta days component: F
I can of course do it in Python:
filter (lambda x: x.date_up<=datetime.now() - timedelta(days=x.days_activation),
Process.objects.all ()),
but I really need to produce a django.db.models.query.QuerySet.

7 days == 1 day * 7
F is deep-black Django magic and the objects that encounter it
must belong to the appropriate magical circles to handle it.
In your case, django.db.models.query.filter knows about F, but datetime.timedelta does not.
Therefore, you need to keep the F out of the timedelta argument list.
Fortunately, multiplication of timedelta * int is supported by F,
so the following can work:
Process.objects.filter(date_up__lte=datetime.now()-timedelta(days=1)*F('days_activation'))
As it turns out, this will work with PostgreSQL, but will not work with SQlite (for which Django 1.11 only supports + and - for timedelta,
perhaps because of a corresponding SQlite limitation).

You are mixing two layers: run-time layer and the database layer. F function is just a helper which allows you to build slightly more complex queries with django ORM. You are using timedelta and Ftogether and expecting that django ORM will be smart enough to convert these things to raw SQL, but it can't, as I see. Maybe I am wrong and do not know something about django ORM.
Anyway, you can rewrite you ORM call with extra extra and build the WHERE clause manually using native SQL functions which equals to datetime.now() and timedelta.

You have to extend Aggregate. Do like below:
from django.db import models as DM
class BaseSQL(object):
function = 'DATE_SUB'
template = '%(function)s(NOW(), interval %(expressions)s day)'
class DurationAgr(BaseSQL, DM.Aggregate):
def __init__(self, expression, **extra):
super(DurationAgr, self).__init__(
expression,
output_field=DM.DateTimeField(),
**extra
)
Process.objects.filter(date_up__lte=DurationAgr('days_activation'))
Hopefully, It will work for you. :)

I tried to use solution by Lutz Prechelt above, but got MySQL syntax error.
It's because we can't perform arithmetic operations with INTERVAL in MySQL.
So, for MySQL my solution is create a custom DB function:
class MysqlSubDate(Func):
function = 'SUBDATE'
output_field = DateField()
Example of usage:
.annotate(remainded_days=MysqlSubDate('end_datetime', F('days_activation')))
Also you can use timedelta, it will be converted into INTERVAL
.annotate(remainded_days=MysqlSubDate('end_datetime', datetime.timedelta(days=10)))

Related

Django manager with datetime.timedelta object inside F query combined with annotate and filter

I am trying to create manager method inside my app, to filter emails object, that have been created 5/10/15 minutes or what so ever, counting exactly from now.
I though I'am gonna use annotate to create new parameter, which will be bool and his state depends on simple subtraction with division and checking if the result is bigger than 0.
from django.db.models import F
from django.utils import timezone
delta = 60 * 1 * 5
current_date = timezone.now()
qs = self.annotate(passed=((current_date - F('created_at')).seconds // delta > 0)).filter(passed=True)
Atm my error says:
AttributeError: 'CombinedExpression' object has no attribute 'seconds'
It is clearly happening duo the fact, that ((current_date - F('created_at')) does not evaluate to datetime.timedelta object but to the CombinedExpression object.
I see more problems out there, i.e. how to compare the expression to 0?
Anyway, would appreciate any tips if I am somewhere close to achieve my goal or is my entire logic behind this query incorrect
Well, I managed to find the solution, even though it might not be the elegant one, it works
qs = self.annotate(foo=Sum(current_date - F('created_at'))).filter(foo__gt=Sum(timezone.timedelta(seconds=delta)))
Why not something like this:
time_cut_off = timezone.now() - timezone.timedelta(minutes=delta)
qs = self.filter(created_at__gte=time_cut_off)
This will get you the messages created in the last delta minutes. Or where you looking for messages created exactly 5 minutes ago (how do you define that if that is the question).
The documentation provides a simple and elegant solution if your timedelta is a constant :
For date and date/time fields, you can add or subtract a timedelta object. The following would return all entries that were modified more than 3 days after they were published:
>>> from datetime import timedelta
>>> Entry.objects.filter(mod_date__gt=F('pub_date') + timedelta(days=3))
In your case I don't think you even need the F() objects.

How to use a tsvector field to perform ranking in Django with postgresql full-text search?

I need to perform a ranking query using postgresql full-text search feature and Django with django.contrib.postgres module.
According to the doc, it is quite easy to do this using the SearchRank class by doing the following:
>>> from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector('body_text')
>>> query = SearchQuery('cheese')
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
This probably works well but this is not exactly what I want since I have a field in my table which already contains tsvectorized data that I would like to use (instead of recomputing tsvector at each search query).
Unforunately, I can't figure out how to provide this tsvector field to the SearchRank class instead of a SearchVector object on a raw data field.
Is anyone able to indicate how to deal with this?
Edit:
Of course, simply trying to instantiate a SearchVector from the tsvector field does not work and fails with this error (approximately since I translated it from french):
django.db.utils.ProgrammingError: ERROR: function to_tsvector(tsvector) does not exist
If your model has a SearchVectorField like so:
from django.contrib.postgres.search import SearchVectorField
class Entry(models.Model):
...
search_vector = SearchVectorField()
you would use the F expression:
from django.db.models import F
...
Entry.objects.annotate(
rank=SearchRank(F('search_vector'), query)
).order_by('-rank')
I've been seeing mixed answers here on SO and in the official documentation. F Expressions aren't used in the documentation for this. However it may just be that the documentation doesn't actually provide an example for using SearchRank with a SearchVectorField.
Looking at the output of .explain(analyze=True) :
Without the F Expression:
Sort Key: (ts_rank(to_tsvector(COALESCE((search_vector)::text, ''::text))
When the F Expression is used:
Sort Key: (ts_rank(search_vector, ...)
In my experience, it seems the only difference between using an F Expression and the field name in quotes is that using the F Expression returns much faster, but is sometimes less accurate - depending on how you structure the query - it can be useful to enforce it with a COALESCE in some cases. In my case it's about a 3-5x speedboost to use the F Expression with my SearchVectorField.
Ensuring your SearchQuery has a config kwarg also improves things dramatically.

Django time difference with F object

I have the following model:
class Assignment(models.Model):
extra_days = models.IntegerField(default=0)
due_date = models.DateTimeField()
Where due_date is the date the assignment is due and extra_days is the number of extra days given after the due date to finish the assignment.
I want to create a query that returns all rows where due_date + extra_days is greater than the current date. Here's what I am doing:
from django.utils import timezone
from django.db.models import F
from datetime import datetime
cur_date = timezone.make_aware(datetime.now(), timezone.get_default_timezone())
a = Assignment.objects.filter(extra_days__gt=cur_date - F('due_date'))
When I print a, I get the following error:
File "c:\Python27\lib\site-packages\MySQLdb\cursors.py", line 204, in execute
if not self._defer_warnings: self._warning_check()
File "c:\Python27\lib\site-packages\MySQLdb\cursors.py", line 117, in _warning
_check
warn(w[-1], self.Warning, 3)
Warning: Truncated incorrect DOUBLE value: '2013-09-01 02:54:31'
If I do a time difference that results in, say, 3.1 days, I'm assuming the days difference would be still be 3. I think it would more correct to do something like this:
a = Assignment.objects.filter(due_date__gt=cur_date - timedelta(days=F('extra_days')))
But that also results in an error.
How can I do this without writing a raw SQL query?
This depends on the database backend you are using, which seems to be PostgreSQL.
PostgreSQL can subtract dates directly, so the following will work:
from django.db.models import F, Func
from django.db.models.functions import Now
class DaysInterval(Func):
function = 'make_interval'
template = '%(function)s(days:=%(expressions)s)'
qs = Assignment.objects.annotate(remaining_days=F('due_date') - Now())
qs.filter(remaining_days__lt=DaysInterval(F('extra_days')))
This results in the following SQL:
SELECT "assignments_assignment"."id",
"assignments_assignment"."extra_days",
"assignments_assignment"."due_date",
("assignments_assignment"."due_date" - STATEMENT_TIMESTAMP()) AS "remaining_days"
FROM "assignments_assignment"
WHERE ("assignments_assignment"."due_date" - STATEMENT_TIMESTAMP())
< (make_interval(DAYS:="assignments_assignment"."extra_days"))
For date difference calculations in other database backends see the Datediff function created by Michael Brooks.
It seems like what I'm trying to do is not possible. I ended up writing a raw query:
cursor.execute("SELECT * FROM app_assignment WHERE DATE_ADD(due_date, INTERVAL extra_days DAYS) > utc_timestamp()")
I was so repulsed at not being able to use the ORM for doing something so seemingly simple that I considered trying out SQLAlchemy, but a raw query works fine. I always tried workarounds to make sure I could use the ORM, but I'll use raw SQL going forwards for complex queries.
As far as I know , you can not pass an F() object as a params to another function since F() base class is a tree.Node type, A class for storing a tree graph which primarily used for filter constructs in the ORM.
see F() define at django/db/models/expression.py and Node at django/utils/tree.py (django 1.3.4)
class ExpressionNode(tree.Node):
...
class F(ExpressionNode):
"""
An expression representing the value of the given field.
"""
def __init__(self, name):
super(F, self).__init__(None, None, False)
self.name = name
def __deepcopy__(self, memodict):
obj = super(F, self).__deepcopy__(memodict)
obj.name = self.name
return obj
def prepare(self, evaluator, query, allow_joins):
return evaluator.prepare_leaf(self, query, allow_joins)
def evaluate(self, evaluator, qn, connection):
return evaluator.evaluate_leaf(self, qn, connection)
you can do something like
Assignment.objects.filter(due_date__gt=F('due_date') - timedelta(days=1))
but not
Assignment.objects.filter(due_date__gt=cur_date - timedelta(days=F('extra_days')))
Correct me if i was wrong. Hope this little help.
Just in case anyone else looks for this, here's something that might be worth looking into.
I'm using Django 1.4 and am running into the exact same issue as the OP. Seems that the issue is probably due to timedelta and datetime needing to evaluate before being sent to the database, but the F object is inherently only going to resolve in the database.
I noticed that in Django 1.8, a new DurationField was introduced that looks like it would directly work like python's timedelta . This should mean that instead of needing to take the timedelta of an F object look up on an IntegerField, one could theoretically use a DurationField and then the F object wouldn't need to be in a timedelta at all. Unfortunately, due to dependencies, I'm not currently able to upgrade my project to 1.8 and test this theory.
If anyone else encounters this problem and is able to test my suggestion, I'd love to know. If I resolve my dependencies and can upgrade to 1.8, then I'll be sure to post back with my results.

How can I make a Django update with a conditional case?

I would like to use Django to update a field to a different value depending on its current value, but I haven't figured out how to do it without doing 2 separate update statements.
Here's an example of what I'd like to do:
now = timezone.now()
data = MyData.objects.get(pk=dataID)
if data.targetTime < now:
data.targetTime = now + timedelta(days=XX)
else:
data.targetTime = data.targetTime + timedelta(days=XX)
data.save()
Now, I'd like to use an update() statement to avoid overwriting other fields on my data, but I don't know how to do it in a single update(). I tried some code like this, but the second update didn't use the up to date time (I ended up with a field equal to the current time) :
# Update the time to the current time
now = timezone.now()
MyData.objects.filter(pk=dataID).filter(targetTime__lt=now).update(targetTime=now)
# Then add the additional time
MyData.objects.filter(pk=dataID).update(targetTime=F('targetTime') + timedelta(days=XX))
Is there a way I can reduce this to a single update() statement? Something similar to the SQL CASE statement?
You need to use conditional expressions, like this
from django.db.models import Case, When, F
object = MyData.objects.get(pk=dataID)
now = timezone.now()
object.targetTime = Case(
When(targetTime__lt=now, then=now + timedelta(days=XX)),
default=F('targetTime') + timedelta(days=XX)
)
object.save(update_fields=['targetTime'])
For debugging, try running this right after save to see what SQL queries have just run:
import pprint
from django.db import connection
pprint.pprint(["queries", connection.queries])
I've tested this with integers and it works in Django 1.8, I haven't tried dates yet so it might need some tweaking.
Django 1.9 added the Greatest and Least database functions. This is an adaptation of Benjamin Toueg's answer:
from django.db.models import F
from django.db.models.functions import Greatest
MyData.objects.filter(pk=dataID).update(
targetTime=Greatest(F('targetTime'), timezone.now()) + timedelta(days=XX)
)
Simple Example for Django 3 and above:
from django.db.models import Case, Value, When, F
MyModel.objects.filter(abc__id=abc_id_list)\
.update(status=Case(
When(xyz__isnull=False, then=Value("this_value")),
default=Value("default_value"),))
If I understand correctly, you take the maximum time between now and the value in database.
If that is so, you can do it in one line with the max function:
from django.db.models import F
MyData.objects.filter(pk=dataID).update(targetTime=max(F('targetTime'),timezone.now()) + timedelta(days=XX))
Instead of using queryset.update(...), use obj.save(update_fields=['field_one', 'field_two']) (see https://docs.djangoproject.com/en/dev/ref/models/instances/#specifying-which-fields-to-save), which won't overwrite your existing fields.
It's not possible to do this without a select query first (get), because you're doing two different things based on a conditional (i.e., you can't pass that kind of logic to the database with Django - there are limits to what can be achieved with F), but at least this gets you a single insert/update.
I have figured out how to do it with a raw SQL statement:
cursor = connection.cursor()
cursor.execute("UPDATE `mydatabase_name` SET `targetTime` = CASE WHEN `targetTime` < %s THEN %s ELSE (`targetTime` + %s) END WHERE `dataID` = %s", [timezone.now(), timezone.now() + timedelta(days=XX), timedelta(days=XX), dataID])
transaction.commit_unless_managed()
I'm using this for now and it seems to be accomplishing what I want.

Aggregate difference between DateTime fields in Django

I have a table containing a series of entries which relate to time periods (specifically, time worked for a client):
task_time:
id | start_time | end_time | client (fk)
1 08/12/2011 14:48 08/12/2011 14:50 2
I am trying to aggregate all the time worked for a given client, from my Django app:
time_worked_aggregate = models.TaskTime.objects.\
filter(client = some_client_id).\
extra(select = {'elapsed': 'SUM(task_time.end_time - task_time.start_time)'}).\
values('elapsed')
if len(time_worked_aggregate) > 0:
time_worked = time_worked_aggregate[0]['elapsed'].total_seconds()
else:
time_worked = 0
This seems inelegant, but it does work. Or at least so I thought: it turns out that it works fine on a PostgreSQL database, but when I move over to SQLite, everything dies.
A bit of digging suggests that the reason for this is that DateTimes aren't first-class data in SQLite. The following raw SQLite query will do my job:
SELECT SUM(strftime('%s', end_time) - strftime('%s', start_time)) FROM task_time WHERE ...;
My question is as follows:
The Python sample above seems roundabout. Can we do this more elegantly?
More importantly at this stage, can we do it in a way that will work on both Postgres and SQLite? Ideally, I'd like not to be writing raw SQL queries and switching on the database backend that happens to be in place; in general, Django is extremely good at protecting us from this. Does Django have a reasonable abstraction for this operation? If not, what's a sensible way for me to do a conditional switch on the backend?
I should mention for context that the dataset is many thousands of entries; the following is not really practical:
sum([task_time.end_date - task_time.start_date for task_time in models.TaskTime.objects.filter(...)])
Almost the same solution as #andri proposed. In the final result you will get the same data.
ExpressionWrapper - New in Django 1.8.
from datetime import timedelta
from django.db.models import ExpressionWrapper, F, fields
from app.models import MyModel
duration = ExpressionWrapper(F('closed_at') - F('opened_at'), output_field=fields.DurationField())
objects = MyModel.objects.closed().annotate(duration=duration).filter(duration__gt=timedelta(seconds=2))
for obj in objects:
print obj.id, obj.duration, obj.duration.seconds
# sample output
# 807 0:00:57.114017 57
# 800 0:01:23.879478 83
# 804 3:40:06.797188 13206
# 801 0:02:06.786300 126
I think since Django 1.8 we can do better:
I would like just to draw the part with annotation, the further part with aggregation should be straightforward:
from django.db.models import F, Func
SomeModel.objects.annotate(
duration = Func(F('end_date'), F('start_date'), function='age')
)
[more about postgres age function here: http://www.postgresql.org/docs/8.4/static/functions-datetime.html ]
each instance of SomeModel will be anotated with duration field containg time difference, which in python will be a datetime.timedelta() object [more about datetime timedelta here: https://docs.python.org/2/library/datetime.html#timedelta-objects ]
I will do it step by step:
first step:annotate the timedelta
group by and sum timedelta
the code like this:
from django.db.models import Count, Sum, F
times_obj_list = models.TaskTime.objects.annotate(times=F("end_time")-F("start_time"))
groupby_obj_list = times_obj_list.values("client").annotate(cnt=Count("id"),seconds=Sum(times)).order_by()
Django currently only supports aggregates for Min, Max, Avg and Count, so using raw SQL is the only way to achieve what you want. When you use raw SQL, database-independence is out the window, so unfortunately, you're out of luck. You'll have to just detect the database and alter the SQL appropriately.