filter pandas dataframe for timedeltas - python-2.7

I got a pandas dataframe, containing timestamps 'expiration' and 'date'.
I want to filter for rows with a certain maximum delta between expiration and date.
When doing fr.expiration - fr.date I obtain timedelta values, but don't know how
to get a filter criteria such as fr[timedelta(fr.expiration-fr.date)<=60days]

for the 60 days you're looking to compare to, create a timedelta object of that value timedelta(days=60) and use that for the filter. and if you're already getting timedelta objects from the subtraction, recasting it to a timedelta seems unnecessary.
and finally, make sure you check the signs of the timedeltas you're comparing.

# sashkello
Thanks,
filterfr = filterfr[filterfr.expiration-filterfr.date <= numpy.timedelta64(datetime.timedelta(days = 60))]
did the trick.
filterfr.expiration-filterfr.date
resulted in timedelta64 values
and raised TypeError: can't compare datetime.timedelta to long.
Converting to numpy.timedelta before comparision worked.

Related

Django ORM converting date to datetime which is slowing down query 30x

I'm attempting query a table and filter the results by date on a datetime field:
.filter(bucket__gte = start_date) where bucket is a datetimefield and start_date is a date object.
However django converts the start_date to a timestamp in the raw sql ex 2020-02-01 00:00:00 when I want it just be a date ex 2020-02-01.
For some reason casting bucket to a date or casting start_time to a timestamp makes the query 30x slower.
When I manually write a query and compare bucket directly to a date ex bucket >= '2020-02-01' the query is blazing fast.
How can I get the django orm to do this?
Seems that its most efficient to convert your date to a datetime in python then do the lookup on the orm since you are filtering on a DateTimeField
from datetime import datetime
.filter(bucket__gte=datetime.combine(start_date, datetime.min.time()))
If the field bucket is indexed, the explain of this query should indicate an Index Scan which would provide desired and most efficient execution plan.
If not it should still be faster since you avoid casting
For some reason casting bucket to a date or casting start_time to a timestamp makes the query 30x slower.
Yes, casting bucket to a date would prevent use of an index (unless the index was over the casted column). But casting start_time to a timestamp would not. What is it 30 times slower than? You just said python automatically converts it, so, how is that different than casting it? As in, what is your actual code?
When I manually write a query and compare bucket directly to a date ex bucket >= '2020-02-01' the query is blazing fast.
OK, but what is it actually doing?
explain select * from foo where bucket > '2021-03-01';
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on foo (cost=0.00..38.25 rows=753 width=8)
Filter: (bucket > '2021-03-01 00:00:00-05'::timestamp with time zone)
(2 rows)
PostgreSQL is also converting it to a timestamp. Does it give the right answer or the wrong answer?
Try convet your datetime to date in filter:
.filter(bucket__date__gte = start_date)

Custom transform for DateRangeField to query on delta between start and stop dates in Django

Question is regarding custom transform or lookup( can’t figure it out yet).
For example I have following model.
from django.contrib.postgres.fields import DateRangeField
class Example(models.Model).
date_range = DateRangeField()
Each date_range is a object with start and stop dates, for example [2020-02-24, 2020-04-16)
Question is – is it possible to create a transform or(and) lookup in order to filter instances of model by their range between start date and stop date?
Example
I want to find instances where difference between start and stop date would be more then 1 year.
This would be something like
True - [2020-02-24, 2021-04-16) - delta more then one year
False - [2020-02-24, 2020-04-16) - delta less then one year
Example.objects.filter(date_range__transform_name_here__gt=365)
I can do it via raw SQL but I don’t want to use it as it is quite common task.
Thank you.
I know it's probably a bit late to help you, but you should be able to do this to achieve your goal:
from django.db.models import DurationField, ExpressionWrapper, F
from django.db.models.functions import Lower, Upper
from django.utils import timezone
greater_than_a_year = Example.objects.all().annotate(
delta=ExpressionWrapper(
F('date_range__endswith') - F('date_range__startswith'), output_field=DurationField()
).filter(
delta__gt=timezone.timedelta(days=365)
)
Here we annotate by subtracting the lower portion of the range from the upper, and output as a DurationField() with the name delta. Then we simply filter on delta like any other DurationField.

django query: get objects where only 'time' is greater than HH:mm no matter what the date is?

I have a model called, MinutePrice, which has a DateTimeField as one of fields.
What I want to do is making query of objects whose time is greater than 15:30, no matter what the date is.
What I've tried:
MinuetePrice.objects.filter(Q(date_time__lte="15:30"))
Errors occured:
ValidationError: ["'15:30' value has an invalid format. It must be in YYYY-MM-DD HH:MM[:ss[.uuuuuu]][TZ] format."]
Any ideas to solve this?
The __time filter can be used for this
MinuetePrice.objects.filter(date_time__time__gt=datetime.time(15, 30))
You need a datetime object to compare, you cannot compare it to a string directly.
datetime.time(15,30)
It should solve your problem.
If you want get all record have time greater than 15:30, you can try Django query like this:
MinuetePrice.objects.filter(Q(timestamp__hour__gte=16) | Q(timestamp__hour__gte=15, timestamp__minute__gte=30))
Document in this queryset time

Django manager with datetime.timedelta object inside F query combined with annotate and filter

I am trying to create manager method inside my app, to filter emails object, that have been created 5/10/15 minutes or what so ever, counting exactly from now.
I though I'am gonna use annotate to create new parameter, which will be bool and his state depends on simple subtraction with division and checking if the result is bigger than 0.
from django.db.models import F
from django.utils import timezone
delta = 60 * 1 * 5
current_date = timezone.now()
qs = self.annotate(passed=((current_date - F('created_at')).seconds // delta > 0)).filter(passed=True)
Atm my error says:
AttributeError: 'CombinedExpression' object has no attribute 'seconds'
It is clearly happening duo the fact, that ((current_date - F('created_at')) does not evaluate to datetime.timedelta object but to the CombinedExpression object.
I see more problems out there, i.e. how to compare the expression to 0?
Anyway, would appreciate any tips if I am somewhere close to achieve my goal or is my entire logic behind this query incorrect
Well, I managed to find the solution, even though it might not be the elegant one, it works
qs = self.annotate(foo=Sum(current_date - F('created_at'))).filter(foo__gt=Sum(timezone.timedelta(seconds=delta)))
Why not something like this:
time_cut_off = timezone.now() - timezone.timedelta(minutes=delta)
qs = self.filter(created_at__gte=time_cut_off)
This will get you the messages created in the last delta minutes. Or where you looking for messages created exactly 5 minutes ago (how do you define that if that is the question).
The documentation provides a simple and elegant solution if your timedelta is a constant :
For date and date/time fields, you can add or subtract a timedelta object. The following would return all entries that were modified more than 3 days after they were published:
>>> from datetime import timedelta
>>> Entry.objects.filter(mod_date__gt=F('pub_date') + timedelta(days=3))
In your case I don't think you even need the F() objects.

Django ORM: How can I filter based on an annotation timedelta result

I have a model like this:
class Interval(models.Model):
start = models.Datetime()
end = models.Datetime(null=True)
I would like to query all intervals that are larger then 5 minutes.
I'am able to do intervals=Interval.objects.exclude(end=None).annotate(d=models.F("end")-models.F("start"))
When I do intervals[0].d , I have the interval, which is correct. Now I would like to only get as results the entries where d is greater than 5 minutes.
I tried intervals=Interval.objects.exclude(end=None).annotate(d=models.F("end")-models.F("start")).filter(d__gt=timedelta(0, 300)), but I get the following error: TypeError: expected string or bytes-like object. It tries to match the timedelta with a regex of datetime.
`
Any ideas?
Thanks in advance
I think the problem is that you have to specify the type of the annotated column to Django as a DurationField. So you can write it like:
from datetime import timedelta
from django.db.models import ExpressionWrapper, F, DurationField
delta = ExpressionWrapper(F("end")-F("start"), DurationField())
intervals = (Interval.objects.exclude(end=None)
.annotate(d=delta)
.filter(d__gt=timedelta(0, 300)))
This will construct a query like:
SELECT id, start, end TIMESTAMPDIFF(MICROSECOND, start, end) AS `d`
FROM interval
WHERE TIMESTAMPDIFF(MICROSECOND, start, end) > 300000000
But we here thus give Django a hint how to interpret the d field (as a DurationField), and thus how to "serialize" the timedelta object).