Django ORM: timedelta difference in days between 2 dates - django

I have a database table that represents expenses associated with a given product.
These expenses, given that they're daily, have a from_date (the date in which they started) and to_date (the date in which they ended). to_date can be null, as these expenses might still be going.
Given 2 Python datetimes, start_date and end_date, I need to produce in the ORM the total spent in the period for my_product.
>>> start_date
datetime.datetime(2021, 8, 20, 0, 0)
>>> end_date
datetime.datetime(2021, 9, 21, 0, 0)
In this case, the expected output should be:
(-104 * (days between 08/20 and 08/25)) + (-113 * (days between 08/26 and 09/21)
This is what I've got so far:
(
my_product.income_streams
.values("product")
.filter(type=IncomeStream.Types.DAILY_EXPENSE)
.filter(add_to_commission_basis=True)
.annotate(period_expenses=Case(
When(Q(from_date__lte=start_date) & Q(to_date__lte=end_date),
then=ExpressionWrapper( start_date - F('to_date'), output_field=IntegerField()))
), # Other When cases...
)
) # Sum all period_expenses results and you've got the solution
And this is what's giving me problems:
then=ExpressionWrapper( start_date - F('to_date'), output_field=IntegerField())
This expression always returns 0 (please note this is why I'm not even attempting to multiply by value: that'd be the next step).
Apparently start_date - F('to_date') is not the same as "give me the difference in days between these 2 dates".
You'd acomplish this in Python with timedelta. What's the equivalent in the ORM?
I've tried with ExtractDay:
then=ExpressionWrapper( ExtractDay(start_date - F('to_date'))
But I get: django.db.utils.OperationalError: user-defined function raised exception
And also tried with DurationField:
then=ExpressionWrapper(start_date - F('to_date'), output_field=DurationField())
But that also returns zero: datetime.timedelta(0)

Casting start_date into a DateTimeField solves the problem, and casting the difference into a DurationField is the next step.
So:
Cast(Cast(start_date, output_field=DateTimeField()) - F('to_date'), output_field=DurationField())
This will work fine on any databse backend, but in order to get the difference in days, you need to wrap it in ExtractDay, which will throw ValueError: Extract requires native DurationField database support. if you use SQLite.
If you're tied to SQLite and cannot use ExtractDay, you can use microseconds, and convert them manually to days by dividing by 86400000000
duration_in_microseconds=ExpressionWrapper(F('to_date') - (Cast(start_date, output_field=DateTimeField())), output_field=IntegerField())
Then
.annotate(duration_in_days=ExpressionWrapper(F('period_duration_microseconds') / 86400000000, output_field=DecimalField())

Related

Using django's ORM to average over "timestamp without time zone" postgres field

Backstory (can skip): I have a database with records of events. Events for each label occurred around a certain period in the year. I would like to find when in the year, more or less, that group of events occurred. Therefore I planned to calculate the average timestamp per group, and do this efficiently with postgres, instead of fetching all the timestamps and calculating locally.
The question: I'm trying to average a timestamp without time zone postgres field with django's excellent ORM like so:
from django.db.models import Avg
ModelName.objects.filter(a_field='some value').aggregate(Avg('time'))
However I'm getting:
function avg(timestamp without time zone) does not exist
LINE 1: SELECT AVG("model_name"."time") AS "time__avg" FROM "m...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Is there a way to do this with django's ORM?If not, how then do I workaround?
I had a similar problem, where I wanted to find the average time taken to vote for a particular item. But postgres wouldn't allow taking the average of datetimes. Doing so would result in the following error:
django.db.utils.ProgrammingError: function avg(timestamp with time zone) does not exist
LINE 1: SELECT "votes_item"."name", AVG("votes_vote"."datetime") AS ...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
To make this simpler consider the following tables, where the Vote has foreign key relation to Item:
The Item table:
id: pK
name: char
datetime(time at which the item was inserted)
1
Apple
22-06-23 11:25:33
2
Orange
22-06-22 01:22:18
The Vote table:
id: pK
user: Fk (user who voted for the item)
item: Fk (the item the user voted)
vote: (1 for positive vote and -1 negative vote)
datetime (time at which the item was voted)
1
1
1
1
2022-06-22 11:26:18
2
3
1
1
2022-06-21 12:26:36
3
2
1
1
2022-06-26 01:20:59
I wanted to know the average time at which users voted for each item. For eg: all the avg time taken by users to vote Apple (i.e annotate)
Since postgres avg function doesn't directly take in datetime, first convert it to seconds then take the average and convert it back to datetime.
To make things simpler create two class as shown below.
from django.db import models
class Epoch(models.expressions.Func):
template = 'EXTRACT(epoch FROM %(expressions)s)::FLOAT'
output_field = models.FloatField()
class DateTimeFromFloat(models.expressions.Func):
template = 'To_TIMESTAMP(%(expressions)s)::TIMESTAMP at time zone \'UTC\''
output_field = models.DateTimeField()
read more about Func in this excellent answer
Now I wanted to get the average time at which each item was voted positive
So I use
Item.objects.filter(vote__vote=1).annotate(avg_time=DateTimeFromFloat(Avg(Epoch('vote__datetime')))).values('avg_time', 'name')
important part:
annotate(avg_time=DateTimeFromFloat(Avg(Epoch('vote__datetime'))))
output
<QuerySet [{'name': 'Apple', 'avg_time': datetime.datetime(2022, 6, 23, 8, 24, 37, 666667, tzinfo=datetime.timezone.utc)}]>
You can perform a similar operation using aggregate.

Filtering django model using date

I need to filter some data from models using the date. I see some posts that speaks about ranges but I just want to see for example the rows of my table from the 22/04/2020, in other words, just one day.
Reading the documentation, I understood that I have to do the following,
import datetime
prueba = DevData.objects.order_by('-data_timestamp').filter(data_timestamp=datetime.date(2020, 4, 22))
prueba = loads(serializers.serialize('json', prueba))
for p in prueba:
print(p)
But appears the following warning:
RuntimeWarning: DateTimeField DevData.data_timestamp received a naive datetime (2020-04-22 00:00:00) while time zone support is active.
And the list appears empty. I think the problem is that is just filtering using 2020-04-22 00:00:00 and not all the day. How can I fix this?
Between two dates is working but just one day I don't know how to do it.
import datetime
start_date = datetime.date(2020, 4, 22)
end_date = datetime.date(2020, 4, 23)
prueba = DevData.objects.order_by('-data_timestamp').filter(data_timestamp__range= (start_date,end_date))
PD: I have info rows in this day.
Thank you very much.
You can make use of the __date lookup [Django-doc] to filter on the date of the timestamp:
import datetime
prueba = DevData.objects.order_by(
'-data_timestamp'
).filter(
data_timestamp__date=datetime.date(2020, 4, 22)
)

Django filter by datetime month__gte is not working

I'm using Django 2.2
I have a created field of DateTime type.
I'm using following command to filter the records which are greater than specific date
q.filter(created__year__gte=2020, created__month__gte=3, created__day__gte=1)
In my database there are records for March (3) month and more but not for February (2).
When above command is executed, it gives me queryset list of data greater than March 1. But when I use the following command
q.filter(created__year__gte=2020, created__month__gte=2, created__day__gte=28)
Where month is February (2), it is not giving any data and the queryset is blank.
Using datetime object gives error
received a naive datetime (2020-03-01 00:00:00) while time zone support is active
Why filter is not working with gte even when month is less than 3?
Why do you want to filter like this? In your case it is totally unnecessary. Just filter by date strictrly
q.filter(created__gte=datetime.datetime(2020, 3, 1))
Use:
q.filter(created__date__gte=datetime.date(2020, 3, 1))
Regarding the filter, its working perfectly when month < 3, but if you specify that day > 28, then its already narrowing results set to only data that was crated between 29-31 day for every month, not only Feb. And most likely its not what you want.
The error was not with the Django setup, instead it was with the MySQL configuration due to missing timezone data.
Check this answer for how to resolve this error
https://stackoverflow.com/a/60844090/3719167

Using Django ORM to retrieve recent rows

In SQL, if I wanted to query a table for data from the most recent 10 minutes (regardless of timezones and such), I'd simply do (using postgresql parlance):
select * from table where creation_time > now() - interval'10 mins';
Is there an equivalent way to do something like this using the Django ORM, disregarding what timezone settings one has set for the app? Would be great to get an illustrative example here.
Try this:-
Data within 10 minutes :-
from datetime import datetime, timedelta
time_threshold = datetime.now() - timedelta(minutes=10)
results = Table.objects.filter(createdOn__lte=time_threshold)
Last 10 rows based on createdOn value:-
recentData = Table.objects.all().order_by('-createdOn')[:10]
Last 10 rows if you don't have createdOn column to filter:-
recentData = Table.objects.all().order_by('-id')[:10]

Django: Group by date (day, month, year)

I've got a simple Model like this:
class Order(models.Model):
created = model.DateTimeField(auto_now_add=True)
total = models.IntegerField() # monetary value
And I want to output a month-by-month breakdown of:
How many sales there were in a month (COUNT)
The combined value (SUM)
I'm not sure what the best way to attack this is. I've seen some fairly scary-looking extra-select queries but my simple mind is telling me I might be better off just iterating numbers, starting from an arbitrary start year/month and counting up until I reach the current month, throwing out simple queries filtering for that month. More database work - less developer stress!
What makes most sense to you? Is there a nice way I can pull back a quick table of data? Or is my dirty method probably the best idea?
I'm using Django 1.3. Not sure if they've added a nicer way to GROUP_BY recently.
Django 1.10 and above
Django documentation lists extra as deprecated soon. (Thanks for pointing that out #seddonym, #Lucas03). I opened a ticket and this is the solution that jarshwah provided.
from django.db.models.functions import TruncMonth
from django.db.models import Count
Sales.objects
.annotate(month=TruncMonth('created')) # Truncate to month and add to select list
.values('month') # Group By month
.annotate(c=Count('id')) # Select the count of the grouping
.values('month', 'c') # (might be redundant, haven't tested) select month and count
Older versions
from django.db import connection
from django.db.models import Sum, Count
truncate_date = connection.ops.date_trunc_sql('month', 'created')
qs = Order.objects.extra({'month':truncate_date})
report = qs.values('month').annotate(Sum('total'), Count('pk')).order_by('month')
Edits
Added count
Added information for django >= 1.10
Just a small addition to #tback answer:
It didn't work for me with Django 1.10.6 and postgres. I added order_by() at the end to fix it.
from django.db.models.functions import TruncMonth
Sales.objects
.annotate(month=TruncMonth('timestamp')) # Truncate to month and add to select list
.values('month') # Group By month
.annotate(c=Count('id')) # Select the count of the grouping
.order_by()
Another approach is to use ExtractMonth. I ran into trouble using TruncMonth due to only one datetime year value being returned. For example, only the months in 2009 were being returned. ExtractMonth fixed this problem perfectly and can be used like below:
from django.db.models.functions import ExtractMonth
Sales.objects
.annotate(month=ExtractMonth('timestamp'))
.values('month')
.annotate(count=Count('id'))
.values('month', 'count')
metrics = {
'sales_sum': Sum('total'),
}
queryset = Order.objects.values('created__month')
.annotate(**metrics)
.order_by('created__month')
The queryset is a list of Order, one line per month, combining the sum of sales: sales_sum
#Django 2.1.7
Here's my dirty method. It is dirty.
import datetime, decimal
from django.db.models import Count, Sum
from account.models import Order
d = []
# arbitrary starting dates
year = 2011
month = 12
cyear = datetime.date.today().year
cmonth = datetime.date.today().month
while year <= cyear:
while (year < cyear and month <= 12) or (year == cyear and month <= cmonth):
sales = Order.objects.filter(created__year=year, created__month=month).aggregate(Count('total'), Sum('total'))
d.append({
'year': year,
'month': month,
'sales': sales['total__count'] or 0,
'value': decimal.Decimal(sales['total__sum'] or 0),
})
month += 1
month = 1
year += 1
There may well be a better way of looping years/months but that's not really what I care about :)
Here is how you can group data by arbitrary periods of time:
from django.db.models import F, Sum
from django.db.models.functions import Extract, Cast
period_length = 60*15 # 15 minutes
# Annotate each order with a "period"
qs = Order.objects.annotate(
timestamp=Cast(Extract('date', 'epoch'), models.IntegerField()),
period=(F('timestamp') / period_length) * period_length,
)
# Group orders by period & calculate sum of totals for each period
qs.values('period').annotate(total=Sum(field))
By month:
Order.objects.filter().extra({'month':"Extract(month from created)"}).values_list('month').annotate(Count('id'))
By Year:
Order.objects.filter().extra({'year':"Extract(year from created)"}).values_list('year').annotate(Count('id'))
By day:
Order.objects.filter().extra({'day':"Extract(day from created)"}).values_list('day').annotate(Count('id'))
Don't forget to import Count
from django.db.models import Count
For django < 1.10
i have orders table in my database . i am going to count orders per month in the last 3 months
from itertools import groupby
from dateutil.relativedelta import relativedelta
date_range = datetime.now()-relativedelta(months=3)
aggs =Orders.objects.filter(created_at=date_range)\
.extra({'date_created':"date(created_at)"}).values('date_created')
for key , group in groupby(aggs):
print(key,len(list(group)))
created_at is datetime field. by extra function what done is taking date from datetime values. when using datetime we may not get the count correct because objects are created at different time in a day.
The for loop will print date and number of count