Django annotate time delta by hours - django

I have a model as follows:
class WorkTime(models.Model):
person = models.ForeignKey(Person)
entry_date = models.DateField(default=datetime.datetime.now(), verbose_name='date')
start_time = models.TimeField(verbose_name='start')
end_time = models.TimeField(verbose_name='end')
With data as follows:
person, date, start, end
1 01/01/2014 08:00 12:00
1 01/01/2014 13:00 18:00
1 02/01/2014 08:00 12:00
1 02/01/2014 13:00 18:00
1 03/01/2014 08:00 16:00
1 01/02/2014 08:30 12:00
1 01/02/2014 13:00 18:00
2 01/01/2014 09:00 13:00
2 01/01/2014 14:00 18:00
How would one sum up the time delta (i.e. end_time - start_time) and GROUP BY person to show the hours worked by person, as below?
person, hours
1 34:30
2 08:00

I don't know a good way to do this in the ORM. It's pretty straight forward to build a dictionary by iterating through a queryset.
from collections import defaultdict
from datetime import datetime
totals = defaultdict()
work_times = WorkTime.objects.
for work_time in work_times:
totals[work_time.person] += datetime.combine(work_time.entry_date, work_time.end_time) - datetime.combine(work_time.entry_date, work_time.start_time)
# print results
for person, total_time in totals.items():
# total_time will be a timedelta, you can do some more work to return hours and minutes
print person, total_time
Depending on your usage, the performance of this might be good enough.

This has been a while, but maybe someone will stumble upon this.
You can do it this way:
from django.db.models import F, Sum
queryset = queryset.values('person').annotate(sum_delta=Sum(F('end_time')-F('start_time')).order_by('person__id')
This will sum up all timedeltas for each person and you dont have to store extra stuff in the db.

Related

How to get a percentage of grouped values over a period of time (at the hour scale) in DAX?

I have a dataset containing the duration (in minutes) of occupancy events over a period of 1 hour in my rooms:
# room date duration
--- ---- ------------------- --------
0 A1 2022-01-01 08:00:00 30
1 A1 2022-01-01 10:00:00 5
2 A1 2022-01-01 16:00:00 30
3 A1 2022-01-02 10:00:00 60
4 A1 2022-01-02 16:00:00 60
...
My date column is linked to a date table in which I have:
# datetime year month monthName day dayOfWeek dayName hour
--- ------------------- ---- ----- --------- --- --------- -------- ----
...
k 2022-01-01 08:00:00 2022 1 January 1 5 Saturday 8
k+1 2022-01-01 09:00:00 2022 1 January 1 5 Saturday 9
...
n 2022-03-01 22:00:00 2022 3 March 1 1 Tuesday 22
I am trying to retrieve the following percentage: duration/timeperiod through a measure. The idea behind using a measure is :
Being able to use a time slicer and see my percentage being updated
Using, for example, a bar chart with my date hierarchy, and being able to see a percentage in my different level of hierarchy (datetime -> year -> month -> dayOfWeek -> hour)
Attempt
My idea was to create a first measure that would return the number of minutes between the first and the last date currently chosen. Here is what I came up with:
Diff minutes = DATEDIFF(
FIRSTDATE( 'date'[date] ),
LASTDATE( 'date'[date] ),
MINUTE
)
The idea was then to create a second measure that would divide the SUM of the durations by the Diff minutes' measure:
My rate = DIVIDE(
SUM( 'table'[duration] ),
[Diff minutes]
)
I currently face a few issues:
The slicer is set to (2022-01-02 --> 2022-01-03) and if I check in a matrix, I have datetime between 2022-01-02 0:00:00 and 2022-01-03 23:00:00, but my measure returns 1440 which is the number of minutes in a day but not in my selected time period
The percentage is also wrong unfortunately. Let's take the example that I highlighted in the capture. There are 2 values for the 10h slot, 5min and 60min. But the percentage shows 4.51% instead of 54.2%. It actually is the result of 65/1440, 1440 being the total of minutes for my whole time period, not my 10h slot.
Examples
1- Let's say I have a slicer over a period of 2 days (2022-01-01 --> 2022-01-02) and my dataset is the one provided before:
I would have a total duration of 185 minutes (30+5+30+60+60)
My time period would be 2 days = 48h = 2880 minutes
The displayed ratio would be: 6.4% (185/2880)
2- With the same slicer, a matrix with hours and percentage would give me:
hour rate
---- -----
0 0.0%
1 0.0%
...
8 25.0% <--- 30 minutes on the 1st of January and 0 minutes on the 2nd
9 0.0% <--- (5+0)/120
10 54.2% <--- (5+60)/120
...
16 75.0% <--- (30+60)/120
Constraints
The example I provided only has 1 room. In practice, there are n rooms and I would like my measure to return the percentage as the mean of all my rooms.
Would it be possible ? Have I chosen the right method ?
The DateDiff function you have created should work, I have tested it on a report and when I select some dates, it gives me the difference between the first and last selected dates.
Make sure your slicer is interacting with the measure.
In the meantime, I think I found a simpler and easier way to do it.
First, I added a new column to my date table, that seems dubious but is actually helpful:
minutes = 60
This allows me to get rid of the DATEDIFF function. My rate measure now looks like this:
My rate = DIVIDE(
SUM( table[duration] ),
[Number of minutes],
0
)
Here, I use the measure Number of minutes which is simply a SUM of the values in the minutes column. In order to provide accurate results when I have multiple rooms selected, I multiplied the number of minutes by the number of rooms:
Number of minutes = COUNTROWS( rooms ) * SUM( 'date'[minutes] )
This now works perfectly with my date hierarchy!

Passing django-recurrence field via REST API

Folks,
I am using django recurrence field in my app and its not clear how to format the field when passed via REST API.
Any help is appreciated.
from recurrence.fields import RecurrenceField
class Course(models.Model):
title = models.CharField(max_length=200)
recurrences = RecurrenceField()
Looks like its based of RFC 2445
https://www.rfc-editor.org/rfc/rfc2445#section-4.8.5.4
Format Definition: This property is defined by the following
notation:
rrule = "RRULE" rrulparam ":" recur CRLF
rrulparam = *(";" xparam)
Example: All examples assume the Eastern United States time zone.
Daily for 10 occurrences:
DTSTART;TZID=US-Eastern:19970902T090000
RRULE:FREQ=DAILY;COUNT=10
==> (1997 9:00 AM EDT)September 2-11
Daily until December 24, 1997:
DTSTART;TZID=US-Eastern:19970902T090000
RRULE:FREQ=DAILY;UNTIL=19971224T000000Z
==> (1997 9:00 AM EDT)September 2-30;October 1-25
(1997 9:00 AM EST)October 26-31;November 1-30;December 1-23
Every other day - forever:
DTSTART;TZID=US-Eastern:19970902T090000
RRULE:FREQ=DAILY;INTERVAL=2
==> (1997 9:00 AM EDT)September2,4,6,8...24,26,28,30;
October 2,4,6...20,22,24
(1997 9:00 AM EST)October 26,28,30;November 1,3,5,7...25,27,29;
Dec 1,3,...

Splitting time by the hour Python

I have a dataframe df1 like this, where starttime and endtime are datetime objects.
StartTime EndTime
9:08 9:10
9:10 9:35
9:35 9:55
9:55 10:10
10:10 10:20
If endtime.hour is not the same as startime.hour, I would like to split times like this
StartTime EndTime
9:08 9:10
9:10 9:55
9:55 10:00
10:00 10:10
10:10 10:20
Essentially insert a row into the existing dataframe df1. I have looked at a ton of examples but haven't figured out how to do this. If my question isn't clear please let me know.
Thanks
This does what you want ...
# load your data into a DataFrame
data="""StartTime EndTime
9:08 9:10
9:10 9:35
9:35 9:55
9:55 10:10
10:10 10:20
"""
from StringIO import StringIO # import from io for Python 3
df = pd.read_csv(StringIO(data), header=0, sep=' ', index_col=None)
# convert strings to Pandas Timestamps (we will ignore the date bit) ...
import datetime as dt
df.StartTime = [dt.datetime.strptime(x, '%H:%M') for x in df.StartTime]
df.EndTime = [dt.datetime.strptime(x, '%H:%M') for x in df.EndTime]
# assumption - all intervals are less than 60 minutes
# - ie. no multi-hour intervals
# add rows
dfa = df[df.StartTime.dt.hour != df.EndTime.dt.hour].copy()
dfa.EndTime = [dt.datetime.strptime(str(x), '%H') for x in dfa.EndTime.dt.hour]
# play with the start hour ...
df.StartTime = df.StartTime.where(df.StartTime.dt.hour == df.EndTime.dt.hour,
other = [dt.datetime.strptime(str(x), '%H') for x in df.EndTime.dt.hour])
# bring back together and sort
df = pd.concat([df, dfa], axis=0) #top/bottom
df = df.sort('StartTime')
# convert the Timestamps to times for easy reading
df.StartTime = [x.time() for x in df.StartTime]
df.EndTime = [x.time() for x in df.EndTime]
And yields
In [40]: df
Out[40]:
StartTime EndTime
0 09:08:00 09:10:00
1 09:10:00 09:35:00
2 09:35:00 09:55:00
3 09:55:00 10:00:00
3 10:00:00 10:10:00
4 10:10:00 10:20:00

Pandas Aggregate/Group by based on most recent date

I have a DataFrame as follows, where Id is a string and Date is a datetime:
Id Date
1 3-1-2012
1 4-8-2013
2 1-17-2013
2 5-4-2013
2 10-30-2012
3 1-3-2013
I'd like to consolidate the table to just show one row for each Id which has the most recent date.
Any thoughts on how to do this?
You can groupby the Id field:
In [11]: df
Out[11]:
Id Date
0 1 2012-03-01 00:00:00
1 1 2013-04-08 00:00:00
2 2 2013-01-17 00:00:00
3 2 2013-05-04 00:00:00
4 2 2012-10-30 00:00:00
5 3 2013-01-03 00:00:00
In [12]: g = df.groupby('Id')
If you are not certain about the ordering, you could do something along the lines:
In [13]: g.agg(lambda x: x.iloc[x.Date.argmax()])
Out[13]:
Date
Id
1 2013-04-08 00:00:00
2 2013-05-04 00:00:00
3 2013-01-03 00:00:00
which for each group grabs the row with largest (latest) date (the argmax part).
If you knew they were in order you could take the last (or first) entry:
In [14]: g.last()
Out[14]:
Date
Id
1 2013-04-08 00:00:00
2 2012-10-30 00:00:00
3 2013-01-03 00:00:00
(Note: they're not in order, so this doesn't work in this case!)
In the Hayden response, I think that using x.loc in place of x.iloc is better, as the index of the df dataframe could be sparse (and in this case the iloc will not work).
(I haven't enought points on stackoverflow to post it in comments of the response).

Days of the week Django

please, explain me, how do this thing: I have a week number (52, for example) and year (2012). So, how I can get the days number (monday - 24, tuesday - 25, etc). Yes, I read this, but I cant understand, how to do it.
Thanks.
I would do it like this:
from datetime import date, timedelta
def get_weekdays(year, week):
january_first = date(year, 1, 1)
monday_date = january_first + timedelta(days=week * 7 - january_first.weekday())
# monday, tuesday, .. sunday
return [(monday_date + timedelta(days=d)).day for d in range(7)]
(my weeks start at monday)