Subtract two values - django

I'd like to subtract two values, one in the current record, then the next in the next record...they are time clock entries and I want to calculate the amount of time an employee spend on his/her break, so I'll have to subtract the time the employee clocked out, and the time the employee clocked back in. This will be done for several records, then at the end of it all I also want to have total of all the breaks taken.
So how can I do this? I'm doing this in Django BTW.
UPDATE
The records look a bit like this:
employee_id, rec_date, start_time, end_time
18, 2010-08-23, 09:58:00, 14:13:00
18, 2010-08-23, 14:39:00, 18:47:00
19, 2010-08-23, 14:15:00, 18:31:00
21, 2010-08-23, 12:05:00, 14:52:00
21, 2010-08-23, 15:23:00, 18:49:00
21, 2010-08-31, 08:00:00, 12:00:00
21, 2010-08-31, 12:45:00, 19:00:00

You'll have to iterate over the clock actions and do the calculation manually. Something of this sort:
breaks = 0
break_start = None
clock_actions = user.clock_actions.filter(date=desired_day).order_by('date')
for action in clock_actions:
if action.type == 'CLOCK IN':
break_start = action.time
elif break_start is not None:
breaks = breaks + action.time - break_start
break_start = None
You may have to adjust the subtraction to work with timedelta objects, depending on your choice of field for time.

Related

Django Window annotation using combined with distinct clause

I have a Django model stored in a Postgres DB comprised of values of counts at irregular intervals:
WidgetCount
- Time
- Count
I'm trying to use a window function with Lag to give me a previous row's values as an annotation. My problem is when I try to combine it with some distinct date truncation the window function uses the source rows rather than the distinctly grouped ones.
For example if I have the following rows:
time count
2020-01-20 05:00 15
2020-01-20 06:00 20
2020-01-20 09:00 30
2020-01-21 06:00 35
2020-01-21 07:00 40
2020-01-22 04:00 50
2020-01-22 06:00 54
2020-01-22 09:00 58
And I want to return a queryset showing the first reading per day, I can use:
from django.db.models.functions import Trunc
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"))
Which gives me:
date count
01/01/20 15
01/01/21 35
01/01/22 50
I would like to add an annotation which gives me yesterday's value (so I can show the change per day).
date count yesterday_count
01/01/20 15
01/01/21 35 15
01/01/22 50 35
If I do:
from django.db.models.functions import Trunc, Lag
from django.db.models import Window
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count")))
The second row return gives me 30 for yesterday_count - ie, its showing me the previous row before applying the distinct clause.
If I add a partiion clause like this:
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count"), partition_by=F("date")))
Then yesterday_count is None for all rows.
I can do this calculation in Python if I need to but it's driving me a bit mad and I'd like to find out if what I'm trying to do is possible.
Thanks!
I think the main problem is that you're mixing operations that used in annotation generates a grouped query set such as sum with a operation that simples create a new field for each record in the given query set such as yesterday_count=Window(expression=Lag("count")).
So Ordering really matters here. So when you try:
WidgetCount.objects.distinct("date").annotate(date=Trunc("time", "day"), yesterday_count=Window(expression=Lag("count")))
The result queryset is simply the WidgetCount.objects.distinct("date") annotated, no grouping is perfomed.
I would suggest decoupling your operations so it becomes easier to understand what is happening, and notice you're iterating over the python object so don't need to make any new queries!
Note in using SUM operation as example because I am getting an unexpected error with the FirstValue operator. So I'm posting with Sum to demonstrate the idea which remains the same. The idea should be the same for first value just by changing acc_count=Sum("count") to first_count=FirstValue("count")
for truncDate_groups in Row.objects.annotate(trunc_date=Trunc('time','day')).values("trunc_date")\
.annotate(acc_count=Sum("count")).values("acc_count","trunc_date")\
.order_by('trunc_date')\
.annotate(y_count=Window(Lag("acc_count")))\
.values("trunc_date","acc_count","y_count"):
print(truncDate_groups)
OUTPUT:
{'trunc_date': datetime.datetime(2020, 1, 20, 0, 0, tzinfo=<UTC>), 'acc_count': 65, 'y_count': None}
{'trunc_date': datetime.datetime(2020, 1, 21, 0, 0, tzinfo=<UTC>), 'acc_count': 75, 'y_count': 162}
{'trunc_date': datetime.datetime(2020, 1, 22, 0, 0, tzinfo=<UTC>), 'acc_count': 162, 'y_count': 65}
It turns out FirstValue operator requires to use a Windows function so you can't nest FirtValue and then calculate Lag, so in this scenario I'm not exactly sure if you can do it. The question becomes how to access the First_Value column without nesting windows.
I haven't tested it out locally but I think you want to GROUP BY instead of using DISTINCT here.
WidgetCount.objects.values(
date=Trunc('time', 'day'),
).order_by('date').annotate(
date_count=Sum('count'), # Will trigger a GROUP BY date
).annotate(
yesterday_count=Window(Lag('date_count')),
)

How do I convert an aware datetime to a POSIX timestamp?

I'd like to specify 12 PM on a particular date in the central timezone and adjust for daylight savings time (CDT). I'd then like to convert this to a POSIX timestamp.
The first thing I reached for is:
d = datetime.datetime(2017, 6, 27, 12, tzinfo=???)
But I don't have a concrete CDT class. pytz does however:
z = pytz.timezone('EST5EDT')
d = datetime.datetime(2017, 6, 27, 12, tzinfo=z)
But this does not work (pytz documentation says as much but I don't understand why the constructor cannot use the timezone information). If I use November (fall back) or June (spring forward) I get still get 12:00:00-05:00 as the time portion.
Even if this did work the method to convert to a POSIX timestamp assumes a naive datetime:
posix = time.mktime(d.timetuple())
This timestamp represents 12PM in my local time zone.
Then there is normalize() with code and examples that I find very difficult to follow:
au_dt = au_tz.normalize(utc_dt.astimezone(au_tz))
I also tried to subtract my aware time from an epoch defined at January 1, 1970 but that doesn't work unless the datetime gets the UTC offset correct (see my second point above).
Can anyone help me with a mental model of how this stuff works in general and a solution to this problem in particular?
This solution seems to work:
import pytz
import datetime
# Naive datetime (no timezone).
d = datetime.datetime(2017, 11, 27, 12)
cdt = pytz.timezone('US/Central')
# Give it a central time zone context.
d = cdt.localize(d)
# Determine the time in UTC.
d = d.astimezone(pytz.utc)
# Create a naive POSIX epoch.
epoch = datetime.datetime(1970, 1, 1)
# Give it context in the UTC time zone.
epoch = pytz.utc.localize(epoch)
# Number of seconds have elapsed since the epoch.
print (d - epoch).total_seconds()

Saving checkpoints and resuming training in tensorflow

I was playing with saving checkpoints and resuming training from saved checkpoints. I was following the example given in - https://www.tensorflow.org/versions/r0.8/api_docs/python/train.html#import_meta_graph
To keep things simple, I have not used any 'real' training of a network. I just performed a simple subtraction op and each check point saves the same operation on same tensors again and again.
A minimal example is provided in the form of the following ipython notebook - https://gist.github.com/dasabir/29b8f84c6e5e817a72ce06584e988f10
In the first phase, I'm running the loop for 100 times (by setting the value of the variable 'endIter = 100' in the code) and saving checkpoints every 10th iteration. So, the checkpoints being saved are numbered as - 9, 19, ..., 99. Now when I'm changing the 'enditer' value to say 200 and am resuming training, the checkpoints again start to be saved from 9, 19, ... (not 109, 119, 129, ...). Is there a trick I'm missing?
Can you print out 'latest_ckpt', and see if it points to the latest ckpt file? Also, you need to maintain the global_step using a tf.variable:
global_step = tf.Variable(0, name='global_step', trainable=False)
...
ckpt = tf.train.get_checkpoint_state(ckpt_dir)
if ckpt and ckpt.model_checkpoint_path:
print ckpt.model_checkpoint_path
saver.restore(sess, ckpt.model_checkpoint_path) # restore all variables
start = global_step.eval() # get last global_step
print "Start from:", start
for i in range(start, 100):
...
global_step.assign(i).eval() # set and update(eval) global_step with index, i
saver.save(sess, ckpt_dir + "/model.ckpt", global_step=global_step)
You can take a look at the full example:
https://github.com/nlintz/TensorFlow-Tutorials/pull/32/files

Filter a dataframe

I'm trying to filter a dataframe for a certain date in a column.
The colum entries are timestamps and I try to construct a boolean vector from those,
checking for a certain date.
I tried:
filterfr = df[((df.expiration.month==6) & (df.expiration.day==22) & (df.expiration.year==2002)]
It doesn't work, because 'Series' object has no attribute 'month'.
How can this be done?
When you do df.expiration, you get back a Series where the items are the expiration datetimes.
Try comparing to an actual datetime.datetime object:
filterfr = df[df['expiration'] == datetime.datetime(2002, 6, 22)]
You may want to look into using a DatetimeIndex, depending on your dataset. This lets you use the convenient syntax
df['2002-06-22']
To have access to the DatetimeIndex methods you have to wrap it in DatetimeIndex (currently*).
The fastest way is to access the day, month and year attributes (just like you attempted):
expir = pd.DatetimeIndex(df['expiration'])
(expir.day == 22) & (expir.month == 6) & (expir.year == 2002)
Alternative, but slower ways are to use the normalize method (to bring it to the start of the day), or to use the date attribute:
pd.DatetimeIndex(df['expiration']).normalize() == datetime.datetime(2002, 06, 22)
pd.DatetimeIndex(df['expiration']).date == datetime.datetime(2002, 06, 22)
*In 0.15 there will be a dt attribute so that you can access these as:
expir = df['expiration']
expir.dt.day ...
This
filterfr = df[df['expiration'] == datetime.datetime(2002, 6, 22)]
worked fine.
However, after doing some filtering, I got an error,
when trying to do filterfr.expiration[0]
or filterfr['expiration'][0]
to get the first element in the series.
KeyError: 0L is raised, although there are elements in the series.
The series looks like this:
Name: expiration, Length: 534668, dtype: datetime64[ns]
Shouldn't this actually always work?

Subtraction of two dates in django query

class Case( models.Model ):
created = models.DateTimeField()
modified = models.DateTimeField()
STATUS = (
('Active', 'Active'),
('Hold', 'Hold'),
('Expired', 'Expired'),
('Cancelled', 'Cancelled'),
)
status = models.CharField(max_length=32, choices=STATUS)
Now i want to extract records, having status expired less than 2 months ago, simply expired more than 2 months ago shouldn't be counted.
i have read __here subtraction of dates but it doesn't work in my case.
expired_cases = Case.objects.filter( status = 'Expired', modified__lt = datetime.now() - timedelta(days=60) ).count()
this kind of query may work but i didn't want to hard code the days in this.
Please help me in this issue.
thanx in advance :)
You can use python's calendar lib to return a list of days. Note that it doesn't handle leap years so you need to implement a solution for that.
>>> days = calendar.mdays
[0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
Get the days of the current month this way, then calculate the timedelta you need. You'll need to strip the first element from the list to make it work.
>>> days = days[1:]
>>> month = datetime.date.today().month-1 #-1 due to 0-based indexing in list
>>> delta = days[month-1] + days[month-2]
Use the python-dateutil module. Create a Python datetimeobject out of the modified time ans subtract. It should work.