Convert two Timestamps to Datetime and get their difference in Python - python-2.7

import datetime
start = datetime.fromtimestamp(float(1485008513.00000))
end = datetime.fromtimestamp(float(1485788517.80000))
#Duration
duration = end - start
My result is :
9 days, 0:40:04.800000
But it must be like this (without days, only hours, minutes and seconds) :
216:40:04.800000
Thanks a lot !

Not elegant, but works (for your example, durations less then a day and much more then 1000 days) - but its ugly:
import datetime
start = datetime.datetime.fromtimestamp(float(1485788515.0000))
end = datetime.datetime.fromtimestamp(float(1485788517.80000))
#Duration
duration = end - start
dur = str(duration).split(',')
print dur
# less then a day is not str() as 0days, ... so we fix that here by introducing artificial
# zero day if a split only retunrs 1 element
if len(dur) < 2:
d = ["0", dur[0]]
dur = d
dayHours = int(dur[0].replace('days',''))*24 # remove the days, mult with 24
hours = dur[1].split(':')[0] # get the partial hours of this part
minsSecs = ':'.join(dur[1].split(':')[1:]) # split+join the rest from the hours
# print all combined
print (str( dayHours+ int(hours) ) + ':' + minsSecs)
Output:
216:40:04.800000
Maybe better:
totSec = duration.total_seconds()
hours = totSec // (60*60)
mins = (totSec - (hours*60*60)) // 60
secs = totSec - (hours*60*60) - mins * 60
print "{:2}:{:2}:{:09.6f}".format(int(hours),int(mins),secs)

Related

Python script | long running | Need suggestions to optimize

I have written this script to generate a dataset which would contain 15 minute time intervals based on the inputs provided for operational hours for all days of a week for 365 days.
example: Let us say Store 1 opens at 9 AM and closes at 9 PM on all days. That is 12 hours everyday. 12*4 = 48(15 minute periods a day). 48 * 365 = 17520 (15 minute periods for a year).
The sample dataset only contains 5 sites but there are about 9000 sites that this script needs to generate data for.
The script obviously runs for a handful of sites(100) and couple of days(2) but needs to run for sites(9000) and 365 days.
Looking for suggestions to make this run faster. This will be running on a local machine.
input data: https://drive.google.com/open?id=1uLYRUsJ2vM-TIGPvt5RhHDhTq3vr4V2y
output data: https://drive.google.com/open?id=13MZCQXfVDLBLFbbmmVagIJtm6LFDOk_T
Please let me know if I can help with anything more to get this answered.
def datetime_range(start, end, delta):
current = start
while current < end:
yield current
current += delta
import pandas as pd
import numpy as np
import cProfile
from datetime import timedelta, date, datetime
#inputs
empty_data = pd.DataFrame(columns=['store','timestamp'])
start_dt = date(2019, 1, 1)
days = 365
data = "input data | attached to the post"
for i in range(days):
for j in range(len(data.store)):
curr_date = start_dt + timedelta(days=i)
curr_date_year = curr_date.year
curr_date_month = curr_date.month
curr_date_day = curr_date.day
weekno = curr_date.weekday()
if weekno<5:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['m_f_open_hrs'].iloc[j],data['m_f_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['m_f_close_hrs'].iloc[j],data['m_f_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
elif weekno==5:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['sat_open_hrs'].iloc[j],data['sat_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['sat_close_hrs'].iloc[j],data['sat_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
else:
dts = [dt.strftime('%Y-%m-%d %H:%M') for dt in
datetime_range(datetime(curr_date_year,curr_date_month,curr_date_day,data['sun_open_hrs'].iloc[j],data['sun_open_min'].iloc[j]), datetime(curr_date_year,curr_date_month,curr_date_day, data['sun_close_hrs'].iloc[j],data['sun_close_min'].iloc[j]),
timedelta(minutes=15))]
vert = pd.DataFrame(dts,columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
final_data = empty_data
I think the most time consuming tasks in your script are the datetime calculations.
You should try to make all of those calculations using UNIX Time. It basically represents time as an integer that counts seconds... so you could take two UNIX dates and see the difference just by doing simple subtraction.
In my opinion you should perform all the operations like that... and when the process has finished you can make all the datetime conversions to a more readable date format.
Other thing that you should change in your script is all the code repetition that is almost identical. It won't improve the performance, but it improves readability, debugging and your skills as a programmer. As a simple example I have refactored some of the code (you probably can do better than what I did, but this is just an example).
def datetime_range(start, end, delta):
current = start
while current < end:
yield current
current += delta
from datetime import timedelta, date, datetime
import numpy as np
import cProfile
import pandas as pd
# inputs
empty_data = pd.DataFrame(columns=['store', 'timestamp'])
start_dt = date(2019, 1, 1)
days = 365
data = "input data | attached to the post"
for i in range(days):
for j in range(len(data.store)):
curr_date = start_dt + timedelta(days=i)
curr_date_year = curr_date.year
curr_date_month = curr_date.month
curr_date_day = curr_date.day
weekno = curr_date.weekday()
week_range = 'sun'
if weekno < 5:
week_range = 'm_f'
elif weekno == 5:
week_range = 'sat'
first_time = datetime(curr_date_year,curr_date_month,curr_date_day,data[week_range + '_open_hrs'].iloc[j],data[week_range + '_open_min'].iloc[j])
second_time = datetime(curr_date_year,curr_date_month,curr_date_day, data[week_range + '_close_hrs'].iloc[j],data[week_range + '_close_min'].iloc[j])
dts = [ dt.strftime('%Y-%m-%d %H:%M') for dt in datetime_range(first_time, second_time, timedelta(minutes=15)) ]
vert = pd.DataFrame(dts, columns = ['timestamp'])
vert['store']= data['store'].iloc[j]
empty_data = pd.concat([vert, empty_data])
final_data = empty_data
Good luck!

django query aggregate function is slow?

I am working with Django to see how to handle large databases. I use a database with fields name, age, date of birth(dob) and height. The database has about 500000 entries. I have to find the average height of persons of (1) same age and (2) born in same year. The aggregate function in querying table takes about 10s. Is it usual or am I missing something?
For age:
age = [i[0] for i in Data.objects.values_list('age').distinct()]
ht = []
for each in age:
aggr = Data.objects.filter(age=each).aggregate(ag_ht=Avg('height')
ht.append(aggr)
From dob,
age = [i[0].year for i in Data.objects.values_list('dob').distinct()]
for each in age:
aggr = Data.objects.filter(dob__contains=each).aggregate(ag_ht=Avg(‌​'height')
ht.append(aggr)
The year has to be extracted from dob. It is SQLite and I cannot use __year (join).
For these queries to be efficient, you have to create indexes on the age and dob columns.
You will get a small additional speedup by using covering indexes, i.e., using two-column indexes that also include the height column.
full version with time compare loop and query set version
import time
from dd.models import Data
from django.db.models import Avg
from django.db.models.functions import ExtractYear
for age
start = time.time()
age = [i[0] for i in Data.objects.values_list('age').distinct()]
ht = []
for each in age:
aggr = Data.objects.filter(age=each).aggregate(ag_ht=Avg('height'))
ht.append(aggr)
end = time.time()
loop_time = end - start
start = time.time()
qs = Data.objects.values('age').annotate(ag_ht=Avg('height')).order_by('age')
ht_qs = qs.values_list('age', 'ag_ht')
end = time.time()
qs_time = end - start
print loop_time / qs_time
for dob year, with easy refactoring your version(add set in the years)
start = time.time()
years = set([i[0].year for i in Data.objects.values_list('dob').distinct()])
ht_year_loop = []
for each in years:
aggr = Data.objects.filter(dob__contains=each).aggregate(ag_ht=Avg('height'))
ht_year_loop.append((each, aggr.get('ag_ht')))
end = time.time()
loop_time = end - start
start = time.time()
qs = Data.objects.annotate(dob_year=ExtractYear('dob')).values('dob_year').annotate(ag_ht=Avg('height'))
ht_qs = qs.values_list('dob_year', 'ag_ht')
end = time.time()
qs_time = end - start
print loop_time / qs_time

How to get the number of Days in a Specific Month between Two Dates in Python

I have two date fields campaign_start_date and campaign_end_date. I want to count the number of days in each month that comes in-between the campaign_start_date and campaign_end_date.
eg:
campaign_start_date = September 7 2017
campaign_end_date = November 6 2017
The solution should be :
Total No:of days = 61 days
No: of months = 3 months
Month 1 = 9/7/2017 to 9/30/2017
Month 2 = 10/1/2017 to 10/31/2017
Month 3 = 11/1/2017 to 11/6/2017
No:of days in Month 1 = 24 days
No:of days in Month 2 = 31 days
No:of days in Month 3 = 6 days
How can I achieve this using Python?
So far I have achieved:
#api.multi
def print_date(self):
start_date = datetime.strptime(self.start_date, "%Y-%m-%d %H:%M:%S")
end_date = datetime.strptime(self.end_date, "%Y-%m-%d %H:%M:%S")
campaign_start_date = date(start_date.year,start_date.month,start_date.day)
campaign_end_date = date(end_date.year,end_date.month,end_date.day)
duration = (campaign_end_date-campaign_start_date).days
return True
Calculate the duration in days:
from datetime import date
campaign_start_date = date(2017, 9, 7)
campaign_end_date = date(2017, 10, 6)
duration = (campaign_end_date-campaign_start_date).days
print campaign_start_date, campaign_end_date, duration
Some hints for further calculations:
import calendar
campaign_end_month_start = campaign_end_date.replace(day=1)
days_in_month_campaign_end = (campaign_end_date - campaign_end_month_start).days + 1
range_startmonth = calendar.monthrange(campaign_start_date.year, campaign_start_date.month)
campaign_start_month_ends = campaign_start_date.replace(day=range_startmonth[1])
days_in_month_campaign_begins = (campaign_start_month_ends - campaign_start_date).days
This way you can calculate the number of days in each month of the campaign (keep in mind to check if campaign_end_date is in another month than campaign_start_date
For calculations you can also access the fields of a date, e.g.
campaign_start_date.day
campaign_start_date.month
campaign_start_date.year
To calculate the number of involved month in your campaign and to get a list of the month to calculate the duration per month you can use this (based on the answer of m.antkowicz in Python: get all months in range?). It's important to set the day to 1 (current = current.replace(day=1)) before and inside the loop, otherwise you skip a month when your startdate is 31st of a month and the next month is shorter than 31 days or if you have a longer period:
from datetime import date, datetime, timedelta
current = campaign_start_date
result = [current]
current = current.replace(day=1)
while current <= campaign_end_date:
current += timedelta(days=32)
current = current.replace(day=1)
result.append(datetime(current.year, current.month, 1))
print result, len(result)
which prints (when you use current.strftime('%Y-%m-%d'):
['2017-09-07', '2017-10-01', '2017-11-01'] 3
now you can loop over the result list and calculate the number of days per months:
durations= []
for curr in result:
curr_range = calendar.monthrange(curr.year, curr.month)
curr_duration = (curr_range[1] - curr.day)+1
if (curr.month < campaign_end_date.month):
durations.append(curr_duration)
else:
durations.append(campaign_end_date.day)
print durations
which gives you the desired "No:of days in Month x" as a list:
[24, 31, 6]
This is the robust solution which takes care of dates from different years.
def get_months_and_durations(start_date,end_date):
current = start_date
result = [current]
current = current.replace(day=1)
while current <= end_date:
current += timedelta(days=32)
current = current.replace(day=1)
result.append(datetime(current.year, current.month, 1).date())
durations= []
for curr in result[:-1]:
curr_range = calendar.monthrange(curr.year, curr.month)
curr_duration = (curr_range[1] - curr.day)+1
if ((curr.month == end_date.month) & (curr.year == end_date.year)):
durations.append(end_date.day)
else:
durations.append(curr_duration)
return result[:-1],durations

How to print the difference only in hours,minutes and seconds only (like td = 33:23:21) in Python?

import Datetime
str1 = "2017-05-01 18:23:22" #String one of time
str2 = "2017-05-02 23:16:22" #String two of time
#Turning it to datetime objects
T1 = datetime.datetime.strptime(str1, "%Y-%m-%d %H:%M:%S")
T2 = datetime.datetime.strptime(str2, "%Y-%m-%d %H:%M:%S")
DIFF = T2 -T1
print DIFF
Output is as: 1 day, 4:53:00
Needed outout: 28:53:00
Here ya go, add this to the bottom of your code to print out the desired format:
days = DIFF.days
hours, remainder = divmod(DIFF.seconds, 3600)
minutes, seconds = divmod(remainder, 60)
print str(days * 24 + hours) + ":" + str("%02d" % minutes) + ":" + str("%02d" % seconds)

Difference between two dates python/Django

I need to know how to get the time elapsed between the edit_date(a column from one of my models) and datetime.now(). My edit_date column is under the DateTimeField format. (I'm using Python 2.7 and Django 1.10)
This is the function I'm trying to do:
def time_in_status(request):
for item in Reporteots.objects.exclude(edit_date__exact=None):
date_format = "%Y-%m-%d %H:%M:%S"
a = datetime.now()
b = item.edit_date
c = a - b
dif = divmod(c.days * 86400 + c.minute, 60)
days = str(dif)
print days
The only thing I'm getting from this fuction are the minutes elapsed and seconds. What I need is to get this date in the following format:
Time_elapsed = 3d 47m 23s
Any ideas? let me know if I'm not clear of if you need more information
Thanks for your attention,
Take a look at dateutil.relativedelta:
http://dateutil.readthedocs.io/en/stable/relativedelta.html
from dateutil.relativedelta import relativedelta
from datetime import datetime
now = datetime.now()
ago = datetime(2017, 2, 11, 13, 5, 22)
diff = relativedelta(ago, now)
print "%dd %dm %ds" % (diff.days, diff.minutes, diff.seconds)
I did that code from memory, so you may have to tweak it to your needs.
Try something like
c = a - b
minutes = (c.seconds % 3600) // 60
seconds = c.seconds % 60
print "%sd %sm %ss" % (c.days, minutes, seconds)