How to add time in a python list - python-2.7

This is the list:
lsty = ['1:07:11', '2:37:28', '07:11', '1:07:11']
Time can be like '2:37:28' (2h 37m 28s) or '07:11' (7m 11s). How can I sum up the list?

You may find the native python datetime.timedelta object useful, it allows you to represent time in a way that Python understands, and perform arithmetic with other timedelta objects.
Perhaps something like this? This is totally untested:
from datetime import timedelta
def sum_times(times):
sum = timedelta(0)
for time in times:
time_split = time.split(':') # Extract just time vals
if len(time_split) == 2: # Just mins/secs
t_delt = timedelta(minutes=time_split[0],
seconds=time_split[1])
else:
t_delt = timedelta(hours=time_split[0],
minutes=time_split[1],
seconds=time_split[2])
sum += t_delt # This is where the magic happens
return '%s:%s:%s' % (sum.hours, sum.minutes, sum.seconds)

Related

Concatenating variable with parameters in pyomo

I want to concatenate the two variables x and p defined as
from pyomo.environ import *
import numpy as np
model = ConcreteModel()
model.t = ContinuousSet(bounds=(0, 10))
# States
model.x = Var(model.t)
model.p = Param(initialize=2)
I tried (with not much hopes) the following:
np.concatenate((model.x, model.p) axis=0)
but I get of course a numpy array out of it. I have been looking on the internet for at least 30 minutes and I could not find anything. Which is surprising.
I need this concatenation as it makes further matrix-vector operations much easier....

datetime strptime method for format HH:MM:SS.MICROSECOND

I'm trying to investigate the Python time striptime method to decompose a time represented as 11:49:57.74. The standard %H, %M, %S are able to decompose the hour , minute , second. However, since the data is a string ( which is taken in python pandas column as datatype object, the Milliseconds after the decimal second is left uninterpreted. Hence, I get an error. Could someone please advise how to parse the example so that the seconds and microseconds are correctly interpreted from the time string ?
I would then use them to find the time delta between two time stamps.
I don't know if I had correctly understood your question.
So, to convert that string time to datetime and calculate the timedelta between two times you need to do as follow:
timedelta = str() #declare an empty string where save the timedelta
my_string = '11:49:57.74' # first example time
another_example_time = '13:49:57.74' #second example time, invented by me for the example
first_time = datetime.strptime(my_string, "%H:%M:%S.%f") # extract the first time
second_time = datetime.strptime(another_example_time , "%H:%M:%S.%f") # extract the second time
#calculate the time delta
if(first_time > second_time):
timedelta = first_time - second_time
else:
timedelta = second_time - first_time
print "The timedelta between %s and %s is: %s" % (first_time, second_time, timedelta)
Here obviusly you don't have any date, so the datetime library as default use 1900-01-01 as you can see in the result of the print:
The timedelta between 1900-01-01 11:49:57.740000 and 1900-01-01 13:49:57.740000 is: 2:00:00
I hope this solution is what you need. Next time provide a little bit more information please, or share an example with the code that you have tried to write.

Time variable units "day as %Y%m%d.%f" in python iris

I am hoping someone can help. I am running a few climate models (NetCDF files) in python using iris. All was working well until I added my last model which is formatted differently. The units they use for the time variable in the new models is day as %Y%m%d.%f but in the other models it is days since …. This means that when I try to constrain the time variable I get the following error AttributeError: 'numpy.float64' object has no attribute 'year'.
I tried adding a year variable using iriscc.add_year(EARTH3, 'time') but that just brings up the error ‘Unit has undefined calendar’.
I’m wondering if you know how I might fix this? Do I need to convert the calendar type? Or is there is there a way around that? Not sure how to do that anyway!
Thank you!
Erika
EDIT: here is the full code for my file the model CanESM2 is working, but the model EARTH3 is not - it is the one with the funny time units.
import matplotlib.pyplot as plt
import iris
import iris.coord_categorisation as iriscc
import iris.plot as iplt
import iris.quickplot as qplt
import iris.analysis.cartography
import cf_units
from cf_units import Unit
import datetime
import numpy as np
def main():
#-------------------------------------------------------------------------
#bring in all the GCM models we need and give them a name
CanESM2= '/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/GCM_data/tasmin_Amon_CanESM2_historical_r1i1p1_185001-200512.nc'
EARTH3= '/exports/csce/datastore/geos/users/s0XXXX/Climate_Modelling/GCM_data/tas_Amon_EC-EARTH_historical_r3i1p1_1850-2009.nc'
#Load exactly one cube from given file
CanESM2 = iris.load_cube(CanESM2)
EARTH3 = iris.load_cube(EARTH3)
print"CanESM2 time"
print (CanESM2.coord('time'))
print "EARTH3 time"
print (EARTH3.coord('time'))
#fix EARTH3 time units as they differ from all other models
t_coord=EARTH3.coord('time')
t_unit = t_coord.attributes['invalid_units']
timestep, _, t_fmt_str = t_unit.split(' ')
new_t_unit_str= '{} since 1850-01-01 00:00:00'.format(timestep)
new_t_unit = cf_units.Unit(new_t_unit_str, calendar=cf_units.CALENDAR_STANDARD)
new_datetimes = [datetime.datetime.strptime(str(dt), t_fmt_str) for dt in t_coord.points]
new_dt_points = [new_t_unit.date2num(new_dt) for new_dt in new_datetimes]
new_t_coord = iris.coords.DimCoord(new_dt_points, standard_name='time', units=new_t_unit)
print "EARTH3 new time"
print (EARTH3.coord('time'))
#regrid all models to have same latitude and longitude system, all regridded to model with lowest resolution
CanESM2 = CanESM2.regrid(CanESM2, iris.analysis.Linear())
EARTH3 =EARTH3.regrid(CanESM2, iris.analysis.Linear())
#we are only interested in the latitude and longitude relevant to Malawi (has to be slightly larger than country boundary to take into account resolution of GCMs)
Malawi = iris.Constraint(longitude=lambda v: 32.0 <= v <= 36., latitude=lambda v: -17. <= v <= -8.)
CanESM2 =CanESM2.extract(Malawi)
EARTH3 =EARTH3.extract(Malawi)
#time constraignt to make all series the same, for ERAINT this is 1990-2008 and for RCMs and GCMs this is 1961-2005
iris.FUTURE.cell_datetime_objects = True
t_constraint = iris.Constraint(time=lambda cell: 1961 <= cell.point.year <= 2005)
CanESM2 =CanESM2.extract(t_constraint)
EARTH3 =EARTH3.extract(t_constraint)
#Convert units to match, CORDEX data is in Kelvin but Observed data in Celsius, we would like to show all data in Celsius
CanESM2.convert_units('Celsius')
EARTH3.units = Unit('Celsius') #this fixes EARTH3 which has no units defined
EARTH3=EARTH3-273 #this converts the data manually from Kelvin to Celsius
#add year data to files
iriscc.add_year(CanESM2, 'time')
iriscc.add_year(EARTH3, 'time')
#We are interested in plotting the data by year, so we need to take a mean of all the data by year
CanESM2YR=CanESM2.aggregated_by('year', iris.analysis.MEAN)
EARTH3YR = EARTH3.aggregated_by('year', iris.analysis.MEAN)
#Returns an array of area weights, with the same dimensions as the cube
CanESM2YR_grid_areas = iris.analysis.cartography.area_weights(CanESM2YR)
EARTH3YR_grid_areas = iris.analysis.cartography.area_weights(EARTH3YR)
#We want to plot the mean for the whole region so we need a mean of all the lats and lons
CanESM2YR_mean = CanESM2YR.collapsed(['latitude', 'longitude'], iris.analysis.MEAN, weights=CanESM2YR_grid_areas)
EARTH3YR_mean = EARTH3YR.collapsed(['latitude', 'longitude'], iris.analysis.MEAN, weights=EARTH3YR_grid_areas)
#-------------------------------------------------------------------------
#PART 4: PLOT LINE GRAPH
#limit x axis
plt.xlim((1961,2005))
#assign the line colours and set x axis to 'year' rather than 'time'
qplt.plot(CanESM2YR_mean.coord('year'), CanESM2YR_mean, label='CanESM2', lw=1.5, color='blue')
qplt.plot(EARTH3YR_mean.coord('year'), EARTH3YR_mean, label='EC-EARTH (r3i1p1', lw=1.5, color='magenta')
#set a title for the y axis
plt.ylabel('Near-Surface Temperature (degrees Celsius)')
#create a legend and set its location to under the graph
plt.legend(loc="upper center", bbox_to_anchor=(0.5,-0.05), fancybox=True, shadow=True, ncol=2)
#create a title
plt.title('Tas for Malawi 1961-2005', fontsize=11)
#add grid lines
plt.grid()
#show the graph in the console
iplt.show()
if __name__ == '__main__':
main()
In Iris, unit strings for time coordinates must be specified in the format <time-period> since <epoch>, where <time-period> is a unit of measure of time, such as 'days', or 'years'. This format is specified by udunits2, the library Iris uses to supply valid units and perform unit conversions.
The time coordinate in this case does not have a unit that follows this format, meaning the time coordinate will not have full time coordinate functionality (this partly explains the Attribute Error in the question). To fix this we will need to construct a new time coordinate based on the values and metadata of the existing time coordinate and then replace the cube's existing time coordinate with the new one.
To do this we'll need to:
construct a new time unit based on the metadata contained in the existing time unit
take the existing time coordinate's point values and format them as datetime objects, using the format string specified in the existing time unit
convert the datetime objects from (2.) to an array of floating-point numbers using the new time unit constructed in (1.)
create a new time coordinate from the array constructed in (3.) and the new time unit produced in (1.)
remove the old time coordinate from the cube and add the new one.
Here's the code to do this...
import datetime
import cf_units
import iris
import numpy as np
t_coord = EARTH3.coord('time')
t_unit = t_coord.attributes['invalid_units']
timestep, _, t_fmt_str = t_unit.split(' ')
new_t_unit_str = '{} since 1850-01-01 00:00:00'.format(timestep)
new_t_unit = cf_units.Unit(new_t_unit_str, calendar=cf_units.CALENDAR_STANDARD)
new_datetimes = [datetime.datetime.strptime(str(dt), t_fmt_str) for dt in t_coord.points]
new_dt_points = [new_t_unit.date2num(new_dt) for new_dt in new_datetimes]
new_t_coord = iris.coords.DimCoord(new_dt_points, standard_name='time', units=new_t_unit)
t_coord_dim = cube.coord_dims('time')
cube.remove_coord('time')
cube.add_dim_coord(new_t_coord, t_coord_dim)
I've made an assumption about the best epoch for your time data. I've also made an assumption about the calendar that best describes your data, but you should be able to replace (when constructing new_t_unit) the standard calendar I've chosen with any other valid cf_units calendar without difficulty.
As a final note, it is effectively impossible to change calendar types. This is because different calendar types include and exclude different days. For example, a 360day calendar has a Feb 30 but no May 31 (as it assumes 12 idealised 30 day long months). If you try and convert from a 360day calendar to a standard calendar, problems you hit include what you do with the data from 29 and 30 Feb, and how you fill the five missing days that don't exist in a 360day calendar. For such reasons it's generally impossible to convert calendars (and Iris doesn't allow such operations).
Hope this helps!
Maybe the answer is not more useful however I write here the function that I made in order to convert the data from %Y%m%d.%f in datetime array.
The function create a perfect datetime array, without missing values, it can be modified to take into account if there are missing times, however a climate model should not have missing data.
def fromEARTHtime2Datetime(dt,timeVecEARTH):
"""
This function returns the perfect array from the EARTH %Y%m%d.%f time
format and convert it to a more useful time, such as the time array
from the datetime of pyhton, this is WHTOUT any missing data!
Parameters
----------
dt : string
This is the time discretization, it can be 1h or 6h, but always it
needs to be hours, example dt = '6h'.
timeVecEARTH : array of float
Vector of the time to be converted. For example the time of the
EARTH is day as %Y%m%d.%f.
And only this format can be converted to datetime, for example:
20490128.0,20490128.25,20490128.5,20490128.75 this will be converted
in datetime: '2049-01-28 00:00:00', '2049-01-28 60:00:00',
'2049-01-28 12:00:00','2049-01-28 18:00:00'
Returns
-------
timeArrNew : datetime
This is the perfect and WITHOUT any missing data datatime array,
for example: DatetimeIndex(['2049-01-28 00:00:00', '2049-01-28 06:00:00',
...
'2049-02-28 18:00:00', '2049-03-01 00:00:00'],
dtype='datetime64[ns]', length=129, freq='6H')
"""
dtDay = 24/np.float(dt[:-1])
partOfDay = np.arange(0,1,1/dtDay)
hDay = []
for ip in partOfDay:
hDay.append('%02.f:00:00' %(24*ip))
dictHours = dict(zip(partOfDay,hDay))
t0Str = str(timeVecEARTH[0])
timeAux0 = t0Str.split('.')
timeAux0 = timeAux0[0][0:4] +'-' + timeAux0[0][4:6] +'-' + timeAux0[0][6:] + ' ' + dictHours[float(timeAux0[1])]
tendStr = str(timeVecEARTH[-1])
timeAuxEnd = tendStr.split('.')
timeAuxEnd = timeAuxEnd[0][0:4] +'-' + timeAuxEnd[0][4:6] +'-' + timeAuxEnd[0][6:] + ' ' + dictHours[float(timeAuxEnd[1])]
timeArrNew = pd.date_range(timeAux0,timeAuxEnd, freq=dt)
return timeArrNew

Increase recursion limit and stack size in python 2.7

I'm working with large trees and need to increase the recursion limit on Python 2.7.
Using sys.setrecursionlimit(10000) crashes my kernel, so I figured I needed to increase the stack size.
However I don't know how large the stack size should be. I tried 100 MiB like this threading.stack_size(104857600), but the kernel still dies. Giving it 1 GiB throws an error.
I haven't worked with the threading module yet so am I using it wrong when I just put the above statement at the beginning of my script? I'm not doing any kind of parallel processing, everything is done in the same thread.
My computer has 128 GB of physical RAM, running Windows 10, iPython console in Spyder.
The error displayed is simply:
Kernel died, restarting
Nothing more.
EDIT:
Full code to reproduce the problem. The building of the tree works well thought it takes quite long, the kernel only dies during the recursive execution of treeToDict() when reading the whole tree into a dictionary. Maybe there is something wrong with the code of that function. The tree is a non-binary tree:
import pandas as pd
import threading
import sys
import random as rd
import itertools as it
import string
threading.stack_size(104857600)
sys.setrecursionlimit(10000)
class treenode:
# class to build the tree
def __init__(self,children,name='',weight=0,parent=None,depth=0):
self.name = name
self.weight = weight
self.children = children
self.parent = parent
self.depth = depth
self.parentname = parent.name if parent is not None else ''
def add_child(node,name):
# add element to the tree
# if it already exists at the given node increase weight
# else add a new child
for i in range(len(node.children)):
if node.children[i].name == name:
node.children[i].weight += 1
newTree = node.children[i]
break
else:
newTree = treenode([],name=name,weight=1,parent=node,depth=node.depth+1)
node.children.append(newTree)
return newTree
def treeToDict(t,data):
# read the tree into a dictionary
if t.children != []:
for i in range(len(t.children)):
data[str(t.depth)+'_'+t.name] = [t.name, t.children[i].name, t.depth, t.weight, t.parentname]
else:
data[str(t.depth)+'_'+t.name] = [t.name, '', t.depth, t.weight, t.parentname]
for i in range(len(t.children)):
treeToDict(t.children[i],data)
# Create random dataset that leads to very long tree branches:
# A is an index for each set of data B which becomes one branch
rd.seed(23)
testSet = [''.join(l) for l in it.combinations(string.ascii_uppercase[:20],2)]
A = []
B = []
for i in range(10):
for j in range(rd.randint(10,6000)):
A.append(i)
B.append(rd.choice(testSet))
dd = {"A":A,"B":B}
data = pd.DataFrame(dd)
# The maximum length should be above 5500, use another seed if it's not:
print data.groupby('A').count().max()
# Create the tree
root = treenode([],name='0')
for i in range(len(data.values)):
if i == 0:
newTree = add_child(root,data.values[i,1])
oldses = data.values[i,0]
else:
if data.values[i,0] == oldses:
newTree = add_child(newTree,data.values[i,1])
else:
newTree = add_child(root,data.values[i,1])
oldses = data.values[i,0]
result={}
treeToDict(root,result)
PS: I'm aware the treeToDict() function is faulty in that it will overwrite entries because there can be duplicate keys. For this error this bug is unimportant however.
To my experience you have a problem not with stack size, but with an algorithm itself.
It's possible to implement tree traversal procedure without recursion at all. You should implement stack-based depth/breadth first search algorithm.
Python-like pseudo-code might look like this:
stack = []
def traverse_tree(root):
stack.append(root)
while stack:
cur = stack.pop()
cur.do_some_awesome_stuff()
stack.append(cur.get_children())
This approach is incredibly scalable and allows you to deal with any trees.
As further reading you can try this and that.

The queryset's `count` is wrong after `extra`

When I use extra in a certain way on a Django queryset (call it qs), the result of qs.count() is different than len(qs.all()). To reproduce:
Make an empty Django project and app, then add a trivial model:
class Baz(models.Model):
pass
Now make a few objects:
>>> Baz(id=1).save()
>>> Baz(id=2).save()
>>> Baz(id=3).save()
>>> Baz(id=4).save()
Using the extra method to select only some of them produces the expected count:
>>> Baz.objects.extra(where=['id > 2']).count()
2
>>> Baz.objects.extra(where=['-id < -2']).count()
2
But add a select clause to the extra and refer to it in the where clause, and the count is suddenly wrong, even though the result of all() is correct:
>>> Baz.objects.extra(select={'negid': '0 - id'}, where=['"negid" < -2']).all()
[<Baz: Baz object>, <Baz: Baz object>] # As expected
>>> Baz.objects.extra(select={'negid': '0 - id'}, where=['"negid" < -2']).count()
0 # Should be 2
I think the problem has to do with django.db.models.sql.query.BaseQuery.get_count(). It checks whether the BaseQuery's select or aggregate_select attributes have been set; if so, it uses a subquery. But django.db.models.sql.query.BaseQuery.add_extra adds only to the BaseQuery's extra attribute, not select or aggregate_select.
How can I fix the problem? I know I could just use len(qs.all()), but it would be nice to be able to pass the extra'ed queryset to other parts of the code, and those parts may call count() without knowing that it's broken.
Redefining get_count() and monkeypatching appears to fix the problem:
def get_count(self):
"""
Performs a COUNT() query using the current filter constraints.
"""
obj = self.clone()
if len(self.select) > 1 or self.aggregate_select or self.extra:
# If a select clause exists, then the query has already started to
# specify the columns that are to be returned.
# In this case, we need to use a subquery to evaluate the count.
from django.db.models.sql.subqueries import AggregateQuery
subquery = obj
subquery.clear_ordering(True)
subquery.clear_limits()
obj = AggregateQuery(obj.model, obj.connection)
obj.add_subquery(subquery)
obj.add_count_column()
number = obj.get_aggregation()[None]
# Apply offset and limit constraints manually, since using LIMIT/OFFSET
# in SQL (in variants that provide them) doesn't change the COUNT
# output.
number = max(0, number - self.low_mark)
if self.high_mark is not None:
number = min(number, self.high_mark - self.low_mark)
return number
django.db.models.sql.query.BaseQuery.get_count = quuux.get_count
Testing:
>>> Baz.objects.extra(select={'negid': '0 - id'}, where=['"negid" < -2']).count()
2
Updated to work with Django 1.2.1:
def basequery_get_count(self, using):
"""
Performs a COUNT() query using the current filter constraints.
"""
obj = self.clone()
if len(self.select) > 1 or self.aggregate_select or self.extra:
# If a select clause exists, then the query has already started to
# specify the columns that are to be returned.
# In this case, we need to use a subquery to evaluate the count.
from django.db.models.sql.subqueries import AggregateQuery
subquery = obj
subquery.clear_ordering(True)
subquery.clear_limits()
obj = AggregateQuery(obj.model)
obj.add_subquery(subquery, using=using)
obj.add_count_column()
number = obj.get_aggregation(using=using)[None]
# Apply offset and limit constraints manually, since using LIMIT/OFFSET
# in SQL (in variants that provide them) doesn't change the COUNT
# output.
number = max(0, number - self.low_mark)
if self.high_mark is not None:
number = min(number, self.high_mark - self.low_mark)
return number
models.sql.query.Query.get_count = basequery_get_count
I'm not sure if this fix will have other unintended consequences, however.