How to obtain the "time" values of a schedule - quantlib

Assuming a fixed rate bond with the schedule shown in the sample code below.
I am able to obtain the number of days between the tenors by using the businessDaysBetween function.
Now I would like the "time value". Is there a way of doing it without creating a new function?
Here is the expected result:
May 14th, 2012 .5
November 14th, 2012 .5
May 14th, 2013 .5
November 14th, 2013 .5
May 14th, 2014 .5
November 14th, 2014 .5
May 14th, 2015 .5
November 16th, 2015 .505556
May 16th, 2016 .5
November 14th, 2016 .49444
Here is the code:
from QuantLib import *
import pandas as pd
effective_date = Date(14, 11, 2011)
termination_date = Date(14, 11, 2016)
tenor = Period(Semiannual)
calendar = UnitedStates()
business_convention = ModifiedFollowing
termination_business_convention = Following
date_generation = DateGeneration.Forward
end_of_month = False
day_count = Thirty360()
schedule = Schedule(effective_date,
termination_date,
tenor,
calendar,
business_convention,
termination_business_convention,
date_generation,
end_of_month)
t = []
for i, d in enumerate(schedule):
tmp = i+1, d,
t.append(tmp)
df = pd.DataFrame(t,columns = ['tenorNo','tenorDate'])
nbDays = []
for x in df['tenorNo'] :
if x == 1:
tmp = 0
else:
tmp = calendar.businessDaysBetween(df['tenorDate'][x-2],df['tenorDate'][x-1])
nbDays.append(tmp)
df['nbDays'] = nbDays
print df
tenorNo tenorDate nbDays
0 1 November 14th, 2011 0
1 2 May 14th, 2012 125
2 3 November 14th, 2012 127
3 4 May 14th, 2013 124
4 5 November 14th, 2013 127
5 6 May 14th, 2014 124
6 7 November 14th, 2014 127
7 8 May 14th, 2015 124
8 9 November 16th, 2015 127
9 10 May 16th, 2016 125
10 11 November 14th, 2016 125

That's what DayCounter instances are for. The time will depend on the day-count convention you choose (for example, you seem to be using 30/360).
Calling
day_count.yearFraction(date1, date2)
will return the time between date1 and date2.

Related

Replacing variable entries to be the same in each group

I'm working with panel data in Stata, and I have a set up like the following:
ID
year
value
1
2010
1
2011
20
1
2012
20
1
2013
1
2014
2
2010
2
2011
14
2
2012
14
2
2013
14
2
2014
14
and I want to change the blank entries to be the same as the other entries within that ID, for any year. I.e., I want something like the following:
ID
year
value
1
2010
20
1
2011
20
1
2012
20
1
2013
20
1
2014
20
2
2010
14
2
2011
14
2
2012
14
2
2013
14
2
2014
14
What do you recommend?
If the value in variable value are always the same within id you can use this:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id int year byte value
1 2010 .
1 2011 20
1 2012 20
1 2013 .
1 2014 .
2 2010 .
2 2011 14
2 2012 14
2 2013 14
2 2014 14
end
*Get mean of values within id
bysort id : egen value2 = mean(value)
*Transfer values back to original var to maintain var labels etc. then drop value2
replace value = value2
drop value2

Line graph with first half, second half year x-axis?

I have 2 observations per year for years 2011-2015. The first observation is january-june and the second is july-december. To preserve the year I thought I should make a variable that denotes if that observation is a "half" or not. But now I'm not sure how to graph it...
year half value
2011 0 10.42
2011 1 10.33
2012 0 11.66
2012 1 11.01
2013 0 14.29
2013 1 10.95
2014 0 12.42
2014 1 7.04
2015 0 7.07
2015 1 6.95
Thank you!
There are many ways to plot such data. Here's one:
clear
input year half value
2011 0 10.42
2011 1 10.33
2012 0 11.66
2012 1 11.01
2013 0 14.29
2013 1 10.95
2014 0 12.42
2014 1 7.04
2015 0 7.07
2015 1 6.95
end
set scheme s1color
gen date = yh(year, half + 1)
format date %th
twoway line value date, ///
|| scatter value date if half == 0, ms(Oh) || scatter value date if half == 1 , ms(Th) ///
legend(order(2 "Jan-June" 3 "Jul-Dec") ring(0) col(1) pos(1)) xtitle("")

Pandas add multiple new columns at once from list of lists

I have a list of timestamp lists where each inner list looks like this:
['Tue', 'Feb', '7', '10:07:40', '2017']
Is it possible with Pandas to add five new columns at the same time to an already created dataframe (same length as the outer list), that are equal to each of these values, with names 'day','month','date','time','year'?
I think you can use DataFrame constructor with concat:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
L = [['Tue', 'Feb', '7', '10:07:40', '2017'],
['Tue', 'Feb', '7', '10:07:40', '2017'],
['Tue', 'Feb', '7', '10:07:40', '2017']]
cols = ['day','month','date','time','year']
df1 = pd.DataFrame(L, columns=cols)
print (df1)
day month date time year
0 Tue Feb 7 10:07:40 2017
1 Tue Feb 7 10:07:40 2017
2 Tue Feb 7 10:07:40 2017
df2 = pd.concat([df, df1], axis=1)
print (df2)
A B C day month date time year
0 1 4 7 Tue Feb 7 10:07:40 2017
1 2 5 8 Tue Feb 7 10:07:40 2017
2 3 6 9 Tue Feb 7 10:07:40 2017
One liner:
df2 = pd.concat([df, pd.DataFrame(L, columns=['day','month','date','time','year'])], axis=1)
print (df2)
A B C day month date time year
0 1 4 7 Tue Feb 7 10:07:40 2017
1 2 5 8 Tue Feb 7 10:07:40 2017
2 3 6 9 Tue Feb 7 10:07:40 2017

python: obtaining a column of dates from the columns of years-months-days

Suppose I have a very simple dataframe:
>>> a
Out[158]:
monthE yearE dayE
0 10 2014 15
1 2 2012 15
2 2 2014 15
3 12 2015 15
4 2 2012 15
Suppose that I want to create the column with the date related to every line, using three columns of integers.
When I have simple numbers it is enough to do like:
>>> datetime.date(1983,11,8)
Out[159]: datetime.date(1983, 11, 8)
If I have to create a column of dates (theoretically a very basic request), instead:
a.apply(lambda x: datetime.date(x['yearE'],x['monthE'],x['dayE']))
I obtain the following error:
KeyError: ('yearE', u'occurred at index monthE')
I think you can first remove last char E and then use to_datetime, but then get pandas timestamps not python dates:
df.columns = df.columns.str[:-1]
df['date'] = pd.to_datetime(df)
#if multiple columns filter by subset
#df['date'] = pd.to_datetime(df[['year','month','day']])
print (df)
month year day date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
print (df.date.dtypes)
datetime64[ns]
print (df.date.iloc[0])
2014-10-15 00:00:00
print (type(df.date.iloc[0]))
<class 'pandas.tslib.Timestamp'>
Thank you MaxU for solution:
df['date'] = pd.to_datetime(df.rename(columns = lambda x: x[:-1]))
#if another columns in df
#df['date'] = pd.to_datetime(df[['yearE','monthE','dayE']].rename(columns=lambda x: x[:-1]))
print (df)
monthE yearE dayE date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
But if really need python dates add axis=1 to apply, but then is impossible use some pandas functions:
df['date'] =df.apply(lambda x: datetime.date(x['yearE'],x['monthE'],x['dayE']), axis=1)
print (df)
monthE yearE dayE date
0 10 2014 15 2014-10-15
1 2 2012 15 2012-02-15
2 2 2014 15 2014-02-15
3 12 2015 15 2015-12-15
4 2 2012 15 2012-02-15
print (df.date.dtypes)
object
print (df.date.iloc[0])
2014-10-15
print (type(df.date.iloc[0]))
<class 'datetime.date'>

Splitting a column that contains multiple date formats

I have a csv file that contains a column with multiple date formats. I need to split them and get the extracted result in the same format.
Wednesday 12 August 2015
Wednesday 12 August 2015
Friday April 1 2016
Friday April 1 2016
5/12/2016
5/12/2016
This is the file and i want it in the mm/dd/yy format. My code is as follows:
import re
import csv
import pandas as pd
#delimiters = " ", "/"
#f = open('merged_34.csv')
f = open('test3.csv')
df = pd.read_csv('test3.csv')
for item in df['serverDatePrettyFirstAction']:
if '/' in item:
newDate.append(item)
else:
item = item.split(' ', 1)[1]
newDate.append(item)
df['newDate'] = newDate
df.to_csv('D:/Python/10.36.202.64/newfile.csv', index = False)
And this is what i get:
serverDatePrettyFirstAction newDate
Wednesday 12 August 2015 12-Aug-15
Wednesday 12 August 2015 12-Aug-15
Friday April 1 2016 April 1 2016
Friday April 1 2016 April 1 2016
5/12/2016 5/12/2016
5/12/2016 5/12/2016
Also is there a way to overwrite the values in the same column itself
a faster approach would be to use pandas's method to_datetime():
In [2]: df
Out[2]:
Date
0 Wednesday 12 August 2015
1 Wednesday 12 August 2015
2 Friday April 1 2016
3 Friday April 1 2016
4 5/12/2016
5 5/12/2016
In [6]: df['newDate'] = pd.to_datetime(df['Date'])
Result:
In [7]: df
Out[7]:
Date newDate
0 Wednesday 12 August 2015 2015-08-12
1 Wednesday 12 August 2015 2015-08-12
2 Friday April 1 2016 2016-04-01
3 Friday April 1 2016 2016-04-01
4 5/12/2016 2016-05-12
5 5/12/2016 2016-05-12
You can use third party dateutil library as long as your data is not too big.( After all, It guesses format every time)
import pandas as pd
from dateutil import parser
df = pd.read_csv('test3.csv')
df['newDate'] = df['serverDatePrettyFirstAction'].apply(parser.parse)
df.to_csv('newfile.csv', index=False, date_format='%Y-%m-%d ')
to overwrite the values in the same column
Use
df['serverDatePrettyFirstAction']=df['serverDatePrettyFirstAction'].apply(parser.parse)