I want to calculate that Price at time t minus Price at time t+1 for security i at day k. Particularly, t+1 is defined as the price at least 5 min after the price at time t. Hence, I added an variable to regulate the interval and used following code to create a row of Price at time t+1.
Here is the sample of input data.
data test4;
length _ric$ 25 type$ 5 interval$ 15 time_l_$ 25 ;
input _ric$ date_l_ time_l_ type$ price interval$;
datalines;
AXPA031407800.U 20131212 9:52:56.537 Trade 5.85 09:50:00
AXPA031407800.U 20131212 9:52:56.537 Trade 5.85 09:50:00
AXPA031407800.U 20131212 9:53:13.586 Trade 5.8 09:50:00
AXPA031407800.U 20131212 9:53:13.586 Trade 5.8 09:50:00
AXPA031407800.U 20131212 9:53:13.607 Trade 5.8 09:50:00
AXPA031407800.U 20131212 9:53:13.607 Trade 5.8 09:50:00
AXPA031407800.U 20131212 9:53:34.990 Trade 5.8 09:50:00
AXPA031407800.U 20131212 9:55:12.990 Trade 5.7 09:55:00
AXPA031407800.U 20131212 9:55:12.990 Trade 5.7 09:55:00
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00
AXPA031407800.U 20131212 9:55:13.011 Trade 5.7 09:55:00
AXPA031407900.U 20131205 9:37:58.420 Trade 6.25 09:35:00
AXPA031407900.U 20131205 9:39:04.996 Trade 6.25 09:35:00
AXPA031407900.U 20131205 9:39:04.996 Trade 6.25 09:35:00
AXPA031407900.U 20131205 9:39:04.996 Trade 6.25 09:35:00
AXPA031407900.U 20131205 9:39:04.996 Trade 6.25 09:35:00
AXPA031407900.U 20131205 9:39:04.996 Trade 6.25 09:35:00
;
data test1;
set test nobs=nobs;
do _i = _n_ to nobs until (new_date ne date_l_ or new_time > interval);
set test (rename=(date_l_=new_date price=new_price interval=new_time)) point=_i;
end;
if (date_l_ ne new_date) or (_i > nobs) then call missing(new_price);
run;
The output data is shown as following. However, the _RIC (security name), date_l_, and time_l_ are changed. As shown in output data, for example, AXPA031407800.U has 13 observations in input data but 7 observations in output data.
_ric type interval time_l_ date_l_ price new_date new_time new_price
AXPA031407800.U Trade 09:50:00 9:55:12.990 20131212 5.85 20131212 09:55:00 5.7
AXPA031407800.U Trade 09:50:00 9:55:12.990 20131212 5.85 20131212 09:55:00 5.7
AXPA031407800.U Trade 09:50:00 9:55:12.990 20131212 5.8 20131212 09:55:00 5.7
AXPA031407800.U Trade 09:50:00 9:55:12.990 20131212 5.8 20131212 09:55:00 5.7
AXPA031407800.U Trade 09:50:00 9:55:12.990 20131212 5.8 20131212 09:55:00 5.7
AXPA031407800.U Trade 09:50:00 9:55:12.990 20131212 5.8 20131212 09:55:00 5.7
AXPA031407800.U Trade 09:50:00 9:55:12.990 20131212 5.8 20131212 09:55:00 5.7
AXPA031407900.U Trade 09:55:00 9:37:58.420 20131212 5.7 20131205 09:35:00
AXPA031407900.U Trade 09:55:00 9:37:58.420 20131212 5.7 20131205 09:35:00
AXPA031407900.U Trade 09:55:00 9:37:58.420 20131212 5.7 20131205 09:35:00
AXPA031407900.U Trade 09:55:00 9:37:58.420 20131212 5.7 20131205 09:35:00
AXPA031407900.U Trade 09:55:00 9:37:58.420 20131212 5.7 20131205 09:35:00
AXPA031407900.U Trade 09:55:00 9:37:58.420 20131212 5.7 20131205 09:35:00
AXPA031407900.U Trade 09:35:00 9:39:04.996 20131205 6.25 20131205 09:35:00
AXPA031407900.U Trade 09:35:00 9:39:04.996 20131205 6.25 20131205 09:35:00
AXPA031407900.U Trade 09:35:00 9:39:04.996 20131205 6.25 20131205 09:35:00
AXPA031407900.U Trade 09:35:00 9:39:04.996 20131205 6.25 20131205 09:35:00
AXPA031407900.U Trade 09:35:00 9:39:04.996 20131205 6.25 20131205 09:35:00
AXPA031407900.U Trade 09:35:00 9:39:04.996 20131205 6.25 20131205 09:35:00
Here is my target results that create a new variable, Price_next_interval. The new variable represents the price in next interval at the same day.
_RIC Date_l_ time_l_ type Price interval Price_next_interval
AXPA031407800.U 20131212 9:52:56.537 Trade 5.85 09:50:00 5.7
AXPA031407800.U 20131212 9:52:56.537 Trade 5.85 09:50:00 5.7
AXPA031407800.U 20131212 9:53:13.586 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:53:13.586 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:53:13.607 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:53:13.607 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:53:34.990 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:55:12.990 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:12.990 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:13.011 Trade 5.7 09:55:00 .
AXPA031407900.U 20131205 9:37:58.420 Trade 6.25 09:35:00 6.28
AXPA031407900.U 20131205 9:45:04.996 Trade 6.28 09:45:00 6.29
AXPA031407900.U 20131205 9:45:04.996 Trade 6.28 09:45:00 6.29
AXPA031407900.U 20131205 9:55:04.996 Trade 6.29 09:55:00 .
AXPA031407900.U 20131205 9:55:04.996 Trade 6.29 09:55:00 .
AXPA031407900.U 20131205 9:55:04.996 Trade 6.29 09:55:00 .
This is easier to do via by-group processing and retain rather than point, in my opinion:
data test4;
length _ric$ 25 type$ 5;
input _ric $ date_l_ :yymmdd8. time_l_ :time. type$ price interval :time. price_next_interval_goal;
format date_l_ yymmddn8. time_l_ interval time.;
datalines;
AXPA031407800.U 20131212 9:52:56.537 Trade 5.85 09:50:00 5.7
AXPA031407800.U 20131212 9:52:56.537 Trade 5.85 09:50:00 5.7
AXPA031407800.U 20131212 9:53:13.586 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:53:13.586 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:53:13.607 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:53:13.607 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:53:34.990 Trade 5.8 09:50:00 5.7
AXPA031407800.U 20131212 9:55:12.990 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:12.990 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:13.002 Trade 5.7 09:55:00 .
AXPA031407800.U 20131212 9:55:13.011 Trade 5.7 09:55:00 .
AXPA031407900.U 20131205 9:37:58.420 Trade 6.25 09:35:00 6.28
AXPA031407900.U 20131205 9:45:04.996 Trade 6.28 09:45:00 6.29
AXPA031407900.U 20131205 9:45:04.996 Trade 6.28 09:45:00 6.29
AXPA031407900.U 20131205 9:55:04.996 Trade 6.29 09:55:00 .
AXPA031407900.U 20131205 9:55:04.996 Trade 6.29 09:55:00 .
AXPA031407900.U 20131205 9:55:04.996 Trade 6.29 09:55:00 .
;
/* Sort into reverse order */
proc sort data = v_want out = want;
by descending date_l_ interval;
run;
/* Carry the price forward via retain if we've got to the last row for this interval */
/* N.B. do not populate retained figure until after the row has been output*/
/* Clear the carried-forward figure at the start of each date*/
data want2;
set test4;
by descending date_l_ descending interval;
if first.date_l_ then call missing(price_next_interval);
output;
retain price_next_interval;
if last.interval then price_next_interval = price;
run;
/*Sort back into original order*/
proc sort data = want2;
by descending date_l_ interval;
run;
Related
I want to calculate the annual average of sales by quarter in Power Bi.
I usually solve this in Excel with an Averageif between dates (last date of a quarter and same date one year earlier).
Below a sample of my data:
Date Quarter Sale
1/12/2016 2016-Q1 12.5
2/25/2016 2016-Q1 65.1
4/7/2016 2016-Q2 95.5
6/22/2016 2016-Q2 74.5
7/10/2016 2016-Q3 7.3
8/30/2016 2016-Q3 87.6
9/5/2016 2016-Q3 88.4
10/27/2016 2016-Q4 18
11/12/2016 2016-Q4 64.2
12/29/2016 2016-Q4 37.2
1/28/2017 2017-Q1 17.8
3/8/2017 2017-Q1 59.6
4/16/2017 2017-Q2 68.7
6/15/2017 2017-Q2 68.5
7/20/2017 2017-Q3 61.8
8/7/2017 2017-Q3 10.8
9/23/2017 2017-Q3 26.5
10/7/2017 2017-Q4 49.8
11/26/2017 2017-Q4 79.7
12/3/2017 2017-Q4 80.5
1/18/2018 2018-Q1 12.5
3/19/2018 2018-Q1 54.7
4/12/2018 2018-Q2 64.0
6/19/2018 2018-Q2 58.9
7/29/2018 2018-Q3 59.9
8/9/2018 2018-Q3 4.1
9/13/2018 2018-Q3 20.2
The wanted result is the table below:
Quarter1_Yr_Trailing Avg Sale
2017-Q1 52.3
2017-Q2 56.7
2017-Q3 43.3
2017-Q4 52.4
2018-Q1 51.3
2018-Q2 49.9
2018-Q3 48.4
Just to clarify, 2018-Q3 is the average of any sale that was recorded between Sept 30, 2017 and Sept 30, 2018:
10/7/2017 2017-Q4 49.8
11/26/2017 2017-Q4 79.7
12/3/2017 2017-Q4 80.5
1/18/2018 2018-Q1 12.5
3/19/2018 2018-Q1 54.7
4/12/2018 2018-Q2 64.0
6/19/2018 2018-Q2 58.9
7/29/2018 2018-Q3 59.9
8/9/2018 2018-Q3 4.1
9/13/2018 2018-Q3 20.2
Thanks for your help.
regards,
Simon
I want to write a rolling mean code of m_tax using Python 2.7 pandas to analysis the time series data from the web page (http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm).
datum m_ta m_tax m_taxd m_tan m_tand
------- ----- ----- ---------- ----- ----------
1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
....
Here I tried my code as:
pd.rolling_mean(df.resample("1M", fill_method="ffill"), window=60, min_periods=1, center=True).mean()
and I got result:
m_ta 11.029173
m_tax 17.104283
m_tan 4.848637
month 6.499500
monthly_mean 11.030405
monthly_std 1.836159
m_tax% 0.083348
m_tan% 0.023627
dtype: float64
In another way I tried as:
s = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/1900', periods=1000))
s = s.cumsum()
r = s.rolling(window=60)
r.mean()
and I got result
1900-01-01 NaN
1900-01-02 NaN
1900-01-03 NaN
1900-01-04 NaN
1900-01-05 NaN
1900-01-06 NaN
1900-01-07 NaN
1900-01-08 NaN
...
So I am confused here. Which one should I use? Could someone please give me idea? Thanks!
Starting with version 0.18.0, both rolling() and resample() are methods that behave similarly to groupby() and are deprecated as functions.
What's new in pandas version 0.18.0
rolling()/expanding() in pandas version 0.18.0
resample() in pandas version 0.18.0
I can't tell exactly what your desired results are, but maybe something like this is what you want? (And you can see the warning message below, although I'm not sure what triggers it here.)
>>> df
m_ta m_tax m_taxd m_tan m_tand
datum
1901-01-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02-01 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03-01 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04-01 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05-01 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06-01 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07-01 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08-01 20.7 25.9 1901-08-01 14.7 1901-08-29
>>> df.resample("1M").rolling(3,center=True,min_periods=1).mean()
/Users/john/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
if __name__ == '__main__':
m_ta m_tax m_tan
datum
1901-01-31 -3.400000 4.250000 -10.050000
1901-02-28 -0.333333 7.333333 -6.500000
1901-03-31 5.100000 11.733333 0.033333
1901-04-30 11.400000 18.066667 6.733333
1901-05-31 16.466667 21.833333 11.400000
1901-06-30 20.066667 24.900000 14.566667
1901-07-31 21.366667 26.033333 15.400000
1901-08-31 21.550000 26.650000 15.800000
I am trying to scrape time series data using pandas DataFrame for Python 2.7 from the web page (http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm). Could somebody please help me how I can write the code. Thanks!
I tried my code as follows:
html =urllib.urlopen("http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm");
text= html.read();
df=pd.DataFrame(index=datum, columns=['m_ta','m_tax','m_taxd', 'm_tan','m_tand'])
But it doesn't give anything. Here I want to display the table as it is.
You can use BeautifulSoup for parsing all font tags, then split column a, set_index from column idx and rename_axis to None - remove index name:
import pandas as pd
import urllib
from bs4 import BeautifulSoup
html = urllib.urlopen("http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm");
soup = BeautifulSoup(html)
#print soup
fontTags = soup.findAll('font')
#print fontTags
#get text from tags fonts
li = [x.text for x in soup.findAll('font')]
#remove first 13 tags, before not contain necessary data
df = pd.DataFrame(li[13:], columns=['a'])
#split data by arbitrary whitspace
df = df.a.str.split(r'\s+', expand=True)
#set column names
df.columns = columns=['idx','m_ta','m_tax','m_taxd', 'm_tan','m_tand']
#convert column idx to period
df['idx'] = pd.to_datetime(df['idx']).dt.to_period('M')
#convert columns to datetime
df['m_taxd'] = pd.to_datetime(df['m_taxd'])
df['m_tand'] = pd.to_datetime(df['m_tand'])
#set column idx to index, remove index name
df = df.set_index('idx').rename_axis(None)
print df
m_ta m_tax m_taxd m_tan m_tand
1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
1901-09 15.9 19.9 1901-09-01 11.8 1901-09-09
1901-10 12.6 17.9 1901-10-04 8.3 1901-10-31
1901-11 4.7 11.1 1901-11-14 -0.2 1901-11-26
1901-12 4.2 8.4 1901-12-22 -1.4 1901-12-07
1902-01 3.4 7.5 1902-01-25 -2.2 1902-01-15
1902-02 2.8 6.6 1902-02-09 -2.8 1902-02-06
1902-03 5.3 13.3 1902-03-22 -3.5 1902-03-13
1902-04 10.5 15.8 1902-04-21 6.1 1902-04-08
1902-05 12.5 20.6 1902-05-31 8.5 1902-05-10
1902-06 18.5 23.8 1902-06-30 14.4 1902-06-19
1902-07 20.2 25.2 1902-07-01 15.5 1902-07-03
1902-08 21.1 25.4 1902-08-07 14.7 1902-08-13
1902-09 16.1 23.8 1902-09-05 9.5 1902-09-24
1902-10 10.8 15.4 1902-10-12 4.9 1902-10-25
1902-11 2.4 9.1 1902-11-01 -4.2 1902-11-18
1902-12 -3.1 7.2 1902-12-27 -17.6 1902-12-15
1903-01 -0.5 8.3 1903-01-11 -11.5 1903-01-23
1903-02 4.6 13.4 1903-02-23 -2.7 1903-02-17
1903-03 9.0 16.1 1903-03-28 4.9 1903-03-09
1903-04 9.0 16.5 1903-04-29 2.6 1903-04-19
1903-05 16.4 21.2 1903-05-03 11.3 1903-05-19
1903-06 19.0 23.1 1903-06-03 15.6 1903-06-07
... ... ... ... ... ...
1998-07 22.5 30.7 1998-07-23 15.0 1998-07-09
1998-08 22.3 30.5 1998-08-03 14.8 1998-08-29
1998-09 16.0 21.0 1998-09-12 10.4 1998-09-14
1998-10 11.9 17.2 1998-10-07 8.2 1998-10-27
1998-11 3.8 8.4 1998-11-05 -1.6 1998-11-21
1998-12 -1.6 6.2 1998-12-14 -8.2 1998-12-26
1999-01 0.6 4.7 1999-01-15 -4.8 1999-01-31
1999-02 1.5 6.9 1999-02-05 -4.8 1999-02-01
1999-03 8.2 15.5 1999-03-31 3.0 1999-03-16
1999-04 13.1 17.1 1999-04-16 6.1 1999-04-18
1999-05 17.2 25.2 1999-05-31 11.1 1999-05-06
1999-06 19.8 24.4 1999-06-07 12.2 1999-06-22
1999-07 22.3 28.0 1999-07-06 16.3 1999-07-23
1999-08 20.6 26.7 1999-08-09 17.3 1999-08-23
1999-09 19.3 22.9 1999-09-26 15.0 1999-09-02
1999-10 11.5 19.0 1999-10-03 5.7 1999-10-18
1999-11 3.9 12.6 1999-11-04 -2.2 1999-11-21
1999-12 1.3 6.4 1999-12-13 -8.1 1999-12-25
2000-01 -0.7 8.7 2000-01-31 -6.6 2000-01-25
2000-02 4.5 10.2 2000-02-01 -0.1 2000-02-23
2000-03 6.7 11.6 2000-03-09 0.6 2000-03-17
2000-04 14.8 22.1 2000-04-21 5.8 2000-04-09
2000-05 18.7 23.9 2000-05-27 12.3 2000-05-22
2000-06 21.9 29.3 2000-06-14 15.4 2000-06-17
2000-07 20.3 26.6 2000-07-03 14.0 2000-07-16
2000-08 23.8 29.7 2000-08-20 18.5 2000-08-31
2000-09 16.1 21.5 2000-09-14 12.7 2000-09-24
2000-10 14.1 18.7 2000-10-04 8.0 2000-10-23
2000-11 9.0 14.9 2000-11-15 3.7 2000-11-30
2000-12 3.0 9.4 2000-12-14 -6.8 2000-12-24
[1200 rows x 5 columns]
I am trying to select the following data using pandas for Python 2.7 from the web page (http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm) starting from the year 1991 to 2000. somebody please can help me how I can write the code. Thanks!
datum m_ta m_tax m_taxd m_tan m_tand
------- ----- ----- ---------- ----- ----------
1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
1901-09 15.9 19.9 1901-09-01 11.8 1901-09-09
1901-10 12.6 17.9 1901-10-04 8.3 1901-10-31
1901-11 4.7 11.1 1901-11-14 -0.2 1901-11-26
1901-12 4.2 8.4 1901-12-22 -1.4 1901-12-07
1902-01 3.4 7.5 1902-01-25 -2.2 1902-01-15
1902-02 2.8 6.6 1902-02-09 -2.8 1902-02-06
1902-03 5.3 13.3 1902-03-22 -3.5 1902-03-13
1902-04 10.5 15.8 1902-04-21 6.1 1902-04-08
1902-05 12.5 20.6 1902-05-31 8.5 1902-05-10
1902-06 18.5 23.8 1902-06-30 14.4 1902-06-19
....
You can use df.year with boolean indexing for selecting data by column datum:
#convert column datum to period
df['datum'] = pd.to_datetime(df['datum']).dt.to_period('M')
#convert columns to datetime
df['m_taxd'] = pd.to_datetime(df['m_taxd'])
df['m_tand'] = pd.to_datetime(df['m_tand'])
print df.datum.dt.year
0 1901
1 1901
2 1901
3 1901
4 1901
5 1901
6 1901
7 1901
8 1901
9 1901
10 1901
11 1901
12 1902
13 1902
14 1902
15 1902
16 1902
17 1902
Name: datum, dtype: int64
#change 1901 to 2000
print df[df.datum.dt.year <= 1901]
datum m_ta m_tax m_taxd m_tan m_tand
0 1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1 1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
2 1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
3 1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
4 1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
5 1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
6 1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
7 1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
8 1901-09 15.9 19.9 1901-09-01 11.8 1901-09-09
9 1901-10 12.6 17.9 1901-10-04 8.3 1901-10-31
10 1901-11 4.7 11.1 1901-11-14 -0.2 1901-11-26
11 1901-12 4.2 8.4 1901-12-22 -1.4 1901-12-07
I want to write Python code to analyze the percentage of m_tax and m_tan for Python 2.7 from the web page (http://owww.met.hu/eghajlat/eghajlati_adatsorok/bp/Navig/202_EN.htm). I have already the dataframe code, but I couldn't write percentage code. Could somebody please help me how I can write the code. Thanks!
datum m_ta m_tax m_taxd m_tan m_tand
------- ----- ----- ---------- ----- ----------
1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10
1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15
1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01
1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23
1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05
1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17
1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04
1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29
You can call div and pass the sum of the columns to add % columns:
In [66]:
df['m_tax%'],df['m_tan%'] = df['m_tax'].div(df['m_tax'].sum()) * 100, df['m_tan'].div(df['m_tax'].sum()) * 100
df
Out[66]:
datum m_ta m_tax m_taxd m_tan m_tand m_tax% m_tan%
0 1901-01 -4.7 5.0 1901-01-23 -12.2 1901-01-10 3.551136 -8.664773
1 1901-02 -2.1 3.5 1901-02-06 -7.9 1901-02-15 2.485795 -5.610795
2 1901-03 5.8 13.5 1901-03-20 0.6 1901-03-01 9.588068 0.426136
3 1901-04 11.6 18.2 1901-04-10 7.4 1901-04-23 12.926136 5.255682
4 1901-05 16.8 22.5 1901-05-31 12.2 1901-05-05 15.980114 8.664773
5 1901-06 21.0 24.8 1901-06-03 14.6 1901-06-17 17.613636 10.369318
6 1901-07 22.4 27.4 1901-07-30 16.9 1901-07-04 19.460227 12.002841
7 1901-08 20.7 25.9 1901-08-01 14.7 1901-08-29 18.394886 10.440341