How to get weighted percentile of each observation in SAS - sas

I have dataset like this:
data providers;
input prv_id mbr_cnt value;
datalines;
1100 25860 3.9025
4700 71855 8.8566
5500 72147 6.9918
6400 25144 4.5200
7000 58114 9.3391
7900 67222 7.5189
8300 54039 8.9301
8800 2204 3.2221
9400 71600 9.9682
10000 68807 7.6581
10200 16322 8.6505
10700 115118 12.4198
11100 148235 18.2053
11700 56441 8.6987
12100 58556 7.6724
12500 81865 10.1048
12900 18106 3.7881
13400 98701 12.9679
13900 10347 3.7001
14400 45516 6.3924
;
run;
I need to calculate percentile of each observation weighted by mbr_cnt. Is there a way to do it in SAS? I tried to use proc rank data=providers groups=100 out=providers_percentile; but that just gives me unweighted percentile.

PROC FREQ has a WEIGHT option and can calculate weighted cumulative percent.
proc freq data=providers;
ods output list=freqout;
weight mbr_cnt;
tables value * prv_id / list missing;
run;
Not sure if this exactly what you need.

Related

Use Proc Summary Statistic in Another Data Step

I have a dataset that contains a list of employees and the number of hours they work.
Name Hours_CD_Max
Bob 455
Dan 675
Jane 543
Suzzy 575
Emily 234
I used a proc summary datastep to calculate the number of total hours worked by these employees.
Proc summary data = staff;
where position = 'PA_FT_UMC';
var Hours_CD_Max;
output out=PAFT_Only_Staff_Totals
sum = Hours_CD_Max_tot;
run;
I would like to use the 'Hours_CD_Max_tot' statistic (2482) from this proc summary datastep and apply to other occurrences within my code where I need to make calculations.
For example, I want to take each providers current hours and divide it by Hours_CD_Max_tot.
So for example, create a dataset that looks like this
data PA_FT_UMC_StaffingV2;
set PA_FT_UMC_StaffingV1 PAFT_Only_Staff_Totals;
if position = 'PA_FT_UMC';
PA_FT_Max_Percent= Hours_CD_Max/Hours_CD_Max_tot;
run;
Name Hours_CD_Max Hours_CD_Max_tot
Bob 455 2482
Dan 675 2482
Jane 543 2482
Suzzy 575 2482
Emily 234 2482
I realize I can calculate the % of hours worked using a proc freq but I really would like to have that (2482) number to be freely available so I can plug it into more complex equations.
If you want to use a variable from another dataset then pull it into your current code.
If there are BY variables you can merge it onto your other dataset. If this case where you just have one observation just run a SET command on the first iteration of the data step.
data PA_FT_UMC_StaffingV2;
set PA_FT_UMC_StaffingV1 PAFT_Only_Staff_Totals;
if _n_=1 then set PAFT_Only_Staff_Totals(keep=Hours_CD_Max_tot);
if position = 'PA_FT_UMC';
PA_FT_Max_Percent= Hours_CD_Max/Hours_CD_Max_tot;
run;
You need to create a macro variable from it and SQL is the fastest solution.
proc sql noprint;
select sum(hours_cd_max) into :hours_max
from have;
quit;
Then you can use it in your calculations later on, using &hours_max

Cluster Bar Chart

I have data that look like the following:
data have;
format date date9.;
input date:mmddyy10. Intervention _24hrPtVolumeESI_1_5;
datalines;
9/17/2018 0 204
9/24/2018 0 139
10/17/2018 0 527
10/23/2018 1 430
11/01/2018 1 231
;
run;
I would like to create a bar chart where the x axis contains ranges of median wait time (e.g. 100-125, 126-150 etc.) while displaying those times comparatively based on intervention (0 or 1). Thus, each range would have two bars-one for preintervention (0) and post interventions(1) The Y axis would simply show the counts for how man given median scores fell within the x axis range.
I've tried toying around with a sgplot code but that produces sloppy results.
proc sgplot data=WORK.FelaCombo;
vbar _24hrPtVolumeESI_1_5 / response=_24hrPtVolumeESI_1_5 stat=sum
group=intervention nostatlabel
groupdisplay=cluster;
xaxis display=(nolabel);
yaxis grid;
run;
Try using a histogram instead. vbar is more for discrete categories, whereas histogram will automatically create bins.
proc sgplot data=WORK.have;
histogram _24hrPtVolumeESI_1_5 /
scale=count
binstart=100
binwidth=25
group=intervention
transparency=0.5
showbins
;
xaxis display=(nolabel);
yaxis grid;
run;

Converting daily data to weekly data in SAS

I have the DAILY returns of industry portfolios in SAS.
I would like to calculate the WEEKLY returns.
The daily returns are in percentage so I think that should just be the sum of returns during each week.
Obvious problems I am facing is that the weeks can have a different number of days in.
The table I have in SAS is in the following format:
INDUSTRY_NUMBER DATE DAILY_RETURN
Any help would be greatly appreciated.
I have tried this:
proc expand data=Day_result
out=Week_result from=day to=week;
Industry_Number Trading_Date;
convert Value_weighted_return / method=aggregate observed=total;
run;
The daily data is in Day_Result when I remove the forth line i.e.
proc expand data=Day_result
out=Week_result from=day to=week;
convert Value_weighted_return / method=aggregate observed=total;
run;
This works as in it does what I want it to do but it doesn't do it for each category it does it for the whole table.
So if I have 40 categories I want the weekly returns for each category.
The second set of code provides the weekly return for every category.
EXAMPLE DATA:
data have;
format trading_date date9.;
infile datalines dlm=',';
input trading_date:ddmmyy10. industry_number value_weighted_return;
datalines;
19/01/2000,1, -0.008
20/01/2000,1, 0.008
23/01/2000,1, 0.008
24/01/2000,1, -0.007
25/01/2000,1, -0.009
26/01/2000,1, 0.008
27/01/2000,1, -0.008
30/01/2000,1, 0.003
31/01/2000,1, -0.001
01/02/2000,1, 0.004
02/02/2000,2, -0.008
03/02/2000,2, -0.005
06/02/2000,2, -0.004
07/02/2000,2, -0.009
08/02/2000,2, 0.002
09/02/2000,2, 0.006
10/02/2000,2, 0.008
13/02/2000,2, 0.008
14/02/2000,2, 0.002
15/02/2000,2, 0.01
16/02/2000,2, -0.008
;
run;
Sort your data by INDUSTRY_NUMBER Trading_Date, use INDUSTRY_NUMBER as a by-group, identify your time variable.
proc sort data=have;
by industry_number trading_date;
run;
Next, convert your data into a time-series to remove any time gaps. Set any missing days as the previous value since it does not change on those trading days (e.g. weekends, bank holidays, etc.).
proc timeseries data=have
out=have_ts;
by industry_number;
id trading_date interval=day
setmissing=previous
accumulate=average
;
var value_weighted_return;
run;
Finally, take the time-series output and convert it from day to week. Since you are using weights, you may want to use average rather than total.
proc expand data=have_ts
out=have_ts_week
from=day
to=week
;
by industry_number;
id trading_date;
convert Value_weighted_return / method=aggregate observed=average;
run;

SAS: Calculating rolling skew of previous 30 days

I want to calculate the skew of a timeseries (stock returns) of the previous 30 days on a rolling basis (thus, getting daily values).
Dataset looks like:
Stock date month year return
1SF7 1/07/2016 7 2016 0.94
1SF7 5/07/2016 7 2016 0.91
1SF7 6/07/2016 7 2016 0.82
1SF7 7/07/2016 7 2016 0.95
..........
Currently, I tried proc means and just calculate month-end skewness
proc means data=have; by year month;
output out= want (drop= _freq_ _type_ ) skew(return)=Skew_monthly;
run;
Anyone has an idea for rolling skewness? I know there is a question here that asks for rolling skewness, but the answer to that only outputs one value per 30 days, but I want daily values.
Thankful for any input!
Marc
Thanks, I managed it with the array version:
data want; array p{0:29} _temporary_;
set have; by symbol;
if symbol then call missing(of p{*});
p{mod(_n_,30)} = return;
skew = skewness(of p{*});
run;

Calculating Percentile by Date

I have the following datasets:
Date Primary_Occupation Jobs
1/1/2005 Math 23
1/1/2005 Science 7
1/1/2005 Food 10
1/1/2006 Math 10
1/1/2006 Sales 64
1/1/2006 Transportation 21
All the way until 11/1/2015
I am trying to tabulate the percentage of jobs by Primary_Occupation and overtime
I saw that proc univariate has a bunch of percentile options, but neither of them seem to be the solution for what I am looking to do.
Here's a template for you to get started. It creates a table with frequencies and percentages. In this example, the output table "summary" contains summary stats for this class of students by sex and age.
proc freq data=sashelp.class;
table sex*age / out=summary;
run;