Grouping observation and forming new variable [duplicate] - sas

In SAS, I have a data set similar to the one below.
ID TRACT meanFA sdFA medianFA
1 t01 0.56 0.14 0.56
1 t02 0.53 0.07 0.52
1 t03 0.71 0.08 0.71
2 t01 0.72 0.09 0.72
2 t02 0.83 0.10 0.86
2 t03 0.59 0.10 0.62
I am not sure if transpose is the right concept here... but I would want the data to look like the one below.
ID t01_meanFA t01_sdFA t01_medianFA t02_meanFA t02_sdFA t02_medianFA t03_meanFA t03_sdFA t03_medianFA
1 0.56 0.14 0.56 0.53 0.07 0.52 0.71 0.08 0.71
2 0.72 0.09 0.72 0.83 0.10 0.86 0.59 0.10 0.62
proc transpose data=TRACT out=newTRACT;
var meanFA sdFA medianFA;
by id;
id tract meanFA sdFA medianFA;
run;
I have been playing around with the SAS code above, but with no success. Any ideas or suggestions would be great!

Double transpose is how you get to that. Get it to a dataset that has one row per desired variable per ID, so
ID=1 variable=t01_meanFA value=0.56
ID=1 variable=t01_sdFA value=0.14
...
ID=2 variable=t01_meanFA value=0.72
...
Then transpose using ID=variable and var=value (or whatever you choose to name those columns). You create the intermediate dataset by creating an array of your values (array vars[3] meanFA sdFA medianFA;) and then iterating over that array, setting variable name to catx('_',tract,vname(vars[n])); (vname gets the variable name of the array element).

You need 2 transposes. Transpose, use a data step to update then _NAME_ variable, and then transpose again;
proc transpose data=tract out=tract2;
by id tract;
run;
data tract2;
format _name_ $32.;
set tract2;
_name_ = strip(tract) || "_" || strip(_name_);
run;
proc transpose data=tract2 out=tract3(drop=_name_);
by id;
/*With no ID statement, the _NAME_ variable is used*/
var col1;
run;

Using example data from this duplicate question.
You can also just do this with a data step.
First, put the maximum sequence number into a macro variable.
proc sql;
select
max(sequence_no) into : maxseq
from
have
;
quit;
Create arrays for your new variables, setting the dimensions with the macro variable. Then loop over each visit, putting the events and notes into their respective variables. Output 1 line per visit.
data want(drop=sequence_no--notes);
do until (last.visit_no);
set have;
by id visit_no;
array event_ (&maxseq);
array notes_ (&maxseq) $;
event_(sequence_no)=event_code;
notes_(sequence_no)=notes;
end;
output;
run;

Related

Converting daily data to weekly data in SAS

I have the DAILY returns of industry portfolios in SAS.
I would like to calculate the WEEKLY returns.
The daily returns are in percentage so I think that should just be the sum of returns during each week.
Obvious problems I am facing is that the weeks can have a different number of days in.
The table I have in SAS is in the following format:
INDUSTRY_NUMBER DATE DAILY_RETURN
Any help would be greatly appreciated.
I have tried this:
proc expand data=Day_result
out=Week_result from=day to=week;
Industry_Number Trading_Date;
convert Value_weighted_return / method=aggregate observed=total;
run;
The daily data is in Day_Result when I remove the forth line i.e.
proc expand data=Day_result
out=Week_result from=day to=week;
convert Value_weighted_return / method=aggregate observed=total;
run;
This works as in it does what I want it to do but it doesn't do it for each category it does it for the whole table.
So if I have 40 categories I want the weekly returns for each category.
The second set of code provides the weekly return for every category.
EXAMPLE DATA:
data have;
format trading_date date9.;
infile datalines dlm=',';
input trading_date:ddmmyy10. industry_number value_weighted_return;
datalines;
19/01/2000,1, -0.008
20/01/2000,1, 0.008
23/01/2000,1, 0.008
24/01/2000,1, -0.007
25/01/2000,1, -0.009
26/01/2000,1, 0.008
27/01/2000,1, -0.008
30/01/2000,1, 0.003
31/01/2000,1, -0.001
01/02/2000,1, 0.004
02/02/2000,2, -0.008
03/02/2000,2, -0.005
06/02/2000,2, -0.004
07/02/2000,2, -0.009
08/02/2000,2, 0.002
09/02/2000,2, 0.006
10/02/2000,2, 0.008
13/02/2000,2, 0.008
14/02/2000,2, 0.002
15/02/2000,2, 0.01
16/02/2000,2, -0.008
;
run;
Sort your data by INDUSTRY_NUMBER Trading_Date, use INDUSTRY_NUMBER as a by-group, identify your time variable.
proc sort data=have;
by industry_number trading_date;
run;
Next, convert your data into a time-series to remove any time gaps. Set any missing days as the previous value since it does not change on those trading days (e.g. weekends, bank holidays, etc.).
proc timeseries data=have
out=have_ts;
by industry_number;
id trading_date interval=day
setmissing=previous
accumulate=average
;
var value_weighted_return;
run;
Finally, take the time-series output and convert it from day to week. Since you are using weights, you may want to use average rather than total.
proc expand data=have_ts
out=have_ts_week
from=day
to=week
;
by industry_number;
id trading_date;
convert Value_weighted_return / method=aggregate observed=average;
run;

SAS: robust regression and output coefficients, t values and adj R squares

I am running robust regression by group in SAS .
My data is like
id stock date stock_liq market_liq
1 VOD 1/5/2016 0.03 0.02
1 VOD 2/5/2016 0.04 0.025
... ... ... ... ...
2 SAB 1/5/2016 0.31 0.02
2 SAB 1/5/2016 0.31 0.02
... ... ... ... ...
Its a panel data and each stock has a unique ID. I want to run robust regression by ID and I want to output the coefficients, t values and adj-R squares.
My code is:
proc robustreg data=have outest= want noprint;
model stock_liq=market_liq ;
by id;
run;
However I don't think the code runs properly. SAS just stops running and the log gives me
"Error: Too many parameters in the model".
Can anyone advise ? Thank you !
The syntax is a bit off. Also the requested outputs can be added:
proc robustreg data=have outest= want noprint;
by id;
model stock_liq=market_liq ;
output out=output_sas
p=stock_liq
r=stock_liqresid ;
run;
See more on the output options from documentation

Proc Transpose 2 columns together in SAS [duplicate]

In SAS, I have a data set similar to the one below.
ID TRACT meanFA sdFA medianFA
1 t01 0.56 0.14 0.56
1 t02 0.53 0.07 0.52
1 t03 0.71 0.08 0.71
2 t01 0.72 0.09 0.72
2 t02 0.83 0.10 0.86
2 t03 0.59 0.10 0.62
I am not sure if transpose is the right concept here... but I would want the data to look like the one below.
ID t01_meanFA t01_sdFA t01_medianFA t02_meanFA t02_sdFA t02_medianFA t03_meanFA t03_sdFA t03_medianFA
1 0.56 0.14 0.56 0.53 0.07 0.52 0.71 0.08 0.71
2 0.72 0.09 0.72 0.83 0.10 0.86 0.59 0.10 0.62
proc transpose data=TRACT out=newTRACT;
var meanFA sdFA medianFA;
by id;
id tract meanFA sdFA medianFA;
run;
I have been playing around with the SAS code above, but with no success. Any ideas or suggestions would be great!
Double transpose is how you get to that. Get it to a dataset that has one row per desired variable per ID, so
ID=1 variable=t01_meanFA value=0.56
ID=1 variable=t01_sdFA value=0.14
...
ID=2 variable=t01_meanFA value=0.72
...
Then transpose using ID=variable and var=value (or whatever you choose to name those columns). You create the intermediate dataset by creating an array of your values (array vars[3] meanFA sdFA medianFA;) and then iterating over that array, setting variable name to catx('_',tract,vname(vars[n])); (vname gets the variable name of the array element).
You need 2 transposes. Transpose, use a data step to update then _NAME_ variable, and then transpose again;
proc transpose data=tract out=tract2;
by id tract;
run;
data tract2;
format _name_ $32.;
set tract2;
_name_ = strip(tract) || "_" || strip(_name_);
run;
proc transpose data=tract2 out=tract3(drop=_name_);
by id;
/*With no ID statement, the _NAME_ variable is used*/
var col1;
run;
Using example data from this duplicate question.
You can also just do this with a data step.
First, put the maximum sequence number into a macro variable.
proc sql;
select
max(sequence_no) into : maxseq
from
have
;
quit;
Create arrays for your new variables, setting the dimensions with the macro variable. Then loop over each visit, putting the events and notes into their respective variables. Output 1 line per visit.
data want(drop=sequence_no--notes);
do until (last.visit_no);
set have;
by id visit_no;
array event_ (&maxseq);
array notes_ (&maxseq) $;
event_(sequence_no)=event_code;
notes_(sequence_no)=notes;
end;
output;
run;

How to assign the result from %macro to a macro variable

I have data set with probabilities to purchase a particular product per observation. Here is an example:
DATA probabilities;
INPUT id P_prod1 P_prod2 P_prod3 ;
DATALINES;
1 0.02 0.5 0.32
2 0.6 0.08 0.12
3 0.8 0.34 0.001
;
I need to calculate the median for each product. Here's how I do that:
%macro get_median (product);
proc means data=probabilities median;
var &product ;
output out=median_data (drop=_type _freq_) median=median;
run;
%mend;
At this point I can get the median for each product by calling
%get_median(P_product1);
Now, the last thing that I want to do is to assign the numeric result for the median to a macro variable. My best guess for how to do that would be something like:
%let med_P_prod1=%get_median(P_prod1);
but unfortunately that does not work.
Can someone help, please?
Cheers!
The simplest solution is to define a %global macro variable and set the let statement to the numeric result inside the macro.
%macro get_median (product);
proc means data=probabilities median;
var &product ;
output out=median_data (drop=_type _freq_) median=median;
run;
%global macroresult;
proc sql;
select median into :macroresult separated by ' ' from median_data;
quit;
%mend;
(That SQL statement is equivalent to LET in that it defines a macro variable, but it is better at getting results from data.)
I'd also recommend just using the dataset in your code rather than putting the value in a macro variable.

sas gchart hbar bars overlap with each other

OK. Finally, I get the chance to address this problem properly. I came across this problem on SAS EG.
First, I have the following dataset:
data test;
infile datalines;
input var1 var2;
datalines;
0.01 200
0.02 200
0.03 200
0.04 200
0.05 200
0.06 200
0.07 200
0.08 200
0.09 200
0.10 200
0.11 200
0.12 200
0.13 200
0.14 200
0.15 200
11111111111111111111111111 200
;
run;
When I try to plot var1(x-axis) against var2(y-axis) in a gchart hbar, it works fine:
PROC GCHART DATA=test;
HBAR age /
SUMVAR=income missing discrete clipref frame;
run;quit;
The chart is
But when I specify goptions reset=all device=gif; The chart becomes:
Clearly, there is an extreme value and all the other bars overlap with each other. Notice that even that I put discrete option in my hbar statement, when I put goptions in, it seems not working.
Obviously, the purpose here is to just put var1 evenly on x-axis, rather than putting them according to their numeric values. So the first chart is what I want. But I need the goptions in order to output the chart to a gif file.
Is there anyone having the similar experience and what would be the solution? Many thanks.
The easiest solution is to change the type of age from number to character. SAS will not try to space character values relative to their the values as it tries with numeric values.