How to use proc transpose on variables that contain numbers separated with _? - sas

Hi I am new to sas I have a question regarding proc transpose
I have this data
Input
School Name State School Code 26/07/2009 02/08/2009 09/08/2009 16/08/2009
Northwest High IL 14556 06 06 06 06
Georgia High GA 147 05 05 05 06
Macy Hgh TX 45456 NA NA NA NA
The desired output is
School Name State School Code Date Absent
Northwest High IL 14566 26/07/2009 6
Northwest High IL 14556 02/08/2009 6
Northwest High IL 14556 09/08/2009 6
Northwest High IL 14556 16/08/2009 6
Georgia High GA 147 26/07/2009 5
Georgia High GA 147 02/08/2009 5
Georgia High GA 147 09/08/2009 5
Georgia High GA 147 16/08/2009 6
Macy Hgh TX 45456 26/07/2009 NA
Macy Hgh TX 45456 02/08/2009 NA
Macy Hgh TX 45456 09/08/2009 NA
Macy Hgh TX 45456 16/08/2009 NA
This is the code I have written
proc sort data=work.input;
by School_Name State School_Code;
run;
proc transpose data=work.input out=work.inputModified;
by by School_Name State School_Code;
run
I get this error saying that No variables to transpose I think the issue is since the variables are actual numbers like this _26_07_2009 sas does not recognize them,
But I don't get the desired output the dates are actual variables when imported into sas they become _26_07_2009. Note there are about 185 dates and they are actual variables.
Thanks

The following transpose does the job:
proc transpose data=work.input out=work.inputModified;
by School_Name State School_Code;
var _:;
run;
Notice the _: notation - it picks up all variables which start with an underscore and transposes them.
As I mentioned in the link in my comments earlier, if you do not explicitly specify the variables you want to tranpose- then proc transpose by default looks for numeric variables that are not in the by variable list to transpose. However, since your date variables are read-in as strings [due to the presence of NAs] it was saying NOTE: No variables to transpose.
You can use the following to convert the date and absent columns into numeric columns.
data inputModified2;
set inputModified;
format date date9.;
date = input(compress(tranwrd(_name_,'_','')), ddmmyy8.);
if col1 NE 'NA' then absent = input(col1, 8.);
else absent=.;
drop _name_ col1;
run;

Related

Rolling Window Model for Unbalanced Dataset in SAS

I have an unbalanced panel dataset of the following form (simplified):
data have;
input ID YEAR EARN LAG_EARN;
datalines;
1 1960 450 .
1 1961 310 450
1 1962 529 310
2 1978 10 .
2 1979 15 10
2 1980 8 15
2 1981 10 8
2 1982 15 10
2 1983 8 15
2 1984 10 8
3 1972 1000 .
3 1973 1599 1000
3 1974 1599 1599
;
run;​
I now want to estimate the following model for each ID:
proc reg;
by ID;
EARN = LAG_EARN;
run;
However, I want to do this for rolling windows of some size. Say for example for windows of size 2. The window should only contain non-empty observations. For example, in the case of firm A, the window is applicable from 1961 onwards and thus only one time (since only one year follows after 1961 and the window is supposed to be of size 2).
Finally, I want to get a table with year columns and firm rows. The table should indicate the following: The regression model (with window size 2) has been performed one time for firm A. The quantity of available years, has only allowed one estimation of this model. Put differently, in 1962 the coefficient of the regression model has a value of X based on the 2 year prior window. Applying the same logic to the other two firms, one can get the following table. "X" representing the respective estimated coefficient value in certain year for firm A/B/C based on the 2-year window and "n" indicating the non-existence of such a value:
data want;
input ID 1962 1974 1980 1981 1982 1983 1984;
datalines;
1 X n n n n n n
2 n n X X X X X
3 n X n n n n n
;
run;​
I do not know how to execute this. Furthermore, I would like to create a macro that allows me to estimate different rolling window models while still creating analogous output dataframes. I would appreciate any help with it, since I have been struggling quite some time now.
Try this macro. This will only output if there are non-missing values of lags that you specify.
%macro lag(data=, out=, window=);
data _want_;
set &data.;
by ID;
LAG_EARN = lag&window.(earn);
if(first.ID) then call missing(lag_earn);
if(NOT missing(lag_earn));
run;
proc sort data=_want_;
by year id;
run;
proc transpose data=_want_
out=&out.(drop=_NAME_);
by ID notsorted;
id year;
var lag_earn;
run;
proc sort data=&out.;
by id;
run;
%mend;
%lag(data=have, out=want, window=1);

Proc report- grouping

I have an easy table, and I need to create a complicated report. I tried to do it with proc report using lots of grouping but didn't give me right result. Here is my example table :
campus id year gender
West 35 2013 F
West 35 2014 F
West 35 2015 F
West 38 2014 M
West 38 2015 M
East 48 2014 -
East 48 2015 -
East 55 2013 F
East 55 2014 F
And this is the report I need to create:
west east
2014 2015 2014 2015
total 2 2 2 1
Gender 2 2 2 1
F 1 1 1 -
M 1 1 - -
none - - 1 1
So I have 4 different group: I worked on this code
proc tabulate data=a ;
class gender year ;
table gender, year*n*f=4. ;
by id;
run ;
Do you think I can do total first, then gender. And tehn I can append them?
This doesn't quite match your requested output, but I'm not sure having the total repeated makes sense either. Proc Tabulate works well here:
proc tabulate data=have;
class campus year gender/missing;
table (all='Total' gender='Gender'), campus=''*year=''*n='';
run;

Applying cutoff to data set with IDs

I am using SAS and managed to run proc logistic, which gives me a table like so.
Classification Table
Prob Correct Incorrect Percentages
Level Event Non- Event Non- Correct Sensi- Speci- FALSE FALSE
Event Event tivity ficity POS NEG J
0 33 0 328 0 9.1 100 0 90.9 . 99
0.02 33 62 266 0 26.3 100 18.9 89 0 117.9
0.04 31 162 166 2 53.5 93.9 49.4 84.3 1.2 142.3
0.06 26 209 119 7 65.1 78.8 63.7 82.1 3.2 141.5
How do I include IDs for the rows of data in lib.POST_201505_PRED below that have at least 0.6 probability?
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
model BUYER =
age
tenure
usage
payment
loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505 out=lib.POST_201505_PRED outroc=lib.POST_201505_ROC;
run;
I've been reading the documentation and searching online but haven't found anything on it. I must be searching for the wrong keywords, as I presume this is a frequently used process.
You just need an id-statement to tell SAS your ID-variable identifies your observations;
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
id ID;
model BUYER = age tenure usage payment loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505
out=lib.POST_201505_PRED
outroc=lib.POST_201505_ROC;
run;
Now your output contains all you need.
For instance to print the IDs that get had probability of at least 0.6 assigned of being a BUYER to them;
proc print data=lib.POST_201505_PRED (where=(P_1 GE 0.6));
var ID P_1;
run;
You find these id yourKey; statements throughout the statistical procedures in SAS, for instance ;
proc univariate data=psydata.stroop;
id Subject;
var ReadTime;
run;
** will report the most extreme values of ReadTime as
;
Turns out I just had to include the ids in lib.POST_201505

removing common prefix or suffix

I have a data set contains a series variables named; PG_86xt, AG_86xt,... with same suffix _86xt. How can I remove such suffix while renaming these variables?
I know how to add prefix or suffix. But the logic of removing them seems to be a little bit different. I think proc dataset modify is still the way to go. But the length of substring before suffix (or after prefix) is unknown.
The example on how to add prefix or suffix
data one;
input id name :$10. age score1 score2 score3;
datalines;
1 George 10 85 90 89
2 Mary 11 99 98 91
3 John 12 100 100 100
4 Susan 11 78 89 100
;
run;
proc datasets library = work nolist;
modify one;
rename &suffixlist;
quit;
You can use the scan function to get the desired result.
By altering the example you have in the link to fit your example:
data one;
input id name :$10. age PG_86xt AG_86xt IG_86xt;
datalines;
1 George 10 85 90 89
2 Mary 11 99 98 91
3 John 12 100 100 100
4 Susan 11 78 89 100
;
run;
By filtering on only those column that fits your convention (XX_86xt), you could use the first part of the scan for renaming.
proc sql noprint;
select cats(name,'=',scan(name, 1, '_'))
into :suffixlist
separated by ' '
from dictionary.columns
where libname = 'WORK' and memname = 'ONE' and '86xt' = scan(name, 2, '_');
quit;
You can use the index function to find the (first) place in each variable name where the suffix / prefix starts, then use that to construct appropriate parameters for substr. It's a bit more work than the code in your example, but you'll get there.

How to subset automatically in SAS?

I am new to SAS, so this might be a silly type of question.
Assume there are several datasets with similar structure but different column names. I want to get new datasets with the same number of rows but only a subset of columns.
In the following example, Data_A and Data_B are original datasets and SubA and SubBare what I want. What is the efficient way of deriving SubA and SubB?
DATA A_auto;
LENGTH A_make $ 20;
INPUT A_make $ 1-17 A_price A_mpg A_rep78 A_hdroom A_trunk A_weight A_length A_turn A_displ A_gratio A_foreign;
CARDS;
AMC Concord 4099 22 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 17 3 3.0 11 3350 173 40 258 2.53 0
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
;
RUN;
DATA B_auto;
LENGTH make $ 20;
INPUT B_make $ 1-17 B_price B_mpg B_rep78 B_hdroom B_trunk B_weight B_length B_turn B_displ B_gratio B_foreign;
CARDS;
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;
DATA SubA;
set A_auto;
keep A_make A_price;
RUN;
DATA SubB;
set B_auto;
keep B_make B_price;
RUN;
Here's my new answer. This introduces quite a few concepts, but all are necessary to complete this task.
First of all I would store the required part variable names (the suffixes that are common to all datasets) in a new dataset. This keeps them all in one place and makes it easier to change if required.
The next step is to create a regular expression (regex) search string that combines all the names, separated by a pipe (|), which is the regex symbol for or. I've also added a $ symbol to end of the names, this ensures only variables ending with the part names will be selected.
select into :[macroname] is the method to create macro variables within proc sql
Then I set up a macro to extract the specific variable names for the current dataset and use those names to create a view (like my original answer)
The dictionary library referenced in the proc sql is a metadata library that contains information on all active libraries, tables, columns etc, so is a good source of identifying what the actual variable names are called (based on the regex search string created earlier).
You won't need the proc print in your code, I just put it in to show everything is working as expected.
Let me know if this works for you
/* create intial datasets */
DATA A_auto;
LENGTH A_make $ 20;
INPUT A_make $ 1-17 A_price A_mpg A_rep78 A_hdroom A_trunk A_weight A_length A_turn A_displ A_gratio A_foreign;
CARDS;
AMC Concord 4099 22 3 2.5 11 2930 186 40 121 3.58 0
AMC Pacer 4749 17 3 3.0 11 3350 173 40 258 2.53 0
Audi Fox 6295 23 3 2.5 11 2070 174 36 97 3.70 1
;
RUN;
DATA B_auto;
LENGTH B_make $ 20;
INPUT B_make $ 1-17 B_price B_mpg B_rep78 B_hdroom B_trunk B_weight B_length B_turn B_displ B_gratio B_foreign;
CARDS;
Toyota Celica 5899 18 5 2.5 14 2410 174 36 134 3.06 1
Toyota Corolla 3748 31 5 3.0 9 2200 165 35 97 3.21 1
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 1
;
RUN;
/* create dataset containing partial name of variables to keep */
data keepvars;
input part_name $ :20.;
datalines;
_make
_price
;
run;
/* create regular expression search string from partial names */
proc sql noprint;
select
cats(part_name,'$') /* '$' matches end of string */
into
:name_str separated by '|' /* '|' is an 'or' search operator in regular expressions */
from
keepvars;
quit;
%put &name_str.; /* print search string to log */
/* macro to create views from datasets */
%macro create_views (dsname, vwname); /* inputs are dataset name being read in and view name being created */
/* extract specific variable names to be kept, based on search string */
proc sql noprint;
select
name
into
:vars separated by ' '
from
dictionary.columns
where
libname = 'WORK'
and memname = upper("&dsname.")
and prxmatch("/&name_str./",strip(name))>0; /* prxmatch is regular expression search function */
quit;
%put &vars.; /* print variables to keep to log */
/* create views */
data &vwname. / view=&vwname.;
set &dsname. (keep=&vars.);
run;
/* test view by printing */
proc print data=&vwname.;;
run;
%mend create_views;
/* run macro for each dataset */
%create_views(A_auto, SubA);
%create_views(B_auto, SubB);