Stepwise selection method in (SAS 9.3) PROC REG - sas

I'm running a multivariate linear regression model in SAS (v. 9.3) using the REG procedure with the stepwise statement, as follows below:
(1) Set the regressors list:
%let regressors = x1 x2 x3;
(2) Run the procedure:
ods output DWStatistic=DW ANOVA=F_Fisher parameterestimates=beta CollinDiag=Collinearita outputstatistics=residui fitstatistics=rsquare;
proc reg data=base_dati outest=reg_multivar edf;
model TD&eq. = &regressors. /selection=stepwise`SLSTAY=&signif_amm_multivar_stay. SLENTRY=&signif_amm_multivar_entry. VIF COLLIN adjrsq DW R influence noint;
output out=diagnostic;
quit;
ods output close;
By adding one regressor to the list, let's say x4, to the macro-variable &regressors., the beta value estimates change, although the selected variables are the same ones.
In practice, in both cases the variables chosen from such selection method are x1 and x2, but beta parameters for x1 and x2 change in the second case with respect to the second case.
Could you provide an explanation for that?
It would be nice to have a reference for such explanation.
Thanks all in advance!

I'm going to guess that you have missing data. SAS removes records row wise. So if you include 2 more variables that happen to have a few missing those entire records will be missing which means you're not actually using the exact same data between each regression model.

Related

Predicting a value using certain x values in multiple Linear regression in sas

In a multiple linear regression problem i have to predict a value using x1=77, x2=20, x3=1998
The code I currently have is
PROC reg Data=GPA;
model y=x1 x2 x3/i;
Output out=new3 student=student2 p=predict2 r=resid2;
Run;
The code runs but I’m not sure how to use the input values to predict another value
Ryan,
An easy way to do this is to add a new row to your data set using the following code:
proc sql;
insert into test (x1, x2, x3) values (77, 20, 1998);
quit;
Then run the model again and you should get your predicted value for y.

how to include all the variables in a dataset in proc reg using a loop and/or macro

I have a number of csv data sets where each data set has up to 5 variables. The variable names are alpha numeric (z1,z2,z3,z4 and z5 are the variable names). So, each data set can have any number of combinations of the variables above with a max of up to 5 variables. I need to run proc reg with predictor variables being all the variables that are in any particular data set. For example if my 1st data set has z1 and z2 as the variable names, I need to run prog reg with z1 and z2 as the predictor variables. If the next data set has z2 z4 and z5 as the variables, I need to run proc reg with z2 z4 and z5 as the predictor variables. The y variable is in a separate data set.
Now, my question is how do I tell sas to use the the variables of the corresponding data set as the predictor variables for proc reg? I have a macro loop set up already to read each individual data set. But I dont know how to use the macro to provide the predictor variables under proc reg.
Any help will be greatly appreciated.
Thank you
If you do not have any other variables that start with Z then you should be able to use Z: to create a variable list.
proc reg data=have ;
model dependent = z: ;
run;

SAS output confidence interval to plot for strata variable after logistic regression

I want to put all confidence interval plot in one plot for all strata variable after logistic regression. For example, my SAS code is:
proc logistic data=data1;
model y = x;
strata cv1;
output out=out1 unknown1=x_beta1 unknown2=lowerbound unknown3=upperbound unknown4=strata_variable;
run;
I do not know what variable names(unknown1 unknown2 unknown3) I can use in the output statement. As in the sas support page, it said "If a STRATA statement is specified, only the PREDICTED=, DFBETAS=, and H= options are available",here is the link.
My plot statement will be:
proc sgplot data=out1;
scatter y=strata_variable x=x_beta1 / xerrorlower=lowerbound xerrorupper=upperbound
markerattrs=(symbol=circlefilled size=9);
run;
The first plot in this page shows exactly what I want. Sorry I cannot insert any plot as my reputation is not high enough.
I find an another way to finish this. I wrote a macro do loop to get every strata data. And then added
ods output OddsRatios=odds_temp;
to get the estimation and confidence interval and merger all the strata together to make the plot I need.

use estimates from proc glm to make prediciton on another dataset

I'm not so familiar with SAS proc glm. All I have done using proc glm so far is to output parameter estimates and predicted values on training datasets. But I also need to use the fitted model to make prediction on testing dataset. (both point estimates and interval estimates)
Here is my code.
ods output ParameterEstimates=Pi_Parameters FitStatistics=Pi_Summary PredictedValues=Pi_Fitted;
proc glm data=Train_Pi;
class Area Fo5 Tye M0 M1 M2 M3;
model Pi = Dow Area Fo5 Tye M0|HC M1|HC M2|HC M3|HC/solution p ss3 /*tolerance*/;
run;
But how to proceed to next step? something like predict(Model_from_Train_Pi,Test_Pi)
If you're on SAS 9.4 see Jake's answer from this question:
How to predict probability in logistic regression in SAS?
If not on 9.4, my answer applies for adding the data in to the original data set.
A third option is PROC SCORE - documentation has an example for proc reg that's almost identical to your question:
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_score_sect018.htm

Regression with both robust (white) standard errors and CLASS variable for fixed effects

proc glm makes it easy to add fixed effects without creating dummy variables for every possible value of the class variable.
proc reg is able to calculate robust (White) standard errors, but it requires you to create individual dummy variables.
Is there any way to combine these functionalities? I'd like to be able to add a number of class variables and receive White standard errors in my output. For example:
With proc glm, I can do this regression. This will give correct results no matter how many levels are contained in the class variables, but it won't calculate robust standard errors.
proc glm data=ds1;
class class1 class2 class3;
weight n;
model y = c class1 class2 class3 / solution;
run;
with proc reg, I can do :
proc reg data=ds2;
weight n;
model y = x / white;
run;
Which has white standard errors, but doesn't incorporate the fixed effects.
To do that, I might need 50 or more dummy variables and a model statement like model y = x class1_d1 class1_d2 ... class3_dn /white;. Would turn into a crazy number or dummy variables if I started adding interaction terms.
Obviously I could write a macro to create the dummy variables, but this seems like such a basic function that I can't help but think I am missing something obvious (STATA and R both have ways to do this easily). Why can't I either use the class statement in proc reg or get robust standard errors out of proc glm?
I think I found part of the answer although I would be interested in other solutions or tweaks to this one.
proc glmmod can be used to create the dataset for proc reg:
proc glmmod noprint outdesign=ds2 data=ds1;
class class1 class2 class3;
weight n;
model y = c class1 class2 class3;
run;
proc reg data=ds2;
weight n;
model y = col2-col50 / white;
run;
proc glmmod uses the GLM syntax and outputs a regression dataset with all of the dummy variables that proc reg needs.
Not as clean as a single-PROC solution (and you have to keep track of the labels to see what ColXX refers to), but it seems to work perfectly.
I think you can:
(1) remove observations with missing variables
(2) demean the independent variables using proc standard
(3) regress the dependent variables on the demeaned independent variables
http://pages.stern.nyu.edu/~adesouza/sasfinphd/index/node60.html
http://pages.stern.nyu.edu/~adesouza/sasfinphd/index/node61.html
The coefficients from the above procedure are exactly the same as those from proc glm (Frisch-Waugh Theorem). But, you do not have to create dummies (which is your main problem). To get robust standard errors, you can simply use proc reg on step(3) with white standard errors.
Hope that helps.
I think I have an answer for this (or at least, if I don't, I might find out by posting my solution here).
According to this page one can compute robust standard errors with proc surveyreg by clustering the data so that each observation is its own cluster. Like this:
data mydata;
set mydata;
counter=_n_;
run;
proc surveyreg data=mydata;
cluster counter;
model y=x;
run;
But proc surveyreg takes a class statement, so that one can run e.g.
proc surveyreg data=mydata;
class t;
cluster counter;
model y= t x*t / solution;
run;