how to label regression models? - sas

Im running 2 different regressions from the same dataset. I would like to name them differently to prevent confusion. How can i assign the label name
proc reg data = main outest = main_regression1;
model x= a b c;
run ;
proc reg data = main outest = main_regression2;
model x= a b d;
run ;
data regression_summary ;
set main_regression1 main_regression2 ;
run ;

You can add multiple models within a single proc reg and assign each one a label.
proc reg data = main outest = est;
FirstModel: model x = a b c;
SecondModel: model x = a b d;
run;
Output of est:
_MODEL_ _TYPE_ ...
FirstModel PARMS ...
SecondModel PARMS ...

You can use the (in=) option to create a temporary flag variable and then assign a value to it in another variable. For example:
data regression_summary ;
set main_regression1 (in=A) main_regression2 ;
if A=1 then LABEL = "Model 1";
else if A=0 then LABEL = "Model 2";
run ;

Related

How to reuse use SAS output in other procedures

How do I use the output variable of one PROC into another PROC.
I'm new to SAS and have spend many hours trying to solve this problem in the program below.
DATA FA2;
SET FA2;
proc iml;
start main;
use FA2;
read all var {Close};
s = Close;
u = j(nrow(s)-1,1,0);
do i=2 to nrow(s);
u[i-1]=log(s[i]/s[i-1]);
end;
n=nrow(s)-1;
rsigma=sqrt(252/n*(u'*u));
mu = mean(u);
call qntl(q, u);
print q[rowname={"P05", "P95"}];
s = quartile(u);
PRINT mu, rsigma, s;
finish;
run;
PROC UNIVARIATE DATA = FA2;
var u;
run;
ERROR
PROC UNIVARIATE DATA = FA2;
var u;
Variable U not found.
run;
Your first two lines accomplish nothing and are not useful - remove them.
data fa2;
set fa2;
You are not creating any output data set within the IML statement, and you really shouldn't reuse the same data set name (FA2) so many times. It makes it harder to debug your code. See the example below.
proc iml;
/** create SAS data set from vectors **/
y = {1,0,3,2,0,3}; /** 6 x 1 vector **/
z = {8,7,6,5,6}; /** 5 x 1 vector **/
c = {A A A B B B}; /** 1 x 6 character vector **/
create MyData var {y z c}; /** create data set **/
append; /** write data in vectors **/
close MyData; /** close the data set **/
quit;
IML ends with a QUIT; at the end.
PROC UNIVARIATE is referencing your FA2 data set which does not have U, because you did not save the variable to the data set so it doesn't exist. So it is correctly generating an error. Fixing the above issues should resolve this.

sas proc surveymeans

I have a task that requires me to input a 'custom' DF value in proc surveymeans. In order to do this, I need to have a "repweights" statement that includes the repweights variables. To get these variables I ran surveymeans that created the repweight variables. I then re-ran the program using these repweights and repweights statement. To check my work I compared the output between the two programs and found it to be slightly different. Should this be the case if I'm using the repweights the program automatically generates? Any help on this would be really appreciated.
prog 1:
proc surveymeans data = original_data varmethod = JK (outweights = weights_out);
var var1;
strata var2 var3;
weight varweight; /*data contains 10 obs*/
run;
data newdata;
merge orginal_data
weights_out (keep = repwt1- repwt10) ;
/*repwt's created from outweights statement */
run;
Prog 2:
Proc surveymeans data = newdata varmethod = JK ;
var var1;
repweights repwt1 - repwt10 / df = 20;
run;

Estimating a response value based on known parameters

SAS newbie here.
My question is about PROC REG in SAS; let's assume I have already created a model and now I would like to use this model, and known predictor variables to estimate a response value.
Is there a clean and easy way of doing this in SAS? So far I've been manually grabbing the intercept and the coefficients from the output of my model to calculate the response variable but as you can imagine it can get pretty nasty when you have a lot of covariates. Their user's guide is pretty cryptic...
Thanks in advance.
#Reese is correct. Here is some sample code to get you up the learning curve faster:
/*Data to regress*/
data test;
do i=1 to 100;
x1 = rannor(123);
x2 = rannor(123)*2 + 1;
y = 1*x1 + 2*x2 + 4*rannor(123);
output;
end;
run;
/*Data to score*/
data to_score;
_model_ = "Y_on_X";
y = .;
x1 = 1.5;
x2 = -1;
run;
/*Method 1: just put missing values on the input data set and
PROC REG will do it for you*/
data test_2;
set test to_score;
run;
proc reg data=test_2 alpha=.01 outest=est;
Y_on_X: model y = x1 x2;
output out=test2_out(where=(y=.)) p=predicted ucl=UCL_Pred lcl=LCL_Pred;
run;
quit;
proc print data=test2_out;
run;
/*Method 2: Use the coefficients and the to_score data with
PROC SCORE*/
proc score data=to_score score=est out=scored type=parms;
var x1 x2;
run;
proc print data=scored;
var Y_on_X X1 X2;
run;
2 ways:
Append the data you want into the data set you're going to use to get estimates but leave the y value blank. Grab the estimates using the output statement from proc reg.
Use Proc Score
http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_score_sect018.htm

How to calculate a mean for the non zero values using proc means or proc summary

I want to have a mean which is based in non zero values for given variables using proc means only.
I know we do can calculate using proc sql, but I want to get it done through proc means or proc summary.
In my study I have 8 variables, so how can I calculate mean based on non zero values where in I am using all of those in the var statement as below:
proc means = xyz;
var var1 var2 var3 var4 var5 var6 var7 var8;
run;
If we take one variable at a time in the var statement and use a where condition for non zero variables , it works but can we have something which would work for all the variables of interest mentioned in the var statement?
Your suggestions would be highly appreciated.
Thank you !
One method is to change all of your zero values to missing, and then use PROC MEANS.
data zeromiss /view=zeromiss ;
set xyz ;
array n{*} var1-var8 ;
do i = 1 to dim(n) ;
if n{i} = 0 then call missing(n{i}) ;
end ;
drop i ;
run ;
proc means data=zeromiss ;
var var1-var8 ;
run ;
Create a view of your input dataset. In the view, define a weight variable for each variable you want to summarise. Set the weight to 0 if the corresponding variable is 0 and 1 otherwise. Then do a weighted summary via proc means / proc summary. E.g.
data xyz_v /view = xyz_v;
set xyz;
array weights {*} weight_var1-weight_var8;
array vars {*} var1-var8;
do i = 1 to dim(vars);
weights[i] = (vars[i] ne 0);
end;
run;
%macro weighted_var(n);
%do i = 1 to &n;
var var&i /weight = weight_var&i;
%end;
%mend weighted_var;
proc means data = xyz_v;
%weighted_var(8);
run;
This is less elegant than Chris J's solution for this specific problem, but it generalises slightly better to other situations where you want to apply different weightings to different variables in the same summary.
Can't you use a data statement?
data lala;
set xyz;
drop qty;
mean = 0;
qty = 0;
if(not missing(var1) and var1 ^= 0) then do;
mean + var1;
qty + 1;
end;
if(not missing(var2) and var2 ^= 0) then do;
mean + var2;
qty + 1;
end;
/* ... repeat to all variables ... */
if(not missing(var8) and var8 ^= 0) then do;
mean + var8;
qty + 1;
end;
mean = mean/qty;
run;
If you want to keep the mean in the same xyz dataset, just replace lala with xyz.

PROC Format and proc summary

data a;
input accountno name $;
datalines;
1.01 x
0.999 harshit
1.99 y
2 kumar
3 manali
;
Run;
proc print; run;
proc format;
value h
0-1='g.0-1'
1-3='g.1-3'
;
run;
proc print data = a;
format accountno h.;
run;
proc summary data = a nway;
class accountno;
format accountno h.;
var accountno;
output out = hpd;
run;
proc print; run;
in proc summary it will not take var accountno also gives
WARNING: Variable accountno already exists on file WORK.HPD.
WARNING: The duplicate variables will not be included in the output data set of the output statement number 1.
so what is the solution?
Not completely sure what you are wanting to get in the output, but I can tell you why you are getting the warning message.
In proc summary, you are using the same variable name in the class statement as you are using in your var statement. In the referent output dataset, the procedure is letting you know that you are duplicating a variable name.
You could add an extra variable in the data step that writes out to data 'a';
If you are trying to just get frequencies of the class variable, remove the var statement completely as in:
proc summary data = a;
class accountvar;
output out = freqs;
run;