SAS newbie here.
My question is about PROC REG in SAS; let's assume I have already created a model and now I would like to use this model, and known predictor variables to estimate a response value.
Is there a clean and easy way of doing this in SAS? So far I've been manually grabbing the intercept and the coefficients from the output of my model to calculate the response variable but as you can imagine it can get pretty nasty when you have a lot of covariates. Their user's guide is pretty cryptic...
Thanks in advance.
#Reese is correct. Here is some sample code to get you up the learning curve faster:
/*Data to regress*/
data test;
do i=1 to 100;
x1 = rannor(123);
x2 = rannor(123)*2 + 1;
y = 1*x1 + 2*x2 + 4*rannor(123);
output;
end;
run;
/*Data to score*/
data to_score;
_model_ = "Y_on_X";
y = .;
x1 = 1.5;
x2 = -1;
run;
/*Method 1: just put missing values on the input data set and
PROC REG will do it for you*/
data test_2;
set test to_score;
run;
proc reg data=test_2 alpha=.01 outest=est;
Y_on_X: model y = x1 x2;
output out=test2_out(where=(y=.)) p=predicted ucl=UCL_Pred lcl=LCL_Pred;
run;
quit;
proc print data=test2_out;
run;
/*Method 2: Use the coefficients and the to_score data with
PROC SCORE*/
proc score data=to_score score=est out=scored type=parms;
var x1 x2;
run;
proc print data=scored;
var Y_on_X X1 X2;
run;
2 ways:
Append the data you want into the data set you're going to use to get estimates but leave the y value blank. Grab the estimates using the output statement from proc reg.
Use Proc Score
http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_score_sect018.htm
Related
I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here
How to plot the dynamic of the variable y over x in SAS (the easiest way!), for example, temperature changes over time. Thank you very much! (PC-SAS)
Two common ways are the scatter and series statements in Proc SGPLOT.
For example:
data have;
do time = 0 to 24;
time = time + ranuni(123) - 1.2;
temp = 60 + 6 * sin(time / 7.6);
rowNumber + 1;
output;
end;
run;
ods graphics on / width=325px;
proc sgplot data=have;
title "Scatter";
footnote "Created: &sysdate";
scatter x=time y=temp;
run;
proc sgplot data=have;
title "Series";
footnote "Created: &sysdate";
series x=time y=temp / markers;
run;
I am writing an nlmixed procedure, and for the likelihood function, I want to use values generated with the iml procedure. So I am wondering if there is a way of using proc iml inside a proc nlmixed.
proc nlmixed data = xxx;
parms b0=0 b1=0;
mu = exp(b0 + b1*Age);
ll = log(((mu**y)*exp(-mu))/gamma(y+1));
model y~ general(ll);
run;
proc iml;
v = {5,6,7,8,9,10,11,12,13,14};
z = j(10,1,.);
do i = 1 to 6;
z[i] = ((v[i]-5)/5)*((mu**v[i])*exp(-mu))/gamma(v[i]+1);
end;
ll=log(sum(z));
quit;
The idea is to:
use mu from nlmixed inside the proc iml
but the ll from both steps should be inside nlmixed
I need to find a ratio of two mean values, that I have found using proc means.
proc means data=a;
class X Y;
var x1 x2;
run;
Then I get the output mean values for variables x1 and x2 in the two categories of X and Y, but it is x1/x2 for each category that I am interested in, and doing it by hand is not really a solution.
I am not a professional programmer, so I hope there is a simple piece of code that I can understand and use.
You need to precompute x1/x2 or postcompute x1/x2 (Depending on whether you want mean(x1/x2) or mean(x1)/mean(x2), which can have different answers of x1 and x2 have different numbers of responses).
So either (... means fill in what you have already)
data premean;
set have;
x1x2 = x1/x2;
run;
proc means ... ;
class ... ;
var x1x2;
run;
or
proc means ...;
class ... ;
var x1 x2;
output out=postmeans mean=;
run;
data want;
set postmeans;
x1x2=x1/x2;
run;
proc sql noprint;
create table xy_ratio as /* New table name*/
select distinct X, Y, avg(x1)/avg(x2) as x1_x2_ratio /* selects distinct rows containing variables listed here. (Must include group by variables) mean of x1 / mean of x2 to form ratio*/
from a /*source dataset*/
group by X, Y /*Similar to class statement, will provide an average for each distinct combination of X and Y that appear in the dataset*/
;
quit;
I want to perform the standard likelihood ratio test in logsitic regression using SAS. I will have a full logistic model, containing all variables, named A and a nested logistic model B, which is derived by dropping out one variable from A.
If I want to test whether that drop out variable is significant or not, I shall perform a likelihood ratio test of model A and B. Is there an easy way to perform this test (essentially a chi-square test) in SAS using a PROC? Thank you very much for the help.
If you want to perform likelihood ratio tests that are full model v.s. one variable dropped model, you can use the GENMOD procedure with the type3 option.
Script:
data d1;
do z = 0 to 2;
do y = 0 to 1;
do x = 0 to 1;
input n ##;
output;
end; end; end;
cards;
100 200 300 400
50 100 150 200
50 100 150 200
;
proc genmod data = d1;
class y z;
freq n;
model x = y z / error = bin link = logit type3;
run;
Output:
LR Statistics For Type 3 Analysis
Chi-
Source DF Square Pr > ChiSq
y 1 16.09 <.0001
z 2 0.00 1.0000
I'm not sure about a PROC statement that can specifically perform LRT but you can compute the test for nested models.
Script
proc logistic data = full_model;
model dependent_var = independent_var(s);
ods output GlobalTests = GlobalTests_full;
run;
data _null_;
set GlobalTests_full;
if test = "Likelihood Ratio" then do;
call symput("ChiSq_full", ChiSq);
call symput("DF_full", DF);
end;
run;
proc logistic data = reduced_model;
model dependent_var = independent_var(s);
ods output GlobalTests = GlobalTests_reduced;
run;
data _null_;
set GlobalTests_reduced;
if test = "Likelihood Ratio" then do;
call symput("ChiSq_reduced", ChiSq);
call symput("DF_reduced", DF);
end;
run;
data LRT_result;
LR = &ChiSq_full - &ChiSq_reduced;
DF = &DF_full - &DF_reduced;
p = 1 - probchi(ChiSq,DF);
run;
I'm no expert on logistic regression, but I think what you are trying to accomplish can be done with PROC LOGISTIC, using the "SELECTION=SCORE" option on the MODEL statement. There are other SELECTION options available, such as STEPWISE, but I think SCORE matches closest to what you are looking for. I would suggest reading up on it though, because there are some associated options (BEST=, START= STOP=) that you might benefit from too.