I am writing an nlmixed procedure, and for the likelihood function, I want to use values generated with the iml procedure. So I am wondering if there is a way of using proc iml inside a proc nlmixed.
proc nlmixed data = xxx;
parms b0=0 b1=0;
mu = exp(b0 + b1*Age);
ll = log(((mu**y)*exp(-mu))/gamma(y+1));
model y~ general(ll);
run;
proc iml;
v = {5,6,7,8,9,10,11,12,13,14};
z = j(10,1,.);
do i = 1 to 6;
z[i] = ((v[i]-5)/5)*((mu**v[i])*exp(-mu))/gamma(v[i]+1);
end;
ll=log(sum(z));
quit;
The idea is to:
use mu from nlmixed inside the proc iml
but the ll from both steps should be inside nlmixed
Related
I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here
[I have this piece of code. However, the Macro in proc univariate generate too many separate dataset due to loop t from 1 to 310. How can I modify this code to include all proc univariate output into one dataset and then modify the rest of the code for a more efficient run?]
%let L=10; %* 10th percentile *;
%let H=%eval(100 - &L); %* 90th percentile*;
%let wlo=V1&L V2&L V3&L ;
%let whi=V1&H V2&H V3&H ;
%let wval=wV1 wV2 wV3 ;
%let val=V1 V2 V3;
%macro winsorise();
%do v=1 %to %sysfunc(countw(&val));
%do t=1 %to 310;
proc univariate data=regressors noprint;
var &val;
output out=_winsor&t._V&v pctlpts=&H &L
prtlpre=&val&t._V&v;
where time_count<=&t;run;
%end;
data regressors (drop=__:);
set regressors;
if _n_=1 then set _winsor&t._V&v;
&wval&t._V&v=min(max(&val&t._V&v,&wlo&t._V&v),&whi&t._V&v);
run;
%end;
%mend;
Thank you.
Presume you have data time_count, x1, x2, x3 with samples at every 0.5 time unit.
data regressors;
call streaminit(123);
do time_count = 0 to 310 by .5;
x1 = 2 ** (sin(time_count/6) * log(time_count+1));
x2 = log2 (time_count+1) + log(time_count/10+.1);
x3 = rand('normal',
output;
end;
format x: 7.3;
run;
Stack the data into groups based on integer time_count levels. The stack is constructed from a full outer join with a less than (<=) criteria. Each group is identified by the top time_count in the group.
proc sql;
create table stack as
select
a.time_count
, a.x1
, a.x2
, a.x3
, b.time_count as time_count_group /* save top value in group variable */
from regressors as a
full join regressors as b /* self full join */
on a.time_count <= b.time_count /* triangular criteria */
where
int(b.time_count)=b.time_count /* select integer top values */
order by
b.time_count, a.time_count
;
quit;
Now compute ALL your stats for ALL your variables for ALL your groups in one go. No macro, no muss, no fuss.
proc univariate data=stack noprint;
by time_count_group;
var x1 x2 x3;
output out=_winsor n=group_size pctlpts=90 10 pctlpre=x1_ x2_ x3_;
run;
Can someone tell me if these two would be equivalent?
PROC PHREG DATA=data;
ARRAY start{*} exp_start1-exp_start100;
ARRAY stop{*} exp_STOP1-exp_STOP100;
use=0;
DO I=1 TO 100;
IF stop{I}>start{I} AND start{I}<time2event THEN use=1;
END;
MODEL time2event*censor(1)=use/risklimits covb;
RUN;
AND:
data _want;
data _have;
ARRAY start{*} exp_start1-exp_start100;
ARRAY stop{*} exp_STOP1-exp_STOP100;
use=0;
DO I=1 TO 100;
IF stop{I}>start{I} AND start{I}<time2event THEN use=1;
END;
run;
**PROC PHREG DATA=_want;**
MODEL time2event*censor(1)=use/risklimits covb;
RUN;
Basically, can do the DO loop be outside of the PROC PHREG procedure when trying to incorporate time-dependent covariates?
Thanks!
SAS newbie here.
My question is about PROC REG in SAS; let's assume I have already created a model and now I would like to use this model, and known predictor variables to estimate a response value.
Is there a clean and easy way of doing this in SAS? So far I've been manually grabbing the intercept and the coefficients from the output of my model to calculate the response variable but as you can imagine it can get pretty nasty when you have a lot of covariates. Their user's guide is pretty cryptic...
Thanks in advance.
#Reese is correct. Here is some sample code to get you up the learning curve faster:
/*Data to regress*/
data test;
do i=1 to 100;
x1 = rannor(123);
x2 = rannor(123)*2 + 1;
y = 1*x1 + 2*x2 + 4*rannor(123);
output;
end;
run;
/*Data to score*/
data to_score;
_model_ = "Y_on_X";
y = .;
x1 = 1.5;
x2 = -1;
run;
/*Method 1: just put missing values on the input data set and
PROC REG will do it for you*/
data test_2;
set test to_score;
run;
proc reg data=test_2 alpha=.01 outest=est;
Y_on_X: model y = x1 x2;
output out=test2_out(where=(y=.)) p=predicted ucl=UCL_Pred lcl=LCL_Pred;
run;
quit;
proc print data=test2_out;
run;
/*Method 2: Use the coefficients and the to_score data with
PROC SCORE*/
proc score data=to_score score=est out=scored type=parms;
var x1 x2;
run;
proc print data=scored;
var Y_on_X X1 X2;
run;
2 ways:
Append the data you want into the data set you're going to use to get estimates but leave the y value blank. Grab the estimates using the output statement from proc reg.
Use Proc Score
http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_score_sect018.htm
I created the following macro. Proc power returns table pw_cout containing column Power. The data _null_ step assigns the value in column Power of pw_out to macro variable tpw. I want the macro to return the value of tpw, so that in the main program, I can call it in DATA step like:
data test;
set tmp;
pw_tmp=ttest_power(meanA=a, stdA=s1, nA=n1, meanB=a2, stdB=s2, nB=n2);
run;
Here is the code of the macro:
%macro ttest_power(meanA=, stdA=, nA=, meanB=, stdB=, nB=);
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw'=&power);
run;
&tpw
%mend ttest_power;
#itzy is correct in pointing out why your approach won't work. But there is a solution maintaing the spirit of your approach: you need to create a power-calculation function uisng PROC FCMP. In fact, AFAIK, to call a procedure from within a function in PROC FCMP, you need to wrap the call in a macro, so you are almost there.
Here is your macro - slightly modified (mostly to fix the symput statement):
%macro ttest_power;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw', power);
run;
%mend ttest_power;
Now we create a function that will call it:
proc fcmp outlib=work.funcs.test;
function ttest_power_fun(meanA, stdA, nA, meanB, stdB, nB);
rc = run_macro('ttest_power', meanA, stdA, nA, meanB, stdB, nB, tpw);
if rc = 0 then return(tpw);
else return(.);
endsub;
run;
And finally, we can try using this function in a data step:
options cmplib=work.funcs;
data test;
input a s1 n1 a2 s2 n2;
pw_tmp=ttest_power_fun(a, s1, n1, a2, s2, n2);
cards;
0 1 10 0 1 10
0 1 10 1 1 10
;
run;
proc print data=test;
You can't do what you're trying to do this way. Macros in SAS are a little different than in a typical programming language: they aren't subroutines that you can call, but rather just code that generate other SAS code that gets executed. Since you can't run proc power inside of a data step, you can't run this macro from a data step either. (Just imagine copying all the code inside the macro into the data step -- it wouldn't work. That's what a macro in SAS does.)
One way to do what you want would be to read each observation from tmp one at a time, and then run proc power. I would do something like this:
/* First count the observations */
data _null_;
call symputx('nobs',obs);
stop;
set tmp nobs=obs;
run;
/* Now read them one at a time in a macro and call proc power */
%macro power;
%do j=1 %to &nobs;
data _null_;
nrec = &j;
set tmp point=nrec;
call symputx('meanA',meanA);
call symputx('stdA',stdA);
call symputx('nA',nA);
call symputx('meanB',meanB);
call symputx('stdB',stdB);
call symputx('nB',nB);
stop;
run;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
proc append base=pw_out_all data=pw_out; run;
%end;
%mend;
%power;
By using proc append you can store the results of each round of output.
I haven't checked this code so it might have a bug, but this approach will work.
You can invoke a macro which calls procedures, etc. (like the example) from within a datastep using call execute(), but it can get a bit messy and difficult to debug.