Using SAS proc IML, I have a function: CVF(m,p,h,pi,e);
I would like to guess h which minimizes this function. Is there some built-in subroutine to minimize it? Or how can I construct an iterative process for it? Every other variables are defined.
Use call nlpqn();. The function you pass needs to have only 1 parameter, the vector you want to optimize over. So here I define a quadratic function where the a, b, and c parameters are also able to be defined. Use the GLOBAL statement and define the non-moving variables before the call.
Alternatively, you can put everything in the input vector, and then add constraints to keep those values from moving.
proc iml;
start myfun(x) global(a, b, c);
out = a*x**2 + b*x + c;
return (out);
finish myfun;
a = 1;
b = 2;
c = 4;
optn = {0, /* option 1: 0 -> MIN, 1 -> MAX*/
0 /* Print options 0-5 0 least, 5 most*/
};
init = 0;
call nlpqn(rc, res, "myfun", init);
/*rc > 0 means success*/
print rc res;
quit;
Returns:
The SAS System 08:59 Friday, June 27, 2014 3
rc res
3 -1
You can build your own optimization routine with IML language, of course, but you have also optimization routines built-in, see chapter 11 of IML User's Guide, Nonlinear Optimization Examples.
Related
This is a programming question, but I'll give you a little of the stats background first. This question refers to part of a data sim for a mixed-effects location scale model (i.e., heterogeneous variances). I'm trying to simulate two MVN variance components using the RANDNORMAL function in IML. Because both variance components are heterogeneous, the variances used by RANDNORMAL will differ across people. Thus, I need IML to select the specific row (e.g., row 1 = person 1) and use the RANDNORMAL function before moving onto the next row, and so on.
My example code below is for 2 people. I use DO to loop through each person's specific variance components (VC1 and VC2). I get the error: "Module RANDNORMAL called again before exit from prior call." I am assuming I need some kind of BREAK or EXIT function in the DO loop, but none I have tried work.
PROC IML;
ColNames = {"ID" "VC1" "VC2"};
A = {1 2 3,
2 8 9};
PRINT A[COLNAME=ColNames];
/*Set men of each variance component to 0*/
MeanVector = {0, 0};
/*Loop through each person's data using THEIR OWN variances*/
DO i = 1 TO 2;
VC1 = A[i,2];
VC2 = A[i,3];
CovMatrix = {VC1 0,
0 VC2};
CALL RANDSEED(1);
U = RANDNORMAL(2, MeanVector, CovMatrix);
END;
QUIT;
Any help is appreciated. Oh, and I'm using SAS 9.4.
You want to move some things around, but mostly you don't want to rewrite U twice: you need to write U's 1st row, then U's 2nd row, if I understand what you're trying to do. The below is a bit more efficient also, since I j() the U and _cv matrices rather than constructing then de novo every time through the loop (which is slow).
proc iml;
a = {1 2 3,2 8 9};
print(a);
_mv = {0,0};
U = J(2,2);
_cv = J(2,2,0);
CALL RANDSEED(1);
do i = 1 to 2;
_cv[1,1] = a[i,2];
_cv[2,2] = a[i,3];
U[i,] = randnormal(1,_mv, _cv);
end;
print(u);
quit;
Your mistake is the line
CovMatrix = {VC1 0, 0 VC2}; /* wrong */
which is not valid SAS/IML syntax. Instead, use #Joe's approach or use
CovMatrix = (VC1 || 0) // (0 || VC2);
For details, see the article "How to build matrices from expressions."
You might also be interested in this article that describes how to carry out this simulation with a block-diagonal matrix: "Constructing block matrices with applications to mixed models."
I am trying to use call randgen within proc IML to create 10 random numbers that follow a Weibull distribution with certain parameters. Here is the code I am using (obviously there will be more than just one loop but I am just testing right now):
do i = 1 to 1;
Call randgen(Rands[i,1:Ntimes], 'Weibull', alpha[i], beta[i]);
print (rands[1,1:Ntimes]);
print (alpha[i]) (beta[i]);
end;
For this example Ntimes = 10, alpha[i] = 4.5985111, and beta[i] = 131.79508. My issue is that each of the 10 iterations/random numbers comes back as 1. I used the rweibull function in R with the same parameters and got results that made sense so I am thinking it has something to do with SAS or my code rather than an issue with the parameters. Am I using the Randgen call correctly? Does anyone know why the results would be coming out this way?
This works:
proc iml;
alpha=j(10);
beta=j(10);
alpha[1]=4.59;
beta[1] = 131.8;
Ntimes=10;
rands = j(1,10);
print (rands);
do i = 1 to 1;
Call randgen(Rands, 'WEIB', alpha[1],beta[1]);
print (rands);
end;
quit;
I don't think you can use Rands[1:Ntimes] that way. I think you would want to assign it to a temporary matrix and then assign that matrix's results to a larger matrix.
IE:
allRands=j(10,10);
do i = 1 to 10;
Call randgen(Rands, 'WEIB', alpha[1],beta[1]);
print (rands);
allRands[i,1:10]=Rands;
end;
print(allRands);
Actually, unless you are using an ancient version of SAS/IML, you don't need any loops. Since SAS/IML 12.3, the RANDGEN subroutine accepts a vector of parameters. In your case, define a vector for the alpha and beta parameters. Let's say there are 'Nparam' parameters. Then allocate an N x Nparam matrix to hold the results. With a single call to RANDGEN, you can fill the matrix so that the i_th column is a sample of size N from Weibull(alpha[i], beta[i]), as shown in the following example:
proc iml;
Nparam = 8; N = 1000;
alpha= 1:Nparam; /* assign parameter values */
beta = 10 + (Nparam:1);
rands = j(N,Nparam);
call randgen(rands, 'WEIB', alpha,beta); /* SAS/IML 12.1 */
/* DONE. The i_th column is a sample from Weibul(alpha[i], beta[i])
TEST IT: Compute the mean of each sample: */
mean = mean(rands); std = std(rands);
print (alpha//beta//mean//std)[r={"alpha" "beta" "mean" "std"}];
/* TEST IT: Plot the distribution of each sample (SAS/IML 12.3) */
title "First param"; call histogram(rands[,1]);
title "Last param"; call histogram(rands[,Nparam]);
I've been trying to find the simplest way to generate simulated time series datasets in SAS. I initially was experimenting with the LAG operator, but this requires input data, so is proabably not the best way to go. (See this question: SAS: Using the lag function without a set statement (to simulate time series data.))
Has anyone developed a macro or dataset that enables time series to be genereated with an arbitrary number of AR and MA terms? What is the best way to do this?
To be specific, I'm looking to generate what SAS calls an ARMA(p,q) process, where p denotes the autoregressive component (lagged values of the dependent variable), and q is the moving average component (lagged values of the error term).
Thanks very much.
I have developed a macro to attempt to answer this question, but I'm not sure whether this is the most efficient way of doing this. Anyway, I thought it might be useful to someone:
%macro TimeSeriesSimulation(numDataPoints=100, model=y=e,outputDataSetName=ts, maxLags=10);
data &outputDataSetName (drop=j);
array lagy(&maxlags) _temporary_;
array lage(&maxlags) _temporary_;
/*Initialise values*/
e = 0;
y=0;
t=1;
do j = 1 to 10;
lagy(j) = 0;
lage(j) = 0;
end;
output;
do t = 2 to &numDataPoints; /*Change this for number of observations*/
/*SPECIFY MODEL HERE*/
e = rannorm(-1); /*Draw from a N(0,1)*/
&model;
/*Update values of lags on the moving average and autoregressive terms*/
do j = &maxlags-1 to 1 by -1; /*Note you have to do this backwards because otherwise you cascade the current value to all past values!*/
lagy(j+1) = lagy(j);
lage(j+1) = lage(j);
end;
lagy(1) = y;
lage(1) = e;
output;
end;
run;
%mend;
/*Example 1: Unit root*/
%TimeSeriesSimulation(numDataPoints=1000, model=y=lagy(1)+e)
/*Example 2: Simple process with AR and MA components*/
%TimeSeriesSimulation(numDataPoints=1000, model=y=0.5*lagy(1)+0.5*lage(1)+e)
I've got something like the following:
proc means data = ... missing;
class 1 2 3 4 5;
var a b;
output sum=;
run;
This does what I want it to do, except for the fact that it is very difficult to differentiate between a missing value that represents a total, and a missing value that represents a missing value. For example, the following would appear in my output:
1 2 3 4 5 type sumA sumB
. . . . . 0 num num
. . . . . 1 num num
Ways I can think of handling this:
1) Change missings to a cardinal value prior to proc means. This is definitely doable...but not exactly clean.
2) Format the missings to something else prior, and then use preloadfmt? This is a bit of a pain...I'd really rather not.
3) Somehow use the proc means-generated variable type to determine whether the missing is a missing or a total
4) Other??
I feel like this is clearly a common enough problem that there must be a clean, easy way, and I just don't know what it is.
Option 3, for sure . Type is simply a binary number with 1 for each class variable, in order, that is included in the current row and 0 for each one that is missing. You can use the CHARTYPE option to ask for it to be given explicitly as a string ('01101110' etc.), or work with math if that's more your style.
How exactly you use this depends on what you're trying to accomplish. Rows that have a missing value on them will have a type that suggests a class variable should exist, but doesn't. So for example:
data want;
set have; *post-proc means assuming used CHARTYPE option;
array classvars a b c d e; *whatever they are;
hasmissing=0;
do _t = 1 to dim(classvars);
if char(_type_,_t) = 1 and classvars[_t] = . then hasmissing=1;
end;
*or;
if cmiss(of classvars[*]) = countc(_type_,'0') then hasmissing=0;
else hasmissing=1; *number of 0s = number of missings = total row, otherwise not;
run;
That's a brute force application, of course. You may also be able to identify it based on the number of missings, if you have a small number of types requested. For example, let's say you have 3 class variables (so 0 to 7 values for type), and you only asked for the 3 way combination (7, '111') and the 3 two way combination 'totals' (6,5,3, ie, '110','101','011'). Then:
data want;
set have;
if (_type_=7 and cmiss(of a b c) = 0) or (cmiss(of a b c) = 1) then ... ; *either base row or total row, no missings;
else ... ; *has at least one missing;
run;
Depending on your data, NMISS may also work. That checks to see if the number of missings is appropriate for the type of data.
Joe's strategy, modified slightly for my exact problem, because it may be useful to somebody at some point in the future.
data want;
set have;
array classvars a b c d e;
do _t = 1 to dim(classvars);
if char(_type_,_t) = 1 and (strip(classvars[_t] = "") or strip(classvars[_t]) = ".") then classvars[_t] = "TOTAL";
end;
run;
The rationale for the changes is as follows:
1) I'm working with (mostly) character variables, not numeric.
2) I'm not interested in whether a row has any missing or not, as those are very frequent, and I want to keep them. Instead, I just want the output to differentiate between the missings and the totals, which I have accomplished by renaming the instances of non-missing to something that indicates total.
I am trying to compute using two loops. But I am not very familiar with loop elements.
Here is my data:
data try;
input rs t a b c;
datalines;
0 600
1 600 0.02514 667.53437 0.1638
2 600 0.2766 724.60233 0.30162
3 610 0.01592 792.34628 0.21354
4 615.2869 0.03027 718.30377 0.22097
5 636.0273 0.01967 705.45965 0.16847
;
run;
What I am trying to compute is that for each 'T' value, all elements of a, b, and c need to be used for the equation. Then I create varaibles v1-v6 to put results of the equation for each T1-T6. After that, I create CS to sum all the elements of v.
So my result dataset will look like this:
rs T a b c v1 v2 v3 v4 v5 v6 CS
0 600 sum of v1
1 600 0.02514 667.53437 0.1638 sum of v2
2 600 0.2766 724.60233 0.30162 sum of v3
3 610 0.01592 792.34628 0.21354 sum of v4
4 615.2869 0.03027 718.30377 0.22097 sum of v5
5 636.0273 0.01967 705.45965 0.16847 sum of v6
I wrote a code below to do this but got errors. Mainly I am not sure how to use i and j properly to link all elements of variables. Can someone point out what i did not think correct? I am aware that myabe I should not use sum function to cum up elements of a variable but not sure which function to use.
data try3;
set try;
retain v1-v6;
retain t a b c;
array v(*) v1-v6;
array var(*) t a b c;
cs=0;
do i=1 to 6;
do j=1 to 6;
v[i,j]=(2.89*(a[j]**2*(1-c[j]))/
((c[j]+exp(1.7*a[j]*(t[i]-b[j])))*
((1+exp(-1.7*a[j]*(t[i]-b[j])))**2));
cs[i]=sum(of v[i,j]-v[i,j]);
end;
end;
run;
Forexample, v1 will be computed like v[1,1] =0 because there is no values for a b c.
For v[1,2]=(2.89*0.02514**2(1-0.1638))/((0.1638+exp(1.7*0.02514*600-667.53437)))*((1+exp(-1.7*0.02514*(600-667.5347)))**2)).
v[1,3]]=(2.89*0.2766**2(1-0.30162))/((0.30162+exp(1.7*0.2766*600-724.60233)))*((1+exp(-1.7*0.2766*(600-724.60233)))**2)).
v[1,4] will be using the next line values of a b c but the t will be same as the t[1]. and do this until the last row. And that will be v1. And then I need to sum all the elements of v1 like v1{1,1] +v1[1,2]+ v1{1,3] ....v1[1,6] to make cs[1,1].
The SAS language isn't that good at doing these kinds of things, which are essentially matrix calculations. The DATA step normally processes one observation at a time, though you can carry calculations over using the RETAIN statement. It is possible that you could get a cleaner result than this if you had access to PROC IML (which does matrix calculations natively), but assuming that you don't have access to IML, you need to do something like the following. I'm not 100% sure that it is what you need, but I think it is along the right lines:
data try;
infile cards missover;
input rs t a b c;
datalines;
0 600
1 600 0.02514 667.53437 0.1638
2 600 0.2766 724.60233 0.30162
3 610 0.01592 792.34628 0.21354
4 615.2869 0.03027 718.30377 0.22097
5 636.0273 0.01967 705.45965 0.16847
;
run;
data try4(rename=(aa=a bb=b cc=c css=cs tt=t vv1=v1 vv2=v2 vv3=v3 vv4=v4 vv5=v5 vv6=v6));
* Construct arrays into which we will read all of the records;
array t(6);
array a(6);
array b(6);
array c(6);
array v(6,6);
array cs(6);
* Read all six records;
do i=1 to 6;
set try(rename=(t=tt a=aa b=bb c=cc));
t[i] = tt;
a[i] = aa;
b[i] = bb;
c[i] = cc;
end;
* Now do the calculation, which involves values from each
row at each iteration;
do i=1 to 6;
cs[i]=0;
do j=1 to 6;
v[i,j]=(2.89*(a[j]**2*(1-c[j]))/
((c[j]+exp(1.7*a[j]*(t[i]-b[j])))*
((1+exp(-1.7*a[j]*(t[i]-b[j])))**2)));
cs[i]+v[i,j];
end;
* Then output the values for this iteration;
tt=t[i];
aa=a[i];
bb=b[i];
cc=c[i];
css=cs[i];
vv1=v[i,1];
vv2=v[i,2];
vv3=v[i,3];
vv4=v[i,4];
vv5=v[i,5];
vv6=v[i,6];
keep tt aa bb cc vv1-vv6 css;
output try4;
end;
Note that I have to construct arrays of known size, that is you have to know how many input records there are.
The first half of the DATA step constructs arrays into which the values from the input data set are read. We read all of the records, and then we do all of the calculations, since we have all of the values in memory in the matricies.
There is some fiddling with RENAMES so that you can keep the array names t, a, b, c etc but still have variables named a, b, c etc in the output data set.
So hopefully that might help you along a bit. Either that or confuse you because I've misunderstood what you're trying to do!