Fortran to SAS code conversiton - fortran

I want to create 1, 2, 3 dimensional variables/arrays inside of proc iml. My code looks following:
proc iml;
start Mean1(x); /*this is 1 dimension variable/array*/
Mean1(x)=sum(x)/dim(x);
finish;
proc iml;
start Mean2(x); /*this is 2 dimension variable/array*/
Mean1(x)=sum(x)/dim(x);
finish;
proc iml;
start Mean3(x); /*this is 3 dimension variable/array*/
Mean1(x)=sum(x)/dim(x);
finish;
I tried to do like this:
proc iml;
declare double x[dim(n),dim(n)];
start Mean2(x); /*this is 2 dimension variable or array*/
Mean1(x)=sum(x)/dim(a, x);
finish;
But it's not working. Could you help me?

There are a few things to know here.
SAS IML arrays are 1 indexed C style row major arrays. Not column
major like Fortran.
To my knowledge, there are no 3 dimensional arrays in IML. Always possible I am mistaken.
All numbers in SAS are doubles.
IML has nice reduction operators that make means easy
and very fast.
To declare a matrix/array, use the J(nrow,ncol,fill) function:
proc iml;
x = J(10,5,1); /*Declare a 10x5 matrix filled with 1s*/
x = normal(x); /*Fills matrix X with random numbers, uses the values in X as the seed*/
mean_all = x[:]; /*mean over all values in x*/
mean_col = x[:,];/*mean of each column */
mean_row = x[,:];/*mean of each row */
print mean_all;
print mean_col;
print mean_row;
quit;
I highly recommend going through the IML documentation. http://support.sas.com/documentation/onlinedoc/iml/index.html

Related

Perform calculations using n and nmiss values

I have the following SAS PROC MEANS statement that works great as it is.
proc means data=MBA_NODUP_APPLICANT_&TERM. missing nmiss n mean median p10 p90 fw = 8;
where ENR = 1;
by SRC_TYPE;
var gmattotal greverb2 grequant2 greanwrt;
run;
However, I am trying to add new variable calculating nmiss/(nmiss+n). I don't see any examples of this online, but also nothing that says that it cannot be done.
To calculate the percent missing, which is what your formula means, just use the OUTPUT statement to generate a dataset with the NMISS and N values. Then add a step to do the arithmetic yourself.
Or you could create a new binary variable using the MISSING() function and take the MEAN of that. The mean of a 1/0 variable is the same are the percent that were 1 (TRUE).
Example:
data test;
set sashelp.cars;
missing_cylinders=missing(cylinders);
run;
proc means data=test nmiss n mean;
var cylinders missing_cylinders ;
run;
So 2/428 is a little less than 0.5%.
The MEANS Procedure
N
Variable Miss N Mean
------------------------------------------------
Cylinders 2 426 5.8075117
missing_cylinders 0 428 0.0046729

How to construct a covariance matrix using PROC IML in SAS

I am writing an assignment in SAS and I'm trying to create a covariance matrix from my data:
PROC IML;
USE nydata;
varNames = {"aa10i" "ac10a" "ba10" "bc20" "ca10" "sex" "cityrur" "edu3" "hinc3rel" "age" "ga10j" "ca10bin"};
READ ALL INTO X; *read X1 & X2 to X matrix;
N = NROW(X); *N = number of observation: NROW is the number of row;
ONE = J(N,1,1); *J(nrow,ncol, value) i.e. 12x1 vector containing ones: J rprecents a function to creat matrix of a given dimention;
A=ONE`; *vector transpose, it can be also writen as T(ONE) ;
C= A*X; *will give 1x2 matrix, wtih a summed value of each columens;
MEAN = C/N; *matric containg means;
MEANS = ONE*MEAN;
XM= X-MEANS; *Corrected mean matrix;
SSCPM=XM`*XM; *sume of squares and cross products matrix;
DF=N-1; *degree of freedom;
S=SSCPM/DF; *Covariance matrix;
D=DIAG(S); *taking only the individual variances;
XS=XM*SQRT(INV(D)); *XS,standardized data,SQRT stands of squear root and INV, standes to inverted matrix;
R=t(XS)*XS/(N-1); *computing the coorelation matrix;
GV=DET(S); * computing from the determinant of the covariance matrix;
create kovarians from S[colname={"aa10i" "ac10a" "ba10" "bc20" "ca10" "sex" "cityrur" "edu3" "hinc3rel" "age" "ga10j" "ca10bin"}
rowname={"aa10i" "ac10a" "ba10" "bc20" "ca10" "sex" "cityrur" "edu3" "hinc3rel" "age" "ga10j" "ca10bin"}]; /** create data set **/
append from S;
close S;
QUIT;
My data looks something like this. There are no more columns than the ones you see.
I get the following error message:
590 append from S;
ERROR: Number of columns in S does not match with the number of variables in the data set.
statement : APPEND at line 590 column 6
591 close S;
NOTE: Cannot close WORK.S; it is not open.
592 QUIT;
NOTE: Exiting IML.
Would someone pretty please tell me what I'm doing wrong?
Reading the error and since I do not have the NY Dataset or test data generated from datastep I assume that the number of variables is not equal to the number of columns.
Hope this helps!

SAS Randgen call with Weibull distribution

I am trying to use call randgen within proc IML to create 10 random numbers that follow a Weibull distribution with certain parameters. Here is the code I am using (obviously there will be more than just one loop but I am just testing right now):
do i = 1 to 1;
Call randgen(Rands[i,1:Ntimes], 'Weibull', alpha[i], beta[i]);
print (rands[1,1:Ntimes]);
print (alpha[i]) (beta[i]);
end;
For this example Ntimes = 10, alpha[i] = 4.5985111, and beta[i] = 131.79508. My issue is that each of the 10 iterations/random numbers comes back as 1. I used the rweibull function in R with the same parameters and got results that made sense so I am thinking it has something to do with SAS or my code rather than an issue with the parameters. Am I using the Randgen call correctly? Does anyone know why the results would be coming out this way?
This works:
proc iml;
alpha=j(10);
beta=j(10);
alpha[1]=4.59;
beta[1] = 131.8;
Ntimes=10;
rands = j(1,10);
print (rands);
do i = 1 to 1;
Call randgen(Rands, 'WEIB', alpha[1],beta[1]);
print (rands);
end;
quit;
I don't think you can use Rands[1:Ntimes] that way. I think you would want to assign it to a temporary matrix and then assign that matrix's results to a larger matrix.
IE:
allRands=j(10,10);
do i = 1 to 10;
Call randgen(Rands, 'WEIB', alpha[1],beta[1]);
print (rands);
allRands[i,1:10]=Rands;
end;
print(allRands);
Actually, unless you are using an ancient version of SAS/IML, you don't need any loops. Since SAS/IML 12.3, the RANDGEN subroutine accepts a vector of parameters. In your case, define a vector for the alpha and beta parameters. Let's say there are 'Nparam' parameters. Then allocate an N x Nparam matrix to hold the results. With a single call to RANDGEN, you can fill the matrix so that the i_th column is a sample of size N from Weibull(alpha[i], beta[i]), as shown in the following example:
proc iml;
Nparam = 8; N = 1000;
alpha= 1:Nparam; /* assign parameter values */
beta = 10 + (Nparam:1);
rands = j(N,Nparam);
call randgen(rands, 'WEIB', alpha,beta); /* SAS/IML 12.1 */
/* DONE. The i_th column is a sample from Weibul(alpha[i], beta[i])
TEST IT: Compute the mean of each sample: */
mean = mean(rands); std = std(rands);
print (alpha//beta//mean//std)[r={"alpha" "beta" "mean" "std"}];
/* TEST IT: Plot the distribution of each sample (SAS/IML 12.3) */
title "First param"; call histogram(rands[,1]);
title "Last param"; call histogram(rands[,Nparam]);

Simulating ARMA/ARIMA time series processes in SAS

I've been trying to find the simplest way to generate simulated time series datasets in SAS. I initially was experimenting with the LAG operator, but this requires input data, so is proabably not the best way to go. (See this question: SAS: Using the lag function without a set statement (to simulate time series data.))
Has anyone developed a macro or dataset that enables time series to be genereated with an arbitrary number of AR and MA terms? What is the best way to do this?
To be specific, I'm looking to generate what SAS calls an ARMA(p,q) process, where p denotes the autoregressive component (lagged values of the dependent variable), and q is the moving average component (lagged values of the error term).
Thanks very much.
I have developed a macro to attempt to answer this question, but I'm not sure whether this is the most efficient way of doing this. Anyway, I thought it might be useful to someone:
%macro TimeSeriesSimulation(numDataPoints=100, model=y=e,outputDataSetName=ts, maxLags=10);
data &outputDataSetName (drop=j);
array lagy(&maxlags) _temporary_;
array lage(&maxlags) _temporary_;
/*Initialise values*/
e = 0;
y=0;
t=1;
do j = 1 to 10;
lagy(j) = 0;
lage(j) = 0;
end;
output;
do t = 2 to &numDataPoints; /*Change this for number of observations*/
/*SPECIFY MODEL HERE*/
e = rannorm(-1); /*Draw from a N(0,1)*/
&model;
/*Update values of lags on the moving average and autoregressive terms*/
do j = &maxlags-1 to 1 by -1; /*Note you have to do this backwards because otherwise you cascade the current value to all past values!*/
lagy(j+1) = lagy(j);
lage(j+1) = lage(j);
end;
lagy(1) = y;
lage(1) = e;
output;
end;
run;
%mend;
/*Example 1: Unit root*/
%TimeSeriesSimulation(numDataPoints=1000, model=y=lagy(1)+e)
/*Example 2: Simple process with AR and MA components*/
%TimeSeriesSimulation(numDataPoints=1000, model=y=0.5*lagy(1)+0.5*lage(1)+e)

SAS computation using double loops

I am trying to compute using two loops. But I am not very familiar with loop elements.
Here is my data:
data try;
input rs t a b c;
datalines;
0 600
1 600 0.02514 667.53437 0.1638
2 600 0.2766 724.60233 0.30162
3 610 0.01592 792.34628 0.21354
4 615.2869 0.03027 718.30377 0.22097
5 636.0273 0.01967 705.45965 0.16847
;
run;
What I am trying to compute is that for each 'T' value, all elements of a, b, and c need to be used for the equation. Then I create varaibles v1-v6 to put results of the equation for each T1-T6. After that, I create CS to sum all the elements of v.
So my result dataset will look like this:
rs T a b c v1 v2 v3 v4 v5 v6 CS
0 600 sum of v1
1 600 0.02514 667.53437 0.1638 sum of v2
2 600 0.2766 724.60233 0.30162 sum of v3
3 610 0.01592 792.34628 0.21354 sum of v4
4 615.2869 0.03027 718.30377 0.22097 sum of v5
5 636.0273 0.01967 705.45965 0.16847 sum of v6
I wrote a code below to do this but got errors. Mainly I am not sure how to use i and j properly to link all elements of variables. Can someone point out what i did not think correct? I am aware that myabe I should not use sum function to cum up elements of a variable but not sure which function to use.
data try3;
set try;
retain v1-v6;
retain t a b c;
array v(*) v1-v6;
array var(*) t a b c;
cs=0;
do i=1 to 6;
do j=1 to 6;
v[i,j]=(2.89*(a[j]**2*(1-c[j]))/
((c[j]+exp(1.7*a[j]*(t[i]-b[j])))*
((1+exp(-1.7*a[j]*(t[i]-b[j])))**2));
cs[i]=sum(of v[i,j]-v[i,j]);
end;
end;
run;
Forexample, v1 will be computed like v[1,1] =0 because there is no values for a b c.
For v[1,2]=(2.89*0.02514**2(1-0.1638))/((0.1638+exp(1.7*0.02514*600-667.53437)))*((1+exp(-1.7*0.02514*(600-667.5347)))**2)).
v[1,3]]=(2.89*0.2766**2(1-0.30162))/((0.30162+exp(1.7*0.2766*600-724.60233)))*((1+exp(-1.7*0.2766*(600-724.60233)))**2)).
v[1,4] will be using the next line values of a b c but the t will be same as the t[1]. and do this until the last row. And that will be v1. And then I need to sum all the elements of v1 like v1{1,1] +v1[1,2]+ v1{1,3] ....v1[1,6] to make cs[1,1].
The SAS language isn't that good at doing these kinds of things, which are essentially matrix calculations. The DATA step normally processes one observation at a time, though you can carry calculations over using the RETAIN statement. It is possible that you could get a cleaner result than this if you had access to PROC IML (which does matrix calculations natively), but assuming that you don't have access to IML, you need to do something like the following. I'm not 100% sure that it is what you need, but I think it is along the right lines:
data try;
infile cards missover;
input rs t a b c;
datalines;
0 600
1 600 0.02514 667.53437 0.1638
2 600 0.2766 724.60233 0.30162
3 610 0.01592 792.34628 0.21354
4 615.2869 0.03027 718.30377 0.22097
5 636.0273 0.01967 705.45965 0.16847
;
run;
data try4(rename=(aa=a bb=b cc=c css=cs tt=t vv1=v1 vv2=v2 vv3=v3 vv4=v4 vv5=v5 vv6=v6));
* Construct arrays into which we will read all of the records;
array t(6);
array a(6);
array b(6);
array c(6);
array v(6,6);
array cs(6);
* Read all six records;
do i=1 to 6;
set try(rename=(t=tt a=aa b=bb c=cc));
t[i] = tt;
a[i] = aa;
b[i] = bb;
c[i] = cc;
end;
* Now do the calculation, which involves values from each
row at each iteration;
do i=1 to 6;
cs[i]=0;
do j=1 to 6;
v[i,j]=(2.89*(a[j]**2*(1-c[j]))/
((c[j]+exp(1.7*a[j]*(t[i]-b[j])))*
((1+exp(-1.7*a[j]*(t[i]-b[j])))**2)));
cs[i]+v[i,j];
end;
* Then output the values for this iteration;
tt=t[i];
aa=a[i];
bb=b[i];
cc=c[i];
css=cs[i];
vv1=v[i,1];
vv2=v[i,2];
vv3=v[i,3];
vv4=v[i,4];
vv5=v[i,5];
vv6=v[i,6];
keep tt aa bb cc vv1-vv6 css;
output try4;
end;
Note that I have to construct arrays of known size, that is you have to know how many input records there are.
The first half of the DATA step constructs arrays into which the values from the input data set are read. We read all of the records, and then we do all of the calculations, since we have all of the values in memory in the matricies.
There is some fiddling with RENAMES so that you can keep the array names t, a, b, c etc but still have variables named a, b, c etc in the output data set.
So hopefully that might help you along a bit. Either that or confuse you because I've misunderstood what you're trying to do!