How to construct a covariance matrix using PROC IML in SAS - sas

I am writing an assignment in SAS and I'm trying to create a covariance matrix from my data:
PROC IML;
USE nydata;
varNames = {"aa10i" "ac10a" "ba10" "bc20" "ca10" "sex" "cityrur" "edu3" "hinc3rel" "age" "ga10j" "ca10bin"};
READ ALL INTO X; *read X1 & X2 to X matrix;
N = NROW(X); *N = number of observation: NROW is the number of row;
ONE = J(N,1,1); *J(nrow,ncol, value) i.e. 12x1 vector containing ones: J rprecents a function to creat matrix of a given dimention;
A=ONE`; *vector transpose, it can be also writen as T(ONE) ;
C= A*X; *will give 1x2 matrix, wtih a summed value of each columens;
MEAN = C/N; *matric containg means;
MEANS = ONE*MEAN;
XM= X-MEANS; *Corrected mean matrix;
SSCPM=XM`*XM; *sume of squares and cross products matrix;
DF=N-1; *degree of freedom;
S=SSCPM/DF; *Covariance matrix;
D=DIAG(S); *taking only the individual variances;
XS=XM*SQRT(INV(D)); *XS,standardized data,SQRT stands of squear root and INV, standes to inverted matrix;
R=t(XS)*XS/(N-1); *computing the coorelation matrix;
GV=DET(S); * computing from the determinant of the covariance matrix;
create kovarians from S[colname={"aa10i" "ac10a" "ba10" "bc20" "ca10" "sex" "cityrur" "edu3" "hinc3rel" "age" "ga10j" "ca10bin"}
rowname={"aa10i" "ac10a" "ba10" "bc20" "ca10" "sex" "cityrur" "edu3" "hinc3rel" "age" "ga10j" "ca10bin"}]; /** create data set **/
append from S;
close S;
QUIT;
My data looks something like this. There are no more columns than the ones you see.
I get the following error message:
590 append from S;
ERROR: Number of columns in S does not match with the number of variables in the data set.
statement : APPEND at line 590 column 6
591 close S;
NOTE: Cannot close WORK.S; it is not open.
592 QUIT;
NOTE: Exiting IML.
Would someone pretty please tell me what I'm doing wrong?

Reading the error and since I do not have the NY Dataset or test data generated from datastep I assume that the number of variables is not equal to the number of columns.
Hope this helps!

Related

SAS Studio: Finding Values of Column 2 in Column 1 Until Column 2 is Specific Value

I have a simple question that I can't seem to answer. I HAVE a large data set where I am searching for values of column 2 that are found in column 1, until column 2 is a specific value. Sounds like a DO loop but I don't have much experience using them. Please see image as this likely will explain better.
Essentially, I have a "starting" point (with the first_match flag=1). Then, I want to grab the value of column 2 in this row (B in this example). Next, I want to search for this value (B) in column 1. Once I find that row (with column 1 = B & column 2 = C), I again grab the value in column 2 (C). Again, I find where in column 1 this new value occurs and obtain the corresponding value of column 2. I repeat this process until column 2 has a value of Z. That's my stopping point. The WANT table shows my desired output.
My apologies if the above is confusing, but it seems like a simple exercise that I can't seem to solve. Any help would be greatly appreciated. Glad to supply further clarification as well.
Have & Want
I have tried PROC SQL to create flags and grab the appropriate rows, but the code is extremely bulky and doesn't seem efficient. Also, the example I laid out has a desired output table with 3 rows. This may not be the case as the desired output could contain between 1 and 10 rows.
This question has been asked and answered previously.
Path traversal can be done using a DATA Step hash object.
Example:
data have;
length vertex1 vertex2 $8;
input vertex1 vertex2;
datalines;
A B
X B
D B
E B
B C
Q C
C Z
Z X
;
data want(keep=vertex1 vertex2 crumb);
length vertex1 vertex2 $8 crumb $1;
declare hash edges ();
edges.defineKey('vertex1');
edges.defineData('vertex2', 'crumb');
edges.defineDone();
crumb = ' ';
do while (not last_edge);
set have end=last_edge;
edges.add();
end;
trailhead = 'A';
vertex1 = trailhead;
do while (0 = edges.find());
if not missing(crumb) then leave;
output;
edges.replace(key:vertex1, data:vertex2, data:'*');
vertex1 = vertex2;
end;
if not missing(crumb) then output;
stop;
run;
All paths in the data can be discovered with an additional outer loop iterating (HITER) over a hash of the vertex1 values.

Perform calculations using n and nmiss values

I have the following SAS PROC MEANS statement that works great as it is.
proc means data=MBA_NODUP_APPLICANT_&TERM. missing nmiss n mean median p10 p90 fw = 8;
where ENR = 1;
by SRC_TYPE;
var gmattotal greverb2 grequant2 greanwrt;
run;
However, I am trying to add new variable calculating nmiss/(nmiss+n). I don't see any examples of this online, but also nothing that says that it cannot be done.
To calculate the percent missing, which is what your formula means, just use the OUTPUT statement to generate a dataset with the NMISS and N values. Then add a step to do the arithmetic yourself.
Or you could create a new binary variable using the MISSING() function and take the MEAN of that. The mean of a 1/0 variable is the same are the percent that were 1 (TRUE).
Example:
data test;
set sashelp.cars;
missing_cylinders=missing(cylinders);
run;
proc means data=test nmiss n mean;
var cylinders missing_cylinders ;
run;
So 2/428 is a little less than 0.5%.
The MEANS Procedure
N
Variable Miss N Mean
------------------------------------------------
Cylinders 2 426 5.8075117
missing_cylinders 0 428 0.0046729

SAS Randgen call with Weibull distribution

I am trying to use call randgen within proc IML to create 10 random numbers that follow a Weibull distribution with certain parameters. Here is the code I am using (obviously there will be more than just one loop but I am just testing right now):
do i = 1 to 1;
Call randgen(Rands[i,1:Ntimes], 'Weibull', alpha[i], beta[i]);
print (rands[1,1:Ntimes]);
print (alpha[i]) (beta[i]);
end;
For this example Ntimes = 10, alpha[i] = 4.5985111, and beta[i] = 131.79508. My issue is that each of the 10 iterations/random numbers comes back as 1. I used the rweibull function in R with the same parameters and got results that made sense so I am thinking it has something to do with SAS or my code rather than an issue with the parameters. Am I using the Randgen call correctly? Does anyone know why the results would be coming out this way?
This works:
proc iml;
alpha=j(10);
beta=j(10);
alpha[1]=4.59;
beta[1] = 131.8;
Ntimes=10;
rands = j(1,10);
print (rands);
do i = 1 to 1;
Call randgen(Rands, 'WEIB', alpha[1],beta[1]);
print (rands);
end;
quit;
I don't think you can use Rands[1:Ntimes] that way. I think you would want to assign it to a temporary matrix and then assign that matrix's results to a larger matrix.
IE:
allRands=j(10,10);
do i = 1 to 10;
Call randgen(Rands, 'WEIB', alpha[1],beta[1]);
print (rands);
allRands[i,1:10]=Rands;
end;
print(allRands);
Actually, unless you are using an ancient version of SAS/IML, you don't need any loops. Since SAS/IML 12.3, the RANDGEN subroutine accepts a vector of parameters. In your case, define a vector for the alpha and beta parameters. Let's say there are 'Nparam' parameters. Then allocate an N x Nparam matrix to hold the results. With a single call to RANDGEN, you can fill the matrix so that the i_th column is a sample of size N from Weibull(alpha[i], beta[i]), as shown in the following example:
proc iml;
Nparam = 8; N = 1000;
alpha= 1:Nparam; /* assign parameter values */
beta = 10 + (Nparam:1);
rands = j(N,Nparam);
call randgen(rands, 'WEIB', alpha,beta); /* SAS/IML 12.1 */
/* DONE. The i_th column is a sample from Weibul(alpha[i], beta[i])
TEST IT: Compute the mean of each sample: */
mean = mean(rands); std = std(rands);
print (alpha//beta//mean//std)[r={"alpha" "beta" "mean" "std"}];
/* TEST IT: Plot the distribution of each sample (SAS/IML 12.3) */
title "First param"; call histogram(rands[,1]);
title "Last param"; call histogram(rands[,Nparam]);

Fortran to SAS code conversiton

I want to create 1, 2, 3 dimensional variables/arrays inside of proc iml. My code looks following:
proc iml;
start Mean1(x); /*this is 1 dimension variable/array*/
Mean1(x)=sum(x)/dim(x);
finish;
proc iml;
start Mean2(x); /*this is 2 dimension variable/array*/
Mean1(x)=sum(x)/dim(x);
finish;
proc iml;
start Mean3(x); /*this is 3 dimension variable/array*/
Mean1(x)=sum(x)/dim(x);
finish;
I tried to do like this:
proc iml;
declare double x[dim(n),dim(n)];
start Mean2(x); /*this is 2 dimension variable or array*/
Mean1(x)=sum(x)/dim(a, x);
finish;
But it's not working. Could you help me?
There are a few things to know here.
SAS IML arrays are 1 indexed C style row major arrays. Not column
major like Fortran.
To my knowledge, there are no 3 dimensional arrays in IML. Always possible I am mistaken.
All numbers in SAS are doubles.
IML has nice reduction operators that make means easy
and very fast.
To declare a matrix/array, use the J(nrow,ncol,fill) function:
proc iml;
x = J(10,5,1); /*Declare a 10x5 matrix filled with 1s*/
x = normal(x); /*Fills matrix X with random numbers, uses the values in X as the seed*/
mean_all = x[:]; /*mean over all values in x*/
mean_col = x[:,];/*mean of each column */
mean_row = x[,:];/*mean of each row */
print mean_all;
print mean_col;
print mean_row;
quit;
I highly recommend going through the IML documentation. http://support.sas.com/documentation/onlinedoc/iml/index.html

SAS computation using double loops

I am trying to compute using two loops. But I am not very familiar with loop elements.
Here is my data:
data try;
input rs t a b c;
datalines;
0 600
1 600 0.02514 667.53437 0.1638
2 600 0.2766 724.60233 0.30162
3 610 0.01592 792.34628 0.21354
4 615.2869 0.03027 718.30377 0.22097
5 636.0273 0.01967 705.45965 0.16847
;
run;
What I am trying to compute is that for each 'T' value, all elements of a, b, and c need to be used for the equation. Then I create varaibles v1-v6 to put results of the equation for each T1-T6. After that, I create CS to sum all the elements of v.
So my result dataset will look like this:
rs T a b c v1 v2 v3 v4 v5 v6 CS
0 600 sum of v1
1 600 0.02514 667.53437 0.1638 sum of v2
2 600 0.2766 724.60233 0.30162 sum of v3
3 610 0.01592 792.34628 0.21354 sum of v4
4 615.2869 0.03027 718.30377 0.22097 sum of v5
5 636.0273 0.01967 705.45965 0.16847 sum of v6
I wrote a code below to do this but got errors. Mainly I am not sure how to use i and j properly to link all elements of variables. Can someone point out what i did not think correct? I am aware that myabe I should not use sum function to cum up elements of a variable but not sure which function to use.
data try3;
set try;
retain v1-v6;
retain t a b c;
array v(*) v1-v6;
array var(*) t a b c;
cs=0;
do i=1 to 6;
do j=1 to 6;
v[i,j]=(2.89*(a[j]**2*(1-c[j]))/
((c[j]+exp(1.7*a[j]*(t[i]-b[j])))*
((1+exp(-1.7*a[j]*(t[i]-b[j])))**2));
cs[i]=sum(of v[i,j]-v[i,j]);
end;
end;
run;
Forexample, v1 will be computed like v[1,1] =0 because there is no values for a b c.
For v[1,2]=(2.89*0.02514**2(1-0.1638))/((0.1638+exp(1.7*0.02514*600-667.53437)))*((1+exp(-1.7*0.02514*(600-667.5347)))**2)).
v[1,3]]=(2.89*0.2766**2(1-0.30162))/((0.30162+exp(1.7*0.2766*600-724.60233)))*((1+exp(-1.7*0.2766*(600-724.60233)))**2)).
v[1,4] will be using the next line values of a b c but the t will be same as the t[1]. and do this until the last row. And that will be v1. And then I need to sum all the elements of v1 like v1{1,1] +v1[1,2]+ v1{1,3] ....v1[1,6] to make cs[1,1].
The SAS language isn't that good at doing these kinds of things, which are essentially matrix calculations. The DATA step normally processes one observation at a time, though you can carry calculations over using the RETAIN statement. It is possible that you could get a cleaner result than this if you had access to PROC IML (which does matrix calculations natively), but assuming that you don't have access to IML, you need to do something like the following. I'm not 100% sure that it is what you need, but I think it is along the right lines:
data try;
infile cards missover;
input rs t a b c;
datalines;
0 600
1 600 0.02514 667.53437 0.1638
2 600 0.2766 724.60233 0.30162
3 610 0.01592 792.34628 0.21354
4 615.2869 0.03027 718.30377 0.22097
5 636.0273 0.01967 705.45965 0.16847
;
run;
data try4(rename=(aa=a bb=b cc=c css=cs tt=t vv1=v1 vv2=v2 vv3=v3 vv4=v4 vv5=v5 vv6=v6));
* Construct arrays into which we will read all of the records;
array t(6);
array a(6);
array b(6);
array c(6);
array v(6,6);
array cs(6);
* Read all six records;
do i=1 to 6;
set try(rename=(t=tt a=aa b=bb c=cc));
t[i] = tt;
a[i] = aa;
b[i] = bb;
c[i] = cc;
end;
* Now do the calculation, which involves values from each
row at each iteration;
do i=1 to 6;
cs[i]=0;
do j=1 to 6;
v[i,j]=(2.89*(a[j]**2*(1-c[j]))/
((c[j]+exp(1.7*a[j]*(t[i]-b[j])))*
((1+exp(-1.7*a[j]*(t[i]-b[j])))**2)));
cs[i]+v[i,j];
end;
* Then output the values for this iteration;
tt=t[i];
aa=a[i];
bb=b[i];
cc=c[i];
css=cs[i];
vv1=v[i,1];
vv2=v[i,2];
vv3=v[i,3];
vv4=v[i,4];
vv5=v[i,5];
vv6=v[i,6];
keep tt aa bb cc vv1-vv6 css;
output try4;
end;
Note that I have to construct arrays of known size, that is you have to know how many input records there are.
The first half of the DATA step constructs arrays into which the values from the input data set are read. We read all of the records, and then we do all of the calculations, since we have all of the values in memory in the matricies.
There is some fiddling with RENAMES so that you can keep the array names t, a, b, c etc but still have variables named a, b, c etc in the output data set.
So hopefully that might help you along a bit. Either that or confuse you because I've misunderstood what you're trying to do!