sum of square errors in proc iml - sas

I'm trying to create a code to run Newton Raphson optimization. I'm using proc iml, but when I need to evaluate the error (e) I need to sum up all the square differences and don't know how to tell SAS that in that case I need the sum of the components of the vector and not the vector.
The code is the following:
proc iml; use chap0; read all var{X} into X;
read all var{t} into t;
W=1;
s= exp(X*w)/(1+ exp(X*w)); print s;
e = (s - t) ** 2; /*here I need the result of the sum for that and not the matrix*/
g=2*(s-t)*s*(1-s);
h=2 * s * (1 - s) * (s * (1 - s) + (s - t) * (1 - 2 * s));
count=0;/*init count number*/
do until (e<1e-8);
count=count+1;
w=w0-g/h; /*here I also need the sum of g and h*/
s= exp(X*w)/(1+ exp(X*w));
e = (s - t) ** 2;
wo=w;
end;
Thanks!

You should use the SSQ function to calculate sum of squares in IML.
e=ssq(s-t);
Here are several other ways to do this. Note the first way gives a different result (as it's summing the squares of s and -t), just an example of the difference of how you pass the arguments.
proc iml;
x = 1:5;
y = j(1,5,2);
e1=ssq(x,-y); *sum of the squares, not actually subtracting note, so it is not the same answer;
e2=ssq(x-y); *sum of squares of the differences;
e3=(x-y)*(x-y)`; *multiplying a vector by its transpose sums it;
e4=(x-y)[##]; *summation subscript operator, see note;
print e1 e2 e3 e4;
quit;
Rick Wicklin has a post about the ## operator, which is quite useful.

Related

Hyperbolic sine without math.h

im new to code and c++ for a homework assignment im to create a code for sinh without the math file. I understand the math behind sinh, but i have no idea how to code it, any help would be highly appreciated.
According to Wikipedia, there is a Taylor series for sinh:
sinh(x) = x + (pow(x, 3) / 3!) + (pow(x, 5) / 5!) + pow(x, 7) / 7! + ...
One challenge is that you are not allowed to use the pow function. The other is calculating the factorial.
The series is a sum of terms, so you'll need a loop:
double sum = 0.0;
for (unsigned int i = 0; i < NUMBER_OF_TERMS; ++i)
{
sum += Term(i);
}
You could implement Term as a separate function, but you may want to take advantage of declaring and using variables in the loop (that the function may not have access to).
Consider that pow(x, N) expands to x * x * x...
This means that in each iteration the previous value is multiplied by the present value. (This will come in handy later.)
Consider that N! expands to 1 * 2 * 3 * 4 * 5 * ...
This means that in each iteration, the previous value is multiplied by the iteration number.
Let's revisit the loop:
double sum = 0.0;
double power = 1.0;
double factorial = 1.0;
for (unsigned int i = 1; i <= NUMBER_OF_TERMS; ++i)
{
// Calculate pow(x, i)
power = power * x;
// Calculate x!
factorial = factorial * i;
}
One issue with the above loop is that the pow and factorial need to be calculated for each iteration, but the Taylor Series terms use the odd iterations. This is solved by calculated the terms for odd iterations:
for (unsigned int i = 1; i <= NUMBER_OF_TERMS; ++i)
{
// Calculate pow(x, i)
power = power * x;
// Calculate x!
factorial = factorial * i;
// Calculate sum for odd iterations
if ((i % 2) == 1)
{
// Calculate the term.
sum += //...
}
}
In summary, the pow and factorial functions are broken down into iterative pieces. The iterative pieces are placed into a loop. Since the Taylor Series terms are calculated with odd iteration values, a check is placed into the loop.
The actual calculation of the Taylor Series term is left as an exercise for the OP or reader.

IML correlation from different matrices

given a matrix X(n * p), I want to split X into Y1(n * p-k) and Y2(n * k), where Y1 is composed by the first k columns of X and Y2 the others.
Now, in R I can get the "crossed" correlation between the columns of Y1 and Y2 calling cor(Y1,Y2, use="pairwise.complete.obs"), how can I get the same result in SAS IML where the corr function admits only 1 dataset?
I tried to find an appropriate solution or algorithm to implement it but with bad results.
Can anyone help with this? Also pointing me some literature about this kind or correlation would be great! I don't want you to code it for me, simply some help or hint on existing functions or algorithms to translate.
Thank you.
EDIT: don't search on the web for crossed correlation, I wrote it simply for trying to explain myself.
Looking up "crossed correlation" leads you to a series of literature on signal processing and a function much like the autocorrelation function. In fact, in R it is documented with acf https://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html.
But that is not what your code is doing. In R:
n = 100
p = 6
k = 2
set.seed(1)
r = rnorm(n*p)
x= matrix(r,n,p)
y1 = x[,1:k]
y2 = x[,(k+1):p]
cor.ys = cor(y1,y2,use="pairwise.complete.obs")
cor.x = cor(x)
(cor.ys - cor.x[1:k,(k+1):p])
You see the result from cor(y1,y2) is just a piece of the correlation matrix from x.
You should be able to put this in IML easily.
I can think of a few ways to do this. The simplest is to compute the full matrix of Pearson correlations (using the pairwise option) and then subset the result. (What DomPazz said.) If you have hundreds of variables and you only want a few of the correlations, it will be inefficient, but it is very simple to program:
proc iml;
n = 100; p = 6; k = 2;
call randseed(1);
x = randfun(n//p, "Normal");
varNames = "x1":"x6";
corr = corr(x, "pearson", "pairwise"); /* full matrix */
idx1 = 1:k; /* specify VAR */
idx2 = (k+1):p; /* specify WITH */
withCorr = corr[idx2, idx1]; /* extract submatrix */
print withcorr[r=(varNames[idx2]) c=(varNames[idx1])];
Outside of SAS/IML you can use PROC CORR and the WITH statement to do the same computation, thereby validating your SAS/IML program:
proc corr data=test noprob nosimple;
var x1-x2;
with x3-x6;
run;

How to do multiplication between two matrix using IML in SAS

I have data set named input_data below import from EXCEL.
0.353481635 0.704898683 0.078640917 0.813815803 0.510842666 0.240912872 0.986312218 0.781868961 0.682272971
0.443441526 0.653187181 0.753981865 0.34909803 0.84215961 0.793863082 0.047816942 0.176759112 0.54213244
0.21443281 0.142501578 0.927011587 0.407251043 0.290280445 0.90730524 0.677030212 0.770541244 0.915728969
0.583493041 0.685127614 0.119042255 0.067769934 0.795793907 0.405029459 0.817724346 0.594170688 0.345660875
0.816193304 0.636823417 0.036348358 0.027985453 0.117027493 0.436516667 0.593191955 0.916981676 0.574223091
0.766842249 0.743249552 0.400052263 0.809650253 0.683610082 0.42152573 0.050520292 0.329441952 0.868549022
0.112847881 0.462579082 0.526220066 0.320851313 0.944585551 0.233027402 0.66141107 0.8380858 0.120044416
0.873949265 0.118525986 0.590234323 0.481974796 0.668976582 0.466558592 0.934633956 0.643438048 0.053508922
And I have another data set called p below
data p;
input p;
datalines;
0.12
0.23
0.11
0.49
0.52
0.78
0.8
0.03
0.02
run;
proc transpose data = p out=p2;
run;
What I want to do is matrix manipulation in IML using SAS.
I have some code already, but the final calculation got error. Can someone give me a hand?
proc iml;
use input_data;
read all var _num_ into x;
print x;
proc iml;
use p2;
read all var _num_ into k;
print k;
proc iml;
Value1 = k * x;
print Value1;
quit;
You have several problems here.
First off, you have three PROC IML statements. PROC IML only persists values while it's running; once it quits, all of the vectors go away forever. So remove the PROC IMLs.
Second, you need to make sure your matrices are correctly ordered and structured. Matrix multiplication works by the following:
m x n * n x p = m x p
Where both N's must be the same. This is rows x columns, so the left-side matrix must have the same number of columns as the right-side matrix has rows. (This is because each element of each row on the left-side matrix is multiplied by the corresponding element in the column on the right-side matrix and then summed, so if the numbers don't match it's not possible to do.)
So you have 8x9 and 9x1, which you transpose to 1x9. So first off, don't transpose p, leave it 9x1. Then, make sure you have the order right (matrix multiplication is NOT commutative, the order matters). k * x means 9x1 * 8x9 which doesn't work (since 1 and 8 aren't the same - remember, the inner two numbers have to match.) x*k does work, since that is 8x9 * 9x1, the two 9s match.
Final output:
proc iml;
use input_data;
read all var _num_ into x;
print x;
use p;
read all var _num_ into k;
print k;
Value1 = x * k;
print Value1;
quit;

sas- compare components of a vector without a loop inside proc iml

I'm writing a code in proc iml and I want to run an if statement that evaluates each component of a vector and returns another vector but in one step. Is there any function to do so? Here's the code:
proc iml;
use chap0; read all var{X} into X;
read all var{t} into t;
count=0;/*init count number*/
W=1;
s= exp(X*w)/(1+ exp(X*w));
s1=j(5,1,10);
do step = 1 to 6;
count=count+1;
s= exp(X*w)/(1+ exp(X*w));
if s <0.5 then s1= 0; /**in this part I need to get a vector with 0 and 1**/
if s >0.5 then s1= 1; /*I need to evaluate each component of the vector in this step*/
print s s1;
e = ssq(s - t);
g=2*(s-t)*s`*(1-s);
h=2 * s * (1 - s)` * (s * (1 - s)` + (s - t) * (1 - 2 * s)`);
o=j(1,5,1);
gg=(o*g);
hh=((o*h)*o`);
gi=gg/hh;
w1=w-gi;
s= exp(X*w)/(1+ exp(X*w));
if s <0.5 then s1= 0; /**here again:
in this part I need to get a vector with 0 and 1**/
if s >0.5 then s1= 1;
print s1;
e = ssq(s - t);
e1 = ssq(s1 - t);
w=w1;
print w w1 e e1 count;
end;
Thanks!
It's just as easy as it seems.
proc iml;
s = 1:5;
s1 = (s>3); *this assigns 1 (true) or 0 (false), for each element, based on relation to 3.;
print s1;
quit;

PROC Lifereg - Holding one parameter fixed

In the LIFEREG procedure, you can specify a generalized gamma distribution using the dist = gamma option, which generates an estimate based on the three parameter generalized gamma distribution. SAS states that the standard two parameter gamma distribution isn't available, but it would be if one could fix the Shape parameter to be equal to 1, per http://en.wikipedia.org/wiki/Generalized_gamma_distribution.
Is it possible in LIFEREG to fix a value of a particular parameter, or is there a setup in something like NLMIXED that might work. For reference, the full code I'd be using looks like so:
proc lifereg data=work.data;
model t*event(0) = X / D= Gamma;
run;
You could do a MLE for the 2-parameter gamma distribution in a data step. Snippet:
s = log(meanvar) - meanlogvar;
k = (3 - s + sqrt( (s - 3)**2 + 24 * s )) / (12 * s);
do j=1 to &iterations until( abs(k - ki) < &condition );
ki = k;
k = ki - ( (log(ki) - digamma(ki) - s) / ((1/ki) - trigamma(ki)) );
end;
theta = meanvar / k;
See: http://en.wikipedia.org/wiki/Gamma_distribution#Maximum_likelihood_estimation