given a matrix X(n * p), I want to split X into Y1(n * p-k) and Y2(n * k), where Y1 is composed by the first k columns of X and Y2 the others.
Now, in R I can get the "crossed" correlation between the columns of Y1 and Y2 calling cor(Y1,Y2, use="pairwise.complete.obs"), how can I get the same result in SAS IML where the corr function admits only 1 dataset?
I tried to find an appropriate solution or algorithm to implement it but with bad results.
Can anyone help with this? Also pointing me some literature about this kind or correlation would be great! I don't want you to code it for me, simply some help or hint on existing functions or algorithms to translate.
Thank you.
EDIT: don't search on the web for crossed correlation, I wrote it simply for trying to explain myself.
Looking up "crossed correlation" leads you to a series of literature on signal processing and a function much like the autocorrelation function. In fact, in R it is documented with acf https://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html.
But that is not what your code is doing. In R:
n = 100
p = 6
k = 2
set.seed(1)
r = rnorm(n*p)
x= matrix(r,n,p)
y1 = x[,1:k]
y2 = x[,(k+1):p]
cor.ys = cor(y1,y2,use="pairwise.complete.obs")
cor.x = cor(x)
(cor.ys - cor.x[1:k,(k+1):p])
You see the result from cor(y1,y2) is just a piece of the correlation matrix from x.
You should be able to put this in IML easily.
I can think of a few ways to do this. The simplest is to compute the full matrix of Pearson correlations (using the pairwise option) and then subset the result. (What DomPazz said.) If you have hundreds of variables and you only want a few of the correlations, it will be inefficient, but it is very simple to program:
proc iml;
n = 100; p = 6; k = 2;
call randseed(1);
x = randfun(n//p, "Normal");
varNames = "x1":"x6";
corr = corr(x, "pearson", "pairwise"); /* full matrix */
idx1 = 1:k; /* specify VAR */
idx2 = (k+1):p; /* specify WITH */
withCorr = corr[idx2, idx1]; /* extract submatrix */
print withcorr[r=(varNames[idx2]) c=(varNames[idx1])];
Outside of SAS/IML you can use PROC CORR and the WITH statement to do the same computation, thereby validating your SAS/IML program:
proc corr data=test noprob nosimple;
var x1-x2;
with x3-x6;
run;
Related
I am trying to draw random samples from some distribution as follows:
my code runs but the numbers look strange. so I am not sure what went wrong, maybe some operators. The elements are extremely large.
my attempt:
C_hat=(((x`)*x)**(-1))*((x`)*z);
S=((z-x*c_hat)`)*((z-x*c_hat));
*draw sigma;
sigma = shape(RandWishart(1, 513 - 3 - 2,s**(-1)),4,4);
*draw vec(c);
vec_c_hat= colvec(c_hat`); *vectorization of c_hat;
call randseed(4321);
vec_c = RandNormal(1,vec_c_hat,(sigma`)#(((x`)*x)**(-1)));
c = shape(vec_c,4,4);
print c;
Since you haven't provided data or a reference, it is difficult to guess whether your "strange" and "extremely large" numbers are correct. However, the program looks mostly correct, so check your data.
A minor problem with your program is that you are using the SHAPE function to reshape the vec_c vector into a matrix. You should be using the SHAPECOL function (or transpose the result).
The following program uses the Sashelp.Cars data, which is distributed with SAS, to initialize the X and Z matrices. The program computes a random matrix C which is close to the inverse crossproduct matrix for the data. I've also added some intermediate computations and comments. This version works as expected on the Sashelp.Cars data:
proc iml;
use sashelp.cars;
read all var {weight wheelbase enginesize horsepower} into X;
read all var {mpg_city mpg_highway} into Z;
close;
*normal equations and covariance;
xpx = x`*x;
invxpx = inv(xpx);
C_hat = invxpx*(x`*z);
r = z-x*c_hat;
S = r`*r;
*draw sigma;
call randseed(4321);
DF = nrow(X)-ncol(X)-2;
W = RandWishart(1, DF, inv(S)); /* 1 x (p*p) row vector */
sigma = shape(W, sqrt(ncol(W))); /* reshape to p x p matrix */
*draw vec(c);
vec_c_hat = colvec(c_hat`); /* stack columns of c_hat */
vec_c = RandNormal(1, vec_c_hat, sigma#invxpx);
c = shapecol(vec_c, nrow(C_hat), ncol(C_hat)); /* reshape w/ SHAPECOL */
print C_hat, c;
Can someone please help with the scenario below? I am very new to SaS and am not sure how to get this to work?
Simulate 200 observations from the following linear model:
Y = alpha + beta1 * X1 + beta2 * X2 + noise
where:
• alpha=1, beta1=2, beta2=-1.5
• X1 ~ N(1, 4), X2 ~ N(3,1), noise ~ N(0,1)
I have tried this code but not sure its completely accurate:
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
DO i = 1 to 200;
Y=alpha+beta1*X1+beta2*X2+Noise;
X1=Rannor(1);
X2=rannor(3);
Noise=ranuni(0);
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
You need to have a look in the SAS help for the topics
"rannor","ranuni","generating random numbers",...
rannor: generating standard normal distributed RVs.
ranuni: uniform distributed RVs.
The argument in rannor is the seed number, not the expected value.
If N(x,y) in your example means that the random variable is normally distributed with expected value x and standard deviation y (or do you mean the variance???) then the code could be (have a look on the changed order of the statements; the definition of Y has to be after the definition of the random numbers...):
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
seed = 1234;
DO i = 1 to 200;
X1=1+4*Rannor(seed);
X2=3+rannor(seed);
Noise=rannor(seed);
Y=alpha+beta1*X1+beta2*X2+Noise;
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
There are also variants for generating random numbers, e.g. "call rannor". There are different concepts to deal with seed numbers in SAS. See the SAS help for these topics, e.g. here
I have data set named input_data below import from EXCEL.
0.353481635 0.704898683 0.078640917 0.813815803 0.510842666 0.240912872 0.986312218 0.781868961 0.682272971
0.443441526 0.653187181 0.753981865 0.34909803 0.84215961 0.793863082 0.047816942 0.176759112 0.54213244
0.21443281 0.142501578 0.927011587 0.407251043 0.290280445 0.90730524 0.677030212 0.770541244 0.915728969
0.583493041 0.685127614 0.119042255 0.067769934 0.795793907 0.405029459 0.817724346 0.594170688 0.345660875
0.816193304 0.636823417 0.036348358 0.027985453 0.117027493 0.436516667 0.593191955 0.916981676 0.574223091
0.766842249 0.743249552 0.400052263 0.809650253 0.683610082 0.42152573 0.050520292 0.329441952 0.868549022
0.112847881 0.462579082 0.526220066 0.320851313 0.944585551 0.233027402 0.66141107 0.8380858 0.120044416
0.873949265 0.118525986 0.590234323 0.481974796 0.668976582 0.466558592 0.934633956 0.643438048 0.053508922
And I have another data set called p below
data p;
input p;
datalines;
0.12
0.23
0.11
0.49
0.52
0.78
0.8
0.03
0.02
run;
proc transpose data = p out=p2;
run;
What I want to do is matrix manipulation in IML using SAS.
I have some code already, but the final calculation got error. Can someone give me a hand?
proc iml;
use input_data;
read all var _num_ into x;
print x;
proc iml;
use p2;
read all var _num_ into k;
print k;
proc iml;
Value1 = k * x;
print Value1;
quit;
You have several problems here.
First off, you have three PROC IML statements. PROC IML only persists values while it's running; once it quits, all of the vectors go away forever. So remove the PROC IMLs.
Second, you need to make sure your matrices are correctly ordered and structured. Matrix multiplication works by the following:
m x n * n x p = m x p
Where both N's must be the same. This is rows x columns, so the left-side matrix must have the same number of columns as the right-side matrix has rows. (This is because each element of each row on the left-side matrix is multiplied by the corresponding element in the column on the right-side matrix and then summed, so if the numbers don't match it's not possible to do.)
So you have 8x9 and 9x1, which you transpose to 1x9. So first off, don't transpose p, leave it 9x1. Then, make sure you have the order right (matrix multiplication is NOT commutative, the order matters). k * x means 9x1 * 8x9 which doesn't work (since 1 and 8 aren't the same - remember, the inner two numbers have to match.) x*k does work, since that is 8x9 * 9x1, the two 9s match.
Final output:
proc iml;
use input_data;
read all var _num_ into x;
print x;
use p;
read all var _num_ into k;
print k;
Value1 = x * k;
print Value1;
quit;
For my SAS project I have to generate pairs of (X,Y) with a distribution Y ~ N(3 + X + .5X^2, sd = 2). I have looked at all of the SAS documentation for normal() and I see absolutely no way to do this. I have tried many different methods and am very frustrated.
I believe this is an example of what the asker wants to do:
data sample;
do i = 1 to 1000;
x = ranuni(1);
y = rand('normal', 3 + x + 0.5*x**2, 2);
output;
end;
run;
proc summary data = sample;
var x y;
output out = xy_summary;
run;
Joe is already more or less there - I think the only key point that needed addressing was making the mean of each y depend on the corresponding x, rather than using a single fixed mean for all the pairs. So rather than 1000 samples from the same Normal distribution, the above generates 1 sample from each of 1000 different Normal distributions.
I've used a uniform [0,1] distribution for x, but you could use any distribution you like.
You generate random numbers in SAS using the rand function. It has all sorts of distributions available; read the documentation to fully understand.
I'm not sure if you can directly use your PDF, but if you're able to use it with a regular normal distribution, you can do that. On top of that, most of the Univariate DFs SAS supports start out with the Uniform distribution and then apply their formula (Discrete or continuous) to that, so that might be the right way to go. That's heading into stat-land which is somewhere I'm averse to going. There isn't a direct way to simply pass a function for X as far as I know, however.
To generate [numsamp] normals with mean M and standard deviation SD:
%let m=0;
%let sd=2;
%let numsamp=100;
data want;
call streaminit(7);
do id = 1 to &numsamp;
y = rand('Normal',&m.,&sd.);
output;
end;
run;
So if I understand what you want right, this might work:
%let m=0;
%let sd=2;
%let numsamp=1000;
data want;
call streaminit(7);
do id = 1 to &numsamp;
x = rand('Normal',&m.,&sd.);
y = 0.5*x**2 + x + 3;
output;
end;
run;
proc means data=want;
var x y;
run;
X has mean 0.5 with SD 1.96 (roughly what you ask for). Y has mean 5 with SD 3.5. If you're asking for Y to have a SD of 2, i'm not sure how to do that.
I am trying to calculate z-scores by creating a variable D from 3 other variables, namely A, B, and C. I am trying to generate D as : D= (A-B)/C but for some reason when I do it, it produces very large numbers. When I did just (A-B) it did not get what it should have when I calculated by hand, instead of -2, I for -105.66.
Variable A is 'long' and variable B is 'float', I am not sure if this is the reason? My stata syntax is:
gen zscore= (height-avheight)/meansd
did not work.
You are confusing scalars and variables. Here's a solution (chop off the first four lines and replace x by height to fit the calculation into your code):
// example data
clear
set obs 50
gen x = runiform()
// summarize
qui su x
// store scalars
sca de mu = r(mean)
sca de sd = r(sd)
// z-score
gen zx = (x - mu) / sd
su zx
x and its z-score zx are variables that take many values, whereas mu and sd are constants. You might code constants in Stata by using scalars or macros.
I am not sure what you are trying to get, but I will use the auto data from Stata to explain. This is basic stuff in Stata. Say I want to test that the price=3
sysuse auto
sum price
#return list which is optional command
scalar myz=(3-r(mean))/r(sd) #r(mean) and r(sd) gives the mean and sd of price, if that is given you can simply enter the value for that
dis myz
-2.0892576
So, z value is -2.09 here.