How to compute regression coefficients with proc mixed in sas? - sas

Here are my data. Data are structured like so: id x1 x2 x3 y.
I used proc mixed to analyze it, but now want to determine regression coefficients and I don't know how to do it. I'm only a beginner with sas. From the results I see that x1, x2, x3 and x1x2x3 are the significant effects, but how to determine the coefficients alpha, beta, gamma, delta, theta:
y = theta + alpha*x1 + beta*x2 + gamma*x3 + delta*x1*x2*x3
This is my code:
ods graphics on;
proc mixed data=test;
class x1 x2 x3;
model y = x1 | x2 | x3 / solution residual;
random id;
run;
ods graphics off;
EDIT 1: Here is a part of the table Solutions for Fixed Effects:
Since x1 has two levels, there are two rows for it in the table. Do I get the effect of x1 by summing these two values: -109.07 for the first row and 0 for the second, or should I do something else? Note that this is 2^k design. The effect of x1 should be computed as half the difference between the average values for y when x1 is high (20) and when it is low (10).

Based on your model, x1, x2, x3 should be treated as continuous variables, then you should be able to get the coefficients in your model.
proc mixed data=test;
model y=x1 x2 x3 x1*x2*x3/ solution residual;
random id/s;
run;
However, based on your code and the values of x1, x2 and x3, it would be better to treat them as categorical variable as what you did, then the Estimate in your table actually is the mean difference between whatever two levels. The link below may help you understand your results.
http://support.sas.com/kb/38/384.htmlexplanation of estimation of coefficients

The solution option should generate your estimates.You need to include it on the model and random statements. You should see two tables, Solution for Fixed Effects and Solution for Random Effects that hold the estimates.
proc mixed data=test;
class x1 x2 x3;
model y = x1 | x2 | x3 / solution residual;
random id / s;
run;
The Random Coefficients example in the documentation is close to your question.
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect034.htm

Related

How to simulate pairs from a joint distribution

I have two normal distributions X and Y and with a given covariance between them and variances for both X and Y, I want to simulate (say 200 points) of pairs of points from the joint distribution, but I can't seem to find a command/way to do this. I want to eventually plot these points in a scatter plot.
so far, I have
set obs 100
set seed 1
gen y = 64*rnormal(1, 5/64)
gen x = 64*rnromal(1, 5/64)
matrix D = (1, .5 | .5, 1)
drawnorm x, y, cov(D)
but this makes an error saying that x and y already exist.
Also, once I have a sample, how would I plot the drawnorm output as a scatter?
A related approach for generating correlated data is to use the corr2data command:
clear
set obs 100
set seed 1
matrix D = (1, .5 \ .5, 1)
drawnorm x1 y1, cov(D)
corr2data x2 y2, cov(D)
. summarize x*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
x1 | 100 .0630304 1.036762 -2.808194 2.280756
x2 | 100 1.83e-09 1 -2.332422 2.238905
. summarize y*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
y1 | 100 -.0767662 .9529448 -2.046532 2.726873
y2 | 100 3.40e-09 1 -2.492884 2.797518
It is important to note that unlike drawnorm, the corr2data approach does not generate data that is a sample from an underlying population.
You can then create a scatter plot as follows:
scatter x1 y1
Or to compare the two approaches in a single graph:
twoway scatter x1 y1 || scatter x2 y2
EDIT:
For specific means and variances you need to specify the mean vector μ and covariance matrix Σ in drawnorm. For example, to draw two random variables that are jointly normally distributed with means of 8 and 12, and variances 5 and 8 respectively, you type:
matrix mu = (8, 12)
scalar cov = 0.4 * sqrt(5 * 8) // assuming a correlation of 0.4
matrix sigma = (5, cov \ cov, 8)
drawnorm double x y, means(mu) cov(sigma)
The mean and cov options of drawnorm are both documented in the help file.
Here is an almost minimal example:
. clear
. set obs 100
number of observations (_N) was 0, now 100
. set seed 1
. matrix D = (1, .5 \ .5, 1)
. drawnorm x y, cov(D)
As the help for drawnorm explains, you must supply new variable names. As x and y already exist, drawnorm threw you out. You also had a superfluous comma that would have triggered a syntax error.
help scatter tells you about scatter plots.

Stata - How to get residuals for original equation using estimates for differenced equation

I have variables y, x1 and x2. I estimate a differenced equation (without the intercept) using reg d.(y x1 x2), nocons. Now I want to get the residuals for the original variables using the estimated coefficients. I can do it by
reg d.(y x1 x2), nocons
matrix b = e(b)
gen resid = y - b[1,1]*x1 - b[1,2]*x2
But would there be an easier way? I need to keep those generated residuals for future use. Here is a complete minimal example.
clear all
set obs 100
gen id = floor((_n-1)/5)+1
by id, sort: gen year = 1990+_n
xtset id year
set seed 1
gen x1 = rnormal()
gen x2 = rnormal()
gen y = rnormal()
*** Data generated ***
reg d.(y x1 x2), nocons
matrix b = e(b)
gen resid = y - b[1,1]*x1 - b[1,2]*x2
I wonder if there is a flexible approach because sometimes I want to completely change variable names for the regression (e.g., reg dy dx1 dx2, nocons not just reg d.(y x1 x2)). I thought perhaps predict might be helpful but I don't know. Would it be possible to avoid typing the variable names explicitly?
predict will not work since it will create residuals on the differenced scale. You want residuals in terms of the original y, which is unusual, so there is no off-the-shelf solution.
I think the easiest path is to do something like this:
reg d.(y x1 x2), nocons coefl
local vars:colnames e(b) // get a list of coefficients
foreach x of local vars {
local xvar = subinstr("`x'","D.","",1) // strip out the D. prefix from the coefficient names
local diff "`diff' - _b[`x']*`xvar'"
}
gen resid = y `diff'
If you have covariates like dx1 and dx2, you can modify the prefix stipper like this:
local xvar = subinstr("`x'","d","",1) // strip out the first d prefix

IML correlation from different matrices

given a matrix X(n * p), I want to split X into Y1(n * p-k) and Y2(n * k), where Y1 is composed by the first k columns of X and Y2 the others.
Now, in R I can get the "crossed" correlation between the columns of Y1 and Y2 calling cor(Y1,Y2, use="pairwise.complete.obs"), how can I get the same result in SAS IML where the corr function admits only 1 dataset?
I tried to find an appropriate solution or algorithm to implement it but with bad results.
Can anyone help with this? Also pointing me some literature about this kind or correlation would be great! I don't want you to code it for me, simply some help or hint on existing functions or algorithms to translate.
Thank you.
EDIT: don't search on the web for crossed correlation, I wrote it simply for trying to explain myself.
Looking up "crossed correlation" leads you to a series of literature on signal processing and a function much like the autocorrelation function. In fact, in R it is documented with acf https://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html.
But that is not what your code is doing. In R:
n = 100
p = 6
k = 2
set.seed(1)
r = rnorm(n*p)
x= matrix(r,n,p)
y1 = x[,1:k]
y2 = x[,(k+1):p]
cor.ys = cor(y1,y2,use="pairwise.complete.obs")
cor.x = cor(x)
(cor.ys - cor.x[1:k,(k+1):p])
You see the result from cor(y1,y2) is just a piece of the correlation matrix from x.
You should be able to put this in IML easily.
I can think of a few ways to do this. The simplest is to compute the full matrix of Pearson correlations (using the pairwise option) and then subset the result. (What DomPazz said.) If you have hundreds of variables and you only want a few of the correlations, it will be inefficient, but it is very simple to program:
proc iml;
n = 100; p = 6; k = 2;
call randseed(1);
x = randfun(n//p, "Normal");
varNames = "x1":"x6";
corr = corr(x, "pearson", "pairwise"); /* full matrix */
idx1 = 1:k; /* specify VAR */
idx2 = (k+1):p; /* specify WITH */
withCorr = corr[idx2, idx1]; /* extract submatrix */
print withcorr[r=(varNames[idx2]) c=(varNames[idx1])];
Outside of SAS/IML you can use PROC CORR and the WITH statement to do the same computation, thereby validating your SAS/IML program:
proc corr data=test noprob nosimple;
var x1-x2;
with x3-x6;
run;

In PROC LOGISTIC which value of the parameter is modelled?

My colleague and I are running exactly the same SAS PROC LOGISTIC, but with different input files.
SAS models ooX = 1 when I do it, and ooX = 0 when he does it.
We've checked record counts and FREQ counts for the main variables. They are the same.
Type 3 analysis of effects are the same. MLE estimates are the same, except for the intercept.
Does SAS require input to be sorted a certain way?
PROC LOGISTIC data = TTTT;
class ooX Y1 Y2 Y3 Y4;
model ooX = Y1 Y2 Y3 q1 q2 q3;
RUN;
If your data are not sorted you can specify the order of your outcome variable right after calling PROC LOGISTIC.
I don't have the data, but assuming that ooX is a binary outcome variable with levels 0 and 1, the model will default to modeling ooX = 0 unless you specify that you want it in descending order.
PROC LOGISTIC data = TTTT descending; /* will model ooX = 1 */
class ooX Y1 Y2 Y3 Y4; /* Not sure if it makes sense to have your outcome in the class statement */
model ooX = Y1 Y2 Y3 q1 q2 q3;
RUN;
As explained in SAS manual (http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_sect030.htm)
For binary response data with event and nonevent categories, if your event category has a higher Ordered Value, then by default the nonevent is modeled.

polynomial terms in proc logistic and other regressions

I'd like to do the following regression
proc logistic data=abc
model y = x x*x x*x*x ....;
run;
Is there a shorthand to generate these polynomial terms? Thanks.
Edit: That will teach me to look closer at the question before I answer. The BAR operator is indeed for interaction - not polynomial effects.
Logistic does not have shorthand to accomplish this yet that I know of - but glimmix does have an experimental technique using the effect statement. For example, this..
effect MyPoly = polynomial(x1-x3/degree=2);
model y = MyPoly;
is the same as
model y = x1 x2 x3 x1*x1 x1*x2 x1*x3 x2*x2 x2*x3 x3*x3;