My colleague and I are running exactly the same SAS PROC LOGISTIC, but with different input files.
SAS models ooX = 1 when I do it, and ooX = 0 when he does it.
We've checked record counts and FREQ counts for the main variables. They are the same.
Type 3 analysis of effects are the same. MLE estimates are the same, except for the intercept.
Does SAS require input to be sorted a certain way?
PROC LOGISTIC data = TTTT;
class ooX Y1 Y2 Y3 Y4;
model ooX = Y1 Y2 Y3 q1 q2 q3;
RUN;
If your data are not sorted you can specify the order of your outcome variable right after calling PROC LOGISTIC.
I don't have the data, but assuming that ooX is a binary outcome variable with levels 0 and 1, the model will default to modeling ooX = 0 unless you specify that you want it in descending order.
PROC LOGISTIC data = TTTT descending; /* will model ooX = 1 */
class ooX Y1 Y2 Y3 Y4; /* Not sure if it makes sense to have your outcome in the class statement */
model ooX = Y1 Y2 Y3 q1 q2 q3;
RUN;
As explained in SAS manual (http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_sect030.htm)
For binary response data with event and nonevent categories, if your event category has a higher Ordered Value, then by default the nonevent is modeled.
Related
2I am writing an IF statement to filter out some values which are sequential. Is there a way to write an IF statement to bring out the sequential values
data H;
input HH $;
cards;
Y1
Y2
Y3
Y4
Y5
; run;
data t;
set H;
if hh in ('Y2' -'Y4');
run;
Use the Scan function to extract the number part then filer the numbers you want:
if scan(hh,1,'Y') >= 1 & scan(hh,1,'Y') <=4;
New Datastep:
data t;
set H;
if scan(hh,1,'Y') >= 1 & scan(hh,1,'Y') <=4;
run;
You can take advantage of the fact that < and > work with character variables and sort order:
data t;
set H;
if 'Y2' <= hh <= 'Y4';
run;
However, Y22 would also be sorted between Y2 and Y4.
data H;
input HH $;
cards;
Y1
Y2
Y3
Y4
Y5
Y22
; run;
data t;
set H;
if 'Y2' <= hh <= 'Y4';
run;
So you would need to add additional logic in that case.
I have variables y, x1 and x2. I estimate a differenced equation (without the intercept) using reg d.(y x1 x2), nocons. Now I want to get the residuals for the original variables using the estimated coefficients. I can do it by
reg d.(y x1 x2), nocons
matrix b = e(b)
gen resid = y - b[1,1]*x1 - b[1,2]*x2
But would there be an easier way? I need to keep those generated residuals for future use. Here is a complete minimal example.
clear all
set obs 100
gen id = floor((_n-1)/5)+1
by id, sort: gen year = 1990+_n
xtset id year
set seed 1
gen x1 = rnormal()
gen x2 = rnormal()
gen y = rnormal()
*** Data generated ***
reg d.(y x1 x2), nocons
matrix b = e(b)
gen resid = y - b[1,1]*x1 - b[1,2]*x2
I wonder if there is a flexible approach because sometimes I want to completely change variable names for the regression (e.g., reg dy dx1 dx2, nocons not just reg d.(y x1 x2)). I thought perhaps predict might be helpful but I don't know. Would it be possible to avoid typing the variable names explicitly?
predict will not work since it will create residuals on the differenced scale. You want residuals in terms of the original y, which is unusual, so there is no off-the-shelf solution.
I think the easiest path is to do something like this:
reg d.(y x1 x2), nocons coefl
local vars:colnames e(b) // get a list of coefficients
foreach x of local vars {
local xvar = subinstr("`x'","D.","",1) // strip out the D. prefix from the coefficient names
local diff "`diff' - _b[`x']*`xvar'"
}
gen resid = y `diff'
If you have covariates like dx1 and dx2, you can modify the prefix stipper like this:
local xvar = subinstr("`x'","d","",1) // strip out the first d prefix
Can someone please help with the scenario below? I am very new to SaS and am not sure how to get this to work?
Simulate 200 observations from the following linear model:
Y = alpha + beta1 * X1 + beta2 * X2 + noise
where:
• alpha=1, beta1=2, beta2=-1.5
• X1 ~ N(1, 4), X2 ~ N(3,1), noise ~ N(0,1)
I have tried this code but not sure its completely accurate:
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
DO i = 1 to 200;
Y=alpha+beta1*X1+beta2*X2+Noise;
X1=Rannor(1);
X2=rannor(3);
Noise=ranuni(0);
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
You need to have a look in the SAS help for the topics
"rannor","ranuni","generating random numbers",...
rannor: generating standard normal distributed RVs.
ranuni: uniform distributed RVs.
The argument in rannor is the seed number, not the expected value.
If N(x,y) in your example means that the random variable is normally distributed with expected value x and standard deviation y (or do you mean the variance???) then the code could be (have a look on the changed order of the statements; the definition of Y has to be after the definition of the random numbers...):
DATA ONE;
alpha = 1;
beta1 = 2;
beta2 = -1.5;
RUN;
DATA CALC;
SET ONE;
seed = 1234;
DO i = 1 to 200;
X1=1+4*Rannor(seed);
X2=3+rannor(seed);
Noise=rannor(seed);
Y=alpha+beta1*X1+beta2*X2+Noise;
OUTPUT;
END;
RUN;
PROC PRINT DATA=CALC;
RUN;
There are also variants for generating random numbers, e.g. "call rannor". There are different concepts to deal with seed numbers in SAS. See the SAS help for these topics, e.g. here
Here are my data. Data are structured like so: id x1 x2 x3 y.
I used proc mixed to analyze it, but now want to determine regression coefficients and I don't know how to do it. I'm only a beginner with sas. From the results I see that x1, x2, x3 and x1x2x3 are the significant effects, but how to determine the coefficients alpha, beta, gamma, delta, theta:
y = theta + alpha*x1 + beta*x2 + gamma*x3 + delta*x1*x2*x3
This is my code:
ods graphics on;
proc mixed data=test;
class x1 x2 x3;
model y = x1 | x2 | x3 / solution residual;
random id;
run;
ods graphics off;
EDIT 1: Here is a part of the table Solutions for Fixed Effects:
Since x1 has two levels, there are two rows for it in the table. Do I get the effect of x1 by summing these two values: -109.07 for the first row and 0 for the second, or should I do something else? Note that this is 2^k design. The effect of x1 should be computed as half the difference between the average values for y when x1 is high (20) and when it is low (10).
Based on your model, x1, x2, x3 should be treated as continuous variables, then you should be able to get the coefficients in your model.
proc mixed data=test;
model y=x1 x2 x3 x1*x2*x3/ solution residual;
random id/s;
run;
However, based on your code and the values of x1, x2 and x3, it would be better to treat them as categorical variable as what you did, then the Estimate in your table actually is the mean difference between whatever two levels. The link below may help you understand your results.
http://support.sas.com/kb/38/384.htmlexplanation of estimation of coefficients
The solution option should generate your estimates.You need to include it on the model and random statements. You should see two tables, Solution for Fixed Effects and Solution for Random Effects that hold the estimates.
proc mixed data=test;
class x1 x2 x3;
model y = x1 | x2 | x3 / solution residual;
random id / s;
run;
The Random Coefficients example in the documentation is close to your question.
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect034.htm
I'd like to do the following regression
proc logistic data=abc
model y = x x*x x*x*x ....;
run;
Is there a shorthand to generate these polynomial terms? Thanks.
Edit: That will teach me to look closer at the question before I answer. The BAR operator is indeed for interaction - not polynomial effects.
Logistic does not have shorthand to accomplish this yet that I know of - but glimmix does have an experimental technique using the effect statement. For example, this..
effect MyPoly = polynomial(x1-x3/degree=2);
model y = MyPoly;
is the same as
model y = x1 x2 x3 x1*x1 x1*x2 x1*x3 x2*x2 x2*x3 x3*x3;