"Automatically" calculate linear combination of parameter estimates with PROC GLM - sas

Background: I have a categorical variable, X, with four levels that I fit as separate dummy variables. Thus, there are three total dummy variables representing x=1, x=2, x=3 (x=0 is baseline).
Problem/issue: I want to be able to calculate the value of a linear combination (i.e. using SAS as a calculator) of these dummy variables. For example, 2*B1 + 2*B2 + B3.
In Stata, this can be done using the lincom command, which uses the stored beta estimates to calculate linear combinations of the parameters.
In SAS in a procedure such as PROC GLM, I think I should use the ESTIMATE statement, but I'm not sure how I would specify the "weights" for each variable in this case.

You are looking for PROC SCORE. This takes output regression or factor estimates and scores a new data set. See here for an example. http://support.sas.com/documentation/cdl/en/statug/66859/HTML/default/viewer.htm#statug_score_examples02.htm

FYI, PROC MODEL does allow this in the model statement, which may be less work than PROC SCORE. I know PROC MODEL can be used readily in place of PROC REG, but I'm not sure how advanced of modeling PROC MODEL does, so it may not be an option for more complex models. I was hoping for something with less coding, but given the nature of SAS, I think this and PROC SCORE are the best I'm going to get.

What if you add your linear combination as a variable in your input dataset?
data myDatasetWithLinCom;
set mydata;
LinComb=2*(x=1)+ 2*(x=2)+(x=3); /*equvilent to 2*B1 + 2*B2 + B3*/
run;
then you can specify LinComb as one of the explanatory variables and you can lookup the coefficient directly from the output.

Related

SAS: how feasible is it to do an iterative multiple imputation

I am new to SAS, and I would like how easy/difficult it would be to try to do an iterative multiple imputation in SAS. In R, this is relatively easy.
The algorithm is as follows:
impute missing data using known distribution
fit model to complete data in 1
use model fit in 2 to impute missing data
repeat model fitting and imputation steps 50 times (e.g. 50 data sets total)
take every 10th dataset and pool the results
Based on my limited experience in SAS, I'm guessing I would have to write a MACRO. I am specifically interested in using proc nlmixed to fit my model. I am not using R because SAS's nlmixed is more flexible and gives more robust results.
proc mi NIMPUTE=n
proc sort; by _Imputation_
proc NLMIXED; by _Imputation_
proc mianalyze;

Fractional logit models in SAS

Institutionally constrained to using SAS (yes, I know). I have a basic specification I run in Stata/R no problem: fractional logit model (Papke Wooldridge 1996). It's a GLM with a binomial distribution assumption and a logit link function. Data context is stationary time series in the unit interval—percentage data.
In Stata this is easily run as
glm Y X, family(binomial) link(logit)
in R it is
aModel <- glm(Y ~ X, family=binomial(link=logit), data = aDataFrame)
Attempting to do this in SAS using proc GLIMMIX:
proc glimmix data =aDataTable method = rspl;
class someClassifier anotherClassifier;
model Y = X / dist = binomial link = logit SOLUTION;
random _residual_;
run;
I'm dealing with a panel dataset, which doesn't matter in R or Stata syntax but appears to be needed information for proc glimmix, hence my inclusion of a 'class' line. I am able to fit models that are fairly close to the original from Stata/R but differ in non trivial ways when we look at individual parameters or predicted values (correlation between different predicted values is about .97). Can anyone advise on the proper way to do a fractional logit in SAS? I think the inclusion of a "random" line as I have above is one source of trouble, as this seems to add random effects to the model via an extra matrix * vector operation.
Solution is simple it turns out. Need to use:
method = QUAD
which will use quasi maximum likelihood estimation, the same as used in Stata and R.

SAS Adaptivereg Limiting Breakpoints

I am using SAS's adaptivereg procedure to create a piece-wise linear function with temperature (temp) as my x variable and Usage (usage_value) as my y variable. I can use the details of adaptivereg procedure to find the ranges of the different linear functions of the piece-wise function. Is there a way I can limit the number of ranges, (i.e. instead of having 8 linear functions in the piece-wise function, I want to limit it to 5 linear functions). Are there any options I can add that would let me limit the amount of linear functions?
Below is the code I am using. Where a1 is the name of my data set, temp is the independent variable, and usage_value is the dependent variable.
proc adaptivereg data=a1 plots=all details=bases;
model usage_value = temp;
run;
I love adaptivereg. It's such a cool little procedure.
You can use the df option in the model statement to control the total number of knots to consider, and the maxbasis option to control the maximum number of knots in the final model. The higher the degrees of freedom that you use, the fewer the knots.
proc adaptivereg data=sashelp.air;
model air = date / df=12 maxbasis=3;
run;
You can also use the alpha= option to fine-tune it. Increasing alpha will result in more knots.
An alternative approach can be using the pbspline/spline options in transreg or proc quantreg, respectively.
proc transreg data=sashelp.air;
model identity(air) = pbspline(date / evenly=6);
run;
proc quantreg data=sashelp.air;
effect sp = spline(date / knotmethod=equal(12) );
model air = sp / quantile=0.5;
run;

How does SAS calculate standard errors of coefficients in logistic regression?

I am doing a logistic regression of a binary dependent variable on a four-value multinomial (categorical) independent variable. Somebody suggested to me that it was better to put the independent variable in as multinomial rather than as three binary variables, even though SAS seems to treat the multinomial as if it is three binaries. THeir reason was that, if given a multinomial, SAS would report std errors and confidence intervals for the three binary variables 'relative to the omitted variable', whereas if given three binaries it would report them 'relative to all cases where the variable was zero'.
When I do the regression both ways and compare, I see that nearly all results are the same, including fit statistics, Odds Ratio estimates and confidence intervals for odds ratios. But the coefficient estimates and conf intervals for those differ between the two.
From my reading of the underlying theory,as presented in Hosmer and Lemeshow's 'Applied Logistic Regression', the estimates and conf intervals reported by SAS for the coefficients are consistent with the theory for the regression using three binary independent variables, but not for the one using a 4-value multinomial.
I think the difference may have something to do with SAS's choice of 'design variables', as for the binary regression the values are 0 and 1, whereas for the multinomial they are -1 and 1. But I don't really understand what SAS is doing there.
Does anybody know how SAS's approach differs between the two regressions, and/or can explain the differences in the outputs?
Here is a link to the SAS output:
SAS output
And here is the SAS code:
proc logistic data=tab descending;
class binB binC binD / descending;
model y = binD binC binB ;
run;
proc logistic data=tab descending;
class multi / descending;
model y = multi;
run;

One-way random-effects ANOVA in SAS: PROC GLM or MIXED?

I'm attempting to conduct a simple one-way random-effects ANOVA in SAS. I want to know if the population variance is significantly different than zero or not.
On UCLA's idre site, they state to use PROC MIXED as follows:
proc mixed data = in.hsb12 covtest noclprint;
class school;
model mathach = / solution;
random intercept / subject = school;
run;
This makes sense to me given my previous experience with using PROC MIXED.
However, in the text Biostatistical Design and Analysis Using R by Murray Logan, he says for a one-way ANOVA, fixed and random effects are not distinguished and conducts (in R) a "standard" one-way ANOVA even though he's testing the variance, not the means. I've found that in SAS, his R procedure is equivalent to using any of the following:
PROC ANOVA
PROC GLM (same as ANOVA, but with GLM in place of ANOVA)
PROC GLM with RANDOM statement
The p-values from the above three models are the same, but differ from the PROC MIXED model used by UCLA. For my data, it's a difference of p=0.2508 and p=0.3138. Although conclusions don't change in this instance, I'm not really comfortable with this difference.
Can anyone give advice on which one is more appropriate and also why there is this difference?
For your model, the difference between PROC ANOVA and PROC MIXED is only due to numerical noise(REML estimator of PROC MIXED). However, the p-values mentioned in your question correspond to the different tests. In order to get the F value using the output of COVTEST in PROC MIXED, you need to recalculate MS_groups taking into account the unequal sample sizes (either manually as explained on p.231 of http://bio.classes.ucsc.edu/bio286/MIcksBookPDFs/QK08.PDF, or just using PROC MIXED with the same fixed model spec as in PROC ANOVA). This paper (http://isites.harvard.edu/fs/docs/icb.topic1140782.files/S98.pdf) provides some examples of used of PROC MIXED in addition to SAS manual.