PROC MIXED in SAS for mixed model - sas

I have a question about the REPEATED statement in PROC MIXED in SAS. If I don't add the RANDOM statement, the model wouldn't be a mixed model because there is no random effect?

No. That is not right. You can often make the same model using either REPEATED or RANDOM. This is a very confusing bit of SAS and they removed the confusion in GLIMMIX. To tell what your model is, you should write it out in matrix form; a lot of hints to how to do this are in the details section of the MIXED documentation.

Related

Do you still need to include "site" as a random effect when modeling matched data set?

I am working on a multicenter propensity matched cohort study. The primary outcome is binary while the secondary outcome is continuous. First I performed multiple imputation to address the missing data. I initially planned exact matching on the sites in addition to matching on other variables of interests but got very poor matches. Then I used variables that described the characteristics of the sites, which I compared with the site variable using c statistic and they had similar values. With this new variables and the other variables of interest I got a much better match. I then performed within imputation conditional logistic regression for the binary variable and pulled the results. For the secondary outcome I used negative binomial regression including the match ID in the class statement and as a repeated statement. Do I need to include 'site' as a random statement in the model? I don't know if this is possible in conditional logistic regression. What would be the best way to model this data after matching? For this study I used SAS for analysis.

Replace missing data in SAS with prediction: Regression Imputation

I have a SAS data set with missing data in multiple columns. I would like replace the missing data with a prediction based on the other data in the data set. Here a link that describes the method but doesn't show me how to do it. How do I replace the missing values with a prediction?
EDIT:
The method I had in mind was just using Proc Reg then apply the coefficents to the missing data to generate the estimate. Does this answer your question?
PROC STDIZE, PROC EXPAND, and PROC MI are all capable of performing different kinds of imputations on your data depending on exactly how you want do determine the 'prediction'.
For simple things like replacing with the mean, PROC STDIZE is the way to go. PROC MI is the most advanced - it performs multiple imputation. PROC EXPAND is appropriate if you have time-series data, as it will try to work out what the correct value is for that point in the time series.
If you have missing data in multiple columns you'll require multiple regressions. This probably isn't a good way to do this, but to answer the question - what you're requesting is called scoring a dataset and you can use PROC SCORE.
An alternative method is in your regression procedure request an OUTPUT data set that contains the predicted values for that regression.
output out=predicted1 p=pred_var_missing;
As a matter of methodology, I recommend #Joe's method instead.
Adding to #Joe 's answer, if you tell us why you want to do this imputation, we can provide better advice. I wrote a blog post called How to Ask a Statistics Question that may help.
However, often, single imputation is a bad method. More particularly, if you are going to do further analysis on this data (with the imputed values) then single imputation will underestimate the variability of the data and give wrong results.
PROC MI is usually a better approach.

Why do I get different regression outputs in SAS and in Stata when using Prais-Winsten estimation?

I have a time series dataset with serious serial correlation problem, so I adopted Prais-Winsten estimator with iterated estimates to fix that. I did the regressions in Stata with the following command:
prais depvar indepvar indepvar2, vce(robust) rhotype(regress)
My colleague wanted to reproduce my results in SAS, so she used the following:
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 iter itprint method=YW;
run;
For the different specifications we ran, some of them roughly match, while others do not. Also I noticed that for each regression specification, Stata has many more iterations than SAS. I wonder if there is something wrong with my (or my colleague's) code.
Update
Inspired by Joe's comment, I modified my SAS code.
/*Iterated Estimation*/
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 itprint method=ITYW;
run;
/*Twostep Estimation*/
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 itprint method=YW;
run;
I have a few suggestions. Note that I'm not a real statistician and am not familiar with the specific estimators here, so this is just a quick read of the docs.
First off, the most likely issue is that it looks like SAS uses the OLS variance estimation method. That is, in your Stata code, you have vce(robust), which is in contrast to what I read SAS as using, the equivalent of vce(ols). See this page in the docs which explains how SAS does the Y-W method of autoregression, compared to this doc page that explains how Stata does it.
Second, you probably should not specify method=YW. SAS distinguishes between the simple Y-W estimation ("two-step" method) and iterated Y-W estimation. method=ITYW is what you want. You specify iter, so it may well be that you're getting this anyway as SAS tends to be smart about those sorts of things, but it's good to verify.
I would suggest actually turning the iterations off to begin with - have both do the two-step method (Stata option twostep, SAS by removing the iter request and specifying method=YW or no method specification). See how well they match there. Once you can get those to match, then move on to iterated; it's possible SAS has a different cutoff than Stata and may well not iterate past that.
I'd also suggest trying this with only one independent and dependent variable pair first, as it's possible the two programs handle things differently when you add in a second independent variable. Always start simple and then add complexity.

SAS Enterprise - Is it possible to show the mathematical formula behind the calculation?

is it possible to show the mathemetical formular / concept behind the analysis done with SAS Enterprise?
Assuming SAS would calculate a correlation between a list of numbers -- is it possible to see what exactly SAS did from a mathematical perspective?
It is not possible to ask SAS for the mathematical formula, no. You can check the documentation; for example, this page gives many of the 'elemantary statistics' formulas (like variance, UCLM, etc.)
If you need the formula behind something more complex that you can't find online, contact your SAS Support rep, and they may be able to put you in contact with the developer of that particular proc - like if you need to know some particular to how PROC GLM does something.
You can ask SAS to give you the SAS code that it ran if you executed a task (in most cases it's available by clicking on the task node), in many cases, but that would be something like proc freq; tables a*b; run;, not a mathematical formula per se.

One-way random-effects ANOVA in SAS: PROC GLM or MIXED?

I'm attempting to conduct a simple one-way random-effects ANOVA in SAS. I want to know if the population variance is significantly different than zero or not.
On UCLA's idre site, they state to use PROC MIXED as follows:
proc mixed data = in.hsb12 covtest noclprint;
class school;
model mathach = / solution;
random intercept / subject = school;
run;
This makes sense to me given my previous experience with using PROC MIXED.
However, in the text Biostatistical Design and Analysis Using R by Murray Logan, he says for a one-way ANOVA, fixed and random effects are not distinguished and conducts (in R) a "standard" one-way ANOVA even though he's testing the variance, not the means. I've found that in SAS, his R procedure is equivalent to using any of the following:
PROC ANOVA
PROC GLM (same as ANOVA, but with GLM in place of ANOVA)
PROC GLM with RANDOM statement
The p-values from the above three models are the same, but differ from the PROC MIXED model used by UCLA. For my data, it's a difference of p=0.2508 and p=0.3138. Although conclusions don't change in this instance, I'm not really comfortable with this difference.
Can anyone give advice on which one is more appropriate and also why there is this difference?
For your model, the difference between PROC ANOVA and PROC MIXED is only due to numerical noise(REML estimator of PROC MIXED). However, the p-values mentioned in your question correspond to the different tests. In order to get the F value using the output of COVTEST in PROC MIXED, you need to recalculate MS_groups taking into account the unequal sample sizes (either manually as explained on p.231 of http://bio.classes.ucsc.edu/bio286/MIcksBookPDFs/QK08.PDF, or just using PROC MIXED with the same fixed model spec as in PROC ANOVA). This paper (http://isites.harvard.edu/fs/docs/icb.topic1140782.files/S98.pdf) provides some examples of used of PROC MIXED in addition to SAS manual.