Kruskal-Wallis test vs. ANOVA in SAS with complex survey data? - sas

I am analyzing a temporal trend(yr) of certain chemicals(a b & c).
I use proc sgplot and series statement to draw a plot and found there was a decreasing trend.
Becuase the data is right-skewed, I used the median concentration of each year to draw the plot.
Now I would like to conduct a statistical test on the trend. My data came from the NHANES and need to use the proc survey** to perform analysis. I know I can do an ANOVA test based on proc surveyreg and use ANOVA option in the MODELstatement.
proc suveyreg data=a;
stratum stra;
cluster clus;
weight wt;
model a=yr/anova;
run;
But since the original data is right-skewed, I think maybe it is better to use Kruskal-Wallis test on the original data. But I don't know how to write a code in SAS and I didn't find information in proc survey**-related document.
My plan B is to use the log-transformed data and ANOVA test. But I am not sure if that is an appropriate approach. Can somebody tell me how to get the normality test of the residual in ANOVA while using proc surveyreg? I would also like to know if I can test a b & c in one procedure or I should write multiple procedures with changes in MODEL statement.
Looking forward to your engagement.Thank you!

Related

Getting Chi-Square statistics in proc surveylogistic

per Default proc surveylogistic displays an F test in the "testing the null hypothesis /beta = 0 " output. Can I somehow change that to a Chi-Square Test?
Usually I use proc logistics but this time I have a cluster variable and to my knowledge proc logistic cant handle those.
In the documentation I read the F and Chi-Square test are equivallent but I get different results for the significance tests (although the point estimates for intercept and my independent variable are the same) to proc logistic for the same analysis.
I also tried using the df=infinity option but the name just changes the value stays the same.
Regards

SAS: how feasible is it to do an iterative multiple imputation

I am new to SAS, and I would like how easy/difficult it would be to try to do an iterative multiple imputation in SAS. In R, this is relatively easy.
The algorithm is as follows:
impute missing data using known distribution
fit model to complete data in 1
use model fit in 2 to impute missing data
repeat model fitting and imputation steps 50 times (e.g. 50 data sets total)
take every 10th dataset and pool the results
Based on my limited experience in SAS, I'm guessing I would have to write a MACRO. I am specifically interested in using proc nlmixed to fit my model. I am not using R because SAS's nlmixed is more flexible and gives more robust results.
proc mi NIMPUTE=n
proc sort; by _Imputation_
proc NLMIXED; by _Imputation_
proc mianalyze;

plot entire time series in SAS?

I am trying to make a forecast, and I want to see the entire time series, with the forcasted period at the end (need to compare with another graph of this kind).
SAS 9.4 does not want to comply, however, and only shows me the forecasting part.
What can I do to remedy this?
The code I'm using is:
Proc arima data=logtabell;
identify var=y(12) nlag=24;
estimate p=1 q=2;
forecast lead=12 interval=month id=date out=results;
run;
Your out= dataset will contain all values by default. If you want to specifically see the full graph that it outputs, add the plots option:
proc arima data=logtabell plots=forecast(forecast);
Or, just do it the easy way by getting every plot:
proc arima data=logtabell plots = all;
Also, make sure ods graphics on; is set.
SAS procedures have an out=resultSet option, from which you can get the results in a dataset.
Combine this output with your time serious in one graph created with proc sgplot

use estimates from proc glm to make prediciton on another dataset

I'm not so familiar with SAS proc glm. All I have done using proc glm so far is to output parameter estimates and predicted values on training datasets. But I also need to use the fitted model to make prediction on testing dataset. (both point estimates and interval estimates)
Here is my code.
ods output ParameterEstimates=Pi_Parameters FitStatistics=Pi_Summary PredictedValues=Pi_Fitted;
proc glm data=Train_Pi;
class Area Fo5 Tye M0 M1 M2 M3;
model Pi = Dow Area Fo5 Tye M0|HC M1|HC M2|HC M3|HC/solution p ss3 /*tolerance*/;
run;
But how to proceed to next step? something like predict(Model_from_Train_Pi,Test_Pi)
If you're on SAS 9.4 see Jake's answer from this question:
How to predict probability in logistic regression in SAS?
If not on 9.4, my answer applies for adding the data in to the original data set.
A third option is PROC SCORE - documentation has an example for proc reg that's almost identical to your question:
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_score_sect018.htm

One-way random-effects ANOVA in SAS: PROC GLM or MIXED?

I'm attempting to conduct a simple one-way random-effects ANOVA in SAS. I want to know if the population variance is significantly different than zero or not.
On UCLA's idre site, they state to use PROC MIXED as follows:
proc mixed data = in.hsb12 covtest noclprint;
class school;
model mathach = / solution;
random intercept / subject = school;
run;
This makes sense to me given my previous experience with using PROC MIXED.
However, in the text Biostatistical Design and Analysis Using R by Murray Logan, he says for a one-way ANOVA, fixed and random effects are not distinguished and conducts (in R) a "standard" one-way ANOVA even though he's testing the variance, not the means. I've found that in SAS, his R procedure is equivalent to using any of the following:
PROC ANOVA
PROC GLM (same as ANOVA, but with GLM in place of ANOVA)
PROC GLM with RANDOM statement
The p-values from the above three models are the same, but differ from the PROC MIXED model used by UCLA. For my data, it's a difference of p=0.2508 and p=0.3138. Although conclusions don't change in this instance, I'm not really comfortable with this difference.
Can anyone give advice on which one is more appropriate and also why there is this difference?
For your model, the difference between PROC ANOVA and PROC MIXED is only due to numerical noise(REML estimator of PROC MIXED). However, the p-values mentioned in your question correspond to the different tests. In order to get the F value using the output of COVTEST in PROC MIXED, you need to recalculate MS_groups taking into account the unequal sample sizes (either manually as explained on p.231 of http://bio.classes.ucsc.edu/bio286/MIcksBookPDFs/QK08.PDF, or just using PROC MIXED with the same fixed model spec as in PROC ANOVA). This paper (http://isites.harvard.edu/fs/docs/icb.topic1140782.files/S98.pdf) provides some examples of used of PROC MIXED in addition to SAS manual.