Finding difference in means using KS Test (Weighted data) - sas

I am trying to find the difference in means in my two samples. The distribution is not normal hence I have to resort to KS test. Now, the data is weighted. In proc t-test, there is an option of mentioning weight in the command, like this:
proc ttest data=input;
var variable_x;
class variable_y;
weight weight;
run;
Is there a similar thing for KS test in npar1way procedure? Thanks!

Related

Kruskal-Wallis test vs. ANOVA in SAS with complex survey data?

I am analyzing a temporal trend(yr) of certain chemicals(a b & c).
I use proc sgplot and series statement to draw a plot and found there was a decreasing trend.
Becuase the data is right-skewed, I used the median concentration of each year to draw the plot.
Now I would like to conduct a statistical test on the trend. My data came from the NHANES and need to use the proc survey** to perform analysis. I know I can do an ANOVA test based on proc surveyreg and use ANOVA option in the MODELstatement.
proc suveyreg data=a;
stratum stra;
cluster clus;
weight wt;
model a=yr/anova;
run;
But since the original data is right-skewed, I think maybe it is better to use Kruskal-Wallis test on the original data. But I don't know how to write a code in SAS and I didn't find information in proc survey**-related document.
My plan B is to use the log-transformed data and ANOVA test. But I am not sure if that is an appropriate approach. Can somebody tell me how to get the normality test of the residual in ANOVA while using proc surveyreg? I would also like to know if I can test a b & c in one procedure or I should write multiple procedures with changes in MODEL statement.
Looking forward to your engagement.Thank you!

SAS: how feasible is it to do an iterative multiple imputation

I am new to SAS, and I would like how easy/difficult it would be to try to do an iterative multiple imputation in SAS. In R, this is relatively easy.
The algorithm is as follows:
impute missing data using known distribution
fit model to complete data in 1
use model fit in 2 to impute missing data
repeat model fitting and imputation steps 50 times (e.g. 50 data sets total)
take every 10th dataset and pool the results
Based on my limited experience in SAS, I'm guessing I would have to write a MACRO. I am specifically interested in using proc nlmixed to fit my model. I am not using R because SAS's nlmixed is more flexible and gives more robust results.
proc mi NIMPUTE=n
proc sort; by _Imputation_
proc NLMIXED; by _Imputation_
proc mianalyze;

SAS: choosing the right model

I'm getting a bit lost in alle the possible ways to find association...
I have a dataset in which my subjects are categorized 1, 2 or 3 (depending on genotype polymorphisms). I want to know the association of either polymorphism with ballistic strength (which is a continuous variable).
Since I have one continous independent variable (strength) and one categorical dependant variable (genotype pholymorphism), I taught of using ANOVA, but I'm not sure which test to choose and how to find the exact one in SAS.
Thanks in advance!
This is better asked on Cross Validated (https://stats.stackexchange.com/).
That said, I would look at PROC GENMOD or PROC GLM to get an idea of how the genotypes affect the strength variable.
proc genmod data=test;
class genotype ;
model strength = genotype / type1;
run;
You can use PROC TTEST to compare any two levels. Here I compare A to B.
proc ttest data=test(where=(genotype in ('A', 'B')));
class genotype;
var strength;
run;

use estimates from proc glm to make prediciton on another dataset

I'm not so familiar with SAS proc glm. All I have done using proc glm so far is to output parameter estimates and predicted values on training datasets. But I also need to use the fitted model to make prediction on testing dataset. (both point estimates and interval estimates)
Here is my code.
ods output ParameterEstimates=Pi_Parameters FitStatistics=Pi_Summary PredictedValues=Pi_Fitted;
proc glm data=Train_Pi;
class Area Fo5 Tye M0 M1 M2 M3;
model Pi = Dow Area Fo5 Tye M0|HC M1|HC M2|HC M3|HC/solution p ss3 /*tolerance*/;
run;
But how to proceed to next step? something like predict(Model_from_Train_Pi,Test_Pi)
If you're on SAS 9.4 see Jake's answer from this question:
How to predict probability in logistic regression in SAS?
If not on 9.4, my answer applies for adding the data in to the original data set.
A third option is PROC SCORE - documentation has an example for proc reg that's almost identical to your question:
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_score_sect018.htm

One-way random-effects ANOVA in SAS: PROC GLM or MIXED?

I'm attempting to conduct a simple one-way random-effects ANOVA in SAS. I want to know if the population variance is significantly different than zero or not.
On UCLA's idre site, they state to use PROC MIXED as follows:
proc mixed data = in.hsb12 covtest noclprint;
class school;
model mathach = / solution;
random intercept / subject = school;
run;
This makes sense to me given my previous experience with using PROC MIXED.
However, in the text Biostatistical Design and Analysis Using R by Murray Logan, he says for a one-way ANOVA, fixed and random effects are not distinguished and conducts (in R) a "standard" one-way ANOVA even though he's testing the variance, not the means. I've found that in SAS, his R procedure is equivalent to using any of the following:
PROC ANOVA
PROC GLM (same as ANOVA, but with GLM in place of ANOVA)
PROC GLM with RANDOM statement
The p-values from the above three models are the same, but differ from the PROC MIXED model used by UCLA. For my data, it's a difference of p=0.2508 and p=0.3138. Although conclusions don't change in this instance, I'm not really comfortable with this difference.
Can anyone give advice on which one is more appropriate and also why there is this difference?
For your model, the difference between PROC ANOVA and PROC MIXED is only due to numerical noise(REML estimator of PROC MIXED). However, the p-values mentioned in your question correspond to the different tests. In order to get the F value using the output of COVTEST in PROC MIXED, you need to recalculate MS_groups taking into account the unequal sample sizes (either manually as explained on p.231 of http://bio.classes.ucsc.edu/bio286/MIcksBookPDFs/QK08.PDF, or just using PROC MIXED with the same fixed model spec as in PROC ANOVA). This paper (http://isites.harvard.edu/fs/docs/icb.topic1140782.files/S98.pdf) provides some examples of used of PROC MIXED in addition to SAS manual.