I have a campaign result where i have a test and holdout dataset where the variance is not normally distributed. I was trying to use Proc NPAR1WAY wilcoxon exact test to get the P value. For some reason all the output is properly populated, but the Exact Test portion is showing Null for all the fields. Not sure what else to check as all the value within the VAR is not null and the log is not showing any ERROR message.
PROC NPAR1WAY WILCOXON DATA = TEST1;
CLASS TEST_HOLDOUT_FLAG;
VAR VARIANCE;
EXACT WILCOXON;
RUN;
Result
Wilcoxon Two-Sample Test
Statistic (S) 3.18E+12
Normal Approximation
Z 8.1747
One-Sided Pr > Z <.0001
Two-Sided Pr > |Z| <.0001
t Approximation
One-Sided Pr > Z <.0001
Two-Sided Pr > |Z| <.0001
Exact Test
One-Sided Pr >= S .
Two-Sided Pr >= |S - Mean| .
Z includes a continuity correction of 0.5.
Kruskal-Wallis Test
Chi-Square 66.826
DF 1
Pr > Chi-Square <.0001
Related
I am using Stata to conduct survey analysis and I am running the command:
quietly svy: mean consumption
I need to extract the results and put them in a matrix.
How do I extract the mean, sd and confidence intervals?
I believe for the mean we can run:
matrix[1,2] = e(b)
The following works for me:
webuse nhanes2f
svyset psuid [pweight=finalwgt], strata(stratid)
quietly svy: mean zinc
return list
scalars:
r(level) = 95
macros:
r(mcmethod) : "noadjust"
matrices:
r(table) : 9 x 1
matrix list r(table)
r(table)[9,1]
zinc
b 87.182067
se .49448269
t 176.30965
pvalue 4.244e-48
ll 86.173563
ul 88.190571
df 31
crit 2.0395134
eform 0
When I run the following codes to show the predicted probabilities of y (binary) vs. x1 (continuous) at different values of x2 (continuous), the range of x1 goes from its minimum to its maximum.
proc logistic data=data;
model y(event='1') = x1 | x2;
store logiMod;
run;
title "Predicted probabilities";
proc plm source=logiMod;
effectplot slicefit(x=x1 sliceby=x2=0 to 30 by 5);
run;
However, I want to show this graph only for x1 values ranging from 0 to 20 with an increment of 2 if possible. I don't want to change my model. I just want to change the range of the display for the x-axis. How do I do that?
SAS Coding: - I perform a ttest on the differences in two groups (independent but from same population). The signs of the 'difference' amount and the t-stat match (i.e. mathematical difference between the two groups is negative and tstat is negative. Or if mathematical difference between the two groups is positive the tstat is positive).
However, when I run a wilcoxon rank sum test, the signs of my z-scores don't match the sign (-/+) of the group difference. (i.e. mathematical difference between the two groups is negative but z-score is positive. If mathematical difference between the two groups is positive the z-score is negative).
I have tried sorting the dataset regular and descending.
Here's my code:
*proc sort data = fundawin3t;
by vb_nvb_TTest;
run;
**Wilcoxon rank sums for vb vs nvb firms.;
proc npar1way data = fundawin3t wilcoxon;
title "NVB vs VB univariate tests and Wilcoxon-Table 4";
var ma_score_2015 age mve roa BM BHAR prcc_f CFI CFF momen6 vb_nvb SERIAL recyc_v;
class vb_nvb_TTest;
run;
Here is my log:
3208
3209 proc sort data = fundawin3t;
3210 by vb_nvb_TTest;
3211 run;
NOTE: Input data set is already sorted, no sorting done.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
3212
3213 **Wilcoxon rank sums for vb vs nvb firms.;
3214 proc npar1way data = fundawin3t wilcoxon;
3215 title "NVB vs VB univariate tests and Wilcoxon-Table 4";
3216 var ma_score_2015 age mve roa BM BHAR prcc_f CFI CFF momen6
tenure vb_nvb SERIAL
3216! recyc_v;
3217 class vb_nvb_TTest;
3218 run;
NOTE: PROCEDURE NPAR1WAY used (Total process time):
real time 6.59 seconds
cpu time 5.25 seconds
RTM
To compute the linear rank statistic S, PROC NPAR1WAY sums the scores of the observations in the smaller of the two samples. If both samples have the same number of observations, PROC NPAR1WAY sums those scores for the sample that appears first in the input data set.
PROC NPAR1WAY computes one-sided and two-sided asymptotic p-values for each two-sample linear rank test. When the test statistic z is greater than its null hypothesis expected value of 0, PROC NPAR1WAY computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. When the test statistic is less than or equal to 0, PROC NPAR1WAY computes the left-sided p-value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. The one-sided p-value $P_1(z)$ can be expressed as
I have used the following statement to calculate predicted values of a logistic model
proc logistic data = dev descending outest =model;
class cat_vars;
Model dep = cont_var cat_var / selection = stepwise slentry=0.1 slstay=0.1
stb lackfit;
output out = tmp p= probofdefault;
Score data=dev out = Logit_File;
run;
I want to know what would be the interpretation of the probabilities i get in the logit_file . Are those probabilities odds ratio ( exp(y)) or are they the probabilities (odds ratio/1+odds ratio)?
Probabilities cannot be odds ratios. A probability is between 0 and 1, odds ratios have no upper bound. The output from SCORE are probabilities.
If you consider the reason for there being a SCORE option in the first place, this should make sense: SCORE is designed to score new data sets using an old model. It uses the odds ratios and so on of the old model on a new data set.
I need to calculate the following for my dataset. I could calculate individual PPV (95% CI) and NPV (95% CI) but got tad confused about how to calculate this:
PPV+NPV-1 (95% CI)
How do I do this calculation?
This page on SAS support gives code as follows:
title 'Sensitivity';
proc freq data=FatComp;
where Response=1;
weight Count;
tables Test / binomial(level="1");
exact binomial;
run;
title 'Specificity';
proc freq data=FatComp;
where Response=0;
weight Count;
tables Test / binomial(level="0");
exact binomial;
run;
title 'Positive predictive value';
proc freq data=FatComp;
where Test=1;
weight Count;
tables Response / binomial(level="1");
exact binomial;
run;
title 'Negative predictive value';
proc freq data=FatComp;
where Test=0;
weight Count;
tables Response / binomial(level="0");
exact binomial;
run;
I doubt that this is a useful measure. In general you should present sensitivity, specificity, positive and negative predictive values. If you want a global measure of accuracy you should go for the proportion of correctly classified subjects.
If you go in the webpage already suggested by Peter Flom yo can scroll until a piece of code for overall accuracy. The accuracy can be computed by creating a binary variable indicating whether test and response agree in each observation. :
data acc;
set FatComp;
if (test and response) or
(not test and not response) then acc=1;
else acc=0;
run;
proc freq;
weight count;
tables acc / binomial(level="1");
exact binomial;
run;
Hope it helps