I am implementing a logit model in a database of households using as dependent variable the classification of poor or not poor household (1 if it is poor, 0 if it is not):
proc logistic data=regression;
model poor(event="1") = variable1 variable2 variable3 variable4;
run;
Using the proc logistic in SAS, I obtained the table "Association of predicted probabilities and observed responses" that allows me to know the concordant percentage. However, I require detailed information of how many households are classified poor adequately, in this way:
I will appreciate your help with this issue.
Add the CTABLE option to your MODEL statement.
model poor(event="1") = variable1 variable2 variable3 variable4 / ctable;
CTABLE classifies the input binary response observations according to
whether the predicted event probabilities are above or below some
cutpoint value z in the range . An observation is predicted as an
event if the predicted event probability exceeds or equals z. You can
supply a list of cutpoints other than the default list by specifying
the PPROB= option. Also, you can compute positive and negative
predictive values as posterior probabilities by using Bayes’ theorem.
You can use the PEVENT= option to specify prior probabilities for
computing these statistics. The CTABLE option is ignored if the data
have more than two response levels. This option is not available with
the STRATA statement.
For more information, see the section Classification Table.
Related
I am using SAS procedure PSMATCH to balance the cohorts. I am calculating the propensity score separately using logistic regression and then using the generated dataset in PSMATCH using PSDATA. I am doing multiple iterations of matching (to get the best results) by bringing variation in region, method (Optimal, Greedy and variable ratio), distance variable, caliper value and ratio. Please find the code below:
proc psmatch data=work.&data_set. region=®ion_var.;
class &cat_var.;
psdata treatvar = case_cntrl_fl(Treated='1') PS=prop_score;
match method=&mtch_method.(&k_method.=&k_val.) exact= &.exact_mtch_var.
stat=&stat_var. caliper(mult=stddev)=&caliper_var.;
assess lps ps var=(prop_score &covar_asses.) / plots = (boxplot cloudplot);
output out(obs=match)=WORK.psm ps=ps lps=lps matchid=_MatchID matchwgt = _MATCHWGT_;
run;
My concern is regarding the number of observation considered for matching (i.e. All Observations). The total observation logistic regression data set are Treatment Arm 1: 531 and Treatment Arm 2: 3252 However, in PSMATCH report All observations reported as Treatment Arm 1: 446 and Treatment Arm 2: 2784 The result is consistent irrespective of the variations in PSMATCH methods
Can somebody help me understand the possible reason of drop in counts?
You likely have missing values in your data. If any variable in the proc is missing, that entire row is excluded from the analysis overall.
Possibly related to this question: How can I print odds ratios as part of the results of a GENMOD procedure?
I am dealing with a wide dataset containing; a main exposure variable, a categorical variable Type (four levels), as several continuous and binary variables as confounding factors.
Additional info: The dataset contains multiple imputations.
I am using the following code:
Proc genmod;
Class ID Type (ref=first)
Model class1= Type;
estimate 'black' TYPE 0 1 1/exp;
estimate 'white' TYPE 1 0 1/exp;
estimate 'red' TYPE 0 1 0/exp;
Repeated ID;
By imputation;
Run;
I expected the results table to contain, among others, the beta for the exponential of every level of the categorical variable Type ( bar that variable's reference group). The actual results table lacks beta values, nor does the table have confidence intervals printed.
What syntax should I use to tell SAS to produce those numbers in the results? I have looked through SAS documentation, but I have yet found an answer.
I was looking for a way to compute the effect size for Friedman's test using sas, but could not find any reference. I wanted to see if there is any difference between the groups and what its size was.
Here is my code:
proc freq data=mydata;
tables id*b*y / cmh2 scores=rank noprint;
run;
These are the results:
The FREQ Procedure
Summary Statistics for b by y
Controlling for id
Cochran-Mantel-Haenszel Statistics (Based on Rank Scores)
Statistic Alternative Hypothesis DF Value Prob
1 Nonzero Correlation 1 230.7145 <.0001
2 Row Mean Scores Differ 1 230.7145 <.0001
This question is correlated with the one posted on Cross Validated, that is concerned with the general statistical formula to compute the effect size for Friedman's test. Here, I would like to find out how to get the effect size in sas.
I am doing a logistic regression of a binary dependent variable on a four-value multinomial (categorical) independent variable. Somebody suggested to me that it was better to put the independent variable in as multinomial rather than as three binary variables, even though SAS seems to treat the multinomial as if it is three binaries. THeir reason was that, if given a multinomial, SAS would report std errors and confidence intervals for the three binary variables 'relative to the omitted variable', whereas if given three binaries it would report them 'relative to all cases where the variable was zero'.
When I do the regression both ways and compare, I see that nearly all results are the same, including fit statistics, Odds Ratio estimates and confidence intervals for odds ratios. But the coefficient estimates and conf intervals for those differ between the two.
From my reading of the underlying theory,as presented in Hosmer and Lemeshow's 'Applied Logistic Regression', the estimates and conf intervals reported by SAS for the coefficients are consistent with the theory for the regression using three binary independent variables, but not for the one using a 4-value multinomial.
I think the difference may have something to do with SAS's choice of 'design variables', as for the binary regression the values are 0 and 1, whereas for the multinomial they are -1 and 1. But I don't really understand what SAS is doing there.
Does anybody know how SAS's approach differs between the two regressions, and/or can explain the differences in the outputs?
Here is a link to the SAS output:
SAS output
And here is the SAS code:
proc logistic data=tab descending;
class binB binC binD / descending;
model y = binD binC binB ;
run;
proc logistic data=tab descending;
class multi / descending;
model y = multi;
run;
Background: I have a categorical variable, X, with four levels that I fit as separate dummy variables. Thus, there are three total dummy variables representing x=1, x=2, x=3 (x=0 is baseline).
Problem/issue: I want to be able to calculate the value of a linear combination (i.e. using SAS as a calculator) of these dummy variables. For example, 2*B1 + 2*B2 + B3.
In Stata, this can be done using the lincom command, which uses the stored beta estimates to calculate linear combinations of the parameters.
In SAS in a procedure such as PROC GLM, I think I should use the ESTIMATE statement, but I'm not sure how I would specify the "weights" for each variable in this case.
You are looking for PROC SCORE. This takes output regression or factor estimates and scores a new data set. See here for an example. http://support.sas.com/documentation/cdl/en/statug/66859/HTML/default/viewer.htm#statug_score_examples02.htm
FYI, PROC MODEL does allow this in the model statement, which may be less work than PROC SCORE. I know PROC MODEL can be used readily in place of PROC REG, but I'm not sure how advanced of modeling PROC MODEL does, so it may not be an option for more complex models. I was hoping for something with less coding, but given the nature of SAS, I think this and PROC SCORE are the best I'm going to get.
What if you add your linear combination as a variable in your input dataset?
data myDatasetWithLinCom;
set mydata;
LinComb=2*(x=1)+ 2*(x=2)+(x=3); /*equvilent to 2*B1 + 2*B2 + B3*/
run;
then you can specify LinComb as one of the explanatory variables and you can lookup the coefficient directly from the output.