SAS: Different Odds Ratio from PROC FREQ & PROC LOGISTIC - sas

I'm working on a project and have run into an expected issue. After running PROC LOGISTIC on my data, I noticed that a few of the odds ratios and regression coefficients seemed to be the inverse of what they should be. After some investigation using PROC FREQ to run the odds ratios, I believe there is some form of error with the odds ratios from PROC LOGISTIC.
The example below is of the response variable "MonthStay" and one of the variables in question "KennelCough". MonthStay = Y and the event of interest is KennelCough = N.
I don't know how to remedy this suspected error. Am I missing something in my code to get the correct calculations? Or am I totally misunderstanding what's going on? Thanks!
Here is the PROC FREQ code and result:
proc freq data = capstone.adopts_dog order = freq;
tables KennelCough*MonthStay / relrisk;
run;
Here is the PROC LOGISTIC CODE and results:
proc logistic data = capstone.adopts_dog plots(only)=(roc(id=prob) effect);
class Breed(ref='Chihuahua') Gender(ref='Female')
Color(ref='Black') Source(ref='Stray') EvalCat(ref='TR') SNAtIn(ref='No')
FoodAggro(ref='Y') AnimalAggro(ref='Y') KennelCough(ref='Y') Dental(ref='Y')
Fearful(ref='Y') Handling(ref='Y') UnderAge(ref='Y') InJuris(ref='Alameda County')
InRegion(ref='East Bay SPCA - Dublin') OutRegion(ref='East Bay SPCA - Dublin')
/ param=ref;
model MonthStay(event='Y') = Age Gender Breed Weight Color Source EvalCat SNatIn
NumBehvCond NumMedCond FoodAggro AnimalAggro KennelCough Dental Fearful
Handling UnderAge Injuris InRegion OutRegion
/ lackfit aggregate scale = none selection = backward rsquare;
output out = probdogs4 PREDPROBS=I reschi = pearson h = leverage;
run;
Class Level Info
Odds Ratios Estimates

In Proc Freq, you are calculating unadjusted odds ratio while in proc logistics, all odds ratio were adjusted for covariates included in the logistic regression model

Related

proc transreg not outputting curve fit plot

I am using proc transreg to test different transformations in the sashelp.baseball dataset. I request all plots and sometimes I can see a curve fit graph and sometimes I can't. Is there something I am missing if I want to output the regression fit with the code below?
DATA BASEBALL;
SET SASHELP.BASEBALL;
RUN;
ODS GRAPHICS ON;
ODS OUTPUT
NObs = num_obs
FitStatistics = fitstat
Coef = params
;
PROC TRANSREG
DATA=BASEBALL
PLOTS=ALL
SOLVE
SS2
PREDICTED;
;
MODEL_1:
MODEL POWER(logsalary/parameter=1) = log(nruns);
OUTPUT OUT = fitted_model;
RUN;
For clarity, the regression fit plot is a scatter plot with the estimated regression line fitted through
The fit plot is generated when the dependent variable does not have a transformation. You can create the transformation ahead of time to get this graph then.
From documentation:
ODS Graph Name: FitPlot
Plot Description: Simple Regression and Separate Group Regressions
Statement and Option: MODEL, a dependent variable that is not
transformed, one non-CLASS independent variable, and at most one CLASS
variable
This code works for me:
PROC TRANSREG
DATA=sashelp.BASEBALL
PLOTS=ALL
SOLVE
SS2
PREDICTED;
;
MODEL_1:
MODEL identity(logsalary) = log(nruns);
OUTPUT OUT = fitted_model;
RUN;
And generates the desired graph.

Repeated subject multivariate regression in SAS: PROC GENMOD

I have patient eye data. Each eye is assigned EyeID and each patient is assigned PatientID. Each patient has 2 eyes. I am doing multivariate logistic regression with PROC GENMOD. To adjust for the fact that there are 2 eyes per patient, I used the option repeated subject=PatientID(EyeID). Is this correct?
I have pasted my code below.
proc genmod data=test descend;
class PatientID EyeID Explan1 Explan2 Explan3 / param=ref;
model Therapy = Explan1 Explan2 Explan3/ dist=bin;
repeated subject=PatientID(EyeID) / corr=unstr corrw;
run;
My reputation is not high enough to comment but the following might be helpful to you since it deals both with repeated measures and the same research subject matter.
http://www2.sas.com/proceedings/sugi29/188-29.pdf

drawing histogram and boxplot in SAS

I wrote the following code in sas, but I did not get result!
The result histogram in grey and the range of data is not as I specified! what is the problem?
I got the following warning too: WARNING: The MIDPOINTS= list was extended to accommodate the data
what about color?
axis1 order=(0 to 100000 by 50000);
axis2 order=(0 to 100 by 5);
run;
proc capability data=HW2 noprint;
histogram Mvisits/midpoints=0 to 98000 by 10000
haxis=axis1
cfill=blue;
run;
.......................................
I have the same problem with boxplot, for example I got the following plot and I want to change the distances, then I could see the plot better, but I could not.
The below is for proc univariate rather than proc capability, I do not have access to SAS/QC to test, but the user guide shows very similar syntax for the histogram statements. Hopefully, you'll be able to translate it back.
It looks like you are having problems with the colour due to your output system. Your graphs are probably delivered via ODS, in which case the cfill option does not apply (see here and not the Traditional Graphics tag).
To change the colour of the histogram bars in ODS output you can use proc template:
proc template;
define style styles.testStyle;
parent = styles.htmlblue;
style GraphDataDefault /
color = green;
end;
run;
ods listing style = styles.testStyle;
proc univariate data = sashelp.cars;
histogram mpg_city;
run;
An example explaining this can be found here.
Alternatively you can use proc sgplot to create a histogram with more control of the colour as follows:
proc sgplot data = sashelp.cars;
histogram mpg_city / fillattrs = (color = red);
run;
As to your question of truncating the histogram. It doesn't really make a great deal of sense to ignore the extreme values as it will give you an erroneous image of the distribution, which somewhat defeats the purpose of the histogram. That said, you can achieve what you are asking for with bit of a hack:
data tempData;
set sashelp.cars;
tempClass = 1;
run;
proc univariate data = tempData noprint;
class tempClass;
histogram mpg_city / maxnbin = 5 endpoints = 0 to 25 by 5;
run;
In the above a dummy class tempClass is created and then comparative histograms are requested using the class statement. maxnbins will limit the number of bins displayed only in a comparative histogram.
Your other option is to exclude (or cap) your extreme points before creating the histogram, but this will lead to slightly erroneous frequency counts/percentages/bar heights.
data tempData;
set sashelp.cars;
mpg_city = min(mpg_city, 20);
run;
proc univariate data = tempData noprint;
histogram mpg_city / endpoints = 0 to 25 by 5;
run;
This is a possible approach to original question (untested as no SAS/QC or data):
proc capability data = HW2 noprint;
histogram Mvisits /
midpoints = 0 to 300000 by 10000
noplot
outhistogram = histData;
run;
proc sgplot data = histData;
vbar _MIDPT_ /
response = _OBSPCT_
fillattrs = (color = blue);
where _MIDPT_ <= 100000;
run;

SAS selecting top logit models by AIC

I have a problem with SAS proc logistic.
I was using the following procedures when I had OLS regression and everything worked OK:
proc reg data = input_data outest = output_data;
model y = x1-x25 / selection = cp aic stop = 10;
run;
quit;
Here I wanted SAS to estimate all possible regressions using combinations of 25 regressors (x1-x25) including no more than 10 regressors in model.
Basically, I want to do the same thing (estimate all possible models having 25 regressors with no more than 10 included in a model and output top-models in a dataset with corresponding AIC) but with logistic regression.
I also know that I can use selection = score in Proc Logistic, but I'm not sure how to use outest= then and whether Score Chi-square is really a reliable alternative to cp and AIC in proc reg
So far, I know how to do stepwise/backward/forward logistic regressions, but these methods do not suit me well and btw they display in the output dataset only the top-1 model, while I want at least top-100.
Any help or advice will be highly appreciated!

Sensitivity and specificity

I need to calculate the following for my dataset. I could calculate individual PPV (95% CI) and NPV (95% CI) but got tad confused about how to calculate this:
PPV+NPV-1 (95% CI)
How do I do this calculation?
This page on SAS support gives code as follows:
title 'Sensitivity';
proc freq data=FatComp;
where Response=1;
weight Count;
tables Test / binomial(level="1");
exact binomial;
run;
title 'Specificity';
proc freq data=FatComp;
where Response=0;
weight Count;
tables Test / binomial(level="0");
exact binomial;
run;
title 'Positive predictive value';
proc freq data=FatComp;
where Test=1;
weight Count;
tables Response / binomial(level="1");
exact binomial;
run;
title 'Negative predictive value';
proc freq data=FatComp;
where Test=0;
weight Count;
tables Response / binomial(level="0");
exact binomial;
run;
I doubt that this is a useful measure. In general you should present sensitivity, specificity, positive and negative predictive values. If you want a global measure of accuracy you should go for the proportion of correctly classified subjects.
If you go in the webpage already suggested by Peter Flom yo can scroll until a piece of code for overall accuracy. The accuracy can be computed by creating a binary variable indicating whether test and response agree in each observation. :
data acc;
set FatComp;
if (test and response) or
(not test and not response) then acc=1;
else acc=0;
run;
proc freq;
weight count;
tables acc / binomial(level="1");
exact binomial;
run;
Hope it helps