I am trying to create a logistic regression model but I keep on getting error messages on when I try to create it.
Here is my sample code:
proc logistic data = cleaned_anes descending;
class default / param=glm;
model default = student balance income;
run;
It says my variables are not found. But they are in my excel spreadsheet.
When I run a proc glimmix in SAS, sometimes it drops observations.
How do I get the set of dropped/excluded observations or maybe the set of included observations so that I can identify the dropped set?
My current Proc GLIMMX code is as follows-
%LET EST=inputf.aarefestimates;
%LET MODEL_VAR3 = age Male Yearc2010 HOSPST
Hx_CTSURG Cardiogenic_Shock COPD MCANCER DIABETES;
data work.refmodel;
set inputf.readmref;
Yearc2010 = YEAR - 2010;
run;
PROC GLIMMIX DATA = work.refmodel NOCLPRINT MAXLMMUPDATE=100;
CLASS hospid HOSPST(ref="xx");
ODS OUTPUT PARAMETERESTIMATES = &est (KEEP=EFFECT ESTIMATE STDERR);
MODEL RADM30 = &MODEL_VAR3 /Dist=b LINK=LOGIT SOLUTION;
XBETA=_XBETA_;
LINP=_LINP_;
RANDOM INTERCEPT/SUBJECT= hospid SOLUTION;
OUTPUT OUT = inputf.aar
PRED(BLUP ILINK)=PREDPROB PRED(NOBLUP ILINK)=EXPPROB;
ID XBETA LINP hospst hospid Visitlink Key RADM30;
NLOPTIONS TECH=NRRIDG;
run;
Thank you in advance!
It drops records with missing values in any variable you're using in the model, in a CLASS, BY, MODEL, RANDOM statement. So you can check for missing among those variables to see what you get. Usually the output data set will also indicate this by not having predictions for the records that are not used.
You can run the code below.
*create fake data;
data heart;set sashelp.heart; ;run;
*Logistic Regression model, ageCHDdiag is missing ;
proc logistic data=heart;
class sex / param=ref;
model status(event='Dead') = ageCHDdiag height weight diastolic;
*generate output data;
output out=want p=pred;
run;
*explicitly flag records as included;
data included;
set want;
if missing(pred) then include='N'; else include='Y';
run;
*check that Y equals total obs included above;
proc freq data=included;
table include;
run;
The output will show:
The LOGISTIC Procedure
Model Information
Data Set WORK.HEART
Response Variable Status
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 5209
Number of Observations Used 1446
And then the PROC FREQ will show:
The FREQ Procedure
Cumulative Cumulative
include Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
N 3763 72.24 3763 72.24
Y 1446 27.76 5209 100.00
And 1,446 records are included in both of the data sets.
I think I answered my question.
The code line -
OUTPUT OUT = inputf.aar
gives the output of the model. This table includes all the observations used in the proc statement. So I can match the data in this table to my input table and find the observations that get dropped.
#REEZA - I already looked for missing values for all the columns in the data. Was not able to identify the records there are getting dropped by only identifying the no. of records with missing values. Thanks for the suggestion though.
I have patient eye data. Each eye is assigned EyeID and each patient is assigned PatientID. Each patient has 2 eyes. I am doing multivariate logistic regression with PROC GENMOD. To adjust for the fact that there are 2 eyes per patient, I used the option repeated subject=PatientID(EyeID). Is this correct?
I have pasted my code below.
proc genmod data=test descend;
class PatientID EyeID Explan1 Explan2 Explan3 / param=ref;
model Therapy = Explan1 Explan2 Explan3/ dist=bin;
repeated subject=PatientID(EyeID) / corr=unstr corrw;
run;
My reputation is not high enough to comment but the following might be helpful to you since it deals both with repeated measures and the same research subject matter.
http://www2.sas.com/proceedings/sugi29/188-29.pdf
I am seeking to obtain risk ratio estimates from multiply imputed, cluster-correlated data in SAS using log binomial regression using SAS Proc Genmod. I've been able to calculate risk ratio estimates for the raw (non-MI) data, but it seems that the program is hitting a snag in generating an output dataset for me to read into Proc Mianalyze.
I am including a repeated subjects statement so that SAS will use robust variance estimation. Without the "repeated subjects" statement, the ODS Output statement seems to work just fine; however, once I include the "repeated subjects" statement, I receive an warning message that my output dataset was not generated.
I am open to other approaches and suggestions to generate risk ratio estimates using this data if the genmod/mianalyze combination is not appropriate, but would like to see if I can get this to work! I would prefer SAS, if possible, due to license access issues to other programs, like Stata and SUDAAN. My code is below, where "seroP" is my binomial outcome, "int" is the binomial independent variable of interest (intervention received vs not received), "tf5" is a binomial covariate, age is a continuous covariate, and village specifies the cluster:
Proc GenMod data=sc.wide_mip descending ; by _Imputation_;
Class int (ref='0') tf5 (ref='0') village /param=ref ;
weight weight;
Model seroP= int tf5 age /
dist=bin Link=logit ;
repeated subject=village/ type=unstr;
estimate 'Beta' int 1 -1/exp;
ods output ParameterEstimates=sc.seroP;
Run;
proc mianalyze parms =sc.seroP;
class int tf5 ;
modeleffects int tf5 age village ;
run;
Thank you for your help!
The short answer is to add an option “PRINTMLE” at the end of the “Repeated” statement. But the code you posted here may not produce what you actually want. So, following is a longer answer:
1.The program below is based on SAS 9.3 (or newer versions) for Windows. If you are using an older version, the coding might be different.
2.For PROC MIANALYZE, three ODS tables from PROC GENMOD are required instead of one, namely, 1) the parameter estimate table (_est); 2) the covariance table (_covb); and 3) the parameter index table (parminfo). The first line of the PROC MIANALYZE statement should look like:
PROC MIANALYZE parms = ~_est covb = ~_covb parminfo=parminfo;
whereas ~_est refers to an ODS parameter table, and ~_covb refers to an ODS covariance table.
There are different types of ODS parameter estimate and covariance tables. The sign “~” should be replaced by a specific set of ODS tables, which will be discussed in the following part.
3.From PROC GENMOD, three different sets of ODS parameter and covariance tables can be generated.
3a) The first set of tables is from a non-repeated model (i.e., without the “repeated” statement). In your case, it looks like:
Proc GenMod data=sc.wide_mip descending ; by _Imputation_;
…
MODEL seroP= int tf5 age/dist=bin Link=logit COVB; /*adding the option COVB*/
/*repeated subject=village/ type=unstr;*/
/*Note that the above line has been changed to comments*/
…
ODS OUTPUT
/*the estimates from a non-repeated model*/
ParameterEstimates=norepeat_est
/*the covariance from a non-repeated model*/
Covb = nonrepeat_covb
/*the indices of the parameters*/
ParmInfo=parminfo;
Run;
Of note, 1) the option COVB is added in the MODEL statement, so as to obtain the ODS covariance table. 2) The “Repeated” statement is put as comments. 3) The “~_est” table is named “nonrepeat_est”. Similarly, the table “~_covb” is named “nonrepeat_covb.
3b) The second set of tables contains model-based estimates from a repeated model. In your case, it looks like:
…
MODEL seroP= int tf5 age/dist=bin Link=logit;
REPEATED subject=village/ type=un MODELSE MCOVB;/*options added*/
…
ODS OUTPUT
/*the model-based estimates from a repeated model*/
GEEModPEst=mod_est
/*the model-based covariance from a repeated model*/
GEENCov= mod_covb
/*the indices of the parameters*/
parminfo=parminfo;
Run;
In the “REPEATED” statement, the option MODELSE is to generate model-based parameter estimates, and MCOVB is to generate the model based covariance. Without these options, the corresponding ODS tables (i.e., GEEModPEst and GEENCov) will not be generated. Note that the ODS table names are different from the previous case. In this case, the tables are GEEModPEst and GEENCov. In the previous case (a non-repeated model), the tables were ParameterEstimates and COVB. Here, the ~_est table is named “mod_est”, standing for the model-based estimates. Similarly, the ~_covb table is named “mod_covb”. The ParmInfo table is the same as in the previous model.
3c) A third set contains empirical estimates, also from a repeated model. The empirical estimates are also called ROBUST estimates. Sounds like the results here are what you want. It looks like:
…
MODEL seroP= int tf5 age/dist=bin Link=logit;
REPEATED subject=village/ type=un ECOVB;/*option changed*/
…
ODS OUTPUT
/*the empirical(ROBUST) estimates from a repeated model*/
GEEEmpPEst=emp_est
/*the empirical(ROBUST) covariance from a repeated model*/
GEERCov= emp_covb
/*the indices of the parameters*/
parminfo=parminfo;
Run;
As you may have noticed, in the “Repeated” statement, the option is changed to ECOVB. That way, the empirical covariance table will be generated. Nothing is required to generate the empirical parameter estimates, as they are always produced by the procedure. The ParmInfo table is the same as in the previous cases.
4.Putting together, actually you can generate the three sets of tables at the same time. The only thing is that, an option “PRINTMLE” should be added, so as to generate estimates from a non-repeated model when repeated terms are in place. The combined program looks like the following:
Proc GenMod data=sc.wide_mip descending ; by _Imputation_;
Class int (ref='0') tf5 (ref='0') village /param=ref ;
weight weight;
Model seroP= int tf5 age /
dist=bin Link=logit COVB; /*COVB to have non-repeated model covariance*/
repeated subject=village/ type=UN MODELSE PRINTMLE MCOVB ECOVB;/*all options*/
estimate 'Beta' int 1 -1/exp;
ODS OUTPUT
/*the estimates from a non-repeated model*/
ParameterEstimates=norepeat_est
/*the covariance from a non-repeated model*/
Covb = nonrepeat_covb
/*the indices of the parameters*/
ParmInfo=parminfo
/*the model-based estimates from a repeated model*/
GEEModPEst=mod_est
/*the model-based covariance from a repeated model*/
GEENCov= mod_covb
/*the empirical(ROBUST) estimates from a repeated model*/
GEEEmpPEst=emp_est
/*the empirical(ROBUST) covariance from a repeated model*/
GEERCov= emp_covb
;
Run;
/*Analyzing non-repeated results*/
PROC MIANALYZE parms = norepeat_est covb = norepeat_covb parminfo=parminfo;
class int tf5 ;
modeleffects int tf5 age village ;
run;
/*Analyzing model-based results*/
PROC MIANALYZE parms = mod_est covb = mod_covb parminfo=parminfo;
class int tf5 ;
modeleffects int tf5 age village ;
run;
/*Analyzing empirical(ROBUST) results*/
PROC MIANALYZE parms = emp_est covb = emp_covb parminfo=parminfo;
class int tf5 ;
modeleffects int tf5 age village ;
run;
Hopefully it helps. For further reading:
SAS proc genmod with clustered, multiply imputed data
http://www.ats.ucla.edu/stat/sas/v8/mianalyzev802.pdf
http://analytics.ncsu.edu/sesug/2006/ST12_06.PDF
Allison, Paul D. Logistic Regression Using SAS®: Theory and Application, Second Edition (page 226-234). Copyright © 2012, SAS Institute Inc.,Cary, North Carolina, USA.
I'm working on a project and have run into an expected issue. After running PROC LOGISTIC on my data, I noticed that a few of the odds ratios and regression coefficients seemed to be the inverse of what they should be. After some investigation using PROC FREQ to run the odds ratios, I believe there is some form of error with the odds ratios from PROC LOGISTIC.
The example below is of the response variable "MonthStay" and one of the variables in question "KennelCough". MonthStay = Y and the event of interest is KennelCough = N.
I don't know how to remedy this suspected error. Am I missing something in my code to get the correct calculations? Or am I totally misunderstanding what's going on? Thanks!
Here is the PROC FREQ code and result:
proc freq data = capstone.adopts_dog order = freq;
tables KennelCough*MonthStay / relrisk;
run;
Here is the PROC LOGISTIC CODE and results:
proc logistic data = capstone.adopts_dog plots(only)=(roc(id=prob) effect);
class Breed(ref='Chihuahua') Gender(ref='Female')
Color(ref='Black') Source(ref='Stray') EvalCat(ref='TR') SNAtIn(ref='No')
FoodAggro(ref='Y') AnimalAggro(ref='Y') KennelCough(ref='Y') Dental(ref='Y')
Fearful(ref='Y') Handling(ref='Y') UnderAge(ref='Y') InJuris(ref='Alameda County')
InRegion(ref='East Bay SPCA - Dublin') OutRegion(ref='East Bay SPCA - Dublin')
/ param=ref;
model MonthStay(event='Y') = Age Gender Breed Weight Color Source EvalCat SNatIn
NumBehvCond NumMedCond FoodAggro AnimalAggro KennelCough Dental Fearful
Handling UnderAge Injuris InRegion OutRegion
/ lackfit aggregate scale = none selection = backward rsquare;
output out = probdogs4 PREDPROBS=I reschi = pearson h = leverage;
run;
Class Level Info
Odds Ratios Estimates
In Proc Freq, you are calculating unadjusted odds ratio while in proc logistics, all odds ratio were adjusted for covariates included in the logistic regression model