I am doing cross-sectional logistic regression modeling of the probability of an event in eyes. Each patient is assigned an PatientID and each eye is assigned an EyeID; there are 2 eyes per patient.
I have attached my code blow.
PROC GENMOD data=new descend;
class patientID Explan1(ref="0") Explan2(ref ="0") Gender(ref="M") / param=ref;
model Therapy = PVD_STATUS Explan1 Explan2 Explan3 Gender/ dist=bin;
repeated subject=patientID(EyeID) / corr=unstr corrw;
run;
I get this error code: ERROR: Nesting of continuous variable not allowed.
This could be an issue related to the
repeated subject=patientID(EyeID)
Has anyone encountered this before? Possible solutions?
Set EyeID as a class variable. SAS assumes that it is continuous unless otherwise defined.
PROC GENMOD data=new descend;
class EyeID patientID Explan1(ref="0") Explan2(ref ="0") Gender(ref="M") / param=ref;
model Therapy = PVD_STATUS Explan1 Explan2 Explan3 Gender/ dist=bin;
repeated subject=patientID(EyeID) / corr=unstr corrw;
run;
I have a dataset of patient diagnoses with one diagnosis code per line, resulting in patient diagnoses on multiple lines. Each patient has a unique patientID. I also have age, race, gender, etc. data on these patients.
How do I indicate to SAS when using PROC FREQ, Logistic, Univariate, etc. that they are the same patient?
This is an example of what the data looks like:
patientID diagnosis age gender lab
1 15.02 65 M positive
1 250.2 65 M positive
2 348.2 23 M negative
2 282.1 23 M negative
3 50 F positive
I was given data on every patient who has had a certain lab (regardless of positive result), as well as all of their diagnoses, which each appear on a different line (as a different observation to SAS). First, I will need to exclude every patient who has a negative result for the lab, which I plan on using an IF statement for. The lab determines if the patient has disease X. Some patients do not have any additional diseases, other than disease X, such as patient #3.
Analyses I would like to perform:
Calculate the frequency of each disease using PROC FREQ.
Characterize the age and race relationships for each diagnosis using PROC FREQ chi square.
PROC Logistic to determine risk factors (age, race, gender, etc.)for developing an additional disease on top of disease X.
Thanks!
The answer to your question is you cannot by default. But when you're processing the data you can account for it easily. IMO keeping it long is easier.
You've asked too many questions above so I'll answer just one, how to count the number of people with disease x.
Proc sort data = have out = unique_disease_patient nodupkey;
By patientID Diag;
Run;
Proc freq data = unique_disease_patient noprint;
Table disease / out = disease_patient_count;
Run;
Note that this is much easier in SQL
Proc sql;
Create table want as
Select diag, count(distinct patientID)
From have
Group by diag;
Quit;
I'm assuming this is homework because you're unlikely to do this in practice except for exploratory analysis.
I am seeking to obtain risk ratio estimates from multiply imputed, cluster-correlated data in SAS using log binomial regression using SAS Proc Genmod. I've been able to calculate risk ratio estimates for the raw (non-MI) data, but it seems that the program is hitting a snag in generating an output dataset for me to read into Proc Mianalyze.
I am including a repeated subjects statement so that SAS will use robust variance estimation. Without the "repeated subjects" statement, the ODS Output statement seems to work just fine; however, once I include the "repeated subjects" statement, I receive an warning message that my output dataset was not generated.
I am open to other approaches and suggestions to generate risk ratio estimates using this data if the genmod/mianalyze combination is not appropriate, but would like to see if I can get this to work! I would prefer SAS, if possible, due to license access issues to other programs, like Stata and SUDAAN. My code is below, where "seroP" is my binomial outcome, "int" is the binomial independent variable of interest (intervention received vs not received), "tf5" is a binomial covariate, age is a continuous covariate, and village specifies the cluster:
Proc GenMod data=sc.wide_mip descending ; by _Imputation_;
Class int (ref='0') tf5 (ref='0') village /param=ref ;
weight weight;
Model seroP= int tf5 age /
dist=bin Link=logit ;
repeated subject=village/ type=unstr;
estimate 'Beta' int 1 -1/exp;
ods output ParameterEstimates=sc.seroP;
Run;
proc mianalyze parms =sc.seroP;
class int tf5 ;
modeleffects int tf5 age village ;
run;
Thank you for your help!
The short answer is to add an option “PRINTMLE” at the end of the “Repeated” statement. But the code you posted here may not produce what you actually want. So, following is a longer answer:
1.The program below is based on SAS 9.3 (or newer versions) for Windows. If you are using an older version, the coding might be different.
2.For PROC MIANALYZE, three ODS tables from PROC GENMOD are required instead of one, namely, 1) the parameter estimate table (_est); 2) the covariance table (_covb); and 3) the parameter index table (parminfo). The first line of the PROC MIANALYZE statement should look like:
PROC MIANALYZE parms = ~_est covb = ~_covb parminfo=parminfo;
whereas ~_est refers to an ODS parameter table, and ~_covb refers to an ODS covariance table.
There are different types of ODS parameter estimate and covariance tables. The sign “~” should be replaced by a specific set of ODS tables, which will be discussed in the following part.
3.From PROC GENMOD, three different sets of ODS parameter and covariance tables can be generated.
3a) The first set of tables is from a non-repeated model (i.e., without the “repeated” statement). In your case, it looks like:
Proc GenMod data=sc.wide_mip descending ; by _Imputation_;
…
MODEL seroP= int tf5 age/dist=bin Link=logit COVB; /*adding the option COVB*/
/*repeated subject=village/ type=unstr;*/
/*Note that the above line has been changed to comments*/
…
ODS OUTPUT
/*the estimates from a non-repeated model*/
ParameterEstimates=norepeat_est
/*the covariance from a non-repeated model*/
Covb = nonrepeat_covb
/*the indices of the parameters*/
ParmInfo=parminfo;
Run;
Of note, 1) the option COVB is added in the MODEL statement, so as to obtain the ODS covariance table. 2) The “Repeated” statement is put as comments. 3) The “~_est” table is named “nonrepeat_est”. Similarly, the table “~_covb” is named “nonrepeat_covb.
3b) The second set of tables contains model-based estimates from a repeated model. In your case, it looks like:
…
MODEL seroP= int tf5 age/dist=bin Link=logit;
REPEATED subject=village/ type=un MODELSE MCOVB;/*options added*/
…
ODS OUTPUT
/*the model-based estimates from a repeated model*/
GEEModPEst=mod_est
/*the model-based covariance from a repeated model*/
GEENCov= mod_covb
/*the indices of the parameters*/
parminfo=parminfo;
Run;
In the “REPEATED” statement, the option MODELSE is to generate model-based parameter estimates, and MCOVB is to generate the model based covariance. Without these options, the corresponding ODS tables (i.e., GEEModPEst and GEENCov) will not be generated. Note that the ODS table names are different from the previous case. In this case, the tables are GEEModPEst and GEENCov. In the previous case (a non-repeated model), the tables were ParameterEstimates and COVB. Here, the ~_est table is named “mod_est”, standing for the model-based estimates. Similarly, the ~_covb table is named “mod_covb”. The ParmInfo table is the same as in the previous model.
3c) A third set contains empirical estimates, also from a repeated model. The empirical estimates are also called ROBUST estimates. Sounds like the results here are what you want. It looks like:
…
MODEL seroP= int tf5 age/dist=bin Link=logit;
REPEATED subject=village/ type=un ECOVB;/*option changed*/
…
ODS OUTPUT
/*the empirical(ROBUST) estimates from a repeated model*/
GEEEmpPEst=emp_est
/*the empirical(ROBUST) covariance from a repeated model*/
GEERCov= emp_covb
/*the indices of the parameters*/
parminfo=parminfo;
Run;
As you may have noticed, in the “Repeated” statement, the option is changed to ECOVB. That way, the empirical covariance table will be generated. Nothing is required to generate the empirical parameter estimates, as they are always produced by the procedure. The ParmInfo table is the same as in the previous cases.
4.Putting together, actually you can generate the three sets of tables at the same time. The only thing is that, an option “PRINTMLE” should be added, so as to generate estimates from a non-repeated model when repeated terms are in place. The combined program looks like the following:
Proc GenMod data=sc.wide_mip descending ; by _Imputation_;
Class int (ref='0') tf5 (ref='0') village /param=ref ;
weight weight;
Model seroP= int tf5 age /
dist=bin Link=logit COVB; /*COVB to have non-repeated model covariance*/
repeated subject=village/ type=UN MODELSE PRINTMLE MCOVB ECOVB;/*all options*/
estimate 'Beta' int 1 -1/exp;
ODS OUTPUT
/*the estimates from a non-repeated model*/
ParameterEstimates=norepeat_est
/*the covariance from a non-repeated model*/
Covb = nonrepeat_covb
/*the indices of the parameters*/
ParmInfo=parminfo
/*the model-based estimates from a repeated model*/
GEEModPEst=mod_est
/*the model-based covariance from a repeated model*/
GEENCov= mod_covb
/*the empirical(ROBUST) estimates from a repeated model*/
GEEEmpPEst=emp_est
/*the empirical(ROBUST) covariance from a repeated model*/
GEERCov= emp_covb
;
Run;
/*Analyzing non-repeated results*/
PROC MIANALYZE parms = norepeat_est covb = norepeat_covb parminfo=parminfo;
class int tf5 ;
modeleffects int tf5 age village ;
run;
/*Analyzing model-based results*/
PROC MIANALYZE parms = mod_est covb = mod_covb parminfo=parminfo;
class int tf5 ;
modeleffects int tf5 age village ;
run;
/*Analyzing empirical(ROBUST) results*/
PROC MIANALYZE parms = emp_est covb = emp_covb parminfo=parminfo;
class int tf5 ;
modeleffects int tf5 age village ;
run;
Hopefully it helps. For further reading:
SAS proc genmod with clustered, multiply imputed data
http://www.ats.ucla.edu/stat/sas/v8/mianalyzev802.pdf
http://analytics.ncsu.edu/sesug/2006/ST12_06.PDF
Allison, Paul D. Logistic Regression Using SAS®: Theory and Application, Second Edition (page 226-234). Copyright © 2012, SAS Institute Inc.,Cary, North Carolina, USA.
I'm working on a project and have run into an expected issue. After running PROC LOGISTIC on my data, I noticed that a few of the odds ratios and regression coefficients seemed to be the inverse of what they should be. After some investigation using PROC FREQ to run the odds ratios, I believe there is some form of error with the odds ratios from PROC LOGISTIC.
The example below is of the response variable "MonthStay" and one of the variables in question "KennelCough". MonthStay = Y and the event of interest is KennelCough = N.
I don't know how to remedy this suspected error. Am I missing something in my code to get the correct calculations? Or am I totally misunderstanding what's going on? Thanks!
Here is the PROC FREQ code and result:
proc freq data = capstone.adopts_dog order = freq;
tables KennelCough*MonthStay / relrisk;
run;
Here is the PROC LOGISTIC CODE and results:
proc logistic data = capstone.adopts_dog plots(only)=(roc(id=prob) effect);
class Breed(ref='Chihuahua') Gender(ref='Female')
Color(ref='Black') Source(ref='Stray') EvalCat(ref='TR') SNAtIn(ref='No')
FoodAggro(ref='Y') AnimalAggro(ref='Y') KennelCough(ref='Y') Dental(ref='Y')
Fearful(ref='Y') Handling(ref='Y') UnderAge(ref='Y') InJuris(ref='Alameda County')
InRegion(ref='East Bay SPCA - Dublin') OutRegion(ref='East Bay SPCA - Dublin')
/ param=ref;
model MonthStay(event='Y') = Age Gender Breed Weight Color Source EvalCat SNatIn
NumBehvCond NumMedCond FoodAggro AnimalAggro KennelCough Dental Fearful
Handling UnderAge Injuris InRegion OutRegion
/ lackfit aggregate scale = none selection = backward rsquare;
output out = probdogs4 PREDPROBS=I reschi = pearson h = leverage;
run;
Class Level Info
Odds Ratios Estimates
In Proc Freq, you are calculating unadjusted odds ratio while in proc logistics, all odds ratio were adjusted for covariates included in the logistic regression model
I have a problem with SAS proc logistic.
I was using the following procedures when I had OLS regression and everything worked OK:
proc reg data = input_data outest = output_data;
model y = x1-x25 / selection = cp aic stop = 10;
run;
quit;
Here I wanted SAS to estimate all possible regressions using combinations of 25 regressors (x1-x25) including no more than 10 regressors in model.
Basically, I want to do the same thing (estimate all possible models having 25 regressors with no more than 10 included in a model and output top-models in a dataset with corresponding AIC) but with logistic regression.
I also know that I can use selection = score in Proc Logistic, but I'm not sure how to use outest= then and whether Score Chi-square is really a reliable alternative to cp and AIC in proc reg
So far, I know how to do stepwise/backward/forward logistic regressions, but these methods do not suit me well and btw they display in the output dataset only the top-1 model, while I want at least top-100.
Any help or advice will be highly appreciated!