I think the question is more related to SAS syntax than statistics and is about proper repeated statement for PROC genmod
I am trying to implement Poisson regression with log link and with robust error variance for survey data.
Here is a working code for non survey data that I tested and it works as intended:
proc genmod data = eyestudy;
class carrot id;
model lenses = carrot/ dist = poisson link = log;
repeated subject = id/ type = unstr;
estimate 'Beta' carrot 1 -1/ exp;
run;
Code above and more information about Poisson regression with log link and with robust error variance but fro non survey data is here: https://stats.idre.ucla.edu/sas/faq/how-can-i-estimate-relative-risk-in-sas-using-proc-genmod-for-common-outcomes-in-cohort-studies/
Below is an example how to use code for PROC genmod for survey analysis (but with dist=binomial link=identity and I think without robust error variance)
proc genmod data=nis10;
class seqnumt estiapt10;
model r_tet_not_utd = / dist=binomial link=identity;
weight provwt;
repeated subject=seqnumt(estiapt10);
where sex = 2;
run;
here strata variable name is estiapt10, cluster variable name is seqnumt and weight variable name is provwt.
Code above and more information about survey data analysis here: https://support.sas.com/resources/papers/proceedings13/272-2013.pdf
My strata variable name is CSTRATM, cluster variable name is CPSUM and weight variable name is PATWT. Dependent variable name is DIETNUTR independent variable name is age_group_var. My data is located in sas_stata. So I tryed this code:
proc genmod data=sas_stata;
class age_group_var id CPSUM CSTRATM;
model DIETNUTR = age_group_var/ dist = poisson link = log;
weight PATWT;
repeated subject = id/ type = unstr;
repeated subject = CPSUM(CSTRATM);
estimate 'Beta' age_group_var 1 -1/ exp;
run;
but it gave me warning:
WARNING: Only the last REPEATED statement is used.
As I understand after reading articles above and some other material I am doing everything right except not the proper repeated statement. For Poisson regression with log link and with robust error variance for survey data I assume there should be some combination of two repeated statements in my code above. I tried several variants of combining those repeated statements but without any luck.
So my question is: What is the code for Poisson regression with log link and with robust error variance for survey data?
I am not quite sure I understood what CPSUM(CSTRATM) is. But I am assuming you are looking to use a interactive or nested effect as the subject.
Assuming x = CPSUM(CSTRATM), you could code up the effect as:
repeated subject = id * x
A full survey of coding up effects can be found at
https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_genmod_sect038.htm
This might be a useful read also:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi29/188-29.pdf
Related
I am trying to create report like the one:
But I am getting the report like this:
I am using the proc report to do this. I don't want the columns which has All the values zero. The AFG,ALL,CapMkt,IIS and PWMDIv are the sublevels of CIB and CRE.
The code I used is:
proc report data=CIB_PWM nowd;
column OWNER_NAME OWNER_EMP_NBR RM_Division RM_Region Owner_Team
Owner_YTD_Rev Owner_Prior_YTD_Rev Owner_Rolling_12_Rev
Attd_Rpt_LOB,(Attd_Rpt_SubLOB, (N));
define Attd_Rpt_LOB / Across ' ' ;
define Attd_Rpt_SubLOB / Across ' ';
define N / ' ';
define OWNER_NAME / group 'OWNER_NAME' format=$488. /*missing order=formatted*/;
define OWNER_EMP_NBR / group 'OWNER_EMP_NBR' format=$80. missing order=formatted;
define RM_Division / group 'RM_Division' missing;
define RM_Region / group 'RM_Region' missing;
define Owner_Team / group 'OWNER_HO_MKT_DEPT' format=$324. missing order=formatted;
define Owner_YTD_Rev / analysis SUM 'Owner_YTD_Rev' format=DOLLAR15. missing;
define Owner_Prior_YTD_Rev / analysis SUM 'Owner_Prior_YTD_Rev' format=DOLLAR15. missing;
define Owner_Rolling_12_Rev / analysis SUM 'Owner_Rolling_12_Rev' format=DOLLAR15. missing;
run;
quit;
Do you think I need to add any more option for this one?
Your help will be very much appreciated.
Thank you
Shankar:
You probably figured something out by now.
The 'want' image shows columns with all zeroes, so it doesn't align with the statement "I don't want the columns which has All the values zero."
Perhaps you can post some sample data demonstrating the problem.
Without seeing more I would suggest creating one or more pre-report steps that look for sets of rows contributing to cases of 'not to be reported' and marking them with a exclude_flag. The report step would then be able to use a statement
where not exclude_flag;
If this extra processing still does not produce the desired report, you may need to perform a data tranpose step and write a different Proc REPORT step.
So using SAS, I have a number of SAS monthend datasets named as follows:
mydata_201501
mydata_201602
mydata_201603
mydata_201604
mydata_201605
...
mydata_201612
Each has account information at particular monthend. I want to stack the datasets all into one dataset using colon rather than writing out the full set statement as follows:
data mynewdata;
set mydata_:;
run;
However there is no datestamp variable within the datasets so when I stack them I will lose the monthend information for each account. I want to know which line refers to which monthend for each account. Is there a way I can automatically create a variable that names the table the row come from. for example the long winded way would be this:
data mynewdata;
set mydata_201501 (in=a) mydata_201502 (in=b) mydata_201503 (in=c)...;
if a then tablename = 'mydata_201501';
if b then tablename = 'mydata_201502';
if c...
run;
but is there a quicker way using colon along these lines?
data mynewdata;
set mydata_:;
tablename = _tablelabel_;
run;
thanks
I always find clicking on comment links annoying, so hopefully here's the answer in your context. Use the INDSNAME= SET statement option to assign the dataset name to a variable:
data mynewdata;
set mydata_: indsname=_tablelabel_;
tablename = _tablelabel_;
run;
N.B. you can call _tablelabel_ whatever you want, and you may wish to change it so it doesn't look like a SAS generated variable name.
INDSNAME= only became a SAS SET statement option in version 9.2
Just to be clear, with my particular code, where the datasets were named mydata_yyyymm and I wanted a monthend variable with datestamp, I was able to produce this using the solution provided by mjsqu as follows (obs and keep statement provided if required):
data mynewdata;
set mydata_: (obs=100 keep=xxx xxx) indsname=_tablelabel_;
format monthend yymmdd10.;
monthend = input(scan(_tablelabel_,-1,'_'),yymmn6.);
run;
I use a PROC MIXED to build the mixed-effect model. In my PROC MIXED, I have the 'STORE' statement which produce the statistical analysis and store in the binary file format.
(http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_mixed_sect022.htm )
The code looks something like this:
PROC MIXED data=dat;
class id;
model height = Age;
random intercept;
STORE OUT = plm;
run;
I am wondering if there is anyway I can access this file (plm).
Cheers
I'd like to be able to fit a gee model with exchangeable var-cov matrix and then run a Huber-White sandwich estimator on the resulted model to guard against biased results. My code for my GEE model is as below:
Proc GENMOD data = Cohort1ONLY;
class SSID SCHIID0809 Ethnicity(ref = "500") ELLbaseline GENDER freeLunch failedInd
GRADE0809(ref = "3")/param = ref;
Model SSMATH0809 = TRT0809 SSMATH0708 SSENG0708 GRADE0809 ELLbaseline GENDER freeLunch
ethnicity failedInd;
repeated subject = SCHIID0809/ type = exch /*corrw: to print the varcov matrix*/;
Run;
I know that the Huber-White Sandwich estimator (Empirical) can easily be implemented in Proc MIXED with the Empirical Option. I have to use GENMOD because of all reference groups that I've defined above. Is there anyway that I can pass the result through a macro that does the HuberWhite sandwich estimator based on the residuals got from the GENMOD above?
I appreciate your help.
-Sepehr
One way is to use empirical parameter covariance matrix using the COVB option available in proc GENMOD. In order to use the empirical covariance matrix estimator (also known as robust variance estimator, or sandwich estimator or Huber-White method) we should add the covb option to repeated statement in proc genmod:
repeated subject={subject id} / covb;
I’m wondering if someone can help with a coding problem I have.
Background – I have a project that imports some files and uses the data in those files to perform projections. The contents of the files determines some aspects of the size of the output that follows. Simply, values in data loaded in drives the size and shape of the tables that follow, and this can vary.
The following code is an example of the problem.
The data loaded will have a variable year start (note wf2009, 2009 is the first year) and variable range (this example goes from 2009 to 2030, but this will vary too).
proc summary data= labeled_proj_data_hc;
class jurisdiction specialty measure;
types jurisdiction*specialty*measure;
VAR wf2009--wf2030;
output out= sum_labeled_proj_data_hc
sum(wf2009) = y2009
sum(wf2010) = y2010
sum(wf2011) = y2011
sum(wf2012) = y2012;
run;
Where I’m not sure how to proceed is:
sum(wf2009) = y2009
sum(wf2010) = y2010
sum(wf2011) = y2011
sum(wf2012) = y2012;
In the sequence of lines calling for the sum of their respective columns, how can I make this dynamic so that the start year is populated from a variable and it increments yearly until the last year which is also variable.
Has anyone solved a similar problem.
Cheers,
Is renaming the variables necessary? If not then you can use the : wildcard operator to access all variables that begin with 'wf', then just put SUM= in the output statement, which will preserve the original names.
So your proc summary would look like this.
proc summary data= labeled_proj_data_hc;
class jurisdiction specialty measure;
types jurisdiction*specialty*measure;
VAR wf: ;
output out= sum_labeled_proj_data_hc
sum=;
run;