Difference between estimates from SAS Proc Genmod and Raw averages - sas

I am trying to find average change in score for a survey response from pre to post at different levels of aggregation using two different methods. In method 1 called the "Raw Method", I calculate the average of pre scores and average of post scores and then take the difference. In method 2 " Genmod Method", I use pre and post as a categorical variable and get the estimate of the interaction term (stage2*rso). Why does not the estimate from method 2 match the average change in scores in method 1? The dataset has multiple pre's and posts for each patient (empi). This probably a very simple problem but I need some understanding. Heres my dataset and code:
empi Provider RSO SCORE STAGE
1001 A X 16.5 PRE
1001 A X 22.2 POST
1001 A X 14.3 PRE
1001 A X 23.4 POST
111 A X 25.6 PRE
1002 B X 32.3 PRE
1002 B X 12 POST
1001 A X 24.3 PRE
1002 B X 15.6 PRE
1002 B X 23.7 POST
112 A X 10.2 PRE
1234 C Y 13.5 PRE
1234 C Y 34.2 POST
1234 C Y 12.3 PRE
/** Method 1 **/
PROC SQL;
CREATE TABLE RSOSCORES AS
SELECT stage2,
RSO,
AVG(SCORE) AS AVG
FROM TEST1
GROUP BY stage2,
RSO;
QUIT;
/** Method 2 **/
proc genmod data=TEST1;
class empi rso stage2;
model SCORE = stage2 rso stage2*rso;
repeated subject = empi/ type=un corrw;
ods output GEEEmpPEst = myGEE_PEs;
run;

Related

Rolling Window Model for Unbalanced Dataset in SAS

I have an unbalanced panel dataset of the following form (simplified):
data have;
input ID YEAR EARN LAG_EARN;
datalines;
1 1960 450 .
1 1961 310 450
1 1962 529 310
2 1978 10 .
2 1979 15 10
2 1980 8 15
2 1981 10 8
2 1982 15 10
2 1983 8 15
2 1984 10 8
3 1972 1000 .
3 1973 1599 1000
3 1974 1599 1599
;
run;​
I now want to estimate the following model for each ID:
proc reg;
by ID;
EARN = LAG_EARN;
run;
However, I want to do this for rolling windows of some size. Say for example for windows of size 2. The window should only contain non-empty observations. For example, in the case of firm A, the window is applicable from 1961 onwards and thus only one time (since only one year follows after 1961 and the window is supposed to be of size 2).
Finally, I want to get a table with year columns and firm rows. The table should indicate the following: The regression model (with window size 2) has been performed one time for firm A. The quantity of available years, has only allowed one estimation of this model. Put differently, in 1962 the coefficient of the regression model has a value of X based on the 2 year prior window. Applying the same logic to the other two firms, one can get the following table. "X" representing the respective estimated coefficient value in certain year for firm A/B/C based on the 2-year window and "n" indicating the non-existence of such a value:
data want;
input ID 1962 1974 1980 1981 1982 1983 1984;
datalines;
1 X n n n n n n
2 n n X X X X X
3 n X n n n n n
;
run;​
I do not know how to execute this. Furthermore, I would like to create a macro that allows me to estimate different rolling window models while still creating analogous output dataframes. I would appreciate any help with it, since I have been struggling quite some time now.
Try this macro. This will only output if there are non-missing values of lags that you specify.
%macro lag(data=, out=, window=);
data _want_;
set &data.;
by ID;
LAG_EARN = lag&window.(earn);
if(first.ID) then call missing(lag_earn);
if(NOT missing(lag_earn));
run;
proc sort data=_want_;
by year id;
run;
proc transpose data=_want_
out=&out.(drop=_NAME_);
by ID notsorted;
id year;
var lag_earn;
run;
proc sort data=&out.;
by id;
run;
%mend;
%lag(data=have, out=want, window=1);

Variance Covariance Matrix from Proc GLM

I'm using Proc GLM to fit a basic fixed effects model and I want to get the variance/covariance matrix. I know this is very east to do if you fit a model with proc reg, but the model I'm fitting has a separate slope for each member of a class (over 50 members of the class) and thus I don't want to code dummy variables for all of them.
Is there any way to get the variance covariance matrix from a fit using proc glm.
Here is an example with made up data and my code. I would like to get a variance-covariance matrix of the estimates.
data example;
input price cat time x2 x3;
cards;
5000 1 1 5.4 50
6000 1 2 6 45
3000 1 3 7 60
4000 2 1 5 50
4500 2 2 5.4 75
4786 3 1 6 33
6500 3 2 5.8 36
1010 3 3 4 41
;;;;
run;
proc glm data=example PLOTS(UNPACK)=DIAGNOSTIC;
class cat;
model price= cat time x2 x3/ noint solution;
run;
I get a parameter estimate for each category (these are essentially nuisance parameters) and then I'm interested in the Covariance matrix of the estimates time, x2 and x3.
Thanks
You need to add output to a file: (I disabled on screen print in purpose, but feel free to enable it as you need.)
proc glm data=example noprint;
class cat;
model price= cat time x2 x3 / noint solution ;
**output out= from_glm COVRATIO = Cov ;**
run; quit;
Resulting to:
price cat time x2 x3 COV
5000 1 1 5.4 50 597.2565
6000 1 2 6 45 8.312725
3000 1 3 7 60 0.0493
....
Edit: Updated the output statement.
For more on on keywords see https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glm_sect020.htm
Hopefully this is what you're after.

Applying cutoff to data set with IDs

I am using SAS and managed to run proc logistic, which gives me a table like so.
Classification Table
Prob Correct Incorrect Percentages
Level Event Non- Event Non- Correct Sensi- Speci- FALSE FALSE
Event Event tivity ficity POS NEG J
0 33 0 328 0 9.1 100 0 90.9 . 99
0.02 33 62 266 0 26.3 100 18.9 89 0 117.9
0.04 31 162 166 2 53.5 93.9 49.4 84.3 1.2 142.3
0.06 26 209 119 7 65.1 78.8 63.7 82.1 3.2 141.5
How do I include IDs for the rows of data in lib.POST_201505_PRED below that have at least 0.6 probability?
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
model BUYER =
age
tenure
usage
payment
loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505 out=lib.POST_201505_PRED outroc=lib.POST_201505_ROC;
run;
I've been reading the documentation and searching online but haven't found anything on it. I must be searching for the wrong keywords, as I presume this is a frequently used process.
You just need an id-statement to tell SAS your ID-variable identifies your observations;
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
id ID;
model BUYER = age tenure usage payment loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505
out=lib.POST_201505_PRED
outroc=lib.POST_201505_ROC;
run;
Now your output contains all you need.
For instance to print the IDs that get had probability of at least 0.6 assigned of being a BUYER to them;
proc print data=lib.POST_201505_PRED (where=(P_1 GE 0.6));
var ID P_1;
run;
You find these id yourKey; statements throughout the statistical procedures in SAS, for instance ;
proc univariate data=psydata.stroop;
id Subject;
var ReadTime;
run;
** will report the most extreme values of ReadTime as
;
Turns out I just had to include the ids in lib.POST_201505

SAS transpose using values as column names and summarize

I'm trying to transpose a data using values as variable names and summarize numeric data by group, I tried with proc transpose and with proc report (across) but I can't do this, the unique way that I know to do this is with data set (if else and sum but the changes aren't dynamically)
For example I have this data set:
school name subject picked saving expenses
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
and I need this in 1 line, sum of 'picked' by the names of students, and later sum of picked by subject, the last 3 columns is the sum total for picked, saving and expense:
school john ruby peter noname math spanish geography nosubject picked saving expenses
raget 15 15 2 0 13 5 2 12 32 22700 8200
If it's possible to be dynamically changed if I have a new student in the school or subject?
It's a little difficult because you're summarising at more than one level, so I've used PROC SUMMARY and chosen different _TYPE_ values. See below:
data have;
infile datalines;
input school $ name $ subject : $10. picked saving expenses;
datalines;
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
;
run;
proc summary data=have;
class school name subject;
var picked saving expenses;
output out=want1 sum(picked)=picked sum(saving)=saving sum(expenses)=expenses;
run;
proc transpose data=want1 (where=(_type_=5)) out=subs (where=(_NAME_='picked'));
by school;
id subject;
run;
proc transpose data=want1 (where=(_type_=6)) out=names (where=(_NAME_='picked'));
by school;
id name;
run;
proc sql;
create table want (drop=_TYPE_ _FREQ_ name subject) as
select
n.*,
s.*,
w.*
from want1 (where=(_TYPE_=4)) w,
names (drop=_NAME_) n,
subs (drop=_NAME_) s
where w.school = n.school
and w.school = s.school;
quit;
I've also tested this code by adding new schools, names and subjects and they do appear in the final table. You'll note that I haven't hardcoded anything (e.g. no reference to math or John), so the code is dynamic enough.
PROC REPORT is an interesting alternative, particularly if you want the printed output rather than as a dataset. You can use ODS OUTPUT to get the output dataset, but it's messy as the variable names aren't defined for some reason (they're "C2" etc.). The printed output of this one is a little messy also as the header rows don't line up, but that can be fixed with some finagling if that's desired.
data have;
input school $ name $ subject $ picked saving expenses;
datalines;
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
;;;;
run;
ods output report=want;
proc report nowd data=have;
columns school (name subject),(picked) picked=picked2 saving expenses;
define picked/analysis sum ' ';
define picked2/analysis sum;
define saving/analysis sum ;
define expenses/analysis sum;
define name/across;
define subject/across;
define school/group;
run;

SAS: No valid observations are found Error - Simple Regression

I have big panel time series data set. I wish to do this basic SAS regression code:
proc sort data=dataset;
by time_id;
run;
ods output parameterestimates=pe;
proc reg data=dataset;
by time_id;
model y=x1 x2 x3....x15;
quit;
run;
I get this error when I run the code:
ERROR: No valid observations are found.
NOTE: The above message was for the following BY group:
time_id=1
ERROR: No valid observations are found.
NOTE: The above message was for the following BY group:
time_id=2....
Why? My time_id variable exists... is it because I have too many time_id variables? If I select firm_id it works but I want time_id.
Here's a sample of my data (panel time series):
y x firm_id time_id
3.4 100 1 1
2.3 200 1 2
6.5 653 1 3
3 50 2 1
4.34 23 2 2
4.8 55 2 3
1.311 400 3 1
1.23 200 3 2
5.63 50 3 3
You'll get this error message if all values of a particular x variable are missing for a given time_id. Take a look at the example below where all values of x2 are missing for time_id 1, when you run the code the Results Output window details the problem (number of missing observations the same as the number of observations).
It works for firm_id because you have fewer values than time_id, therefore not all values of a particular x variable are missing for each firm_id.
data have;
input y x1 x2 firm_id time_id;
cards;
3.4 100 . 1 1
2.3 200 200 1 2
6.5 653 653 1 3
3 50 . 2 1
4.34 23 23 2 2
4.8 55 55 2 3
1.311 400 . 3 1
1.23 200 200 3 2
5.63 50 50 3 3
;
run;
proc sort data=have;
by time_id;
run;
ods output parameterestimates=pe;
proc reg data=have;
by time_id;
model y=x1-x2;
quit;
run;