Marginal effects of dummy predictors in logistic model in SAS - sas

I'd like to derive marginal effects from categorical logistic model in SAS.
In case of continuous dependent variables, thanks for SAS manual(22604), I understood how to calculate marginal effects to some extent.
However this SAS manual only handles marginal effects of continuous predictors on binary or categorical dependent variables. So I'm not sure whether manual's method can be applied in case of binary dummy predictors.
Conceptually, I can interpret marginal effects of dummy predictors on dependent variable, but technically i'm not sure it's right calculation.
Of course, it might be better to use the odd ratio. I agree, but I'd like to use and marginal effects.
Thanks for reading.
data crops;
input Crop $1-10 x1 rain $ ;
datalines;
Corn 16 1
Corn 15 0
Corn 16 0
Corn 18 0
Corn 15 1
Corn 15 1
Corn 12 1
Soybeans 20 0
Soybeans 24 0
Soybeans 21 1
Soybeans 27 1
Soybeans 12 1
Soybeans 22 1
Cotton 31 0
Cotton 29 0
Cotton 34 0
Cotton 26 1
Cotton 53 0
Cotton 34 1
Sugarbeets 22 1
Sugarbeets 25 0
Sugarbeets 34 1
Sugarbeets 54 1
Sugarbeets 25 1
Sugarbeets 26 1
Clover 12 0
Clover 24 0
Clover 87 0
Clover 51 0
Clover 96 0
Clover 31 1
Clover 56 1
Clover 32 0
Clover 36 0
Clover 53 1
Clover 32 1
;
proc logistic data=crops;
class rain(ref='0');
model crop = rain / link=glogit;
output out=preds predprobs=individual;
ods output ParameterEstimates=betas;
run;
proc transpose data=betas out=rowbetas;
var estimate;
run;
data margeff;
if _n_=1 then set rowbetas;
set preds;
SumBetaPred=col5*IP_Clover + col6*IP_Corn + col7*IP_Cotton + col8*IP_Soybeans;
MEClover=IP_Clover*(col5-SumBetaPred);
MECorn=IP_Corn*(col6-SumBetaPred);
MECotton=IP_Cotton*(col7-SumBetaPred);
MESoybeans=IP_Soybeans*(col8-SumBetaPred);
MESugarbeets=IP_Sugarbeets*(-SumBetaPred);
run;
proc sort nodupkey;
by rain;
run;
proc print;
id rain;
var me:;
run;

Related

PROC TRANSPOSE value column while retaining dates and Hour End

I have data structured like this:
Meter_ID Date HourEnd Value
100 12/01/2007 1 986
100 12/01/2007 2 992
100 12/01/2007 3 1002
200 12/01/2007 1 47
200 12/01/2007 2 45
200 12/01/2007 3 50
300 12/01/2007 1 32
300 12/01/2007 2 37
300 12/01/2007 3 40
And would like to transpose the information so that I end up with this:
Date HourEnd Meter100 Meter200 Meter300
12/01/2007 1 986 47 32
12/01/2007 2 992 45 37
12/01/2007 3 1002 50 40
I have tried numerous PROC TRANSPOSE options and variations and am confusing myself. Any help would be greatly appreciated!
You need to SORT.
data have;
infile cards firstobs=2;
input Meter_ID Date:mmddyy. HourEnd Value;
format date mmddyy10.;
cards;
Meter_ID Date HourEnd Value
100 12/01/2007 1 986
100 12/01/2007 2 992
100 12/01/2007 3 1002
200 12/01/2007 1 47
200 12/01/2007 2 45
200 12/01/2007 3 50
300 12/01/2007 1 32
300 12/01/2007 2 37
300 12/01/2007 3 40
;;;;
run;
proc print;
proc sort data=have;
by date hourend meter_id;
run;
proc print;
run;
proc transpose prefix="Meter"n;
by date hourend;
id meter_id;
var value;
run;
proc print;
run;

SAS warning: complete separation of data points. The maximum likelihood estimate does not exist

I did loggistic regression in SAS using the database shown below but I got several warnings. I tried to identify the outliers and exclude them then test for multicolinearity but still I am getting warnings.
Any advice will be greatly appreciated.
**********************************;
************** database **********;
***********************************;
data D_BP;
input BP Age Weight BSA Dur Pulse Stress;
datalines;
0 47 85.4 1.75 5.1 63 33
0 51 89.4 1.89 7 72 95
0 47 90.9 1.9 6.2 66 8
0 49 89.2 1.83 7.1 69 62
0 48 92.7 2.07 5.6 64 35
0 47 94.4 2.07 5.3 74 90
0 50 95 2.05 10.2 68 47
0 45 87.1 1.92 5.6 67 80
0 46 94.5 1.98 7.4 69 95
0 46 87 1.87 3.6 62 18
0 46 94.5 1.9 4.3 70 12
0 48 90.5 1.88 9 71 99
1 49 94.2 2.1 3.8 70 14
1 49 95.3 1.98 8.2 72 10
1 50 94.7 2.01 5.8 73 99
1 48 99.5 2.25 9.3 71 10
1 49 99.8 2.25 2.5 69 42
1 49 94.1 1.98 5.6 71 21
1 52 101.3 2.19 10 76 98
1 56 95.7 2.09 7 75 99
;
run;
****** do logistic regression **********;
Proc logistic data=work.D_bp;
Model BP=Age Weight BSA Dur Pulse Stress;
Run;
**** identify outlier *********;
proc reg data=work.D_bp plots(only
label)=(RStudentByLeverage CooksD);
model BP=Age Weight BSA Dur Pulse Stress ;
run;
**** After removing outliers ==> assess multicollinearity*********;
**** assessing multicollinearity by 2 ways *********;
proc corr data=work.D_bp ;
Var Age Weight BSA Dur Pulse Stress;
run;
proc reg data=work.D_bp plots;
Model BP=Age Weight BSA Dur Pulse Stress/Collin vif tol;
run;
****** repeat logistic regression after excluding weight **********;
Proc logistic data=work.D_bp;
Model BP=Age BSA Dur Pulse Stress;
Run;
WARNING: There is a complete separation of data points. The maximum likelihood estimate does
not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are
based on the last maximum likelihood iteration. Validity of the model fit is
questionable.

A dynamic SAS program to consolidate dates of events that are nested within each other

Hello,
I want to write a dynamic program which helps me to flag the start and end dates of events that are nested within the consolidated dates that are present at the top of each Pt.ID in the attached example. I can easily do these if there is only one such consolidated period per Pt.ID. However, there could be more than one such consolidated periods per Pt. ID. (As shown for second Pt.ID, 1002). As shown in the example, the events that fall within the consolidated period/s are fagged as "Y" in the flag variable and if they don't fall within the consolidated period then they are flagged as "N" in this variable. How can I write a program that accounts for all of such consolidated periods per Pt.ID and then compare them with the dates for the rest of the events of a particular patient and flag events which fall within any of those consolidated periods?
Thank you.
So join the event records with the period records and calculate whether the event is within the period. Then you could take the MAX over all periods.
For example here is code for your sample that creates a binary 1/0 flag variable called INCLUDED.
data Sample;
infile datalines missover;
input Pt_ID Event_ID Category $ Start_Date : mmddyy10.
Start_Day End_date : mmddyy10. End_day Duration
;
format Start_date End_date mmddyy10.;
datalines;
1001 . Moderate 8/5/2016 256 9/3/2016 285 30
1001 1 Moderate 3/8/2016 106 3/16/2016 114 9
1001 2 Moderate 8/5/2016 256 8/14/2016 265 10
1001 3 Moderate 8/21/2016 272 8/24/2016 275 4
1001 4 Moderate 8/23/2016 274 9/3/2016 285 12
1002 . Severe 11/28/2016 13 12/19/2016 34 22
1002 . Severe 2/6/2017 83 2/28/2017 105 23
1002 1 Severe 11/28/2016 13 12/5/2016 20 8
1002 2 Severe 12/12/2016 27 12/19/2016 34 8
1002 3 Severe 1/9/2017 55 1/12/2017 58 4
1002 4 Severe 2/6/2017 83 2/13/2017 90 8
1002 5 Severe 2/20/2017 97 2/28/2017 105 9
1002 6 Severe 3/17/2017 122 3/24/2017 129 8
1002 7 Severe 5/4/2017 170 5/13/2017 179 10
1002 8 Severe 5/24/2017 190 5/30/2017 196 7
1002 9 Severe 6/9/2017 206 6/13/2017 210 5
;
proc sql ;
create table want as
select a.*
, max(b.start_date <= a.start_date and b.end_date >= a.end_date ) as Included
from sample a
left join sample b
on a.pt_id = b.pt_id and missing(b.event_id)
group by 1,2,3,4,5,6,7,8
order by a.pt_id, a.event_id, a.start_date , a.end_date
;
quit;

SAS using Datalines - "observation read not used"

I am a complete newb to SAS and I only know is basic sql. Currently taking Regression class and having trouble with SAS code.
I am trying to input two columns of data where x variable is State; y variable is # of accidents for a simple regression.
I keep getting this:
ERROR: No valid observations are found.
Number of Observations Read 51
Number of Observations Used 0
Number of Observations with Missing Values 51
Is it because datalines only read numbers and not charcters?
Here is the code as well as the datalines:
Data Firearm_Accidents_1999_to_2014;
ods graphics on;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
Connecticut 0
Delaware 0
District_of_Columbia 0
Florida 350
Georgia 413
Hawaii 0
Idaho 0
Illinois 287
Indiana 288
Iowa 0
Kansas 44
Kentucky 384
Louisiana 562
Maine 0
Maryland 21
Massachusetts 27
Michigan 168
Minnesota 0
Mississippi 332
Missouri 320
Montana 0
Nebraska 0
Nevada 0
New_Hampshire 0
New_Jersey 85
New_Mexico 49
New_York 218
North_Carolina 437
North_Dakota 0
Ohio 306
Oklahoma 227
Oregon 41
Pennsylvania 465
Rhode_Island 0
South_Carolina 324
South_Dakota 0
Tennessee 603
Texas 876
Utah 0
Vermont 0
Virginia 203
Washington 45
West_Virginia 136
Wisconsin 64
Wyoming 0
;
run; proc print;
proc reg data = Firearm_Accidents_1999_to_2014;
model State = Sum_OF_Deaths;
ods graphics off;
run; quit;
OK, some different levels of issues here.
ODS GRAPHICS go before and after procs, not inside them.
When reading a character variable you need to tell SAS using an informat.
This allows you to read in the data. However your regression has several issues. For one, State is a character variable and you can do regression with a character variable. I think that issue is beyond this forum. Review your regression basics and check what you're trying to do.
Data Firearm_Accidents_1999_to_2014;
informat state $32.;
Input State Sum_OF_Deaths;
Datalines;
Alabama 526
Alaska 0
Arizona 150
Arkansas 246
California 834
Colorado 33
....
;
run;

Merge dataset without common variable (By)?

Currently I have two datasets with similar variable lists. Each dataset has a procedure variable. I want to compare the frequency of the procedure variable between datasets. I created a flag in both datasets to id the source dataset, and was going to merge but don't have a common identifier. How do I merge a dataset without deleting any observations? This isn't just a simple Merge without a By function, right?
Currently have:
Data.a Data.b
pproc proc1_numb
70 9
71 15
77 24
80 80
81 42
83 71
86 66
87 125
121 159
125 242
Want Output:
pproc freq
9 1
15 1
24 1
42 1
66 1
70 1
71 2
77 1
80 2
81 1
83 1
86 1
87 1
121 1
125 2
159 1
242 1
If I understand your question properly, you should just concatenate the two datasets into one and rename the variable. Then you can use PROC MEANS to get the frequencies. Something like this:
data all;
set a
b(rename=(proc1_numb=pproc));
run;
proc means nway data=all noprint;
class pproc;
output out=want(drop=_type_ rename=(_freq_=freq));
run;