I have been tasked with the following problem:
Out of a total 1,000 subjects on aspirin, 80 had heart attacks and 65 had strokes. Out of a total 2,000 subjects on placebo, 240 had heart attacks and 165 had strokes.
I am asked if there is a significant benefit for aspirin therapy for heart attacks and strokes. What is the RR for aspirin use for each of the two outcomes?
My main issue has been setting up the data lines. Here is what I have so far, but my output window doesn't look right.
Another issue is figuring out how to account for the varying sample sizes and the fact that someone might have had a heart attack AND a stroke.
DATA ODDS;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
Stroke 1-Yes 65
Stroke 2-No 165
;
PROC FREQ DATA=ODDS;
TITLE "Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH;
WEIGHT COUNT;
RUN;
Edit 1:
DATA ODDS;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
NoHeartAttack 1-Yes 920
NoHeartAttack 2-No 1760
Stroke 1-Yes 65
Stroke 2-No 165
NoStroke 1-Yes 935
NoStroke 2-No 1835
;
PROC FREQ DATA=ODDS;
TITLE "Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH;
WEIGHT COUNT;
RUN;
This is what your data and code should look like. You may need to flip the order in the TABLES statement so that the Relative Risk is calculated appropriately for your situation. I didn't bother checking that this was the case, as you can easily change if required.
DATA HeartAttack;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
NoHeartAttack 1-Yes 920
NoHeartAttack 2-No 1760
;
PROC FREQ DATA=HeartAttack;
TITLE "Heart Attack Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH Relrisk;
WEIGHT COUNT;
RUN;
DATA Stroke;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
Stroke 1-Yes 65
Stroke 2-No 165
NoStroke 1-Yes 935
NoStroke 2-No 1835
;
PROC FREQ DATA=Stroke;
TITLE "Stroke Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH Relrisk;
WEIGHT COUNT;
RUN;
If you have the raw data I would recommend working with that instead of creating datalines in the first place. It leaves room for errors and then you could also deal with the interaction.
Placebo gives you your expected distribution. We have to handle strokes and heart attacks separately because there is no data for interaction. (If there is no interaction we'd expect a small number of patients with both, but there could be a negative or masking interaction, e.g. if the heart attacks are fatal, or there could be a cumulative interaction heart attacks often preceded by strokes or vice versa). We can't answer any of those questions.
Once you've got you expected, it's simply two chi-squared tests with two bins each. Not one chi-squared test with 4 bins.
(I'll put in a plug for my book Basic Algorithms if you want to code the chi-squared significance test from the ground up, without using any loook-up tables).
Related
I need to calculate the two-sided ttest in SAS.
I generally use the proc ttest adding side=2 but I am not sure if this test works fine or if another way should be preferred to it.
An example of data is the following:
Score Segment Obs Class_obs
1 0 500 15
1 1 500 34
2 0 234 23
2 1 766 65
Where the p-value is calculated per each score. Segment means that a condition is met (e.g. score higher than 60. 0 means ‘lower than 60’ while 1 means ‘higher than 60’).
Obs is the number of observations in each segment by score. Class obs is the number of obs that satisfy a specific condition on the overall population.
Happy to share more info if it needs.
I am not able to get a row with ALL using row percentages. I would like the first row to give sum and percentage for column totals. So the percent under borderline for ALL should display 1861 * 100/5049=36.8% and under Desirable to display 1399 * 100/5049=27.7%. Currently it is displaying 100% and I need to change that.
proc tabulate data=sashelp.heart;* format=8.2;
class chol_status smoking_status sex;
table (all smoking_status sex),
(all chol_status)*(n*f=8. colpctn) ;
run;
The output is
All Cholesterol Status
Borderline Desirable High
N ColPctN N ColPctN N ColPctN N ColPctN
All 5049 100.00 1861 100.00 1399 100.00 1789 100.00 <- change the cholesterol % to denominator 5049
Smoking Status
Heavy (16-25) 1029 20.38 383 20.58 285 20.37 361 20.18
Light (1-5) 563 11.15 192 10.32 174 12.44 197 11.01
Moderate (6-15) 563 11.15 217 11.66 170 12.15 176 9.84
Non-smoker 2436 48.25 886 47.61 655 46.82 895 50.03
Very Heavy (> 25) 458 9.07 183 9.83 115 8.22 160 8.94
Sex
Female 2770 54.86 959 51.53 803 57.40 1008 56.34
Male 2279 45.14 902 48.47 596 42.60 781 43.66
I think the closest you can get is this:
proc tabulate data=sashelp.heart;* format=8.2;
class chol_status smoking_status sex;
table all*rowpctn=' ' (smoking_status sex)*(n=' '*f=8. colpctn=' '),
(all) (chol_status) ;
run;
That's not what you want, though, and doesn't really look very good. It's the only option that comes out of proc tabulate, though, as Tabulate won't let you assign statistics to both the rows and the columns - you have to pick one.
PROC REPORT will do what you want, with some effort. However, you could also run this in a two step process - output the tabulate to a dataset, fix the row percentages, then re-print it, either in Report or Tabulate, not asking it to percentage things that time.
I've set up a table with age and average spending by age. Age is my dependent variable. In my dataset, I have a lot of members at age 21, so I need to put more weight on it when I run regression in SAS. I'm new to SAS. I have used that regression button, but have not written codes. Is there another built in button for weighting? Or how would you do this?
Age Ave Spending Total Members
20 $100 35
21 $80 85
22 $75 20
You didn't specify which SAS product you use, but if you use SAS Enterprise Guide, the "Tasks > Regression > Linear Regression" menu gives a "relative weight" option where you can specify Total Members.
If you want to do this programatically, here is a short example:
DATA regdata;
INPUT Age 3.0
Ave_spending 3.0
total_members 3.0;
DATALINES;
20 100 35
21 80 85
22 75 20
;
RUN;
PROC REG DATA=regdata;
WEIGHT total_members;
MODEL Age = Ave_spending;
RUN;
The "Relative Weight" option translates into the "WEIGHT" command you see in the code above.
Help my with my task, please. I have a problem:
Name Age Height Eyes
Dan 25 174 Blue
Dan 54 165 Black
Jane 33 160 Blue
Kate 19 170 Green
I need:
Name Characteristic
Dan 25
174
Blue
Dan 54
165
Black
Jane 33
160
Blue
Kate 19
170
Green
I tryed to do it with concatenation:
Characteristic=Age||Height||Eyes
But it makes one line from characteristics, but not a column:
Name Characteristic
Dan 25 174 Blue
Dan 54 165 Black
Jane 33160 Blue
Kate 19 170Green
I knew, I need use split to solve this moment. Help me, please with some advice
You need to convert it all to character to have it one field. You may also be able to use a data _null_ step to create your report.
Here's how you could transpose your data to one field that you could then use proc report on. This is a transpose problem, not concatenation.
data have;
input Name $ Age Height Eyes $;
cards;
Dan 25 174 Blue
Dan 54 165 Black
Jane 33 160 Blue
Kate 19 170 Green
;
run;
data want;
set have;
characteristic = put(age, 8. -l); output;
characteristic = put(height, 8. -l); output;
characteristic = Eyes; output;
drop age height eyes;
run;
If you were creating a text file or a different output this may be what you want:
data _null_;
set have;
file '/folders/myfolders/want.txt' dlm=" ";
put Name "09"x Age;
put "09"x Height;
put "09"x Eyes;
run;
Hope that helps!
I am using SAS and managed to run proc logistic, which gives me a table like so.
Classification Table
Prob Correct Incorrect Percentages
Level Event Non- Event Non- Correct Sensi- Speci- FALSE FALSE
Event Event tivity ficity POS NEG J
0 33 0 328 0 9.1 100 0 90.9 . 99
0.02 33 62 266 0 26.3 100 18.9 89 0 117.9
0.04 31 162 166 2 53.5 93.9 49.4 84.3 1.2 142.3
0.06 26 209 119 7 65.1 78.8 63.7 82.1 3.2 141.5
How do I include IDs for the rows of data in lib.POST_201505_PRED below that have at least 0.6 probability?
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
model BUYER =
age
tenure
usage
payment
loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505 out=lib.POST_201505_PRED outroc=lib.POST_201505_ROC;
run;
I've been reading the documentation and searching online but haven't found anything on it. I must be searching for the wrong keywords, as I presume this is a frequently used process.
You just need an id-statement to tell SAS your ID-variable identifies your observations;
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
id ID;
model BUYER = age tenure usage payment loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505
out=lib.POST_201505_PRED
outroc=lib.POST_201505_ROC;
run;
Now your output contains all you need.
For instance to print the IDs that get had probability of at least 0.6 assigned of being a BUYER to them;
proc print data=lib.POST_201505_PRED (where=(P_1 GE 0.6));
var ID P_1;
run;
You find these id yourKey; statements throughout the statistical procedures in SAS, for instance ;
proc univariate data=psydata.stroop;
id Subject;
var ReadTime;
run;
** will report the most extreme values of ReadTime as
;
Turns out I just had to include the ids in lib.POST_201505