How to do weighting in regression in SAS? - sas

I've set up a table with age and average spending by age. Age is my dependent variable. In my dataset, I have a lot of members at age 21, so I need to put more weight on it when I run regression in SAS. I'm new to SAS. I have used that regression button, but have not written codes. Is there another built in button for weighting? Or how would you do this?
Age Ave Spending Total Members
20 $100 35
21 $80 85
22 $75 20

You didn't specify which SAS product you use, but if you use SAS Enterprise Guide, the "Tasks > Regression > Linear Regression" menu gives a "relative weight" option where you can specify Total Members.
If you want to do this programatically, here is a short example:
DATA regdata;
INPUT Age 3.0
Ave_spending 3.0
total_members 3.0;
DATALINES;
20 100 35
21 80 85
22 75 20
;
RUN;
PROC REG DATA=regdata;
WEIGHT total_members;
MODEL Age = Ave_spending;
RUN;
The "Relative Weight" option translates into the "WEIGHT" command you see in the code above.

Related

Program for determining if aspirin is significantly better than placebo

I have been tasked with the following problem:
Out of a total 1,000 subjects on aspirin, 80 had heart attacks and 65 had strokes. Out of a total 2,000 subjects on placebo, 240 had heart attacks and 165 had strokes.
I am asked if there is a significant benefit for aspirin therapy for heart attacks and strokes. What is the RR for aspirin use for each of the two outcomes?
My main issue has been setting up the data lines. Here is what I have so far, but my output window doesn't look right.
Another issue is figuring out how to account for the varying sample sizes and the fact that someone might have had a heart attack AND a stroke.
DATA ODDS;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
Stroke 1-Yes 65
Stroke 2-No 165
;
PROC FREQ DATA=ODDS;
TITLE "Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH;
WEIGHT COUNT;
RUN;
Edit 1:
DATA ODDS;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
NoHeartAttack 1-Yes 920
NoHeartAttack 2-No 1760
Stroke 1-Yes 65
Stroke 2-No 165
NoStroke 1-Yes 935
NoStroke 2-No 1835
;
PROC FREQ DATA=ODDS;
TITLE "Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH;
WEIGHT COUNT;
RUN;
This is what your data and code should look like. You may need to flip the order in the TABLES statement so that the Relative Risk is calculated appropriately for your situation. I didn't bother checking that this was the case, as you can easily change if required.
DATA HeartAttack;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
HeartAttack 1-Yes 80
HeartAttack 2-No 240
NoHeartAttack 1-Yes 920
NoHeartAttack 2-No 1760
;
PROC FREQ DATA=HeartAttack;
TITLE "Heart Attack Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH Relrisk;
WEIGHT COUNT;
RUN;
DATA Stroke;
INPUT OUTCOME $ EXPOSURE $ COUNT;
DATALINES;
Stroke 1-Yes 65
Stroke 2-No 165
NoStroke 1-Yes 935
NoStroke 2-No 1835
;
PROC FREQ DATA=Stroke;
TITLE "Stroke Odds Ratio Aspirin";
TABLES EXPOSURE*OUTCOME / CHISQ CMH Relrisk;
WEIGHT COUNT;
RUN;
If you have the raw data I would recommend working with that instead of creating datalines in the first place. It leaves room for errors and then you could also deal with the interaction.
Placebo gives you your expected distribution. We have to handle strokes and heart attacks separately because there is no data for interaction. (If there is no interaction we'd expect a small number of patients with both, but there could be a negative or masking interaction, e.g. if the heart attacks are fatal, or there could be a cumulative interaction heart attacks often preceded by strokes or vice versa). We can't answer any of those questions.
Once you've got you expected, it's simply two chi-squared tests with two bins each. Not one chi-squared test with 4 bins.
(I'll put in a plug for my book Basic Algorithms if you want to code the chi-squared significance test from the ground up, without using any loook-up tables).

Values in column to reverse order

i need help in finding how to convert datavalues in a column to reverse order into new column or same column.I mean first datavalue in column should be the last value in column and vice versa.
example:
name age
karl 40
lowry 56
jim 29
robert 34
samuel 60
harry 47
the output i need should look like this.
name age
harry 47
samuel 60
robert 34
jim 29
lowry 56
karl 40
i need reverse order of the datavalues on variables age and name or only on one variable.
First create a variable of the observation number:
data temp;
set have;
ObsNum = _n_;
run;
Then use that variable to sort the dataset:
proc sort data=temp out=want (drop=ObsNum);
by descending ObsNum;
run;

Applying cutoff to data set with IDs

I am using SAS and managed to run proc logistic, which gives me a table like so.
Classification Table
Prob Correct Incorrect Percentages
Level Event Non- Event Non- Correct Sensi- Speci- FALSE FALSE
Event Event tivity ficity POS NEG J
0 33 0 328 0 9.1 100 0 90.9 . 99
0.02 33 62 266 0 26.3 100 18.9 89 0 117.9
0.04 31 162 166 2 53.5 93.9 49.4 84.3 1.2 142.3
0.06 26 209 119 7 65.1 78.8 63.7 82.1 3.2 141.5
How do I include IDs for the rows of data in lib.POST_201505_PRED below that have at least 0.6 probability?
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
model BUYER =
age
tenure
usage
payment
loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505 out=lib.POST_201505_PRED outroc=lib.POST_201505_ROC;
run;
I've been reading the documentation and searching online but haven't found anything on it. I must be searching for the wrong keywords, as I presume this is a frequently used process.
You just need an id-statement to tell SAS your ID-variable identifies your observations;
proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending;
id ID;
model BUYER = age tenure usage payment loyalty_card
/outroc=lib.POST_201503_ROC;
Score data=lib.POST_201505
out=lib.POST_201505_PRED
outroc=lib.POST_201505_ROC;
run;
Now your output contains all you need.
For instance to print the IDs that get had probability of at least 0.6 assigned of being a BUYER to them;
proc print data=lib.POST_201505_PRED (where=(P_1 GE 0.6));
var ID P_1;
run;
You find these id yourKey; statements throughout the statistical procedures in SAS, for instance ;
proc univariate data=psydata.stroop;
id Subject;
var ReadTime;
run;
** will report the most extreme values of ReadTime as
;
Turns out I just had to include the ids in lib.POST_201505

Check if a column exists and then sum in SAS

This is my input dataset:
Ref Col_A0 Col_01 Col_02 Col_aa Col_03 Col_04 Col_bb
NYC 10 0 44 55 66 34 44
CHG 90 55 4 33 22 34 23
TAR 10 8 0 25 65 88 22
I need to calculate the % of Col_A0 for a specific reference.
For example % col_A0 would be calculated as
10/(10+0+44+55+66+34+44)=.0395 i.e. 3.95%
So my output should be
Ref %Col_A0 %Rest
NYC 3.95% 96.05%
CHG 34.48% 65.52%
TAR 4.58% 95.42%
I can do this part but the issue is column variables.
Col_A0 and Ref are fixed columns so they will be there in the input every time. But the other columns won't be there. And there can be some additional columns too like Col_10, col_11 till col_30 and col_cc till col_zz.
For example the input data set in some scenarios can be just:
Ref Col_A0 Col_01 Col_02 Col_aa Col_03
NYC 10 0 44 55 66
CHG 90 55 4 33 22
TAR 10 8 0 25 65
So is there a way I can write a SAS code which checks to see if the column exists or not. Or if there is any other better way to do it.
This is my current SAS code written in Enterprise Guide.
PROC SQL;
CREATE TABLE output123 AS
select
ref,
(col_A0/(Sum(Col_A0,Col_01,Col_02,Col_aa,Col_03,Col_04,Col_bb)) FORMAT=PERCENT8.2 AS PERCNT_ColA0,
(1-(col_A0/(Sum(Col_A0,Col_01,Col_02,Col_aa,Col_03,Col_04,Col_bb))) FORMAT=PERCENT8.2 AS PERCNT_Rest
From Input123;
quit;
Scenarios where all the columns are not there I get an error. And if there are additional columns then I miss those. Please advice.
Thanks
I would not use SQL, but would use regular datastep.
data want;
set have;
a0_prop = col_a0/sum(of _numeric_);
run;
If you wanted to do this in SQL, the easiest way is to keep (or transform) the dataset in vertical format, ie, each variable a separate row per ID. Then you don't need to know how many variables there are to figure it out.
If you always want to sum all the numeric columns then just do :
col_A0 / sum(of _numeric_)

How to use proc transpose on variables that contain numbers separated with _?

Hi I am new to sas I have a question regarding proc transpose
I have this data
Input
School Name State School Code 26/07/2009 02/08/2009 09/08/2009 16/08/2009
Northwest High IL 14556 06 06 06 06
Georgia High GA 147 05 05 05 06
Macy Hgh TX 45456 NA NA NA NA
The desired output is
School Name State School Code Date Absent
Northwest High IL 14566 26/07/2009 6
Northwest High IL 14556 02/08/2009 6
Northwest High IL 14556 09/08/2009 6
Northwest High IL 14556 16/08/2009 6
Georgia High GA 147 26/07/2009 5
Georgia High GA 147 02/08/2009 5
Georgia High GA 147 09/08/2009 5
Georgia High GA 147 16/08/2009 6
Macy Hgh TX 45456 26/07/2009 NA
Macy Hgh TX 45456 02/08/2009 NA
Macy Hgh TX 45456 09/08/2009 NA
Macy Hgh TX 45456 16/08/2009 NA
This is the code I have written
proc sort data=work.input;
by School_Name State School_Code;
run;
proc transpose data=work.input out=work.inputModified;
by by School_Name State School_Code;
run
I get this error saying that No variables to transpose I think the issue is since the variables are actual numbers like this _26_07_2009 sas does not recognize them,
But I don't get the desired output the dates are actual variables when imported into sas they become _26_07_2009. Note there are about 185 dates and they are actual variables.
Thanks
The following transpose does the job:
proc transpose data=work.input out=work.inputModified;
by School_Name State School_Code;
var _:;
run;
Notice the _: notation - it picks up all variables which start with an underscore and transposes them.
As I mentioned in the link in my comments earlier, if you do not explicitly specify the variables you want to tranpose- then proc transpose by default looks for numeric variables that are not in the by variable list to transpose. However, since your date variables are read-in as strings [due to the presence of NAs] it was saying NOTE: No variables to transpose.
You can use the following to convert the date and absent columns into numeric columns.
data inputModified2;
set inputModified;
format date date9.;
date = input(compress(tranwrd(_name_,'_','')), ddmmyy8.);
if col1 NE 'NA' then absent = input(col1, 8.);
else absent=.;
drop _name_ col1;
run;