here I am using proc freq
Proc freq data=external_raw;
table Marital_status;
run;
this the table the shows up:
Marital_status
Frequency
Percent
Cumulative Frequency
Cumulative Percent
1
15851
8.38
15851
8.38
2
122370
64.68
138221
73.06
3
2645
1.40
140866
74.45
4
10216
5.40
151082
79.85
5
32141
16.99
183223
96.84
9
5975
3.16
189198
100.00
I want to change 1="Single", 2= "Married", 3= "Separated", 4= "Divorced", 5= "Widowed" and 9= "Unknown".
Screenshot of above table
Format it with proc format.
proc format;
value status
1 = 'Single'
2 = 'Married'
3 = 'Separated'
4 = 'Divored'
5 = 'Widowed'
9 = 'Unknown'
;
run;
Then apply the format:
proc freq data=have;
format marital_status status.;
table marital_status;
run;
Related
I tried some solutions already here and I am still unable to get a desired output.
The data I have is given below (ID is unique):
data have;
input id code_1 code_2 code_3 code_4 randa randb randc$;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;;;;;
run
I need to get the frequency of only the presence of various codes. (code1, code2 etc..)
The desired output:
Variable Frequency
code_1 3
code_2 1
code_3 2
code_4 3
I tried the solution in this and the code is given below:
ods output onewayfreqs=preds;
proc freq data=have;
tables _all_;
run;
ods output close;
proc tabulate data=preds;
class table frequency;
tables table,frequency;
run;
Output:
Frequenza
1 2 3
N N N
Table 1 . 1
Tabella code_1
Tabella code_2 1 . 1
Tabella code_3 . 2 .
Tabella code_4 1 . 1
Tabella id 4 . .
Tabella randa 4 . .
Tabella randb 4 . .
Tabella randc 4 . .
Also I tried as the code below:
proc freq data=have order=freq;
array codes code_:;
do _n_ = 1 to dim(codes);
table codes(_n_)/list missing out=var1_freq;
end;
run;
But I donot know how to write the code properly.
I am getting output for the code below (only for one code at a time):
proc freq data=have order=freq ;
tables code_1/list missing out=var1_freq;
run;
But how to get for multiple codes? Many thanks for your help..!
The out= option for the tables statement will only produce output for the last variable listed, so you won't get all 4 codes.
You can count the 1 valued code_* variables after transposition.
data have;
input id code_1 code_2 code_3 code_4 randa randb randc $ ;
datalines;
19736 1 0 1 0 5.5 10 11
19737 0 0 0 1 2 4.8 19
19738 1 0 1 1 6 9 2.6
19739 1 1 0 1 1.6 7 8.5
;
data idcodes / view=idcodes;
set have;
array codes code_1-code_4;
do _n_ = 1 to dim (codes);
variable = vname(codes(_n_));
flag = codes(_n_);
output;
end;
keep id variable flag;
run;
proc freq data=idcodes;
where flag;
table variable / out=freqs(keep=variable count);
run;
Presuming codes are only 0/1, you could also sum the codes and transpose the result.
proc means noprint data=have;
var code_:;
output out=flagsum sum=;
run;
proc transpose data=flagsum out=want(rename=(_name_=variable col1=frequency));
var code_:;
run;
I have a situation where I would like to put the value of a variable in the label in SAS.
Example: Median for Total_Days is 2. I would like to put this value in Days_Median_Split label. The median keeps on changing with varying data, so I would like to automate it.
Phy_Activity Total_Days "Days_Median_Split: Number of Days with Median 2"
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
Sample Dataset
Thanks so much!
* step 1 create data;
data have;
input Phy_Activity $ Total_Days Days_Median_Split;
datalines;
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
run;
*step 2 sort data on Total_days;
proc sort data = have;
by Total_days;
run;
*step 3 get count of obs;
proc sql noprint;
select count(*) into: cnt
from have;quit;
* step 4 calulate median;
%let median = %sysevalf(&cnt/2 + .5);
*step 5 get median obsevation;
proc sql noprint;
select Total_days into: medianValue
from have
where monotonic()=&median;quit;
*step 6 create label;
data have;
set have;
label Days_Median_split = 'Days_Median_split: Number of Days with Median '
%trim(&medianValue);
run;
I have two models that have output:
output out=m1 pred=p1;
output out=m2 pred=p2;
They run fine and create the below sample observations:
Sample of M2
Obs p1
1 0.98057
2 0.71486
3 0.91951
4 0.93073
5 0.93505
6 0.98788
7 0.94461
8 0.99449
9 0.93282
10 0.88654
AND
Sample of M1
Obs p2
1 0.97988
2 0.70704
3 0.91731
4 0.92880
5 0.93324
6 0.98746
7 0.94386
8 0.99431
9 0.93102
10 0.88404
Next I try to combine the two into a table with the statement:
proc sql;
create table ptable as select *
from m1 as a left join m2 as b on a.cnt=b.cnt;
quit;
But I'm getting this error:
ERROR: Column cnt could not be found in the table/view identified with the correlation name A.
ERROR: Column cnt could not be found in the table/view identified with the correlation name A.
ERROR: Column cnt could not be found in the table/view identified with the correlation name B.
ERROR: Column cnt could not be found in the table/view identified with the correlation name B.
So how do I put p1 and p2 into a table in SAS?
Below is the code used to generate p1 and p2 where DATA = source
/*initial model*/
proc hplogistic data=DATA;
model value(event='1')=X1-X20;
output out=m1 pred=p1;
run;
/*new model*/
proc hplogistic data=data;
model value(event='1')=X1-X20;
output out=m2 pred=p2;
run;
This is a case for performing a merge without a by statement. The tables will be joined row-by-row.
For example
data have1(keep=p1) have2(keep=p2);
do _n_ = 1 to 10;
p1 = 1 / _n_;
p2 = _n_ ** 2;
output;
end;
run;
data want;
merge have1 have2;
*** NO BY STATEMENT ***;
run;
proc print data=want;run;
-------- OUTPUT --------
Obs p1 p2
1 1.00000 1
2 0.50000 4
3 0.33333 9
4 0.25000 16
5 0.20000 25
6 0.16667 36
7 0.14286 49
8 0.12500 64
9 0.11111 81
10 0.10000 100
Merges without BY is atypical in the data processing that I normally do.
Here is the answer that ran the whole way through:
/*initial model*/
proc hplogistic data=DATA;
model value(event='1')=X1-X20;
output out=m1 pred=p1 copyvars=(cnt readmit);
run;
/*new model*/
proc hplogistic data=data;
model value(event='1')=X1-X20;
output out=m2 pred=p2 copyvar=cnt;
run;
cnt had to be included in the output to be joined in the table.
I have a dataset of patient data with each diagnosis on a different line.
This is an example of what it looks like:
patientID diabetes cancer age gender
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
I need to isolate the patients who have a diagnosis of both diabetes and cancer; their unique patient identifier is patientID. Sometimes they are both on the same line, sometimes they aren't. I am not sure how to do this because the information is on multiple lines.
How would I go about doing this?
This is what I have so far:
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(DOB) as DOB
from diab_dx
group by patientID;
quit;
data final; set want;
if diabetes GE 1 AND cancer GE 1 THEN both = 1;
else both =0;
run;
proc freq data=final;
tables both;
run;
Is this correct?
If you want to learn about data steps lookup how this works.
data pat;
input patientID diabetes cancer age gender:$1.;
cards;
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
;;;;
run;
data both;
do until(last.patientid);
set pat; by patientid;
_diabetes = max(diabetes,_diabetes);
_cancer = max(cancer,_cancer);
end;
both = _diabetes and _cancer;
run;
proc print;
run;
add a having statement at the end of sql query should do.
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(age) as DOB
from PAT
group by patientID
having calculated diabetes ge 1 and calculated cancer ge 1;
quit;
You might find some coders, especially those coming from statistical backgrounds, are more likely to use Proc MEANS instead of SQL or DATA step to compute the diagnostic flag maximums.
proc means noprint data=have;
by patientID;
output out=want
max(diabetes) = diabetes
max(cancer) = cancer
min(age) = age
;
run;
or for the case of all the same aggregation function
proc means noprint data=have;
by patientID;
var diabetes cancer;
output out=want max= ;
run;
or
proc means noprint data=have;
by patientID;
var diabetes cancer age;
output out=want max= / autoname;
run;
I am using a proc means to calculate the share of the payments made by business line, the data looks like this:
data Test;
input ID Business_Line Payment2017;
Datalines;
1 1 1000
2 1 2000
3 1 3000
4 1 4000
5 2 500
6 2 1500
7 2 3000
;
run;
i'm looking to calculate an additional column which, by group (business_line) calculates the percentage share (weight) of the payment as such:
data Test;
input ID Business_Line Payment2017 share;
Datalines;
1 1 1000 0.1
2 1 2000 0.2
3 1 3000 0.3
4 1 4000 0.4
5 2 500 0.1
6 2 1500 0.3
7 2 3000 0.6
;
run;
the code I have used so far:
proc means data = test noprint;
class ID;
by business_line;
var Payment2017;
output out=test2
sum = share;
weight = payment2017/share;
run;
I have also tried
proc means data = test noprint;
class ID;
by business_line;
var Payment2017 /weight = payment2017;
output out=test3 ;
run;
appreciate the help.
Proc FREQ will compute percentages. You can divide the PERCENT column of the output to get the fraction, or work with percents downstream.
In this example id crosses payment2017 in order to ensure all original rows are part of the output. If the id was not present, and there were any replicate payment amounts, FREQ would aggregate the payment amounts.
proc freq data=have noprint;
by business_line;
table id*payment2017 / out=want all;
weight payment2017 ;
run;
It is convenient to do with proc sql:
proc sql;
select *, payment2017/sum(payment2017) as share from test group by business_line;
quit;
data step:
data have;
do until (last.business_line);
set test;
by business_line notsorted;
total+payment2017;
end;
do until (last.business_line);
set test;
by business_line notsorted;
share=payment2017/total;
output;
end;
call missing(total);
drop total;
run;