I have a sample SAS code below. And I want to know how to proc report the percentage using two decimal formats. For example, I want to put 100% as just zero decimal, and all the other percentages which is not 100% with 1 decimal such like 25.0%.
Here is my code.
data a;
infile datalines missover;
input subjid trt itt safety pp complete enroll disreason;
uncomplete=(complete^=1);
datalines;
1 1 1 1 1 1 1
2 2 0 1 1 1 1
3 1 1 1 1 0 1 4
4 2 1 1 1 1 1
5 1 1 1 0 0 1 5
6 2 1 1 1 1 1
7 2 1 1 1 0 1 1
8 1 0 1 0 1 1
9 1 1 1 1 0 1 5
10 2 0 1 0 0 1 5
11 2 1 1 1 0 1 1
12 2 1 1 0 0 1 2
13 1 1 1 0 0 1 3
14 2 1 1 0 0 1 4
;
run;
data b;
set a(in=a) a(in=b);
if b then trt=3;
run;
data c;
length cat $20;
set b;
array a(6) itt safety pp complete uncomplete enroll;
array b(5) r1 r2 r3 r4 r5;
do i=1 to 6;
cat=upcase(vname(a(i)));
value=a(i);
output;
end;
do j=1 to 5;
if disreason=j then value=1;
else value=0;
cat=upcase(vname(b(j)));
output;
end;
keep trt cat value;
run;
proc format ;
value $newcat(notsorted)
'ENROLL'='Enrolled Population'
'1'='1'
'PP'='Per-Protocol Population'
'2'='2'
'ITT'='ITT Population'
'3'='3'
'SAFETY'='Safety Population'
'4'='4'
'COMPLETE'='Patients Completed'
'UNCOMPLETE'='Patients Discontinued'
'5'='Primary Reason for Discontinuation of Study Dose'
'R1'='\li360 Lack of Effect'
'R2'='\li360 Protocol Violation'
'R3'='\li360 Lost to Follow-up'
'R4'='\li360 Adverse Event'
'R5'='\li360 Personal Reason';
value trt 1='Treatment 1'
2='Treatment 2'
3='Overall';
picture pct(round)
0<-100='0009)'(prefix='(' mult=100)
0=0;
run;
option missing='' nodate nonumber orientation=landscape;
ods rtf file='c:\dispoistion.rtf';
proc report data=c completerows nowd
style(report)={frame=hsides rules=groups}
style(header)={background=white};
column cat cat2 trt, value, (sum mean);
define cat/group format=$newcat. preloadfmt order=data noprint;
define cat2/computed ' ' style(column)={cellwidth=30% PROTECTSPECIALCHARS=OFF
};
define trt/across format=trt. '' order=internal;
define value/analysis '';
define sum/ 'N' style(column)={cellwidth=35pt} style(header)={just=right};
define mean/ '(%)' format=pct. style(column)={cellwidth=35pt just=left}
style(header)={just=left};
compute cat2/char length=50;
cat2=put(cat, $newcat.);
if cat2 in ('1', '2', '3','4') then cat2='';
endcomp;
run;
ods rtf close;
You need to use a picture format:
data test;
input x;
datalines;
0.5
0.751
0.999
1.00
;;;;
run;
proc format;
picture pct100f
-1 = [PERCENT7.0]
-1<-<1 = [PERCENT7.1]
1 = [PERCENT7.0]
other=[BEST12.];
quit;
proc print data=test;
format x pct100f.;
var x;
run;
Adjust that as needed. The -1 <-< 1 means anything that is between -1 and 1, exclusive. The [PERCENT7.0] is telling it to use that format for that section.
Try using this format:
define mean/ '(%)' format=percent7.1
Related
Suppose i have random diagnostic codes, such as 001, v58, ..., 142,.. How can I construct columns from the codes which is 1 for the records?
Input:
id found code
1 1 001
2 0 v58
3 1 v58
4 1 003
5 0 v58
......
......
15000 0 v58
Output:
id code_001 code_v58 code_003 .......
1 1 0 0
2 0 0 0
3 0 1 0
4 1 0 0
5 0 0 0
.........
.........
You will want to TRANSPOSE the values and name the pivoted columns according to data (value of code) with an ID statement.
Example:
In real world data it is often the case that missing diagnoses will be flagged zero, and that has to be done in a subsequent step.
data have;
input id found code $;
datalines;
1 1 001
2 0 v58
2 1 003 /* second diagnosis result for patient 2 */
3 1 v58
4 1 003
5 0 v58
;
proc transpose data=have out=want(drop=_name_) prefix=code_;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
* missing occurs when an id was not diagnosed with a code;
* if that means the flag should be zero (for logistic modeling perhaps)
* the missings need to be changed to zeroes;
data want;
set want;
array codes code_:;
do _n_ = 1 to dim(codes); /* repurpose automatic variable _n_ for loop index */
if missing(codes(_n_)) then codes(_n_) = 0;
end;
run;
I have data with
One binary variable, poor
Two socio-demographic variables var1 and var2
I would like to have the poverty rate of each of my var1 * var2 possible value, that would look like that :
But with three variables in a proc freq, I get multiple outputs, one for each value of the first variable I put on my product
proc freq data=test;
table var1*var2*poor;
run;
How can I get something close to what I would like ?
Try this
data test;
input var1 var2 poor;
cards;
1 1 1
2 3 0
3 2 1
4 1 1
1 2 1
2 3 0
4 1 0
4 2 0
3 1 1
1 2 0
3 2 0
1 3 1
3 3 0
3 3 0
3 3 1
1 1 0
2 2 0
2 2 1
2 2 1
2 1 1
2 1 1
2 1 1
;
run;
proc tabulate data=test;
class var1 var2 poor;
tables var1,
var2*poor*pctn<poor>={label="%"};
run;
I would like to create a table that has three variables where var2 is a percentage of var1 and var3 is a percentage of var 2, broken down by class variables that have missing values.
To explain, imagine I have data showing who applied, was interviewed, and was hired for a job, e.g.
data job;
input applied interviewed hired;
datalines;
1 1 1
1 1 1
1 1 1
1 1 0
1 1 0
1 1 0
1 0 .
1 0 .
1 0 .
1 0 .
;
run;
it's very easy to create a table that shows the count of who applied, and then the percentage of those who were interviewed and then of those people, the percentage who was hired.
proc tabulate data = job;
var applied interviewed hired;
tables applied * n (interviewed hired) * mean * f=percent6.;
run;
which gives:
applied interviewed hired
10 60% 50%
Now I would like to break that down by several class variables with missing values.
data have;
input sex degree exp applied interviewed hired;
datalines;
0 1 1 1 1 1
1 . 0 1 1 1
. 0 1 1 1 1
0 1 0 1 1 0
1 0 1 1 1 0
0 1 0 1 1 0
1 . 1 1 0 .
0 1 . 1 0 .
. 0 0 1 0 .
1 0 0 1 0 .
;
run;
If I do one class variable at a time it will give me the correct percentages:
proc tabulate data = have format = 6.;
class sex;
var applied interviewed hired;
tables sex, applied * sum (interviewed hired) * mean * f=percent6.;
run;
Is there a way to do all three class variables in the table at once and get the right percentage for each category. so the table looks like:
applied interviewed hired
sex
0 4 75% 33%
1 4 50% 50%
degree
0 4 50% 50%
1 4 75% 33%
exp
0 5 60% 33%
1 4 75% 67%
This is something I must do many, many times and I need to populate tables in a report with the numbers, so I'm looking for a solution where the table can be printed all in one step.
How would you solve this problem?
The problem you're running into is that of missing data. When a case is missing for any class variable, it is eliminated from the entire table, unless you specify MISSING in the proc call. So, for example, your 4th sex=0 who did not interview was missing EXP; so they didn't show up at all in the table, though you would want them showing up in SEX.
You can get the correct numbers, mostly:
proc tabulate data = have format = 6. missing;
class sex degree exp;
var applied interviewed hired;
tables (sex degree exp), applied * sum (interviewed hired) * mean * f=percent6.;
run;
However, you have an extra row that includes those with missing data. You cannot eliminate those rows from the printed output while also including them in the other class calculations; this is just one of those limitations of SAS tabulation. Other PROCs have a similar problem; PROC FREQ is the only one that doesn't do this if you have multiple tables generated, but even then within one table (combined with asterisks) you will have the same issue.
The only way I've found around this is to output the table to a dataset and then filter out those rows, and PROC REPORT or PRINT or TABULATE the data back out.
I think this is close to what you want. You will have to fix the row labels, but it is one PROC TABULATE step.
title;
data have;
input sex degree exp applied interviewed hired;
datalines;
0 1 1 1 1 1
1 . 0 1 1 1
. 0 1 1 1 1
0 1 0 1 1 0
1 0 1 1 1 0
0 1 0 1 1 0
1 . 1 1 0 .
0 1 . 1 0 .
. 0 0 1 0 .
1 0 0 1 0 .
;
run;
proc print;
run;
proc summary data=have missing ;
class sex degree exp;
ways 1;
output out=stats sum(applied)= mean(interviewed hired)= / levels;
run;
data stats2;
set stats;
if n(of sex degree exp) eq 0 then delete;
run;
proc print;
run;
proc tabulate data=stats2;
class _type_ / descend;
class _level_;
var applied interviewed hired;
tables (_type_*_level_),applied*sum='N'*f=8. (interviewed hired)*sum='Percent'*f=percent6.;
run;
/**/
/* applied interviewed hired*/
/*sex */
/* 0 4 75% 33%*/
/* 1 4 50% 50%*/
/*degree */
/* 0 4 50% 50%*/
/* 1 4 75% 33%*/
/*exp */
/* 0 5 60% 33%*/
/* 1 4 75% 67%*/
Hi my dataset looks something like this:
Var1 Var2 mainvar
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
I want to tabulate Var1 and Var2 based on the value of mainvar (which ranges from 1 to 5) so I tried:
%let class=Var1 Var2
proc tabulate data=x noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
But this is giving me the table without the data being factored by values of mainvar. Any help? Thanks!
In general, I think it's best to create a reproducible example. The following works fine for me:
data example ;
input var1 var2 mainvar ;
cards;
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
;
run;
%let class=Var1 Var2 ;
proc tabulate data=example noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
How can I manage proc tabulate to show the value of a variable with missing value instead of its statistic? Thanks!
For example, I want to show the value of sym. It takes value 'x' or missing value. How can I do it?
Sample code:
data test;
input tx mod bm $ yr sym $;
datalines;
1 1 a 0 x
1 2 a 0 x
1 3 a 0 x
2 1 a 0 x
2 2 a 0 x
2 3 a 0 x
3 1 a 0
3 2 a 0
3 3 a 0 x
1 1 b 0 x
1 2 b 0
1 3 b 0
1 4 b 0
1 5 b 0
2 1 b 0
2 2 b 0
2 3 b 0
2 4 b 0
2 5 b 0
3 1 b 0 x
3 2 b 0
3 3 b 0
1 1 c 0
1 2 c 0 x
1 3 c 0
2 1 c 0
2 2 c 0
2 3 c 0
3 1 c 0
3 2 c 0
3 3 c 0
1 3 a 1 x
2 3 a 1
3 3 a 1
1 3 b 1
2 3 b 1
3 3 b 1
1 3 c 1 x
2 3 c 1
3 3 c 1
;
run;
proc tabulate data=test;
class yr bm tx mod ;
var sym;
table yr*bm, tx*mod;
run;
proc tabulate data=test;
class tx mod bm yr sym;
table yr*bm, tx*mod*sym*n;
run;
That gives you ones for each SYM=x (since n=missing). That hides the rows for SYM=missing, hence you miss some values overall from your example table. (You could format the column with a format that defines 1 = 'x' easily).
proc tabulate data=test;
class tx mod bm yr;
class sym /missing;
table yr*bm, tx*mod*sym=' '*n;
run;
That gives you all of your combinations of the 4 main variables, but includes missing syms as their own column.
If you want to have your cake and eat it too, then you need to redefine SYM to be a numeric variable, so you can use it as a VAR.
proc format;
invalue ISYM
x=1
;
value FSYM
1='x';
quit;
data test;
infile datalines truncover;
input tx mod bm $ yr sym :ISYM.;
format sym FSYM.;
datalines;
1 1 a 0 x
1 2 a 0 x
1 3 a 0 x
... more lines ...
;
run;
proc tabulate data=test;
class tx mod bm yr;
var sym;
table yr*bm, tx*mod*sym*sum*f=FSYM.;
run;
All of these assume these are unique combination rows. If you start having multiples of yr*bm*tx*mod, you would have a problem here as this wouldn't give you the expected result (sum 1+1+1=3 would not give you an 'x').