SAS proc tabulate questions - sas

My dataset looks like this:
ID Var1 Var2 Var3
A 1 0 1
B 0 0 1
B 1 1 0
A 0 0 0
A 1 1 1
My expected output will be:
ID Var1 Var2 Var3
A 2 1 3
B 1 1 1
Can someone help with this?

I've not access to SAS right now but try the following:-
proc tabulate data = in;
class id;
var var:;
table id, sum=''*(var1 var2 var3);
run;

Related

SAS: apply statement over multiple columns

I have a dataset with one id column and three variables:
data have;
input id var1 var2 var3;
datalines;
1 0 1 0
2 1 1 0
3 0 0 2
4 0 4 1
;
run;
I want to use some osrt or data or proc sql step over var1 to var3 to keep as 0 if it is 0, and 1 if it is greater than 0. It should ideally use an array var1 -- var3 as the actual dataset has many more variables.
Try this
data have;
input id var1 var2 var3;
datalines;
1 0 1 0
2 1 1 0
3 0 0 2
4 0 4 1
;
run;
data want;
set have;
array v var1 -- var3;
do over v;
v = v > 0;
end;
run;

In SAS: How to flag unique combinations of a set of variable values

In SAS, how can I create an identifier for each unique combination of a set of variables?
I have, for example, a several thousand observations with a dichotomous value for six variables. There are 2^6 unique combinations for the values of these variables for each observation. I would like to create an identifier for each unique combination, and eventually group my observations according to this value.
Have:
SubjectID Var1 Var2 Var3 Var4 Var5 Var6
---------------------------------------------------------------
ID1 1 1 1 1 1 1
ID2 1 0 1 1 1 1
ID3 0 1 1 1 1 1
ID4 0 0 1 1 1 0
... ... ... ... ... ... ...
ID3000 1 1 0 1 0 0
Want:
SubjectID Var1 Var2 Var3 Var4 Var5 Var6 Identifier
------------------------------------------------------------------------------
ID1 1 1 1 1 1 1 A
ID2 1 1 1 1 1 1 A
ID3 0 1 1 1 1 1 B
ID4 0 0 1 1 1 0 C
... ... ... ... ... ... ...
ID3000 1 1 0 1 0 0 Z
A would represent 1, 1, 1, 1, 1, 1 as a unique combination and B would represent 0, 1, 1, 1, 1, 1 etc.
I have thought about creating a dummy variable based on 64 Var1-Var6 conditional statements. I've also thought about concatenating the values from Var1-Var6 into a new row to create a unique identifier.
Is there a more straightforward way of going about this?
I prefer an approach that assigns a specific identifier to a specific combination of the values, rather than one that just generates some arbitrary unique string whenever a new combination comes up.
Proc summary works well with the LEVELS option. This technique works for any values of the group variables numeric or character.
data have;
input (v1-v6)(1.);
cards;
111111
111110
111101
111011
110111
;;;;
proc print;
proc summary data=have nway;
class v1-v6;
output out=unique(drop=_type_) / levels;
run;
Why not just concatenate the values?
So your combinations are:
111111
111110
111101
111011
110111
....
You can use PROC FREQ to check the number of each type.
proc freq data=have;
table var1*var2*var3*var4*var5*var6 / out=want list;
run;
By using the unique values of the given variables' combinations and then creating an alphabetical List of Ids, you can create the result
data inp;
length combined $6.;
input subjectid $4. v1 1. v2 1. v3 1. v4 1. v5 1. v6 1.;
combined=compress(v1||v2||v3||v4||v5||v6);
datalines;
ID1 111111
ID2 011111
ID3 001111
ID4 111110
ID5 000111
ID6 111111
ID7 000111
;
run;
proc sql;
create table uniq
as
select distinct combined from inp order by combined desc;
quit;
data uniq1;
set uniq;
retain alphabet 65;
Id=byte(alphabet) ;
alphabet+1;
drop alphabet;
run;
proc sql;
create table final_ds
as
select subjectid, v1, v2, v3, v4, v5, v6, Id
from inp a
left join uniq1 b
on a.combined=b.combined;
quit;
Assuming the data is sorted by your grouping variables then just use BY group processing.
data want;
set have;
by var1-var6 ;
groupid + first.var6 ;
run;
Or you could just convert the 6 binary variables into a single unique value.
group2 = input(cats(of var1-var6),binary6.);
This has the added value of not requiring that you sort the data, but it does need for none of the grouping variables to be missing.
Result
SubjectID Var1 Var2 Var3 Var4 Var5 Var6 Identifier Want groupno group2
ID4 0 0 1 1 1 0 C 1 14
ID3 0 1 1 1 1 1 B 2 31
ID1 1 1 1 1 1 1 A 3 63
ID2 1 1 1 1 1 1 A 3 63

SAS : proc freq on 3 categorical variables

I have data with
One binary variable, poor
Two socio-demographic variables var1 and var2
I would like to have the poverty rate of each of my var1 * var2 possible value, that would look like that :
But with three variables in a proc freq, I get multiple outputs, one for each value of the first variable I put on my product
proc freq data=test;
table var1*var2*poor;
run;
How can I get something close to what I would like ?
Try this
data test;
input var1 var2 poor;
cards;
1 1 1
2 3 0
3 2 1
4 1 1
1 2 1
2 3 0
4 1 0
4 2 0
3 1 1
1 2 0
3 2 0
1 3 1
3 3 0
3 3 0
3 3 1
1 1 0
2 2 0
2 2 1
2 2 1
2 1 1
2 1 1
2 1 1
;
run;
proc tabulate data=test;
class var1 var2 poor;
tables var1,
var2*poor*pctn<poor>={label="%"};
run;

Create different decimal format in SAS proc report and ODS

I have a sample SAS code below. And I want to know how to proc report the percentage using two decimal formats. For example, I want to put 100% as just zero decimal, and all the other percentages which is not 100% with 1 decimal such like 25.0%.
Here is my code.
data a;
infile datalines missover;
input subjid trt itt safety pp complete enroll disreason;
uncomplete=(complete^=1);
datalines;
1 1 1 1 1 1 1
2 2 0 1 1 1 1
3 1 1 1 1 0 1 4
4 2 1 1 1 1 1
5 1 1 1 0 0 1 5
6 2 1 1 1 1 1
7 2 1 1 1 0 1 1
8 1 0 1 0 1 1
9 1 1 1 1 0 1 5
10 2 0 1 0 0 1 5
11 2 1 1 1 0 1 1
12 2 1 1 0 0 1 2
13 1 1 1 0 0 1 3
14 2 1 1 0 0 1 4
;
run;
data b;
set a(in=a) a(in=b);
if b then trt=3;
run;
data c;
length cat $20;
set b;
array a(6) itt safety pp complete uncomplete enroll;
array b(5) r1 r2 r3 r4 r5;
do i=1 to 6;
cat=upcase(vname(a(i)));
value=a(i);
output;
end;
do j=1 to 5;
if disreason=j then value=1;
else value=0;
cat=upcase(vname(b(j)));
output;
end;
keep trt cat value;
run;
proc format ;
value $newcat(notsorted)
'ENROLL'='Enrolled Population'
'1'='1'
'PP'='Per-Protocol Population'
'2'='2'
'ITT'='ITT Population'
'3'='3'
'SAFETY'='Safety Population'
'4'='4'
'COMPLETE'='Patients Completed'
'UNCOMPLETE'='Patients Discontinued'
'5'='Primary Reason for Discontinuation of Study Dose'
'R1'='\li360 Lack of Effect'
'R2'='\li360 Protocol Violation'
'R3'='\li360 Lost to Follow-up'
'R4'='\li360 Adverse Event'
'R5'='\li360 Personal Reason';
value trt 1='Treatment 1'
2='Treatment 2'
3='Overall';
picture pct(round)
0<-100='0009)'(prefix='(' mult=100)
0=0;
run;
option missing='' nodate nonumber orientation=landscape;
ods rtf file='c:\dispoistion.rtf';
proc report data=c completerows nowd
style(report)={frame=hsides rules=groups}
style(header)={background=white};
column cat cat2 trt, value, (sum mean);
define cat/group format=$newcat. preloadfmt order=data noprint;
define cat2/computed ' ' style(column)={cellwidth=30% PROTECTSPECIALCHARS=OFF
};
define trt/across format=trt. '' order=internal;
define value/analysis '';
define sum/ 'N' style(column)={cellwidth=35pt} style(header)={just=right};
define mean/ '(%)' format=pct. style(column)={cellwidth=35pt just=left}
style(header)={just=left};
compute cat2/char length=50;
cat2=put(cat, $newcat.);
if cat2 in ('1', '2', '3','4') then cat2='';
endcomp;
run;
ods rtf close;
You need to use a picture format:
data test;
input x;
datalines;
0.5
0.751
0.999
1.00
;;;;
run;
proc format;
picture pct100f
-1 = [PERCENT7.0]
-1<-<1 = [PERCENT7.1]
1 = [PERCENT7.0]
other=[BEST12.];
quit;
proc print data=test;
format x pct100f.;
var x;
run;
Adjust that as needed. The -1 <-< 1 means anything that is between -1 and 1, exclusive. The [PERCENT7.0] is telling it to use that format for that section.
Try using this format:
define mean/ '(%)' format=percent7.1

class in proc tabulate not working

Hi my dataset looks something like this:
Var1 Var2 mainvar
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
I want to tabulate Var1 and Var2 based on the value of mainvar (which ranges from 1 to 5) so I tried:
%let class=Var1 Var2
proc tabulate data=x noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
But this is giving me the table without the data being factored by values of mainvar. Any help? Thanks!
In general, I think it's best to create a reproducible example. The following works fine for me:
data example ;
input var1 var2 mainvar ;
cards;
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
;
run;
%let class=Var1 Var2 ;
proc tabulate data=example noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;