I have a dataset with one id column and three variables:
data have;
input id var1 var2 var3;
datalines;
1 0 1 0
2 1 1 0
3 0 0 2
4 0 4 1
;
run;
I want to use some osrt or data or proc sql step over var1 to var3 to keep as 0 if it is 0, and 1 if it is greater than 0. It should ideally use an array var1 -- var3 as the actual dataset has many more variables.
Try this
data have;
input id var1 var2 var3;
datalines;
1 0 1 0
2 1 1 0
3 0 0 2
4 0 4 1
;
run;
data want;
set have;
array v var1 -- var3;
do over v;
v = v > 0;
end;
run;
Related
In SAS, how can I create an identifier for each unique combination of a set of variables?
I have, for example, a several thousand observations with a dichotomous value for six variables. There are 2^6 unique combinations for the values of these variables for each observation. I would like to create an identifier for each unique combination, and eventually group my observations according to this value.
Have:
SubjectID Var1 Var2 Var3 Var4 Var5 Var6
---------------------------------------------------------------
ID1 1 1 1 1 1 1
ID2 1 0 1 1 1 1
ID3 0 1 1 1 1 1
ID4 0 0 1 1 1 0
... ... ... ... ... ... ...
ID3000 1 1 0 1 0 0
Want:
SubjectID Var1 Var2 Var3 Var4 Var5 Var6 Identifier
------------------------------------------------------------------------------
ID1 1 1 1 1 1 1 A
ID2 1 1 1 1 1 1 A
ID3 0 1 1 1 1 1 B
ID4 0 0 1 1 1 0 C
... ... ... ... ... ... ...
ID3000 1 1 0 1 0 0 Z
A would represent 1, 1, 1, 1, 1, 1 as a unique combination and B would represent 0, 1, 1, 1, 1, 1 etc.
I have thought about creating a dummy variable based on 64 Var1-Var6 conditional statements. I've also thought about concatenating the values from Var1-Var6 into a new row to create a unique identifier.
Is there a more straightforward way of going about this?
I prefer an approach that assigns a specific identifier to a specific combination of the values, rather than one that just generates some arbitrary unique string whenever a new combination comes up.
Proc summary works well with the LEVELS option. This technique works for any values of the group variables numeric or character.
data have;
input (v1-v6)(1.);
cards;
111111
111110
111101
111011
110111
;;;;
proc print;
proc summary data=have nway;
class v1-v6;
output out=unique(drop=_type_) / levels;
run;
Why not just concatenate the values?
So your combinations are:
111111
111110
111101
111011
110111
....
You can use PROC FREQ to check the number of each type.
proc freq data=have;
table var1*var2*var3*var4*var5*var6 / out=want list;
run;
By using the unique values of the given variables' combinations and then creating an alphabetical List of Ids, you can create the result
data inp;
length combined $6.;
input subjectid $4. v1 1. v2 1. v3 1. v4 1. v5 1. v6 1.;
combined=compress(v1||v2||v3||v4||v5||v6);
datalines;
ID1 111111
ID2 011111
ID3 001111
ID4 111110
ID5 000111
ID6 111111
ID7 000111
;
run;
proc sql;
create table uniq
as
select distinct combined from inp order by combined desc;
quit;
data uniq1;
set uniq;
retain alphabet 65;
Id=byte(alphabet) ;
alphabet+1;
drop alphabet;
run;
proc sql;
create table final_ds
as
select subjectid, v1, v2, v3, v4, v5, v6, Id
from inp a
left join uniq1 b
on a.combined=b.combined;
quit;
Assuming the data is sorted by your grouping variables then just use BY group processing.
data want;
set have;
by var1-var6 ;
groupid + first.var6 ;
run;
Or you could just convert the 6 binary variables into a single unique value.
group2 = input(cats(of var1-var6),binary6.);
This has the added value of not requiring that you sort the data, but it does need for none of the grouping variables to be missing.
Result
SubjectID Var1 Var2 Var3 Var4 Var5 Var6 Identifier Want groupno group2
ID4 0 0 1 1 1 0 C 1 14
ID3 0 1 1 1 1 1 B 2 31
ID1 1 1 1 1 1 1 A 3 63
ID2 1 1 1 1 1 1 A 3 63
I have a dataset in SAS and I want to Convert one column into string by the Product. I have attached the image of input and output required.
I need the Colomn STRING in the outut. can anyone please help me ?
I have coded a data step to create the input data:
data have;
input products $
dates
value
;
datalines;
a 1 0
a 2 0
a 3 1
a 4 0
a 5 1
a 6 1
b 1 0
b 2 1
b 3 1
b 4 1
b 5 0
b 6 0
c 1 1
c 2 0
c 3 1
c 4 1
c 5 0
c 6 1
;
Does the following suggested solution give you what you want?:
data want;
length string $ 20;
do until(last.products);
set have;
by products;
string = catx(',',string,value);
end;
do until(last.products);
set have;
by products;
output;
end;
run;
Here's my quick solution.
data temp;
length cat $20.;
do until (last.prod);
set have;
by prod notsorted;
cat=catx(',',cat,value);
end;
drop value date;
run;
proc sql;
create table want as
select have.*, cat as string
from have inner join temp
on have.prod=temp.prod;
quit;
I have a sample SAS code below. And I want to know how to proc report the percentage using two decimal formats. For example, I want to put 100% as just zero decimal, and all the other percentages which is not 100% with 1 decimal such like 25.0%.
Here is my code.
data a;
infile datalines missover;
input subjid trt itt safety pp complete enroll disreason;
uncomplete=(complete^=1);
datalines;
1 1 1 1 1 1 1
2 2 0 1 1 1 1
3 1 1 1 1 0 1 4
4 2 1 1 1 1 1
5 1 1 1 0 0 1 5
6 2 1 1 1 1 1
7 2 1 1 1 0 1 1
8 1 0 1 0 1 1
9 1 1 1 1 0 1 5
10 2 0 1 0 0 1 5
11 2 1 1 1 0 1 1
12 2 1 1 0 0 1 2
13 1 1 1 0 0 1 3
14 2 1 1 0 0 1 4
;
run;
data b;
set a(in=a) a(in=b);
if b then trt=3;
run;
data c;
length cat $20;
set b;
array a(6) itt safety pp complete uncomplete enroll;
array b(5) r1 r2 r3 r4 r5;
do i=1 to 6;
cat=upcase(vname(a(i)));
value=a(i);
output;
end;
do j=1 to 5;
if disreason=j then value=1;
else value=0;
cat=upcase(vname(b(j)));
output;
end;
keep trt cat value;
run;
proc format ;
value $newcat(notsorted)
'ENROLL'='Enrolled Population'
'1'='1'
'PP'='Per-Protocol Population'
'2'='2'
'ITT'='ITT Population'
'3'='3'
'SAFETY'='Safety Population'
'4'='4'
'COMPLETE'='Patients Completed'
'UNCOMPLETE'='Patients Discontinued'
'5'='Primary Reason for Discontinuation of Study Dose'
'R1'='\li360 Lack of Effect'
'R2'='\li360 Protocol Violation'
'R3'='\li360 Lost to Follow-up'
'R4'='\li360 Adverse Event'
'R5'='\li360 Personal Reason';
value trt 1='Treatment 1'
2='Treatment 2'
3='Overall';
picture pct(round)
0<-100='0009)'(prefix='(' mult=100)
0=0;
run;
option missing='' nodate nonumber orientation=landscape;
ods rtf file='c:\dispoistion.rtf';
proc report data=c completerows nowd
style(report)={frame=hsides rules=groups}
style(header)={background=white};
column cat cat2 trt, value, (sum mean);
define cat/group format=$newcat. preloadfmt order=data noprint;
define cat2/computed ' ' style(column)={cellwidth=30% PROTECTSPECIALCHARS=OFF
};
define trt/across format=trt. '' order=internal;
define value/analysis '';
define sum/ 'N' style(column)={cellwidth=35pt} style(header)={just=right};
define mean/ '(%)' format=pct. style(column)={cellwidth=35pt just=left}
style(header)={just=left};
compute cat2/char length=50;
cat2=put(cat, $newcat.);
if cat2 in ('1', '2', '3','4') then cat2='';
endcomp;
run;
ods rtf close;
You need to use a picture format:
data test;
input x;
datalines;
0.5
0.751
0.999
1.00
;;;;
run;
proc format;
picture pct100f
-1 = [PERCENT7.0]
-1<-<1 = [PERCENT7.1]
1 = [PERCENT7.0]
other=[BEST12.];
quit;
proc print data=test;
format x pct100f.;
var x;
run;
Adjust that as needed. The -1 <-< 1 means anything that is between -1 and 1, exclusive. The [PERCENT7.0] is telling it to use that format for that section.
Try using this format:
define mean/ '(%)' format=percent7.1
Hi my dataset looks something like this:
Var1 Var2 mainvar
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
I want to tabulate Var1 and Var2 based on the value of mainvar (which ranges from 1 to 5) so I tried:
%let class=Var1 Var2
proc tabulate data=x noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
But this is giving me the table without the data being factored by values of mainvar. Any help? Thanks!
In general, I think it's best to create a reproducible example. The following works fine for me:
data example ;
input var1 var2 mainvar ;
cards;
1 0 1
0 0 1
1 1 3
0 0 2
1 1 5
1 1 4
0 0 3
;
run;
%let class=Var1 Var2 ;
proc tabulate data=example noseps missing FORMCHAR=' ';
class &class mainvar;
table &class;
run;
My dataset looks like this:
ID Var1 Var2 Var3
A 1 0 1
B 0 0 1
B 1 1 0
A 0 0 0
A 1 1 1
My expected output will be:
ID Var1 Var2 Var3
A 2 1 3
B 1 1 1
Can someone help with this?
I've not access to SAS right now but try the following:-
proc tabulate data = in;
class id;
var var:;
table id, sum=''*(var1 var2 var3);
run;