Flatten Multiple Observations in SAS - sas

I have a data set where a patient can have multiple (and unknown) values for some variables that ends up looking something like this:
ID Var1 Var2 Var3 Var4
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
...
99 Blue Female 14 908
100 Red Male 28 911
I want to pack this data down so that each ID has only a single entry, with indicators for the presence or absence of one of the values in their original slew of entries. So, for example, something like this:
ID YesBlue Var2 Var3 Yes911
1 1 Female 17 1
99 1 Female 14 0
100 0 Male 28 1
Is there a straightforward way to do this in SAS? Or failing that, in Access (where the data is coming from) which I have no idea really how to use.

If your data set is called PATIENTS1, maybe something like this:
proc sql noprint;
create table patients2 as
select *
,case(var1)
when "Blue" then 1
else 0
end as ablue
,case(var4)
when 911 then 1
else 0
end as a911
,max(calculated ablue) as yesblue
,max(calculated a911) as yes911
from patients1
group by id
order by id;
quit;
proc sort data=patients2 out=patients3(drop=var1 var4 ablue a911) nodupkey;
by id;
run;

Here's a data step solution. I'm assuming that the values for Var2 and Var3 are always the same for a given ID.
data have;
input ID Var1 $ Var2 $ Var3 Var4;
cards;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
;
run;
data want (drop=Var1 Var4 _:);
set have;
by ID;
if first.ID then do;
_blue=0;
_911=0;
end;
_blue+(Var1='Blue');
_911+(Var4=911);
if last.ID then do;
YesBlue=(_blue>0);
Yes911=(_911>0);
output;
end;
run;

EDIT: Looks like the same thing Keith said, only written differently.
This should do it:
data test;
input id Var1 $ Var2 $ Var3 Var4;
datalines;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
run;
data flatten(drop=Var1 Var4);
set test;
retain YesBlue;
retain Yes911;
by id;
if first.id then do;
YesBlue = 0;
Yes911 = 0;
end;
if Var1 eq "Blue" then YesBlue = 1;
if Var4 eq 911 then Yes911 = 1;
if last.id then output;
run;

PROC SQL is perfect for things like this. This a similar to DavB's answer, but eliminates the additional sort:
data have;
input ID Var1 $ Var2 $ Var3 Var4;
cards;
1 Blue Female 17 908
1 Blue Female 17 909
1 Red Female 17 910
1 Red Female 17 911
99 Blue Female 14 908
100 Red Male 28 911
;
run;
proc sql;
create table want as
select ID
, max(case(var1)
when 'Blue'
then 1
else 0 end) as YesBlue
, max(var2) as Var2
, max(var3) as Var3
, max(case(var4)
when 911
then 1
else 0 end) as Yes911
from have
group by id
order by id;
quit;
It also safely reduces your original data by the ID variable, but at the risk of possible errors if the source is not exactly as you describe.

Related

Populating a dataset depending on the values of a variable in another dataset

I have two data sets INPUT and OUTPUT.
data INPUT;
input
id 1-4
var1 $ 6-10
var2 $ 12-17
var3 $ 19-22
transformation $ 24-26
;
datalines;
1023 apple banana oats 1:1
1049 12 22 8 2x
1219 milk cream fish 1:1
;
run;
The OUTPUT dataset has a different structure. The variables do not have the same name.
data work.output;
attrib
variable_1 length=8 format=best12. label="Variable 1"
variable_2 length=$50 format=$50. label="Variable 2"
Variable_3 length=8 format=date9. label="Variable 3";
stop;
run;
OUTPUT will be filled with the values from input based on what is specified in column "transformation" in table INPUT: when "transformation" equals "1:1", I want to fill the OUTPUT ds with the values of the corresponding INPUT dataset. If this were a small excel, I would do copy & paste or a lookup.
For example, obs1 of dataset INPUT has transformation = 1:1, so I want to fill variable_1 of dataset OUTPUT with "apple", variable_2 with "banana" and variable_3 with "oats".
For the second observation of ds INPUT I want to multiply each variable with two and assign them to variable_1 - variable_3 respectively.
In my real dataset I have much more columns so I need to automate this, probalby via index, since the variable names do not correspond.
You probably need to code each transformation rule separately.
This works for your example. But you did not include any date transformations so variable3 is not used.
data INPUT;
input
id 1-4
var1 $ 6-10
var2 $ 12-17
var3 $ 19-22
transformation $ 24-26
;
datalines;
1023 apple banana oats 1:1
1049 12 22 8 2x
1219 milk cream fish 1:1
;
proc transpose data=input prefix=value out=step1;
by id transformation;
var var1-var3 ;
run;
data output;
set step1;
length variable1 8 variable2 $50 variable3 8;
format variable3 date9.;
if transformation='1:1' then variable2=value1;
if transformation='2x' then variable1 = 2*input(value1,32.);
run;
Result
Obs id transformation _NAME_ value1 variable1 variable2 variable3
1 1023 1:1 var1 apple . apple .
2 1023 1:1 var2 banana . banana .
3 1023 1:1 var3 oats . oats .
4 1049 2x var1 12 24 .
5 1049 2x var2 22 44 .
6 1049 2x var3 8 16 .
7 1219 1:1 var1 milk . milk .
8 1219 1:1 var2 cream . cream .
9 1219 1:1 var3 fish . fish .

SAS, transpose a table

I want to transform my SAS table from data Have to data want.
I feel I need to use Proc transpose but could not figure it out how to do it.
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
data Want;
input Variable $11.0 MAX MIN SUM;
datalines;
Variable_1 6 0 29
Variable_2 7 1 87
Variable_3 11 3 87
Variable_4 23 5 100
;
You are right, proc transpose is the solution
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
/*sort it by the stat var*/
proc sort data=Have; by Stat; run;
/*id statement will keep the column names*/
proc transpose data=have out=want name=Variable;
id stat;
run;
proc print data=want; run;

Transpose multiple columns to rows in SAS

I am new to SAS and I want to transpose the following table in SAS
From
ID Var1 Var2 Jul-09 Aug-09 Sep-09
1 10 15 200 300
2 5 17 -150 200
to
ID Var1 Var2 Date Transpose
1 10 15 Jul-09 200
1 10 15 Aug-09 300
2 5 17 Aug-09 -150
2 5 17 Sep-09 200
Can anyone help please?
You can use proc transpose to tranform data.
options validvarname=any;
data a;
infile datalines missover;
input ID Var1 Var2 "Jul-09"n "Aug-09"n "Sep-09"n;
datalines;
1 10 15 200 300
2 5 17 -150 200
;
run;
proc transpose data=a out=b(rename=(_NAME_=Date COL1=Transpose));
var "Jul-09"n--"Sep-09"n;
by ID Var1-Var2;
run;
data a;
input ID Var1 Var2 Jul_09 Aug_09;
CARDS;
1 10 15 200 300
2 5 17 -150 200
;
DATA b(drop=i jul_09 aug_09);
array dates_{*} jul_09 aug_09;
set a;
do i=1 to dim(dates_);
this_value=dates_{i};
this_date=input(compress(vname(dates_{i}),'_'),MONYY5.);
output;
end;
format this_date monyy5.;
run;

proc tabulate table with where clause

I have a dataset with tree variables, three binary variables.
I wrote a proc tabulate
proc tabulate data=mydata;
class country var1 var2;
table Country, var1 var2;
run;
Var1 Var2
0 1 0 1
USA 40 50 40 50
AUS 50 20 50 20
IRE 60 40 60 40
DUB 70 50 70 50
Here I get the table with the totals of both var 1 var 2 for 0s and 1s.
However I want only the totals of 1s in this cross table. How can I do that.
If I use a where caluse as below, it shows only 1ns both..
proc tabulate data=mydata;
class country var1 var2;
table Country, var1 var2;
where var1=1 and var2=2;
run;
When I use the above it brings out only the 1s present in both at the sametime.
Which is not I am looking for.
So the dataset I want is as below.
Var1 Var2
1 1
USA 50 50
AUS 20 20
IRE 40 40
DUB 50 50
Is there any other way of doing this?
Change and to or.
Truth table for
Var1=1, Var2=1
Include?
Var1 Var2 AND OR
0 0 N N
0 1 N Y
1 0 N Y
1 1 Y Y
Since your variables are coded 0,1 you can ask for the SUM statistic to get the "count" of the number of ones.
proc tabulate data=mydata;
class country;
var var1 var2;
table Country, var1*sum var2*sum;
run;

Why informat is not working in SAS

Tried various formats of date, but output do not reflects any date. What could be the issue?
data c;
input age gender income color$ doj$;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;
You are mixing things up a bit.
The date formats are to be applied on numeric data, not on text data.
So you should not read in doj as $ (text), but as a date (so a date informat).
Try DDMMYY10. for doj on your input statement:
data c;
input age gender income color$ doj ddmmyy10.;
format doj date9.;
datalines;
19 1 14000 W 14/07/1988
45 2 45000 b 15/09/1956
34 2 56000 y 14/09/1967
33 1 45000 b 14/02/1956
;
run;