I have a data set that looks like the following but with multiple patients:
ID Variable val Visit
A Height 5 Base
A Weight 3 Base
A BMI 1 Base
A Height 2 Visit 1
A Weight 4 Visit 1
A BMI 3 Visit 1
data have;
input id var $ val visit $;
cards;
A height 5 base
A weight 3 base
A bmi 1 base
A height 2 visit1
A weight 4 visit1
A bmi 3 visit1
;
I would like to create a 3 new columns that retains the base value for all visits:
ID Variable Value Visit Height Weight BMI
A Height 5 Base 5 3 1
A Weight 3 Base 5 3 1
A BMI 1 Base 5 3 1
A Height 2 Visit 1 5 3 1
A Weight 4 Visit 1 5 3 1
A BMI 3 Visit 1 5 3 1
I would attempt this by retaining the "base" value but I'm a bit stumped on how to approach it.
proc sort data=mydata out=base;
where visit='base';
by id;
run;
data base;
set base;
by id;
if first.id then call missing(weight, height, bmi);
retain height weight bmi;
if var = 'Height' then height = val;
else if var = 'Weight' then weight = val;
else bmi = val;
if last.id then output;
keep height weight bmi id;
run;
proc sql;
create table withbase as
select * from mydata a full join base b on a.id = b.id;
Adding BASELINE to every obs as you are asking for doesn't seem very useful. How about something like this.
data have;
input id:$1. var $ val visit $;
cards;
A height 5 base
A weight 3 base
A bmi 1 base
A height 2 visit1
A weight 4 visit1
A bmi 3 visit1
;;;;
run;
proc sort data=have;
by id var visit;
run;
proc print;
run;
data maybe;
do until(last.var);
set have;
by id var;
if visit eq 'base' then baseline=val;
if not missing(baseline) then do;
change = val - baseline;
end;
output;
end;
run;
proc print;
run;
Related
I want to do some sum calculate for a data set. The challenge is I need to do both row sum AND column Sum by ID. Below is the example.
data have;
input ID var1 var2;
datalines;
1 1 1
1 3 2
1 2 3
2 0 5
2 1 3
3 0 1
;
run;
data want;
input ID var1 var2 sum;
datalines;
1 1 1 12
1 3 2 12
1 2 3 12
2 0 5 9
2 1 3 9
3 0 1 1
;
run;
Using SQL is cool, but SAS has nice data step!
proc sort data=have; by id; run;
data result;
set have;
by id;
retain sum 0;
if first.id then sum=0;
sum=sum+sum(var1,var2);
if last.id then output;
run;
proc sort data=result; by id; run;
data want;
merge have result;
by id;
run;
You will decide what to use...
Use SQL to do all of it in one step. Group only by ID, but keep var1 and var2 in the column selection. This will create the same data in want.
proc sql noprint;
create table want as
select ID
, var1
, var2
, sum(var1) + sum(var2) as sum
from have
group by ID
;
quit;
I am using a proc means to calculate the share of the payments made by business line, the data looks like this:
data Test;
input ID Business_Line Payment2017;
Datalines;
1 1 1000
2 1 2000
3 1 3000
4 1 4000
5 2 500
6 2 1500
7 2 3000
;
run;
i'm looking to calculate an additional column which, by group (business_line) calculates the percentage share (weight) of the payment as such:
data Test;
input ID Business_Line Payment2017 share;
Datalines;
1 1 1000 0.1
2 1 2000 0.2
3 1 3000 0.3
4 1 4000 0.4
5 2 500 0.1
6 2 1500 0.3
7 2 3000 0.6
;
run;
the code I have used so far:
proc means data = test noprint;
class ID;
by business_line;
var Payment2017;
output out=test2
sum = share;
weight = payment2017/share;
run;
I have also tried
proc means data = test noprint;
class ID;
by business_line;
var Payment2017 /weight = payment2017;
output out=test3 ;
run;
appreciate the help.
Proc FREQ will compute percentages. You can divide the PERCENT column of the output to get the fraction, or work with percents downstream.
In this example id crosses payment2017 in order to ensure all original rows are part of the output. If the id was not present, and there were any replicate payment amounts, FREQ would aggregate the payment amounts.
proc freq data=have noprint;
by business_line;
table id*payment2017 / out=want all;
weight payment2017 ;
run;
It is convenient to do with proc sql:
proc sql;
select *, payment2017/sum(payment2017) as share from test group by business_line;
quit;
data step:
data have;
do until (last.business_line);
set test;
by business_line notsorted;
total+payment2017;
end;
do until (last.business_line);
set test;
by business_line notsorted;
share=payment2017/total;
output;
end;
call missing(total);
drop total;
run;
Follow up to
SAS - transpose multiple variables in rows to columns
I have the following code:
data have;
input CX_ID 1. TYPE $1. COUNT_RATE 1. SUM_RATE 2.;
datalines;
1A110
1B220
2A120
;
run;
proc summary data = have nway;
class cx_id;
output out=want (drop = _:)
idgroup(out[2] (count_rate sum_rate)= count sum);
run;
So this table:
CX_ID TYPE COUNT_RATE SUM_RATE
1 A 1 10
1 B 2 20
2 A 1 20
becomes
CX_ID COUNT_1 COUNT_2 SUM_1 SUM_2
1 1 2 10 20
2 1 . 20 .
Which is perfect, but how do I set the names to be
Count_A Count_B Sum_A Sum_B
Or in general whatever the value in the type field of the have table ?
Thank you
A double PROC TRANSPOSE is dynamic and you can add a data step to customize the names easily.
*sample data;
data have;
input CX_ID 1. TYPE $1. COUNT 1. SUM 2.;
datalines;
1A110
1B220
2A120
;
run;
*transpose to long;
proc transpose data=have out=long;
by cx_id type;
run;
*transpose to wide;
proc transpose data=long out=wide;
by cx_id;
var col1;
id _name_ type;
run;
I have the following data where people in households are sorted by age (oldest to youngest):
data houses;
input HouseID PersonID Age;
datalines;
1 1 25
1 2 20
2 1 32
2 2 16
2 3 14
2 4 12
3 1 44
3 2 42
3 3 10
3 4 5
;
run;
I would like to calculate for each household the maximum age difference between consecutively aged people. So this example would give values of 5 (=25-20), 16 (=32-16) and 32 (=42-10) for households 1, 2 and 3 consecutively.
I could do this using lots of merges (i.e. extract person 1, merge with extract of person 2, and so on), but as there can be upto 20+ people in a household I'm looking for a much more direct method.
Here's a two pass solution. Same first step as the two solutions above, sort by age. In the second step keep track of max_diff per row, at the last record of HouseID output the results. This results in only two passes through the data.
proc sort data=houses; by houseid age;run;
data want;
set houses;
by houseID;
retain max_diff 0;
diff = dif1(age)*-1;
if first.HouseID then do;
diff = .; max_diff=.;
end;
if diff>max_diff then max_diff=diff;
if last.houseID then output;
keep houseID max_diff;
run;
proc sort data=houses; by houseid personid age;run;
data _t1;
set houses;
diff = dif1(age) * (-1);
if personid = 1 then diff = .;
run;
proc sql;
create table want as
select houseid, max(diff) as Max_Diff
from _t1
group by houseid;
proc sort data = house;
by houseid descending age;
run;
data house;
set house;
by houseid;
lag_age = lag1(age);
if first.houseid then age_diff = 0;
age_diff = lag_age - age;
run;
proc sql;
select houseid,max(age_diff) as max_age_diff
from house
group by houseid;
quit;
Working:
First sort the data set using houseid and descending Age.
Second data step will calculate difference between current age value (in PDV) and previous age value in PDV. Then, using sql procedure, we can get the max age difference for each houseid.
Just throwing one more into the mix. This one is a condensed version of Reeza's response.
/* No need to sort by PersonID as age is the only concern */
proc sort data = houses;
by HouseID Age;
run;
data want;
set houses;
by HouseID;
/* Keep the diff when a new row is loaded */
retain diff;
/* Only replace the diff if it is larger than previous */
diff = max(diff, abs(dif(Age)));
/* Reset diff for each new house */
if first.HouseID then diff = 0;
/* Only output the final diff for each house */
if last.HouseID;
keep HouseID diff;
run;
Here is an example using FIRST. and LAST. with one pass (after sort) through the data.
data houses;
input HouseID PersonID Age;
datalines;
1 1 25
1 2 20
2 1 32
2 2 16
2 3 14
2 4 12
3 1 44
3 2 42
3 3 10
3 4 5
;
run;
Proc sort data=HOUSES;
by houseid descending age ;
run;
Data WANT(keep=houseid max_diff);
format houseid max_diff;
retain max_diff age1 age2;
Set HOUSES;
by houseid descending age ;
if first.houseid and last.houseid then do;
max_diff=0;
output;
end;
else if first.houseid then do;
call missing(max_diff,age1,age2);
age1=age;
end;
else if not(first.houseid or last.houseid) then do;
age2=age;
temp=age1-age2;
if temp>max_diff then max_diff=temp;
age1=age;
end;
else if last.houseid then do;
age2=age;
temp=age1-age2;
if temp>max_diff then max_diff=temp;
output;
end;
Run;
I checked out this previous post (LINK) for potential solution, but still not working. I want to sum across rows using the ID as the common identifier. The num variable is constant. The id and comp the two variables I want to use to creat a pct variable, which = sum of [comp = 1] / num
Have:
id Comp Num
1 1 2
2 0 3
3 1 1
2 1 3
1 1 2
2 1 3
Want:
id tot pct
1 2 100
2 3 0.666666667
3 1 100
Currently have:
proc sort data=have;
by id;
run;
data want;
retain tot 0;
set have;
by id;
if first.id then do;
tot = 0;
end;
if comp in (1) then tot + 1;
else tot + 0;
if last.id;
pct = tot / num;
keep id tot pct;
output;
run;
I use SQL for things like this. You can do it in a Data Step, but the SQL is more compact.
data have;
input id Comp Num;
datalines;
1 1 2
2 0 3
3 1 1
2 1 3
1 1 2
2 1 3
;
run;
proc sql noprint;
create table want as
select id,
sum(comp) as tot,
sum(comp)/count(id) as pct
from have
group by id;
quit;
Hi there is a much more elegant solution to your problem :)
proc sort data = have;
by id;
run;
data want;
do _n_ = 1 by 1 until (last.id);
set have ;
by id ;
tot = sum (tot, comp) ;
end ;
pct = tot / num ;
run;
I hope it is clear. I use sql too because I am new and the DOW loop is rather complicated but in your case its pretty straightforward.