SAS macro for mean change in baseline values - sas

I have the following data:
Patient Visit VisitNumber LAB LABVALUE
001 BASELINE 1 LAB1 10
001 DAY 100 2 LAB1 15
001 DAY 200 3 LAB1 12
002 BASELINE 1 LAB1 11
002 DAY 100 2 LAB1 14
002 DAY 200 3 LAB1 12
001 BASELINE 1 LAB2 40
001 DAY 100 2 LAB2 45
001 DAY 200 3 LAB2 42
002 BASELINE 1 LAB2 41
002 DAY 100 2 LAB2 44
002 DAY 200 3 LAB2 42
I would like to create the following table, which summarizes the variable 'LABVALUE' for all patients at each visit (Table 2):
Visit VisitNumber LAB MEAN BASELINEMEAN CHANGEBASEMEAN
BASELINE 1 LAB1 10.5 10.5 .
DAY 100 2 LAB1 14.5 10.5 4
DAY 200 3 LAB1 12 10.5 1.5
BASELINE 1 LAB2 40.5 40.5 .
DAY 100 2 LAB2 44.5 40.5 4
DAY 200 3 LAB2 42 40.5 1.5
I have the following code that generates the change in values from baseline for each visit by patient:
proc sort data=have;
by patient lab visitnumber;
run;
data for_report;
set have;
by patient lab;
retain base_visitnum base_labvalue;
if first.patient then do;
base_visitnum = .;
base_labvalue = .;
end;
if first.lab and visit='BASELINE' then do;
base_visitnumber = visitnumber;
base_labvalue = labvalue;
end;
if not first.lab then do;
delta_labvalue = labvalue - base_labvalue;
end;
run;
This generates the following table:
LAB Visit VisitNumber LABVALUE BASE_VISITNUM BASE_LABVALUE DELTA_LABVALUE
LAB1 BASELINE 1 10 1 10 .
LAB1 DAY 100 2 15 1 10 5
LAB1 DAY 200 3 12 1 10 2
LAB1 BASELINE 1 11 1 11 .
LAB1 DAY 100 2 14 1 11 3
LAB1 DAY 200 3 12 1 11 1
LAB2 BASELINE 1 40 1 10 .
LAB2 DAY 100 2 45 1 10 5
LAB2 DAY 200 3 42 1 10 2
LAB2 BASELINE 1 41 1 11 .
LAB2 DAY 100 2 44 1 11 3
LAB2 DAY 200 3 42 1 11 1
Any insight as to how I can generate Table 2 would be greatly appreciated.

This should get you most of the way there:
proc sql noprint;
create table table2 as
select visit,
visitnumber,
lab,
mean(value) as mean,
mean(base_labvalue) as baselinemean
from for_report
group by visit, visitnumber, lab
;
quit;
I've left some details for you to complete :-)
Also, watch out for the mismatch between base_visitnum and base_visitnumber in your example code.

Related

How to Capture previous row value and perform subtraction

How to Capture previous row value and perform subtraction
Refer Table 1 as main data, Table 2 as desired output, Let me explain you in detail, Closing_Bal is derived from (Opening_bal - EMI) for eg if (20 - 2) = 18, as value 18 i want in 2nd row under opening_bal column then ( opening_bal - EMI) and so till new LAN , If New LAN available then start the loop again ,
i have created lag function butnot able to run loop
Try this
data A;
input Month $ LAN Opening_Bal EMI Closing_Bal;
infile datalines dlm = '|' dsd;
datalines;
1_Nov|1|20|2|18
2_Dec|1| |3|
3_Jan|1| |5|
4_Feb|1| |3|
1_Nov|2|30|4|26
2_Dec|2| |3|
3_Jan|2| |2|
4_Feb|2| |5|
5_Mar|2| |6|
;
data B(drop = c);
set A;
by LAN;
if first.LAN then c = Closing_Bal;
if Opening_Bal = . then do;
Opening_Bal = c;
Closing_Bal = Opening_Bal - EMI;
c = Closing_Bal;
end;
retain c;
run;
Result:
Month LAN Opening_Bal EMI Closing_Bal
1_Nov 1 20 2 18
2_Dec 1 18 3 15
3_Jan 1 15 5 10
4_Feb 1 10 3 7
1_Nov 2 30 4 26
2_Dec 2 26 3 23
3_Jan 2 23 2 21
4_Feb 2 21 5 16
5_Mar 2 16 6 10
The problem is that you already have CLOSING_BAL on the input dataset, so when the SET statement reads a new observation it will overwrite the value calculated on the previous observation. Either drop or rename the variable in the source dataset.
Example:
data have;
input Month $ LAN Opening_Bal EMI Closing_Bal;
datalines;
1_Nov 1 20 2 18
2_Dec 1 . 3 .
3_Jan 1 . 5 .
4_Feb 1 . 3 .
1_Nov 2 30 4 26
2_Dec 2 . 3 .
3_Jan 2 . 2 .
4_Feb 2 . 5 .
5_Mar 2 . 6 .
;
data want;
set have (drop=closing_bal);
retain Closing_Bal;
Opening_Bal=coalesce(Opening_Bal,Closing_Bal);
Closing_bal=Opening_bal - EMI ;
run;
Results:
Opening_ Closing_
Obs Month LAN Bal EMI Bal
1 1_Nov 1 20 2 18
2 2_Dec 1 18 3 15
3 3_Jan 1 15 5 10
4 4_Feb 1 10 3 7
5 1_Nov 2 30 4 26
6 2_Dec 2 26 3 23
7 3_Jan 2 23 2 21
8 4_Feb 2 21 5 16
9 5_Mar 2 16 6 10
I am not sure this works
data B;
set A;
by lan;
if not first.lan then do;
opening_bal = lag(closing_bal);
closing_bal = opening_bal - EMI;
end;
run;
because you don't execute lag for each observation.

Biderectional Vlookup - flag in the same table - Sas

I need to do this:
table 1:
ID Cod.
1 20
2 102
4 30
7 10
9 201
10 305
table 2:
ID Cod.
1 20
2 50
3 15
4 30
5 25
7 10
10 300
Now, I got a table like this with an outer join:
ID Cod. ID1 Cod1.
1 20 1 20
2 50 . .
. . 2 102
3 15 . .
4 30 4 30
5 25 . .
7 10 7 10
. . 9 201
10 300 . .
. . 10 305
Now I want to add a flag that tell me if the ID have common values, so:
ID Cod. ID1 Cod1. FLag_ID Flag_cod:
1 20 1 20 0 0
2 50 . . 0 1
. . 2 102 0 1
3 15 . . 1 1
4 30 4 30 0 0
5 25 . . 1 1
7 10 7 10 0 0
. . 9 201 1 1
10 300 . . 0 1
. . 10 305 0 1
I would like to know how can I get the flag_ID, specifically to cover the cases of ID = 2 or ID=10.
Thank you
You can group by a coalescence of id in order to count and compare details.
Example
data table1;
input id code ##; datalines;
1 20 2 102 4 30 7 10 9 201 10 305
;
data table2;
input id code ##; datalines;
1 20 2 50 3 15 4 30 5 25 7 10 10 300
;
proc sql;
create table got as
select
table2.id, table2.code
, table1.id as id1, table1.code as code1
, case
when count(table1.id) = 1 and count(table2.id) = 1 then 0 else 1
end as flag_id
, case
when table1.code - table2.code ne 0 then 1 else 0
end as flag_code
from
table1
full join
table2
on
table2.id=table1.id and table2.code=table1.code
group by
coalesce(table2.id,table1.id)
;
You might also want to look into
Proc COMPARE with BY

Left join PROC SQL using threshold date

I am hoping you can help me! Please help!!!!
I am in SAS using PROC SQL and I have datasets A and B with different measurements (relating to patient's health) as follows:
Dataset A
ID Date measurement_a
1 20JUN2013 52.3
1 12JUL2013 65.6
1 28NOV2014 37.4
1 02DEC2014 61.3
1 22SEP2015 40.5
1 15OCT2015 60.5
2 03JUN2011 46.5
2 19JUL2011 54.1
2 29OCT2012 53.6
...
Dataset B
ID Date measurement_b
1 21MAR2007 43
1 13JUL2007 45
1 07APR2009 47
1 14MAY2009 46
1 16FEB2012 42
1 27AUG2012 53
1 12DEC2012 58
1 20JUN2013 56
1 10DEC2013 53
1 23MAY2014 49
1 17SEP2014 44
1 23SEP2015 40
2 16DEC2011 58
2 22AUG2012 54
2 20FEB2013 56
2 29MAY2013 53
...
What I am looking for is that if the date in Dataset B is within 6 months of the date in Dataset a, then a new variable called "time" will be added, saying 1,2,3,etc. for how many ever match with ** only measurement_a** length (in other words, I do not need to retain values of measurement_b if it does not match the date in Dataset a. Here is an example of what I mean:
Desired result/dataset:
ID Time measurement_a measurement_b
1 1 52.3 56 (Dataset B Date = 20JUN2013 - Matched exactly)
1 2 65.6 53 (Dataset B date = 10DEC2013 - Within six months of 12JUL2013 [Dataset A Date])
1 3 37.4 44 (Dataset B date = 17SEP2014 - Within six months of 28NOV2014 [Dataset A Date])
1 4 61.3 . (because 17SEP2014 [Dataset B] is closest to 28NOV2014 [Dataset A])
1 5 40.5 40 (because 23SEP2015 [Dataset B] is closest to 22SEP2015 [Dataset A])
1 6 60.5 . (No date in Dataset B that is within 6 months of Date in Dataset A [15OCT2015])
2 1 46.5 . (See below)
2 2 54.1 58 (because 03JUL2011 [Dataset B] is closest to 19JUL2011 [Dataset A])
2 3 53.6 54 (Dataset B date = 22AUG2012 - Within 6 months of Dataset A date = 29OCT2012)
...
I have joined on ID but the times is proving difficult. I know it could be the difference in months in the "where" statement in the following code:
PROC SQL;
CREATE TABLE join_test as
SELECT * FROM data_a as a
LEFT_JOIN data_b as b
ON a.id = b.id
WHERE days(a.Date - b.Date) <= 180 ;
QUIT;
But this does not do the trick.
Can some please help me?
I really appreciate it. Thanks in advance.
In the join criteria add the use of the SAS function INTCK to compute the number of month intervals between the two date values. Proc SQL does not have a way to introduce a serial count value, so you will have to add that in a subsequent step. A LEFT JOIN will create a result set with every id/date in table A.
Example:
The columns a.date, b_date and c_months_apart were added to show how the join works. You can safely remove them from the select.
proc sql;
create table stage1 as
select
a.id
, a.date
, a.measurement_a
, b.measurement_b
, b.date as b_date
, intck('month', a.date, b.date, 'C') as c_months_apart
from
a left join b
on a.id = b.id
and intck('month', a.date, b.date, 'C') between 0 and 6
order by a.id, a.date, b.date
;
data want;
set stage1;
by id;
if first.id then time=1; else time+1;
run;
Output (want)
measurement_ measurement_ c_months_
ID Date a b b_date apart time
1 20JUN2013 52.3 56 20JUN2013 0 1
1 20JUN2013 52.3 53 10DEC2013 5 2
1 12JUL2013 65.6 53 10DEC2013 4 3
1 28NOV2014 37.4 . . . 4
1 02DEC2014 61.3 . . . 5
1 22SEP2015 40.5 40 23SEP2015 0 6
1 15OCT2015 60.5 . . . 7
2 03JUN2011 46.5 58 16DEC2011 6 1
2 19JUL2011 54.1 58 16DEC2011 4 2
2 29OCT2012 53.6 56 20FEB2013 3 3

SAS, calculate row difference

data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
I have two columns of data ID and Month. Column 1 is the ID, the same ID may have multiple rows (1-5). The second column is the enrolled month. I want to create the third column. It calculates the different between the current month and the initial month for each ID.
you can do it like that.
data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
data calc;
set test;
by id;
retain current_month;
if first.id then do;
current_month=month;
calc_month=0;
end;
if ^first.id then do;
calc_month = month - current_month ;
end;
run;
Krs

rolling up groups in a matrix

Here is the data I have, I use proc tabulate to present it how it is presented in excel, and to make the visualization easier. The goal is to make sure groups strictly below the diagonal (i know it's a rectangle, the (1,1) (2,2)...(7,7) "diagonal") to roll up the column until it hits the diagonal or makes a group size of at least 75.
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 80 90 87 45 12 41 88
4 24 90 100 110 22 141 88
5 0 1 0 0 0 0 2
6 0 1 0 0 0 0 6
7 0 1 0 0 0 0 2
8 0 1 0 0 0 0 11
Ive already used if/thens to regroup certain data values, but I need a general way to do it for other sets.
Thanks in advance
desired results
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 104 90 87 45 12 41 88
4 0 94 100 110 22 141 88
5 0 0 0 0 0 0 2
6 0 0 0 0 0 0 6
7 0 0 0 0 0 0 13
8 0 0 0 0 0 0 0
Mock up some categorical data for some patients who have to be counted
data mock;
do patient_id = 1 to 2500;
month = ceil(7*ranuni(123));
age = ceil(8*ranuni(123));
output;
end;
stop;
run;
Create a tabulation of counts (N) similar to the one shown in the question:
options missing='0';
proc tabulate data=mock;
class month age;
table age,month*n=''/nocellmerge;
run;
For each month get the sub-diagonal patient count
proc sql;
/* create table subdiagonal_column as */
select month, count(*) as subdiag_col_freq
from mock
where age > month
group by month;
For each row get the pre-diagonal patient count
/* create table prediagonal_row as */
select age, count(*) as prediag_row_freq
from mock
where age > month
group by age;
other sets can be tricky if the categorical values are not +1 monotonic. To do a similar process for non-montonic categorical values you will need to create surrogate variables that are +1 monotonic. For example:
data mock;
do item_id = 1 to 2500;
pet = scan ('cat dog snake rabbit hamster', ceil(5*ranuni(123)));
place = scan ('farm home condo apt tower wild', ceil(6*ranuni(123)));
output;
end;
run;
proc tabulate data=mock;
class pet place;
table pet,place*n=''/nocellmerge;
run;
proc sql;
create table unq_pets as select distinct pet from mock;
create table unq_places as select distinct place from mock;
data pets;
set unq_pets;
pet_num = _n_;
run;
data places;
set unq_places;
place_num = _n_;
run;
proc sql;
select distinct place_num, mock.place, count(*) as subdiag_col_freq
from mock
join pets on pets.pet = mock.pet
join places on places.place = mock.place
where pet_num > place_num
group by place_num
order by place_num
;