SAS Mapping between two datasets - sas

I’m trying to map between two datasets using SAS.
Dataset 1
ID Start_Date End Date
Aaa 1/1/2023 5/1/2023
Bbb 10/1/2023 1/2/2023
Ccc 15/1/2023 27/1/2023
Dataset 2
ID Date
Aaa 4/1/2023
Aaa 10/1/2023
Bbb 1/1/2023
Bbb 15/1/2023
Bbb 31/1/2023
Ccc 10/1/2023
I want to filter out all the rows in dataset2 that fits the time range (between start and end date) from dataset1 for each ID.
For this example, the output should be as following:
ID Date
Aaa 4/1/2023
Bbb 15/1/2023
Bbb 31/1/2023

Perhaps a simple inner join will give you the expected result
proc sql;
create table want as
select t2.*
from dataset1 t1
inner join dataset2 t2
on t1.id = t2.id
and t2.date between t1.start_date and t1.end_date
;
quit;
ID date
Aaa 04/01/23
Bbb 15/01/23
Bbb 31/01/23
If you have dates stored as a string instead (which seem to be the case following your desired output), consider using the input() function in the join
proc sql;
create table want as
select t2.*
from have1 t1
inner join have2 t2
on t1.id = t2.id
and input(t2.date, ddmmyy10.)
between input(t1.start_date, ddmmyy10.)
and input(t1.end_date, ddmmyy10.)
;
quit;
ID date
Aaa 4/1/2023
Bbb 15/1/2023
Bbb 31/1/2023

Try this
data one;
input ID $ (Start_Date End_Date)(:ddmmyy10.);
format Start_Date End_Date ddmmyy10.;
datalines;
Aaa 1/1/2023 5/1/2023
Bbb 10/1/2023 1/2/2023
Ccc 15/1/2023 27/1/2023
;
data two;
input ID $ Date :ddmmyy10.;
format Date ddmmyy10.;
datalines;
Aaa 4/1/2023
Aaa 10/1/2023
Bbb 1/1/2023
Bbb 15/1/2023
Bbb 31/1/2023
Ccc 10/1/2023
;
data want(keep = ID Date);
merge one two;
by ID;
if Start_Date <= Date <= End_Date;
run;

Using double set can solve this problem logically。
The first set is to load the dataset 2.And the second one is to load the range (datset 1).A if will only output the date that fitting the range.
Try this:
data dsout;
set ds2;
do i=1 to rec while(^eof);
set ds1(rename=(ID=ID_)) point=i nobs=rec end=eof;
if ID=ID_ and date>=Start_Date and date<=End_Date then do;
output;
leave;
end;
end;
drop Start_Date End_Date;
run;

Related

Conditional Merging in SAS

Hi all i have the below two datasets in which i need to map Date1 with Date2 in a range of (+/- 7)days within an ID.
Data set1;
input ID Date1 ddmmyy8.;
Format Date1 Date11.;
Datalines;
001 02-08-15
001 04-08-15
001 06-08-15
002 11-09-15
002 14-09-15
002 17-09-15
;
run;
Data set2;
input ID TYPE $ Date2 ddmmyy8.;
Format Date2 Date11.;
Datalines;
001 TYPE1 02-08-15
001 TYPE2 11-08-15
001 TYPE3 06-08-15
002 TYPE1 07-09-15
002 TYPE2 04-09-15
002 TYPE3 09-08-15
;
run;
Proc sql;
create table out as select a.ID, a.Date1, b.Date2,
intck('days', Date1, Date2) as Diff
from set1 as a full join set2 as b
on a.ID = b.ID and (Date1 + 7 >= Date2 >= Date1 - 7)
group by a.ID, Date1 having diff = min(diff);
quit;
I get the below output
i need output
Expected Output
***The output i get is highlighted in yellow when i map using min of Diff.
but the output i need is highlighted in green it is because i have to maintain the values in Date2 as distinct and does not repeat.
(i.e) Because 02-Aug-2015 is already mapped with 02-Aug-2015 as well as 09-Aug-2015 mapped with 09-Aug-2015 of Date1 i need the 04-Aug-2015 of Date1 to be mapped with the remaining 11-Aug-2015***
So judging on the comments above the rules are quite complex:
* join based on ID
* Absolute difference between date 1 and date 2 should be less then 7
* Preference for matching dates should be given to dates that are equal
* In case dates aren't equal, the solution should be so that as much as combinations are
returned as possible
* Date1 & Date2 need to be unique
I'm not sure whether the solution can be done in one proc sql step.
You can't just work with a simple rule as it is not always the result with the lowest difference between dates that needs to be returned. Next to that, you need to sort of 'remember' which dates per ID you have selected for an output since you can't select them anymore further on.
First I made an inner join on ID where the difference in dates was <= 7. This gives all possible valid combinations. Then I've put all the dates where the difference in dates was 0 in a macro variable so I can make a table of all possible combinations except dates being in the macro variable. In the last step then you want to have the solution returning the most possible combinations of date1 - date2 where both are distinct.
Data set1;
input ID Date1 ddmmyy8.;
Format Date1 Date11.;
Datalines;
001 02-08-15
001 04-08-15
001 06-08-15
002 11-09-15
002 14-09-15
002 17-09-15
;
run;
Data set2;
input ID TYPE $ Date2 ddmmyy8.;
Format Date2 Date11.;
Datalines;
001 TYPE1 02-08-15
001 TYPE2 11-08-15
001 TYPE3 06-08-15
002 TYPE1 07-09-15
002 TYPE2 04-09-15
002 TYPE3 09-08-15
;
run;
Proc sql noprint;
/*Create a macro variable with all the dates for which the difference between date1
and date2 = 0 */
select distinct put(Date1,yymmddn8.) into: dates seperated by ','
from set1 as a inner join set2 as b
on a.ID = b.ID and abs(intck('days', Date1, Date2)) = 0;
/*Create table with all lines where difference <= 7 but date is not in the ones with
difference = 0 */
create table out as select a.ID, a.Date1, b.Date2,
intck('days', Date1, Date2) as Diff
from set1 as a inner join set2 as b
on a.ID = b.ID where abs(intck('days', Date1, Date2)) <= 7 and
find("&dates.",put(Date1,yymmddn8.)) = 0 and find("&dates.",put(Date2,yymmddn8.)) = 0;
/* Check the number of possible combinations */
create table out as
select a.*,
b.cnt1 + c.cnt2 as combos
from out a left join (select distinct id,
date1,
count(*) as cnt1
from out
group by id, date1) b on a.date1 = b.date1 and a.id = b.id
left join (select distinct id,
date2,
count(*) as cnt2
from out
group by id, date2) c on a.date2 = c.date2 and a.id = c.id
order by id, combos;
quit;
/* Select unique dates per date1, date2 */
data out(keep = id date1 date2);
retain mem1 mem2;
length mem1 mem2 $100.;
set out;
by id combos;
if first.id then do;
mem1 = "0";
mem2 = "0";
end;
date10 = put(date1,yymmddn8.);
date20 = put(date2,yymmddn8.);
if find(mem1,date10) = 0 and find(mem2,date20) = 0 then do;
mem1 = catx(',',mem1,date10);
mem2 = catx(',',mem2,date20);
output;
end;
run;
/* Create a union between lines with no difference in date and lines with difference
in date*/
proc sql;
create table final as
select * from out
union
select a.ID, a.Date1, b.Date2
from set1 as a inner join set2 as b
on a.ID = b.ID and abs(intck('days', Date1, Date2)) = 0;
quit;
So this gives a table like:
Final Table

How to merge 2 datasets with different lengths?

I would like to merge 2 datasets with 2 different dimensions.
TABLE1: people
gender name
M raa
F chico
M july
F sergio
TABLE2: serial_numbers
gender serial
M 4
F 5
I want the result to be
result
gender name serial
M raa 4
F chico 5
M july 4
F sergio 5
I'm creating here the datasets to illustrate how to merge both datasets:
data people;
infile cards;
length gender $1
name $10;
input gender name;
cards;
M raa
F chico
M july
F sergio
;
run;
data serial_numbers;
length gender $1
serial 8;
infile cards;
input gender serial;
cards;
M 4
F 5
;
run;
Solution 1: use a proc sql to perform the join.
proc sql;
create table result as
select a.gender, a.name, b.serial
from people a LEFT JOIN serial_numbers b
on a.gender=b.gender;
quit;
proc print data=result;
run;
Solution 2: use a data step to merge both datasets. This requires the datasets to be sorted:
proc sort data=people;
by gender;
run;
proc sort data=serial_numbers;
by gender;
run;
data result;
merge people serial_numbers;
by gender;
run;
proc print data=result;
run;

Extracting info by matching two datasets in SAS

I have two datasets. Both have a common column- ID. I would like to check if ID from df1 lies in df2 and extract all such rows from df1. I'm doing this in SAS.
It is easily done in one sql query.
proc sql;
create table extract_from_df1 as
select
*
from
df1
where
id in (select id from df2)
;
quit;
There are lots of ways to do this. For example:
proc sql;
create table compare as select distinct
a.id as id1, b.id as id2
from table1 as a
left join table2 as b
on a.id = b.id;
quit;
and then keep matches. Or you can try:
proc sql;
delete from table2 where id2 in select distinct id1 from table1;
quit;
data df1;
input id name $;
cards;
1 abc
2 cde
3 fgh
4 ijk
;
run;
data df2;
input id address $;
cards;
1 abc
2 cde
5 ggh
6 ihh
7 jjj
;
run;
data c;
merge df1(in=x) df2(in=y);
if x and y;
keep id name;
run;
proc print data=c;
run;

How do I use PROC EXPAND to fill in time series observations within a panel (longitudinal) data set?

I'm using this SAS code:
data test1;
input cust_id $
month
category $
status $;
datalines;
A 200003 ABC C
A 200004 DEF C
A 200006 XYZ 3
B 199910 ASD X
B 199912 ASD C
;
quit;
proc sql;
create view test2 as
select cust_id, input(put(month, 6.), yymmn6.) as month format date9.,
category, status from test1 order by cust_id, month asc;
quit;
proc expand data=test2 out=test3 to=month method=none;
by cust_id;
id month;
quit;
proc print data=test3;
title "after expand";
quit;
and I want to create a dataset that looks like this:
Obs cust_id month category status
1 A 01MAR2000 ABC C
2 A 01APR2000 DEF C
3 A 01MAY2000 . .
4 A 01JUN2000 XYZ 3
5 B 01OCT1999 ASD X
6 B 01NOV1999 . .
7 B 01DEC1999 ASD C
but the output from proc expand just says "Nothing to do. The data set WORK.TEST3 has 0 observations and 0 variables." I don't want/need to change the frequency of the data, just interpolate it with missing values.
What am I doing wrong here? I think proc expand is the correct procedure to use, based on this example and the documentation, but for whatever reason it doesn't create the data.
You need to add a VAR statement. Unfortunately, the variables need to be numeric. So just expand the month by cust_id. Then join back the original values.
proc expand data=test2 out=test3 to=month ;
by cust_id;
id month;
var _numeric_;
quit;
proc sql noprint;
create table test4 as
select a.*,
b.category,
b.status
from test3 as a
left join
test2 as b
on a.cust_id = b.cust_id
and a.month = b.month;
quit;
proc print data=test4;
title "after expand";
quit;

Transposing wide to long in SAS, without extra columns

I'd like to transpose a dataset, but SAS insists on adding a new column, if the "by" column
has multiple entries.
So if I run
data test;
input a b $ c $ ;
datalines;
1 aaa bbb
1 bbb bbb
2 ccc ccc
3 ccc ccc
;
run;
proc transpose data=test;
by a;
var b b;
run;
I get a table with two columns that looks like this:
1 b aaa bbb
1 c bbb bbb
2 b ccc
2 c ccc
3 b ccc
3 c ccc
What I'd like with a table that looks like this:
1 b aaa
1 c bbb
1 b bbb
1 c bbb
2 b ccc
2 c ccc
3 b ccc
3 c ccc
So instead of adding columns, for each entry, I want SAS to add rows. Any ideas on how to do this?
Just to be clear, this is a toy example! The dataset I'm working with has more columns.
This ought to work (using your example code):
proc transpose data=test out=test_tran1(rename=(_name_ = old_var));
by a;
var b c;
run;
proc transpose data=test_tran1 out=test_tran2(drop=_: rename = (col1=values) where = (not missing(values)));
by a old_var;
var col:;
run;
You can't use PROC TRANSPOSE in a single step with a 'mixed' dataset (multiple rows per by group AND multiple columns) to get long. Transpose only really works well going all one or the other.
Easiest way to get long is usually the data step.
data want;
set test;
array vars b c;
do _i = 1 to dim(vars);
varname = vname(vars[_i]);
value = vars[_i];
output;
end;
keep a varname value;
run;
data output;
set test;
array vars {*} b -- c; * define array composed of list of variables from B to C, have to be of same type;
length varname $32;
keep a varname value;
do i=1 to dim(vars);* loop array (list of variables);
varname= vname(vars(i));* get name of variable that provided value;
value = vars(i);* get the value of variable;
output; *output row;
end;
run;