Hi all i have the below two datasets in which i need to map Date1 with Date2 in a range of (+/- 7)days within an ID.
Data set1;
input ID Date1 ddmmyy8.;
Format Date1 Date11.;
Datalines;
001 02-08-15
001 04-08-15
001 06-08-15
002 11-09-15
002 14-09-15
002 17-09-15
;
run;
Data set2;
input ID TYPE $ Date2 ddmmyy8.;
Format Date2 Date11.;
Datalines;
001 TYPE1 02-08-15
001 TYPE2 11-08-15
001 TYPE3 06-08-15
002 TYPE1 07-09-15
002 TYPE2 04-09-15
002 TYPE3 09-08-15
;
run;
Proc sql;
create table out as select a.ID, a.Date1, b.Date2,
intck('days', Date1, Date2) as Diff
from set1 as a full join set2 as b
on a.ID = b.ID and (Date1 + 7 >= Date2 >= Date1 - 7)
group by a.ID, Date1 having diff = min(diff);
quit;
I get the below output
i need output
Expected Output
***The output i get is highlighted in yellow when i map using min of Diff.
but the output i need is highlighted in green it is because i have to maintain the values in Date2 as distinct and does not repeat.
(i.e) Because 02-Aug-2015 is already mapped with 02-Aug-2015 as well as 09-Aug-2015 mapped with 09-Aug-2015 of Date1 i need the 04-Aug-2015 of Date1 to be mapped with the remaining 11-Aug-2015***
So judging on the comments above the rules are quite complex:
* join based on ID
* Absolute difference between date 1 and date 2 should be less then 7
* Preference for matching dates should be given to dates that are equal
* In case dates aren't equal, the solution should be so that as much as combinations are
returned as possible
* Date1 & Date2 need to be unique
I'm not sure whether the solution can be done in one proc sql step.
You can't just work with a simple rule as it is not always the result with the lowest difference between dates that needs to be returned. Next to that, you need to sort of 'remember' which dates per ID you have selected for an output since you can't select them anymore further on.
First I made an inner join on ID where the difference in dates was <= 7. This gives all possible valid combinations. Then I've put all the dates where the difference in dates was 0 in a macro variable so I can make a table of all possible combinations except dates being in the macro variable. In the last step then you want to have the solution returning the most possible combinations of date1 - date2 where both are distinct.
Data set1;
input ID Date1 ddmmyy8.;
Format Date1 Date11.;
Datalines;
001 02-08-15
001 04-08-15
001 06-08-15
002 11-09-15
002 14-09-15
002 17-09-15
;
run;
Data set2;
input ID TYPE $ Date2 ddmmyy8.;
Format Date2 Date11.;
Datalines;
001 TYPE1 02-08-15
001 TYPE2 11-08-15
001 TYPE3 06-08-15
002 TYPE1 07-09-15
002 TYPE2 04-09-15
002 TYPE3 09-08-15
;
run;
Proc sql noprint;
/*Create a macro variable with all the dates for which the difference between date1
and date2 = 0 */
select distinct put(Date1,yymmddn8.) into: dates seperated by ','
from set1 as a inner join set2 as b
on a.ID = b.ID and abs(intck('days', Date1, Date2)) = 0;
/*Create table with all lines where difference <= 7 but date is not in the ones with
difference = 0 */
create table out as select a.ID, a.Date1, b.Date2,
intck('days', Date1, Date2) as Diff
from set1 as a inner join set2 as b
on a.ID = b.ID where abs(intck('days', Date1, Date2)) <= 7 and
find("&dates.",put(Date1,yymmddn8.)) = 0 and find("&dates.",put(Date2,yymmddn8.)) = 0;
/* Check the number of possible combinations */
create table out as
select a.*,
b.cnt1 + c.cnt2 as combos
from out a left join (select distinct id,
date1,
count(*) as cnt1
from out
group by id, date1) b on a.date1 = b.date1 and a.id = b.id
left join (select distinct id,
date2,
count(*) as cnt2
from out
group by id, date2) c on a.date2 = c.date2 and a.id = c.id
order by id, combos;
quit;
/* Select unique dates per date1, date2 */
data out(keep = id date1 date2);
retain mem1 mem2;
length mem1 mem2 $100.;
set out;
by id combos;
if first.id then do;
mem1 = "0";
mem2 = "0";
end;
date10 = put(date1,yymmddn8.);
date20 = put(date2,yymmddn8.);
if find(mem1,date10) = 0 and find(mem2,date20) = 0 then do;
mem1 = catx(',',mem1,date10);
mem2 = catx(',',mem2,date20);
output;
end;
run;
/* Create a union between lines with no difference in date and lines with difference
in date*/
proc sql;
create table final as
select * from out
union
select a.ID, a.Date1, b.Date2
from set1 as a inner join set2 as b
on a.ID = b.ID and abs(intck('days', Date1, Date2)) = 0;
quit;
So this gives a table like:
Final Table
Related
Observations from "other_claims" data set are to summed with the observations in the "event_claims" data set under the following conditions:
"Other_claims" occur within a 90-day window of the event_claim, "stay_discharge_dt," are to be summed with the event cost("cost_event").
If the "other_claim" partially overlaps with the 90-day period, only overlapping days are to be included.
The included fraction: (# of overlapping days)/(total # of days of the other_claim)
Here's the sql solution I am considering. I'm curious if this could be more efficient?
data event_claims;
input patient_id stay_admission_dt mmddyy10. #14stay_discharge_dt mmddyy10. doctor cost_event;
format stay_admission_dt stay_discharge_dt mmddyy10.;
datalines;
1 06/10/2019 06/15/2019 45 20000
2 10/18/2018 10/22/2018 78 30000
;
data other_claims;
length patient_id 3. type $19;
input patient_id Type$ service_start_date :mmddyy10. service_end_date :mmddyy10. service_cost dollar7.0;
format service_start_date service_end_date mmddyy10.;
datalines;
1 skilled_nursing 06/15/2019 06/25/2019 $7,000
1 home-health 06/25/2019 08/25/2019 $24,000
1 office_visit 07/1/2019 07/1/2019 $200
1 home_health 08/26/2019 09/26/2019 $12,000
2 er_visit 10/15/2018 10/16/2018 $1,500
2 home_health 10/23/2018 11/23/2018 $8,000
2 outpatient_services 01/18/2019 1/22/2019 $5,000
;
proc sql;
create table events_others as
select a.person_id
,a.stay_admission_dt
,a.stay_discharge_dt
,a.stay_discharge_dt+90 as service_deadline format mmddyy10.
,b.service_start_date
,b.service_end_date
,case when b.service_start_date > calculated service_deadline
or b.service_start_date < a.stay_admission_dt
then "service not payable"
else "payable" end as payable
,case when calculated payable = "payable"
and b.service_end_date > calculated service_deadline
then intck("days",b.service_end_date, calculated service_deadline )
else 0 end as overlap /* When the other claim event exceeds the 90-day window of the*/
,a.service_cost
,b.service_cost as service_cost_other
,case when calculated overlap ne 0
then (intck("days",b.service_start_date,b.service_end_date) + calculated overlap)/intck("days",b.service_start_date,b.service_end_date)
else 0 end as partial_factor
,calculated partial_factor * b.service_cost as final_other_cost format=dollar9.2
from event_claims a
left join other_claims b
on a.person_id=b.person_id
group by a.person_id
,a.stay_admission_dt
,a.stay_discharge_dt
order by a.person_id
,a.stay_admission_dt
;quit;
proc sql;
create table total_cost_of_care as
select a.*
,b.final_other_cost format=dollar9.2
,a.service_cost + final_other_cost as total_episode_cost format=dollar12.2
from events_others a
inner join
(select person_id
,stay_admission_dt
,sum(final_other_cost) as final_other_cost
from events_others
group by person_id
,stay_admission_dt
) b
on (a.person_id=b.person_id
and a.stay_admission_dt=b.stay_admission_dt)
;quit;
Since I am new to SAS I need some help to understand how to combine the overlap date ranges into one row.I want to combine the overlap date ranges when they have matching Id. If the dates don’t overlap then I want to keep them as it is. IF they over lap by Matching Id and drug code Then it should combine into one line. Please look at the same ple data set which I have below and the expected results:
Current Data set:
ID Drug Code BEG_Date End_Date
1 100 1/1/2018 1/1/2019
1 100 1/1/2018 3/1/2018
1 100 2/1/2018 04/30/2018
1 90 4/1/2018 04/30/2018
1 100 5/1/2018 6/1/2018
1 98 6/1/2018 8/31/2018
1 100 9/1/2018 5/4/2019
Expected results:
ID Drug Code BEG_Date End_Date
1 100 1/1/2018 3/31/2018
1 90 4/1/2018 04/30/2018
1 100 5/1/2018 6/1/2018
1 98 6/2/2018 8/31/2018
1 100 9/1/2018 5/4/2019
I wrote some SAS code but I am combining the dates even when there is no overlap. I want to write some code which should work in SAS.
PROC SORT DATA=Want OUT=ONE;
BY PERSON_ID BEG_DATE DRUG_CODE END_DATE;
RUN;
data TWO (DROP=PERSON_ID2 DRUG_CODE2 BEG_DATE END_DATE
RENAME=(BEG2=BEG_DOS
END2=END_DOS));
SET ONE;
RETAIN BEG2 END2;
PERSON_ID2=LAG1(PERSON_ID);
DRUG_CODE2=LAG1(DRUG_CODE);
IF PERSON_ID2=PERSON_ID AND DRUG_CODE2=DRUG_CODE AND BEG_DATE LE(END2+1) THEN
DO;
BEG2=MIN(BEG_DATE,BEG2);
END2=MAX(END_DATE,END2);
END;
ELSE
DO;
SEG+1;
BEG2=BEG_DATE;
END2=END_DATE;
END;
FORMAT BEG2 END2 MMDDYY10.;
RUN;
DATA THREE(DROP=BEG_DOS END_DOS SEG);
RETAIN BEG_DATE END_DATE;
SET TWO;
BY PERSON_ID SEG;
FORMAT BEG_DATE END_DATE MMDDYY10.;
IF FIRST.SEG THEN
DO;
BEG_DATE=BEG_DOS;
END;
IF LAST.SEG THEN
DO;
END_DATE = END_DOS;
OUTPUT;
END;
RUN;
This is how I would do it. Create an obs for each ID DRUG and DATE. Flag the gaps and summarize by RUN.
data have;
input ID Drug_Code (BEG End)(:mmddyy.);
format BEG End mmddyyd10.;
cards;
1 100 1/1/2018 3/1/2018
1 100 2/1/2018 04/30/2018
1 90 4/1/2018 04/30/2018
1 90 6/1/2018 8/15/2018
1 100 5/1/2018 6/1/2018
1 98 6/1/2018 8/31/2018
1 100 9/1/2018 5/4/2019
;;;;
run;
proc print;
run;
/*1 100 1/1/2018 1/1/2019*/
data exv/ view=exv;
set have;
do date = beg to end;
output;
end;
drop beg end;
format date mmddyyd10.;
run;
proc sort data=exv out=ex nodupkey;
by id drug_code date;
run;
data breaksV / view=BreaksV;
set ex;
by id drug_code;
dif = dif(date);
if first.drug_code then do; dif=1; run=1; end;
if dif ne 1 then run+1;
run;
proc summary data=breaksV nway missing;
class id drug_code run;
var date;
output out=want(drop=_type_) min=Begin max=End;
run;
Proc print;
run;
Computing the extent range composed of overlapping segment ranges requires a good understanding of the range conditions (cases).
Consider the scenarios when sorted by start date (within any larger grouping set, G, such as id and drug)
Let [ and ] be endpoints of a range
# be date values (integers) within
Extent be the combined range that grows
Segment be the range in the current row
Case 1 - Growth. Within G Segment start before Extent end
Segment will either not contribute to Extent or extend it.
[####] Extent
+ [#] Segment range DOES NOT contribute
--------
[####] Extent (do not output a row, still growing)
or
[####] Extent
+ [#####] Segment range DOES contribute
--------
[#######] Extent (do not output a row, still growing)
Case 2 - Terminus. 3 possibilities:
Within G Segment start after Extent end,
Next G reached (different id/drug combination),
End of data reached.
#2 and #3 can be tested by checking the appropriate last. flag.
[####] Extent
+ ..[#] Segment beyond Extent (gap is 2)
--------
[####] output Extent
[#] reset Extent to Segment
You can adjust your rules for Segment being adjacent (gap=0) or close enough (gap < threshold) to mean an Extent is either expanded, or, output and reset to Segment.
Note: The situation is a little more (not shown) complicated for the real world cases of:
missing start means the Segment has an unknown start date (presume it to be epoch (0=01JAN1960, or some date that pre-dates all dates in the data or study)
missing end means the Segment is active today (end date is date when processing data)
Sample code:
data have;
call streaminit(42);
do id = 1 to 10;
do _n_ = 1 to 50;
drug = ceil(rand('UNIFORM', 10));
beg_date = intnx ('MONTH', '01JAN2008'D, rand('UNIFORM',20));
end_date = intnx ('DAY', beg_date, rand('UNIFORM',75));
OUTPUT;
end;
end;
format beg_date end_date yymmdd10.;
run;
proc sort data=have out=segments;
by id drug beg_date end_date;
run;
data want;
set segments;
by id drug beg_date end_date; * will error if incoming data is NOT sorted;
retain ext_beg ext_end;
retain gap_allowed 0; * set to 1 for contiguously adjacent segment ;
if first.drug then do;
ext_beg = beg_date;
ext_end = end_date;
segment_count = 0;
end;
if beg_date <= ext_end + gap_allowed then do;
ext_end = max (ext_end, end_date);
segment_count + 1;
end;
else do;
extent_id + 1;
OUTPUT;
ext_beg = beg_date;
ext_end = end_date;
segment_count = 1;
end;
if last.drug then do;
extent_id + 1;
OUTPUT;
* reset occurs implicitly;
* it will happen at first. logic when control returns to top of step;
end;
format ext_: yymmdd10.;
keep id drug ext_beg ext_end segment_count extent_id;
run;
I have two datasets. Both have a common column- ID. I would like to check if ID from df1 lies in df2 and extract all such rows from df1. I'm doing this in SAS.
It is easily done in one sql query.
proc sql;
create table extract_from_df1 as
select
*
from
df1
where
id in (select id from df2)
;
quit;
There are lots of ways to do this. For example:
proc sql;
create table compare as select distinct
a.id as id1, b.id as id2
from table1 as a
left join table2 as b
on a.id = b.id;
quit;
and then keep matches. Or you can try:
proc sql;
delete from table2 where id2 in select distinct id1 from table1;
quit;
data df1;
input id name $;
cards;
1 abc
2 cde
3 fgh
4 ijk
;
run;
data df2;
input id address $;
cards;
1 abc
2 cde
5 ggh
6 ihh
7 jjj
;
run;
data c;
merge df1(in=x) df2(in=y);
if x and y;
keep id name;
run;
proc print data=c;
run;
I'm using this SAS code:
data test1;
input cust_id $
month
category $
status $;
datalines;
A 200003 ABC C
A 200004 DEF C
A 200006 XYZ 3
B 199910 ASD X
B 199912 ASD C
;
quit;
proc sql;
create view test2 as
select cust_id, input(put(month, 6.), yymmn6.) as month format date9.,
category, status from test1 order by cust_id, month asc;
quit;
proc expand data=test2 out=test3 to=month method=none;
by cust_id;
id month;
quit;
proc print data=test3;
title "after expand";
quit;
and I want to create a dataset that looks like this:
Obs cust_id month category status
1 A 01MAR2000 ABC C
2 A 01APR2000 DEF C
3 A 01MAY2000 . .
4 A 01JUN2000 XYZ 3
5 B 01OCT1999 ASD X
6 B 01NOV1999 . .
7 B 01DEC1999 ASD C
but the output from proc expand just says "Nothing to do. The data set WORK.TEST3 has 0 observations and 0 variables." I don't want/need to change the frequency of the data, just interpolate it with missing values.
What am I doing wrong here? I think proc expand is the correct procedure to use, based on this example and the documentation, but for whatever reason it doesn't create the data.
You need to add a VAR statement. Unfortunately, the variables need to be numeric. So just expand the month by cust_id. Then join back the original values.
proc expand data=test2 out=test3 to=month ;
by cust_id;
id month;
var _numeric_;
quit;
proc sql noprint;
create table test4 as
select a.*,
b.category,
b.status
from test3 as a
left join
test2 as b
on a.cust_id = b.cust_id
and a.month = b.month;
quit;
proc print data=test4;
title "after expand";
quit;
I have a data set with daily data in SAS. I would like to convert this to monthly form by taking differences from the previous month's value by id. For example:
thedate, id, val
2012-01-01, 1, 10
2012-01-01, 2, 14
2012-01-02, 1, 11
2012-01-02, 2, 12
...
2012-02-01, 1, 20
2012-02-01, 2, 15
I would like to output:
thedate, id, val
2012-02-01, 1, 10
2012-02-01, 2, 1
Here is one way. If you license SAS-ETS, there might be a better way to do it with PROC EXPAND.
*Setting up the dataset initially;
data have;
informat thedate YYMMDD10.;
input thedate id val;
datalines;
2012-01-01 1 10
2012-01-01 2 14
2012-01-02 1 11
2012-01-02 2 12
2012-02-01 1 20
2012-02-01 2 15
;;;;
run;
*Sorting by ID and DATE so it is in the right order;
proc sort data=have;
by id thedate;
run;
data want;
set have;
retain lastval; *This is retained from record to record, so the value carries down;
by id thedate;
if (first.id) or (last.id) or (day(thedate)=1); *The only records of interest - the first record, the last record, and any record that is the first of a month.;
* To do END: if (first.id) or (last.id) or (thedate=intnx('MONTH',thedate,0,'E'));
if first.id then call missing(lastval); *Each time ID changes, reset lastval to missing;
if missing(lastval) then output; *This will be true for the first record of each ID only - put that record out without changes;
else do;
val = val-lastval; *set val to the new value (current value minus retained value);
output; *put the record out;
end;
lastval=sum(val,lastval); *this value is for the next record;
run;
You could achieve this using a PROC SQL, and the intnx function to bring last months date forward a month...
proc sql ;
create table lag as
select b.thedate, b.id, (b.val - a.val) as val
from mydata b
left join
mydata a on b.date = intnx('month',a.date,1,'s')
and b.id = a.id
order by b.date, b.id ;
quit ;
This may need tweaking to handle scenarios where the previous month doesn't exist or months which have a different number of days to the previous month.