Finding values not in another dataset in SAS - sas

In SAS:I'm having 2 datasets & if I want to find out values of only variable which are not present in another dataset that's easy.
Now if I have to compare in the following way:
data dataset1;
input PointA $ PointB $ #6 date date7.;
format date mmddyy10.;
datalines;
NY LV 02Oct2018
NY LV 04Oct2018
NY LV 06Oct2018
;
which gives Dataset1:
Obs PointA PointB Date
1 NY LV 10/02/2002
2 NY LV 10/04/2002
3 NY LV 10/06/2002
Dataset2 has dates from 01Oct2018 to 06Oct2018.
DATE
01Oct2018
02Oct2018
03Oct2018
04Oct2018
05Oct2018
06Oct2018
WANTED: The final output i want is which all values (dates) in Dataset1 are absent for PointA-PointB as compared to Dataset2. So my desired output is:
Obs PointA PointB Date
1 NY LV 10/01/2002
2 NY LV 10/03/2002
3 NY LV 10/05/2002
I'm using NOT IN but it gives me only the dates. Somehow I need to include the other variables; in this case PointA, PointB.

Perform a full cartesian join with ON criteria involving a not equal (ne) comparison. Remove the original 'have' rows from the cartesian with an EXCEPT set-operator
* use the proper informat! ;
data have;
input PointA $ PointB $ date date9.; format date mmddyy10.; datalines;
NY LV 02Oct2018
NY LV 04Oct2018
NY LV 06Oct2018
;
data dates; input
date date9.; format date mmddyy10.; datalines;
01Oct2018
02Oct2018
03Oct2018
04Oct2018
05Oct2018
06Oct2018
;
proc sql;
create table have_not_lookup as
select distinct
have.PointA, have.PointB, lookup.date
from
have
join
dates lookup
on
lookup.date NE have.date
except
select * from have
;

Related

Bundling healthcare claims using SAS/SQL

Observations from "other_claims" data set are to summed with the observations in the "event_claims" data set under the following conditions:
"Other_claims" occur within a 90-day window of the event_claim, "stay_discharge_dt," are to be summed with the event cost("cost_event").
If the "other_claim" partially overlaps with the 90-day period, only overlapping days are to be included.
The included fraction: (# of overlapping days)/(total # of days of the other_claim)
Here's the sql solution I am considering. I'm curious if this could be more efficient?
data event_claims;
input patient_id stay_admission_dt mmddyy10. #14stay_discharge_dt mmddyy10. doctor cost_event;
format stay_admission_dt stay_discharge_dt mmddyy10.;
datalines;
1 06/10/2019 06/15/2019 45 20000
2 10/18/2018 10/22/2018 78 30000
;
data other_claims;
length patient_id 3. type $19;
input patient_id Type$ service_start_date :mmddyy10. service_end_date :mmddyy10. service_cost dollar7.0;
format service_start_date service_end_date mmddyy10.;
datalines;
1 skilled_nursing 06/15/2019 06/25/2019 $7,000
1 home-health 06/25/2019 08/25/2019 $24,000
1 office_visit 07/1/2019 07/1/2019 $200
1 home_health 08/26/2019 09/26/2019 $12,000
2 er_visit 10/15/2018 10/16/2018 $1,500
2 home_health 10/23/2018 11/23/2018 $8,000
2 outpatient_services 01/18/2019 1/22/2019 $5,000
;
proc sql;
create table events_others as
select a.person_id
,a.stay_admission_dt
,a.stay_discharge_dt
,a.stay_discharge_dt+90 as service_deadline format mmddyy10.
,b.service_start_date
,b.service_end_date
,case when b.service_start_date > calculated service_deadline
or b.service_start_date < a.stay_admission_dt
then "service not payable"
else "payable" end as payable
,case when calculated payable = "payable"
and b.service_end_date > calculated service_deadline
then intck("days",b.service_end_date, calculated service_deadline )
else 0 end as overlap /* When the other claim event exceeds the 90-day window of the*/
,a.service_cost
,b.service_cost as service_cost_other
,case when calculated overlap ne 0
then (intck("days",b.service_start_date,b.service_end_date) + calculated overlap)/intck("days",b.service_start_date,b.service_end_date)
else 0 end as partial_factor
,calculated partial_factor * b.service_cost as final_other_cost format=dollar9.2
from event_claims a
left join other_claims b
on a.person_id=b.person_id
group by a.person_id
,a.stay_admission_dt
,a.stay_discharge_dt
order by a.person_id
,a.stay_admission_dt
;quit;
proc sql;
create table total_cost_of_care as
select a.*
,b.final_other_cost format=dollar9.2
,a.service_cost + final_other_cost as total_episode_cost format=dollar12.2
from events_others a
inner join
(select person_id
,stay_admission_dt
,sum(final_other_cost) as final_other_cost
from events_others
group by person_id
,stay_admission_dt
) b
on (a.person_id=b.person_id
and a.stay_admission_dt=b.stay_admission_dt)
;quit;

SUM() in SAS from same table where statement

trying to figure out how I can make the below happen in SAS:
so on the compare column I would like to add (both amount to both item 1 and 2 separately)
one way to do.
data have;
input state $ compare $ comp_cnt;
datalines;
NY Both 4000
NY Item1 3500
NY Item2 2000
KY Both 5000
KY Item1 3000
KY Item2 4000
;
proc SQL;
select a.state,
a.compare,
a.comp_cnt +b.comp_cnt as comp_cnt
from
(select * from have
where compare ne 'Both')a
left join
(select * from have
where compare ='Both')b
on a.state=b.state;
quit;

Conditional Merging in SAS

Hi all i have the below two datasets in which i need to map Date1 with Date2 in a range of (+/- 7)days within an ID.
Data set1;
input ID Date1 ddmmyy8.;
Format Date1 Date11.;
Datalines;
001 02-08-15
001 04-08-15
001 06-08-15
002 11-09-15
002 14-09-15
002 17-09-15
;
run;
Data set2;
input ID TYPE $ Date2 ddmmyy8.;
Format Date2 Date11.;
Datalines;
001 TYPE1 02-08-15
001 TYPE2 11-08-15
001 TYPE3 06-08-15
002 TYPE1 07-09-15
002 TYPE2 04-09-15
002 TYPE3 09-08-15
;
run;
Proc sql;
create table out as select a.ID, a.Date1, b.Date2,
intck('days', Date1, Date2) as Diff
from set1 as a full join set2 as b
on a.ID = b.ID and (Date1 + 7 >= Date2 >= Date1 - 7)
group by a.ID, Date1 having diff = min(diff);
quit;
I get the below output
i need output
Expected Output
***The output i get is highlighted in yellow when i map using min of Diff.
but the output i need is highlighted in green it is because i have to maintain the values in Date2 as distinct and does not repeat.
(i.e) Because 02-Aug-2015 is already mapped with 02-Aug-2015 as well as 09-Aug-2015 mapped with 09-Aug-2015 of Date1 i need the 04-Aug-2015 of Date1 to be mapped with the remaining 11-Aug-2015***
So judging on the comments above the rules are quite complex:
* join based on ID
* Absolute difference between date 1 and date 2 should be less then 7
* Preference for matching dates should be given to dates that are equal
* In case dates aren't equal, the solution should be so that as much as combinations are
returned as possible
* Date1 & Date2 need to be unique
I'm not sure whether the solution can be done in one proc sql step.
You can't just work with a simple rule as it is not always the result with the lowest difference between dates that needs to be returned. Next to that, you need to sort of 'remember' which dates per ID you have selected for an output since you can't select them anymore further on.
First I made an inner join on ID where the difference in dates was <= 7. This gives all possible valid combinations. Then I've put all the dates where the difference in dates was 0 in a macro variable so I can make a table of all possible combinations except dates being in the macro variable. In the last step then you want to have the solution returning the most possible combinations of date1 - date2 where both are distinct.
Data set1;
input ID Date1 ddmmyy8.;
Format Date1 Date11.;
Datalines;
001 02-08-15
001 04-08-15
001 06-08-15
002 11-09-15
002 14-09-15
002 17-09-15
;
run;
Data set2;
input ID TYPE $ Date2 ddmmyy8.;
Format Date2 Date11.;
Datalines;
001 TYPE1 02-08-15
001 TYPE2 11-08-15
001 TYPE3 06-08-15
002 TYPE1 07-09-15
002 TYPE2 04-09-15
002 TYPE3 09-08-15
;
run;
Proc sql noprint;
/*Create a macro variable with all the dates for which the difference between date1
and date2 = 0 */
select distinct put(Date1,yymmddn8.) into: dates seperated by ','
from set1 as a inner join set2 as b
on a.ID = b.ID and abs(intck('days', Date1, Date2)) = 0;
/*Create table with all lines where difference <= 7 but date is not in the ones with
difference = 0 */
create table out as select a.ID, a.Date1, b.Date2,
intck('days', Date1, Date2) as Diff
from set1 as a inner join set2 as b
on a.ID = b.ID where abs(intck('days', Date1, Date2)) <= 7 and
find("&dates.",put(Date1,yymmddn8.)) = 0 and find("&dates.",put(Date2,yymmddn8.)) = 0;
/* Check the number of possible combinations */
create table out as
select a.*,
b.cnt1 + c.cnt2 as combos
from out a left join (select distinct id,
date1,
count(*) as cnt1
from out
group by id, date1) b on a.date1 = b.date1 and a.id = b.id
left join (select distinct id,
date2,
count(*) as cnt2
from out
group by id, date2) c on a.date2 = c.date2 and a.id = c.id
order by id, combos;
quit;
/* Select unique dates per date1, date2 */
data out(keep = id date1 date2);
retain mem1 mem2;
length mem1 mem2 $100.;
set out;
by id combos;
if first.id then do;
mem1 = "0";
mem2 = "0";
end;
date10 = put(date1,yymmddn8.);
date20 = put(date2,yymmddn8.);
if find(mem1,date10) = 0 and find(mem2,date20) = 0 then do;
mem1 = catx(',',mem1,date10);
mem2 = catx(',',mem2,date20);
output;
end;
run;
/* Create a union between lines with no difference in date and lines with difference
in date*/
proc sql;
create table final as
select * from out
union
select a.ID, a.Date1, b.Date2
from set1 as a inner join set2 as b
on a.ID = b.ID and abs(intck('days', Date1, Date2)) = 0;
quit;
So this gives a table like:
Final Table

How to merge 2 datasets with different lengths?

I would like to merge 2 datasets with 2 different dimensions.
TABLE1: people
gender name
M raa
F chico
M july
F sergio
TABLE2: serial_numbers
gender serial
M 4
F 5
I want the result to be
result
gender name serial
M raa 4
F chico 5
M july 4
F sergio 5
I'm creating here the datasets to illustrate how to merge both datasets:
data people;
infile cards;
length gender $1
name $10;
input gender name;
cards;
M raa
F chico
M july
F sergio
;
run;
data serial_numbers;
length gender $1
serial 8;
infile cards;
input gender serial;
cards;
M 4
F 5
;
run;
Solution 1: use a proc sql to perform the join.
proc sql;
create table result as
select a.gender, a.name, b.serial
from people a LEFT JOIN serial_numbers b
on a.gender=b.gender;
quit;
proc print data=result;
run;
Solution 2: use a data step to merge both datasets. This requires the datasets to be sorted:
proc sort data=people;
by gender;
run;
proc sort data=serial_numbers;
by gender;
run;
data result;
merge people serial_numbers;
by gender;
run;
proc print data=result;
run;

How to use proc compare to update dataset

I want to use proc compare to update dataset on a daily basis.
work.HAVE1
Date Key Var1 Var2
01Aug2013 K1 a 2
01Aug2013 K2 a 3
02Aug2013 K1 b 4
work.HAVE2
Date Key Var1 Var2
01Aug2013 K1 a 3
01Aug2013 K2 a 3
02Aug2013 K1 b 4
03Aug2013 K2 c 1
Date and Key are uniquely determine one record.
How can I use the above two tables to construct the following
work.WANT
Date Key Var1 Var2
01Aug2013 K1 a 3
01Aug2013 K2 a 3
02Aug2013 K1 b 4
03Aug2013 K2 c 1
I don't want to delete the previous data and then rebuild it. I want to modify it via append new records at the bottom and adjust the values in VAR1 or VAR2.
I'm struggling with proc compare but it just doesn't return what I want.
proc compare base=work.HAVE1 compare=work.HAVE2 out=WORK.DIFF outnoequal outcomp;
id Date Key;
run;
This will give you new and changed (unequal records) in single dataset WORK.DIFF. You'll have to distinguish new vs changed yourself.
However, what you want to achieve is actually a MERGE - inserts new, overwrites existing, though maybe due to performance reasons etc. you don't want to re-create the full table.
data work.WANT;
merge work.HAVE1 work.HAVE2;
by Date Key;
run;
Edit1:
/* outdiff option will produce records with _type_ = 'DIF' for matched keys */
proc compare base=work.HAVE1 compare=work.HAVE2 out=WORK.RESULT outnoequal outcomp outdiff;
id Date Key;
run;
data WORK.DIFF_KEYS; /* keys of changed records */
set WORK.RESULT;
where _type_ = 'DIF';
keep Date Key;
run;
/* split NEW and CHANGED */
data
WORK.NEW
WORK.CHANGED
;
merge
WORK.RESULT (where=( _type_ ne 'DIF'));
WORK.DIFF_KEYS (in = d)
;
by Date Key;
if d then output WORK.CHANGED;
else output WORK.NEW;
run;
Edit2:
Now you can just APPEND the WORK.NEW to target table.
For WORK.CHANGED - either use MODIFY or UPDATE statement to update the records.
Depending on the size of the changes, you can also think about PROC SQL; DELETE to delete old records and PROC APPEND to add new values.
All a PROC COMPARE will do will tell you the differences between 2 datasets. To achieve your goal you need to use an UPDATE statement in a data step. This way, values in HAVE1 are updated with HAVE2 where the date and key match, or a new record inserted if there are no matches.
data have1;
input Date :date9. Key $ Var1 $ Var2;
format date date9.;
datalines;
01Aug2013 K1 a 2
01Aug2013 K2 a 3
02Aug2013 K1 b 4
;
run;
data have2;
input Date :date9. Key $ Var1 $ Var2;
format date date9.;
datalines;
01Aug2013 K1 a 3
01Aug2013 K2 a 3
02Aug2013 K1 b 4
03Aug2013 K2 c 1
;
run;
data want;
update have1 have2;
by date key;
run;