How can I assign values to dataset based on time and overlapping numerical ranges? - SAS - sas

I have a credit card transaction dataset (let's call it "Trans") with transaction amount, zip code, and date. I have another dataset (let's call it "Key") that lists sales tax rates based on date and geocode. The Key dataset also includes a range of zip codes associated with each geocode represented by 2 variables: Zip Start and Zip End.
Because Geocodes don't align with zip codes, some of the zip code ranges overlap. If this happens, I want to use the lowest sales tax rate associated with the zip code shown in Trans.
Trans dataset:
TransAmount TransDate TransZip
$200 01/07/1998 90010
$12 02/09/2002 90022
Key dataset:
Geocode Rate StartDate EndDate ZipStart ZipEnd
1001 .0825 199701 200012 90001 90084
1001 .085 200101 200812 90001 90084
1002 .0825 199701 200012 90022 90024
1002 .08 200101 200812 90022 90024
Desired output:
TransAmount TransDate TransZip Rate
$200 01/07/1998 90010 .0825
$12 02/09/2002 90022 .08
I used this basic SQL code in SAS, but I run into the problem of overlapping zip codes.
proc sql;
create table output as
select a.*, b.zipstart, b.zipend, b.startdate, b.enddate, b.rate
from Trans.CA_Zip_Cd_Testing a left join Key.CA_rates b
on a.TranZip ge b.zipstart
and a.TranZip le b.zipend
and a.TransDate ge b.StartDate
and a.transDate le b.EndDate
;
quit;

Well the easiest way to do this as far as the query portion is to just add a subquery to get the min rate.
Select t.transamount, t.transdate,t.transzip
,(Select MIN(rate) from Key where t.transzip between ZipStart and ZipEnd and t.transdate between startdate and enddate) 'Rate'
from trans t
You could also do it as subquery and join on it.

The SAS SQL Optimizer can be good sometimes. Other times, it can be a challenge. This code is going to be a bit more complicated, but it will likely be faster, and subject to size constraints on your key table.
data key;
set key;
dummy_key=1;
run;
data want(drop=dummy_key geocode rate startDate endDate zipStart zipEnd rc i);
if _n_ = 1 then do;
if 0 then set key;
declare hash k (dataset:'key',multidata:'y');
k.defineKey('dummy_key');
k.defineData('geocode','rate','startdate','enddate','zipstart','zipend');
k.defineDone();
end;
call missing (of _all_);
set trans;
dummy_key=1;
rc = k.find();
do i=1 to 1000 while (rc=0);
transZipNum = input(transZip,8.); *converts character zip to number. if its already a number then remove;
zipStartNum = input(zipStart,8.);
zipEndNum = input(zipEnd,8.);
if startDate <= transDate <= endDate then do;
if zipStartNum <= transZipNum <= zipEndNum then do;
rate_out = min(rate_out,rate);
end;
end;
rc=k.find_next();
end;
run;

Related

SAS. calculate % from DISTINCT COUNT

I am working in SAS Studio Version: 2022.09.
I am working with survey data and will be tracking Region-Facility that has not submitted a survey in over 3 weeks. Surveys are voluntary but ideally facilities will submit a new survey weekly.
Region
Facility (Type&Name)
Date Survey Submitted
North
Hospital-Baptist Hospital
1/01/2023
South
PCP-Family Care
1/01/2023
North
PCP- Primary Medical
1/08/2023
South
PCP-Family Care
1/08/2023
North
Hospital-Baptist Hospital
1/15/2023
North
Hospital-St Mary Hospital
1/15/2023
West
Daycare-Early Learning
1/15/2023
West
Hospital-Methodist
1/15/2023
South
Daycare-Early Learning
1/15/2023
To obtain a list of facilities by region that submitted before but have not submitted in 3 weeks. Since we do not expect to be successful with every facility, we will stop following facilities after 10 weeks.
Data have;
set want;
DaysDiff=intck('day', Date, today());
run;
proc sort data=have;
by Facility Region Date;
run;
data have;
set have;
by Facility;
if last.Facility;
run;
proc sort data=have
out=SurveysMissing;
BY Region Facility;
WHERE DaysDiff>21 AND DaysDiff<70;
run;
To assist in determining significance of losing facilities that had not submitted recently, I would like to obtain a %.
[Total # of facilities per REGION that have not submitted survey >21 <70] / [Total # of facilities per REGION that have reported in the last 10 weeks]
/*#facilities not submitted >21 AND <70 /*
proc sql;
SELECT Count(Distinct Facility) AS Count, Region
FROM have
WHERE DaysDiff>21 AND DaysDiff <70
GROUP BY Region;
run;
/*Count of Distinct Facilities per Region*/
proc sql;
SELECT Count(Distinct Facility) AS Count, Region
FROM have
WHERE DaysDiff <70
GROUP BY Region;
run;
Would I need to create tables and do a left join to calculate %?
Thanks.
In Proc SQL a true condition resolves to 1 and false to 0. You can leverage this feature to compute the ratio of sums of expressions or binary flags.
Example:
Compute the ratio based on a subquery that flags facilities
proc sql;
create table want as
select
region, sum (isquiet_flag) / sum (submitted_flag) label = 'Fraction of quiet facilities'
from
( select region, facility
, min(today() - date_submitted ) > 21 as isquiet_flag
, min(today() - date_submitted ) < 70 as submitted_flag
from have
where today() - date_submitted < 70
group by region, facility
)
group by
region
;
In your last data step for have, add an indicator for missing survery.
data have;
set have;
by Facility;
if last. Facility;
surverymissing = (daysdiff > 21); * contains 1 if condition is true, otherwise 0;
run;
Then use proc summary to compute your numerator and denominator for each region. The numerator is the sum of surveymissing while the denominator is the count of the same.
proc summary data=have nway;
where daysdiff < 70;
class region;
var surveymissing;
output out=region_summary (drop=_:) sum=SurveysMissing n=TotalFacilities;
run;

SAS Macro help to loop monthly sas datasets

I have monthly datasets in SAS Library for customers from Jan 2013 onwards with datasets name as CUST_JAN2013,CUST_FEB2013........CUST_OCT2017. These customers datasets have huge records of 2 million members for each month.This monthly datset has two columns (customer number and customer monthly expenses).
I have one input dataset Cust_Expense with customer number and month as columns. This Cust_Expense table has only 250,000 members and want to pull expense data for each member from SPECIFIC monthly SAS dataset by joining customer number.
Cust_Expense
------------
Customer_Number Month
111 FEB2014
987 APR2017
784 FEB2014
768 APR2017
.....
145 AUG2017
345 AUG2014
I have tried using call execute, but it tries to loop thru each 250,000 records of input dataset (Cust_Expense) and join with corresponding monthly SAS customer tables which takes too much of time.
Is there a way to read input tables (Cust_Expense) by month so that we read all customers for a specific month and then read the same monthly table ONCE to pull all the records from that month, so that it does not loop 250,000 times.
Depending on what you want the result to be, you can create one output per month by filtering on cust_expenses per month and joining with the corresponding monthly dataset
%macro want;
proc sql noprint;
select distinct month
into :months separated by ' '
from cust_expenses
;
quit;
proc sql;
%do i=1 %to %sysfunc(countw(&months));
%let month=%scan(&months,&i,%str( ));
create table want_&month. as
select *
from cust_expense(where=(month="&month.")) t1
inner join cust_&month. t2
on t1.customer_number=t2.customer_number
;
%end;
quit;
%mend;
%want;
Or you could have one output using one join by 'unioning' all those monthly datasets into one and dynamically adding a month column.
%macro want;
proc sql noprint;
select distinct month
into :months separated by ' '
from cust_expenses
;
quit;
proc sql;
create table want as
select *
from cust_expense t1
inner join (
%do i=1 %to %sysfunc(countw(&months));
%let month=%scan(&months,&i,%str( ));
%if &i>1 %then union;
select *, "&month." as month
from cust_&month
%end;
) t2
on t1.customer_number=t2.customer_number
and t1.month=t2.month
;
quit;
%mend;
%want;
In either case, I don't really see the point in joining those monthly datasets with the cust_expense dataset. The latter does not seem to hold any information that isn't already present in the monthly datasets.
Your first, best answer is to get rid of these monthly separate tables and make them into one large table with ID and month as key. Then you can simply join on this and go on your way. Having many separate tables like this where a data element determines what table they're in is never a good idea. Then index on month to make it faster.
If you can't do that, then try creating a view that is all of those tables unioned. It may be faster to do that; SAS might decide to materialize the view but maybe not (but if it's extremely slow, then look in your temp table space to see if that's what's happening).
Third option then is probably to make use of SAS formats. Turn the smaller table into a format, using the CNTLIN option. Then a single large datastep will allow you to perform the join.
data want;
set jan feb mar apr ... ;
where put(id,CUSTEXPF1.) = '1';
run;
That only makes one pass through the 250k table and one pass through the monthly tables, plus the very very fast format lookup which is undoubtedly zero cost in this data step (as the disk i/o will be slower).
I guess you could output your data in specific dataset like this example :
data test;
infile datalines dsd;
input ID : $2. MONTH $3. ;
datalines;
1,JAN
2,JAN
3,JAN
4,FEB
5,FEB
6,MAR
7,MAR
8,MAR
9,MAR
;
run;
data JAN FEB MAR;
set test;
if MONTH = "JAN" then output JAN;
if MONTH = "FEB" then output FEB;
if MONTH = "MAR" then output MAR;
run;
You will avoid to loop through all your ID (250000)
and you will use dataset statement from SAS
At the end you will get 12 DATASET containing the ID related.
If you case, FEB2014 , for example, you will use a substring fonction and the condition in your dataset will become :
...
set test;
...
if SUBSTR(MONTH,1,3)="FEB" then output FEB;
...
Regards

Flagging values based on subsequent occurences using first. retain etc

Thank you who will be able to help me. I've got a dataset as below:
data smp;
infile datalines dlm=',';
informat identifier $7. trx_date $9. transaction_id $13. product_description $50. ;
input identifier $ trx_date transaction_id $ product_description $ ;
datalines;
Cust1,11Aug2016,20-0030417313,ONKEN BIOPOT F/FREE STRAWBERRY
Cust1,11Aug2016,20-0030417313,ONKEN BIOPOT F/FREE STRAWBERRY
Cust1,11Aug2016,20-0030417313,ONKEN BIOPOT FULL STRAWB/GRAIN
Cust1,11Aug2016,20-0030417313,RACHELS YOG GREEK NAT F/F/ORG
Cust1,03Nov2016,23-0040737060,RACHELS YOG GREEK NAT F/F/ORG
Cust3,13Feb2016,39-0070595440,COLLECT YOG LEMON
Cust3,21Jun2016,34-0050769524,AF YOG FARMHOUSE STRAWB/REDCUR
Cust3,21Jun2016,34-0050769524,Y/VALLEY GREEK HONEY ORGANIC
Cust3,21Jun2016,34-0050769524,Y/VALLEY THICK LEMON CURD ORG
Cust3,21Jun2016,34-0050769524,Y/VALLEY THICK YOG FRUITY FAVS
Cust3,21Jun2016,34-0050769524,Y/VALLEY THICK YOG STRAWB ORG
Cust3,26Jun2016,39-0430106897,TOTAL GREEK YOGURT 0%
Cust3,14Aug2016,54-0040266755,M/BUNCH SQUASHUMS STRAW/RASP
Cust3,14Aug2016,54-0040266755,MULLER CORNER STRAWBERRY
Cust3,14Aug2016,54-0040266755,TOTAL GREEK YOGURT 0%
Cust3,22Aug2016,54-0050447336,M/BUNCH SQUASHUMS STRAW/RASP
;
For each customers (and each of their purchase based on transaction_id), i'm wanting to flag each product that will be repurchased during their next visit (only their next visit) on a rolling basis. So in the above dataset, correct flags would be on rows 4,12 and 13 because these products are bought on the next customer visit (we only look at the next visit).
I'm trying to do it with the following program:
proc sort data = smp out = td;
by descending identifier transaction_id product_description;
run;
DATA TD2(DROP=tmp_product);
SET td;
BY identifier transaction_id product_description;
RETAIN tmp_product;
IF FIRST.product_description and first.transaction_id THEN DO;
tmp_product = product_description;
END;
ATTRIB repeat_flag FORMAT=$1.;
IF NOT FIRST.product_description THEN DO;
IF tmp_product EQ product_description THEN repeat_flag ='Y';
ELSE repeat_flag = 'N';
END;
RUN;
proc sort data = td2;
by descending identifier transaction_id product_description;
run;
But it's not working? if someone could pse help it would be fab.
Best Wishes
Other method is to produce a dummy group in original dataset and temporary dataset. In original dataset, group is sequenced by visit time per customer, in temporary dataset, group is sequenced from beginning of SECOND visit time per customer, group number in temporary dataset is the same as group number of original dataset, but its visit time is next visit of original dataset. With the dummy group, it is easy to find the same product that was repurchased during their next visit by hash table.
proc sort data=smp;
by identifier trx_date;
run;
data have(drop=_group) temp(drop=group rename=(_group=group));
set smp;
by identifier trx_date;
if first.identifier then do;
group=1; _group=0;
end;
if dif(trx_date)>0 then do;
group+1; _group+1;
end;
if _group^=0 then output temp;
output have;
run;
data want;
if 0 then set temp;
if _n_=1 then do;
declare hash h(dataset:'temp');
h.definekey('identifier','group','product_description');
h.definedata('product_description');
h.definedone();
end;
set have;
flag=(h.find()=0);
drop group;
run;
The method below will "look ahead" to the next row (opposite to LAG) after sorting so you can bring comparisons onto the same row for simple logic:
** convert character date to numeric **;
data smp1; set smp;
TRX_DATE_NUM = input(TRX_DATE,ANYDTDTE10.);
format TRX_DATE_NUM mmddyy10.;
run;
** sort **;
proc sort data = smp1;
by IDENTIFIER PRODUCT_DESCRIPTION TRX_DATE_NUM;
run;
** look ahead at the next observations and use logic to identify flags **;
data look_ahead;
set smp1;
by IDENTIFIER;
set smp1 (firstobs = 2
keep = IDENTIFIER PRODUCT_DESCRIPTION TRX_DATE_NUM
rename = (IDENTIFIER = NEXT_ID PRODUCT_DESCRIPTION = NEXT_PROD TRX_DATE_NUM = NEXT_DT))
smp1 (obs = 1 drop = _ALL_);
if last.IDENTIFIER then do;
NEXT_ID = "";
NEXT_PROD = "";
NEXT_DT = .;
end;
run;
** logic says if the next row is the same customer who bought the same product on a different date then flag **;
data look_ahead_final; set look_ahead;
if IDENTIFIER = NEXT_ID and NEXT_PROD = PRODUCT_DESCRIPTION and TRX_DATE_NUM ne NEXT_DT then FLAG = 1;
else FLAG = 0;
run;
There are a few ways to do this; I think the simplest to understand, while still having a reasonable level of performance, is to sort the data in descending date order and then use an array to store the product_descriptions of the last trx_date.
Here I use a 2 dimensional array where the first dimension is just a 1/2 value; each trx_date simultaneously loads one row of the array and checks against the other row of the array (using _array_switch to determine which is being loaded/checked).
You could do the same thing with a hash table, and it would be appreciably faster along with perhaps a bit less complicated in some ways; if you are familiar with hash tables and want to see that solution comment and I or someone else can provide it.
You also could use SQL to do this, and I suspect that is the most common solution overall, but I couldn't quite get it to work, as it has some complexity with subqueries within subqueries the way I was approaching it, and I'm apparently not good enough with those.
Here's the array solution. Set the second dimension of prods to a reasonable maximum for your data - it could even be thousands, this is a temporary array and does not use much memory so set to 32000 or whatever would not be a big deal.
proc sort data=smp;
by identifier descending trx_date ;
run;
data want;
array prods[2,20] $255. _temporary_;
retain _array_switch 2;
do _n_ = 1 by 1 until (last.trx_date);
set smp;
by identifier descending trx_date;
/* for first row for an identifier, clear out the whole thing */
if first.identifier then do;
call missing(of prods[*]);
end;
/* for first row of a trx_date, clear out the array-row we were looking at last time, and switch _array_switch to the other value */
if first.trx_date then do;
do _i = 1 to dim(prods,2);
if missing(prods[_array_switch,_i]) then leave;
call missing(prods[_array_switch,_i]);
end;
_array_switch = 3-_array_switch;
end;
*now check the array to see if we should set next_trans_flag;
next_trans_flag='N';
do _i = 1 to dim(prods,2);
if missing(prods[_array_switch,_i]) then leave; *for speed;
if prods[_array_switch,_i] = product_description then next_trans_flag='Y';
end;
prods[3-_array_switch,_n_] = product_description; *set for next trx_date;
output;
end;
drop _:;
run;
I think to really answer this you need to generate a list of distinct visit*product combinations. And also a list of the distinct products bought on particular visits.
proc sql noprint ;
create table bought as
select distinct identifier, product_description, trx_date, transaction_id
from smp
order by 1,2,3,4
;
create table all_visits as
select a.identifier, product_description, trx_date, transaction_id
from (select distinct identifier,product_description from bought) a
natural join (select distinct identifier,transaction_id,trx_date from bought) b
order by 1,2,3,4
;
quit;
You can then combine them and make a flag for whether the product was bought on that visit.
data check ;
merge all_visits bought(in=in1) ;
by identifier product_description trx_date transaction_id ;
bought=in1;
run;
You can now use a lead technique to figure out if the they also bought the product on the next visit.
data flag ;
set check ;
by identifier product_description trx_date transaction_id ;
set check(firstobs=2 keep=bought rename=(bought=bought_next)) check(drop=_all_ obs=1);
if last.product_description then bought_next=0;
run;
You can then combine back with the actual purchases and eliminate the extra dummy records.
proc sort data=smp;
by identifier product_description trx_date transaction_id ;
run;
data want ;
merge flag smp (in=in1);
by identifier product_description trx_date transaction_id ;
if in1 ;
run;
Let's put the records back into the original order so we can check the results.
proc sort; by row; run;
proc print; run;

SAS software: How to delete observations with more than five zeros for the dependent variable

I have a consumer panel data with weekly recorded spending at a retail store. The unique identifier is household ID. I would like to delete observations if there occurs more than five zeros in spending. That is, the household did not make any purchase for five weeks. Once identified, I will delete all observations associated with the household ID. Does anyone know how I can implement this procedure in SAS? Thanks.
I think proc SQL would be good here.
This could be done in a single step with a more complex subquery but it is probably better to break it down into 2 steps.
Count how many zeroes each household ID has.
Filter to only include household IDs that have 5 or less zeroes.
proc sql;
create table zero_cnt as
select distinct household_id,
sum(case when spending = 0 then 1 else 0 end) as num_zeroes
from original_data
group by household_id;
create table wanted as
select *
from original_data
where household_id in (select distinct household_id from zero_cnt where num_zeroes <= 5);
quit;
Edit:
If the zeroes have to be consecutive then the method of building the list of IDs to exclude is different.
* Sort by ID and date;
proc sort data = original_data out = sorted_data;
by household_id date;
run;
Use the Lag operator: to check the previous spending amounts.
More info on LAG here: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm
data exclude;
set sorted;
by household_id;
array prev{*} _L1-_L4;
_L1 = lag(spending);
_L2 = lag2(spending);
_L3 = lag3(spending);
_L4 = lag4(spending);
* Create running count for the number of observations for each ID;
if first.household_id; then spend_cnt = 0;
spend_cnt + 1;
* Check if current ID has at least 5 observations to check. If so, add up current spending and previous 4 and output if they are all zero/missing;
if spend_cnt >= 5 then do;
if spending + sum(of prev) = 0 then output;
end;
keep household_id;
run;
Then just use a subquery or match merge to remove the IDs in the 'excluded' dataset.
proc sql;
create table wanted as
select *
from original_data;
where household_id not in(select distinct household_id from excluded);
quit;

Create a loop in SAS which filters 2 variables simultaneously

The question might be quite vague but I could not come up with a decent concise title.
I have data where there are id ,date, amountA and AmtB as my variables. The task is to pick the dates that are within 10 days of each other and then see if their amountA are within 20% and if they are then pick the one with highest amountB. I have used to this code to achieve this
id date amountA amountB
1 1/15/2014 1000 79
1 1/16/2014 1100 81
1 1/30/2014 700 50
1 2/05/2014 710 80
1 2/25/2014 720 50
This is what I need
id date amountA amountB
1 1/16/2014 1100 81
1 1/30/2014 700 50
1 2/25/2014 720 50
I wrote this code but the problem with this code is its not automatic and has to be done on a case to case basis.I need a way to loop it so that it automatically outputs the results.I am no pro at looping and hence am stuck.Any help is greatly appreciated
data test2;
set test1;
diff_days=abs(intck('days',first_dt,date));
if diff_days<=10 then flag=1;
else if diff_days>10 then flag=0;
run;
data test3 rem_test3;
set test2;
if flag=1 then output test3;
else output rem_test3;
run;
proc sort data=test3;
by id amountA;
run;
data all_within;
set test3;
by id amountA;
amtA_lag=lag1(amountA);
if first.id then
do;
counter=1;
flag1=1;
end;
if first.id=0 then
do;
counter+1;
diff=abs(amountA-amtA_lag);
if diff<(10/100*amountA) then flag1+1;
else flag1=0;
end;
if last.stay and flag1=counter then output all_within;
run;
If I understand the problem correctly, you want to group all records together that have (no skip of 10+ days) and (amt A w/in 20%)?
Looping isn't your problem - no explicitly coded loop is needed to do this (or at least, the way I think of it). SAS does the data step loop for you.
What you want to do is:
Identify groups. A group is the consecutive records that you want to, among them, collapse to one row. It's not perfectly clear to me how amountA has to behave here - does the whole group need to have less than a maximum difference of 10%, or a record to next record difference of < 10%, or a (current highest amtB of group) < 10% - but you can easily identify all of these rules. Use a RETAINed variable to keep track of the previous amountA, previous date, highest amountB, date associated with the highest amountB, amountA associated with highest amountB.
When you find a record that doesn't fit in the current group, output a record with the values of the previous group.
You shouldn't need two steps for this, although you can if you want to see it more easily - this may be helpful for debugging your rules. Set it so that you have a GroupNum variable, which you RETAIN, and you increment that any time you see a record that causes a new group to start.
I had trouble figuring out the rules...but here is some code that checks each record against the previous for the criteria I think you want.
Data HAVE;
input id date :mmddyy10. amountA amountB ;
format date mmddyy10.;
datalines;
1 1/15/2014 1000 79
1 1/16/2014 1100 81
1 1/30/2014 700 50
1 2/05/2014 710 80
1 2/25/2014 720 50
;
Proc Sort data=HAVE;
by id date;
Run;
Data WANT(drop=Prev_:);
Set HAVE;
Prev_Date=lag(date);
Prev_amounta=lag(amounta);
Prev_amountb=lag(amountb);
If not missing(prev_date);
If date-prev_date<=10 then do;
If (amounta-prev_amounta)/amounta<=.1 then;
If amountb<prev_amountb then do;
Date=prev_date;
AmountA=prev_amounta;
AmountB=prev_amountb;
end;
end;
Else delete;
Run;
Here is a method that I think should work. The basic approach is:
Find all the pairs of sufficiently close observations
Join the pairs with themselves to get all connected ids
Reduce the groups
Join to the original data and get the desired values
data have;
input
id
date :mmddyy10.
amountA
amountB;
format date mmddyy10.;
datalines;
1 1/15/2014 1000 79
2 1/16/2014 1100 81
3 1/30/2014 700 50
4 2/05/2014 710 80
5 2/25/2014 720 50
;
run;
/* Count the observations */
%let dsid = %sysfunc(open(have));
%let nobs = %sysfunc(attrn(&dsid., nobs));
%let rc = %sysfunc(close(&dsid.));
/* Output any connected pairs */
data map;
array vals[3, &nobs.] _temporary_;
set have;
/* Put all the values in an array for comparison */
vals[1, _N_] = id;
vals[2, _N_] = date;
vals[3, _N_] = amountA;
/* Output all pairs of ids which form an acceptable pair */
do i = 1 to _N_;
if
abs(vals[2, i] - date) < 10 and
abs((vals[3, i] - amountA) / amountA) < 0.2
then do;
id2 = vals[1, i];
output;
end;
end;
keep id id2;
run;
proc sql;
/* Reduce the connections into groups */
create table groups as
select
a.id,
min(min(a.id, a.id2, b.id)) as group
from map as a
left join map as b
on a.id = b.id2
group by a.id;
/* Get the final output */
create table lookup (where = (amountB = maxB)) as
select
have.*,
groups.group,
max(have.amountB) as maxB
from have
left join groups
on have.id = groups.id
group by groups.group;
quit;
The code works for the example data. However, the group reduction is insufficient for more complicated data. Fortunately, approaches for finding all the subgraphs given a set of edges can be found here, here, here or here (using SAS/OR).