Flag accounts on the basis of reapplication logic ( case of multiple entries by one person) - sas

I'm doing a case study for which I need some help.
Background - There are a bunch of people who have applied for Visa and have made multiple applications through different channels. I have to find out if a person has reapplied within 30 days of their previous application (Y/N), their reapplied channel and date. Every entry should be considered independent irrespective of multiple entries by a single person. [See attached image] Ex - PETE5O made first application on 5th Aug but second was not made within 30 days hence is it is not reapplied but the third was made within 30 days of second and hence it is reapplied and also the channel of reapplication would be channel of third application here (which is basically reapplication), same goes with date, so and so forth. There can be n number of applications made by a single person. Also, there can be chances of multiple reapplications. Please advise what should I do. I have info in yellow but want to get the info in blue.
enter image description here
Code -
data have;
format apply_date date9.;
input Id $ Channel $ Apply_date date9. ;
cards;
SAM1D Online 1-Oct-22
SAM1D Kiosk 9-Oct-22
PETE5O Office 5-Aug-22
PETE5O Kiosk 6-Sep-22
PETE5O Online 8-Sep-22
PETE5O Kiosk 5-Oct-22
;
Thanks in advance.
Expecting results in blue to be populated. [See attached image]

You can use a 1:1 merge of the data with itself, the second one being offset by one row (firstobs=2) and using data set option rename= to create variables for the 'lead' condition.
Example:
data want;
merge
have
have(firstobs=2 rename=(id=nextid channel=nextchannel apply_date=nextapply_date))
;
if id = nextid then do;
if nextapply_date - apply_date <= 30 then do;
reapply_flag = 'Y';
reapply_channel = nextchannel;
reapply_date = nextapply_date;
reapply_interval = nextapply_date - apply_date;
end;
end;
else do;
reapply_flag = 'N';
end;
format reapply_date date9.;
drop next:;
run;

To calculate based on future actions process the data in reverse chronological order. That way you can use LAG() to remember those future events.
proc sort data=have ;
by id descending apply_date;
run;
data want;
set have;
by id descending apply_date;
length reapply_flag $3 reapply_channel $6 reapply_date 8;
format reapply_date date9.;
lag_date=lag(apply_date);
lag_channel=lag(channel);
reapply_flag='No';
if first.id then call missing(of reapply_channel reapply_date);
else if lag_date - apply_date <= 30 then do;
reapply_flag='Yes';
reapply_channel=lag_channel;
reapply_date=lag_date;
end;
drop lag_: ;
run;
Result:
Apply_ reapply_ reapply_ reapply_
Obs Id Channel date flag channel date
1 PETE5O Kiosk 05OCT2022 No .
2 PETE5O Online 08SEP2022 Yes Kiosk 05OCT2022
3 PETE5O Kiosk 06SEP2022 Yes Online 08SEP2022
4 PETE5O Office 05AUG2022 No .
5 SAM1D Kiosk 09OCT2022 No .
6 SAM1D Online 01OCT2022 Yes Kiosk 09OCT2022

format apply_date date9.; input Id $ Channel $ Apply_date date9. ; cards; SAM1D Online 1-Oct-22 SAM1D Kiosk 9-Oct-22 PETE5O Office 5-Aug-22 PETE5O Kiosk 6-Sep-22 PETE5O Online 8-Sep-22 PETE5O Kiosk 5-Oct-22 ;

Related

How to use MS SQL window function in SAS proc SQL

Hi I am trying to calculate how much the customer paid on the month by subtracting their balance from the next month.
Data looks like this: I want to calculate PaidAmount for A111 in Jun-20 by Balance in Jul-20 - Balance in June-20. Can anyone help, please? Thank you
For this situation there is no need to look ahead as you can create the output you want just by looking back.
data have;
input id date balance ;
informat date yymmdd10.;
format date yymmdd10.;
cards;
1 2020-06-01 10000
1 2020-07-01 8000
1 2020-08-01 5000
2 2020-06-01 10000
2 2020-07-01 8000
3 2020-08-01 5000
;
data want;
set have ;
by id date;
lag_date=lag(date);
format lag_date yymmdd10.;
lag_balance=lag(balance);
payment = lag_balance - balance ;
if not first.id then output;
if last.id then do;
payment=.;
lag_balance=balance;
lag_date=date;
output;
end;
drop date balance;
rename lag_date = date lag_balance=balance;
run;
proc print;
run;
Result:
Obs id date balance payment
1 1 2020-06-01 10000 2000
2 1 2020-07-01 8000 3000
3 1 2020-08-01 5000 .
4 2 2020-06-01 10000 2000
5 2 2020-07-01 8000 .
6 3 2020-08-01 5000 .
This is looking for a LEAD calculation which is typically done via PROC EXPAND but that's under the SAS/ETS license which not many users have. Another option is to merge the data with itself, offsetting the records by one so that the next months record is on the same line.
data want;
merge have have(firstobs=2 rename=balance = next_balance);
by clientID;
PaidAmount = Balance - next_balance;
run;
If you can be missing months in your series this is not a good approach. If that is possible you want to do an explicit merge using SQL instead. This assumes you have month as a SAS date as well.
proc sql;
create table want as
select t1.*, t1.balance - t2.balance as paidAmount
from have as t1
left join have as t2
on t1.clientID = t2.ClientID
/*joins current month with next month*/
and intnx('month', t1.month, 0, 'b') = intnx('month', t2.month, 1, 'b');
quit;
Code is untested as no test data was provided (I won't type out your data to test code).

Calculate average of the last x years

I have the following data
Date value_idx
2002-01-31 .
2002-01-31 24.533
2002-01-31 26.50
2018-02-28 25.2124
2019-09-12 22.251
2019-01-31 24.214
2019-05-21 25.241
2019-05-21 .
2020-05-21 25.241
2020-05-21 23.232
I would need to calculate the average of value_idx of the last 3 years and 7 years.
I tried first to calculate it as follows:
proc sql;
create table table1 as
select date, avg(value_idx) as avg_value_idx
from table
group by date;
quit;
The problem is that I do not know how to calculate the average of value_idx not per each month but for the last two years. So I think I should extract the year, group by that, and then calculate the average.
I hope someone of you can help me with this.
You can use CASE to decide which records contribute to which MEAN. You need to clarify what you mean by last 2 or last 7 years. This code will find the value of the maximum date and then compare the year of that date to the year of the other dates.
select
mean(case when year(max_date)-year(date) < 2 then value_idx else . end) as mean_yr2
,mean(case when year(max_date)-year(date) < 7 then value_idx else . end) as mean_yr7
from have,(select max(date) as max_date from have)
;
Results
mean_yr2 mean_yr7
------------------
24.0358 24.2319
The best way to do this sort of thing in SAS is with native PROCs, as they have a lot of functionality related to grouping.
In this case, we use multilabel formats to control the grouping. I assume you mean 'Last Three Years' as in calendar 2018/2019/2020 and 'Last Seven Years' as calendar 2014-2020. Presumably you can see how to modify this for other time periods - so long as you aren't trying to make the time period relative to each data point.
We create a format that uses the MULTILABEL option (which allows data points to fall in multiple categories), and the NOTSORTED option (to allow us to force the ordering of the labels, otherwise SEVEN is earlier than THREE).
Then, we use it in PROC TABULATE, enabling it with MLF (MultiLabel Format) and preloadfmt order=data which again keeps the ordering correct. This produces a report with the two averages only.
data have;
informat date yymmdd10.;
input Date value_idx;
datalines;
2002-01-31 .
2002-01-31 24.533
2002-01-31 26.50
2017-02-28 25.2124
2017-09-12 22.251
2018-01-31 24.214
2018-05-21 25.241
2019-05-21 .
2020-05-21 25.241
2020-05-21 23.232
;;;;
run;
proc format;
value yeartabfmt (multilabel notsorted)
'01JAN2018'd-'31DEC2020'd = 'Last Three Years'
'01JAN2014'd-'31DEC2020'd = 'Last Seven Years'
other=' '
;
quit;
proc tabulate data=have;
class date/mlf preloadfmt order=data;
var value_idx;
format date yeartabfmt.;
tables date,value_idx*mean;
run;

SAS aggregate rows computation

I'm a beginner user of SAS especially when it comes to aggregate rows computation.
Here is a question which I believe some of you may have encountered before.
The data I have is related to insurance policies, here is an example dataset: columns from left to right are customer number, policy number, policy status, policy start date and policy cancel date (if the policy is not active, otherwise is a missing value).
data have;
informat cust_id 8. pol_num $10. status $10. start_date can_date DDMMYY10.;
input cust_id pol_num status start_date can_date;
format start_date can_date date9.;
datalines;
110 P110001 Cancelled 04/12/2004 10/10/2013
110 P110002 Active 01/03/2005 .
123 P123001 Cancelled 21/07/1998 23/04/2013
123 P123003 Cancelled 22/10/1987 01/11/2011
133 P133001 Active 19/02/2001 .
133 P133001 Active 20/02/2002 .
;
run;
Basically I want to roll these policy level information to customer level, if a customer holds at least one active policy, then his status would be 'Active', otherwise if all his policies are Cancelled, then his status becomes 'Inactive'. I also need a customer "start date" which picks up the earliest policy start date under that customer. If the customer is 'Inactive', then I need the customer's latest policy cancel date as the customer's exit date.
Below is the what I needed:
data want;
informat cust_id 8. status $10. start_date exit_date DDMMYY10.;
input cust_id status start_date exit_date;
format start_date exit_date date9.;
datalines;
110 Active 01/03/2005 .
123 Inactive 22/10/1987 23/04/2013
133 Active 19/02/2001 .
;
run;
Solution in any form would be much appreciated! Either DATA step or PROC SQL is fine.
Thank you so much.
You can do something like that:
proc sql;
create table want as
select cust_id,
case when count(case when status='Active' then 1 end) > 0
then 'Active'
else 'Inactive'
end as status,
min(start_date) as start_date,
case when count(case when status='Active' then 1 end) = 0
then max(can_date)
end as exit_date
from have
group by cust_id;
quit;
You could attack the question in a DATA step. Here's one simple way, assuming your data are sorted by cust_id and start_date...
data want;
set have (keep=cust_id status start_date exit_date);
where upcase(status) contains 'ACTIVE';
by cust_id start_date;
if first.start_date then output;
else delete;
run;
/*BEGINNER NOTES*/
*1] WHERE tells SAS to compile only records that fit a certain
condition - the DS 'want' will never have any observations with
'CANCELLED' in the status variable;
*2] I use UPCASE() to standardize the contents of status, as CONTAINS
is a case-sensitive operator;
*3] FIRST.variable = 1 if the value is the first encountered in
the compile phase;

Establishing treatment sample with Panel data in SAS

I have panel data that looks something like this:
ID year dummy
1234 2007 0
1234 2008 0
1234 2009 0
1234 2010 1
1234 2011 1
2345 2008 0
2345 2009 1
2345 2010 1
2345 2011 1
3456 2008 0
3456 2009 0
3456 2010 1
3456 2011 1
With more observations following the same pattern and many more variables that aren't relevant to this problem.
I want to establish a treatment sample of IDs where the dummy variable "switches" at 2010 (is 0 when year<2010 and 1 when year>=2010). In the example data above, 1234 and 3456 would be in the sample and 2345 would not.
I'm fairly new to SAS and I guess I'm not familiar enough with CLASS and BY statements to figure out how to do this.
So far I've done this:
data c_temp;
set c_data_full;
if year < 2010 and dummy=0
then trtmt_grp=1;
else pre_grp=0;
if year >=2010 and dummy=1
then trtmt_grp=1;
run;
But that doesn't do anything about the panel aspect of the data. I can't figure out how to do the last step of selecting only the IDs where trtmt_grp is 1 for every year.
All help is appreciated! Thanks!
Don't think you need double DoW loop, unless you need to append the data to the other rows. Simple single pass should suffice if you just need a single row per ID that matches.
data want;
set have;
by id;
retain grpcheck; *keep its value for multiple passes;
if first.id and year < 2010 then grpcheck=1; *reset for each ID to 1 (kept);
else if first.id and year ge 2010 then grpcheck=0;
if (year<2010) and (dummy=1) then grpcheck=0; *if a non-zero is found before 2010, set to 0;
if (year >= 2010) and (dummy=0) then grpcheck=0; *if a 0 is found at/after 2010, set to 0;
if last.id and year >= 2010 and grpcheck=1; *if still 1 by last.id and it hits at least 2010 then output;
run;
Any time you want to do some logic for each ID (or, each logically grouped set of rows by some variable's value), you start by setting your flag/etc. in an if first.id statement group. Then, modify your flag as appropriate for each row. Then, add an if last.id group which checks to see if the flag is still set when you've hit the last row.
I think you probably want a double DOW loop. First loop to calculate your TRTMT_GRP flag at the ID level and the second to select the detailed records.
data want ;
do until (last.id);
set c_data_full;
by id dummy ;
if first.dummy and dummy=1 and year=2010 then trtmt_grp=1;
end;
do until (last.id);
set c_data_full;
by id ;
if trtmt_grp=1 then output;
end;
run;
It seems to me that Proc SQL can deliver a pretty straightforward approach,
proc sql;
select distinct id from have
group by id
having sum(year<=2009 and dummy = 1)=0 and sum(year>=2010 and dummy=0) = 0
;
quit;

Is there any PERIOD datatype for SAS sql ?

I have the following table for people running a marathon
person start end
mike 2-Jun-14 2-Aug-14
nike 3-Jul-14 9-Aug-14
mini 1-Aug-14 3-Nov-14
I want to know if a person was "running the marathon" on the 1st of each month. The desired table should look like this
person running on
mike 1-Jul-14
mike 1-Aug-14
nike 1-Aug-14
mini 1-Aug-14
mini 1-Sep-14
... for all other months
Is there any way in SAS proc sql to get this result. The other dbms have this (Teradata, Oracle). What is the best approach to this in SAS ?
I would use a DATA STEP to solve this in SAS.
data have;
input person $ from :anydtdte. to :anydtdte.;
format from to date11.;
datalines;
mike 2-Jun-14 2-Aug-14
nike 3-Jul-14 9-Aug-14
mini 1-Aug-14 3-Nov-14
;
run;
data want(keep=person running_on);
set have;
format Running_on date11.;
do Running_on=from to to;
if day(running_on) = 1 then
output;
end;
run;