I'm looking to take a variable observation's date and essentially keep rolling it forward by its specified repricing parameter until a target date
the dataset being used is:
data have;
input repricing_frequency date_of_last_repricing end_date;
datalines;
3 15399 21367
10 12265 21367
15 13879 21367
;
format date_of_last_repricing end_date date9.;
informat date_of_last_repricing end_date date9.;
run;
so the idea is that i'd keep applying the repricing frequency of either 3 months, 10 months or 15 months to the date_of_last_repricing until it is as close as it can be to the date "31DEC2017". Thanks in advance.
EDIT including my recent workings:
data want;
set have;
repricing_N = intck('Month',date_of_last_repricing,'31DEC2017'd,'continuous');
dateoflastrepricing = intnx('Month',date_of_last_repricing,repricing_N,'E');
format dateoflastrepricing date9.;
informat dateoflastrepricing date9.;
run;
The INTNX function will compute an incremented date value, and allows the resultant interval alignment to be specified (in your case the 'end' of the month n-months hence)
data have;
format date_of_last_repricing end_date date9.;
informat date_of_last_repricing end_date date9.;
* use 12. to read the raw date values in the datalines;
input repricing_frequency date_of_last_repricing: 12. end_date: 12.;
datalines;
3 15399 21367
10 12265 21367
15 13879 21367
;
run;
data want;
set have;
status = 'Original';
output;
* increment and iterate;
date_of_last_repricing = intnx('month',
date_of_last_repricing, repricing_frequency, 'end'
);
do while (date_of_last_repricing <= end_date);
status = 'Computed';
output;
date_of_last_repricing = intnx('month',
date_of_last_repricing, repricing_frequency, 'end'
);
end;
run;
If you want to compute only the nearest end date, as when iterating by repricing frequency, you do not have to iterate. You can divide the months apart by the frequency to get the number of iterations that would have occurred.
data want2;
set have;
nearest_end_month = intnx('month', end_date, 0, 'end');
if nearest_end_month > end_date then nearest_end_month = intnx('month', nearest_end_month, -1, 'end');
months_apart = intck('month', date_of_last_repricing, nearest_end_month);
iterations_apart = floor(months_apart / repricing_frequency);
iteration_months = iterations_apart * repricing_frequency;
nearest_end_date = intnx('month', date_of_last_repricing, iteration_months, 'end');
format nearest: date9.;
run;
proc sql;
select id, max(date_of_last_repricing) as nearest_end_date format=date9. from want group by id;
select id, nearest_end_date from want2;
quit;
Related
I am trying to create a query to find number of occurrences in a list in a SAS dataset, for the past 12 Months starting from Last Month
I have created the macro below to be used in my WHERE clause:
%let cur_date = %sysfunc(today(), date9.);
%let pre_date2 = %sysfunc(putn(%sysfunc(intnx(month, %sysfunc(today()), -1, End)),%sysfunc(intnx(month, %sysfunc(today()), -12, End)) date9.)));
%put &pre_date4;
I would appreciate if you can help me with this.
Thanks
You need two macro variables: one for the end of the prior month and one for the first day 12 months prior to last month.
%let last_month = %sysfunc(intnx(month, %sysfunc(today()), -1, E) );
%let last_12_months = %sysfunc(intnx(month, &last_month., -12, B) );
Now you can run your query using between:
where date BETWEEN &last_month. AND &last_12_months.;
Example:
data have;
do i = -36 to 0;
date = intnx('month', today(), i, 'B');
output;
end;
format date date9.;
drop i;
run;
data want;
set have;
where date BETWEEN &last_month. AND &last_12_months.;
run;
Output:
date
01OCT2020
01NOV2020
01DEC2020
01JAN2021
01FEB2021
01MAR2021
01APR2021
01MAY2021
01JUN2021
01JUL2021
01AUG2021
01SEP2021
Member StartDate EndDate
1 27-Jul-17 27-Oct-17
2 27-Aug-19 11-Sep-19
3 28-Mar-17 31-Jul-17
4 17-Jun-19 13-Aug-19
5 21-Mar-17 16-May-17
6 17-Mar-17 05-Jul-17
7 20-Jan-16 11-Apr-16
8 27-Apr-15 09-Jun-15
9 13-Feb-19 19-Mar-19
10 27-May-15 30-Sep-15
11 16-Dec-16 30-Mar-17
12 17-Nov-16 02-Feb-17
data mydata;
set mdata;
weekdays=intck("weekdays",startdate,enddate);
run;
Holiday data is here:
holiday_date description
01/01/2003 New Years Day
17/02/2003 Family Day
18/04/2003 Good Friday
21/04/2003 Easter Monday
19/05/2003 Victoria Day
01/07/2003 Canada Day
04/08/2003 Civic Holiday
01/09/2003 Labour Day
13/10/2003 Thanksgiving Day
11/11/2003 Remembrance Day
25/12/2003 Christmas Day (*)
26/12/2003 Boxing Day (*)
01/01/2004 New Years Day
16/02/2004 Family Day
09/04/2004 Good Friday
12/04/2004 Easter Monday
24/05/2004 Victoria Day
01/07/2004 Canada Day
02/08/2004 Civic Holiday
I want to calculate number of week days excluding holidays. I have a table with the holidays dates. How do I apply it for each record to calculate business days?
This example is copied from SAS documentation.
Example 3: Using Custom Intervals with the INTCK Function
options intervalds=(BankingDays=BankDayDS);
data BankDayDS(keep=begin);
start = '15DEC1998'd;
stop = '15JAN2002'd;
nwkdays = intck('weekday',start,stop);
do i = 0 to nwkdays;
begin = intnx('weekday',start,i);
year = year(begin);
if begin ne holiday('NEWYEAR',year) and
begin ne holiday('MLK',year) and
begin ne holiday('USPRESIDENTS',year) and
begin ne holiday('MEMORIAL',year) and
begin ne holiday('USINDEPENDENCE',year) and
begin ne holiday('LABOR',year) and
begin ne holiday('COLUMBUS',year) and
begin ne holiday('VETERANS',year) and
begin ne holiday('THANKSGIVING',year) and
begin ne holiday('CHRISTMAS',year) then
output;
end;
format begin date9.;
run;
data CountDays;
start = '01JAN1999'd;
stop = '31DEC2001'd;
ActualDays = intck('DAYS',start,stop);
Weekdays = intck('WEEKDAYS',start,stop);
BankDays = intck('BankingDays',start,stop);
format start stop date9.;
run;
title 'Methods of Counting Days';
proc print data=CountDays;
run;
In response to the comment.
data mydata;
input Member (StartDate EndDate)(:date.);
format startdate enddate date11.;
cards;
1 27-Jul-17 27-Oct-17
2 27-Aug-19 11-Sep-19
3 28-Mar-17 31-Jul-17
4 17-Jun-19 13-Aug-19
5 21-Mar-17 16-May-17
6 17-Mar-17 05-Jul-17
7 20-Jan-16 11-Apr-16
8 27-Apr-15 09-Jun-15
9 13-Feb-19 19-Mar-19
10 27-May-15 30-Sep-15
11 16-Dec-16 30-Mar-17
12 17-Nov-16 02-Feb-17
;;;;
run;
proc print;
run;
proc summary data=mydata nway missing;
output out=range(drop=_:) min(startdate)=start max(enddate)=stop;
run;
proc print;
run;
options intervalds=(BankingDays=BankDayDS);
data BankDayDS(keep=begin);
set range;
nwkdays = intck('weekday',start,stop);
do i = 0 to nwkdays;
begin = intnx('weekday',start,i);
year = year(begin);
if begin ne holiday('NEWYEAR',year) and
begin ne holiday('MLK',year) and
begin ne holiday('USPRESIDENTS',year) and
begin ne holiday('MEMORIAL',year) and
begin ne holiday('USINDEPENDENCE',year) and
begin ne holiday('LABOR',year) and
begin ne holiday('COLUMBUS',year) and
begin ne holiday('VETERANS',year) and
begin ne holiday('THANKSGIVING',year) and
begin ne holiday('CHRISTMAS',year) then
output;
end;
format begin date9.;
run;
data mydata;
set mydata;
ActualDays = intck('DAYS',startdate,enddate);
weekdays = intck("weekdays",startdate,enddate);
BankDays = intck('BankingDays',startdate,enddate);
run;
title 'Methods of Counting Days';
proc print data=mydata;
run;
Since I am new to SAS I need some help to understand how to combine the overlap date ranges into one row.I want to combine the overlap date ranges when they have matching Id. If the dates don’t overlap then I want to keep them as it is. IF they over lap by Matching Id and drug code Then it should combine into one line. Please look at the same ple data set which I have below and the expected results:
Current Data set:
ID Drug Code BEG_Date End_Date
1 100 1/1/2018 1/1/2019
1 100 1/1/2018 3/1/2018
1 100 2/1/2018 04/30/2018
1 90 4/1/2018 04/30/2018
1 100 5/1/2018 6/1/2018
1 98 6/1/2018 8/31/2018
1 100 9/1/2018 5/4/2019
Expected results:
ID Drug Code BEG_Date End_Date
1 100 1/1/2018 3/31/2018
1 90 4/1/2018 04/30/2018
1 100 5/1/2018 6/1/2018
1 98 6/2/2018 8/31/2018
1 100 9/1/2018 5/4/2019
I wrote some SAS code but I am combining the dates even when there is no overlap. I want to write some code which should work in SAS.
PROC SORT DATA=Want OUT=ONE;
BY PERSON_ID BEG_DATE DRUG_CODE END_DATE;
RUN;
data TWO (DROP=PERSON_ID2 DRUG_CODE2 BEG_DATE END_DATE
RENAME=(BEG2=BEG_DOS
END2=END_DOS));
SET ONE;
RETAIN BEG2 END2;
PERSON_ID2=LAG1(PERSON_ID);
DRUG_CODE2=LAG1(DRUG_CODE);
IF PERSON_ID2=PERSON_ID AND DRUG_CODE2=DRUG_CODE AND BEG_DATE LE(END2+1) THEN
DO;
BEG2=MIN(BEG_DATE,BEG2);
END2=MAX(END_DATE,END2);
END;
ELSE
DO;
SEG+1;
BEG2=BEG_DATE;
END2=END_DATE;
END;
FORMAT BEG2 END2 MMDDYY10.;
RUN;
DATA THREE(DROP=BEG_DOS END_DOS SEG);
RETAIN BEG_DATE END_DATE;
SET TWO;
BY PERSON_ID SEG;
FORMAT BEG_DATE END_DATE MMDDYY10.;
IF FIRST.SEG THEN
DO;
BEG_DATE=BEG_DOS;
END;
IF LAST.SEG THEN
DO;
END_DATE = END_DOS;
OUTPUT;
END;
RUN;
This is how I would do it. Create an obs for each ID DRUG and DATE. Flag the gaps and summarize by RUN.
data have;
input ID Drug_Code (BEG End)(:mmddyy.);
format BEG End mmddyyd10.;
cards;
1 100 1/1/2018 3/1/2018
1 100 2/1/2018 04/30/2018
1 90 4/1/2018 04/30/2018
1 90 6/1/2018 8/15/2018
1 100 5/1/2018 6/1/2018
1 98 6/1/2018 8/31/2018
1 100 9/1/2018 5/4/2019
;;;;
run;
proc print;
run;
/*1 100 1/1/2018 1/1/2019*/
data exv/ view=exv;
set have;
do date = beg to end;
output;
end;
drop beg end;
format date mmddyyd10.;
run;
proc sort data=exv out=ex nodupkey;
by id drug_code date;
run;
data breaksV / view=BreaksV;
set ex;
by id drug_code;
dif = dif(date);
if first.drug_code then do; dif=1; run=1; end;
if dif ne 1 then run+1;
run;
proc summary data=breaksV nway missing;
class id drug_code run;
var date;
output out=want(drop=_type_) min=Begin max=End;
run;
Proc print;
run;
Computing the extent range composed of overlapping segment ranges requires a good understanding of the range conditions (cases).
Consider the scenarios when sorted by start date (within any larger grouping set, G, such as id and drug)
Let [ and ] be endpoints of a range
# be date values (integers) within
Extent be the combined range that grows
Segment be the range in the current row
Case 1 - Growth. Within G Segment start before Extent end
Segment will either not contribute to Extent or extend it.
[####] Extent
+ [#] Segment range DOES NOT contribute
--------
[####] Extent (do not output a row, still growing)
or
[####] Extent
+ [#####] Segment range DOES contribute
--------
[#######] Extent (do not output a row, still growing)
Case 2 - Terminus. 3 possibilities:
Within G Segment start after Extent end,
Next G reached (different id/drug combination),
End of data reached.
#2 and #3 can be tested by checking the appropriate last. flag.
[####] Extent
+ ..[#] Segment beyond Extent (gap is 2)
--------
[####] output Extent
[#] reset Extent to Segment
You can adjust your rules for Segment being adjacent (gap=0) or close enough (gap < threshold) to mean an Extent is either expanded, or, output and reset to Segment.
Note: The situation is a little more (not shown) complicated for the real world cases of:
missing start means the Segment has an unknown start date (presume it to be epoch (0=01JAN1960, or some date that pre-dates all dates in the data or study)
missing end means the Segment is active today (end date is date when processing data)
Sample code:
data have;
call streaminit(42);
do id = 1 to 10;
do _n_ = 1 to 50;
drug = ceil(rand('UNIFORM', 10));
beg_date = intnx ('MONTH', '01JAN2008'D, rand('UNIFORM',20));
end_date = intnx ('DAY', beg_date, rand('UNIFORM',75));
OUTPUT;
end;
end;
format beg_date end_date yymmdd10.;
run;
proc sort data=have out=segments;
by id drug beg_date end_date;
run;
data want;
set segments;
by id drug beg_date end_date; * will error if incoming data is NOT sorted;
retain ext_beg ext_end;
retain gap_allowed 0; * set to 1 for contiguously adjacent segment ;
if first.drug then do;
ext_beg = beg_date;
ext_end = end_date;
segment_count = 0;
end;
if beg_date <= ext_end + gap_allowed then do;
ext_end = max (ext_end, end_date);
segment_count + 1;
end;
else do;
extent_id + 1;
OUTPUT;
ext_beg = beg_date;
ext_end = end_date;
segment_count = 1;
end;
if last.drug then do;
extent_id + 1;
OUTPUT;
* reset occurs implicitly;
* it will happen at first. logic when control returns to top of step;
end;
format ext_: yymmdd10.;
keep id drug ext_beg ext_end segment_count extent_id;
run;
I am trying to go through a list of dates and keep only the date range for dates that 5 or more occurrences and delete all others. The example I have is:
data test;
input dt dt2;
format dt dt2 date9.;
datalines;
20000 20001
20000 20002
20000 20003
21000 21001
21000 21002
21000 21003
21000 21004
21000 21005
;
run;
proc sort data = test;
by dt dt2;
run;
data check;
set test;
by dt dt2;
format dt dt2 date9.;
if last.dt = first.dt then
if abs(last.dt2 - first.dt) < 5 then delete;
run;
What I want returned is just one entry, if possible, but I would be happy with the entire appropriate range as well.
The one entry would be a table that has:
start_dt end_dt
21000 21005
The appropriate range is:
21000 21001
21000 21002
21000 21003
21000 21004
21000 21005
My code doesn't work as desired, and I am not sure what changes I need to make.
last.dt2 and first.dt are flags and can have value in (0,1), so condition abs(last.dt2 - first.dt) < 5 is always true.
Use counter variable to count records in group instead:
data check(drop= count);
length count 8;
count=0;
do until(last.dt);
set test;
by dt dt2;
format dt dt2 date9.;
count = count+1;
if last.dt and count>=5 then output;
end;
run;
I'm not sure why you are looking to use the last.dt2 and the first.dt within your delete function so I have turned it around to create your desired output:
data check2;
set test;
by dt ;
format dt dt2 date9.;
if last.dt then do;
if abs(dt2 - dt) >= 5 then output;
end;
run;
Of course, this will only work if your file is sorted on dt and dt2.
Hope this helps.
I have to calculate the correlation and covariance for my daily sales values for an event window. The event window is of 45 day period and my data looks like -
store_id date sales
5927 12-Jan-07 3,714.00
5927 12-Jan-07 3,259.00
5927 14-Jan-07 3,787.00
5927 14-Jan-07 3,480.00
5927 17-Jan-07 3,646.00
5927 17-Jan-07 3,316.00
4978 18-Jan-07 3,530.00
4978 18-Jan-07 3,103.00
4978 18-Jan-07 3,026.00
4978 21-Jan-07 3,448.00
Now, for every store_id, date combination, I need to go back 45 days (there is more data for each combination in my original data set) calculate the correlation between sales and lag(sales) i.e. autocorrelation of degree one. As you can see, the date column is not continuous. So something like (date - 45) is not going to work.
I have gotten till this part -
data ds1;
set ds;
by store_id;
LAG_SALE = lag(sales);
IF FIRST.store_idTHEN DO;
LAG_SALE = .;
END;
run;
For calculating correlation and covariances -
proc corr data=ds1 outp=Corr
by store_id date;
cov; /** include covariances **/
var sales lag_sale;
run;
But how do I insert the event window for each date, store_id combination? My final output should look something like this -
id date corr cov
5927 12-Jan-07 ... ...
5927 14-Jan-07 ... ...
Here is what I've come up with:
First I convert the date to a SAS date, which is the number of days since Jan. 1 1960:
data ds;
set ds (rename=(date=old_date));
date = input(old_date, date11.);
drop old_date;
run;
Then compute lag_sale (I am using the same calculation you used in the question, but make sure this is what you want to do. For some observations the lag sale is the previous recorded date, but for some it is the same store_id and date, just a different observation.):
proc sort data=ds; by store_id; run;
data ds;
set ds;
by store_id;
lag_sale = lag(sales);
if first.store_id then lag_sale = .;
run;
Then set up the final data set:
data final;
length store_id 8 date 8 cov 8 corr 8;
if _n_ = 0;
run;
Then create a macro which takes a store_id and date and runs proc corr. The first part of the macro selects only the data with that store_id and within the past 45 days of the date. Then it runs proc corr. Then it formats proc corr how you want it and appends the results to the "final" data set.
%macro corr(store_id, date);
data ds2;
set ds;
where store_id = &store_id and %eval(&date-45) <= date <=&date
and lag_sale ne .;
run;
proc corr noprint data=ds2 cov outp=corr;
by store_id;
var sales lag_sale;
run;
data corr2;
set corr;
where _type_ in ('CORR', 'COV') and _name_ = 'sales';
retain cov;
date = &date;
if _type_ = 'COV' then cov = lag_sale;
else do;
corr = lag_sale;
output;
end;
keep store_id date corr cov;
run;
proc append base=final data=corr2 force; run;
%mend corr;
Finally run the macro for each store_id/date combination.
proc sort data=ds out=ds3 nodupkey;
by store_id date;
run;
data _null_;
set ds3;
call execute('%corr('||store_id||','||date||');');
run;
proc sort data=final;
by store_id date;
run;