Member StartDate EndDate
1 27-Jul-17 27-Oct-17
2 27-Aug-19 11-Sep-19
3 28-Mar-17 31-Jul-17
4 17-Jun-19 13-Aug-19
5 21-Mar-17 16-May-17
6 17-Mar-17 05-Jul-17
7 20-Jan-16 11-Apr-16
8 27-Apr-15 09-Jun-15
9 13-Feb-19 19-Mar-19
10 27-May-15 30-Sep-15
11 16-Dec-16 30-Mar-17
12 17-Nov-16 02-Feb-17
data mydata;
set mdata;
weekdays=intck("weekdays",startdate,enddate);
run;
Holiday data is here:
holiday_date description
01/01/2003 New Years Day
17/02/2003 Family Day
18/04/2003 Good Friday
21/04/2003 Easter Monday
19/05/2003 Victoria Day
01/07/2003 Canada Day
04/08/2003 Civic Holiday
01/09/2003 Labour Day
13/10/2003 Thanksgiving Day
11/11/2003 Remembrance Day
25/12/2003 Christmas Day (*)
26/12/2003 Boxing Day (*)
01/01/2004 New Years Day
16/02/2004 Family Day
09/04/2004 Good Friday
12/04/2004 Easter Monday
24/05/2004 Victoria Day
01/07/2004 Canada Day
02/08/2004 Civic Holiday
I want to calculate number of week days excluding holidays. I have a table with the holidays dates. How do I apply it for each record to calculate business days?
This example is copied from SAS documentation.
Example 3: Using Custom Intervals with the INTCK Function
options intervalds=(BankingDays=BankDayDS);
data BankDayDS(keep=begin);
start = '15DEC1998'd;
stop = '15JAN2002'd;
nwkdays = intck('weekday',start,stop);
do i = 0 to nwkdays;
begin = intnx('weekday',start,i);
year = year(begin);
if begin ne holiday('NEWYEAR',year) and
begin ne holiday('MLK',year) and
begin ne holiday('USPRESIDENTS',year) and
begin ne holiday('MEMORIAL',year) and
begin ne holiday('USINDEPENDENCE',year) and
begin ne holiday('LABOR',year) and
begin ne holiday('COLUMBUS',year) and
begin ne holiday('VETERANS',year) and
begin ne holiday('THANKSGIVING',year) and
begin ne holiday('CHRISTMAS',year) then
output;
end;
format begin date9.;
run;
data CountDays;
start = '01JAN1999'd;
stop = '31DEC2001'd;
ActualDays = intck('DAYS',start,stop);
Weekdays = intck('WEEKDAYS',start,stop);
BankDays = intck('BankingDays',start,stop);
format start stop date9.;
run;
title 'Methods of Counting Days';
proc print data=CountDays;
run;
In response to the comment.
data mydata;
input Member (StartDate EndDate)(:date.);
format startdate enddate date11.;
cards;
1 27-Jul-17 27-Oct-17
2 27-Aug-19 11-Sep-19
3 28-Mar-17 31-Jul-17
4 17-Jun-19 13-Aug-19
5 21-Mar-17 16-May-17
6 17-Mar-17 05-Jul-17
7 20-Jan-16 11-Apr-16
8 27-Apr-15 09-Jun-15
9 13-Feb-19 19-Mar-19
10 27-May-15 30-Sep-15
11 16-Dec-16 30-Mar-17
12 17-Nov-16 02-Feb-17
;;;;
run;
proc print;
run;
proc summary data=mydata nway missing;
output out=range(drop=_:) min(startdate)=start max(enddate)=stop;
run;
proc print;
run;
options intervalds=(BankingDays=BankDayDS);
data BankDayDS(keep=begin);
set range;
nwkdays = intck('weekday',start,stop);
do i = 0 to nwkdays;
begin = intnx('weekday',start,i);
year = year(begin);
if begin ne holiday('NEWYEAR',year) and
begin ne holiday('MLK',year) and
begin ne holiday('USPRESIDENTS',year) and
begin ne holiday('MEMORIAL',year) and
begin ne holiday('USINDEPENDENCE',year) and
begin ne holiday('LABOR',year) and
begin ne holiday('COLUMBUS',year) and
begin ne holiday('VETERANS',year) and
begin ne holiday('THANKSGIVING',year) and
begin ne holiday('CHRISTMAS',year) then
output;
end;
format begin date9.;
run;
data mydata;
set mydata;
ActualDays = intck('DAYS',startdate,enddate);
weekdays = intck("weekdays",startdate,enddate);
BankDays = intck('BankingDays',startdate,enddate);
run;
title 'Methods of Counting Days';
proc print data=mydata;
run;
Related
Say I had 5 years of data that were being used to calculate some measure across those aggregated years. Sometimes those are 5 consecutive years and other times data was not available for a given year so it must be skipped. For example 2016-2020 vs 2015-2017 & 2019-2020. In this case data was not available for 2018. I have been given a set of rules for how these years should be presented.
Consecutive years should be ex: 2016-2020
Non-Consecutive Years Will Look slightly different depending on where the missing year(s) occur.
2015-2017 & 2019-2020
2010, 2012, 2014, 2016 & 2017
2015-2018 & 2020
While it would be trivial just to produce a comma separated list of all years used this is how they want the years presented. These labels are for a series of different measures so I am attempting to create these labels automatically within a macro. The number of years of data is also not always 5. It could be 3 years or even 10 years.
The obvious first idea was a do until process that started at the minimum year and progressively compared against the next year used in the analysis looking to see if the years were consecutive. Given the number of years isn't consistently 5 this was what made the most sense so far but I have not worked with do until loops very much. As such I couldn't quite figure out how to progressivley build the label over the iterations of the do until loop while also adhering to these rules.
For this example lets use the years 2015,2016,2017,2019,2020.
Any help would be greatly appreciated.
This could be a case of a picture is worth a thousand words.
Example:
/* simulate raw results of a survey of 10 questions over 16 years */
data surveyresults;
call streaminit(20230125);
do qid = 1 to 10;
do year = 2007 to 2022;
if year = 2021 then continue;
if rand('uniform') > 0.85 then continue;
do _n_ = 1 to rand('integer', 30);
pid + 1;
if rand('uniform') > 0.85 then continue;
answercode = rand('integer', 20);
output;
end;
end;
end;
run;
proc sql noprint;
create table stage1 as
select distinct qid, year, 1 as flag
from surveyresults
order by qid, year
;
select catx(' ', min(year), 'to', max(year))
into :year_range
from stage1 ;
ods html file='plot.html';
proc sgplot data=stage1;
scatter x=year y=qid / markerattrs=(symbol=squarefilled size=8.2%);
xaxis values=(&year_range);
yaxis type=discrete;
run;
ods html close;
This should get you started.
data test;
infile cards dsd;
input x ##;
d = dif(x); /*used to create RUN when dif > 1 increment run*/
if d eq . or d > 1 then run+1;
cards;
2015,2016,2017,2019,2020,2022,2024,2025,2026
;;;;
run;
proc print;
run;
proc summary data=test nway; /*count the number of years in each run*/
class run;
output out=runlen(drop=_type_);
run;
data test; /* merge TEST and RUNLEN*/
length list $128;
do until(last.run); /*loop until last.run*/
merge test runlen;
by run;
if first.run then list = cats(x); /*start of list*/
end;
select(_freq_); /*based on run-length create LIST */
when(1);
when(2) list = catx(' & ',list,x);
otherwise list = catx('-',list,x);
end;
run;
proc print;
run;
Probably an easier way than this, but this works for your scenarios.
data years;
input year;
cards;
2015
2016
2017
2019
2020
;
run;
/* data years; */
/* input year; */
/* cards; */
/* 2010 */
/* 2012 */
/* 2014 */
/* 2016 */
/* 2017 */
/* ; */
/* run; */
/* data years; */
/* input year; */
/* cards; */
/* 2015 */
/* 2016 */
/* 2017 */
/* 2018 */
/* 2020 */
/* ; */
/* run; */
data want;
merge years end=eof years(firstobs=2 rename=year=next_year);
length year_list $200. interval $20.;;
retain year_list start_year;
_dif= next_year - year;
if _n_=1 then start_year=year;
if _dif > 1 or eof then do;
if start_year ne year then interval = catx('-', start_year, year);
else interval = put(start_year, 8. -l);
if eof then year_list=catx(" & ", year_list, interval);
else year_list = catx(", ", year_list, interval);
start_year = next_year;
end;
if eof then call symputx('year_list', year_list);
run;
%put &year_list;
This version creates the combined list. I think it has the features you describe.
data test;
infile cards dsd;
input x ##;
d = dif(x); /*used to create RUN when dif > 1 increment run*/
if d eq . or d > 1 then run+1;
cards;
2015,2016,2017,2019,2020,2022,2024,2025,2026
;;;;
run;
proc print;
run;
data list(keep=combinedlist);
length list $128 combinedList $256;
do while(not eof);
list=' ';
do runlength=1 by 1 until(last.run); /*loop until last.run*/
set test end=eof;
by run;
if first.run then list = cats(x); /*start of list*/
end;
select(runlength); /*based on run-length create LIST */
when(1);
when(2) list = catx(' & ',list,x);
otherwise list = catx('-',list,x);
end;
combinedList = catx(', ',combinedList,list);
end;
output;
stop;
run;
proc print;
run;
Since I am new to SAS I need some help to understand how to combine the overlap date ranges into one row.I want to combine the overlap date ranges when they have matching Id. If the dates don’t overlap then I want to keep them as it is. IF they over lap by Matching Id and drug code Then it should combine into one line. Please look at the same ple data set which I have below and the expected results:
Current Data set:
ID Drug Code BEG_Date End_Date
1 100 1/1/2018 1/1/2019
1 100 1/1/2018 3/1/2018
1 100 2/1/2018 04/30/2018
1 90 4/1/2018 04/30/2018
1 100 5/1/2018 6/1/2018
1 98 6/1/2018 8/31/2018
1 100 9/1/2018 5/4/2019
Expected results:
ID Drug Code BEG_Date End_Date
1 100 1/1/2018 3/31/2018
1 90 4/1/2018 04/30/2018
1 100 5/1/2018 6/1/2018
1 98 6/2/2018 8/31/2018
1 100 9/1/2018 5/4/2019
I wrote some SAS code but I am combining the dates even when there is no overlap. I want to write some code which should work in SAS.
PROC SORT DATA=Want OUT=ONE;
BY PERSON_ID BEG_DATE DRUG_CODE END_DATE;
RUN;
data TWO (DROP=PERSON_ID2 DRUG_CODE2 BEG_DATE END_DATE
RENAME=(BEG2=BEG_DOS
END2=END_DOS));
SET ONE;
RETAIN BEG2 END2;
PERSON_ID2=LAG1(PERSON_ID);
DRUG_CODE2=LAG1(DRUG_CODE);
IF PERSON_ID2=PERSON_ID AND DRUG_CODE2=DRUG_CODE AND BEG_DATE LE(END2+1) THEN
DO;
BEG2=MIN(BEG_DATE,BEG2);
END2=MAX(END_DATE,END2);
END;
ELSE
DO;
SEG+1;
BEG2=BEG_DATE;
END2=END_DATE;
END;
FORMAT BEG2 END2 MMDDYY10.;
RUN;
DATA THREE(DROP=BEG_DOS END_DOS SEG);
RETAIN BEG_DATE END_DATE;
SET TWO;
BY PERSON_ID SEG;
FORMAT BEG_DATE END_DATE MMDDYY10.;
IF FIRST.SEG THEN
DO;
BEG_DATE=BEG_DOS;
END;
IF LAST.SEG THEN
DO;
END_DATE = END_DOS;
OUTPUT;
END;
RUN;
This is how I would do it. Create an obs for each ID DRUG and DATE. Flag the gaps and summarize by RUN.
data have;
input ID Drug_Code (BEG End)(:mmddyy.);
format BEG End mmddyyd10.;
cards;
1 100 1/1/2018 3/1/2018
1 100 2/1/2018 04/30/2018
1 90 4/1/2018 04/30/2018
1 90 6/1/2018 8/15/2018
1 100 5/1/2018 6/1/2018
1 98 6/1/2018 8/31/2018
1 100 9/1/2018 5/4/2019
;;;;
run;
proc print;
run;
/*1 100 1/1/2018 1/1/2019*/
data exv/ view=exv;
set have;
do date = beg to end;
output;
end;
drop beg end;
format date mmddyyd10.;
run;
proc sort data=exv out=ex nodupkey;
by id drug_code date;
run;
data breaksV / view=BreaksV;
set ex;
by id drug_code;
dif = dif(date);
if first.drug_code then do; dif=1; run=1; end;
if dif ne 1 then run+1;
run;
proc summary data=breaksV nway missing;
class id drug_code run;
var date;
output out=want(drop=_type_) min=Begin max=End;
run;
Proc print;
run;
Computing the extent range composed of overlapping segment ranges requires a good understanding of the range conditions (cases).
Consider the scenarios when sorted by start date (within any larger grouping set, G, such as id and drug)
Let [ and ] be endpoints of a range
# be date values (integers) within
Extent be the combined range that grows
Segment be the range in the current row
Case 1 - Growth. Within G Segment start before Extent end
Segment will either not contribute to Extent or extend it.
[####] Extent
+ [#] Segment range DOES NOT contribute
--------
[####] Extent (do not output a row, still growing)
or
[####] Extent
+ [#####] Segment range DOES contribute
--------
[#######] Extent (do not output a row, still growing)
Case 2 - Terminus. 3 possibilities:
Within G Segment start after Extent end,
Next G reached (different id/drug combination),
End of data reached.
#2 and #3 can be tested by checking the appropriate last. flag.
[####] Extent
+ ..[#] Segment beyond Extent (gap is 2)
--------
[####] output Extent
[#] reset Extent to Segment
You can adjust your rules for Segment being adjacent (gap=0) or close enough (gap < threshold) to mean an Extent is either expanded, or, output and reset to Segment.
Note: The situation is a little more (not shown) complicated for the real world cases of:
missing start means the Segment has an unknown start date (presume it to be epoch (0=01JAN1960, or some date that pre-dates all dates in the data or study)
missing end means the Segment is active today (end date is date when processing data)
Sample code:
data have;
call streaminit(42);
do id = 1 to 10;
do _n_ = 1 to 50;
drug = ceil(rand('UNIFORM', 10));
beg_date = intnx ('MONTH', '01JAN2008'D, rand('UNIFORM',20));
end_date = intnx ('DAY', beg_date, rand('UNIFORM',75));
OUTPUT;
end;
end;
format beg_date end_date yymmdd10.;
run;
proc sort data=have out=segments;
by id drug beg_date end_date;
run;
data want;
set segments;
by id drug beg_date end_date; * will error if incoming data is NOT sorted;
retain ext_beg ext_end;
retain gap_allowed 0; * set to 1 for contiguously adjacent segment ;
if first.drug then do;
ext_beg = beg_date;
ext_end = end_date;
segment_count = 0;
end;
if beg_date <= ext_end + gap_allowed then do;
ext_end = max (ext_end, end_date);
segment_count + 1;
end;
else do;
extent_id + 1;
OUTPUT;
ext_beg = beg_date;
ext_end = end_date;
segment_count = 1;
end;
if last.drug then do;
extent_id + 1;
OUTPUT;
* reset occurs implicitly;
* it will happen at first. logic when control returns to top of step;
end;
format ext_: yymmdd10.;
keep id drug ext_beg ext_end segment_count extent_id;
run;
I'm looking to take a variable observation's date and essentially keep rolling it forward by its specified repricing parameter until a target date
the dataset being used is:
data have;
input repricing_frequency date_of_last_repricing end_date;
datalines;
3 15399 21367
10 12265 21367
15 13879 21367
;
format date_of_last_repricing end_date date9.;
informat date_of_last_repricing end_date date9.;
run;
so the idea is that i'd keep applying the repricing frequency of either 3 months, 10 months or 15 months to the date_of_last_repricing until it is as close as it can be to the date "31DEC2017". Thanks in advance.
EDIT including my recent workings:
data want;
set have;
repricing_N = intck('Month',date_of_last_repricing,'31DEC2017'd,'continuous');
dateoflastrepricing = intnx('Month',date_of_last_repricing,repricing_N,'E');
format dateoflastrepricing date9.;
informat dateoflastrepricing date9.;
run;
The INTNX function will compute an incremented date value, and allows the resultant interval alignment to be specified (in your case the 'end' of the month n-months hence)
data have;
format date_of_last_repricing end_date date9.;
informat date_of_last_repricing end_date date9.;
* use 12. to read the raw date values in the datalines;
input repricing_frequency date_of_last_repricing: 12. end_date: 12.;
datalines;
3 15399 21367
10 12265 21367
15 13879 21367
;
run;
data want;
set have;
status = 'Original';
output;
* increment and iterate;
date_of_last_repricing = intnx('month',
date_of_last_repricing, repricing_frequency, 'end'
);
do while (date_of_last_repricing <= end_date);
status = 'Computed';
output;
date_of_last_repricing = intnx('month',
date_of_last_repricing, repricing_frequency, 'end'
);
end;
run;
If you want to compute only the nearest end date, as when iterating by repricing frequency, you do not have to iterate. You can divide the months apart by the frequency to get the number of iterations that would have occurred.
data want2;
set have;
nearest_end_month = intnx('month', end_date, 0, 'end');
if nearest_end_month > end_date then nearest_end_month = intnx('month', nearest_end_month, -1, 'end');
months_apart = intck('month', date_of_last_repricing, nearest_end_month);
iterations_apart = floor(months_apart / repricing_frequency);
iteration_months = iterations_apart * repricing_frequency;
nearest_end_date = intnx('month', date_of_last_repricing, iteration_months, 'end');
format nearest: date9.;
run;
proc sql;
select id, max(date_of_last_repricing) as nearest_end_date format=date9. from want group by id;
select id, nearest_end_date from want2;
quit;
I have monthly data with several observations per day. I have day, month and year variables. How can I retain data from only the first and the last 5 days of each month? I have only weekdays in my data so the first and last five days of the month changes from month to month, ie for Jan 2008 the first five days can be 2nd, 3rd, 4th, 7th and 8th of the month.
Below is an example of the data file. I wasn't sure how to share this so I just copied some lines below. This is from Jan 2, 2008.
Would a variation of first.variable and last.variable work? How can I retain observations from the first 5 days and last 5 days of each month?
Thanks.
1 AA 500 B 36.9800 NH 2 1 2008 9:10:21
2 AA 500 S 36.4500 NN 2 1 2008 9:30:41
3 AA 100 B 36.4700 NH 2 1 2008 9:30:43
4 AA 100 B 36.4700 NH 2 1 2008 9:30:48
5 AA 50 S 36.4500 NN 2 1 2008 9:30:49
If you want to examine the data and determine the minimum 5 and maximum 5 values then you can use PROC SUMMARY. You could then merge the result back with the data to select the records.
So if your data has variables YEAR, MONTH and DAY you can make a new data set that has the top and bottom five days per month using simple steps.
proc sort data=HAVE (keep=year month day) nodupkey
out=ALLDAYS;
by year month day;
run;
proc summary data=ALLDAYS nway;
class year month;
output out=MIDDLE
idgroup(min(day) out[5](day)=min_day)
idgroup(max(day) out[5](day)=max_day)
/ autoname ;
run;
proc transpose data=MIDDLE out=DAYS (rename=(col1=day));
by year month;
var min_day: max_day: ;
run;
proc sql ;
create table WANT as
select a.*
from HAVE a
inner join DAYS b
on a.year=b.year and a.month=b.month and a.day = b.day
;
quit;
/****
get some dates to play with
****/
data dates(keep=i thisdate);
offset = input('01Jan2015',DATE9.);
do i=1 to 100;
thisdate = offset + round(599*ranuni(1)+1); *** within 600 days from offset;
output;
end;
format thisdate date9.;
run;
/****
BTW: intnx('month',thisdate,1)-1 = first day of next month. Deduct 1 to get the last day
of the current month.
intnx('month',thisdate,0,"BEGINNING") = first day of the current month
****/
proc sql;
create table first5_last5 AS
SELECT
*
FROM
dates /* replace with name of your data set */
WHERE
/* replace all occurences of 'thisdate' with name of your date variable */
( intnx('month',thisdate,1)-5 <= thisdate <= intnx('month',thisdate,1)-1 )
OR
( intnx('month',thisdate,0,"BEGINNING") <= thisdate <= intnx('month',thisdate,0,"BEGINNING")+4 )
ORDER BY
thisdate;
quit;
Create some data with the desired structure;
Data inData (drop=_:); * froget all variables starting with an underscore*;
format date yymmdd10. time time8.;
_instant = datetime();
do _i = 1 to 1E5;
date = datepart(_instant);
time = timepart(_instant);
yy = year(date);
mm = month(date);
dd = day(date);
*just some more random data*;
letter = byte(rank('a') +floor(rand('uniform', 0, 26)));
*select week days*;
if weekday(date) in (2,3,4,5,6) then output;
_instant = _instant + 1E5*rand('exponential');
end;
run;
Count the days per month;
proc sql;
create view dayCounts as
select yy, mm, count(distinct dd) as _countInMonth
from inData
group by yy, mm;
quit;
Select the days;
data first_5(drop=_:) last_5(drop=_:);
merge inData dayCounts;
by yy mm;
_newDay = dif(date) ne 0;
retain _nrInMonth;
if first.mm then _nrInMonth = 1;
else if _newDay then _nrInMonth + 1;
if _nrInMonth le 5 then output first_5;
if _nrInMonth gt _countInMonth - 5 then output last_5;
run;
Use the INTNX() function. You can use INTNX('month',...) to find the beginning and ending days of the month and then use INTNX('weekday',...) to find the first 5 week days and last five week days.
You can convert your month, day, year values into a date using the MDY() function. Let's assume that you do that and create a variable called TODAY. Then to test if it is within the first 5 weekdays of last 5 weekdays of the month you could do something like this:
first5 = intnx('weekday',intnx('month',today,0,'B'),0) <= today
<= intnx('weekday',intnx('month',today,0,'B'),4) ;
last5 = intnx('weekday',intnx('month',today,0,'E'),-4) <= today
<= intnx('weekday',intnx('month',today,0,'E'),0) ;
Note that those ranges will include the week-ends, but it shouldn't matter if your data doesn't have those dates.
But you might have issues if your data skips holidays.
I have a data set with daily data in SAS. I would like to convert this to monthly form by taking differences from the previous month's value by id. For example:
thedate, id, val
2012-01-01, 1, 10
2012-01-01, 2, 14
2012-01-02, 1, 11
2012-01-02, 2, 12
...
2012-02-01, 1, 20
2012-02-01, 2, 15
I would like to output:
thedate, id, val
2012-02-01, 1, 10
2012-02-01, 2, 1
Here is one way. If you license SAS-ETS, there might be a better way to do it with PROC EXPAND.
*Setting up the dataset initially;
data have;
informat thedate YYMMDD10.;
input thedate id val;
datalines;
2012-01-01 1 10
2012-01-01 2 14
2012-01-02 1 11
2012-01-02 2 12
2012-02-01 1 20
2012-02-01 2 15
;;;;
run;
*Sorting by ID and DATE so it is in the right order;
proc sort data=have;
by id thedate;
run;
data want;
set have;
retain lastval; *This is retained from record to record, so the value carries down;
by id thedate;
if (first.id) or (last.id) or (day(thedate)=1); *The only records of interest - the first record, the last record, and any record that is the first of a month.;
* To do END: if (first.id) or (last.id) or (thedate=intnx('MONTH',thedate,0,'E'));
if first.id then call missing(lastval); *Each time ID changes, reset lastval to missing;
if missing(lastval) then output; *This will be true for the first record of each ID only - put that record out without changes;
else do;
val = val-lastval; *set val to the new value (current value minus retained value);
output; *put the record out;
end;
lastval=sum(val,lastval); *this value is for the next record;
run;
You could achieve this using a PROC SQL, and the intnx function to bring last months date forward a month...
proc sql ;
create table lag as
select b.thedate, b.id, (b.val - a.val) as val
from mydata b
left join
mydata a on b.date = intnx('month',a.date,1,'s')
and b.id = a.id
order by b.date, b.id ;
quit ;
This may need tweaking to handle scenarios where the previous month doesn't exist or months which have a different number of days to the previous month.