I am trying to create a query to find number of occurrences in a list in a SAS dataset, for the past 12 Months starting from Last Month
I have created the macro below to be used in my WHERE clause:
%let cur_date = %sysfunc(today(), date9.);
%let pre_date2 = %sysfunc(putn(%sysfunc(intnx(month, %sysfunc(today()), -1, End)),%sysfunc(intnx(month, %sysfunc(today()), -12, End)) date9.)));
%put &pre_date4;
I would appreciate if you can help me with this.
Thanks
You need two macro variables: one for the end of the prior month and one for the first day 12 months prior to last month.
%let last_month = %sysfunc(intnx(month, %sysfunc(today()), -1, E) );
%let last_12_months = %sysfunc(intnx(month, &last_month., -12, B) );
Now you can run your query using between:
where date BETWEEN &last_month. AND &last_12_months.;
Example:
data have;
do i = -36 to 0;
date = intnx('month', today(), i, 'B');
output;
end;
format date date9.;
drop i;
run;
data want;
set have;
where date BETWEEN &last_month. AND &last_12_months.;
run;
Output:
date
01OCT2020
01NOV2020
01DEC2020
01JAN2021
01FEB2021
01MAR2021
01APR2021
01MAY2021
01JUN2021
01JUL2021
01AUG2021
01SEP2021
Related
We have a code that we use to create quarterly reports of projects. There is a piece of code, a do loop, that takes the startdate and enddate of each project in our dataset and creates an observation for each month and year that the project took place in. For example if we have a project called "Employment Help" with a startdate value of 01JAN2022 and an enddate value of 01APR2022, the do loop will create 4 observations for this project with the month and year values of 1 2022, 2 2022, 3 2022, and 4 2022. We use this to count how many projects happened during our quarters. We are running into an issue where the do loop is dropping projects and not giving them a month or year value and we are losing projects in our count because of this. The dates are all in the same format.
Here is an example of some data that is pulled in, EXAMPLE 2 is properly pulled into the do loop, EXAMPLE 1 does not get pulled through.
Here is the code:
**data test2;
set users3;
do i = 0 to (year(enddate)-year(startdate));
year = year(startdate)+i;
end;
do i = 0 to (month(enddate)-month(startdate));
month = month(startdate)+i;
drop i;
output;
end;
run;**
Consider the following example:
data have;
input project$ startdate:date9. enddate:date9.;
format startdate enddate date9.;
datalines;
A 01JAN2022 01APR2022
B 01MAR2022 01JUN2022
C 01NOV2022 01JAN2023
;
run;
The third row will fail to run because the difference between the start month number and end month number is negative (1 - 11). Instead of doing two loops, one for year and one for month, do a single loop for all of the months from the start date. Use intnx() to generate your months using startdate as the reference month. i will offset each month from the start date. For example:
code output
intnx('month', '01JAN2022'd, 0) 01JAN2022
intnx('month', '01JAN2022'd, 1) 01FEB2022
intnx('month', '01JAN2022'd, 2) 01MAR2022
Since you're incrementing by exactly one month for each date, you can get the year and month number in a single loop.
data want;
set have;
do i = 0 to intck('month', startdate, enddate);
month = month(intnx('month', startdate, i) );
year = year(intnx('month', startdate, i) );
output;
end;
drop i;
run;
Your code doesn't seem to handle the cross of years, ie if a project started in 2021 and ended in 2022.
This should get you closer.
data have;
input startdate : date9. enddate : date9.;
format startdate enddate date9.;
cards;
01Jan2022 01Apr2022
01Sep2021 01Apr2022
;;;
run;
data want;
set have;
nmonths = intck('month', startdate, enddate) +1 ;
date = startdate;
do i = 1 to nmonths;
month = month(date);
year = year(date);
date = intnx('month', startdate, i, 'b');
output;
end;
run;
I have a SAS code (SQL) that has to repeat for 25 times; for each month/year combination (see code below). How can I use a macro in this code?
proc sql;
create table hh_oud_AUG_17 as
select hh_key
,sum(RG_count) as RG_count_aug_17
,case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht_aug_17
from basis_RG_oud
where valid_from_dt <= "01AUG2017"d <= valid_to_dt
group by hh_key
order by hh_key
;
quit;
proc sql;
create table hh_oud_SEP_17 as
select hh_key
,sum(RG_count) as RG_count_sep_17
,case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht_sep_17
from basis_RG_oud
where valid_from_dt <= "01SEP2017"d <= valid_to_dt
group by hh_key
order by hh_key
;
quit;
If you use a data step to do this, you can put all the desired columns in the same output dataset rather than using a macro to create 25 separate datasets:
/*Generate lists of variable names*/
data _null_;
stem1 = "RG_count_";
stem2 = "loyabo_recht_";
month = '01aug2017'd;
length suffix $4 vlist1 vlist2 $1000;
do i = 0 to 24;
suffix = put(intnx('month', month, i, 's'), yymmn4.);
vlist1 = catx(' ', vlist1, cats(stem1,suffix));
vlist2 = catx(' ', vlist2, cats(stem2,suffix));
end;
call symput("vlist1",vlist1);
call symput("vlist2",vlist2);
run;
%put vlist1 = &vlist1;
%put vlist2 = &vlist2;
/*Produce output table*/
data want;
if 0 then set have;
start_month = '01aug2017'd;
array rg_count[2, 0:24] &vlist1 &vlist2;
do _n_ = 1 by 1 until(last.hh_key);
set basis_RG_oud;
by hh_key;
do i = 0 to hbound2(rg_count);
if valid_from_dt <= intnx('month', start_month, i, 's') <= valid_to_dt
then rg_count[1,i] = sum(rg_count[1,i],1);
end;
end;
do _n_ = 1 to _n_;
set basis_RG_oud;
do i = 0 to hbound2(rg_count);
rg_count[2,i] = rg_count[1,i] >= 2;
end;
end;
run;
Create a second data set that enumerates (is a list of) the months to be examined. Cross Join the original data to that second data set. Create a single output table (or view) that contains the month as a categorical variable and aggregates based on that. You will be able to by-group process, classify or subset based on the month variable.
data months;
do month = '01jan2017'd to '31dec2018'd;
output;
month = intnx ('month', month, 0, 'E');
end;
format month monyy7.;
run;
proc sql;
create table want as
select
month, hh_key,
sum(RG_count) as RG_count,
case when sum(RG_count) >=2 then 1 else 0 end as loyabo_recht
from
basis_RG_oud
cross join
months
where
valid_from_dt <= month <= valid_to_dt
group
by month, hh_key
order
by month, hh_key
;
…
/* Some analysis */
BY MONTH;
…
/* Some tabulation */
CLASS MONTH;
TABLE … MONTH …
WHERE year(month) = 2018;
I'm looking to take a variable observation's date and essentially keep rolling it forward by its specified repricing parameter until a target date
the dataset being used is:
data have;
input repricing_frequency date_of_last_repricing end_date;
datalines;
3 15399 21367
10 12265 21367
15 13879 21367
;
format date_of_last_repricing end_date date9.;
informat date_of_last_repricing end_date date9.;
run;
so the idea is that i'd keep applying the repricing frequency of either 3 months, 10 months or 15 months to the date_of_last_repricing until it is as close as it can be to the date "31DEC2017". Thanks in advance.
EDIT including my recent workings:
data want;
set have;
repricing_N = intck('Month',date_of_last_repricing,'31DEC2017'd,'continuous');
dateoflastrepricing = intnx('Month',date_of_last_repricing,repricing_N,'E');
format dateoflastrepricing date9.;
informat dateoflastrepricing date9.;
run;
The INTNX function will compute an incremented date value, and allows the resultant interval alignment to be specified (in your case the 'end' of the month n-months hence)
data have;
format date_of_last_repricing end_date date9.;
informat date_of_last_repricing end_date date9.;
* use 12. to read the raw date values in the datalines;
input repricing_frequency date_of_last_repricing: 12. end_date: 12.;
datalines;
3 15399 21367
10 12265 21367
15 13879 21367
;
run;
data want;
set have;
status = 'Original';
output;
* increment and iterate;
date_of_last_repricing = intnx('month',
date_of_last_repricing, repricing_frequency, 'end'
);
do while (date_of_last_repricing <= end_date);
status = 'Computed';
output;
date_of_last_repricing = intnx('month',
date_of_last_repricing, repricing_frequency, 'end'
);
end;
run;
If you want to compute only the nearest end date, as when iterating by repricing frequency, you do not have to iterate. You can divide the months apart by the frequency to get the number of iterations that would have occurred.
data want2;
set have;
nearest_end_month = intnx('month', end_date, 0, 'end');
if nearest_end_month > end_date then nearest_end_month = intnx('month', nearest_end_month, -1, 'end');
months_apart = intck('month', date_of_last_repricing, nearest_end_month);
iterations_apart = floor(months_apart / repricing_frequency);
iteration_months = iterations_apart * repricing_frequency;
nearest_end_date = intnx('month', date_of_last_repricing, iteration_months, 'end');
format nearest: date9.;
run;
proc sql;
select id, max(date_of_last_repricing) as nearest_end_date format=date9. from want group by id;
select id, nearest_end_date from want2;
quit;
I'm trying to manipulate my Dispensing_Date to give me the weeknum of the year ending on last Friday for each Date, can this be done? Here is what I have so far...
%let 1= 01012016;
%let 53 = 12302016;
**01 import whiteoak file;
proc import
datafile = "E:\Horizon\Adhoc\AH\whiteoak.xlsx"
out = whiteoak
dbms = XLSX
replace;
run;
** 02 remove dupes to ensure unique rx and fill;
proc sort nodup data=whiteoak;
by Rx_ Refill;
run;
** 03 Filter out holds;
data whiteoak;
set whiteoak;
where (Filled_Status="YES");
run;
** 04 create weekday variable;
data dates;
set whiteoak;
format Dispensing_Date MMDDYY8.;
run;
This is my best guess as to what you are asking.
24 data _null_;
25 x = today();
26 d = intnx('week.7',x,-1,'end');
27 put (_all_)(=weekdate.);
28 run;
x=Wednesday, January 4, 2017 d=Friday, December 30, 2016
Does this do what you want?
data weeks;
do date = '22DEC2016'd to '15JAN2017'd;
format date first_friday weekdate.;
sas_week=week(date);
first_friday= intnx('week.7',intnx('year',date,0,'b'),0,'e');
friday_week=1+int((7+date-first_friday)/7) ;
output;
end;
run;
If it does then apply it to your data:
data dates;
set whiteoak;
week = 1 + int((7+Dispensing_Date
- intnx('week.7',intnx('year',Dispensing_Date,0,'b'),0,'e'))/7);
run;
I have monthly data with several observations per day. I have day, month and year variables. How can I retain data from only the first and the last 5 days of each month? I have only weekdays in my data so the first and last five days of the month changes from month to month, ie for Jan 2008 the first five days can be 2nd, 3rd, 4th, 7th and 8th of the month.
Below is an example of the data file. I wasn't sure how to share this so I just copied some lines below. This is from Jan 2, 2008.
Would a variation of first.variable and last.variable work? How can I retain observations from the first 5 days and last 5 days of each month?
Thanks.
1 AA 500 B 36.9800 NH 2 1 2008 9:10:21
2 AA 500 S 36.4500 NN 2 1 2008 9:30:41
3 AA 100 B 36.4700 NH 2 1 2008 9:30:43
4 AA 100 B 36.4700 NH 2 1 2008 9:30:48
5 AA 50 S 36.4500 NN 2 1 2008 9:30:49
If you want to examine the data and determine the minimum 5 and maximum 5 values then you can use PROC SUMMARY. You could then merge the result back with the data to select the records.
So if your data has variables YEAR, MONTH and DAY you can make a new data set that has the top and bottom five days per month using simple steps.
proc sort data=HAVE (keep=year month day) nodupkey
out=ALLDAYS;
by year month day;
run;
proc summary data=ALLDAYS nway;
class year month;
output out=MIDDLE
idgroup(min(day) out[5](day)=min_day)
idgroup(max(day) out[5](day)=max_day)
/ autoname ;
run;
proc transpose data=MIDDLE out=DAYS (rename=(col1=day));
by year month;
var min_day: max_day: ;
run;
proc sql ;
create table WANT as
select a.*
from HAVE a
inner join DAYS b
on a.year=b.year and a.month=b.month and a.day = b.day
;
quit;
/****
get some dates to play with
****/
data dates(keep=i thisdate);
offset = input('01Jan2015',DATE9.);
do i=1 to 100;
thisdate = offset + round(599*ranuni(1)+1); *** within 600 days from offset;
output;
end;
format thisdate date9.;
run;
/****
BTW: intnx('month',thisdate,1)-1 = first day of next month. Deduct 1 to get the last day
of the current month.
intnx('month',thisdate,0,"BEGINNING") = first day of the current month
****/
proc sql;
create table first5_last5 AS
SELECT
*
FROM
dates /* replace with name of your data set */
WHERE
/* replace all occurences of 'thisdate' with name of your date variable */
( intnx('month',thisdate,1)-5 <= thisdate <= intnx('month',thisdate,1)-1 )
OR
( intnx('month',thisdate,0,"BEGINNING") <= thisdate <= intnx('month',thisdate,0,"BEGINNING")+4 )
ORDER BY
thisdate;
quit;
Create some data with the desired structure;
Data inData (drop=_:); * froget all variables starting with an underscore*;
format date yymmdd10. time time8.;
_instant = datetime();
do _i = 1 to 1E5;
date = datepart(_instant);
time = timepart(_instant);
yy = year(date);
mm = month(date);
dd = day(date);
*just some more random data*;
letter = byte(rank('a') +floor(rand('uniform', 0, 26)));
*select week days*;
if weekday(date) in (2,3,4,5,6) then output;
_instant = _instant + 1E5*rand('exponential');
end;
run;
Count the days per month;
proc sql;
create view dayCounts as
select yy, mm, count(distinct dd) as _countInMonth
from inData
group by yy, mm;
quit;
Select the days;
data first_5(drop=_:) last_5(drop=_:);
merge inData dayCounts;
by yy mm;
_newDay = dif(date) ne 0;
retain _nrInMonth;
if first.mm then _nrInMonth = 1;
else if _newDay then _nrInMonth + 1;
if _nrInMonth le 5 then output first_5;
if _nrInMonth gt _countInMonth - 5 then output last_5;
run;
Use the INTNX() function. You can use INTNX('month',...) to find the beginning and ending days of the month and then use INTNX('weekday',...) to find the first 5 week days and last five week days.
You can convert your month, day, year values into a date using the MDY() function. Let's assume that you do that and create a variable called TODAY. Then to test if it is within the first 5 weekdays of last 5 weekdays of the month you could do something like this:
first5 = intnx('weekday',intnx('month',today,0,'B'),0) <= today
<= intnx('weekday',intnx('month',today,0,'B'),4) ;
last5 = intnx('weekday',intnx('month',today,0,'E'),-4) <= today
<= intnx('weekday',intnx('month',today,0,'E'),0) ;
Note that those ranges will include the week-ends, but it shouldn't matter if your data doesn't have those dates.
But you might have issues if your data skips holidays.