Extracting years from input files over time interval [SAS] - sas

I'm using SAS macros to run through multiple monthly files and extract variables needed for further analysis. Currently the program takes each monthly file, extracts what's needed and outputs it as the same monthly file. I also have it set to combine everything into a yearly file. This works fine, for the most part:
%let start_date = '31jan2022'd;
%let end_date = '31mar2022'd;
%let num_years = %sysfunc(intck(year,&start_date,&end_date));
data _null_;
call symput('start_loop',compress(intck('month',&start_date,date())*-1));
call symput('end_loop',compress(intck('month',&end_date,date())*-1));
%MACRO MONTH_EXTRACT;
%do l=&start_loop. %to &end_loop.;
data _null_;
call symput ('monyy',put(intnx('month',date(),&l.,'end'),monyy5.));
call symput ('end_mon',put(intnx('month',date(),&l.,'end'),date9.));
call symput ('date',put(intnx('month',date(),&l.,'end'),yymmn.));
run;
&let file=libref.monthly_&date.;
%let file_year=%substr(&date,1,4);
data file_&monyy. (keep=var1 var2 var3);
set &file;
inpt_dt=&date;
proc append base=files_&file_year force data=file_&monyy.;
%end;
%mend;
This works fine when the start date and the end date are contained within the same year. However, what is desired is that, when the start date and end date are not within the same year, a yearly file will be compiled for each year within the interval. Ex.
start_date = '31oct2021'd;
end_date = '31mar2022'd;
This would generate two outputs called files_2021 and files_2022. When I run it with the current code, it only generates the first file for 2021.
I've attempted to add in:
%IF &num_years > 1 %then %do;
%LET start_year=input(substr(&start_date,9,-4),4.0);
start_month=input(&start_date,mmddyy10.);
%LET start_month=substr(start_month,1,2);
%LET end_year=input(substr(%end_date,9,-4),4.0);
end_month=input(&end_date,mmddyy10.);
%LET end_month=substr(end_month,1,2);
%DO file_interval=&start_year %to &end_year;
months=0;
%if file_interval=&start_year %then %do
mstart=&start_month;
%end;
%else %do
mstart=1;
%end;
%if file_interval=&end_year %then %do
mstop=&end_month;
%end;
%else %do
mstop = 12;
%end;
months=mstop-mstart+1;
%end;
I know that a proc append is needed, but I don't know what would be the data to append in this instance. I also know that right now I'm only counting the months. How can I isolate each year and create yearly output files from the monthly?

Instead of looping by year and calculating the month, loop by month and calculate the year. You can increment your loop by referencing off of the start date. For example:
Start Date = 01JAN2020
End Date = 01MAR2020
01JAN2020: 0 Months from Start Date
01FEB2020: 1 Month from Start Date
01MAR2020: 2 Months from Start Date
We can use a combination of intck() and intnx() to calculate this.
%let start_date = 31oct2021;
%let end_date = 31mar2022;
%macro month_extract;
%do i = 0 %to %sysfunc(intck(month, "&start_date."d, "&end_date."d) );
%let date = %sysfunc(intnx(month, "&start_date."d, &i.) );
%let month = %sysfunc(month(&date.), z2.);
%let year = %sysfunc(year(&date.) );
%let file = libref.monthly_&year.&month.;
%let outfile = files_&year.;
%put File: Year: &year. | Month: &month. | File: &file. | Outfile: &outfile.;
%end;
%mend;
%month_extract;
Output:
File: Year: 2021 | Month: 10 | File: libref.monthly_202110 | Outfile: files_2021
File: Year: 2021 | Month: 11 | File: libref.monthly_202111 | Outfile: files_2021
File: Year: 2021 | Month: 12 | File: libref.monthly_202112 | Outfile: files_2021
File: Year: 2022 | Month: 01 | File: libref.monthly_202201 | Outfile: files_2022
File: Year: 2022 | Month: 02 | File: libref.monthly_202202 | Outfile: files_2022
File: Year: 2022 | Month: 03 | File: libref.monthly_202203 | Outfile: files_2022

Related

How to call a macro inside a macro depending on conditions

I have a macro called "Comparison" that compares values from current period with the previous period, and it's working fine.
Edit to explain better: The macro Comparison will compare values from a specific account (revenues, for example) for the month T and T-1. All inside that macro works fine.
Say the current period is T. If the current month is March, June, September or December (Q1, Q2, Q3 or Q4), then I want to compare values from period T with T-1, T-1 with T-2 and T-2 with T-3. If the current month is not in the first condition, then I will only compare T with T-1. There's a variable called YEARMONTH (that can be 202210, for example) that I declare in another part of the code.
So basically I'm trying to run the Comparison macro 1 time if it's not the end of a quarter, or 3 times if it's a quarter.
I'm trying to do it as follows:
%MACRO TEST(YEARMONTH); /*20XXYY*/
%LET MONTH = %SUBSTR(&YEARMONTH,5,2);
%LET CP = &YEARMONTH.;
%LET CP_1 = &YEARMONTH. - 1;
%LET CP_2 = &YEARMONTH. - 2;
%IF &MONTH. = 3 %THEN %DO; %LET CP_3 = &YEARMONTH. - 91; %END
%ELSE %DO; %LET CP_3 = &YEARMONTH. - 3; %END;
%IF &MONTH. IN (3, 6, 9, 12) %THEN %DO;
%Comparison(CP,CP_1);
%Comparison(CP_1,CP_2);
%Comparison(CP_2,CP_3);
%END;
%ELSE %DO;
%Comparison(CP,CP_1);
%END;
%MEND TEST;
Basically I can't test it in SAS as my profile was mistakenly blocked by IT (they were meant to revoke my access to some libraries, but they revoked everything linked to SAS). Considering that the macro "Comparison" is working, will that new Macro work or are there flaws in my code?
It works a lot easier if you convert your YYYYMM string into an actual date. You have to use & before the macro variable name to pass in the values. You never defined CP_3 macro variable. You can just use the MOD() function to test if it is the last month of a quarter.
%macro test(yearmonth);
%local date month cp cp_1 cp_2 cp_3 ;
%let date=%sysfunc(inputn(&yearmonth,yymmn6.));
%let month=%sysfunc(month(&date));
%let cp = %sysfunc(putn(&date,yymmn6.));
%let cp_1 = %sysfunc(intnx(month,&date,-1),yymmn6.);
%let cp_2 = %sysfunc(intnx(month,&date,-2),yymmn6.);
%let cp_3 = %sysfunc(intnx(month,&date,-3),yymmn6.);
%comparison(&cp,&cp_1);
%if %sysfunc(mod(&month,3)) = 0 %then %do;
%comparison(&cp_1,&cp_2);
%comparison(&cp_2,&cp_3);
%end;
%mend test;
Let's make a dummy %COMPARISON() macro and test it;
317 %macro comparison(one,two);
318 %put &=one &=two;
319 %mend;
320
321 %test(202201)
ONE=202201 TWO=202112
322 %test(202203)
ONE=202203 TWO=202202
ONE=202202 TWO=202201
ONE=202201 TWO=202112

Iterate date in loop in SAS

need help on one query , I have to iterate date in do loop that is in format of yymmd6.(202112) so that once the month reach to 12 then its automatically change to next year first month.
///// code////////
%let startmo=202010 ;
%let endmo= 202102;
%macro test;
%do month= &startmo %to &endmo;
Data ABC_&month;
Set test&month;
X=&month ;
%end;
Run;
%mend;
%test;
//////////
Output should be 5 dataset as
ABC_202010
ABC_202011
ABC_202012
ABC_202101
ABC_20210
I need macro variable month to be resolved 202101 once it reached to 202012
Those are not actual DATE values. Just strings that you have imposed your own interpretation on so that they LOOK like dates to you.
Use date values instead and then it is easy to generate strings in the style you need by using a FORMAt.
%macro test(startmo,endmo);
%local offset month month_string;
%do offset = 0 to %sysfunc(intck(month,&startmo,&endmo));
%let month=%sysfunc(intnx(month,&startmo,&offset));
%let month_string=%sysfunc(putn(&month,yymmn6.));
data ABC_&month_string;
set test&month_string;
X=&month ;
format X monyy7.;
run;
%end;
%mend;
%test(startmo='01OCT2020'd , endmo='01FEB2021'd)
And if you need to convert one of those strings into a date value use an INFORMAT.
%let date=%sysfunc(inputn(202010,yymmn6.));
I would prefer to use a do while loop.
check whether the last 2 characters are 12, if so, change the month part to 01.
code
%let startmo=202010 ;
%let endmo= 202102;
%macro test;
%do %while(&startmo <= &endmo);
Data ABC_&startmo;
Set test&startmo;
X=&startmo ;
Run;
%end;
%let mon = %substr(&startmo, 5, 2);
%let yr = %substr(&startmo, 1, 4);
%if &mon = 12 %then %do;
%let m = 01;
%let startmo = %sysfunc(cat(%eval(&yr + 1), &m));
%end;
%else %do;
%let startmo = %eval(&startmo + 1);
%end;
%mend;
%test;

SAS - conditional macro variable

I have a variable date=201611 and I need to create the first day of the next month in the following format '2016-12-01'. The following code works fine for the months up till 11:
%let date = 201611;
%let rok = %sysfunc(substr(&date,1,4));
%let month = %sysfunc(substr(&date,5,2));
%let xdat2_ii = &rok-%eval(&month + 1)-01;
%let xdat1 = %str(%')&xdat2_ii.%str(%');
%put &xdat1;
'2016-12-01'
I need to add some improvement to make the code working for the month 12, i.e. when the date is 201612 then to obtain '2017-01-01'.
My idea was to do it using macro, but it does not work.
%macro promenne;
%if &month < 12 %then %let xdat2_ii = &rok-%eval(&month + 1)-01
%else %if &month= 12 %then %let xdat2_ii = %eval(&rok + 1)-01-01;
%mend promenne;
Thank you for any suggestions which way to go.
When working with dates, is often easiest to use the built in date shifting functions - in this case, intnx.
/* define variable (this is a macro STRING) */
%let date=201612;
/* convert to SAS date value (numeric, num of days since 01JAN1960) */
%let dateval=%sysfunc(mdy(%substr(&date,5,2),1,%substr(&date,1,4)));
/* finally - shift to beginning of following month and format output */
%let xdat2_ii=%sysfunc(intnx(MONTH,&dateval,1,B),yymmddd10.);
%put &xdat2_ii; /* 2017-01-01 */

SAS year function not working inside macro

hello am trying access columns from library with specific date format and using year function on the columns in my macro code but it produces duplicate values... but the year function displays duplicate values and does not provide desired results. my code should return only the year from the input dates.
%macro dteyear(lib=,outdsn=);
proc sql noprint;
select distinct catx(".",libname,memname), name
into :dsns separated by " ", :varname separated by " "
from dictionary.columns
where libname = upcase("&lib") and format=('YYMMDD10.')
order by 1;
quit;
%put &dsns;
%put &varname;
%local olddsn curdsn curvbl i;
data &outdsn.;
set
%let olddsn=;
%do i=1 %to &sqlobs;
%let curdsn=%scan(&dsns,&i,%str( ));
%let curvbl=%scan(&varname,&i,%str( ));
%if &curdsn NE &olddsn
%then %do;
%if &olddsn NE
%then %do;
)
%end;
%let olddsn=&curdsn.;
&curdsn (keep=&curvbl
%end;
%else %do;
&curvbl
%end;
%end;
);
%do i=1 %to &sqlobs;
%scan(&varname,&i,%str( ))=year(&varname.);
%end;
run;
proc print data=&outdsn;run;
%MEND;
%dteyear(lib=dte3,outdsn=dtetst);
the input data is as follows
1975-12-04
1977-11-03
1989-09-15
1998-06-17
1999-05-31
2000-08-14
2001-03-11
2007-03-11
2007-12-28
2008-10-07
2009-12-03
duplicate output from my code is-->
Obs RFDTC
1 1965-05-19
2 1965-05-19
3 1965-05-19
4 1965-05-19
5 1965-05-19
6 1965-05-19
7 1965-05-19
8 1965-05-19
9 1965-05-19
10 1965-05-19
11 1965-05-19
12 1965-05-19
13 1965-05-19
The basic problem is that the YEAR() function returns a 4-digit number, and the variable's format is YYMMDD10., so the result is formatted as a SAS date very close to 1960 (SAS's beginning of all time).
What I did in the code below was change the format to 4.0, so it displays as a 4-digit number.
If you want to have access to the original date variable, you'll have to create a new variable for the year. I'll leave that to you.
There was an additional problem--that is, YEAR(&varname.) inserts the entire list of variables, not just the one you're working with. It works if there is only one date variable, but not if there are more than one. I fixed this, too.
%macro dteyear(lib=,outdsn=);
proc sql noprint;
select distinct catx(".",libname,memname), name
into :dsns separated by " ", :varname separated by " "
from dictionary.columns
where libname = upcase("&lib") and format=('YYMMDD10.')
order by 1;
quit;
%put &dsns;
%put &varname;
%local olddsn curdsn curvbl i;
data &outdsn.;
set
%let olddsn=;
%do i=1 %to &sqlobs;
%let curdsn=%scan(&dsns,&i,%str( ));
%let curvbl=%scan(&varname,&i,%str( ));
%if &curdsn NE &olddsn
%then %do;
%if &olddsn NE
%then %do;
)
%end;
%let olddsn=&curdsn.;
&curdsn (keep=&curvbl
%end;
%else %do;
&curvbl
%end;
%end;
);
%do i=1 %to &sqlobs;
%let curvbl=%scan(&varname,&i,%str( ));
&curvbl=year(&curvbl.);
format &curvbl 4.0;
%end;
run;
proc print data=&outdsn;run;
%MEND;
data have;
input datevar yymmdd10.;
format datevar yymmdd10.;
cards;
1975-12-04
1977-11-03
1989-09-15
1998-06-17
1999-05-31
2000-08-14
2001-03-11
2007-03-11
2007-12-28
2008-10-07
2009-12-03
run;
options mprint;
%dteyear(lib=work,outdsn=want)
The result, then, is:
Obs datevar
1 1975
2 1977
3 1989
4 1998
5 1999
6 2000
7 2001
8 2007
9 2007
10 2008
11 2009
To convert a date value to just a year you can use the YEAR() function, but you also need to change the format attached to the variable since you will have essentially divided the value stored in it by 365 to convert it from the number of days to the number of years.
rfdtc = year(rfdtc);
format rfdtc 4. ;
Your macro is attempting to read many variables from many datasets and generate a single output dataset. I am not sure the resulting dataset will be of much value to you since it will look like a checker board of missing values. Also if the same variable name appears in more than one input dataset you will get corrupted values because of applying the YEAR() function to value that has already been converted from a date value to a year value.
For example you could end up generating a data step like this:
data WANT ;
set ds1 (keep=datevar1)
ds1 (keep=datevar2)
ds2 (keep=datevar3)
ds3 (keep=datevar3)
;
datevar1=year(datevar1);
datevar2=year(datevar2);
datevar3=year(datevar3);
datevar3=year(datevar3);
format datevar1 datevar2 datevar3 datevar3 4.;
run;
Since both input datasets DS2 and DS3 have a variable named DATEVAR3 you will be applying the YEAR() function to the value twice. That will convert everything to the year 1965.
To eliminate the problem with running the YEAR() function on the same value multiple times and losing the actual year perhaps you just want to apply the YEAR. format instead of converting the stored value.
format datevar1 datevar2 datevar3 datevar4 year. ;
That would still leave the underlying different date values. If you really need to values to be identical perhaps you could convert the value to the first day of the year? You could use INTNX() function
datevar1 = intnx('year',datevar1,0,'b');
or the MDY() function
datevar1 = mdy(1,1,year(datevar1));

Split SAS dataset

I have a SAS dataset that looks like this:
id | dept | ...
1 A
2 A
3 A
4 A
5 A
6 A
7 A
8 A
9 B
10 B
11 B
12 B
13 B
Each observation represents a person.
I would like to split the dataset into "team" datasets, each dataset can have a maximum of 3 observations.
For the example above this would mean creating 3 datasets for dept A (2 of these datasets would contain 3 observations and the third dataset would contain 2 observations). And 2 datasets for dept B (1 containing 3 observations and the other containing 2 observations).
Like so:
First dataset (deptA1):
id | dept | ...
1 A
2 A
3 A
Second dataset (deptA2)
id | dept | ...
4 A
5 A
6 A
Third dataset (deptA3)
id | dept | ...
7 A
8 A
Fourth dataset (deptB1)
id | dept | ...
9 B
10 B
11 B
Fifth dataset (deptB2)
id | dept | ...
12 B
13 B
The full dataset I'm using contains thousands of observations with over 50 depts. I can work out how many datasets per dept are required and I think a macro is the best way to go as the number of datasets required is dynamic. But I can't figure out the logic to create the datasets so that they have have a maximum of 3 observations. Any help appreciated.
Another version.
Compared to DavB version, it only processes input data once and splits it into several tables in single datastep.
Also if more complex splitting rule is required, it can be implemented in datastep view WORK.SOURCE_PREP.
data WORK.SOURCE;
infile cards;
length ID 8 dept $1;
input ID dept;
cards;
1 A
2 A
3 A
4 A
5 A
6 A
7 A
8 A
9 B
10 B
11 B
12 B
13 B
14 C
15 C
16 C
17 C
18 C
19 C
20 C
;
run;
proc sort data=WORK.SOURCE;
by dept ID;
run;
data WORK.SOURCE_PREP / view=WORK.SOURCE_PREP;
set WORK.SOURCE;
by dept;
length table_name $32;
if first.dept then do;
count = 1;
table = 1;
end;
else count + 1;
if count > 3 then do;
count = 1;
table + 1;
end;
/* variable TABLE_NAME to hold table name */
TABLE_NAME = catt('WORK.', dept, put(table, 3. -L));
run;
/* prepare list of tables */
proc sql noprint;
create table table_list as
select distinct TABLE_NAME from WORK.SOURCE_PREP where not missing(table_name)
;
%let table_cnt=&sqlobs;
select table_name into :table_list separated by ' ' from table_list;
select table_name into :tab1 - :tab&table_cnt from table_list;
quit;
%put &table_list;
%macro loop_when(cnt, var);
%do i=1 %to &cnt;
when ("&&&var.&i") output &&&var.&i;
%end;
%mend;
data &table_list;
set WORK.SOURCE_PREP;
select (TABLE_NAME);
/* generate OUTPUT statements */
%loop_when(&table_cnt, tab)
end;
run;
You could try this:
%macro split(inds=,maxobs=);
proc sql noprint;
select distinct dept into :dept1-:dept9999
from &inds.
order by dept;
select ceil(count(*)/&maxobs.) into :numds1-:numds9999
from &inds.
group by dept
order by dept;
quit;
%let numdept=&sqlobs;
data %do i=1 %to &numdept.;
%do j=1 %to &&numds&i;
dept&&dept&i&&j.
%end;
%end;;
set &inds.;
by dept;
if first.dept then counter=0;
counter+1;
%do i=1 %to &numdept.;
%if &i.=1 %then %do;
if
%end;
%else %do;
else if
%end;
dept="&&dept&i" then do;
%do k=1 %to &&numds&i.;
%if &k.=1 %then %do;
if
%end;
%else %do;
else if
%end;
counter<=&maxobs.*&k. then output dept&&dept&i&&k.;
%end;
end;
%end;
run;
%mend split;
%split(inds=YOUR_DATASET,maxobs=3);
Just replace the INDS parameter value in the %SPLIT macro call to the name of your input data set.