Suppose that I have a library save which contains daily files thefile_dly_yyyymmdd,
e.g. save.thefile_dly_20150831, save.thefile_dly_20150901, ... , save.thefile_dly_20210731.
I want to perform some manipulation on the historical data in this library, however, I only want to extract the files between a specific date, and also only want to keep the file which corresponds to the last file for each month, e.g. I want to extract save.thefile_dly_20150831, save.thefile_dly_20150930, save.thefile_dly_20151031, etc.
Something like the following.
%macro loop_through(start,end);
%do i = &start. %to &end.;
%if %sysunc(exist(SAVE.THEFILE_DLY_&i.)) %then %do;
/* Do some data processing on the file */
%end;
%end;
%mend;
%loop_though(20150831,20210731);
The problem is that the abovementioned code will loop through every single integer between 20150831 and 20210731, which is not optimal, and also, it will process every single file that exists for the month, and not just the file corresponding to the last day of each month.
How can I adjust? Any advice will be appreciated.
To loop over calendar intervals iterate over the number of intervals. Use the INTNX() function to calculate the next interval's date. Use the INTCK() function to calculate the number of intervals requested.
%macro loop_through(start,end);
%local offset ymd dsname ;
%do offset = 0 %to %sysfunc(intck(month,&start,&end));
%let ymd=%sysfunc(intnx(month,&start,&offset,end),yymmddn8.);
%let dsname=SAVE.THEFILE_DLY_&ymd;
%if %sysunc(exist(&dsname)) %then %do;
/* Do some data processing on the file */
%end;
%end;
%mend;
%loop_though('01AUG2015'd,'01JUL2021'd);
If you really want to allow the user of the macro to pass in YYYYMMDD digit strings instead of actual SAS date values then add some logic to your macro to convert the digit strings into actual date values. For example:
%let start=%sysfunc(inputn(&start,yymmdd8.));
Related
I am trying to write a macro wherein my paramter has multiple values and a few of those are prefxed with space. I want to be able to read in the strings along with the space, but space being the default delimiter is causing issues.
`%macro ab(where_p=);
data want;
set have;
%DO I =1 %TO %SYSFUNC(COUNTW(&WHERE_P));
%IF %LENGTH(&WHERE_P) > 0 %THEN %DO;
B_&I=%SCAN(%STR(&WHERE_P),&I);
%end;
%end;
run;
%mend;
%ab(WHERE_P=" ATF" " TRUST");`
Here it is not able to read in values as is, it reads a space as one string then ATF as next and then space again and TRUST as next. Wherein, it should read ' ATF' as one string and 'TRUST'as second.
Can someone help read in such data using scan function.
Thanks
Just use the functionality of the %SCAN() function to handle this. If the data includes the delimiter then the values need to be quoted.
%let WHERE_P=" ATF" " TRUST";
%let word1 = %scan(&where_p,1,%str( ),q);
So your loop should look like:
%IF %LENGTH(&WHERE_P) %THEN %DO I =1 %TO %SYSFUNC(COUNTW(&WHERE_P,%str( ),q));
B_&I=%SCAN(&where_p,&I,%str( ),q);
%end;
...
%ab(WHERE_P=" ATF" " TRUST");`
Or you could use a different delimiter that does NOT appear in the data. If you want to pass in leading spaces without actual quotes then you need to use macro quoting.
%IF %LENGTH(&WHERE_P) %THEN %DO I =1 %TO %SYSFUNC(COUNTW(&WHERE_P,|));
B_&I=%sysfunc(quote(%qSCAN(&where_p,&I,|)));
%end;
...
%ab(WHERE_P=%str( ATF| TRUST));
Try:
%macro ab(where_p=);
%let array_size = %EVAL(%SYSFUNC(COUNTC(&WHERE_P, '"'))/2);
data want;
set have;
array B_(&array_size) $20 (&where_p);
run;
%mend;
%ab(WHERE_P=" ATF" " TRUST" );
You first find the number of items which is number of quotes divided by 2.
Then create an array of that size and assign values using &WHERE_P directly.
If you want to allow strings in WHERE_P longer than 20 chars, you need to change the length in the array line.
Consider the following text value '1/3/2016' from a dataset. This is a badly formatted date value that i cannot correct using ANYDTDTE. as I am on SAS 9.0. In this string the day and month are also the wrong way round. This is actually 03JAN2016 in date9. format
Therefore I have attempted to correct all of the above with the following macro:
%macro date_cats();
proc sql noprint;
select scan(matchdate,1,'/'), scan(matchdate,2,'/'), strip(scan(matchdate,3,'/')) into :month, :day, :year
from test;
quit;
%let padder = 0;
%if %length(&month) < 2 %then
%let month = %sysfunc(cats(&padder., &month.));
%put &month.;
%if %length(&day) < 2 %then
%let day = %sysfunc(cats(&padder., &day.));
%put &day.;
%put %sysfunc(cats(&day., &month., &year.));
%mend;
%date_cats();
The three %put statements produce the following in the log:
01
03
132016
Can anyone tell me in the final put statement why the final CATS statement is either dropping the added '0' character or reverting back to the macro variables being joined before they were padded out?
Thanks
Don't use CATS() to generate macro variables.
First it is totally unneeded since you can concatenate macro variable values by just expanding their values next to each other. Replace
%let month = %sysfunc(cats(&padder., &month.));
with
%let month = &padder.&month.;
Second when trying to evaluate the arguments to functions like CATS() that can take either numeric or character values %SYSFUNC() will attempt to evaluate your strings to see if they are numbers. In your case they are numbers so the leading zeros disappear. In other cases you can cause SAS to generate warning messages.
Third, if you really want to convert a string like 'M/D/Y' into a string like 'DMY' then assuming the string contains valid dates then just use formats to do the conversion.
%let have=1/20/2015 ;
%let want=%sysfunc(inputn(&have,mmddyy10),ddmmyyn8);
CATS is seeing numbers and automatically converting them, unhelpfully.
Generally for macro vars you can use the following
%put &day.&month.&year.;
Hi I am trying to rename variables using SAS Macro loop.
%Let t1=12Mth;
%Let t2=20;
%Let t3=30;
%Let t4=40;
%Let t5=50;
%Let t6=60;
%macro Re(time);
%Do I = 1 %to &time.;
data MilkNew;
set Milk;
rename MT&&t&I..Sp=MTSp&&t&I.;
run;
%end;
%mend Re;
%Re(6)
This loop is mean to rename MT...Sp to MTSp.... Eg:MT20SP to MTSp20.
When I run my loop, there was no error but the variable names were not changed in MilkNew at all.
Where does the problem come? Thanks!
If the only purpose of the macro is to rename the variables in the data set, then why read the data with a set statement. Your data set is probably really small so you don't even realize the inefficiency of doing that. Instead use the modify statement in proc datasets to accomplish the same thing, but more efficiently. Here's an alternative macro for you.
%macro renamevar(dsname, time);
%local lib ds i;
%let lib = %sysfunc(coalescec(%scan(&dsname, -2, %str(.)), work));
%let ds = %scan(&dsname, -1, %str(.));
proc datasets lib=&lib nolist;
modify &ds;
rename
%do i = 1 %to &time;
mt&&t&i..Sp=MTSp&&t&i.
%end;
;
quit;
%mend;
%renamevar(milk, 6);
Here's the log after the macro call:
NOTE: Renaming variable mt12MthSp to MTSp12Mth.
NOTE: Renaming variable mt20Sp to MTSp20.
NOTE: Renaming variable mt30Sp to MTSp30.
NOTE: Renaming variable mt40Sp to MTSp40.
NOTE: Renaming variable mt50Sp to MTSp50.
NOTE: Renaming variable mt60Sp to MTSp60.
NOTE: MODIFY was successful for WORK.MILK.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
You should move the loop so that it only generates just the RENAME statement (or even just the old=new name pairs). What is happening now is that you keep overwriting MilkNew so only the last RENAME has any effect.
%macro Re(time);
data MilkNew;
set Milk;
%do I = 1 %to &time.;
rename MT&&t&I..Sp=MTSp&&t&I.;
%end;
run;
%mend Re;
%Re(6)
You should have seen the last variable name in the loop (so the 6th) changed. That's because you repeated the same data step with the same source dataset but a different destination - so each time you 'forgot' the changes made in the earlier step.
So, this would've worked, though I'll get in a minute to why this isn't a good way to do this.
%Let t1=12Mth;
%Let t2=20;
%Let t3=30;
%Let t4=40;
%Let t5=50;
%Let t6=60;
%macro Re(time);
%Do I = 1 %to &time.;
data Milk;
set Milk;
rename MT&&t&I..Sp=MTSp&&t&I.;
run;
%end;
%mend Re;
data milk;
input
MT12mthSP
MT20SP
MT30SP
MT40SP
MT50SP
MT60SP
;
datalines;
12 20 30 40 50 60
;;;;
run;
%Re(6)
Here I had it make all changes to Milk and save them back in that dataset. If you want to preserve Milk then first make Milk_New then have that in both set and data statements.
Second, you should not do a new data step for each change. Macros don't have to have a data step in them; they can be run inside the datastep.
So for example:
%macro Re(time);
%Do I = 1 %to &time.;
rename MT&&t&I..Sp=MTSp&&t&I.;
%end;
%mend Re;
data milk_new;
set milk;
%Re(6);
run;
Even better would be generating this list outside of a macro entirely - look up "generating code SAS" for suggestions on that.
If you didn't see any renames at all, you also may have an issue where a label is present on the column(s). That won't affect your usage of the variable name, but it will make it confusing. Use
label _all_;
Or include a label-clearing statement (label <varname>; where you pop in the same variable name as the original variable name before rename) inside your macro loop to fix that.
I am an intermediate user of SAS, but I have limited knowledge of arrays and macros. I have a set of code that prompts the user to enter a date range. For example, the user might enter December 1, 2015-December 5,2015. For simplicity, imagine the code looks like:
data new; set old;
if x1='December 1, 2015'd then y="TRUE";
run;
I need to run this same code for every day in the date prompt range, so for the 1st, 2nd, 3rd, 4th, and 5th. My thought was to create an array that contains the dates, but I am not sure how I would do that. My second thought was to create a macro, but I can't figure out out to feed a list through a macro.
Also, just FYI, the code is a lot longer and more complicated than just a data step.
The following macro can be used as a framework for your code:
%MACRO test(startDate, endDAte);
%DO i=&startDate %to &endate;
/* data steps go here */
/* example */
DATA test;
SET table;
IF x1 = &i THEN y = "true";
RUN;
%END;
%MEND;
Look into call execute to call your macro and a data null step using a do loop to loop through the days. Getting the string correct for the call execute can sometimes be tricky, but worth the effort overall.
data sample;
do date='01Jan2014'd to '31Jan2014'd;
output;
end;
run;
%macro print_date(date);
proc print data=sample;
where date="&date"d;
format date date9.;
run;
%mend;
%let date_start=05Jan2014;
%let date_end=11Jan2014;
data _null_;
do date1="&date_start"d to "&date_end"d by 1;
str='%print_date('||put(date1, date9.)||');';
call execute(str);
end;
run;
I wonder if there is a way of detecting whether a data set is empty, i.e. it has no observations.
Or in another saying, how to get the number of observations in a specific data set.
So that I can write an If statement to set some conditions.
Thanks.
It's easy with PROC SQL. Do a count and put the results in a macro variable.
proc sql noprint;
select count(*) into :observations from library.dataset;
quit;
There are lots of different ways, I tend to use a macro function with open() and attrn(). Below is a simple example that works great most of the time. If you are going to be dealing with data views or more complex situations like having a data set with records marked for deletion or active where clauses, then you might need more robust logic.
%macro nobs(ds);
%let DSID=%sysfunc(OPEN(&ds.,IN));
%let NOBS=%sysfunc(ATTRN(&DSID,NOBS));
%let RC=%sysfunc(CLOSE(&DSID));
&NOBS
%mend;
/* Here is an example */
%put %nobs(sashelp.class);
Here's the more complete example that #cmjohns was talking about. It will return 0 if it is empty, -1 if it is missing, and has options to handle deleted observations and where clauses (note that using a where clause can make the macro take a long time on very large datasets).
Usage Notes:
This macro will return the number of observations in a dataset. If the dataset does not exist then -1 will be returned. I would not recommend this for use with ODBC libnames, use it only against SAS tables.
Parameters:
iDs - The libname.dataset that you want to check.
iWhereClause (Optional) - A where clause to apply
iNobsType (Optional) - Either NOBS OR NLOBSF. See SASV9 documentation for descriptions.
Macro definition:
%macro nobs(iDs=, iWhereClause=1, iNobsType=nlobsf, iVerbose=1);
%local dsid nObs rc;
%if "&iWhereClause" eq "1" %then %do;
%let dsID = %sysfunc(open(&iDs));
%end;
%else %do;
%let dsID = %sysfunc(open(&iDs(where=(&iWhereClause))));
%end;
%if &dsID %then %do;
%let nObs = %sysfunc(attrn(&dsID,nlobsf));
%let rc = %sysfunc(close(&dsID));
%end;
%else %do;
%if &iVerbose %then %do;
%put WARNING: MACRO.NOBS.SAS: %sysfunc(sysmsg());
%end;
%let nObs = -1;
%end;
&nObs
%mend;
Example Usage:
%put %nobs(iDs=sashelp.class);
%put %nobs(iDs=sashelp.class, iWhereClause=height gt 60);
%put %nobs(iDs=this_dataset_doesnt_exist);
Results
19
12
-1
Installation
I recommend setting up a SAS autocall library and placing this macro in your autocall location.
Proc sql is not efficient when we have large dataset. Though using ATTRN is good method but this can accomplish within base sas, here is the efficient solution that can give number of obs of even billions of rows just by reading one row:
data DS1;
set DS nobs=i;
if _N_ =2 then stop;
No_of_obs=i;
run;
The trick is producing an output even when the dataset is empty.
data CountObs;
i=1;
set Dataset_to_Evaluate point=i nobs=j; * 'point' avoids review of full dataset*;
No_of_obs=j;
output; * Produces a value before "stop" interrupts processing *;
stop; * Needed whenever 'point' is used *;
keep No_of_obs;
run;
proc print data=CountObs;
run;
The above code is the simplest way I've found to produce the number of observations even when the dataset is empty. I've heard NOBS can be tricky, but the above can work for simple applications.
A slightly different approach:
proc contents data=library.dataset out=nobs;
run;
proc summary data=nobs nway;
class nobs;
var delobs;
output out=nobs_summ sum=;
run;
This will give you a dataset with one observation; the variable nobs has the value of number of observations in the dataset, even if it is 0.
I guess I am trying to reinvent the wheel here with so many answers already. But I do see some other methods trying to count from the actual dataset - this might take a long time for huge datasets. Here is a more efficient method:
proc sql;
select nlobs from sashelp.vtable where libname = "library" and memname="dataset";
quit;