Delete older versions in a library that have a date suffix older than today's sysdate - sas

I am totally new to SAS but have been tasked with cleaning up one of our SAS processes.
We have a SAS Library which has a number of versions of a table which have a date suffix.
For example: COL_DATA_TABLE_2022_02_15
Due that our update creates one of these every time the process is run, I would like to delete all that do not equal the one I am creating when the process runs.
The process runs infrequently. So there could be days when the process is not run.
Code creating the new version:
proc datasets lib=Lib1 nolist ;
change COL_DATA_TABLE = COL_DATA_TABLE_&sysdate. ;
quit ;
What code should I use to remove the previous versions of the table? As they are out of date, taking into account that the date suffix could be any date.
For those in the library now, I will delete them manually.

would deleting all the COL_DATA_TABLE_: tables before the CHANGE work?
proc datasets lib=Lib1 nolist ;
delete COL_DATA_TABLE_:; *note name with _:;
run;
change COL_DATA_TABLE = COL_DATA_TABLE_&sysdate. ;
quit;

Related

How to accumulate the records of a newly imported table with the records of another table that I have stored on the servers in SAS?

I am new to SAS and I have the following problem:
When trying to join records I just imported (in one table) with records I have stored in another table.
What happens is that I am going to run the code in SAS daily, and I need the table that I am going to create today (17/05/2021) by importing a file 'X', to join the table that I created yesterday (16/05/2021) by importing a file 'Y'.
And so the code will be executed tomorrow, the next day and so on.
In conclusion the records will accumulate as the days go by.
To tackle this problem, I am first creating two variables, one with the date of the day the code will be executed and the other with the date of the last execution.
%let daily_date = 20210423; /*AAAAMMDD*/
%let last_execution_date = 20210422; /*AAAAMMDD*/
Then the import of a file is done, we can see that the name of this created table has the date of the day in which the code is being executed.
data InputAC.RA_ratings&daily_date;
infile "&ruta_InputRA." FIRSTOBS=2
dsd lrecl=4096 truncover;
input
#1 RA_Customer_ID $10.
#11 Rating_ID 10.
#21 ISRM_Model_Overlay_ID $10.
#31 Constant_ID 10.
#41 Value $100.
;
run;
proc sort data=inputac.RA_ratings&daily_date;
by RA_Customer_ID Rating_ID;
quit;
Finally the union of InputAC.RA_ratings&daily_date with InputAC.RA_ratings&last_execution_date is made. ('InputAC.RA_ratings&last_execution_date' should be the table that was imported at an earlier date than today.)
data InputAC.RA_ratings&fec_diario;
merge
InputAC.RA_ratings&fec_diario
InputAC.RA_ratings&ultima_fecha_de_ejecucion;
by RA_Customer_ID Rating_ID;
run;
This is how the tables are being stored on the server.
(Ignore date 20210413, let's imagine it is 20210422)
However, I have to perform this task without using the variable 'last_execution_date'.
I've been researching but I still can't find any SAS function that can help me with this problem.
I hope someone can help me, thank you very much in advance.
This is a pretty complex and interesting question from an operations point of view. The answer depends on a few things.
How much control do you have over the execution of this process?
Is "yesterday" guaranteed, or does the process need to work if "last execution date" is not yesterday?
What should happen if the process is run twice today?
The best practices way to solve this is to have a dataset (or table) that stores the last execution date. That allows you to handle #2 trivially, and the answer to #3 might guide exactly how you store this but is easily handled anyway.
Say for example you have a table, MetaAC.LastExecDate (or, in spanish, MetaAC.UltimaFecha or similar). It could store things this way:
data LastExecDate;
timestamp = datetime();
execdate = input(&daily_date,yymmdd8.);
run;
proc append base=MetaAC.LastExecDate data=LastExecDate;
run;
This lets you store an arbitrary execdate even if it's not today, and also store when you ran it (for audit purposes), and you could even add who ran it if that's interesting (there is a macro variable &sysuserid or similar). Then put all this at the bottom of your process, and it updates as you go.
Then, you can pull out from this the exact info you want - for example,
proc sql;
select max(execdate)
into :last_exec_date
from MetaAC.LastExecDate
where execdate ne today()
;
quit;
Now, if you don't have control over this for some reason, you could determine this in a different way. Again, the exact process depends on your circumstances and your answers to 2 and 3.
If your answer to 2 is you always want it to be yesterday, then this is really easy - just do this:
%let daily_date=20210517;
%let last_execution_date = %sysfunc(putn(%sysevalf(%sysfunc(inputn(&daily_date,yymmdd8.))-1),yymmddn8.));
%put &=last_execution_date;
The two %sysfuncs just do the input/put from SAS datastep inside the macro language, and %sysevalf lets you do math.
If you don't want it to always be the prior day (if there are weekends, or other days you don't necessarily want to assume it's the prior day), then your best bet is to either use the dictionary tables to look at what's there and find the largest date prior to your date, or maybe use a x command to look at the folder and do the same thing (might be easier to use OS command than to use SQL for this, sometimes SQL dictionary tables can be slow).

How to record the number of SAS startups?

I am designing an auto SAS program. I want it execute at the very first time I start SAS everyday and it should be executed only once. That is to say, I may start SAS several times this day, but the auto program will be executed only the first time I start SAS.
There are also some restricts:
1. It won't be executed if I have not use my SAS one day;
2. It won't be executed if I happen to working on SAS at daybreak;
I think recording the number of SAS startups is the key but have no idea on how to record it. Thanks for any hints.
Same as Quentin's comment
Add code such as the following to your autoexec.
options nodsnferr;
data _null_;
if not exist ('sasuser.laststart') then
call execute ('%include "my-once-a-day.sas";');
set sasuser.laststart;
if date < today() then
call execute ('%include "my-once-a-day.sas";');
run;
options nodsnferr;
data sasuser.laststart;
date = today();
run;
If you run multiple concurrent SAS sessions with different autoexecs and sasuser paths the above is not sufficient.

SAS - Choose last dataset in library that satisfies specific name convention

Say I have the a library named mylib.
Within the mylib library, the following datasets are held:
mylib.data_yearly_2015
mylib.data_yearly_2016
mylib.data_yearly_2017
mylib.data_yearly_2018
mylib.data_yearly_2015
mylib.data_mtly_01JUN2015
mylib.data_mtly_01DEC2015
mylib.data_mtly_01JUN2016
mylib.data_mtly_01DEC2016
mylib.data_mtly_01JUN2017
mylib.data_mtly_01DEC2017
Now I need to write a macro that will specifically choose the latest data_mtly_xxxxxx table from the mylib library.
For example, in the current stage, it should choose mylib.data_mtly_01DEC2017
If, however, a new dataset gets added, for example mylib.data_mtly_01JUN2018, it would have to choose that table.
How can I go about doing this in SAS?
Get a list of all data sets
Get the date portion using SCAN() and INPUT()
Get max date.
Proc sql noprint;
Select max(input(scan(name, -1, ‘_’), date9.) ) into :latest_date
From sashelp.vtable
Where upcase(libname) = ‘MYLIB’ and upcase(memname) like ‘DATA_MTLY_%’;
Quit;
Now you should have the latest date value in a macro variable and can use that in your code.
%put &latest_date.;
If it looks like a number and not a date, you’ll need a format applied but you should be able to convert it using PUT().
Note: code is untested.

permanently save modified dataset

I know this is a very basic question but my code keeps failing when trying to run what I found through the help documentation.
Up to now I have been running an analysis project off of the .WORK directory which I understand gets wiped out every time a session ends. I have done a bunch of data cleaning and preparation and do not want to have to do that every time before I start my analysis.
So I understand, from reading this: https://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001310720.htm that I have to output the cleaned dataset to a non-temporary directory.
Steps I have taken so far:
1) created a new Library called "Project"
2) Saved it in a folder that I have under "my folders" in SAS
3) My code for saving the cleaned dataset to the "Project" library is as follows:
PROC SORT DATA=FAA_ALL NODUPKEY;
BY GROUND_SPEED;
DATA PROJECT.FAA_ALL;
RUN;
Then I run this code in a new program:
PROC PRINT DATA=PROJECT.FAA_ALL;
RUN;
It says there are no observations and that the dataset is essentially empty.
Can some tell me where I'm going wrong?
Your problem is the PROC SORT
PROC SORT DATA=FAA_ALL NODUPKEY;
BY GROUND_SPEED;
DATA PROJECT.FAA_ALL;
RUN;
Should be
PROC SORT DATA=FAA_ALL OUT= PROJECT.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
That DATA PROJECT.FAA_ALL was starting a Data Step creating a blank data set.
Something else worth mentioning: your data step didn't do what you might have expected because you had no set statement. Your code was equivalent to:
PROC SORT DATA=WORK.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
DATA PROJECT.FAA_ALL;
SET _NULL_;
RUN;
PROJECT.FAA_ALL is empty because nothing was read in.
The SORT procedure implicitly sorts a dataset in-place. You could have SAS move the sorted data by adding the set statement to your data step:
PROC SORT DATA=WORK.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
DATA PROJECT.FAA_ALL;
SET WORK.FAA_ALL;
RUN;
However, this still takes two steps, and requires extra disk I/O. Using the out option in a SAS procedure (as in DomPazz's answer) is almost always faster and more efficient than using a data step just to move data.

Run Duration in SAS enterprise miner

I have the following problem. We have several streams in Enterprise Miner and we would like to be able to tell how long was each run. I have tried to create a macro that would save the start and end time/date but the problem is that global variables defined in a node, are not seen anymore in a subsequent node (so are global only inside a node, but not between nodes). How people usually solve the problem? Any idea or suggestion?
Thanks, Umberto
Just write out timestamps to log (EM should produce a global log in the same fashion that EG and DI do)
Either use:
data _null_;
datetime = datetime();
put datetime= datetime20.;
run;
or macro language:
%put EM node started at at %sysfunc(time(),timeampm.) on %sysfunc(date(),worddate.).;
with a higly customised message you have read the log in SAS looking for those strings using regex.
Solution 2:
Other option is to created a table in a library that is visible from EM and EG for example and have sql inserts at the beginning/end of your process.
proc sql;
create table EM_logger
(jobcode char(100),
timestamp num informat=datetime20. format=datetime20.);
quit;
proc sql;
insert into EM_logger values('Begining Linear Reg',%sysfunc(datetime()));
quit;
data w;
do i=1 to 10000000;
output;
end;
run;
proc sql;
insert into EM_logger values('End Linear Reg',%sysfunc(datetime()));
quit;
Table layout can be as complex as you want and as long as you can access it you can get your statistics.
Hope it helps