Hope someone can shed some light on this for me.
I have a process that uses the below table. There is a subsequent table (resource5) that has the same data as resource4 - basically I can use either table - not sure why there's two to be honest but it may come in handy here.
Both tables are updated sequentially twice an hour at irregular intervals, so I cannot schedule around them and it seems to take around 5mins to update each table.
I always need the latest available data, and other data is live so I'm hitting the table quite frequently (every 15 mins).
Is there a way to check resource4 is available to be locked by my process and if so, proceed to run the data step and if not, hit resource5 instead and if not res5 then just quit the entire process so nothing else tries (other proc sql from oracle) to run?
As long as work.resource4 appears and is usable then all is well.
All my code does is this, once it's in WORK I can do whatever without fear of an issue.
data resource4;
set publprev.resource4;
run;
ps. I'm using SAS EG in Windows to make the change, then the process is exported via a .sas file with all of the code and runs off of a Unix SAS server via crontab though a shell script which also creates a log file. Probably not the most efficient way to schedule this stuff but it is what I have.
Many thanks in advance.
You can use the function open to check if a table is available to you for reading (i.e. copying to WORK).
You will have to use macro to provide the name of the available data set to your DATA Step.
Example:
* NOTE: The DATA Step will automatically close resources it open()s;
%let RESOURCE=NONE AVAILABLE;
data _null_;
if open ('publprev.resource4') ne 0 then do;
call symput('RESOURCE', 'publprev.resource4');
stop;
end;
if open ('publprev.resource5') ne 0 then do;
call symput('RESOURCE', 'publprev.resource5');
end;
run;
data work.resource;
set &RESOURCE;
run;
Related
I am new to SAS and I have the following problem:
When trying to join records I just imported (in one table) with records I have stored in another table.
What happens is that I am going to run the code in SAS daily, and I need the table that I am going to create today (17/05/2021) by importing a file 'X', to join the table that I created yesterday (16/05/2021) by importing a file 'Y'.
And so the code will be executed tomorrow, the next day and so on.
In conclusion the records will accumulate as the days go by.
To tackle this problem, I am first creating two variables, one with the date of the day the code will be executed and the other with the date of the last execution.
%let daily_date = 20210423; /*AAAAMMDD*/
%let last_execution_date = 20210422; /*AAAAMMDD*/
Then the import of a file is done, we can see that the name of this created table has the date of the day in which the code is being executed.
data InputAC.RA_ratings&daily_date;
infile "&ruta_InputRA." FIRSTOBS=2
dsd lrecl=4096 truncover;
input
#1 RA_Customer_ID $10.
#11 Rating_ID 10.
#21 ISRM_Model_Overlay_ID $10.
#31 Constant_ID 10.
#41 Value $100.
;
run;
proc sort data=inputac.RA_ratings&daily_date;
by RA_Customer_ID Rating_ID;
quit;
Finally the union of InputAC.RA_ratings&daily_date with InputAC.RA_ratings&last_execution_date is made. ('InputAC.RA_ratings&last_execution_date' should be the table that was imported at an earlier date than today.)
data InputAC.RA_ratings&fec_diario;
merge
InputAC.RA_ratings&fec_diario
InputAC.RA_ratings&ultima_fecha_de_ejecucion;
by RA_Customer_ID Rating_ID;
run;
This is how the tables are being stored on the server.
(Ignore date 20210413, let's imagine it is 20210422)
However, I have to perform this task without using the variable 'last_execution_date'.
I've been researching but I still can't find any SAS function that can help me with this problem.
I hope someone can help me, thank you very much in advance.
This is a pretty complex and interesting question from an operations point of view. The answer depends on a few things.
How much control do you have over the execution of this process?
Is "yesterday" guaranteed, or does the process need to work if "last execution date" is not yesterday?
What should happen if the process is run twice today?
The best practices way to solve this is to have a dataset (or table) that stores the last execution date. That allows you to handle #2 trivially, and the answer to #3 might guide exactly how you store this but is easily handled anyway.
Say for example you have a table, MetaAC.LastExecDate (or, in spanish, MetaAC.UltimaFecha or similar). It could store things this way:
data LastExecDate;
timestamp = datetime();
execdate = input(&daily_date,yymmdd8.);
run;
proc append base=MetaAC.LastExecDate data=LastExecDate;
run;
This lets you store an arbitrary execdate even if it's not today, and also store when you ran it (for audit purposes), and you could even add who ran it if that's interesting (there is a macro variable &sysuserid or similar). Then put all this at the bottom of your process, and it updates as you go.
Then, you can pull out from this the exact info you want - for example,
proc sql;
select max(execdate)
into :last_exec_date
from MetaAC.LastExecDate
where execdate ne today()
;
quit;
Now, if you don't have control over this for some reason, you could determine this in a different way. Again, the exact process depends on your circumstances and your answers to 2 and 3.
If your answer to 2 is you always want it to be yesterday, then this is really easy - just do this:
%let daily_date=20210517;
%let last_execution_date = %sysfunc(putn(%sysevalf(%sysfunc(inputn(&daily_date,yymmdd8.))-1),yymmddn8.));
%put &=last_execution_date;
The two %sysfuncs just do the input/put from SAS datastep inside the macro language, and %sysevalf lets you do math.
If you don't want it to always be the prior day (if there are weekends, or other days you don't necessarily want to assume it's the prior day), then your best bet is to either use the dictionary tables to look at what's there and find the largest date prior to your date, or maybe use a x command to look at the folder and do the same thing (might be easier to use OS command than to use SQL for this, sometimes SQL dictionary tables can be slow).
I am designing an auto SAS program. I want it execute at the very first time I start SAS everyday and it should be executed only once. That is to say, I may start SAS several times this day, but the auto program will be executed only the first time I start SAS.
There are also some restricts:
1. It won't be executed if I have not use my SAS one day;
2. It won't be executed if I happen to working on SAS at daybreak;
I think recording the number of SAS startups is the key but have no idea on how to record it. Thanks for any hints.
Same as Quentin's comment
Add code such as the following to your autoexec.
options nodsnferr;
data _null_;
if not exist ('sasuser.laststart') then
call execute ('%include "my-once-a-day.sas";');
set sasuser.laststart;
if date < today() then
call execute ('%include "my-once-a-day.sas";');
run;
options nodsnferr;
data sasuser.laststart;
date = today();
run;
If you run multiple concurrent SAS sessions with different autoexecs and sasuser paths the above is not sufficient.
I had one main program in sas, in that another 2 sas programs are being called.
These 2 sas programs create formats using proc format cntlin from large data sets and are temporary means residing in workspace. These formats are used in sas program to assing format to some variables.
In main sas program almost 15 large data sets are created in work library.
Some proc sql joins and data step merges are happening
We have index creation on data sets using proc datasets.
We also used proc sort
Where ever possible used where instead of if
It had mprint mlogic symbolgen options enabled
And some small logic wise performance tuning is done.
Here most part of dataset creation is done in work library. If we clear total work space previously created formats are lost. We dont want to loose formats untill end of job because these are used in entire sas program.
It is taking 1TB of sas workspace to accomplish all this job. So i wanted to reduce this usage space.
Can you guys someone please suggest what are all optimizations we can do to use less space as well as memory.
Write the format catalogs to a different folder.
I have developed a SAS process in Enterprise Guide 7.1 that sends e-mails daily (if need be).
The way it works is this:
[external program] generates a file which specifies who needs to be emailed and the subject matter
.
My sas process then looks like this:
1. import this file.
2. manipulate this file.
3. generate emails based on contents of manipulated file.
The problem is, everything crashes if the original file imported in step 1 is empty. Is there a way to run the import, check if the dataset is empty, and then if it is terminate the entire sas process tree?
Thank you in advance, I've been searching but to no avail.
Best way would be to put step 2 and 3 completely in a macro and only execute it when step1 dataset is not empty.
step 1 import file in dataset mydata
data _null_;
set mydata nobs=number;
call symput('mydata_count', number);
stop;
run;
%macro m;
%if &mydata_count > 0 %then %do;
step 2 manipulate this file
step 3 generate emails
%end;
%mend;
%m;
As alternative you could use the statements "Endsas" or "abort" which both terminate your job and session but they can have unwanted sideeffects, you can find these statements and information about them easily when googling for them together with keyword sas.
Although the two statements do what you originally wanted, i would recommend the logical approach i posted as first, because you have more control about what is happening that way and you can avoid some bad side-effects when working with the statements
IMO a better way is to start using a macro like %runquit;. See my answer here. https://stackoverflow.com/a/31390442/214994
Basically instead of using run; or quit; at the end of a step you use %runquit;. If any errors occurred during that step then the rest of the SAS process will be aborted. If running in batch, the entire process is killed. If running interactively, code execution stops, but your interactive session remains open.
EDIT: This assumes you get some kind of error message or warning if the file is empty.
I found this piece of code online
data _null_;
set sashelp.class;
if mod(_n_,5)=0 then
rc = dosubl(cats('SYSECHO "OBS N=',_n_,'";'));
s = sleep(1); /* contrived delay - 1 second on Windows */
run;
I would like to know if you had any idea of how to adapt this piece to a proc sql statement, so I could track the progress of a long query...
For example
proc sql;
create table test as
select * from work.mytable
where mycolumn="thisvalue";
quit;
and somewhere in the statement above we would include the
rc = dosubl(cats('SYSECHO "OBS N=',_n_,'";'));
You wouldn't be able to directly check the progress of a SQL query, unfortunately (if it's operating on SAS datasets, anyway), except by monitoring the physical size of the table (you can do a directory listing of your WORK directory, or depending on how it's building the table, the Utility directory). However, it may or may not be linear; SQL might, for example, use a hash strategy which would not necessarily take up disk space until it was fairly close to being done.
For SQL, you're best off looking at the query plan to tell how long something's going to take. There are several guides out there, such as The SQL Optimizer Project, which explains the _METHOD and _TREE options among other things.