I run some SAS queries in (Base SAS 9.4) every hour. I was wondering if there was any way I can schedule these to run on a certain TIME FOR every hours in a certain order? Thanks for your help :)
I'd advice to utilize some other means of timing processes. (Such as cron tabs or Jenkins)
However here is something that might be of use.
%macro do_every_hour;
%do %while(1); /*Does not end, so be sure what you do....*/
data time; /*When the loop begins, stored to dataset*/
begin=datetime();
run;
%do_the_queries; /*Your own queries go here*/
data time; /*How long did the queries take. */
set time;
end=datetime();
time_remain=(60-(end-begin)) <>0 ; /*Calculate the time for sleep if you want every hour. Make sure there are no negative values. */
call symput("sleep_time", time_remain); /*Take the number to macro variable for clarity's sake.*/
run;
%sysfunc(sleep(&sleep_time)); /*Here we wait for the next round.*/
%end; /*Do loop end.*/
%mend do_every_hour;
%do_every_hour;
Related
Instead of commenting in and out large blocks of code when developing, I am looking for an equivalent to the stop statement outside a data step which stops a SAS script at a certain point, ideally without throwing an error, nor setting brackets, nor defining own macros. All I found as minimal workaround is something like the following:
%put --- This is code I would like to execute;
data _null_;
abort cancel file;
run;
%put --- This is code which should temporarily disabled;
Is there a shorter or cleaner solution (in terms of log-output) how to stop executing a SAS script without quitting SAS altogether?
Related questions
Ending a SAS-Stored process properly inspired my example code.
break/exit script is about a slightly different problem
Create one macro stop_submission
%macro stop_submission;
%abort cancel;
%mend;
Sample use:
%macro stop_submission;
%abort cancel;
%mend;
data one;
set sashelp.class;
run;
%stop_submission
data two;
set sashelp.class;
run;
data three;
set sashelp.class;
run;
Will log something like
29167 data one;
29168 set sashelp.class;
29169 run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.ONE has 19 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
29170
29171 %stop_submission
ERROR: Execution canceled by an %ABORT CANCEL statement.
NOTE: The SAS System stopped processing due to receiving a CANCEL request.
I am designing an auto SAS program. I want it execute at the very first time I start SAS everyday and it should be executed only once. That is to say, I may start SAS several times this day, but the auto program will be executed only the first time I start SAS.
There are also some restricts:
1. It won't be executed if I have not use my SAS one day;
2. It won't be executed if I happen to working on SAS at daybreak;
I think recording the number of SAS startups is the key but have no idea on how to record it. Thanks for any hints.
Same as Quentin's comment
Add code such as the following to your autoexec.
options nodsnferr;
data _null_;
if not exist ('sasuser.laststart') then
call execute ('%include "my-once-a-day.sas";');
set sasuser.laststart;
if date < today() then
call execute ('%include "my-once-a-day.sas";');
run;
options nodsnferr;
data sasuser.laststart;
date = today();
run;
If you run multiple concurrent SAS sessions with different autoexecs and sasuser paths the above is not sufficient.
Is there a way to get a list of all the outputs (datasets/files) created by a step(iteration) in SAS?
I tried using the automatic variables but all that I could get was the last created dataset using &syslast and &sysdsn variables. But what if a data step creates multiple datasets? How can I get their names/details automatically in SAS without using any list, etc keywords? Is there a way possible?
Please Suggest!
Thank you!
I don't believe this is possible. The only way I can think of is to parse the log following your data step / iteration.
For this you can use something like:
/* set up a fresh log prior to your iteration */
%let logloc=%sysfunc(pathname(work))/mylog.txt;
proc printto log="&logloc" new;
run;
/* run your iteration */
data mystep with lots of output datasets;
set something;
run;
/* return to normal logging */
proc printto log=log;
run;
data _null_;
infile "&logloc";
input;
if _infile_=:'data' then do;
/* perform log scanning */
/* will likely need some complex logic to be robust!*/
end;
run;
PROC SCAPROC will report this in the log, with the caveat that you have to run the process first and then you'll get the output.
I am struggling to figure out the best way to generate random numbers reproducibly using multiple SAS data steps.
To do it in one data step is straightfoward: just use CALL STREAMINIT at the start of the data step.
However, if I then use a second data step, I can't figure out any way to continue the sequence of random numbers. If I don't use CALL STREAMINIT at all in the second data step, then the random numbers in the second data step are not reproducible. If I use CALL STREAMINIT with the same seed, I get the same random numbers as in the first data step.
The only think I can think of is to use CALL STREAMINIT with a different seed in each data step. Somehow that seems less satisfactory to me than using just one long random number sequence starting with the firs data step.
So for example I could do something like this:
%macro myrandom;
%do i = 1 %to 10;
data dataset&i;
call streaminit(&i);
[do stuff involving random numbers]
run;
%end;
%mend;
But somehow using a predictable sequence of seeds seems like cheating. Should I be worried about that? Is that actually a perfectly acceptable way of doing it, or is there a better way?
Here is my attempt at this:
%macro dataset_rand(_num,_rows);
data dataset;
do i = 0 to &_rows - 1;
call streaminit(123);
c = rand("UNIFORM");
varnum = mod(i,&_num.) +1;
output;
end;
run;
data %do i = 1 %to &_num.;
dataset&i.
%end;
;
set dataset;
%do j = 1 %to &_num;
if varnum = &j. then
output dataset&j.;
%end;
run;
%mend;
%dataset_rand(10,100);
Here I ran one step to create every single row with a single random variable and another variable that will be used to assign it to a dataset.
input is _num and _rows, which allow you to chose how many rows total and how many tables, so the example (10,100) creates 10 tables of 10 rows. With dataset1 holding the 1st, 11th ... 91st member of the random sequence.
That said I don't know of any reason why 10 datasets with 10 seeds, would be any better or worse than 1 dataset with 1 seed split into 10.
Using RANUNI or similar (the 'old' random number streams), you would use call ranuni to accomplish this. This lets you save the seed for the next round, and then you could call symputx that value to the next datastep and re-start the same stream. That's because the output value for one pseudorandom value is a direct variation on the seed for the next in that algorithm.
However, using RAND, the seed is more complicated (it's not really just one value, after the first number was called). From the documentation:
The RAND function is started with a single seed. However, the state of the process cannot be captured by a single seed. You cannot stop and restart the generator from its stopping point.
This is of course a simplification (obviously SAS is capable of doing so, it just doesn't open up the right hooks for you to do so, presumably as it's not as straightforward as call ranuni is).
What you can do, though, is use the macro language, depending on exactly what you're trying to do. Using %syscall and %sysfunc, you can get a single stream that goes across data steps.
However, one caveat: it doesn't look like you can ever reset it. From documentation on Seed Values:
When the RANUNI function is called through the macro language by using %SYSFUNC, one pseudo-random number stream is created. You cannot change the seed value unless you close SAS and start a new SAS session. The %SYSFUNC macro produces the same pseudo-random number stream as the DATA steps that generated the data sets A, B, and C for the first macro invocation only. Any subsequent macro calls produce a continuation of the single stream.
This is specific to the ranuni family, but it looks like it is also true for thhe rand family.
So, start up a new session of SAS, and run this:
%macro get_rands(seed=0, n=, var=, randtype=Uniform, randargs=);
%local i;
%syscall streaminit(seed);
%do i = 1 %to &n;
&var. = %sysfunc(rand(&randtype. &randargs.));
output;
%end;
%mend get_rands;
data first;
%get_rands(seed=7,n=10,var=x);
run;
data second;
%get_rands(n=10,var=x);
run;
data whole;
call streaminit(7);
do _i = 1 to 20;
x = rand('Uniform');
output;
end;
run;
But don't make the mistake of running it twice in one session.
Otherwise, your best bet is to generate your random numbers once, then use them in multiple data steps. If you use BY groups, it's easy to manage things this way. If you have specific questions how to implement your project in this way, let us know in a new question.
Not sure if it's much easier, but you can use the STREAM subroutine to generate multiple independent streams from the same initial seed. Below is an example, slightly modified from the doc on CALL STREAM.
%macro RepeatRand(N=, Repl=, seed=);
%do k = 1 %to &Repl;
data dataset&k;
call streaminit('PCG', &seed);
call stream(&k);
do i = 1 to &N;
u = rand("uniform");
output;
end;
run;
%end;
%mend;
%RepeatRand(N=8, Repl=2, seed=36457);
I just wanted to know like in proc sql we define stimer option.
The PROC SQL option STIMER | NOSTIMER specifies whether PROC SQL writes timing information for each statement to the SAS log, instead of writing a cumulative value for the entire procedure. NOSTIMER is the default.
Now in same way how to specify timing information in data set step. I am not using proc sql step
data h;
select name,empid
from employeemaster;
quit;
PROC SQL steps individually are effectively separate data steps, so in a sense you always get the identical information from SAS. What you're asking is effectively how to find out how long 'select name' takes versus 'empid'.
There's not a direct way to get the timing of an individual statement in a data step, but you could write data step code to find out. The problem is that the data step is executed row-wise, so it's really quite different from the PROC SQL STIMER details; almost nothing you do in a data step will take very long by itself, unless you are doing something more complex like a hash table lookup. What takes long is writing out the data first, and reading in the data second.
You do have some options for troubleshooting long data steps, if that's your concern. OPTIONS MSGLEVEL=I will give you information about index usage, merge details, etc., which can be helpful if you aren't sure why it is taking a long time to do certain things (see http://goo.gl/bpGWL in SAS documentation for more info). You can write your own timestamp:
data test;
set sashelp.class sashelp.class;
_t=time();
put _t=;
run;
Odds are that won't show you much of use since most data step iterations won't take very long but if you are doing something fancy it might help. You could also use conditional statements to only print the time at certain intervals - when at FIRST.ID for example in a process that works BY ID;.
Ultimately though the information you already get from notes is what is most useful. In PROC SQL you need the STIMER information because SQL is doing several things at once, while SAS lets/makes you do everything out step-wise. Example:
PROC SQL;
create table X as select * from A,B where A.ID=B.ID;
quit;
is one step - but in SAS this would be:
proc sort data=a; by ID; run;
proc sort data=b; by ID; run;
data x;
merge a(in=a) b(in=b);
by id;
if a and b;
run;
For that you would get information on the duration of each of those steps (the two sorts and the merge) in SAS, which is similar to what STIMER would tell you.
No way.
PROC SQL STIMER logs timing for each separately executable SQL statement/query.
In data step, as you may know, the data step looping occurs, observation per observation, so the data step statement timing would be something like per observation, let's say transactional. Anyway this would not describe all the details where the time is being spent - waiting for disk reads, writes, etc.
So I guess this won't be very usable. In general, SAS performance is I/O driven.