Quit vs Run statements in SAS - sas

In SAS, what is the difference between 'quit' and 'run'? statements? I cannot figure out when to use 'quit' and when to use 'run'? For example, why is proc datasets using quit but proc contents using run

This dates back to where SAS used to be a mainframe program (and still can be!).
RUN; is a command for SAS to run the submitted statements. Back in the older mainframe days, statements would've been submitted to SAS one at a time (or in batches, but the core concept here is that each line is separate from SAS's point of view). SAS accepts statements without doing anything until it hits a RUN; or something else that would create a step boundary (another DATA or PROC line, usually). In a data step, or a non-interactive proc (proc means, for example - a proc that can only do one set of instructions, and then exits), run tells it to do (whatever) and then return to a blank slate.
QUIT; is used in interactive programming environments. IML, SQL, many of the regression and modelling PROCs, FORMAT, TEMPLATE, DATASETS, etc. - all can be used interactively, meaning, more than one set of instructions can be sent to them.
In these interactive cases, you want SAS to go ahead and run some of the instructions, but still keep that PROC's environment open - your next statement would be in the same PROC, for example. Some of those run immediately - PROC SQL is a good example of this - while some (particularly the modelling PROCs) RUN; does something (tells it to run the model so far) but it won't exit the proc until QUIT; is encountered (or another step boundary that requires it to exit, i.e. a data/proc statement). These are called "run groups", and "run group processing" is the term you'll see associated with that.
You will find that some people put run; quit; at every point that run; or quit; might be appropriate; that doesn't hurt anything, though it isn't really 'right', either. And there are some cases where it's needed to do that!
One example:
/* first run group*/
proc gplot data=sales;
title1 "Sales Summary";
plot sales*model_a;
run;
/* second run group */
plot sales*model_b;
run;
quit;
(from run-group processing )

Related

How to accumulate the records of a newly imported table with the records of another table that I have stored on the servers in SAS?

I am new to SAS and I have the following problem:
When trying to join records I just imported (in one table) with records I have stored in another table.
What happens is that I am going to run the code in SAS daily, and I need the table that I am going to create today (17/05/2021) by importing a file 'X', to join the table that I created yesterday (16/05/2021) by importing a file 'Y'.
And so the code will be executed tomorrow, the next day and so on.
In conclusion the records will accumulate as the days go by.
To tackle this problem, I am first creating two variables, one with the date of the day the code will be executed and the other with the date of the last execution.
%let daily_date = 20210423; /*AAAAMMDD*/
%let last_execution_date = 20210422; /*AAAAMMDD*/
Then the import of a file is done, we can see that the name of this created table has the date of the day in which the code is being executed.
data InputAC.RA_ratings&daily_date;
infile "&ruta_InputRA." FIRSTOBS=2
dsd lrecl=4096 truncover;
input
#1 RA_Customer_ID $10.
#11 Rating_ID 10.
#21 ISRM_Model_Overlay_ID $10.
#31 Constant_ID 10.
#41 Value $100.
;
run;
proc sort data=inputac.RA_ratings&daily_date;
by RA_Customer_ID Rating_ID;
quit;
Finally the union of InputAC.RA_ratings&daily_date with InputAC.RA_ratings&last_execution_date is made. ('InputAC.RA_ratings&last_execution_date' should be the table that was imported at an earlier date than today.)
data InputAC.RA_ratings&fec_diario;
merge
InputAC.RA_ratings&fec_diario
InputAC.RA_ratings&ultima_fecha_de_ejecucion;
by RA_Customer_ID Rating_ID;
run;
This is how the tables are being stored on the server.
(Ignore date 20210413, let's imagine it is 20210422)
However, I have to perform this task without using the variable 'last_execution_date'.
I've been researching but I still can't find any SAS function that can help me with this problem.
I hope someone can help me, thank you very much in advance.
This is a pretty complex and interesting question from an operations point of view. The answer depends on a few things.
How much control do you have over the execution of this process?
Is "yesterday" guaranteed, or does the process need to work if "last execution date" is not yesterday?
What should happen if the process is run twice today?
The best practices way to solve this is to have a dataset (or table) that stores the last execution date. That allows you to handle #2 trivially, and the answer to #3 might guide exactly how you store this but is easily handled anyway.
Say for example you have a table, MetaAC.LastExecDate (or, in spanish, MetaAC.UltimaFecha or similar). It could store things this way:
data LastExecDate;
timestamp = datetime();
execdate = input(&daily_date,yymmdd8.);
run;
proc append base=MetaAC.LastExecDate data=LastExecDate;
run;
This lets you store an arbitrary execdate even if it's not today, and also store when you ran it (for audit purposes), and you could even add who ran it if that's interesting (there is a macro variable &sysuserid or similar). Then put all this at the bottom of your process, and it updates as you go.
Then, you can pull out from this the exact info you want - for example,
proc sql;
select max(execdate)
into :last_exec_date
from MetaAC.LastExecDate
where execdate ne today()
;
quit;
Now, if you don't have control over this for some reason, you could determine this in a different way. Again, the exact process depends on your circumstances and your answers to 2 and 3.
If your answer to 2 is you always want it to be yesterday, then this is really easy - just do this:
%let daily_date=20210517;
%let last_execution_date = %sysfunc(putn(%sysevalf(%sysfunc(inputn(&daily_date,yymmdd8.))-1),yymmddn8.));
%put &=last_execution_date;
The two %sysfuncs just do the input/put from SAS datastep inside the macro language, and %sysevalf lets you do math.
If you don't want it to always be the prior day (if there are weekends, or other days you don't necessarily want to assume it's the prior day), then your best bet is to either use the dictionary tables to look at what's there and find the largest date prior to your date, or maybe use a x command to look at the folder and do the same thing (might be easier to use OS command than to use SQL for this, sometimes SQL dictionary tables can be slow).

Understanding SAS output data sets

SAS has several forms it uses to create output data sets from within a procedure. It is not always clear whether or not a particular procedure can generate a data set and, if it seems to be able to, it's not always clear how.
Off the top of my head, here are some examples of how widely the syntax can differ.
Example 1
proc sort data = sashelp.baseball out = baseball_sorted;
by
league
division
;
run;
Example 2
proc means noprint data = baseball_sorted;
by
league
division
;
var nHits;
output
out = baseball_avg_hits (drop = _TYPE_ _FREQ_)
mean = mean_hits
;
run;
Example 3
ods exclude all;
ods output
statistics = baseball_statistics
equality = baseball_ftest
;
proc ttest data = baseball_sorted;
class league;
var nHits;
run;
ods exclude none;
Example 4
The PROC ANOVA OUTSTAT= option.
It seems almost as if SAS has implemented each of these willy-nilly. Is the SAS syntax dictating how to create a data set directed by some consistent approach I am not seeing or is it truly capricious and arbitrary?
For PROC code, the syntax for outputting data is often specific to that procedure, which often feels willy-nilly. (Your examples 1, 2, 4) I think PROC developers are given a lot of freedom, and remember that many of these PROCS are 30+ years old.
The great thing about the Output Delivery System (ODS, your example 3) is it provides a single syntax for outputting data, regardless of the procedure. So you can use the ODS OUTPUT statement with (almost?) any PROC. The names and structures of the output objects will of course vary between PROCs. So if you are looking for a consistent approach, I would focus on using ODS OUTPUT. ODS was added in V7 (I think).
It would be interesting to try to find an example of an output dataset which could be made by a PROC but could not be made by ODS OUTPUT. I hope there aren't any. If that is the case, you could consider the range of OUTPUT statements/options within PROCs as legacy code.
Agree with Quentin. You have to remember that there are SAS systems out there running code written in the 80s. SAS would have a huge headache if they forced every team to rewrite all the procedures and then forced their customers to change all their code. SAS has been around since the 60s and the organic growth of the syntax is to be expected.
FWIW, having an OUT= statement makes sense on things with no graphical output. I.E. PROC SORT or PROC TRANSPOSE.
The way I see it there are four main ways to specify the output data sets.
In the PROC statement you may be able to specify some type of output statements or options, such as OUT= OUTEST=.
In the main statement of the procedure, ie MODEL/TABLE can have options that allow for output. ie PROC FREQ has an OUT= on the TABLE statement.
An explicit OUTPUT statement within a procedure. These are typically from older procedures. ie PROC MEANS
ODS tables which are relatively newer method, more frequently used these days since the format aligns with what you'd expect to see.
Yes, there are multiple places to check, but fortunately the SAS documentation for procedures is relatively clear with the options and how to use/specify the outputs.
If I've missed anything that seems different post in the comments and I can update this.
PS. Although SAS is definitely bad, trying to navigate different packages/modules in Python to export an XLSX file isn't straight forward either. Some packages support some options others don't. I've given up on asking why these days and just accept it as peculiarities of the different languages at this point.

SAS: Track progress inside of a Proc Sql statement

I found this piece of code online
data _null_;
set sashelp.class;
if mod(_n_,5)=0 then
rc = dosubl(cats('SYSECHO "OBS N=',_n_,'";'));
s = sleep(1); /* contrived delay - 1 second on Windows */
run;
I would like to know if you had any idea of how to adapt this piece to a proc sql statement, so I could track the progress of a long query...
For example
proc sql;
create table test as
select * from work.mytable
where mycolumn="thisvalue";
quit;
and somewhere in the statement above we would include the
rc = dosubl(cats('SYSECHO "OBS N=',_n_,'";'));
You wouldn't be able to directly check the progress of a SQL query, unfortunately (if it's operating on SAS datasets, anyway), except by monitoring the physical size of the table (you can do a directory listing of your WORK directory, or depending on how it's building the table, the Utility directory). However, it may or may not be linear; SQL might, for example, use a hash strategy which would not necessarily take up disk space until it was fairly close to being done.
For SQL, you're best off looking at the query plan to tell how long something's going to take. There are several guides out there, such as The SQL Optimizer Project, which explains the _METHOD and _TREE options among other things.

Get ODS output table names without actually having to run a PROC?

Question: is there a way to just get the ODS table names from a PROC without running the program up to that point with ods trace on?
Background: I often need to output an ODS data set from a PROC, but the only method I know of to get the list of available data sets is to insert
ods trace on;
before the PROC, then run the program, then review the log file to find the appropriate data set name, then insert my ods output statement, and re-run.
In a time-consuming program, that process can take a lot of time, and it just seems inefficient to have to run a program in order to figure out how to continue programming.
I can't find any SAS documentation that lists the available ODS tables by PROC, but if something like that exists, that would be a great answer to this question. I know the ODS output tables vary depending on which options are specified, but it still seems like a comprehensive list could be compiled, with notes about whether each table is dependent on PROC-specific options.
I'd also love it if there were something like a meta-PROC where a PROC name could be specified, and the ODS table names are returned, without running any other code.
There is a compilation in the SAS doc. This is for 9.4, but there is one in all the versions.
http://support.sas.com/documentation/cdl/en/odsug/66611/HTML/default/viewer.htm#p0mnbijm0t6w1cn1dpf3q5suxk4u.htm
You can run it with options obs=1; (just reset to obs=max later) if the output procedure is capable of running with zero observations (in general, if you're not doing any programmatic code generation, this is probably the case). You can also likely create a test procedure run that does not use your actual data (although that depends on what you're doing). (You cannot use obs=0, as that would produce no output.)
For example:
options obs=1;
ods trace on;
proc freq data=sashelp.class;
run;
ods trace off;
options obs=max;
You also may be able to determine the names from the installed templates, if it is a procedure that uses templates. For example, PROC FREQ does.
Bring up the Results explorer, and right click on the "Results" node up at the top. Select "Templates". That opens the Template explorer. Then look about for your procedure. Most of the statistical procedures are in Sashelp.Tmplstat. FREQ is not, however; it is in Sashelp.Tmplbase, under Base.Freq. Each of the PROC FREQ entries that starts with define table ... is a separate ODS table; the primary ones for PROC FREQ are Onewayfreqs and CrosstabFreqs, but generally all of the ones with a similar icon to those are tables (the blue ones are dynamic variables).
For PROC REG, for example, it is in the SASHELP.tmplstat folder (SASHELP.tmplstat.Reg), and has a few dozen tables available to see, generally with logical names. Not every table is produced from every run (it depends on what you ask for and what the PROC decides is needed), and I'm not sure every single one is available to intercept via ODS, but most of them are.

how to get timing information of a data step query

I just wanted to know like in proc sql we define stimer option.
The PROC SQL option STIMER | NOSTIMER specifies whether PROC SQL writes timing information for each statement to the SAS log, instead of writing a cumulative value for the entire procedure. NOSTIMER is the default.
Now in same way how to specify timing information in data set step. I am not using proc sql step
data h;
select name,empid
from employeemaster;
quit;
PROC SQL steps individually are effectively separate data steps, so in a sense you always get the identical information from SAS. What you're asking is effectively how to find out how long 'select name' takes versus 'empid'.
There's not a direct way to get the timing of an individual statement in a data step, but you could write data step code to find out. The problem is that the data step is executed row-wise, so it's really quite different from the PROC SQL STIMER details; almost nothing you do in a data step will take very long by itself, unless you are doing something more complex like a hash table lookup. What takes long is writing out the data first, and reading in the data second.
You do have some options for troubleshooting long data steps, if that's your concern. OPTIONS MSGLEVEL=I will give you information about index usage, merge details, etc., which can be helpful if you aren't sure why it is taking a long time to do certain things (see http://goo.gl/bpGWL in SAS documentation for more info). You can write your own timestamp:
data test;
set sashelp.class sashelp.class;
_t=time();
put _t=;
run;
Odds are that won't show you much of use since most data step iterations won't take very long but if you are doing something fancy it might help. You could also use conditional statements to only print the time at certain intervals - when at FIRST.ID for example in a process that works BY ID;.
Ultimately though the information you already get from notes is what is most useful. In PROC SQL you need the STIMER information because SQL is doing several things at once, while SAS lets/makes you do everything out step-wise. Example:
PROC SQL;
create table X as select * from A,B where A.ID=B.ID;
quit;
is one step - but in SAS this would be:
proc sort data=a; by ID; run;
proc sort data=b; by ID; run;
data x;
merge a(in=a) b(in=b);
by id;
if a and b;
run;
For that you would get information on the duration of each of those steps (the two sorts and the merge) in SAS, which is similar to what STIMER would tell you.
No way.
PROC SQL STIMER logs timing for each separately executable SQL statement/query.
In data step, as you may know, the data step looping occurs, observation per observation, so the data step statement timing would be something like per observation, let's say transactional. Anyway this would not describe all the details where the time is being spent - waiting for disk reads, writes, etc.
So I guess this won't be very usable. In general, SAS performance is I/O driven.