List all the output created by a SAS step - sas

Is there a way to get a list of all the outputs (datasets/files) created by a step(iteration) in SAS?
I tried using the automatic variables but all that I could get was the last created dataset using &syslast and &sysdsn variables. But what if a data step creates multiple datasets? How can I get their names/details automatically in SAS without using any list, etc keywords? Is there a way possible?
Please Suggest!
Thank you!

I don't believe this is possible. The only way I can think of is to parse the log following your data step / iteration.
For this you can use something like:
/* set up a fresh log prior to your iteration */
%let logloc=%sysfunc(pathname(work))/mylog.txt;
proc printto log="&logloc" new;
run;
/* run your iteration */
data mystep with lots of output datasets;
set something;
run;
/* return to normal logging */
proc printto log=log;
run;
data _null_;
infile "&logloc";
input;
if _infile_=:'data' then do;
/* perform log scanning */
/* will likely need some complex logic to be robust!*/
end;
run;

PROC SCAPROC will report this in the log, with the caveat that you have to run the process first and then you'll get the output.

Related

Why in SAS data step where works but not if

Working code:
data t2;
set t1;
where a like "%SR";
run;
Code errored:
data t2;
set t1;
if a like "%SR";
run;
Error message:
ERROR 388-185: Expecting an arithmetic operator.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
It complained about 'like'
Any ideal?
LIKE is not an operator that SAS code understands. The only reason it works in WHERE is because WHERE statement supports SQL syntax such as LIKE and BETWEEN to make it easier to push the WHERE condition into a remote database.
Use some other way to test if the last two letters are SR. Here are two methods.
if 'SR' = substrn(a,length(a)-1);
if 'RS' =: left(reverse((a)) ;
The most similar solution is to use prxmatch:
data t2;
set t1;
if prxmatch("/.*SR/ios",a);
run;
Note that this is much slower than WHERE with LIKE, and Tom's solutions are faster if there's a reasonable way to do them (as there is in the example).
The LIKE operator is not understood by the DATA Step IF statement.
LIKE is available to DATA Step in the WHERE statement, the WHERE= data set option, or PROC SQL WHERE clause.
data have;
input text $CHAR20.;
datalines;
ABCEFG
YESSR
Mark JR
Mark SR
;
data want;
set have;
where text like '%SR'; /* where statement */
run;
data want;
set have(where=(text like '%SR')); /* where= option */
run;
proc sql;
create table want as
select text from have
where text like '%SR' /* where clause */
;

permanently save modified dataset

I know this is a very basic question but my code keeps failing when trying to run what I found through the help documentation.
Up to now I have been running an analysis project off of the .WORK directory which I understand gets wiped out every time a session ends. I have done a bunch of data cleaning and preparation and do not want to have to do that every time before I start my analysis.
So I understand, from reading this: https://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001310720.htm that I have to output the cleaned dataset to a non-temporary directory.
Steps I have taken so far:
1) created a new Library called "Project"
2) Saved it in a folder that I have under "my folders" in SAS
3) My code for saving the cleaned dataset to the "Project" library is as follows:
PROC SORT DATA=FAA_ALL NODUPKEY;
BY GROUND_SPEED;
DATA PROJECT.FAA_ALL;
RUN;
Then I run this code in a new program:
PROC PRINT DATA=PROJECT.FAA_ALL;
RUN;
It says there are no observations and that the dataset is essentially empty.
Can some tell me where I'm going wrong?
Your problem is the PROC SORT
PROC SORT DATA=FAA_ALL NODUPKEY;
BY GROUND_SPEED;
DATA PROJECT.FAA_ALL;
RUN;
Should be
PROC SORT DATA=FAA_ALL OUT= PROJECT.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
That DATA PROJECT.FAA_ALL was starting a Data Step creating a blank data set.
Something else worth mentioning: your data step didn't do what you might have expected because you had no set statement. Your code was equivalent to:
PROC SORT DATA=WORK.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
DATA PROJECT.FAA_ALL;
SET _NULL_;
RUN;
PROJECT.FAA_ALL is empty because nothing was read in.
The SORT procedure implicitly sorts a dataset in-place. You could have SAS move the sorted data by adding the set statement to your data step:
PROC SORT DATA=WORK.FAA_ALL NODUPKEY;
BY GROUND_SPEED;
RUN;
DATA PROJECT.FAA_ALL;
SET WORK.FAA_ALL;
RUN;
However, this still takes two steps, and requires extra disk I/O. Using the out option in a SAS procedure (as in DomPazz's answer) is almost always faster and more efficient than using a data step just to move data.

Run Duration in SAS enterprise miner

I have the following problem. We have several streams in Enterprise Miner and we would like to be able to tell how long was each run. I have tried to create a macro that would save the start and end time/date but the problem is that global variables defined in a node, are not seen anymore in a subsequent node (so are global only inside a node, but not between nodes). How people usually solve the problem? Any idea or suggestion?
Thanks, Umberto
Just write out timestamps to log (EM should produce a global log in the same fashion that EG and DI do)
Either use:
data _null_;
datetime = datetime();
put datetime= datetime20.;
run;
or macro language:
%put EM node started at at %sysfunc(time(),timeampm.) on %sysfunc(date(),worddate.).;
with a higly customised message you have read the log in SAS looking for those strings using regex.
Solution 2:
Other option is to created a table in a library that is visible from EM and EG for example and have sql inserts at the beginning/end of your process.
proc sql;
create table EM_logger
(jobcode char(100),
timestamp num informat=datetime20. format=datetime20.);
quit;
proc sql;
insert into EM_logger values('Begining Linear Reg',%sysfunc(datetime()));
quit;
data w;
do i=1 to 10000000;
output;
end;
run;
proc sql;
insert into EM_logger values('End Linear Reg',%sysfunc(datetime()));
quit;
Table layout can be as complex as you want and as long as you can access it you can get your statistics.
Hope it helps

Perform Iterative Operations on OUTEST or OUTSTAT variables in SAS?

In SAS, how can I assign a variable coming from either the OUTEST or OUTSTAT functions to be used in a loop?
For example, say I want to run some sort of iterative analysis until my mean (average) reaches a certain threshold. I know how to extract the mean using either OUTEST or OUTSTAT, but then how can I perform operations or blocks of code on it?
Thank you.
If you are interested in details, I am trying to perform backward selection of VIFs (to remove multicollinearity). Unfortunately, SAS doesn't seem to have a 'SELECTION=BACKWARD' feature for this...
EDIT: Updated with sample code:
%MACRO MULTICOLLINEARITY(TABLE_SUFFIX,YVAR,FIELDS,MAX_VIF);
/* PRELIMINARY PROC REG ON ALL FIELDS*/
PROC REG DATA=TABLE_&TABLE_SUFFIX. NOPRINT;
MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST1;
RUN;
/* RETAIN NON-NULL VIF FIELDS ONLY */
DATA NO_NULL_VIF;
SET PAREST1 (WHERE=(VarianceInflation <> .));
RUN;
/* CREATE VARIABLE LIST OF NON-NULL VIF FIELDS */
PROC SQL;
SELECT VARIABLE
INTO :NO_NULL_VIF_FIELDS SEPARATED BY ' '
FROM NO_NULL_VIF;
QUIT;
/* RE-RUN REGRESSION WITH NON-NULL VIF FIELDS ONLY */
PROC REG DATA=TABLE_&TABLE_SUFFIX. NOPRINT;
MODEL &YVAR = &NO_NULL_VIF_FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST2;
RUN;
/* START ITERATION OF DROPPING THE HIGHEST VIF UNTIL THE CRITERIA IS MET */
???
%MEND;
%MULTICOLLINEARITY(, RESPONSE, &INPUT_FIELDS,???)
And by criteria I mean VIF_MAX < N where N is some threshold specified in the macro. For example, if we want to retain only fields with VIF less than 5, then it should drop the highest one, re-run the PROC REG, drop the highest, re-run, etc. etc. until the highest on is less than 5.
First off - I'd verify that you can't do this using PROC MODEL. I'm not a regression guy so I don't know for sure. Might be worth posting on a more stat-focused site; CV isn't really appropriate since they're not generally trying to answer software questions, but maybe communities.sas.com . I would find it surprising if this wasn't directly possible in PROC MODEL and/or in one of the more complicated procs.
Second, the way I'd approach this is to write a recursive macro. Take out the first part (the non-null VIF fields) and either move that to an outer macro that just runs once, or make it an expectation of the programmer to do on his/her own (unless this is not feasible, and/or can change with iterations - not something I'm knowledgeable of). Then do something like this:
%MACRO MULTICOLLINEARITY(TABLE_SUFFIX,YVAR,FIELDS,MAX_VIF);
ods _all_ close;
%put Running with &fields; *note which fields currently running;
*also may want to include a run # counter as parameter;
PROC REG DATA=TABLE_&TABLE_SUFFIX.;
MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST2;
RUN;
quit;
*Data step to analyse PAREST2 and see if any of the fields can be dropped;
proc sort data=parest2;
by descending varianceinflation;
run;
data _null_;
set parest2(obs=1);
if varianceinflation > &max_vif then do;
fields_run = tranwrd("&fields",trim(variable),' ');
if not missing(fields_run) then do;
call_string = cats('%multicollinearity(',"&table_suffix.,&yvar.,",fields_run,",&max_vif.)");
call execute(call_string);
end;
end;
else do;
put "Stopped with Max VIF:" variable "=" varianceinflation;
run;
ods preferences;
%MEND MULTICOLLINEARITY;
Then you call it once with the full field list, and it calls itself in the CALL EXECUTE if there is still a parameter left. An incremented # of runs may be helpful (both to see how many times it ran in your log, and to be able to make sure that you don't end up in an infinite loop if you make a mistake with the fields variable deletion.)
I would run this with OPTION NONOTES NOSOURCE; and none of the symbogen/mprint stuff on, so you can just get the %put/put statements in your log.

How to remove Warning from SAS log?

i am getting a log warning stating
WARNING: 21 observations omitted due to missing ID values
i was transposing the dataset using this code:
PROC TRANSPOSE DATA= PT OUT= PT;
BY SOC_NM PT_NM;
ID TREATMENT;
VAR COUNT;
RUN;
i want to remove this warning from log.is there any option available in SAS for this
thank you for help.
You need to decide whether you are keeping the TREATMENT=' ' records or not. If you want to keep them, then you need to assign a nonmissing value to TREATMENT. If not, then the WHERE statement like vasja's answer will work.
Will adding WHERE clause do the job for you?
PROC TRANSPOSE DATA= PT OUT= PT;
BY SOC_NM PT_NM;
ID TREATMENT;
VAR COUNT;
WHERE NOT MISSING(TREATMENT);
RUN;
Before transposing, add this condition in the data step
if TREATMENT=. then TREATMENT=99;
after transposing, drop the variable "_99"
There's no option to remove warning messages from the log. If you really must keep your code as is then you can use PROC PRINTTO to temporarily divert the log output to an external file. However, this means you won't see anything in the log for that particular step, so it is not something I would recommend unless you are very sure of what you are doing. Check out the example code below, you'll see that only the steps creating tables a and c show in the log.
data a;
run;
proc printto log='c:\temp\temp.log';
run;
data b;
run;
proc printto;
run;
data c;
run;