Controlling export (ODS or PROC) based on Number of Observations - sas

I currently have a SAS process that generates multiple data sets (whether they have observations or not). I want to determine a way to control the export procedure based on the total number of observations (if nobs > 0, then export). My first attempt was something primitive using if/then logic comparing a select into macro var (counting obs in a data set) -
DATA _NULL_;
SET A_EXISTS_ON_B;
IF &A_E > 0 THEN DO;
FILE "C:\Users\ME\Desktop\WORKLIST_T &PDAY..xls";
PUT TASK;
END;
RUN;
The issue here is that I don't have a way to write multiple sets to the same workbook with multiple sheets(or do I?)
In addition, whenever I try and add another "Do" block, with similar logic, the execution fails. If this cannot be done with a data null, would ODS be the answer?

The core of what you want to do, conditionally execute code, can be done one of a number of ways.
Let's imagine we have a short macro that exports a dataset to excel. Simple as pie.
%macro export_to_excel(data=,file=,sheet=);
proc export data=&data. outfile=&file. dbms=excel replace;
sheet=&sheet.;
run;
%mend export_to_excel;
Now let's say we want to do this conditionally. How we do it depends, to some degree, on how we call this macro in our code now.
Let's say you have:
%let wherecondition=1; *always true!;
data class;
set sashelp.class;
if &wherecondition. then output;
run;
%export_to_excel(data=class,file="c:\temp\class.xlsx", sheet=class1);
Now you want to make this so it only exports if class has some rows in it, right. So you get the # of obs in class:
proc sql;
select count(1) into :classobs from class;
quit;
And now you need to incorporate that somehow. In this case, the easiest way is to add a condition to the export macro. Open code doesn't allow conditional executing of code, so it needs to be in a macro.
So we do:
%macro export_to_excel(data=,file=,sheet=,condition=1);
%if &condition. %then %do;
proc export data=&data. outfile=&file. dbms=excel replace;
sheet=&sheet.;
run;
%end;
%mend export_to_excel;
And you add the count to the call:
%export_to_excel(data=class,file="c:\temp\class.xlsx", sheet=class1,condition=&classobs.)
Tada, now it won't try to export when it's 0. Great.
If this code is already in a macro, you don't have to alter the export macro itself. You can simply put that %if %then part around the macro call. But that's only if the whole thing is already a macro - %if isn't allowed outside of macros (sorry).
Now, if you're exporting a whole bunch of datasets, and you're generating your export calls from something, you can add the condition there, more easily and more smoothly than this.
Basically, either make by hand (if that makes sense), or use proc sql or proc contents or (other method of your choice) to make a dataset that contains one row per dataset-to-export, with four variables: dataset name, file to export, sheet to export (unless that's the same as the dataset name), and count of observations for that dataset. Often the first three would be made by hand, and then merged/updated via sql or something else to the count of obs per dataset.
Then you can generate calls to export, like so:
proc sql;
select cats('%export_to_excel(data=',dataname,',file=',filename,',sheet=',sheetname,')')
into :explist separated by ' '
from datasetwithnames
where obsnum>0;
quit;
&explist.; *this actually executes them;
Assuming obsnum is the new variable you created with the # of obs, and the other variables are obviously named. That won't pull a line with anything with 0 observations - so it never tries to execute the export. That works with the initial export macro just as well as with the modified one.

Suggest you google around for different approaches to writing XLS files.
Regarding using a DATA step or PROC step, the DATA step is tolerant of datasets that have 0 obs. If the SET statement reads a dataset that has 0 obs, it will simply end the step. So you don't need special logic. Most PROCS also accomodate 0 obs dataset without throwing a warning or error.
For example:
1218 *Make a 0 obs dataset;
1219 data empty;
1220 x=1;
1221 stop;
1222 run;
NOTE: The data set WORK.EMPTY has 0 observations and 1 variables.
1223
1224 data want;
1225 put "I run before SET statement.";
1226 set empty;
1227 put "I do not run after SET statement.";
1228 run;
I run before SET statement.
NOTE: There were 0 observations read from the data set WORK.EMPTY.
NOTE: The data set WORK.WANT has 0 observations and 1 variables.
1229
1230 proc print data=empty;
1231 run;
NOTE: No observations in data set WORK.EMPTY.
But note as Joe points out, PROC EXPORT will happily export a dataset with 0 obs and write an file with 0 records, overwriting if it was there already. e.g.:
1582 proc export data=sashelp.class outfile="d:\junk\class.xls";
1583 run;
NOTE: File "d:\junk\class.xls" will be created if the export process succeeds.
NOTE: "CLASS" range/sheet was successfully created.
1584
1585 data class;
1586 stop;
1587 set sashelp.class;
1588 run;
NOTE: The data set WORK.CLASS has 0 observations and 5 variables.
1589
1590 *This will replace class.xls";
1591 proc export data=class outfile="d:\junk\class.xls" replace;
1592 run;
NOTE: "CLASS" range/sheet was successfully created.
ODS statements would likely do the same.
I use a macro to check if a dataset is empty. SO answers like:
How to detect how many observations in a dataset (or if it is empty), in SAS?

Related

How to skip code if created dataset has zero rows

I have a job which at first imports some xlsx files, then connects to multiple DB tables. Based on conditions, the job selects rows to output, and creates an excel file to send on to the final end-user.
Sometimes, that job returns zero rows, which is acceptable; in that case, I would prefer to create an empty excel file with only the variables, but not run the other code (checking/cleaning code).
How can I conditionally execute code only when there are results?
Something like this:
I get 0 rows
If Result = 0 then Go to *"here"*
Else *"just run the code further"*
You have a few useful things that can help you here.
First off, PROC SQL sets a macro variable SQLOBS, which is particularly useful in identifying how many records were returned from the last SQL query it ran.
proc sql;
select * from sashelp.class;
quit;
%put I returned &SQLOBS rows;
You might use this to drive further processing, either with %IF blocks as Tom notes in comments or other methods I will cover below.
You can also check how many rows are in a dataset explicitly, if you prefer a slightly more robust option.
proc sql;
select count(*) into :class_count from sashelp.class;
quit;
%put I returned &class_count rows;
For very large datasets, there are faster options (using the dataset descriptors, dictionary tables, or a few other options), but for most tables this is fine.
Either way, what I would typically do with a program I intended to run in production would be then to drive the rest of the program from macros.
%macro whatIWantToDo(params);
...
do stuff
...
%mend whatIWantToDo;
proc sql;
mySqlStuff;
quit;
%if &sqlobs. gt 0 %then %do;
%whatIWantToDo(params);
%end;
%else %do;
%put Nothing to do;
%end;
Another option is to use call execute; this is appropriate if your data drives the macro parameters. The big advantage of call execute is that it only runs if you have data rows - if you have zero, it won't do anything!
Say you have some datasets to run code on. You could have up to twelve - one per month - but only have them for the current calendar year, so in Jan you have one, Feb you have two, etc. You could do this:
data mydata_jan mydata_feb mydata_mar;
set sashelp.class;
run;
%macro printit(data=);
title "Printing &data.";
proc print data=&data;
run;
title;
%mend printit;
data _null_;
set sashelp.vtable;
where upcase(memname) like 'MYDATA_%' and nobs gt 0;
callstr = cats('%printit(data=',memname,')');
call execute(Callstr);
run;
First I make the datasets, with a name I can programmatically identify. Then I make the macro that I want to run on each (this could be checking, cleaning, whatever). Then I use sashelp.vtable which shows which tables are created, and check the nobs variable (number of observations) is more than zero. Then I use call execute to run the macro on that dataset!

Is there any way to apply conditional logic for keeping/dropping columns in SAS?

I'm trying to drop particular columns in SAS based on a macro variable, my hands are slightly tied in terms of what code I can use - so I need a solution in BASE SAS.
I've already tried wrapping the drop/keep in an if, but I know the drop happens at run time so this won't work.
Example:
data dropsomecolumns;
if &somemacro =1 then do;
drop somecol1 somecol2;
end;
run;
You either need to use macro code to conditionally generate the code you want.
data dropsomecolumns;
set have;
%if &somemacro =1 %then %do;
drop somecol1 somecol2;
%end;
run;
Or change so that the macro variable has the list of columns to drop.
%let drop_columns=somecol1 somecol2;
data dropsomecolumns(drop=&drop_columns);
set have;
run;
Note that the drop statement will give a warning if no variables are listed, but the drop= dataset option will not give that warning.
You have have no variables listed and it will work fine and assumes not are to be dropped. So if the macro variable is a list of variables just use:
drop &drop_columns;
This works fine:
data demo;
set sashelp.class;
drop ;
run;
So no conditional logic needed.

SAS: proc reg and macro

i have a data that contain 30 variable and 2000 Observations.
I want to calculate regression in a loop, whan in each step I delete the i row in the data.
so in the end I need thet my output will be 2001 regrsion, one for the regrsion on all the data end 2000 on each time thet I drop a row.
I am new to sas, and I tray to find how to do it withe macro, but I didn't understand.
Any comments and help will be appreciated!
This will create the data set I was talking about in my comment to Chris.
data del1V /view=del1v;
length group _obs_ 8;
set sashelp.class nobs=nobs;
_obs_ = _n_;
group=0;
output;
do group=1 to nobs;
if group eq _n_ then;
else output;
end;
run;
proc sort out=analysis;
by group;
run;
DATA NEW;
DATA OLD;
do i = 1 to 2001;
IF _N_ ^= i THEN group=i;
else group=.;
output;
end;
proc sort data=new;
by group;
proc reg syntax;
by group;
run;
This will create a data set that is much longer. You will only call proc reg once, but it will run 2001 models.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. SAS can create a data set with the GROUP column to differentiate the results.
I edited my original answer per #data null suggestion. I agree that the above is probably faster, though I'm not as confident that it would be 100x faster. I do not know enough about the costs of the overhead of proc reg versus the cost of the group by statement and a larger data set. Regardless the answer above is simpler programming. Here is my original answer/alternate approach.
You can do this within a macro program. It will have this general structure:
%macro regress;
%do i=1 %to 2001;
DATA NEW;
DATA OLD;
IF _N_=&I THEN DELETE;
RUN;
proc reg syntax;
run;
%end;
%mend;
%regress
Macros are an advanced programming function in SAS. The macro program is required in order to do a loop of proc reg. The %'s are indicative of macro functions. &i is a macro variable (& is the prefix of a macro variable that is being called). The macro is created in a block that starts and ends with %macro / %mend, and called by %regress.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. Use &i to create a different data set each time and then append together as part of the macro loop.

Using a series of values for a SAS macro parameter

I'm looking for a way to use a series of values for a macro parameter instead of a single value. I'm basically manipulating a series of files for consecutive months (May 2014 to Sept 2015) and I've written a macro to take advantage of the naming conventions. However, I'm still manually writing out the months to use the macro. I'm doing this many times over with lots of different files from this month. Is there a way to have the parameter reference a list of values and go through them like an array/do-loop? I've looked into %ARRAY as a possibility but that doesn't seem to do what I'm looking for unless I'm not seeing it's full capability. I've attached a sample of this code below.
%MACRO memmonth(monyr=);
proc freq data=work.both_&monyr ;
table var1/ out=work.mem_&monyr;
run;
data work.mem_&monyr;
set work.mem_&monyr;
count_&monyr=count;
run;
%MEND memmonth;
%memmonth(monyr=may14)
%memmonth(monyr=jun14)
%memmonth(monyr=jul14)
%memmonth(monyr=aug14)
%memmonth(monyr=sep14)
%memmonth(monyr=oct14)
%memmonth(monyr=nov14)
%memmonth(monyr=dec14)
%memmonth(monyr=jan15)
%memmonth(monyr=feb15)
%memmonth(monyr=mar15)
%memmonth(monyr=apr15)
%memmonth(monyr=may15)
%memmonth(monyr=jun15)
%memmonth(monyr=jul15)
%memmonth(monyr=aug15)
%memmonth(monyr=sep15)
In general I would recommend passing the list of values as a space delimited list and adding looping logic in the macro. If spaces are valid characters in the values then use some other delimiter. Do not use comma as the delimiter as it means you will need to use macro quoting to call the macro.
So your basic macro is this.
%macro memmonth(monyr);
proc freq data=work.both_&monyr ;
table var1/ out=work.mem_&monyr (rename=(count=count_&monyr)) ;
run;
%mend memmonth;
%memmonth(may14)
%memmonth(jun14)
You could change it to this.
%macro memmonth(monyrlist);
%local i monyr;
%do i=1 %to %sysfunc(countw(&monyrlist));
%let monyr=%scan(&monyrlist,&i);
proc freq data=work.both_&monyr ;
table var1/ out=work.mem_&monyr (rename=(count=count_&monyr)) ;
run;
%end;
%mend memmonth;
%memmonth(may14 jun14)
If you always want to process all of the months in an interval then you could just pass in the start and end month of the interval.
%macro memmonth(start,end);
%local i monyr;
%do i=0 %to %sysfunc(intck(month,"01&start"d,"01&end"d));
%let monyr=%sysfunc(intnx(month,"01&start"d,&i),monyy5.);
proc freq data=work.both_&monyr ;
table var1/ out=work.mem_&monyr (rename=(count=count_&monyr)) ;
run;
%end;
%mend memmonth;
%memmonth(may14,sep15)
If you have a source list, whether it is a text file or a dataset, you can use a simple data step to generate macro calls. So if you have an input dataset with the variable MONYR then your driver program would look like this:
data _null_;
set mylist ;
call execute(cats('%nrstr(memmonth)(',MONYR,')'));
run;
If the source is a file with the names then replace the SET statement with the appropriate INFILE and INPUT statements. If the source is a directory name then look at one of the many SAS ways to read the names of files in a directory into a dataset and use that to drive the macro call generation.

Perform Iterative Operations on OUTEST or OUTSTAT variables in SAS?

In SAS, how can I assign a variable coming from either the OUTEST or OUTSTAT functions to be used in a loop?
For example, say I want to run some sort of iterative analysis until my mean (average) reaches a certain threshold. I know how to extract the mean using either OUTEST or OUTSTAT, but then how can I perform operations or blocks of code on it?
Thank you.
If you are interested in details, I am trying to perform backward selection of VIFs (to remove multicollinearity). Unfortunately, SAS doesn't seem to have a 'SELECTION=BACKWARD' feature for this...
EDIT: Updated with sample code:
%MACRO MULTICOLLINEARITY(TABLE_SUFFIX,YVAR,FIELDS,MAX_VIF);
/* PRELIMINARY PROC REG ON ALL FIELDS*/
PROC REG DATA=TABLE_&TABLE_SUFFIX. NOPRINT;
MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST1;
RUN;
/* RETAIN NON-NULL VIF FIELDS ONLY */
DATA NO_NULL_VIF;
SET PAREST1 (WHERE=(VarianceInflation <> .));
RUN;
/* CREATE VARIABLE LIST OF NON-NULL VIF FIELDS */
PROC SQL;
SELECT VARIABLE
INTO :NO_NULL_VIF_FIELDS SEPARATED BY ' '
FROM NO_NULL_VIF;
QUIT;
/* RE-RUN REGRESSION WITH NON-NULL VIF FIELDS ONLY */
PROC REG DATA=TABLE_&TABLE_SUFFIX. NOPRINT;
MODEL &YVAR = &NO_NULL_VIF_FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST2;
RUN;
/* START ITERATION OF DROPPING THE HIGHEST VIF UNTIL THE CRITERIA IS MET */
???
%MEND;
%MULTICOLLINEARITY(, RESPONSE, &INPUT_FIELDS,???)
And by criteria I mean VIF_MAX < N where N is some threshold specified in the macro. For example, if we want to retain only fields with VIF less than 5, then it should drop the highest one, re-run the PROC REG, drop the highest, re-run, etc. etc. until the highest on is less than 5.
First off - I'd verify that you can't do this using PROC MODEL. I'm not a regression guy so I don't know for sure. Might be worth posting on a more stat-focused site; CV isn't really appropriate since they're not generally trying to answer software questions, but maybe communities.sas.com . I would find it surprising if this wasn't directly possible in PROC MODEL and/or in one of the more complicated procs.
Second, the way I'd approach this is to write a recursive macro. Take out the first part (the non-null VIF fields) and either move that to an outer macro that just runs once, or make it an expectation of the programmer to do on his/her own (unless this is not feasible, and/or can change with iterations - not something I'm knowledgeable of). Then do something like this:
%MACRO MULTICOLLINEARITY(TABLE_SUFFIX,YVAR,FIELDS,MAX_VIF);
ods _all_ close;
%put Running with &fields; *note which fields currently running;
*also may want to include a run # counter as parameter;
PROC REG DATA=TABLE_&TABLE_SUFFIX.;
MODEL &YVAR = &FIELDS / VIF COLLIN NOINT;
ODS OUTPUT PARAMETERESTIMATES=PAREST2;
RUN;
quit;
*Data step to analyse PAREST2 and see if any of the fields can be dropped;
proc sort data=parest2;
by descending varianceinflation;
run;
data _null_;
set parest2(obs=1);
if varianceinflation > &max_vif then do;
fields_run = tranwrd("&fields",trim(variable),' ');
if not missing(fields_run) then do;
call_string = cats('%multicollinearity(',"&table_suffix.,&yvar.,",fields_run,",&max_vif.)");
call execute(call_string);
end;
end;
else do;
put "Stopped with Max VIF:" variable "=" varianceinflation;
run;
ods preferences;
%MEND MULTICOLLINEARITY;
Then you call it once with the full field list, and it calls itself in the CALL EXECUTE if there is still a parameter left. An incremented # of runs may be helpful (both to see how many times it ran in your log, and to be able to make sure that you don't end up in an infinite loop if you make a mistake with the fields variable deletion.)
I would run this with OPTION NONOTES NOSOURCE; and none of the symbogen/mprint stuff on, so you can just get the %put/put statements in your log.