cross validation in SAS - sas

I have split my data into 5 folds in SAS. So I have
s1,s2,s3,s4,s5
I was wondering what's the best way to iterate through each of the folds to perform cross validation. For example, the first iteration I want to use s1 as my test set and s2,3,4,5 as the training sets, the second iteration use s2 as the test and s1,3,4,5 as training etc.
what kind of loop in SAS would accomplish this goal?
Thanks!

Probably best to call upon a macro to make it a bit easier to call upon.
%Macro Validate(cur,i) ;
%Do j = 1 %to 5 ;
%If &j <> &i %THEN
%DO;
Data &Cur._&j. ;
Set &cur S&j. ;
<validation steps>
Run;
%END;
%End;
%mend Validate ;
Data _null_ ;
Do i = 1 to 5 ;
Call Execute("%Validate(s"||strip(i)||","||strip(i)||");");
End;
Run;

Proc gmlselect performs k-folds cross validation with multiple methods to choose best models. It is experimental in 9.1 but released in production for 9.2+
Further info here
Hope this help.

Related

Is there a way to have a conditional transpose in SAS?

I need to make my transpose conditional. A flow I'm creating in EG allows you to turn some sections of the flow off. It does this by using a macro variable (e.g. &myvariable). When &myvariable is set to 0, that section of the flow will be filtered out, so effectively, no rows of data pass through that section.
This works great, but stacks/ transposes will not work when there is no data.
I need it so that when there is no data to go into the transpose, it instead just makes a padded column to replicate what would have been the output of that program.
It needs to be in base SAS, I'm using Enterprise Guide. I've already tried using the conditional logic functionality in EG, but it's not appropriate because I need an ordered list.
''' some conditional logic?
if &myvariable = 0 then do;
format padded_col1 $10.;
else do;
'''transpose
proc transpose data= some_dataset;
by id;
id year;
var income;
run;
'''
You need to use macro logic, not data step here, if you have SAS 9.4+ you can use %IF/%THEN in open code. If you don't you need to wrap it in a macro.
%if &myVariable = 0 %then %do;
*****sas code*******
%end;
%else %do;
****conditional proc transpose*****
%end;

SAS Macro for multiple datasets

I am new to SAS. I have 12(Monthly data) data sets in a folder.
Names of data sets are:
201401
201402
201403
...
201411
201412
Each data contain 10 Variables. Variable names are same for all data.
I want only 3 Variables among 10 and rename data by new_201401 and so on.
I am trying it manually by using Keep Var1 Var2 Var3; but is there any easy way or macro so we can make it fast? Thanks in advance.
I think this will do the trick:
%macro keep(table,var1,var2,var3,set);
data &table (keep=&var1 &var2 &var3);
set &set;
run;
%mend keep;
You can rename them using the following macro (note: the %if conditions are just split out to include a leading 0 for single digit months):
%macro monthly(year=,prefix=) ;
%do i=1 %to 2 ;
%if %eval(&i<10) %then Data_&year.0&i=&prefix&i ;
%else Data_&year&i=&prefix&i ;
%end ;
%mend monthly ;
You can then then for example pass these values into proc datasets for whatever years you need:
proc datasets library=work ;
change %monthly(year=2014,prefix=new_) %monthly(year=2015,prefix=new2_);
run ;

SAS macro do-loop with data step

This is my first macro, so my apologies if I missed something simple.
I need to do the same data step six (or more) times and append each one to the first, so I tried a do-loop within a macro. Everything works with the loop removed, but once the do-loop is added, I get errors that either say I have an extra %end or an extraneous %mend. All ideas welcome. Thanks!
%macro freeze_samples(orig_file=, samples= , Start_Freeze_Incr=,
End_Freeze_Incr= );
%do i = 1 %to &samples;
data freeze_slice_&i;
set &orig_file;
(do stuff)
run;
* If we have more than one slice, append to previous slice(s).;
%if &i > 1 %then %do;
proc append base = temp_1 data = temp_&i;
run;
%end;
%end;
%mend;
I think you either have a problem you didn't include in the text (ie, in the 'do stuff' section) or you have a bad session (ie, you fixed the problem but there's something from a previous run messing up something now). This runs fine (given I don't know what you're doing):
%macro freeze_samples(orig_file=, samples= , Start_Freeze_Incr=,
End_Freeze_Incr= );
%do i = 1 %to &samples;
data freeze_slice_&i;
set &orig_file;
*(do stuff);
run;
* If we have more than one slice, append to previous slice(s).;
%if &i > 1 %then %do;
proc append base = freeze_slice_1 data = freeze_slice_&i;
run;
%end;
%end;
%mend;
%freeze_samples(orig_file=sashelp.class,samples=2,start_freeze_incr=1,end_freeze_incr=5);
I would note that you're probably better off not doing whatever you're doing this way; in SAS, there is usually a better way than splitting data off into multiple datasets. But since I don't know what you're doing I can't really suggest the better way beyond recommending reading this article and keeping it in mind (even if you're doing something different than bootstrapping, the concept applies to almost everything in SAS).

"For in" loop equivalent in SAS 9.3

I'm searching for a while an equivalent of the for in loop (like in Python or in R) in SAS 9.3 macro language. The DO loop seem's to be the solution but did't work exactly as I want.
I founded a way to do it in a data step with a DO loop but it don't work with the macro language.
For example, in a data step, this code is working :
DATA _NULL_;
DO i = 1,3,5,9;
PUT i;
END;
RUN;
And then the log prompt as expected :
1
3
5
9
When I try to do the same with an %DO loop in a Macro, I have an error.
%MACRO test();
%DO i = 1,2,4,9 ;
%PUT i = &i;
%END;
%MEND;
%test();
The log promp these messages :
ERROR: Expected %TO not found in %DO statement.
ERROR: A dummy macro will be compiled
I'm quite new in SAS and stackoverflow so I hope my question is no too stupid. It's so simple to do this in Python and R then it must have a simple way to do it in SAS.
Thank's for help - J. Muller
The closest I've ever come across to this pattern in SAS macro language is this:
%MACRO test();
%let j=1;
%let vals=1 2 4 9;
%do %while(%scan(&vals,&j) ne );
%let i=%scan(&vals, &j);
%put &i;
%let j=%eval(&j+1);
%end;
%MEND;
%test();
(Warning: untested, as I no longer have a SAS installation I can test this out on.)
You can certainly get around it this way:
options mindelimiter=,;
options minoperator;
%MACRO test();
%DO i = 1 %to 9 ;
%if &i in (1,2,4,9) %then %do;
%PUT i = &i;
%END;
%end;
%MEND;
%test();
However, I think you can usually avoid this sort of call by executing your macro multiple times rather than attempting to control the loop inside the macro. For example, imagine a dataset and a macro:
data have;
input x;
datalines;
1
2
4
9
;;;;
run;
%macro test(x);
%put &x;
%mend test;
Now you want to call %test() once for each value in that list. Okay, easy to do.
proc sql;
select cats('%test(',x,')') into :testcall separated by ' ' from have;
quit;
&testcall;
That works just as well as your %do in loop, except it's data driven, meaning if you want to change the calls you just change the dataset (or if your data changes, the call automatically changes!). In general, SAS is more effective when designed as data driven programming rather than as entirely written code.

How to detect how many observations in a dataset (or if it is empty), in SAS?

I wonder if there is a way of detecting whether a data set is empty, i.e. it has no observations.
Or in another saying, how to get the number of observations in a specific data set.
So that I can write an If statement to set some conditions.
Thanks.
It's easy with PROC SQL. Do a count and put the results in a macro variable.
proc sql noprint;
select count(*) into :observations from library.dataset;
quit;
There are lots of different ways, I tend to use a macro function with open() and attrn(). Below is a simple example that works great most of the time. If you are going to be dealing with data views or more complex situations like having a data set with records marked for deletion or active where clauses, then you might need more robust logic.
%macro nobs(ds);
%let DSID=%sysfunc(OPEN(&ds.,IN));
%let NOBS=%sysfunc(ATTRN(&DSID,NOBS));
%let RC=%sysfunc(CLOSE(&DSID));
&NOBS
%mend;
/* Here is an example */
%put %nobs(sashelp.class);
Here's the more complete example that #cmjohns was talking about. It will return 0 if it is empty, -1 if it is missing, and has options to handle deleted observations and where clauses (note that using a where clause can make the macro take a long time on very large datasets).
Usage Notes:
This macro will return the number of observations in a dataset. If the dataset does not exist then -1 will be returned. I would not recommend this for use with ODBC libnames, use it only against SAS tables.
Parameters:
iDs - The libname.dataset that you want to check.
iWhereClause (Optional) - A where clause to apply
iNobsType (Optional) - Either NOBS OR NLOBSF. See SASV9 documentation for descriptions.
Macro definition:
%macro nobs(iDs=, iWhereClause=1, iNobsType=nlobsf, iVerbose=1);
%local dsid nObs rc;
%if "&iWhereClause" eq "1" %then %do;
%let dsID = %sysfunc(open(&iDs));
%end;
%else %do;
%let dsID = %sysfunc(open(&iDs(where=(&iWhereClause))));
%end;
%if &dsID %then %do;
%let nObs = %sysfunc(attrn(&dsID,nlobsf));
%let rc = %sysfunc(close(&dsID));
%end;
%else %do;
%if &iVerbose %then %do;
%put WARNING: MACRO.NOBS.SAS: %sysfunc(sysmsg());
%end;
%let nObs = -1;
%end;
&nObs
%mend;
Example Usage:
%put %nobs(iDs=sashelp.class);
%put %nobs(iDs=sashelp.class, iWhereClause=height gt 60);
%put %nobs(iDs=this_dataset_doesnt_exist);
Results
19
12
-1
Installation
I recommend setting up a SAS autocall library and placing this macro in your autocall location.
Proc sql is not efficient when we have large dataset. Though using ATTRN is good method but this can accomplish within base sas, here is the efficient solution that can give number of obs of even billions of rows just by reading one row:
data DS1;
set DS nobs=i;
if _N_ =2 then stop;
No_of_obs=i;
run;
The trick is producing an output even when the dataset is empty.
data CountObs;
i=1;
set Dataset_to_Evaluate point=i nobs=j; * 'point' avoids review of full dataset*;
No_of_obs=j;
output; * Produces a value before "stop" interrupts processing *;
stop; * Needed whenever 'point' is used *;
keep No_of_obs;
run;
proc print data=CountObs;
run;
The above code is the simplest way I've found to produce the number of observations even when the dataset is empty. I've heard NOBS can be tricky, but the above can work for simple applications.
A slightly different approach:
proc contents data=library.dataset out=nobs;
run;
proc summary data=nobs nway;
class nobs;
var delobs;
output out=nobs_summ sum=;
run;
This will give you a dataset with one observation; the variable nobs has the value of number of observations in the dataset, even if it is 0.
I guess I am trying to reinvent the wheel here with so many answers already. But I do see some other methods trying to count from the actual dataset - this might take a long time for huge datasets. Here is a more efficient method:
proc sql;
select nlobs from sashelp.vtable where libname = "library" and memname="dataset";
quit;