SAS include code into data step - sas

I have dynamically create a myfile.sas with following content:
and a = 0
and b = 0
Now I want to include this file into a data step:
data y;
set x;
if 1=1
%include incl("myfile.sas")
then selektion=0;
else selektion=1;
run;
The result should be:
data y;
set x;
if 1=1
and a=0
and b=0
then myvar=0
else myvar=1;
run;
However I get the following error:
ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored.
Is this possible to include the file into the if statement?

Indeed, that doesn't work. You can use %include within a data or proc step to add some lines to it but not within an incomplete statement.
Had your myfile.sas looked like this:
if 1=1
and a = 0
and b = 0
you could have written
data y;
set x;
%include "myfile.sas";;
then selektion=0;
else selektion=1;
run;
Couldn't you have these lines in a macro instead of a file?
%macro mymacro;
and a=0
and b=0
%mend;
data y;
set x;
if 1=1
%mymacro
then selektion=0;
else selektion=1;
run;
If that myfile.sas has to stay as is, you could work around it in this rather convoluted (but still generic) way:
filename myfile temp;
data _null_;
file myfile2;
infile 'myfile.sas' eof=end;
input;
if _n_=1 then put '%macro mymacro;';
put _infile_;
return;
end:
put '%mend;';
run;
%include myfile;
data y;
set x;
if 1=1
%mymacro
then selektion=0;
else selektion=1;
run;

The %INCLUDE needs to be at statement boundary. You could put the IF 1=1 into the same file or into another file. Make sure to include semi-colon to end the %INCLUDE command, but don't include a semi-colon in the contents of of the file.
data y;
set x;
%include incl("if1file.sas","myfile.sas") ;
then selektion=0;
else selektion=1;
run;
A better solution might be to put the code into a macro variable (if less than 64K bytes).
%let condition=
and a = 0
and b = 0
;
data y;
set x;
if 1=1 &condition then selektion=0;
else selektion=1;
run;
If it is longer than 64K bytes then define it as a macro instead.
%macro condition;
and a = 0
and b = 0
%mend;
data y;
set x;
if 1=1 %condition then selektion=0;
else selektion=1;
run;

According to SAS documentation:
%INCLUDE Statement
Brings a SAS programming statement, data lines, or both, into a current SAS program.
The injection you are attempting is not a complete statement, so it fails. A more specific description of the action you are describing would be %INLINE. However, there is no such SAS statement.
Let's call a program that outputs code a 'codegener' and the output it produces the 'codegen'
In the context of your use the codegen is specific to a single statement. This highly suggests the codegener should be placing the codegen in a macro variable (for ease of later use) instead of a file.
Suppose the codegener uses data about statement construction:
DATA statements_meta;
length varname $32 operator $10 value $200;
input varname operator value;
datalines;
a = 0
b = 0
run;
and the codegener is a DATA step
DATA _null_;
file "myfile.snippet";
... looping logic over data for statement construction ...
put " and " varname " = 0 "
...
run;
Change the codegener to be more like the following:
DATA _null_;
length snippet $32000;
snippet = "";
... looping logic over data for statement construction ...
snippet = catx (" ", snippet, "and", varname, comparisonOperator, comparisonValue);
... end loop
call symput('snippet', trim(snippet));
stop;
run;
...
DATA ...
if 1=1 &snippet then ... else ...
run;

Related

Matching SAS character variables to a list

So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.

print warning to log if then else statement sas

I am writing an if/then/else statement, where the final else is :
if variable2 = 'foo' then variable = 'bar'
else variable = .
Can I print a custom 'warning' to the log file that has a list or array of the variable2 names where
variable = .
You can use the PUTLOG statement to write messages to the log.
if variable2 = 'foo' then variable = 'bar' ;
else do;
variable = . ;
putlog "WARNING: bad value " variable2 = ;
end ;
It looks like the OP wanted a single message with all the relevant variable names listed. There are various ways to do that... the easiest would be a quick PROC SQL on the output data set after this step finishes:
proc sql noprint;
select distinct variable2 into :bad_vals separated by ' '
from my_data
where variable = .
;
quit;
%put WARNING: Bad values of VARIABLE: &bad_vals;
This would be fine for relatively small data sets; for big data sets you could avoid the extra pass through the data by maintaining a list of relevant values while the initial data step is running, and just print the message once at the end of the data step.
data mydata;
length bad_vals $ 10000;
drop bad_vals;
set in end=end;
...blah...
...else do;
variable = .;
bad_vals = strip(bad_vals) || ' ' || variable2;
end;
if end then do;
putlog 'WARNING: Bad values of VARIABLE:' bad_vals;
end;
run;
or you could use a macro var instead of the data step var bad_vals, etc.

What is the simplest way to either display the data, if there are observations, or create an empty record stating that the dataset was empty?

I have looked around quite a bit for something of this nature, and the majority of sources all give examples of counting the amount of observations etc.
But what I am actually after is a simple piece of code that will check to see if there are any observations in the dataset, if that condition is met then the program needs to continue as normal, but if the condition is not met then I would like a new record to be created with a variable stating that the dataset is empty.
I have seen macros and SQL code that can accomplish this, but what I would like to know is is it possible to do the same in SAS code? I know the code I have below does not work, but any insight would be appreciated.
Data TEST;
length VAR1 $200.;
set sashelp.class nobs=n;
call symputx('nrows',n);
obs= &nrows;
if obs = . then VAR1= "Dataset is empty"; output;
Run;
You could do it by always appending a 1-row data set with the empty dataset message, and then delete the message if it doesn't apply.
data empty_marker;
length VAR1 $200;
VAR1='Dataset is empty';
run;
Data TEST;
length VAR1 $200.;
set
sashelp.class nobs=n
empty_marker (in=marker)
;
if (marker) and _n_ > 1 then delete;
Run;
Easiest way I can think of is to use the nobs statement to check the number of records. The trick is you don't want to actually read from an empty data set. That will terminate the DATA Step and the nobs value will not be set. So you use an always false if statement to check the number of observations.
data test1;
format x best. msg $32.;
stop;
run;
data test1;
if _n_ = 0 then
set test1 nobs=nobs;
if ^nobs then do;
msg = "NO RECORDS";
output;
stop;
end;
set test1;
/*Normal code here*/
output;
run;
So this populates the nobs value with 0. The if clause sees the 0 and allows you to set the message and output that value. Use the stop to then terminate the DATA Step. Outside of that check, do your normal data step code. You need the ending output statement because of the first. Once the compiler sees an output it will not do it automatically for you.
Here it works for a data set with values.
data test2;
format x best. msg $32.;
do x=1 to 5;
msg="Yup";
output;
end;
run;
data test2;
if _n_ = 0 then
set test2 nobs=nobs;
if ^nobs then do;
msg = "NO RECORDS";
output;
stop;
end;
set test2;
y=x+1;
output;
run;

Loading SAS code from text file inside a data step

Assume that I have a SAS data step, where I subtract every observation (say, I only have variable X) from its mean:
data tmp;
set tmp;
x = x-2;
run;
Let's say mean is not always 2 and I have another script that creates a text file with one line, which contains:
x = x-2;
Now, the question is, is there any way I can have something like:
data tmp;
set tmp;
load text_file;
run;
To do the same thing as the first data step? In other words, I want a solution that relies on using the content of the file (either as I showed in the data step or within a macro).
%INCLUDE will do what you want. Assuming your text file "c:\mycode.sas" has the line
x=x-2;
then you can do this:
data tmp;
set tmp;
%include "c:\mycode.sas";
run;
I'd note that this is a really, really bad way to do this, but it's what you asked for.
If I wanted to subtract the mean of x from x (standardizing the data), I'd either use PROC STDIZE, or do this:
proc means data=tmp;
var x;
output out=x_mean mean=x_bar;
run;
data want;
set tmp;
if _n_ = 1 then set x_mean;
x=x-x_bar;
run;
Or, PROC STDIZE (included in SAS/STAT):
proc stdize data=tmp out=want_std method=mean;
var x;
run;

Macro returning a value

I created the following macro. Proc power returns table pw_cout containing column Power. The data _null_ step assigns the value in column Power of pw_out to macro variable tpw. I want the macro to return the value of tpw, so that in the main program, I can call it in DATA step like:
data test;
set tmp;
pw_tmp=ttest_power(meanA=a, stdA=s1, nA=n1, meanB=a2, stdB=s2, nB=n2);
run;
Here is the code of the macro:
%macro ttest_power(meanA=, stdA=, nA=, meanB=, stdB=, nB=);
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw'=&power);
run;
&tpw
%mend ttest_power;
#itzy is correct in pointing out why your approach won't work. But there is a solution maintaing the spirit of your approach: you need to create a power-calculation function uisng PROC FCMP. In fact, AFAIK, to call a procedure from within a function in PROC FCMP, you need to wrap the call in a macro, so you are almost there.
Here is your macro - slightly modified (mostly to fix the symput statement):
%macro ttest_power;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw', power);
run;
%mend ttest_power;
Now we create a function that will call it:
proc fcmp outlib=work.funcs.test;
function ttest_power_fun(meanA, stdA, nA, meanB, stdB, nB);
rc = run_macro('ttest_power', meanA, stdA, nA, meanB, stdB, nB, tpw);
if rc = 0 then return(tpw);
else return(.);
endsub;
run;
And finally, we can try using this function in a data step:
options cmplib=work.funcs;
data test;
input a s1 n1 a2 s2 n2;
pw_tmp=ttest_power_fun(a, s1, n1, a2, s2, n2);
cards;
0 1 10 0 1 10
0 1 10 1 1 10
;
run;
proc print data=test;
You can't do what you're trying to do this way. Macros in SAS are a little different than in a typical programming language: they aren't subroutines that you can call, but rather just code that generate other SAS code that gets executed. Since you can't run proc power inside of a data step, you can't run this macro from a data step either. (Just imagine copying all the code inside the macro into the data step -- it wouldn't work. That's what a macro in SAS does.)
One way to do what you want would be to read each observation from tmp one at a time, and then run proc power. I would do something like this:
/* First count the observations */
data _null_;
call symputx('nobs',obs);
stop;
set tmp nobs=obs;
run;
/* Now read them one at a time in a macro and call proc power */
%macro power;
%do j=1 %to &nobs;
data _null_;
nrec = &j;
set tmp point=nrec;
call symputx('meanA',meanA);
call symputx('stdA',stdA);
call symputx('nA',nA);
call symputx('meanB',meanB);
call symputx('stdB',stdB);
call symputx('nB',nB);
stop;
run;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
proc append base=pw_out_all data=pw_out; run;
%end;
%mend;
%power;
By using proc append you can store the results of each round of output.
I haven't checked this code so it might have a bug, but this approach will work.
You can invoke a macro which calls procedures, etc. (like the example) from within a datastep using call execute(), but it can get a bit messy and difficult to debug.