Assume that I have a SAS data step, where I subtract every observation (say, I only have variable X) from its mean:
data tmp;
set tmp;
x = x-2;
run;
Let's say mean is not always 2 and I have another script that creates a text file with one line, which contains:
x = x-2;
Now, the question is, is there any way I can have something like:
data tmp;
set tmp;
load text_file;
run;
To do the same thing as the first data step? In other words, I want a solution that relies on using the content of the file (either as I showed in the data step or within a macro).
%INCLUDE will do what you want. Assuming your text file "c:\mycode.sas" has the line
x=x-2;
then you can do this:
data tmp;
set tmp;
%include "c:\mycode.sas";
run;
I'd note that this is a really, really bad way to do this, but it's what you asked for.
If I wanted to subtract the mean of x from x (standardizing the data), I'd either use PROC STDIZE, or do this:
proc means data=tmp;
var x;
output out=x_mean mean=x_bar;
run;
data want;
set tmp;
if _n_ = 1 then set x_mean;
x=x-x_bar;
run;
Or, PROC STDIZE (included in SAS/STAT):
proc stdize data=tmp out=want_std method=mean;
var x;
run;
Related
So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.
I'm facing the problem that I want to put data into a character variable.
So I have a long tranposed dataset where I have three variables: date( by which i transposed before hand) var (has three different outputs of my previous variables) and col1 (which includes the values of my previous variables).
Now i want to create a forth variable which has as well three different outputs. My problem is that I can create the variable put with my code it does always create missing value.
data pair2;
set data1;
if var="BNNESR" or var="BNNESR_r" or var="BNNESR_t" then output;
length all $ 20;
all=" ";
if var="BNNESR" then all="pdev";
if var="BNNESR_t" then all="trigger";
if var="BNNESR_r" then all="rdev";
drop var;
run;
Afterwards I want to tranpose it back by the "all" variable. I know i could just rename the old vars before I transpose it and then just keep them.
But the complete calculation will go on and actually will be turned into a macro where it is not that easy if would do it like that way.
Your program will just subset the input data and add a new variable that is empty because you are writing the data out before you assign any value to the new variable.
Use a subsetting IF (or WHERE) statement instead of using an explicit OUTPUT statement. Once your data step has an explicit OUTPUT statement then SAS no longer automatically writes the observation at the end of the data step iteration.
data pair2;
set data1;
if var="BNNESR" or var="BNNESR_r" or var="BNNESR_t" ;
length all $20;
if var="BNNESR" then all="pdev";
else if var="BNNESR_t" then all="trigger";
else if var="BNNESR_r" then all="rdev";
drop var;
run;
Since the list in the IF statement matches the values in the recode step then perhaps you want to just use a DELETE statement instead?
data pair2;
set data1;
length all $20;
if var="BNNESR" then all="pdev";
else if var="BNNESR_t" then all="trigger";
else if var="BNNESR_r" then all="rdev";
else delete;
drop var;
run;
I have dynamically create a myfile.sas with following content:
and a = 0
and b = 0
Now I want to include this file into a data step:
data y;
set x;
if 1=1
%include incl("myfile.sas")
then selektion=0;
else selektion=1;
run;
The result should be:
data y;
set x;
if 1=1
and a=0
and b=0
then myvar=0
else myvar=1;
run;
However I get the following error:
ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored.
Is this possible to include the file into the if statement?
Indeed, that doesn't work. You can use %include within a data or proc step to add some lines to it but not within an incomplete statement.
Had your myfile.sas looked like this:
if 1=1
and a = 0
and b = 0
you could have written
data y;
set x;
%include "myfile.sas";;
then selektion=0;
else selektion=1;
run;
Couldn't you have these lines in a macro instead of a file?
%macro mymacro;
and a=0
and b=0
%mend;
data y;
set x;
if 1=1
%mymacro
then selektion=0;
else selektion=1;
run;
If that myfile.sas has to stay as is, you could work around it in this rather convoluted (but still generic) way:
filename myfile temp;
data _null_;
file myfile2;
infile 'myfile.sas' eof=end;
input;
if _n_=1 then put '%macro mymacro;';
put _infile_;
return;
end:
put '%mend;';
run;
%include myfile;
data y;
set x;
if 1=1
%mymacro
then selektion=0;
else selektion=1;
run;
The %INCLUDE needs to be at statement boundary. You could put the IF 1=1 into the same file or into another file. Make sure to include semi-colon to end the %INCLUDE command, but don't include a semi-colon in the contents of of the file.
data y;
set x;
%include incl("if1file.sas","myfile.sas") ;
then selektion=0;
else selektion=1;
run;
A better solution might be to put the code into a macro variable (if less than 64K bytes).
%let condition=
and a = 0
and b = 0
;
data y;
set x;
if 1=1 &condition then selektion=0;
else selektion=1;
run;
If it is longer than 64K bytes then define it as a macro instead.
%macro condition;
and a = 0
and b = 0
%mend;
data y;
set x;
if 1=1 %condition then selektion=0;
else selektion=1;
run;
According to SAS documentation:
%INCLUDE Statement
Brings a SAS programming statement, data lines, or both, into a current SAS program.
The injection you are attempting is not a complete statement, so it fails. A more specific description of the action you are describing would be %INLINE. However, there is no such SAS statement.
Let's call a program that outputs code a 'codegener' and the output it produces the 'codegen'
In the context of your use the codegen is specific to a single statement. This highly suggests the codegener should be placing the codegen in a macro variable (for ease of later use) instead of a file.
Suppose the codegener uses data about statement construction:
DATA statements_meta;
length varname $32 operator $10 value $200;
input varname operator value;
datalines;
a = 0
b = 0
run;
and the codegener is a DATA step
DATA _null_;
file "myfile.snippet";
... looping logic over data for statement construction ...
put " and " varname " = 0 "
...
run;
Change the codegener to be more like the following:
DATA _null_;
length snippet $32000;
snippet = "";
... looping logic over data for statement construction ...
snippet = catx (" ", snippet, "and", varname, comparisonOperator, comparisonValue);
... end loop
call symput('snippet', trim(snippet));
stop;
run;
...
DATA ...
if 1=1 &snippet then ... else ...
run;
I created the following macro. Proc power returns table pw_cout containing column Power. The data _null_ step assigns the value in column Power of pw_out to macro variable tpw. I want the macro to return the value of tpw, so that in the main program, I can call it in DATA step like:
data test;
set tmp;
pw_tmp=ttest_power(meanA=a, stdA=s1, nA=n1, meanB=a2, stdB=s2, nB=n2);
run;
Here is the code of the macro:
%macro ttest_power(meanA=, stdA=, nA=, meanB=, stdB=, nB=);
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw'=&power);
run;
&tpw
%mend ttest_power;
#itzy is correct in pointing out why your approach won't work. But there is a solution maintaing the spirit of your approach: you need to create a power-calculation function uisng PROC FCMP. In fact, AFAIK, to call a procedure from within a function in PROC FCMP, you need to wrap the call in a macro, so you are almost there.
Here is your macro - slightly modified (mostly to fix the symput statement):
%macro ttest_power;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
data _null_;
set pw_out;
call symput('tpw', power);
run;
%mend ttest_power;
Now we create a function that will call it:
proc fcmp outlib=work.funcs.test;
function ttest_power_fun(meanA, stdA, nA, meanB, stdB, nB);
rc = run_macro('ttest_power', meanA, stdA, nA, meanB, stdB, nB, tpw);
if rc = 0 then return(tpw);
else return(.);
endsub;
run;
And finally, we can try using this function in a data step:
options cmplib=work.funcs;
data test;
input a s1 n1 a2 s2 n2;
pw_tmp=ttest_power_fun(a, s1, n1, a2, s2, n2);
cards;
0 1 10 0 1 10
0 1 10 1 1 10
;
run;
proc print data=test;
You can't do what you're trying to do this way. Macros in SAS are a little different than in a typical programming language: they aren't subroutines that you can call, but rather just code that generate other SAS code that gets executed. Since you can't run proc power inside of a data step, you can't run this macro from a data step either. (Just imagine copying all the code inside the macro into the data step -- it wouldn't work. That's what a macro in SAS does.)
One way to do what you want would be to read each observation from tmp one at a time, and then run proc power. I would do something like this:
/* First count the observations */
data _null_;
call symputx('nobs',obs);
stop;
set tmp nobs=obs;
run;
/* Now read them one at a time in a macro and call proc power */
%macro power;
%do j=1 %to &nobs;
data _null_;
nrec = &j;
set tmp point=nrec;
call symputx('meanA',meanA);
call symputx('stdA',stdA);
call symputx('nA',nA);
call symputx('meanB',meanB);
call symputx('stdB',stdB);
call symputx('nB',nB);
stop;
run;
proc power;
twosamplemeans test=diff_satt
groupmeans = &meanA | &meanB
groupstddevs = &stdA | &stdB
groupns = (&nA &nB)
power = .;
ods output Output=pw_out;
run;
proc append base=pw_out_all data=pw_out; run;
%end;
%mend;
%power;
By using proc append you can store the results of each round of output.
I haven't checked this code so it might have a bug, but this approach will work.
You can invoke a macro which calls procedures, etc. (like the example) from within a datastep using call execute(), but it can get a bit messy and difficult to debug.
data abc;
a = 1; output;
a = 99; output;
run;
proc format;
invalue abc
99 = .
other = _same_;
value abc
99 = .
other = _same_;
run;
proc means data = abc;
format a abc.;
informat a abc.;
var a;
run;
I would expect the above code to give me a mean of 1 for the variable a. But it doesn't, in proc means it doesn't seem to want to use the format I have defined. Is there an option I can turn on to make it work?
formats and informats don't work that way. informats change the incoming data before it gets saved in a sas data set. formats change the way data is presented for output, but the underlying data remains unchanged. additionally, formats don't apply to calculations.
you might try something like this?
data abc;
a = 1; output;
a = 99; output;
run;
data def;
set abc;
if a = 99 then a = .N;
run;
proc means data = def;
var a;
run;
As my knowledge, formats will work to display the values. formats will not consider into any analysis.
SD.