Report how many times condition is met in SAS - sas

Given a data step like this:
data tmp;
do i=1 to 10;
if 3<i<7 then do;
some stuff;
end;
end;
run;
I want to write to the log how many times the if statement is true. For example, in this example, I want to have a line in the log that says:
If statement true 3 times
because the condition is true when i is 4, 5, or 6. How can I do this?

Using retain to keep a counter variable, it's pretty easy to increment a count of how many times an if condition was met.
data tmp;
retain Counter 0;
do i=1 to 10;
if 3<i<7 then do;
Counter+1;
*some stuff;
end;
end;
put 'If statement true ' Counter 'time(s).';
run;
Note that this writes to the log once because it is the last thing that occurs before the data step terminates (there's only one loop in the data step in the example). If you wanted to do this for a data step that has more than one loop (e.g. when there is a set statement reading data in from another dataset, you'd want to tell SAS you only want it to report at the end of the step. You'd do it like this:
* create an example input data set;
data exampleData;
do i=1 to 10;
output;
end;
run;
* use a variable 'eof' to indicate the end of the input dataset;
data new;
set exampleData end=eof;
retain Counter 0;
if 3<i<7 then do;
Counter+1;
*some stuff;
end;
if eof then put 'If statement true ' Counter 'time(s).';
run;

Related

Matching SAS character variables to a list

So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.

nesting do until eof in do while loop

Is it possible to nest a do until (eof) in a do until or do while loop? I have not been able to get it to work.
Data
data have;
do i = 1 to 3;
output;
end;
run;
Example - quits after first j
data want;
j = 1;
do while (j < 4);
do until (eof);
set have end=eof;
end;
call missing(eof);
output;
j + 1;
end;
run;
Let me know if I haven't made my problem clear enough. Thanks!
In general you can use any type of DO loop inside any other type.
Your particular program is going to stop the first time it tries to run the SET statement on the second iteration of the outer DO loop since there are no more observations to be read. Most normal SAS data step stop in the same way when they have exhausted their input stream.
If you really want to re-read the whole dataset then use the POINT= option on the SET statement. Make sure to have a way to stop the step since it will no longer be able to read past the end of the input stream.
data want;
do j=1 to 4;
do p=1 to nobs;
set have point=p nobs=nobs ;
* Do something there ;
end;
* Do something else here ;
end;
* and maybe something here too ;
stop;
run;

Backend Processing of SET Statement and use of Continue and Leave Statement in SAS

I am beginner to SAS Programming.
I have written a piece of code to understand the stuff, but I am not getting why after getting the continue statement it is going to the output a statement.
Given below is the code :
data a B;
put 'entering do DATASTEP' ;
do i=1 to 4;
put 'entering do loop'" " i;
if (i=1) then do;
put 'value of i is 1'" " i;
put 'Entering the loop' ;
put j=_N_;
if _N_ = 2 then continue;
set sashelp.class(firstobs=1 obs=5);
put 'Ouside the loop';
output a;
end;
if (i=2) then do;
put 'value of i is 2'" " i;
put 'Entering the loop' ;
put j=_n_;
set sashelp.class(firstobs=6 obs=10);
put 'Ouside the loop';
output B;
end;
end;
put 'GETING OUT OF THE DATASTEP';
run;
For more clarity about my doubt request to please run this and then we can have a discussion about the output dataset and the log.
Thanks in advance.
It looks to me like the CONTINUE is working fine.
Normally SAS will stop the data step when you read past the end of the input data. Without the CONTINUE statement that would be when it tried to read from the first SET statement for the 6th time. But since you skipped it once it will stop when it tries to execute the second SET statement for the 6th time.
Here is a simplified version of what your data step is doing. Notice how it reads the records in 1,6,7,2,8,3,9,4,10,5 order.
data sample;
do i=1 to 10; output; end;
run;
data _null_ ;
if _n_^=2 then do;
set sample (firstobs=1 obs=5);
put i=;
end;
set sample (firstobs=6 obs=10);
put i=;
run;
i=1
i=6
i=7
i=2
i=8
i=3
i=9
i=4
i=10
i=5

SAS - repeating a data step to solve for a value

Is it possible to repeat a data step a number of times (like you might in a %do-%while loop) where the number of repetitions depends on the result of the data step?
I have a data set with numeric variables A. I calculate a new variable result = min(1, A). I would like the average value of result to equal a target and I can get there by scaling variable A by a constant k. That is solve for k where target = average(min(1,A*k)) - where k and target are constants and A is a list.
Here is what I have so far:
filename f0 'C:\Data\numbers.csv';
filename f1 'C:\Data\target.csv';
data myDataSet;
infile f0 dsd dlm=',' missover firstobs=2;
input A;
init_A = A; /* store the initial value of A */
run;
/* read in the target value (1 observation) */
data targets;
infile f1 dsd dlm=',' missover firstobs=2;
input target;
K = 1; * initialise the constant K;
run;
%macro iteration; /* I need to repeat this macro a number of times */
data myDataSet;
retain key 1;
set myDataSet;
set targets point=key;
A = INIT_A * K; /* update the value of A /*
result = min(1, A);
run;
/* calculate average result */
proc sql;
create table estimate as
select avg(result) as estimate0
from myDataSet;
quit;
/* compare estimate0 to target and update K */
data targets;
set targets;
set estimate;
K = K * (target / estimate0);
run;
%mend iteration;
I can get the desired answer by running %iteration a few times, but Ideally I would like to run the iteration until (target - estimate0 < 0.01). Is such a thing possible?
Thanks!
I had a similar problem to this just the other day. The below approach is what I used, you will need to change the loop structure from a for loop to a do while loop (or whatever suits your purposes):
First perform an initial scan of the table to figure out your loop termination conditions and get the number of rows in the table:
data read_once;
set sashelp.class end=eof;
if eof then do;
call symput('number_of_obs', cats(_n_) );
call symput('number_of_times_to_loop', cats(3) );
end;
run;
Make sure results are as expected:
%put &=number_of_obs;
%put &=number_of_times_to_loop;
Loop over the source table again multiple times:
data final;
do loop=1 to &number_of_times_to_loop;
do row=1 to &number_of_obs;
set sashelp.class point=row;
output;
end;
end;
stop; * REQUIRED BECAUSE WE ARE USING POINT=;
run;
Two part answer.
First, it's certainly possible to do what you say. There are some examples of code that works like this available online, if you want a working, useful-code example of iterative macros; for example, David Izrael's seminal Rakinge macro, which performs a rimweighting procedure by iterating over a relatively simple process (proc freqs, basically). This is pretty similar to what you're doing. In the process it looks in the datastep at the various termination criteria, and outputs a macro variable that is the total number of criteria met (as each stratification variable separately needs to meet the termination criterion). It then checks %if that criterion is met, and terminates if so.
The core of this is two things. First, you should have a fixed maximum number of iterations, unless you like infinite loops. That number should be larger than the largest reasonable number you should ever need, often by around a factor of two. Second, you need convergence criteria such that you can terminate the loop if they're met.
For example:
data have;
x=5;
run;
%macro reduce(data=, var=, amount=, target=, iter=20);
data want;
set have;
run;
%let calc=.;
%let _i=0;
%do %until (&calc.=&target. or &_i.=&iter.);
%let _i = %eval(&_i.+1);
data want;
set want;
&var. = &var. - &amount.;
call symputx('calc',&var.);
run;
%end;
%if &calc.=&target. %then %do;
%put &var. reduced to &target. in &_i. iterations.;
%end;
%else %do;
%put &var. not reduced to &target. in &iter. iterations. Try a larger number.;
%end;
%mend reduce;
%reduce(data=have,var=x,amount=1,target=0);
That is a very simple example, but it has all of the same elements. I prefer to use do-until and increment on my own but you can do the opposite also (as %rakinge does). Sadly the macro language doesn't allow for do-by-until like the data step language does. Oh well.
Secondly, you can often do things like this inside a single data step. Even in older versions (9.2 etc.), you can do all of what you ask above in a single data step, though it might look a little clunky. In 9.3+, and particularly 9.4, there are ways to run that proc sql inside the data step and get the result back without waiting for another data step, using RUN_MACRO or DOSUBL and/or the FCMP language. Even something simple, like this:
data have;
initial_a=0.3;
a=0.3;
target=0.5;
output;
initial_a=0.6;
a=0.6;
output;
initial_a=0.8;
a=0.8;
output;
run;
data want;
k=1;
do iter=1 to 20 until (abs(target-estimate0) < 0.001);
do _n_ = 1 to nobs;
if _n_=1 then result_tot=0;
set have nobs=nobs point=_n_;
a=initial_a*k;
result=min(1,a);
result_tot+result;
end;
estimate0 = result_tot/nobs;
k = k * (target/estimate0);
end;
output;
stop;
run;
That does it all in one data step. I'm cheating a bit because I'm writing my own data step iterator, but that's fairly common in this sort of thing, and it is very fast. Macros iterating multiple data steps and proc sql steps will be much slower typically as there is some overhead from each one.

What is the simplest way to either display the data, if there are observations, or create an empty record stating that the dataset was empty?

I have looked around quite a bit for something of this nature, and the majority of sources all give examples of counting the amount of observations etc.
But what I am actually after is a simple piece of code that will check to see if there are any observations in the dataset, if that condition is met then the program needs to continue as normal, but if the condition is not met then I would like a new record to be created with a variable stating that the dataset is empty.
I have seen macros and SQL code that can accomplish this, but what I would like to know is is it possible to do the same in SAS code? I know the code I have below does not work, but any insight would be appreciated.
Data TEST;
length VAR1 $200.;
set sashelp.class nobs=n;
call symputx('nrows',n);
obs= &nrows;
if obs = . then VAR1= "Dataset is empty"; output;
Run;
You could do it by always appending a 1-row data set with the empty dataset message, and then delete the message if it doesn't apply.
data empty_marker;
length VAR1 $200;
VAR1='Dataset is empty';
run;
Data TEST;
length VAR1 $200.;
set
sashelp.class nobs=n
empty_marker (in=marker)
;
if (marker) and _n_ > 1 then delete;
Run;
Easiest way I can think of is to use the nobs statement to check the number of records. The trick is you don't want to actually read from an empty data set. That will terminate the DATA Step and the nobs value will not be set. So you use an always false if statement to check the number of observations.
data test1;
format x best. msg $32.;
stop;
run;
data test1;
if _n_ = 0 then
set test1 nobs=nobs;
if ^nobs then do;
msg = "NO RECORDS";
output;
stop;
end;
set test1;
/*Normal code here*/
output;
run;
So this populates the nobs value with 0. The if clause sees the 0 and allows you to set the message and output that value. Use the stop to then terminate the DATA Step. Outside of that check, do your normal data step code. You need the ending output statement because of the first. Once the compiler sees an output it will not do it automatically for you.
Here it works for a data set with values.
data test2;
format x best. msg $32.;
do x=1 to 5;
msg="Yup";
output;
end;
run;
data test2;
if _n_ = 0 then
set test2 nobs=nobs;
if ^nobs then do;
msg = "NO RECORDS";
output;
stop;
end;
set test2;
y=x+1;
output;
run;