Variable (constant) based on another variable in SAS - if-statement

I need help with the following. My input dataset is as follows:
If one of the values in the QC column is a FAIL, all of the values in the last column 'Final' should be REPEAT, irrespective of what other values are found in the QC column. Desired output dataset:
Thank you.
The following code does not give expected results as no condition is specified for other qc values.
data exp;
set exp;
if QC = "FAIL" then do;
FINAL= "REPEAT";
end;
run;

You need to read the data twice. The first time to figure out if there are any QC failures. The second time to get the records again so you can attach the new variable and write them to the output dataset. The first pass can stop as soon as you find any failure.
data want ;
do while(not eof1);
set have end=eof1;
if qc = 'FAIL' then do;
final='REPEAT';
eof1=1;
end;
end;
do while(not eof2);
set have end=eof2;
output;
end;
stop;
run;

You have to process the entire data set to determine if no change is needed.
In this example there is a check if FAIL occurs in any row and then conditionally changes to REPEAT
data have;
id+1;
input value qc $ ##;
datalines;
. FAIL 1 PASS 0 PASS 1 PASS . FAIL
;
%let repeat_flag = 0;
data _null_;
set have;
where qc = 'FAIL';
call symputx ('repeat_flag',1);
stop;
run;
%if &repeat_flag %then %do;
data have;
set have;
qc = 'REPEAT';
run;
%end;
to set everything to REPEAT dependent on the number of FAILs, you can add a conditional to the DATA _NULL_.
data _null_;
set have;
where qc = 'FAIL';
if _n_ >= 3;
call symputx ('repeat_flag',1);
stop;
run;

Related

Calling different call execute on if condition - SAS

My intent is to build a dataset in a two step process based on 'frame' dataset using call execute. I need 'a_dataset'. It doesn't exist.
I read the row of 'frame' dataset:
in the first row since 'a_dataset' doesn't exist i do macro nexds.
Second row of 'frame' , i check if 'a_dataset' exist, i found it and i do macro exds.
Unfortunately my code is not working. For each row in 'frame' the condition of existence is always false and it runs two times the nexds macro.
DATA WORK.frame;
INFILE DATALINES4
/*DLM='7F'x*/
DLM=' '
MISSOVER
DSD ;
INPUT
from : $CHAR25.
tojoin : $CHAR7. ;
DATALINES4;
dataset join1
dataset join2
;;;;
proc delete data=a_dataset;run;
%macro exds(dsn,varname);
data &dsn.;
set &dsn.;
&varname. = 'exist';
run;
%mend;
%macro nexds(dsn,varname);
data &dsn.;
&varname. = 'notexist';
run;
%mend;
data _null_;
set frame;
name = strip(tojoin);
dsname = cat('a_',strip(from));
if ~exist(dsname) then put 'notexist';
else put 'exist';
if ~exist(dsname) then call execute('%nexds('||dsname||','||name||')');
else call execute('%exds('||dsname||','||name||')');
run;
It runs this two lines of code:
1 + data a_dataset; join1 = 'notexist'; run;
2 + data a_dataset; join2 = 'notexist'; run;
instead i want:
1 + data a_dataset; join1 = 'notexist'; run;
2 + data a_dataset; set a_dataset; join2 = 'exist'; run;
in the log at the put call :
notexist
notexist
it seems like the if condition is checked at the beginning for each row of 'frame' and not after each it read a single row of 'frame'.
You are not understanding how CALL EXECUTE() works. The code you generate is stored up to run after the current step finishes.
Since both observations in FRAME have the same value of FROM the test for whether or not the dataset exists will have the same value both times since any code generated by the first observation has not had a chance to run yet.
Move the test and branching logic into the macro instead.
DATA frame;
INPUT from :$25. tojoin :$7. ;
DATALINES4;
dataset join1
dataset join2
;;;;
%macro make(dsn,varname);
data &dsn ;
%if %sysfunc(exist(&dsn)) %then %do;
set &dsn;
&varname = 'exist';
%end;
%else %do;
&varname = 'notexist';
%end;
run;
%mend make;
proc delete data=a_dataset;run;
options mprint;
data _null_;
set frame;
call execute(cats('%nrstr(%make)(', 'a_', from, ',' , tojoin, ')' ));
run;

How to get number of rows equal to zero instead of nothing if table is empty is SAS

I have the following code, which stores the number of rows in a table inside the macro-variable n_vars:
data cont_vars;
set var_list;
where flg_categorical = 0;
call symput('n_vars', _n_);
%put &n_vars;
run;
Currently if the resulting table is empty then n_vars resolves to nothing and I want it to be set equal to 0 so I can use it in a %do x = 1 %to &n_vars; loop later.
If the where statement indeed leads to no observation in the dataset, the following code should then set the n_vars macro-variable to 0:
data want;
if eof then call symputx('n_vars',_n_-1);
set have end=eof;
where flg_categorical = 0;
run;
Please consider creating a Minimal Reproducible Example.
data have;
infile datalines delimiter=',';
input x1 flg_categorical;
datalines;
1,1
6,2
3,2
;
run;
data want;
if eof then call symputx('n_vars',_n_-1);
set have end=eof;
where flg_categorical = 0;
run;
%put &=n_vars;
Result:
Anyway I don't know see why you would use a %do loop that would be incremented from 1 to 0.
Maybe you just want to run another kind of steps when the data set is empty. If that is the case, I would consider using:
%if &n_vars. > 0 %then %do; /* data set is not empty */
...
%end;
%else %do; /* data set is empty */
...
%end;

Do loop for creating new variables in SAS

I am trying to run this code
data swati;
input facility_id$ loan_desc : $50. sys_name :$50.;
cards;
fac_001 term_loan RM_platform
fac_001 business_loan IQ_platform
fac_002 business_loan BUSES_termloan
fac_002 business_loan RM_platform
fac_003 overdrafts RM_platform
fac_003 RCF IQ_platform
fac_003 term_loan BUSES_termloan
;
proc contents data=swati out=contents(keep=name varnum);
run;
proc sort data=contents;
by varnum;
run;
data contents;
set contents ;
where varnum in (2,3);
run;
data contents;
set contents;
summary=catx('_',name, 'summ');
run;
data _null_;
set contents;
call symput ("name" || put(_n_ , 10. -L), name);
call symput ("summ" || put (_n_ , 10. -L), summary);
run;
options mlogic symbolgen mprint;
%macro swati;
%do i = 1 %to 2;
proc sort data=swati;
by facility_id &&name&i.;
run;
data swati1;
set swati;
by facility_id &&name&i.;
length &&summ&i. $50.;
retain &&summ&i.;
if first.facility_id then do;
&&summ&i.="";
end;
if first.&&name&i. = last.&&name&i. then &&summ&i.=catx(',',&&name&i., &&summ&i.);
else if first.&&name&i. ne last.&&name&i. then &&summ&i.=&&name&i.;
run;
if last.facility_id ;
%end;
%mend;
%swati;
This code will create two new variables loan_desc_summ and sys_name_summ which has values of the all the loans_desc in one line and the sys_names in one line seprated by comma example (term_loan, business_loan), (RM_platform, IQ_platform) But if a customer has only one loan_desc the loan_summ should only have its value twice.
The problem while running the do loop is that after running this code, I am getting the dataset with only the sys_name_summ and not the loan_desc_summ. I want the dataset with all the five variables facility_id, loan_desc, sys_name, loan_desc_summ, sys_name_summ.
Could you please help me in finding out if there is a problem in the do loop??
Your loop is always starting with the same input dataset (swati) and generating a new dataset (SWATI1). So only the last time through the loop has any effect. Each loop would need to start with the output of the previous run.
You also need to fix your logic for eliminating the duplicates.
For example you could change the macro to:
%macro swati;
data swati1;
set swati;
run;
%do i = 1 %to 2;
proc sort data=swati1;
by facility_id &&name&i.;
run;
data swati1;
set swati1;
by facility_id &&name&i ;
length &&summ&i $500 ;
if first.facility_id then &&summ&i = ' ' ;
if first.&&name&i then catx(',',&&summ&i,&&name&i);
if last.facility_id ;
run;
%end;
%mend;
Also your program could be a lot smaller if you just used arrays.
data want ;
set have ;
by facility_id ;
array one loan_desc sys_name ;
array two $500 loan_desc_summ sys_name_summ ;
retain loan_desc_summ sys_name_summ ;
do i=1 to dim(one);
if first.facility_id then two(i)=one(i) ;
else if not findw(two(i),one(i),',','t') then two(i)=catx(',',two(i),one(i));
end;
if last.facility_id;
drop i loan_desc sys_name ;
run;
If you want to make it more flexible you can put the list of variable names into a macro variable.
%let varlist=loan_desc sys_name;
You could then generate the list of new names easily.
%let varlist2=%sysfunc(tranwrd(&varlist,%str( ),_summ%str( )))_summ ;
Then you can use the macro variables in the ARRAY, RETAIN and DROP statements.

Find three most recent data year for each row

I have a data set with one row for each country and 100 columns (10 variables with 10 data years each).
For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive).
This is what I have so far, but I know its wrong because of the nest loop, and its has same value for recent1 recent2 recent3 however I haven't figured out how to create recent1 recent2 recent3 without two loops.
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004 -- MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
%let rc = 1;
%do i = 2013 %to 2004 %by -1;
%do rc = 1 %to 3 %by 1;
%if MATERNAL_CARE_&i. ne . %then %do;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
%end;
%end; run; %mend; %test();
You don't need to use a macro to do this - just some arrays:
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004-MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
array mc {*} MATERNAL_CARE_2004-MATERNAL_CARE_2013;
array recent {*} recent1-recent3;
do i = 2013 to 2004 by -1;
do rc = 1 to 3 by 1;
if mc[i] ne . then do;
recent[rc] = mc[i];
end;
end;
run;
Maybe I don't get your request, but according to your description:
"For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive)" I created this sample dataset with dt1 and dt2 and 2 locations.
The output will be 2 datasets (and generally the number of the variables starting with DT) named DS1 and DS2 with 3 observations for each country, the first one for the first variable, the second one for the second variable.
This is the sample dataset:
data sample_ds;
length city $10 dt1 dt2 8.;
infile datalines dlm=',';
input city $ dt1 dt2;
datalines;
MS,5,0
MS,3,9
MS,3,9
MS,2,0
MS,1,8
MS,1,7
CA,6,1
CA,6,.
CA,6,.
CA,2,8
CA,1,5
CA,0,4
;
This is the sample macro:
%macro help(ds=);
data vars(keep=dt:); set &ds; if _n_ not >0; run;
%let op = %sysfunc(open(vars));
%let nvrs = %sysfunc(attrn(&op,nvars));
%let cl = %sysfunc(close(&op));
%do idx=1 %to &nvrs.;
proc sort data=&ds(keep=city dt&idx.) out=ds&idx.(where=(dt&idx. ne .)) nodupkey; by city DESCENDING dt&idx.; run;
data ds&idx.; set ds&idx.;
retain cnt;
by city DESCENDING dt&idx.;
if first.city then cnt=0; else cnt=cnt+1;
run;
data ds&idx.(drop=cnt); set ds&idx.(where=(cnt<3)); rename dt&idx.=act&idx.; run;
%end;
%mend;
You will run this macro with:
%help(ds=sample_ds);
In the first statement of the macro I select the variables on which I want to iterate:
data vars(keep=dt:); set &ds; if _n_ not >0; run;
Work on this if you want to make this work for your code, or simply rename your variables as DT1 DT2...
Let me know if it is correct for you.
When writing macro code, always keep in mind what has to be done when. SAS processes your code stepwise.
Before your sas code is even compiled, your macro variables are resolved and your macro code is executed
Then the resulting SAS Base code is compiled
Finally the code is executed.
When you write %if MATERNAL_CARE_&i. ne . %then %do, this is macro code interpreded before compilation.
At that time MATERNAL_CARE_&i. is not a variable but a text string containing a macro variable.
The first time you run trhough your %do i = 2013 %to 2004 by -1, it is filled in as MATERNAL_CARE_2013, the second as MATERNAL_CARE_2012., etc.
Then the macro %if statement is interpreted, and as the text string MATERNAL_CARE_1 is not equal to a dot, it is evaluated to FALSE
and recent_&rc. = MATERNAL_CARE_&i. is not included in the code to pass to your compiler.
You can see that if you run your code with option mprint;
The resolution;
options mprint;
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_: recent_:;
** The : acts as a wild card here **;
%do i = 2013 %to 2004 %by -1;
if MATERNAL_CARE_&i. ne . then do;
%do rc = 1 %to 3 %by 1;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
end;
%end;
run;
%mend;
%test();
Now, before compilation of if MATERNAL_CARE_&i. ne . then do, only the &i. is evalueated and if MATERNAL_CARE_2013 ne . then do is passed to the compiler.
The compiler will see this as a test if the SAS variable MATERNAL_CARE_1 has value missing, and that is just what you wanted;
Remark:
It is not essential that I moved the if statement above the ``. It is just more efficient because the condition is then evaluated less often.
It is however essential that you close your %ifs and %dos with an %end and your ifs and dos with an end;
Remark:
you do not need %let rc = 1, because %do rc = 1 to 3 already initialises &rc.;
For completeness SAS is compiled stepwise:
The next PROC or data step and its macro code are only considered when the preveous one is executed.
That is why you can write macro variables from a data step or sql select into that will influence the code you compile in your next step,
somehting you can not do for instance with C++ pre compilation;
Thanks everyone. Found a hybrid solution from a few solutions posted.
data sample_ds;
infile datalines dlm=',';
input country $ maternal_2004 maternal_2005
maternal_2006 maternal_2007 maternal_2008 maternal_2009 maternal_2010 maternal_2011 maternal_2012 maternal_2013;
datalines;
MS,5,0,5,0,5,.,5,.,5,.
MW,3,9,5,0,5,0,5,.,5,0
WE,3,9,5,0,5,.,.,.,.,0
HU,2,0,5,.,5,.,5,0,5,0
MI,1,8,5,0,5,0,5,.,5,0
HJ,1,7,5,0,5,0,.,0,.,0
CJ,6,1,5,0,5,0,5,0,5,0
CN,6,1,.,5,0,5,0,5,0,5
CE,6,5,0,5,0,.,0,5,.,8
CT,2,5,0,5,0,5,0,5,0,9
CW,1,5,0,5,0,5,.,.,0,7
CH,0,5,0,5,0,.,0,.,0,5
;
%macro test(var);
data &var._recent;
set sample_ds;
keep country &var._1 &var._2 &var._3;
array mc {*} &var._2004-&var._2013;
array recent {*} &var._1-&var._25;
count=1;
do i = 10 to 1 by -1;
if mc[i] ne . then do;
recent[count] = mc[i];
count=count+1;
end;
end;
run;
%mend;

What is the simplest way to either display the data, if there are observations, or create an empty record stating that the dataset was empty?

I have looked around quite a bit for something of this nature, and the majority of sources all give examples of counting the amount of observations etc.
But what I am actually after is a simple piece of code that will check to see if there are any observations in the dataset, if that condition is met then the program needs to continue as normal, but if the condition is not met then I would like a new record to be created with a variable stating that the dataset is empty.
I have seen macros and SQL code that can accomplish this, but what I would like to know is is it possible to do the same in SAS code? I know the code I have below does not work, but any insight would be appreciated.
Data TEST;
length VAR1 $200.;
set sashelp.class nobs=n;
call symputx('nrows',n);
obs= &nrows;
if obs = . then VAR1= "Dataset is empty"; output;
Run;
You could do it by always appending a 1-row data set with the empty dataset message, and then delete the message if it doesn't apply.
data empty_marker;
length VAR1 $200;
VAR1='Dataset is empty';
run;
Data TEST;
length VAR1 $200.;
set
sashelp.class nobs=n
empty_marker (in=marker)
;
if (marker) and _n_ > 1 then delete;
Run;
Easiest way I can think of is to use the nobs statement to check the number of records. The trick is you don't want to actually read from an empty data set. That will terminate the DATA Step and the nobs value will not be set. So you use an always false if statement to check the number of observations.
data test1;
format x best. msg $32.;
stop;
run;
data test1;
if _n_ = 0 then
set test1 nobs=nobs;
if ^nobs then do;
msg = "NO RECORDS";
output;
stop;
end;
set test1;
/*Normal code here*/
output;
run;
So this populates the nobs value with 0. The if clause sees the 0 and allows you to set the message and output that value. Use the stop to then terminate the DATA Step. Outside of that check, do your normal data step code. You need the ending output statement because of the first. Once the compiler sees an output it will not do it automatically for you.
Here it works for a data set with values.
data test2;
format x best. msg $32.;
do x=1 to 5;
msg="Yup";
output;
end;
run;
data test2;
if _n_ = 0 then
set test2 nobs=nobs;
if ^nobs then do;
msg = "NO RECORDS";
output;
stop;
end;
set test2;
y=x+1;
output;
run;