If-else then do in SAS - sas

Hi I would like to know the steps of a multiple if then condition.
I would like to do the following:
data _ ;
set _ ;
if condition 1 is true then do;
if sub condition 1 is true then _ ;
else if sub condition2 is true then _;
else if ... ;
end;
else if condition 2 is true then do; /* Is it right? */
if sub condition 1 is true then _ ;
else if sub condition2 is true then _;
else if ... ;
end;
run;
Could you please tell me which the right steps are? I should include else if or else do?
For example: condition 1 can take values 1 or 0. sub-conditions (I will call them as test1,test2, test3, ...) are other conditions. So I would have something like :
data _ ;
set _ ;
if condition1 = 1 then do;
if test1 = . then test3=test2; else test3=test1;
else if test1 = 'My test' or test2= 'My test' then test3=test2 else test3=test2;
end;
else if condition1=0 then do;
if test1 = . then test3=test2; else test3=test1;
else if test1 = 'My test' or test2= 'My test' then test3=test2 else test3=test2;
end;
else test3=test2;
run;
A sample of data could be:
condition1 test1 test2
1 . M
0 My test .
1 Love home
0 Home .
what I would like to select is, based on condition1 values,
if condition1 is 1 and test1 is . then assign to test3 test2's value, otherwise test3=test1; and so on.
My expected output would be then:
condition1 test1 test2 test3
1 . M M
0 My test . My test
1 Love home Love
0 Home . Home

Not sure what you are asking, but perhaps this will help you.
You can think of the nested ifs as additional conditions. So if you had
if test1 then do;
if test2 then statement1 ;
else if test3 then statement2 ;
end;
You could re-write it as
if test1 and test2 then statement1 ;
else if test1 and test3 then statement2 ;

Related

Several if statements

I want to flag Komp and Bauspar if either one of them is <1 with -, >1 with + and if one of them is blank --> no flag.
Tried the following, but it produces with two 2022_Bauspar_flag columns somehow?
Can you give me hint?
Thanks a lot.
Kind regards,
Ben
%macro target_years2(table,type);
%local name_Bauspar name_Komp;
data &table ;
set work.&table;
%let name_Komp = "2022_ZZ_Komp"n;
%let name_Bauspar = "2022_ZZ_Bauspar"n;
&name_Komp = (1+("2022_Komposit"n-"2022_Komposit_Ziel"n)/"2022_Komposit_Ziel"n);
&name_Bauspar = (1+("2022_Bausparen"n-"2022_Bausparen_Ziel"n)/"2022_Bausparen_Ziel"n);
/*create ZZ_flags*/
if &name_Komp > 1 THEN do;
"2022_ZZ_Komp_flag"n = '+';
end;
else if &name_Komp < 1 and &name_Komp <> . THEN do;
"2022_ZZ_Komp_flag"n = '-';
end;
else if &name_Bauspar > 1 THEN do;
"2022_ZZ_Baupar_flag"n = '+';
end;
else if &name_Bauspar < 1 and &name_Bauspar <> . THEN do;
"2022_ZZ_Bauspar_flag"n = '-';
end;
else do;
end;
run;
%mend;
%target_years2(Produktion_temp,Produktion)
Difficult to help you as you do not provide any output or detailed explanation of what is wrong.
Note that if you want to compute both columns for each observations you would need to split your if statement. The second IF condition is not evaluated when the first IF condition is true.
I understand you want to compute two derived columns 2022_ZZ_Komp_flag and 2022_ZZ_Bauspar_flag with the following condition:
if associated macro variable &name_ > 1 then flag is +
if associated macro variable &name_ < 1 then flag is -
if associated macro variable &name_ = . then flag is missing
With the following dataset
data have;
input zz_komp zz_baupar;
cards;
0.9 1.1
1.1 0.8
. 2
0.8 .
;
The following code
data want;
set have;
"2022_ZZ_Komp_flag"n = ifc(zz_komp > 1, '+', '-');
"2022_ZZ_Baupar_flag"n = ifc(zz_baupar > 1, '+', '-');
if missing(zz_komp) then "2022_ZZ_Komp_flag"n = '';
if missing(zz_baupar) then "2022_ZZ_Baupar_flag"n = '';
run;
Produces
Is it the expected result?
You have a typo in your code. You assign to Baupar_flag in one case, and Bauspar_flag in the other
else if &name_Bauspar > 1 THEN do;
"2022_ZZ_Baupar_flag"n = '+';
------
end;
else if &name_Bauspar < 1 and &name_Bauspar <> . THEN do;
"2022_ZZ_Bauspar_flag"n = '-';
-------

How to work SET statement in a DO loop in SAS?

I studied SET statement in Do loop in SAS but i don't understand how to work SET statement in DO loop.
I create the following example dataset a1:
/* Create data a1 */
data a1 ;
input fruit $ ;
cards ;
melon
apple
orange
;
run ;
proc print data=a1 ;
title "Results of a1" ;
run;
Then, I create the following new dataset c1 :
/* Create data c1 using a1 -- This is a upper code block */
data c1 ;
do i = 1 to 3 ;
set a1 ;
count + 1 ;
N_VAR = _N_ ;
ERR_VAR = _ERROR_ ;
output ;
end;
run ;
proc print data=c1 LABEL ;
LABEL N_VAR = "_N_" ;
LABEL ERR_VAR = "_ERROR_" ;
title "Results of c1" ;
run ;
Question: Why doesn't the upper code have the same output as the below code block? I don't understand how to work SET statement in a DO loop. What concept am I missing?
/* My expectation for c1 -- This is a below code block */
data my_expectation ;
input i fruit $ count N ERROR ;
cards ;
1 melon 1 1 0
1 apple 2 2 0
1 orange 3 3 0
2 melon 4 1 0
2 apple 5 2 0
2 orange 6 3 0
3 melon 7 1 0
3 apple 8 2 0
3 orange 9 3 0
;
run;
proc print data=my_expectation label ;
LABEL N = "_N_" ;
LABEL ERROR = "_ERROR_" ;
title "The result that I expected for c1" ;
run ;
I attached result image file below.
Thank you for your attention.
Each SET statement sets up an independent reading stream.
A DATA step is an implicit loop.
After the DO loop iterates 3 times the implicit DATA step loop returns control to the top of the step.
At the second implicit iteration, the DO loop is entered, and in its first iteration the SET statement is reached (for the 4th time). The input data set (A1) has no more observations, so the DATA step ends.
You can observe the flow behavior with this version of your DATA step:
data c1 ;
put 'TOP';
do i = 1 to 3 ;
put i= 'pre SET';
set a1 ;
put i= 'post SET';
count + 1 ;
N_VAR = _N_ ;
ERR_VAR = _ERROR_ ;
output ;
end;
put 'BOTTOM';
run;
Aside:
When a DATA step does not have any explicit OUTPUT statements, the step will implicitly output an observation when control reaches the bottom of the step -- There are statements that prevent flow from reaching the bottom, such as, a RETURN statement or a subsetting IF statement that fails.
I answered your why question, #Tom showed you how to produce your expected result with DATA step. The result is a cross join that SQL can also perform:
data a1 ;
input fruit $ ;
cards ;
melon
apple
orange
;
data replicates;
do i = 1 to 3;
output;
end;
run;
proc sql;
create table want as
select i, a1.*
from replicates cross join a1
;
quit;
If you want to output each observation three times then move the DO loop after the SET.
set a1;
do i=1 to 3; output; end;
If you really want to read through the dataset three times then you either need three separate SET statements
i=1;
set a1;
output;
i=2;
set a1;
output;
i=3;
set a1;
output;
or use POINT= option to explicitly control which observation you are reading with the SET statement.
do i=1 to 3 ;
do p=1 to nobs;
set a1 point=p nobs=nobs ;
output;
end;
end;
stop;
Most DATA step stops when they read past the input and since that cannot happen with the POINT= option you need the STOP statement to prevent the data step from repeating forever.

if statement conditions are embedded in a column

I have a SAS table that has the if condition embedded in the condition1 column of that table. To be more explicit, I created a test dataset:
data test;
infile datalines delimiter=',';
input x1 x2 flag $ condition1 $ value_cond1_true $ value_cond1_false $ ;
datalines;
1,5, ,x1>x2,A,B
6,5, ,x2>x1,D,A
3,2, , ,C,D
;
run;
I am wondering if it possible to create a code that can directly output in the SAS code the if statement instead of creating a single macro-variable for each observation (&cond1_1, &cond1_2, ... &cond1_n).
Here is what I would want to do (I know it is not possible to use call symput in that case):
data final;
set test;
/* For each observation */
do i=1 to _n_;
/* Creating macro-variables for the if condition */
call symput("cond1",CONDITION1);
call symput("value_cond1_true",VALUE_COND1_TRUE);
call symput("value_cond1_false",VALUE_COND1_FALSE);
/* If the cond1 macro-variable is not empty then do */
if %sysevalf(%superq(cond1)=, boolean) = 0 then do;
if &cond1. then flag = &value_cond1_true.;
else flag = &value_cond1_false.;
end;
/* If the cond1 macro-variable is empty then */
else flag = "X";
end;
run;
Data can not modify the statements of a running DATA Step.
There is no 'dynamic expression resolver' that is part of data step.
There are some options though
Use the data to write source code
A different conditional has to be performed for each row (n)
Use resolve() to dynamically evaluate an expression in the macro system.
The values of the variables have to be replaced into the conditional for each row (n)
Write a program
filename evals temp;
data _null_;
file evals;
set test;
length statement $256;
put 'if _n_ = ' _n_ ' then do;';
if missing(condition1) then
statement = 'flag="X";'; /* 'call missing(flag);'; */
else
statement = 'flag = ifc('
|| trim(condition1) || ','
|| quote(trim(value_cond1_true )) || ','
|| quote(trim(value_cond1_false ))
|| ');';
put statement;
put 'end;';
run;
options source2;
data want;
set test;
length flag $8;
%include evals;
keep x1 x2 flag;
run;
filename evals;
RESOLVE function
data want;
set test;
length flag $8 cond expr $256;
cond = condition1;
cond = transtrn(cond,'x1',cats(x1));
cond = transtrn(cond,'x2',cats(x2));
expr = 'ifc(' || trim(cond) || ',' ||
trim(value_cond1_true) || ',' ||
trim(value_cond1_false) ||
')';
if not missing (condition1) then
flag = resolve ('%sysfunc(' || trim(expr) || ')');
else
flag = "X";
keep x1 x2 flag;
run;
If you are going to use the data to write code then take advantage of the power of the PUT statement.
data have;
infile cards dsd truncover;
input x1 x2 condition1 $ value_cond1_true $ value_cond1_false $ ;
cards;
1,5,x1>x2,A,B
6,5,x2>x1,D,A
3,2,,C,D
;
filename code temp;
data _null_;
set have;
file code;
if condition1 ne ' ';
put
'if _n_=' _n_ 'then flag=ifc(' condition1
',' value_cond1_true :$quote.
',' value_cond1_false :$quote.
');'
;
run;
data want;
set have;
length flag $8 ;
flag='X';
%include code / source2;
run;
Results:
84 data want;
85 set have;
86 length flag $8 ;
87 flag='X';
88 %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file ...\#LN00059.
89 +if _n_=1 then flag=ifc(x1>x2 ,"A" ,"B" );
90 +if _n_=2 then flag=ifc(x2>x1 ,"D" ,"A" );
NOTE: %INCLUDE (level 1) ending.
91 run;
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 3 observations and 6 variables.

CASE WHEN statements not returning a false

I am trying to create a flag that shows a 1, when a variable match_flg = total_match_flg, otherwise return a 0.
When i run the following code
proc sql;
create table xxxxxxx as
select*,
CASE
when match_flg = total_match_flg then 1 else 0
end as keep_flg
quit;
it returns all 1 and am sure in the dataset that statement should false and return some 0
What am i doing wrong ?
Is it because you're not reading any data in with a from statement?
I ran similar code (added a from) and it ran fine.
Edit: Including my test data;
data test;
do i = 1 to 10;
match_flag = i;
total_match_flag = 10-i;
output;
end;
drop i;
run;
proc sql;
create table x as
select *,
case
when match_flag = total_match_flag then 1 else 0
end as keep_flg
from test;
quit;
As a sidenote, case can be clumsy to use. Have a look at the IFC/IFN functions instead.
http://www.lexjansen.com/wuss/2012/28.pdf

Checking for proper ordering of numeric, time, etc

My data looks something like this:
data tmp ;
input id var1 - var5 ;
datalines ;
1 1 2 3 4 5
2 1 2 . . .
3 1 . . . 4
4 . 3 . . .
5 . . . . 5
6 1 3 2 2 3
7 5 3 7 8 9
8 1 . . . 2
9 1 . 2 3 4
;
run ;
I'm trying to determine if n variables are properly 'ordered.' By ordered, I mean numerically or sequential in time (or even alphabetic). So in this example, my desired output would be:
dummy = 1 1 1 1 1 0 0 1 1 since the ones where dummy = 1 are in correct order.
It would be trivial if I had complete data:
if var1 <= var2 <= ... <= varn then dummy = 1; else dummy = 0;
I do not have complete data unfortunately. So the problem may be that sas treats . as a very small number(?) and also that I cannot perform operations on . since this also failed:
if 0 * (var1 = .) + var1 <=
var1 * (var2 = .) + var2 <=
var2 * (var3 = .) + var3 <= ... <=
var_n-1 * (varn = .) + varn
then dummy = 1;
else dummy = 0;
Basically this would check to see if a variable is . and if it is, then use the previous value in the inequality, but if it is not missing, proceed as normal. This works sometimes, but still requires most of the info to be nonmissing.
I have also tried something like:
if var2 = max(var1, var2) & var1 <= var2 &
var3 = max(var1 -- var3) & var2 <= var3 & ...
but this approach also needs complete data. And I have tried transposing the data into a long format so that I can just delete the missing columns (and only keep variables I am interested in knowing the order of) but a transposed data set of thousands of variables isn't useful to me (if you would convert back to wide, there would still be missing columns).
Clearly, I am not the best SASer, but I would ideally like to write a macro or something since this issue comes up for me a lot (basically just a data check to see if dates are in order and occur when they should be regarding their relative timeline).
Here is all the code:
data tmp ;
input id var1 - var5 ;
datalines ;
1 1 2 3 4 5
2 1 2 . . .
3 1 . . . 4
4 . 3 . . .
5 . . . . 5
6 1 3 2 2 3
7 5 3 7 8 9
8 1 . . . 2
9 1 . 2 3 4
;
run ;
data tmp1 ;
set tmp ;
if var1 <= var2 <= var3 <= var4 <= var5 then dummy1 = 1 ; else dummy1 = 0 ;
if 0 * (var1 = .) + var1 <=
var1 * (var2 = .) + var2 <=
var2 * (var3 = .) + var3 <=
var3 * (var4 = .) + var4 <=
var4 * (var5 = .) + var5
then dummy2 = 1 ;
else dummy2 = 0 ;
if var2 = max(var1,var2) & var1 ~= var2 &
var3 = max(var1, var2, var3) & var2 ~= var3 &
var4 = max(var1, var2, var3, var4) & var3 ~= var4 &
var5 = max(var1, var2, var3, var4, var5) & var4 ~= var5
then dummy3 = 1 ;
else dummy3 = 0 ;
* none of dummy1 - 3 pick up the observations that are in proper order ;
run ;
data tmp1_varsIwant ;
set tmp1 ;
keep id var1 -- var5 ;
run ;
proc transpose data = tmp1_varsIwant out = tmp1_long ;
by id ;
run ;
data tmp1_long ;
set tmp1_long ;
if col1 = . then delete ;
if _name_ in('var6', 'var999') then delete ;
run ;
proc sort data = tmp1_long ;
by id col1 ;
run ;
Maybe you could force all the logic into one conditional, but it's probably simpler to use a loop like this:
data tmp1 ;
set tmp ;
array vars (*) var1-var5;
last_highest = .;
dummy = 1;
do i = 1 to 5;
if vars(i) > . and vars(i) < last_highest then do;
dummy = 0;
leave;
end;
last_highest = coalesce(vars(i),last_highest);
end;
run ;