IF-THEN vs IF in SAS - sas

What is the difference between IF and IF-THEN
For example the following statement
if type='H' then output;
vs
if type='H';
output;

An if-then statement conditionally executes code. If the condition is met for a given observation, whatever follows the 'then' before the ; is executed, otherwise it isn't. In your example, since what follows is output, only observations with type 'H' are output to the data set(s) being built by the data step. You can also have an if-then-do statement, such as in the following code:
if type = 'H' then do;
i=1;
output;
end;
If-then-do statements conditionally execute code between the do; and the end;. Thus the above code executes i=1; and output; only if type equals 'H'.
An if without a then is a "subsetting if". According to SAS documentation:
A subsetting IF statement tests the condition after an observation is
read into the Program Data Vector (PDV). If the condition is true, SAS
continues processing the current observation. Otherwise, the
observation is discarded, and processing continues with the next
observation.
Thus if the condition of a subsetting if (ex. type='H') is not met, the observation is not output to the data set being created by the data step. In your example, only observations where type is 'H' will be output.
In summary, both of your example codes produce the same result, but by different means. if type='H' then output; only outputs observations where type is 'H', while if type='H'; output; discards observations where type is not 'H'. Note that in the latter you don't need the output; because there is an implicit output in the SAS data step, which is only overridden if there is an explicit output; command.

They're similar but not identical. In a data step, if is a subsetting statement, and all records not satisfying the condition are dropped. From the documentation:
"Continues processing only those observations that meet the condition of the specified expression."
if then functions more like the if statement in other languages, it executes the statement after the then clause conditionally. A somewhat contrived example:
data baz;
set foo;
if type = 'H';
x = x + 1;
run;
data baz:
set foo;
if type='H' then x = x + 1;
run;
In both examples x will be incremented by 1 if type = 'H', but in the first data step baz will not contain any observations with type not equal to 'H'.
Nowadays it seems like most things that used to be accomplished by if are done using where.

Related

difference between using if _n_ =1 then do; and not using it with PRXPARSE() in SAS

Created a dataset :
data x;
infile datalines truncover;
input name $100.;
datalines;
Deepanshu
How are you, deepanshu
dipanshu
deepanshu is a good boy
My name is deepanshu
Deepanshu Bhalla
Deepanshuuu
DeepanshuBhalla
Bhalla Deepanshu
;
run;
Wrote the following code :
data test;
set x;
if _n_ =1 then do;
retain re;
re = prxparse("s/(Deepanshu\s?Bhalla|bhalla\s?Deepanshu|Deepanshu)/Soumya Pandey/i");
end;
new_data = prxchange(re, -1, name);
proc print;
run;
and a similar one but without the
if _n_ =1 then do; end; retain;
data test;
set x;
re = prxparse("s/(Deepanshu\s?Bhalla|bhalla\s?Deepanshu|Deepanshu)/Soumya Pandey/i");
new_data = prxchange(re, -1, name);
proc print;
run;
Both of the testing codes gave the same result. What is the difference between them?
DATA Step code blocks with the construct
if _n_ = 1 then do;
...
end;
cause the interior statements to occur during only the 1st iteration of the implicit loop.
Retaining a (non DATA SET) variable prevents its value from being reset to missing at the top of the implicit loop. Retain can be used to initialize a variable with a literal value at compilation time and do not need a if _n_=1 guard. Initializations from an computation assignment or INPUT necessarily require the guard (except special situations such a prxparse).
For the case of interior statement re = prxparse(...)
As stated by #whymath the DATA Step compiler is being improved in each release of SAS and there are now some implicit guards against recompiling a static regular expression pattern.
You will see the same code construct used for initializing hash objects.
Tip of the day:
If you do not specify an argument, the RETAIN statement causes the values of all variables that are created with INPUT or assignment statements to be retained from one iteration of the DATA step to the next.
The first one uses if _n_ = 1 then ...; retain ; statements, which is a very good and practical programming technique, called initialization block. It will only executing at the first row when reading data and avoids compiling the regular expression for each iteration of data step.
However, This skill may considered old-fashioned. In the very new version of SAS(mine is SAS9.4M5), we don't need to write this initialization block any more, there is some internal parser optimization. Here is specification from SAS Help Center:
If perl-regular-expression is a constant or if it uses the /o option, the Perl regular expression is compiled only once. Successive calls to PRXPARSE do not cause a recompile, but returns the regular-expression-id for the regular expression that was already compiled. This behavior simplifies the code because you do not need to use an initialization block (IF _N_ =1) to initialize Perl regular expressions.
So I prefer to use your second way, it is a true SASor's way.

SAS: adding character variables in data step without setting the lenghth in advance

In a SAS data step, if one creates a character variable he has to be careful in choosing the right length in advance. The following data step returns a wrong result when var1=case2, since 'var2' is truncated to 2 characters and is equal to 'ab', which is obviously not what we want. The same happens replacing var2=' ' with length var2 $2. This kind of procedure is quite prone to errors.
data b; set a;
var2 = ' ';
if var1 = 'case1' then var2='xy';
if var1 = 'case2' then var2='abcdefg';
run;
I was unable to find a way to just define 'var2' as a character, without having to care for its length (side note: if left unspecified, the length is 8).
Do you know if it is possible?
If not, can you perhaps suggest a more robust turnoround, something similar to an sql "case", "decode", etc, to allocate different values to a new string variable that does not suffer from this length issue?
SAS data step code is very flexible compared to most computer languages (and certainly compared to other languages created in the early 1970s) in that you are not forced to define variables before you start using them. The data step compiler waits to define the variable until it needs to. But like any computer program it has rules that it follows. When it cannot tell anything about the variable then it is defined as numeric. If it sees that the variable should be character it bases the decision on the length of the variable on the information available at the first reference. So if the first place you use the variable in your code is assigning it a string constant that is 2 bytes long then the variable has a length of 2. If it is the result of character function where the length is unknown then the default length is 200. If the reference is using a format or informat then the length is set to the appropriate length for the width of the format/informat. If there is no additional information then the length is 8.
You can also use PROC SQL code if you want. In that case the rules of ANSI SQL apply for how variable types are determined.
In your particular example the assignment of blanks to the variable is not needed since all newly created variables are set to missing (all blanks in the case of character variables) when the data step iteration starts. Note that if VAR2 is not new (ie it is already defined in dataset A) then you cannot change its length anyway.
So just replace the assignment statement with a length statement.
data b;
set a;
length var2 $20;
if var1 = 'case1' then var2='ab';
if var1 = 'case2' then var2='abcdefg';
run;
SAS is not going the change the language at this point, they have too many users with existing code bases. Perhaps they will make a new language at some point in the future.

error in do loop SAS

I need my data temp dataset to generate 2 columns.
word1 and word2. Both will have blank values. The value in the do loop will change. 2 is just a dummy number.
Can some one tell me how to interpret this error ?
data temp(drop=k);
do k=1 to 2;
word&k=.;
output;
end;
run;
Logs -
180
WARNING: Apparent symbolic reference K not resolved.
ERROR 180-322: Statement is not valid or it is used out of proper order.
You need to use an array, not a macro variable; you're misunderstanding how macro variables work.
data temp(drop=k);
array word[2];
do k=1 to 2;
word[k]=.;
output;
end;
run;
Macro variables are an entirely different system, and require a different kind of loop (and, to be inside a macro, the way you're trying to do it).

SAS Macro Works Standalone, But Not When Looped

I have a large dataset where I am storing macro parameters. The macro is itself used to call a number of other macros, each of which runs a number of operations.
Ideally, I'd like to use another macro to loop over each row of the dataset, construct (using PROC SQL) a macro call, store it in a macro variable :CALL, and call the variable at every iteration of the loop (with a PUT &CALL.;) That is:
%macro OUTER_LOOP(DS);
%let K = ;
%COUNT_ROWS(DS, K); /* This stores the number of rows in DS in K. */
%do i = 1 %to &K.;
proc sql noprint; ...; quit; /* Create the macro call, and store it in :CALL. */
%put &CALL.;
%end;
%mend;
%OUTER_LOOP;
This doesn't work as expected: some of the internal checks that exist in my macro indicate several datasets created by the macro are missing. Curiously, when I don't run this in a macro loop (i.e. I manually create a macro call, row-by-row, and execute it), no error occurs.
Has anyone experienced this issue? If so, is anyone familiar with a solution that would still allow me to loop over macro calls? I know that CALL EXECUTE(); (in the data step) runs different parts of the macro at different times--is that what is occurring in this case, as well?
I would add %put Loop iterating: i=&i k=&k ; inside the DO loop. That will let you see how many times the loop iterates. One possibility is the loop is exiting earlier than you intend it to. If that is the case, the cause could be a collision between the macro variable i you use for the looping in %Outer_Loop and another macro variable i you use in one of the inner macros you call. As a general rule, it's a good idea to define macro variables as %LOCAL to the macro they are defined in. Doing that will prevent such macro variable collisions. But without seeing the inner macros, that's just one possibility.
You could also add %put %superq(Call) ; inside the do loop. That will show you the macro calls that are being generated, so you can check you are getting the expected parameter values in each call.
Most likely a scoping issue. Your sub-macros are likely overwriting the values of your macro variables in your calling-macros.
You can fix this by declaring all your variables as local variables using the %local statement. If there are macro variables that you need to access after the macros have run, explicitly declare them as %global.
So for the macro you have listed above you will need the below line:
%local k i;
Don't forget you need to do this for any sub-macros that are called, and so on...
You can avoid a lot of these types of problems by generating the code yourself. For your example you could move the logic that generates the code from SQL to a data step and then instead of a macro you just need a data step. You don't even need know the number of observations in the dataset in advance.
filename code temp ;
data _null_;
set DS ;
file code ;
put '.... generated code based on values in current data ... ;
run;
%include code / source2 ;

SAS: Difference between IF-THEN and IF-THEN-DO Statments?

I am new to SAS and would like to know what are the difference Difference between "IF-THEN" and "IF-THEN-DO" statements in SAS?
Simplified you can say, if then is for one statement, if then do for a block of statements. If you use if without then in Datastep, it prevents output for the specific set.
Example:
data x;
set y;
if a = 1 then /*one statment is following*/
b=2;
if a = 1 then do; /* a block of statements is follwing till end statement, similar to brackets in other programming languages*/
b=2;
c=3;
end;
if a = 1; /*only when a = 1 data will be written to x*/
run;
SAS evaluates the expression in an IF-THEN statement to produce a result that is either non-zero, zero, or missing. A non-zero and nonmissing result causes the expression to be true; a result of zero or missing causes the expression to be false.
If the conditions that are specified in the IF clause are met, the IF-THEN statement executes a SAS statement for observations that are read from a SAS data set, for records in an external file, or for computed values. An optional ELSE statement gives an alternative action if the THEN clause is not executed. The ELSE statement, if used, must immediately follow the IF-THEN statement.
Using IF-THEN statements without the ELSE statement causes SAS to evaluate all IF-THEN statements. Using IF-THEN statements with the ELSE statement causes SAS to execute IF-THEN statements until it encounters the first true statement. Subsequent IF-THEN statements are not evaluated. (Source: support.sas.com)
The DO statement is the simplest form of DO group processing. The statements between the DO and END statements are called a DO group. You can nest DO statements within DO groups.
A simple DO statement is often used within IF-THEN/ELSE statements to designate a group of statements to be executed depending on whether the IF condition is true or false. (Source: support.sas.com)
Regards,
Vasilij