How to run/not run SAS or SQL code based on conditional output? - sas

I have a SAS program with a macro that will output a different list of variables based on the input criteria. For example, with %MACRO(OPTION1), I get three variables, but with %MACRO(OPTION2), I get four variables. The name of all of the variables is fixed, but it's just a matter of if they are created or not (based on the option).
How can I adjust the macro so that any option inputted by the user will still allow the macro to run? In other words, how can I tell it to ignore some variables if they don't exist.
Fortunately, I am not restricted to any specific procedure, but it would probably have to be either in a DATA step (macro language) or a PROC SQL statement (where clause or some other conditional statement).

This is answerable in the general, as an approach to programming.
The first rule:
Use macro parameters explicitly when the amount of code is small.
This means, if you want to (say) do a PROC MEANS on something, but the variable differed, you could do:
%macro run_means(var=);
proc means data=sashelp.class;
var &var.;
run;
%mend run_means;
%run_means(var=height);
%run_means(var=weight);
etc. Don't put some conditional logic in the macro, make them external. This includes lists of variables; make the whole list of variables parameters. Don't write them into your macro. If it's a long list, make it a macro variable in your main program, and pass that macro variable. Your macro itself should strive to accept what's given; today you have two sets of variables, tomorrow you might have three, or a slightly different set of one or the other. It's easier to change what you pass to the macro than to change the macro.
This concept will feel comfortable to folks used to object oriented programming, in particular the modular approach, although the separation of data is a bit different.
The second rule:
When substantial parts of a macro vary based on a parameter, separate that code into multiple macros.
In this case, let's say you have two things you want to do: run a PROC MEANS, or run a PROC FREQ, depending on if it's a character or numeric variable. Here, I suggest a general rule of not putting all of that into one macro. It's possible, but it's generally a bad idea. Adding to the previous macro, if you wanted to do this for sashelp.class, I'd do it like this:
%macro run_freq(var=);
proc freq data=sashelp.class;
tables &var.;
run;
%mend run_freq;
%run_means(var=height);
%run_means(var=weight);
%run_freq (var=sex);
How you create these may be programmatic. A lot depends on what you're doing and how you're generating the code; and sometimes in the middle of your macro, you generate the value that determines which of the two things you do. I would still write the portion that varies as a separate macro, though; you can then add logic to call the appropriate macro, and allow it to be more legible.

Related

how to get macro variable to evaluate math?

I have the following sas marco snippet:
%macro processLink(uuid=, name=, cluster_external_ipaddress=);
%let unix_starttime = 1000000*(&starttime - '01JAN1970:00:00'dt);
%let unix_endtime = 1000000*(&endtime - '01JAN1970:00:00'dt);
...
when this runs it just creates the variable as a string ie
=1000000*(dhms(today()-1,0,0,0) - '01JAN1970:00:00'dt)
instead of the unix timestamp in usecs.
using unix_starttime = 1000000*(&starttime - '01JAN1970:00:00'dt); outside the macro in a data step works
do i need a null datastep in the macro for this to work as intended ?
Thanks
In general if you want to work with DATA you are better off using SAS code and not MACRO code. You can use CALL SYMPUTX() to generate a macro variable if you need it later.
data _null_;
call symputx('unix_starttime',1000000*(&starttime - '01JAN1970:00:00'dt));
...
run;
You can use %eval() to do simple integer arithmetic and comparisons. If you need to use floating point numbers (or date/time/datetime literals) then you need to use %sysevalf().
%let unix_starttime=%sysevalf(1000000*(&starttime - '01JAN1970:00:00'dt));
In general, anything after a %let statement is treated as pure text. However, there are functions available to wrap around the text which tell SAS to perform a mathematical operation.
These are %eval, used for integer calculations, or %sysevalf where calculations involving decimals are required.
So you could put %let unix_starttime = %eval(1000000*(&starttime - '01JAN1970:00:00'dt));
It's not applicable here, but if you ever need to include a function in a %let statement, then precede the function name with %sysfunc

What does this block of SAS code do?

I have sas code that I need to partially convert to c++ code, however I am struggling understand its function. I have no experience with sas, and after a few hours of various tutorials and examples I have made very little progress. I don't have access to any of the input data or any corresponding output either. The code follows the following format, but I've changed the variable names:
data data1;
set data2;
output;
if type='ABCD' and zone=1 then do;
type='BCDE'; spec='CDE'; sub='ABCD DEF'; output;
type='EFGH'; spec='FGH'; output;
type='ABCD'; spec='DEF';
end;
The code then continues on, however I only need to understand the logic of this if statement. In the actual code there are many of these statements but they all follow the same structure, understanding one should help me to understand them all. The variable values are only important insofar as type and uniqueness, if variables here share a value then that is true in the original code as well, otherwise they are different.
I know that the program is designed to take combinations of type/spec/zone and convert them into other type/spec combinations but I can't seem to grasp the logic.
The DATA and SET statements define the target and source, respectively.
The first OUTPUT statement will insure that the target has at least one copy of every record read from the source data.
The code inside the DO END block of the IF/THEN statement will cause two additional records to be written when it runs. They will have different values for the TYPE, SPEC and SUB variables as the assignment statements indicate. At the end of the DO block the values of TYPE, SPEC and SUB will have been set to 'ABCD','DEF' and 'ABCD DEF', respectively.
So if your input is
TYPE,SPEC,SUB,ZONE
ABCD,UNK,UNK,0
ABCD,XX,YY,1
UNK,UNK,UNK,0
The values written by the part of the code you posted would be.
TYPE,SPEC,SUB,ZONE
ABCD,UNK,UNK,0
ABCD,XX,YY,1
BCDE,CDE,ABCD DEF,1
EFGH,FGH,ABCD DEF,1
UNK,UNK,UNK,0

SAS - How to determine the number of variables in a used range?

I imagine what I'm asking is pretty basic, but I'm not entirely certain how to do it in SAS.
Let's say that I have a range of variables, or an array, x1-xn. I want to be able to run a program that uses the number of variables within that range as part of its calculation. But I want to write it in such a way that, if I add variables to that range, it will still function.
Essentially, I want to be able to create a variable that if I have x1-x6, the variable value is '6', but if I have x1-x7, the value is '7'.
I know that :
var1=n(of x1-x6)
will return the number of non-missing numeric variables.. but I want this to work if there are missing values.
I hope I explained that clearly and that it makes sense.
Couple of things.
First off, when you put a range like you did:
x1-x7
That will always evaluate to seven items, whether or not those variables exist. That simply evaluates to
x1 x2 x3 x4 x5 x6 x7
So it's not very interesting to ask how many items are in that, unless you're generating that through a macro (and if you are, you probably can have that macro indicate how many items are in it).
But the range x1--x7 or x: both are more interesting problems, so we'll continue.
The easiest way to do this is, if the variables are all of a single type (but an unknown type), is to create an array, and then use the dim function.
data _null_;
x3='ABC';
array _temp x1-x7;
count = dim(_temp);
put count=;
run;
That doesn't work, though, if there are multiple types (numeric and character) at hand. If there are, then you need to do something more complex.
The next easiest solution is to combine nmiss and n. This works if they're all numeric, or if you're tolerant of the log messages this will create.
data _null_;
x3='ABC';
count = nmiss(of x1-x7) + n(of x1-x7);
put count=;
run;
nmiss is number of missing, plus n is number of nonmissing numeric. Here x3 is counted with the nmiss group.
Unfortunately, there is not a c version of n, or we'd have an easier time with this (combining c and cmiss). You could potentially do this in a macro function, but that would get a bit messy.
Fortunately, there is a third option that is tolerant of character variables: combining countw with catx. Then:
data _null_;
x3='ABC';
x4=' ';
count = countw(catq('dm','|',of x1-x7),'|','q');
put count=;
run;
This will count all variables, numeric or character, with no conversion notes.
What you're doing here is concatenating all of the variables together with a delimiter between, so [x1]|[x2]|[x3]..., and then counting the number of "words" in that string defining word as thing delimited by "|". Even missing values will create something - so .|.|ABC|.|.|.|. will have 7 "words".
The 'm' argument to CATQ tells it to even include missing values (spaces) in the concatenation. The 'q' argument to COUNTW tells it to ignore delimiters inside quotes (which CATQ adds by default).
If you use a version before CATQ is available (sometime in 9.2 it was added I believe), then you can use CATX, but you lose the modifiers, meaning you have more trouble with empty strings and embedded delimiters.

Evaluate string variable that contains macro reference

I've got a table with DataSet names in, some of which contain macro references in the name.
e.g. Monthly_Data_&YYMM (where YYMM is the latest month)
I want to keep the Table with this string, but then have a new variable with the evaluated DataSet name.
e.g. Monthly_Data_&YYMM, Monthly_Data_1612
I can't work out a way to do this. If I read the dataset as a macro variable it returns as the required name, but I can't then join it on the same row as the non evaluated reference.
I'm sure this must be possible, and probably quite easy, but I just can't get my head around how to do this.
Many thanks
You can use the resolve function to do this, e.g.
%let YYMM = 1601;
data mydata;
dsname = 'Monthly_Data_&YYMM';
dsname_resolved = resolve(dsname);
run;
N.B. all macro variables used in your column of names must be defined in your session with the correct values at the point when the resolve function executes. If two different data sets used the same macro variable in their name, but it took different values at different times, you will need to redefine the macro variable and run your logic separately, possibly via separate data steps or call symput + symget.

SAS Macro Works Standalone, But Not When Looped

I have a large dataset where I am storing macro parameters. The macro is itself used to call a number of other macros, each of which runs a number of operations.
Ideally, I'd like to use another macro to loop over each row of the dataset, construct (using PROC SQL) a macro call, store it in a macro variable :CALL, and call the variable at every iteration of the loop (with a PUT &CALL.;) That is:
%macro OUTER_LOOP(DS);
%let K = ;
%COUNT_ROWS(DS, K); /* This stores the number of rows in DS in K. */
%do i = 1 %to &K.;
proc sql noprint; ...; quit; /* Create the macro call, and store it in :CALL. */
%put &CALL.;
%end;
%mend;
%OUTER_LOOP;
This doesn't work as expected: some of the internal checks that exist in my macro indicate several datasets created by the macro are missing. Curiously, when I don't run this in a macro loop (i.e. I manually create a macro call, row-by-row, and execute it), no error occurs.
Has anyone experienced this issue? If so, is anyone familiar with a solution that would still allow me to loop over macro calls? I know that CALL EXECUTE(); (in the data step) runs different parts of the macro at different times--is that what is occurring in this case, as well?
I would add %put Loop iterating: i=&i k=&k ; inside the DO loop. That will let you see how many times the loop iterates. One possibility is the loop is exiting earlier than you intend it to. If that is the case, the cause could be a collision between the macro variable i you use for the looping in %Outer_Loop and another macro variable i you use in one of the inner macros you call. As a general rule, it's a good idea to define macro variables as %LOCAL to the macro they are defined in. Doing that will prevent such macro variable collisions. But without seeing the inner macros, that's just one possibility.
You could also add %put %superq(Call) ; inside the do loop. That will show you the macro calls that are being generated, so you can check you are getting the expected parameter values in each call.
Most likely a scoping issue. Your sub-macros are likely overwriting the values of your macro variables in your calling-macros.
You can fix this by declaring all your variables as local variables using the %local statement. If there are macro variables that you need to access after the macros have run, explicitly declare them as %global.
So for the macro you have listed above you will need the below line:
%local k i;
Don't forget you need to do this for any sub-macros that are called, and so on...
You can avoid a lot of these types of problems by generating the code yourself. For your example you could move the logic that generates the code from SQL to a data step and then instead of a macro you just need a data step. You don't even need know the number of observations in the dataset in advance.
filename code temp ;
data _null_;
set DS ;
file code ;
put '.... generated code based on values in current data ... ;
run;
%include code / source2 ;