I am trying to build a custom transformation in SAS DI. This transformation will "act" on columns in an input data set, producing the desired output. For simplicity let's assume the transformation will use input_col1 to compute output_col1, input_col2 to compute output_col2, and so on up to some specified number of columns to act on (let's say 2).
In the Code Options section of the custom transformation users are able to specify (via prompts) the names of the columns to be acted on; for example, a user could specify that input_col1 should refer to the column named "order_datetime" in the input dataset, and either make a similar specification for input_col2 or else leave that prompt blank.
Here is the code I am using to generate the output for the custom transformation:
data cust_trans;
set &_INPUT0;
i=1;
do while(i<3);
call symputx('index',i);
result = myfunc("&&input_col&index");
output_col&index = result; /*what is proper syntax here?*/
i = i+1;
end;
run;
Here myfunc refers to a custom function I made using proc fcmp which works fine.
The custom transformation works fine if I do not try to take into account the variable number of input columns to act on (i.e. if I use "&&input_col&i" instead of "&&input_col&index" and just use the column result on the output table).
However, I'm having two issues with trying to make the approach more dynamic:
I get the following warning on the line containing
result = myfunc("&&input_col&index"):
WARNING: Apparent symbolic reference INDEX not resolved.
I do not know how to have the assignment to the desired output column happen dynamically; i.e., depending on the iteration of the do loop I'd like to assign the output value to the corresponding output column.
I feel confident that the solution to this must be well known amongst experts, but I cannot find anything explaining how to do this.
Any help is greatly appreciated!
You can't use macro variables that depend on data variables, in this manner. Macro variables are resolved at compile time, not at run time.
So you either have to
%do i = 1 %to .. ;
which is fine if you're in a macro (it won't work outside of an actual macro), or you need to use an array.
data cust_trans;
set &_INPUT0;
array in[2] &input_col1 &input_col2; *or however you determine the input columns;
array output_col[2]; *automatically names the results;
do i = 1 to dim(in);
result = myfunc(in[i]); *You quote the input - I cannot see what your function is doing, but it is probably wrong to do so;
output_col[i] = result; /*what is proper syntax here?*/
end;
run;
That's the way you'd normally do that. I don't know what myfunc does, and I also don't know why you quote "&&input_col&index." when you pass it to it, but that would be a strange way to operate unless you want the name of the input column as text (and don't want to know what data is in that variable). If you do, then pass vname(in[i]) which passes the name of the variable as a character.
Related
I have a dataset with a variable called pt with observations 8.1,8.2,8.3 etc and a variable called mean with values like 8.24 8.1 8.234 etc. Which are paired with each other.
I want to be able to set my put informat to the formats from the variable num.
I get the errors "Expecting an arithmetic expression"
"the symbol is not recognized and will be ignored" and "syntax error" from my code. (underlining the &fmt. part)
if pt=&type;
call symput("fmt",pt);
fmt_mean = putn(mean,&fmt.);
Thanks in advance for your help.
The macro processor's work is done before SAS compiles and runs the data step. So trying to place the value into a macro variable and then use it immediately to generate and execute SAS code will not work.
But since you are using the PUTN() function it can use the value of an actual variable, so there is no need to put the format into a macro variable.
fmt_mean = putn(mean,pt);
Please, post your data set and data step. Your description is hard to understand.
However the solution seems to be simple: do not use macro variables! You don't need them here. Unlike put() function which expect format know at compile time (that is when you can use macro variables) its analog putn() expects second argument to be variable. Of course, it works a little slower due to that permittance. So your code can look like that:
data ...;
set ...(keep=mean pt);
fmt_mean = putn(mean, pt);
run;
where pt variable maybe numeric, i.e. 8.2, or character, i.e. '8.2'.
If you want to understand how SAS macro works and what call symput does look here:
https://stackoverflow.com/a/69979074/7864377
I've gone in circles on this one for 1.5 hours, so I'm giving in and asking for help here. What I'm trying to do is dead simple but I cannot for the life of me find a link describing the process.
I have the following data step:
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date),date9.)));
%useful_macro(¯o_input_date);
run;
where a date value is passed to a macro function (I'm new to these). I'd like to use the numeric value of the date value - let's be wild and say I want to get the value of the year, multiply it by the day value, and subtract the remainder after dividing the month value by 3. I can't seem to get just the year value out of the input. I've tried various things such as
symget, both "naked" and prepended with "%", with arguments that represent all possible permutations of the following variants:
have a naked reference to the variable, e.g. macro_input_date
enclose in single quotes, e.g. 'macro_input_date'
enclose in double quotes, e.g. "macro_input_date"
prepend with the ampersand, e.g. ¯o_input_date
direct call to %sysfunc(year(<argument as variously specified above>)
Can anyone tell me what I am missing?
Thanks!
Given that you asked about macro functions, I'll guess that your example date processing is just an example. Talking about macro functions in general, it's important to understand that a macro function will (generally) not be doing any processing of its own, it will just be generating some data step code to do some task. So, for something like your contrived example, the data step code would be something like:
data out;
set in; * Assume this contains a numeric called 'some_date';
result = year(some_date) * day(some_date) - mod(month(some_date), 3);
run;
To macroise this, you don't need to transfer the data values to the macro, you just need to transfer the variable name:
%macro date_func(var=);
year(&var) * day(&var) - mod(month(&var), 3)
%mend;
data out;
set in; * Assume this contains a numeric called 'some_date';
result = %date_func(var=some_date);
run;
Note that the value of the var parameter here is the literal text some_date, not the value of the some_date data step variable. There are other ways to do it of course - you could actually pass this macro a date literal and it would still work:
data out;
set in; * Assume this contains a numeric called 'some_date';
result = %date_func(var="21apr2017"d);
run;
so it all depends on exactly what you're trying to do... maybe you want to assign the result to another macro variable, so it doesn't need to be part of a data step at all, in which case you could do a similar thing with %sysfunc functions etc.
If you're just trying to get the year, you would do something like:
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date,date9.)));
yearval = substr(symget('macro_input_date'),6,4);
put yearval=;
run;
Your macro value (¯o_input_date) is not the actual date value (14610) but is the text 01JAN2000. So you cannot use the year function (unless you INPUT it back), you would use substr to grab the year part.
Of course, this is all sort of pointless as going to/from macro variable doesn't really accomplish much here.
Are you just have trouble with date literals? Your data step code
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date),date9.)));
run;
is just going to do the same thing as
%let macro_input_date=01JAN2000 ;
Now if you want to treat that string of characters as if it represents a date then you need to either wrap it up as a date literal
"¯o_input_date"d
Or convert it.
%sysfunc(inputn(¯o_input_date,date9))
Why not just store the actual date value into the macro variable?
call symputx('macro_input_date',some_date);
Then it wouldn't look like a date to you but it would look like a date to the YEAR() function.
Within proc IML of the SAS system from within a user-defined module I want to be able to access the data in a vector by taking as an argument the name of the vector.
For example in the code below the module called "test" builds the two strings "x_vec1" and "x_vec2" but does not print the contents of these two vectors, but rather prints their names (I think submit blocks produce the same results).
This concept is very easily accomplished using macros by calling IML within macros, but I want to do it purely in IML to keep the code "clean". Whilst IML is brilliant for statistical work I need more than this - I need to keep my code short by having an analogous concept as the macro variable which can be resolved within a function at run time.
proc iml;
start test(ep);
str = concat("x_",ep);
print str;
finish;
x_vec1 = J(10,1,34);
x_vec2 = J(10,1,67);
run test("vec1");
run test("vec2");
quit;
The short answer to your question is that you need the function value.
proc iml;
x_vec1 = J(10,1,34);
str = cats('x_','vec1');
_temp = value(str);
print(_temp);
quit;
However, it doesn't work in a subroutine quite like you programmed it. That's because the subroutine needs to know that it's allowed to access the variable x_vec1 as a global variable - which you can't actually do here since you're specifying it on the fly.
You can get around this by PUSHing the command back to the calling environment, however.
start test(ep);
str = cats('print(x_',ep,')');
print(str);
call push(str);
finish;
Not sure if this gives you the flexibility you need; if you are trying to do something more complex you may want to clarify what the final result needs to be.
Using Joe's answer above, this new example accesses 2 vectors called "x_vec1" and "x_vec2" and adds a user defined amount to each element. Thus I think this approach can be used to create re-usable code that takes vector/matrix names as arguments.
proc iml;
start test(ep,n);
str1 = cats('print(x_',ep,');');
print str1;
str2 = cats('y_',ep,'= x_',ep,'+',char(n),';');
print str2;
str3 = cats('print(y_',ep,');');
print str3;
call queue(str1,str2,str3);
finish;
x_vec1 = J(10,1,34);
x_vec2 = J(10,1,67);
run test("vec1",10);
run test("vec2",100);
quit;
I’m pretty new with do loops in SAS and I know that I am trying to make this loop work like a MATLAB script. I haven’t found many helpful tips online as most of the do-loop examples are just for calculations, not actually checking to see if the row before the current one has the same value.
Here is my issue that I need to solve:
I want to look at each policy numbers below and see if the one before is the same, if it is, I want to flag it.
Policy
26X0118907
26X0375309
26X0375309
26X0527509
I would consider i=1 to be the first policy(26X0118907) and i=2 to be the second policy (26X0375309).
In this case according to the code (that doesn't work) below this increment would be flagged as ‘B’. Do you know how to properly code a situation like this?
data AF_Inforce_&thestate.;
set AF_Inforce_&thestate.;
by Rating_St;
if first.Rating_St then counter=0;
counter+1;
myloop:
do i=2 to counter;
P2(i)=Policy(i);
P1(i)=Policy(i-1);
if P1(i)=P2(i) then flag='A';
else flag='B';
end;
return;
run;
The first thing you need to learn coming from MATLAB or a similar language is that SAS is different. In particular, the DATA step is its own DO loop, looping over records.
Second, it's a bit complicated to access data accross rows. However, there are a few tricks.
Vasja showed you one (lag, which doesn't actually go to a previous record, but sort of acts like it does). dif does the same thing except it compares, so if your policynum had been numeric, Vasja's code could be rewritten as dif(policy)=0 instead of policy=lag(policy)(though this is only for numerics).
A better trick in my opinion in your case is to use by group processing. Normally this works with sorted fields, but here it doesn't matter if it's sorted: you just want to know if two consecutive rows are identical, right?
data want;
set have;
by rating_st policy notsorted;
if first.policy and last.policy then recflag='A';
else if first.rating_st then recflag='A';
else recflag='B';
run;
I don't know that I understand your rules entirely, but they're probably going to be some form of this. I put the two possibilities there, you might just want the second one (ie, you don't care if it's singular or just the first). The first would flag only singular policies.
Try looking at LAG function (it "remembers" the values of a variable in a queue)
Your code should go like this:
data AF_Inforce_&thestate.;
set AF_Inforce_&thestate.;
by Rating_St;
if first.Rating_St = 0 and Policy=LAG(Policy) then flag='A';
else flag='B';
run;
totalSUPPLY= sum(of supply1-supply485);
Ive got this simple calculation to make (in SAS) from a table that Ive transposed (hence the variable names). I have to do this several times, and the the number of supply variables is not the same for each calculation. I.e. in the above example its 485, but I do it later in my analysis and its 350.
My question: Is there a way to 'wildcard' the number of 'supply' columns. Basically, I want something like this (but this doesnt work): totalSUPPLY= sum(of supply1-supply%);
Also: If there is an easier way do the same Im open (and would actually prefer) that.
Thanks everyone!
data yoursummary;
set yourdata; /*dataset containing supply1-supply485*/
array supplies{*} supply:;
totalSUPPLY = sum(of supplies{*});
run;
N.B. using a : wildcard like this will only pick up matching variables that are present in the PDV at the point when you create the array, so the array definition has to come after the set statement. Also, it only works for variables with a common prefix, not those with a common suffix.
As Joe has pointed out, the following more concise code also works:
data yoursummary;
set yourdata; /*dataset containing supply1-supply485*/
totalSUPPLY = sum(of supplies:);
run;
Of course, if you declare an array it's then easier to do related things like checking how many variables are being added together, or looping through the variables in the array and applying the same logic to each one in turn.