SAS Data integration studio - sas

Long time reader first-time questioner.
Using SAS Data Integration studio, when you create a summary transformation in the table options advanced tab you can add a where statement to your code automatically. Unfortunately, it adds some code that makes this resolve incorrectly. Putting the following in the where text box:
TESTFIELD = "TESTVALUE"
creates
%let _INPUT_options = %nrquote(WHERE = %(TESTFIELD = %"TESTVALUE%"%));
In the code, used
proc tabulate data = &_INPUT (&_INPUT_options)
But resolves to
WHERE = (TESTFIELD = "TESTVALUE")
_
22
ERROR: Syntax error while parsing WHERE clause. ERROR 22-322: Syntax
error, expecting one of the following: a name, a quoted string, a
numeric constant, a datetime constant,
a missing value, (, *, +, -, :, INPUT, NOT, PUT, ^, ~.
My question is this: Is there a way to add a function to the where statement box that would allow this quotation mark to be properly added here?
Note that all functions get the preceding % when added to the where statement automatically and I have no control over that. This seems like something that should be relatively easy to fix but I haven't found a simple way yet.

The % are simply escaping the " and () characters; they're perfectly harmless, really. The bigger problem is the %NRQUOTE "quotes" (which are nonprinting characters that tell SAS this is macro-quoted); they mess up the WHERE processing.
Use %UNQUOTE( ... ) to remove these.
Example:
data have;
testfield="TESTVALUE";
output;
testfield="AMBASDF";
output;
run;
%let _INPUT_options = %nrquote(WHERE = %(TESTFIELD = %"TESTVALUE%"%));
%put &=_input_options;
data want;
set have(%unquote(&_INPUT_options.));
run;

Thank you all for your responses. Long story short, I ended up creating a SAS Troubleshooting ticket. The analyst told me that they have now documented the issue, which should now be resolved in a future iteration of DI.
The temporary solution was to create a new transformation, with a slight alteration, adding an UNQOUTE (as mentioned above by Joe) to the source code before the input options:
proc tabulate data = &_INPUT (%unquote(&_INPUT_options)) %unquote(&procOptions);
For those interested you will need to create the transformation in a public subfolder of your project so others can use it. Not what I was hoping for, but a workable solution while waiting for the version update.

Related

SAS Apparent symbolic reference when there is a n ampersand in data

This appears to be an old problem but I haven't seen an answer that fully addresses it. Totally possible I just missed it.
I'm consuming data that has a text field called fullDescription that contains a string like (made up but fits the pattern):
"00001234456 Wells Fargo DR FM AT&T PYMT 00987600"
I'm attempting to parse the data and dig out tidbits like "Wells Fargo" and "AT&T". However, when I manipulate "AT&T" SAS tries to read it as "AT" then the variable value for T. It stopped erroring (but still warns) when I instituted this line:
%LET description = %SYSFUNC(COMPRESS(%BQUOTE(&&fullDescription&row),'',P));
This, at least, returns "00001234456 Wells Fargo DR FM ATT PYMT 00987600" (missing ampersand) but still throws:
WARNING: Apparent symbolic reference T not resolved
I haven't figured out a way to prevent the warning. Is there a way to leave the ampersand in but not treat it as a variable? If that's not possible, can I cleanse it once and not get the error?
You are almost there, just add %nrstr() function when define the macro variable fullDescription.
data _null_;
call symputx('fullDescription','%nrstr(00001234456 Wells Fargo DR FM AT&T PYMT 00987600)');
run;
%LET description = %SYSFUNC(COMPRESS(%bquote(&fullDescription),'',P));
%put &=description;
This makes no warning anymore.

How to choose indexed assignment variable dynamically in SAS?

I am trying to build a custom transformation in SAS DI. This transformation will "act" on columns in an input data set, producing the desired output. For simplicity let's assume the transformation will use input_col1 to compute output_col1, input_col2 to compute output_col2, and so on up to some specified number of columns to act on (let's say 2).
In the Code Options section of the custom transformation users are able to specify (via prompts) the names of the columns to be acted on; for example, a user could specify that input_col1 should refer to the column named "order_datetime" in the input dataset, and either make a similar specification for input_col2 or else leave that prompt blank.
Here is the code I am using to generate the output for the custom transformation:
data cust_trans;
set &_INPUT0;
i=1;
do while(i<3);
call symputx('index',i);
result = myfunc("&&input_col&index");
output_col&index = result; /*what is proper syntax here?*/
i = i+1;
end;
run;
Here myfunc refers to a custom function I made using proc fcmp which works fine.
The custom transformation works fine if I do not try to take into account the variable number of input columns to act on (i.e. if I use "&&input_col&i" instead of "&&input_col&index" and just use the column result on the output table).
However, I'm having two issues with trying to make the approach more dynamic:
I get the following warning on the line containing
result = myfunc("&&input_col&index"):
WARNING: Apparent symbolic reference INDEX not resolved.
I do not know how to have the assignment to the desired output column happen dynamically; i.e., depending on the iteration of the do loop I'd like to assign the output value to the corresponding output column.
I feel confident that the solution to this must be well known amongst experts, but I cannot find anything explaining how to do this.
Any help is greatly appreciated!
You can't use macro variables that depend on data variables, in this manner. Macro variables are resolved at compile time, not at run time.
So you either have to
%do i = 1 %to .. ;
which is fine if you're in a macro (it won't work outside of an actual macro), or you need to use an array.
data cust_trans;
set &_INPUT0;
array in[2] &input_col1 &input_col2; *or however you determine the input columns;
array output_col[2]; *automatically names the results;
do i = 1 to dim(in);
result = myfunc(in[i]); *You quote the input - I cannot see what your function is doing, but it is probably wrong to do so;
output_col[i] = result; /*what is proper syntax here?*/
end;
run;
That's the way you'd normally do that. I don't know what myfunc does, and I also don't know why you quote "&&input_col&index." when you pass it to it, but that would be a strange way to operate unless you want the name of the input column as text (and don't want to know what data is in that variable). If you do, then pass vname(in[i]) which passes the name of the variable as a character.

How do I retrieve numerical value of macro argument set in data step

I've gone in circles on this one for 1.5 hours, so I'm giving in and asking for help here. What I'm trying to do is dead simple but I cannot for the life of me find a link describing the process.
I have the following data step:
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date),date9.)));
%useful_macro(&macro_input_date);
run;
where a date value is passed to a macro function (I'm new to these). I'd like to use the numeric value of the date value - let's be wild and say I want to get the value of the year, multiply it by the day value, and subtract the remainder after dividing the month value by 3. I can't seem to get just the year value out of the input. I've tried various things such as
symget, both "naked" and prepended with "%", with arguments that represent all possible permutations of the following variants:
have a naked reference to the variable, e.g. macro_input_date
enclose in single quotes, e.g. 'macro_input_date'
enclose in double quotes, e.g. "macro_input_date"
prepend with the ampersand, e.g. &macro_input_date
direct call to %sysfunc(year(<argument as variously specified above>)
Can anyone tell me what I am missing?
Thanks!
Given that you asked about macro functions, I'll guess that your example date processing is just an example. Talking about macro functions in general, it's important to understand that a macro function will (generally) not be doing any processing of its own, it will just be generating some data step code to do some task. So, for something like your contrived example, the data step code would be something like:
data out;
set in; * Assume this contains a numeric called 'some_date';
result = year(some_date) * day(some_date) - mod(month(some_date), 3);
run;
To macroise this, you don't need to transfer the data values to the macro, you just need to transfer the variable name:
%macro date_func(var=);
year(&var) * day(&var) - mod(month(&var), 3)
%mend;
data out;
set in; * Assume this contains a numeric called 'some_date';
result = %date_func(var=some_date);
run;
Note that the value of the var parameter here is the literal text some_date, not the value of the some_date data step variable. There are other ways to do it of course - you could actually pass this macro a date literal and it would still work:
data out;
set in; * Assume this contains a numeric called 'some_date';
result = %date_func(var="21apr2017"d);
run;
so it all depends on exactly what you're trying to do... maybe you want to assign the result to another macro variable, so it doesn't need to be part of a data step at all, in which case you could do a similar thing with %sysfunc functions etc.
If you're just trying to get the year, you would do something like:
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date,date9.)));
yearval = substr(symget('macro_input_date'),6,4);
put yearval=;
run;
Your macro value (&macro_input_date) is not the actual date value (14610) but is the text 01JAN2000. So you cannot use the year function (unless you INPUT it back), you would use substr to grab the year part.
Of course, this is all sort of pointless as going to/from macro variable doesn't really accomplish much here.
Are you just have trouble with date literals? Your data step code
data _null_;
some_date = "01JAN2000"D;
call symput('macro_input_date',left(put(some_date),date9.)));
run;
is just going to do the same thing as
%let macro_input_date=01JAN2000 ;
Now if you want to treat that string of characters as if it represents a date then you need to either wrap it up as a date literal
"&macro_input_date"d
Or convert it.
%sysfunc(inputn(&macro_input_date,date9))
Why not just store the actual date value into the macro variable?
call symputx('macro_input_date',some_date);
Then it wouldn't look like a date to you but it would look like a date to the YEAR() function.

SAS compare character records according to 2 different variable groupings

I am trying to find records which do are not grouped similarly according to 2 different variables (all variables have character format).
My variables are appln_id (unique) earliest_filing_id (groupings) docdb_family_id (groupings). The data set comprises around 25,000 different appln_id, but only 15446 different earliest_filing_id and 15755 docdb_family_id. Now you see that there's a difference of ca. 300 records among these 2 groups (potenially more because groupings might also change).
Now what I would like to do is the see all cases, which are not similarly grouped. Here an example:
appln_id earliest_filing_id docdb_family_id
10137202 10137202 30449399
10272131 10137202 30449399
10272153 10137202 !!25768424!!
You can see that the last case differs and should be on my list that I hope to create.
I was trying to solve it with either a Proc compare, a Call sortc or a by+if...then coding but failed so far to come up with a good solution.
I am not using SAS for that long yet...
Your help is super appreciated!
Grazie
Annina
Sounds like you want to use BY group processing to assign a new group variable.
Make sure your data is sorted and then run something like this to create a new GROUPID variable.
data want ;
set have ;
by EARLIEST_FILING_ID DOCDB_FAMILY_ID ;
groupid + first.docdb_family_id ;
run;
If my understanding is correct, you want to select unique docdb_family_id. Try this:
proc sql;
select * from yourfile group by docdb_family_id having count(*)=1;
quit;

SAS new variable name using macro variable

I am trying to create a new variable based on the value of a macro variable. However, SAS highlights 'vari' as red, seemingly indicating that I am doing something wrong. The statement still seems to get executed correctly though. Any thoughts?
%let i=7;
data d1;
set d1;
vari&i=7;
run;
SAS syntax highlighter is an aid, but there are many situations where it is not "correct". Particularly for the macro language, it can't always guess how symbols will resolve. It doesn't have all the information (or intelligence) as the SAS word scanner/tokenizer. I use syntax highlighting as a hint that something might be wrong, but I ignore it when I've checked the code and confirmed it is correct.
The code in your example is fine.