SAS Macro quoting issues - sas

I am trying to perform operations on binary data saved into a macro variable. The datastep below successfully saves the data into the macro variable without any issues:
data _null_;
infile datalines truncover ;
attrib x length=$300 informat=$300. format=$300.;
input x $300.;
put x=;
call symput ('str',cats(x));
datalines4;
‰PNG > IHDR ) ) ëŠZ sRGB ®Î=é gAMA ±^üa pHYs ;à ;ÃÇo¨d ZIDAT8OåŒ[ À½ÿ¥Ó¼”Ö5Dˆ_v#aw|+¸AnŠ‡;6<ÞóRÆÒÈeFõU/'“#f™Ù÷&É|&t"<ß}4¯à6†Ë-Œ_È(%<É'™èNß%)˜Î{- IEND®B`‚
;;;;
run;
When I try and use the contents of the macro variable in any way, the combinations of reserved characters are making it impossible to work with. The following reserved characters are in the value, and are not matched:
&%'"()
I've tried every combination of macro quoting functions I can think of and I can't even get the value to print using a %put():
%put %nrbquote(&str);
Results in:
SYMBOLGEN: Macro variable STR resolves to ‰PNG > IHDR ) ) ëŠZ sRGB ®Î=é gAMA
±^üa pHYs ;à ;ÃÇo¨d ZIDAT8OåŒ[
À½ÿ¥Ó¼”Ö5Dˆ_v#aw|+¸AnŠ‡;6<ÞóRÆÒÈeFõU/'“#f™Ù÷&É|&t"<ß}4¯à6†Ë-Œ_È(%<É'™èNß%)˜Î{-
IEND®B`‚
ERROR: The value É is not a valid SAS name.
ERROR: The SAS Macro Facility has encountered an I/O error. Canceling submitted statements.
NOTE: The SAS System stopped processing due to receiving a CANCEL request.
Ultimately, what I'd like to do is convert these values to a base64 encoding using the following statement (I've pre-calculated the length of the base64 format for ease-of-debugging):
%let base64_string = %sysfunc(putc(%nrbquote(&str),$base64x244.));

You can use %SUPERQ() to quote a macro variable without having to first expand it. Note that it takes the name of macro variable and not the value as its argument.
%let base64_string = %sysfunc(putc(%superq(str),$base64x244.));
But why not just do the transformation in a DATA STEP and avoid the macro quoting issues?

Related

Can I nest %sysfunc-functions or achieve similar results?

I have the following strings that are used (in different variations) as variable names:
Data variables;
input variable;
datalines;
Exkl_UtgUtl_Flyg
Exkl_UtgUtl_Tag
Exkl_UtgUtl_Farja
Exkl_UtgUtl_Hyrbil
Exkl_UtgUtl_Bo
Exkl_UtgUtl_Aktiv
Exkl_UtgUtl_Annat
;
run;
In order to reference related variables I need to turn variables of the type "Exkl_UtgUtl_Flyg" to variables of the type "UtgUtl_FlygSSEK_Pers" and "UtgUtl_FlygSSEK_PPmedel".I try to do this in the following macro, along with other manipulations:
%macro imputera_saknad_utgift(variabel);
DATA IBIS3_5;
SET IBIS3_5;
if &variabel=1 and %sysfunc(cats(%qsysfunc(TRANWRD(&variabel,'Exkl_','')),SSEK_Pers))=. then
%sysfunc(cats(%qsysfunc(TRANWRD(&variabel,'Exkl_','')),SSEK_Pers))=%sysfunc(cats(%qsysfunc(TRANWRD(&variabel,'Exkl_','')),SSEK_PPmedel));
RUN;
%mend imputera_saknad_utgift;
The documentation stated that %sysfunc can't be nested, but mentioned something about alternating
%sysfunc- and %qsysfunc-functions so I tried that. I then try to execute the code:
data _null_;
set variabler2;
call execute(cats('%imputera_saknad_utgift(',utgifter_inte_missing,')'));
run;
This does not seem to work however. The cats-function seems to have worked, but not the nested TRANWRD-function:
NOTE: DATA statement used (Total process time):
real time 0.11 seconds
cpu time 0.12 seconds
5 + DATA IBIS3_5; SET IBIS3_5; if Exkl_UtgUtl_Bo=1 and Exkl_UtgUtl_BoSSEK_Pers=. then
Exkl_UtgUtl_BoSSEK_Pers=Exkl_UtgUtl_BoSSEK_PPmedel;
How do I make this work? The output should look something like:
DATA IBIS3_5; SET IBIS3_5; if Exkl_UtgUtl_Bo=1 and UtgUtl_BoSSEK_Pers=. then
UtgUtl_BoSSEK_Pers=UtgUtl_BoSSEK_PPmedel;
I don't think your macro variable values have quote characters in them, so this code is not going to work:
%qsysfunc(TRANWRD(&variabel,'Exkl_',''))
Since it is looking to replace the 7 character string 'Exkl_' with just the two character string '', two quotes next to each other.
You probably meant to search for Exkl_ instead. You probably also do not want to use %QSYSFUNC() here since that will preserve the space that TRANWRD() will insert. You could use %SYSFUNC() to avoid having that leading space as part of the value. Or perhaps use the TRANSTRN() function instead since that function, unlike TRANWRD(), can translate to an empty string instead of a single space.
Example:
439 %let variable=Exkl_UtgUtl_Flyg ;
440 %put %qsysfunc(TRANWRD(&variable,'Exkl_','')) ;
Exkl_UtgUtl_Flyg
441 %put %qsysfunc(TRANWRD(&variable,Exkl_,)) ;
UtgUtl_Flyg
442 %put %sysfunc(TRANWRD(&variable,Exkl_,)) ;
UtgUtl_Flyg
443 %put %qsysfunc(TRANSTRN(&variable,Exkl_,)) ;
UtgUtl_Flyg

SAS - append string macro variable to data set name

I'm trying to append a string macro variable to a data set name in SAS. I want to create datasets that read something like work.cps2020jan and work.cps2020feb. But that's not what I am getting. My code:
%macro loop(values);
%let count=%sysfunc(countw(&values));
%do i = 1 %to &count;
%let value=%qscan(&values,&i,%str(,));
%put &value;
data work.cps2020&value.;
set "A:\cpsb2020&value" ;
mth = "&value.";
keep
PEMLR
mth
;
run;
%end;
%mend;
%loop(%str(jan,feb));
Running this code results in the following output in the log:
NOTE: There were 138697 observations read from the data set
A:\cpsb2020jan.
NOTE: The data set WORK.CPS2020 has 138697 observations and 2 variables.
NOTE: The data set WORK.JAN has 138697 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 4.29 seconds
cpu time 0.20 seconds
feb
NOTE: There were 139248 observations read from the data set
A:\cpsb2020feb.
NOTE: The data set WORK.CPS2020 has 139248 observations and 2 variables.
NOTE: The data set WORK.FEB has 139248 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 4.44 seconds
cpu time 0.15 seconds
I don't understand why my macro creates two datasets per loop instead of one dataset per loop called work.cps2020jan and work.cps2020feb. If I change &value. to &i. SAS outputs work.cps20201 and work.cps20202. But that's not what I want.
Any insights?
The %QSCAN macro function will mask it's result with special invisible (non-printable) characters only visible to the macro processor system.
What happened is that
data work.cps2020&value.;
was seen as
data work.cps2020<mask-character><non-masked part of symbol value><mask-character>;
during executor processing, which treated the non-printable mask character as a non-syntax token separator, resulting in a DATA statement listing two output tables.
data work.cps2020 jan;
The positions of mask characters in a macro variable can be observed (in the LOG) using %put _user_, or, the actual symbol contents can be captured from a metadata view such as SASHELP.VMACRO or DICTIONARY.MACRO
Let's simplify your macro and add some logging and symbol capture
%macro loop(values);
%local count i;
%let count=%sysfunc(countw(&values));
%do i = 1 %to &count;
%let value=%qscan(&values,&i,%str(,));
%put _user_; %*--- log them masks;
data x&i; %* --- symbol capture;
set sashelp.vmacro;
where name like '%VALUE%';
value_hex = put (value,$HEX40.);
run;
%* --- do the step that creates two tables;
data work.cps2020&value.;
set sashelp.class;
run;
%end;
%mend;
options nomprint nosymbolgen nomlogic;
%loop(%str(jan,feb));
proc print data=x1 noobs style(data)=[fontsize=14pt fontfamily="Courier"];
var value:;
run;
LOG snippet, those little boxes are the special invisible masking characters (I am showing them in image captures because stack overflow / html won't show non-printable characters)
Same LOG text, copy and pasted into Notepad2 show the mask characters as control characters
The Proc PRINT of the captured macro symbol data will expose the hexadecimal masking characters
06 macro %quote start
08 macro %quote end
01 macro %str start
02 macro %str end
1E masked version of comma
&value is returned as quoted by %qscan(). Use %scan() instead. Quoted macro variables can sometimes cause issues on resolution when they're used in this way. It's best to only quote them when needed, such as in a %put statement that has a % sign within it.
You don't need %qscan(). If the value contained any characters that need macro quoting then they would be invalid for use in a member name anyway. So use %scan() instead.
But when used inside of a macro the tokenizer will sometimes mistakenly see things like xxx&mvar as two tokens even when there are no special characters in &mvar. You can group the value you are generating to work around that.
For example by making a new macro variable
%let dsn=cps2020&value.;
data work.&dsn. ;
Or use the %unquote() function:
data %unquote(work.cps2020&value.);
Or use a name literal:
data work."cps2020&value."n;

Using %PUT to correctly format the dynamic file name

I have a SAS script that reads in a CSV file and stores it in a SAS data set:
LIBNAME IN '\\path\Data';
FILENAME CSV '\\path\Data\DT.csv';
DATA IN.DT;
INFILE CSV DLM=',' DSD FIRSTOBS=1;
INPUT KEY VALUE1 VALUE2;
RUN;
I want to change it such that instead of expecting the input to be named DT.csv, it would accept an input named DT-2016-03-03-TEST.csv, or whatever the current date is. In other words, I need to use a dynamic value in my FILENAME statement.
Here is what I have so far:
%LET curday = %SYSFUNC(day("&sysdate"d));
%LET curmonth = %SYSFUNC(month("&sysdate"d));
%LET curyear = %SYSFUNC(year("&sysdate"d));
%PUT %SYSFUNC(PUTN(&curday, z2.));
FILENAME CSV "\\path\Data\DT-&curyear-&curmonth-&curday-TEST.csv";
But the string it generates is like Data\DT-2016-3-3-TEST.csv rather than Data\DT-2016-03-03-TEST.csv
In other words, the trailing zeros are not there. What am I doing incorrectly?
You'll need to use either a macro variable or a big group of macro functions (whichever you'd like). We'll go with creating macro variables for readability purposes. Based upon what you've said, we know a few things about the pattern:
It starts with DT-
It has today's date in a yyyy-mm-dd format
It ends in .csv
Two of these are static values, and one needs to be dynamic in a specific format. Let's get crackin'.
Start off by storing the path in its own macro variable. This makes the code more generalizable to other applications (i.e. you can copy/paste old code for new programs! It's good to be lazy in the programming world).
%let path = \\path\data;
Next, let's build our dynamic pattern using a %let statement. We know it starts with DT-:
___________________________________________
%let file = DT-
___________________________________________
We can now cross #1 off the list! Let's knock out #2.
Two functions will help us get this in the order that we want:
%sysfunc()
today()
We'll encapsulate today() with %sysfunc(). %sysfunc() lets us run most non-macro-language SAS functions, and also has the added benefit of returning the value in a format that you desire using an additional argument. This is really helpful for us here.
So, let's grab today's date as a numeric SAS date, then convert it to yymmddx10 format, where x is some delimiter keyword. We'll use yymmddd10. - that is, a format that specifies yyyy-mm-dd. The extra d means dash.
___________________________________________
%let file = DT-%sysfunc(today(), yymmddd10.)
___________________________________________
2 is now out of the way. Hard part's over! All we need to do is append .csv to it, and we'll be all set.
___________________________________________
%let file = DT-%sysfunc(today(), yymmddd10.).csv;
___________________________________________
You can confirm the macro variable file's value with a %put statement:
%put NOTE: This is my filename: &file;
You should see in green text in the log NOTE: This is my filename: DT-2016-03-03.csv
Now, we'll just put it all together:
%let path = \\path\data;
%let file = DT-%sysfunc(today(), yymmddd10.).csv;
libname IN "&path";
filename CSV "&path\&file";
data in.DT;
infile csv dlm=',' dsd firstobs=1;
input key value1 value2;
run;
You've now got a dynamic way to read in these CSVs, and you can adapt this code elsewhere. Awesomesauce. I think you've earned yourself a celebratory coffee, and maybe a biscotti or two; don't go too crazy.
Stu's answer is absolutely correct. For the tl;dr version.
%put echos stuff to the log. All you are doing is "putting" the result of %SYSFUNC(PUTN(&curday, z2.)) to the log. You are not updating the value in &curday.
Try
%LET curday = %SYSFUNC(PUTN(&curday, z2.));
Do that for the other curmonth, too.
Take the time and read Stu's answer.

file statement in data step to export comma delimited text file

Problem: suppose i do not know the variable name and number of variable. or imagine there are too many variables that i cannot write the put statement.
the following cases is that i knew there are 3 varialbes
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put region mtg sendmail;
run;
I tried using put _all_;
And the output is:
region=N mtg=24NOV1999 sendmail=10OCT1999 _ERROR_=0 _N_=1
region=S mtg=28DEC1999 sendmail=13NOV1999 _ERROR_=0 _N_=2
region=E mtg=03DEC1999 sendmail=19OCT1999 _ERROR_=0 _N_=3
region=W mtg=04OCT1999 sendmail=20AUG1999 _ERROR_=0 _N_=4
While it does not give comman delimited format but named format instead
My desired output would be
N,24NOV1999,10OCT1999
S,28DEC1999,13NOV1999
E,03DEC1999,19OCT1999
W,04OCT1999,20AUG1999
This is right one
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (~);
run;
this one helps u
The reason you have so many answers that seem to work, but have different characters, is that the important thing is changing _all_ to (_all_). The arguments after that are not important.
Explained in some detail here, you actually have two entirely different things going on when you write
put _all_;
and
put (_all_) (:);
Programmers familiar with the concept of an overloaded function will find that as the simplest way to think of this. If put sees _all_, it calls one version of put. If it sees (_all_) (or any list of variables with ( ) around it), it calls another (expanding _all_ to its variable list). Notice that if you try
put (_all_);
It fails, and it fails with errors suggesting it is trying to call formatted input (ie, it asks you why you don't have another ( there, which would be the normal thing in formatted input after a list with ( ).)
By itself, _all_ is an argument to put that specifically tells it to use named output to output all variables in the dataset. Hence the variable=value format of the output. So in the first example, _all_ is a constant - an argument - nothing more.
In the second example, though, (_all_) is a variable list, which contains all variables as if they were typed in, space delimited. So
put (_all_) (:);
is equivalent to
put (name sex age height weight) (:);
if used with SASHELP.CLASS. Adding anything - a colon, a tilde, an ampersand, etc. - that is legal in the context of formatted output will cause that to be used.
Note that
put _all_ #;
Does not cause that to happen - apparently # (or ## or / or //) are all legal arguments to put _all_.
Interestingly, _numeric_ and _character_ do not have an analogous shortcut - clearly this is an explicit, special case just for _all_. They cannot be used without parens. put _numeric_; gives an error that _numeric_ is not a legal variable name. But, put (_numeric_) (:); is perfectly legal.
Try the colon modifier option.
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (:);
run;
Another option is to read the names from the SASHELP.VCOLUMN table, create a macro variable that lists the columns and include that in your put statement.
The documentation is a bit scarce:
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000176623.htm
:
enables you to specify a format that the PUT statement uses to write the variable value. All leading and trailing blanks are deleted, and each value is followed by a single blank.
~
enables you to specify a format that the PUT statement uses to write the variable value. SAS displays the formatted value in quotation marks even if the formatted value does not contain the delimiter. SAS deletes all leading and trailing blanks, and each value is followed by a single blank. Missing values for character variables are written as a blank (" ") and, by default, missing values for numeric variables are written as a period (".").
It is easiest to just use a variable list followed by a format list. Syntax is:
(<variable list>) (<format list>)
The values in the format list are repeated until the variables in the variable list are exhausted. The format list can include format modifiers like :,&,~ or = and cursor movement commands like /, +n, or #n.
Also you should add the DSD option to your FILE statement so that missing values are properly represented in the CSV file as having nothing between the delimiters.
So your program reduces to:
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' DSD dlm=',';
put (_all_) (:) ;
run;
The problem you had with PUT _ALL_; is that when _ALL_ is used by itself it is treated differently than when it is part of a variable list inside of (). As a variable list it does not include system generated variables such as _N_ or FIRST. or LAST. variables generated by BY statements.
Note that if you want to use _ALL_ in a variable list and still get named output you can use the = format modifier in the format list.
put (_all_) (=) ;
No, I'm Spartacus!
data _null_;
set meeting;
file 'C:\Users\Desktop\meeting2.txt' dlm=',';
put (_all_) (&);
run;
data meeting;
input region $ mtg $ sendmail $;
cards;
N 24NOV1999 10OCT1999
S 28DEC1999 13NOV1999
E 03DEC1999 19OCT1999
W 04OCT1999 20AUG1999
;
run;
proc export data=meeting
outfile='c:\input\meeting.txt'
dbms=tab replace;
delimiter=',';
run;
hope this is helpul for even number of of variables.

macro variable is uninitialized after %let statement in sas

I want to create something in SAS that works like an Excel lookup function. Basically, I set the values for macro variables var1, var2, ... and I want to find their index number according to the ref table. But I get the following messages in the data step.
NOTE: Variable A is uninitialized.
NOTE: Variable B is uninitialized.
NOTE: Variable NULL is uninitialized.
When I print the variables &num1,&num2, I get nothing. Here is my code.
data ref;
input index varname $;
datalines;
0 NULL
1 A
2 B
3 C
;
run;
%let var1=A;
%let var2=B;
%let var3=NULL;
data temp;
set ref;
if varname=&var1 then call symput('num1',trim(left(index)));
if varname=&var2 then call symput('num2',trim(left(index)));
if varname=&var3 then call symput('num3',trim(left(index)));
run;
%put &num1;
%put &num2;
%put &num3;
I can get the correct values for &num1,&num2,.. if I type varname='A' in the if-then statement. And if I subsequently change the statement back to varname=&var1, I can still get the required output. But why is it so? I don't want to input the actual string value and then change it back to macro variable to get the result everytime.
Solution to immediate problem
You need to wrap your macro variables in double quotes if you want SAS to treat them as string constants. Otherwise, it will treat them the same way as any other random bits of text it finds in your data step.
Alternatively, you could re-define the macro vars to include the quotes.
As a further option, you could use the symget or resolve functions, but these are not usually needed unless you want to create a macro variable and use it again within the same data step. If you use them as a replacement for double quotes they tend to use a lot more CPU as they will evaluate the macro vars once per row by default - normally, macro vars are evaluated just once, at compile time, before your code executes.
A better approach?
For the sort of lookup you're doing, you actually don't need to use a dataset at all - you can instead define a custom format, which gives you much more flexibility in how you can use it. E.g. this creates a format called lookup:
proc format;
value lookup
1 = 'A'
2 = 'B'
3 = 'C'
other = '#N/A' /*Since this is what vlookup would do :) */
;
run;
Then you can use the format like so:
%let testvar = 1;
%let testvar_lookup = %sysfunc(putn(&testvar, lookup.));
Or in a data step:
data _null_;
var1 = 1;
format var1 lookup.;
put var1=;
run;