SAS - append string macro variable to data set name - sas

I'm trying to append a string macro variable to a data set name in SAS. I want to create datasets that read something like work.cps2020jan and work.cps2020feb. But that's not what I am getting. My code:
%macro loop(values);
%let count=%sysfunc(countw(&values));
%do i = 1 %to &count;
%let value=%qscan(&values,&i,%str(,));
%put &value;
data work.cps2020&value.;
set "A:\cpsb2020&value" ;
mth = "&value.";
keep
PEMLR
mth
;
run;
%end;
%mend;
%loop(%str(jan,feb));
Running this code results in the following output in the log:
NOTE: There were 138697 observations read from the data set
A:\cpsb2020jan.
NOTE: The data set WORK.CPS2020 has 138697 observations and 2 variables.
NOTE: The data set WORK.JAN has 138697 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 4.29 seconds
cpu time 0.20 seconds
feb
NOTE: There were 139248 observations read from the data set
A:\cpsb2020feb.
NOTE: The data set WORK.CPS2020 has 139248 observations and 2 variables.
NOTE: The data set WORK.FEB has 139248 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 4.44 seconds
cpu time 0.15 seconds
I don't understand why my macro creates two datasets per loop instead of one dataset per loop called work.cps2020jan and work.cps2020feb. If I change &value. to &i. SAS outputs work.cps20201 and work.cps20202. But that's not what I want.
Any insights?

The %QSCAN macro function will mask it's result with special invisible (non-printable) characters only visible to the macro processor system.
What happened is that
data work.cps2020&value.;
was seen as
data work.cps2020<mask-character><non-masked part of symbol value><mask-character>;
during executor processing, which treated the non-printable mask character as a non-syntax token separator, resulting in a DATA statement listing two output tables.
data work.cps2020 jan;
The positions of mask characters in a macro variable can be observed (in the LOG) using %put _user_, or, the actual symbol contents can be captured from a metadata view such as SASHELP.VMACRO or DICTIONARY.MACRO
Let's simplify your macro and add some logging and symbol capture
%macro loop(values);
%local count i;
%let count=%sysfunc(countw(&values));
%do i = 1 %to &count;
%let value=%qscan(&values,&i,%str(,));
%put _user_; %*--- log them masks;
data x&i; %* --- symbol capture;
set sashelp.vmacro;
where name like '%VALUE%';
value_hex = put (value,$HEX40.);
run;
%* --- do the step that creates two tables;
data work.cps2020&value.;
set sashelp.class;
run;
%end;
%mend;
options nomprint nosymbolgen nomlogic;
%loop(%str(jan,feb));
proc print data=x1 noobs style(data)=[fontsize=14pt fontfamily="Courier"];
var value:;
run;
LOG snippet, those little boxes are the special invisible masking characters (I am showing them in image captures because stack overflow / html won't show non-printable characters)
Same LOG text, copy and pasted into Notepad2 show the mask characters as control characters
The Proc PRINT of the captured macro symbol data will expose the hexadecimal masking characters
06 macro %quote start
08 macro %quote end
01 macro %str start
02 macro %str end
1E masked version of comma

&value is returned as quoted by %qscan(). Use %scan() instead. Quoted macro variables can sometimes cause issues on resolution when they're used in this way. It's best to only quote them when needed, such as in a %put statement that has a % sign within it.

You don't need %qscan(). If the value contained any characters that need macro quoting then they would be invalid for use in a member name anyway. So use %scan() instead.
But when used inside of a macro the tokenizer will sometimes mistakenly see things like xxx&mvar as two tokens even when there are no special characters in &mvar. You can group the value you are generating to work around that.
For example by making a new macro variable
%let dsn=cps2020&value.;
data work.&dsn. ;
Or use the %unquote() function:
data %unquote(work.cps2020&value.);
Or use a name literal:
data work."cps2020&value."n;

Related

Can I nest %sysfunc-functions or achieve similar results?

I have the following strings that are used (in different variations) as variable names:
Data variables;
input variable;
datalines;
Exkl_UtgUtl_Flyg
Exkl_UtgUtl_Tag
Exkl_UtgUtl_Farja
Exkl_UtgUtl_Hyrbil
Exkl_UtgUtl_Bo
Exkl_UtgUtl_Aktiv
Exkl_UtgUtl_Annat
;
run;
In order to reference related variables I need to turn variables of the type "Exkl_UtgUtl_Flyg" to variables of the type "UtgUtl_FlygSSEK_Pers" and "UtgUtl_FlygSSEK_PPmedel".I try to do this in the following macro, along with other manipulations:
%macro imputera_saknad_utgift(variabel);
DATA IBIS3_5;
SET IBIS3_5;
if &variabel=1 and %sysfunc(cats(%qsysfunc(TRANWRD(&variabel,'Exkl_','')),SSEK_Pers))=. then
%sysfunc(cats(%qsysfunc(TRANWRD(&variabel,'Exkl_','')),SSEK_Pers))=%sysfunc(cats(%qsysfunc(TRANWRD(&variabel,'Exkl_','')),SSEK_PPmedel));
RUN;
%mend imputera_saknad_utgift;
The documentation stated that %sysfunc can't be nested, but mentioned something about alternating
%sysfunc- and %qsysfunc-functions so I tried that. I then try to execute the code:
data _null_;
set variabler2;
call execute(cats('%imputera_saknad_utgift(',utgifter_inte_missing,')'));
run;
This does not seem to work however. The cats-function seems to have worked, but not the nested TRANWRD-function:
NOTE: DATA statement used (Total process time):
real time 0.11 seconds
cpu time 0.12 seconds
5 + DATA IBIS3_5; SET IBIS3_5; if Exkl_UtgUtl_Bo=1 and Exkl_UtgUtl_BoSSEK_Pers=. then
Exkl_UtgUtl_BoSSEK_Pers=Exkl_UtgUtl_BoSSEK_PPmedel;
How do I make this work? The output should look something like:
DATA IBIS3_5; SET IBIS3_5; if Exkl_UtgUtl_Bo=1 and UtgUtl_BoSSEK_Pers=. then
UtgUtl_BoSSEK_Pers=UtgUtl_BoSSEK_PPmedel;
I don't think your macro variable values have quote characters in them, so this code is not going to work:
%qsysfunc(TRANWRD(&variabel,'Exkl_',''))
Since it is looking to replace the 7 character string 'Exkl_' with just the two character string '', two quotes next to each other.
You probably meant to search for Exkl_ instead. You probably also do not want to use %QSYSFUNC() here since that will preserve the space that TRANWRD() will insert. You could use %SYSFUNC() to avoid having that leading space as part of the value. Or perhaps use the TRANSTRN() function instead since that function, unlike TRANWRD(), can translate to an empty string instead of a single space.
Example:
439 %let variable=Exkl_UtgUtl_Flyg ;
440 %put %qsysfunc(TRANWRD(&variable,'Exkl_','')) ;
Exkl_UtgUtl_Flyg
441 %put %qsysfunc(TRANWRD(&variable,Exkl_,)) ;
UtgUtl_Flyg
442 %put %sysfunc(TRANWRD(&variable,Exkl_,)) ;
UtgUtl_Flyg
443 %put %qsysfunc(TRANSTRN(&variable,Exkl_,)) ;
UtgUtl_Flyg

Using SAS SET statement with numbered macro variables

I'm trying to create a custom transformation within SAS DI Studio to do some complicated processing which I will want to reuse often. In order to achieve this, as a first step, I am trying to replicate the functionality of a simple APPEND transformation.
To this end, I've enabled multiple inputs (max of 10) and am trying to leverage the &_INPUTn and &_INPUT_count macro variables referenced here. I would like to simply use the code
data work.APPEND_DATA / view=work.APPEND_DATA;
%let max_input_index = %sysevalf(&_INPUT_count - 1,int);
set &_INPUT0 - &&_INPUT&max_input_index;
keep col1 col2 col3;
run;
However, I receive the following error:
ERROR: Missing numeric suffix on a numbered data set list (WORK.SOME_INPUT_TABLE-WORK.ANOTHER_INPUT_TABLE)
because the macro variables are resolved to the names of the datasets they refer to, whose names do not conform to the format required for the
SET dataset1 - dataset9;
statement. How can I get around this?
Much gratitude.
You need to create a macro that loops through your list and resolves the variables. Something like
%macro list_tables(n);
%do i=1 %to &n;
&&_INPUT&i
%end;
%mend;
data work.APPEND_DATA / view=work.APPEND_DATA;
%let max_input_index = %sysevalf(&_INPUT_count - 1,int);
set %list_tables(&max_input_index);
keep col1 col2 col3;
run;
The SET statement will need a list of the actual dataset names since they might not form a sequence of numeric suffixed names.
You could use a macro %DO loop if are already running a macro. Make sure to not generate any semi-colons inside the %DO loop.
set
%do i=1 %to &_inputcount ; &&_input&i %end;
;
But you could also use a data step to concatenate the names into a single macro variable that you could then use in the SET statement.
data _null_;
call symputx('_input1',symget('_input'));
length str $500 ;
do i=1 to &_inputcount;
str=catx(' ',str,symget(cats('_input',i)));
end;
call symputx('_input',str);
run;
data .... ;
set &_input ;
...
The extra CALL SYMPUTX() at the top of the data step will handle the case when count is one and SAS only creates the _INPUT macro variable instead of creating the series of macro variables with the numeric suffix. This will set _INPUT1 to the value of _INPUT so that the DO loop will still function.

SAS Macro quoting issues

I am trying to perform operations on binary data saved into a macro variable. The datastep below successfully saves the data into the macro variable without any issues:
data _null_;
infile datalines truncover ;
attrib x length=$300 informat=$300. format=$300.;
input x $300.;
put x=;
call symput ('str',cats(x));
datalines4;
‰PNG > IHDR ) ) ëŠZ sRGB ®Î=é gAMA ±^üa pHYs ;à ;ÃÇo¨d ZIDAT8OåŒ[ À½ÿ¥Ó¼”Ö5Dˆ_v#aw|+¸AnŠ‡;6<ÞóRÆÒÈeFõU/'“#f™Ù÷&É|&t"<ß}4¯à6†Ë-Œ_È(%<É'™èNß%)˜Î{- IEND®B`‚
;;;;
run;
When I try and use the contents of the macro variable in any way, the combinations of reserved characters are making it impossible to work with. The following reserved characters are in the value, and are not matched:
&%'"()
I've tried every combination of macro quoting functions I can think of and I can't even get the value to print using a %put():
%put %nrbquote(&str);
Results in:
SYMBOLGEN: Macro variable STR resolves to ‰PNG > IHDR ) ) ëŠZ sRGB ®Î=é gAMA
±^üa pHYs ;à ;ÃÇo¨d ZIDAT8OåŒ[
À½ÿ¥Ó¼”Ö5Dˆ_v#aw|+¸AnŠ‡;6<ÞóRÆÒÈeFõU/'“#f™Ù÷&É|&t"<ß}4¯à6†Ë-Œ_È(%<É'™èNß%)˜Î{-
IEND®B`‚
ERROR: The value É is not a valid SAS name.
ERROR: The SAS Macro Facility has encountered an I/O error. Canceling submitted statements.
NOTE: The SAS System stopped processing due to receiving a CANCEL request.
Ultimately, what I'd like to do is convert these values to a base64 encoding using the following statement (I've pre-calculated the length of the base64 format for ease-of-debugging):
%let base64_string = %sysfunc(putc(%nrbquote(&str),$base64x244.));
You can use %SUPERQ() to quote a macro variable without having to first expand it. Note that it takes the name of macro variable and not the value as its argument.
%let base64_string = %sysfunc(putc(%superq(str),$base64x244.));
But why not just do the transformation in a DATA STEP and avoid the macro quoting issues?

Do Loop using CALL SYMPUT

Having some problems with Do Loop Concepts. I have a static date (can be any date for that matter) defined with -
%LET DATE = %SYSFUNC(TODAY());
%PUT &DATE;
I need to create a series of macro variables that hold values of that date (&DATE) incremented by 10 days, so I used a simple data step to achieve this -
DATA _NULL_;
CALL SYMPUT('DATE10',&DATE+10);
CALL SYMPUT('DATE20',&DATE+20);
CALL SYMPUT('DATE30',&DATE+30);
RUN;
This method is OK for increments of 10 up to 30 days after the initial value of &DATE. I am now tasked with extending the report to execute on dates that extend to 250 days (incremented by 10 days) from the value of &DATE. Assuming a DO LOOP would be the most efficient execution method, I am having trouble understanding how the loop would "create" a new macro var (ex. &Date150) within the loop. Assuming the syntax below is correct, I am not sure what the next/correct step would be :
DATA _NULL_;
DO I=10 TO 150 BY 10;
CALL SYMPUT('DATE10',&DATE);
END;
RUN;
How would I "increment" the actual name of the macro var (&DATE10,&Date20...&Date150) in the loop while executing the creation of a macro var based on 10 day increments?
Use the I variable as part of your variable name, via a concatenation function, cats() is probably appropriate. Also, a personal preference, but I prefer Call SymputX as it removes any extra spaces.
DATA _NULL_;
DO I=10 TO 150 BY 10;
CALL SYMPUTX(cats('DATE', i), &DATE+i);
END;
RUN;
Consider placing the values into a single macro variable. As long as the list is not longer than maximum length of a macro variable.
DATA _NULL_;
length dates $32767 ;
date=today();
DO I=10 TO 150 BY 10;
dates=catx(' ',dates,date+i);
end;
CALL SYMPUTx('dates',dates);
RUN;
Then in your reporting code you can either just use the list of dates.
proc report ;
where date in (&dates);
...
run;
Or if you have macro you can use a %DO loop.
%do i=1 %to %sysfunc(countw(&dates));
%let date=%scan(&dates,&i);
proc report;
where date=&date;
....
%end;
Easy enough - pass in a variable as the first argument to call symput that contains the name of the macro variable you want to create.

Categorical variables with macro

I am trying to create categorical variables in sas. I have written the following macro, but I get an error: "Invalid symbolic variable name xxx" when I try to run. I am not sure this is even the correct way to accomplish my goal.
Here is my code:
%macro addvars;
proc sql noprint;
select distinct coverageid
into :coverageid1 - :coverageid9999999
from save.test;
%do i=1 %to &sqlobs;
%let n=coverageid&i;
%let v=%superq(&n);
%let f=coverageid_&v;
%put &f;
data save.test;
set save.test;
%if coverageid eq %superq(&v)
%then &f=1;
%else &f=0;
run;
%end;
%mend addvars;
%addvars;
You're combining macro code with data step code in a way that isn't correct. %if = macro language, meaning you are actually evaluating whether the text "coverageid" is equal to the text that %superq(&v) evaluates to, not whether the contents of the coverageid variable equal the value in &v. You could just convert %if to if, but even if you got that to work properly it would be hideously inefficient (you're rewriting the dataset N times, so if you have 1500 values for coverageID you rewrite the entire 500MB dataset or whatnot 1500 times, instead of just once).
If what you want to do is take the variable 'coverageid' and convert it to a set of variables that consist of all possible values of coverageid, 1/0 binary, for each, there are a nubmer of ways to do it. I'm fairly sure the ETS module has a procedure that just does this, but I don't recall it off the top of my head - if you were to post this to the SAS mailing list, one of the guys there would undoubtedly have it quickly.
The simple way for me, is to do this with entirely datastep code. First determine how many potential values there are for COVERAGEID, then assign each to a direct value, then assign the value to the correct variable.
If the COVERAGEID values are consecutive (ie, 1 to some number, no skips, or you don't mind skipping) then this is easy - set up an array and iterate over it. I will assume they are NOT consecutive.
*First, get the distinct values of coverageID. There are a dozen ways to do this, this works as well as any;
proc freq data=save.test;
tables coverageid/out=coverage_values(keep=coverageid);
run;
*Then save them into a format. This converts each value to a consecutive number (so the lowest value becomes 1, the next lowest 2, etc.) This is not only useful for this step, but it can be useful in the future in converting back.;
data coverage_values_fmt;
set coverage_values;
start=coverageid;
label=_n_;
fmtname='COVERAGEF';
type='i';
call symputx('CoverageCount',_n_);
run;
*Import the created format;
proc format cntlin=coverage_values_fmt;
quit;
*Now use the created format. If you had already-consecutive values, you could skip to this step and skip the input statement - just use the value itself;
data save.test_fin;
set save.test;
array coverageids coverageid1-coverageid&coveragecount.;
do _t = 1 to &coveragecount.;
if input(coverageid,COVERAGEF.) = _t then coverageids[_t]=1;
else coverageids[_t]=0;
end;
drop _t;
run;
Here's another way that doesn't use formats, and may be easier to follow.
First, just make some test data:
data test;
input coverageid ##;
cards;
3 27 99 105
;
run;
Next, create a data set with no observations but one variable for each level of coverageid. Note that this approach allows arbitrary values here.
proc transpose data=test out=wide(drop=_name_);
id coverageid;
run;
Finally, create a new data set that combines the initial data set and the wide one. Then, for each level of x, look at each categorical variable and decide whether to turn it "on".
data want;
set test wide;
array vars{*} _:;
do i=1 to dim(vars);
vars{i} = (coverageid = substr(vname(vars{i}),2,1));
end;
drop i;
run;
The line
vars{i} = (coverageid = substr(vname(vars{i}),2));
may require more explanation. vname returns the name of the variable, and since we didn't specify a prefix in proc transpose, all variables are named something like _1, _2, etc. So we take the substring of the variable name that starts in the second position, and compare it to coverageid; if they're the same, we set the variable to 1; otherwise it evaluates to 0.