Using SAS SET statement with numbered macro variables - sas

I'm trying to create a custom transformation within SAS DI Studio to do some complicated processing which I will want to reuse often. In order to achieve this, as a first step, I am trying to replicate the functionality of a simple APPEND transformation.
To this end, I've enabled multiple inputs (max of 10) and am trying to leverage the &_INPUTn and &_INPUT_count macro variables referenced here. I would like to simply use the code
data work.APPEND_DATA / view=work.APPEND_DATA;
%let max_input_index = %sysevalf(&_INPUT_count - 1,int);
set &_INPUT0 - &&_INPUT&max_input_index;
keep col1 col2 col3;
run;
However, I receive the following error:
ERROR: Missing numeric suffix on a numbered data set list (WORK.SOME_INPUT_TABLE-WORK.ANOTHER_INPUT_TABLE)
because the macro variables are resolved to the names of the datasets they refer to, whose names do not conform to the format required for the
SET dataset1 - dataset9;
statement. How can I get around this?
Much gratitude.

You need to create a macro that loops through your list and resolves the variables. Something like
%macro list_tables(n);
%do i=1 %to &n;
&&_INPUT&i
%end;
%mend;
data work.APPEND_DATA / view=work.APPEND_DATA;
%let max_input_index = %sysevalf(&_INPUT_count - 1,int);
set %list_tables(&max_input_index);
keep col1 col2 col3;
run;

The SET statement will need a list of the actual dataset names since they might not form a sequence of numeric suffixed names.
You could use a macro %DO loop if are already running a macro. Make sure to not generate any semi-colons inside the %DO loop.
set
%do i=1 %to &_inputcount ; &&_input&i %end;
;
But you could also use a data step to concatenate the names into a single macro variable that you could then use in the SET statement.
data _null_;
call symputx('_input1',symget('_input'));
length str $500 ;
do i=1 to &_inputcount;
str=catx(' ',str,symget(cats('_input',i)));
end;
call symputx('_input',str);
run;
data .... ;
set &_input ;
...
The extra CALL SYMPUTX() at the top of the data step will handle the case when count is one and SAS only creates the _INPUT macro variable instead of creating the series of macro variables with the numeric suffix. This will set _INPUT1 to the value of _INPUT so that the DO loop will still function.

Related

Create a macro that applies translate on multiple columns that you define in a dataset

I'm new to programming in SAS and I would like to do 2 macros, the first one I have done and it consists of giving 3 parameters: name of the input table, name of the column, name of the output table. What this macro does is translate the rare or accented characters, passing it a table and specifying in which column you want the rare characters to be translated:
The code to do this macro is this:
%macro translate_column(table,column,name_output);
*%LET table = TEST_MACRO_TRNSLT;
*%let column = marca;
*%let name_output = COSAS;
PROC SQL;
CREATE TABLE TEST AS
SELECT *
FROM &table.;
QUIT;
data &NAME_OUTPUT;
set TEST;
&column.=tranwrd(&column., "Á", "A");
run;
%mend;
%translate_column(TEST_MACRO_TRNSLT,marca,COSAS);
The problem comes when I try to do the second macro, that I want to replicate what I do in the first one but instead of having the columns that I can introduce to 1, let it be infinite, that is, if in a data set I have 4 columns with characters rare, can you translate the rare characters of those 4 columns. I don't know if I have to put a previously made macro in a parameter and then make a kind of loop or something in the macro.
The same by creating a kind of array (I have no experience with this) and putting those values in a list (these would be the different columns you want to iterate over) or in a macrovariable, it may be that passing this list as a function parameter works.
Could someone give me a hand on this? I would be very grateful
Either use an ARRAY or a %DO loop.
In either case use a space delimited list of variable names as the value of the COLUMN input parameter to your macro.
%translate_column
(table=TEST_MACRO_TRNSLT
,column=var1 varA var2 varB
,name_output=COSAS
);
So here is ARRAY based version:
%macro translate_column(table,column,name_output);
data &NAME_OUTPUT;
set &table.;
array __column &column ;
do over __column;
__column=ktranslate(__column, "A", "Á");
end;
run;
%mend;
Here is %DO loop based version
%macro translate_column(table,column,name_output);
%local index name ;
data &NAME_OUTPUT;
set &table.;
%do index=1 %to %sysfunc(countw(&column,%str( )));
%let name=%scan(&column,&index,%str( ));
&name = ktranslate(&name, "A", "Á");
%end;
run;
%mend;
Notice I switched to using KTRANSLATE() instead of TRANWRD. That means you could adjust the macro to handle multiple character replacements at once
&name = ktranslate(&name,'AO','ÁÓ');
The advantage of the ARRAY version is you could do it without having to create a macro. The advantage of the %DO loop version is that it does not require that you find a name to use for the array that does not conflict with any existing variable name in the dataset.

How to access nth element of array within loop in macro SAS?

I'm trying to get the ith item of an array within %do loop in macro definition and create a dataset with the name of the element, but all I can get is something like "z1" etc. This is what I got so far
%macro print(set,groupvar);
proc sql ;
select put(count(distinct &groupvar),1.)
into :hm
from &set
;
select distinct set
into :z1-:z&sysmaxlong
from &set
;
quit;
data %do i =1 %to &hm;
%scan(&z, &i);
%end;
;
%mend;
I also tried z[&i] instead of %scan(&z,&i) but still no luck
Issues:
You have a macro variable SET but you have the distinct SET in the select query, is that what you intended?
select distinct set
You don't need to specify the end of a series when creating macro variables.
select distinct &set into :z1- from &set; quit;
You created a series of macro variables Z1-Zn but try and use SCAN to retrieve the values? They're not stored in an array, its stored in a series of macro variables such as Z1, Z2 etc.
An output statement can not specify a dynamic destination table. You will need to create wallpaper code to output to the appropriate table based on splitting criteria.
Your macro will need to create macro variables to support this code template
data &out1 &out2 … &outN;
set input_data;
select;
when (&case1) output &out1;
when (&case2) output &out2;
…
when (&caseN) output &outN;
otherwise;
end;
run;
Some slick SQL could support the template
data &outlist;
set input_data;
select;
&whenStatements;
otherwise;
end;
run;
The hash object .output() method can specify a dynamic destination for saving hash contents. Consider the case of "Split a table into subset tables whose names are based on a variables values" aka "Split one data set into several data sets named according to a group variable". Some methods require a pre-parse of the data (as in your posted code), others do not.
There is some hash based splitter code at https://www.devenezia.com/downloads/sas/samples/hash-6.sas
You can find other data splitters on sas community.
If you want to use %scan() then just create one macro variable.
select distinct set
into :z separated by ' '
from &set
;
%let hm=&sqlobs;
The DATA statement will end at the first semi-colon. But you are generating &hm+1 semi-colons instead of just one. Remove the spurious semi-colon from inside the %do loop.
data
%do i =1 %to &hm;
%scan(&z, &i)
%end;
;

SAS Rename variables using a list of variables in a macro

A novice in SAS here. I am trying to rename variables in a data set by using the new values I have in a list. Since I have multiple files with over 100 variables that need to be renamed, I created the following macro and I am trying to pass the list with the new names. However, I am not sure how to pass the list of variables and loop through it properly in the macro. Right now I am getting an error in the %do loop that says: "ERROR: The %TO value of the %DO I loop is invalid."
Any guidance will be greatly appreciated.
The list of new variables comes from another macro and it is saved in &newvars.
The number of variables in the files are the same number in the list, and the order they should be replaced is the same.
%macro rename(lib,dsn,newname);
proc sql noprint;
select nvar into :num_vars from dictionary.tables
where libname="&LIB" and memname="&DSN";
select distinct(nliteral(name)) into:vars
from dictionary.columns
where libname="&LIB" and memname="&DSN";
quit;
run;
proc datasets library = &LIB;
modify &DSN;
rename
%do i = 1 %to &num_vars.;
&&vars&i == &&newname&i.
%end;
;
quit;
run;
%mend rename;
%rename(pga3,selRound,&newvars);
Thank you in advance.
You are getting that error message because the macro variable NUM_VARS is not being set because no observations met your first where condition.
The LIBNAME and MEMNAME fields in the metadata tables are always uppercase and you called the macro with lowercase names.
You can use the %upcase() macro function to fix that. While you are at it you can eliminate the first query as SQL will count the number of variables for you in the second query. Also if you want that query to generate multiple macro variables with numeric suffixes you need to modify the into clause to say that. The DISTINCT keyword is not needed as a dataset cannot have two variables with the same name.
select nliteral(name)
into :vars1 -
from dictionary.columns
where libname=%upcase("&LIB") and memname=%upcase("&DSN")
;
%let num_vars=&sqlobs;
You also should tell it what order to generate the names. Were the new names generated in expectation that the list would be in the order the variables exist in the dataset? If so use the VARNUM variable in the ORDER BY clause. If in alphabetical order then use NAME in the ORDER BY clause.
How are you passing in the new names?
Is it a space delimited list? If so your final step should look more like this:
proc datasets library = &LIB;
modify &DSN;
rename
%do i = 1 %to &num_vars.;
&&vars&i = %scan(&newname,&i,%str( ))
%end;
;
run; quit;
If NEWNAME has the base name to use for a series of variable names with numeric suffixes then you would want this:
&&vars&i = &newname&i
If instead you are passing into NEWNAME a base string to use to locate a series of macro variables with numeric suffixes then the syntax would be more like this.
&&vars&i = &&&newname&i
So if NEWNAME=XXX and I=1 then on the first pass of the macro processor that line will transform into
&vars1 = &XXX1
And on the second pass the values of VARS1 and XXX1 would be substituted.

How can I peform same datastep across many variables in SAS?

I have data that looks like this and has 500 variables with a target:
var1 var2 var3 var4 ... var500 target
The names of the variables are not sequential as above so I don't think I can use something like var1:var500. I want to loop through the variables to create graphs. Some of the variables are continous and some are nominal.
for var1 through var500
if nominal then create graphtypeA var[i] * target
else if continous then create graphtypeB var[i] * target
end;
I can easily create a second table that has the data type in it to check against. Arrays seem like they might be useful to peform this task of looping through variables. Something like:
data work.mydata;
set archive.mydata;
array myarray{501] myarray1 - myarray501
do i=1 to 500;
proc sgpanel;
panelby myarray[501];
histogram myarray[i];
end;
run;
This doesn't work though and it doens't check to see what type of variable it is. If we assume I have another sas.dataset that has varname and vartype (continuous, nominal) how can I loop through to create the desired graphs for the given vartype? Thanks in advance.
Basically, you need to loop over some variables, apply some logic to determine the variable type, then produce output depending on the variable type. While there are many approaches to this problem, one solution is to select your variables into a macro variable, loop over this "list" (not a formal data structure) of variables, and use macro control logic to designate different subroutines for numeric and character variables.
I'll use the sashelp.cars data set to illustrate. In this example the variable origin is your 'Target' variable and the variables Make, Type, Horsepower, and Cylinders are the numeric and character variables.
* get some data;
data set1 (keep = Make Type Origin Horsepower Cylinders);
set sashelp.cars;
run;
* create dataset of variable names and types;
proc contents data = set1
out = vars
noprint;
run;
* get variable names and variable types (1=numeric, 2=character)
* into two macro variable "lists" where each entry is seperated
* by a space;
proc sql noprint;
select name, type
into :varname separated by ' ', :vartype separated by ' '
from vars
where name <> "Make";
quit;
* put the macro variables to the log to confirm they are what
* you expect
%put &varname;
%put &vartype;
Now, use a macro to loop over the values in the macro variable list. The countw function counts the number of variables, and uses this number as the loop iterator limit. The scan function reads in each variable name and type by its relative position in the respective macro variable lists. For each variable the type is then evaluated and a plot is produced depending on whether it is character or numeric. In this example, a histogram with density plot is produced for numeric variables and a bar chart of frequency counts is produced for character variables.
The loop logic is general, and Proc sgpanel and Proc sgplot cab be modified or replaced with other desired data step processing or procedures.
* turn on options that are useful for
* macro debugging, turn them off
* when using in production;
options mlogic mprint symbolgen;
%macro plotter;
%do i = 1 %to %sysfunc(countw(&varname));
%let nextvar = %scan(&varname, &i, %str( ));
%let nextvartype = %scan(&vartype, &i, %str( ));
%if &nextvartype. = 1 %then %do;
proc sgpanel data=set1 noautolegend;
title "&nextvar. Distribution";
panelby Origin;
histogram &nextvar.;
density &nextvar.;
run;
%end;
%if &nextvartype. = 2 %then %do;
proc sgplot data=set1;
title "&nextvar. Count by Origin";
vbar &nextvar. /group= origin;
run;
%end;
%end;
%mend plotter;
*call the macro;
%plotter;
Unfortunately it is not possible to use arrays outside a data step in the way that you propose here, at least not in any very efficient way. However, there are quite a few options available to you. One would be just to call your graphing proc once and tell it to graph every numeric variable in your dataset, e.g. like so:
proc univariate data = sashelp.class;
var _NUMERIC_;
histogram;
run;
If the variables you want to graph that are of the same type are adjacent in the column order of your dataset, you can use a double-dash list, e.g.
proc univariate data = sashelp.class;
var age--weight;
histogram;
run;
In general you should seek to avoid calling procs or running data steps separately for every variable - it is nearly always more efficient to call them just once and process everything in one go.

Categorical variables with macro

I am trying to create categorical variables in sas. I have written the following macro, but I get an error: "Invalid symbolic variable name xxx" when I try to run. I am not sure this is even the correct way to accomplish my goal.
Here is my code:
%macro addvars;
proc sql noprint;
select distinct coverageid
into :coverageid1 - :coverageid9999999
from save.test;
%do i=1 %to &sqlobs;
%let n=coverageid&i;
%let v=%superq(&n);
%let f=coverageid_&v;
%put &f;
data save.test;
set save.test;
%if coverageid eq %superq(&v)
%then &f=1;
%else &f=0;
run;
%end;
%mend addvars;
%addvars;
You're combining macro code with data step code in a way that isn't correct. %if = macro language, meaning you are actually evaluating whether the text "coverageid" is equal to the text that %superq(&v) evaluates to, not whether the contents of the coverageid variable equal the value in &v. You could just convert %if to if, but even if you got that to work properly it would be hideously inefficient (you're rewriting the dataset N times, so if you have 1500 values for coverageID you rewrite the entire 500MB dataset or whatnot 1500 times, instead of just once).
If what you want to do is take the variable 'coverageid' and convert it to a set of variables that consist of all possible values of coverageid, 1/0 binary, for each, there are a nubmer of ways to do it. I'm fairly sure the ETS module has a procedure that just does this, but I don't recall it off the top of my head - if you were to post this to the SAS mailing list, one of the guys there would undoubtedly have it quickly.
The simple way for me, is to do this with entirely datastep code. First determine how many potential values there are for COVERAGEID, then assign each to a direct value, then assign the value to the correct variable.
If the COVERAGEID values are consecutive (ie, 1 to some number, no skips, or you don't mind skipping) then this is easy - set up an array and iterate over it. I will assume they are NOT consecutive.
*First, get the distinct values of coverageID. There are a dozen ways to do this, this works as well as any;
proc freq data=save.test;
tables coverageid/out=coverage_values(keep=coverageid);
run;
*Then save them into a format. This converts each value to a consecutive number (so the lowest value becomes 1, the next lowest 2, etc.) This is not only useful for this step, but it can be useful in the future in converting back.;
data coverage_values_fmt;
set coverage_values;
start=coverageid;
label=_n_;
fmtname='COVERAGEF';
type='i';
call symputx('CoverageCount',_n_);
run;
*Import the created format;
proc format cntlin=coverage_values_fmt;
quit;
*Now use the created format. If you had already-consecutive values, you could skip to this step and skip the input statement - just use the value itself;
data save.test_fin;
set save.test;
array coverageids coverageid1-coverageid&coveragecount.;
do _t = 1 to &coveragecount.;
if input(coverageid,COVERAGEF.) = _t then coverageids[_t]=1;
else coverageids[_t]=0;
end;
drop _t;
run;
Here's another way that doesn't use formats, and may be easier to follow.
First, just make some test data:
data test;
input coverageid ##;
cards;
3 27 99 105
;
run;
Next, create a data set with no observations but one variable for each level of coverageid. Note that this approach allows arbitrary values here.
proc transpose data=test out=wide(drop=_name_);
id coverageid;
run;
Finally, create a new data set that combines the initial data set and the wide one. Then, for each level of x, look at each categorical variable and decide whether to turn it "on".
data want;
set test wide;
array vars{*} _:;
do i=1 to dim(vars);
vars{i} = (coverageid = substr(vname(vars{i}),2,1));
end;
drop i;
run;
The line
vars{i} = (coverageid = substr(vname(vars{i}),2));
may require more explanation. vname returns the name of the variable, and since we didn't specify a prefix in proc transpose, all variables are named something like _1, _2, etc. So we take the substring of the variable name that starts in the second position, and compare it to coverageid; if they're the same, we set the variable to 1; otherwise it evaluates to 0.