how to use macro to change names of a table column ,except the ones that you point out.numeric type columns are added prefix "XC_" and char type columns are added prefix "XN_"
A proper utility macro will accept all the processing abstractions as parameters:
data=, the data set to operate on
copy=, the variables to leave alone (copy is an homage to the copy statement in TRANSPOSE)
char_prefix=XC_, prefix to apply to names of not-copied character variables, default is XC_
num_prefix=XN_, prefix to apply to names of not-copied numeric variables, default is XN_
The innards of a utility macro is a black box. Sometimes the design of the innards are specified to allow DATA and PROC steps to occur.
Sample code
Proc SQL is used to fill a macro variable with a list of old=new name pairs that can be used in a RENAME statement executed by Proc DATASETS
%macro Fixer ( data=, copy=, char_prefix=XC_, num_prefix=XN_ );
%let syslast = &data;
%local lib mem rename_clause;
%let lib = %scan(&syslast,1);
%let mem = %scan(&syslast,2);
proc sql noprint;
select
trim(name) || '=' ||
case type
when 'num' then "&num_prefix" || name
when 'char' then "&char_prefix" || name
else ''
end
into :rename_clause separated by ' '
from
dictionary.columns
where
libname = "&lib"
and memname = "&mem"
and indexw (%upcase("©"), upcase(name)) = 0
;
proc datasets nolist;
modify &data;
rename &rename_clause;
run;
quit;
%mend;
data class;
set sashelp.class;
teacher = 'Davidowski';
run;
options mprint;
%Fixer ( data=class, copy=Name )
Other times the innards must not generate any code. For this question such a macro would use macro function %SYSFUNC to access data set functions such as open, close, attrn, vartype, varname as it tests renaming criteria and accumulates old=new name pairs that will be emitted for use by the callee.
Related
I would like to know is it possible to perform an action that keeps only columns that contain a certain character.
For example, lets say that I have columns: name, surname, sex, age.
I want to keep only columns that start with letter 's' (surname and sex).
How do I do that?
There's several variations on how to filter out names.
For prefixes or lists of variables it's pretty easy. For suffixes or more complex patterns it keeps more complicated. In general you can short cut lists as follows:
_numeric_ : all numeric variables
_character_ : all character variables
_all_ : all variables
prefix1 - prefix# : all variables with the same prefix assuming they're numbered
prefix: : all variables that start with prefix
firstVar -- lastVar : variables based on location between first and last variable, including the first and last.
first-numeric-lastVar : variables that are numeric based on location between first and last variable
Anything more complex requires that you filter it via the metadata list. SAS basically keeps some metadata about each data set so you can query that information to build your lists. Data about columns and types are in the sashelp.vcolumn or dictionary.column data set.
To filter all columns that have the word mpg for example:
*generate variable list;
proc sql noprint;
select name into :var_list separated by " "
from sashelp.vcolumn
where libname = 'SASHELP' and memname = 'CARS'
and lowcase(name) like '%mpg%';
quit;
*check log for results;
%put &var_list;
*verification from original table;
proc contents data=sashelp.cars;
run;
*example of usage;
data want;
set sashelp.cars;
keep &var_list;
run;
Some more details are available in this blog post and here (documentation).
If you want do keep only variables that start with an s, then use name prefix list operator :.
data want;
set have(keep=s:);
run;
It's possible.
In the code below I created a macro variable that has the name of columns that have in a table. After run the code you will have the name of columns you want.
PROC SQL;
SELECT
NAME
INTO:
NMVAR /* SAVE IN MACRO VARIABLE */
FROM SASHELP.VCOLUMN
WHERE
LIBNAME EQ "YOUR LIBNAME" AND /* THE NAME OF LIB MUST BE WRITTEN IN UPPERCASE */
MEMNAME EQ "YOUR TABLE" AND /* THE NAME OF 'TABLE/DATA SET' MUST BE WRITTEN IN UPPERCASE */
SUBSTR(NAME,1,1) EQ "S";
RUN;
For complex variable name selection filtering, such as regular expressions, or lookup in external metadata control table, you will need to process the metadata of the table itself to construct source that can be applied.
This example demonstrates two, of many, ways that source code can be generated.
metadata table from target table, Proc CONTENTS
process metadata, Proc SQL
construct source code
Expectation of name lists < 64K
SQL INTO :<macro-variable> for source code expected to be < 64K characters
Very large name lists or robust
A macro that streams source code from metadata table
From a data set with 50,000 variables select the columns whose name contains 2912
data have;
retain id 'HOOPLA12345' x1-x50000 .;
stop;
run;
* obtain metadata of target table;
proc contents noprint data=have
out=varlist_table
( keep=name
where= (
prxmatch('/x.*2912.*/',name) /* name selection criteria */
)
);
run;
* Short lists;
* construct source code for name list;
proc sql noprint;
select name into :varlist separated by ' ' from varlist_table;
data want;
set have (keep=&varlist); /* apply generated source code */
run;
* Arbitrary or Long lists expected;
%macro stream_column (data=, column=);
%local dsid index &column;
%let dsid=%sysfunc(open(&data(keep=&column)));
%if &dsid %then %do;
%syscall SET(dsid);
%do %while (0=%sysfunc(fetch(&dsid)));
&&&column. /* emit column value from table */
%end;
%let dsid = %sysfunc(close(&dsid));
%end;
%mend;
options mprint;
data want2;
set have (keep=
/* stream source code as macro text emissions */
%stream_column(data=varlist_table,column=name)
);
run;
I would like to know is it possible to perform an action that keeps only columns that contain a certain character.
For example, lets say that I have columns: name, surname, sex, age.
I want to keep only columns that start with letter 's' (surname and sex).
How do I do that?
There's several variations on how to filter out names.
For prefixes or lists of variables it's pretty easy. For suffixes or more complex patterns it keeps more complicated. In general you can short cut lists as follows:
_numeric_ : all numeric variables
_character_ : all character variables
_all_ : all variables
prefix1 - prefix# : all variables with the same prefix assuming they're numbered
prefix: : all variables that start with prefix
firstVar -- lastVar : variables based on location between first and last variable, including the first and last.
first-numeric-lastVar : variables that are numeric based on location between first and last variable
Anything more complex requires that you filter it via the metadata list. SAS basically keeps some metadata about each data set so you can query that information to build your lists. Data about columns and types are in the sashelp.vcolumn or dictionary.column data set.
To filter all columns that have the word mpg for example:
*generate variable list;
proc sql noprint;
select name into :var_list separated by " "
from sashelp.vcolumn
where libname = 'SASHELP' and memname = 'CARS'
and lowcase(name) like '%mpg%';
quit;
*check log for results;
%put &var_list;
*verification from original table;
proc contents data=sashelp.cars;
run;
*example of usage;
data want;
set sashelp.cars;
keep &var_list;
run;
Some more details are available in this blog post and here (documentation).
If you want do keep only variables that start with an s, then use name prefix list operator :.
data want;
set have(keep=s:);
run;
It's possible.
In the code below I created a macro variable that has the name of columns that have in a table. After run the code you will have the name of columns you want.
PROC SQL;
SELECT
NAME
INTO:
NMVAR /* SAVE IN MACRO VARIABLE */
FROM SASHELP.VCOLUMN
WHERE
LIBNAME EQ "YOUR LIBNAME" AND /* THE NAME OF LIB MUST BE WRITTEN IN UPPERCASE */
MEMNAME EQ "YOUR TABLE" AND /* THE NAME OF 'TABLE/DATA SET' MUST BE WRITTEN IN UPPERCASE */
SUBSTR(NAME,1,1) EQ "S";
RUN;
For complex variable name selection filtering, such as regular expressions, or lookup in external metadata control table, you will need to process the metadata of the table itself to construct source that can be applied.
This example demonstrates two, of many, ways that source code can be generated.
metadata table from target table, Proc CONTENTS
process metadata, Proc SQL
construct source code
Expectation of name lists < 64K
SQL INTO :<macro-variable> for source code expected to be < 64K characters
Very large name lists or robust
A macro that streams source code from metadata table
From a data set with 50,000 variables select the columns whose name contains 2912
data have;
retain id 'HOOPLA12345' x1-x50000 .;
stop;
run;
* obtain metadata of target table;
proc contents noprint data=have
out=varlist_table
( keep=name
where= (
prxmatch('/x.*2912.*/',name) /* name selection criteria */
)
);
run;
* Short lists;
* construct source code for name list;
proc sql noprint;
select name into :varlist separated by ' ' from varlist_table;
data want;
set have (keep=&varlist); /* apply generated source code */
run;
* Arbitrary or Long lists expected;
%macro stream_column (data=, column=);
%local dsid index &column;
%let dsid=%sysfunc(open(&data(keep=&column)));
%if &dsid %then %do;
%syscall SET(dsid);
%do %while (0=%sysfunc(fetch(&dsid)));
&&&column. /* emit column value from table */
%end;
%let dsid = %sysfunc(close(&dsid));
%end;
%mend;
options mprint;
data want2;
set have (keep=
/* stream source code as macro text emissions */
%stream_column(data=varlist_table,column=name)
);
run;
Working in SAS here, and have a lot of column names that I'd like to drop a pattern from. This is pretty straightforward in R:
colnames(data) <- gsub('drop_pattern', '', colnames(data))
But is there an equivalently elegant SAS way?
You can use the RENAME statement in PROC DATASETS to modify the names of variables in a dataset without having to make a new dataset.
proc datasets lib=mylib nolist;
modify mydata ;
rename freddrop_patterndy = freddy samdrop_patternmy=sammy ;
run;
quit;
You can use any number of functions, including those that support regular expressions, to construct a new name from an old name. For example if you just want to remove some constant text then something like this could work:
new_name = transtrn(old_name,'drop_pattern',trimn(' '));
You can use a query against the metadata of the variable names to generate the oldname=newname pairs into a macro variable.
proc sql noprint ;
select catx('=',name,transtrn(old_name,'drop_pattern',trimn(' '))
into :rename_list separated by ' '
from dictionary.column
where libname='MYLIB' and memname='MYDATA' and index(name,'drop_pattern')
;
quit;
Then you can use the macro variable in your code. You will probably need to skip this step if there are no names that need to be changed.
%if &sqlobs %then %do ;
proc datasets lib=mylib nolist;
modify mydata ;
rename &rename_list ;
run;
quit;
%end;
Note if you have set the VALIDVARNAME option to ANY then you will need to use the NLITERAL() function when generating the oldname=newname pairs to handle names that might not follow normal naming rules.
select catx('=',nliteral(name),nliteral(transtrn(old_name,'drop_pattern',trimn(' ')))
A novice in SAS here. I am trying to rename variables in a data set by using the new values I have in a list. Since I have multiple files with over 100 variables that need to be renamed, I created the following macro and I am trying to pass the list with the new names. However, I am not sure how to pass the list of variables and loop through it properly in the macro. Right now I am getting an error in the %do loop that says: "ERROR: The %TO value of the %DO I loop is invalid."
Any guidance will be greatly appreciated.
The list of new variables comes from another macro and it is saved in &newvars.
The number of variables in the files are the same number in the list, and the order they should be replaced is the same.
%macro rename(lib,dsn,newname);
proc sql noprint;
select nvar into :num_vars from dictionary.tables
where libname="&LIB" and memname="&DSN";
select distinct(nliteral(name)) into:vars
from dictionary.columns
where libname="&LIB" and memname="&DSN";
quit;
run;
proc datasets library = &LIB;
modify &DSN;
rename
%do i = 1 %to &num_vars.;
&&vars&i == &&newname&i.
%end;
;
quit;
run;
%mend rename;
%rename(pga3,selRound,&newvars);
Thank you in advance.
You are getting that error message because the macro variable NUM_VARS is not being set because no observations met your first where condition.
The LIBNAME and MEMNAME fields in the metadata tables are always uppercase and you called the macro with lowercase names.
You can use the %upcase() macro function to fix that. While you are at it you can eliminate the first query as SQL will count the number of variables for you in the second query. Also if you want that query to generate multiple macro variables with numeric suffixes you need to modify the into clause to say that. The DISTINCT keyword is not needed as a dataset cannot have two variables with the same name.
select nliteral(name)
into :vars1 -
from dictionary.columns
where libname=%upcase("&LIB") and memname=%upcase("&DSN")
;
%let num_vars=&sqlobs;
You also should tell it what order to generate the names. Were the new names generated in expectation that the list would be in the order the variables exist in the dataset? If so use the VARNUM variable in the ORDER BY clause. If in alphabetical order then use NAME in the ORDER BY clause.
How are you passing in the new names?
Is it a space delimited list? If so your final step should look more like this:
proc datasets library = &LIB;
modify &DSN;
rename
%do i = 1 %to &num_vars.;
&&vars&i = %scan(&newname,&i,%str( ))
%end;
;
run; quit;
If NEWNAME has the base name to use for a series of variable names with numeric suffixes then you would want this:
&&vars&i = &newname&i
If instead you are passing into NEWNAME a base string to use to locate a series of macro variables with numeric suffixes then the syntax would be more like this.
&&vars&i = &&&newname&i
So if NEWNAME=XXX and I=1 then on the first pass of the macro processor that line will transform into
&vars1 = &XXX1
And on the second pass the values of VARS1 and XXX1 would be substituted.
I have a SAS dataset which has 20 character variables, all of which are names (e.g. Adam, Bob, Cathy etc..)
I would like a dynamic code to create variables called Adam_ref, Bob_ref etc.. which will work even if there a different dataset with different names (i.e. don't want to manually define each variable).
So far my approach has been to use proc contents to get all variable names and then use a macro to create macro variables Adam_ref, Bob_ref etc..
How do I create actual variables within the dataset from here? Do I need a different approach?
proc contents data=work.names
out=contents noprint;
run;
proc sort data = contents; by varnum; run;
data contents1;
set contents;
Name_Ref = compress(Name||"_Ref");
call symput (NAME, NAME_Ref);
%put _user_;
run;
If you want to create an empty dataset that has variables named like some values you have in a macro variables you could do something like this.
Save the values into macro variables that are named by some pattern, like v1, v2 ...
proc sql;
select compress(Name||"_Ref") into :v1-:v20 from contents;
quit;
If you don't know how many values there are, you have to count them first, I assumed there are only 20 of them.
Then, if all your variables are character variables of length 100, you create a dataset like this:
%macro create_dataset;
data want;
length %do i=1 %to 20; &&v&i $100 %end;
;
stop;
run;
%mend;
%create_dataset; run;
This is how you can do it if you have the values in macro variable, there is probably a better way to do it in general.
If you don't want to create an empty dataset but only change the variable names, you can do it like this:
proc sql;
select name into :v1-:v20 from contents;
quit;
%macro rename_dataset;
data new_names;
set have(rename=(%do i=1 %to 20; &&v&i = &&v&i.._ref %end;));
run;
%mend;
%rename_dataset; run;
You can use PROC TRANSPOSE with an ID statement.
This step creates an example dataset:
data names;
harry="sally";
dick="gordon";
joe="schmoe";
run;
This step is essentially a copy of your step above that produces a dataset of column names. I will reuse the dataset namerefs throughout.
proc contents data=names out=namerefs noprint;
run;
This step adds the "_Refs" to the names defined before and drops everything else. The variable "name" comes from the column attributes of the dataset output by PROC CONTENTS.
data namerefs;
set namerefs (keep=name);
name=compress(name||"_Ref");
run;
This step produces an empty dataset with the desired columns. The variable "name" is again obtained by looking at column attributes. You might get a harmless warning in the GUI if you try to view the dataset, but you can otherwise use it as you wish and you can confirm that it has the desired output.
proc transpose out=namerefs(drop=_name_) data=namerefs;
id name;
run;
Here is another approach which requires less coding. It does not require running proc contents, does not require knowing the number of variables, nor creating a macro function. It also can be extended to do some additional things.
Step 1 is to use built-in dictionary views to get the desired variable names. The appropriate view for this is dictionary.columns, which has alias of sashelp.vcolumn. The dictionary libref can be used only in proc sql, while th sashelp alias can be used anywhere. I tend to use sashelp alias since I work in windows with DMS and can always interactively view the sashelp library.
proc sql;
select compress(Name||"_Ref") into :name_list
separated by ' '
from sashelp.vcolumn
where libname = 'WORK'
and memname = 'NAMES';
quit;
This produces a space delimited macro vaiable with the desired names.
Step 2 To build the empty data set then this code will work:
Data New ;
length &name_list ;
run ;
You can avoid assuming lengths or create populated dataset with new variable names by using a slightly more complicated select statement.
For example
select compress(Name)||"_Ref $")||compress(put(length,best.))
into :name_list
separated by ' '
will generate a macro variable which retains the previous length for each variable. This will work with no changes to step 2 above.
To create populated data set for use with rename dataset option, replace the select statement as follows:
select compress(Name)||"= "||compress(_Ref")
into :name_list
separated by ' '
Then replace the Step 2 code with the following:
Data New ;
set names (rename = ( &name_list)) ;
run ;