What is the simplest way to put all column names of a given dataset into a macro variable?
Another question, how should I put names of all "numeric" columns into one macro variable and names of all character columns into another macro variable?
Depending on what you're doing you can also reference all character variables with _character_ and numerical variables with _numeric_.
*doesn't do anything but illustrates how to reference all variables of a specific type;
data want;
set sashelp.class;
array _c1(*) _character_;
array _n1(*) _numeric_;
run;
If you need to do this for a number of tables then you could try defining a macro for that purpose, e.g.:
%macro get_cols(lib,mem,mvar,type);
%global &mvar;
proc sql noprint;
select
name
into
:&mvar separated by ' '
from
dictionary.columns
where
libname eq upcase("&lib")
and memname eq upcase("&mem")
%if %upcase(&type) ne ALL %then
and upcase(type) eq upcase("&type");
;
quit;
%put &mvar = &&&mvar;
%mend get_cols;
Then you could call it, as required:
/* character variables */
%get_cols(sashelp,class,cvars,char);
/* numeric variables */
%get_cols(sashelp,class,nvars,num);
/* all variables */
%get_cols(sashelp,class,vars,all);
using proc sql into clause with dictionary.columns is one of easiest way to make macro variables for your purpose. Below query uses sashelp.class, you can use your table name and libname instead
/* for all variables*/
Proc sql noprint;
select name into :macvar1 separated by ',' from
dictionary.columns
where upcase(memname) = 'CLASS'
and upcase(libname) = 'SASHELP';
/* for numeric variables*/
Proc sql noprint;
select name into :macvar2 separated by ',' from
dictionary.columns
where upcase(memname) = 'CLASS'
and upcase(libname) = 'SASHELP'
and upcase(type) = 'NUM';
/* for character variables*/
Proc sql noprint;
select name into :macvar3 separated by ',' from
dictionary.columns
where upcase(memname) = 'CLASS'
and upcase(libname) = 'SASHELP'
and upcase(type) = 'CHAR';
%put value of all variables is &macvar1;
%put value of numeric variables is &macvar2;
%put value of character variables is &macvar3;
Related
I would like to know is it possible to perform an action that keeps only columns that contain a certain character.
For example, lets say that I have columns: name, surname, sex, age.
I want to keep only columns that start with letter 's' (surname and sex).
How do I do that?
There's several variations on how to filter out names.
For prefixes or lists of variables it's pretty easy. For suffixes or more complex patterns it keeps more complicated. In general you can short cut lists as follows:
_numeric_ : all numeric variables
_character_ : all character variables
_all_ : all variables
prefix1 - prefix# : all variables with the same prefix assuming they're numbered
prefix: : all variables that start with prefix
firstVar -- lastVar : variables based on location between first and last variable, including the first and last.
first-numeric-lastVar : variables that are numeric based on location between first and last variable
Anything more complex requires that you filter it via the metadata list. SAS basically keeps some metadata about each data set so you can query that information to build your lists. Data about columns and types are in the sashelp.vcolumn or dictionary.column data set.
To filter all columns that have the word mpg for example:
*generate variable list;
proc sql noprint;
select name into :var_list separated by " "
from sashelp.vcolumn
where libname = 'SASHELP' and memname = 'CARS'
and lowcase(name) like '%mpg%';
quit;
*check log for results;
%put &var_list;
*verification from original table;
proc contents data=sashelp.cars;
run;
*example of usage;
data want;
set sashelp.cars;
keep &var_list;
run;
Some more details are available in this blog post and here (documentation).
If you want do keep only variables that start with an s, then use name prefix list operator :.
data want;
set have(keep=s:);
run;
It's possible.
In the code below I created a macro variable that has the name of columns that have in a table. After run the code you will have the name of columns you want.
PROC SQL;
SELECT
NAME
INTO:
NMVAR /* SAVE IN MACRO VARIABLE */
FROM SASHELP.VCOLUMN
WHERE
LIBNAME EQ "YOUR LIBNAME" AND /* THE NAME OF LIB MUST BE WRITTEN IN UPPERCASE */
MEMNAME EQ "YOUR TABLE" AND /* THE NAME OF 'TABLE/DATA SET' MUST BE WRITTEN IN UPPERCASE */
SUBSTR(NAME,1,1) EQ "S";
RUN;
For complex variable name selection filtering, such as regular expressions, or lookup in external metadata control table, you will need to process the metadata of the table itself to construct source that can be applied.
This example demonstrates two, of many, ways that source code can be generated.
metadata table from target table, Proc CONTENTS
process metadata, Proc SQL
construct source code
Expectation of name lists < 64K
SQL INTO :<macro-variable> for source code expected to be < 64K characters
Very large name lists or robust
A macro that streams source code from metadata table
From a data set with 50,000 variables select the columns whose name contains 2912
data have;
retain id 'HOOPLA12345' x1-x50000 .;
stop;
run;
* obtain metadata of target table;
proc contents noprint data=have
out=varlist_table
( keep=name
where= (
prxmatch('/x.*2912.*/',name) /* name selection criteria */
)
);
run;
* Short lists;
* construct source code for name list;
proc sql noprint;
select name into :varlist separated by ' ' from varlist_table;
data want;
set have (keep=&varlist); /* apply generated source code */
run;
* Arbitrary or Long lists expected;
%macro stream_column (data=, column=);
%local dsid index &column;
%let dsid=%sysfunc(open(&data(keep=&column)));
%if &dsid %then %do;
%syscall SET(dsid);
%do %while (0=%sysfunc(fetch(&dsid)));
&&&column. /* emit column value from table */
%end;
%let dsid = %sysfunc(close(&dsid));
%end;
%mend;
options mprint;
data want2;
set have (keep=
/* stream source code as macro text emissions */
%stream_column(data=varlist_table,column=name)
);
run;
I would like to know is it possible to perform an action that keeps only columns that contain a certain character.
For example, lets say that I have columns: name, surname, sex, age.
I want to keep only columns that start with letter 's' (surname and sex).
How do I do that?
There's several variations on how to filter out names.
For prefixes or lists of variables it's pretty easy. For suffixes or more complex patterns it keeps more complicated. In general you can short cut lists as follows:
_numeric_ : all numeric variables
_character_ : all character variables
_all_ : all variables
prefix1 - prefix# : all variables with the same prefix assuming they're numbered
prefix: : all variables that start with prefix
firstVar -- lastVar : variables based on location between first and last variable, including the first and last.
first-numeric-lastVar : variables that are numeric based on location between first and last variable
Anything more complex requires that you filter it via the metadata list. SAS basically keeps some metadata about each data set so you can query that information to build your lists. Data about columns and types are in the sashelp.vcolumn or dictionary.column data set.
To filter all columns that have the word mpg for example:
*generate variable list;
proc sql noprint;
select name into :var_list separated by " "
from sashelp.vcolumn
where libname = 'SASHELP' and memname = 'CARS'
and lowcase(name) like '%mpg%';
quit;
*check log for results;
%put &var_list;
*verification from original table;
proc contents data=sashelp.cars;
run;
*example of usage;
data want;
set sashelp.cars;
keep &var_list;
run;
Some more details are available in this blog post and here (documentation).
If you want do keep only variables that start with an s, then use name prefix list operator :.
data want;
set have(keep=s:);
run;
It's possible.
In the code below I created a macro variable that has the name of columns that have in a table. After run the code you will have the name of columns you want.
PROC SQL;
SELECT
NAME
INTO:
NMVAR /* SAVE IN MACRO VARIABLE */
FROM SASHELP.VCOLUMN
WHERE
LIBNAME EQ "YOUR LIBNAME" AND /* THE NAME OF LIB MUST BE WRITTEN IN UPPERCASE */
MEMNAME EQ "YOUR TABLE" AND /* THE NAME OF 'TABLE/DATA SET' MUST BE WRITTEN IN UPPERCASE */
SUBSTR(NAME,1,1) EQ "S";
RUN;
For complex variable name selection filtering, such as regular expressions, or lookup in external metadata control table, you will need to process the metadata of the table itself to construct source that can be applied.
This example demonstrates two, of many, ways that source code can be generated.
metadata table from target table, Proc CONTENTS
process metadata, Proc SQL
construct source code
Expectation of name lists < 64K
SQL INTO :<macro-variable> for source code expected to be < 64K characters
Very large name lists or robust
A macro that streams source code from metadata table
From a data set with 50,000 variables select the columns whose name contains 2912
data have;
retain id 'HOOPLA12345' x1-x50000 .;
stop;
run;
* obtain metadata of target table;
proc contents noprint data=have
out=varlist_table
( keep=name
where= (
prxmatch('/x.*2912.*/',name) /* name selection criteria */
)
);
run;
* Short lists;
* construct source code for name list;
proc sql noprint;
select name into :varlist separated by ' ' from varlist_table;
data want;
set have (keep=&varlist); /* apply generated source code */
run;
* Arbitrary or Long lists expected;
%macro stream_column (data=, column=);
%local dsid index &column;
%let dsid=%sysfunc(open(&data(keep=&column)));
%if &dsid %then %do;
%syscall SET(dsid);
%do %while (0=%sysfunc(fetch(&dsid)));
&&&column. /* emit column value from table */
%end;
%let dsid = %sysfunc(close(&dsid));
%end;
%mend;
options mprint;
data want2;
set have (keep=
/* stream source code as macro text emissions */
%stream_column(data=varlist_table,column=name)
);
run;
Issue:
I'm looking to reverse the order of all columns in a sas dataset. Should I achieve this by first transposing and then using a loop to reverse the order of the columns? This is my logic...
Step One:
data pre_transpose;
set sashelp.class;
*set &&dataset&i.. ;
_row_ + 1; * Unique identifier ;
length _charvar_ $20; * Create 1 character variable ;
run;
Step One Output:
Step Two: Do I Reverse Columns Here?
proc transpose data = pre_transpose out = middle (where = (lowcase(_name_) ne '_row_'));
by _row_;
var _all_;
quit;
Step Two Output:
EDIT:
I have attempted this:
/* use proc sql to create a macro variable for column names */
proc sql noprint;
select varnum, nliteral(name)
into :varlist, :varlist separated by ' '
from dictionary.columns
where libname = 'WORK' and memname = 'all_character'
order by varnum desc;
quit;
/* Use retain to maintain format */
data reverse_columns;
retain &varlist.;
set all_character;
run;
But I did not achieve the results I was looking for - the column order is not reversed.
You just need to get the list of variable names. One way is to use the metadata. Do if your dataset is member HAVE in libref WORK then you could use this to get the list of variable names into a single macro variable.
proc sql noprint;
select varnum , nliteral(name)
into :varlist, :varlist separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE'
order by varnum desc
;
quit;
You could then use the macro variable in a data step like this.
data want ;
retain &varlist ;
set have ;
run;
Note that the value of libname and memname in DICTIONARY.COLUMNS is in uppercase only.
I would like to create a macro that add a suffix to variable names in a dataset. below is my code:
%macro add_suffix(library=,dataset=,suffix=);
proc sql noprint;
select cat(name, ' = ', cats('&suffix.',name )) into :rename_list separated by ' ' from
dictionary.columns where libname = '&library.' and memname= '&dataset.';
quit;
proc datasets library=&library nolist nodetails;
modify &dataset;
rename &rename_list;
run;
quit;
%mend;
%add_suffix(library=OUTPUT,dataset=CA_SPREADS,suffix=CA);
It gives error messages:
NOTE: No rows were selected.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
WARNING: Apparent symbolic reference RENAME_LIST not resolved.
NOTE: Line generated by the invoked macro "ADD_SUFFIX".
2 rename &rename_list; run;
-
22
76
NOTE: Enter RUN; to continue or QUIT; to end the procedure.
ERROR 22-322: Expecting a name.
ERROR 76-322: Syntax error, statement will be ignored.
If I put the library and dataset names in quotation mark, it works for the first block i.e. add values to string rename_list but not for the proc dataset step
Macro triggers like % and & are not honored inside of single quotes. That is why you are not getting any hits on your SQL query. There is no library name that has an & as the first character.
The reason it looked like it was sort of working is that when you use this in your SQL statement
catx('=',name,cats('&prefix.',name))
then you end up with a string like
age=&prefix.age
And that will actually work because the reference to the macro variable PREFIX will resolve when you run the RENAME statement.
You should just use double quotes instead.
%macro change_names(library=,dataset=,prefix=,suffix=);
%local rename_list;
proc sql noprint;
select catx('=',name,cats("&prefix",name,"&suffix"))
into :rename_list separated by ' '
from dictionary.columns
where libname = %upcase("&library")
and memname = %upcase("&dataset")
;
quit;
%if (&sqlobs) %then %do;
proc datasets library=&library nolist nodetails;
modify &dataset;
rename &rename_list;
run;
quit;
%end;
%else %put WARNING: Did not find any variables for &library..&dataset..;
%mend change_names;
%change_names(library=OUTPUT,dataset=CA_SPREADS,prefix=CA);
Your macro variables are not being resolved because you're wrapping them in single quotes ' rather than double quotes ".
You should uppercase the libname and memname parameters of your macro as these are always in uppercase in dictionary.columns.
Tested and works. Longer but maybe more beginner friendly approach. Input the dataset name and what suffix you want to add.
example: %add_suffix(orders, _old); /* will add _old suffix to all variables.*/
%macro Add_Suffix(Dataset, suffix);
proc contents noprint
data=work.&dataset out=sjm_tmp(keep=NAME);
run;
data sjm_tmp2;
set sjm_tmp;
foobar=cats(name, '=',NAME,'&suffix.');
run;
proc sql noprint;
select foobar into :sjm_list separated by ' ' from sjm_tmp2;
quit;
proc datasets library = work nolist;
modify &dataset;
rename &sjm_list;
quit;
proc datasets library=work noprint;
delete sjm_tmp sjm_tmp2 ;
run;
%mend Add_Suffix;
I have a very large number of datasets that are not consistently formatted - I am trying to read them into SAS and normalize them.
The basic need here is to locate a 'key column' that contains a certain string - from there I know what to do with all the variables to the left and right of that column.
The 'GREP' macro from the sas website (http://support.sas.com/kb/33/078.html) seems like it can handle this, but I need help adapting the code in the following ways:
1 - I only need to search one dataset at a time, already in the 'work' library.
2 - I need to capture the name of the variable (and the position number of it) that prints to the log at the end of this macro. This seems like it would be easy but it just returns the last column in the dataset instead of the (correct) column that prints to the log at the end.
Current code below:
%macro grep(librf,string); /* parameters are unquoted, libref name, search string */
%let librf = %upcase(&librf);
proc sql noprint;
select left(put(count(*),8.)) into :numds
from dictionary.tables
where libname="&librf";
select memname into :ds1 - :ds&numds
from dictionary.tables
where libname="&librf";
%do i=1 %to &numds;
proc sql noprint;
select left(put(count(*),8.)) into :numvars
from dictionary.columns
where libname="&librf" and memname="&&ds&i" and type='char';
/* create list of variable names and store in a macro variable */
%if &numvars > 0 %then %do;
select name into :var1 - :var&numvars
from dictionary.columns
where libname="&librf" and memname="&&ds&i" and type='char';
quit;
data _null_;
set &&ds&i;
%do j=1 %to &numvars;
if &&var&j = "&string" then
put "String &string found in dataset &librf..&&ds&i for variable &&var&j";
%end;
run;
%end;
%end;
%mend;
%grep(work,Source Location);
The log returns: "String Source Location found in dataset WORK.RAW_IMPORT for variable C" (the third), which is correct.
I just need usable macro variables equal to "C" and "3" at the end. This macro will be part of a larger macro (or a prelude to it) so the two macro variables need to reset with each dataset I run through it. Thanks for any help offered.
Please find the modification below, basically what I have done was to create global macro variables for dataset name and variable name which will feed as input to get the variable position using VARNUM function as below, ( change identified by **** )
%macro grep(librf,string);
%let librf = %upcase(&librf);
proc sql noprint;
select left(put(count(*),8.)) into :numds
from dictionary.tables
where libname="&librf";
select memname into :ds1 - :ds&numds
from dictionary.tables
where libname="&librf";
%do i=1 %to &numds;
proc sql noprint;
select left(put(count(*),8.)) into :numvars
from dictionary.columns
where libname="&librf" and memname="&&ds&i" and type='char';
/* create list of variable names and store in a macro variable */
%if &numvars > 0 %then %do;
select name into :var1 - :var&numvars
from dictionary.columns
where libname="&librf" and memname="&&ds&i" and type='char';
quit;
%global var_pos var_nm var_ds;
data _null_;
set &&ds&i;
%do j=1 %to &numvars;
**** ADDED NEW CODE HERE ****;
if &&var&j = "&string" then do; /* IF-DO nesting */;
call symputx("var_nm","&&var&j"); /*Global Macro variable for Variable Name */
call symputx("var_ds","&&ds&i"); /*Global Macro variable for Dataset Name */
put "String &string found in dataset &librf..&&ds&i for variable &&var&j";
%end;
run;
**** ADDED NEW CODE HERE ****;
%let dsid=%sysfunc(open(&var_ds,i)); /* Open Data set */
%let var_pos=%sysfunc(varnum(&dsid,&var_nm)); /* Variable Position */
%let rc=%sysfunc(close(&dsid)); /* Close Data set */;
%end;
%end;
%mend;
%grep(work,Source Location);
%put &=var_nm &=var_ds &=var_pos;