Renaming all variables from a SAS Table - sas

I have two SAS tables which are the same, only the column names aren't the same.
The first table D1 has 80 column names that have the following pattern X1000_a010_b020 and the second table D2 has 80 column names that have the following pattern X_1000_a0010_b0020. Please note that they are not in the same order.
I want to make sure that all the columns from D1 have the same names as in D2. In other words, I want to add the underscore after the X and add a 0 after all the a's and b's.
However I don't how to proceed. I would guess that RegEx would be the go to but I am not familiar with it.
As a structure example, some times ago I was using the following code to replace spaces in a column name with an underscore. I would like to do the same but for the underscore after the X and the 0 after the a's and b's.
%macro rename_vars(table);
%local rename_list sqlobs;
proc sql noprint;
select catx('=',nliteral(name),translate(trim(name),'_',' '))
into :rename_list separated by ' '
from sashelp.vcolumn
where libname=%upcase("%scan(work.&table,-2,.)")
and memname=%upcase("%scan(&table,-1,.)")
and indexc(trim(name),' ')
;
quit;
%if &sqlobs %then %do ;
proc datasets lib=%scan(WORK.&table,-2);
modify %scan(&table,-1);
rename &rename_list;
run;
quit;
%end;
%mend rename_vars;

Your example code seems to show you have a plan for how to implement the renaming so let's just concentrate on generating the OLDNAME <-> NEWNAME pairs. You can generate a list of names in a particular dataset with PROC CONTENTS or querying DICTIONARY.COLUMNS with SQL code (or SASHELP.VCOLUMN with any tool). So let's assume you have a dataset named CONTENTS that contains a variable named NAME. So the goal is to create a new variable, which we can call NEWNAME.
So let's just translate the three transformations you say you need directly into individual actions. You can collapse the steps if you want, but there is no pressing need for efficiency in this operation.
data fixed_names;
set contents;
newname = tranwrd(upcase(name),'_A','_A0');
newname = tranwrd(newname,'_B','_B0');
newname = cats(char(newname,1),'_',substr(newname,2));
keep name newname;
run;
Now you could pull that list into a macro variable. So a space delimited list of old=new pairs is useful for rename.
proc sql noprint;
select catx('=',name,newname) into :renames
from fixed_names
where newname ne upcase(name)
;
quit;
Or if the goal is to literally compare the two datasets you might want to generate one list of old names and a separate list of new names.
select name,newname
into :oldlist separated by ' '
, :newlist separated by ' '
from fixed_names
;
Which you could then use with PROC COMPARE directly without any need to rename any variables.
proc compare data=DS1 compare=DS2 ;
var &oldlist;
with &newlist;
run;

Related

Rename SAS columns using pattern matching

Working in SAS here, and have a lot of column names that I'd like to drop a pattern from. This is pretty straightforward in R:
colnames(data) <- gsub('drop_pattern', '', colnames(data))
But is there an equivalently elegant SAS way?
You can use the RENAME statement in PROC DATASETS to modify the names of variables in a dataset without having to make a new dataset.
proc datasets lib=mylib nolist;
modify mydata ;
rename freddrop_patterndy = freddy samdrop_patternmy=sammy ;
run;
quit;
You can use any number of functions, including those that support regular expressions, to construct a new name from an old name. For example if you just want to remove some constant text then something like this could work:
new_name = transtrn(old_name,'drop_pattern',trimn(' '));
You can use a query against the metadata of the variable names to generate the oldname=newname pairs into a macro variable.
proc sql noprint ;
select catx('=',name,transtrn(old_name,'drop_pattern',trimn(' '))
into :rename_list separated by ' '
from dictionary.column
where libname='MYLIB' and memname='MYDATA' and index(name,'drop_pattern')
;
quit;
Then you can use the macro variable in your code. You will probably need to skip this step if there are no names that need to be changed.
%if &sqlobs %then %do ;
proc datasets lib=mylib nolist;
modify mydata ;
rename &rename_list ;
run;
quit;
%end;
Note if you have set the VALIDVARNAME option to ANY then you will need to use the NLITERAL() function when generating the oldname=newname pairs to handle names that might not follow normal naming rules.
select catx('=',nliteral(name),nliteral(transtrn(old_name,'drop_pattern',trimn(' ')))

sas: how to create a variable containing a list of different variables in 2 datasets

I'm kinda new to SAS.
I have 2 datasets: set1 and set2.
I'd like to get a list of variables that's in set2 but not in set1.
I know I can easily see them by doing proc compare and then listvar,
however, i wish to copy&paste the whole list of different variables instead of copying one by one from the report generated.
i want either a macro variable containing a list of all different variables separated by space, or printing out all variables in plain texts that I can easily copy everything.
proc contents data=set1 out=cols1;
proc contents data=set2 out=cols2;
data common;
merge cols1 (in=a) cols2 (in=b);
by name;
if not a and b;
keep name;
run;
proc sql;
select name into :commoncols separated by ','
from work.common;
quit;
Get the list of variable names and then compare the lists.
Conceptually the simplest way see what is in a dataset is to use proc contents.
proc contents data=set1 noprint out=content1 ; run;
proc contents data=set2 noprint out=content2 ; run;
Now you just need to find the names that are in one and not the other.
An easy way is with PROC SQL set operations.
proc sql ;
create table in1_not_in2 as
select name from content1
where upcase(name) not in (select upcase(name) from content2)
;
create table in2_not_in1 as
select name from content2
where upcase(name) not in (select upcase(name) from content1)
;
quit;
You could also push the lists into macro variables instead of datasets.
proc sql noprint ;
select name from content1
into :in1_not_in2 separated by ' '
where upcase(name) not in (select upcase(name) from content2)
;
select name from content2
into :in2_not_in1 separated by ' '
where upcase(name) not in (select upcase(name) from content1)
;
quit;
Then you could use the macro variables to generate other code.
data both;
set set1(drop=&in1_not_in2) set2(drop=&in2_not_in1) ;
run;

Reading a list of name to a SAS Macro

I am trying to read a list of values into a macro, so that the macro variable would contain the table name and create a column that would contain the table name.
My attempt, which is wrong, was trying to use the code below, and erroring out because of the line " '&tbl' as Table_Dt ". The code below is inefficient, so feel free to enhance it. Thanks for your help.
%macro flat(tbl);
proc sql exec feedback stimer noprint outobs=5;
CREATE TABLE &tbl as
SELECT
ID,
DOB,
'&tbl' as Table_Dt
FROM &tbl..flat_file;
QUIT;
%mend flat;
%flat(flat0113);
%flat(flat0213);
...
%flat(flat1213);
As you are basically processing a list, this could also be done using call execute. No need to write all the information to macro variables. All tables/libraries are already stored in the sashelp tables and therefore are ready for list processing.
data _null_;
set sashelp.vslib (where=(substr(libname,1,4) = 'FLAT')) end =eof;
if _n_ = 1 then call execute ('proc sql exec feedback stimer noprint outobs=5;');
call execute ('
CREATE TABLE '|| libname ||' AS
SELECT ID,
DOB,
"'||compress(libname)||'" as Table_Dt
FROM '||compress(libname)||'.flat_file
;
');
if eof then call execute ('QUIT;');
run;
Macros in quotation marks will only resolve with double quotes, not single. If you want to do a more efficient way, you can do so with the following modified code. I am assuming that you are reading from libraries named flat0113, flat0213, etc.
Step 1: Get a list of all the libnames with the word "flat" in it
proc sql noprint;
select distinct libname
, count(libname)
into: tbl_list separated by ' '
, total_tbls
from sashelp.vmember
where libname LIKE 'FLAT%'
;
quit;
This will create two macro variables: &tbl_list, and &total_tbls.
&tbl_list holds the values flat0113 flat0213 flat ... flat1213.
&total_tbls holds the total number of values in &tbl_list.
Step 2: Loop through the newly created list
%macro readTables;
%do i = 1 %to &total_tbls;
%let tbl = %scan(tbl_list, &i);
proc sql exec feedback stimer noprint outobs=5;
CREATE TABLE &tbl as
SELECT
ID,
DOB,
"&tbl" as Table_Dt
FROM &tbl..flat_file;
quit;
%end;
%mend;
%readTables;
This will read each individual value from &tbl_list one by one until the very end of the list.

macro to transpose and add a prefix after ordering columns' names

I need to run a macro that does a transpose for many variables (and creates a table for each one), orders the columns names, which are numeric, but also adds as a prefix the variable's name (which is a string).
I have a macro in SAS to perform a transpose with different variables as var in the transpose. The code is:
%macro transponer(var);
proc transpose data=labo2.A_svm_200711_200806
out=labo2.D_tr_&var.0;
var &var;
id mes;
by cid;
run;
/*......more code.....*/
select cats(name, '=', &var, name)
into :prefijolista
separated by ' '
from dictionary.columns
where libname='LABO2' and memname= cats('D_TR_',upcase(&var))
and name like '_20%';
quit;
%put &prefijolista;
%mend;
Since mes is numeric I wanted to order the variable, that's why I didn't introduce the "prefix &var" in the proc transpose but instead I did it after the retain (that was useful to order the columns).
The problem starts when I try to introduce the prefix (after the ordering).
Since one of the variables' name is for example "monto", I get the following error (because it is the var variable in the transpose and it's not a column name in the transposed table):
The following columns were not found in the contributing tables:
monto.
My next step would be:
proc datasets library=labo2;
modify D_tr_&var.0;
rename &prefijolista;
quit;
But I cant do it untill I get the previous one done.
So I don't know how to order the columns after the transpose and also add the prefix.
How can I solve this?
Thanks!
You need to rename the columns using something like PROC DATASETS.
proc datasets lib=work nolist;
modify myDataSet;
rename old_col_name = new_col_name;
run;
quit;
A documentation example is available in the Base SAS guide under the doc for PROC DATASETS. It is available online at: http://support.sas.com/documentation/cdl/en/proc/67327/HTML/default/viewer.htm#n0mfav25learpan1lerk79jsp30n.htm
The problem was that &var inside the cats function inside a macro hast to use
" "
Also you could use
sysfunc(cats(D_TR, &a)
So finally the code will remain like:
%let a = %upcase(&var);
%put &a;
%let b=%sysfunc(cats(D_TR_,&a));
%put &b;
proc sql;
select cats(name, '=', "&var" , name)
into :prefijolista
separated by ' '
from dictionary.columns
where libname='LABO2' and memname= "&b"
and name like '_20%';
quit;
%put &prefijolista;
%put "&b";
PROC datasets library=LABO2;
modify &b;
rename &prefijolista;
quit;
%put "ult" &b;
Not very straightforward, but worked. :)

SAS - Creating variables from macro variables

I have a SAS dataset which has 20 character variables, all of which are names (e.g. Adam, Bob, Cathy etc..)
I would like a dynamic code to create variables called Adam_ref, Bob_ref etc.. which will work even if there a different dataset with different names (i.e. don't want to manually define each variable).
So far my approach has been to use proc contents to get all variable names and then use a macro to create macro variables Adam_ref, Bob_ref etc..
How do I create actual variables within the dataset from here? Do I need a different approach?
proc contents data=work.names
out=contents noprint;
run;
proc sort data = contents; by varnum; run;
data contents1;
set contents;
Name_Ref = compress(Name||"_Ref");
call symput (NAME, NAME_Ref);
%put _user_;
run;
If you want to create an empty dataset that has variables named like some values you have in a macro variables you could do something like this.
Save the values into macro variables that are named by some pattern, like v1, v2 ...
proc sql;
select compress(Name||"_Ref") into :v1-:v20 from contents;
quit;
If you don't know how many values there are, you have to count them first, I assumed there are only 20 of them.
Then, if all your variables are character variables of length 100, you create a dataset like this:
%macro create_dataset;
data want;
length %do i=1 %to 20; &&v&i $100 %end;
;
stop;
run;
%mend;
%create_dataset; run;
This is how you can do it if you have the values in macro variable, there is probably a better way to do it in general.
If you don't want to create an empty dataset but only change the variable names, you can do it like this:
proc sql;
select name into :v1-:v20 from contents;
quit;
%macro rename_dataset;
data new_names;
set have(rename=(%do i=1 %to 20; &&v&i = &&v&i.._ref %end;));
run;
%mend;
%rename_dataset; run;
You can use PROC TRANSPOSE with an ID statement.
This step creates an example dataset:
data names;
harry="sally";
dick="gordon";
joe="schmoe";
run;
This step is essentially a copy of your step above that produces a dataset of column names. I will reuse the dataset namerefs throughout.
proc contents data=names out=namerefs noprint;
run;
This step adds the "_Refs" to the names defined before and drops everything else. The variable "name" comes from the column attributes of the dataset output by PROC CONTENTS.
data namerefs;
set namerefs (keep=name);
name=compress(name||"_Ref");
run;
This step produces an empty dataset with the desired columns. The variable "name" is again obtained by looking at column attributes. You might get a harmless warning in the GUI if you try to view the dataset, but you can otherwise use it as you wish and you can confirm that it has the desired output.
proc transpose out=namerefs(drop=_name_) data=namerefs;
id name;
run;
Here is another approach which requires less coding. It does not require running proc contents, does not require knowing the number of variables, nor creating a macro function. It also can be extended to do some additional things.
Step 1 is to use built-in dictionary views to get the desired variable names. The appropriate view for this is dictionary.columns, which has alias of sashelp.vcolumn. The dictionary libref can be used only in proc sql, while th sashelp alias can be used anywhere. I tend to use sashelp alias since I work in windows with DMS and can always interactively view the sashelp library.
proc sql;
select compress(Name||"_Ref") into :name_list
separated by ' '
from sashelp.vcolumn
where libname = 'WORK'
and memname = 'NAMES';
quit;
This produces a space delimited macro vaiable with the desired names.
Step 2 To build the empty data set then this code will work:
Data New ;
length &name_list ;
run ;
You can avoid assuming lengths or create populated dataset with new variable names by using a slightly more complicated select statement.
For example
select compress(Name)||"_Ref $")||compress(put(length,best.))
into :name_list
separated by ' '
will generate a macro variable which retains the previous length for each variable. This will work with no changes to step 2 above.
To create populated data set for use with rename dataset option, replace the select statement as follows:
select compress(Name)||"= "||compress(_Ref")
into :name_list
separated by ' '
Then replace the Step 2 code with the following:
Data New ;
set names (rename = ( &name_list)) ;
run ;