appending text to all columns at once in sas - sas

I have a tables which have columns from col1 to col10.
I would like to append a string such as italy_col1 to italy_col10.
how can I achieve this without a macro.
Since i am joining multiple table i want to append a text "Italy" for all column in table 1 and "USA" in table 2. I tried below example it doesnt suit my requirement
https://support.sas.com/kb/48/674.html
cats function appends all the values in the column of the tables. Any suggestions?

One way is to generate macro variables and then use those in your code.
First get lists of variables to rename from TABLE1 and TABLE2.
proc sql noprint;
select catx('=',name,cats('Italy_',name) into :rename1 separated by ' '
from dictionary.columns
where libname="WORK" and memname="TABLE1" and upcase(name) ne 'ID'
;
select catx('=',name,cats('USA_',name) into :rename2 separated by ' '
from dictionary.columns
where libname="WORK" and memname="TABLE2" and upcase(name) ne 'ID'
;
quit;
Then use the list of rename pairs in your code that merges the datasets.
data want;
merge table1(rename=(&rename1)) table2(rename=(&rename2));
by id;
run;
Note this will only work when the number of variables to rename is small enough to fit into a single macro variable. If the list is longer just use another method, such as a data step, to generate the same code.
Also watch out for variable names that are too long. SAS has a limit of 32 bytes for variable names so adding 4 or 7 extra characters might result in names that are too long. You might just truncate to 32 characters , but then you risk forming duplicate names.

Related

SAS - Keep only columns listed in a separate dataset

I have two datasets. The first, big_dataset, has around 3000 columns, most of which are never used. The second, column_list, contains a single column called column_name with around 100 values. Each value is the name of a column I want to keep.
I want to filter big_dataset so that only columns in column_list are kept, and the rest are discarded.
If I were using Pandas dataframes in Python, this would be a trivial task:
cols = column_list['column_name'].tolist()
smaller_dataset = big_dataset[cols]
However, I can't figure out the SAS equivalent. Proc Transpose doesn't let me turn the rows into headers. I can't figure out a statement in the data step that would let this work, and as far as I'm aware this isn't something that Proc SQL could handle. I've read through the docs on Proc Datasets and that doesn't seem to have what I need either.
To obtain a list of columns from column_list to use against big_dataset, you can query the column_list table and put the result into a macro variable. This can be achieved with PROC SQL and the SEPARATED BY clause:
proc sql noprint;
select column_name
into :cols separated by ','
from column_list;
create table SMALLER_DATASET AS
select &cols.
from WORK.BIG_DATASET;
quit;
Alternatively you may use SEPARATED BY ' ' and then use the resulting list in a KEEP statement or dataset option:
proc sql noprint;
select column_name
into :cols separated by ' '
from column_list;
quit;
data small_dataset;
set big_dataset (keep=&cols.);
/* or keep=&cols.; */
run;

Renaming all variables from a SAS Table

I have two SAS tables which are the same, only the column names aren't the same.
The first table D1 has 80 column names that have the following pattern X1000_a010_b020 and the second table D2 has 80 column names that have the following pattern X_1000_a0010_b0020. Please note that they are not in the same order.
I want to make sure that all the columns from D1 have the same names as in D2. In other words, I want to add the underscore after the X and add a 0 after all the a's and b's.
However I don't how to proceed. I would guess that RegEx would be the go to but I am not familiar with it.
As a structure example, some times ago I was using the following code to replace spaces in a column name with an underscore. I would like to do the same but for the underscore after the X and the 0 after the a's and b's.
%macro rename_vars(table);
%local rename_list sqlobs;
proc sql noprint;
select catx('=',nliteral(name),translate(trim(name),'_',' '))
into :rename_list separated by ' '
from sashelp.vcolumn
where libname=%upcase("%scan(work.&table,-2,.)")
and memname=%upcase("%scan(&table,-1,.)")
and indexc(trim(name),' ')
;
quit;
%if &sqlobs %then %do ;
proc datasets lib=%scan(WORK.&table,-2);
modify %scan(&table,-1);
rename &rename_list;
run;
quit;
%end;
%mend rename_vars;
Your example code seems to show you have a plan for how to implement the renaming so let's just concentrate on generating the OLDNAME <-> NEWNAME pairs. You can generate a list of names in a particular dataset with PROC CONTENTS or querying DICTIONARY.COLUMNS with SQL code (or SASHELP.VCOLUMN with any tool). So let's assume you have a dataset named CONTENTS that contains a variable named NAME. So the goal is to create a new variable, which we can call NEWNAME.
So let's just translate the three transformations you say you need directly into individual actions. You can collapse the steps if you want, but there is no pressing need for efficiency in this operation.
data fixed_names;
set contents;
newname = tranwrd(upcase(name),'_A','_A0');
newname = tranwrd(newname,'_B','_B0');
newname = cats(char(newname,1),'_',substr(newname,2));
keep name newname;
run;
Now you could pull that list into a macro variable. So a space delimited list of old=new pairs is useful for rename.
proc sql noprint;
select catx('=',name,newname) into :renames
from fixed_names
where newname ne upcase(name)
;
quit;
Or if the goal is to literally compare the two datasets you might want to generate one list of old names and a separate list of new names.
select name,newname
into :oldlist separated by ' '
, :newlist separated by ' '
from fixed_names
;
Which you could then use with PROC COMPARE directly without any need to rename any variables.
proc compare data=DS1 compare=DS2 ;
var &oldlist;
with &newlist;
run;

SAS - Creating smaller datasets from a bigger one iteratively based on the distinct values of a certain column

I have a huge dataset with a column(variable) having 100 distinct values. I want to break this dataset into 100 smaller parts using those distinct values of the column and that too in a loop(iteratively). I have been suggested to use a macro but i'm unable to do that
Sollution with macro variables only: list your datasets and output statements in a proc sql with an into clause:
proc sql;
select distinct 'WORK.cars_'|| origin
, 'when ("'|| trim(origin) ||'") output cars_'|| origin
into :cars_data separated by ' '
, :cars_when separated by '; '
from sashelp.cars;
quit;
Leave out the double quotes for an integer criterion.
For a float criterion, convert the dot to an underscore.
data &cars_data.;
set sashelp.cars;
select (origin);
&cars_when.;
end;
run;

How to change the column headers of a sas dataset into an observation?

I have created a sas code which generates many sas datasets. Now I want to append all of them to a single excel file . So first I want to convert all the column headers of sas datasets as first observation. Then leave space between these datasets (adding a blank observation). How can we do it?
one way to do this would be to use dictionary.columns
proc sql;
create table Attribute as
select * from dictionary.columns;
Read through the table and check what attributes you are interested in. For your case you might be interested in the column "NAME" <- consist of the name of all columns.
Modify the table by adding where statement to the proc sql based on the identity of the column ( from which library / what type of file / name of file) e.g. where upcase(libname)= "WORK"
data attribute;
array column [ n ] $ length ;
do i=1 to n;
set attribute ( keep = name) ;
column [ i ] = name ;
end;
run;
Then I would proceed with data step. You could use macro variable to store the value of column's names by select variable into : but anyhow you still need to hardcode the size for the array n or any other method that store value into one observation . Also remember define the length and the type of array accordingly. You can give name to the variable in the result dataset Attribute by adding var1-varnafter the length at array statement.
For simplicity I use set statement to read observation one and one and store the value of column NAME, which is the official column name derived when using dictionary.columns into the array
Note that creating a non-temporary array would create variable(s) .
Add if you want to add the blank,
data younameit ;
merge attribute attribute(firstobs=2 keep=name rename=(name=_name));
output;
if name ne _name then do;
call missing(of _all_);
output;
end;
run;
As two datasets start with different observation and column names do not duplicate within one dataset, the next row of a valid observation ( derived from the first output statement in the resulting dataset would be empty due to call missing ( of _all_ ) ; output;
Sounds like you just want to combine the datasets and write the results to the Excel file. Do you really need the extra empty row?
libname out xlsx 'myfile.xlsx';
data out.report ;
set ds1 ds2 ...;
run;
Ensure that all your columns are character (or numeric, substitute numeric), then in your data step use:
array names{*} _character_;
do i=1 to dim(names);
call label(names{i}, names{i});
end;
output;

Renaming Column with Dynamic Name

I have a column name from a .xls file (using excelcs engine for importing) which is dynamic and changes from day to day.
I am wondering how to do reference and rename that dynamic column name within sas without knowing before hand what it will be called?
This depends some on how it dynamically changes. If it is entirely unpredictable - you cannot write code to figure it out, or to eliminate the other known columns - your simplest option may be to use GETNAMES=NO, and then set the names yourself.
If it is in some way predictable (such as it is "MYDYNAMIC_XXXX" where XXXX changes in some fashion), you can probably figure it out from dictionary.columns. (Modify libname/memname/etc. as appropriate; memname is the dataset name.)
proc sql;
select name into :dynname
from dictionary.columns
where libname='WORK' and memname='MYDATASET'
and name like 'MYDYNAMIC_%';
quit;
Alternately, you could use a NOT(IN(...)) clause to eliminate the known column names, if you need to know that.
Finally, if it is in one consistent location, easier than using GETNAMES=NO may be to query dictionary.columns based on the variable number (where varnum=5 for example if it is the fifth variable number).
Expanding on Joe's last comment - is the column is in the same position, just called something different?
If so, you can use the dictionary.columns table, selecting the specific column number, and storing the corresponding column name in a macro variable.
Example, your column is the 5th column in Excel/dataset...
/* Pull column name */
proc sql ;
select name into :DYNVAR
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS'
and varnum = 5 ;
quit ;
/* Then to reference the column simply substitute it for &DYNVAR */
data want ;
set sashelp.class (keep=&DYNVAR) ;
run ;
You could then extend this to multiple columns if necessary...
/* Pull column name */
proc sql ;
select name into :DYNVARS separated by ' '
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS'
and varnum in (1,4,5) ;
quit ;
/* Then to reference the columns simply substitute it for &DYNVARS */
data want ;
set sashelp.class (keep=&DYNVARS) ;
run ;