SAS column name replacement scheme - sas

I have a pass-through query using proc sql (excerpt below), in which some of the resulting names are modified since they aren't valid SAS names. 1A is replaced by _A, 2A is replaced by _A0, and some other changes are made. My questions are:
Is there a document which explains the rules for name replacement e.g. 2A becomes _A0?
Is it possible for me to change the way SAS corrects the names? For example, can I make 1A become _1A instead of _A?
.
proc sql;
connect to oracle as clc([omitted]);
CREATE table out.bk_ald as
SELECT *
FROM connection to bpm (
SELECT
, "1A"
, "1B"
, "1C"
, "1D"
, "1E"
, "2A"
, "2B"
, "2C"
...

You cannot change the algorithm and I am not sure if it is published. But you could either rename the column yourself on the Oracle side.
select * from connection to oracle (select "1A" as "_1A", ...);
Or rename on the SAS side. SAS will store the original name as the variable's LABEL. You could query the metadata and use that to rename the variables.
proc contents data=bk_ald noprint out=contents; run;
proc sql noprint ;
select catx(name,'=',cats('_',label)) into :rename separated by ' '
from contents
where upcase(name) ne upcase(label)
;
quit;
data want ;
set bk_ald;
rename &rename ;
run;

Related

Renaming all variables from a SAS Table

I have two SAS tables which are the same, only the column names aren't the same.
The first table D1 has 80 column names that have the following pattern X1000_a010_b020 and the second table D2 has 80 column names that have the following pattern X_1000_a0010_b0020. Please note that they are not in the same order.
I want to make sure that all the columns from D1 have the same names as in D2. In other words, I want to add the underscore after the X and add a 0 after all the a's and b's.
However I don't how to proceed. I would guess that RegEx would be the go to but I am not familiar with it.
As a structure example, some times ago I was using the following code to replace spaces in a column name with an underscore. I would like to do the same but for the underscore after the X and the 0 after the a's and b's.
%macro rename_vars(table);
%local rename_list sqlobs;
proc sql noprint;
select catx('=',nliteral(name),translate(trim(name),'_',' '))
into :rename_list separated by ' '
from sashelp.vcolumn
where libname=%upcase("%scan(work.&table,-2,.)")
and memname=%upcase("%scan(&table,-1,.)")
and indexc(trim(name),' ')
;
quit;
%if &sqlobs %then %do ;
proc datasets lib=%scan(WORK.&table,-2);
modify %scan(&table,-1);
rename &rename_list;
run;
quit;
%end;
%mend rename_vars;
Your example code seems to show you have a plan for how to implement the renaming so let's just concentrate on generating the OLDNAME <-> NEWNAME pairs. You can generate a list of names in a particular dataset with PROC CONTENTS or querying DICTIONARY.COLUMNS with SQL code (or SASHELP.VCOLUMN with any tool). So let's assume you have a dataset named CONTENTS that contains a variable named NAME. So the goal is to create a new variable, which we can call NEWNAME.
So let's just translate the three transformations you say you need directly into individual actions. You can collapse the steps if you want, but there is no pressing need for efficiency in this operation.
data fixed_names;
set contents;
newname = tranwrd(upcase(name),'_A','_A0');
newname = tranwrd(newname,'_B','_B0');
newname = cats(char(newname,1),'_',substr(newname,2));
keep name newname;
run;
Now you could pull that list into a macro variable. So a space delimited list of old=new pairs is useful for rename.
proc sql noprint;
select catx('=',name,newname) into :renames
from fixed_names
where newname ne upcase(name)
;
quit;
Or if the goal is to literally compare the two datasets you might want to generate one list of old names and a separate list of new names.
select name,newname
into :oldlist separated by ' '
, :newlist separated by ' '
from fixed_names
;
Which you could then use with PROC COMPARE directly without any need to rename any variables.
proc compare data=DS1 compare=DS2 ;
var &oldlist;
with &newlist;
run;

SAS - Change date format returned from database

I'm pulling data from many Teradata tables that have dates stored in MM/DD/YYYY format (ex: 8/21/2003, 10/7/2013). SAS returns them as DDMMMYYYY, or DATE9 format (ex: 21AUG2003, 07OCT2013). Is there a way to force SAS to return date variables as MM/DD/YYYY, or MMDDYY10 format? I know I can manually specify this for specific columns, but I have a macro set up to execute the same query for 65 different tables:
%macro query(x);
proc sql;
connect using dbase;
create table &x. as select * from connection to dbase
(select *
from table.&x.);
disconnect from dbase;
quit;
%mend(query);
%query(bankaccount);
%query(budgetcat);
%query(timeattendance);
Some of these tables will have date variables and some won't. So I'd like the value to be returned as MMDDYY10 format by default. Thanks for your help!
Per the comments to my question, I was able to figure this out using the FMTINFO function. I pretty much used this same code:
proc contents data=mylib._all_ noprint out=contents;
run;
data _null_;
set contents;
where fmtinfo(format,'cat')='date';
by libname memname ;
if first.libname then call execute(catx(' ','proc datasets nolist lib=',libname,';')) ;
if first.memname then call execute(catx(' ','modify',memname,';format',name)) ;
else call execute(' '||trim(name)) ;
if last.memname then call execute(' DDMMYYS10.; run;') ;
if last.libname then call execute('quit;') ;
run;
Found here:
https://communities.sas.com/t5/SAS-Procedures/Change-DATE-formats-to-DDMMYYS10-for-ALL-unknown-number-date/td-p/366637

sql does not respect my fcmp function length

Can anybody explain to me how to get PROC SQL to give the results of my custom function the length I specify in the function definition? Datastep does it fine, but SQL gives me the default length of 200 characters.
Here is code that demonstrates the issue:
proc fcmp outlib = work.funcs.funcs ;
* Note type/length specification ;
function testy(istr $) $11 ;
return ('bibbitybobb') ;
endsub ;
quit ;
options cmplib = work.funcs ;
data from_dstep ;
set sashelp.class ;
tes = testy(name) ;
run ;
proc sql ;
create table from_sql as
select *
, testy(name) as tes
from sashelp.class
;
describe table from_dstep ;
describe table from_sql ;
quit ;
My log on that is:
47 describe table from_dstep ;
NOTE: SQL table WORK.FROM_DSTEP was created like:
create table WORK.FROM_DSTEP( bufsize=65536 )
(
Name char(8),
Sex char(1),
Age num,
Height num,
Weight num,
tes char(11)
);
48 describe table from_sql ;
NOTE: SQL table WORK.FROM_SQL was created like:
create table WORK.FROM_SQL( bufsize=65536 )
(
Name char(8),
Sex char(1),
Age num,
Height num,
Weight num,
tes char(200)
);
As you can see, the datastep gave me a length of 11 on my 'tes' variable, but sql gives me 200.
Is there a way to get length 11 when using SQL?
Unfortunately, I don't think so. SQL and data step work differently in this regard, other built-in functions have some of the same issues (CATS/CATX for example have different defaults in SQL than in data step). I think it has to do with how compilation works in the data step vs. interpretation in SQL. I believe I've seen something specifying this was expected behavior, but I can't seem to find it right now; if you'd like more detail and nobody else here can provide it, perhaps start a track with SAS support and see what they say.
You can directly set it in SQL of course:
proc sql ;
create table from_sql as
select *
, testy(name) as tes length 11
from sashelp.class
;
quit;

Reading a list of name to a SAS Macro

I am trying to read a list of values into a macro, so that the macro variable would contain the table name and create a column that would contain the table name.
My attempt, which is wrong, was trying to use the code below, and erroring out because of the line " '&tbl' as Table_Dt ". The code below is inefficient, so feel free to enhance it. Thanks for your help.
%macro flat(tbl);
proc sql exec feedback stimer noprint outobs=5;
CREATE TABLE &tbl as
SELECT
ID,
DOB,
'&tbl' as Table_Dt
FROM &tbl..flat_file;
QUIT;
%mend flat;
%flat(flat0113);
%flat(flat0213);
...
%flat(flat1213);
As you are basically processing a list, this could also be done using call execute. No need to write all the information to macro variables. All tables/libraries are already stored in the sashelp tables and therefore are ready for list processing.
data _null_;
set sashelp.vslib (where=(substr(libname,1,4) = 'FLAT')) end =eof;
if _n_ = 1 then call execute ('proc sql exec feedback stimer noprint outobs=5;');
call execute ('
CREATE TABLE '|| libname ||' AS
SELECT ID,
DOB,
"'||compress(libname)||'" as Table_Dt
FROM '||compress(libname)||'.flat_file
;
');
if eof then call execute ('QUIT;');
run;
Macros in quotation marks will only resolve with double quotes, not single. If you want to do a more efficient way, you can do so with the following modified code. I am assuming that you are reading from libraries named flat0113, flat0213, etc.
Step 1: Get a list of all the libnames with the word "flat" in it
proc sql noprint;
select distinct libname
, count(libname)
into: tbl_list separated by ' '
, total_tbls
from sashelp.vmember
where libname LIKE 'FLAT%'
;
quit;
This will create two macro variables: &tbl_list, and &total_tbls.
&tbl_list holds the values flat0113 flat0213 flat ... flat1213.
&total_tbls holds the total number of values in &tbl_list.
Step 2: Loop through the newly created list
%macro readTables;
%do i = 1 %to &total_tbls;
%let tbl = %scan(tbl_list, &i);
proc sql exec feedback stimer noprint outobs=5;
CREATE TABLE &tbl as
SELECT
ID,
DOB,
"&tbl" as Table_Dt
FROM &tbl..flat_file;
quit;
%end;
%mend;
%readTables;
This will read each individual value from &tbl_list one by one until the very end of the list.

SAS : How to iterate a dataset elements within the proc sql WHERE statement?

I need to create multiple tables using proc sql
proc sql;
/* first city */
create table London as
select * from connection to myDatabase
(select * from mainTable
where city = 'London');
/* second city */
create table Beijing as
select * from connection to myDatabase
(select * from mainTable
where city = 'Beijing');
/* . . the same thing for other cities */
quit;
The names of those cities are in the sas table myCities
How can I embed the data step into proc sql in order to iterate through all cities ?
proc sql noprint;
select quote(city_varname) into :cities separated by ',' from myCities;
quit;
*This step above creates a list as a macro variable to be used with the in() operator below. EDIT: Per Joe's comment, added quote() function so that each city will go into the macro-var list within quotes, for proper referencing by in() operator below.
create table all_cities as
select * from connection to myDatabase
(select * from mainTable
where city in (&cities));
*this step is just the step you provided in your question, slightly modified to use in() with the macro-variable list defined above.
One relatively simple solution to this is to do this entirely in a data step. Assuming you can connect via libname (which if you can connect via connect to you probably can), let's say the libname is mydb. Using a similar construction to Max Power's for the first portion:
proc sql noprint;
select city_varname
into :citylist separated by ' '
from myCities;
select cats('%when(var=',city_varname,')')
into :whenlist separated by ' '
from myCities;
quit;
%macro when(var=);
when "&var." output &var.;
%mend when;
data &citylist.;
set mydb.mainTable;
select(city);
&whenlist.;
otherwise;
end;
run;
If you're using most of the data in mainTable, this probably wouldn't be much slower than doing it database-side, as you're moving all of the data anyway - and likely it would be faster since you only hit the database once.
Even better would be to pull this to one table (like Max shows), but this is a reasonable method if you do need to create multiple tables.
You need to put your proc sql code into a SAS Macro.
Create a macro-variable for City (in my example I called the macro-variable "City").
Execute the macro from a datastep program. Since the Datastep program processes one for each observation, there is no need to create complex logic to iterate.
data mycities;
infile datalines dsd;
input macrocity $ 32.;
datalines;
London
Beijing
Buenos_Aires
;
run;
%macro createtablecity(city=);
proc sql;
/* all cities */
create table &city. as
select * from connection to myDatabase
(select * from mainTable
where city = "&city.");
quit;
%mend;
data _null_;
set mycities;
city = macrocity;
call execute('%createtablecity('||city||')');
run;
Similar to the other solutions here really, maybe a bit simpler... Pull out a distinct list of cities, place into macros, run SQL query within a do loop.
Proc sql noprint;
Select distinct city, count(city) as c
Into :n1-:n999, :c
From connection to mydb
(Select *
From mainTable)
;
Quit;
%macro createTables;
%do a=1 %to &c;
Proc sql;
Create table &&n&a as
Select *
From connection to myDb
(Select *
From mainTable
Where city="&&n&a")
;
Quit;
%end;
%mend createTables;
%createTables;