Length and concat in PROC SQL SAS - sas

I want to define length for some particular columns in select statement and i want to concatenate the two columns i.e sponsor id and sponsor like "ABC-123" in SAS proc sql . Please help
here is the code
proc sql;
select
project_id,
sponsor_id,
empl_country,
region,
empl_dept_descr,
empl_bu_descr,
sponsor,
full_name,
mnth_name FROM Stage0;
quit;

The CATX function will concatenate any number of arguments, of any type, strip the values and place a common delimiter (also stripped) between each. For example:
proc sql;
create table want as
select
catx('-', name, age) as name_age length=20
, catx(':', name, sex, height) as name_gender_height
from sashelp.class;
The length of a new variable will be 200 characters if the variable the CATX result is being assigned to does not have a length specified.
Stripping means the leading and trailing spaces are removed. Arguments that are missing values do not become part of the concatenation.
SAS Documentation for CATX

If you don't know the length you can use a strip() function which will remove leading and trailing spaces, in this case it will remove the spaces generated by the catx() default length:
strip(catx('-', sponsor_id, sponsor))
Code:
proc sql;
select
project_id,
sponsor_id,
empl_country,
region,
empl_dept_descr,
empl_bu_descr,
sponsor,
full_name,
mnth_name ,
/* New column */
strip(catx('-', sponsor_id, sponsor)) as new_id
FROM Stage0;
quit;

Related

How to remove first 4 and last 3 letters, digits or punctuation using PROC SQL in SAS Enterprise Guide?

I am trying to remove 009, and ,N from 009,A,N just to obtain the letter A in my dataset. Please guide how do I obtain such using PROC SQL in SAS or even in the data step. I want to keep the variable name the same and just remove the above-mentioned digits, letters and punctuation from the data.
Is your original value saved in one variable? If so, you can utilize the scan function in a data step or in Proc SQL to extract the second element from a string that contains comma delimiter:
data want (drop=str rename=(new_str=str));
set orig;
length new_str $ 1;
new_str = strip(scan(str,2,','));
run;
proc sql;
create table work.want
as select *, strip(scan(str,2,',')) as new_str length=1
from orig;
quit;
In Proc SQL, if you want to replace the original column with the updated column, you can replace the * in the SELECT clause with the names of all columns other than original variable you are modifying.
What you code really depends on what other values there are in the other rows of the data set.
From the one sample value you provide the following code could be what you want.
data have;
input myvar $char80.;
datalines;
009,A,N
009-A-N
Okay
run;
proc sql;
update work.have
set myvar = scan(myvar,2,',')
where count(myvar,',') > 1
;

Renaming all variables from a SAS Table

I have two SAS tables which are the same, only the column names aren't the same.
The first table D1 has 80 column names that have the following pattern X1000_a010_b020 and the second table D2 has 80 column names that have the following pattern X_1000_a0010_b0020. Please note that they are not in the same order.
I want to make sure that all the columns from D1 have the same names as in D2. In other words, I want to add the underscore after the X and add a 0 after all the a's and b's.
However I don't how to proceed. I would guess that RegEx would be the go to but I am not familiar with it.
As a structure example, some times ago I was using the following code to replace spaces in a column name with an underscore. I would like to do the same but for the underscore after the X and the 0 after the a's and b's.
%macro rename_vars(table);
%local rename_list sqlobs;
proc sql noprint;
select catx('=',nliteral(name),translate(trim(name),'_',' '))
into :rename_list separated by ' '
from sashelp.vcolumn
where libname=%upcase("%scan(work.&table,-2,.)")
and memname=%upcase("%scan(&table,-1,.)")
and indexc(trim(name),' ')
;
quit;
%if &sqlobs %then %do ;
proc datasets lib=%scan(WORK.&table,-2);
modify %scan(&table,-1);
rename &rename_list;
run;
quit;
%end;
%mend rename_vars;
Your example code seems to show you have a plan for how to implement the renaming so let's just concentrate on generating the OLDNAME <-> NEWNAME pairs. You can generate a list of names in a particular dataset with PROC CONTENTS or querying DICTIONARY.COLUMNS with SQL code (or SASHELP.VCOLUMN with any tool). So let's assume you have a dataset named CONTENTS that contains a variable named NAME. So the goal is to create a new variable, which we can call NEWNAME.
So let's just translate the three transformations you say you need directly into individual actions. You can collapse the steps if you want, but there is no pressing need for efficiency in this operation.
data fixed_names;
set contents;
newname = tranwrd(upcase(name),'_A','_A0');
newname = tranwrd(newname,'_B','_B0');
newname = cats(char(newname,1),'_',substr(newname,2));
keep name newname;
run;
Now you could pull that list into a macro variable. So a space delimited list of old=new pairs is useful for rename.
proc sql noprint;
select catx('=',name,newname) into :renames
from fixed_names
where newname ne upcase(name)
;
quit;
Or if the goal is to literally compare the two datasets you might want to generate one list of old names and a separate list of new names.
select name,newname
into :oldlist separated by ' '
, :newlist separated by ' '
from fixed_names
;
Which you could then use with PROC COMPARE directly without any need to rename any variables.
proc compare data=DS1 compare=DS2 ;
var &oldlist;
with &newlist;
run;

appending text to all columns at once in sas

I have a tables which have columns from col1 to col10.
I would like to append a string such as italy_col1 to italy_col10.
how can I achieve this without a macro.
Since i am joining multiple table i want to append a text "Italy" for all column in table 1 and "USA" in table 2. I tried below example it doesnt suit my requirement
https://support.sas.com/kb/48/674.html
cats function appends all the values in the column of the tables. Any suggestions?
One way is to generate macro variables and then use those in your code.
First get lists of variables to rename from TABLE1 and TABLE2.
proc sql noprint;
select catx('=',name,cats('Italy_',name) into :rename1 separated by ' '
from dictionary.columns
where libname="WORK" and memname="TABLE1" and upcase(name) ne 'ID'
;
select catx('=',name,cats('USA_',name) into :rename2 separated by ' '
from dictionary.columns
where libname="WORK" and memname="TABLE2" and upcase(name) ne 'ID'
;
quit;
Then use the list of rename pairs in your code that merges the datasets.
data want;
merge table1(rename=(&rename1)) table2(rename=(&rename2));
by id;
run;
Note this will only work when the number of variables to rename is small enough to fit into a single macro variable. If the list is longer just use another method, such as a data step, to generate the same code.
Also watch out for variable names that are too long. SAS has a limit of 32 bytes for variable names so adding 4 or 7 extra characters might result in names that are too long. You might just truncate to 32 characters , but then you risk forming duplicate names.

SAS Proc SQL Trim not working?

I have a problem that seems pretty simple (probably is...) but I can't get it to work.
The variable 'name' in the dataset 'list' has a length of 20. I wish to conditionally select values into a macro variable, but often the desired value is less than the assigned length. This leaves trailing blanks at the end, which I cannot have as they disrupt future calls of the macro variable.
I've tried trim, compress, btrim, left(trim, and other solutions but nothing seems to give me what I want (which is 'Joe' with no blanks). This seems like it should be easier than it is..... Help.
data list;
length id 8 name $20;
input id name $;
cards;
1 reallylongname
2 Joe
;
run;
proc sql;
select trim(name) into :nameselected
from list
where id=2;
run;
%put ....&nameselected....;
Actually, there is an option, TRIMMED, to do what you want.
proc sql noprint;
select name into :nameselected TRIMMED
from list
where id=2;
quit;
Also, end PROC SQL with QUIT;, not RUN;.
It works if you specify a separator:
proc sql;
select trim(name) into :nameselected separated by ''
from list
where id=2;
run;

SAS sum variables using name after a proc transpose

I have a table with postings by category (a number) that I transposed. I got a table with each column name as _number for example _16, _881, _853 etc. (they aren't in order).
I need to do the sum of all of them in a proc sql, but I don't want to create the variable in a data step, and I don't want to write all of the columns names either . I tried this but doesn't work:
proc sql;
select sum(_815-_16) as nnl
from craw.xxxx;
quit;
I tried going to the first number to the last and also from the number corresponding to the first place to the one corresponding to the last place. Gives me a number that it's not correct.
Any ideas?
Thanks!
You can't use variable lists in SQL, so _: and var1-var6 and var1--var8 don't work.
The easiest way to do this is a data step view.
proc sort data=sashelp.class out=class;
by sex;
run;
*Make transposed dataset with similar looking names;
proc transpose data=class out=transposed;
by sex;
id height;
var height;
run;
*Make view;
data transpose_forsql/view=transpose_forsql;
set transposed;
sumvar = sum(of _:); *I confirmed this does not include _N_ for some reason - not sure why!;
run;
proc sql;
select sum(sumvar) from transpose_Forsql;
quit;
I have no documentation to support this but from my experience, I believe SAS will assume that any sum() statement in SQL is the sql-aggregate statement, unless it has reason to believe otherwise.
The only way I can see for SAS to differentiate between the two is by the way arguments are passed into it. In the below example you can see that the internal sum() function has 3 arguments being passed in so SAS will treat this as the SAS sum() function (as the sql-aggregate statement only allows for a single argument). The result of the SAS function is then passed in as the single parameter to the sql-aggregate sum function:
proc sql noprint;
create table test as
select sex,
sum(sum(height,weight,0)) as sum_height_and_weight
from sashelp.class
group by 1
;
quit;
Result:
proc print data=test;
run;
sum_height_
Obs Sex and_weight
1 F 1356.3
2 M 1728.6
Also note a trick I've used in the code by passing in 0 to the SAS function - this is an easy way to add an additional parameter without changing the intended result. Depending on your data, you may want to swap out the 0 for a null value (ie. .).
EDIT: To address the issue of unknown column names, you can create a macro variable that contains the list of column names you want to sum together:
proc sql noprint;
select name into :varlist separated by ','
from sashelp.vcolumn
where libname='SASHELP'
and memname='CLASS'
and upcase(name) like '%T' /* MATCHES HEIGHT AND WEIGHT */
;
quit;
%put &varlist;
Result:
Height,Weight
Note that you would need to change the above wildcard to match your scenario - ie. matching fields that begin with an underscore, instead of fields that end with the letter T. So your final SQL statement will look something like this:
proc sql noprint;
create table test as
select sex,
sum(sum(&varlist,0)) as sum_of_fields_ending_with_t
from sashelp.class
group by 1
;
quit;
This provides an alternate approach to Joe's answer - though I believe using the view as he suggests is a cleaner way to go.