SAS Safe Column Names (Updated) - sas

Is there a simple way in SAS to convert a string to a SAS-safe name that would be used as a column name?
ie.
Rob Penridge ----> Rob_Penridge
$*#'Blah#* ----> ____Blah__
I'm using a proc transpose and then want to work with the renamed columns after the transpose.
EDIT:
8 year follow-up... is there now a better way to do this? I feel like I saw a better method sometime back but I'm struggling to find any documentation/examples now that I need to do it.

proc transpose will take those names without any modification, as long as you set options validvarname=any;
If you want to work with the columns afterwards, you can use the NLITERAL function to construct named literals that can be used to refer to them:
options validvarname=any;
/* Create dataset and transpose it */
data zz;
var1 = "Rob Penridge";
var2 = 5;
output;
var1 = "$*#'Blah#*";
var2 = 100;
output;
run;
proc transpose
data = zz
out = zz_t;
id var1;
run;
/* Refer to the transposed columns in the dataset using NLITERAL */
data _null_;
set zz;
call symput(cats("name", _n_), nliteral(var1));
run;
data blah;
set zz_t;
&name1. = &name1. + 5;
&name2. = &name2. + 200;
run;

May try perl regular expression function.
Since for column name, the first character should not be numerical, it's more complicated then.
data _null_;
name1 = "1$*#' Blah1#*";
name2 = prxchange("s/[^A-Za-z_]/_/",1,prxchange("s/[^A-Za-z_0-9]/_/",-1,name1));
put name2;
run;

Take a look at the VALIDVARNAME System Option. It might allow you to accept non-valid SAS names.
Also the NOTNAME function could facilitate in helping find invalid characters.

How about using SAS's regular expression functionality? For example:
data names;
set name;
name_cleaned = prxchange('s/[^a-z0-9 ]/_/i', -1, name);
run;
This will convert anything that isn't a letter, number, or space into a _. You can add other characters that you want to allow to the list after the 9. Just be aware that some characters are "special" and must be preceded by a \.

You could also use the IDLABEL statement in the transpose to add labels that match the original values. Then use the VARLABEL function to retrieve the labels and work with them that way.

Related

How to remove first 4 and last 3 letters, digits or punctuation using PROC SQL in SAS Enterprise Guide?

I am trying to remove 009, and ,N from 009,A,N just to obtain the letter A in my dataset. Please guide how do I obtain such using PROC SQL in SAS or even in the data step. I want to keep the variable name the same and just remove the above-mentioned digits, letters and punctuation from the data.
Is your original value saved in one variable? If so, you can utilize the scan function in a data step or in Proc SQL to extract the second element from a string that contains comma delimiter:
data want (drop=str rename=(new_str=str));
set orig;
length new_str $ 1;
new_str = strip(scan(str,2,','));
run;
proc sql;
create table work.want
as select *, strip(scan(str,2,',')) as new_str length=1
from orig;
quit;
In Proc SQL, if you want to replace the original column with the updated column, you can replace the * in the SELECT clause with the names of all columns other than original variable you are modifying.
What you code really depends on what other values there are in the other rows of the data set.
From the one sample value you provide the following code could be what you want.
data have;
input myvar $char80.;
datalines;
009,A,N
009-A-N
Okay
run;
proc sql;
update work.have
set myvar = scan(myvar,2,',')
where count(myvar,',') > 1
;

Combine one column's values into a single string

This might sound awkward but I do have a requirement to be able to concatenate all the values of a char column from a dataset, into one single string. For example:
data person;
input attribute_name $ dept $;
datalines;
John Sales
Mary Acctng
skrill Bish
;
run;
Result : test_conct = "JohnMarySkrill"
The column could vary in number of rows in the input dataset.
So, I tried the code below but it errors out when the length of the combined string (samplkey) exceeds 32K in length.
DATA RECKEYS(KEEP=test_conct);
length samplkey $32767;
do until(eod);
SET person END=EOD;
if lengthn(attribute_name) > 0 then do;
test_conct = catt(test_conct, strip(attribute_name));
end;
end;
output; stop;
run;
Can anyone suggest a better way to do this, may be break down a column into chunks of 32k length macro vars?
Regards
It would very much help if you indicated what you're trying to do but a quick method is to use SQL
proc sql NOPRINT;
select name into :name_list separated by ""
from sashelp.class;
quit;
%put &name_list.;
As you've indicated macro variables do have a size limit (64k characters) in most installations now. Depending on what you're doing, a better method may be to build a macro that puts the entire list as needed into where ever it needs to go dynamically but you would need to explain the usage for anyone to suggest that option. This answers your question as posted.
Try this, using the VARCHAR() option. If you're on an older version of SAS this may not work.
data _null_;
set sashelp.class(keep = name) end=eof;
length long_var varchar(1000000);
length want $256.;
retain long_var;
long_var = catt(long_var, name);
if eof then do;
want = md5(long_var);
put want;
end;
run;

extracting a list of observations from a single sas cell

I have a sas dataset that has a list of variables embedded within a single character variable, delimited by pipes. It looks something like this:
Obs. List_of_forms
1,"|FormA(04-15-2003)||FormB(04-15-2004)|",
2,"|FormA(04-15-2002)||FormA(04-15-2003)||FormB(04-15-2003)|"
I would like to extract each of the items delimited by pipes as individual variables, so the data would look something like this:
Obs., form1, form2, form3
1,"FormA(04-15-2003)","FormB(04-15-2004)",.,
2,"FormA(04-15-2002)","FormA(04-15-2003)","FormB(04-15-2003)"
But I'm at a loss for how to do this. I've thought about coding a do-loop to iterate through each pipe, but this seems needlessly complex. Any advice for a more elegant solution?
Use the SCAN() function. First we can setup your example data.
data have ;
obs+1;
input list_of_forms $60. ;
cards;
|FormA(04-15-2003)||FormB(04-15-2004)|
|FormA(04-15-2002)||FormA(04-15-2003)||FormB(04-15-2003)|
;;;;
Now we can convert it to multiple columns.
data want;
set have ;
array form (3) $60 ;
do i=1 to dim(form);
form(i) = scan(list_of_forms,i,'|');
end;
drop i;
run;
To make it more dynamic you could find the maximum number of values over the whole dataset and replace the hard coded upper bound of 3 on the new variables.
proc sql noprint ;
select max(countw(list_of_forms,'|'))
into :nforms
from have
;
run;
...
array form (&nforms) $60 ;

Text manipulation of macro list variables to stack datasets with automated names

I have written a macro that accepts a list of variables, runs a proc mixed model using each variable as a predictor, and then exports the results to a dataset with the variable name appended to it. I am trying to figure out how to stack the results from all of the variables in a single data set.
Here is the macro:
%macro cogTraj(cog,varlist);
%let j = 1;
%let var = %scan(&varlist, %eval(&j));
%let solution = sol;
%let outsol = &solution.&var.;
%do %while (&var ne );
proc mixed data = datuse;
model &cog = &var &var*year /solution cl;
random int year/subject = id;
ods output SolutionF = &outsol;
run;
%let j = %eval(&j + 1);
%let var = %scan(&varlist, %eval(&j));
%let outsol = &solution.&var.;
%end;
%mend;
/* Example */
%cogTraj(mmmscore, varlist = bio1 bio2 bio3);
The result would be the creation of Solbio1, Solbio2, and Solbio3.
I have created a macro variable containing the "varlist" (Ideally, I'd like to input a macro variable list as the argument but I haven't figured out how to deal with the scoping):
%let biolist = bio1 bio2 bio3;
I want to stack Solbio1, Solbio2, and Solbio3 by using text manipulation to add "Sol" to the beginning of each variable. I tried the following, outside of any data step or macro:
%let biolistsol = %add_string( &biolist, Sol, location = prefix);
without success.
Ultimately, I want to do something like this;
data Solbio_stack;
set %biolistsol;
run;
with the result being a single dataset in which Solbio1, Solbio2, and Solbio3 are stacked, but I'm sure I don't have the right syntax.
Can anyone help me with the text string/dataset stacking issue? I would be extra happy if I could figure out how to change the macro to accept %biolist as the argument, rather than writing out the list variables as an argument for the macro.
I would approach this differently. A good approach for the problem is to drive it with a dataset; that's what SAS is good at, really, and it's very easy.
First, construct a dataset that has a row for each variable you're running this on, and a variable name that contains the variable name (one per row). You might be able to construct this using PROC CONTENTS or sashelp.vtable or dictionary.tables, if you're using a set of variables from one particular dataset. It can also come from a spreadsheet you import, or a text file, or anything else really - or just written as datalines, as below.
So your example would have this dataset:
data vars_run;
input name $ cog $;
datalines;
bio1 mmmscore
bio2 mmmscore
bio3 mmmscore
;;;;
run;
If your 'cog' is fairly consistent you don't need to put it in the data, if it is something that might change you might also have a variable for it in the data. I do in the above example include it.
Then, you write the macro so it does one pass on the PROC MIXED - ie, the inner part of the %do loop.
%macro cogTraj(cog=,var=, sol=sol);
proc mixed data = datuse;
model &cog = &var &var*year /solution cl;
random int year/subject = id;
ods output SolutionF = &sol.&var.;
run;
%mend cogTraj;
I put the default for &sol in there. Now, you generate one call to the macro from each row in your dataset. You also generate a list of the sol sets.
proc sql;
select cats('%cogTraj(cog=',cog,',var=',name,',sol=sol)')
into :callList
sepearated by ' '
from have;
select cats('sol',name') into :solList separated by ' '
from have;
quit;
Next, you run the macro:
&callList.
And then you can do this:
data sol_all;
set &solList.;
run;
All done, and a lot less macro variable parsing which is messy and annoying.

How to combine text and numbers in catx statement

The variable upc is already defined in my cool dataset. How do I convert it to a macro variable? I am trying to combine both text and numbers. For example blah should equal upc=123;
data cool;
set cool;
blah = catx("","upc=&upc","ccc")
run;
If upc is a numeric variable and you just want to include its value into some character string then you don't need to do anything special. Concatenation function will convert it into character before concatenating automatically:
data cool;
blah = catx("","upc=",upc,"ccc");
run;
The result:
upc----blah
123 upc= 123ccc
BTW, if you want to concatenate strings without blanks between them, you can use function CATS(), which strips all leading and trailing spaces from each argument.
The following test code works for my SAS 9.3 x64 PC.
Please note that:
1.symputx() provide the connection between dataset and macro variables.
2.cats() will be more appropriate than catx() if delimiting characters are not needed.
3.If you did not attempt to create a new data set, data _NULL_ is fine.
You can check the log to see that the correct values are being assigned.
Bill
data a;
input ID $ x y ##;
datalines;
A 1 10 A 2 20 A 3 30
;
run;
options SymbolGen MPrint MLogic MExecNote MCompileNote=all;
data _NULL_;
set a;
call symputx(cats("blah",_N_),cats(ID,x),"G");
run;
%put blah1=&blah1;
%put blah2=&blah2;
%put blah3=&blah3;