SAS - Define array of letters - sas

Is there a shorthand in SAS for defining a sequence of letters in an array?
Many languages possess a mechanism for doing so easily and I imagine SAS does too, although I'm unable to find a reference for it.
For instance, in R I could do
> x <- letters[1:4]
> x
[1] "a" "b" "c" "d"
In Python, one way is
>>> import string
>>> list(string.ascii_lowercase[:4])
['a', 'b', 'c', 'd']
In SAS, I currently am having to list the letters explicitly,
data _null_;
array letters (4) $ _temporary_ ('a', 'b', 'c', 'd');
do i = 1 to hbound(letters);
put letters(i);
end;
run;

You can use the COLLATE() to generate a string of single byte characters. If you don't know the ASCII code for the start of the block of characters you want then use the RANK() function.
So if you only want four characters start from 'a' you could do it this way.
length str $4 ;
str = collate(rank('a'));
Or you could also use the optional second parameter to COLLATE() to specify how many characters you want.
length str $4 ;
str = collate(rank('a'),rank('a')+vlength(str)-1);
There is no need for an "array", just use a variable.
data _null_;
length str $4 ;
str = collate(rank('a'));
do i=1 to vlength(str);
ch = char(str,i);
put i= ch= :$quote. ;
end;
run;
Result:
i=1 ch="a"
i=2 ch="b"
i=3 ch="c"
i=4 ch="d"

Not that I'm aware of, but it is trivial to write a macro to do that.
%macro letter_sequence(start=1,end=, lower=1);
%local i addon;
%if &lower=1 %then %let addon=96;
%else %let addon=64;
%do i = &start+&addon. %to &end.+&addon.;
"%sysfunc(byte(&i.))"
%end;
%mend letter_sequence;
data test;
array x[4] $ (%letter_sequence(end=4));
put x[2]=;
run;

Another option is to use the collate function and the call pokelong routine:
/*Upper case*/
data _null_;
array a[26] $1;
call pokelong(collate(65,65+25),addrlong(a1),26);
put _All_;
run;
/*Lower case*/
data _null_;
array a[26] $1;
call pokelong(collate(97,97+25),addrlong(a1),26);
put _All_;
run;
This bypasses all the usual mechanisms for assigning values for individual variables and takes advantage of the default memory layout used by SAS for character arrays, copying the whole alphabet in one go starting at the address for the first element.
N.B. call pokelong might not be available in some locked-down SAS environments, e.g. SAS University Edition. Also, this might not work properly with temporary arrays in SAS 9.1.3 or earlier on some platforms.
I think this is the only way to do this in SAS without either hard-coding your letters or writing some sort of loop.

You can use a combination of the rank function (which converts a character to its ascii value) and the byte function (which converts back the other way).
data _null_;
length seq $51; /* define seq as character variable */
do i = rank('a') to rank('d'); /* loop through ascii values of required letters */
call catx(' ',seq,byte(i)); /* concatenate letters */
end;
put seq; /* print final output */
run;

Related

Matching SAS character variables to a list

So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.

load values from datasets into arrays and use them in a datastep

I have 5 separate datasets(actually many more but i want to shorten the code) named dk33,dk34,dk35,dk51,dk63, each dataset contains a numeric field: surv_probs. I would like to load the values into 5 arrays and then use the arrays in a datastep(result), however, I need advice what is the best way to do it.
I am getting error when I use the macro: setarrays: (code below)
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
ERROR: Illegal reference to the array dk33_arr.
Here is the main code.
%let var1 = dk33;
%let var2 = dk34;
%let var3 = dk35;
%let var4 = dk51;
%let var5 = dk63;
%let varN = 5;
/*put length of each column into macro variables */
%macro getlength;
%do i=1 %to &varN;
proc sql noprint;
select count(surv_probs)
into : &&var&i.._rows
from work.&&var&i;
quit;
%end;
%mend;
/*load values of column:surv_probs into macro variables*/
%macro readin;
%do i=1 %to &varN;
proc sql noprint;
select surv_probs
into: &&var&i.._list separated by ","
from &&var&i;
quit;
%end;
%mend;
data _null_;
call execute('%readin');
call execute('%getlength');
run;
/* create arrays*/
%macro setarrays;
%do i=1 %to 1;
j=1;
array &&var&i.._arr{&&&&&&var&i.._rows};
do while(scan("&&&&&&var&i.._list",j,",") ne "");
&&var&i.._arr = scan("&&&&&&var&i.._list",j,",");
j=j+1;
end;
%end;
%mend;
data result;
%setarrays
put dk33_arr(1);
* some other statements where I use the arrays*
run;
Answer to toms question:
*macro getlength(when executed) creates 5 macro variables named: dk33_rows,dk34_rows,dk35_rows,dk51_rows,dk63_rows
*the macro readin(when executed):creates 5 macro variables dk33_list,dk34_list,dk35_list,dk51_list,dk63_list. Each containing a string which is comma separates the values from the column: eg.: 0.99994,0.1999,0.1111
*the macro setarrays creates 5 arrays,when executed, dk33_arr,dk34_arr,... holding the parsed values from the macro variables created by readin
I find that "macro arrays" like VAR1,VAR2,.... are generally more trouble than they are worth. Either keep your list of dataset names in an actual dataset and generate code from that. Or if the list is short enough put the list into a single macro variable and use %SCAN() to pull out the items as you need them.
But either way it is also better to avoid trying to write macro code that needs more than three &'s. Build up the reference in multiple steps. Build a macro variable that has the name of the macro you want to reference and then pull the value of that into another macro variable. It might take more lines of code, but you can more easily understand what is happening.
%let i=1 ;
%let mvarname=var&i;
%let dataset_name=&&&mvarname;
Before you begin using macro code (or other code generation techniques) make sure you know what code you are trying to generate. If you want to load a variable into a temporary array you can just use a DO loop. There is no need to macro code, or copying values, or even counts, into macro variables. For example instead of getting the count of the observations you could just make your temporary array larger than you expect to ever need.
data test1 ;
if _n_=1 then do;
do i=1 to nobs_dk33;
array dk33 (1000) _temporary_;
set dk33 nobs=nobs_dk33 ;
dk33(i)=surv_probs;
end;
do i=1 to nobs_dk34;
array dk34 (1000) _temporary_;
set dk34 nobs=nobs_dk34 ;
dk34(i)=surv_probs;
end;
end;
* What ever you are planning to do with the DK33 and DK34 arrays ;
run;
Or you could transpose the dataset first.
proc transpose data=dk33 out=dk33_t prefix=dk33_ ;
var surv_probs ;
run;
Then your later step is easier since you can just use a SET statement to read in the one observation that has all of the values.
data test;
if _n_=1 then do;
set dk33_t ;
array dk33 dk33_: ;
end;
....
run;

Converting numeric variables to character in SAS

I have two datasets, both with same variable names. In one of the datasets two variables have character format, however in the other dataset all variables are numeric. I use the following code to convert numeric variables to character, but the numbers are changing by 490.6 -> 491.
How can I do the conversion so that the numbers wouldn't change?
data tst ;
set data (rename=(Day14=Day14_Character Day2=Day2_Character)) ;
Day14 = put(Day14_Character, 8.) ;
Day2 = put(Day2_Character, 8.) ;
drop Day14_Character Day2_Character ;
run;
Your posted code is confused. Half of it looks like code to convert from character to numeric and half looks like it is for the other direction.
To convert to character use the PUT() function. Normally you will want to left align the resulting string. You can use the -L modifier on the end of the format specification to left align the value.
So to convert numeric variables DAY14 and DAY2 to character variables of length $8 you could use code like this:
data want ;
set have (rename=(Day14=Day14_Numeric Day2=Day2_Numeric)) ;
Day14 = put(Day14_Numeric, best8.-L) ;
Day2 = put(Day2_Numeric, best8.-L) ;
drop Day14_Numeric Day2_Numeric ;
run;
Remember you use PUT statement or PUT() function with formats to convert values to text. And you use the INPUT statement or INPUT() function with informats to convert text to values.
Change the format to something like Best8.2:
data tst ;
set data (rename=(Day14=Day14_Character Day2=Day2_Character)) ;
Day14 = put(Day14_Character, best8.2) ;
Day2 = put(Day2_Character, best8.2) ;
drop Day14_Character Day2_Character ;
run;
Here is an example:
data test;
input r ;
datalines;
500.04
490.6
;
run;
data test1;
set test;
num1 = put(r, 8.2);
run;
If you do not want to specify the width and number of decimal points you can just use the BEST. informat and SAS will automatically assign the width and decimals based on the input data. However the length of the outcome variable may be large unless you specify it explicitly. This will still retain your numbers as in the original variable.

Create data set from macro variable SAS

I have a macro variable which stores a string of names, for example:
%let operation = add subtract divide multiply;
I wanted to transpose each element (to appear as observation) in the macro into a data set variable. So the data set should look like:
<obs> <operation>
<1> add
<2> subtract
<3> divide
<4> multiply
Use the SCAN() function. The default delimiters will work for your example, otherwise you can specify the exact delimiters to use.
%let operation= add subtract divide multiply;
data want ;
length obs 8 operation $20 ;
do obs=1 by 1 until (operation=' ');
operation=scan("&operation",obs);
if operation ne ' ' then output;
end;
run;
I still don't know enough about what you have and what you want. This example is contrived but may give you some help regarding syntax etc.
%let operation = add subtract multiply divide;
data operation;
length &operation 8;
array operation[*] &operation (2 3 10 4);
put 'NOTE: ' (operation[*])(=);
run;
*data set of names;
proc transpose data=operation(obs=0) out=names name=operation;
var &operation;
run;
proc print;
run;

Quotation mark SAS (+) PROC FORMAT value|invalue

I'm still stucked with SAS special characters treatment.
%macro mFormat();
%do i=1 %to &numVar. ;
proc format library = work ;
invalue $ inf&&nomVar&i..s
%do j=1 %to &&numMod&i.;
"%superq(tb&i.mod&j.)" = &j.
%end;
;
run;
proc format library = work ;
value f&&nomVar&i..s
%do k=1 %to &&numMod&i.;
&k. = "%superq(tb&i.mod&k.)"
%end;
;
run;
%end;
%mend mFormat;
%mFormat();
As you can see, the program supposes to create the format and the informats for each variable. My only problem is when the variable name resolves to Brand which contains
GOTAN-GOTAN
FRANCES-FRANCES
+&DECO-+DECO&
etc ...
These names leads me to this error
“ERROR: This range is repeated, or values overlap:”
I hope I can force SAS to read those names. Or perhaps, this is not the best approach to generate FORMATS and INFORMATS for variables that contain these characters( &, %, -, ', ").
Because your macro is using so many global macro variables, it's hard to see the problem. That error message indicates that your macro is genenerating duplicate ranges to PROC FORMAT. The complete error message should tell you which range is in error; if that is all you see, my guess is that more than more of your macro variables resolves to a blank.
There is no restriction on using hypens when defining PROC FORMAT ranges. I made up this little example to illustrate:
proc format library = work ;
invalue infs
'GOTAN-GOTAN' = 1
'FRANCES-FRANCES' = 2
'+&DECO-+DECO&' = 3;
value fs
1 = 'GOTAN-GOTAN'
2 = 'FRANCES-FRANCES'
3 = '+&DECO-+DECO&';
run;
data a;
test = 'FRANCES-FRANCES';
in_test = input(test,infs.);
put test= in_test= in_test= fs.;
run;
Although you may find some trick to solve your macro problem, I'd suggest you toss that out and use the CNTLIN option of PROC FORMAT to use a data set to create your custom formats and informats. That would certainly make things easier to maintain and might also help create some useful metadata for your project. Here is a simple example to create the same format and informat as above:
data fmt_defs;
length fmtname start label $32 type $1;
fmtname = 'INFS';
type = 'I';
start = 'GOTAN-GOTAN'; label = '1'; output;
start = 'FRANCES-FRANCES'; label = '2'; output;
start = '+&DECO-+DECO&'; label = '3'; output;
fmtname = 'FS';
type = 'N';
start = '1'; label='GOTAN-GOTAN'; output;
start = '2'; label='FRANCES-FRANCES'; output;
start = '3'; label='+&DECO-+DECO&'; output;
run;
proc format library = work cntLin=fmt_defs;
run;
You can find much more information about PROC FORMAT in the online documentation.
Good luck,
Bob
I think the hypen is the problem for the samples you provided. Maybe you could use a character replacement function to TRANSLATE the hyphen (or other problem characters) to something else like a space or underscore.
%Let Test=One-Two;
%Put &test;
%Let Test=%sysfunc(translate(&test,%str(_),%str(-)));
%Put &test;