load values from datasets into arrays and use them in a datastep - sas

I have 5 separate datasets(actually many more but i want to shorten the code) named dk33,dk34,dk35,dk51,dk63, each dataset contains a numeric field: surv_probs. I would like to load the values into 5 arrays and then use the arrays in a datastep(result), however, I need advice what is the best way to do it.
I am getting error when I use the macro: setarrays: (code below)
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
ERROR: Illegal reference to the array dk33_arr.
Here is the main code.
%let var1 = dk33;
%let var2 = dk34;
%let var3 = dk35;
%let var4 = dk51;
%let var5 = dk63;
%let varN = 5;
/*put length of each column into macro variables */
%macro getlength;
%do i=1 %to &varN;
proc sql noprint;
select count(surv_probs)
into : &&var&i.._rows
from work.&&var&i;
quit;
%end;
%mend;
/*load values of column:surv_probs into macro variables*/
%macro readin;
%do i=1 %to &varN;
proc sql noprint;
select surv_probs
into: &&var&i.._list separated by ","
from &&var&i;
quit;
%end;
%mend;
data _null_;
call execute('%readin');
call execute('%getlength');
run;
/* create arrays*/
%macro setarrays;
%do i=1 %to 1;
j=1;
array &&var&i.._arr{&&&&&&var&i.._rows};
do while(scan("&&&&&&var&i.._list",j,",") ne "");
&&var&i.._arr = scan("&&&&&&var&i.._list",j,",");
j=j+1;
end;
%end;
%mend;
data result;
%setarrays
put dk33_arr(1);
* some other statements where I use the arrays*
run;
Answer to toms question:
*macro getlength(when executed) creates 5 macro variables named: dk33_rows,dk34_rows,dk35_rows,dk51_rows,dk63_rows
*the macro readin(when executed):creates 5 macro variables dk33_list,dk34_list,dk35_list,dk51_list,dk63_list. Each containing a string which is comma separates the values from the column: eg.: 0.99994,0.1999,0.1111
*the macro setarrays creates 5 arrays,when executed, dk33_arr,dk34_arr,... holding the parsed values from the macro variables created by readin

I find that "macro arrays" like VAR1,VAR2,.... are generally more trouble than they are worth. Either keep your list of dataset names in an actual dataset and generate code from that. Or if the list is short enough put the list into a single macro variable and use %SCAN() to pull out the items as you need them.
But either way it is also better to avoid trying to write macro code that needs more than three &'s. Build up the reference in multiple steps. Build a macro variable that has the name of the macro you want to reference and then pull the value of that into another macro variable. It might take more lines of code, but you can more easily understand what is happening.
%let i=1 ;
%let mvarname=var&i;
%let dataset_name=&&&mvarname;
Before you begin using macro code (or other code generation techniques) make sure you know what code you are trying to generate. If you want to load a variable into a temporary array you can just use a DO loop. There is no need to macro code, or copying values, or even counts, into macro variables. For example instead of getting the count of the observations you could just make your temporary array larger than you expect to ever need.
data test1 ;
if _n_=1 then do;
do i=1 to nobs_dk33;
array dk33 (1000) _temporary_;
set dk33 nobs=nobs_dk33 ;
dk33(i)=surv_probs;
end;
do i=1 to nobs_dk34;
array dk34 (1000) _temporary_;
set dk34 nobs=nobs_dk34 ;
dk34(i)=surv_probs;
end;
end;
* What ever you are planning to do with the DK33 and DK34 arrays ;
run;
Or you could transpose the dataset first.
proc transpose data=dk33 out=dk33_t prefix=dk33_ ;
var surv_probs ;
run;
Then your later step is easier since you can just use a SET statement to read in the one observation that has all of the values.
data test;
if _n_=1 then do;
set dk33_t ;
array dk33 dk33_: ;
end;
....
run;

Related

Matching SAS character variables to a list

So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.

put values to a file using functions without creating new variables

I am processing a dataset, the contents of which I do not know in advance. My target SAS instance is 9.3, and I cannot use SQL as that has certain 'reserved' names (such as "user") that cannot be used as column names.
The puzzle looks like this:
data _null_;
set some.dataset; file somefile;
/* no problem can even apply formats */
put name age;
/* how to do this without making new vars? */
put somefunc(name) max(age);
run;
I can't put var1=somefunc(name); put var1; as that may clash with a source variable named var1.
I'm guessing the answer is to make some macro function that will read the dataset header and return me a "safe" (non-clashing) variable, or an fcmp function in a format, but I thought I'd check with the community to see - is there some "old school" way to outPUT directly from a function, in a data step?
Temporary array?
34 data _null_;
35 set sashelp.class;
36 array _n[*] _numeric_;
37 array _f[3] _temporary_;
38 put _n_ #;
39 do _n_ = 1 to dim(_f);
40 _f[_n_] = log(_n[_n_]);
41 put _f[_n_]= #;
42 end;
43 put ;
44 run;
1 _f[1]=2.6390573296 _f[2]=4.2341065046 _f[3]=4.7229532216
2 _f[1]=2.5649493575 _f[2]=4.0342406382 _f[3]=4.4308167988
3 _f[1]=2.5649493575 _f[2]=4.1789920363 _f[3]=4.5849674787
4 _f[1]=2.6390573296 _f[2]=4.1399550735 _f[3]=4.6298627986
5 _f[1]=2.6390573296 _f[2]=4.1510399059 _f[3]=4.6298627986
6 _f[1]=2.4849066498 _f[2]=4.0483006237 _f[3]=4.4188406078
7 _f[1]=2.4849066498 _f[2]=4.091005661 _f[3]=4.4367515344
8 _f[1]=2.7080502011 _f[2]=4.1351665567 _f[3]=4.7229532216
9 _f[1]=2.5649493575 _f[2]=4.1351665567 _f[3]=4.4308167988
The PUT statement does not accept a function invocation as a valid item for output.
A DATA step does not do columnar functions as you indicated with max(age) (so it would be even less likely to use such a function in PUT ;-)
Avoid name collisions
My recommendation is to use a variable name that is highly unlikely to collide.
_temp_001 = somefunc(<var>);
_temp_002 = somefunc2(<var2>);
put _temp_001 _temp_002;
drop _temp_:;
or
%let tempvar = _%sysfunc(rand(uniform, 1e15),z15.);
&tempvar = somefunc(<var>);
put &tempvar;
drop &tempvar;
%symdel tempvar;
Repurpose
You can re-purpose any automatic variable that is not important to the running step. Some omni-present candidates include:
numeric variables:
_n_
_iorc_
_threadid_
_nthreads_
first.<any-name> (only tweak after first. logic associated with BY statement)
last.<any-name>
character variables:
_infile_ (requires an empty datalines;)
_hostname_
avoid
_file_
_error_
I think you would be pretty safe choosing some unlikely to collide names. An easy way to generate these and still make the code somewhat readable would be to just hash a string to create a valid SAS varname and use a macro reference to make the code readable. Something like this:
%macro get_low_collision_varname(iSeed=);
%local try cnt result;
%let cnt = 0;
%let result = ;
%do %while ("&result" eq "");
%let try = %sysfunc(md5(&iSeed&cnt),hex32.);
%if %sysfunc(anyalpha(%substr(&try,1,1))) gt 0 %then %do;
%let result = &try;
%end;
%let cnt = %eval(&cnt + 1);
%end;
&result
%mend;
The above code takes a seed string and just adds a number to the end of it. It iterates the number until it gets a valid SAS varname as output from the md5() function. You could even then test the target dataset name to make sure the variable doesn't already exist. If it does build that logic into the above function.
Test it:
%let my_var = %get_low_collision_varname(iSeed=this shouldnt collide);
%put &my_var;
data _null_;
set sashelp.class;
&my_var = 1;
put _all_;
run;
Results:
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=1
Name=Alice Sex=F Age=13 Height=56.5 Weight=84 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=2
This doesn't specifically answer the question of how to achieve it without creating new varnames, but it does give a practical workaround.

SAS:how to use index to pick out macro array variable

I create a marco array using:
proc sql;
select distinct variable into:numVarList separated by ' ' from Map_num;
I used:
%put &numVarList{1};
and it gave me all variables:var1 var2 var3{1}
how to use index to pick out macro array variable?
update 20180305
it is strange that
%put &numVarList.;
then I got:age agenc_non_ccbt_fnd_bal chmtpd_tmpnt_bal crnyr_cnter_tdnum
%put %sysnc(scan(&numVarList.,1,str( )));
I got:age agnc_non_ccb
why?and how to fix it?
You do not create an array with your select. The result is just a string: var1 var2 var3
However you can access each element with the scan-function:
%let first_ele = %scan(&numVarList.,1,%str( ));
The result is: var1
You can also loop your string like this:
%do i=1 %to %sysfunc(countw(&numVarList.,%str( )));
%put %scan(&numVarList.,&i.,%str( ));
%end;
Concatenation of values
proc sql;
select distinct variable into:numVarList separated by ' ' from Map_num;
populates a single macro variable with a value, that can be construed as a list, which is a concatenation of the distinct values in the column named "variable".
For such a list you would scan out the individual items as shown by #zuluk.
In your case when the original values are names of variables, the resolution of the concatenation can be used directly as part of a SAS statement that accepts variable name lists, such as Proc PRINT; VAR &numVarList or DATA _NULL_; ARRAY v &numVarList
Macro array
The concept macro-array is simply a set of macro variables (which can be thought of as 'symbols' when too many 'variable' ideas are colliding) with a common basename and increasing numeric suffix. Such a set of macro variables is created by using a slightly different syntax in Proc SQL.
select distinct variable
into :symbol1-:symbol9999
from Map_num
The 9999 represents a large number that you do not expect to exceed. If the data has N <= 9999 rows then only N macro variable will be created. If N > 9999 rows only 9999 macro variables will be created. Caution: Too many macro variables can fill the macro symbol table and cause errors in your SAS. For me, Macro arrays are more a programming concept than a programming construct.
For example
Proc SQL noprint;
select name into :name1-:name9999 from sashelp.class;
%let name_count = &sqlobs;
quit;
%put NOTE: &=name1;
%put NOTE: &=name2;
%put NOTE: name&name_count=%superq(name&name_count); * almost same as next;
%put NOTE: name&name_count=&&name&name_count; * almost same as prev;
When dealing with the 'name' of the macro array in 1-level abstraction way, complete resolution is achieved by coding the 'tricky triple-hat' &&&
%macro log_macroArray (basename);
%local i count_symbol value_symbol;
%let count_symbol = &basename._count;
%do i = 1 %to &&&count_symbol;
%let value_symbol = &basename.&i;
%put NOTE: &value_symbol=&&&value_symbol;
%end;
%mend;
%log_macroArray(name);
The SAS macro system 'loops' internally during its value resolution phase and collapses the presence to && to & at each step of it's internal evaluation.
Building on #zuluk's answer, you cannot use an operator (like { }) to access a macro "array" since it's not a part of the language and it's not possible to overload operators in SAS... mostly ... but you can do a function-style macro easily.
proc sql;
select name into :namelist separated by ' '
from sashelp.class;
quit;
%macro marray(list, n);
%scan(&list.,&n.)
%mend marray;
%put %marray(&namelist,2);
That is pretty close to what you're looking for, just not quite the same syntax. If you then wanted to build new variables/etc., you could do so through the macro as well, though it might be more complicated to write a general macro given there are lots of ways you might want to do that. Here's a non-function-style version.
%macro m_to_array(list, n);
*optionally - if you want to not specify n;
%let n = %sysfunc(countw(&&&list));
%do _i = 1 %to &n;
%global &list.&_i.;
%let &list.&_i. = %scan(&&&list.,&_i.);
%end;
%mend m_to_array;
%m_to_array(namelist);
%put _global_;

Create SAS macro to create a macro variable

I have created a SAS macro, macro A, that takes in a variable name and returns transformed versions of that name i.e. if you run %A(asdf) you get out asdf_log asdf_exp asdf_10. I want to write another macro, macro B, that takes the output from the first macro and appends it together into a new macro variable.
%macro B(varList, outputName);
%let &outputName =
%A(var1);
%A(var2);
;
%mend
Is almost what I want to do, except that it obviously doesn't compile.
I am also not sure if this is possible in SAS.
As a further complication, the input to macro B is a list of variable that I want to run macro A for and append into one long list of variable names.
Why? Because I have a macro that runs on a list of variables and I want to run it on a transformed variable list.
Example:
I have %let varList = x y; and I want as an output x_log x_exp x_10 y_log y_exp y_10. To do this I want two macros one, macro A, that returns the transformed variables names:
%macro A(var);
&var._log
&var._exp
&var._10
%mend
I can't get the second macro (B as written above) to work properly.
So if the inner macro is just returning characters, that is it doesn't actually generate any non macro statements, then why not make the outer one work the same way?
%macro inner(x);
&x._log &x._exp &x._10
%mend;
%macro outer(list);
%local i;
%do i=1 %to %sysfunc(countw(&list));
%inner(%scan(&list,&i))
%end;
%mend outer;
%let want=%outer(X y Z);
This is not too hard. You need to loop over the values in varList, appending results to outputName. You also need to declare outputName as GLOBAL so it will be accessible outside %B
%macro B(varList, outputName);
%global &outputName;
%let &outputName = ;
%local i n var;
%let n = %sysfunc(countw(&varList));
%do i=1 %to &n;
%let var = %scan(&varList,&i);
%let &outputName = &outputName %A(&var);
%end;
%mend;

merge 6000 variables with the merge command line

I've got the following issue and I'm not sure on how to do this.
I'm trying to merge 6000 variables through the code below
Please find below the piece of code I've written for two of the variables
data big_aat_1;
merge Aat_1(rename=(var14=var14_t0 var28=var28_t_0))
Aat_2(rename=(var14=var14_t_1 var28=var28_t_1))
Aat_3(rename=(var14=var14_t_2 var28=var28_t_2))
Aat_4(rename=(var14=var14_t_3 var28=var28_t_3))
Aat_5(rename=(var14=var14_t_4 var28=var28_t_4))
Aat_6(rename=(var14=var14_t_5 var28=var28_t_5));
by nouv_date;
run;
My aim is to try to automate my piece of code for the 6000 variables I have and keep the way I'm doing it e.g. with the merge.
The result will all the variables would be like the one below. The ...represent the rest of the variables
data big_aat_1;
merge Aat_1(rename=(var14=var14_t0 var28=var28_t_0 var37=var37_t_0 ...))
Aat_2(rename=(var14=var14_t_1 var28=var28_t_1 var37=var37_t_1 ...))
Aat_3(rename=(var14=var14_t_2 var28=var28_t_2 var37=var37_t_2 ...))
Aat_4(rename=(var14=var14_t_3 var28=var28_t_3 var37=var37_t_3 ...))
Aat_5(rename=(var14=var14_t_4 var28=var28_t_4 var37=var37_t_4 ...))
Aat_6(rename=(var14=var14_t_5 var28=var28_t_5 var37=var37_t_5 ...));
by nouv_date;
run;
There are 2 things I need to state
1) I have a dataset / table that contains all the distinct variable names (e.g. var14, var28 ...). It would be great if I can use it. The name of the dataset is dicoAg
2) I need to keep the merge for some reasons I cannot talk about here.
If you have any insight
I started creating test data sets (you obviously already have them):
%MACRO P;
%DO I=1 %TO 6;
data aat_&I;
%DO J=1 %TO 6000;
var&J=&J;
%END;
nouv_date=1;output;
run;
%END;
%MEND;
%P;
and then I used proc contents to have a list of the variables (you can skip this step and use dicoAg):
proc contents data=aat_1 varnum out=vars;run;
and then you have sas write the rename code for you:
data _NULL_;
set vars /*dicoAg*/(where=(NAME^="nouv_date")) end=fine;
file "MyPath\Rename.sas";
if _N_=1 then do;
put '%MACRO RENAME(J=); ';
put '(rename=( ';
end;
/*intead of NAME use the variable in dicoAg which contains all the variables' names*/
put ' ' NAME '=' NAME +(-1) '_&J';
if fine then do;
put ' )) ';
put '%MEND; ';
end;
run;
you include the code:
%include "MyPath\Rename.sas";
and at the end you write the macro to do the merge:
%MACRO P;
data big_aat_1;
merge
%DO D=1 %TO 6;
aat_&D. %RENAME(J=&D)
%END;
;
by nouv_date;
run;
%MEND;
%P;
Everyone,
Without going into full details, my man-a and I did that
data big_aat_1;
merge %do j=1 %to 6 ; Aat_&j(rename=(%do i=1 %to &&&&nvar&&pays&l ; &&&&var&&pays&l.._&i=&&&&var&&pays&l.._&i.._t%eval(&j-1) %end ; )) %end ; ;
by nouv_date;
run;
Not perfect nor superbly efficient but doing the trick.
Explanation :
&&&&nvar&&pays&lis the max number of variables
&&&&var&&pays&l.._&iis the variable
The results will give you something like this
merge Aat_1(rename=( var1=var1_t0 var31=var31_t0 var60=var60_t0 var90=var90_t0 var119=var119_t0 ...
Aat_6(rename=( var1=var1_t5 var31=var31_t5 var60=var60_t5 var90=var90_t5 var119=var119_t5...
Best.