put values to a file using functions without creating new variables - sas

I am processing a dataset, the contents of which I do not know in advance. My target SAS instance is 9.3, and I cannot use SQL as that has certain 'reserved' names (such as "user") that cannot be used as column names.
The puzzle looks like this:
data _null_;
set some.dataset; file somefile;
/* no problem can even apply formats */
put name age;
/* how to do this without making new vars? */
put somefunc(name) max(age);
run;
I can't put var1=somefunc(name); put var1; as that may clash with a source variable named var1.
I'm guessing the answer is to make some macro function that will read the dataset header and return me a "safe" (non-clashing) variable, or an fcmp function in a format, but I thought I'd check with the community to see - is there some "old school" way to outPUT directly from a function, in a data step?

Temporary array?
34 data _null_;
35 set sashelp.class;
36 array _n[*] _numeric_;
37 array _f[3] _temporary_;
38 put _n_ #;
39 do _n_ = 1 to dim(_f);
40 _f[_n_] = log(_n[_n_]);
41 put _f[_n_]= #;
42 end;
43 put ;
44 run;
1 _f[1]=2.6390573296 _f[2]=4.2341065046 _f[3]=4.7229532216
2 _f[1]=2.5649493575 _f[2]=4.0342406382 _f[3]=4.4308167988
3 _f[1]=2.5649493575 _f[2]=4.1789920363 _f[3]=4.5849674787
4 _f[1]=2.6390573296 _f[2]=4.1399550735 _f[3]=4.6298627986
5 _f[1]=2.6390573296 _f[2]=4.1510399059 _f[3]=4.6298627986
6 _f[1]=2.4849066498 _f[2]=4.0483006237 _f[3]=4.4188406078
7 _f[1]=2.4849066498 _f[2]=4.091005661 _f[3]=4.4367515344
8 _f[1]=2.7080502011 _f[2]=4.1351665567 _f[3]=4.7229532216
9 _f[1]=2.5649493575 _f[2]=4.1351665567 _f[3]=4.4308167988

The PUT statement does not accept a function invocation as a valid item for output.
A DATA step does not do columnar functions as you indicated with max(age) (so it would be even less likely to use such a function in PUT ;-)
Avoid name collisions
My recommendation is to use a variable name that is highly unlikely to collide.
_temp_001 = somefunc(<var>);
_temp_002 = somefunc2(<var2>);
put _temp_001 _temp_002;
drop _temp_:;
or
%let tempvar = _%sysfunc(rand(uniform, 1e15),z15.);
&tempvar = somefunc(<var>);
put &tempvar;
drop &tempvar;
%symdel tempvar;
Repurpose
You can re-purpose any automatic variable that is not important to the running step. Some omni-present candidates include:
numeric variables:
_n_
_iorc_
_threadid_
_nthreads_
first.<any-name> (only tweak after first. logic associated with BY statement)
last.<any-name>
character variables:
_infile_ (requires an empty datalines;)
_hostname_
avoid
_file_
_error_

I think you would be pretty safe choosing some unlikely to collide names. An easy way to generate these and still make the code somewhat readable would be to just hash a string to create a valid SAS varname and use a macro reference to make the code readable. Something like this:
%macro get_low_collision_varname(iSeed=);
%local try cnt result;
%let cnt = 0;
%let result = ;
%do %while ("&result" eq "");
%let try = %sysfunc(md5(&iSeed&cnt),hex32.);
%if %sysfunc(anyalpha(%substr(&try,1,1))) gt 0 %then %do;
%let result = &try;
%end;
%let cnt = %eval(&cnt + 1);
%end;
&result
%mend;
The above code takes a seed string and just adds a number to the end of it. It iterates the number until it gets a valid SAS varname as output from the md5() function. You could even then test the target dataset name to make sure the variable doesn't already exist. If it does build that logic into the above function.
Test it:
%let my_var = %get_low_collision_varname(iSeed=this shouldnt collide);
%put &my_var;
data _null_;
set sashelp.class;
&my_var = 1;
put _all_;
run;
Results:
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=1
Name=Alice Sex=F Age=13 Height=56.5 Weight=84 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=2
This doesn't specifically answer the question of how to achieve it without creating new varnames, but it does give a practical workaround.

Related

Matching SAS character variables to a list

So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.

load values from datasets into arrays and use them in a datastep

I have 5 separate datasets(actually many more but i want to shorten the code) named dk33,dk34,dk35,dk51,dk63, each dataset contains a numeric field: surv_probs. I would like to load the values into 5 arrays and then use the arrays in a datastep(result), however, I need advice what is the best way to do it.
I am getting error when I use the macro: setarrays: (code below)
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
ERROR: Illegal reference to the array dk33_arr.
Here is the main code.
%let var1 = dk33;
%let var2 = dk34;
%let var3 = dk35;
%let var4 = dk51;
%let var5 = dk63;
%let varN = 5;
/*put length of each column into macro variables */
%macro getlength;
%do i=1 %to &varN;
proc sql noprint;
select count(surv_probs)
into : &&var&i.._rows
from work.&&var&i;
quit;
%end;
%mend;
/*load values of column:surv_probs into macro variables*/
%macro readin;
%do i=1 %to &varN;
proc sql noprint;
select surv_probs
into: &&var&i.._list separated by ","
from &&var&i;
quit;
%end;
%mend;
data _null_;
call execute('%readin');
call execute('%getlength');
run;
/* create arrays*/
%macro setarrays;
%do i=1 %to 1;
j=1;
array &&var&i.._arr{&&&&&&var&i.._rows};
do while(scan("&&&&&&var&i.._list",j,",") ne "");
&&var&i.._arr = scan("&&&&&&var&i.._list",j,",");
j=j+1;
end;
%end;
%mend;
data result;
%setarrays
put dk33_arr(1);
* some other statements where I use the arrays*
run;
Answer to toms question:
*macro getlength(when executed) creates 5 macro variables named: dk33_rows,dk34_rows,dk35_rows,dk51_rows,dk63_rows
*the macro readin(when executed):creates 5 macro variables dk33_list,dk34_list,dk35_list,dk51_list,dk63_list. Each containing a string which is comma separates the values from the column: eg.: 0.99994,0.1999,0.1111
*the macro setarrays creates 5 arrays,when executed, dk33_arr,dk34_arr,... holding the parsed values from the macro variables created by readin
I find that "macro arrays" like VAR1,VAR2,.... are generally more trouble than they are worth. Either keep your list of dataset names in an actual dataset and generate code from that. Or if the list is short enough put the list into a single macro variable and use %SCAN() to pull out the items as you need them.
But either way it is also better to avoid trying to write macro code that needs more than three &'s. Build up the reference in multiple steps. Build a macro variable that has the name of the macro you want to reference and then pull the value of that into another macro variable. It might take more lines of code, but you can more easily understand what is happening.
%let i=1 ;
%let mvarname=var&i;
%let dataset_name=&&&mvarname;
Before you begin using macro code (or other code generation techniques) make sure you know what code you are trying to generate. If you want to load a variable into a temporary array you can just use a DO loop. There is no need to macro code, or copying values, or even counts, into macro variables. For example instead of getting the count of the observations you could just make your temporary array larger than you expect to ever need.
data test1 ;
if _n_=1 then do;
do i=1 to nobs_dk33;
array dk33 (1000) _temporary_;
set dk33 nobs=nobs_dk33 ;
dk33(i)=surv_probs;
end;
do i=1 to nobs_dk34;
array dk34 (1000) _temporary_;
set dk34 nobs=nobs_dk34 ;
dk34(i)=surv_probs;
end;
end;
* What ever you are planning to do with the DK33 and DK34 arrays ;
run;
Or you could transpose the dataset first.
proc transpose data=dk33 out=dk33_t prefix=dk33_ ;
var surv_probs ;
run;
Then your later step is easier since you can just use a SET statement to read in the one observation that has all of the values.
data test;
if _n_=1 then do;
set dk33_t ;
array dk33 dk33_: ;
end;
....
run;

rename SAS variables in reverse order using do loops

I have 10 variables (var1-var10), which I need to rename var10-var1 in SAS. So basically I need var10 to be renamed var1, var9 var2, var8 var3, and so on.
This is the code that I used based on this paper, http://analytics.ncsu.edu/sesug/2005/PS06_05.PDF:
%macro new;
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%do j=1 %to 10 %by 1;
var.&i=var.&j
%end;
%end;
;
%mend new;
%new;
The problem I'm having is that it only renames var1 as var10, so the last iteration in the do-loop.
Thanks in advance for any help!
Emily
You really don't need to do that, you can rename variable with list references, especially if they've been named sequentially.
ie:
rename var1-var10 = var10-var1;
Here's a test that demonstrates this:
data check;
array var(10) var1-var10 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
output;
run;
data want;
set check;
rename var1-var10 = var10-var1;
run;
If you do need to do it manually for some reason, then you need two arrays. Once you've assigned the variable you've lost the old variable so you can't access it anymore. So you need some sort of temporary array to hold the new values.
While Reeza's answer is correct, it's probably worth going through why your method didn't work - which is another reasonable, if convoluted, way to do it.
First off, you have some minor syntax issues, such as a misplaced semicolon, periods in the wrong places (They end macro variable names, not begin them), and a missing run statement; we'll ignore those and fix them as we change the code.
Second, you have two nested loops, when you don't really want that. You don't want to do the inner code 10 times (once per iteration of j) for each iteration of i (so 100 times total); you want to do the inner code once for each iteration of both i and j.
Let's see what this fix, then, gives us:
data temp;
array var[10];
do _n_ = 1 to 15;
do _i = 1 to 10;
var[_i] = _i;
end;
output;
end;
drop _i;
run;
%macro new();
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
var&i.=var&j.;
%end;
run;
%mend new;
%new();
Okay, so this now does something closer to what you want; but you have an issue, right? You lose the values for the second half (well, really the first half since you use %by -1) since they're not stored in a separate place.
You could do this by having a temporary dumping area where you stage the original variables, allowing you to simultaneously change the values and access the original. A common array-based method (rather than macro based) works this way. Here's how it would look like in a macro.
%macro new();
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
_var&i. = var&i.;
var&i.=coalesce(_var&j., var&j.);
%end;
drop _:;
run;
%mend new;
We use coalesce() which returns the first nonmissing argument; for the first five iterations it uses var&j. but the second five iterations use _var&j. instead. Rather than use this function you could also just prepopulate the variable.
A much better option though is to use rename, as Reeza does in the above answer, but presented here with something more like your original answer:
%macro new();
data temp_one;
set temp;
rename
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
var&i.=var&j.
%end;
;
run;
%mend new;
This works because rename does not actually move things around - it just sets the value of "please write this value out to _____ variable on output" to something different.
This is actually what the author in the linked paper proposes, and I suspect you just missed the rename bit. That's why you have the single semicolon after the whole thing (since it's just one rename statement, so just one ; ) rather than individual semicolons after each iteration (as you'd need with assignment).

Put list of values into several macro variables in SAS

I have a list of values defined in a macro variable, e.g.,
%let datelist = 20100614 20120309 20151215;
Now, I want to put these values into the corresponding number of macro variables. In this case, I want to put them into Date1, Date2, Date3.
Of course, I could manually type out:
%let Date1 = 20100614;
%let Date2 = 20120309;
%let Date3 = 20151215;
How can I do that in a dynamic way so that if there were 25 dates, or 2, it would still work?
Ok, I'll suggest a data step, because I prefer that over macro loops any day.
Use COUNTW() to count the number of loops required and use CALL SYMPUTX to create the macro variables. You should look into the third parameter of the function if you want to control the scope of the macro variable.
%let datelist = 20100614 20120309 20151215;
data _null_;
word = "&datelist";
n=countw(word);
do i=1 to n;
call symputx('date'||Put(i, 8. -l), scan(word, i));
end;
run;
%put &date1.;
%put &date2.;
%put &date3.;

Find three most recent data year for each row

I have a data set with one row for each country and 100 columns (10 variables with 10 data years each).
For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive).
This is what I have so far, but I know its wrong because of the nest loop, and its has same value for recent1 recent2 recent3 however I haven't figured out how to create recent1 recent2 recent3 without two loops.
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004 -- MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
%let rc = 1;
%do i = 2013 %to 2004 %by -1;
%do rc = 1 %to 3 %by 1;
%if MATERNAL_CARE_&i. ne . %then %do;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
%end;
%end; run; %mend; %test();
You don't need to use a macro to do this - just some arrays:
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_2004-MATERNAL_CARE_2013 recent_1 recent_2 recent_3;
array mc {*} MATERNAL_CARE_2004-MATERNAL_CARE_2013;
array recent {*} recent1-recent3;
do i = 2013 to 2004 by -1;
do rc = 1 to 3 by 1;
if mc[i] ne . then do;
recent[rc] = mc[i];
end;
end;
run;
Maybe I don't get your request, but according to your description:
"For each variable I am trying to make a new data set with the three most recent data years for that variable for each country (which might not be successive)" I created this sample dataset with dt1 and dt2 and 2 locations.
The output will be 2 datasets (and generally the number of the variables starting with DT) named DS1 and DS2 with 3 observations for each country, the first one for the first variable, the second one for the second variable.
This is the sample dataset:
data sample_ds;
length city $10 dt1 dt2 8.;
infile datalines dlm=',';
input city $ dt1 dt2;
datalines;
MS,5,0
MS,3,9
MS,3,9
MS,2,0
MS,1,8
MS,1,7
CA,6,1
CA,6,.
CA,6,.
CA,2,8
CA,1,5
CA,0,4
;
This is the sample macro:
%macro help(ds=);
data vars(keep=dt:); set &ds; if _n_ not >0; run;
%let op = %sysfunc(open(vars));
%let nvrs = %sysfunc(attrn(&op,nvars));
%let cl = %sysfunc(close(&op));
%do idx=1 %to &nvrs.;
proc sort data=&ds(keep=city dt&idx.) out=ds&idx.(where=(dt&idx. ne .)) nodupkey; by city DESCENDING dt&idx.; run;
data ds&idx.; set ds&idx.;
retain cnt;
by city DESCENDING dt&idx.;
if first.city then cnt=0; else cnt=cnt+1;
run;
data ds&idx.(drop=cnt); set ds&idx.(where=(cnt<3)); rename dt&idx.=act&idx.; run;
%end;
%mend;
You will run this macro with:
%help(ds=sample_ds);
In the first statement of the macro I select the variables on which I want to iterate:
data vars(keep=dt:); set &ds; if _n_ not >0; run;
Work on this if you want to make this work for your code, or simply rename your variables as DT1 DT2...
Let me know if it is correct for you.
When writing macro code, always keep in mind what has to be done when. SAS processes your code stepwise.
Before your sas code is even compiled, your macro variables are resolved and your macro code is executed
Then the resulting SAS Base code is compiled
Finally the code is executed.
When you write %if MATERNAL_CARE_&i. ne . %then %do, this is macro code interpreded before compilation.
At that time MATERNAL_CARE_&i. is not a variable but a text string containing a macro variable.
The first time you run trhough your %do i = 2013 %to 2004 by -1, it is filled in as MATERNAL_CARE_2013, the second as MATERNAL_CARE_2012., etc.
Then the macro %if statement is interpreted, and as the text string MATERNAL_CARE_1 is not equal to a dot, it is evaluated to FALSE
and recent_&rc. = MATERNAL_CARE_&i. is not included in the code to pass to your compiler.
You can see that if you run your code with option mprint;
The resolution;
options mprint;
%macro test();
data Maternal_care_recent;
set wb;
keep country MATERNAL_CARE_: recent_:;
** The : acts as a wild card here **;
%do i = 2013 %to 2004 %by -1;
if MATERNAL_CARE_&i. ne . then do;
%do rc = 1 %to 3 %by 1;
recent_&rc. = MATERNAL_CARE_&i.;
%end;
end;
%end;
run;
%mend;
%test();
Now, before compilation of if MATERNAL_CARE_&i. ne . then do, only the &i. is evalueated and if MATERNAL_CARE_2013 ne . then do is passed to the compiler.
The compiler will see this as a test if the SAS variable MATERNAL_CARE_1 has value missing, and that is just what you wanted;
Remark:
It is not essential that I moved the if statement above the ``. It is just more efficient because the condition is then evaluated less often.
It is however essential that you close your %ifs and %dos with an %end and your ifs and dos with an end;
Remark:
you do not need %let rc = 1, because %do rc = 1 to 3 already initialises &rc.;
For completeness SAS is compiled stepwise:
The next PROC or data step and its macro code are only considered when the preveous one is executed.
That is why you can write macro variables from a data step or sql select into that will influence the code you compile in your next step,
somehting you can not do for instance with C++ pre compilation;
Thanks everyone. Found a hybrid solution from a few solutions posted.
data sample_ds;
infile datalines dlm=',';
input country $ maternal_2004 maternal_2005
maternal_2006 maternal_2007 maternal_2008 maternal_2009 maternal_2010 maternal_2011 maternal_2012 maternal_2013;
datalines;
MS,5,0,5,0,5,.,5,.,5,.
MW,3,9,5,0,5,0,5,.,5,0
WE,3,9,5,0,5,.,.,.,.,0
HU,2,0,5,.,5,.,5,0,5,0
MI,1,8,5,0,5,0,5,.,5,0
HJ,1,7,5,0,5,0,.,0,.,0
CJ,6,1,5,0,5,0,5,0,5,0
CN,6,1,.,5,0,5,0,5,0,5
CE,6,5,0,5,0,.,0,5,.,8
CT,2,5,0,5,0,5,0,5,0,9
CW,1,5,0,5,0,5,.,.,0,7
CH,0,5,0,5,0,.,0,.,0,5
;
%macro test(var);
data &var._recent;
set sample_ds;
keep country &var._1 &var._2 &var._3;
array mc {*} &var._2004-&var._2013;
array recent {*} &var._1-&var._25;
count=1;
do i = 10 to 1 by -1;
if mc[i] ne . then do;
recent[count] = mc[i];
count=count+1;
end;
end;
run;
%mend;