I have 10 variables (var1-var10), which I need to rename var10-var1 in SAS. So basically I need var10 to be renamed var1, var9 var2, var8 var3, and so on.
This is the code that I used based on this paper, http://analytics.ncsu.edu/sesug/2005/PS06_05.PDF:
%macro new;
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%do j=1 %to 10 %by 1;
var.&i=var.&j
%end;
%end;
;
%mend new;
%new;
The problem I'm having is that it only renames var1 as var10, so the last iteration in the do-loop.
Thanks in advance for any help!
Emily
You really don't need to do that, you can rename variable with list references, especially if they've been named sequentially.
ie:
rename var1-var10 = var10-var1;
Here's a test that demonstrates this:
data check;
array var(10) var1-var10 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
output;
run;
data want;
set check;
rename var1-var10 = var10-var1;
run;
If you do need to do it manually for some reason, then you need two arrays. Once you've assigned the variable you've lost the old variable so you can't access it anymore. So you need some sort of temporary array to hold the new values.
While Reeza's answer is correct, it's probably worth going through why your method didn't work - which is another reasonable, if convoluted, way to do it.
First off, you have some minor syntax issues, such as a misplaced semicolon, periods in the wrong places (They end macro variable names, not begin them), and a missing run statement; we'll ignore those and fix them as we change the code.
Second, you have two nested loops, when you don't really want that. You don't want to do the inner code 10 times (once per iteration of j) for each iteration of i (so 100 times total); you want to do the inner code once for each iteration of both i and j.
Let's see what this fix, then, gives us:
data temp;
array var[10];
do _n_ = 1 to 15;
do _i = 1 to 10;
var[_i] = _i;
end;
output;
end;
drop _i;
run;
%macro new();
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
var&i.=var&j.;
%end;
run;
%mend new;
%new();
Okay, so this now does something closer to what you want; but you have an issue, right? You lose the values for the second half (well, really the first half since you use %by -1) since they're not stored in a separate place.
You could do this by having a temporary dumping area where you stage the original variables, allowing you to simultaneously change the values and access the original. A common array-based method (rather than macro based) works this way. Here's how it would look like in a macro.
%macro new();
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
_var&i. = var&i.;
var&i.=coalesce(_var&j., var&j.);
%end;
drop _:;
run;
%mend new;
We use coalesce() which returns the first nonmissing argument; for the first five iterations it uses var&j. but the second five iterations use _var&j. instead. Rather than use this function you could also just prepopulate the variable.
A much better option though is to use rename, as Reeza does in the above answer, but presented here with something more like your original answer:
%macro new();
data temp_one;
set temp;
rename
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
var&i.=var&j.
%end;
;
run;
%mend new;
This works because rename does not actually move things around - it just sets the value of "please write this value out to _____ variable on output" to something different.
This is actually what the author in the linked paper proposes, and I suspect you just missed the rename bit. That's why you have the single semicolon after the whole thing (since it's just one rename statement, so just one ; ) rather than individual semicolons after each iteration (as you'd need with assignment).
Related
I am processing a dataset, the contents of which I do not know in advance. My target SAS instance is 9.3, and I cannot use SQL as that has certain 'reserved' names (such as "user") that cannot be used as column names.
The puzzle looks like this:
data _null_;
set some.dataset; file somefile;
/* no problem can even apply formats */
put name age;
/* how to do this without making new vars? */
put somefunc(name) max(age);
run;
I can't put var1=somefunc(name); put var1; as that may clash with a source variable named var1.
I'm guessing the answer is to make some macro function that will read the dataset header and return me a "safe" (non-clashing) variable, or an fcmp function in a format, but I thought I'd check with the community to see - is there some "old school" way to outPUT directly from a function, in a data step?
Temporary array?
34 data _null_;
35 set sashelp.class;
36 array _n[*] _numeric_;
37 array _f[3] _temporary_;
38 put _n_ #;
39 do _n_ = 1 to dim(_f);
40 _f[_n_] = log(_n[_n_]);
41 put _f[_n_]= #;
42 end;
43 put ;
44 run;
1 _f[1]=2.6390573296 _f[2]=4.2341065046 _f[3]=4.7229532216
2 _f[1]=2.5649493575 _f[2]=4.0342406382 _f[3]=4.4308167988
3 _f[1]=2.5649493575 _f[2]=4.1789920363 _f[3]=4.5849674787
4 _f[1]=2.6390573296 _f[2]=4.1399550735 _f[3]=4.6298627986
5 _f[1]=2.6390573296 _f[2]=4.1510399059 _f[3]=4.6298627986
6 _f[1]=2.4849066498 _f[2]=4.0483006237 _f[3]=4.4188406078
7 _f[1]=2.4849066498 _f[2]=4.091005661 _f[3]=4.4367515344
8 _f[1]=2.7080502011 _f[2]=4.1351665567 _f[3]=4.7229532216
9 _f[1]=2.5649493575 _f[2]=4.1351665567 _f[3]=4.4308167988
The PUT statement does not accept a function invocation as a valid item for output.
A DATA step does not do columnar functions as you indicated with max(age) (so it would be even less likely to use such a function in PUT ;-)
Avoid name collisions
My recommendation is to use a variable name that is highly unlikely to collide.
_temp_001 = somefunc(<var>);
_temp_002 = somefunc2(<var2>);
put _temp_001 _temp_002;
drop _temp_:;
or
%let tempvar = _%sysfunc(rand(uniform, 1e15),z15.);
&tempvar = somefunc(<var>);
put &tempvar;
drop &tempvar;
%symdel tempvar;
Repurpose
You can re-purpose any automatic variable that is not important to the running step. Some omni-present candidates include:
numeric variables:
_n_
_iorc_
_threadid_
_nthreads_
first.<any-name> (only tweak after first. logic associated with BY statement)
last.<any-name>
character variables:
_infile_ (requires an empty datalines;)
_hostname_
avoid
_file_
_error_
I think you would be pretty safe choosing some unlikely to collide names. An easy way to generate these and still make the code somewhat readable would be to just hash a string to create a valid SAS varname and use a macro reference to make the code readable. Something like this:
%macro get_low_collision_varname(iSeed=);
%local try cnt result;
%let cnt = 0;
%let result = ;
%do %while ("&result" eq "");
%let try = %sysfunc(md5(&iSeed&cnt),hex32.);
%if %sysfunc(anyalpha(%substr(&try,1,1))) gt 0 %then %do;
%let result = &try;
%end;
%let cnt = %eval(&cnt + 1);
%end;
&result
%mend;
The above code takes a seed string and just adds a number to the end of it. It iterates the number until it gets a valid SAS varname as output from the md5() function. You could even then test the target dataset name to make sure the variable doesn't already exist. If it does build that logic into the above function.
Test it:
%let my_var = %get_low_collision_varname(iSeed=this shouldnt collide);
%put &my_var;
data _null_;
set sashelp.class;
&my_var = 1;
put _all_;
run;
Results:
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=1
Name=Alice Sex=F Age=13 Height=56.5 Weight=84 C34FD80ED9E856160E59FCEBF37F00D2=1 _ERROR_=0 _N_=2
This doesn't specifically answer the question of how to achieve it without creating new varnames, but it does give a practical workaround.
I have 5 separate datasets(actually many more but i want to shorten the code) named dk33,dk34,dk35,dk51,dk63, each dataset contains a numeric field: surv_probs. I would like to load the values into 5 arrays and then use the arrays in a datastep(result), however, I need advice what is the best way to do it.
I am getting error when I use the macro: setarrays: (code below)
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
WARNING: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation
marks.
ERROR: Illegal reference to the array dk33_arr.
Here is the main code.
%let var1 = dk33;
%let var2 = dk34;
%let var3 = dk35;
%let var4 = dk51;
%let var5 = dk63;
%let varN = 5;
/*put length of each column into macro variables */
%macro getlength;
%do i=1 %to &varN;
proc sql noprint;
select count(surv_probs)
into : &&var&i.._rows
from work.&&var&i;
quit;
%end;
%mend;
/*load values of column:surv_probs into macro variables*/
%macro readin;
%do i=1 %to &varN;
proc sql noprint;
select surv_probs
into: &&var&i.._list separated by ","
from &&var&i;
quit;
%end;
%mend;
data _null_;
call execute('%readin');
call execute('%getlength');
run;
/* create arrays*/
%macro setarrays;
%do i=1 %to 1;
j=1;
array &&var&i.._arr{&&&&&&var&i.._rows};
do while(scan("&&&&&&var&i.._list",j,",") ne "");
&&var&i.._arr = scan("&&&&&&var&i.._list",j,",");
j=j+1;
end;
%end;
%mend;
data result;
%setarrays
put dk33_arr(1);
* some other statements where I use the arrays*
run;
Answer to toms question:
*macro getlength(when executed) creates 5 macro variables named: dk33_rows,dk34_rows,dk35_rows,dk51_rows,dk63_rows
*the macro readin(when executed):creates 5 macro variables dk33_list,dk34_list,dk35_list,dk51_list,dk63_list. Each containing a string which is comma separates the values from the column: eg.: 0.99994,0.1999,0.1111
*the macro setarrays creates 5 arrays,when executed, dk33_arr,dk34_arr,... holding the parsed values from the macro variables created by readin
I find that "macro arrays" like VAR1,VAR2,.... are generally more trouble than they are worth. Either keep your list of dataset names in an actual dataset and generate code from that. Or if the list is short enough put the list into a single macro variable and use %SCAN() to pull out the items as you need them.
But either way it is also better to avoid trying to write macro code that needs more than three &'s. Build up the reference in multiple steps. Build a macro variable that has the name of the macro you want to reference and then pull the value of that into another macro variable. It might take more lines of code, but you can more easily understand what is happening.
%let i=1 ;
%let mvarname=var&i;
%let dataset_name=&&&mvarname;
Before you begin using macro code (or other code generation techniques) make sure you know what code you are trying to generate. If you want to load a variable into a temporary array you can just use a DO loop. There is no need to macro code, or copying values, or even counts, into macro variables. For example instead of getting the count of the observations you could just make your temporary array larger than you expect to ever need.
data test1 ;
if _n_=1 then do;
do i=1 to nobs_dk33;
array dk33 (1000) _temporary_;
set dk33 nobs=nobs_dk33 ;
dk33(i)=surv_probs;
end;
do i=1 to nobs_dk34;
array dk34 (1000) _temporary_;
set dk34 nobs=nobs_dk34 ;
dk34(i)=surv_probs;
end;
end;
* What ever you are planning to do with the DK33 and DK34 arrays ;
run;
Or you could transpose the dataset first.
proc transpose data=dk33 out=dk33_t prefix=dk33_ ;
var surv_probs ;
run;
Then your later step is easier since you can just use a SET statement to read in the one observation that has all of the values.
data test;
if _n_=1 then do;
set dk33_t ;
array dk33 dk33_: ;
end;
....
run;
I have the following variables: A_Bldg B_Bldg C_Bldg D_Bldg. I want to multiply them by INTSF and store the result in a new variable, Sale_i. For example, A_Bldg * INTSF = Sale_A, B_Bldg * INTSF = Sale_B, and so on.
My code is:
%macro loopit(mylist);
%let n=%sysfunc(countw(&mylist));
%do J = 1 %to &n;
%let i = %scan(&mylist,&J);
data test;
set data;
sale_&i. = &i._Bldg * INTSF;
run;
%end;
%mend;
%let list = A B C D;
%loopit(&list);
This only produces Sale_D, which is the last letter in the list. How do I get Sales A-C to appear? The first four lines of code are so I can loop through the text A-D. I thought about doing it with arrays, but didn't know how to choose the variables based on the A-D indicators. Thanks for your help!
You're currently looping through your list and recreating the test dataset every time, so it only appears to have sale_d because you're only viewing the last iteration.
You can clean up your loop by scanning through your list in one data step to solve your problem:
%let list = A B C D;
%macro loopit;
data test;
set data;
%do i = 1 %to %sysfunc(countw(&list.));
%let this_letter = %scan(&list., &i.);
sale_&this_letter. = &this_letter._Bldg * INTSF;
%end;
run;
%mend loopit;
%loopit;
Your %DO loop is in the wrong place. But really you do not need to use macro code to do something that the native SAS code can already do.
data want;
set have ;
array in A_Bldg B_Bldg C_Bldg D_Bldg ;
array out sale_1-sale4 ;
do i=1 to dim(in);
out(i)=intsf*in(i);
end;
run;
Is it possible to repeat a data step a number of times (like you might in a %do-%while loop) where the number of repetitions depends on the result of the data step?
I have a data set with numeric variables A. I calculate a new variable result = min(1, A). I would like the average value of result to equal a target and I can get there by scaling variable A by a constant k. That is solve for k where target = average(min(1,A*k)) - where k and target are constants and A is a list.
Here is what I have so far:
filename f0 'C:\Data\numbers.csv';
filename f1 'C:\Data\target.csv';
data myDataSet;
infile f0 dsd dlm=',' missover firstobs=2;
input A;
init_A = A; /* store the initial value of A */
run;
/* read in the target value (1 observation) */
data targets;
infile f1 dsd dlm=',' missover firstobs=2;
input target;
K = 1; * initialise the constant K;
run;
%macro iteration; /* I need to repeat this macro a number of times */
data myDataSet;
retain key 1;
set myDataSet;
set targets point=key;
A = INIT_A * K; /* update the value of A /*
result = min(1, A);
run;
/* calculate average result */
proc sql;
create table estimate as
select avg(result) as estimate0
from myDataSet;
quit;
/* compare estimate0 to target and update K */
data targets;
set targets;
set estimate;
K = K * (target / estimate0);
run;
%mend iteration;
I can get the desired answer by running %iteration a few times, but Ideally I would like to run the iteration until (target - estimate0 < 0.01). Is such a thing possible?
Thanks!
I had a similar problem to this just the other day. The below approach is what I used, you will need to change the loop structure from a for loop to a do while loop (or whatever suits your purposes):
First perform an initial scan of the table to figure out your loop termination conditions and get the number of rows in the table:
data read_once;
set sashelp.class end=eof;
if eof then do;
call symput('number_of_obs', cats(_n_) );
call symput('number_of_times_to_loop', cats(3) );
end;
run;
Make sure results are as expected:
%put &=number_of_obs;
%put &=number_of_times_to_loop;
Loop over the source table again multiple times:
data final;
do loop=1 to &number_of_times_to_loop;
do row=1 to &number_of_obs;
set sashelp.class point=row;
output;
end;
end;
stop; * REQUIRED BECAUSE WE ARE USING POINT=;
run;
Two part answer.
First, it's certainly possible to do what you say. There are some examples of code that works like this available online, if you want a working, useful-code example of iterative macros; for example, David Izrael's seminal Rakinge macro, which performs a rimweighting procedure by iterating over a relatively simple process (proc freqs, basically). This is pretty similar to what you're doing. In the process it looks in the datastep at the various termination criteria, and outputs a macro variable that is the total number of criteria met (as each stratification variable separately needs to meet the termination criterion). It then checks %if that criterion is met, and terminates if so.
The core of this is two things. First, you should have a fixed maximum number of iterations, unless you like infinite loops. That number should be larger than the largest reasonable number you should ever need, often by around a factor of two. Second, you need convergence criteria such that you can terminate the loop if they're met.
For example:
data have;
x=5;
run;
%macro reduce(data=, var=, amount=, target=, iter=20);
data want;
set have;
run;
%let calc=.;
%let _i=0;
%do %until (&calc.=&target. or &_i.=&iter.);
%let _i = %eval(&_i.+1);
data want;
set want;
&var. = &var. - &amount.;
call symputx('calc',&var.);
run;
%end;
%if &calc.=&target. %then %do;
%put &var. reduced to &target. in &_i. iterations.;
%end;
%else %do;
%put &var. not reduced to &target. in &iter. iterations. Try a larger number.;
%end;
%mend reduce;
%reduce(data=have,var=x,amount=1,target=0);
That is a very simple example, but it has all of the same elements. I prefer to use do-until and increment on my own but you can do the opposite also (as %rakinge does). Sadly the macro language doesn't allow for do-by-until like the data step language does. Oh well.
Secondly, you can often do things like this inside a single data step. Even in older versions (9.2 etc.), you can do all of what you ask above in a single data step, though it might look a little clunky. In 9.3+, and particularly 9.4, there are ways to run that proc sql inside the data step and get the result back without waiting for another data step, using RUN_MACRO or DOSUBL and/or the FCMP language. Even something simple, like this:
data have;
initial_a=0.3;
a=0.3;
target=0.5;
output;
initial_a=0.6;
a=0.6;
output;
initial_a=0.8;
a=0.8;
output;
run;
data want;
k=1;
do iter=1 to 20 until (abs(target-estimate0) < 0.001);
do _n_ = 1 to nobs;
if _n_=1 then result_tot=0;
set have nobs=nobs point=_n_;
a=initial_a*k;
result=min(1,a);
result_tot+result;
end;
estimate0 = result_tot/nobs;
k = k * (target/estimate0);
end;
output;
stop;
run;
That does it all in one data step. I'm cheating a bit because I'm writing my own data step iterator, but that's fairly common in this sort of thing, and it is very fast. Macros iterating multiple data steps and proc sql steps will be much slower typically as there is some overhead from each one.
I would like to create variables containing lagged values of a given variable for a large number of lags. How could I do this? I try the following:
data out;
set in;
do i = 1 to 50;
%let j = i;
lag_&j = Lag&j.(x);
end;
run;
How can I get the loop variable i into the macro variable j or how to use it directly to create the appropriately named variable and for the Lag function?
Chris J answers the question, but here i'll provide my preferred way of doing this.
%macro lagvar(var=,num=);
%do _iter = 1 %to &num.;
lag_&_iter. = lag&_iter.(&var.);
%end;
%mend lagvar;
data out;
set in;
%lagvar(var=x,num=50); *semicolon optional here;
run;
This is a more modular usage of the macro loop (and more readable, assuming you use intelligent names - the above is okay, you could do even more with the name if you wanted to be very clear, and of course add comments).
You're mixing macro & datastep syntax incorrectly...
You need a macro-loop (%DO instead of do) to generate the datastep code (i.e. lag1-lag50), and macro-loops need to be within a macro.
%MACRO LAGLOOP ;
data out ;
set in ;
%DO J = 1 %TO 50 ;
lag_&J = lag&J(x) ;
%END ;
run ;
%MEND ;
%LAGLOOP ;