Some context:
I have a string of digits (not ordered, but with known range 1 - 78) and I want to extract the digits to create specific variables with it, so I have
"64,2,3" => var_64 = 1; var_02 = 2; var_03 = 1; (the rest, like var_01 are all set to missing)
I basically came up with two solutions, one is using a macro DO loop and the other one a data step DO loop. The non-macro solution was to fist initialize all variables var_01 - var_78 (via a macro), then to put them into an array and then to gradually set the values of this array while looping through the string, word-by-word.
I then realized that it would be way easier to use the loop iterator as a macro variable and I came up with this MWE:
%macro fast(w,l);
do p = 1 to &l.;
%do j = 1 %to 9;
if &j. = scan(&w.,p,",") then var_0&j. = 1 ;
%end;
%do j = 10 %to 78;
if &j. = scan(&w.,p,",") then var_&j. = 1 ;
%end;
end;
%mend;
data want;
string = "2,4,64,54,1,4,7";
l = countw(string,",");
%fast(string,l);
run;
It works (no errors, no warnings, expected result) but I am unsure about mixing macro-DO-loops and non-macro-DO-loops. Could this lead to any inconsistencies or should I just stay with the non-macro solution?
Your current code is comparing numbers like 1 to strings like "1".
&j. = scan(&w.,p,",")
It will work as long as the strings can be converted into numbers, but it is not a good practice. It would be better to explicitly convert the strings into numbers.
input(scan(&w.,p,","),32.)
You can do what you want with an array. Use the number generated from the next item in the list as the index into the array.
data want;
string = "2,4,64,54,1,4,7";
array var_ var_01-var_78 ;
do index=1 to countw(string,",");
var_[input(scan(string,index,","),32.)]=1;
end;
drop index;
run;
Related
Have a variable called var1 that has two kinds of values (both as character strings). One is "ND" the other is a number out of 0-100, as a string. I want to convert "ND" to 0 and the character string to a numeric value, for example 1(character) to 1(numeric).
Here's my code attempt:
data cleaned_up(drop = exam_1);
set dataset.df(rename=(exam1=exam_1));
select (exam1);
when ('ND') do;
exam1 = 0;
end;
when ;
exam1 = input(exam_1,2.);
end;
otherwise;
end;
Clearly not working. What am I doing wrong?
A couple of problems with your code. Putting the rename statement as a dataset option against the input dataset will perform the rename before the data is read in. Therefore exam1 won't exist as it is now called exam_1. This will still be defined as a character column, so the input function won't work.
You need to keep the existing column, create a new numeric column to do the conversion, then drop the old column and rename the new one. This can be done as a dataset option against the output dataset.
The tranwrd function will replace all occurrences of 'ND' to '0', then using input with the best12 informat will read in all the data as numbers. You don't have to specify the length when reading numbers (i.e. 2. for 2 digits, 3. for 3 digits etc).
data cleaned_up (drop=exam1 rename=(exam_1=exam1));
set df;
exam_1 = input(tranwrd(exam1,'ND','0'),best12.);
run;
You are using select(exam1) while it should be select(exam_1). You can use select for this purpose, but I think simple if condition can solve this much easier:
data test;
length source $32;
do source='99', '34.5', '105', 'ND';
output;
end;
run;
data result(drop = convertedValue);
set test;
if (source eq 'ND') then do;
result = 0;
end;
else do;
convertedValue = input(source,??best.);
if not missing(convertedValue) then do;
if (0 <= round(convertedValue, 1E-12) <= 100) then do;
result = convertedValue;
end;
end;
end;
run;
input(source,??best.) tries to convert source to number and if it fails (e.g. values contains some word), it does not print an error and simply continues execution.
round(convertedValue,1E-12) is used to avoid precision error during the comparison. If you want to do it absolutely safely you have to use something like
if (0 < round(convertedValue,1E-12) < 100
or abs(round(convertedValue,1E-12)) < 1E-10
or abs(round(convertedValue-100,1E-12)) < 1E-10
)
Try to use ifc function then convert to numeric variable.
data have;
input x $3.;
_x=input(ifc(x='ND','0',x),best12.);
cards;
3
10
ND
;
I have 10 variables (var1-var10), which I need to rename var10-var1 in SAS. So basically I need var10 to be renamed var1, var9 var2, var8 var3, and so on.
This is the code that I used based on this paper, http://analytics.ncsu.edu/sesug/2005/PS06_05.PDF:
%macro new;
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%do j=1 %to 10 %by 1;
var.&i=var.&j
%end;
%end;
;
%mend new;
%new;
The problem I'm having is that it only renames var1 as var10, so the last iteration in the do-loop.
Thanks in advance for any help!
Emily
You really don't need to do that, you can rename variable with list references, especially if they've been named sequentially.
ie:
rename var1-var10 = var10-var1;
Here's a test that demonstrates this:
data check;
array var(10) var1-var10 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
output;
run;
data want;
set check;
rename var1-var10 = var10-var1;
run;
If you do need to do it manually for some reason, then you need two arrays. Once you've assigned the variable you've lost the old variable so you can't access it anymore. So you need some sort of temporary array to hold the new values.
While Reeza's answer is correct, it's probably worth going through why your method didn't work - which is another reasonable, if convoluted, way to do it.
First off, you have some minor syntax issues, such as a misplaced semicolon, periods in the wrong places (They end macro variable names, not begin them), and a missing run statement; we'll ignore those and fix them as we change the code.
Second, you have two nested loops, when you don't really want that. You don't want to do the inner code 10 times (once per iteration of j) for each iteration of i (so 100 times total); you want to do the inner code once for each iteration of both i and j.
Let's see what this fix, then, gives us:
data temp;
array var[10];
do _n_ = 1 to 15;
do _i = 1 to 10;
var[_i] = _i;
end;
output;
end;
drop _i;
run;
%macro new();
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
var&i.=var&j.;
%end;
run;
%mend new;
%new();
Okay, so this now does something closer to what you want; but you have an issue, right? You lose the values for the second half (well, really the first half since you use %by -1) since they're not stored in a separate place.
You could do this by having a temporary dumping area where you stage the original variables, allowing you to simultaneously change the values and access the original. A common array-based method (rather than macro based) works this way. Here's how it would look like in a macro.
%macro new();
data temp_one;
set temp;
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
_var&i. = var&i.;
var&i.=coalesce(_var&j., var&j.);
%end;
drop _:;
run;
%mend new;
We use coalesce() which returns the first nonmissing argument; for the first five iterations it uses var&j. but the second five iterations use _var&j. instead. Rather than use this function you could also just prepopulate the variable.
A much better option though is to use rename, as Reeza does in the above answer, but presented here with something more like your original answer:
%macro new();
data temp_one;
set temp;
rename
%do i=10 %to 1 %by -1;
%let j = %eval(11-&i.);
var&i.=var&j.
%end;
;
run;
%mend new;
This works because rename does not actually move things around - it just sets the value of "please write this value out to _____ variable on output" to something different.
This is actually what the author in the linked paper proposes, and I suspect you just missed the rename bit. That's why you have the single semicolon after the whole thing (since it's just one rename statement, so just one ; ) rather than individual semicolons after each iteration (as you'd need with assignment).
I'm trying to find the max of four variables, Value_A Value_B Value_C Value_D, within a macro. I thought I could do %sysfunc(max(value_&i.)) but that isn't working. My full code is:
%let i = (A B C D);
%macro maxvalue;
data want;
set have;
%do j = 1 %to %sysfunc(countw(&list.));
%let i = %scan(&list.,&j.);
value_&i.= Sale_&i. - int_&i.
Max_Value = %sysfunc(max(value_&i.));
%end;
run;
%mend maxvalue;
%maxvalue;
I should specify that I only want the max of the four variables for each observation. Thanks for your help!
Aside from the typo - %let i=(A B C D); should be %let list=(A B C D) - you're a) overcomplicating it, and b) confusing macro syntax with datastep syntax. Whilst you could do this using a macro, there is no need.
Given the variables in question are all prefixed in a similar manner (although it would be even better if they were numerically-suffixed, e.g. Value1, Value2), it's far easier to use arrays and the appropriate functions :
data want ;
set have ;
array sale{*} Sale_A Sale_B Sale_C Sale_D ;
array int{*} Int_A Int_B Int_C Int_D ;
array value{*} Value_A Value_B Value_C Value_D ;
/* Iterate over array */
do i = 1 to dim(sale) ;
value{i} = sum(sale{i},-int{i}) ;
end ;
max_value = max(of value{*}) ;
run ;
As aforementioned, you're over-complicating this, but you can achieve what you're trying to do using macro logic by including another for loop within your max_value assignment. This method involves you taking the max of your four variables and a missing value, which should produce the desired result:
%let list = A B C D;
%macro maxvalue;
data want;
set have;
%do j = 1 %to %sysfunc(countw(&list.));
%let i = %scan(&list.,&j.);
value_&i.= Sale_&i. - int_&i.
%end;
max_value = max(
%do x = 1 %to %sysfunc(countw(&list.));
%let y = %scan(&list.,&x.);
value_&y.,
%end; .
);
run;
%mend maxvalue;
%maxvalue;
Why not just rename your variables to SALE_1 to SALE_4? Then you can reference them with a simple variable list SALE_1-SALE_4.
If you are going to use non-numeric suffixes on lists of similarly named variables then perhaps what you really need is a simple function style macro to generate the lists of variable names based on a base name and list of suffix values.
%macro generate_names(base,list);
&base%sysfunc(tranwrd(%sysfunc(compbl(&list)),%str( ),%str( &base)))
%mend generate_names;
Then it is easier to generate variable lists to use for ARRAY statements
%let suffixes=A B C D;
array sale %generate_names(Sale_,&suffixes);
array int %generate_names(Int_,&suffixes);
array value %generate_names(Value_,&suffixes);
and other statements.
max_value = max(of %generate_names(Value_,&suffixes)) ;
Trying to do some performance testing
I can't figure out a macro
%generate(n_rows,n_cols);
that would generate a table with n_rows and n_cols, filled with random numbers/strings
I tried using this link:
http://bi-notes.com/2012/08/benchmark-io-performance/
But I quickly encounter a memory issue
Thanks!
Try this. I added a 2 input parameters. So now you have a number of numerics and a number of characters. Also the ability to define the output dataset name.
%macro generate(n_rows,n_num_cols,n_char_cols,outdata=test,seed=0);
data &outdata;
array nums[&n_num_cols];
array chars[&n_char_cols] $;
temp = "abcdefghijklmnopqrstuvwxyz";
do i=1 to &n_rows;
do j=1 to &n_num_cols;
nums[j] = ranuni(&seed);
end;
do j=1 to &n_char_cols;
chars[j] = substr(temp,ceil(ranuni(&seed)*18),8);
end;
output;
end;
drop i j temp;
run;
%mend;
%generate(10,10,10,outdata=test);
I would like to create variables containing lagged values of a given variable for a large number of lags. How could I do this? I try the following:
data out;
set in;
do i = 1 to 50;
%let j = i;
lag_&j = Lag&j.(x);
end;
run;
How can I get the loop variable i into the macro variable j or how to use it directly to create the appropriately named variable and for the Lag function?
Chris J answers the question, but here i'll provide my preferred way of doing this.
%macro lagvar(var=,num=);
%do _iter = 1 %to &num.;
lag_&_iter. = lag&_iter.(&var.);
%end;
%mend lagvar;
data out;
set in;
%lagvar(var=x,num=50); *semicolon optional here;
run;
This is a more modular usage of the macro loop (and more readable, assuming you use intelligent names - the above is okay, you could do even more with the name if you wanted to be very clear, and of course add comments).
You're mixing macro & datastep syntax incorrectly...
You need a macro-loop (%DO instead of do) to generate the datastep code (i.e. lag1-lag50), and macro-loops need to be within a macro.
%MACRO LAGLOOP ;
data out ;
set in ;
%DO J = 1 %TO 50 ;
lag_&J = lag&J(x) ;
%END ;
run ;
%MEND ;
%LAGLOOP ;