Writing the Ackermann function in a SAS data step - sas

In my quest to understand recursive programming in SAS, I have tried, unsuccessfully many times, to write a version of a two-argument Ackermann function.
The function states that:
I was only going to calculate m & n for values ranging from 0 - 3, as values of m >= 4 cause the returned values to become large very rapidly.
I was shooting for a relatively simple output; something like:
Ack(0,0) = 1
Ack(0,1) = 2
Ack(0,2) = 3
Ack(0,3) = 4
Ack(1,0) = 2
Ack(1,1) = 3
And so on to Ack(3,3) = 61
I was unable to find any reference online to someone doing this in SAS. So, if someone could help me with this, I would really appreciate it!
Thanks!

proc fcmp implementation:
/* Define */
proc fcmp outlib=work.funcs.math;
function ackerman(m, n);
if m = 0 then return(n + 1);
else if n = 0 then return(ackerman(m - 1, 1));
else return(ackerman(m - 1, ackerman(m, n - 1)));
endsub;
run;
quit;
/*Test*/
option cmplib = work.funcs;
proc fcmp;
out = ackerman(3,2);
put "Testing Function Call";
put "ackerman(3,2) returns:" out;
quit;

Here is a SAS/AF class implementation
sasuser.examples.ackermanclass.scl
Class Ackerman extends sashelp.fsp.object.class;
compute: public method
m: num
n: num
return = num;
if m=0 then return n+1;
if m > 0 then do;
if n = 0 then return compute ( m-1, 1 );
if n > 0 then return compute ( m-1, compute ( m, n-1 ) );
throw _new_ SASHelp.Classes.SCLException ("Ackerman compute, invalid args: n=" || cats(n));
end;
throw _new_ SASHelp.Classes.SCLException ("Ackerman compute, invalid args: m=" || cats(m));
endmethod;
EndClass;
sasuser.examples.ackermantest.scl
init:
declare sasuser.examples.ackerman.class ackerman
= _new_ sasuser.examples.ackerman.class();
do m = 0 to 3;
do n = 0 to 3;
put m= n= 'result=' ackerman.compute(m,n);
end;
end;
return;
Test with AFA C=sasuser.examples.ackermantest.scl
m=0 n=0 result=1
m=0 n=1 result=2
m=0 n=2 result=3
m=0 n=3 result=4
m=1 n=0 result=2
m=1 n=1 result=3
m=1 n=2 result=4
m=1 n=3 result=5
m=2 n=0 result=3
m=2 n=1 result=5
m=2 n=2 result=7
m=2 n=3 result=9
m=3 n=0 result=5
m=3 n=1 result=13
m=3 n=2 result=29
m=3 n=3 result=61

Here is Proc DS2 example that uses recursion:
proc ds2;
data _null_;
method ackerman(int m, int n) returns int;
if m=0 then return n+1;
if m > 0 then do;
if n = 0 then return ackerman ( m-1, 1 );
if n > 0 then return ackerman ( m-1, ackerman ( m, n-1 ) );
return -1;
end;
return -1;
end;
method init();
declare int m n result;
do m = 0 to 3;
do n = 0 to 3;
result = ackerman(m,n);
put m= n= result=;
end;
end;
end;
enddata;
run;
quit;

It is difficult to do recursion in normal SAS code. But it is easy in macro code.
%macro a(m,n);
%if %sysfunc(verify(&m.&n,0123456789)) %then %do;
%put WARNING: Invalid input to macro &sysmacroname.. Use only non-negative integers.;
.
%end;
%else %if (&m=0) %then %eval(&n+1);
%else %if (&n=0) %then %a(%eval(&m-1),1);
%else %a(%eval(&m-1),%a(&m,%eval(&n-1)));
%mend a;
If you must use it with the values of dataset variables then you might consider using the resolve() function.
data testa;
do i=0 to 3; do j=0 to 3;
a=input(resolve(cats('%a(',i,',',j,')')),32.);
output;
end;end;
run;
proc print; run;
Results
Obs i j a
1 0 0 1
2 0 1 2
3 0 2 3
4 0 3 4
5 1 0 2
6 1 1 3
7 1 2 4
8 1 3 5
9 2 0 3
10 2 1 5
11 2 2 7
12 2 3 9
13 3 0 5
14 3 1 13
15 3 2 29
16 3 3 61
Of course if you are only ever going to use arguments from 0 to 3 then perhaps you just need to use an array lookup instead.
data testb;
array _a(0:3,0:3) _temporary_
(1 2 3 4
2 3 4 5
3 5 7 9
5 13 29 61
);
do i=0 to 3; do j=0 to 3; a=_a(i,j); output; end; end;
run;

Related

Sum consecutive observations in a dataset SAS

I have a dataset that looks like:
Hour Flag
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
I want to have an output dataset like:
Total_Hours Count
2 2
3 1
4 1
As you can see, I want to count the number of hours included in each period with consecutive "1s". A missing value ends the consecutive sequence.
How should I go about doing this? Thanks!
You'll need to do this in two steps. First step is making sure the data is sorted properly and determining the number of hours in a consecutive period:
PROC SORT DATA = <your dataset>;
BY hour;
RUN;
DATA work.consecutive_hours;
SET <your dataset> END = lastrec;
RETAIN
total_hours 0
;
IF flag = 1 THEN total_hours = total_hours + 1;
ELSE
DO;
IF total_hours > 0 THEN output;
total_hours = 0;
END;
/* Need to output last record */
IF lastrec AND total_hours > 0 THEN output;
KEEP
total_hours
;
RUN;
Now a simple SQL statement:
PROC SQL;
CREATE TABLE work.hour_summary AS
SELECT
total_hours
,COUNT(*) AS count
FROM
work.consecutive_hours
GROUP BY
total_hours
;
QUIT;
You will have to do two things:
compute the run lengths
compute the frequency of the run lengths
For the case of using the implict loop
Each run length occurnece can be computed and maintained in a retained tracking variable, testing for a missing value or end of data for output and a non missing value for run length reset or increment.
Proc FREQ
An alternative is to use an explicit loop and a hash for frequency counts.
Example:
data have; input
Hour Flag; datalines;
1 1
2 1
3 .
4 1
5 1
6 .
7 1
8 1
9 1
10 .
11 1
12 1
13 1
14 1
;
data _null_;
declare hash counts(ordered:'a');
counts.defineKey('length');
counts.defineData('length', 'count');
counts.defineDone();
do until (end);
set have end=end;
if not missing(flag) then
length + 1;
if missing(flag) or end then do;
if length > 0 then do;
if counts.find() eq 0
then count+1;
else count=1;
counts.replace();
length = 0;
end;
end;
end;
counts.output(dataset:'want');
run;
An alternative
data _null_;
if _N_ = 1 then do;
dcl hash h(ordered : "a");
h.definekey("Total_Hours");
h.definedata("Total_Hours", "Count");
h.definedone();
end;
do Total_Hours = 1 by 1 until (last.Flag);
set have end=lr;
by Flag notsorted;
end;
Count = 1;
if Flag then do;
if h.find() = 0 then Count+1;
h.replace();
end;
if lr then h.output(dataset : "want");
run;
Several weeks ago, #Richard taught me how to use DOW-loop and direct addressing array. Today, I give it to you.
data want(keep=Total_Hours Count);
array bin[99]_temporary_;
do until(eof1);
set have end=eof1;
if Flag then count + 1;
if ^Flag or eof1 then do;
bin[count] + 1;
count = .;
end;
end;
do i = 1 to dim(bin);
Total_Hours = i;
Count = bin[i];
if Count then output;
end;
run;
And Thanks Richard again, he also suggested me this article.

SAS make summary statistic not available in proc mean

I have a table with very many columns but for the in order to explain my
problem I will use this simple table.
data test;
input a b c;
datalines;
0 0 0
1 1 1
. 4 2
;
run;
I need to calculate the common summary statistic as min, max and number of missing. But I also need to calculate some special numbers as number of values above a certain level( in this example >0 and >1.
I can use proc mean but it only give me results for normal things like min, max etc.
What I want is result on the following format:
var minval maxval nmiss n_above1 n_above2
a 0 1 1 1 0
b 0 4 0 2 1
c 0 2 0 2 1
I have been able to make this informat for one variable with this rather
stupid code:
data result;
set test(keep =b) end=last;
variable = 'b';
retain minval maxval;
if _n_ = 1 then do;
minval = 1e50;
maxval = -1e50;
end;
if minval > b then minval = b;
if maxval < b then maxval = b;
if b=. then nmiss+1;
if b>0 then n_above1+1;
if b>2 then n_above2+1;
if last then do;
output;
end;
drop b;
run;
This produce the following table:
variable minval maxval nmiss n_above1 n_above2
b 0 4 0 2 1
I know there has to be better way do this. I am used to Python and Pandas. There I will only loop through each variable, calculate the different summary statistick and append the result to a new dataframe for each variable.
I can probably also use proc sql. The next example
proc sql;
create table res as
select count(case when a > 0 then 1 end) as n_above1_a,
count(case when b > 0 then 1 end) as n_above1_b,
count(case when c > 0 then 1 end) as n_above1_c
from test;
quit;
This gives me:
n_above1_a n_above1_b n_above1_c
1 2 2
But this do not solve my problem.
If you add an unique identifier to each row then you can just use PROC TRANSPOSE and PROC SQL to get your result.
data test;
input a b c;
id+1;
datalines;
0 0 0
1 1 1
. 4 2
;
proc transpose data=test out=tall ;
by id ;
run;
proc sql noprint ;
create table want as
select _name_
, min(col1) as minval
, max(col1) as maxval
, sum(missing(col1)) as nmiss
, sum(col1>1) as n_above1
, sum(col1>2) as n_above2
from tall
group by _name_
;
quit;
Result
Obs _NAME_ minval maxval nmiss n_above1 n_above2
1 a 0 1 1 0 0
2 b 0 4 0 1 1
3 c 0 2 0 1 0

If condition is not executed when the first condition is verified SAS

The following code is executed and the result is wrong.
When the first condition is verified the code don't work correctly.
The lag1 it remains as missing when the value should be 3...
Thanks for help me.
DATA VALUES;
INPUT VAL caract$ var1 var2;
DATALINES;
1 a 12 0
1 c 0 4
1 c 3 2
2 a 3 2
2 b 15 16
2 b 4 1
3 a 12 13
3 c 12 13
4 c 14 15
5 b 14 0
6 b 14 15
7 a 12 15
7 c 12 15
8 c 14 15
9 c 14 5
10 c 13 7
;
RUN;
%macro lag_var(dataset, lag);
data &dataset&lag;
set &dataset;
by VAL;
%do i=0 %to &lag;
if caract eq 'b' then
lag&i=lag&i(var1);
else lag&i = lag&i(var2);
%end;
if first.VAL then do;
count=0;
%do i=1 %to &lag;
lag&i=.;
%end;
end;
count+1;
%do i=1 %to &lag;
if (not first.VAL and count<=&i) then do;
lag&i=.;
end;
%end;
maxi = max(of lag1 - lag&lag);
run;
%mend lag_var;
%lag_var(VALUES,3);
It is surely related to conditional execution of the LAG function. Try changing to something like this using TEMP1 and TEMP2 variables to hold the lagged values.
%do i=0 %to &lag;
temp1=lag&i(var1);
temp2=lag&i(var2);
if caract eq 'b' then
lag&i=temp1;
else lag&i = temp2;
%end;

SAS count a sequence of equal numbers

I wish to get last number in a sequence of equal numbers. For example, I have the following dataset:
X
1
1
0
0
0
1
1
0
Given that sequence of numbers I need extract the last number of a sequence of "ones" until appear a 0. That is what I want:
X Seq
1 1
1 2
1 3
0 1
0 2
0 3
1 1
1 2
0 1
1 1
1 2
1 3
0 1
I need create a new dataset with the numbers in bold, that is:
Seq1
3
2
3
Thanks for any advice.
One more option - use the NOTSORTED BY group option.
Data want;
Set have;
By x NOTSORTED;
Retain count;
If first.x then count=1;
Else count+1;
If last.x then output;
Keep count;
Run;
You can create group variables using a lag and then keep the last observation of each group you have created:
data temp;
input x $;
datalines;
1
1
1
0
0
0
1
1
0
1
1
1
0
;
run;
data temp2;
set temp;
retain flag;
if lag(x) > x then flag = _n_;
if x = 0 then delete;
run;
data temp3 (keep = seq1);
set temp2;
seq1 + 1;
by flag;
if first.flag then seq1 = 1;
if last.flag then save = 1;
if missing(save) then delete;
run;
Use a proc summary with notsorted option here:
data math;
input x;
datalines;
1
1
1
0
0
0
1
1
0
1
0
1
1
1
1
0
1
;
run;
proc summary data=math;
by x notsorted;
class x;
output Out=z;
run;
data z (drop=_type_ x);
set z (rename=(_FREQ_=COUNT));
where _type_=1 and x=1;/*if you are looking for other number then 1, replace it here*/
run;
proc print data=z noobs;
run;
result is:
Here's a solution using only one data step with a retain statement.
data have;
input x ##;
output;
datalines;
1 1 1 0 0 0 1 1 0 1 0 1 1 1 1 0 1
;
data want(keep = count);
set have end = last;
retain x_previous . count .;
if x = 0 then do;
if x_previous = 1 then do;
output;
count = 0;
end;
end;
else if x = 1 then count + 1;
if last = 1 and count > 0 then output;
x_previous = x;
run;
Results
count
-----
3
2
1
4
1

Using Retain Statement for Mathematical Operations in SAS

I have a dataset with 4 observations (rows) per person.
I want to create three new variables that calculate the difference between the second and first, third and second, and fourth and third rows.
I think retain can do this, but I'm not sure how.
Or do I need an array?
Thanks!
data test;
input person var;
datalines;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
;
run;
data test;
set test;
by person notsorted;
retain pos;
array diffs{*} diff0-diff3;
retain diff0-diff3;
if first.person then do;
pos = 0;
end;
pos + 1;
diffs{pos} = dif(var);
if last.person then output;
drop var diff0 pos;
run;
Why not use The Lag function.
data test; input person var;
cards;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
run;
data test; set test;
by person;
LagVar=Lag(Var);
difference=var-Lagvar;
if first.person then difference=.;
run;
An alternative approach without arrays.
/*-- Data from simonn's answer --*/
data SO1019005;
input person var;
datalines;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
;
run;
/*-- Why not just do a transpose? --*/
proc transpose data=SO1019005 out=NewData;
by person;
run;
/*-- Now calculate your new vars --*/
data NewDataWithVars;
set NewData;
NewVar1 = Col2 - Col1;
NewVar2 = Col3 - Col2;
Newvar3 = Col4 - Col3;
run;
Why not use the dif() function instead?
/* test data */
data one;
do id = 1 to 2;
do v = 1 to 4 by 1;
output;
end;
end;
run;
/* check */
proc print data=one;
run;
/* on lst
Obs id v
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 2 3
8 2 4
*/
/* now create diff within id */
data two;
set one;
by id notsorted; /* assuming already in order */
dif = ifn(first.id, ., dif(v));
run;
proc print data=two;
run;
/* on lst
Obs id v dif
1 1 1 .
2 1 2 1
3 1 3 1
4 1 4 1
5 2 1 .
6 2 2 1
7 2 3 1
8 2 4 1
*/
data output_data;
retain count previous_value diff1 diff2 diff3;
set data input_data
by person;
if first.person then do;
count = 0;
end;
else do;
count = count + 1;
if count = 1 then diff1 = abs(value - previous_value);
if count = 2 then diff2 = abs(value - previous_value);
if count = 3 then do;
diff3 = abs(value - previous_value);
output output_data;
end;
end;
previous_value = value;
run;