This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Tail recursion in C++
I'm new to tail recursion in c++. My project requires I make all my functions tail recursive. I've tested the following code and it works correctly. However, I'm not sure if how I've done it qualifies as tail recursion.
static int sum_helper(list_t hList, int accumulator){
if (list_isEmpty(hList))
return accumulator;
else {
accumulator += list_first(hList);
hList = list_rest(hList);
return sum_helper(hList, accumulator);
}
}
int sum(list_t list){
/*
// EFFECTS: returns the sum of each element in list
// zero if the list is empty.
*/
if (list_isEmpty(list))
return 0;
return sum_helper(list, 0);
}
Thanks!
In short, you don't do anything after the recursive call (sum_helper). This means that you never need to return to the caller, and thus, you can throw away the stack frame of the caller.
Take the example of the normal factorial function
int fact(int x)
{
if(x == 0)
return 1;
else
return x * fact(x-1);
}
This is not tail recursive since the value of fact(x-1) needs to be returned, then multiplied by six. Instead, we can cheat a little, and pass an accumulator too. See this:
int fact(int x, int acc)
{
if(x == 0)
return acc; // Technically, acc * 1, but that's the identity anyway.
else
return fact(x-1, acc*x);
}
Here, the last function call in the control flow is fact(x-1, acc*x). Afterwards, we don't need to use the return value for anything of the called function for anything else, hence we don't need to return to the current frame. For this reason, we can throw away the stack frame and apply other optimisations.
Disclaimer: I've probably applied the factorial algorithm wrong, but you get the jist. Hopefully.
It's tail-recursion provided list_t doesn't have a non-trivial destructor. If it does have a non-trivial destructor, the destructor needs to run after the recursive call returns and before the function itself returns.
Bonus:
int sum(list_t hList, int accumulator = 0) {
return list_isEmpty(hList)
? 0
: sum(list_rest(hList), accumulator + list_first(hList));
}
But tastes vary; some people might like yours more.
From theoreitcal point of view, yes, it's tail recursion (provided that hList does not have nontrival destructor). But from practival point of view it depends on your compiler and its settings. Let's take a look at assembly generated for this simple code:
#include <cstdlib>
struct list{
int head;
list * tail;
};
int sum_helper(list * l, int accumulator){
if (l == NULL)
return accumulator;
else {
accumulator += l->head;
return sum_helper(l->tail, accumulator);
}
}
Optimisations ON : (g++ -O2 ..., boring part omitted):
testq %rdi, %rdi
movl %esi, %eax
je .L2
...
.L6:
...
jne .L6 <-- loop
.L2:
rep
ret
This is clearly a loop. But when you disable optimisations, you get:
_Z10sum_helperP4listi:
.LFB6:
...
jne .L2
movl -12(%rbp), %eax
jmp .L3
.L2:
...
call _Z10sum_helperP4listi <-- recursion
.L3:
leave
.cfi_def_cfa 7, 8
ret
Which is recursive.
Related
I understand tail-recursion however I have been assigned to write a code to see what the N'th Fibonacci number is.
To begin, this code does work. It's not the best way but it's one way--however I'm starting to worry that it isn't tail recursive. The code is here:
static int fib_tail_helper(int n, int result) {
if (n == 0) {
return result;
}
else if (result == 0) {
return fib_tail_helper(n - 1, result + 1);
}
else {
return fib_tail_helper(n - 1, result + fib_tail_helper(n - 1, 0));
}
}
int fib_tail(int n) {
/*
// REQUIRES: n >= 0
// EFFECTS: computes the Nth Fibonacci number
// fib(0) = 0
// fib(1) = 1
// fib(n) = fib(n-1) + fib(n-2) for (n>1).
// MUST be tail recursive
*/
return fib_tail_helper(n, 0);
}
I'm mostly worried about the "return fib_tail_helper(n - 1, result + fib_tail_helper(n - 1), 0".
I feel as if that would use another stack, and thus be non-tail-recursive... Can anyone give some input?
Thanks!!
No it is not tail-recursive.
The compiler needs to evaluate the fib_tail_helper argument first, which means it will create n-1 call stacks before it proceeds to call the last fib_tail_helper as the return value.
To show that it's not tail-recursive a transformation might help:
static int fib_tail_helper(int n, int result) {
if (n == 0) {
return result;
}
else if (result == 0) {
return fib_tail_helper(n - 1, result + 1);
}
else {
int tailrecursivePreventingValue = fib_tail_helper(n - 1, 0);
return fib_tail_helper(n - 1, result + tailrecursivePreventingValue);
}
}
It does exactly the same as your code but introduces an explanatory variable. You can see that there are 2 calls to fib_tail_helper() in the last else-block. This means exponential running time since the second value depends on the first one.
Tail recursion is a clever implementation of recursion, that does not use stack space.
It works like this:
If a function calls itself, as it's last action, then that is called "tail recursion".
In this special case, a compiler can forgo doing an actual function call. It can execute a goto back to the beginning of the function. The code of the function will run again, just as if it had been called. When the function terminates, it will return to the last address on the stack, which is the function that originally called the recursive function.
This approach guarantees that the stack does not overflow, no matter how deep the recursion goes. That is what is so great about tail recursion.
The bad news is that C++ does NOT automatically support tail recursion.
The good news is that it is trivially easy to implement tail recursion.
You simply replace the final function call with a goto back to the beginning of the function.
(This is just you writing the goto that the compiler would, if it supported tail recursion.)
How do I pass data around my program without copying it every time?
Specifically, when calling sim(ohlc) I want to just pass the pointer reference, I don't want to copy the data to the function.
This is the program I made, but I'm not sure this is the best way to do it (specially when it comes to speed and memory usage).
I think I'm not passing the pointer to sim(ohlc) like I should, but if I try sim(&ohlc) I don't know how to change the sim function to accept that.
struct ohlcS {
vector<unsigned int> timestamp;
vector<float> open;
vector<float> high;
vector<float> low;
vector<float> close;
vector<float> volume;
} ;
ohlcS *read_csv(string file_name) {
// open file and read stuff
if (read_error)
return NULL;
static ohlcS ohlc;
ohlc.timestamp.push_back(read_value);
return &ohlc;
}
int sim(ohlcS* ohlc) {
// do stuff
return 1;
}
main() {
ohlcS *ohlc = read_csv(input_file);
results = sim(ohlc);
}
It's C++, use a reference. It's safe, since you return a static object.
static ohlc ohlc_not_found;
ohlc &read_csv(string file_name) {
// open file and read stuff
if(error_while_opening)
{
return ohlc_not_found;
}
static ohlc loc_ohlc;
loc_ohlc.timestamp.push_back(read_value);
return loc_ohlc;
}
int sim(const ohlc& par_ohlc) {
// do stuff
return 1;
}
....
ohlc& var_ohlc = read_csv(input_file);
if(var_ohlc == ohlc_not_found)
{
// error handling
return;
}
results = sim(var_ohlc);
If you want to modify par_ohlc in sim, do not make it const.
and it's not recommended to use ohlc for both class and variable name :(
In line:
results = sim(ohlc);
you are passing ohlc pointer to sim function, no deep data copy is done, only 32bit pointer value is copied.
This pushes the address (32 bit value) onto the stack.
results = sim(ohlc);
Like:
; ...
push eax ; addr of struct/class/whatever
call function ; jump to function
; ...
function:
push ebp
mov ebp, esp
mov eax, [ebp+8] ; ebp+8 is the 32 bit value you pushed before onto the stack
; -> your pointer
Take a look at this and maybe that too.
Version 2
; ...
push eax ; addr of struct/class/whatever
jmp function ; jump to function
autolbl001:
; ...
function:
push ebp
mov ebp, esp
mov eax, [ebp+8] ; ebp+8 is the 32 bit value you pushed before onto the stack
; ...
jmp autolbl001
I have this piece of code which accesses some information about a point on a 'x' and 'y' axis. This information is later used to draw some points onto the screen.
This is how the code works:
//MAX_X_COORD has a value of 169
//MAX_Y_COORD has a value of 55
void RedrawFromDATAtable()
{
COORD pos;
HANDLE tmpbuf = CreateConsoleScreenBuffer(GENERIC_WRITE , NULL, NULL, CONSOLE_TEXTMODE_BUFFER, NULL);
WriteConsoleA(tmpbuf, " ", 1, NULL, NULL);
if(SetConsoleActiveScreenBuffer(tmpbuf)==0)
{MessageBoxA(NULL, "ERROR", "ERROR", 0);return;}
bufferdata_ex * dptr;
//bufferdata_ex * y_dptr;
int * lcol(new int); //Increases speed by reducing function calls - Experienced about twice as fast drawing!
for(short x=0;x<MAX_X_COORD;x++)
{
//y_dptr = bridge->DATA[x];
for(short y=0;y<MAX_Y_COORD;y++)
{
//dptr = (y_dptr+y); //Rewrite to use a constant pointer!
dptr = &(_bridge->DATA[x][y]);
if(dptr->InUse==true)
{
{
pos.X = x;
pos.Y = y;
SetConsoleCursorPosition(output, pos);
//haschanged = false;
}
if(!(*lcol==dptr->color)) //Need for a new color?
{ SetConsoleTextAttribute(output, dptr->color);lcol = &dptr->color;}
char c((char)dptr->sym);
WriteConsoleA(output, &c, 1, NULL, NULL);
lcol = &dptr->color;
}
}
}
SetConsoleTextAttribute(output, bridge->current_color);
SetConsoleCursorPosition(output, last_mouse_position);
SetConsoleActiveScreenBuffer(output);
CloseHandle(tmpbuf);
delete lcol;
}
Cut to the case!
Alright!
So recently I had a thought that accessing the array like that would slow down my code. As far as I know then whenever you access an element in an array the processor will take the base adress of the array and from there by multiply the size of the elements by the index which is used to find the adress of the specified element.
My thought here was that if I ask the processor to do that multiple times, instead of just creating a pointer to the adress, and then use that to process my elements, then it would slow down my code.
So I rewrote the code to the following:
void RedrawFromDATAtable()
{
COORD pos;
HANDLE tmpbuf = CreateConsoleScreenBuffer(GENERIC_WRITE , NULL, NULL, CONSOLE_TEXTMODE_BUFFER, NULL);
WriteConsoleA(tmpbuf, " ", 1, NULL, NULL);
if(SetConsoleActiveScreenBuffer(tmpbuf)==0)
{MessageBoxA(NULL, "ERROR", "ERROR", 0);return;}
bufferdata_ex * dptr;
bufferdata_ex * y_dptr;
int * lcol(new int); //Increases speed by reducing function calls - Experienced about twice as fast drawing!
for(short x=0;x<MAX_X_COORD;x++)
{
y_dptr = _bridge->DATA[x];
for(short y=0;y<MAX_Y_COORD;y++)
{
dptr = (y_dptr+y); //Rewrite to use a constant pointer!
//dptr = &(bridge->DATA[x][y]);
if(dptr->InUse==true)
{
{
pos.X = x;
pos.Y = y;
SetConsoleCursorPosition(output, pos);
//haschanged = false;
}
if(!(*lcol==dptr->color)) //Need for a new color?
{ SetConsoleTextAttribute(output, dptr->color);lcol = &dptr->color;}
char c((char)dptr->sym);
WriteConsoleA(output, &c, 1, NULL, NULL);
lcol = &dptr->color;
}
}
}
SetConsoleTextAttribute(output, bridge->current_color);
SetConsoleCursorPosition(output, last_mouse_position);
SetConsoleActiveScreenBuffer(output);
CloseHandle(tmpbuf);
delete lcol;
}
The idea seems perfectly fine to me, but the problem is that the first piece of code is faster than the second piece of code!
So my question is: Why is it the first piece of code is faster than the second piece of code?
For those who doesn't like to read:
Why is the first piece of code faster than the other?
The first one takes 0.0919 seconds to finish where the second takes 0.226 seconds.
Also this is a copy of how the assembly handles the pointers:
//No pointers
dptr = &(bridge->DATA[x][y]);
001B41C6 mov eax,dword ptr [this]
001B41C9 mov ecx,dword ptr [eax+14h]
001B41CC movsx edx,word ptr [x]
001B41D0 imul edx,edx,370h
001B41D6 lea eax,[ecx+edx+1D4h]
001B41DD movsx ecx,word ptr [y]
001B41E1 shl ecx,4
001B41E4 add eax,ecx
001B41E6 mov dword ptr [dptr],eax
//With pointers
//Pointing to DATA[x]
012C41A5 mov eax,dword ptr [this]
012C41A8 mov ecx,dword ptr [eax+14h]
012C41AB movsx edx,word ptr [x]
012C41AF imul edx,edx,370h
012C41B5 lea eax,[ecx+edx+1D4h]
012C41BC mov dword ptr [y_dptr],eax
//Pointing to DATA[x]+y
012C41E0 movsx eax,word ptr [y]
012C41E4 shl eax,4
012C41E7 add eax,dword ptr [y_dptr]
012C41EA mov dword ptr [dptr],eax
other than this part of the code, then the rest is identical.
Looking only at the assembly we see an extra mov (the assignment of y_dptr).
Seeing how this is done on every iteration in the (outer) loop and there are no other differences in the code, this could be your reason for performance decrease.
Other than that, there is really nothing in your code that takes advantage of the pointer magic you are trying to use.
f.e. You use dptr = (y_dptr+y); where you can lose either dptr or y_dptr by using increment on the pointer directly (y_dptr++;). This is some pointer arithmetic magic you are not using and could be improved.
I read about recursion in Programming Interviews Exposed (3rd ed.) where they present the following recursive factorial function:
int factorial(int n){
if (n > 1) { /* Recursive case */
return factorial(n-1) * n;
} else { /* Base case */
return 1;
}
}
On the bottom of the same page (page 108) they talk about tail-recursive functions:
Note that when the value returned by the recursive call is itself immediately returned, as in the preceding definition for factorial, the function is tail-recursive.
But is this really the case here? The last call in the function is the * call, so won't this stack frame be preserved (if we don't take compiler optimization into account)? Is this really tail-recursive?
You can rewrite it to be tail-recursive:
int factorial(int n){
return factorial2(n, 1);
}
int factorial2(int n, int accum) {
if (n < 1) {
return accum;
} else {
return factorial2(n - 1, accum * n);
}
}
No, it's not tail-recursive. The result being returned by factorial(n-1) still has to be multiplied by n, which requires that factorial(n) regain control (thus mandating that the call to factorial(n-1) be a call rather than a jump).
With that said, even if it were tail-recursive, the compiler still might not do TCO on it. Depends on the compiler and the optimizations that you ask it to do.
Quoting from this link: tail recursion using factorial as example
factorial(n) {
if (n == 0) return 1;
return n * factorial(n - 1);
}//equivalent to your code
This definition is NOT tail-recursive since the recursive call to
factorial is not the last thing in the function
(its result has to be multiplied by n)
Tail Recursive is a special case of recursion in which the last operation of the function is a recursive call. In a tail recursive function, there are no pending operations to be performed on return from a recursive call.
The function you mentioned is not a tail recursive because there is a pending operation i.e multiplication to be performed on the return from a recursive call.
In case you did this:
int factorial(int n,int result)
{
if (n > 1)
{ /* Recursive case */
return factorial(n-1,n*result);
}
else
{ /* Base case */
return result;
}
}
would be a tail recursive function. since it has no pending operation on return from a recursive call.
The book 'Modern Compiler Design' is the nice book about compilers. In its source code something that is annoying me is AST or Abstract Syntax Tree. Suppose we want to write a parenthesized expression parser which parses something like: ((2+3)*4) * 2! The book says that we have an AST like:
((2+3)*4) * 2
/ | \
(2+3) *4 * 2
/ | \
(2+3) * 4
/ | \
2 + 3
So should I save a tree in memory or just use recursive calls; Note: if I don't store it in memory, how can I convert it to machine code ?
Parser code:
int parse(Expression &expr)
{
if(token.class=='D')
{
expr.type='D';
expr.value=token.val-'0';
get_next_token();
return 1;
}
if(token.class=='(')
{
expr.type='P';
get_next_token();
parse(&expr->left);
parse_operator(&expr->op);
parse(&expr->right);
if(token.class!=')')
Error("missing )");
get_next_token();
return 1;
}
return 0;
}
Grammar is:
expr -> expr | (expr op expr)
digit -> 0|1|2....|9
op -> +|*
You can store the tree in memory or you can directly produce the required output code. Storing the intermediate form is normally done to be able to do some processing on the code at an higher level before generating output.
In your case for example it would be simple to discover that your expression contains no variables and therefore the result is a fixed number. Looking only at one node at a time this however is not possible. To be more explicit if after looking at "2*" you generate machine code for computing the double of something this code is sort of wasted when the other part is for example "3" because your program will compute "3" and then compute the double of that every time while just loading "6" would be equivalent but shorter and faster.
If you want to generate the machine code then you need first to know for what kind of machine the code is going to be generated... the simplest model uses a stack-based approach. In this case you need no register allocation logic and it's easy to compile directly to machine code without the intermediate representation. Consider this small example that handles just integers, four operations, unary negation and variables... you will notice that no data structure is used at all: source code characters are read and machine instructions are written to output...
#include <stdio.h>
#include <stdlib.h>
void error(const char *what) {
fprintf(stderr, "ERROR: %s\n", what);
exit(1);
}
void compileLiteral(const char *& s) {
int v = 0;
while (*s >= '0' && *s <= '9') {
v = v*10 + *s++ - '0';
}
printf(" mov eax, %i\n", v);
}
void compileSymbol(const char *& s) {
printf(" mov eax, dword ptr ");
while ((*s >= 'a' && *s <= 'z') ||
(*s >= 'A' && *s <= 'Z') ||
(*s >= '0' && *s <= '9') ||
(*s == '_')) {
putchar(*s++);
}
printf("\n");
}
void compileExpression(const char *&);
void compileTerm(const char *& s) {
if (*s >= '0' && *s <= '9') {
// Number
compileLiteral(s);
} else if ((*s >= 'a' && *s <= 'z') ||
(*s >= 'A' && *s <= 'Z') ||
(*s == '_')) {
// Variable
compileSymbol(s);
} else if (*s == '-') {
// Unary negation
s++;
compileTerm(s);
printf(" neg eax\n");
} else if (*s == '(') {
// Parenthesized sub-expression
s++;
compileExpression(s);
if (*s != ')')
error("')' expected");
s++;
} else {
error("Syntax error");
}
}
void compileMulDiv(const char *& s) {
compileTerm(s);
for (;;) {
if (*s == '*') {
s++;
printf(" push eax\n");
compileTerm(s);
printf(" mov ebx, eax\n");
printf(" pop eax\n");
printf(" imul ebx\n");
} else if (*s == '/') {
s++;
printf(" push eax\n");
compileTerm(s);
printf(" mov ebx, eax\n");
printf(" pop eax\n");
printf(" idiv ebx\n");
} else break;
}
}
void compileAddSub(const char *& s) {
compileMulDiv(s);
for (;;) {
if (*s == '+') {
s++;
printf(" push eax\n");
compileMulDiv(s);
printf(" mov ebx, eax\n");
printf(" pop eax\n");
printf(" add eax, ebx\n");
} else if (*s == '-') {
s++;
printf(" push eax\n");
compileMulDiv(s);
printf(" mov ebx, eax\n");
printf(" pop eax\n");
printf(" sub eax, ebx\n");
} else break;
}
}
void compileExpression(const char *& s) {
compileAddSub(s);
}
int main(int argc, const char *argv[]) {
if (argc != 2) error("Syntax: simple-compiler <expr>\n");
compileExpression(argv[1]);
return 0;
}
For example running the compiler with 1+y*(-3+x) as input you get as output
mov eax, 1
push eax
mov eax, dword ptr y
push eax
mov eax, 3
neg eax
push eax
mov eax, dword ptr x
mov ebx, eax
pop eax
add eax, ebx
mov ebx, eax
pop eax
imul ebx
mov ebx, eax
pop eax
add eax, ebx
However this approach of writing compilers doesn't scale well to an optimizing compiler.
While it's possible to get some optimization by adding a "peephole" optimizer in the output stage, many useful optimizations are possible only looking at code from an higher point of view.
Also even the bare machine code generation could benefit by seeing more code, for example to decide which register assign to what or to decide which of the possible assembler implementations would be convenient for a specific code pattern.
For example the same expression could be compiled by an optimizing compiler to
mov eax, dword ptr x
sub eax, 3
imul dword ptr y
inc eax
Nine times out of ten you'll save the AST in memory for whatever you are doing after lexing and parsing are done.
Once you have an AST you can do a number of things:
Evaluate it directly (perhaps using recursion, perhaps using your own custom stack)
Transform it into some other output, such as code in another language or some other type of translation.
Compile it to preferred instruction set
etc.
You can create an AST with Dijkstra's Shunting-yard algorithm.
At some point you will have the whole expression or AST in memory though, unless you calculate immediate results while parsing. This works with (sub-)expressions containing only literals or compile time constants, but not with any variables calculated at runtime.
So should I save a tree in memory or just use recursive calls;
You'll use recursive calls in your parser to build the tree in memory.
And of course, you want to keep the tree in memory to process it.
An optimizing compiler keeps several representations of the code in memory (and transform them).
The answer to the question depends on whether you want a compiler, an interpreter, or something in between (an interpreter wrapped around an intermediate language). If you want an interpreter, a recursive descent parser will at the same time evaluate the expression, so there is no need to hold it in memory. If you want a compiler, then a constant expression like the example can and should be optimised, but most expressions will operate on variables, and you need to convert to tree form as an intermediate step before converting to a linear form.
A hybrid compiler / interpreter will usually compile expressions, but it doesn't have to. It's often a cheap way of writing a program which outputs an executable to simply wrap the interpreter up with the source code. Matlab uses this technique - code used to be genuinely compiled but there were problems with consistency with the interactive version. However I wouldn't allow the difficulty of generating a parse tree for expressions determine the issue.