Tail recursion in C++ - c++

Can someone show me a simple tail-recursive function in C++?
Why is tail recursion better, if it even is?
What other kinds of recursion are there besides tail recursion?

A simple tail recursive function:
unsigned int f( unsigned int a ) {
if ( a == 0 ) {
return a;
}
return f( a - 1 ); // tail recursion
}
Tail recursion is basically when:
there is only a single recursive call
that call is the last statement in the function
And it's not "better", except in the sense that a good compiler can remove the recursion, transforming it into a loop. This may be faster and will certainly save on stack usage. The GCC compiler can do this optimisation.

Tail recusion in C++ looks the same as C or any other language.
void countdown( int count ) {
if ( count ) return countdown( count - 1 );
}
Tail recursion (and tail calling in general) requires clearing the caller's stack frame before executing the tail call. To the programmer, tail recursion is similar to a loop, with return reduced to working like goto first_line;. The compiler needs to detect what you are doing, though, and if it doesn't, there will still be an additional stack frame. Most compilers support it, but writing a loop or goto is usually easier and less risky.
Non-recursive tail calls can enable random branching (like goto to the first line of some other function), which is a more unique facility.
Note that in C++, there cannot be any object with a nontrivial destructor in the scope of the return statement. The end-of-function cleanup would require the callee to return back to the caller, eliminating the tail call.
Also note (in any language) that tail recursion requires the entire state of the algorithm to be passed through the function argument list at each step. (This is clear from the requirement that the function's stack frame be eliminated before the next call begins… you can't be saving any data in local variables.) Furthermore, no operation can be applied to the function's return value before it's tail-returned.
int factorial( int n, int acc = 1 ) {
if ( n == 0 ) return acc;
else return factorial( n-1, acc * n );
}

Tail recursion is a special case of a tail call. A tail call is where the compiler can see that there are no operations that need to be done upon return from a called function -- essentially turning the called function's return into it's own. The compiler can often do a few stack fix-up operations and then jump (rather than call) to the address of the first instruction of the called function.
One of the great things about this besides eliminating some return calls is that you also cut down on stack usage. On some platforms or in OS code the stack can be quite limited and on advanced machines like the x86 CPUs in our desktops decreasing the stack usage like this will improve data cache performance.
Tail recursion is where the called function is the same as the calling function. This can be turned into loops, which is exactly the same as the jump in the tail call optimization mentioned above. Since this is the same function (callee and caller) there are fewer stack fixups that need to be done before the jump.
The following shows a common way to do a recursive call which would be more difficult for a compiler to turn into a loop:
int sum(int a[], unsigned len) {
if (len==0) {
return 0;
}
return a[0] + sum(a+1,len-1);
}
This is simple enough that many compilers could probably figure it out anyway, but as you can see there is an addition that needs to happen after the return from the called sum returns a number, so a simple tail call optimization is not possible.
If you did:
static int sum_helper(int acc, unsigned len, int a[]) {
if (len == 0) {
return acc;
}
return sum_helper(acc+a[0], len-1, a+1);
}
int sum(int a[], unsigned len) {
return sum_helper(0, len, a);
}
You would be able to take advantage of the calls in both functions being tail calls. Here the sum function's main job is to move a value and clear a register or stack position. The sum_helper does all of the math.
Since you mentioned C++ in your question I'll mention some special things about that.
C++ hides some things from you which C does not. Of these destructors are the main thing that will get in the way of tail call optimization.
int boo(yin * x, yang *y) {
dharma z = x->foo() + y->bar();
return z.baz();
}
In this example the call to baz is not really a tail call because z needs to be destructed after the return from baz. I believe that the rules of C++ may make the optimization more difficult even in cases where the variable is not needed for the duration of the call, such as:
int boo(yin * x, yang *y) {
dharma z = x->foo() + y->bar();
int u = z.baz();
return qwerty(u);
}
z may have to be destructed after the return from qwerty here.
Another thing would be implicit type conversion, which can happen in C as well, but can more complicated and common in C++.
For instance:
static double sum_helper(double acc, unsigned len, double a[]) {
if (len == 0) {
return acc;
}
return sum_helper(acc+a[0], len-1, a+1);
}
int sum(double a[], unsigned len) {
return sum_helper(0.0, len, a);
}
Here sum's call to sum_helper is not a tail call because sum_helper returns a double and sum will need to convert that into an int.
In C++ it is quite common to return an object reference which may have all kinds of different interpretations, each of which could be a different type conversion,
For instance:
bool write_it(int it) {
return cout << it;
}
Here there is a call made to cout.operator<< as the last statement. cout will return a reference to itself (which is why you can string lots of things together in a list separated by << ), which you then force to be evaluated as a bool, which ends up calling another of cout's methods, operator bool(). This cout.operator bool() could be called as a tail call in this case, but operator<< could not.
EDIT:
One thing that is worth mentioning is that a major reason that tail call optimization in C is possible is that the compiler knows that the called function will store it's return value in the same place as the calling function would have to ensure that its return value is stored in.

Tail recursion is a trick to actually cope with two issues at the same time. The first is executing a loop when it is hard to know the number of iterations to do.
Though this can be worked out with simple recursion, the second problem arises which is that of stack overflow due to the recursive call being executed too many times. The tail call is the solution, when accompanied by a "compute and carry" technique.
In basic CS you learn that a computer algorithm needs to have an invariant and a termination condition. This is the base for building the tail recursion.
All computation happens in the argument passing.
All results must be passed onto function calls.
The tail call is the last call, and occurs at termination.
To simply put it, no computation must happen on the return value of your function .
Take for example the computation of a power of 10, which is trivial and can be written by a loop.
Should look something like
template<typename T> T pow10(T const p, T const res =1)
{
return p ? res: pow10(--p,10*res);
}
This gives an execution, e.g 4:
ret,p,res
-,4,1
-,3,10
-,2,100
-,1,1000
-,0,10000
10000,-,-
It is clear that the compiler just has to copy values without changing the stack pointer and when the tail call happens just to return the result.
Tail recursion is very important because it can provide ready made compile time evaluations, e.g. The above can be made to be.
template<int N,int R=1> struct powc10
{
int operator()() const
{
return powc10<N-1, 10*R>()();
}
};
template<int R> struct powc10<0,R>
{
int operator()() const
{
return R;
}
};
this can be used as powc10<10>()() to compute the 10th power at compile time.
Most compilers have a limit of nested calls so the tail call trick helps. Evidently,there are no meta programming loops, so have to use recursion.

Tail recursion does not exist really at compiler level in C++.
Although you can write programs that use tail recursion, you do not get the inherit benefits of tail recursion implemented by supporting compilers/interpreters/languages. For instance Scheme supports a tail recursion optimization so that it basically will change recursion into iteration. This makes it faster and invulnerable to stack overflows. C++ does not have such a thing. (least not any compiler I've seen)
Apparently tail recursion optimizations exist in both MSVC++ and GCC. See this question for details.

Wikipedia has a decent article on tail recursion. Basically, tail recursion is better than regular recursion because it's trivial to optimize it into an iterative loop, and iterative loops are generally more efficient than recursive function calls. This is particularly important in functional languages where you don't have loops.
For C++, it's still good if you can write your recursive loops with tail recursion since they can be better optimized, but in such cases, you can generally just do it iteratively in the first place, so the gain is not as great as it would be in a functional language.

Related

I can't get the right output that I want and the answer changes every time

So I am trying to code for this question:
Yes, I have to use arrays since it is a requirement.
Consider the problem of adding two n-bit binary integers, stored in two n-element arrays A and B. The sum of the two integers should be stored in binary form in an (n+1) element array C . State the problem formally and write pseudocode for adding the two integers.
I know that the ans array contains the correct output at the end of the addd function. However, I am not able to output that answer.
Below is my code. Please help me figure where in the code I'm going wrong, and what I can do to change it so it works. I will be very grateful.
#include <iostream>
using namespace std;
int * addd(int a[], int n1, int b[], int n2)
{
int s;
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
int ans[s];
int i=n1-1, j=n2-1, k=s-1;
int carry=0;
while(i>=0 && j>=0 && k>0)
{
ans[k]=(a[i]+b[j]+carry)%2;
//cout<<k<<" "<<ans[k]<<endl;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
//cout<<"Carry "<<carry<<endl;
ans[0]=carry;
return ans;
}
int main(int argc, const char * argv[]) {
// insert code here...
int a[]={0,0,0,1,1,1};
int n1=sizeof(a)/sizeof(a[0]);
int b[]={1,0,1,1,0,1};
int n2=sizeof(b)/sizeof(b[0]);
int *p=addd(a,6,b,6);
// cout<<p[1]<<endl;
// cout<<p[0]<<" "<<p[1]<<" "<<p[2]<<" "<<p[3]<<" "<<p[4]<<" "<<p[5]<<" "<<p[6]<<endl;
return 0;
}
using namespace std;
Don't write using namespace std;. I have a summary I paste in from a file of common issues when I'm active in the Code Review Stack Exchange, but I don't have that here. Instead, you should just declare the symbols you need, like using std::cout;
int * addd(int a[], int n1, int b[], int n2)
The parameters of the form int a[] are very odd. This comes from C and is actually transformed into int* a and is not passing the array per-se.
The inputs should be const.
The names are not clear, but I'm guessing that n1 is the size of the array? In the Standard Guidelines, you'll see that passing a pointer plus length is strongly discouraged. The Standard Guidelines Library supplies a simple span type to use for this instead.
And the length should be size_t not int.
Based on the description, I think each element is only one bit, right? So why are the arrays of type int? I'd use bool or perhaps int8_t as being easier to work with.
What are you returning? If a and b and their lengths are the input, where is the output that you are returning a pointer to the beginning of? This is not giving value semantics, as you are returning a pointer to something that must exist elsewhere so what is its lifetime?
int s;
int ans[s];
return ans;
Well, there's your problem. First of all, declaring an array of a size that's not a constant is not even legal. (This is a gnu extension that implements C's VLA feature but not without issues as it breaks the C++ type system)
Regardless of that, you are returning a pointer to the first element of the local array, so what happens to the memory when the function returns? Boom.
int s;
No. Initialize values when they are created.
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
Learn the library.
How about:
const size_t s = 1+std::max(n1,n2);
and then the portable way to get your memory is:
std::vector<int> ans(s);
Your main logic will not work if one array is shorter than the other. The shorter input should behave as if it had leading zeros to match. Consider abstracting the problem of "getting the next bit" so you don't duplicate the code for handling each input and make an unreadable mess. You really should have learned to use collections and iterators first.
now:
return ans;
would work as intended since it is a value. You just need to declare the function to be the right type. So just use auto for the return type and it knows.
int n1=sizeof(a)/sizeof(a[0]);
Noooooooo.
There is a standard function to give the size of a built-in primitive array. But really, this should be done automatically as part of the passing, not as a separate thing, as noted earlier.
int *p=addd(a,6,b,6);
You wrote 6 instead of n1 etc.
Anyway, with the previous edits, it becomes:
using std::size;
const auto p = addd (a, size(a), b, size(b));
Finally, concerning:
cout<<p[0]<<" "<<p[1]<<" "<<p[2]<<" "<<p[3]<<" "<<p[4]<<" "<<p[5]<<" "<<p[6]<<endl;
How about using loops?
for (auto val : p) cout << val;
cout << '\n';
oh, don't use endl. It's not needed for cout which auto-flushes anyway, and it's slow. Modern best practice is to use '\n' and then flush explicitly if/when needed (like, never).
Let's look at:
int ans[s];
Apart that this is not even part of the standard and probably the compiler is giving you some warnings (see link), that command allocate temporary memory in the stack which gets deallocated on function exit: that's why you are getting every time different results, you are reading garbage, i.e. memory that in the meantime might have been overwritten.
You can replace it for example with
int* ans = new int[s];
Don't forget though to deallocate the memory when you have finished using the buffer (outside the function), to avoid memory leakage.
Some other notes:
int s;
if(n1<n2) {s=n2+1;}
else {s=n1+1;}
This can be more elegantly written as:
const int s = (n1 < n2) ? n2 + 1 : n1 + 1;
Also, the actual computation code is imprecise as it leads to wrong results if n1 is not equal to n2: You need further code to finish processing the remaining bits of the longest array. By the way you don't need to check on k > 0 because of the way you have defined s.
The following should work:
int i=n1-1, j=n2-1, k=s-1;
int carry=0;
while(i>=0 && j>=0)
{
ans[k]=(a[i]+b[j]+carry)%2;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
while(i>=0) {
ans[k]=(a[i]+carry)%2;
carry=(a[i]+carry)/2;
i--; k--;
}
while(j>=0) {
ans[k]=(b[j]+carry)%2;
carry=(b[j]+carry)/2;
j--; k--;
}
ans[0]=carry;
return ans;
}
If You Must Only Use C Arrays
Returning ans is returning the pointer to a local variable. The object the pointer refers to is no longer valid after then function has returned, so trying to read it would lead to undefined behavior.
One way to fix this is to pass in the address to an array to hold your answer, and populate that, instead of using a VLA (which is a non-standard C++ extension).
A VLA (variable length array) is an array which takes its size from a run-time computed value. In your case:
int s;
//... code that initializes s
int ans[s];
ans is a VLA because you are not using a constant to determine the array size. However, that is not a standard feature of the C++ language (it is an optional one in the C language).
You can modify your function so that ans is actually provided by the caller.
int * addd(int a[], int n1, int b[], int n2, int ans[])
{
//...
And then the caller would be responsible for passing in a large enough array to hold the answer.
Your function also appears to be incomplete.
while(i>=0 && j>=0 && k>0)
{
ans[k]=(a[i]+b[j]+carry)%2;
//cout<<k<<" "<<ans[k]<<endl;
carry=(a[i]+b[j]+carry)/2;
i--; j--; k--;
}
If one array is shorter than the other, then the index for the shorter array will reach 0 first. Then, when that corresponding index goes negative, the loop will stop, without handling the remaining terms in the longer array. This essentially makes the corresponding entries in ans be uninitialized. Reading those values results in undefined behavior.
To address this, you should populate the remaining entries in ans with the correct calculation based on carry and the remaining entries in the longer array.
A More C++ Approach
The original answer above was provided assuming you were constrained to only using C style arrays for both input and output, and that you wanted an answer that would allow you to stay close to your original implementation.
Below is a more C++ oriented solution, assuming you still need to provide C arrays as input, but otherwise no other constraint.
C Array Wrapper
A C array does not provide the amenities that you may be accustomed to have when using C++ containers. To gain some of these nice to have features, you can write an adapter that allows a C array to behave like a C++ container.
template <typename T, std::size_t N>
struct c_array_ref {
typedef T ARR_TYPE[N];
ARR_TYPE &arr_;
typedef T * iterator;
typedef std::reverse_iterator<T *> reverse_iterator;
c_array_ref (T (&arr)[N]) : arr_(arr) {}
std::size_t size () { return N; }
T & operator [] (int i) { return arr_[i]; }
operator ARR_TYPE & () { return arr_; }
iterator begin () { return &arr_[0]; }
iterator end () { return begin() + N; }
reverse_iterator rbegin () { return reverse_iterator(end()); }
reverse_iterator rend () { return reverse_iterator(begin()); }
};
Use C Array References
Instead of passing in two arguments as information about the array, you can pass in the array by reference, and use template argument deduction to deduce the array size.
Return a std::array
Although you cannot return a local C array like you attempted in your question, you can return an array that is wrapped inside a struct or class. That is precisely what the convenience container std::array provides. When you use C array references and template argument deduction to obtain the array size, you can now compute at compile time the proper array size that std::array should have for the return value.
template <std::size_t N1, std::size_t N2>
std::array<int, ((N1 < N2) ? N2 : N1) + 1>
addd(int (&a)[N1], int (&b)[N2])
{
Normalize the Input
It is much easier to solve the problem if you assume the arguments have been arranged in a particular order. If you always want the second argument to be the larger array, you can do that with a simple recursive call. This is perfectly safe, since we know the recursion will happen at most once.
if (N2 < N1) return addd(b, a);
Use C++ Containers (or Look-Alike Adapters)
We can now convert our arguments to the adapter shown earlier, and also create a std::array to hold the output.
c_array_ref<int, N1> aa(a);
c_array_ref<int, N2> bb(b);
std::array<int, std::max(N1, N2)+1> ans;
Leverage Existing Algorithms if Possible
In order to deal with the short comings of your original program, you can adjust your implementation a bit in an attempt to remove special cases. One way to do that is to store the result of adding the longer array to 0 and storing it into the output. However, this can mostly be accomplished with a simple call to std::copy.
ans[0] = 0;
std::copy(bb.begin(), bb.end(), ans.begin() + 1);
Since we know the input consists of only 1s and 0s, we can compute straight addition from the shorter array into the longer array, without concern for carry (that will be addressed in the next step). To compute this addition, we apply std::transform with a lambda expression.
std::transform(aa.rbegin(), aa.rend(), ans.rbegin(),
ans.rbegin(),
[](int a, int b) -> int { return a + b; });
Lastly, we can make a pass over the output array to fix up the carry computation. After doing so, we are ready to return the result. The return is possible because we are using std::array to represent the answer.
for (auto i = ans.rbegin(); i != ans.rend()-1; ++i) {
*(i+1) += *i / 2;
*i %= 2;
}
return ans;
}
A Simpler main Function
We now only need to pass in the two arrays to the addd function, since template type deduction will discover the sizes of the arrays. In addition, the output generator can be handled more easily with an ostream_iterator.
int main(int, const char * []) {
int a[]={1,0,0,0,1,1,1};
int b[]={1,0,1,1,0,1};
auto p=addd(a,b);
std::copy(p.begin(), p.end(),
std::ostream_iterator<int>(std::cout, " "));
return 0;
}
Try it online!
If I may editorialize a bit... I think this is a deceptively difficult question for beginners, and as-stated should flag problems in the design review long before any attempt at coding. It's telling you to do things that are not good/typical/idiomatic/proper in C++, and distracting you with issues that get in the way of the actual logic to be developed.
Consider the core algorithm you wrote (and Antonio corrected): that can be understood and discussed without worrying about just how A and B are actually passed in for this code to use, or exactly what kind of collection it is. If they were std::vector, std::array, or primitive C array, the usage would be identical. Likewise, how does one return the result out of the code? You populate ans here, and how it is gotten into and/or out of the code and back to main is not relevant.
Primitive C arrays are not first-class objects in C++ and there are special rules (inherited from C) on how they are passed as arguments.
Returning is even worse, and returning dynamic-sized things was a major headache in C and memory management like this is a major source of bugs and security flaws. What we want is value semantics.
Second, using arrays and subscripts is not idiomatic in C++. You use iterators and abstract over the exact nature of the collection. If you were interested in writing super-efficent back-end code that doesn't itself deal with memory management (it's called by other code that deals with the actual collections involved) it would look like std::merge which is a venerable function that dates back to the early 90's.
template< class InputIt1, class InputIt2, class OutputIt >
OutputIt merge( InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
OutputIt d_first );
You can find others with similar signatures, that take two different ranges for input and outputs to a third area. If you write addp exactly like this, you could call it with primitive C arrays of hardcoded size:
int8_t A[] {0,0,0,1,1,1};
int8_t B[] {1,0,1,1,0,1};
int8_t C[ ??? ];
using std::begin; std::end;
addp (begin(A),end(A), begin(B), end(B), begin(C));
Note that it's up to the caller to have prepared an output area large enough, and there's no error checking.
However, the same code can be used with vectors, or even any combination of different container types. This could populate a std::vector as the result by passing an insertion iterator. But in this particular algorithm that's difficult since you're computing it in reverse order.
std::array
Improving upon the situation with primitive C arrays, you could use the std::array class which is exactly the same array but without the strange passing/returning rules. It's actually just a primitive C array inside a wrapping struct. See this documentation: https://en.cppreference.com/w/cpp/container/array
So you could write it as:
using BBBNum1 = std::array<int8_t, 6>
BBBNum1 addp (const BBBNum1& A, const BBBNum1& B) { ... }
The code inside can use A[i] etc. in the same way you are, but it also can get the size via A.size(). The issue here is that the inputs are the same length, and the output is the same as well (not 1 larger). Using templates, it could be written to make the lengths flexible but still only specified at compile time.
std::vector
The vector is like an array but with a run-time length. It's dynamic, and the go-to collection you should reach for in C++.
using BBBNum2 = std::vector<int8_t>
BBBNum2 addp (const BBBNum2& A, const BBBNum2& B) { ... }
Again, the code inside this function can refer to B[j] etc. and use B.size() exactly the same as with the array collection. But now, the size is a run-time property, and can be different for each one.
You would create your result, as in my first post, by giving the size as a constructor argument, and then you can return the vector by-value. Note that the compiler will do this efficiently and not actually have to copy anything if you write:
auto C = addp (A, B);
now for the real work
OK, now that this distraction is at least out of the way, you can worry about actually writing the implementation. I hope you are convinced that using vector instead of a C primitive array does not affect your problem logic or even the (available) syntax of using subscripts. Especially since the problem referred to psudocode, I interpret its use of "array" as "suitable indexable collection" and not specifically the primitive C array type.
The issue of going through 2 sequences together and dealing with differing lengths is actually a general purpose idea. In C++20, the Range library has things that make quick work of this. Older 3rd party libraries exist as well, and you might find it called zip or something like that.
But, let's look at writing it from scratch.
You want to read an item at a time from two inputs, but neatly make it look like they're the same length. You don't want to write the same code three times, or elaborate on the cases where A is shorter or where B may be shorter... just abstract out the idea that they are read together, and if one runs out it provides zeros.
This is its own piece of code that can be applied twice, to A and to B.
class backwards_bit_reader {
const BBBnum2& x;
size_t index;
public:
backwards_bit_reader(const BBBnum2& x) : x{x}, index{x.size()} {}
bool done() const { return index == 0; }
int8_t next()
{
if (done()) return 0; // keep reading infinite leading zeros
--index;
return x[index];
}
};
Now you can write something like:
backwards_bit_reader A_in { A };
backwards_bit_reader B_in { B };
while (!A_in.done() && !B_in.done()) {
const a = A_in.next();
const b = B_in.next();
const c = a+b+carry;
carry = c/2; // update
C[--k]= c%2;
}
C[0]= carry; // the final bit, one longer than the input
It can be written far more compactly, but this is clear.
another approach
The problem is, is writing backwards_bit_reader beyond what you've learned thus far? How else might you apply the same logic to both A and B without duplicating the statements?
You should be learning to recognize what's sometimes called "code smell". Repeating the same block of code multiple times, and repeating the same steps with nothing changed but which variable it's applying to, should be seen as ugly and unacceptable.
You can at least cut back the cases by ensuring that B is always the longer one, if they are of different length. Do this by swapping A and B if that's not the case, as a preliminary step. (Actually implementing that well is another digression)
But the logic is still nearly duplicated, since you have to deal with the possibility of the carry propagating all the way to the end. Just now you have 2 copies instead of 3.
Extending the shorter one, at least in façade, is the only way to write one loop.
how realistic is this problem?
It's simplified to the point of being silly, but if it's not done in base 2 but with larger values, this is actually implementing multi-precision arithmetic, which is a real thing people want to do. That's why I named the type above BBBNum for "Bad Binary Bignum".
Getting down to an actual range of memory and wanting the code to be fast and optimized is also something you want to do sometimes. The BigNum is one example; you often see this with string processing. But we'll want to make an efficient back-end that operates on memory without knowing how it was allocated, and higher-level wrappers that call it.
For example:
void addp (const int8_t* a_begin, const int8_t* a_end,
const int8_t* b_begin, const int8_t* b_end,
int8_t* result_begin, int8_t* result_end);
will use the provided range for output, not knowing or caring how it was allocated, and taking input that's any contiguous range without caring what type of container is used to manage it as long as it's contiguous. Note that as you saw with the std::merge example, it's more idiomatic to pass begin and end rather than begin and size.
But then you have helper functions like:
BBBNum2 addp (const BBBNum2& A, const BBBNum2& B)
{
BBBNum result (1+std::max(A.size(),B.size());
addp (A.data(), A.data()+A.size(), B.data(), B.data()+B.size(), C.data(), C.data()+C.size());
}
Now the casual user can call it using vectors and a dynamically-created result, but it's still available to call for arrays, pre-allocated result buffers, etc.

In recursive DP, break up recursion call by storing variables: inefficient?

Suppose I am solving a dynamic programming problem recursively (top down). For example, a recursive solution to the longest common subsequence problem:
LCS(S,n,T,m)
{
if (n==0 || m==0) return 0;
if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1);
else result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) );
return result;
}
Often in such a DP problem at some point we have to take the max of some expressions, representing returns to different choices we can make. In the above case we have the max of two simple expressions, but in worse cases it can be the max of three or four quite complicated expressions involving long function calls. In such situations, I am often tempted to give these complicated expressions their own variable names, to make the code more readable. In the above case that would mean I would write
LCS(S,n,T,m)
{
if (n==0 || m==0) return 0;
if (S[n] == T[m]) result = 1 + LCS(S,n-1,T,m-1);
else
a = LCS(S,n-1,T,m);
b = LCS(S, n, T, m-1);
result = max(a, b);
return result;
}
(In this simplified case a and b are not complicated, but in other cases they are, and there may be even more arguments to the max function, so this could really help it be more understandable.)
My Question: Is this a terrible idea? As I understand it, I'm adding a variable to each layer of the call stack, and I'm thinking that could be wasteful. But on the other hand, at each layer it has to calculate the temporary variable LCS(S,n,T,m) anyway (I'm thinking in terms of C++, say), and as far as I know, there might be not much difference in cost between the two ways.
If this is a terrible idea, is there a more efficient way to break up a complicated recursive function call to make it more readable?
C++ has the "As-If" rule, which states that a compiler can do whatever it wants so long as the observable effects are indistinguishable from what is defined by the standard to happen. In this case, it's trivial to prove both fragments have the same meaning, and a compiler will likely emit identical instructions for both.
Note: You aren't doing dynamic programming here, as you don't memoise parameter / result pairs.

what is the difference between a recursive version of "find" and a not recursive one?

In the book Accelerated C++ Programming, on page 205, there are the two following implementation of find
template <class In, class X> In find(In begin, In end, const X& x)
I am interested in knowing what is any difference in terms of performance (whether it's actually the same after the compilation?) of the following two implementations.
non-recursive
template <class In, class X> In find(In begin, In end, const X& x)
{
while (begin != end && *begin != x)
++begin;
return begin;
}
recursive
template <class In, class X> In find(In begin, In end, const X& x)
{
if (begin == end || *begin == x)
return begin;
begin++;
return find(begin, end, x);
}
By using Compiler Explorer suggested by Kerrek I got the following
non-recursive https://godbolt.org/g/waKUF2
recursive https://godbolt.org/g/VKNnYZ
It seems to be exactly the same after the compilation? (If I use the tool correctly.. Sorry, I am very new to C++)
Recursive functions will add additionally elements on the stack. This can potentially cause stackoverflow errors depending on the state of the stack before starting recursion and the number of times you recurse.
Each function call pushes data onto the stack which includes the return address. This continues until the data is found. At this time, all of the functions will start to return the value that the last function returned until the we finally get back to the function that called the original find.
The exact amount of data stored for each function call depends on the calling convention and architecture. There is also overhead from pushing data on the stack which can make the algorithm slower, but that depends on the algorithm.
This is strictly for recursion that is not tail call optimized.
For the most part recursion is slower, and takes up more of the stack as well. The main advantage of recursion is that for problems like tree traversal it make the algorithm a little easier or more "elegant".
Check out some of the comparisons:
Recursion vs Iteration

Tail-recursion with objects

I have a recursive function that I would like to make tail-recursive. My actual problem is more complex and context-dependent. But the issue I would like to solve is demonstrated with this simple program:
#include <iostream>
struct obj
{
int n;
operator int&() { return n; }
};
int tail(obj n)
{
return tail(obj{ n + 1 > 1000 ? n - 1000 : n + 1 });
}
int main()
{
tail(obj{ 1 });
}
It seems natural that this is tail-recursive. It is not, though, because the destructor of obj n has to be called each time. At least MSVC13 (edit:) and MSVC15 do not optimize this. If I replace obj with int and change the calls accordingly, it becomes tail-recursive as expected.
My actual question is: Is there an easy way to make this tail-recursive apart from just replacing obj with int? I am aiming for performance benefits, so playing around with heap-allocated memory and new is most likely not helpful.
Short Answer: No.
Longer Answer: You might find a way to achieve this but certainly no easy one.
Since tail call optimization is not required by the standard, you can never know for sure if some minor change to your program will make the compiler fail to optimize the code.
Worse, consider what happens when you need to debug your program. The compiler will almost certainly not optimize advanced tail calls with debugger flags, which means that your program will only work correctly in release mode. This will make the program much harder to maintain.
Alternative to tail recursion
Just write a loop. It can always be done and it is likely to be much, much less convoluted. It also doesn't use the heap, so the overhead will be much smaller.
Since you use a temporary, I assume you don't need the object after the recursive call.
One fairly hackish solution is to allocate an object, pass a pointer to it, and reallocate it before making the recursive call, to which you pass the object you newly constructed.
struct obj
{
int n;
operator int&() { return n; }
};
int tail_impl(obj*& n)
{
int n1 = *n + 1 > 1000 ? *n - 1000 : *n + 1;
delete n;
n = new obj{n1};
return tail_impl(n);
}
int tail(obj n)
{
obj *n1 = new obj{n};
auto ret = tail_impl(n1);
delete n1;
return ret;
}
int main()
{
tail(obj{ 1 });
}
I've obviously omitted some crucial exception safety details. However GCC is able to turn tail_impl into a loop, since it is indeed tail recursion.

why is this factorial recursive function inefficient?

In the book "Think like a programmer", the following recursive function is said to be "highly inefficient" and I can't figure out why (the book does not explain). It doesn't seem like there are any unnecessary calculations being done. Is it because of the overhead of calling so many functions (well the same function multiple times) and thus setting up environments for each call to the function?
int factorial(int n) {
if (n == 1) return 1;
else return n * factorial(n-1);
}
It is inefficient in two ways, and you hit one of them:
It is recursive, instead of iterative. This will be highly inefficient if tail-call optimization is not enabled. (To learn more about tail-call optimization, look here.) It could have been done like this:
int factorial(int n)
{
int result = 1;
while (n > 0)
{
result *= n;
n--;
}
return result;
}
or, alternatively, with a for loop.
However, as noted in comments above, it doesn't really matter how efficient it is if an int can't even hold the result. It really should be longs, long longs, or even big-ints.
The second inefficiency is simply not having an efficient algorithm. This list of factorial algorithms shows some more efficient ways of computing the factorial by decreasing the number of numerical operations.
There is significant function call overhead in C when not using a compiler that implements tail call optimization.
Function call overhead is the extra time and memory necessary for a computer to properly set up a function call.
Tail call optimization is a method of turning recursive functions like the one given into a loop.
I think the book writer may want to tell readers to not abuse recursion. For this function you could just use:
int factorial(int n) {
int res = 1;
for (i = 1; i <= n; i++) {
res = res * i;
}
return res;
}
Recursion is slower as well as memory eater in terms of Memory Stack.It is a time taking work to push info onto the stack and again to pop it .The main advantage of recursion is that it makes the algorithm a little easier to understand or more "elegant".
For finding the factorial we can use For loop that will be good in terms of memory as well as Time Complexity.
int num=4;
int fact = 1;
for (;num>1;num--)
{
fact = fact*num;
}
//display fact