Tail Recursion, why it's efficent? [duplicate]

Tail Recursion, why it's efficent? [duplicate] - c++

Very simply, what is tail-call optimization?
More specifically, what are some small code snippets where it could be applied, and where not, with an explanation of why?

Tail-call optimization is where you are able to avoid allocating a new stack frame for a function because the calling function will simply return the value that it gets from the called function. The most common use is tail-recursion, where a recursive function written to take advantage of tail-call optimization can use constant stack space.
Scheme is one of the few programming languages that guarantee in the spec that any implementation must provide this optimization, so here are two examples of the factorial function in Scheme:
(define (fact x)
(if (= x 0) 1
(* x (fact (- x 1)))))
(define (fact x)
(define (fact-tail x accum)
(if (= x 0) accum
(fact-tail (- x 1) (* x accum))))
(fact-tail x 1))
The first function is not tail recursive because when the recursive call is made, the function needs to keep track of the multiplication it needs to do with the result after the call returns. As such, the stack looks as follows:
(fact 3)
(* 3 (fact 2))
(* 3 (* 2 (fact 1)))
(* 3 (* 2 (* 1 (fact 0))))
(* 3 (* 2 (* 1 1)))
(* 3 (* 2 1))
(* 3 2)
6
In contrast, the stack trace for the tail recursive factorial looks as follows:
(fact 3)
(fact-tail 3 1)
(fact-tail 2 3)
(fact-tail 1 6)
(fact-tail 0 6)
6
As you can see, we only need to keep track of the same amount of data for every call to fact-tail because we are simply returning the value we get right through to the top. This means that even if I were to call (fact 1000000), I need only the same amount of space as (fact 3). This is not the case with the non-tail-recursive fact, and as such large values may cause a stack overflow.

Let's walk through a simple example: the factorial function implemented in C.
We start with the obvious recursive definition
unsigned fac(unsigned n)
{
if (n < 2) return 1;
return n * fac(n - 1);
}
A function ends with a tail call if the last operation before the function returns is another function call. If this call invokes the same function, it is tail-recursive.
Even though fac() looks tail-recursive at first glance, it is not as what actually happens is
unsigned fac(unsigned n)
{
if (n < 2) return 1;
unsigned acc = fac(n - 1);
return n * acc;
}
ie the last operation is the multiplication and not the function call.
However, it's possible to rewrite fac() to be tail-recursive by passing the accumulated value down the call chain as an additional argument and passing only the final result up again as the return value:
unsigned fac(unsigned n)
{
return fac_tailrec(1, n);
}
unsigned fac_tailrec(unsigned acc, unsigned n)
{
if (n < 2) return acc;
return fac_tailrec(n * acc, n - 1);
}
Now, why is this useful? Because we immediately return after the tail call, we can discard the previous stackframe before invoking the function in tail position, or, in case of recursive functions, reuse the stackframe as-is.
The tail-call optimization transforms our recursive code into
unsigned fac_tailrec(unsigned acc, unsigned n)
{
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n - 1;
goto TOP;
}
This can be inlined into fac() and we arrive at
unsigned fac(unsigned n)
{
unsigned acc = 1;
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n - 1;
goto TOP;
}
which is equivalent to
unsigned fac(unsigned n)
{
unsigned acc = 1;
for (; n > 1; --n)
acc *= n;
return acc;
}
As we can see here, a sufficiently advanced optimizer can replace tail-recursion with iteration, which is far more efficient as you avoid function call overhead and only use a constant amount of stack space.

TCO (Tail Call Optimization) is the process by which a smart compiler can make a call to a function and take no additional stack space. The only situation in which this happens is if the last instruction executed in a function f is a call to a function g (Note: g can be f). The key here is that f no longer needs stack space - it simply calls g and then returns whatever g would return. In this case the optimization can be made that g just runs and returns whatever value it would have to the thing that called f.
This optimization can make recursive calls take constant stack space, rather than explode.
Example: this factorial function is not TCOptimizable:
from dis import dis
def fact(n):
if n == 0:
return 1
return n * fact(n-1)
dis(fact)
2 0 LOAD_FAST 0 (n)
2 LOAD_CONST 1 (0)
4 COMPARE_OP 2 (==)
6 POP_JUMP_IF_FALSE 12
3 8 LOAD_CONST 2 (1)
10 RETURN_VALUE
4 >> 12 LOAD_FAST 0 (n)
14 LOAD_GLOBAL 0 (fact)
16 LOAD_FAST 0 (n)
18 LOAD_CONST 2 (1)
20 BINARY_SUBTRACT
22 CALL_FUNCTION 1
24 BINARY_MULTIPLY
26 RETURN_VALUE
This function does things besides call another function in its return statement.
This below function is TCOptimizable:
def fact_h(n, acc):
if n == 0:
return acc
return fact_h(n-1, acc*n)
def fact(n):
return fact_h(n, 1)
dis(fact)
2 0 LOAD_GLOBAL 0 (fact_h)
2 LOAD_FAST 0 (n)
4 LOAD_CONST 1 (1)
6 CALL_FUNCTION 2
8 RETURN_VALUE
This is because the last thing to happen in any of these functions is to call another function.

Probably the best high level description I have found for tail calls, recursive tail calls and tail call optimization is the blog post
"What the heck is: A tail call"
by Dan Sugalski. On tail call optimization he writes:
Consider, for a moment, this simple function:
sub foo (int a) {
a += 15;
return bar(a);
}
So, what can you, or rather your language compiler, do? Well, what it can do is turn code of the form return somefunc(); into the low-level sequence pop stack frame; goto somefunc();. In our example, that means before we call bar, foo cleans itself up and then, rather than calling bar as a subroutine, we do a low-level goto operation to the start of bar. Foo's already cleaned itself out of the stack, so when bar starts it looks like whoever called foo has really called bar, and when bar returns its value, it returns it directly to whoever called foo, rather than returning it to foo which would then return it to its caller.
And on tail recursion:
Tail recursion happens if a function, as its last operation, returns
the result of calling itself. Tail recursion is easier to deal with
because rather than having to jump to the beginning of some random
function somewhere, you just do a goto back to the beginning of
yourself, which is a darned simple thing to do.
So that this:
sub foo (int a, int b) {
if (b == 1) {
return a;
} else {
return foo(a*a + a, b - 1);
}
gets quietly turned into:
sub foo (int a, int b) {
label:
if (b == 1) {
return a;
} else {
a = a*a + a;
b = b - 1;
goto label;
}
What I like about this description is how succinct and easy it is to grasp for those coming from an imperative language background (C, C++, Java)

GCC C minimal runnable example with x86 disassembly analysis
Let's see how GCC can automatically do tail call optimizations for us by looking at the generated assembly.
This will serve as an extremely concrete example of what was mentioned in other answers such as https://stackoverflow.com/a/9814654/895245 that the optimization can convert recursive function calls to a loop.
This in turn saves memory and improves performance, since memory accesses are often the main thing that makes programs slow nowadays.
As an input, we give GCC a non-optimized naive stack based factorial:
tail_call.c
#include <stdio.h>
#include <stdlib.h>
unsigned factorial(unsigned n) {
if (n == 1) {
return 1;
}
return n * factorial(n - 1);
}
int main(int argc, char **argv) {
int input;
if (argc > 1) {
input = strtoul(argv[1], NULL, 0);
} else {
input = 5;
}
printf("%u\n", factorial(input));
return EXIT_SUCCESS;
}
GitHub upstream.
Compile and disassemble:
gcc -O1 -foptimize-sibling-calls -ggdb3 -std=c99 -Wall -Wextra -Wpedantic \
-o tail_call.out tail_call.c
objdump -d tail_call.out
where -foptimize-sibling-calls is the name of generalization of tail calls according to man gcc:
-foptimize-sibling-calls
Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.
as mentioned at: How do I check if gcc is performing tail-recursion optimization?
I choose -O1 because:
the optimization is not done with -O0. I suspect that this is because there are required intermediate transformations missing.
-O3 produces ungodly efficient code that would not be very educative, although it is also tail call optimized.
Disassembly with -fno-optimize-sibling-calls:
0000000000001145 <factorial>:
1145: 89 f8 mov %edi,%eax
1147: 83 ff 01 cmp $0x1,%edi
114a: 74 10 je 115c <factorial+0x17>
114c: 53 push %rbx
114d: 89 fb mov %edi,%ebx
114f: 8d 7f ff lea -0x1(%rdi),%edi
1152: e8 ee ff ff ff callq 1145 <factorial>
1157: 0f af c3 imul %ebx,%eax
115a: 5b pop %rbx
115b: c3 retq
115c: c3 retq
With -foptimize-sibling-calls:
0000000000001145 <factorial>:
1145: b8 01 00 00 00 mov $0x1,%eax
114a: 83 ff 01 cmp $0x1,%edi
114d: 74 0e je 115d <factorial+0x18>
114f: 8d 57 ff lea -0x1(%rdi),%edx
1152: 0f af c7 imul %edi,%eax
1155: 89 d7 mov %edx,%edi
1157: 83 fa 01 cmp $0x1,%edx
115a: 75 f3 jne 114f <factorial+0xa>
115c: c3 retq
115d: 89 f8 mov %edi,%eax
115f: c3 retq
The key difference between the two is that:
the -fno-optimize-sibling-calls uses callq, which is the typical non-optimized function call.
This instruction pushes the return address to the stack, therefore increasing it.
Furthermore, this version also does push %rbx, which pushes %rbx to the stack.
GCC does this because it stores edi, which is the first function argument (n) into ebx, then calls factorial.
GCC needs to do this because it is preparing for another call to factorial, which will use the new edi == n-1.
It chooses ebx because this register is callee-saved: What registers are preserved through a linux x86-64 function call so the subcall to factorial won't change it and lose n.
the -foptimize-sibling-calls does not use any instructions that push to the stack: it only does goto jumps within factorial with the instructions je and jne.
Therefore, this version is equivalent to a while loop, without any function calls. Stack usage is constant.
Tested in Ubuntu 18.10, GCC 8.2.

Note first of all that not all languages support it.
TCO applys to a special case of recursion. The gist of it is, if the last thing you do in a function is call itself (e.g. it is calling itself from the "tail" position), this can be optimized by the compiler to act like iteration instead of standard recursion.
You see, normally during recursion, the runtime needs to keep track of all the recursive calls, so that when one returns it can resume at the previous call and so on. (Try manually writing out the result of a recursive call to get a visual idea of how this works.) Keeping track of all the calls takes up space, which gets significant when the function calls itself a lot. But with TCO, it can just say "go back to the beginning, only this time change the parameter values to these new ones." It can do that because nothing after the recursive call refers to those values.

Look here:
http://tratt.net/laurie/tech_articles/articles/tail_call_optimization
As you probably know, recursive function calls can wreak havoc on a stack; it is easy to quickly run out of stack space. Tail call optimization is way by which you can create a recursive style algorithm that uses constant stack space, therefore it does not grow and grow and you get stack errors.

The recursive function approach has a problem. It builds up a call stack of size O(n), which makes our total memory cost O(n). This makes it vulnerable to a stack overflow error, where the call stack gets too big and runs out of space.
Tail call optimization (TCO) scheme. Where it can optimize recursive functions to avoid building up a tall call stack and hence saves the memory cost.
There are many languages who are doing TCO like (JavaScript, Ruby and few C) whereas Python and Java do not do TCO.
JavaScript language has confirmed using :) http://2ality.com/2015/06/tail-call-optimization.html

We should ensure that there are no goto statements in the function itself .. taken care by function call being the last thing in the callee function.
Large scale recursions can use this for optimizations, but in small scale, the instruction overhead for making the function call a tail call reduces the actual purpose.
TCO might cause a forever running function:
void eternity()
{
eternity();
}

In a functional language, tail call optimization is as if a function call could return a partially evaluated expression as the result, which would then be evaluated by the caller.
f x = g x
f 6 reduces to g 6. So if the implementation could return g 6 as the result, and then call that expression it would save a stack frame.
Also
f x = if c x then g x else h x.
Reduces to f 6 to either g 6 or h 6. So if the implementation evaluates c 6 and finds it is true then it can reduce,
if true then g x else h x ---> g x
f x ---> h x
A simple non tail call optimization interpreter might look like this,
class simple_expresion
{
...
public:
virtual ximple_value *DoEvaluate() const = 0;
};
class simple_value
{
...
};
class simple_function : public simple_expresion
{
...
private:
simple_expresion *m_Function;
simple_expresion *m_Parameter;
public:
virtual simple_value *DoEvaluate() const
{
vector<simple_expresion *> parameterList;
parameterList->push_back(m_Parameter);
return m_Function->Call(parameterList);
}
};
class simple_if : public simple_function
{
private:
simple_expresion *m_Condition;
simple_expresion *m_Positive;
simple_expresion *m_Negative;
public:
simple_value *DoEvaluate() const
{
if (m_Condition.DoEvaluate()->IsTrue())
{
return m_Positive.DoEvaluate();
}
else
{
return m_Negative.DoEvaluate();
}
}
}
A tail call optimization interpreter might look like this,
class tco_expresion
{
...
public:
virtual tco_expresion *DoEvaluate() const = 0;
virtual bool IsValue()
{
return false;
}
};
class tco_value
{
...
public:
virtual bool IsValue()
{
return true;
}
};
class tco_function : public tco_expresion
{
...
private:
tco_expresion *m_Function;
tco_expresion *m_Parameter;
public:
virtual tco_expression *DoEvaluate() const
{
vector< tco_expression *> parameterList;
tco_expression *function = const_cast<SNI_Function *>(this);
while (!function->IsValue())
{
function = function->DoCall(parameterList);
}
return function;
}
tco_expresion *DoCall(vector<tco_expresion *> &p_ParameterList)
{
p_ParameterList.push_back(m_Parameter);
return m_Function;
}
};
class tco_if : public tco_function
{
private:
tco_expresion *m_Condition;
tco_expresion *m_Positive;
tco_expresion *m_Negative;
tco_expresion *DoEvaluate() const
{
if (m_Condition.DoEvaluate()->IsTrue())
{
return m_Positive;
}
else
{
return m_Negative;
}
}
}

Related

in c++ is there any way specialise a function template for specific values of arguments

I have a broadly used function foo(int a, int b) and I want to provide a special version of foo that performs differently if a is say 1.
a) I don't want to go through the whole code base and change all occurrences of foo(1, b) to foo1(b) because the rules on arguments may change and I dont want to keep going through the code base whenever the rules on arguments change.
b) I don't want to burden function foo with an "if (a == 1)" test because of performance issues.
It seems to me to be a fundamental skill of the compiler to call the right code based on what it can see in front of it. Or is this a possible missing feature of C++ that requires macros or something to handle currently.

Simply write
inline int foo(int a, int b)
{
if (a==1) {
// skip complex code and call easy code
call_easy(b);
} else {
// complex code here
do_complex(a, b);
}
}
When you call
foo(1, 10);
the optimizer will/should simply insert a call_easy(b).
Any decent optimizer will inline the function and detect if the function has been called with a==1. Also I think that the entire constexpr mentioned in other posts is nice, but not really necessary in your case. constexpr is very useful, if you want to resolve values at compile time. But you simply asked to switch code paths based on a value at runtime. The optimizer should be able to detect that.
In order to detect that, the optimizer needs to see your function definition at all places where your function is called. Hence the inline requirement - although compilers such as Visual Studio have a "generate code at link time" feature, that reduces this requirement somewhat.
Finally you might want to look at C++ attributes [[likely]] (I think). I haven't worked with them yet, but they are supposed to tell the compiler which execution path is likely and give a hint to the optimizer.
And why don't you experiment a little and look at the generated code in the debugger/disassemble. That will give you a feel for the optimizer. Don't forget that the optimizer is likely only active in Release Builds :)

Templates work in compile time and you want to decide in runtime which is never possible. If and only if you really can call your function with constexpr values, than you can change to a template, but the call becomes foo<1,2>() instead of foo(1,2); "performance issues"... that's really funny! If that single compare assembler instruction is the performance problem... yes, than you have done everything super perfect :-)
BTW: If you already call with constexpr values and the function is visible in the compilation unit, you can be sure the compiler already knows to optimize it away...
But there is another way to handle such things if you really have constexpr values sometimes and your algorithm inside the function can be constexpr evaluated. In that case, you can decide inside the function if your function was called in a constexpr context. If that is the case, you can do a full compile time algorithm which also can contain your if ( a== 1) which will be fully evaluated in compile time. If the function is not called in constexpr context, the function is running as before without any additional overhead.
To do such decision in compile time we need the actual C++ standard ( C++20 )!
constexpr int foo( int a, int)
{
if (std::is_constant_evaluated() )
{ // this part is fully evaluated in compile time!
if ( a == 1 )
{
return 1;
}
else
{
return 2;
}
}
else
{ // and the rest runs as before in runtime
if ( a == 0 )
{
return 3;
}
else
{
return 4;
}
}
}
int main()
{
constexpr int res1 = foo( 1,0 ); // fully evaluated during compile time
constexpr int res2 = foo( 2,0 ); // also full compile time
std::cout << res1 << std::endl;
std::cout << res2 << std::endl;
std::cout << foo( 5, 0) << std::endl; // here we go in runtime
std::cout << foo( 0, 0) << std::endl; // here we go in runtime
}
That code will return:
1
2
4
3
So we do not need to go with classic templates, no need to change the rest of the code but have full compile time optimization if possible.

#Sebastian's suggestion works at least in the simple case with all optimisation levels except -O0 in g++ 9.3.0 on Ubuntu 20.04 in c++20 mode. Thanks again.
See below disassembly always calling directly the correct subfunction func1 or func2 instead of the top function func(). A similar disassembly after -O0 shows only the top level func() being called leaving the decision to run-time which is not desired.
I hope this will work in production code and perhaps with multiple hard coded arguments.
Breakpoint 1, main () at p1.cpp:24
24 int main() {
(gdb) disass /m
Dump of assembler code for function main():
6 inline void func(int a, int b) {
7
8 if (a == 1)
9 func1(b);
10 else
11 func2(a,b);
12 }
13
14 void func1(int b) {
15 std::cout << "func1 " << " " << " " << b << std::endl;
16 }
17
18 void func2(int a, int b) {
19 std::cout << "func2 " << a << " " << b << std::endl;
20 }
21
22 };
23
24 int main() {
=> 0x0000555555555286 <+0>: endbr64
0x000055555555528a <+4>: push %rbp
0x000055555555528b <+5>: push %rbx
0x000055555555528c <+6>: sub $0x18,%rsp
0x0000555555555290 <+10>: mov $0x28,%ebp
0x0000555555555295 <+15>: mov %fs:0x0(%rbp),%rax
0x000055555555529a <+20>: mov %rax,0x8(%rsp)
0x000055555555529f <+25>: xor %eax,%eax
25
26 X x1;
27
28 int b=1;
29 x1.func(1,b);
0x00005555555552a1 <+27>: lea 0x7(%rsp),%rbx
0x00005555555552a6 <+32>: mov $0x1,%esi
0x00005555555552ab <+37>: mov %rbx,%rdi
0x00005555555552ae <+40>: callq 0x55555555531e <X::func1(int)>
30
31 b=2;
32 x1.func(2,b);
0x00005555555552b3 <+45>: mov $0x2,%edx
0x00005555555552b8 <+50>: mov $0x2,%esi
0x00005555555552bd <+55>: mov %rbx,%rdi
0x00005555555552c0 <+58>: callq 0x5555555553de <X::func2(int, int)>
33
34 b=3;
35 x1.func(1,b);
0x00005555555552c5 <+63>: mov $0x3,%esi
0x00005555555552ca <+68>: mov %rbx,%rdi
0x00005555555552cd <+71>: callq 0x55555555531e <X::func1(int)>
36
37 b=4;
38 x1.func(2,b);
0x00005555555552d2 <+76>: mov $0x4,%edx
0x00005555555552d7 <+81>: mov $0x2,%esi
0x00005555555552dc <+86>: mov %rbx,%rdi
0x00005555555552df <+89>: callq 0x5555555553de <X::func2(int, int)>
39
40 return 0;
0x00005555555552e4 <+94>: mov 0x8(%rsp),%rax
0x00005555555552e9 <+99>: xor %fs:0x0(%rbp),%rax
0x00005555555552ee <+104>: jne 0x5555555552fc <main()+118>
0x00005555555552f0 <+106>: mov $0x0,%eax
0x00005555555552f5 <+111>: add $0x18,%rsp
0x00005555555552f9 <+115>: pop %rbx
0x00005555555552fa <+116>: pop %rbp
0x00005555555552fb <+117>: retq
0x00005555555552fc <+118>: callq 0x555555555100 <__stack_chk_fail#plt>
End of assembler dump.

I can do x = y = z. How come x < y < z is not allowed in C++? [duplicate]

This question already has answers here:
Is (4 > y > 1) a valid statement in C++? How do you evaluate it if so?
(5 answers)
Language support for chained comparison operators (x < y < z)
(5 answers)
Closed 3 years ago.
I'm new to programming and have a question about using multiple operators on a single line.
Say, I have
int x = 0;
int y = 1;
int z = 2;
In this example, I can use a chain of assignment operators: x = y = z;
Yet how come I can't use: x < y < z;?

You can do that, but the results will not be what you expect.
bool can be implicitly casted to int. In such case, false value will be 0 and true value will be 1.
Let's say we have the following:
int x = -2;
int y = -1;
int z = 0;
Expression x < y < z will be evaluated as such:
x < y < z
(x < y) < z
(-2 < -1) < 0
(true) < 0
1 < 0
false
Operator = is different, because it works differently. It returns its left hand side operand (after the assignment operation), so you can chain it:
x = y = z
x = (y = z)
//y holds the value of z now
x = (y)
//x holds the value of y now
gcc gives me the following warning after trying to use x < y < z:
prog.cc:18:3: warning: comparisons like 'X<=Y<=Z' do not have their mathematical meaning [-Wparentheses]
18 | x < y < z;
| ~~^~~
Which is pretty self-explanatory. It works, but not as one may expect.
Note: Class can define it's own operator=, which may also do unexpected things when chained (nothing says "I hate you" better than operator which doesn't follow basic rules and idioms). Fortunately, this cannot be done for primitive types like int
class A
{
public:
A& operator= (const A& other)
{
n = other.n + 1;
return *this;
}
int n = 0;
};
int main()
{
A a, b, c;
a = b = c;
std::cout << a.n << ' ' << b.n << ' ' << c.n; //2 1 0, these objects are not equal!
}
Or even simpler:
class A
{
public:
void operator= (const A& other)
{
}
int n = 0;
};
int main()
{
A a, b, c;
a = b = c; //doesn't compile
}

x = y = z
You can think of the built-in assignment operator, =, for fundamental types returning a reference to the object being assigned to. That's why it's not surprising that the above works.
y = z returns a reference to y, then
x = y
x < y < z
The "less than" operator, <, returns true or false which would make one of the comparisons compare against true or false, not the actual variable.
x < y returns true or false, then
true or false < z where the boolean gets promoted to int which results in
1 or 0 < z
Workaround:
x < y < z should be written:
x < y && y < z
If you do this kind of manual BinaryPredicate chaining a lot, or have a lot of operands, it's easy to make mistakes and forget a condition somewhere in the chain. In that case, you can create helper functions to do the chaining for you. Example:
// matching exactly two operands
template<class BinaryPredicate, class T>
inline bool chain_binary_predicate(BinaryPredicate p, const T& v1, const T& v2)
{
return p(v1, v2);
}
// matching three or more operands
template<class BinaryPredicate, class T, class... Ts>
inline bool chain_binary_predicate(BinaryPredicate p, const T& v1, const T& v2,
const Ts&... vs)
{
return p(v1, v2) && chain_binary_predicate(p, v2, vs...);
}
And here's an example using std::less:
// bool r = 1<2 && 2<3 && 3<4 && 4<5 && 5<6 && 6<7 && 7<8
bool r = chain_binary_predicate(std::less<int>{}, 1, 2, 3, 4, 5, 6, 7, 8); // true

It is because you see those expressions as "chain of operators", but C++ has no such concept. C++ will execute each operator separately, in an order determined by their precedence and associativity (https://en.cppreference.com/w/cpp/language/operator_precedence).
(Expanded after C Perkins's comment)
James, your confusion comes from looking at x = y = z; as some special case of chained operators. In fact it follows the same rules as every other case.
This expression behaves like it does because the assignment = is right-to-left associative and returns its right-hand operand. There are no special rules, don't expect them for x < y < z.
By the way, x == y == z will not work the way you might expect either.
See also this answer.

C and C++ don't actually have the idea of "chained" operations. Each operation has a precedence, and they just follow the precedence using the results of the last operation like a math problem.
Note: I go into a low level explanation which I find to be helpful.
If you want to read a historical explanation, Davislor's answer may be helpful to you.
I also put a TL;DR at the bottom.
For example, std::cout isn't actually chained:
std::cout << "Hello!" << std::endl;
Is actually using the property that << evaluates from left to right and reusing a *this return value, so it actually does this:
std::ostream &tmp = std::ostream::operator<<(std::cout, "Hello!");
tmp.operator<<(std::endl);
(This is why printf is usually faster than std::cout in non-trivial outputs, as it doesn't require multiple function calls).
You can actually see this in the generated assembly (with the right flags):
#include <iostream>
int main(void)
{
std::cout << "Hello!" << std::endl;
}
clang++ --target=x86_64-linux-gnu -Oz -fno-exceptions -fomit-frame-pointer -fno-unwind-tables -fno-PIC -masm=intel -S
I am showing x86_64 assembly below, but don't worry, I documented it explaining each instruction so anyone should be able to understand.
I demangled and simplified the symbols. Nobody wants to read std::basic_ostream<char, std::char_traits<char> > 50 times.
# Logically, read-only code data goes in the .text section. :/
.globl main
main:
# Align the stack by pushing a scratch register.
# Small ABI lesson:
# Functions must have the stack 16 byte aligned, and that
# includes the extra 8 byte return address pushed by
# the call instruction.
push rax
# Small ABI lesson:
# On the System-V (non-Windows) ABI, the first two
# function parameters go in rdi and rsi.
# Windows uses rcx and rdx instead.
# Return values go into rax.
# Move the reference to std::cout into the first parameter (rdi)
# "offset" means an offset from the current instruction,
# but for most purposes, it is used for objects and literals
# in the same file.
mov edi, offset std::cout
# Move the pointer to our string literal into the second parameter (rsi/esi)
mov esi, offset .L.str
# rax = std::operator<<(rdi /* std::cout */, rsi /* "Hello!" */);
call std::operator<<(std::ostream&, const char*)
# Small ABI lesson:
# In almost all ABIs, member function calls are actually normal
# functions with the first argument being the 'this' pointer, so this:
# Foo foo;
# foo.bar(3);
# is actually called like this:
# Foo::bar(&foo /* this */, 3);
# Move the returned reference to the 'this' pointer parameter (rdi).
mov rdi, rax
# Move the address of std::endl to the first 'real' parameter (rsi/esi).
mov esi, offset std::ostream& std::endl(std::ostream&)
# rax = rdi.operator<<(rsi /* std::endl */)
call std::ostream::operator<<(std::ostream& (*)(std::ostream&))
# Zero out the return value.
# On x86, `xor dst, dst` is preferred to `mov dst, 0`.
xor eax, eax
# Realign the stack by popping to a scratch register.
pop rcx
# return eax
ret
# Bunch of generated template code from iostream
# Logically, text goes in the .rodata section. :/
.rodata
.L.str:
.asciiz "Hello!"
Anyways, the = operator is a right to left operator.
struct Foo {
Foo();
// Why you don't forget Foo(const Foo&);
Foo& operator=(const Foo& other);
int x; // avoid any cheating
};
void set3Foos(Foo& a, Foo& b, Foo& c)
{
a = b = c;
}
void set3Foos(Foo& a, Foo& b, Foo& c)
{
// a = (b = c)
Foo& tmp = b.operator=(c);
a.operator=(tmp);
}
Note: This is why the Rule of 3/Rule of 5 is important, and why inlining these is also important:
set3Foos(Foo&, Foo&, Foo&):
# Align the stack *and* save a preserved register
push rbx
# Backup `a` (rdi) into a preserved register.
mov rbx, rdi
# Move `b` (rsi) into the first 'this' parameter (rdi)
mov rdi, rsi
# Move `c` (rdx) into the second parameter (rsi)
mov rsi, rdx
# rax = rdi.operator=(rsi)
call Foo::operator=(const Foo&)
# Move `a` (rbx) into the first 'this' parameter (rdi)
mov rdi, rbx
# Move the returned Foo reference `tmp` (rax) into the second parameter (rsi)
mov rsi, rax
# rax = rdi.operator=(rsi)
call Foo::operator=(const Foo&)
# Restore the preserved register
pop rbx
# Return
ret
These "chain" because they all return the same type.
But < returns bool.
bool isInRange(int x, int y, int z)
{
return x < y < z;
}
It evaluates from left to right:
bool isInRange(int x, int y, int z)
{
bool tmp = x < y;
bool ret = (tmp ? 1 : 0) < z;
return ret;
}
isInRange(int, int, int):
# ret = 0 (we need manual zeroing because setl doesn't zero for us)
xor eax, eax
# (compare x, y)
cmp edi, esi
# ret = ((x < y) ? 1 : 0);
setl al
# (compare ret, z)
cmp eax, edx
# ret = ((ret < z) ? 1 : 0);
setl al
# return ret
ret
TL;DR:
x < y < z is pretty useless.
You probably want the && operator if you want to check x < y and y < z.
bool isInRange(int x, int y, int z)
{
return (x < y) && (y < z);
}
bool isInRange(int x, int y, int z)
{
if (!(x < y))
return false;
return y < z;
}

The historical reason for this is that C++ inherited these operators from C, which inherited them from an earlier language named B, which was based on BCPL, based on CPL, based on Algol.
Algol introduced “assignations” in 1968, which made assignments into expressions that returned a value. This allowed an assignment statement to pass its result along to the right-hand side of another assignment statement. This allowed chaining assignments. The = operator had to be parsed from right to left for this to work, which is the opposite of every other operator, but programmers had been used to that quirk since the ’60s. All the C-family languages inherited this, and C introduced a few others that work the same way.
The reason that serious bugs like if (euid = 0) or a < b < c compile at all is because of a simplification made in BCPL: truth values and numbers have the same type and can be used interchangeably. The B in BCPL stood for “Basic,” and the way it made itself so simple was to ditch the type system. All expressions were weakly-typed and the size of a machine register. Just one set of operators &, |, ^ and ~ did double duty for both integer and Boolean expressions, which let the language eliminate the Boolean type. Thus, a < b < c converts a < b into the numeric value of true or false, and compares that to c. In order for ~ to work as both bitwise and logical not, BCPL needed to define true as ~false, which is ~0. On most machines, that represents -1, but on some, it could be INT_MIN, a trap value, or -0. So, you could pass the “rvalue” of true to an arithmetic expression, but it wouldn’t be meaningful.
B, the predecessor of C, decided to keep the general idea, but go back to the Algol value of 1 for TRUE. This meant that ~ no longer changed TRUE to FALSE or vice versa. Since B didn’t have strong typing that could determine at compile time whether to use logical or bitwise not, it needed to create a separate ! operator. It also defined all nonzero integer values as truthy. It kept using bitwise & and |, even though these were now broken (1&2 is false even though both operands are truthy).
C added the && and || operators, to allow short-circuit optimization and, secondarily, to fix that problem with AND. It chose not to add a logical-xor, true to their philosophy of letting us shoot ourselves in the foot, so ^ breaks if we use it on a pair of different truthy numbers. (If you want a robust logical-xor, !!p ^ !!q.) Then, the designers made the very dubious choice not to add back a Boolean type, even though they had completely undone every benefit of eliminating it in the first place, and not having one now made the language more complicated, not less. Both C++ and the C standard library would later define bool, but by then it was too late. They were stuck with three more operators than they’d started with, and they had made typing = when you meant == into a deadly trap that has caused many security bugs.
Modern compilers try to mitigate the problems by assuming that any use of =, < and so on that violates most coding standards is probably a typo, and at least warning you about it. If you really meant to do that—one common example is if (errcode = library_call()) to both check if the call failed and save the error code in case it did—the convention is that an extra pair of parentheses tells the compiler you really meant it. So, a compiler would accept if ( 0 != (errcode = library_call()) ) without complaint. In C++17, you could also write if ( const auto errcode = library_call() ) or if ( const auto errcode = library_call(); errcode != 0 ). Similarly, the compiler would accept (foo < bar) < baz, but what you probably meant is foo < bar && bar < baz.

Even though it looks like you are assigning to multiple variables at the same time, it is actually a chain of sequential assignments. Specifically, y = z is evaluated first. The built-in = operator assigns the value of z to y and then returns an lvalue reference to y (source). That reference is then used to assign to x. So the code is basically equivalent to this
y = z;
x = y;
Applying the same logic to the comparison statement, with the difference that this one is evaluated left to right (source), we get the equivalent of
const bool first_comparison = x < y;
first_comparison < z;
Now, bool can be cast to int, but that is not what you want most of the time. As to why the language doesn't do what you want, it's because these operators are only defined as binary operators. Chained assignment just works because it can spare the return value so it was designed to return a reference to enable these semantics, but comparisons are required to return a bool and therefore they cannot be chained in a meaningful way without introducing new potentially breaking features to the language.

You can use x<y<z, but it does not get the result that you expect !
x<y<z is evaluated as (x<y)<z. Then x<y results in a boolean that will be either true or false. When you try to compare a boolean with the integer z, it gets integer promotion, with false being 0 and true being 1 (this is clearly defined by the C++ standard).
Demonstration:
int x=1,y=2,z=3;
cout << "x<y: "<< (x<y) << endl; // 1 since 1 is smaller than 2
cout << "x<y<z: "<< (x<y<z) <<endl; // 1 since boolean (x<y) is true, which is
// promoted to 1, which is smaller than 3
z=1;
cout << "x<y<z: "<< (x<y<z) <<endl; // 1 since boolean (x<y) is true, which is
// promoted to 1, which is not smaler than 1
You can use x=y=z, but it might not be what you expect either!
Be aware that = is the assignment operator and not the comparison for equality! = works right to left, copying the value on the right into the "lvalue" on the left. So here, it copies the value of z into y, then copies the value in y into x.
If you use this expression in a conditional (if, while, ...), it will be true if x is in the end something different from 0 and false in all other cases, whatever the initial values of x, y and z. ``
Demonstration:
int x=1,y=2,z=3;
if (x=y=z)
cout << "Ouch! it's true and now all variables are 3" <<endl;
z=0;
if (x=y=z)
cout <<"Whatever"<<end;
else
cout << "Ouch! it's false and now all the variables are 0"<<endl;
You can use x==y==z, but it might still not be what you expect!
Same as for x<y<z except that the comparison is for equality. So you'll end up comparing a promoted boolean with and integer value, and not at all that all values are equal!
Conclusions
If you want to compare more than 2 items in a chained way, just rewrite the expression comparing termes two by two:
(x<y && y<z) // same truth than mathematically x<y<z
(x==y && y==z) // true if and only if all three terms are equal
Chaining the assignment operator is allowed, but tricky. It is sometimes used to initialize several variables at once. But it's not to be recommended as a general practice.
int i, j;
for (i=j=0; i<10 && j<5; j++) // trick !!
j+=2;
for (int i=0, j=0; i<10 && j<5; j++) // comma operator is cleaner
j+=2;

I can use x = y = z. Why not x < y < z?
You're essentially asking about syntax-idiomatic consistency here.
Well, just take consistency in the other direction: You should just avoid using x = y = z. After all, it is not an assertion that x, y and z are equal - it is rather two consecutive assignments; and at the same time, because it's reminiscent of indication of equality - this double-assignment a bit confusing.
So, just write:
y = z;
x = y;
instead, unless there's a very particular reason to push everything into a single statement.

Difficulty in understanding let and lambda usage in Scheme

I am having hard time to understand the scope of the following code:
(define (create-counter (x 1))
(let ([count 0])
(lambda()
(let ([temp count])
(set! count (+ x count)) temp))))
if I use:
(let ((c (create-counter ))) (+ (c) (c) (c) (c)))
the code work however if i tried with:
(+ (create-counter)(create-counter)(create-counter)(create-counter))
This does not work and give me a 0. Can someone please help me to understand this thoroughly? if possible, please compare to other language like C/C++ it would be easier for me to catch the hold of this. Thanks

(define (create-counter (x 1))
(let ([count 0])
(lambda()
(let ([temp count])
(set! count (+ x count)) temp))))
Translates to:
auto create_counter(int x=1){
int count=0;
return [x,count]()mutable{
int r=count;
count+=x;
return r;
};
}
A simple C++14 function returning a closure object.
When you do this:
(let ((c (create-counter ))) (+ (c) (c) (c) (c)))
It is:
auto c = create_counter();
auto r = c()+c()+c()+c();
return r;
It creates one counter, then runs it 4 times, returning 0 1 2 3 and adding to 6.
In this case:
(+ ((create-counter))((create-counter))((create-counter))((create-counter)))
It is:
auto r = create_counter()()+create_counter()()+create_counter()()+create_counter()();
return r;
Which creates 4 counters, and runs each one once. The first time you run a counter you get 0. So this adds to 0.
The closure object has state. It returns a bigger number each time you call it.
Now you may not be familiar with C++11/14 lamnda.
auto create_counter(int x=1){
int count=0;
return [x,count]()mutable{
int r=count;
count+=x;
return r;
};
}
Is
struct counter {
int x,count;
int operator()(){
int r=count;
count+=x;
return r;
};
};
counter create_counter(int x=1){
return {x,0};
}
with some syntax sugar.
I fixed what seems to be a syntax error in your original code. I am no expert, so maybe I got it wrong.
As an aside, a briefer create counter looks like:
auto create_counter(int x=1){
return [=,count=0]()mutable{
int r=count;
count+=x;
return r;
};
}

When you call "create-counter", it creates a counter and then returns a procedure that refers to that particular counter. When you call "create-counter" four times, you're creating four separate counters; each procedure refers to its own counter. When you call "create-counter" once and then the resulting procedure four times, it's creating just one counter, and incrementing it four times.
It's a bit hard to compare this to C, since C and C++ are quite weak in the area of closures; it's not easy to return a function that's defined inside of another function.
The closest analog might be a "counter" object in C++; think of "create-counter" as the constructor for an object containing a single integer, and the resulting procedure as an "increment" method that increments the counter contained in that object. In your second example, then, you're creating four distinct objects, where in your first example, you're creating one object and calling its "increment" method four times.

How does tail recursion really help over traditional recursion?

I was reading up about the difference between tail recursion and Traditional recursion and find it mentioned that "Tail Recursion however is a form of recursion that doesn’t use any stack space, and thus is a way to use recursion safely."'
I am struggling to understand how.
Comparing finding factorial of a number using the Traditional and tail recursion
Traditional recursion
/* traditional recursion */
fun(5);
int fun(int n)
{
if(n == 0)
return 1;
return n * fun(n-1);
}
Here, the call stack would look like
5 * fact(4)
|
4 * fact(3)
|
3 * fact(2)
|
2 * fact(1)
|
1 * fact(0)
|
1
Tail recursion
/* tail recursion */
fun(5,1)
int fun(int n, int sofar)
{
int ret = 0;
if(n == 0)
return sofar;
ret = fun(n-1,sofar*n);
return ret;
}
However, even here, the variable 'sofar' would hold - 5,20,60,120,120 at different points.
But once return is called from the base case which is recursive invocation #4, it still has to return 120 to recursive invocation #3, then to#2, #1 and back to main.
So, I mean to say that the stack is used and everytime you return to the previous call, the variables at that point in time can be seen, which means it is being saved at each step.
Unless, the tail recursion was written like below, I am not being able to understand how it saves stack space.
/* tail recursion */
fun(5,1)
int fun(int n, int sofar)
{
int ret = 0;
if(n == 0)
return 'sofar' back to main function, stop recursing back; just a one-shot return
ret = fun(n-1,sofar*n);
return ret;
}
PS : I have read few threads on SO and came to understand what tail recursion is, however, this question is more related to why it saves stack space. I could not find a similar question where this was discussed.

The trick is that if the compiler notices the tail recursion, it can compile a goto instead. It will generate something like the following code:
int fun_optimized(int n, int sofar)
{
start:
if(n == 0)
return sofar;
sofar = sofar*n;
n = n-1;
goto start;
}
And as you can see, the stack space is reused for each iteration.
Note that this optimization can only be done if the recursive call is the very last action in the function, that is tail recursion (try doing it manually to the non-tail case and you'll see that's just impossible).

A function call is tail recursive when function call (recursive) is performed as final action. Since the current recursive instance is done executing at that point, no need to maintaining its stack frame.
In this case, creating a stack frame on top of the current stack frame is nothing more than waste.
When compiler recognizes a recursion to be a tail recursion then it does not create nesting stack frames for each of the call instead it use the current stack frame. This is equivalent in effect to a goto statement. This make make that function call iterative rather recursive.
Note that in traditional recursion, every recursive call must have to complete before compiler performing the multiplication operations:
fun(5)
5 * fun(4)
5 * (4 * fun(3))
5 * (4 * (3 * fun(2)))
5 * (4 * (3 * (2 * fun(1))))
5 * (4 * (3 * (2 * 1)))
120
Nested stack frame needed in this case. Look at wiki for more information.
In case of tail recursion, with each call of fun, variable sofar is updated:
fun(5, 1)
fun(4, 5)
fun(3, 20)
fun(2, 60)
fun(1, 120)
120
No need to save stack frame of current recursive call.

tactics to think as a computer

I have some question from the exam in which I need to deduce the output of the following code:
01 int foo(int a) {
02 print 'F';
03 if (a <= 1) return 1;
04 return bar(a, foo(a-1));
05 }
06
07 int bar(int x, int y) {
08 print 'B';
09 if (x > y) return baz(x, y);
10 return baz(y, x);
11 }
12
13 int baz(int x, int y) {
14 print 'Z'
15 if (y == 0) return 0;
16 return baz(x, y-1) + x;
17 }
18
19 void main() {
20 foo(3);
21 }
my question is what tactic will be the best to solve this kind of the questions? I'm not allowed to use PC of course
P.S. You can use eager evaluation as in c++ or normal order evaluation(output will be different of course, but I'm interested in tactics only), I tried to solve it using stack, every time write the function which I call, but anyway it is complicated
thanks in advance for any help

I would use a "bottom-to-top" attempt:
baz is the function that is called, but doesn't call other functions (except itself). It outputs 'Z' exactly y + 1 times, the return code is x*y (you add x after each call).
bar is the "next higher" function, it outputs 'B' once and calls baz with its lower argument as the second parameter - the return code is x*y, too.
foo is the "top" function (right after main) and its the most complicated function. It outputs 'F', not only once, but a times (because of the foo(a-1) at the end that is evaluated before the bar call. The bar call multiplies a and foo(a-1), which will multiply a-1 and foo(a-2) and so on, until foo(1) is evaluated and returns 1. So the return code is a * (a-1) * ... 2 * 1, so a!.
This is not a complete analysis, f.e. we don't know in which order the characters will be output, but it is a rough scheme of what happens - and as you and other people in the comments pointed out, this is what you want - tactics instead of a complete answer.

What I'd probably do is to start with the main() function at the top left corner of the page, write down the first line executed, keeping track of local variables etc., then write the next line under it and so on.
But when a function is called, also move right by one column, writing down the function's name and the actual value of the input arguments for that invocation first and then proceding with the lines in that function.
When you return from the function, move left and write the return value between the two columns.
Also, keep a separate area for the "standard output", where all the printed text goes.
These steps should take you through most of "think like a computer" problems.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Tail Recursion, why it's efficent? [duplicate] - c++

Very simply, what is tail-call optimization? More specifically, what are some small code snippets where it could be applied, and where not, with an explanation of why?

Related

in c++ is there any way specialise a function template for specific values of arguments

I can do x = y = z. How come x < y < z is not allowed in C++? [duplicate]

Difficulty in understanding let and lambda usage in Scheme

How does tail recursion really help over traditional recursion?

tactics to think as a computer

Categories

Resources