Is there partial tail call optimization for recursive functions? - c++

How can I do tail call optimization on g++ on a function that is not completely tail recursive?
For example:
void foo(Node *n) {
if (n == nullptr) return;
foo(n->left);
cout << n->datum;
foo(n->right);
}
This is foo(n->left) is not tail recursive, but foo(n->right) is. Is there a way to optimize this?

The answer is: ask your compiler, very nicely.
The C++ specification allows the compiler to implement any optimization as long as the observable results remain the same.
In the shown code, the partial tail optimization will obviously produce identical observable results. Whether it actually happens, that depends entirely on your compiler. The C++ specification does not require the compiler to perform tail optimization here, but it doesn't prohibit it, either. This is entirely up to your compiler, and it's fairly likely that a modern C++ compiler will do this, at a sufficiently aggressive optimization level.

Related

Can the extra conditional check at the call site be optimised out by the compiler easily?

In the case where you need to check the return value at the call site, is it easy for the compiler to optimise it out if the value is checked in the function itself? Does it make a difference whether the function is inline? I tried looking at the assembly code to check for jumps but I'm afraid I don't understand it at all. I'm talking about a situation like this?
int* try_get()
{
static int anint;
anint = rand() % 2;
if (anint) return &anint;
else return nullptr;
}
int main()
{
int* p = try_get();
if (p) // The value was already tested in the function.
// Is optimisation of this easy? Does it depend on whether the function is inline?
{
std::cout << "Hello";
}
}
A C++ compiler is allowed to perform any optimization that has no observable effects, however the C++ standard does not require any C++ compiler to perform any such optimization (except those that are required by the C++ specification itself, such as mandatory copy elision). Except for the required optimizations, everything else is entirely at your C++ compiler's discretion.
If the compiler has access both to the function definition and its call site, and the compiler can work out that this particular optimization has no observable effects, then the compiler can certainly optimize it out. Whether your compiler will do that can only be answered by looking at your compiler's compiled code. And even after determining what your compiler actually does will not, of course, bear any relevance to what any other compiler would do.
Whether or not the function in question is inline, or not, may or may not be a factor that your compiler considers when deciding whether to perform this optimization.
And, finally, even looking at what your compiler produced, for a particular translation unit, may not even paint the entire picture as well. Many current C++ compilers feature link-time optimizations, where the combined mighty forces of the compiler and the linker produce additional optimizations and code transformations in the final, linked executable.
So the only definitive answer here is to go actually look at the actual linked code in your final executable, in order to figure out whether any particular optimization took place, and, of course, that is a highly technical matter.

Evaluating an 'if' clause at compile time

Consider the following code snippet:
#include <limits>
#include <stdexcept>
void g(unsigned) {
// ...
}
template<typename UIntT>
void f(UIntT n)
{
if constexpr (std::numeric_limits<UIntT>::max() > std::numeric_limits<unsigned>::max())
{
if (n > std::numeric_limits<unsigned>::max())
throw std::length_error("Too long.");
}
g(n);
}
I wonder whether the 'if constexpr' clause is really useful here. Aren't compilers smart enough to find out whether the 'if' clause can ever be true for a given UIntT? If so, is this mandated by the standard?
Aren't compilers smart enough to find out whether the if clause can ever be true for a given UIntT?
Most are.
If so, is this mandated by the standard?
No, some optimizations have been given a name (RVO:s etc) and have later been incorporated into the language standard, but DEADC0DE optimizations aren't standardized (to my knowledge).
... but constexpr is
There's no way a conforming compiler would keep that block (if the condition is false) in your resulting binary - however you decide to optimize your code.
This use of if constexpr has no observable difference from an if according to the C++ standard.
However, slightly different variants of it could result in an observable difference in what symbols a compilation unit uses. It seems plausible to me that would cause observable differences.
Most modern compilers can and will reduce that to if (false) during optimization even if not constexpr, and dead-branch elimination is a pretty simple optimization. In a debug build they might leave the dead code alone, while they might eliminate it with constexpr.
Compiler explorer is great to answer specific cases of this kind of question, as it makes it pretty easy to see the generated assembly of every major compiler. So if you want to know if there is a difference in a default MSVC 2015 debug or release setup, you can see it there.

LLVM tail call optimization

Here is my understanding of things:
A function "f" is tail recursive when calling itself is its last action.
Tail-recursion can be significantly optimized by forming a loop instead of calling the function again; the function's parameters are updated in place, and the body is ran again. This is called recursive tail call optimization.
LLVM implements recursive tail call optimization when using fastcc, GHC, or the HiPE calling convention.
http://llvm.org/docs/CodeGenerator.html#tail-call-optimization
I have some questions:
Let's consider the silly example:
int h(int x){
if (x <= 0)
return x;
else
h(x-1);
}
1) In their example, the keyword "tail" preceeds call. Elsewhere I read that this keyword is optional. Suppose the function above is translated to LLVM appropriately, do the last few lines need to be
%x' = load *i32 %x
%m = tail call fastcc i32 #h(i32 %x')
ret %m
2) What is the meaning of the inreg option in their example?
3) I would not want to perform tail call optimizations all over the place, only for recursive functions. Is there a way I can get LLVM to only perform the optimization (when available) for recursive functions?
Apparently the answer is yes. You have to change the definition of h to see this (because the optimizer is too good! It figures out that h is either the identity or returns 0).
Consider
int factorial (int x, int y){
if (x==0)
return y;
else
return factorial(x-1,y*x);
}
Compiled with clang -S -emit-llvm, so that no optimization is performed. One sees that no calling conventions are directly specified, which means that the default calling convention is enough to support tail recursion optimization (whether or not it supports tail calling in general is a different story -- it would be interesting to know, but I guess that is really a different question).
The file emitted by clang -S -emit-llvm is main.s (assuming the factorial definition is in main.c). If you run
opt -O3 main.s -S -o mainOpt.s
then you can see that the tail recursion is eliminated. There is an optimization called tailcallelim which may be turned on as -O3. It's hard to tell because the help file, opt --help, says only that -O3 is similar to gcc -O3.
The point is that we can see that the calling convention does not need to specified for this. Maybe fastcc is not needed, or maybe it is default? So (1) is partially answered; however, I still do not know (2) or (3).
There are two different things here:
You can optimise self-recursive tail calls into a loop. LLVM provides an optimisation pass that does this. It does not require a specific calling convention.
You can use a different calling convention to guarantee tail call optimisation of all calls in tail position (i.e. including calls to other functions). With LLVM, you need to specify the calling convention on the function, on the call instruction and mark the call as a tail call.
Sounds like you want the former.

C++ use templates to avoid compiler from checking a boolean

Let's say I have a function:
template <bool stuff>
inline void doSomething() {
if(stuff) {
cout << "Hello" << endl;
}
else {
cout << "Goodbye" << endl;
}
}
And I call it like this:
doSomething<true>();
doSomething<false>();
It would pring out:
Hello
Goodbye
What I'm really wondering is does the compiler fully optimize this?
When I call the templated function with true, will it create a function that just outputs "Hello" and avoids the if statement and the code for "Goodbye"?
This would be really useful for this one giant function I just wrote that's supposed to be very optimized and avoid as many unnecessary if statement checks as possible. I have a very good feeling it would, at least in a release build with optimizations if not in a debug build with no optimizations.
Disclaimer: Noone can guarantee anything.
That said, this an obvious and easy optimization for any compiler. It's quite safe to say that it will be optimized away, unless the optimizer is, well, practically useless.
Since your "true" and "false" are constants, you are unambiguously creating an obvious dead branch in each class, and the compiler should optimize it away. Should is taken literally here - I would consider it a major, major problem if an "optimising" compiler did not do dead branch removal.
In other words, if your compiler cannot optimize this, it is the use of that compiler that should be evaluated, not the code.
So, I would say your gut feeling is correct: while yes, no "guarantees" as such can be made on each and every compiler, I would not use a compiler incapable of performing simplistic optimizations in any production environment, and of course not in any performance critical one. (In release builds of course).
So, use it. Any modern optimizing compiler will optimize it away because it is a trivial optimization. If in doubt, check disassembly, and if it is not optimized, change the compiler to something more modern.
In general, if you are writing any kind of performance-critical code, you must rely, at least to some extent, on compiler optimizations.
This is inherently up to the compiler, so you'd have to check the compiler's documentation or the generated code. But in simple cases like this, you can easily implement the optimization yourself:
template <bool stuff>
inline void doSomething();
template<>
inline void doSomething<true>() {
cout << "Hello" << endl;
}
template<>
inline void doSomething<false>() {
cout << "Goodbye" << endl;
}
But "optimization" isn't really the right word to use since this might actually degrade performance. It's only an optimization if it actually benefits your code performance.
Indeed, it really createa two functions, but
premature optimization is the root of all evil
especially if your changing your code structure because of a simple if statement. I doubt that this will affect performance. Also the boolean must be static, that means you cant take a runtime evaluated var and pass it to the function. How should the linker know which function to call? In this case youll have to manually evaluate it and call the appropiate function on your own.
Compilers are really good at constant folding. That is, in this case it would surprise me if the check would stay until after optimization. A non-optimized build might still have the check. The easiest way to verify is to create assembler output and check.
That said, it is worth noting that the compiler has to check both branches for correctness, even if it only ever uses one branch. This frequently shows up, e.g., when using slightly different algorithms for Random Access Iterators and other iterators. The condition would depend on a type-trait and one of the branches may fail to compile depending on operations tested for by the traits. The committee has discussed turning off this checking under the term static if although there is no consensus, yet, on how the features would look exactly (if it gets added).
If I understand you correctly you want (in essence) end up with 'two' functions that are optimised for either a true or a false input so that they don't need check that flag?
Aside from any trivial optimisation that may yield (I'm against premature otimisation - I believe in maintainability before measurement before optimisation), I would say why not refactor your function to actually be two functions? If they have common code then then that code could be refactored out too. However if the requirement is such that the refactoring is non optimal then I'd replace that with a #define refactoring.

Two questions about inline functions in C++

I have question when I compile an inline function in C++.
Can a recursive function work with inline. If yes then please describe how.
I am sure about loop can't work with it but I have read somewhere recursive would work, If we pass constant values.
My friend send me some inline recursive function as constant parameter and told me that would be work but that not work on my laptop, no error at compile time but at run time display nothing and I have to terminate it by force break.
inline f(int n) {
if(n<=1)
return 1;
else {
n=n*f(n-1);
return n;
}
}
how does this work?
I am using turbo 3.2
Also, if an inline function code is too large then, can the compiler change it automatically in normal function?
thanks
This particular function definitely can be inlined. That is because the compiler can figure out that this particular form of recursion (tail-recursion) can be trivially turned into a normal loop. And with a normal loop it has no problem inlining it at all.
Not only can the compiler inline it, it can even calculate the result for a compile-time constant without generating any code for the function.
With GCC 4.4
int fac = f(10);
produced this instruction:
movl $3628800, 4(%esp)
You can easily verify when checking assembly output, that the function is indeed inlined for input that is not known at compile-time.
I suppose your friend was trying to say that if given a constant, the compiler could calculate the result entirely at compile time and just inline the answer at the call site. c++0x actually has a mechanism for this called constexpr, but there are limits to how complex the code is allowed to be. But even with the current version of c++, it is possible. It depends entirely on the compiler.
This function may be a good candidate given that it clearly only references the parameter to calculate the result. Some compilers even have non-portable attributes to help the compiler decide this. For example, gcc has pure and const attributes (listed on that page I just linked) that inform the compiler that this code only operates on the parameters and has no side effects, making it more likely to be calculated at compile time.
Even without this, it will still compile! The reason why is that the compiler is allowed to not inline a function if it decides. Think of the inline keyword more of a suggestion than an instruction.
Assuming that the compiler doesn't calculate the whole thing at compile time, inlining is not completely possible without other optimizations applied (see EDIT below) since it must have an actual function to call. However, it may get partially inlined. In that case the compiler will inline the initial call, but also emit a regular version of the function which will get called during recursion.
As for your second question, yes, size is one of the factors that compilers use to decide if it is appropriate to inline something.
If running this code on your laptop takes a very long time, then it is possible that you just gave it very large values and it is simply taking a long time to calculate the answer... The code look ok, but keep in mind that values above 13! are going to overflow a 32-bit int. What value did you attempt to pass?
The only way to know what actually happens is to compile it an look at the assembly generated.
PS: you may want to look into a more modern compiler if you are concerned with optimizations. For windows there is MingW and free versions of Visual C++. For *NIX there is of course g++.
EDIT: There is also a thing called Tail Recursion Optimization which allows compilers to convert certain types of recursive algorithms to iterative, making them better candidates for inlining. (In addition to making them more stack space efficient).
Recursive function can be inlined to certain limited depth of recursion. Some compilers have an option that lets you to specify how deep you want to go when inlining recursive functions. Basically, the compiler "flattens" several nested levels of recursion. If the execution reaches the end of "flattened" code, the code calls itself in usual recursive fashion and so on. Of course, if the depth of recursion is a run-time value, the compiler has to check the corresponding condition every time before executing each original recursive step inside the "flattened" code. In other words, there's nothing too unusual about inlining a recursive function. It is like unrolling a loop. There's no requirement for the parameters to be constant.
What you mean by "I am sure about loop can't work" is not clear. It doesn't seem to make much sense. Functions with a loop can be easily inlined and there's nothing strange about it.
What are you trying to say about your example that "displays nothing" is not clear either. There is nothing in the code that would "display" anything. No wonder it "displays nothing". On top of that, you posted invalid code. C++ language does not allow function declarations without an explicit return type.
As for your last question, yes, the compiler is completely free to implement an inline function as "normal" function. It has nothing to do with function being "too large" though. It has everything to do with more-or-less complex heuristic criteria used by that specific compiler to make the decision about inlining a function. It can take the size into account. It can take other things into account.
You can inline recursive functions. The compiler normally unrolls them to a certain depth- in VS you can even have a pragma for this, and the compiler can also do partial inlining. It essentially converts it into loops. Also, as #Evan Teran said, the compiler is not forced to inline a function that you suggest at all. It might totally ignore you and that's perfectly valid.
The problem with the code is not in that inline function. The constantness or not of the argument is pretty irrelevant, I'm sure.
Also, seriously, get a new compiler. There's modern free compilers for whatever OS your laptop runs.
One thing to keep in mind - according to the standard, inline is a suggestion, not an absolute guarantee. In the case of a recursive function, the compiler would not always be able to compute the recursion limit - modern compilers are getting extremely smart, a previous response shows the compiler evaluating a constant inline and simply generating the result, but consider
bigint fac = factorialOf(userInput)
there's no way the compiler can figure that one out........
As a side note, most compilers tend to ignore inlines in debug builds unless specifically instructed not to do so - makes debugging easier
Tail recursions can be converted to loops as long as the compiler can satisfactorily rearrange the internal representation to get the recursion conditional test at the end. In this case it can do the code generation to re-express the recursive function as a simple loop
As far as issues like tail recursion rewrites, partial expansions of recursive functions, etc, these are usually controlled by the optimization switches - all modern compilers are capable of pretty signficant optimization, but sometimes things do go wrong.
Remember that the inline key word merely sends a request, not a command to the compiler. The compliler may ignore yhis request if the function definition is too long or too complicated and compile the function as normal function.
in some of the cases where inline functions may not work are
For functions returning values, if a loop, a switch or a goto exists.
For functions not returning values, if a return statement exists.
If function contains static variables.
If in line functions are recursive.
hence in C++ inline recursive functions may not work.