LLVM tail call optimization - llvm

Here is my understanding of things:
A function "f" is tail recursive when calling itself is its last action.
Tail-recursion can be significantly optimized by forming a loop instead of calling the function again; the function's parameters are updated in place, and the body is ran again. This is called recursive tail call optimization.
LLVM implements recursive tail call optimization when using fastcc, GHC, or the HiPE calling convention.
http://llvm.org/docs/CodeGenerator.html#tail-call-optimization
I have some questions:
Let's consider the silly example:
int h(int x){
if (x <= 0)
return x;
else
h(x-1);
}
1) In their example, the keyword "tail" preceeds call. Elsewhere I read that this keyword is optional. Suppose the function above is translated to LLVM appropriately, do the last few lines need to be
%x' = load *i32 %x
%m = tail call fastcc i32 #h(i32 %x')
ret %m
2) What is the meaning of the inreg option in their example?
3) I would not want to perform tail call optimizations all over the place, only for recursive functions. Is there a way I can get LLVM to only perform the optimization (when available) for recursive functions?

Apparently the answer is yes. You have to change the definition of h to see this (because the optimizer is too good! It figures out that h is either the identity or returns 0).
Consider
int factorial (int x, int y){
if (x==0)
return y;
else
return factorial(x-1,y*x);
}
Compiled with clang -S -emit-llvm, so that no optimization is performed. One sees that no calling conventions are directly specified, which means that the default calling convention is enough to support tail recursion optimization (whether or not it supports tail calling in general is a different story -- it would be interesting to know, but I guess that is really a different question).
The file emitted by clang -S -emit-llvm is main.s (assuming the factorial definition is in main.c). If you run
opt -O3 main.s -S -o mainOpt.s
then you can see that the tail recursion is eliminated. There is an optimization called tailcallelim which may be turned on as -O3. It's hard to tell because the help file, opt --help, says only that -O3 is similar to gcc -O3.
The point is that we can see that the calling convention does not need to specified for this. Maybe fastcc is not needed, or maybe it is default? So (1) is partially answered; however, I still do not know (2) or (3).

There are two different things here:
You can optimise self-recursive tail calls into a loop. LLVM provides an optimisation pass that does this. It does not require a specific calling convention.
You can use a different calling convention to guarantee tail call optimisation of all calls in tail position (i.e. including calls to other functions). With LLVM, you need to specify the calling convention on the function, on the call instruction and mark the call as a tail call.
Sounds like you want the former.

Related

Is there partial tail call optimization for recursive functions?

How can I do tail call optimization on g++ on a function that is not completely tail recursive?
For example:
void foo(Node *n) {
if (n == nullptr) return;
foo(n->left);
cout << n->datum;
foo(n->right);
}
This is foo(n->left) is not tail recursive, but foo(n->right) is. Is there a way to optimize this?
The answer is: ask your compiler, very nicely.
The C++ specification allows the compiler to implement any optimization as long as the observable results remain the same.
In the shown code, the partial tail optimization will obviously produce identical observable results. Whether it actually happens, that depends entirely on your compiler. The C++ specification does not require the compiler to perform tail optimization here, but it doesn't prohibit it, either. This is entirely up to your compiler, and it's fairly likely that a modern C++ compiler will do this, at a sufficiently aggressive optimization level.

How will i know whether inline function is actually replaced at the place where it is called or not?

I know that inline function are either replaced where it is called or behave as a normal function.
But how will I know whether inline function is actually replaced at the place where it is called or not as decision of treating inline function as inline is at the compile time?
Programatically at run-time, You cannot.
And the truth of the matter is: You don't need to know
An compiler can choose to inline functions that are not marked inline or ignore functions marked explicitly inline, it is completely the wish(read wisdom) of the compiler & You should trust the compiler do its job judiciously. Most of the mainstream compilers will do their job nicely.
If your question is purely from a academic point of view then there are a couple of options available:
Analyze generated Assembly Code:
You can check the assembly code to check if the function code is inlined at point of calling.
How to generate the assembly code?
For gcc:
Use the -S switch while compilation.
For ex:
g++ -S FileName.cpp
The generated assembly code is created as file FileName.s.
For MSVC:
Use the /FA Switch from command line.
In the generated assembly code lookup if there is a call assembly instruction for the particular function.
Use Compiler specific Warnings and Diagnostics:
Some compilers will emit a warning if they fail to comply an inline function request.
For example, in gcc, the -Winline command option will emit a warning if the compiler does not inline a function that was declared inline.
Check the GCC documentation for more detail:
-Winline
Warn if a function that is declared as inline cannot be inlined. Even with this option, the compiler does not warn about failures to inline functions declared in system headers.
The compiler uses a variety of heuristics to determine whether or not to inline a function. For example, the compiler takes into account the size of the function being inlined and the amount of inlining that has already been done in the current function. Therefore, seemingly insignificant changes in the source program can cause the warnings produced by -Winline to appear or disappear.
Check the generated code. If the function is expanded, you'll see its body, as opposed to a call or similar instruction.
You can use tools for listing symbols from object files such as nm on Linux. If the function was inlined, it will not be listed in nm output - it became part of some other function. Also you will not be able to put breakpoint on this function by name in debugger.
If you need to make sure that function is inlined and OK to go with proprietary extension in MS VC++, check out the __forceinline declarator. The compiler will either inline the function or, if it falls into the list of documented special cases, you will get a warning - so you will know the inlining status.
Not endorsing it in any way.
With gdb, if you cannot call to a function, one of its possible meanings is the function is inline. Flipping the reasoning, if you can call a function inside gdb, means the function is not marked inline.
The decision to inline or not a function is made by compiler. And since it is made by compiler, so YES, it can be made at compile time only.
So, if you can see the assembly code by using -S option (with gcc -S produces assembly code), you can see whether your function has been inlined or not.
There is a way to determine if a function is inline programmatically, without looking at the assembly code. This answer is taken from here.
Say you want to check if a specific call is inlined. You would go about like this. Compiler inlines functions, but for those functions that are exported (and almost all function are exported) it needs to maintain a non-inlined addressable function code that can be called from the outside world.
To check if your function my_function is inlined, you need to compare the my_function function pointer (which is not inlined) to the current value of the PC. Here is how I did it in my environment (GCC 7, x86_64):
void * __attribute__((noinline)) get_pc () { return _builtin_return_address(0); }
void my_function() {
void* pc = get_pc();
asm volatile("": : :"memory");
printf("Function pointer = %p, current pc = %p\n", &my_function, pc);
}
void main() {
my_function();
}
If a function is not inlined, difference between the current value of the PC and value of the function pointer should small, otherwise it will be larger. On my system, when my_function is not inlined I get the following output:
Function pointer = 0x55fc17902500, pc = 0x55fc1790257b
If the function is inlined, I get:
Function pointer = 0x55ddcffc6560, pc = 0x55ddcffc4c6a
For the non-inlined version difference is 0x7b and for the inlined version difference is 0x181f.
see the size of object files, they are different between inlined and not inlined
use nm "obj_file" | grep "fun_name", they are also different
gcc -Winline -O1
compare with assembly code
Above answer are very mush useful, I am just adding some point which we keep in our mind while writing inline function.
Remember, inlining is only a request to the compiler, not a command. Compiler can ignore the request for inlining. Compiler may not perform inlining in such circumstances like:
1) If a function contains a loop. (for, while, do-while)
2) If a function contains static variables.
3) If a function is recursive.
4) If a function return type is other than void, and the return statement doesn’t exist in function body.
5) If a function contains switch or goto statement.
Complete info: https://www.geeksforgeeks.org/inline-functions-cpp/
The compiler does not make a function inline if the function returns an address.

How deep do compilers inline functions?

Say I have some functions, each of about two simple lines of code, and they call each other like this: A calls B calls C calls D ... calls K. (So basically it's a long series of short function calls.) How deep will compilers usually go in the call tree to inline these functions?
The question is not meaningful.
If you think about inlining, and its consequences, you'll realise it:
Avoids a function call (with all the register saving/frame adjustment)
Exposes more context to the optimizer (dead stores, dead code, common sub-expression elimintation...)
Duplicates code (bloating the instruction cache and the executable size, among other things)
When deciding whether to inline or not, the compiler thus performs a balancing act between the potential bloat created and the speed gain expected. This balancing act is affected by options: for gcc -O3 means optimize for speed while -Oz means optimize for size, on inlining they have quasi opposite behaviors!
Therefore, what matters is not the "nesting level" it is the number of instruction (possibly weighted as not all are created equal).
This means that a simple forwarding function:
int foo(int a, int b) { return foo(a, b, 3); }
is essentially "transparent" from the inlining point of view.
One the other hand, a function counting a hundred lines of code is unlikely to get inlined. Except that a static free functions called only once are quasi systematically inlined, as it does not create any duplication in this case.
From this two examples we get a hunch of how the heuristics behave:
the less instructions the function have, the better for inling
the less often it is called, the better for inlining
After that, they are parameters you should be able to set to influence one way or another (MSVC as __force_inline which hints strongly at inling, gcc as they -finline-limit flag to "raise" the treshold on the instruction count, etc...)
On a tangent: do you know about partial inlining ?
It was introduced in gcc in 4.6. The idea, as the name suggests, is to partially inline a function. Mostly, to avoid the overhead of a function call when the function is "guarded" and may (in some cases) return nearly immediately.
For example:
void foo(Bar* x) {
if (not x) { return; } // null pointer, pfff!
// ... BIG BLOC OF STATEMENTS ...
}
void bar(Bar* x) {
// DO 1
foo(x);
// DO 2
}
could get "optimized" as:
void foo#0(Bar* x) {
// ... BIG BLOC OF STATEMENTS ...
}
void bar(Bar* x) {
// DO 1
if (x) { foo#0(x); }
// DO 2
}
Of course, once again the heuristics for inlining apply, but they apply more discriminately!
And finally, unless you use WPO (Whole Program Optimization) or LTO (Link Time Optimization), functions can only be inlined if their definition is in the same TU (Translation Unit) that the call site.
I've seen compilers inline more than 5 functions deep. But at some point, it basically becomes a space-efficiency trade-off that the compiler makes. Every compiler is different in this aspect. Visual Studio is very conservative with inlining. GCC (under -O3) and the Intel Compiler love to inline...

Optimizing g++ when indexing an array with an injective function

I have a for loop where each step i, it processes an array element p[f(i)], where f(i) is an injective (one-to-one) map from 1...n to 1...m (m > n). So there is no data coupling in the loop and all compiler optimization techniques such as pipelining can be used. But how can I inform g++ of the injectivity of f(i)? Or do I even need to (can g++ figure that out)?
Assuming that f doesn't rely on any global state and produces no side effects, you can tag it with the const attribute:
int f(int i) __attribute__((const));
If f does rely on global state but still has the property that it's a pure function of its inputs and global state (and produces no side effects), you can use the the slightly weaker pure attribute.
These attributes let gcc make more optimizations than it otherwise could, although I don't know if these will be helpful in your case. Take a look at the generated assembly code and see if they help.
You could also try processing the loop with a temporary storage array, i.e.:
temp[i]= process(p[f(i)]);
then copy the results back:
p[f(i)]= temp[i];
Assuming you declared p and temp to be restricted pointers, the compiler has enough information to optimize a little more aggressively.
If the definition of f() is in scope and is inlineable most any good compiler should first inline it into the function, then the next optimization passes should be able to rewrite the code as if the function call wasn't there.

Two questions about inline functions in C++

I have question when I compile an inline function in C++.
Can a recursive function work with inline. If yes then please describe how.
I am sure about loop can't work with it but I have read somewhere recursive would work, If we pass constant values.
My friend send me some inline recursive function as constant parameter and told me that would be work but that not work on my laptop, no error at compile time but at run time display nothing and I have to terminate it by force break.
inline f(int n) {
if(n<=1)
return 1;
else {
n=n*f(n-1);
return n;
}
}
how does this work?
I am using turbo 3.2
Also, if an inline function code is too large then, can the compiler change it automatically in normal function?
thanks
This particular function definitely can be inlined. That is because the compiler can figure out that this particular form of recursion (tail-recursion) can be trivially turned into a normal loop. And with a normal loop it has no problem inlining it at all.
Not only can the compiler inline it, it can even calculate the result for a compile-time constant without generating any code for the function.
With GCC 4.4
int fac = f(10);
produced this instruction:
movl $3628800, 4(%esp)
You can easily verify when checking assembly output, that the function is indeed inlined for input that is not known at compile-time.
I suppose your friend was trying to say that if given a constant, the compiler could calculate the result entirely at compile time and just inline the answer at the call site. c++0x actually has a mechanism for this called constexpr, but there are limits to how complex the code is allowed to be. But even with the current version of c++, it is possible. It depends entirely on the compiler.
This function may be a good candidate given that it clearly only references the parameter to calculate the result. Some compilers even have non-portable attributes to help the compiler decide this. For example, gcc has pure and const attributes (listed on that page I just linked) that inform the compiler that this code only operates on the parameters and has no side effects, making it more likely to be calculated at compile time.
Even without this, it will still compile! The reason why is that the compiler is allowed to not inline a function if it decides. Think of the inline keyword more of a suggestion than an instruction.
Assuming that the compiler doesn't calculate the whole thing at compile time, inlining is not completely possible without other optimizations applied (see EDIT below) since it must have an actual function to call. However, it may get partially inlined. In that case the compiler will inline the initial call, but also emit a regular version of the function which will get called during recursion.
As for your second question, yes, size is one of the factors that compilers use to decide if it is appropriate to inline something.
If running this code on your laptop takes a very long time, then it is possible that you just gave it very large values and it is simply taking a long time to calculate the answer... The code look ok, but keep in mind that values above 13! are going to overflow a 32-bit int. What value did you attempt to pass?
The only way to know what actually happens is to compile it an look at the assembly generated.
PS: you may want to look into a more modern compiler if you are concerned with optimizations. For windows there is MingW and free versions of Visual C++. For *NIX there is of course g++.
EDIT: There is also a thing called Tail Recursion Optimization which allows compilers to convert certain types of recursive algorithms to iterative, making them better candidates for inlining. (In addition to making them more stack space efficient).
Recursive function can be inlined to certain limited depth of recursion. Some compilers have an option that lets you to specify how deep you want to go when inlining recursive functions. Basically, the compiler "flattens" several nested levels of recursion. If the execution reaches the end of "flattened" code, the code calls itself in usual recursive fashion and so on. Of course, if the depth of recursion is a run-time value, the compiler has to check the corresponding condition every time before executing each original recursive step inside the "flattened" code. In other words, there's nothing too unusual about inlining a recursive function. It is like unrolling a loop. There's no requirement for the parameters to be constant.
What you mean by "I am sure about loop can't work" is not clear. It doesn't seem to make much sense. Functions with a loop can be easily inlined and there's nothing strange about it.
What are you trying to say about your example that "displays nothing" is not clear either. There is nothing in the code that would "display" anything. No wonder it "displays nothing". On top of that, you posted invalid code. C++ language does not allow function declarations without an explicit return type.
As for your last question, yes, the compiler is completely free to implement an inline function as "normal" function. It has nothing to do with function being "too large" though. It has everything to do with more-or-less complex heuristic criteria used by that specific compiler to make the decision about inlining a function. It can take the size into account. It can take other things into account.
You can inline recursive functions. The compiler normally unrolls them to a certain depth- in VS you can even have a pragma for this, and the compiler can also do partial inlining. It essentially converts it into loops. Also, as #Evan Teran said, the compiler is not forced to inline a function that you suggest at all. It might totally ignore you and that's perfectly valid.
The problem with the code is not in that inline function. The constantness or not of the argument is pretty irrelevant, I'm sure.
Also, seriously, get a new compiler. There's modern free compilers for whatever OS your laptop runs.
One thing to keep in mind - according to the standard, inline is a suggestion, not an absolute guarantee. In the case of a recursive function, the compiler would not always be able to compute the recursion limit - modern compilers are getting extremely smart, a previous response shows the compiler evaluating a constant inline and simply generating the result, but consider
bigint fac = factorialOf(userInput)
there's no way the compiler can figure that one out........
As a side note, most compilers tend to ignore inlines in debug builds unless specifically instructed not to do so - makes debugging easier
Tail recursions can be converted to loops as long as the compiler can satisfactorily rearrange the internal representation to get the recursion conditional test at the end. In this case it can do the code generation to re-express the recursive function as a simple loop
As far as issues like tail recursion rewrites, partial expansions of recursive functions, etc, these are usually controlled by the optimization switches - all modern compilers are capable of pretty signficant optimization, but sometimes things do go wrong.
Remember that the inline key word merely sends a request, not a command to the compiler. The compliler may ignore yhis request if the function definition is too long or too complicated and compile the function as normal function.
in some of the cases where inline functions may not work are
For functions returning values, if a loop, a switch or a goto exists.
For functions not returning values, if a return statement exists.
If function contains static variables.
If in line functions are recursive.
hence in C++ inline recursive functions may not work.