Does gcc optimize recursive functions? How to do it? [closed]

Does gcc optimize recursive functions? How to do it? [closed] - c++

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I found a interesting quize today about gcc http://ridiculousfish.com/blog/posts/will-it-optimize.html
How come this code
int factorial(int x) {
if (x > 1) return x * factorial(x-1);
else return 1;
}
can be translated by compiler into
int factorial(int x) {
int result = 1;
while (x > 1) result *= x--;
return result;
}
Is this true? How gcc does it?

You already know that gcc can optimize a tail-recursive function into a loop. The other thing that gcc can do (and is mentioned in your link) is try to optimize a non-tail-recursive function into a tail-recursive function.
Your factorial function is here:
int factorial(int x) {
if (x > 1) return x * factorial(x-1);
else return 1;
}
Now I'm going to try to make as few changes as possible and rewrite this as tail-recursive. First, I'll flip the if test:
int factorial(int x) {
if (!(x > 1)) return 1;
else return x * factorial(x-1);
}
Next, I'll remove the unneeded else:
int factorial(int x) {
if (!(x > 1)) return 1;
return x * factorial(x-1);
}
This is almost tail-recursive, but it is returning x * factorial() and not just factorial(). The typical way to make this tail recursive is to include a second parameter, which is an accumulator.
int factorial(int x, int accumulator = 1) {
if (!(x > 1)) return accumulator;
return factorial(x - 1, x * accumulator);
}
Now this is a tail-recursive function, and it can be optimized into a loop.

The compiler could transform that code into something tail-call optimisable by putting the multiplication before the recursive function call:
int factorial(int x) {
return factorial_tail_call(x, 1);
}
int factorial_tail_call(int x, int result) {
if (x > 1) return factorial_tail_call(x-1, result*x);
return result;
}
By performing the evaluation of result*x before factorial_tail_call is called recursively, the compiler can determine that x and result are no longer needed. Hence, it can pop them from the stack. This forms proof that the stack doesn't need to grow.
Can you see any resemblance between the transformed-to code? The 1 is in the same place, the condition x > 1 is in the same place and return result; is in the same place. It's all just a different way of expressing the same algorithm, providing the compiler implements tail call optimisation. By moving the multiplication expression into an argument and putting in comments the code from your post to the right, you might be able to see some resemblance of functionality, and how the compiler managed to make the remainder of the transformation:
int factorial(int x) {
return factorial_tail_call(x, 1); // int result = 1;
}
int factorial_tail_call(int x, int result) {
if (x > 1) return factorial_tail_call(x-1, result*x); // while (x > 1) result *= x--;
return result; // return result;
}
§5.1.2.3p4 of n1570.pdf
In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
Compilers are smart things, written by far better programmers than most of us. If the compiler can figure out that two pieces of code are equivalent, then it can choose whichever of the two it wishes (with some restrictions, described in the quotation below). For example, it could replace a loop that calculates and prints the first thousand primes with a single printf expression.
§5.1.2.3p6 of n1570.pdf
The least requirements on a conforming implementation are:
— Accesses to volatile objects are evaluated strictly according to the
rules of the abstract machine.
— At program termination, all data written into files shall be
identical to the result that execution of the program according to the
abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take
place as specified in 7.21.3. The intent of these requirements is that
unbuffered or line-buffered output appear as soon as possible, to
ensure that prompting messages actually appear prior to a program
waiting for input.
This is the observable behavior of the program.
That's one reason why micro-optimisation is futile.
If another thread modifyies a string that strlen is processing, that's a race condition. Race conditions are undefined behaviour. You need to guard the string with a mutex to ensure this doesn't happen, or learn better multi-threading paradigms. Which book are you reading?
§5.1.2.4p25 of n1570.pdf
The execution of a program contains a data race if it contains two
conflicting actions in different threads, at least one of which is not
atomic, and neither happens before the other. Any such data race
results in undefined behavior.

Related

Use of reference variables in C++ [duplicate]

This question already has an answer here:
Use pass by reference in recursion
(1 answer)
Closed 1 year ago.
I came across the following question:
#include <stdio.h>
int f(int &x, int c)
{
c = c - 1;
if (c == 0) return 1;
x = x + 1;
return f(x, c) * x;
}
int main()
{
int p = 5;
printf("%d", f(p, p));
}
As far as I have worked out, the recursion calls should work, and with each recursion call, value of c should reduce by 1 and value of x should increase by 1 . However, if I calculate it that way, the answer comes out to be 3024. However, on executing, the output comes out to be 6561. I am not really sure how this answer is coming.
I have the link to the article from where this question is taken, but I fail to understand how x will remain constant as is described in this link: https://www.geeksforgeeks.org/c-references-question-1/
Can someone help me with the working behind this code?

There can be any result, due to Undefined behavior.
Since the x is passed by reference, the last assign will matter. Thus, we have:
9*9*9*9*1=6561
We delay the evaluation of the f(x, c) * x, where all the subsequent recursive calls will have access to the x. We increment the x this way until the c is equal to 0, means we increment the x up to it being 9.
When we hit the base case (the c == 0), the current call returns 1, while the x is already 9.
Then we evaluate the multiplication 1 * x * x * x * x, where x is equal to 9. Thus, the 6561.
Flawless explanation? Not so much.
The fun part is that there is no concept of left-to-right or right-to-left evaluation in C++:
Order of evaluation of any part of any expression, including order of
evaluation of function arguments is unspecified (with some exceptions
listed below). The compiler can evaluate operands and other
subexpressions in any order, and may choose another order when the
same expression is evaluated again.
Means, that though this logic gets a pass this time, it may won't be this way next time for me or if you run it yourself.

Is a boolean expression as onerous as branching with if or switch?

Often I convert some if statements into boolean expressions for code compactness. For instance, if I have something like
foo(int x)
{
if (x > 5) return 100 + 5;
return 100;
}
I'll do it like
foo(int x)
{
return 100 + (x > 5) * 5;
}
This is very simple so no problem, the thing is when I have multiple tests, I can greatly simplify them (at the expense of readability but that's a different issue).
So the question is if that (x > 5) evaluation is as onerous as explicitly branching with it.

In both cases the expression (x > 5) has to be checked if it evaluates to true . And as demonstrated already, both versions compile to the same assembly even without any optimization enabled.
However, the Philosophy section of C++ Core Guidelines has these two rules you would do well to pay heed to:
P.1: Express ideas directly in code
P.3: Express intent
Though these rules cannot be enforced in anyway, adhering to them will make you adopt the version with the if statement.
Doing so will make it less onerous for those who have to maintain the code after you and even yourself a few months later.

You seem to be conflating C++ language constructs with patterns in the assembly. It may have been viable to reason about code on this level given the compilers of the late eighties or early nineties. At this point, however, compilers apply a lot of optimizations and transformations whose correctness or utility is not even obvious to the average programmer. A very simple example is the common beginner's mistake of assuming the following equivalences:
std::uint16_t a = ...;
a *= 2; // a multiplication in assembly
a *= 17; // ditto
a /= 3; // a division in assembly
They may then be surprised to find out that their compiler of choice translates these into the assembly equivalent of e.g.:
a <<= 1u;
a = (a << 4u) + a; // or even (a << 4u) | a if a < 16
a *= 43691u;
Note that the last transformation is only allowed if a is known to be a multiple of the divisor, so you may not see this kind of optimization all too often. How does it even work? In mathematical terms, uint16_t can be thought of as the residue class ring Z/(2^16)Z, and in this ring, there exists a multiplicative inverse for any element that is coprime to 2^16 (i.e. not divisible by 2). If d (e.g. 3) is coprime to 2, it has such an inverse, and then dividing by d is simply equivalent to multiplying by the inverse of d if the remainder is known to be zero. (I won't go into how this inverse can be calculated here.)
Here is another surprising optimization:
long arithsum(long n)
{
long result = 0;
for (long i=0; i<=n; ++i)
result += i;
return result;
}
GCC with -O3 rather mundanely translates this into an unrolled loop of additions. My version (9.0.0svn-something) of Clang, however, will pull a Gauss on you if you do this, and translate this into something like:
long arithsum(long n)
{
return (n * (n+1)) >> 1;
}
Anyway, the same caveats apply to if/switch etc. – while these are control flow structures, and so you'd think they correspond to branching, this may not be so. Likewise, what appears to be a non-branching operation might be translated to a branching operation if the compiler has an optimization rule under which this seems beneficial, or even if it is just unable to translate its own AST or intermediate representation into machine code without use of branching (on the given architecture).
TL;DR: Before you try to outsmart your compiler, figure out which assembly the compiler produces for the straightforward / readable code in this first place. If this assembly is good, there is no point in making the code more subtle / less readable.

Assuming by onerous you mean 1/0. Sure it might work in C/C++ due to implicit typecasting but might not for other languages. If that's what you want to achieve why not use ternary operator (? :) which also makes the code more readable
foo(int x) {
return (x > 5) ? (100 + 5) : 100;
}
Also read this stackoverflow article -- bool to int conversion

Can (a==1 && a==2 && a==3) ever evaluate to true in C or C++?

We know it can in Java and JavaScript.
But the question is, can the condition below ever evaluate to true in C or C++?
if(a==1 && a==2 && a==3)
printf("SUCCESS");
EDIT
If a was an integer.

Depends on your definition of "a is an integer":
int a__(){ static int r; return ++r; }
#define a a__() //a is now an expression of type `int`
int main()
{
return a==1 && a==2 && a==3; //returns 1
}
Of course:
int f(int b) { return b==1&&b==2&&b==3; }
will always return 0; and optimizers will generally replace the check with exactly that.

If we put macro magic aside, I can see one way that could positively answer this question. But it will require a bit more than just standard C. Let's assume we have an extension allowing to use the __attribute__((at(ADDRESS))); attribute, which is placing a variable at some specific memory location (available in some ARM compilers for example, like ARM GCC). Lets assume we have a hardware counter register at the address ADDRESS, which is incrementing each read. Then we could do something like this:
volatile int a __attribute__((at(ADDRESS)));
The volatile is forcing the compiler to generate the register read each time the comparison is performed, so the counter will increment 3 times. If the initial value of the counter is 1, the statement will return true.
P.S. If you don't like the at attribute, same effect can be achieved using linker script by placing a into specific memory section.

If a is of a primitive type (i.e all == and && operators are built in) and you are in defined behavior, and there's no way for another thread to modify a in the middle of execution (this is technically a case of undefined behavior - see comments - but I left it here anyway because it's the example given in the Java question), and there is no preprocessor magic involved (see chosen answer), then I don't believe there is anything way for this to evaluate to true. However, as you can see by that list of conditions, there are many scenarios in which that expression could evaluate to true, depending on the types used and the context of the code.

In C, yes it can. If a is uninitialised then (even if there is no UB, as discussed here), its value is indeterminate, reading it gives indeterminate results, and comparing it to other numbers therefore also gives indeterminate results.
As a direct consequence, a could compare true with 1 in one moment, then compare true with 2 instead the next moment. It can't hold both those values simultaneously, but it doesn't matter, because its value is indeterminate.
In practice I'd be surprised to see the behaviour you describe, though, because there's no real reason for the actual storage to change in memory in the time between the two comparisons.
In C++, sort of. The above is still true there, but reading an indeterminate value is always an undefined operation in C++ so really all bets are off.
Optimisations are allowed to aggressively bastardise your code, and when you do undefined things this can quite easily result in all manner of chaos.
So I'd be less surprised to see this result in C++ than in C but, if I did, it would be an observation without purpose or meaning because a program with undefined behaviour should be entirely ignored anyway.
And, naturally, in both languages there are "tricks" you can do, like #define a (x++), though these do not seem to be in the spirit of your question.

The following program randomly prints seen: yes or seen: no, depending on whether at some point in the execution of the main thread (a == 0 && a == 1 && a == 2) evaluated to true.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
_Atomic int a = 0;
_Atomic int relse = 0;
void *writer(void *arg)
{
++relse;
while (relse != 2);
for (int i = 100; i > 0; --i)
{
a = 0;
a = 1;
a = 2;
}
return NULL;
}
int main(void)
{
int seen = 0;
pthread_t pt;
if (pthread_create(&pt, NULL, writer, NULL)) exit(EXIT_FAILURE);
++relse;
while (relse != 2);
for (int i = 100; i > 0; --i)
seen |= (a == 0 && a == 1 && a == 2);
printf("seen: %s\n", seen ? "yes":"no");
pthread_join(pt, NULL);
return 0;
}
As far as I am aware, this does not contain undefined behavior at any point, and a is of an integer type, as required by the question.
Obviously this is a race condition, and so whether seen: yes or seen: no is printed depends on the platform the program is run on. On Linux, x86_64, gcc 8.2.1 both answers appear regularly. If it doesn't work, try increasing the loop counters.

Use pass by reference in recursion

int f(int &x, int c)
{
c = c - 1;
if (c == 0) return 1;
x = x + 1;
return f(x, c) * x;
}
int x = 5;
cout << f(x,5);
In the example above the four possible answers to choose from are:
3024
6561
55440
161051
Function f(int &x, int c) is called four times after the first call before it reaches the base case where it returns the result which is 6561. My guess was 3024 but I was wrong. Even if the x variable which is passed by reference increments in each call of f(int &x, int c) and takes the values 6->7->8->9 respectively the final result of this recursion is equal to 9^4.
So my question is: Variable x is passed by reference and is equal to 9 when it reaches the base case. Does that mean that all the stages of recursion will have this value for variable x even if they had a different value when they've been called?

No, there are more than four answers to choose from.
The fetch of x for the recursive function call, and the fetch of x for the right hand side of multiplication, is not sequenced with each other; and as such the evaluation order is unspecified.
This doesn't mean that the evaluation order would be some particular evaluation order, and it's only necessary to figure it out. This means that the final results can:
Vary depending on the compiler.
Vary each time this program executes.
The evaluation order may also be different for each individual recursive call. Each recursive call can end up using a different evaluation order, too. "Unspecified" means "unspecified". Any possibility can happen. Each individual time.
I didn't bother to calculate all actual possibilities here. It's better to invest one's own time on something that should work properly, instead of on something that obviously can never work properly.
If you want a specific evaluation order, it's going to be either this:
int y=x;
return f(x, c) * y;
Or this:
int y=f(x, c);
return y * x;
This evaluation order is now specified.

Can't recursive functions be inlined? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Can a recursive function be inline?
What are the trade offs of making recursive functions inline.

Recursive functions that can be optimised by tail-end recursion can certainly be inlined. If the last thing a function does is call itself, then it can be converted into a plain loop.

Arbitrary recursive functions can't be inlined for the same reason a snake can't swallow its own tail.

[Edit: just noticed that although your title says "be inlined", your actual question says "making functions inline". The two effectively have nothing to do with one another, they just have confusingly similar names. In modern compilers, the primary effect of inline is the thing that originally in C99 was (I think) just a necessary detail to make inline work at all: to permit multiple definitions of a symbol with external linkage. That's because modern compilers don't pay a whole lot of attention to the programmer's opinion of whether a function should be inlined. They do pay some, though, so the confusion of concepts persists. I've answered the question in the title, which is the decision the compiler makes, not the question in the body, which is the decision the programmer makes.]
Inlining is not necessarily an all-or-nothing deal. One strategy which compilers use to decide whether to inline, is to keep inlining function calls until the resulting code is "too big". "Big" is defined by some hopefully sensible heuristic.
So consider the following recursive function (which deliberately is not simply tail-recursive):
int triangle(int n) {
if (n == 1) return 1;
return n + triangle(n-1);
}
If it's called like this:
int t100() {
return triangle(100);
}
Then there's no particular reason in principle that the usual rules that the compiler uses for inlining shouldn't result in this:
int t100() {
// inline call to triangle(100)
int result;
if (100 == 1) { result = 1; } else {
// inline call to triangle(99)
int t99;
if (100 - 1 == 1) { t99 = 1; } else {
// inline call to triangle(98)
int t98;
if (100 - 1 - 1 == 1) { t98 = 1; } else {
// oops, "too big", no more inlining
t98 = triangle(100 - 1 - 1 - 1) + 98;
}
t99 = t98 + 99;
}
result = t99 + 100;
}
return result;
}
Obviously the optimiser will have a field day with that, so it's much "smaller" than it looks:
int t100() {
return triangle(97) + 297;
}
The code in triangle itself could be "unrolled" a few steps by a few levels of inlining, in exactly the same way, except that it doesn't have the benefits of constants:
int triangle(int n) {
if (n == 1) return 1;
if (n == 2) return 3;
if (n == 3) return 6;
return triangle(n-3) + 3*n - 3;
}
I doubt whether compilers actually do this, though, I don't think I've ever noticed it [Edit: MSVC does if you tell it to, thanks peterchen].
There's an obvious potential benefit in saving call overhead, but as against that people don't really expect recursive functions to get inlined, and there's no particular guarantee that the usual inlining heuristics will perform well with recursive functions (where there are two different places, the call site and the recursive call, that might be inlined, with different benefits in each case). Furthermore, it's difficult at compile time to estimate how deep the recursion will go, and the inline heuristics might like to take account of the call depth to make decisions. So it may be that the compiler just doesn't bother.
Functional language compilers are typically a lot more aggressive dealing with recursion than C or C++ compilers. The relevant trade-off there is that so many functions written in functional languages are recursive, that performance might be hopeless if the compiler couldn't optimise tail-recursion. So Lisp programmers typically rely on good optimisation of recursive functions, whereas C and C++ programmers typically don't.

If your compiler does not support it, you can try manually inlining instead...
int factorial(int n) {
int result = 1;
if (n-- == 0) {
return result;
} else {
result *= 1;
if (n-- == 0) {
return result;
} else {
result *= 2;
if (n-- == 0) {
return result;
} else {
result *= 3;
if (n-- == 0) {
return result;
} else {
result *= 4;
if (n-- == 0) {
return result;
} else {
// ...
}
}
}
}
}
}
See the problem yet?

Tail recursion (a special case of recursion) it's possible to be inlined by smart compilers.

Now, hold on. A tail-recursive function could be unrolled and inlined pretty easily. Apparently there are compilers that do this, but I am not aware of specifics.

Of course. Any function can be inlined if it makes sense to do it:
int f(int i)
{
if (i <= 0) return 1;
else return i * f(i - 1);
}
int main()
{
return f(10);
}
pseudo assembly (f is inlined in main):
main:
mov r0, #10 ; Pass 10 to f
f:
cmp r0, #0 ; arg <= 0? ...
bge 1l
mov r0, #1 ; ... is so, return 1
ret
1:
mov r0, -(sp) ; if not, save arg.
dec r0 ; pass arg - 1 to f
call f ; just because it's inlined doesn't mean I can't call it.
mul r0, (sp)+ ; compute the result
ret ; done.
;-)

When you call an ordinary function when you change command sequential execution order and jump(call or jmp) into some address where the function resides. Inlining mean that you place in all occurences of this function the commands of this function, so you don't have a one place where you could jump, also other types of optimisations can be used, like elemination of pushing/popping function parameters.

When you know, that the recursive chain will in normal cases be not so long, you could do inlining upto a predefined level (I don't know, if any existing compiler is intelligent enough for this today).
Inlining a recursive function is much like unrolling a loop. You will end up with much duplicate code -- but in some cases it could be worthwhile:
The number of recursive calls (the length of the chain) is normally short (in cases it gets longer than predefined, just do normal recursion)
The overhead for the functions calls is relatively big compared to the logic -- so do some "unrolling" for example five instances and end up doing a recursive call again -- this would lead to saving 80% of the call overhead.
Off course the tail-recursive special-case -- but this was mentioned by others.

Of course can be declared inline. The inline keyword is just a hint to the compiler. In many case the compiler just ignore it and depending on the compiler this could be one of this situatios.

Some compilers cna turn tail recursion into plain loops, and thus inline them normally.
Non-tail recursion could be inlined up to a given depth, usually decided by the compiler.
I've never encountered a practical application for that, as the cost of call isn't high enough anymore to offset the increase in code size.
[edit] (to clarify that: even though I like to toy with these things, and often check what code my compiler generates for "funny stuff" just out of curiosity, I haven't encountered a use case where any such unrolling helped significantly. This doesn't mean they don't exist or couldn't be constructed.
The only place where it would help is precalculating low iterations during compile time. However, in my experience this immensely increases compile times for often negligible runtime performance benefits.
Note that Visual Studio 2008 (and earlier) gives you quite some control over this:
#pragma inline_recursion(on)
#pragma inline_depth(N)
__forceinline
Be careful with the latter, it can easily overload the compiler :)

Inline means that on each place a call to a function marked as inline gets done, the compiler places a copy of the said function code there. This avoids function calling mechanisms, and it's usual argument stack pushing-poping, saving time in gazillion-calls-per-second situations. You see the consequences to static variables and stuff like that? all gone...
So, if you had an inlined recursive call, either your compiler is super smart and figures whether the number of copies is deterministic, of it will say "Cannot make it inline", because it wouldn't know when to stop.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Does gcc optimize recursive functions? How to do it? [closed] - c++

Related

Use of reference variables in C++ [duplicate]

Is a boolean expression as onerous as branching with if or switch?

Can (a==1 && a==2 && a==3) ever evaluate to true in C or C++?

Use pass by reference in recursion

Can't recursive functions be inlined? [duplicate]

Categories

Resources