I'm a beginner in C++. Yesterday I read about recursive functions, so I decided to write my own. Here's what I wrote:
int returnZero(int anyNumber) {
if(anyNumber == 0)
return 0;
else {
anyNumber--;
return returnZero(anyNumber);
}
}
When I do this: int zero1 = returnZero(4793);, it causes a stack overflow. However, if I pass the value 4792 as the argument, no overflow occurs.
Any ideas as to why?
Whenever you call a function, including recursively, the return address and often the arguments are pushed onto the call stack. The stack is finite, so if the recursion is too deep you'll eventually run out of stack space.
What surprises me is that it only takes 4793 calls on your machine to overflow the stack. This is a pretty small stack. By way of comparison, running the same code on my computer requires ~100x as many calls before the program crashes.
The size of the stack is configurable. On Unix, the command is ulimit -s.
Given that the function is tail-recursive, some compilers might be able to optimize the recursive call away by turning it into a jump. Some compilers might take your example even further: when asked for maximum optimizations, gcc 4.7.2 transforms the entire function into:
int returnZero(int anyNumber) {
return 0;
}
This requires exactly two assembly instructions:
_returnZero:
xorl %eax, %eax
ret
Pretty neat.
You just hit the call stack's size limit of your system, that's what's happening. For some reason the stack in your system is tiny, a depth of 4793 function calls is rather small.
Your stack is limited in size and so when you make 4793 calls you are hitting the limit while 4792 just comes in under. Each function call will use some space on the stack for house keeping and maybe arguments.
This page gives an example of what a stack looks like during a recursive function call.
My guess is you stack is exactly big enough to fit 4792 entries - today. Tomorrow or the next, that number might be different. Recursive programming can be dangerous and this example illistrates why. We try not to let recursion get this deep or 'bad' things can happen.
Any "boundless" recursion, that is recursive calls that aren't naturally limited to a small(ish) number will have this effect. Exactly where the limit goes depends on the OS, the environment the function is called in (the compiler, which function calls the recursive function, etc, etc).
If you add another variable, say int x[10]; to your function that calls your recursive function, the number needed to crash it will change (probably by about 5 or so).
Compile it with a different compiler (or even different compiler settings, e.g. optimization turned on) and it will probably change again.
Using recursion, you can achieve SuperDigit:
public class SuperDigit
{
static int sum = 0;
int main()
{
int n = 8596854;
cout<<getSum(n);
}
int getSum(int n){
sum=0;
while (n > 0) {
int rem;
rem = n % 10;
sum = sum + rem;
n = n / 10;
getSum(n);
}
return sum;
}
}
Related
I am interested in the execution speed of a function called through a pointer. I found initially that calling a function pointer through a pointer passed in as a parameter is slower than calling a locally declared function pointer. Please see the following code; you can see I have two function calls, both of which ultimately execute a lambda through a function pointer.
#include <chrono>
#include <iostream>
using namespace std;
__attribute__((noinline)) int plus_one(int x) {
return x + 1;
}
typedef int (*FUNC)(int);
#define OUTPUT_TIME(msg) std::cout << "Execution time (ns) of " << msg << ": " << std::chrono::duration_cast<chrono::nanoseconds>(t_end - t_start).count() << std::endl;
#define START_TIMING() auto const t_start = std::chrono::high_resolution_clock::now();
#define END_TIMING(msg) auto const t_end = std::chrono::high_resolution_clock::now(); OUTPUT_TIME(msg);
auto constexpr g_count = 1000000;
__attribute__((noinline)) int speed_test_no_param() {
int r;
auto local_lambda = [](int a) {
return plus_one(a);
};
FUNC f = local_lambda;
START_TIMING();
for (auto i = 0; i < g_count; ++i)
r = f(100);
END_TIMING("speed_test_no_param");
return r;
}
__attribute__((noinline)) int speed_test_with_param(FUNC &f) {
int r;
START_TIMING();
for (auto i = 0; i < g_count; ++i)
r = f(100);
END_TIMING("speed_test_with_param");
return r;
}
int main() {
int ret = 0;
auto main_lambda = [](int a) {
return plus_one(a);
};
ret += speed_test_no_param();
FUNC fp = main_lambda;
ret += speed_test_with_param(fp);
return ret;
}
Built on Ubuntu 20.04 with:
g++ -ggdb -ffunction-sections -O3 -std=c++17 -DNDEBUG=1 -DRELEASE=1 -c speed_test.cpp -o speed_test.o && g++ -o speed_test -Wl,-gc-sections -Wl,--start-group speed_test.o -Wl,--rpath='$ORIGIN' -Wl,--end-group
The results were not surprising; for any given number of runs, we see that the version without the parameter is clearly the fastest. Here is just one run; all of the many times I have run, this yields the same result:
Execution time (ns) of speed_test_no_param: 74
Execution time (ns) of speed_test_with_param: 1173849
When I dig into the assembly, I found what I believe is the reason for this. The code for speed_test_no_param() is:
0x000055555555534b call 0x555555555310 <plus_one(int)>
... whereas the code for speed_test_with_param is more complicated; a fetch of the address of the lambda, then a jump to the plus_one function:
0x000055555555544e call QWORD PTR [rbx]
...
0x0000555555555324 jmp 0x555555555310 <plus_one(int)>
(On compiler explorer at https://godbolt.org/z/b4hqYx7Eo. Different compiler but similar assembly; timing code commented out.)
What I didn't expect though is that when I reduce the number of calls down to 1 from 1000000 (auto constexpr g_count = 1), the results are flipped with the parameter version being the fastest:
Execution time (ns) of speed_test_no_param: 61
Execution time (ns) of speed_test_with_param: 31
I have also run this many times; the parameter version is always the fastest.
I do not understand why this is; I don't now believe a call through a parameter is slower than a local variable due to this conflicting evidence, but looking at the assembly suggests it really should be.
Can someone please explain?
UPDATE
As per the comment below, ordering matters. When I call speed_test_with_param() first, speed_test_no_param() is the fastest of the two! Yet when I call speed_test_no_param() first, speed_test_with_param() is the fastest! Any explanation to this would be greatly appreciated!
With multiple loop iterations in the C++ source, the fast version is only doing one in asm, because you gave the optimizer enough visibility to prove that's equivalent.
Why ordering matters with just one iteration: probably warm-up effects in the library code for std::chrono. Idiomatic way of performance evaluation?
Can you confirm that my suspicion that the call without the parameter technically should be the fastest, because with the parameter involves a memory read to find the location to call?
Much more significant is whether the compiler can constant-propagate the function pointer and see what function is being called; notice how speed_test_with_param has an actual loop that calls g_count times, but speed_test_no_param can see it's calling plus_one. Clang sees through the local lambda and the noinline to notice it has no side-effects, so it only calls it once.
It doesn't inline, but it still does inter-procedural optimization. With GCC, you could block that by using __attribute__((noipa)). GCC's noclone attribute can also stop it from making a copy of the function with constant-propagation into it, but noipa is I think stronger. noinline isn't sufficient for benchmarking stuff that becomes trivial to optimize when the compiler can see everything. But I don't think clang has anything like that.
You can make functions opaque to the optimizer by putting them in separate source files and not using -flto or other option like gcc -fwhole-program
The only reason store/reload is involved with the function pointer is because you passed it by reference for no reason, even though it's just a single pointer. If you pass it by value (https://godbolt.org/z/WEvvsvoxb) you can see call rbx in the loop.
Apparently clang couldn't hoist the load because it wasn't sure the caller's function-pointer wouldn't be modified by the call, because it was making a stand-alone version of speed_test_with_param that would work with any caller and any arg, not just the one main passes. So constprop didn't happen.
An indirect call can mispredict more easily, and yes store/reload adds a few cycles more latency before the prediction can be checked.
So yes, in general you'd expect it to be slower when the function to be called is a function-pointer arg, not a compile-time-constant fptr initialized within the calling function where the compiler can see the definition of what it's calling even if you artificially limit it.
If it becomes a call some_name instead of call rbx, that's still faster even if it does still have to loop like you were trying to make it.
(Microbenchmarking is hard, especially when you're trying to benchmark a C++ concept which can optimize differently depending on context; you have to know enough about compilers, optimization, and assembly to realize what makes the difference and what you're actually measuring. There isn't a meaningful answer to some questions, like "how fast or slow is the + operator?", even if you limit it to integers, because it can optimize away with constants, or vectorize, or not depending on how it's used.)
You're benchmarking a single iteration, which subjects you to cache effects and other warmup costs. The entire reason we normally run benchmarks several times is to amortize out these kinds of effects.
Caching refers to the memory hierarchy: your actual RAM is significantly slower than your CPU (and disk even more so), so to speed things up your CPU has a cache (often, multiple caches) which stores the most recently accessed bits of memory. The first time you start your program, it will need to be loaded from disk into RAM; thereafter, it will need to be loaded from RAM into the CPU caches. Uncached memory accesses can be orders of magnitudes slower than cached memory accesses. As your program runs, various bits of code and data will be loaded from RAM and cached; hence, subsequent executions of the same bit of code will often be faster than the first execution.
Other effects can include things like lazy dynamic linking and lazy initializations, wherein certain functions will perform extra work the first time they're called (for example, resolving dynamic library loads or initializing static data). These can all contribute to the first iteration being slower than subsequent iterations.
To address these issues, always make sure to run your benchmarks multiple times - and when possible, run your entire benchmark suite a few times in one process and take the lowest (fastest) run.
I'm a beginner in C++. Yesterday I read about recursive functions, so I decided to write my own. Here's what I wrote:
int returnZero(int anyNumber) {
if(anyNumber == 0)
return 0;
else {
anyNumber--;
return returnZero(anyNumber);
}
}
When I do this: int zero1 = returnZero(4793);, it causes a stack overflow. However, if I pass the value 4792 as the argument, no overflow occurs.
Any ideas as to why?
Whenever you call a function, including recursively, the return address and often the arguments are pushed onto the call stack. The stack is finite, so if the recursion is too deep you'll eventually run out of stack space.
What surprises me is that it only takes 4793 calls on your machine to overflow the stack. This is a pretty small stack. By way of comparison, running the same code on my computer requires ~100x as many calls before the program crashes.
The size of the stack is configurable. On Unix, the command is ulimit -s.
Given that the function is tail-recursive, some compilers might be able to optimize the recursive call away by turning it into a jump. Some compilers might take your example even further: when asked for maximum optimizations, gcc 4.7.2 transforms the entire function into:
int returnZero(int anyNumber) {
return 0;
}
This requires exactly two assembly instructions:
_returnZero:
xorl %eax, %eax
ret
Pretty neat.
You just hit the call stack's size limit of your system, that's what's happening. For some reason the stack in your system is tiny, a depth of 4793 function calls is rather small.
Your stack is limited in size and so when you make 4793 calls you are hitting the limit while 4792 just comes in under. Each function call will use some space on the stack for house keeping and maybe arguments.
This page gives an example of what a stack looks like during a recursive function call.
My guess is you stack is exactly big enough to fit 4792 entries - today. Tomorrow or the next, that number might be different. Recursive programming can be dangerous and this example illistrates why. We try not to let recursion get this deep or 'bad' things can happen.
Any "boundless" recursion, that is recursive calls that aren't naturally limited to a small(ish) number will have this effect. Exactly where the limit goes depends on the OS, the environment the function is called in (the compiler, which function calls the recursive function, etc, etc).
If you add another variable, say int x[10]; to your function that calls your recursive function, the number needed to crash it will change (probably by about 5 or so).
Compile it with a different compiler (or even different compiler settings, e.g. optimization turned on) and it will probably change again.
Using recursion, you can achieve SuperDigit:
public class SuperDigit
{
static int sum = 0;
int main()
{
int n = 8596854;
cout<<getSum(n);
}
int getSum(int n){
sum=0;
while (n > 0) {
int rem;
rem = n % 10;
sum = sum + rem;
n = n / 10;
getSum(n);
}
return sum;
}
}
I was just going through this Wikipedia entry. Out of curiosity to find the stack size allocated to a simple process, i tried this
int main()
{
static int count = 0;
cout<<" Count = "<<count++<<endl;
main();
return 0;
}
Compiler DevC++
I got this :-
Till this point everything is fine, understandable, by the way from last digit i.e. 43385 can i guess the maximum stack size - on 32 bit machine (what if i say 4 bytes(4 bytes for return address on stack for each call), i may sound silly on this.
Now if i modify my program to :-
void foo()
{
static int count = 0;
cout<<" Count = "<<count++<<endl;
foo();
}
int main()
{
foo();
return 0;
}
In this i get Stack Over flow at count :- 130156 (ok, fine)
But my question if, i add one function in between main and foo, i get this count decrements by 1(130155), if 2 functions in b/w foo and main count is decremented by 2(130154) and so on. Why is this behavior? Is it because 1 space is being consumed for each function address.
Firstly correct your program by adding Count++ (silly).
Stack size is not fixed, most compilers let you specify the Stack size. Stack size is also dependent on some factors like Platform, toolchain, ulimit,and other parameters.There are many static and dynamic properties that can influence it.
There are three kinds of memory limits:
for 32-bit (windows)
Static data - 2GB
Dynamic data - 2GB
Stack data - 1GB (the stack size is set by the linker, the default is 1MB. This can be increased using the Linker property System > Stack Reserve Size).
by using your program you can guess the current stack size.
The memory-limits-applications-windows Stack Overflow Recursion in C c-maximum-stack-size-of-program will help you.
I wrote a function using a recursion. While testing it, it turned out, that the function is killed without any obvious reason, while the recursion is still running.
To test this, I wrote an infinite recursion.
On my PC this function quits after about 2 seconds and the last output is about 327400.
The last number isn't always the same.
I am using Ubuntu Lucid Lynx, the GCC compiler and Eclipse as IDE. If somebody has an idea what the problem is and how I can prevent the program from exiting I would be really pleased.
#include <iostream>
void rek(double x){
std::cout << x << std::endl;
rek(x + 1);
}
int main(int argc, char **argv) {
rek(1);
}
You are most likely overflowing the stack, at which point your program will be summarily killed. The depth of the stack will always limit the amount you can recurse, and if you are hitting that limit, it means your algorithm needs to change.
I think you are right in expecting the code to run forever, as explained in
How do I check if gcc is performing tail-recursion optimization?
your code should be able to run for ever and ever, if gcc is performing tail recursion. On my machine it looks like -O3 actually makes gcc generate tail calls and actually flatten the stack. :-)
I surgest you set the optimize flag to O2 or O3.
You are causing a stack overflow (running out of stack space) because you don't provide an exit condition.
void rek(double x){
if(x > 10)
return;
std::cout << x << std::endl;
rek(x + 1);
}
Are you expecting this to work forever?
It won't. At some point you're going to run out of stack.
This is funny, talking about stack overflow on stackoverflow.com. ;) The call stack is limited (you can customized it from the project settings), but at some point, when you have infinite loop calls, it will be exceed and your program terminated.
If you want to avoid a stack overflow with infinite recursion, you're unfortunately going to have to delve into some assembly in order to change the stack so that a new activation record isn't constantly pushed onto the stack, which after some point will cause the overflow. Because you make the recursive call at the end of the function, this is called in other languages where recursion is popular (i.e., Lisp, Scheme, Haskell, etc.) a trail-call optimization. It prevents a stack overflow by basically transforming the tail-call into a loop. It would be something like this in C (note: I'm using inline assembly with gcc on x86, and I changed your arguments to int from double in order to simplify the assembly. Also I've changed to C from C++ in order to avoid name-mangling of function-names. Finally the "\n\t" at the end of each statement is not an actual assembly command but is needed for inline assembly in gcc):
#include <stdio.h>
void rek(int x)
{
printf("Value for x: %d\n", x);
//we now duplicate the equvalent of `rek(x+1);` with tail-call optimization
__asm("movl 8(%ebp), %eax\n\t" //get the value of x off the stack
"incl %eax\n\t" //add 1 to the value of x
"movl 4(%ebp), %ecx\n\t" //save the return address on the stack
"movl (%ebp), %edx\n\t" //save the caller's activation record base pointer
"addl $12, %ebp\n\t" //erase the activation record
"movl %ebp, %esp\n\t" //reset the stack pointer
"pushl %eax\n\t" //push the new value of x on the stack for function call
"pushl %ecx\n\t" //push the return value back to the caller (i.e., main()) on the stack
"movl %edx, %ebp\n\t" //restore the old value of the caller's stack base pointer
"jmp rek\n\t"); //jump to the start of rek()
}
int main()
{
rek(1);
printf("Finished call\n"); //<== we never get here
return 0;
}
Compiled with gcc 4.4.3 on Ubuntu 10.04, this ran pretty much "forever" in an infinite loop with no stack overflow, where-as without the tail-call optimization, it crashed with a segmentation fault pretty quickly. You can see from the comments in the __asm section how the stack activation record space is being "recycled" so that each new call does not use up space on the stack. This involves saving the key values in the old activation record (the previous caller's activation record base pointer and the return address), and restoring them, but with the arguments changed for the next recursive call to the function.
And again, other languages, mainly functional languages, perform tail-call optimization as a base-feature of the language. So a tail-call recursive function in Scheme/Lisp/etc. won't overflow the stack since this type of stack manipulation is done under-the-hood for you when a new function call is made as the last statement of an existing function.
Well you have defined infinite recursion and overflowing the stack, which kills your app. If you really want to print all numbers; then use a loop.
int main(...)
{
double x = 1;
while (true)
{
std:cout << x << std::endl;
x += 1;
}
}
Each recursive method should implement an exit condition, otherwise you will get stack overflow and the program will terminate.
In your case, there is no condition on the parameter you are passing to the function,hence, it runs forever and eventually crashes.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Can a recursive function be inline?
What are the trade offs of making recursive functions inline.
Recursive functions that can be optimised by tail-end recursion can certainly be inlined. If the last thing a function does is call itself, then it can be converted into a plain loop.
Arbitrary recursive functions can't be inlined for the same reason a snake can't swallow its own tail.
[Edit: just noticed that although your title says "be inlined", your actual question says "making functions inline". The two effectively have nothing to do with one another, they just have confusingly similar names. In modern compilers, the primary effect of inline is the thing that originally in C99 was (I think) just a necessary detail to make inline work at all: to permit multiple definitions of a symbol with external linkage. That's because modern compilers don't pay a whole lot of attention to the programmer's opinion of whether a function should be inlined. They do pay some, though, so the confusion of concepts persists. I've answered the question in the title, which is the decision the compiler makes, not the question in the body, which is the decision the programmer makes.]
Inlining is not necessarily an all-or-nothing deal. One strategy which compilers use to decide whether to inline, is to keep inlining function calls until the resulting code is "too big". "Big" is defined by some hopefully sensible heuristic.
So consider the following recursive function (which deliberately is not simply tail-recursive):
int triangle(int n) {
if (n == 1) return 1;
return n + triangle(n-1);
}
If it's called like this:
int t100() {
return triangle(100);
}
Then there's no particular reason in principle that the usual rules that the compiler uses for inlining shouldn't result in this:
int t100() {
// inline call to triangle(100)
int result;
if (100 == 1) { result = 1; } else {
// inline call to triangle(99)
int t99;
if (100 - 1 == 1) { t99 = 1; } else {
// inline call to triangle(98)
int t98;
if (100 - 1 - 1 == 1) { t98 = 1; } else {
// oops, "too big", no more inlining
t98 = triangle(100 - 1 - 1 - 1) + 98;
}
t99 = t98 + 99;
}
result = t99 + 100;
}
return result;
}
Obviously the optimiser will have a field day with that, so it's much "smaller" than it looks:
int t100() {
return triangle(97) + 297;
}
The code in triangle itself could be "unrolled" a few steps by a few levels of inlining, in exactly the same way, except that it doesn't have the benefits of constants:
int triangle(int n) {
if (n == 1) return 1;
if (n == 2) return 3;
if (n == 3) return 6;
return triangle(n-3) + 3*n - 3;
}
I doubt whether compilers actually do this, though, I don't think I've ever noticed it [Edit: MSVC does if you tell it to, thanks peterchen].
There's an obvious potential benefit in saving call overhead, but as against that people don't really expect recursive functions to get inlined, and there's no particular guarantee that the usual inlining heuristics will perform well with recursive functions (where there are two different places, the call site and the recursive call, that might be inlined, with different benefits in each case). Furthermore, it's difficult at compile time to estimate how deep the recursion will go, and the inline heuristics might like to take account of the call depth to make decisions. So it may be that the compiler just doesn't bother.
Functional language compilers are typically a lot more aggressive dealing with recursion than C or C++ compilers. The relevant trade-off there is that so many functions written in functional languages are recursive, that performance might be hopeless if the compiler couldn't optimise tail-recursion. So Lisp programmers typically rely on good optimisation of recursive functions, whereas C and C++ programmers typically don't.
If your compiler does not support it, you can try manually inlining instead...
int factorial(int n) {
int result = 1;
if (n-- == 0) {
return result;
} else {
result *= 1;
if (n-- == 0) {
return result;
} else {
result *= 2;
if (n-- == 0) {
return result;
} else {
result *= 3;
if (n-- == 0) {
return result;
} else {
result *= 4;
if (n-- == 0) {
return result;
} else {
// ...
}
}
}
}
}
}
See the problem yet?
Tail recursion (a special case of recursion) it's possible to be inlined by smart compilers.
Now, hold on. A tail-recursive function could be unrolled and inlined pretty easily. Apparently there are compilers that do this, but I am not aware of specifics.
Of course. Any function can be inlined if it makes sense to do it:
int f(int i)
{
if (i <= 0) return 1;
else return i * f(i - 1);
}
int main()
{
return f(10);
}
pseudo assembly (f is inlined in main):
main:
mov r0, #10 ; Pass 10 to f
f:
cmp r0, #0 ; arg <= 0? ...
bge 1l
mov r0, #1 ; ... is so, return 1
ret
1:
mov r0, -(sp) ; if not, save arg.
dec r0 ; pass arg - 1 to f
call f ; just because it's inlined doesn't mean I can't call it.
mul r0, (sp)+ ; compute the result
ret ; done.
;-)
When you call an ordinary function when you change command sequential execution order and jump(call or jmp) into some address where the function resides. Inlining mean that you place in all occurences of this function the commands of this function, so you don't have a one place where you could jump, also other types of optimisations can be used, like elemination of pushing/popping function parameters.
When you know, that the recursive chain will in normal cases be not so long, you could do inlining upto a predefined level (I don't know, if any existing compiler is intelligent enough for this today).
Inlining a recursive function is much like unrolling a loop. You will end up with much duplicate code -- but in some cases it could be worthwhile:
The number of recursive calls (the length of the chain) is normally short (in cases it gets longer than predefined, just do normal recursion)
The overhead for the functions calls is relatively big compared to the logic -- so do some "unrolling" for example five instances and end up doing a recursive call again -- this would lead to saving 80% of the call overhead.
Off course the tail-recursive special-case -- but this was mentioned by others.
Of course can be declared inline. The inline keyword is just a hint to the compiler. In many case the compiler just ignore it and depending on the compiler this could be one of this situatios.
Some compilers cna turn tail recursion into plain loops, and thus inline them normally.
Non-tail recursion could be inlined up to a given depth, usually decided by the compiler.
I've never encountered a practical application for that, as the cost of call isn't high enough anymore to offset the increase in code size.
[edit] (to clarify that: even though I like to toy with these things, and often check what code my compiler generates for "funny stuff" just out of curiosity, I haven't encountered a use case where any such unrolling helped significantly. This doesn't mean they don't exist or couldn't be constructed.
The only place where it would help is precalculating low iterations during compile time. However, in my experience this immensely increases compile times for often negligible runtime performance benefits.
Note that Visual Studio 2008 (and earlier) gives you quite some control over this:
#pragma inline_recursion(on)
#pragma inline_depth(N)
__forceinline
Be careful with the latter, it can easily overload the compiler :)
Inline means that on each place a call to a function marked as inline gets done, the compiler places a copy of the said function code there. This avoids function calling mechanisms, and it's usual argument stack pushing-poping, saving time in gazillion-calls-per-second situations. You see the consequences to static variables and stuff like that? all gone...
So, if you had an inlined recursive call, either your compiler is super smart and figures whether the number of copies is deterministic, of it will say "Cannot make it inline", because it wouldn't know when to stop.