Why is the recursion depth non-deterministic (C++)? - c++

Repeated runs of the following C++ program give a different maximum number of recursion calls (varying by approximately 100 function calls) before a segmentation fault.
#include <iostream>
void recursion(int i)
{
std::cout << "iteration: " << ++i << std::endl;
recursion(i);
}
int main()
{
recursion(0);
};
I compiled the file main.cpp with
g++ -O0 main.cpp -o main
Here and here the same issue as above is discussed for java. In both cases, the answers are based on java related concepts, JIT, garbage collection, HotSpot optimizer, etc.
Why does the maximum number of recursions vary for C++?

Your recursion never logically terminates. It only terminates when your program crashes due to lack of stack space.
A certain amount of stack space is used for every recursive call, but in C++, it's not defined exactly how much stack space is available and how much is used per recursive call.
The stack space used per call may vary by optimization settings, linker options, alignment requirements, how your program is launched, and a ton of other things.
Bottom line: you have coded a bug, and you are running afoul of undefined behavior in your compiler and platform. If you want to figure out exactly how much stack space your program has on its current thread, your platform will have APIs you can call to get that value.

What happens when you blow the stack is not a guaranteed crash. Depending on the system, you could just be trashing memory in a relatively random bit of your memory space.
What is in that memory might depend on what memory allocations occurred, how much contiguous memory the OS handed to you when you asked for some, ASLR, or whatever.
Undefined behaviour in C++ is not predictable.

Beyond the C++ aspect: Following the comments of Eljay and n.'pronouns'.m, I turned of ASLR. This post describes how to do that. In short, ASLR can be disabled via
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
and enabled via
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
After disabling ASLR, the number of recursions before the system segmentation fault is constant for repeated execution of the described program.

Related

My C and C++ code is taking a long time to execute ... Any tips? [duplicate]

I switched to c++ because i heard its 400 times faster than python, But when i made an infinite loop that increments a variable and prints its value python seems to be faster, How can that be?
And how to optimize it?
Python script:
x = 1
while 1:
print(x)
x+=1
C++ code:
int x = 1;
while (1) {
cout << x << endl;
x++;
}
I tried optimizing it by putting this command:
ios_base::sync_with_stdio(false);
The speed became almost identical to python's but not faster.
Yeah and i did search for this topic i didn't find anything that explains why.
C++'s std::endl flushes the stream, python's print does not. Try using "\n", that should speed up the C++ code.
You are not benchmarking the language, you are benchmarking the OS.
The time it takes to display text (by the windowing system) is longer than the time to prepare the characters (by your code) by orders of magnitude.
You will obtain the same behavior with any language.
C++'s advantage in comparison with Python doesn't lie in operations constrained by the OS such as printing to the console, but rather:
The fact is it hard typed, thus minimizing run-time overhead due to dynamic typing and type safety
The fact C++ is compiled (and highly optimized) and Python is (mostly) interpreted
In it's memory management model (Python uses managed objects that require garbage collection)
C++ can give you more control when implementing performance critical code (as far as using assembly and taking advantage of specific hardware)

Will reading out-of-bounds of a stack-allocated array cause any problems in real world?

Even though it is bad practice, is there any way the following code could cause trouble in real life? Note than I am only reading out of bounds, not writing:
#include <iostream>
int main() {
int arr[] = {1, 2, 3};
std::cout << arr[3] << '\n';
}
As mentioned, it is not "safe" to read beyond the end of the stack. But it sounds like you're really trying to ask what could go wrong? and, typically, the answer is "not much". Your program would ideally crash with a segfault, but it might just keep on happily running, unaware that it's entered undefined behavior. The results of such a program would be garbage, of course, but nothing's going to catch on fire (probably...).
People mistakenly write code with undefined behavior all the time, and a lot of effort has been spent trying to help them catch such issues and minimize their harm. Programs run in user space cannot affect other programs on the same machine thanks to isolated address spaces and other features, and software like sanitizers can help detect UB and other issues during development. Typically you can just fix the issue and move on to more important things.
That said, UB is, as the name suggests, undefined. Which means your computer is allowed to do whatever it wants once you ask it to execute UB. It could format your hard drive, fry your processor, or even "make demons fly out of your nose". A reasonable computer wouldn't do those things, but it could.
The most significant issue with a program that enters UB is simply that it's not going to do what you wanted it to do. If you are trying to delete /foo but you read off the end of the stack you might end up passing /bar to your delete function instead. And if you access memory that an attacker also has access to you could wind up executing code on their behalf. A large number of major security vulnerabilities boil down to some line of code that triggers UB in just the wrong way that a malicious user can take advantage of.
Depends on what you mean by stack. If it is the whole stack, then no, you can't do that, it will lead to a segmentation fault. Not because there is the memory of other processes there (that's not how it works), but rather because there is NOTHING there. You can heuristically see this by looking at the various addresses the program uses. The stack for example is at ~0x7f7d4af48040, which is beyond what any computer would have as memory. The memory your program sees is different from the physical memory.
If you mean read beyond the stack frame of the current method: yes, you can technically do that safely. Here is an example
void stacktrace(){
std::cerr << "Received SIGSEGV. Stack trace:\n";
void** bp;
asm(R"(
.intel_syntax noprefix
mov %[bp], rbp
.att_syntax
)"
: [bp] "=r" (bp));
size_t i = 0;
while(true){
std::cerr << "[" << i++ << "] " << bp[1] << '\n';
if(bp > *bp) break;
bp = (void**) *bp;
}
exit(1);
}
This is a very basic program I wrote to see, whether I could manually generate a stack trace. It might not be obvious if you are unfamiliar, but on x64 the address contained in rbp is the base of the current stack frame. In c++, the stack frame would look like:
return pointer
previous value of rsp [rsp = stack pointer] <- rbp points here
local variables (may be some other stuff like stack cookie)
...
local variables <- rsp points here
The address decreases the lower you go. In the example I gave above you can see that I get the value of rbp, which points outside the current stack frame, and move from there. So you can read from memory beyond the stack frame, but you generally shouldn't, and even so, why would you want to?
Note: Evg pointed this out. If you read some object, beyond the stack that might/will probably trigger a segfault, depending on object type, so this should only be done if you are very sure of what you're doing.
If you don't own the memory or you do own it but you haven't initialized it, you are not allowed to read it. This might seem like a pedantic and uselss rule. Afterall, the memory is there and I am not trying to overwrite anything, right? What is a byte among friends, let me read it.
The point is that C++ is a high level language. The compiler only tries to interpret what you have coded and translate it to assembly. If you type in nonsense, you will get out nonsense. It's a bit like forcing someone translate "askjds" from English to German.
But does this ever cause problems in real life? I roughly know what asm instructions are going to be generated. Why bother?
This video talks about a bug with Facebooks' string implementation where they read a byte of uninitialized memory which they did own, but it caused a very difficult to find bug nevertheless.
The point is that, silicon is not intuitive. Do not try to rely on your intuitions.

Maximum recursive function calls in C/C++ before stack is full and gives a segmentation fault?

I was doing a question where I used a recursive function to create a segment tree. For larger values it started giving segmentation fault. So I thought before it might be because of array index value out of bound but later I thought it might be because of program stack going too big.
I wrote this code to count what is the maximum number of recursive calls allowed before the system give seg-fault.
#include<iostream>
using namespace std;
void recur(long long int);
int main()
{
recur(0);
return 0;
}
void recur(long long int v)
{
v++;
cout<<v<<endl;
recur(v);
}
After running the above code I got value of v to be 261926 and 261893 and 261816 before getting segmentation fault and all values were close to these.
Now I know that this would depend on machine to machine, and the size of the stack of the function being called but can someone explain the basics of how to keep safe from seg-faults and what is a soft limit that one can keep in mind.
The number of recursion levels you can do depends on the call-stack size combined with the size of local variables and arguments that are placed on such a stack. Aside from "how the code is written", just like many other memory related things, this is very much dependent on the system you're running on, what compiler you are using, optimisation level [1], and so on. Some embedded systems I've worked on, the stack would be a few hundred bytes, my first home computer had 256 bytes of stack, where modern desktops have megabytes of stack (and you can adjust it, but eventually you will run out)
Doing recursion at unlimited depth is not a good idea, and you should look at changing your code to so that "it doesn't do that". You need to understand the algorithm and understand to what depth it will recurse, and whether that is acceptable in your system. There is unfortunately nothing anyone can do at the time stack runs out (at best your program crashes, at worst it doesn't, but instead causes something ELSE to go wrong, such as the stack or heap of some other application gets messed up!)
On a desktop machine, I'd think it's acceptable to have a recursion depth of a hew hundred to some thousands, but not much more than this - and that is if you have small usage of stack in each call - if each call is using up kilobytes of stack, you should limit the call level even further, or reduce the need for stack-space.
If you need to have more recursion depth than that, you need to re-arrange the code - for example using a software stack to store the state, and a loop in the code itself.
[1] Using g++ -O2 on your posted code, I got to 50 million and counting, and I expect if I leave it long enough, it will restart at zero because it keeps going forever - this since g++ detects that this recursion can be converted into a loop, and does that. Same program compiled with -O0 or -O1 does indeed stop at a little over 200000. With clang++ -O1 it just keeps going. The clang-compiled code is still running as I finished writing the rest of the code, at 185 million "recursions".
There is (AFAIK) no well established limit. (I am answering from a Linux desktop point of view).
On desktops, laptops the default stack size is a few megabytes in 2015. On Linux you could use setrlimit(2) to change it (to a reasonable figure, don't expect to be able to set it to a gigabyte these days) - and you could use getrlimit(2) or parse /proc/self/limits (see proc(5)) to query it . On embedded microcontrollers - or inside the Linux kernel- , the entire stack may be much more limited (to a few kilobytes in total).
When you create a thread using pthread_create(3) you could use an explicit pthread_attr_t and use pthread_attr_setstack(3) to set the stack space.
BTW, with recent GCC, you might compile all your software (including the standard C library) with split stacks (so pass -fsplit-stack to gcc or g++)
At last your example is a tail call, and GCC could optimize that (into a jump with arguments). I checked that if you compile with g++ -O2 (using GCC 4.9.2 on Linux/x86-64/Debian) the recursion would be transformed into a genuine loop and no stack allocation would grow indefinitely (your program run for nearly 40 millions calls to recur in a minute, then I interrupted it) In better languages like Scheme or Ocaml there is a guarantee that tail calls are indeed compiled iteratively (then the tail recursive call becomes the usually -or even the only- looping construct).
CyberSpok is excessive in his comment (hinting to avoid recursions). Recursions are very useful, but you should limit them to a reasonable depth (e.g. a few thousands), and you should take care that call frames on the call stack are small (less than a kilobyte each), so practically allocate and deallocate most of the data in the C heap. The GCC -fstack-usage options is really useful for reporting stack usage of every compiled function. See this and that answers.
Notice that continuation passing style is a canonical way to transform recursions into iterations (then you trade stack frames with dynamically allocated closures).
Some clever algorithms replace a recursion with fancy modifying iterations, e.g. the Deutche-Shorr-Waite graph marking algorithm.
For Linux based applications, we can use getrlimit and setrlimit API's to know various kernel resource limits, like size of core file, cpu time, stack size, nice values, max. no. of processes etc. 'RLIMIT_STACK' is the resource name for stack defined in linux kernel. Below is simple program to retrieve stack size :
#include <iostream>
#include <sys/time.h>
#include <sys/resource.h>
#include <errno.h>
using namespace std;
int main()
{
struct rlimit sl;
int returnVal = getrlimit(RLIMIT_STACK, &sl);
if (returnVal == -1)
{
cout << "Error. errno: " << errno << endl;
}
else if (returnVal == 0)
{
cout << "stackLimit soft - max : " << sl.rlim_cur << " - " << sl.rlim_max << endl;
}
}

Limit recursive calls in C++ (about 5000)?

In order to know the limit of the recursive calls in C++ i tried this function !
void recurse ( int count ) // Each call gets its own count
{
printf("%d\n",count );
// It is not necessary to increment count since each function's
// variables are separate (so each count will be initialized one greater)
recurse ( count + 1 );
}
this program halt when count is equal 4716 ! so the limit is just 4716 !!
I'm a little bit confused !! why the program stops exeuction when the count is equal to 4716 !!
PS: Executed under Visual studio 2010.
thanks
The limit of recursive calls depends on the size of the stack. The C++ language is not limiting this (from memory, there is a lower limit of how many function calls a standards conforming compiler will need to support, and it's a pretty small value).
And yes, recursing "infinitely" will stop at some point or another. I'm not entirely sure what else you expect.
It is worth noting that designing software to do "boundless" recursion (or recursion that runs in to the hundreds or thousands) is a very bad idea. There is no (standard) way to find out the limit of the stack, and you can't recover from a stack overflow crash.
You will also find that if you add an array or some other data structure [and use it, so it doesn't get optimized out], the recursion limit goes lower, because each stack-frame uses more space on the stack.
Edit: I actually would expect a higher limit, I suspect you are compiling your code in debug mode. If you compile it in release mode, I expect you get several thousand more, possibly even endless, because the compiler converts your tail-recursion into a loop.
The stack size is dependent on your environment.
In *NIX for instance, you can modify the stack size in the environment, then run your program and the result will be different.
In Windows, you can change it this way (source):
$ editbin /STACK:reserve[,commit] program.exe
You've probably run out of stack space.
Every time you call the recursive function, it needs to push a return address on the stack so it knows where to return to after the function call.
It crashes at 4716 because it just happens to run out of stack space after about 4716 iterations.

What makes EXE's grow in size?

My executable was 364KB in size. It did not use a Vector2D class so I implemented one with overloaded operators.
I changed most of my code from
point.x = point2.x;
point.y = point2.y;
to
point = point2;
This resulted in removing nearly 1/3 of my lines of code and yet my exe is still 364KB. What exactly causes it to grow in size?
The compiler probably optimised your operator overload by inlining it. So it effectively compiles to the same code as your original example would. So you may have cut down a lot of lines of code by overloading the assignment operator, but when the compiler inlines, it takes the contents of your assignment operator and sticks it inline at the calling point.
Inlining is one of the ways an executable can grow in size. It's not the only way, as you can see in other answers.
What makes EXE’s grow in size?
External libraries, especially static libraries and debugging information, total size of your code, runtime library. More code, more libraries == larger exe.
To reduce size of exe, you need to process exe with gnu strip utility, get rid of all static libraries, get rid of C/C++ runtime libraries, disable all runtime checks and turn on compiler size optimizations. Working without CRT is a pain, but it is possible. Also there is a wcrt (alternative C runtime) library created for making small applications (by the way, it hasn't been updated/maintained during last 5 years).
The smallest exe that I was able create with msvc compiler is somewhere around 16 kilobytes. This was a windows application that displayed single window and required msvcrt.dll to run. I've modified it a bit, and turned it into practical joke that wipes out picture on monitor.
For impressive exe size reduction techniques, you may want to look at .kkrieger. It is a 3D first person shooter, 96 kilobytes total. The game has a large and detailed level, supports shaders, real-time shadows, etc. I.e. comparable with Saurbraten (see screenshots). The smallest working windows application (3d demo with music) I ever encountered was 4 kilobytes big, and used compression techniques and (probably) undocumented features (i.e. the fact that *.com executbale could unpack and launch win32 exe on windows xp)..
In most cases, size of *.exe shouldn't really bother you (I haven't seen a diskette for a few years), as long as it is reasonable (below 100 megabytes). For example of "unreasonable" file size see debug build of Qt 4 for mingw.
This resulted in removing nearly 1/3 of my lines of code and yet my exe is still 364KB.
Most likely it is caused by external libraries used by compiler, runtime checks, etc.
Also, this is an assignment operation. If you aren't using custom types for x (with copy constructor), "copy" operation is very likely to result in small number of operations - i.e. removing 1/3 of lines doesn't guarantee that your code will be 1/3 shorter.
If you want to see how much impact your modification made, you could "ask" compiler to produce asm listing for both versions of the program then compare results (manually or with diff). Or you could disasm/compare both versions of executable. BUt I'm certain that using GNU strip or removing extra libraries will have more effect than removing assignment operators.
What type is point? If it's two floats, then the compiler will implicitly do a member-by-member copy, which is the same thing you did before.
EDIT: Apparently some people in today's crowd didn't understand this answer and compensated by downvoting. So let me elaborate:
Lines of code have NO relation to the executable size. The source code tells the compiler what assembly line to create. One line of code can cause hundreds if not thousands of assembly instructions. This is particularly true in C++, where one line can cause implicit object construction, destruction, copying, etc.
In this particular case, I suppose that "point" is a class with two floats, so using the assignment operator will perform a member-by-member copy, i.e. it takes every member individually and copies it. Which is exactly the same thing he did before, except that now it's done implicitly. The resulting assembly (and thus executable size) is the same.
Executables are most often sized in 'pages' rather than discrete bytes.
I think this a good example why one shouldn't worry too much about code being too verbose if you have a good optimizing compiler. Instead always code clearly so that fellow programmers can read your code and leave the optimization to the compiler.
Some links to look into
http://www2.research.att.com/~bs/bs_faq.html#Hello-world
GCC C++ "Hello World" program -> .exe is 500kb big when compiled on Windows. How can I reduce its size?
http://www.catch22.net/tuts/minexe
As for Windows, lots of compiler options in VC++ may be activated like RTTI, exception handling, buffer checking, etc. that may add more behind the scenes to the overall size.
When you compile a c or c++ program into an executable, the compiler translates your code into machine code, and applying optimizations as it sees fit.
But simply, more code = more machine code to generate = more size to the executable.
Also, check if you have lot of static/global objects. This substantially increase your exe size if they are not zero initialized.
For example:
int temp[100] = {0};
int main()
{
}
size of the above program is 9140 bytes on my linux machine.
if I initialize temp array to 5, then the size will shoot up by around 400 bytes. The size of the below program on my linux machine is 9588.
int temp[100] = {5};
int main()
{
}
This is because, zero initialized global objects go into .bss segment, which ill be initialized at once during program startup. Where as non zero initialized objects contents will be embedded in the exe itself.