Why stack overflow on some machines, but segmentation fault on another? - c++

Just out of curiosity, I'm trying to generate a stack overflow. This code generates a Stack Overflow according to the OP, but when I run it on my machine, it generates a segmentation fault:
#include <iostream>
using namespace std;
int num = 11;
unsigned long long int number = 22;
int Divisor()
{
int result;
result = number%num;
if (result == 0 && num < 21)
{
num+1;
Divisor();
if (num == 20 && result == 0)
{
return number;
}
}
else if (result != 0)
{
number++;
Divisor();
}
}
int main ()
{
Divisor();
cout << endl << endl;
system ("PAUSE");
return 0;
}
Also, according to this post, some examples there should also do the same. Why is it I get segmentation faults instead?

Why is it I get segmentation faults instead?
The segmentation fault, what you're seeing, is a side-effect of the stack overflow. The reason is stack overflow, the result is segmentation fault.
From the wikipedia article for "stack overflow" (emphasis mine)
.... When a program attempts to use more space than is available on the call stack (that is, when it attempts to access memory beyond the call stack's bounds, which is essentially a buffer overflow), the stack is said to overflow, typically resulting in a program crash.

A stack overflow can lead to following errors:
SIGSEGV (segmentation violation) signal for the process.
SIGILL (illegal instruction) signal.
SIGBUS an access to an invalid address.
For more read Program Error Signals. Since the behavior is undefined any of the above can come up on different systems/architectures.

You are essentially asking: what is the behavior of undefined behavior?
The answer is: undefined behavior is behavior which is not defined. Anything might happen.
Researching why you get a certain undefined behavior on a certain system is most often pointless exercise.
Undefined, unspecified and implementation-defined behavior
In the case of stack overflow, the program might overwrite other variables in RAM, or corrupt the running function's own return address, or attempt to modify memory outside its given address range etc etc. Depending on system, you might get hardware exceptions and various error signals such as SIGSEGV (on POSIX systems), or sudden program crashes, or "program seems to be working fine", or something else.

The other answers posted are all correct.
However, if the intent of your question is to understand why you do not see a printed error stating that a stack overflow has occurred, the answer is that some run-time libraries explicitly detect and report stack overflows, while others do not, and simply crash with a segfault.
In particular, it looks like at least some versions of Windows detect Stackoverflows and turn them into exceptions, since the documentation suggests you can handle them.

A stack overflow is a cause, a segmentation fault is the result.
On linux and other unix like systems a segmentation fault may be the result, among other things, of a stack overflow. You don't get any specific information that the program encountered a stack overflow.
In the first post you're linking, the person is running the code on Windows which may behave differently, and e.g. detect a stack overflow specifically.

I guess you're using a compiler that doesn't have stack checking enabled.
Stack checking is a rather simple mechanism, it kills the program stating that a Stack Overflow Happened as soon as the stack pointer flies past the stack bound. It is often disabled for optimisation purposes, because a program will almost certainly crash on a stack overflow anyway.
Why a segfault? Well, without stack checking enabled, your program doesn't stop after using up the stack, and continues right into unrelated (and quite often protected) memory, which it tries to modify to use as another stack frame for a new function invokation. Madness ensues, and a segfault happens.

Related

cout prints char[] containing more characters than set length? [duplicate]

This question already has answers here:
What is a buffer overflow and how do I cause one?
(12 answers)
Closed 5 years ago.
#include<iostream>
using namespace std;
int main(void)
{
char name[5];
cout << "Name: ";
cin.getline(name, 20);
cout << name;
}
Output:
Name: HelloWorld
HelloWorld
Shouldn't this give an error or something?
Also when I write an even longer string,
Name: HelloWorld Goodbye
HelloWorld Goodbye
cmd exits with an error.
How is this possible?
Compiler: G++ (GCC 7), Nuwen
OS: Windows 10
It's called buffer overflow and is a common source of code bugs and exploits. It's the developers responsibility to ensure it doesn't happen. character strings wil be printed until they reach the first '\0' character
The code produces "undefined behavior". This means, anything might happen. In your case, the program works unexpectedly. It might however do something completely different with different compiler flags or on a different system.
Shouldn't this give an error or something.
No. The compiler cannot know that you will input a long string, thus there cannot be any compiler error. You also don't throw any runtime exception here. It is up to you to make sure the program can handle long strings.
Your code has encountered UB, also known as undefined behaviour, which, as Wikipedia defines, the result of executing computer code whose behavior is not prescribed by the language specification to which the code adheres. It usually occurs when you do note define variables properly, in this case a too small char array.
Even -Wall flag will not give any warning. So you can use tools like valgrind and gdb to detect memory leaks and buffer overflows
You can check those questions:
Array index out of bound in C
No out of bounds error
They have competent answers.
My short answer, based on those already given in the questions I posted:
Your code implements an Undefined Behavior(buffer overflow) so it doesn't give an error when you run it once. But some other time it may give. It's a chance thing.
When you enter a longer string, you actually corrupt the memory (stack) of the program (i.e you overwrite the memory which should contain some program-related data, with your data) and so the return code of your program ends up being different than 0 which interprets as an error. The longer the string, the higher the chance of screwing things up (sometimes even short strings screw things up)
You can read more here: https://en.wikipedia.org/wiki/Buffer_overflow

Segmentation fault - why and how does it work?

In both the functions defined below, it tries to allocate 10M of memory in the stack. But the segmentation fault happens only in the second case and not it the first and I am trying to understand why so.
Function definition 1:
a(int *i)
{
char iptr[50000000];
*i = 1;
}
Function definition 2:
a()
{
char c;
char iptr[5000000];
printf("&c = 0x%lx, iptr = 0x%x ... ", &c, iptr);
fflush(stdout);
c = iptr[0];
printf("ok\n");
}
According to my understanding in case of local variables that are not alloted memory dynamically are stored in stack section of the program. So I suppose, during compile time itself the compiler checks if the variable fits in the stack or not.
Hence if above stated is true, then segmentation fault should occur in both the cases (i.e. also in case 1).
The website (http://web.eecs.utk.edu/courses/spring2012/cs360/360/notes/Memory/lecture.html) from where I picked this states that the segfault happens in function 2 in a when the code attempts to push iptr on the stack for the printf call. This is because the stack pointer is pointing to the void. Had we not referenced anything at the stack pointer, our program should have worked.
I need help understanding this last statement and my earlier doubt related to this.
So I suppose, during compile time itself the compiler checks if the variable fits in the stack or not.
No, that cannot be done. When compiling a function, the compiler does not know what the call stack will be when the function is called, so it will assume that you know what you are doing (which might or not be the case). Also note that the amount of stack space may be affected by both compile time and runtime restrictions (in Linux you can set the stack size with ulimit on the shell that starts the process).
I need help understanding this last statement and my earlier doubt related to this.
I would not attempt to look too much into that statement, it is not standard but rather based on knowledge of a particular implementation that is not even described there, and thus is built on some assumptions that are not necessarily true.
It assumes that the act of allocating the array does not 'touch' the allocated memory (in some debug builds in some implementations that is false) and thus whether you attempt to allocate 1 byte or 100M if the data is not touched by your program the allocation is fine --this need not be the case.
It also assumes that the arguments of the function printf are passed in the stack (this is actually the case in all implementations I know, due to the variadic arguments nature of the function). With the previous assumption, the array would overflow the stack (assuming an stack of <10M), but would not crash as the memory is not accessed, but to be able to call printf the value of the argument would be pushed to the stack beyond the array. This will write to memory and that write will be beyond the allocated space for the stack and crash.
Again, all this is implementation, not defined by the language.
Error in your code is being thrown by the following code:
; Find next lower page and probe
cs20:
sub eax, _PAGESIZE_ ; decrease by PAGESIZE
test dword ptr [eax],eax ; probe page. "**This line throws the error**"
jmp short cs10
_chkstk endp
end
From chkstk.asm file, which Provide stack checking on procedure entry. And this file explicitically defines:
_PAGESIZE_ equ 1000h
Now as a explanation of your problem This Question tells everything you need as mentioned by: Shafik Yaghmour
Your printf format string assumes that pointers, ints (%x), and longs (%lx) are all the same size; this may be false on your platform, leading to undefined behavior. Use %p instead. I intended to make this a comment, but can't yet.
I am surprised no one noticed that the first function allocates 10 times the space than the second function. There are seven zeros after 5 in the first function whereas the second function has six zeros after 5 :-)
I compiled it with gcc-4.6.3 and got segmentation fault on the first function but not on the second function. After I removed the additional zero in the first function, seg fault went away. Adding a zero in the second function introduced the seg fault. So at least in my case, the reason of this seg fault is that the program could not allocate the required space on the stack. I would be happy to hear about the observations that differ from the above.

Buffer array overflow in for loop in c

When would a program crash in a buffer overrun case
#include<stdio.h>
#include<stdlib.h>
main() {
char buff[50];
int i=0;
for( i=0; i <100; i++ )
{
buff[i] = i;
printf("buff[%d]=%d\n",i,buff[i]);
}
}
What will happen to first 50 bytes assigned, when would the program crash?
I see in my UBUNTU with gcc a.out it is crashing when i 99
>>
buff[99]=99
*** stack smashing detected ***: ./a.out terminated
Aborted (core dumped)
<<
I would like to know why this is not crashing when assignment happening at buff[51] in the for loop?
It is undefined behavior. You can never predict when (or if at all) it crashes, but you cannot rely upon it 'not crashing' and code an application.
Reasoning
The rationale is that there is no compile or run time 'index out of bound checking' in c arrays. That is present in STL vectors or arrays in other higher level languages. So whenever your program accesses memory beyond the allocated range, it depends whether it simply corrupts another field on your program's stack or affects memory of another program or something else, so one can never predict a crash which only occurs in extreme cases. It only crashes in a state that forces the OS to intervene OR when it no longer remains possible for your program to function correctly.
Example
Say you were inside a function call, and immediately next to your array was, the RETURN address i.e. the address your program uses to return to the function it was called from. Suppose you corrupted that and now your program tries to return to the corrupted value, which is not a valid address. Hence it would crash in such a situation.
The worst happens when you silently modified another field's value and didn't even discover what was wrong assuming no crash occurred.
Since it seems you have allocated on the stack the buffer, the app possibly will crash on the first occasion you overwrite an instruction which is to be executed, possibly somewhere in the code of the for loop... at least that's how it's supposed to be in theory.

Why am I not getting a segmentation fault with this code? (Bus error)

I had a bug in my code that went like this.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Ideally this should give a segfault. However, I saw this give a bus error.
Wikipedia says something to the order of 'Bus error is when the program tries to access an unaligned memory location or when you try to access a physical (not virtual) memory location that does not exist or is not allowed. '
The second part of the above statement sounds similar to a seg fault. So my question is, when do you get a SIGBUS and when a SIGSEGV?
EDIT:-
Quite a few people have mentioned the context. I'm not sure what context would be needed but this was a buffer overflow lying inside a static class function that get's called from a number of other class functions. If there's something more specific that I can give which will help, do ask.
Anyways, someone had commented that I should simply write better code. I guess the point of asking this question was "can an application developer infer anything from a SIGBUS versus a SIGSEGV?" (picked from that blog post below)
As you probably realize, the base cause is undefined behavior in your
program. In this case, it leads to an error detected by the hardware,
which is caught by the OS and mapped to a signal. The exact mapping
isn't really specified (and I've seen integral division by zero result
in a SIGFPE), but generally: SIGSEGV occurs when you access out of
bounds, SIGBUS for other accessing errors, and SIGILL for an illegal
instruction. In this case, the most likely explination is that your
bounds error has overwritten the return address on the stack. If the
return address isn't correctly aligned, you'll probably get a SIGBUS,
and if it is, you'll start executing whatever is there, which could
result in a SIGILL. (But the possibility of executing random bytes as
code is what the standards committee had in mind when they defined
“undefined behavior”. Especially on machines with no memory
protection, where you could end up jumping directly into the OS.)
A segmentation fault is never guaranteed when you're doing fishy stuff with memory. It all depends on a lot of factors (how the compiler lays out the program in memory, optimizations etc).
What may be illegal for a C++ program may not be illegal for a program in general. For instance the OS doesn't care if you step outside an array. It doesn't even know what an array is. However it does care if you touch memory that doesn't belong to you.
A segmentation fault occurs if you try to do a data access a virtual address that is not mapped to your process. On most operating systems, memory is mapped in pages of a few kilobytes; this means that you often won't get a fault if you write off the end of an array, since there is other valid data following it in the memory page.
A bus error indicates a more low-level error; a wrongly-aligned access or a missing physical address are two reasons, as you say. However, the first is not happening here, since you're dealing with bytes, which have no alignment restriction; and I think the second can only happen on data accesses when memory is completely exhausted, which probably isn't happening.
However, I think you might also get a bus error if you try to execute code from an invalid virtual address. This could well be what is happening here - by writing off the end of a local array, you will overwrite important parts of the stack frame, such as the function's return address. This will cause the function to return to an invalid address, which (I think) will give a bus error. That's my best guess at what particular flavour of undefined behaviour you are experiencing here.
In general, you can't rely on segmentation faults to catch buffer overruns; the best tool I know of is valgrind, although that will still fail to catch some kinds of overrun. The best way to avoid overruns when working with strings is to use std::string, rather than pretending that you're writing C.
In this particular case, you don't know what kind of garbage you have in the format string. That garbage could potentially result in treating the remaining arguments as those of an "aligned" data type (e.g. int or double). Treating an unaligned area as an aligned argument definitely causes SIGBUS on some systems.
Given that your string is made up of two other strings each being a max of 20 characters long, yet you are putting it into a field that is 25 characters, that is where your first issue lies. You are have a good potential to overstep your bounds.
The variable desc should be at least 41 characters long (20 + 20 + 1 [for the space you insert]).
Use valgrind or gdb to figure out why you are getting a seg fault.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Just by looking at this code, I can assume that name and address each can be 20 chars long. If that is so, then does it not imply that desc should be minimum 20+20+1 chars long? (1 char for the space between name and address, as specified in the sprintf).
That can be the one reason of segfault. There could be other reasons as well. For example, what if name is longer than 20 chars?
So better you use std::string:
std::string name;
std::string address;
std::string desc = name + " " + address;
char const *char_desc = desc.str(); //if at all you need this

Debug stack corruption

Now I am debugging a large project, which has a stack corruption: the application fails.
I would like to know how to find (debug) such stack corruption code with Visual Studio 2010?
Here's an example of some code which causes stack problems, how would I find less obvious cases of this type of corruption?
void foo()
{
int i = 10;
int *p = &i;
p[-2] = 100;
}
Update
Please note that this is just an example. I need to find such bad code in the current project.
There's one technique that can be very effective with these kinds of bugs, but it'll only work on a subset of them that has a few characteristics:
the corrupting value must be stable (ie., as in your example, when the corruption occurs, it's always 100), or at least something that can be readily identified in a simple expression
the corruption has to occur at a particular address on the stack
the corrupting value is unusual enough that you won't be hit with a slew of false positives
Note that the second condition may seem unlikely at first glance because the stack can be used in so many different ways depending on the runtime actions. However, stack usage is generally pretty deterministic. The problem is that a particular stack location can be used for so many different things that the problem is really item #3.
Anyway, if your bug has these characteristics, you should identify the stack address (or one of them) that gets corrupted, then set a memory breakpoint for a write to that address with a condition that causes it to break only if the value written is the corrupting value. In visual Studio, you can do this by creating a "New Data Breakpoint..." in the Breakpoints window then right clicking the breakpoint to set the condition.
If you end up getting too many false positives, it might help to narrow the scope of the breakpoint by leaving it disabled until some point in the execution path that's closer to the bug (if you can identify such a time), or set the hit count high enough to remove most of the false positives.
An additional complication is the address of the stack may change from run to run - in this case, you'll have to take care to set the breakpoint on each run (the lower bits of the address should be the same).
I believe your Questions quotes an example of stack corruption and the question you are asking is not why it crashes.
If it is so, It crashes because it creates an Undefined Behavior because the index -2 points to an unknown memory location.
To answer the question on profiling your application:
You can use Rational Purify Plus for Visual studio to check for memory overrites and access errors.
This is UB: p[-2] = 100;
You can access p with the operator[] in this(p[i]) way, but in this case i is an invalid value. So p[-2] points to an invalid memory location and causes Undefined Behaviour.
To find it you should debug your app and find where it crashes, and hopefully, it'll be at a place where something is actually wrong.