Segmentation fault - why and how does it work? - c++

In both the functions defined below, it tries to allocate 10M of memory in the stack. But the segmentation fault happens only in the second case and not it the first and I am trying to understand why so.
Function definition 1:
a(int *i)
{
char iptr[50000000];
*i = 1;
}
Function definition 2:
a()
{
char c;
char iptr[5000000];
printf("&c = 0x%lx, iptr = 0x%x ... ", &c, iptr);
fflush(stdout);
c = iptr[0];
printf("ok\n");
}
According to my understanding in case of local variables that are not alloted memory dynamically are stored in stack section of the program. So I suppose, during compile time itself the compiler checks if the variable fits in the stack or not.
Hence if above stated is true, then segmentation fault should occur in both the cases (i.e. also in case 1).
The website (http://web.eecs.utk.edu/courses/spring2012/cs360/360/notes/Memory/lecture.html) from where I picked this states that the segfault happens in function 2 in a when the code attempts to push iptr on the stack for the printf call. This is because the stack pointer is pointing to the void. Had we not referenced anything at the stack pointer, our program should have worked.
I need help understanding this last statement and my earlier doubt related to this.

So I suppose, during compile time itself the compiler checks if the variable fits in the stack or not.
No, that cannot be done. When compiling a function, the compiler does not know what the call stack will be when the function is called, so it will assume that you know what you are doing (which might or not be the case). Also note that the amount of stack space may be affected by both compile time and runtime restrictions (in Linux you can set the stack size with ulimit on the shell that starts the process).
I need help understanding this last statement and my earlier doubt related to this.
I would not attempt to look too much into that statement, it is not standard but rather based on knowledge of a particular implementation that is not even described there, and thus is built on some assumptions that are not necessarily true.
It assumes that the act of allocating the array does not 'touch' the allocated memory (in some debug builds in some implementations that is false) and thus whether you attempt to allocate 1 byte or 100M if the data is not touched by your program the allocation is fine --this need not be the case.
It also assumes that the arguments of the function printf are passed in the stack (this is actually the case in all implementations I know, due to the variadic arguments nature of the function). With the previous assumption, the array would overflow the stack (assuming an stack of <10M), but would not crash as the memory is not accessed, but to be able to call printf the value of the argument would be pushed to the stack beyond the array. This will write to memory and that write will be beyond the allocated space for the stack and crash.
Again, all this is implementation, not defined by the language.

Error in your code is being thrown by the following code:
; Find next lower page and probe
cs20:
sub eax, _PAGESIZE_ ; decrease by PAGESIZE
test dword ptr [eax],eax ; probe page. "**This line throws the error**"
jmp short cs10
_chkstk endp
end
From chkstk.asm file, which Provide stack checking on procedure entry. And this file explicitically defines:
_PAGESIZE_ equ 1000h
Now as a explanation of your problem This Question tells everything you need as mentioned by: Shafik Yaghmour

Your printf format string assumes that pointers, ints (%x), and longs (%lx) are all the same size; this may be false on your platform, leading to undefined behavior. Use %p instead. I intended to make this a comment, but can't yet.

I am surprised no one noticed that the first function allocates 10 times the space than the second function. There are seven zeros after 5 in the first function whereas the second function has six zeros after 5 :-)
I compiled it with gcc-4.6.3 and got segmentation fault on the first function but not on the second function. After I removed the additional zero in the first function, seg fault went away. Adding a zero in the second function introduced the seg fault. So at least in my case, the reason of this seg fault is that the program could not allocate the required space on the stack. I would be happy to hear about the observations that differ from the above.

Related

Will reading out-of-bounds of a stack-allocated array cause any problems in real world?

Even though it is bad practice, is there any way the following code could cause trouble in real life? Note than I am only reading out of bounds, not writing:
#include <iostream>
int main() {
int arr[] = {1, 2, 3};
std::cout << arr[3] << '\n';
}
As mentioned, it is not "safe" to read beyond the end of the stack. But it sounds like you're really trying to ask what could go wrong? and, typically, the answer is "not much". Your program would ideally crash with a segfault, but it might just keep on happily running, unaware that it's entered undefined behavior. The results of such a program would be garbage, of course, but nothing's going to catch on fire (probably...).
People mistakenly write code with undefined behavior all the time, and a lot of effort has been spent trying to help them catch such issues and minimize their harm. Programs run in user space cannot affect other programs on the same machine thanks to isolated address spaces and other features, and software like sanitizers can help detect UB and other issues during development. Typically you can just fix the issue and move on to more important things.
That said, UB is, as the name suggests, undefined. Which means your computer is allowed to do whatever it wants once you ask it to execute UB. It could format your hard drive, fry your processor, or even "make demons fly out of your nose". A reasonable computer wouldn't do those things, but it could.
The most significant issue with a program that enters UB is simply that it's not going to do what you wanted it to do. If you are trying to delete /foo but you read off the end of the stack you might end up passing /bar to your delete function instead. And if you access memory that an attacker also has access to you could wind up executing code on their behalf. A large number of major security vulnerabilities boil down to some line of code that triggers UB in just the wrong way that a malicious user can take advantage of.
Depends on what you mean by stack. If it is the whole stack, then no, you can't do that, it will lead to a segmentation fault. Not because there is the memory of other processes there (that's not how it works), but rather because there is NOTHING there. You can heuristically see this by looking at the various addresses the program uses. The stack for example is at ~0x7f7d4af48040, which is beyond what any computer would have as memory. The memory your program sees is different from the physical memory.
If you mean read beyond the stack frame of the current method: yes, you can technically do that safely. Here is an example
void stacktrace(){
std::cerr << "Received SIGSEGV. Stack trace:\n";
void** bp;
asm(R"(
.intel_syntax noprefix
mov %[bp], rbp
.att_syntax
)"
: [bp] "=r" (bp));
size_t i = 0;
while(true){
std::cerr << "[" << i++ << "] " << bp[1] << '\n';
if(bp > *bp) break;
bp = (void**) *bp;
}
exit(1);
}
This is a very basic program I wrote to see, whether I could manually generate a stack trace. It might not be obvious if you are unfamiliar, but on x64 the address contained in rbp is the base of the current stack frame. In c++, the stack frame would look like:
return pointer
previous value of rsp [rsp = stack pointer] <- rbp points here
local variables (may be some other stuff like stack cookie)
...
local variables <- rsp points here
The address decreases the lower you go. In the example I gave above you can see that I get the value of rbp, which points outside the current stack frame, and move from there. So you can read from memory beyond the stack frame, but you generally shouldn't, and even so, why would you want to?
Note: Evg pointed this out. If you read some object, beyond the stack that might/will probably trigger a segfault, depending on object type, so this should only be done if you are very sure of what you're doing.
If you don't own the memory or you do own it but you haven't initialized it, you are not allowed to read it. This might seem like a pedantic and uselss rule. Afterall, the memory is there and I am not trying to overwrite anything, right? What is a byte among friends, let me read it.
The point is that C++ is a high level language. The compiler only tries to interpret what you have coded and translate it to assembly. If you type in nonsense, you will get out nonsense. It's a bit like forcing someone translate "askjds" from English to German.
But does this ever cause problems in real life? I roughly know what asm instructions are going to be generated. Why bother?
This video talks about a bug with Facebooks' string implementation where they read a byte of uninitialized memory which they did own, but it caused a very difficult to find bug nevertheless.
The point is that, silicon is not intuitive. Do not try to rely on your intuitions.

I don't understand memory allocation and strcpy [duplicate]

This question already has answers here:
No out of bounds error
(7 answers)
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 3 years ago.
Here's a sample of my code:
char chipid[13];
void initChipID(){
write_to_logs("Called initChipID");
strcpy(chipid, string2char(twelve_char_string));
write_to_logs("Chip ID: " + String(chipid));
}
Here's what I don't understand: even if I define chipid as char[2], I still get the expected result printed to the logs.
Why is that? Shouldn't the allocated memory space for chipid be overflown by the strcpy, and only the first 2 char of the string be printed?
Here's what I don't understand: even if I define chipid as char[2], I
still get the expected result printed to the logs.
Then you are (un)lucky. You are especially lucky if the undefined behavior produced by the overflow does not manifest as corruption of other data, yet also does not crash the program. The behavior is undefined, so you should not interpret whatever manifestation it takes as something you should rely upon, or that is specified by the language.
Why is that?
The language does not specify that it will happen, and it certainly doesn't specify why it does happen in your case.
In practice, the manifest ation you observe is as if the strcpy writes the full data into memory at the location starting at the beginning of your array and extending past its end, overwriting anything else your program may have stored in that space, and that the program subsequently reads it back via a corresponding overflowing read.
Shouldn't the allocated memory space for chipid be
overflown by the strcpy,
Yes.
and only the first 2 char of the string be
printed?
No, the language does not specify what happens once the program exercises UB by performing a buffer overflow (or by other means). But also no, C arrays are represented in memory simply as a flat sequence of contiguous elements, with no explicit boundary. This is why C strings need to be terminated. String functions do not see the declared size of the array containing a string's elements, they see only the element sequence.
You have it part correct: "the allocated memory space for chipid be overflowed by the strcpy" -- this is true. And this is why you get the full result (well, the result of an overflow is undefined, and could be a crash or other result).
C/C++ gives you a lot of power when it comes to memory. And with great power comes great responsibility. What you are doing gives undefined behaviour, meaning it may work. But it will definitely give problems later on when strcpy is writing to memory it is not supposed to write to.
You will find that you can get away with A LOT of things in C/C++. However, things like these will give you headaches later on when your program unexpectedly crashes, and this could be in an entire different part of your program, which makes it difficult to debug.
Anyway, if you are using C++, you should use std::string, which makes things like this a lot easier.

Buffer array overflow in for loop in c

When would a program crash in a buffer overrun case
#include<stdio.h>
#include<stdlib.h>
main() {
char buff[50];
int i=0;
for( i=0; i <100; i++ )
{
buff[i] = i;
printf("buff[%d]=%d\n",i,buff[i]);
}
}
What will happen to first 50 bytes assigned, when would the program crash?
I see in my UBUNTU with gcc a.out it is crashing when i 99
>>
buff[99]=99
*** stack smashing detected ***: ./a.out terminated
Aborted (core dumped)
<<
I would like to know why this is not crashing when assignment happening at buff[51] in the for loop?
It is undefined behavior. You can never predict when (or if at all) it crashes, but you cannot rely upon it 'not crashing' and code an application.
Reasoning
The rationale is that there is no compile or run time 'index out of bound checking' in c arrays. That is present in STL vectors or arrays in other higher level languages. So whenever your program accesses memory beyond the allocated range, it depends whether it simply corrupts another field on your program's stack or affects memory of another program or something else, so one can never predict a crash which only occurs in extreme cases. It only crashes in a state that forces the OS to intervene OR when it no longer remains possible for your program to function correctly.
Example
Say you were inside a function call, and immediately next to your array was, the RETURN address i.e. the address your program uses to return to the function it was called from. Suppose you corrupted that and now your program tries to return to the corrupted value, which is not a valid address. Hence it would crash in such a situation.
The worst happens when you silently modified another field's value and didn't even discover what was wrong assuming no crash occurred.
Since it seems you have allocated on the stack the buffer, the app possibly will crash on the first occasion you overwrite an instruction which is to be executed, possibly somewhere in the code of the for loop... at least that's how it's supposed to be in theory.

off-by-one error with string functions (C/C++) and security potentials

So this code has the off-by-one error:
void foo (const char * str) {
char buffer[64];
strncpy(buffer, str, sizeof(buffer));
buffer[sizeof(buffer)] = '\0';
printf("whoa: %s", buffer);
}
What can malicious attackers do if she figured out how the function foo() works?
Basically, to what kind of security potential problems is this code vulnerable?
I personally thought that the attacker can't really do anything in this case, but I heard that they can do a lot of things even if they are limited to work with 1 byte.
The only off-by-one error I see here is this line:
buffer[sizeof(buffer)] = '\0';
Is that what you're talking about? I'm not an expert on these things, so maybe I've overlooking something, but since the only thing that will ever get written to that wrong byte is a zero, I think the possibilities are quite limited. The attacker can't control what's being written there. Most likely it would just cause a crash, but it could also cause tons of other odd behavior, all of it specific to your application. I don't see any code injection vulnerability here unless this error causes your app to expose another such vulnerability that would be used as the vector for the actual attack.
Again, take with a grain of salt...
Read Shell Coder's Handbook 2nd Edition for lots of information.
Disclaimer: This is inferred knowledge from some research I just did, and should not be taken as gospel.
It's going to overwrite part or all of your saved frame pointer with a null byte - that's the reference point that your calling function will use to offset it's memory accesses. So at that point the calling function's memory operations are going to a different location. I don't know what that location will be, but you don't want to be accessing the wrong memory. I won't say you can do anything, but you might be able to do something.
How do I know this (really, how did I infer this)? Smashing the stack for Fun and Profit by Aleph One. It's quite old, and I don't know if Windows or Compilers have changed the way the stack behaves to avoid these problems. But it's a starting point.
example1.c:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
To understand what the program does to call function() we compile it with
gcc using the -S switch to generate assembly code output:
$ gcc -S -o example1.s example1.c
By looking at the assembly language output we see that the call to
function() is translated to:
pushl $3
pushl $2
pushl $1
call function
This pushes the 3 arguments to function backwards into the stack, and
calls function(). The instruction 'call' will push the instruction pointer
(IP) onto the stack. We'll call the saved IP the return address (RET). The
first thing done in function is the procedure prolog:
pushl %ebp
movl %esp,%ebp
subl $20,%esp
This pushes EBP, the frame pointer, onto the stack. It then copies the
current SP onto EBP, making it the new FP pointer. We'll call the saved FP
pointer SFP. It then allocates space for the local variables by subtracting
their size from SP.
We must remember that memory can only be addressed in multiples of the
word size. A word in our case is 4 bytes, or 32 bits. So our 5 byte buffer
is really going to take 8 bytes (2 words) of memory, and our 10 byte buffer
is going to take 12 bytes (3 words) of memory. That is why SP is being
subtracted by 20. With that in mind our stack looks like this when
function() is called (each space represents a byte):
bottom of top of
memory memory
buffer2 buffer1 sfp ret a b c
<------ [ ][ ][ ][ ][ ][ ][ ]
top of bottom of
stack stack
What can malicious attackers do if she
figured out how the function foo()
works? Basically, to what kind of
security potential problems is this
code vulnerable?
This is probably not the best example of a bug that could be easily exploited for security purposes although it could exploited to potentially crash the code simply by using a string of 64-characters or longer.
While it certainly is a bug that will corrupt the address immediately after the array (on the stack) with a single zero byte, there is no easy way for a hacker to inject data into the corrupted area. Calling the printf() function will push parameters on the stack and may clear the zero that was written out of array bounds and lead to a potentially unterminated string being passed to printf.
However, without intimate knowledge of what goes on in printf (and needing to exploit printf as well as foo), a hacker would be hard pressed to do anything other than crash your code.
FWIW, this is a good reason to compile with warnings on or to use functions like strncpy_s which both respects buffer size and also includes a terminating null even if the copied string is larger than the buffer. With strncpy_s, the line "buffer[sizeof(buffer)] = '\0';" is not even necessary.
The issue is that you don't have permission to write to the item after the array. When you asked for 64 chars for buffer, the system is required to give you at least 64 bytes. It's normal for the system to give you more than that -- in which case the memory belongs to you and there is no problem in practice -- but it is possible that even the first byte after the array belongs to "somebody else."
So what happens if you overwrite it? If the "somebody else" is actually inside your program (maybe in a different structure or thread) the operating system probably won't notice you trampled on that data, but that other structure or thread might. There's no telling what data should be there or how trampling over it will affect things.
In this case you allocated buffer on the stack, which means (1) the somebody else is you, and in fact is your current stack frame, and (2) it's not in another thread (but could affect other local variables in the current stack frame).

c++ what happens if you print more characters with sprintf, than the char pointer has allocated?

I assume this is a common way to use sprintf:
char pText[x];
sprintf(pText, "helloworld %d", Count );
but what exactly happens, if the char pointer has less memory allocated, than it will be print to?
i.e. what if x is smaller than the length of the second parameter of sprintf?
i am asking, since i get some strange behaviour in the code that follows the sprintf statement.
It's not possible to answer in general "exactly" what will happen. Doing this invokes what is called Undefined behavior, which basically means that anything might happen.
It's a good idea to simply avoid such cases, and use safe functions where available:
char pText[12];
snprintf(pText, sizeof pText, "helloworld %d", count);
Note how snprintf() takes an additional argument that is the buffer size, and won't write more than there is room for.
This is a common error and leads to memory after the char array being overwritten. So, for example, there could be some ints or another array in the memory after the char array and those would get overwritten with the text.
See a nice detailed description about the whole problem (buffer overflows) here. There's also a comment that some architectures provide a snprintf routine that has a fourth parameter that defines the maximum length (in your case x). If your compiler doesn't know it, you can also write it yourself to make sure you can't get such errors (or just check that you always have enough space allocated).
Note that the behaviour after such an error is undefined and can lead to very strange errors. Variables are usually aligned at memory locations divisible by 4, so you sometimes won't notice the error in most cases where you have written one or two bytes too much (i.e. forget to make place for a NUL), but get strange errors in other cases. These errors are hard to debug because other variables get changed and errors will often occur in a completely different part of the code.
This is called a buffer overrun.
sprintf will overwrite the memory that happens to follow pText address-wise. Since pText is on the stack, sprintf can overwrite local variables, function arguments and the return address, leading to all sorts of bugs. Many security vulnerabilities result from this kind of code — e.g. an attacker uses the buffer overrun to write a new return address pointing to his own code.
The behaviour in this situation is undefined. Normally, you will crash, but you might also see no ill effects, strange values appearing in unrelated variables and that kind of thing. Your code might also call into the wrong functions, format your hard-drive and kill other running programs. It is best to resolve this by allocating more memory for your buffer.
I have done this many times, you will receive memory corruption error. AFAIK, I remember i have done some thing like this:-
vector<char> vecMyObj(10);
vecMyObj.resize(10);
sprintf(&vecMyObj[0],"helloworld %d", count);
But when destructor of vector is called, my program receive memory corruption error, if size is less then 10, it will work successfully.
Can you spell Buffer Overflow ? One possible result will be stack corruption, and make your app vulnerable to Stack-based exploitation.