Stack corruption in C++ - c++

In C++, in which way the stack may get corrupted. One way I guess is to overwriting the stack variables by accessing an array beyond its boundaries. Is there any other way that it can get corrupted?

You could have a random/undefined pointer that ends up pointing to the stack, and write though that.
An assembly function could incorrectly setup/modify/restore the stack
Cosmic waves could flips bits in the stack.
Radioactive elements in the chip's casing could flip bits.
Anything in the kernel could go wrong and accidentally change your stack memory.
But those are not particular to C++, which doesn't have any idea of the stack.

Violations of the One Definition Rule can lead to stack corruption. The following example looks stupid, but I've seen it a couple of times with different libraries compiled in different configurations.
header.h
struct MyStruct
{
int val;
#ifdef LARGEMYSTRUCT
char padding[16];
#endif
}
file1.cpp
#define LARGEMYSTRUCT
#include "header.h"
//Here it looks like MyStruct is 20 bytes in size
void func(MyStruct s)
{
memset(s.padding, 0, 16); //corrupts the stack as below file2.cpp does not have LARGEMYSTRUCT declared and declares Mystruct with 4 bytes
return; //Will probably crash here as the return pointer has been overwritten
}
file2.cpp
#include "header.h"
//Here it looks like MyStruct is only 4 bytes in size.
extern void func(MyStruct s);
void caller()
{
MyStruct s;
func(s); //push four bytes on to the stack
}

Taking pointers to stack variables is a good way:
void foo()
{
my_struct s;
bar(&s);
}
If bar keeps a copy of the pointer then anything can happen in the future.
Summing up: Stack corruption happens when there's stray pointers pointing to the stack.

The C++ standard does not define stack/heap. Further, there are a number of ways to invoke undefined behavior in a program -- all of which may corrupt your stack (it's UB, after all). The short answer is -- your question is too vague to have a meaningful answer.

Calling a function with the wrong calling convention.
(though this is technically compiler-specific, not a question of C++, every C++ compiler has to deal with that.)

Throwing an exception inside a destructor is a good candidate. It would mess up the stack unwinding.

Related

When exactly is stack allocated [duplicate]

This question already has answers here:
At what exact moment is a local variable allocated storage?
(5 answers)
Closed 6 years ago.
Even in C (not just C++) you can declare variables at the start of a code block, which is enclosed in curly braces.
Example:
#include <stdio.h>
void use_stack(int cnt)
{
if (cnt<=16) {
int a[16];
int i;
a[0]=3;
a[1]=5;
for (i=2;i<cnt;i++) {
a[i]=a[i-1]+a[i-2];
}
printf("a[%d] == %d\n",cnt-1,a[cnt-1]);
}
else {
printf("cnt is too big\n");
}
}
Now I know that variables like the array a[16] are allocated on the stack in this case.
I was wondering if the space for this array is allocated at the start of the function (first opening curly brace) or at the start of the block where it is declared (opening curly brace after if).
From examining the assembler code it seems the compiler allocates the space for a[16] directly at the entry of the function.
I actually expected that the stack would be allocated (stack pointer decreased) at the declaration of a[16] and that the stack would be de-allocated (stack pointer increased) at the end of the corresponding if code block.
But this does not happen it seems (stack for a[16] is allocated directly at function entry, even if a[16] is not used in the else branch).
Has anyone an explanation why this is the case ?
So is there any part of the C language standard, which explains this behavior, or does it have to do with things like "longjmp" or signal handling, which maybe require that the stack pointer is "constant" inside a function ?
Note: The reason why I assumed the stack would be allocated/deallocated at the start/end of the code block is, because in C++ the constructor/destructor of objects allocated on the stack will be called at the start/end of the code block. So if you examine the assembler code of a C++ program you will notice that the stack is still allocated at the function entry; just the constructor/destructor call will be done at the start/end of the code block.
I am explicitly interested why stack is not allocated/deallocated at the start/end of a code block using curly braces.
The question: At what exact moment is a local variable allocated storage? is only about a local variable allocated at the start of a function. I am surprised that stack allocation for variables allocated later inside a code block is also done at the function entry.
So far the answers have been:
Something to do with optimization
Might different for C, C++
Stack is not even mentioned in the C language specification
So: I am interested in the answer for C... (and I strongly believe that the answer will apply to C++ also, but I am not asking about C++ :-)).
Optimization: Here is an example which will directly demonstrate why I am so surprised and why I am quite sure that this is not an optimization:
#include <stdio.h>
static char *stackA;
static char *stackB;
static void calc(int c,int *array)
{
int result;
if (c<=0) {
// base case c<=0:
stackB=(char *)(&result);
printf("stack ptr calc() = %p\n",stackB);
if (array==NULL) {
printf("a[0] == 1\n");
} else {
array[0]=1;
}
return;
}
// here: c>0
if (array==NULL) {
// no array allocated, so allocate it now
int i;
int a[2500];
// calculate array entries recursively
calc(c-1,a);
// calculate current array entry a[c]
a[c]=a[c-1]+3;
// print full array
for(i=0;i<=c;i++) {
printf("a[%d] = %d\n",i,a[i]);
}
} else {
// array already allocated
calc(c-1,array);
// calculate current array entry a[c]
array[c]=array[c-1]+3;
}
}
int main()
{
int a;
stackA=(char *)(&a);
printf("stack ptr main() = %p\n",stackA);
calc(9,NULL);
printf("used stack = %d\n",(int)(stackA-stackB));
}
I am aware that this is an ugly program :-).
The function calc calculates n*3 + 1 for all 0<=n<=c in a recursive fashion.
If you look at the code for calc you notice that the array a[2500] is only declared when the input parameter array to the function is NULL.
Now this only happens in the call to calc which is done in main.
The stackA and stackB pointers are used to calculate a rough estimate how much stack is used by this program.
Now: int a[2500] should consume around 10000 bytes (4 bytes per integer, 2500 entries). So you could expect that the whole program consumes around 10000 bytes of stack + something additional (for overhead when calc is called recursively).
But: It turns out this program consumes around 100000 bytes of stack (10 times as much as expected). The reason is, that for each call of calc the array a[2500] is allocated, even if it is only used in the first call. There are 10 calls to calc (0<=c<=9) and so you consume 100000 bytes of stack.
It does not matter if you compile the program with or without optimization
GCC-4.8.4 and clang for x64, MS Visual C++ 2010, Windriver for DIAB (for PowerPC) all exhibit this behavior
Even weirder: C99 introduces Variable Length Arrays. If I replace int a[2500]; in the above code with int a[2500+c]; then the program uses less stack space (around 90000 bytes less).
Note: If I only change the call to calc in main to calc(1000,NULL); the program will crash (stack overflow == segmentation fault). If I additionally change to int a[2500+c]; the program works and uses less than 100KB stack. I still would like to see an answer, which explains why a Variable Length Array does not lead to a stack overflow whereas a fixed length array does lead to a stack overflow, in particular since this fixed length array is out of scope (except for the first invocation of calc).
So what's the reason for this behavior in C ?
I do not believe that GCC/clang both simply are not able to do better; I strongly believe there has to be a technical reason for this. Any ideas ?
Answer by Google
After more googling: I strongly believe this has something to do with "setjmp/longjmp" behavior. Google for "Variable Length Array longjmp" and see for yourself. It seems longjmp is hard to implement if you do not allocate all arrays at function entry.
The language rules for automatic storage only guarantees that the last allocated is the first deallocated.
A compiler can implement this logical stack any way it sees fit.
If it can prove that a function isn't recursive it can even allocated the storage at program start-up.
I believe that the above applies to C as well as C++, but I'm no C expert.
Please, when you ask about the details of a programming language, limit the question to one language at a time.
There is no technical reason for this other than choices that compiler makers made. It's less generated code and faster executing code to always reserve all the stack space we'll need at the beginning of the function. So all the compilers made the same reasonable performance tradeoff.
Try using a variable length array and you'll see that the compiler is fully capable of generating code that "allocates" stack just for a block. Something like this:
void
foo(int sz, int x)
{
extern void bar(char *);
if (x) {
char a[sz];
bar(a);
} else {
char a[10];
bar(a);
}
}
My compiler generates code that always reserves stack space for the x is false part, but the space for the true part is only reserved if x is true.
How this is done isn't regulated by any standard. The C and C++ standards don't mention the stack at all, in theory those languages could be used even on computers that don't have a stack.
On computers that do have a stack, how this is done is specified by the ABI of the given system. Often, stack space is reserved at the point when the program enters the function. But compilers are free to optimize the code so that the stack space is only reserved when a certain variable is used.
At any rate, the point where you declare the variable has no relation to when it gets allocated. In your example, int a[16] is either allocated when the function is entered, or it is allocated just before the first place where it is used. It doesn't matter if a is declared inside the if statement or at the top of the function.
In C++ however, there is the concept of constructors. If your variable is an object with a constructor, then that constructor will get executed at the point where the variable is declared. Meaning that the variable must be allocated before that happens.
Alf has explained the limitations and freedoms that the standard specifies (or rather, what it doesn't specify).
I'll suggest an answer to the question
why stack is not allocated/deallocated at the start/end of a code block using curly braces.
The compilers that you tested chose to do so (I'm actually just guessing, I didn't write any of them), because of better run-time performance and simplicity (and of course, because it is allowed).
Allocating 96 bytes (arbitrary example) once takes about half as much time as allocating 48 bytes twice. And third as much times as allocating 32 bytes thrice.
Consider a loop as an extreme case:
for(int i = 0; i < 1000000; i++) {
int j;
}
If j is allocated at the start of the function, there is one allocation. If j is allocated within the loop body, then there will be a million allocations. Less allocations is better (faster).
Note: The reason why I assumed the stack would be allocated/deallocated at the start/end of the code block is, because in C++ the constructor/destructor of objects allocated on the stack will be called at the start/end of the code block.
Now you know that you were wrong to have assumed so. As written in a fine answer in the linked question, the allocation/dealocation need not coincide with the construction/destruction.
Imagine I would use a[100000]
That is approaching a very significant fraction of total stack space. You should allocate large blocks of memory like that dynamically.

How to make uninitiated pointer not equal to 0/null?

I am a c++ learner. Others told me "uninitiatied pointer may point to anywhere". How to prove that by code.?I made a little test code but my uninitiatied pointer always point to 0. In which case it does not point to 0? Thanks
#include <iostream>
using namespace std;
int main() {
int* p;
printf("%d\n", p);
char* p1;
printf("%d\n", p1);
return 0;
}
Any uninitialized variable by definition has an indeterminate value until a value is supplied, and even accessing it is undefined. Because this is the grey-area of undefined behaviour, there's no way you can guarantee that an uninitialized pointer will be anything other than 0.
Anything you write to demonstrate this would be dictated by the compiler and system you are running on.
If you really want to, you can try writing a function that fills up a local array with garbage values, and create another function that defines an uninitialized pointer and prints it. Run the second function after the first in your main() and you might see it.
Edit: For you curiosity, I exhibited the behavior with VS2015 on my system with this code:
void f1()
{
// junk
char arr[24];
for (char& c : arr) c = 1;
}
void f2()
{
// uninitialized
int* ptr[4];
std::cout << (std::uintptr_t)ptr[1] << std::endl;
}
int main()
{
f1();
f2();
return 0;
}
Which prints 16843009 (0x01010101). But again, this is all undefined behaviour.
Well, I think it is not worth to prove this question, because a good coding style should be used and this say's: Initialise all variables! One example: If you "free" a pointer, just give them a value like in this example:
char *p=NULL; // yes, this is not needed but do it! later you may change your program an add code beneath this line...
p=(char *)malloc(512);
...
free(p);
p=NULL;
That is a safe and good style. Also if you use free(p) again by accident, it will not crash your program ! In this example - if you don't set NULL to p after doing a free(), your can use the pointer by mistake again and your program would try to address already freed memory - this will crash your program or (more bad) may end in strange results.
So don't waste time on you question about a case where pointers do not point to NULL. Just set values to your variables (pointers) ! :-)
It depends on the compiler. Your code executed on an old MSVC2008 displays in release mode (plain random):
1955116784
1955116784
and in debug mode (after croaking for using unitialized pointer usage):
-858993460
-858993460
because that implementation sets uninitialized pointers to 0xcccccccc in debug mode to detect their usage.
The standard says that using an uninitialized pointer leads to undefined behaviour. That means that from the standard anything can happen. But a particular implementation is free to do whatever it wants:
yours happen to set the pointers to 0 (but you should not rely on it unless it is documented in the implementation documentation)
MSVC in debug mode sets the pointer to 0xcccccccc in debug mode but AFAIK does not document it (*), so we still cannot rely on it
(*) at least I could not find any reference...

buffer overflow not affecting const variable?

I don't know much about hacking, c, assembly, memory and all that stuff so I coudln't resolve my question by my self.
So, buffer overflow overflows into other address of variables and corrupts them. So I tested and it really does. And I thought that if it can overflow constant variable buffer overflow must be super powered and I tested, but it doesn't overflow const variable.
Why is that?
int a;
char buffer[8];
and
const int a;
char buffer[8];
has address of variable 'buffer' in front of address of variable 'a' by size of the 'buffer'. Is there something special in const variable when allocated to a memory?
my sample code:
#include <stdio.h>
#include <string.h>
int main() {
char buffer1[8];
const int a=0; //vs int a=0;
char buffer2[8];
strcpy(buffer1,"one");
strcpy(buffer2,"two");
printf("Buffer 2 [%p]: %s\n",buffer2,buffer2);
printf("a [%p]: %d\n",&a,a);
printf("Buffer 1 [%p]: %s\n",buffer1,buffer1);
printf("\nCopy Buffer\n\n");
strcpy(buffer2,"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");
printf("Buffer 2 [%p]: %s\n",buffer2,buffer2);
printf("a [%p]: %d\n",&a,a);
printf("Buffer 1 [%p]: %s\n",buffer1,buffer1);
return 0;
}
Three things come to mind:
This is undefined behavior, so all bets are off the table.
The compiler doesn't even have to look at what a is. If I were a compiler, I would just look at that code and say "a is always zero, so I'm just going to go ahead and replace a with 0." The thing is, when you say printf("%p %d", &a, a);, the compiler doesn't even have to get the contents of a. It knows it's going to be zero, always and forever.* So it can just change that code to printf("%p %d", &a, 0);.
Even if a were not const, the compiler would be allowed to "cache" the value of a in a register. It only needs to look up the value once, and then it knows that a never changes*, so it can reuse the value it looked up from before.
*Compilers will make a lot of assumptions, like "this code doesn't invoke undefined behavior." If you invoke undefined behavior, the compiler might make some "wrong" assumptions. Which it probably does, in this case.
Buffer Overflow is UB, so anything can happen.
Also, your compiler could optimize away your const variable, because it has a constant value determined at compile time.
An implementation is allowed to put a const-qualified object in a different memory segment altogether, so that may be why it isn't affected (emphasis on may; since behavior on buffer overflow is undefined, there could be any number of reasons).
Online C 2011 standard, section 6.7.3, footnote 132:
The implementation may place a const object that is not volatile in a read-only region of
storage. Moreover, the implementation need not allocate storage for such an object if its address is
never used.

How to find if a variable is allocated in stack or heap?

Stumbled upon this interview question somewhere,
In C,
Given a variable x, how do you find out if the space for that variable is allocated on the stack or heap?
(Is there any way to find it out programatically and not having to go through the symbol table, etc? And does finding if the space is allocated in stack or heap has any practical implications?)
No, not in general.
Do you know of gcc -fsplit-stack ?
It is up to the implementation to decide whether to allocate a contiguous stack or a stack where blocks are interleaved with heap blocks in memory. Good luck figuring out whether a block was allocated for the heap or the stack when the latter is split.
If you are working on an architecture that stores the stack on a larger address than the heap, you could compare the variable address with the bottom of the stack. Using the pthread threading API, this comparison would look like this:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <inttypes.h>
int is_stack(void *ptr)
{
pthread_t self = pthread_self();
pthread_attr_t attr;
void *stack;
size_t stacksize;
pthread_getattr_np(self, &attr);
pthread_attr_getstack(&attr, &stack, &stacksize);
return ((uintptr_t) ptr >= (uintptr_t) stack
&& (uintptr_t) ptr < (uintptr_t) stack + stacksize);
}
The test:
int main()
{
int x;
int *p1 = malloc(sizeof(int));
int *p2 = &x;
printf("%d %d\n", is_stack(p1), is_stack(p2));
return 0;
}
...prints 0 1, as expected.
The above code will not detect storage from stacks in other threads. To do that, the code would need to track all created threads.
This is NOT guaranteed by any standard BUT
on most platforms the stack grows down from highest address available, and heap grows up from the bottom if the most significant byte of the address is in the top half of the available memory space for your platform, and you haven't allocated gigabytes of memory, it's a pretty good bet it's on the stack.
#include <iostream>
#include <stdlib.h>
int main()
{
int x = 0;
int* y = new int;
unsigned int a1 = (int) &x;
unsigned int a2 = (int) y;
std::cout<<std::hex<<a1<<" "<<a2<<std::endl;
}
gives the output ffbff474 21600 on the machine I'm typing this.
It might be a trick question. Variables have either automatic or static storage duration[*]. You can fairly safely say that automatics are allocated "on the stack", at least assuming they aren't optimized into registers. It's not a requirement of the standard that there be "a stack", but a conforming C implementation must maintain a call stack and associate automatic variables with the levels of the call stack. So whatever the details of what it actually does, you can pretty much call that "the stack".
Variables with static storage duration generally inhabit one or more data sections. From the POV of the OS, data sections might be allocated from the heap before the program starts, but from the POV of the program they have no relation to the "free store".
You can tell the storage duration of a variable by examining its definition in the source -- if it's in function scope then it's automatic unless marked static. If it's not in function scope then it has static duration regardless of whether or not it is marked static (since the static keyword means something different there).
There is no portable way to tell the storage duration of a variable from its address, but particular implementations might provide ways to do it, or tools that you can use with greater or lesser reliability to take a guess.
Objects can also have dynamic storage duration (which generally is what "allocated on the heap" is intended to mean), but such objects are not variables, so that would be the trick if there is one.
[*] Or thread-local in C11 and C++11.
I don't think it has solutions. The code may adjust var's address by stack(heap) address scope, but it's would not be an exact way. At most, the code can only run in some certain platforms.
No it's not possible to determine that by memory location, the compiler would have to support it with isstack() to be portable.
One possible way to track memory allocation SPECIFICALY ON THE HEAP in C++ is overloading the new operator to keep tracking that for you. Then you know that if the memory is not allocated on the heap, it must be allocated on the stack.
#include <iostream>
#include <string>
// Variable to track how many allocations has been done.
static uint32_t s_alloc_count = 0;
void* operator new(size_t size){
s_alloc_count++;
std::cout << "Allocating " << size << " bytes\n";
return malloc(size);
}
Tracking the variable s_alloc_count you should be capable of see how many allocations on the heap has been done and have the size of these allocations printed on the console. Using debug tools like breakpoints, running the code "step-by-step" and console logging is one way to track where are these allocations. This is not an automated way but is a way to do that.
OBS: This tip should be only used for tests, avoid this type of code in production code.

What happens to class members when malloc is used instead of new?

I'm studying for a final exam and I stumbled upon a curious question that was part of the exam our teacher gave last year to some poor souls. The question goes something like this:
Is the following program correct, or not? If it is, write down what the program outputs. If it's not, write down why.
The program:
#include<iostream.h>
class cls
{ int x;
public: cls() { x=23; }
int get_x(){ return x; } };
int main()
{ cls *p1, *p2;
p1=new cls;
p2=(cls*)malloc(sizeof(cls));
int x=p1->get_x()+p2->get_x();
cout<<x;
return 0;
}
My first instinct was to answer with "the program is not correct, as new should be used instead of malloc". However, after compiling the program and seeing it output 23 I realize that that answer might not be correct.
The problem is that I was expecting p2->get_x() to return some arbitrary number (whatever happened to be in that spot of the memory when malloc was called). However, it returned 0. I'm not sure whether this is a coincidence or if class members are initialized with 0 when it is malloc-ed.
Is this behavior (p2->x being 0 after malloc) the default? Should I have expected this?
What would your answer to my teacher's question be? (besides forgetting to #include <stdlib.h> for malloc :P)
Is this behavior (p2->x being 0 after malloc) the default? Should I have expected this?
No, p2->x can be anything after the call to malloc. It just happens to be 0 in your test environment.
What would your answer to my teacher's question be? (besides forgetting to #include for malloc :P)
What everyone has told you, new combines the call to get memory from the freestore with a call to the object's constructor. Malloc only does half of that.
Fixing it: While the sample program is wrong. It isn't always wrong to use "malloc" with classes. It is perfectly valid in a shared memory situation you just have to add an in-place call to new:
p2=(cls*)malloc(sizeof(cls));
new(p2) cls;
new calls the constructor, malloc will not. So your object will be in an unknown state.
The actual behaviour is unknown, because new acts pretty the same like malloc + constructor call.
In your code, the second part is missing, hence, it could work in one case, it could not, but you can't say exactly.
Why can't 0 be an arbitrary number too? Are you running in Debug mode? What compiler?
VC++ pre-fills newly allocated memory with a string of 0xCC byte values (in debug mode of-course) so you would not have obtained a zero for an answer if you were using it.
Malloc makes no guarantuee to zero out the memory it allocated and the result of the programm is undefined.
Otherwise there are many other things that keep this program from being correct C++. cout is in namespace std, malloc needs to included through#include <cstdlib> and iostream.h isn't standard compliant either.