I was recently reading about stack & heap corruption in C & C++. The author of the website demonstrates stack corruption using below example.
#include<stdio.h>
int main(void)
{
int b = 10;
int a[3];
a[0] = 1;
a[1] = 2;
a[2] = 3;
printf(" b = %d \n",b);
a[3] = 12; // oops it is invalid, behaviour is undefined
printf(" b = %d \n",b);
printf("address of b= %x\n",&b);
printf("address of a[3]= %x\n",&a[3]);
return 0;
}
I tested above program on visual studio 2010 compiler (VC++) & it gives me runtime error that says:
stack around variable a gets corrupted
Now my question: is stack corrupted for lifetime or it is only for the time during when above erroneous program was being executed?
Same way, I know that deleting same pointer twice might do really bad things like heap corruption.
The following code:
int* p=new int();
delete p;
delete p; // oops disaster here, undefined behaviour
When the above code fragment executes the VC++ shows heap corruption error at runtime.
It is Undefined Behaviour. You cannot know what will happen if you do 'forbidden' things. You have no guarantee that your program will work well.
You have to be careful with terminology here. Will the stack be "corrupted" for the remainder of your program's life? It may be; it may not be. In this instance you've only corrupted data within the current stack frame, so once you're out of that function call, in practice your "corruption" will have gone.
But that's not quite the whole story. Since you've overwritten a variable with bytes that aren't supposed to be there, what knock-on effects might that have on your program? The consequences of this memory corruption could feasibly be logically passed on to other function scopes, or even other computers if you're sending this data over a network connection and the data is no longer in the expected form. (Typically, your data protocol will have safety features built into it to detect and discard unexpected forms of data; but, that's up to you.)
The same is true of heap corruption. Any time you overwrite the bytes of something that is not supposed to be overwritten, and any time you do so with arbitrary or unknowable data, you run the risk of potentially catastrophic consequences that may logically last well beyond the lifetime of your program.
Within the scope of C++ as a language, this condition is summed up in a specific phrase: undefined behaviour. It states that you can't really rely on anything at all after you've corrupted your memory. Once you've invoked UB, all bets are off.
The one guarantee that you usually have in practice is that your OS will not allow you to directly overwrite any memory that does not belong to your program. That is, corrupting the memory of other processes or of the OS itself is very difficult. The memory model of modern OSs is deliberately designed that way in order to keep programs isolated and prevent this kind of damage from broken programs and/or viruses.
C++ as well as C does not have array boundary overflow or underflow check. However, you can abstract out, you may define an array with overloaded index operator (operator []) where you can check for array index out of bounds and act accordingly. When you delete a pointer using delete ptr (when ptr is allocated through new), the space allocated before is returned back to heap space, however the value of the pointer becomes same as before. So, it;s a good programming practice that you should make the ptr NULL after delete, e.g.
int* p=new int();
...
if (p) {
delete p;
p = (int *) NULL;
}
// double deletion is prevented, and ptr us not dangling any more
if (p) {
delete p;
p = (int *) NULL;
}
However, stack or heap corruption, if at all, lies confined within the program space and when the program terminates, normally or abnormally, all memory occupies are released back to the operating system
Related
This Question statement is came in picture due to statement made by user (Georg Schölly 116K Reputation) in his Question Should one really set pointers to `NULL` after freeing them?
if this Question statement is true
Then How data will corrupt I am not getting ?
Code
#include<iostream>
int main()
{
int count_1=1, count_2=11, i;
int *p=(int*)malloc(4*sizeof(int));
std::cout<<p<<"\n";
for(i=0;i<=3;i++)
{
*(p+i)=count_1++;
}
for(i=0;i<=3;i++)
{
std::cout<<*(p+i)<<" ";
}
std::cout<<"\n";
free(p);
p=(int*)malloc(6*sizeof(int));
std::cout<<p<<"\n";
for(i=0;i<=5;i++)
{
*(p+i)=count_2++;
}
for(i=0;i<=3;i++)
{
std::cout<<*(p+i)<<" ";
}
}
Output
0xb91a50
1 2 3 4
0xb91a50
11 12 13 14
Again it is allocating same memory location after freeing (0xb91a50), but it is working fine, isn't it ?
You do not reuse the old pointer in your code. After p=(int*)malloc(6*sizeof(int));, p point to a nice new allocated array and you can use it without any problem. The data corruption problem quoted by Georg would occur in code similar to that:
int *p=(int*)malloc(4*sizeof(int));
...
free(p);
// use a different pointer but will get same address because of previous free
int *pp=(int*)malloc(6*sizeof(int));
std::cout<<p<<"\n";
for(i=0;i<=5;i++)
{
*(pp+i)=count_2++;
}
p[2] = 23; //erroneouly using the old pointer will corrupt the new array
for(i=0;i<=3;i++)
{
std::cout<<*(pp+i)<<" ";
}
Setting the pointer to NULL after you free a block of memory is a precaution with the following advantages:
it is a simple way to indicate that the block has been freed, or has not been allocated.
the pointer can be tested, thus preventing access attempts or erroneous calls to free the same block again. Note that free(p) with p a null pointer is OK, as well as delete p;.
it may help detect bugs: if the program tries to access the freed object, a crash is certain on most targets if the pointer has been set to NULL whereas if the pointer has not been cleared, modifying the freed object may succeed and result in corrupting the heap or another object that would happen to have been allocated at the same address.
Yet this is not a perfect solution:
the pointer may have been copied and these copies still point to the freed object.
In your example, you reuse the pointer immediately so setting it to NULL after the first call to free is not very useful. As a matter of fact, if you wrote p = NULL; the compiler would probably optimize this assignment out and not generate code for it.
Note also that using malloc() and free() in C++ code is frowned upon. You should use new and delete or vector templates.
I'm trying to figure out what is the difference between those three kinds of problems associated with memory models.
If I want to simulate a memory leak scenario, I can create a pointer without calling corresponding delete method.
int main() {
// OK
int * p = new int;
delete p;
// Memory leak
int * q = new int;
// no delete
}
If I want to simulate a double free scenario, I can free a pointer twice and this part memory will be assigned twice later.
a = malloc(10); // 0xa04010
b = malloc(10); // 0xa04030
c = malloc(10); // 0xa04050
free(a);
free(b); // To bypass "double free or corruption (fasttop)" check
free(a); // Double Free !!
d = malloc(10); // 0xa04010
e = malloc(10); // 0xa04030
f = malloc(10); // 0xa04010 - Same as 'd' !
However, I don't know what is accessing freed memory. Can anybody give me an example of accessing freed memory?
Memory leaks are bad.
Double frees are worse.
Accessing freed memory is worser.
Memory leaks
This is not an error per se. A leaking program is stil valid. It may not be a problem. But this is still bad; with time, your program will reserve memory from the host and never release it. If the host's memory is full before the program completion, you run into troubles.
Double frees
Per the standard, this is undefined behaviour. In practice, this is almost always a call to std::abort() by the C++ runtime.
Accessing freed memory
Also undefined behaviour. But in some case, nothing bad will happen. You'll test your program, put it in production. And some day, for no apparent reason, it will break. And it will break hard: randomly. The best time to rework your résumé.
And here is how to access freed memory:
// dont do this at home
int* n = new int{};
delete n;
std::cout << *n << "\n"; // UNDEFINED BEHAVIOUR. DONT.
Your examples of a memory leak (allocating memory but freeing it) and double-free (passing a pointer to allocated memory to free / delete more than once) are correct.
Performing a double-free does not however mean that a section of memory will be returned more than once by malloc as your example indicates. What it does do is invoke undefined behavior, meaning the behavior of your program cannot be predicted going forward.
Accessing free'ed memory means freeing a pointer and then subsequently trying to use it:
int *a = malloc(10 * sizeof(int)); // allocate memory
free(a); // free memory
print("a[0]=%d\n", a[0]); // illegal: use after free
You are correct about making a memory leak and a double-free. Accessing freed memory happens when you dereference a pointer that has been freed:
int *ptr = malloc(sizeof(int));
*ptr = 123;
free(ptr);
int invalid = *ptr; // Accessing freed memory
Problems like this are notoriously hard to detect, because the program continues to work as expected for some time. If you expect to reuse the pointer variable at some later time, it is a good idea to assign it NULL immediately after calling free. This way a subsequent dereference would fail fast.
I'm trying to figure out what is the difference between those three kinds of problems associated with memory models.
memory leak - you dynamically allocate memory and never release it.
double free - you dynamically allocate memory and release it multiple times
accessing after free - you dynamically allocate memory then release and access that memory after release.
int main()
{
int *pnPtr = new int;
delete pnPtr;
*pnPtr = 4;
cout<<*pnPtr;
}
Ans) 4 while I tried to execute in visual studio 2010.
Please, explain me how 4 is displayed as output?
what did you expect it to return?
Understand, that delete only frees the allocated memory pointed at, but leaves the pointer as it is. You can still use the pointer to do something to the pointed address.
int main()
{
int *pnPtr = new int; // allocate memory, pnPtr now points to it
delete pnPtr; // delete allocated memory, pnPtr still points to that location
*pnPtr = 4; // set memory at pointed address to 4
cout<<*pnPtr;
}
The pointer pnPtr still points to somewhere in memory, calling delete does not change the value of pnPtr itself. The memory is just no longer allocated for your process.
delete does not change the value of the pointer, it destroys the int size memory association but if not reallocated then it can be available. That has happened in your case.
Issue is due to Dangling pointers that arise during object destruction, when an object that has an incoming reference(pointer) is freed or de-allocated, without modifying the value of the pointer, so that the pointer still points to the memory location of the de-allocated memory.
Assume if the system reallocates the previously freed memory to another process, if the original program then de-references the (now) dangling pointer, unpredictable behavior may result, as the memory may now contain completely different data. This is especially the case if the program writes data to memory pointed by a dangling pointer a silent corruption of unrelated data may result, leading to:
Subtle bugs that can be extremely difficult to find, or
cause segmentation faults or
general protection faults.
So, it is recommended to reset the freed/deleted pointer as follows:
int main()
{
int *pnPtr = new int;
delete pnPtr;
pnPtr = null;
// To-do Logic
return 0;
}
I am sorry if I may not have phrased the question correctly, but in the following code:
int main() {
char* a=new char[5];
a="2222";
a[7]='f'; //Error thrown here
cout<<a;
}
If we try to access a[7] in the program, we get an error because we haven't been assigned a[7].
But if I do the same thing in a class :
class str
{
public:
char* a;
str(char *s) {
a=new char[5];
strcpy(a,s);
}
};
int main()
{
str s("ssss");
s.a[4]='f';s.a[5]='f';s.a[6]='f';s.a[7]='f';
cout<<s.a<<endl;
return 0;
}
The code works, printing the characters "abcdfff".
How are we able to access a[7], etc in the code when we have only allocated char[5] to a while we were not able to do so in the first program?
In your first case, you have an error:
int main()
{
char* a=new char[5]; // declare a dynamic char array of size 5
a="2222"; // assign the pointer to a string literal "2222" - MEMORY LEAK HERE
a[7]='f'; // accessing array out of bounds!
// ...
}
You are creating a memory leak and then asking why undefined behavior is undefined.
Your second example is asking, again, why undefined behavior is undefined.
As others have said, it's undefined behavior. When you write to memory out of bounds of the allocated memory for the pointer, several things can happen
You overwrite an allocated, but unused and so far unimportant location
You overwrite a memory location that stores something important for your program, which will lead to errors because you've corrupted your own memory at that point
You overwrite a memory location that you aren't allowed to access (something out of your program's memory space) and the OS freaks out, causing an error like "AccessViolation" or something
For your specific examples, where the memory is allocated is based on how the variable is defined and what other memory has to be allocated for your program to run. This may impact the probability of getting one error or another, or not getting an error at all. BUT, whether or not you see an error, you shouldn't access memory locations out of your allocated memory space because like others have said, it's undefined and you will get non-deterministic behavior mixed with errors.
int main() {
char* a=new char[5];
a="2222";
a[7]='f'; //Error thrown here
cout<<a;
}
If we try to access a[7] in the program, we get an error because we
haven't been assigned a[7].
No, you get a memory error from accessing memory that is write-protected, because a is pointing to the write-only memory of "2222", and by chance two bytes after the end of that string is ALSO write-protected. If you used the same strcpy as you use in the class str, the memory access would overwrite some "random" data after the allocated memory which is quite possibly NOT going to fail in the same way.
It is indeed invalid (undefined behaviour) to access memory outside of the memory you have allocated. The compiler, C and C++ runtime library and OS that your code is produced with and running on top of is not guaranteed to detect all such things (because it can be quite time-consuming to check every single operation that accesses memory). But it's guaranteed to be "wrong" to access memory outside of what has been allocated - it just isn't always detected.
As mentioned in other answers, accessing memory past the end of an array is undefined behavior, i.e. you don't know what will happen. If you are lucky, the program crashes; if not, the program continues as if nothing was wrong.
C and C++ do not perform bounds checks on (simple) arrays for performance reasons.
The syntax a[7] simply means go to memory position X + sizeof(a[0]), where X is the address where a starts to be stored, and read/write. If you try to read/write within the memory that you have reserved, everything is fine; if outside, nobody knows what happens (see the answer from #reblace).
I made a small program that looked like this:
void foo () {
char *str = "+++"; // length of str = 3 bytes
char buffer[1];
strcpy (buffer, str);
cout << buffer;
}
int main () {
foo ();
}
I was expecting that a stack overflow exception would appear because the buffer had smaller size than the str but it printed out +++ successfully... Can someone please explain why would this happened ?
Thank you very much.
Undefined Behavior(UB) happened and you were unlucky it did not crash.
Writing beyond the bounds of allocated memory is Undefined Behavior and UB does not warrant a crash. Anything might happen.
Undefined behavior means that the behavior cannot be defined.
You don't get a stack overflow because it's undefined behaviour, which means anything can happen.
Many compilers today have special flags that tell them to insert code to check some stack problems, but you often need to explicitly tell the compiler to enable that.
Undefined behavior...
In case you actually care about why there's a good chance of getting a "correct" result in this case: there are a couple of contributing factors. Variables with auto storage class (i.e., normal, local variables) will typically be allocated on the stack. In a typical case, all items on the stack will be a multiple of some specific size, most often int -- for example, on a typical 32-bit system, the smallest item you can allocate on the stack will be 32 bits. In other words, on your typical 32-bit system, room for four bytes (of four chars, if you prefer that term).
Now, as it happens, your source string contained only 3 characters, plus the NUL terminator, for a total of 4 characters. By pure bad chance, that just happened to be short enough to fit into the space the compiler was (sort of) forced to allocate for buffer, even though you told it to allocate less.
If, however, you'd copied a longer string to the target (possibly even just a single byte/char longer) chances of major problems would go up substantially (though in 64-bit software, you'd probably need longer still).
There is one other point to consider as well: depending on the system and the direction the stack grows, you might be able to write well the end of the space you allocated, and still have things appear to work. You've allocated buffer in main. The only other thing defined in main is str, but it's just a string literal -- so chances are that no space is actually allocated to store the address of the string literal. You end up with the string literal itself allocated statically (not on the stack) and its address substituted where you've used str. Therefore, if you write past the end of buffer, you may be just writing into whatever space is left at the top of the stack. In a typical case, the stack will be allocated one page at a time. On most systems, a page is 4K or 8K in size, so for a random amount of space used on the stack, you can expect an average of 2K or 4K free respectively.
In reality, since this is in main and nothing else has been called, you can expect the stack to be almost empty, so chances are that there's close to a full page of unused space at the top of the stack, so copying the string into the destination might appear to work until/unless the source string was quite long (e.g., several kilobytes).
As to why it will often fail much sooner than that though: in a typical case, the stack grows downward, but the addresses used by buffer[n] will grow upward. In a typical case, the next item on the stack "above" buffer will be the return address from main to the startup code that called main -- therefore, as soon as you write past the amount of space on the stack for buffer (which, as above, is likely to be larger than you specified) you'll end up overwriting the return address from main. In that case, the code inside main will often appear to work fine, but as soon as execution (tries to) return from main, it'll end up using that data you just wrote as the return address, at which point you're a lot more likely to see visible problems.
Outlining what happens:
Either you are lucky and it crashes at once. Or because it's undefined technically you could end up writing to a memory address used by something else. say that you had two buffers, one buffer[1] and one longbuffer[100] and assume that the memory address at buffer[2] could be the same as longbuffer[0] which would mean that long buffer now terminates at longbuffer[1] (because the null-termination).
char *s = "+++";
char longbuffer[100] = "lorem ipsum dolor sith ameth";
char buffer[1];
strcpy (buffer, str);
/*
buffer[0] = +
buffer[1] = +
buffer[2] = longbuffer[0] = +
buffer[3] = longbuffer[0] = \0 <- since assigning s will null terminate (i.e. add a \0)
*/
std::cout << longbuffer; // will output: +
Hope that helps in clarifying please note it's not very likely that these memory addresses will be the same in the random case, but it could happen, and it doesn't even need to be the same type, anything can be at buffer[2] and buffer[3] addresses before being overwritten by the assignment. Then the next time you try to use your (now destroyed) variable it might well crash, and thats when debugging become a bit tedious since the crash doesn't seem to have much to do with the real problem. (i.e. it crashes when you try to access a variable on your stack while the real problem is that you somewhere else in your code destroyed it).
There is no explicit bounds checking, or exception throwing on strcpy - it's a C function. If you want to use C functions in C++, you're going to have to take on the responsibility of checking for bounds etc. or switch to using std::string.
In this case it did work, but in a critical system, taking this approach might mean that your unit tests pass but in production, your code barfs - not a situation that you want.
Stack corruption is happening, its an undefined behaviour, luckily crash didnt occur. Do the below modifications in your program and run it will crash surely because of stack corruption.
void foo () {
char *str = "+++"; // length of str = 3 bytes
int a = 10;
int *p = NULL;
char buffer[1];
int *q = NULL;
int b = 20;
p = &a;
q = &b;
cout << *p;
cout << *q;
//strcpy (buffer, str);
//Now uncomment the strcpy it will surely crash in any one of the below cout statment.
cout << *p;
cout << *q;
cout << buffer;
}