Stack overflow for string in C++?

Stack overflow for string in C++? - c++

I made a small program that looked like this:
void foo () {
char *str = "+++"; // length of str = 3 bytes
char buffer[1];
strcpy (buffer, str);
cout << buffer;
}
int main () {
foo ();
}
I was expecting that a stack overflow exception would appear because the buffer had smaller size than the str but it printed out +++ successfully... Can someone please explain why would this happened ?
Thank you very much.

Undefined Behavior(UB) happened and you were unlucky it did not crash.
Writing beyond the bounds of allocated memory is Undefined Behavior and UB does not warrant a crash. Anything might happen.
Undefined behavior means that the behavior cannot be defined.

You don't get a stack overflow because it's undefined behaviour, which means anything can happen.
Many compilers today have special flags that tell them to insert code to check some stack problems, but you often need to explicitly tell the compiler to enable that.

Undefined behavior...
In case you actually care about why there's a good chance of getting a "correct" result in this case: there are a couple of contributing factors. Variables with auto storage class (i.e., normal, local variables) will typically be allocated on the stack. In a typical case, all items on the stack will be a multiple of some specific size, most often int -- for example, on a typical 32-bit system, the smallest item you can allocate on the stack will be 32 bits. In other words, on your typical 32-bit system, room for four bytes (of four chars, if you prefer that term).
Now, as it happens, your source string contained only 3 characters, plus the NUL terminator, for a total of 4 characters. By pure bad chance, that just happened to be short enough to fit into the space the compiler was (sort of) forced to allocate for buffer, even though you told it to allocate less.
If, however, you'd copied a longer string to the target (possibly even just a single byte/char longer) chances of major problems would go up substantially (though in 64-bit software, you'd probably need longer still).
There is one other point to consider as well: depending on the system and the direction the stack grows, you might be able to write well the end of the space you allocated, and still have things appear to work. You've allocated buffer in main. The only other thing defined in main is str, but it's just a string literal -- so chances are that no space is actually allocated to store the address of the string literal. You end up with the string literal itself allocated statically (not on the stack) and its address substituted where you've used str. Therefore, if you write past the end of buffer, you may be just writing into whatever space is left at the top of the stack. In a typical case, the stack will be allocated one page at a time. On most systems, a page is 4K or 8K in size, so for a random amount of space used on the stack, you can expect an average of 2K or 4K free respectively.
In reality, since this is in main and nothing else has been called, you can expect the stack to be almost empty, so chances are that there's close to a full page of unused space at the top of the stack, so copying the string into the destination might appear to work until/unless the source string was quite long (e.g., several kilobytes).
As to why it will often fail much sooner than that though: in a typical case, the stack grows downward, but the addresses used by buffer[n] will grow upward. In a typical case, the next item on the stack "above" buffer will be the return address from main to the startup code that called main -- therefore, as soon as you write past the amount of space on the stack for buffer (which, as above, is likely to be larger than you specified) you'll end up overwriting the return address from main. In that case, the code inside main will often appear to work fine, but as soon as execution (tries to) return from main, it'll end up using that data you just wrote as the return address, at which point you're a lot more likely to see visible problems.

Outlining what happens:
Either you are lucky and it crashes at once. Or because it's undefined technically you could end up writing to a memory address used by something else. say that you had two buffers, one buffer[1] and one longbuffer[100] and assume that the memory address at buffer[2] could be the same as longbuffer[0] which would mean that long buffer now terminates at longbuffer[1] (because the null-termination).
char *s = "+++";
char longbuffer[100] = "lorem ipsum dolor sith ameth";
char buffer[1];
strcpy (buffer, str);
/*
buffer[0] = +
buffer[1] = +
buffer[2] = longbuffer[0] = +
buffer[3] = longbuffer[0] = \0 <- since assigning s will null terminate (i.e. add a \0)
*/
std::cout << longbuffer; // will output: +
Hope that helps in clarifying please note it's not very likely that these memory addresses will be the same in the random case, but it could happen, and it doesn't even need to be the same type, anything can be at buffer[2] and buffer[3] addresses before being overwritten by the assignment. Then the next time you try to use your (now destroyed) variable it might well crash, and thats when debugging become a bit tedious since the crash doesn't seem to have much to do with the real problem. (i.e. it crashes when you try to access a variable on your stack while the real problem is that you somewhere else in your code destroyed it).

There is no explicit bounds checking, or exception throwing on strcpy - it's a C function. If you want to use C functions in C++, you're going to have to take on the responsibility of checking for bounds etc. or switch to using std::string.
In this case it did work, but in a critical system, taking this approach might mean that your unit tests pass but in production, your code barfs - not a situation that you want.

Stack corruption is happening, its an undefined behaviour, luckily crash didnt occur. Do the below modifications in your program and run it will crash surely because of stack corruption.
void foo () {
char *str = "+++"; // length of str = 3 bytes
int a = 10;
int *p = NULL;
char buffer[1];
int *q = NULL;
int b = 20;
p = &a;
q = &b;
cout << *p;
cout << *q;
//strcpy (buffer, str);
//Now uncomment the strcpy it will surely crash in any one of the below cout statment.
cout << *p;
cout << *q;
cout << buffer;
}

Related

C++ new[] operator creates array of length = length + 1?

Why does the new[] operator in C++ actually create an array of length + 1? For example, see this code:
#include <iostream>
int main()
{
std::cout << "Enter a positive integer: ";
int length;
std::cin >> length;
int *array = new int[length]; // use array new. Note that length does not need to be constant!
//int *array;
std::cout << "I just allocated an array of integers of length " << length << '\n';
for (int n = 0; n<=length+1; n++)
{
array[n] = 1; // set element n to value 1
}
std::cout << "array[0] " << array[0] << '\n';
std::cout << "array[length-1] " << array[length-1] << '\n';
std::cout << "array[length] " << array[length] << '\n';
std::cout << "array[length+1] " << array[length+1] << '\n';
delete[] array; // use array delete to deallocate array
array = 0; // use nullptr instead of 0 in C++11
return 0;
}
We dynamically create an array of length "length" but we are able to assign a value at the index length+1. If we try to do length+2, we get an error.
Why is this? Why does C++ make the length = length + 1?

It doesn’t. You’re allowed to calculate the address array + n, for the purpose of checking that another address is less than it. Trying to access the element array[n] is undefined behavior, which means the program becomes meaningless and the compiler is allowed to do anything whatsoever. Literally anything; one old version of GCC, if it saw a #pragma directive, started a roguelike game on the terminal. (Thanks, Revolver_Ocelot, for reminding me: that was technically implementation-defined behavior, a different category.) Even calculating the address array + n + 1 is undefined behavior.
Because it can do anything, the particular compiler you tried that on decided to let you shoot yourself in the foot. If, for example, the next two words after the array were the header of another block in the heap, you might get a memory-corruption bug. Or maybe a compiler stored the array at the top of your memory space, the address &array[n+1] is aNULL` pointer, and trying to dereference it causes a segmentation fault. Or maybe the next page of memory is not readable or writable and trying to access it crashes the program with a protection fault. Or maybe the implementation bounds-checks your array accesses at runtime and crashes the program. Maybe the runtime stuck a canary value after the array and checks later to see if it was overwritten. Or maybe it happens, by accident, to work.
In practice, you really want the compiler to catch those bugs for you instead of trying to track down the bugs that buffer overruns cause later. It would be better to use a std::vector than a dynamic array. If you must use an array, you want to check that all your accesses are in-bounds yourself, because you cannot rely on the compiler to do that for you and skipping them is a major cause of bugs.

If you write or read beyond the end of an array or other object you create with new, your program's behaviour is no longer defined by the C++ standard.
Anything can happen and the compiler and program remain standard compliant.
The most likely thing to happen in this case is you are corrupting memory in the heap. In a small program this "seems to work" as the section of the heap ypu use isn't being used by any other code, in a larger one you will crash or behave randomly elsewhere in a seemingoy unrelated bit of code.
But arbitrary things could happen. The compiler could prove a branch leads to access beyond tue end of an array and dead-code eliminate paths that lead to it (UB that time travels), or it could hit a protected memory region and crash, or it could corrupt heap management data and cause a future new/delete to crash, or nasal demons, or whatever else.

At the for loop you are assigning elements beyond the bounds of the loop and remember that C++ does not do bounds checking.
So when you initialize the array you are initializing beyond the bounds of the array (Say the user enters 3 for length you are initializing 1 to array[0] through array[5] because the condition is n <= length + 1;
The behavior of the array is unpredictable when you go beyond its bounds, but most likely your program will crash. In this case you are going 2 elements beyonds its bounds because you have used = in the condition and length + 1.

There is no requirement that the new [] operator allocate more memory than requested.
What is happening is that your code is running past the end of the allocated array. It therefore has undefined behaviour.
Undefined behaviour means that the C++ standard imposes no requirements on what happens. Therefore, your implementation (compiler and standard library, in this case) will be equally correct if your program SEEMS to work properly (as it does in your case), produces a run time error, trashes your system drive, or anything else.
In practice, all that is happening is that your code is writing to memory, and later reading from that memory, past the end of the allocated memory block. What happens depends on what is actually in that memory location. In your case, whatever happens to be in that memory location is able to be modified (in the loop) or read (in order to print to std::cout).
Conclusion: the explanation is not that new[] over-allocates. It is that your code has undefined behaviour, so can seem to work anyway.

What happens when assigning a value to non-allocated memory?

int main()
{
char* p = new char('a');
*reinterpret_cast<int*>(p) = 43523;
return 0;
}
This code runs fine but how safe is it? It should have only allocated one byte of memory but it doesn't seem to be having any problem filling 4 bytes with data. What can happen with the other 3 bytes that are not allocated?

char* p = new char('a');
ASCII code of 'a' is 97, so this is equivalent to:
char* p = new char(97);
Single character (1 byte) space is allocated with *p='a'
Now, you are trying to put more than 1-byte value, that's certainly risky, even if this works, I mean runs without any segmentation fault. You are overwriting some other parts of memory that you don't own or even if own, must be for some other purpose.

The code is not safe. Your program has undefined behaviour. Anything at all could happen.

The code causes heap buffer overflow. You're overriding memory which you don't own and can be in use in other parts of the program.

C++ stack and heap corruption

I was recently reading about stack & heap corruption in C & C++. The author of the website demonstrates stack corruption using below example.
#include<stdio.h>
int main(void)
{
int b = 10;
int a[3];
a[0] = 1;
a[1] = 2;
a[2] = 3;
printf(" b = %d \n",b);
a[3] = 12; // oops it is invalid, behaviour is undefined
printf(" b = %d \n",b);
printf("address of b= %x\n",&b);
printf("address of a[3]= %x\n",&a[3]);
return 0;
}
I tested above program on visual studio 2010 compiler (VC++) & it gives me runtime error that says:
stack around variable a gets corrupted
Now my question: is stack corrupted for lifetime or it is only for the time during when above erroneous program was being executed?
Same way, I know that deleting same pointer twice might do really bad things like heap corruption.
The following code:
int* p=new int();
delete p;
delete p; // oops disaster here, undefined behaviour
When the above code fragment executes the VC++ shows heap corruption error at runtime.

It is Undefined Behaviour. You cannot know what will happen if you do 'forbidden' things. You have no guarantee that your program will work well.

You have to be careful with terminology here. Will the stack be "corrupted" for the remainder of your program's life? It may be; it may not be. In this instance you've only corrupted data within the current stack frame, so once you're out of that function call, in practice your "corruption" will have gone.
But that's not quite the whole story. Since you've overwritten a variable with bytes that aren't supposed to be there, what knock-on effects might that have on your program? The consequences of this memory corruption could feasibly be logically passed on to other function scopes, or even other computers if you're sending this data over a network connection and the data is no longer in the expected form. (Typically, your data protocol will have safety features built into it to detect and discard unexpected forms of data; but, that's up to you.)
The same is true of heap corruption. Any time you overwrite the bytes of something that is not supposed to be overwritten, and any time you do so with arbitrary or unknowable data, you run the risk of potentially catastrophic consequences that may logically last well beyond the lifetime of your program.
Within the scope of C++ as a language, this condition is summed up in a specific phrase: undefined behaviour. It states that you can't really rely on anything at all after you've corrupted your memory. Once you've invoked UB, all bets are off.
The one guarantee that you usually have in practice is that your OS will not allow you to directly overwrite any memory that does not belong to your program. That is, corrupting the memory of other processes or of the OS itself is very difficult. The memory model of modern OSs is deliberately designed that way in order to keep programs isolated and prevent this kind of damage from broken programs and/or viruses.

C++ as well as C does not have array boundary overflow or underflow check. However, you can abstract out, you may define an array with overloaded index operator (operator []) where you can check for array index out of bounds and act accordingly. When you delete a pointer using delete ptr (when ptr is allocated through new), the space allocated before is returned back to heap space, however the value of the pointer becomes same as before. So, it;s a good programming practice that you should make the ptr NULL after delete, e.g.
int* p=new int();
...
if (p) {
delete p;
p = (int *) NULL;
}
// double deletion is prevented, and ptr us not dangling any more
if (p) {
delete p;
p = (int *) NULL;
}
However, stack or heap corruption, if at all, lies confined within the program space and when the program terminates, normally or abnormally, all memory occupies are released back to the operating system

Why Does Array With 1 element Allow 2k elements? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why don’t i get “Segmentation Fault”?
Why does this code work? If the first element only hold the first characer, then where are the rest of characters being stored? And if this is possible, why aren't we using this method?
Notice line 11: static char c[1]. Using one element, you can store as much characters as you want. I use static to keep the memory location alive outside of the function when pointing to it later.
#include <stdio.h>
void PutString( const char* pChar ){
for( ; *pChar != 0; pChar++ )
{
putchar( *pChar );
}
}
char* GetString(){
static char c[1];
int i = 0;
do
{
c[i] = getchar();
}while( c[i++] != '\n' );
c[i] = '\0';
return c;
}
void main(){
PutString( "Enter some text: " );
char* pChar = GetString();
PutString( "You typed the following:\n" );
PutString( pChar );
}

C doesn't check for array boundaries, so no error is thrown. However, characters after the first one will be stored in memory not allocated by the program. If the string is short, this may work, but a long enough string will corrupt enough memory to crash the process.

You can write wherever you want:
char *bad = 0xABCDEF00;
bad[0] = 'A';
But you shouldn't. Who knows what the above lines of code will do? In the very best case, your program will crash. In the worst case, you've corrupted memory and won't find out until much later. (And good luck tracking down the source!)
To answer your specific questions, it doesn't "work". The rest of the characters are stored directly after the array.

You are just very (un)lucky, that you are not overwriting some other data structures. The array definitely cannot store as much characters as you want - sooner or later you either silently corrupt your memory (in the worse case), or hit a segfault by accesing a memory your process hasn't mapped. The fact that it works is likely because the compiler didn't place any other data after your c[1]. Just try to add a second array, let's say static char d[1]; after c, and then try reading from it - you'll see the second character from c.

C++ does not do bounds checking on arrays. That's for performance reasons; checking every array index to see if it's outside the bounds would incur an unacceptable runtime overhead. Avoiding overhead has always been a design goal of C++.
If you want bounds checking, you should use std::vector instead, which does provides it as an optional feature through std::vector::at().

In this case the behavior is undefined: according to the compiler / the current state of the memory / ..., it may seems to run fine, it may write corrupted chars, or it may crash because of a sefault.
Linking against Electric Fence or running in valgrind may help to find such errors at runtime.

When does the space occupied by a variable get deallocated in c++?

Doesn't the space occupied by a variable get deallocated as soon as the control is returned from the function??
I thought it got deallocated.
Here I have written a function which is working fine even after returning a local reference of an array from function CoinDenom,Using it to print the result of minimum number of coins required to denominate a sum.
How is it able to print the right answer if the space got deallocated??
int* CoinDenom(int CoinVal[],int NumCoins,int Sum) {
int min[Sum+1];
int i,j;
min[0]=0;
for(i=1;i<=Sum;i++) {
min[i]=INT_MAX;
}
for(i=1;i<=Sum;i++) {
for(j=0;j< NumCoins;j++) {
if(CoinVal[j]<=i && min[i-CoinVal[j]]+1<min[i]) {
min[i]=min[i-CoinVal[j]]+1;
}
}
}
return min; //returning address of a local array
}
int main() {
int Coins[50],Num,Sum,*min;
cout<<"Enter Sum:";
cin>>Sum;
cout<<"Enter Number of coins :";
cin>>Num;
cout<<"Enter Values";
for(int i=0;i<Num;i++) {
cin>>Coins[i];
}
min=CoinDenom(Coins,Num,Sum);
cout<<"Min Coins required are:"<< min[Sum];
return 0;
}

The contents of the memory taken by local variables is undefined after the function returns, but in practice it'll stay unchanged until something actively changes it.
If you change your code to do some significant work between populating that memory and then using it, you'll see it fail.

What you have posted is not C++ code - the following is illegal in C++:
int min[Sum+1];
But in general, your program exhibits undefined behaviour. That means anything could happen - it could even appear to work.

The space is "deallocated" when the function returns - but that doesn't mean the data isn't still there in memory. The data will still be on the stack until some other function overwrites it. That is why these kinds of bugs are so tricky - sometimes it'll work just fine (until all the sudden it doesn't)

You need to allocate memory on the heap for return variable.
int* CoinDenom(int CoinVal[],int NumCoins,int Sum) {
int *min= new int[Sum+1];
int i,j;
min[0]=0;
for(i=1;i<=Sum;i++) {
min[i]=INT_MAX;
}
for(i=1;i<=Sum;i++) {
for(j=0;j< NumCoins;j++) {
if(CoinVal[j]<=i && min[i-CoinVal[j]]+1<min[i]) {
min[i]=min[i-CoinVal[j]]+1;
}
}
}
return min; //returning address of a local array
}
min=CoinDenom(Coins,Num,Sum);
cout<<"Min Coins required are:"<< min[Sum];
delete[] min;
return 0;
In your case you able to see the correct values only, because no one tried to change it. In general this is unpredictable situation.

That array is on the stack, which in most implementations, is a pre-allocated contiguous block of memory. You have a stack pointer that points to the top of the stack, and growing the stack means just moving the pointer along it.
When the function returned, the stack pointer was set back, but the memory is still there and if you have a pointer to it, you could access it, but it's not legal to do so -- nothing will stop you, though. The memory values in the array's old space will change the next time the stack depth runs over the area where the array is.

The variable you use for the array is allocated on stack and stack is fully available to the program - the space is not blocked or otherwise hidden.
It is deallocated in the sense that it can be reused later for other function calls and in the sense that destructors get called for variables allocated there. Destructors for integers are trivial and don't do anything. That's why you can access it and it can happen that the data has not been overwritten yet and you can read it.

The answer is that there's a difference between what the language standard allows, and what turns out to work (in this case) because of how the specific implementation works.
The standard says that the memory is no longer used, and so must not be referenced.
In practice, local variables on the stack. The stack memory is not freed until the application terminates, which means you'll never get an access violation/segmentation fault for writing to stack memory. But you're still violating the rules of C++, and it won't always work. The compiler is free to overwrite it at any time.
In your case, the array data has simply not been overwritten by anything else yet, so your code appears to work. Call another function, and the data gets overwritten.

How is it able to print the right answer if the space got deallocated??
When memory is deallocated, it still exists, but it may be reused for something else.
In your example, the array has been deallocated but its memory hasn't yet been reused, so its contents haven't yet been overwritten with other values, which is why you're still able from it the values that you wrote.
The fact that it won't have been reused yet is not guaranteed; and the fact that you can even read from it at all after it's deallocated is also not guaranteed: so don't do it.

This might or might not work, behaviour is undefined and it's definitely wrong to do it like this. Most compilers also give a compiler warning, for example GCC:
test.cpp:8: warning: address of local variable `min' returned

Memory is like clay that never hardens. Allocating memory is like taking some clay out of the clay pot. Maybe you make a cat and a cheeseburger. When you have finished, you deallocate the clay by putting your figures back into the pot, but just being put into the pot does not make them lose their shape: if you or someone else looks into the pot, they will continue to observe your cat and cheeseburger sitting on the top of the clay stack until someone else comes along and makes them into something else.
The NAND gates in the memory chips are the clay, the strata that holds the NAND gates is the clay pot, and the particular voltages that represent the value of your variables are your sculptures. Those voltages do not change just because your program has taken them off the list of things it cares about.

You need to understand the stack. Add this function,
void f()
{
int a[5000];
memset( a, 0, sizeof(a) );
}
and then call it immediately after calling CoinDenom() but before writing to cout. You'll find that it no longer works.
Your local variables are stored on the stack. CoinDenom() returns a memory address that points into the stack. Very simplified and leaving out lots of details, say the stack pointer is pointing to 0x1000 just before you call CoinDenom. An int* (Coins) is pushed on the stack. This becomes CoinVal[]. Then an int, Num which becomes NumCoins. Then another int, Sum which becomes Sum. Thats 3 ints at 4 bytes/int. Then space for the local variables:
int min[Sum+1];
int i,j;
which would be (Sum + 3) * 4 bytes/int. Say Sum = 2, that gives us another 20 bytes total, so the stack pointer gets incremented by 32 bytes to 0x1020. (All of main's locals are below 0x1000 on the stack.) min is going to point to 0x100c. When CoinDenom() returns, the stack pointer is decremented "freeing" that memory but unless another function is called to have it's own locals allocated in that memory, nothing's going to happen to change what's stored in that memory.
See http://en.wikipedia.org/wiki/Calling_convention for more detail on how the stack is managed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js