Confusion about stack growth and addressing - c++

I am trying to better understand items on a stack and how they are addressed. The article I found here seems to indicate that when MIPS stack is initialized, a fixed amount of memory is allocated and the stack grows down to the stack limit which would appear to be smaller addresses. I would assume that based on this logic a stack overflow would occur when 0x0000 was traversed?
I realize MIPS is big endian, but does that change how the stack grows? I wrote what I believed would be a quick way to observe this on an x86_64 machine, but the stack appears to grow up, as I originally assumed it did.
#include <iostream>
#include <vector>
int main() {
std::vector<int*> v;
for( int i = 0; i < 10; i++ ) {
v.push_back(new int);
std::cout << v.back() << std::endl;
}
}
I'm also confused by the fact that not all of the memory address's do not appear to be contiguous, which makes me think I did something stupid. Could somebody please clarify?

The stack on x86 machines also grows downwards. Endianness is unrelated to the direction in which the stack grows.
The stack of a machine has absolutely nothing to do with std::vector<>. Also, new int allocates heap memory, so it tells you absolutely nothing about the stack.
In order to see in which direction the stack grows, you need to do something like this:
recursive( 5 );
void recursive( int n )
{
if( n == 0 )
return;
int a;
printf( "%p\n", &a );
recursive( n - 1 );
}
(Note that if your compiler is smart enough to optimize tail recursion, then you will need to tell it to not optimize it, otherwise the observations will be all wrong.)

essentially there are 3 types of memory you use in programming: static, dynamic/heap and stack.
Static memory is pre-allocated by the compiler and consist of the constants and variables declared statically in your program.
Heap is the memory which you can freely allocate and release
Stack is the memory which gets allocated for all local variables declared in a function. This is important because every time you call the function a new memory for its variables is allocated. So, that every call to a function will assure that it has its own unique copy of the variables. And every time you return from the function the memory gets freed.
It absolutely does not matter how the stack is managed as soon as it follows the above rules. It is convenient however to have program memory to be allocated in the lower address space and grow up, and the stack to start from a top memory space and grow down. Most systems implement this scheme.
In general there is a stack pointer register/variable which points so the current stack address. when a function gets called it decrease this address by the number of bytes it needs for its variables. when it calls the next function, this new one will start with the new pointer already decreased by the caller. When the function returns it restores the pointer which it started from.
There could be different schemes but as far as I know, mips and i86 follow this one.
And essentially there is only one virtual memory space in the program. This is up to the operating system and/or compiler how to use it. The compiler will split the memory in the logical regions for its own use and handle them, hopefully, according to the calling conventions defined in the platform documents.
So, in our example, v and i are allocated on the function stack. cout is static. every new int allocate space in heap. v is not a simple variable but a struct which contains fields which it needs to manage the list. it needs space for all these internals. So, every push_back modifies those fields to point to the allocated 'int' in some way. push_back() and back() are function calls and allocate their own stacks for internal variables to not interfere with the top function.

Related

stack overflow eror in c++ [duplicate]

I am using Dev C++ to write a simulation program. For it, I need to declare a single dimensional array with the data type double. It contains 4200000 elements - like double n[4200000].
The compiler shows no error, but the program exits on execution. I have checked, and the program executes just fine for an array having 5000 elements.
Now, I know that declaring such a large array on the stack is not recommended. However, the thing is that the simulation requires me to call specific elements from the array multiple times - for example, I might need the value of n[234] or n[46664] for a given calculation. Therefore, I need an array in which it is easier to sift through elements.
Is there a way I can declare this array on the stack?
No there is no(we'll say "reasonable") way to declare this array on the stack. You can however declare the pointer on the stack, and set aside a bit of memory on the heap.
double *n = new double[4200000];
accessing n[234] of this, should be no quicker than accessing n[234] of an array that you declared like this:
double n[500];
Or even better, you could use vectors
std::vector<int> someElements(4200000);
someElements[234];//Is equally fast as our n[234] from other examples, if you optimize (-O3) and the difference on small programs is negligible if you don't(+5%)
Which if you optimize with -O3, is just as fast as an array, and much safer. As with the
double *n = new double[4200000];
solution you will leak memory unless you do this:
delete[] n;
And with exceptions and various things, this is a very unsafe way of doing things.
You can increase your stack size. Try adding these options to your link flags:
-Wl,--stack,36000000
It might be too large though (I'm not sure if Windows places an upper limit on stack size.) In reality though, you shouldn't do that even if it works. Use dynamic memory allocation, as pointed out in the other answers.
(Weird, writing an answer and hoping it won't get accepted... :-P)
Yes, you can declare this array on the stack (with a little extra work), but it is not wise.
There is no justifiable reason why the array has to live on the stack.
The overhead of dynamically allocating a single array once is neglegible (you could say "zero"), and a smart pointer will safely take care of not leaking memory, if that is your concern.
Stack allocated memory is not in any way different from heap allocated memory (apart from some caching effects for small objects, but these do not apply here).
Insofar, just don't do it.
If you insist that you must allocate the array on the stack, you will need to reserve 32 megabytes of stack space first (preferrably a bit more). For that, using Dev-C++ (which presumes Windows+MingW) you will either need to set the reserved stack size for your executable using compiler flags such as -Wl,--stack,34000000 (this reserves somewhat more than 32MiB), or create a thread (which lets you specify a reserved stack size for that thread).
But really, again, just don't do that. There's nothing wrong with allocating a huge array dynamically.
Are there any reasons you want this on the stack specifically?
I'm asking because the following will give you a construct that can be used in a similar way (especially accessing values using array[index]), but it is a lot less limited in size (total max size depending on 32bit/64bit memory model and available memory (RAM and swap memory)) because it is allocated from the heap.
int arraysize= 4200000;
int *heaparray= new int[arraysize];
...
k= heaparray[456];
...
delete [] heaparray;
return;

Large number of null variables with the total size exceeding the memory in C

I have a very basic question related to NULL variables in C. Consider a hypothetical 64-bit system with very limited memory say 4KB and with a large number of integer pointers all set to NULL, such that the total size exceeds the available memory. Will such a program compile and execute?
Assume that the program doesn't have to do anything meaningful, just do declarations to a bunch of null integer pointers(of the sort int *x = NULL) and terminate.
Even though you did this:
int *x = NULL;
memory is still allocated for storing the pointer x (despite there being NULL on the right hand side). Memory in such case, if x is automatic variable was allocated on the stack.
If you had used malloc on the right hand side you would additionally have claimed memory from the heap.
Now if you create many such pointers which will exceed available stack memory you will get stack overflow on run time - but if you don't use these pointers they might as well get optimized away.
If you declare but don't use a variable which has no side effects the compiler will optimize it out of existence. So no, this is not a way to go out of memory.
If you don't have optimizations turned on, you could create enough variables on the stack to cause a stack overflow. You could also just create a really big array on the stack.
That said, it's quite easy to run out of memory, and you don't need to do it with copious quantities of int pointers. No matter how you manage to run out of memory, it won't stop you from compiling the program successfully.

Why can't we allocate dynamic memory on the stack?

Allocating stuff on the stack is awesome because than we have RAII and don't have to worry about memory leaks and such. However sometimes we must allocate on the heap:
If the data is really big (recommended) - because the stack is small.
If the size of the data to be allocated is only known at runtime (dynamic allocation).
Two questions:
Why can't we allocate dynamic memory (i.e. memory of size that is
only known at runtime) on the stack?
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable? I.e. Thing t;.
Edit: I know some compilers support Variable Length Arrays - which is dynamically allocated stack memory. But that's really an exception to the general rule. I'm interested in understanding the fundamental reasons for why generally, we can't allocate dynamic memory on the stack - the technical reasons for it and the rational behind it.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
It's more complicated to achieve this. The size of each stack frame is burned-in to your compiled program as a consequence of the sort of instructions the finished executable needs to contain in order to work. The layout and whatnot of your function-local variables, for example, is literally hard-coded into your program through the register and memory addresses it describes in its low-level assembly code: "variables" don't actually exist in the executable. To let the quantity and size of these "variables" change between compilation runs greatly complicates this process, though it's not completely impossible (as you've discovered, with non-standard variable-length arrays).
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable
This is just a consequence of the syntax. C++'s "normal" variables happen to be those with automatic or static storage duration. The designers of the language could technically have made it so that you can write something like Thing t = new Thing and just use a t all day, but they did not; again, this would have been more difficult to implement. How do you distinguish between the different types of objects, then? Remember, your compiled executable has to remember to auto-destruct one kind and not the other.
I'd love to go into the details of precisely why and why not these things are difficult, as I believe that's what you're after here. Unfortunately, my knowledge of assembly is too limited.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
Technically, this is possible. But not approved by the C++ standard. Variable length arrays(VLA) allows you to create dynamic size constructs on stack memory. Most compilers allow this as compiler extension.
example:
int array[n];
//where n is only known at run-time
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable? I.e. Thing t;.
We can. Whether you do it or not depends on implementation details of a particular task at hand.
example:
int i;
int *ptr = &i;
We can allocate variable length space dynamically on stack memory by using function _alloca. This function allocates memory from the program stack. It simply takes number of bytes to be allocated and return void* to the allocated space just as malloc call. This allocated memory will be freed automatically on function exit.
So it need not to be freed explicitly. One has to keep in mind about allocation size here, as stack overflow exception may occur. Stack overflow exception handling can be used for such calls. In case of stack overflow exception one can use _resetstkoflw() to restore it back.
So our new code with _alloca would be :
int NewFunctionA()
{
char* pszLineBuffer = (char*) _alloca(1024*sizeof(char));
…..
// Program logic
….
//no need to free szLineBuffer
return 1;
}
Every variable that has a name, after compilation, becomes a dereferenced pointer whose address value is computed by adding (depending on the platform, may be "subtracting"...) an "offset value" to a stack-pointer (a register that contains the address the stack actually is reaching: usually "current function return address" is stored there).
int i,j,k;
becomes
(SP-12) ;i
(SP-8) ;j
(SP-4) ;k
To let this "sum" to be efficient, the offsets have to be constant, so that they can be encode directly in the instruction op-code:
k=i+j;
become
MOV (SP-12),A; i-->>A
ADD A,(SP-8) ; A+=j
MOV A,(SP-4) ; A-->>k
You see here how 4,8 and 12 are now "code", not "data".
That implies that a variable that comes after another requires that "other" to retain a fixed compile-time defined size.
Dynamically declared arrays can be an exception, but they can only be that last variable of a function. Otherwise, all the variables that follows will have an offset that have to be adjusted run-time after that array allocation.
This creates the complication that dereferencing the addresses requires arithmetic (not just a plain offset) or the capability to modify the opcode as variables are declared (self modifying code).
Both the solution becomes sub-optimal in term of performance, since all can break the locality of the addressing, or add more calculation for each variable access.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
You can with Microsoft compilers using _alloca() or _malloca(). For gcc, it's alloca()
I'm not sure it's part of the C / C++ standards, but variations of alloca() are included with many compilers. If you need aligned allocation, such a "n" bytes of memory starting on a "m" byte boundary (where m is a power of 2), you can allocate n+m bytes of memory, add m to the pointer and mask off the lower bits. Example to allocate hex 1000 bytes of memory on a hex 100 boundary. You don't need to preserve the value returned by _alloca() since it's stack memory and automatically freed when the function exits.
char *p;
p = _alloca(0x1000+0x100);
(size_t)p = ((size_t)0x100 + (size_t)p) & ~(size_t)0xff;
Most important reason is that Memory used can be deallocated in any order but stack requires deallocation of memory in a fixed order i.e LIFO order.Hence practically it would be difficult to implement this.
Virtual memory is a virtualization of memory, meaning that it behaves as the resource it is virtualizing (memory). In a system, each process has a different virtual memory space:
32-bits programs: 2^32 bytes (4 Gigabytes)
64-bits programs: 2^64 bytes (16 Exabytes)
Because virtual space is so big, only some regions of that virtual space are usable (meaning that only some regions can be read/written just as if it were real memory). Virtual memory regions are initialized and made usable through mapping. Virtual memory does not consume resources and can be considered unlimited (for 64-bits programs) BUT usable (mapped) virtual memory is limited and use up resources.
For every process, some mapping is done by the kernel and other by the user code. For example, before even the code start executing, the kernel maps specific regions of the virtual memory space of a process for the code instructions, global variables, shared libraries, the stack space... etc. The user code uses dynamic allocation (allocation wrappers such as malloc and free), or garbage collectors (automatic allocation) to manage the virtual memory mapping at application-level (for example, if there is no enough free usable virtual memory available when calling malloc, new virtual memory is automatically mapped).
You should differentiate between mapped virtual memory (the total size of the stack, the total current size of the heap...) and allocated virtual memory (the part of the heap that malloc explicitly told the program that can be used)
Regarding this, I reinterpret your first question as:
Why can't we save dynamic data (i.e. data whose size is only known at runtime) on the stack?
First, as other have said, it is possible: Variable Length Arrays is just that (at least in C, I figure also in C++). However, it has some technical drawbacks and maybe that's the reason why it is an exception:
The size of the stack used by a function became unknown at compile time, this adds complexity to stack management, additional register (variables) must be used and it may impede some compiler optimizations.
The stack is mapped at the beginning of the process and it has a fixed size. That size should be increased greatly if variable-size-data is going to be placed there by default. Programs that do not make extensive use of the stack would waste usable virtual memory.
Additionally, data saved on the stack must be saved and deleted in Last-In-First-Out order, which is perfect for local variables within functions but unsuitable if we need a more flexible approach.
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable?
As this answer explains, we can.
Read a bit about Turing Machines to understand why things are the way they are. Everything was built around them as the starting point.
https://en.wikipedia.org/wiki/Turing_machine
Anything outside of this is technically an abomination and a hack.

unable to initialize double array

I have 4 gig RAM and the following lines that throws a stackoverflow exception:
int main()
{
double X[4096*512], Y[4096*512], Z[4096*512];
return 0;
}
each double takes 8 bytes space, so my three arrays should be 3*4096*512*8/1024/1024 = 48 Mbyte big, can somebody explain the error or is 48 Mbyte too much to handle?
You are declaring in the stack, normally the stack in OS are limited (eg: 1MB), you could expanded when compiling (eg: in GCC use -Wl,stack_size,134217728 128Mb) but don't recommend.
Better use std::vector<double>.
#include <vector>
int main() {
std::vector<double> X(4096*512), Y(4096*512), Z(4096*512);
return 0;
}
If you want to avoid the overhead of std::vector, you can allocate the arrays on the heap
double *myArray = new double[SIZE];
and remember to free them.
delete [] myArray;
Expanding my comment (not a thorough description, you should do more research), there are two types of memory, the "stack" and the "heap". The stack is the local working memory of a piece of software, while the heap is the big pool. The stack holds all the local variables of a function call. Anything local declared within the function will be stored on the stack. The stack had a nifty property that it can be "pushed": when we call the next function, we go further down the stack and start over. But this means that the amount of stack memory needs to be generally reserved for the lifetime of a program, so we limit it to a small amount: general 1 to a few megabytes.
When you run out of stack memory, you get a "Stack Overflow" ("Now thats what that means!")
The heap is kindof the rest of the memory. The heap is where the program stores any dynamic memory and (possibly, it can be more complicated) global variables. The heap is where things like malloc and new puts its memory.
Whenever you declare a local variable, it is stored on the stack. This isn't a problem for small variables, but arrays like the ones you have get HUGE easy.
If you don't want to worry about new or malloc you can use things like std::vector, which will put the large amount of data on the heap while giving you local variable semantics.
Again, this is "basic programming" so you should get really familiar with this subject.

What do C++ arrays init to?

So I can fix this manually so it isn't an urgent question but I thought it was really strange:
Here is the entirety of my code before the weird thing that happens:
int main(int argc, char** arg) {
int memory[100];
int loadCounter = 0;
bool getInput = true;
print_memory(memory);
and then some other unrelated stuff.
The print memory just prints the array which should've initialized to all zero's but instead the first few numbers are:
+1606636544 +32767 +1606418432 +32767 +1856227894 +1212071026 +1790564758 +813168429 +0000 +0000
(the plus and the filler zeros are just for formatting since all the numbers are supposed to be from 0-1000 once the array is filled. The rest of the list is zeros)
It also isn't memory leaking because I tried initializing a different array variable and on the first run it also gave me a ton of weird numbers. Why is this happening?
Since you asked "What do C++ arrays init to?", the answer is they init to whatever happens to be in the memory they have been allocated at the time they come into scope.
I.e. they are not initialized.
Do note that some compilers will initialize stack variables to zero in debug builds; this can lead to nasty, randomly occurring issues once you start doing release builds.
The array you are using is stack allocated:
int memory[100];
When the particular function scope exits (In this case main) or returns, the memory will be reclaimed and it will not leak. This is how stack allocated memory works. In this case you allocated 100 integers (32 bits each on my compiler) on the stack as opposed to on the heap. A heap allocation is just somewhere else in memory hopefully far far away from the stack. Anyways, heap allocated memory has a chance for leaking. Low level Plain Old Data allocated on the stack (like you wrote in your code) won't leak.
The reason you got random values in your function was probably because you didn't initialize the data in the 'memory' array of integers. In release mode the application or the C runtime (in windows at least) will not take care of initializing that memory to a known base value. So the memory that is in the array is memory left over from last time the stack was using that memory. It could be a few milli-seconds old (most likely) to a few seconds old (less likely) to a few minutes old (way less likely). Anyways, it's considered garbage memory and it's to be avoided at all costs.
The problem is we don't know what is in your function called print_memory. But if that function doesn't alter the memory in any ways, than that would explain why you are getting seemingly random values. You need to initialize those values to something first before using them. I like to declare my stack based buffers like this:
int memory[100] = {0};
That's a shortcut for the compiler to fill the entire array with zero's.
It works for strings and any other basic data type too:
char MyName[100] = {0};
float NoMoney[100] = {0};
Not sure what compiler you are using, but if you are using a microsoft compiler with visual studio you should be just fine.
In addition to other answers, consider this: What is an array?
In managed languages, such as Java or C#, you work with high-level abstractions. C and C++ don't provide abstractions (I mean hardware abstractions, not language abstractions like OO features). They are dessigned to work close to metal that is, the language uses the hardware directly (Memory in this case) without abstractions.
That means when you declare a local variable, int a for example, what the compiler does is to say "Ok, im going to interpret the chunk of memory [A,A + sizeof(int)] as an integer, which I call 'a'" (Where A is the offset between the beginning of that chunk and the start address of function's stack frame).
As you can see, the compiler only "assigns" memory-segments to variables. It does not do any "magic", like "creating" variables. You have to understand that your code is executed in a machine, and the machine has only a memory and a CPU. There is no magic.
So what is the value of a variable when the function execution starts? The value represented with the data which the chunk of memory of the variable has. Commonly, that data has no sense from our current point of view (Could be part of the data used previously by a string, for example), so when you access that variable you get extrange values. Thats what we call "garbage": Data previously written which has no sense in our context.
The same applies to an array: An array is only a bigger chunk of memory, with enough space to fit all the values of the array: [A,A + (length of the array)*sizeof(type of array elements)]. So as in the variable case, the memory contains garbage.
Commonly you want to initialize an array with a set of values during its declaration. You could achieve that using an initialiser list:
int array[] = {1,2,3,4};
In that case, the compiler adds code to the function to initialize the memory-chunk which the array is with that values.
Sidenote: Non-POD types and static storage
The things explained above only applies to POD types such as basic types and arrays of basic types. With non-POD types like classes the compiler adds calls to the constructor of the variables, which are designed to initialise the values (attributes) of a class instance.
In addition, even if you use POD types, if variables have static storage specification, the compiler initializes its memory with a default value, because static variables are allocated at program start.
the local variable on stack is not initialized in c/c++. c/c++ is designed to be fast so it doesn't zero stack on function calls.
Before main() runs, the language runtime sets up the environment. Exactly what it's doing you'd have to discover by breaking at the load module's entry point and watching the stack pointer, but at any rate your stack space on entering main is not guaranteed clean.
Anything that needs clean stack or malloc or new space gets to clean it itself. Plenty of things don't. C[++] isn't in the business of doing unnecessary things. In C++ a class object can have non-trivial constructors that run implicitly, those guarantee the object's set up for use, but arrays and plain scalars don't have constructors, if you want an inital value you have to declare an initializer.