Stack overflow - static memory vs. dynamic memory

Stack overflow - static memory vs. dynamic memory - c++

If you write int m[1000000]; inside the main function of C/C++, it will get a runtime error for stack overflow. Instead if you write vector<int> m; and then push_back 1000000 elements there, it will run fine.
I am very curious about why this is happening. They both are local memory, aren't they? Thanks in advance.

Yes, the vector itself is an automatic (stack) object. But the vector holds a pointer to its contents (an internal dynamic array), and that will be allocated on the heap (by default). To simplify a little, you can think of vector as doing malloc/realloc or new[] calls internally (actually it uses an allocator).
EDIT: As I noted, automatic variables are allocated on the stack, while malloc generally allocates on the heap. The available memory for each are platform and even configuration-specific, but the available stack memory is typically much more limited.

The amount of stack memory is limited, because it has to be reserved in advance. However, the amount of heap memory will typically expend up to much higher limits imposed by your OS, but "nearly" reaching the limits of your virtual address space (2GB for a 32-bit machine, a whole lot more for a 64-bit machine).
You can increase the amount of reserved stack space, typically as a setting to your linker.

int m[1000000] - It will allocate the 1000000 ints on stack. as stack is limited so it will throw the stack overflow runtime error.
vector m; and then push_back of 1000000 elements are working because internally vector allocates the memory on heap not on stack. so in your application stack only the vector object is present so it is not throwing the stack overflow runtime error.

The vector object itself is on the stack; but internally it will allocate memory from the heap as needed to store an arbitary number of elements. So the stack cost is small and fixed for it.

Related

Why program crashes, when ifstream reads to a buffer greater than ~1MiB? [duplicate]

I'm having some troubles with a large array of vectors in C++.
Basicaly I wan't an array of 2 millions elements, and each elements is a vector<int> (It's for building an adjacency list).
So when I do vector<int> myList[10] it works great, but when i do vector<int> myList[2000000] it does not work and I don't know why.
I tried to do unsigned long int var = 2000000; vector<int> myList[var]; but still the same error. (I don't know what is the error, my program just crash)
If you have any idea,
Thanks

There's a big difference between heap and stack memory. The heap is the nice big space where you can dynamically allocate gigabytes of memory - the stack is much more constrained in terms of allocation size (and is determined at compile time).
If defining a local variable, that means it lives on the stack (like your array). With 2 million elements, that's at least 2MB being allocated (or assuming ~24B of stack usage per vector, more like 48MB), which is quite a lot for the stack, and likely causes the crash. Dynamically allocating an array of vectors (or preferably just allocating a vector of vectors) ensures that the bulk of the memory is being allocated from the heap, which might prevent this crash.
You can also increase the size of the stack using compiler flags, but that's generally not preferable to just dynamic allocation.

This is helpfull
Dynamically allocate memory for my_List. or
Declare your array of vector of int's(my_List) as a global variable and size a `const. Thier storage locations are by design big enough to allocate such large mermory size.
For local variable, the stack segment might be to small to allocate 2e6*24B.

When is heap memory prefered over stack memory

I know that local arrays are created on the stack, and have automatic storage duration, since they are destroyed when the function they're in ends. They necessarily have a fixed size:
{
int foo[16];
}
Arrays created with operator new[] have dynamic storage duration and are stored on the heap. They can have varying sizes.
{
const int size = 16;
int* foo = new int[size];
// do something with foo
delete[] foo;
}
The size of the stack is fixed and limited for every process.
My question is:
Is there a rule of thumb when to switch from stack memory to heap memory, in order to reduce the stack memory consumption?
Example:
double a[2] is perfectly reasoable;
double a[1000000000] will most likely result in a stack overflow, if the stack size is 1mb
Where is a reasonable limit to switch to dynamic allocation?

See this answer for a discussion about heap allocation.
Where is a reasonable limit to switch to dynamic allocation?
In several cases, including:
too large automatic variables. As a rule of thumb, I recommend avoiding call frames of more than a few kilobytes (and a call stack of more than a megabytes). That limit might be increased if you are sure that your function is not usable recursively. On many small embedded systems, the stack is much more limited (e.g. to a few kilobytes) so you need to limit even more each call frame (e.g. to only a hundred bytes). BTW, on some systems, you can increase the call stack limit much more (perhaps to several gigabytes), but this is also a sysadmin issue.
non LIFO allocation discipline, which happens quite often.
Notice that most C++ standard containers allocate their data in the heap, even if the container is on the stack. For example, an automatic variable of vector type, e.g. a local std::vector<double> autovec; has its data heap allocated (and released when the vector is destroyed). Read more about RAII.

Why can't we allocate dynamic memory on the stack?

Allocating stuff on the stack is awesome because than we have RAII and don't have to worry about memory leaks and such. However sometimes we must allocate on the heap:
If the data is really big (recommended) - because the stack is small.
If the size of the data to be allocated is only known at runtime (dynamic allocation).
Two questions:
Why can't we allocate dynamic memory (i.e. memory of size that is
only known at runtime) on the stack?
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable? I.e. Thing t;.
Edit: I know some compilers support Variable Length Arrays - which is dynamically allocated stack memory. But that's really an exception to the general rule. I'm interested in understanding the fundamental reasons for why generally, we can't allocate dynamic memory on the stack - the technical reasons for it and the rational behind it.

Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
It's more complicated to achieve this. The size of each stack frame is burned-in to your compiled program as a consequence of the sort of instructions the finished executable needs to contain in order to work. The layout and whatnot of your function-local variables, for example, is literally hard-coded into your program through the register and memory addresses it describes in its low-level assembly code: "variables" don't actually exist in the executable. To let the quantity and size of these "variables" change between compilation runs greatly complicates this process, though it's not completely impossible (as you've discovered, with non-standard variable-length arrays).
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable
This is just a consequence of the syntax. C++'s "normal" variables happen to be those with automatic or static storage duration. The designers of the language could technically have made it so that you can write something like Thing t = new Thing and just use a t all day, but they did not; again, this would have been more difficult to implement. How do you distinguish between the different types of objects, then? Remember, your compiled executable has to remember to auto-destruct one kind and not the other.
I'd love to go into the details of precisely why and why not these things are difficult, as I believe that's what you're after here. Unfortunately, my knowledge of assembly is too limited.

Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
Technically, this is possible. But not approved by the C++ standard. Variable length arrays(VLA) allows you to create dynamic size constructs on stack memory. Most compilers allow this as compiler extension.
example:
int array[n];
//where n is only known at run-time
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable? I.e. Thing t;.
We can. Whether you do it or not depends on implementation details of a particular task at hand.
example:
int i;
int *ptr = &i;

We can allocate variable length space dynamically on stack memory by using function _alloca. This function allocates memory from the program stack. It simply takes number of bytes to be allocated and return void* to the allocated space just as malloc call. This allocated memory will be freed automatically on function exit.
So it need not to be freed explicitly. One has to keep in mind about allocation size here, as stack overflow exception may occur. Stack overflow exception handling can be used for such calls. In case of stack overflow exception one can use _resetstkoflw() to restore it back.
So our new code with _alloca would be :
int NewFunctionA()
{
char* pszLineBuffer = (char*) _alloca(1024*sizeof(char));
…..
// Program logic
….
//no need to free szLineBuffer
return 1;
}

Every variable that has a name, after compilation, becomes a dereferenced pointer whose address value is computed by adding (depending on the platform, may be "subtracting"...) an "offset value" to a stack-pointer (a register that contains the address the stack actually is reaching: usually "current function return address" is stored there).
int i,j,k;
becomes
(SP-12) ;i
(SP-8) ;j
(SP-4) ;k
To let this "sum" to be efficient, the offsets have to be constant, so that they can be encode directly in the instruction op-code:
k=i+j;
become
MOV (SP-12),A; i-->>A
ADD A,(SP-8) ; A+=j
MOV A,(SP-4) ; A-->>k
You see here how 4,8 and 12 are now "code", not "data".
That implies that a variable that comes after another requires that "other" to retain a fixed compile-time defined size.
Dynamically declared arrays can be an exception, but they can only be that last variable of a function. Otherwise, all the variables that follows will have an offset that have to be adjusted run-time after that array allocation.
This creates the complication that dereferencing the addresses requires arithmetic (not just a plain offset) or the capability to modify the opcode as variables are declared (self modifying code).
Both the solution becomes sub-optimal in term of performance, since all can break the locality of the addressing, or add more calculation for each variable access.

Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
You can with Microsoft compilers using _alloca() or _malloca(). For gcc, it's alloca()
I'm not sure it's part of the C / C++ standards, but variations of alloca() are included with many compilers. If you need aligned allocation, such a "n" bytes of memory starting on a "m" byte boundary (where m is a power of 2), you can allocate n+m bytes of memory, add m to the pointer and mask off the lower bits. Example to allocate hex 1000 bytes of memory on a hex 100 boundary. You don't need to preserve the value returned by _alloca() since it's stack memory and automatically freed when the function exits.
char *p;
p = _alloca(0x1000+0x100);
(size_t)p = ((size_t)0x100 + (size_t)p) & ~(size_t)0xff;

Most important reason is that Memory used can be deallocated in any order but stack requires deallocation of memory in a fixed order i.e LIFO order.Hence practically it would be difficult to implement this.

Virtual memory is a virtualization of memory, meaning that it behaves as the resource it is virtualizing (memory). In a system, each process has a different virtual memory space:
32-bits programs: 2^32 bytes (4 Gigabytes)
64-bits programs: 2^64 bytes (16 Exabytes)
Because virtual space is so big, only some regions of that virtual space are usable (meaning that only some regions can be read/written just as if it were real memory). Virtual memory regions are initialized and made usable through mapping. Virtual memory does not consume resources and can be considered unlimited (for 64-bits programs) BUT usable (mapped) virtual memory is limited and use up resources.
For every process, some mapping is done by the kernel and other by the user code. For example, before even the code start executing, the kernel maps specific regions of the virtual memory space of a process for the code instructions, global variables, shared libraries, the stack space... etc. The user code uses dynamic allocation (allocation wrappers such as malloc and free), or garbage collectors (automatic allocation) to manage the virtual memory mapping at application-level (for example, if there is no enough free usable virtual memory available when calling malloc, new virtual memory is automatically mapped).
You should differentiate between mapped virtual memory (the total size of the stack, the total current size of the heap...) and allocated virtual memory (the part of the heap that malloc explicitly told the program that can be used)
Regarding this, I reinterpret your first question as:
Why can't we save dynamic data (i.e. data whose size is only known at runtime) on the stack?
First, as other have said, it is possible: Variable Length Arrays is just that (at least in C, I figure also in C++). However, it has some technical drawbacks and maybe that's the reason why it is an exception:
The size of the stack used by a function became unknown at compile time, this adds complexity to stack management, additional register (variables) must be used and it may impede some compiler optimizations.
The stack is mapped at the beginning of the process and it has a fixed size. That size should be increased greatly if variable-size-data is going to be placed there by default. Programs that do not make extensive use of the stack would waste usable virtual memory.
Additionally, data saved on the stack must be saved and deleted in Last-In-First-Out order, which is perfect for local variables within functions but unsuitable if we need a more flexible approach.
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable?
As this answer explains, we can.

Read a bit about Turing Machines to understand why things are the way they are. Everything was built around them as the starting point.
https://en.wikipedia.org/wiki/Turing_machine
Anything outside of this is technically an abomination and a hack.

Size of dynamic array vs static array

What is the maximum size of static array, and dynamic array? I think that there is no limit for dynamic array but why static arrays have a limited size?

Unhandled exception at 0x011164A7 in StackOverflow.exe: 0xC00000FD: Stack overflow (parameters: 0x00000000, 0x00482000)
This looks more like a runtime error. More precisely - stack overflow.
In most places the size of array is limited only by available memory. However, the limit on stack allocated objects is usually much more severe. By default, it's 1Mb on Windows and 8Mb on Linux. It looks like your array and other data already on the stack is taking more space than the limit.
There are few ways to avoid this error:
Make this array static or declare it at top level of your module. This way it will be allocated in .bss segment instead of stack.
Use malloc/new to explicitly allocate this array on heap.
Use C++ collections such as std::vector instead of arrays.
Increase stack size limit. On Linux this can be done with ulimit -s unlimited

The maximum size of an array is determined by the amount of memory that a program can access. On a 32-bit system, the maximum amount of memory that can be addressed by a pointer is 2^32 bytes which is 4 gigabytes. The actual limit may be less, depending on operating system implementation details.
Note that this has nothing to do with the amount of physical memory you have available. Even on a machine with substantially less than 1 GB of RAM, you can allocate a 2 GB array... it's just going to be slow, as most of the array will be in virtual memory, swapped out to disk.

C++ do stack memory allocation/deallocation defragement memory

C++ do stack memory allocation/deallocation defragement memory.
when i declare local memory variables, they are allocated and deallocated, does it makes memory fragemented?
this can be very important from memory point of view
how much memory is availabe for stack?
can I allocate
char sam[999999999999999];

No, memory is not fragmented. How much memory you can allocate on stack is implementation-defined, usually something around 1 megabyte.

Allocating builtins on the stack shouldn't result in fragmentation. However if you allocate something like string on the stack, while the stack itself won't get fragmented the string allocates memory on the heap which could wind up fragmented.
Generally the stack is extremely small compared to the heap - something like 1-64MB depending on your platform.

how much memory is available for stack?
It depends.
It depends on the compiler and it depends on the parameters you launch your binary with (since compilers may decide to defer the definition of the stack size to the runtime). It also depends on the OS and the available resources.
One point of interest, gcc is working on SplitStacks. A number of languages already offer this (Go for example), the idea is that the stack can then grow on demand. At this point, the limit becomes: how much can the OS allocate in one go ?
I haven't experimented it yet... don't even know if this is fully implemented.

By doing allocation and deallocation on stack memory will not be frgameneted as how much memory is required for a stack variable is known at the compile time itself. Regarding the maximum amount of memory you can allocate on stack depends on the compiler. By default, for VS compilers its 1 MB and can be chnaged through compiler options.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js