What are some of the technical differences between memory that is allocated with the new operator and memory that is allocated via a simple variable declaration, such as int var? Does c++ have any form of automatic memory management?
In particular, I have a couple questions. First, since with dynamic memory you have to declare a pointer to store the address of the actual memory you work with, doesn't dynamic memory use more memory? I don't see why the pointer is necessary at all unless you're declaring an array.
Secondly, if I were to make a simple function such as this:
int myfunc() { int x = 2; int y = 3; return x+y; }
...And call it, would the memory allocated by the function be freed as soon as it's scope of existence has ended? What about with dynamic memory?
Note: This answer is way too long. I'll pare it down sometime. Meanwhile, comment if you can think of useful edits.
To answer your questions, we first need to define two areas of memory called the stack and the heap.
The stack
Imagine the stack as a stack of boxes. Each box represents the execution of a function. At the beginning, when main is called, there is one box sitting on the floor. Any local variables you define are in that box.
A simple example
int main(int argc, char * argv[])
{
int a = 3;
int b = 4;
return a + b;
}
In this case, you have one box on the floor with the variables argc (an integer), argv (a pointer to a char array), a (an integer), and b (an integer).
More than one box
int main(int argc, char * argv[])
{
int a = 3;
int b = 4;
return do_stuff(a, b);
}
int do_stuff(int a, int b)
{
int c = a + b;
c++;
return c;
}
Now, you have a box on the floor (for main) with argc, argv, a, and b. On top of that box, you have another box (for do_stuff) with a, b, and c.
This example illustrates two interesting effects.
As you probably know, a and b were passed-by-value. That's why there is a copy of those variables in the box for do_stuff.
Notice that you don't have to free or delete or anything for these variables. When your function returns, the box for that function is destroyed.
Box overflow
int main(int argc, char * argv[])
{
int a = 3;
int b = 4;
return do_stuff(a, b);
}
int do_stuff(int a, int b)
{
return do_stuff(a, b);
}
Here, you have a box on the floor (for main, as before). Then, you have a box (for do_stuff) with a and b. Then, you have another box (for do_stuff calling itself), again with a and b. And then another. And soon, you have a stack overflow.
Summary of the stack
Think of the stack as a stack of boxes. Each box represents a function executing, and that box contains the local variables defined in that function. When the function returns, that box is destroyed.
More technical stuff
Each "box" is officially called a stack frame.
Ever notice how your variables have "random" default values? When an old stack frame is "destroyed", it just stops being relevant. It doesn't get zeroed out or anything like that. The next time a stack frame uses that section of memory, you see bits of old stack frame in your local variables.
The heap
This is where dynamic memory allocation comes into play.
Imagine the heap as an endless green meadow of memory. When you call malloc or new, a block of memory is allocated in the heap. You are given a pointer to access this block of memory.
int main(int argc, char * argv[])
{
int * a = new int;
return *a;
}
Here, a new integer's worth of memory is allocated on the heap. You get a pointer named a that points to that memory.
a is a local variable, and so it is in main's "box"
Rationale for dynamic memory allocation
Sure, using dynamically allocated memory seems to waste a few bytes here and there for pointers. However, there are things that you just can't (easily) do without dynamic memory allocation.
Returning an array
int main(int argc, char * argv[])
{
int * intarray = create_array();
return intarray[0];
}
int * create_array()
{
int intarray[5];
intarray[0] = 0;
return intarray;
}
What happens here? You "return an array" in create_array. In actuality, you return a pointer, which just points to the part of the create_array "box" that contains the array. What happens when create_array returns? Its box is destroyed, and you can expect your array to become corrupt at any moment.
Instead, use dynamically allocated memory.
int main(int argc, char * argv[])
{
int * intarray = create_array();
int return_value = intarray[0];
delete[] intarray;
return return_value;
}
int * create_array()
{
int * intarray = new int[5];
intarray[0] = 0;
return intarray;
}
Because function returning does not modify the heap, your precious intarray escapes unscathed. Remember to delete[] it after you're done though.
Dynamic memory lives on the heap as opposed to the stack. The lifetime of dynamic memory is from the time of allocation, to the time of deallocation. With local variables, their lifetime is limited to the function / block they are defined in.
Regarding your question about the memory usage in the function, in your example the memory for your local variables would be freed at the end of the function. However, if the memory was dynamically allocated with new, it would not be automatically disposed, and you would be responsible for explicitly using delete to free the memory.
Regarding automatic memory management, the C++ Standard Library provides auto_ptr for this.
Memory allocated by "new" ends up on the heap.
Memory allocated in a function resides inside the function where the function is placed on the stack.
Read about stack vs heap allocation here: http://www-ee.eng.hawaii.edu/~tep/EE160/Book/chap14/subsection2.1.1.8.html
Memory allocated with the new operator is fetched from a memory section called "heap" while static allocations for variables are use a memory section shared with procedure/function-calls (the "stack").
You only need to worry about the dynamic memory allocations you made yourself with new, variables which are known at compile-time (defined in the source) are automatically freed at the end of their scope (end of function/procedure, block, ...).
The big difference between "dynamic" and "ordinary" memory was rather good reflected in the question itself.
Dynamic memory is not too good supported by C++ at all.
When you use dynamic memory, you are totally responsible for it by yourself. You have to allocate it. When you forget to do it and try to access it threw your pointer, you will have plenty off negative surprises. Also you have to free the memory -- and when you forget it by any way, you will have even more surprises. Such errors belong to the most difficult errors to find in C/C++ programms.
You need an extra pointer, since somehow you need access to your new memory. Some memory (if dynamic or not) is first of it nothing a programming language can handle. You need to have access to it. This is done by variables. But variables in languages like C++ are stored in "ordinary" memory. So you need to have "pointers" -- pointers are a form of indirection, that says "No, I am not the value you are searching for, but I point to it". Pointers are the only possibility in C++ to access dynamic memory.
By contrast, "ordinary" memory can be accessed directly, allocation and freeing is done automatically by the language itself.
Dynamic memory and pointers is the biggest source for problems in C++ -- but it is also a very mighty concept -- when you do it right, you can do much more then with ordinary memory.
That is also the reason, plenty of libraries have functions or whole modules for dealing with dynamic memory. The auto_ptr-example was also mentioned in a parallel answer, that tries to deal with the problem, that dynamic memory should be reliably released at the end of a method.
Normally you will use dynamic memory only in cases you really need it. You will not use it, to have a single integer variable, but to have arrays or build larger data structures in memory.
Related
I am new to C and C++. I understand that whenever a function is called, its variables get memory allocated on the stack, that includes the case where the variable happens to be a pointer that points to data allocated on the heap via malloc or new (but I heard it is not guaranteed that the storage allocated by malloc is 100% on the Heap, please correct me if I am wrong). For example,
Void fn(){
Member *p = new Member()
}
Or
Void fn() {
int *p = (int*) malloc( sizeof(int) * 10 );
}
Please correct if I am wrong, in both cases, variable p (which holds the address to the object allocated on the heap) is on the stack, and it points to the object on the heap.
So is it correct to say that all the variables we declare are on the stack even though they might point to something on the heap?
Let’s say the address of local variable pointer p is loaded at memory address 001, it has the address of the member object located on Heap, and that address is 002. We can draw a diagram like this.
If that is correct, my next question is, can we have a pointer that is actually located on the heap, and it points to a variable located on Stack? If it is not possible, can that pointer points to a variable located on Heap?
Maybe another way to phrase this question is: in order to access something in heap, we can only access it via pointers on the stack??
A possible diagram could look like this
If that is possible, Can I have an example here?
Yes, you can put your pointer on the free store (heap) and have it point to a variable on the stack. The trick is to create a pointer to a pointer (int**):
int main()
{
int i = 0; // int on the stack
int** ip = new int*; // create an int* (int pointer) on the free store (heap)
// ip (the int**) is still on the stack
*ip = &i;
// Now your free store (heap) located pointer points
// to your stack based variable i
delete ip; // clean up
}
NOTE: The terms "heap" and "stack" are general, well understood, computing terms. In C++ they are referred to in the Standard as the "free store" and (although not directly named) a "stack" is 100% implied (eg. through references to "stack-unwinding") and therefore required.
stack and heap are not specifically defined by the standard. Those are implementation details.
Heap refers to a data structure that many operating systems use to help them safely manage the allocated space for different programs running at the same time. Read more here
Here is a diagram for a simple heap so that you can have a mental model of it:
Keep in mind that this is not exactly what operating systems use. In fact, operating systems use a far more advanced form of the heap data structure that allows them to perform many sorts of complex memory-related tasks. Also, not every OS implements the free store using the heap data structure. Some may use different techniques.
Whereas a stack is much simpler:
can we have a pointer that is actually located on the heap, and it points to a variable located on Stack?
Yes, it's possible but rarely needed:
#include <iostream>
int main( )
{
int a_variable_on_stack { 5 };
int** ptr_on_stack { new int*( &a_variable_on_stack ) };
std::cout << "address of `a_variable_on_stack`: " << &a_variable_on_stack << '\n'
<< "address of ptr on the heap: " << ptr_on_stack << '\n'
<< "value of ptr on the heap: " << *ptr_on_stack << '\n';
std::cin.get( );
}
Possible output:
address of `a_variable_on_stack`: 0x47eb5ffd2c
address of ptr on the heap: 0x1de33cc3810
value of ptr on the heap: 0x47eb5ffd2c
Notice how the address of a_variable_on_stack and value of ptr stored on heap are both 0x47eb5ffd2c. In other words, a pointer on the heap is holding the address of a variable that is on the stack.
In short:
Variables declared within a function are allocated on the stack, and can point to whatever you want (to address of other variables on the stack and to address of other variables on the heap).
Same is for variables declared on the heap. They can point to address of other variables on the heap or to address of variables on the stack. There is no limitation here.
However, variables declared on the stack, are by nature temporary, and when function return this memory is reclaimed. Therefor it is not a good practice to have pointers to variable's address at the stack, unless you know the function did not finish yet (i.e. using local variables address from within the same function or by functions calls from within the same function). A common mistake of novice C/C++ developers, is to return from function, address of variable declared on the stack. When function returns, this memory is reclaimed and will be soon reused for other function calls memory, so accessing this address has undefined behavior.
I am new to C and C++.
Your question is not C or C++ specific, but it is about programming languages in general.
... whenever a function is called, its variables get memory allocated on the stack ...
This is correct: Nearly all compilers do it this way.
However, there are exceptions - for example on SPARC or TriCore CPUs, which have a special feature...
... allocated on the heap via malloc ...
malloc never allocates memory on the stack but on the heap.
... is not guaranteed that the storage allocated by malloc is 100% on the heap ...
Unlike the word "stack", the meaning of the word "heap" differs a bit from situation to situation.
In some cases, the word "heap" is used to specify a certain memory area that is used by malloc and new.
If there is not enough memory in that memory area, malloc (or new) asks the operating system for memory in a different memory area.
However, other people would also call that memory area "heap".
... in both cases, variable p is on the stack, and it points to the object on the heap.
This is correct.
... can we have a pointer that is actually located on the heap, and it points to a variable located on Stack?
Sure:
int ** allocatedMemory;
void myFunction()
{
int variableOnStack;
allocatedMemory = (int **)malloc(sizeof(int *));
*allocatedMemory = &variableOnStack;
...
}
The variable allocatedMemory points to some data on the heap and that data is a pointer to a variable (variableOnStack) on the stack.
However, when the function myFunction() returns, the variable variableOnStack does no longer exist. Let's say the function otherFunction() is called after myFunction():
void otherFunction()
{
int a;
int b;
...
}
Now we don't know if *allocatedMemory points to a, to b or even the "return address" because we don't know which of the two variables is stored at the same address as variableOnStack.
Bad things may happen if we write to **allocatedMemory now...
In order to access something in heap, we can only access it via pointers on the stack??
... diagram "B" ...
To access some data on the heap, you definitely need some pointer that is not stored on the heap.
This pointer can be:
A global or static variable
In my example above, allocatedMemory is a global variable.
Global and static variables are neither stored in a completely different memory area (not heap nor stack)
A local variable on the stack
A local variable in a CPU register
(I already wrote that local variables are not always stored on the stack)
Theoretically, the situation in diagram "B" is possible: Simply overwrite the variable allocatedMemory by NULL (or another pointer).
However, a program cannot directly access data on the heap.
This means that p* (which is some data on the heap) cannot be accessed any more if there is no more pointer "outside" the heap that points to p*.
Context
I was experimenting with getting C strings in C++ without allocating memory on the heap and came across this in testing:
#include <stddef.h>
#include <stdlib.h>
char* get_empty_c_string(size_t length) {
char buffer[length];
char *string = buffer;
for (size_t i = 0; i ^ length; i++) *(string + i) = '\0';
return string;
}
int main(void) {
char *string = get_empty_c_string(20u); // Allocated on heap?
// or stack?
return 0;
}
Question
Is the C string returned allocated on heap or stack?
As far as I know:
Heap allocation occurs with the calloc, malloc & realloc C standard functions or new & new[] C++ keywords.
Stack allocation in most other cases.
The array buffer is a variable length array (VLA), meaning its size is determined at runtime. As a variable local to a function is resides on the stack. The pointer string then points to that array, and that pointer is returned. And because the returned pointer points to a local stack variable which goes out of scope, attempting to use that pointer will invoke undefined behavior.
Also, note that VLAs are a C only feature.
There is no way in standard C++ to obtain runtime-sized memory of automatic storage duration (which usually maps to stack memory).
Therefore a proper string of any length cannot be obtained on the stack. You can only allocate a buffer with a maximal size and use strings up to that length in the program. (Something similar is usually done by std::string as so-called short string optimization.)
Furthermore, you cannot return pointers or references to variables with automatic storage duration from a function. When the function returns the variables are destroyed and the pointer/reference becomes invalid. You can only ever use the stack-allocation until the function returns. You can however return the variable by-value.
As #PaulMcKenzie points out, your implementation of get_empty_c_string() would fail to compile: In essence, arrays as temporary/instance variables of a function need to have a static size defined for them prior to compile time. This is because that volume of memory is pushed onto the stack at function invocation.
I can see that you're trying to have dynamic memory allocation as part of the function itself, which is why you need such heap-allocators.
I've always declared my arrays using this method:
bool array[256];
However, I've recently been told to declare my arrays using:
bool* array = new bool[256];
What is the difference and which is better? Honestly, I don't fully understand the second way, so an explanation on that would be helpful too.
bool array[256];
This allocates a bool array with automatic storage duration.
It will be automatically cleaned up when it goes out of scope.
In most implementations this would be allocated on the stack if it's not declared static or global.
Allocations/deallocations on the stack are computationally really cheap compared to the alternative. It also might have some advantages for data-locality but that's not something you usually have to worry about. But you might need to be careful of allocating many large arrays to avoid a stack overflow.
bool* array = new bool[256];
This allocates an array with dynamic storage duration.
You need to clean it up yourself with a call to delete[] later on. If you do not then you will leak memory.
Alternatively (as mentioned by #Fibbles) you can use smart-pointers to express the desired ownership/lifetime requirements. This will leave the responsibility of cleaning up to the smart-pointer class. Which helps a lot with guaranteeing deletion, even in cases of exceptions.
It has the advantage of being able to pass it to outer scopes and other objects without copying (RVO will avoid copying for the first case too in certain cases, but storing it as a data-member and other uses can't be optimized in the first case).
The first is allocation of memory on stack:
// inside main (or function, or non-static member of class) -> stack
int main() {
bool array[256];
}
or maybe as a static memory:
// outside main (and any function, or static member of class) -> static
bool array[256];
int main() {
}
The last is allocation of dynamic memory (in heap):
int main() {
bool* array = new bool[256];
delete[] array; // you should not forget to release memory allocated in heap
}
The advantage of dynamic memory is that it can be created with variable number of elements (not 256, but from some user input for example). But you should release it each time by yourself.
More about stack, static and heap memory and when you should use each is here: Stack, Static, and Heap in C++
The difference is static vs dynamic allocation, as previous answers have indicated. There are reasons for using one over the other. This video by Herb Sutter explains when you should use what. https://www.youtube.com/watch?v=JfmTagWcqoE It is just over 1 1/2 hours.
My preference is to use
bool array[256];
unless there's a reason to do otherwise.
Mike
I read about C++ dynamic memory allocation. Here is my code:
#include <iostream>
using namespace std;
int main()
{
int t;
cin>>t;
int a[t];
return 0;
}
What is the difference between the above and the following:
int* a=new(nothrow) int[t];
Use dynamic allocation:
when you need control over when an object is created and destroyed; or
when you need to create a local object that's too big to risk putting on the stack; or
when the size of a local array isn't a constant
To answer your specific question: int a[t]; isn't valid C++, since an array size must be constant. Some compilers allow such variable-length arrays as an extension, borrowed from C; but you shouldn't use them, unless you don't mind being tied to that compiler.
So you'd want dynamic allocation there, either the easy way, managed by RAII:
std::vector<int> a(t);
// use it, let it clean itself up when it goes out of scope
or the hard way, managed by juggling pointers and hoping you don't drop them:
int* a=new int[t];
// use it, hope nothing throws an exception or otherwise leaves the scope
delete [] a; // don't forget to delete it
Your first example is C99-compatible array allocations, which occur on the stack and whose lifetimes are similar to other local variables.
The allocation example is a typical C++ dynamic memory allocation, which occurs from the heap and whose lifetime extends until delete a[] is reached--without this code the memory is "leaked". The one-of-lifetime occurs with the variable is destructed by delete and can occur after the current local scope has ended.
I am trying to understand the difference between the stack and heap memory, and this question on SO as well as this explanation did a pretty good job explaining the basics.
In the second explanation however, I came across an example to which I have a specific question, the example is this:
It is explained that the object m is allocated on the heap, I am just wondering if this is the full story. According to my understanding, the object itself indeed is allocated on the heap as the new keyword has been used for its instantiation.
However, isn't it that the pointer to object m is on the same time allocated on the stack? Otherwise, how would the object itself, which of course is sitting in the heap be accessed. I feel like for the sake of completeness, this should have been mentioned in this tutorial, leaving it out causes a bit of confusion to me, so I hope someone can clear this up and tell me that I am right with my understanding that this example should have basically two statements that would have to say:
1. a pointer to object m has been allocated on the stack
2. the object m itself (so the data that it carries, as well as access to its methods) has been allocated on the heap
Your understanding may be correct, but the statements are wrong:
A pointer to object m has been allocated on the stack.
m is the pointer. It is on the stack. Perhaps you meant pointer to a Member object.
The object m itself (the data that it carries, as well as access to its methods) has been allocated on the heap.
Correct would be to say the object pointed by m is created on the heap
In general, any function/method local object and function parameters are created on the stack. Since m is a function local object, it is on the stack, but the object pointed to by m is on the heap.
"stack" and "heap" are general programming jargon. In particular , no storage is required to be managed internally via a stack or a heap data structure.
C++ has the following storage classes
static
automatic
dynamic
thread
Roughly, dynamic corresponds to "heap", and automatic corresponds to "stack".
Moving onto your question: a pointer can be created in any of these four storage classes; and objects being pointed to can also be in any of these storage classes. Some examples:
void func()
{
int *p = new int; // automatic pointer to dynamic object
int q; // automatic object
int *r = &q; // automatic pointer to automatic object
static int *s = p; // static pointer to dynamic object
static int *s = r; // static pointer to automatic object (bad idea)
thread_local int **t = &s; // thread pointer to static object
}
Named variables declared with no specifier are automatic if within a function, or static otherwise.
When you declare a variable in a function, it always goes on the stack. So your variable Member* m is created on the stack. Note that by itself, m is just a pointer; it doesn't point to anything. You can use it to point to an object on either the stack or heap, or to nothing at all.
Declaring a variable in a class or struct is different -- those go where ever the class or struct is instantiated.
To create something on the heap, you use new or std::malloc (or their variants). In your example, you create an object on the heap using new and assign its address to m. Objects on the heap need to be released to avoid memory leaks. If allocated using new, you need to use delete; if allocated using std::malloc, you need to use std::free. The better approach is usually to use a "smart pointer", which is an object that holds a pointer and has a destructor that releases it.
Yes, the pointer is allocated on the stack but the object that pointer points to is allocated on the heap. You're correct.
However, isn't it that the pointer to object m is on the same time
allocated on the stack?
I suppose you meant the Member object. The pointer is allocated on the stack and will last there for the entire duration of the function (or its scope). After that, the code might still work:
#include <iostream>
using namespace std;
struct Object {
int somedata;
};
Object** globalPtrToPtr; // This is into another area called
// "data segment", could be heap or stack
void function() {
Object* pointerOnTheStack = new Object;
globalPtrToPtr = &pointerOnTheStack;
cout << "*globalPtrToPtr = " << *globalPtrToPtr << endl;
} // pointerOnTheStack is NO LONGER valid after the function exits
int main() {
// This can give an access violation,
// a different value after the pointer destruction
// or even the same value as before, randomly - Undefined Behavior
cout << "*globalPtrToPtr = " << *globalPtrToPtr << endl;
return 0;
}
http://ideone.com/BwUVgm
The above code stores the address of a pointer residing on the stack (and leaks memory too because it doesn't free Object's allocated memory with delete).
Since after exiting the function the pointer is "destroyed" (i.e. its memory can be used for whatever pleases the program), you can no longer safely access it.
The above program can either: run properly, crash or give you a different result. Accessing freed or deallocated memory is called undefined behavior.