Proper stack and heap usage in C++? - c++

I've been programming for a while but It's been mostly Java and C#. I've never actually had to manage memory on my own. I recently began programming in C++ and I'm a little confused as to when I should store things on the stack and when to store them on the heap.
My understanding is that variables which are accessed very frequently should be stored on the stack and objects, rarely used variables, and large data structures should all be stored on the heap. Is this correct or am I incorrect?

No, the difference between stack and heap isn't performance. It's lifespan: any local variable inside a function (anything you do not malloc() or new) lives on the stack. It goes away when you return from the function. If you want something to live longer than the function that declared it, you must allocate it on the heap.
class Thingy;
Thingy* foo( )
{
int a; // this int lives on the stack
Thingy B; // this thingy lives on the stack and will be deleted when we return from foo
Thingy *pointerToB = &B; // this points to an address on the stack
Thingy *pointerToC = new Thingy(); // this makes a Thingy on the heap.
// pointerToC contains its address.
// this is safe: C lives on the heap and outlives foo().
// Whoever you pass this to must remember to delete it!
return pointerToC;
// this is NOT SAFE: B lives on the stack and will be deleted when foo() returns.
// whoever uses this returned pointer will probably cause a crash!
return pointerToB;
}
For a clearer understanding of what the stack is, come at it from the other end -- rather than try to understand what the stack does in terms of a high level language, look up "call stack" and "calling convention" and see what the machine really does when you call a function. Computer memory is just a series of addresses; "heap" and "stack" are inventions of the compiler.

I would say:
Store it on the stack, if you CAN.
Store it on the heap, if you NEED TO.
Therefore, prefer the stack to the heap. Some possible reasons that you can't store something on the stack are:
It's too big - on multithreaded programs on 32-bit OS, the stack has a small and fixed (at thread-creation time at least) size (typically just a few megs. This is so that you can create lots of threads without exhausting address space. For 64-bit programs, or single threaded (Linux anyway) programs, this is not a major issue. Under 32-bit Linux, single threaded programs usually use dynamic stacks which can keep growing until they reach the top of the heap.
You need to access it outside the scope of the original stack frame - this is really the main reason.
It is possible, with sensible compilers, to allocate non-fixed size objects on the heap (usually arrays whose size is not known at compile time).

It's more subtle than the other answers suggest. There is no absolute divide between data on the stack and data on the heap based on how you declare it. For example:
std::vector<int> v(10);
In the body of a function, that declares a vector (dynamic array) of ten integers on the stack. But the storage managed by the vector is not on the stack.
Ah, but (the other answers suggest) the lifetime of that storage is bounded by the lifetime of the vector itself, which here is stack-based, so it makes no difference how it's implemented - we can only treat it as a stack-based object with value semantics.
Not so. Suppose the function was:
void GetSomeNumbers(std::vector<int> &result)
{
std::vector<int> v(10);
// fill v with numbers
result.swap(v);
}
So anything with a swap function (and any complex value type should have one) can serve as a kind of rebindable reference to some heap data, under a system which guarantees a single owner of that data.
Therefore the modern C++ approach is to never store the address of heap data in naked local pointer variables. All heap allocations must be hidden inside classes.
If you do that, you can think of all variables in your program as if they were simple value types, and forget about the heap altogether (except when writing a new value-like wrapper class for some heap data, which ought to be unusual).
You merely have to retain one special bit of knowledge to help you optimise: where possible, instead of assigning one variable to another like this:
a = b;
swap them like this:
a.swap(b);
because it's much faster and it doesn't throw exceptions. The only requirement is that you don't need b to continue to hold the same value (it's going to get a's value instead, which would be trashed in a = b).
The downside is that this approach forces you to return values from functions via output parameters instead of the actual return value. But they're fixing that in C++0x with rvalue references.
In the most complicated situations of all, you would take this idea to the general extreme and use a smart pointer class such as shared_ptr which is already in tr1. (Although I'd argue that if you seem to need it, you've possibly moved outside Standard C++'s sweet spot of applicability.)

You also would store an item on the heap if it needs to be used outside the scope of the function in which it is created. One idiom used with stack objects is called RAII - this involves using the stack based object as a wrapper for a resource, when the object is destroyed, the resource would be cleaned up. Stack based objects are easier to keep track of when you might be throwing exceptions - you don't need to concern yourself with deleting a heap based object in an exception handler. This is why raw pointers are not normally used in modern C++, you would use a smart pointer which can be a stack based wrapper for a raw pointer to a heap based object.

To add to the other answers, it can also be about performance, at least a little bit. Not that you should worry about this unless it's relevant for you, but:
Allocating in the heap requires finding a tracking a block of memory, which is not a constant-time operation (and takes some cycles and overhead). This can get slower as memory becomes fragmented, and/or you're getting close to using 100% of your address space. On the other hand, stack allocations are constant-time, basically "free" operations.
Another thing to consider (again, really only important if it becomes an issue) is that typically the stack size is fixed, and can be much lower than the heap size. So if you're allocating large objects or many small objects, you probably want to use the heap; if you run out of stack space, the runtime will throw the site titular exception. Not usually a big deal, but another thing to consider.

Stack is more efficient, and easier to managed scoped data.
But heap should be used for anything larger than a few KB (it's easy in C++, just create a boost::scoped_ptr on the stack to hold a pointer to the allocated memory).
Consider a recursive algorithm that keeps calling into itself. It's Very hard to limit and or guess the total stack usage! Whereas on the heap, the allocator (malloc() or new) can indicate out-of-memory by returning NULL or throw ing.
Source: Linux Kernel whose stack is no larger than 8KB!

For completeness, you may read Miro Samek's article about the problems of using the heap in the context of embedded software.
A Heap of Problems

The choice of whether to allocate on the heap or on the stack is one that is made for you, depending on how your variable is allocated. If you allocate something dynamically, using a "new" call, you are allocating from the heap. If you allocate something as a global variable, or as a parameter in a function it is allocated on the stack.

In my opinion there are two deciding factors
1) Scope of variable
2) Performance.
I would prefer to use stack in most cases but if you need access to variable outside scope you can use heap.
To enhance performance while using heaps you can also use the functionality to create heap block and that can help in gaining performance rather than allocating each variable in different memory location.

probably this has been answered quite well. I would like to point you to the below series of articles to have a deeper understanding of low level details. Alex Darby has a series of articles, where he walks you through with a debugger. Here is Part 3 about the Stack.
http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/

Related

Difference between Node* n = new Node() and Node n in C++ [duplicate]

An experienced C++ user told me that I should strive for using heap variables, i.e.:
A* obj = new A("A");
as opposed to:
A obj("A");
Aside from all that stuff about using pointers being nice and flexible, he said it's better to put things on the heap rather than the stack (something about the stack being smaller than the heap?). Is it true? If so why?
NB: I know about issues with lifetime. Let's assume I have managed the lifetime of these variables appropriately. (i.e. the only criteria of concern is heap vs. stack storage with no lifetime concern)
Depending on the context we can consider heap or stack. Every thread gets a stack and the thread executes instructions by invoking functions. When a function is called, the function variables are pushed to stack. And when the function returns the stack rollbacks and memory is reclaimed. Now there is a size limitation for the thread local stack, it varies and can be tweaked to some extent. Considering this if every object is created on stack and the object requires large memory, then the stack space will exhaust resulting to stackoverflow error. Besides this if the object is to be accessed by multiple threads then storing such object on stack makes no sense.
Thus small variables, small objects who's size can be determine at compile time and pointers should be stored on stack. The concern of storing objects on heap or free store is, memory management becomes difficult. There are chances of memory leak, which is bad. Also if application tries to access an object which is already deleted, then access violation can happen which can cause application crash.
C++11 introduces smart pointers (shared, unique) to make memory management with heap easier. The actual referenced object is on heap but is encapsulation by the smart pointer which is always on the stack. Hence when the stack rollbacks during function return event or during exception the destructor of smart pointer deletes the actual object on heap. In case of shared pointer the reference count is maintained and the actually object is deleted when the reference count is zero.
http://en.wikipedia.org/wiki/Smart_pointer
There are no general rules regarding use of stack allocated vs heap allocated variables. There are only guidelines, depending on what you are trying to do.
Here are some pros and cons:
Heap Allocation:
Pros:
more flexible - in case you have a lot of information that is not available at compile-time
bigger in size - you can allocate more - however, it's not infinite, so at some point your program might run out of memory if allocations/deallocations are not handled correctly
Cons:
slower - dynamic allocation is usually slower than stack allocation
may cause memory fragmentation - allocating and deallocating objects of different sizes will make the memory look like Swiss cheese :) causing some allocations to fail if there is no memory block of the required size available
harder to maintain - as you know each dynamic allocation must be followed by a deallocation, which should be done by the user - this is error prone as there are a lot of cases where people forget to match every malloc() call with a free() call or new() with delete()
Stack allocation:
Pros:
faster - which is important mostly on embedded systems (I believe that for embedded there is a MISRA rule which forbids dynamic allocation)
does not cause memory fragmentation
makes the behavior of applications more deterministic - e.g. removes the possibility to run out of memory at some point
less error prone - as the user is not needed to handle deallocation
Cons:
less flexible - you have to have all information available at compile-time (data size, data structure, etc.)
smaller in size - however there are ways to calculate total stack size of an application, so running out of stack can be avoided
I think this captures a few of the pros and cons. I'm sure there are more.
In the end it depends on what your application needs.
The stack should be prefered to the heap, as stack allocated variables are automatic variables: their destruction is done automatically when the program goes out of their context.
In fact, the lifespan of object created on the stack and on the heap is different:
The local variables of a function or a code block {} (not allocated by new), are on the stack. They are automatically destroyed when you are returning from the function. (their destructors are called and their memory is freed).
But, if you need something an object to be used outside of the the function, you will have to allocate in on the heap (using new) or return a copy.
Example:
void myFun()
{
A onStack; // On the stack
A* onHeap = new A(); // On the heap
// Do things...
} // End of the function onStack is destroyed, but the &onHeap is still alive
In this example, onHeap will still have its memory allocated when the function ends. Such that if you don't have a pointer to onHeap somewhere, you won't be able to delete it and free the memory. It's a memory leak as the memory will be lost until the program end.
However if you were to return a pointer on onStack, since onStack was destroyed when exiting the function, using the pointer could cause undefined behaviour. While using onHeap is still perfectly valid.
To better understand how stack variables are working, you should search information about the call stack such as this article on Wikipedia. It explains how the variables are stacked to be used in a function.
It is always better to avoid using new as much as possible in C++.
However, there are times when you cannot avoid it.
For ex:
Wanting variables to exist beyond their scopes.
So it should be horses for courses really, but if you have a choice always avoid heap allocated variables.
The answer is not as clear cut as some would make you believe.
In general, you should prefer automatic variables (on the stack) because it's just plain easier. However some situations call for dynamic allocations (on the heap):
unknown size at compile time
extensible (containers use heap allocation internally)
large objects
The latter is a bit tricky. In theory, the automatic variables could get allocated infinitely, but computers are finite and worse all, most of the times the size of the stack is finite too (which is an implementation issue).
Personally, I use the following guideline:
local objects are allocated automatically
local arrays are deferred to std::vector<T> which internally allocates them dynamically
it has served me well (which is just anecdotal evidence, obviously).
Note: you can (and probably should) tie the life of the dynamically allocated object to that of a stack variable using RAII: smart pointers or containers.
C++ has no mention of the Heap or the Stack. As far as the language is concerned they do not exist/are not separate things.
As for a practical answer - use what works best - do you need fast - do you need guarantees. Application A might be much better with everything on the Heap, App B might fragment OS memory so badly it kills the machine - there is no right answer :-(
Simply put, don't manage your own memory unless you need to. ;)
Stack = Static Data allocated during compile time. (not dynamic)
Heap = Dyanamic Data allocated during run time. (Very dynamic)
Although pointers are on the Stack...Those pointers are beautiful because they open the doors for dynamic, spontaneous creation of data (depending on how you code your program).
(But I'm just a savage, so why does it matter what i say)

Prefer heap over stack?

I recently dove into graphics programming and I noticed that many graphic engines (i.e Ogre), and many coders overall, prefer to initialize class instances dynamically. Here's an example from Ogre Basic Tutorial 1
//...
Ogre::Entity* ogreHead = mSceneMgr->createEntity("Head", "ogrehead.mesh");
Ogre::SceneNode* headNode = mSceneMgr->getRootSceneNode()->createChildSceneNode("HeadNode");
//...
ogreHead and headNode data members and methods are then referred to as ogreHead->blabla.
Why mess around with object pointers instead of plain objects?
BTW, I've also read somewhere that heap memory allocation is much slower than stack memory allocation.
Heap allocation is, inevitably much slower than stack allocation. More on "How much slower?" later. However, in many cases, the choice is "made for you", for several reasons:
Stack is limited. And if you run out, the application almost always gets terminated - there is no real good recovery, even printing an error message to say "I ran out of stack" may be hard...
Stack allocation "goes away" when you leave the function where the allocation was made.
Variability is much more well defined and easy to deal with. C++ does not cope with "variable length arrays" very well, and it's certainly not guaranteed to work in all compilers.
How much slower is heap over stack?
We'll get to "and does it matter" in a bit.
For a given allocation, stack allocation is simply a subtract operation [1], where at the very minimum new or malloc will be a function call, and probably even the most simple allocator will be several dozen instructions, in complex cases thousands [because memory has to be gotten from the OS, and cleared of it's previous content]. So anything from a 10x to "infinitely" slower, give or take. Exact numbers will depend on the exact system the code is running in, size of the allocation, and often "previous calls to the allocator" (e.g. a long list of "freed" allocations can make allocating a new object slower, because a good fit has to be searched for). And of course, unless you do the "ostrich" method of heap management, you also need to free the object and cope with "out of memory" which adds more code/time to the execution and complexity of the code.
With some reasonably clever programming, however, this can be mostly hidden - for example, allocating something that stays allocated for a long time, over the lifetime of the object, will be "nothing to worry about". Allocating objects from the heap for every pixel or every trianle in a 3D game would CLEARLY be a bad idea. But if the lifetime of the object is many frames or even the entire game, the time to allocate and free it will be nearly nothing.
Similarly, instead of doing 10000 individual object allocations, make one for 10000 objects. Object pool is one such concept.
Further, often the allocation time isn't where the time is spent. For example, reading a triangle list from a file from a disk will take much longer than allocating the space for the same triangle list - even if you allocate each single one!
To me, the rule is:
Does it fit nicely on the stack? Typically a few kilobytes is fine, many kilobytes not so good, and megabytes definitely not ok.
Is the number (e.g. array of objects) known, and the maximum such that you can fit it on the stack?
Do you know what the object will be? In other words abstract/polymorphic classes will probably need to be allocated on the heap.
Is its lifetime the same as the scope it is in? If not, use the heap (or stack further down, and pass it up the stack).
[1] Or add if stack is "grows towards high addresses" - I don't know of a machine which has such an architecture, but it is conceivable and I think some have been made. C certainly makes no promises as to which way the stack grows, or anything else about how the runtime stack works.
The scope of the stack is limited: it only exists within a function. Now, modern user-interfacing programs are usually event driven, which means that a function of yours is invoked to handle an event, and then that function must return in order for the program to continue running. So, if your event handler function wishes to create an object which will remain in existence after the function has returned, clearly, that object cannot be allocated on the stack of that function, because it will cease to exist as soon as the function returns. That's the main reason why we allocate things on the heap.
There are other reasons, too.
Sometimes, the exact size of a class is not known during compilation time. If the exact size of a class is not known, it cannot be created on the stack, because the compiler needs to have precise knowledge of how much space it needs to allocate for each item on the stack.
Furthermore, factory methods like whatever::createEntity() are often used. If you have to invoke a separate method to create an object for you, then that object cannot be created on the stack, for the reason explained in the first paragraph of this answer.
Why pointers instead of objects?
Because pointers help make things fast. If you pass an object by value, to another function, for example
shoot(Orge::Entity ogre)
instead of
shoot(Orge::Entity* ogrePtr)
If ogre isn't a pointer, what happens is you are passing the whole object into the function, rather than a reference. If the compiler doesn't optimize, you are left with an inefficient program. There are other reasons too, with the pointer, you can modify the passed in object (some argue references are better but that's a different discussion). Otherwise you would be spending too much time copying modified objects back and forth.
Why heap?
In some sense heap is a safer type of memory to access and allows you to safely reset/recover. If you call new and don't have memory, you can flag that as an error. If you are using the stack, there is actually no good way to know you have caused stackoverflow, without some other supervising program, at which point you are already in danger zone.
Depends on your application. Stack has local scope so if the object goes out of scope, it will deallocate memory for the object. If you need the object in some other function, then no real way to do that.
Applies more to OS, heap is comparatively much larger than stack, especially in multi-threaded application where each thread can have a limited stack size.

Is it better to use heap or stack variables?

An experienced C++ user told me that I should strive for using heap variables, i.e.:
A* obj = new A("A");
as opposed to:
A obj("A");
Aside from all that stuff about using pointers being nice and flexible, he said it's better to put things on the heap rather than the stack (something about the stack being smaller than the heap?). Is it true? If so why?
NB: I know about issues with lifetime. Let's assume I have managed the lifetime of these variables appropriately. (i.e. the only criteria of concern is heap vs. stack storage with no lifetime concern)
Depending on the context we can consider heap or stack. Every thread gets a stack and the thread executes instructions by invoking functions. When a function is called, the function variables are pushed to stack. And when the function returns the stack rollbacks and memory is reclaimed. Now there is a size limitation for the thread local stack, it varies and can be tweaked to some extent. Considering this if every object is created on stack and the object requires large memory, then the stack space will exhaust resulting to stackoverflow error. Besides this if the object is to be accessed by multiple threads then storing such object on stack makes no sense.
Thus small variables, small objects who's size can be determine at compile time and pointers should be stored on stack. The concern of storing objects on heap or free store is, memory management becomes difficult. There are chances of memory leak, which is bad. Also if application tries to access an object which is already deleted, then access violation can happen which can cause application crash.
C++11 introduces smart pointers (shared, unique) to make memory management with heap easier. The actual referenced object is on heap but is encapsulation by the smart pointer which is always on the stack. Hence when the stack rollbacks during function return event or during exception the destructor of smart pointer deletes the actual object on heap. In case of shared pointer the reference count is maintained and the actually object is deleted when the reference count is zero.
http://en.wikipedia.org/wiki/Smart_pointer
There are no general rules regarding use of stack allocated vs heap allocated variables. There are only guidelines, depending on what you are trying to do.
Here are some pros and cons:
Heap Allocation:
Pros:
more flexible - in case you have a lot of information that is not available at compile-time
bigger in size - you can allocate more - however, it's not infinite, so at some point your program might run out of memory if allocations/deallocations are not handled correctly
Cons:
slower - dynamic allocation is usually slower than stack allocation
may cause memory fragmentation - allocating and deallocating objects of different sizes will make the memory look like Swiss cheese :) causing some allocations to fail if there is no memory block of the required size available
harder to maintain - as you know each dynamic allocation must be followed by a deallocation, which should be done by the user - this is error prone as there are a lot of cases where people forget to match every malloc() call with a free() call or new() with delete()
Stack allocation:
Pros:
faster - which is important mostly on embedded systems (I believe that for embedded there is a MISRA rule which forbids dynamic allocation)
does not cause memory fragmentation
makes the behavior of applications more deterministic - e.g. removes the possibility to run out of memory at some point
less error prone - as the user is not needed to handle deallocation
Cons:
less flexible - you have to have all information available at compile-time (data size, data structure, etc.)
smaller in size - however there are ways to calculate total stack size of an application, so running out of stack can be avoided
I think this captures a few of the pros and cons. I'm sure there are more.
In the end it depends on what your application needs.
The stack should be prefered to the heap, as stack allocated variables are automatic variables: their destruction is done automatically when the program goes out of their context.
In fact, the lifespan of object created on the stack and on the heap is different:
The local variables of a function or a code block {} (not allocated by new), are on the stack. They are automatically destroyed when you are returning from the function. (their destructors are called and their memory is freed).
But, if you need something an object to be used outside of the the function, you will have to allocate in on the heap (using new) or return a copy.
Example:
void myFun()
{
A onStack; // On the stack
A* onHeap = new A(); // On the heap
// Do things...
} // End of the function onStack is destroyed, but the &onHeap is still alive
In this example, onHeap will still have its memory allocated when the function ends. Such that if you don't have a pointer to onHeap somewhere, you won't be able to delete it and free the memory. It's a memory leak as the memory will be lost until the program end.
However if you were to return a pointer on onStack, since onStack was destroyed when exiting the function, using the pointer could cause undefined behaviour. While using onHeap is still perfectly valid.
To better understand how stack variables are working, you should search information about the call stack such as this article on Wikipedia. It explains how the variables are stacked to be used in a function.
It is always better to avoid using new as much as possible in C++.
However, there are times when you cannot avoid it.
For ex:
Wanting variables to exist beyond their scopes.
So it should be horses for courses really, but if you have a choice always avoid heap allocated variables.
The answer is not as clear cut as some would make you believe.
In general, you should prefer automatic variables (on the stack) because it's just plain easier. However some situations call for dynamic allocations (on the heap):
unknown size at compile time
extensible (containers use heap allocation internally)
large objects
The latter is a bit tricky. In theory, the automatic variables could get allocated infinitely, but computers are finite and worse all, most of the times the size of the stack is finite too (which is an implementation issue).
Personally, I use the following guideline:
local objects are allocated automatically
local arrays are deferred to std::vector<T> which internally allocates them dynamically
it has served me well (which is just anecdotal evidence, obviously).
Note: you can (and probably should) tie the life of the dynamically allocated object to that of a stack variable using RAII: smart pointers or containers.
C++ has no mention of the Heap or the Stack. As far as the language is concerned they do not exist/are not separate things.
As for a practical answer - use what works best - do you need fast - do you need guarantees. Application A might be much better with everything on the Heap, App B might fragment OS memory so badly it kills the machine - there is no right answer :-(
Simply put, don't manage your own memory unless you need to. ;)
Stack = Static Data allocated during compile time. (not dynamic)
Heap = Dyanamic Data allocated during run time. (Very dynamic)
Although pointers are on the Stack...Those pointers are beautiful because they open the doors for dynamic, spontaneous creation of data (depending on how you code your program).
(But I'm just a savage, so why does it matter what i say)

Why should C++ programmers minimize use of 'new'?

I stumbled upon Stack Overflow question Memory leak with std::string when using std::list<std::string>, and one of the comments says this:
Stop using new so much. I can't see any reason you used new anywhere you did. You can create objects by value in C++ and it's one of the huge advantages to using the language. You do not have to allocate everything on the heap. Stop thinking like a Java programmer.
I'm not really sure what he means by that.
Why should objects be created by value in C++ as often as possible, and what difference does it make internally? Did I misinterpret the answer?
There are two widely-used memory allocation techniques: automatic allocation and dynamic allocation. Commonly, there is a corresponding region of memory for each: the stack and the heap.
Stack
The stack always allocates memory in a sequential fashion. It can do so because it requires you to release the memory in the reverse order (First-In, Last-Out: FILO). This is the memory allocation technique for local variables in many programming languages. It is very, very fast because it requires minimal bookkeeping and the next address to allocate is implicit.
In C++, this is called automatic storage because the storage is claimed automatically at the end of scope. As soon as execution of current code block (delimited using {}) is completed, memory for all variables in that block is automatically collected. This is also the moment where destructors are invoked to clean up resources.
Heap
The heap allows for a more flexible memory allocation mode. Bookkeeping is more complex and allocation is slower. Because there is no implicit release point, you must release the memory manually, using delete or delete[] (free in C). However, the absence of an implicit release point is the key to the heap's flexibility.
Reasons to use dynamic allocation
Even if using the heap is slower and potentially leads to memory leaks or memory fragmentation, there are perfectly good use cases for dynamic allocation, as it's less limited.
Two key reasons to use dynamic allocation:
You don't know how much memory you need at compile time. For instance, when reading a text file into a string, you usually don't know what size the file has, so you can't decide how much memory to allocate until you run the program.
You want to allocate memory which will persist after leaving the current block. For instance, you may want to write a function string readfile(string path) that returns the contents of a file. In this case, even if the stack could hold the entire file contents, you could not return from a function and keep the allocated memory block.
Why dynamic allocation is often unnecessary
In C++ there's a neat construct called a destructor. This mechanism allows you to manage resources by aligning the lifetime of the resource with the lifetime of a variable. This technique is called RAII and is the distinguishing point of C++. It "wraps" resources into objects. std::string is a perfect example. This snippet:
int main ( int argc, char* argv[] )
{
std::string program(argv[0]);
}
actually allocates a variable amount of memory. The std::string object allocates memory using the heap and releases it in its destructor. In this case, you did not need to manually manage any resources and still got the benefits of dynamic memory allocation.
In particular, it implies that in this snippet:
int main ( int argc, char* argv[] )
{
std::string * program = new std::string(argv[0]); // Bad!
delete program;
}
there is unneeded dynamic memory allocation. The program requires more typing (!) and introduces the risk of forgetting to deallocate the memory. It does this with no apparent benefit.
Why you should use automatic storage as often as possible
Basically, the last paragraph sums it up. Using automatic storage as often as possible makes your programs:
faster to type;
faster when run;
less prone to memory/resource leaks.
Bonus points
In the referenced question, there are additional concerns. In particular, the following class:
class Line {
public:
Line();
~Line();
std::string* mString;
};
Line::Line() {
mString = new std::string("foo_bar");
}
Line::~Line() {
delete mString;
}
Is actually a lot more risky to use than the following one:
class Line {
public:
Line();
std::string mString;
};
Line::Line() {
mString = "foo_bar";
// note: there is a cleaner way to write this.
}
The reason is that std::string properly defines a copy constructor. Consider the following program:
int main ()
{
Line l1;
Line l2 = l1;
}
Using the original version, this program will likely crash, as it uses delete on the same string twice. Using the modified version, each Line instance will own its own string instance, each with its own memory and both will be released at the end of the program.
Other notes
Extensive use of RAII is considered a best practice in C++ because of all the reasons above. However, there is an additional benefit which is not immediately obvious. Basically, it's better than the sum of its parts. The whole mechanism composes. It scales.
If you use the Line class as a building block:
class Table
{
Line borders[4];
};
Then
int main ()
{
Table table;
}
allocates four std::string instances, four Line instances, one Table instance and all the string's contents and everything is freed automagically.
Because the stack is faster and leak-proof
In C++, it takes but a single instruction to allocate space—on the stack—for every local scope object in a given function, and it's impossible to leak any of that memory. That comment intended (or should have intended) to say something like "use the stack and not the heap".
The reason why is complicated.
First, C++ is not garbage collected. Therefore, for every new, there must be a corresponding delete. If you fail to put this delete in, then you have a memory leak. Now, for a simple case like this:
std::string *someString = new std::string(...);
//Do stuff
delete someString;
This is simple. But what happens if "Do stuff" throws an exception? Oops: memory leak. What happens if "Do stuff" issues return early? Oops: memory leak.
And this is for the simplest case. If you happen to return that string to someone, now they have to delete it. And if they pass it as an argument, does the person receiving it need to delete it? When should they delete it?
Or, you can just do this:
std::string someString(...);
//Do stuff
No delete. The object was created on the "stack", and it will be destroyed once it goes out of scope. You can even return the object, thus transfering its contents to the calling function. You can pass the object to functions (typically as a reference or const-reference: void SomeFunc(std::string &iCanModifyThis, const std::string &iCantModifyThis). And so forth.
All without new and delete. There's no question of who owns the memory or who's responsible for deleting it. If you do:
std::string someString(...);
std::string otherString;
otherString = someString;
It is understood that otherString has a copy of the data of someString. It isn't a pointer; it is a separate object. They may happen to have the same contents, but you can change one without affecting the other:
someString += "More text.";
if(otherString == someString) { /*Will never get here */ }
See the idea?
Objects created by new must be eventually deleted lest they leak. The destructor won't be called, memory won't be freed, the whole bit. Since C++ has no garbage collection, it's a problem.
Objects created by value (i. e. on stack) automatically die when they go out of scope. The destructor call is inserted by the compiler, and the memory is auto-freed upon function return.
Smart pointers like unique_ptr, shared_ptr solve the dangling reference problem, but they require coding discipline and have other potential issues (copyability, reference loops, etc.).
Also, in heavily multithreaded scenarios, new is a point of contention between threads; there can be a performance impact for overusing new. Stack object creation is by definition thread-local, since each thread has its own stack.
The downside of value objects is that they die once the host function returns - you cannot pass a reference to those back to the caller, only by copying, returning or moving by value.
C++ doesn't employ any memory manager by its own. Other languages like C# and Java have a garbage collector to handle the memory
C++ implementations typically use operating system routines to allocate the memory and too much new/delete could fragment the available memory
With any application, if the memory is frequently being used it's advisable to preallocate it and release when not required.
Improper memory management could lead memory leaks and it's really hard to track. So using stack objects within the scope of function is a proven technique
The downside of using stack objects are, it creates multiple copies of objects on returning, passing to functions, etc. However, smart compilers are well aware of these situations and they've been optimized well for performance
It's really tedious in C++ if the memory being allocated and released in two different places. The responsibility for release is always a question and mostly we rely on some commonly accessible pointers, stack objects (maximum possible) and techniques like auto_ptr (RAII objects)
The best thing is that, you've control over the memory and the worst thing is that you will not have any control over the memory if we employ an improper memory management for the application. The crashes caused due to memory corruptions are the nastiest and hard to trace.
I see that a few important reasons for doing as few new's as possible are missed:
Operator new has a non-deterministic execution time
Calling new may or may not cause the OS to allocate a new physical page to your process. This can be quite slow if you do it often. Or it may already have a suitable memory location ready; we don't know. If your program needs to have consistent and predictable execution time (like in a real-time system or game/physics simulation), you need to avoid new in your time-critical loops.
Operator new is an implicit thread synchronization
Yes, you heard me. Your OS needs to make sure your page tables are consistent and as such calling new will cause your thread to acquire an implicit mutex lock. If you are consistently calling new from many threads you are actually serialising your threads (I've done this with 32 CPUs, each hitting on new to get a few hundred bytes each, ouch! That was a royal p.i.t.a. to debug.)
The rest, such as slow, fragmentation, error prone, etc., have already been mentioned by other answers.
Pre-C++17:
Because it is prone to subtle leaks even if you wrap the result in a smart pointer.
Consider a "careful" user who remembers to wrap objects in smart pointers:
foo(shared_ptr<T1>(new T1()), shared_ptr<T2>(new T2()));
This code is dangerous because there is no guarantee that either shared_ptr is constructed before either T1 or T2. Hence, if one of new T1() or new T2() fails after the other succeeds, then the first object will be leaked because no shared_ptr exists to destroy and deallocate it.
Solution: use make_shared.
Post-C++17:
This is no longer a problem: C++17 imposes a constraint on the order of these operations, in this case ensuring that each call to new() must be immediately followed by the construction of the corresponding smart pointer, with no other operation in between. This implies that, by the time the second new() is called, it is guaranteed that the first object has already been wrapped in its smart pointer, thus preventing any leaks in case an exception is thrown.
A more detailed explanation of the new evaluation order introduced by C++17 was provided by Barry in another answer.
Thanks to #Remy Lebeau for pointing out that this is still a problem under C++17 (although less so): the shared_ptr constructor can fail to allocate its control block and throw, in which case the pointer passed to it is not deleted.
Solution: use make_shared.
To a great extent, that's someone elevating their own weaknesses to a general rule. There's nothing wrong per se with creating objects using the new operator. What there is some argument for is that you have to do so with some discipline: if you create an object you need to make sure it's going to be destroyed.
The easiest way of doing that is to create the object in automatic storage, so C++ knows to destroy it when it goes out of scope:
{
File foo = File("foo.dat");
// Do things
}
Now, observe that when you fall off that block after the end-brace, foo is out of scope. C++ will call its destructor automatically for you. Unlike Java, you don't need to wait for the garbage collection to find it.
Had you written
{
File * foo = new File("foo.dat");
you would want to match it explicitly with
delete foo;
}
or even better, allocate your File * as a "smart pointer". If you aren't careful about that it can lead to leaks.
The answer itself makes the mistaken assumption that if you don't use new you don't allocate on the heap; in fact, in C++ you don't know that. At most, you know that a small amount of memory, say one pointer, is certainly allocated on the stack. However, consider if the implementation of File is something like:
class File {
private:
FileImpl * fd;
public:
File(String fn){ fd = new FileImpl(fn);}
Then FileImpl will still be allocated on the stack.
And yes, you'd better be sure to have
~File(){ delete fd ; }
in the class as well; without it, you'll leak memory from the heap even if you didn't apparently allocate on the heap at all.
new() shouldn't be used as little as possible. It should be used as carefully as possible. And it should be used as often as necessary as dictated by pragmatism.
Allocation of objects on the stack, relying on their implicit destruction, is a simple model. If the required scope of an object fits that model then there's no need to use new(), with the associated delete() and checking of NULL pointers.
In the case where you have lots of short-lived objects allocation on the stack should reduce the problems of heap fragmentation.
However, if the lifetime of your object needs to extend beyond the current scope then new() is the right answer. Just make sure that you pay attention to when and how you call delete() and the possibilities of NULL pointers, using deleted objects and all of the other gotchas that come with the use of pointers.
When you use new, objects are allocated to the heap. It is generally used when you anticipate expansion. When you declare an object such as,
Class var;
it is placed on the stack.
You will always have to call destroy on the object that you placed on the heap with new. This opens the potential for memory leaks. Objects placed on the stack are not prone to memory leaking!
One notable reason to avoid overusing the heap is for performance -- specifically involving the performance of the default memory management mechanism used by C++. While allocation can be quite quick in the trivial case, doing a lot of new and delete on objects of non-uniform size without strict order leads not only to memory fragmentation, but it also complicates the allocation algorithm and can absolutely destroy performance in certain cases.
That's the problem that memory pools where created to solve, allowing to to mitigate the inherent disadvantages of traditional heap implementations, while still allowing you to use the heap as necessary.
Better still, though, to avoid the problem altogether. If you can put it on the stack, then do so.
I tend to disagree with the idea of using new "too much". Though the original poster's use of new with system classes is a bit ridiculous. (int *i; i = new int[9999];? really? int i[9999]; is much clearer.) I think that is what was getting the commenter's goat.
When you're working with system objects, it's very rare that you'd need more than one reference to the exact same object. As long as the value is the same, that's all that matters. And system objects don't typically take up much space in memory. (one byte per character, in a string). And if they do, the libraries should be designed to take that memory management into account (if they're written well). In these cases, (all but one or two of the news in his code), new is practically pointless and only serves to introduce confusions and potential for bugs.
When you're working with your own classes/objects, however (e.g. the original poster's Line class), then you have to begin thinking about the issues like memory footprint, persistence of data, etc. yourself. At this point, allowing multiple references to the same value is invaluable - it allows for constructs like linked lists, dictionaries, and graphs, where multiple variables need to not only have the same value, but reference the exact same object in memory. However, the Line class doesn't have any of those requirements. So the original poster's code actually has absolutely no needs for new.
I think the poster meant to say You do not have to allocate everything on the heap rather than the the stack.
Basically, objects are allocated on the stack (if the object size allows, of course) because of the cheap cost of stack-allocation, rather than heap-based allocation which involves quite some work by the allocator, and adds verbosity because then you have to manage data allocated on the heap.
Two reasons:
It's unnecessary in this case. You're making your code needlessly more complicated.
It allocates space on the heap, and it means that you have to remember to delete it later, or it will cause a memory leak.
Many answers have gone into various performance considerations. I want to address the comment which puzzled OP:
Stop thinking like a Java programmer.
Indeed, in Java, as explained in the answer to this question,
You use the new keyword when an object is being explicitly created for the first time.
but in C++, objects of type T are created like so: T{} (or T{ctor_argument1,ctor_arg2} for a constructor with arguments). That's why usually you just have no reason to want to use new.
So, why is it ever used at all? Well, for two reasons:
You need to create many values the number of which is not known at compile time.
Due to limitations of the C++ implementation on common machines - to prevent a stack overflow by allocating too much space creating values the regular way.
Now, beyond what the comment you quoted implied, you should note that even those two cases above are covered well enough without you having to "resort" to using new yourself:
You can use container types from the standard libraries which can hold a runtime-variable number of elements (like std::vector).
You can use smart pointers, which give you a pointer similar to new, but ensure that memory gets released where the "pointer" goes out of scope.
and for this reason, it is an official item in the C++ community Coding Guidelines to avoid explicit new and delete: Guideline R.11.
The core reason is that objects on heap are always difficult to use and manage than simple values. Writing code that are easy to read and maintain is always the first priority of any serious programmer.
Another scenario is the library we are using provides value semantics and make dynamic allocation unnecessary. Std::string is a good example.
For object oriented code however, using a pointer - which means use new to create it beforehand - is a must. In order to simplify the complexity of resource management, we have dozens of tools to make it as simple as possible, such as smart pointers. The object based paradigm or generic paradigm assumes value semantics and requires less or no new, just as the posters elsewhere stated.
Traditional design patterns, especially those mentioned in GoF book, use new a lot, as they are typical OO code.
new is the new goto.
Recall why goto is so reviled: while it is a powerful, low-level tool for flow control, people often used it in unnecessarily complicated ways that made code difficult to follow. Furthermore, the most useful and easiest to read patterns were encoded in structured programming statements (e.g. for or while); the ultimate effect is that the code where goto is the appropriate way to is rather rare, if you are tempted to write goto, you're probably doing things badly (unless you really know what you're doing).
new is similar — it is often used to make things unnecessarily complicated and harder to read, and the most useful usage patterns can be encoded have been encoded into various classes. Furthermore, if you need to use any new usage patterns for which there aren't already standard classes, you can write your own classes that encode them!
I would even argue that new is worse than goto, due to the need to pair new and delete statements.
Like goto, if you ever think you need to use new, you are probably doing things badly — especially if you are doing so outside of the implementation of a class whose purpose in life is to encapsulate whatever dynamic allocations you need to do.
One more point to all the above correct answers, it depends on what sort of programming you are doing. Kernel developing in Windows for example -> The stack is severely limited and you might not be able to take page faults like in user mode.
In such environments, new, or C-like API calls are prefered and even required.
Of course, this is merely an exception to the rule.
new allocates objects on the heap. Otherwise, objects are allocated on the stack. Look up the difference between the two.

When should I use the new keyword in C++?

I've been using C++ for a short while, and I've been wondering about the new keyword. Simply, should I be using it, or not?
With the new keyword...
MyClass* myClass = new MyClass();
myClass->MyField = "Hello world!";
Without the new keyword...
MyClass myClass;
myClass.MyField = "Hello world!";
From an implementation perspective, they don't seem that different (but I'm sure they are)... However, my primary language is C#, and of course the 1st method is what I'm used to.
The difficulty seems to be that method 1 is harder to use with the std C++ classes.
Which method should I use?
Update 1:
I recently used the new keyword for heap memory (or free store) for a large array which was going out of scope (i.e. being returned from a function). Where before I was using the stack, which caused half of the elements to be corrupt outside of scope, switching to heap usage ensured that the elements were intact. Yay!
Update 2:
A friend of mine recently told me there's a simple rule for using the new keyword; every time you type new, type delete.
Foobar *foobar = new Foobar();
delete foobar; // TODO: Move this to the right place.
This helps to prevent memory leaks, as you always have to put the delete somewhere (i.e. when you cut and paste it to either a destructor or otherwise).
Method 1 (using new)
Allocates memory for the object on the free store (This is frequently the same thing as the heap)
Requires you to explicitly delete your object later. (If you don't delete it, you could create a memory leak)
Memory stays allocated until you delete it. (i.e. you could return an object that you created using new)
The example in the question will leak memory unless the pointer is deleted; and it should always be deleted, regardless of which control path is taken, or if exceptions are thrown.
Method 2 (not using new)
Allocates memory for the object on the stack (where all local variables go) There is generally less memory available for the stack; if you allocate too many objects, you risk stack overflow.
You won't need to delete it later.
Memory is no longer allocated when it goes out of scope. (i.e. you shouldn't return a pointer to an object on the stack)
As far as which one to use; you choose the method that works best for you, given the above constraints.
Some easy cases:
If you don't want to worry about calling delete, (and the potential to cause memory leaks) you shouldn't use new.
If you'd like to return a pointer to your object from a function, you must use new
There is an important difference between the two.
Everything not allocated with new behaves much like value types in C# (and people often say that those objects are allocated on the stack, which is probably the most common/obvious case, but not always true). More precisely, objects allocated without using new have automatic storage duration
Everything allocated with new is allocated on the heap, and a pointer to it is returned, exactly like reference types in C#.
Anything allocated on the stack has to have a constant size, determined at compile-time (the compiler has to set the stack pointer correctly, or if the object is a member of another class, it has to adjust the size of that other class). That's why arrays in C# are reference types. They have to be, because with reference types, we can decide at runtime how much memory to ask for. And the same applies here. Only arrays with constant size (a size that can be determined at compile-time) can be allocated with automatic storage duration (on the stack). Dynamically sized arrays have to be allocated on the heap, by calling new.
(And that's where any similarity to C# stops)
Now, anything allocated on the stack has "automatic" storage duration (you can actually declare a variable as auto, but this is the default if no other storage type is specified so the keyword isn't really used in practice, but this is where it comes from)
Automatic storage duration means exactly what it sounds like, the duration of the variable is handled automatically. By contrast, anything allocated on the heap has to be manually deleted by you.
Here's an example:
void foo() {
bar b;
bar* b2 = new bar();
}
This function creates three values worth considering:
On line 1, it declares a variable b of type bar on the stack (automatic duration).
On line 2, it declares a bar pointer b2 on the stack (automatic duration), and calls new, allocating a bar object on the heap. (dynamic duration)
When the function returns, the following will happen:
First, b2 goes out of scope (order of destruction is always opposite of order of construction). But b2 is just a pointer, so nothing happens, the memory it occupies is simply freed. And importantly, the memory it points to (the bar instance on the heap) is NOT touched. Only the pointer is freed, because only the pointer had automatic duration.
Second, b goes out of scope, so since it has automatic duration, its destructor is called, and the memory is freed.
And the barinstance on the heap? It's probably still there. No one bothered to delete it, so we've leaked memory.
From this example, we can see that anything with automatic duration is guaranteed to have its destructor called when it goes out of scope. That's useful. But anything allocated on the heap lasts as long as we need it to, and can be dynamically sized, as in the case of arrays. That is also useful. We can use that to manage our memory allocations. What if the Foo class allocated some memory on the heap in its constructor, and deleted that memory in its destructor. Then we could get the best of both worlds, safe memory allocations that are guaranteed to be freed again, but without the limitations of forcing everything to be on the stack.
And that is pretty much exactly how most C++ code works.
Look at the standard library's std::vector for example. That is typically allocated on the stack, but can be dynamically sized and resized. And it does this by internally allocating memory on the heap as necessary. The user of the class never sees this, so there's no chance of leaking memory, or forgetting to clean up what you allocated.
This principle is called RAII (Resource Acquisition is Initialization), and it can be extended to any resource that must be acquired and released. (network sockets, files, database connections, synchronization locks). All of them can be acquired in the constructor, and released in the destructor, so you're guaranteed that all resources you acquire will get freed again.
As a general rule, never use new/delete directly from your high level code. Always wrap it in a class that can manage the memory for you, and which will ensure it gets freed again. (Yes, there may be exceptions to this rule. In particular, smart pointers require you to call new directly, and pass the pointer to its constructor, which then takes over and ensures delete is called correctly. But this is still a very important rule of thumb)
The short answer is: if you're a beginner in C++, you should never be using new or delete yourself.
Instead, you should use smart pointers such as std::unique_ptr and std::make_unique (or less often, std::shared_ptr and std::make_shared). That way, you don't have to worry nearly as much about memory leaks. And even if you're more advanced, best practice would usually be to encapsulate the custom way you're using new and delete into a small class (such as a custom smart pointer) that is dedicated just to object lifecycle issues.
Of course, behind the scenes, these smart pointers are still performing dynamic allocation and deallocation, so code using them would still have the associated runtime overhead. Other answers here have covered these issues, and how to make design decisions on when to use smart pointers versus just creating objects on the stack or incorporating them as direct members of an object, well enough that I won't repeat them. But my executive summary would be: don't use smart pointers or dynamic allocation until something forces you to.
Which method should I use?
This is almost never determined by your typing preferences but by the context. If you need to keep the object across a few stacks or if it's too heavy for the stack you allocate it on the free store. Also, since you are allocating an object, you are also responsible for releasing the memory. Lookup the delete operator.
To ease the burden of using free-store management people have invented stuff like auto_ptr and unique_ptr. I strongly recommend you take a look at these. They might even be of help to your typing issues ;-)
If you are writing in C++ you are probably writing for performance. Using new and the free store is much slower than using the stack (especially when using threads) so only use it when you need it.
As others have said, you need new when your object needs to live outside the function or object scope, the object is really large or when you don't know the size of an array at compile time.
Also, try to avoid ever using delete. Wrap your new into a smart pointer instead. Let the smart pointer call delete for you.
There are some cases where a smart pointer isn't smart. Never store std::auto_ptr<> inside a STL container. It will delete the pointer too soon because of copy operations inside the container. Another case is when you have a really large STL container of pointers to objects. boost::shared_ptr<> will have a ton of speed overhead as it bumps the reference counts up and down. The better way to go in that case is to put the STL container into another object and give that object a destructor that will call delete on every pointer in the container.
Without the new keyword you're storing that on call stack. Storing excessively large variables on stack will lead to stack overflow.
If your variable is used only within the context of a single function, you're better off using a stack variable, i.e., Option 2. As others have said, you do not have to manage the lifetime of stack variables - they are constructed and destructed automatically. Also, allocating/deallocating a variable on the heap is slow by comparison. If your function is called often enough, you'll see a tremendous performance improvement if use stack variables versus heap variables.
That said, there are a couple of obvious instances where stack variables are insufficient.
If the stack variable has a large memory footprint, then you run the risk of overflowing the stack. By default, the stack size of each thread is 1 MB on Windows. It is unlikely that you'll create a stack variable that is 1 MB in size, but you have to keep in mind that stack utilization is cumulative. If your function calls a function which calls another function which calls another function which..., the stack variables in all of these functions take up space on the same stack. Recursive functions can run into this problem quickly, depending on how deep the recursion is. If this is a problem, you can increase the size of the stack (not recommended) or allocate the variable on the heap using the new operator (recommended).
The other, more likely condition is that your variable needs to "live" beyond the scope of your function. In this case, you'd allocate the variable on the heap so that it can be reached outside the scope of any given function.
The simple answer is yes - new() creates an object on the heap (with the unfortunate side effect that you have to manage its lifetime (by explicitly calling delete on it), whereas the second form creates an object in the stack in the current scope and that object will be destroyed when it goes out of scope.
Are you passing myClass out of a function, or expecting it to exist outside that function? As some others said, it is all about scope when you aren't allocating on the heap. When you leave the function, it goes away (eventually). One of the classic mistakes made by beginners is the attempt to create a local object of some class in a function and return it without allocating it on the heap. I can remember debugging this kind of thing back in my earlier days doing c++.
C++ Core Guidelines R.11: Avoid using new and delete explicitly.
Things have changed significantly since most answers to this question were written. Specifically, C++ has evolved as a language, and the standard library is now richer. Why does this matter? Because of a combination of two factors:
Using new and delete is potentially dangerous: Memory might leak if you don't keep a very strong discipline of delete'ing everything you've allocated when it's no longer used; and never deleteing what's not currently allocated.
The standard library now offers smart pointers which encapsulate the new and delete calls, so that you don't have to take care of managing allocations on the free store/heap yourself. So do other containers, in the standard library and elsewhere.
This has evolved into one of the C++ community's "core guidelines" for writing better C++ code, as the linked document shows. Of course, there exceptions to this rule: Somebody needs to write those encapsulating classes which do use new and delete; but that someone is rarely yourself.
Adding to #DanielSchepler's valid answer:
The second method creates the instance on the stack, along with such things as something declared int and the list of parameters that are passed into the function.
The first method makes room for a pointer on the stack, which you've set to the location in memory where a new MyClass has been allocated on the heap - or free store.
The first method also requires that you delete what you create with new, whereas in the second method, the class is automatically destructed and freed when it falls out of scope (the next closing brace, usually).
The short answer is yes the "new" keyword is incredibly important as when you use it the object data is stored on the heap as opposed to the stack, which is most important!