Related
If a class needs to allocate memory dynamically (e.g. std::vector), is it acceptable for the class to simply allocate and deallocate the memory internally, using operator new or malloc?
The answer isn't entirely obvious to me. The lack of a system managing the memory allocation like in garbage collected languages is obviously empowering; but on the other hand, it is precisely this lack of coordination that ends up wasting memory. For instance, it would be quite trivial to make a 'fake' allocator that just passes stack memory to an object which would, under normal circumstances, require dynamic memory, but which the programmer can assert will never need more than X amount of bytes.
Perhaps you think that this issue is irrelevant in the days of large address spaces, but it feels a bit lame to fall back on the hardware, this is C++ after all.
EDIT
I realize now how cryptic I was with the question... Let me explain it a bit better.
When I say 'wasting memory', I specifically mean the kind of memory-wasting that happens with heap fragmentation. Reducing heap fragmentation is the most compelling point of making a memory managing system in C++, since (as many comments have pointed out) destructors already handle the resource management side of things. When your allocations are essentially random (you don't know where your new memory is in relation to other allocated memory) and every class could potentially allocate, you run into the sort of problem that data oriented design tries to fix: poor data locality.
So the question is: would it make sense for there to be a class that does the memory management, object management, heap compaction, and maybe statistics tracking (for debugging purposes) to make the most efficient use of memory and data locality?
[In this view, every class or function that allocates memory dynamically has to get a reference to that class, somehow.]
Or is it better to let every class be able to allocate without necessarily making it part of the interface of that class?
If a class needs to allocate memory dynamically (e.g. std::vector), is it acceptable for the class to simply allocate and deallocate the memory internally, using operator new or malloc?
Usually, we have two kinds of classes:
managers of resources (including dynamic memory);
"business logic" classes.
Most of the times we shouldn't mix the layers of resource management and domain logic.
So, if your class is a manager of a raw resource, it allocates/deallocates, initializes/deinitializes its only resource and does nothing else. In this case, new is OK and even necessary (e.g. you can't instead use std::vector when writing your own dynamic array, otherwise you don't need to write it at all). See RAII.
If your class contains some app logic, it is not permitted to explicitly allocate dynamic memory, open sockets etc., but it uses other RAII-classes for that. At this high level C++ provides you with something that GC languages don't: it makes RAII-owners manage files, sockets etc. - any kind of resource, not just raw bytes of heap memory, so you don't need manual Java/C#-style try-with-resources everywhere you create a not-of-raw-memory manager object - the compiler does it for you as soon as you have a RAII class for that.
Context: I'm working on a project where a client needs us to use custom dynamic memory allocation instead of allocating objects from the stack. Note that the objects in question have size known during compilation and doesn't even require dynamic allocation. Which makes me wonder,
What are some contexts where custom dynamic memory allocation of objects can be better than allocating objects from the stack? (where size is known during compilation)
An example. If Dog is a class, then instead of just declaring Dog puppy; they want us to do
Dog* puppy = nullptr;
custom_alloc(puppy);
new(puppy) Dog(); // the constructor
// do stuff
puppy->~Dog(); // the destructor
custom_free(puppy)
The real custom_alloc function is not known to us. To make the program run, the given custom_alloc function would be a wrapper of malloc. And custom_free would be a wrapper of free
I do not like this approach and was wondering when this can be actually useful or what they are really trying to solve by doing this.
Possible reasons:
Stack size is limited; while typical thread libraries allocate 1-10 MB for each thread's stack, it's not uncommon for the limit to be set lower for applications where hundreds or thousands of threads are expected to be launched concurrently (e.g. high traffic webservers; Microsoft IIS used to use a 256 KB limit, and only upped it to 512 KB for 64 bit setups).
You may want to keep an object around after the function has returned (without using globals). While NRVO and/or move semantics does mean it's often relatively cheap to return the object by value, when NRVO doesn't apply, copying around a single pointer is cheaper than just about anything else.
Auditing/tracing: They may want to use their custom function for specific types to keep track of memory allocation patterns
Persistent storage: The allocator may be backed by a memory mapped file; for structured data, that file may double as long term storage
Performance: Custom allocators (e.g. Intel's TBB) have been known to dramatically reduce runtime in certain circumstances. This is more a justification for using a custom allocator instead of the default allocator; custom allocators generally won't beat stack storage (except in really niche cases where memory locality might be improved by removing large objects from the stack and putting them in their own dedicated storage).
(Likely a terrible idea) Avoiding exception handling cleanup overhead. If your classes are RAII, then code has to be generated to clean them up along various code paths in case of an exception. Raw pointers don't generate any such code. Of course, if you don't take measures to perform the cleanup on exception yourself, this means memory leaks, but in rare cases (e.g. when you expect the program to exit completely, and you want the OS to handle memory cleanup) this might provide a minor "benefit".
A combination of the above: They may want to be able to swap between a tracing allocator and a performance allocator by linking different runtime libraries to provide custom_alloc
All that said, their approach to do this is pretty awful; requiring manual placement new and destructor invocation is unpleasant (std::unique_ptr/std::shared_ptr could help a bit by providing custom deleter functors that do this work for you, but it's ugly even so). Typically if you need a custom allocator, you'd define appropriate overloads for operator new/operator delete. That way, avoiding stack allocation (for whatever reason) isn't nearly so unpleasant; you just replace logically stack allocated variables with std::unique_ptrs (created via std::make_unique), and your code remains fairly simple.
I'm new to C++ and I'm wondering why I should even bother using new and delete? It can cause problems (memory leaks) and I don't get why I shouldn't just initialize a variable without the new operator. Can someone explain it to me? It's hard to google that specific question.
For historical and efficiency reasons, C++ (and C) memory management is explicit and manual.
Sometimes, you might allocate on the call stack (e.g. by using VLAs or alloca(3)). However, that is not always possible, because
stack size is limited (depending on the platform, to a few kilobytes or a few megabytes).
memory need is not always FIFO or LIFO. It does happen that you need to allocate memory, which would be freed (or becomes useless) much later during execution, in particular because it might be the result of some function (and the caller - or its caller - would release that memory).
You definitely should read about garbage collection and dynamic memory allocation. In some languages (Java, Ocaml, Haskell, Lisp, ....) or systems, a GC is provided, and is in charge of releasing memory of useless (more precisely unreachable) data. Read also about weak references. Notice that most GCs need to scan the call stack for local pointers.
Notice that it is possible, but difficult, to have quite efficient garbage collectors (but usually not in C++). For some programs, Ocaml -with a generational copying GC- is faster than the equivalent C++ code -with explicit memory management.
Managing memory explicitly has the advantage (important in C++) that you don't pay for something you don't need. It has the inconvenience of putting more burden on the programmer.
In C or C++ you might sometimes consider using the Boehm's conservative garbage collector. With C++ you might sometimes need to use your own allocator, instead of the default std::allocator. Read also about smart pointers, reference counting, std::shared_ptr, std::unique_ptr, std::weak_ptr, and the RAII idiom, and the rule of three (in C++, becoming the rule of 5). The recent wisdom is to avoid explicit new and delete (e.g. by using standard containers and smart pointers).
Be aware that the most difficult situation in managing memory are arbitrary, perhaps circular, graphs (of reference).
On Linux and some other systems, valgrind is a useful tool to hunt memory leaks.
The alternative, allocating on the stack, will cause you trouble as stack sizes are often limited to Mb magnitudes and you'll get lots of value copies. You'll also have problems sharing stack-allocated data between function calls.
There are alternatives: using std::shared_ptr (C++11 onwards) will do the delete for you once the shared pointer is no longer being used. A technique referred to by the hideous acronym RAII is exploited by the shared pointer implementation. I mention it explicitly since most resource cleanup idioms are RAII-based. You can also make use of the comprehensive data structures available in the C++ Standard Template Library which eliminate the need to get your hands too dirty with explicit memory management.
But formally, every new must be balanced with a delete. Similarly for new[] and delete[].
Indeed in many cases new and delete are not needed, you can just use standard containers instead and leaving to them the allocation/deallocation management.
One of the reasons for which you may need to use allocation explicitly is for objects where the identity is important (i.e. they are not just values that can be copied around).
For example if you have a gui "window" object then making copies probably doesn't make sense and thus you're more or less ruling out all standard containers (they're designed for objects that can be copied and assigned). In this case if the object needs to survive the function that creates it probably the simplest solution is to just allocate explicitly it on the heap, possibly using a smart pointer to avoid leaks or use-after-delete.
In other cases it may be important to avoid copies not because they're illegal, but just not very efficient (big objects) and explicitly handling the instance lifetime may be a better (faster) solution.
Another case where explicit allocation/deallocation may be the best option are complex data structures that cannot be represented by the standard library (for example a tree in which each node is also part of a doubly-linked list).
Modern C++ styles often frown on explicit calls to new and delete outside of specialized resource management code.
This is not because the stack/automatic storage is sufficient, but rather because RAII smart resource owners (be they containers, shared pointers, or something else) make almost all direct memory wrangling unnessecary. And as the problem of memory management is often error prone, this makes your code more robust, easier to read, and sometimes faster (as the fancy resource owners can use techniques you might not bother with everywhere).
This is exemplified by the rule of zero: write no destructor, copy/move assign, copy/move constructor. Store state in smart storage, and have it handle it for you.
None of the above applies when you yourself are writing smart memory owning classes. This is a rare thing to need to do, however. It also requires C++14 (for make_unique) to get rid of the penultimate excuse to call new.
Now, the free store is still used, just not directly, under the above style. The free store (aka heap) is needed because automatic storage (aka the stack) only supports really simple object lifetime rules (scope based, compile time deterministic size and count, FILO order). As runtime sized and counted data is common, and object lifetime is often not that simple, the free store is used by most programs. Sometimes copying an object around on the stack is enough to make the simple lifetime less of a problem, but at other times identity is important.
The final reason is stack overflow. On some C++ implementations the stack/automatic storage is seriously constrained in size. What more is that there is rarely if ever a reliable failure mode when you put to much stuff in it. By storing large data on the free store, we can reduce the chance the stack will overflow.
First, if you don't need dynamic allocation, don't use it.
The most frequent reason for needing dynamic allocation is that
the object will have a lifetime which is determined by the
program logic rather than lexical scope. The new and
delete operators are designed to support explicitly managed
lifetimes.
Another common reason is that the size or structure of the
"object" is determined at runtime. For simple cases (arrays,
etc.) there are standard classes (std::vector) which will
handle this for you, but for more complicated structures (e.g.
graphs and trees), you'll have to do this yourself. (The usual
technique here is to create a class representing the graph or
tree, and have it manage the memory.)
And there is the case where the object must be polymorphic, and
the actual type won't be known until runtime. (There are some
tricky ways of handling this without dynamic allocation in the
simplest cases, but in general, you'll need dynamic allocation.)
In this case, std::unique_ptr might be appropriate to handle
the delete, or if the object must be shared, std::shared_ptr
(although usually, objects which must be shared fall into the
first category, above, and so smart pointers aren't
appropriate).
There are probably other reasons as well, but these are the
three that I've encountered the most often.
Only on simple programs you can know beforehand how much memory you'd use. In general you can not foresee how much memory you'd use.
However with modern C++11 you generally rely on standard libraries like vector and map for memory allocation, and the use of smart pointers helps you avoid memory leaks, so you don't really need to use new and delete explicitly by hand.
When you are using New then your object stores in Heap, and it remains there until you don't manually delete it. but in the case without using new your object goes in Stack and it destroys automatically when it goes out of scope.
Stack is set to a fix size, so if there is no any block for assign a new object then Stack Overflow occurs. This often happens when a lot of nested functions are being called, or if there is an infinite recursive call. If the current size of the heap is too small to accommodate new memory, then more memory can be added to the heap by the operating system.
Another reason may be if you are explicitly calling an external library or API with a C-style interface. Setting up a callback in such cases often means context data must be supplied and returned in the callback, and such an interface usually provides only a 'simple' void* or int*. Allocating an object or struct with new is appropriate for such actions, (you can delete it later in the callback, should you need to).
I stumbled upon Stack Overflow question Memory leak with std::string when using std::list<std::string>, and one of the comments says this:
Stop using new so much. I can't see any reason you used new anywhere you did. You can create objects by value in C++ and it's one of the huge advantages to using the language. You do not have to allocate everything on the heap. Stop thinking like a Java programmer.
I'm not really sure what he means by that.
Why should objects be created by value in C++ as often as possible, and what difference does it make internally? Did I misinterpret the answer?
There are two widely-used memory allocation techniques: automatic allocation and dynamic allocation. Commonly, there is a corresponding region of memory for each: the stack and the heap.
Stack
The stack always allocates memory in a sequential fashion. It can do so because it requires you to release the memory in the reverse order (First-In, Last-Out: FILO). This is the memory allocation technique for local variables in many programming languages. It is very, very fast because it requires minimal bookkeeping and the next address to allocate is implicit.
In C++, this is called automatic storage because the storage is claimed automatically at the end of scope. As soon as execution of current code block (delimited using {}) is completed, memory for all variables in that block is automatically collected. This is also the moment where destructors are invoked to clean up resources.
Heap
The heap allows for a more flexible memory allocation mode. Bookkeeping is more complex and allocation is slower. Because there is no implicit release point, you must release the memory manually, using delete or delete[] (free in C). However, the absence of an implicit release point is the key to the heap's flexibility.
Reasons to use dynamic allocation
Even if using the heap is slower and potentially leads to memory leaks or memory fragmentation, there are perfectly good use cases for dynamic allocation, as it's less limited.
Two key reasons to use dynamic allocation:
You don't know how much memory you need at compile time. For instance, when reading a text file into a string, you usually don't know what size the file has, so you can't decide how much memory to allocate until you run the program.
You want to allocate memory which will persist after leaving the current block. For instance, you may want to write a function string readfile(string path) that returns the contents of a file. In this case, even if the stack could hold the entire file contents, you could not return from a function and keep the allocated memory block.
Why dynamic allocation is often unnecessary
In C++ there's a neat construct called a destructor. This mechanism allows you to manage resources by aligning the lifetime of the resource with the lifetime of a variable. This technique is called RAII and is the distinguishing point of C++. It "wraps" resources into objects. std::string is a perfect example. This snippet:
int main ( int argc, char* argv[] )
{
std::string program(argv[0]);
}
actually allocates a variable amount of memory. The std::string object allocates memory using the heap and releases it in its destructor. In this case, you did not need to manually manage any resources and still got the benefits of dynamic memory allocation.
In particular, it implies that in this snippet:
int main ( int argc, char* argv[] )
{
std::string * program = new std::string(argv[0]); // Bad!
delete program;
}
there is unneeded dynamic memory allocation. The program requires more typing (!) and introduces the risk of forgetting to deallocate the memory. It does this with no apparent benefit.
Why you should use automatic storage as often as possible
Basically, the last paragraph sums it up. Using automatic storage as often as possible makes your programs:
faster to type;
faster when run;
less prone to memory/resource leaks.
Bonus points
In the referenced question, there are additional concerns. In particular, the following class:
class Line {
public:
Line();
~Line();
std::string* mString;
};
Line::Line() {
mString = new std::string("foo_bar");
}
Line::~Line() {
delete mString;
}
Is actually a lot more risky to use than the following one:
class Line {
public:
Line();
std::string mString;
};
Line::Line() {
mString = "foo_bar";
// note: there is a cleaner way to write this.
}
The reason is that std::string properly defines a copy constructor. Consider the following program:
int main ()
{
Line l1;
Line l2 = l1;
}
Using the original version, this program will likely crash, as it uses delete on the same string twice. Using the modified version, each Line instance will own its own string instance, each with its own memory and both will be released at the end of the program.
Other notes
Extensive use of RAII is considered a best practice in C++ because of all the reasons above. However, there is an additional benefit which is not immediately obvious. Basically, it's better than the sum of its parts. The whole mechanism composes. It scales.
If you use the Line class as a building block:
class Table
{
Line borders[4];
};
Then
int main ()
{
Table table;
}
allocates four std::string instances, four Line instances, one Table instance and all the string's contents and everything is freed automagically.
Because the stack is faster and leak-proof
In C++, it takes but a single instruction to allocate space—on the stack—for every local scope object in a given function, and it's impossible to leak any of that memory. That comment intended (or should have intended) to say something like "use the stack and not the heap".
The reason why is complicated.
First, C++ is not garbage collected. Therefore, for every new, there must be a corresponding delete. If you fail to put this delete in, then you have a memory leak. Now, for a simple case like this:
std::string *someString = new std::string(...);
//Do stuff
delete someString;
This is simple. But what happens if "Do stuff" throws an exception? Oops: memory leak. What happens if "Do stuff" issues return early? Oops: memory leak.
And this is for the simplest case. If you happen to return that string to someone, now they have to delete it. And if they pass it as an argument, does the person receiving it need to delete it? When should they delete it?
Or, you can just do this:
std::string someString(...);
//Do stuff
No delete. The object was created on the "stack", and it will be destroyed once it goes out of scope. You can even return the object, thus transfering its contents to the calling function. You can pass the object to functions (typically as a reference or const-reference: void SomeFunc(std::string &iCanModifyThis, const std::string &iCantModifyThis). And so forth.
All without new and delete. There's no question of who owns the memory or who's responsible for deleting it. If you do:
std::string someString(...);
std::string otherString;
otherString = someString;
It is understood that otherString has a copy of the data of someString. It isn't a pointer; it is a separate object. They may happen to have the same contents, but you can change one without affecting the other:
someString += "More text.";
if(otherString == someString) { /*Will never get here */ }
See the idea?
Objects created by new must be eventually deleted lest they leak. The destructor won't be called, memory won't be freed, the whole bit. Since C++ has no garbage collection, it's a problem.
Objects created by value (i. e. on stack) automatically die when they go out of scope. The destructor call is inserted by the compiler, and the memory is auto-freed upon function return.
Smart pointers like unique_ptr, shared_ptr solve the dangling reference problem, but they require coding discipline and have other potential issues (copyability, reference loops, etc.).
Also, in heavily multithreaded scenarios, new is a point of contention between threads; there can be a performance impact for overusing new. Stack object creation is by definition thread-local, since each thread has its own stack.
The downside of value objects is that they die once the host function returns - you cannot pass a reference to those back to the caller, only by copying, returning or moving by value.
C++ doesn't employ any memory manager by its own. Other languages like C# and Java have a garbage collector to handle the memory
C++ implementations typically use operating system routines to allocate the memory and too much new/delete could fragment the available memory
With any application, if the memory is frequently being used it's advisable to preallocate it and release when not required.
Improper memory management could lead memory leaks and it's really hard to track. So using stack objects within the scope of function is a proven technique
The downside of using stack objects are, it creates multiple copies of objects on returning, passing to functions, etc. However, smart compilers are well aware of these situations and they've been optimized well for performance
It's really tedious in C++ if the memory being allocated and released in two different places. The responsibility for release is always a question and mostly we rely on some commonly accessible pointers, stack objects (maximum possible) and techniques like auto_ptr (RAII objects)
The best thing is that, you've control over the memory and the worst thing is that you will not have any control over the memory if we employ an improper memory management for the application. The crashes caused due to memory corruptions are the nastiest and hard to trace.
I see that a few important reasons for doing as few new's as possible are missed:
Operator new has a non-deterministic execution time
Calling new may or may not cause the OS to allocate a new physical page to your process. This can be quite slow if you do it often. Or it may already have a suitable memory location ready; we don't know. If your program needs to have consistent and predictable execution time (like in a real-time system or game/physics simulation), you need to avoid new in your time-critical loops.
Operator new is an implicit thread synchronization
Yes, you heard me. Your OS needs to make sure your page tables are consistent and as such calling new will cause your thread to acquire an implicit mutex lock. If you are consistently calling new from many threads you are actually serialising your threads (I've done this with 32 CPUs, each hitting on new to get a few hundred bytes each, ouch! That was a royal p.i.t.a. to debug.)
The rest, such as slow, fragmentation, error prone, etc., have already been mentioned by other answers.
Pre-C++17:
Because it is prone to subtle leaks even if you wrap the result in a smart pointer.
Consider a "careful" user who remembers to wrap objects in smart pointers:
foo(shared_ptr<T1>(new T1()), shared_ptr<T2>(new T2()));
This code is dangerous because there is no guarantee that either shared_ptr is constructed before either T1 or T2. Hence, if one of new T1() or new T2() fails after the other succeeds, then the first object will be leaked because no shared_ptr exists to destroy and deallocate it.
Solution: use make_shared.
Post-C++17:
This is no longer a problem: C++17 imposes a constraint on the order of these operations, in this case ensuring that each call to new() must be immediately followed by the construction of the corresponding smart pointer, with no other operation in between. This implies that, by the time the second new() is called, it is guaranteed that the first object has already been wrapped in its smart pointer, thus preventing any leaks in case an exception is thrown.
A more detailed explanation of the new evaluation order introduced by C++17 was provided by Barry in another answer.
Thanks to #Remy Lebeau for pointing out that this is still a problem under C++17 (although less so): the shared_ptr constructor can fail to allocate its control block and throw, in which case the pointer passed to it is not deleted.
Solution: use make_shared.
To a great extent, that's someone elevating their own weaknesses to a general rule. There's nothing wrong per se with creating objects using the new operator. What there is some argument for is that you have to do so with some discipline: if you create an object you need to make sure it's going to be destroyed.
The easiest way of doing that is to create the object in automatic storage, so C++ knows to destroy it when it goes out of scope:
{
File foo = File("foo.dat");
// Do things
}
Now, observe that when you fall off that block after the end-brace, foo is out of scope. C++ will call its destructor automatically for you. Unlike Java, you don't need to wait for the garbage collection to find it.
Had you written
{
File * foo = new File("foo.dat");
you would want to match it explicitly with
delete foo;
}
or even better, allocate your File * as a "smart pointer". If you aren't careful about that it can lead to leaks.
The answer itself makes the mistaken assumption that if you don't use new you don't allocate on the heap; in fact, in C++ you don't know that. At most, you know that a small amount of memory, say one pointer, is certainly allocated on the stack. However, consider if the implementation of File is something like:
class File {
private:
FileImpl * fd;
public:
File(String fn){ fd = new FileImpl(fn);}
Then FileImpl will still be allocated on the stack.
And yes, you'd better be sure to have
~File(){ delete fd ; }
in the class as well; without it, you'll leak memory from the heap even if you didn't apparently allocate on the heap at all.
new() shouldn't be used as little as possible. It should be used as carefully as possible. And it should be used as often as necessary as dictated by pragmatism.
Allocation of objects on the stack, relying on their implicit destruction, is a simple model. If the required scope of an object fits that model then there's no need to use new(), with the associated delete() and checking of NULL pointers.
In the case where you have lots of short-lived objects allocation on the stack should reduce the problems of heap fragmentation.
However, if the lifetime of your object needs to extend beyond the current scope then new() is the right answer. Just make sure that you pay attention to when and how you call delete() and the possibilities of NULL pointers, using deleted objects and all of the other gotchas that come with the use of pointers.
When you use new, objects are allocated to the heap. It is generally used when you anticipate expansion. When you declare an object such as,
Class var;
it is placed on the stack.
You will always have to call destroy on the object that you placed on the heap with new. This opens the potential for memory leaks. Objects placed on the stack are not prone to memory leaking!
One notable reason to avoid overusing the heap is for performance -- specifically involving the performance of the default memory management mechanism used by C++. While allocation can be quite quick in the trivial case, doing a lot of new and delete on objects of non-uniform size without strict order leads not only to memory fragmentation, but it also complicates the allocation algorithm and can absolutely destroy performance in certain cases.
That's the problem that memory pools where created to solve, allowing to to mitigate the inherent disadvantages of traditional heap implementations, while still allowing you to use the heap as necessary.
Better still, though, to avoid the problem altogether. If you can put it on the stack, then do so.
I tend to disagree with the idea of using new "too much". Though the original poster's use of new with system classes is a bit ridiculous. (int *i; i = new int[9999];? really? int i[9999]; is much clearer.) I think that is what was getting the commenter's goat.
When you're working with system objects, it's very rare that you'd need more than one reference to the exact same object. As long as the value is the same, that's all that matters. And system objects don't typically take up much space in memory. (one byte per character, in a string). And if they do, the libraries should be designed to take that memory management into account (if they're written well). In these cases, (all but one or two of the news in his code), new is practically pointless and only serves to introduce confusions and potential for bugs.
When you're working with your own classes/objects, however (e.g. the original poster's Line class), then you have to begin thinking about the issues like memory footprint, persistence of data, etc. yourself. At this point, allowing multiple references to the same value is invaluable - it allows for constructs like linked lists, dictionaries, and graphs, where multiple variables need to not only have the same value, but reference the exact same object in memory. However, the Line class doesn't have any of those requirements. So the original poster's code actually has absolutely no needs for new.
I think the poster meant to say You do not have to allocate everything on the heap rather than the the stack.
Basically, objects are allocated on the stack (if the object size allows, of course) because of the cheap cost of stack-allocation, rather than heap-based allocation which involves quite some work by the allocator, and adds verbosity because then you have to manage data allocated on the heap.
Two reasons:
It's unnecessary in this case. You're making your code needlessly more complicated.
It allocates space on the heap, and it means that you have to remember to delete it later, or it will cause a memory leak.
Many answers have gone into various performance considerations. I want to address the comment which puzzled OP:
Stop thinking like a Java programmer.
Indeed, in Java, as explained in the answer to this question,
You use the new keyword when an object is being explicitly created for the first time.
but in C++, objects of type T are created like so: T{} (or T{ctor_argument1,ctor_arg2} for a constructor with arguments). That's why usually you just have no reason to want to use new.
So, why is it ever used at all? Well, for two reasons:
You need to create many values the number of which is not known at compile time.
Due to limitations of the C++ implementation on common machines - to prevent a stack overflow by allocating too much space creating values the regular way.
Now, beyond what the comment you quoted implied, you should note that even those two cases above are covered well enough without you having to "resort" to using new yourself:
You can use container types from the standard libraries which can hold a runtime-variable number of elements (like std::vector).
You can use smart pointers, which give you a pointer similar to new, but ensure that memory gets released where the "pointer" goes out of scope.
and for this reason, it is an official item in the C++ community Coding Guidelines to avoid explicit new and delete: Guideline R.11.
The core reason is that objects on heap are always difficult to use and manage than simple values. Writing code that are easy to read and maintain is always the first priority of any serious programmer.
Another scenario is the library we are using provides value semantics and make dynamic allocation unnecessary. Std::string is a good example.
For object oriented code however, using a pointer - which means use new to create it beforehand - is a must. In order to simplify the complexity of resource management, we have dozens of tools to make it as simple as possible, such as smart pointers. The object based paradigm or generic paradigm assumes value semantics and requires less or no new, just as the posters elsewhere stated.
Traditional design patterns, especially those mentioned in GoF book, use new a lot, as they are typical OO code.
new is the new goto.
Recall why goto is so reviled: while it is a powerful, low-level tool for flow control, people often used it in unnecessarily complicated ways that made code difficult to follow. Furthermore, the most useful and easiest to read patterns were encoded in structured programming statements (e.g. for or while); the ultimate effect is that the code where goto is the appropriate way to is rather rare, if you are tempted to write goto, you're probably doing things badly (unless you really know what you're doing).
new is similar — it is often used to make things unnecessarily complicated and harder to read, and the most useful usage patterns can be encoded have been encoded into various classes. Furthermore, if you need to use any new usage patterns for which there aren't already standard classes, you can write your own classes that encode them!
I would even argue that new is worse than goto, due to the need to pair new and delete statements.
Like goto, if you ever think you need to use new, you are probably doing things badly — especially if you are doing so outside of the implementation of a class whose purpose in life is to encapsulate whatever dynamic allocations you need to do.
One more point to all the above correct answers, it depends on what sort of programming you are doing. Kernel developing in Windows for example -> The stack is severely limited and you might not be able to take page faults like in user mode.
In such environments, new, or C-like API calls are prefered and even required.
Of course, this is merely an exception to the rule.
new allocates objects on the heap. Otherwise, objects are allocated on the stack. Look up the difference between the two.
I've been programming for a while but It's been mostly Java and C#. I've never actually had to manage memory on my own. I recently began programming in C++ and I'm a little confused as to when I should store things on the stack and when to store them on the heap.
My understanding is that variables which are accessed very frequently should be stored on the stack and objects, rarely used variables, and large data structures should all be stored on the heap. Is this correct or am I incorrect?
No, the difference between stack and heap isn't performance. It's lifespan: any local variable inside a function (anything you do not malloc() or new) lives on the stack. It goes away when you return from the function. If you want something to live longer than the function that declared it, you must allocate it on the heap.
class Thingy;
Thingy* foo( )
{
int a; // this int lives on the stack
Thingy B; // this thingy lives on the stack and will be deleted when we return from foo
Thingy *pointerToB = &B; // this points to an address on the stack
Thingy *pointerToC = new Thingy(); // this makes a Thingy on the heap.
// pointerToC contains its address.
// this is safe: C lives on the heap and outlives foo().
// Whoever you pass this to must remember to delete it!
return pointerToC;
// this is NOT SAFE: B lives on the stack and will be deleted when foo() returns.
// whoever uses this returned pointer will probably cause a crash!
return pointerToB;
}
For a clearer understanding of what the stack is, come at it from the other end -- rather than try to understand what the stack does in terms of a high level language, look up "call stack" and "calling convention" and see what the machine really does when you call a function. Computer memory is just a series of addresses; "heap" and "stack" are inventions of the compiler.
I would say:
Store it on the stack, if you CAN.
Store it on the heap, if you NEED TO.
Therefore, prefer the stack to the heap. Some possible reasons that you can't store something on the stack are:
It's too big - on multithreaded programs on 32-bit OS, the stack has a small and fixed (at thread-creation time at least) size (typically just a few megs. This is so that you can create lots of threads without exhausting address space. For 64-bit programs, or single threaded (Linux anyway) programs, this is not a major issue. Under 32-bit Linux, single threaded programs usually use dynamic stacks which can keep growing until they reach the top of the heap.
You need to access it outside the scope of the original stack frame - this is really the main reason.
It is possible, with sensible compilers, to allocate non-fixed size objects on the heap (usually arrays whose size is not known at compile time).
It's more subtle than the other answers suggest. There is no absolute divide between data on the stack and data on the heap based on how you declare it. For example:
std::vector<int> v(10);
In the body of a function, that declares a vector (dynamic array) of ten integers on the stack. But the storage managed by the vector is not on the stack.
Ah, but (the other answers suggest) the lifetime of that storage is bounded by the lifetime of the vector itself, which here is stack-based, so it makes no difference how it's implemented - we can only treat it as a stack-based object with value semantics.
Not so. Suppose the function was:
void GetSomeNumbers(std::vector<int> &result)
{
std::vector<int> v(10);
// fill v with numbers
result.swap(v);
}
So anything with a swap function (and any complex value type should have one) can serve as a kind of rebindable reference to some heap data, under a system which guarantees a single owner of that data.
Therefore the modern C++ approach is to never store the address of heap data in naked local pointer variables. All heap allocations must be hidden inside classes.
If you do that, you can think of all variables in your program as if they were simple value types, and forget about the heap altogether (except when writing a new value-like wrapper class for some heap data, which ought to be unusual).
You merely have to retain one special bit of knowledge to help you optimise: where possible, instead of assigning one variable to another like this:
a = b;
swap them like this:
a.swap(b);
because it's much faster and it doesn't throw exceptions. The only requirement is that you don't need b to continue to hold the same value (it's going to get a's value instead, which would be trashed in a = b).
The downside is that this approach forces you to return values from functions via output parameters instead of the actual return value. But they're fixing that in C++0x with rvalue references.
In the most complicated situations of all, you would take this idea to the general extreme and use a smart pointer class such as shared_ptr which is already in tr1. (Although I'd argue that if you seem to need it, you've possibly moved outside Standard C++'s sweet spot of applicability.)
You also would store an item on the heap if it needs to be used outside the scope of the function in which it is created. One idiom used with stack objects is called RAII - this involves using the stack based object as a wrapper for a resource, when the object is destroyed, the resource would be cleaned up. Stack based objects are easier to keep track of when you might be throwing exceptions - you don't need to concern yourself with deleting a heap based object in an exception handler. This is why raw pointers are not normally used in modern C++, you would use a smart pointer which can be a stack based wrapper for a raw pointer to a heap based object.
To add to the other answers, it can also be about performance, at least a little bit. Not that you should worry about this unless it's relevant for you, but:
Allocating in the heap requires finding a tracking a block of memory, which is not a constant-time operation (and takes some cycles and overhead). This can get slower as memory becomes fragmented, and/or you're getting close to using 100% of your address space. On the other hand, stack allocations are constant-time, basically "free" operations.
Another thing to consider (again, really only important if it becomes an issue) is that typically the stack size is fixed, and can be much lower than the heap size. So if you're allocating large objects or many small objects, you probably want to use the heap; if you run out of stack space, the runtime will throw the site titular exception. Not usually a big deal, but another thing to consider.
Stack is more efficient, and easier to managed scoped data.
But heap should be used for anything larger than a few KB (it's easy in C++, just create a boost::scoped_ptr on the stack to hold a pointer to the allocated memory).
Consider a recursive algorithm that keeps calling into itself. It's Very hard to limit and or guess the total stack usage! Whereas on the heap, the allocator (malloc() or new) can indicate out-of-memory by returning NULL or throw ing.
Source: Linux Kernel whose stack is no larger than 8KB!
For completeness, you may read Miro Samek's article about the problems of using the heap in the context of embedded software.
A Heap of Problems
The choice of whether to allocate on the heap or on the stack is one that is made for you, depending on how your variable is allocated. If you allocate something dynamically, using a "new" call, you are allocating from the heap. If you allocate something as a global variable, or as a parameter in a function it is allocated on the stack.
In my opinion there are two deciding factors
1) Scope of variable
2) Performance.
I would prefer to use stack in most cases but if you need access to variable outside scope you can use heap.
To enhance performance while using heaps you can also use the functionality to create heap block and that can help in gaining performance rather than allocating each variable in different memory location.
probably this has been answered quite well. I would like to point you to the below series of articles to have a deeper understanding of low level details. Alex Darby has a series of articles, where he walks you through with a debugger. Here is Part 3 about the Stack.
http://www.altdevblogaday.com/2011/12/14/c-c-low-level-curriculum-part-3-the-stack/