Just a design/optimization question. When do you store pointers or objects and why? For example, I believe both of these work (barring compile errors):
class A{
std::unique_ptr<Object> object_ptr;
};
A::A():object_ptr(new Object()){}
class B{
Object object;
};
B::B():object(Object()){}
I believe one difference comes when instantiating on stack or heap?
For example:
int main(){
std::unique_ptr<A> a_ptr;
std::unique_ptr<B> b_ptr;
a_ptr = new A(); //(*object_ptr) on heap, (*a_ptr) on heap?
b_ptr = new B(); //(*object_ptr) on heap, object on heap?
A a; //a on stack, (*object_ptr) on heap?
B b; //b on stack, object on stack?
}
Also, sizeof(A) should be < sizeof(B)?
Are there any other issues that I am missing?
(Daniel reminded me about the inheritance issue in his related post in the comments)
So since stack allocation is faster than the heap allocation in general, but size is smaller for A than B, are these one of those tradeoffs that cannot be answered without testing performance in the case in question even with move semantics? Or some rules of thumbs When it is more advantageous to use one over the other?
(San Jacinto corrected me about stack/heap allocation is faster, not stack/heap)
I would guess that more copy constructing would lead to the same performance issue, (3 copies would ~ 3x similar performance hit as initiating the first instance). But move constructing may be more advantageous to use the stack as much as possible???
Here is a related question, but not exactly the same.
C++ STL: should I store entire objects, or pointers to objects?
Thanks!
If you have a big object inside your A class, then I'd store a pointer to it, but for small objects, or primitive types, you should not really need to store pointers, in most cases.
Also, when something is stored on the stack or on the heap (freestore) is really implementation dependent, and A a is not always guarantueed to be on the stack.
It's better to call this an automatic object, because it's storage duration is determined by the scope of the function it is declared in. When the function returns, a will be destroyed.
Pointers require the use of new and it does carry some overhead, but on machines today, I'd say it is trivial in most cases, unless of course you have start newing up millions of objects, then you will start seeing the performance issues.
Each situation is different, and when you should and shouldn't use a pointer, instead of an automatic object, is largely dependent on your situation.
This depends on a lot of specific factors, and either approach can have its merits. I'd say if you will exclusively use the outer object through dynamic allocation, then you might as well make all the members direct members and avoid the additional member allocation. On the other hand, if the outer object is allocated automatically, large members should probably be handled through a unique_ptr.
There's an additional benefit to handling members only through pointers: You remove compile-time dependencies, and the header file for the outer class may be able to get away with a forward-declaration of the inner class, rather than requiring full inclusion of the inner class's header ("PIMPL"). In large projects this sort of decoupling may turn out to be economically sensible.
The heap is not "slower" than the stack. Heap allocation can be slower than stack allocation, and poor cache locality may cause a lot of cache misses if you design your objects and data structures in such a way that there is not a lot of contiguous memory access. So from this standpoint, it depends on what your design and code use goals are.
Even setting this aside, you have to question your copy semantics too. If you want deep copies of your objects (and your objects' objects are also deeply copied), then why even store pointers? If it's okay to have shared memory due to copy semantics, then store pointers but make sure you don't free the memory twice in the dtor.
I tend to use pointers under two conditions: class member initialization order matters deeply, and I'm injecting dependencies into an object. In most other cases, I use non-pointer types.
edit: There are two additional cases when I use pointers: 1) to avoid circular include dependencies (although I may use a reference in some cases), 2) With the intention of using polymorphic function calls.
There are a few cases where you have almost no choice but to store a pointer. One obvious one is when you're creating something like a binary tree:
template <class T>
struct tree_node {
struct tree_node *left, *right;
T data;
}
In this case, the definition is basically recursive, and you don't know up-front how many descendants a tree node might have. You're pretty much stuck with (at least some variation of) storing pointers, and allocating descendant nodes as needed.
There are also cases like dynamic strings where you have only a single object (or array of objects) in the parent object, but its size can vary over a wide enough range that you just about need to (at least provide for the possibility to) use dynamic allocation. With strings, small sizes are common enough that there's a fairly widely-used "short string optimization", where the string object directly includes enough space for strings up to some limit, as well as a pointer to allow dynamic allocation if the string exceeds that size:
template <class T>
class some_string {
static const limit = 20;
size_t allocated;
size_t in_use;
union {
T short_data[limit];
T *long_data;
};
// ...
};
A less obvious reason to use a pointer instead of directly storing a sub-object is for the sake of exception safety. Just for one obvious example, if you store only pointers in a parent object, that can (usually does) make it trivial to provide a swap for those objects that gives the nothrow guarantee:
template <class T>
class parent {
T *data;
void friend swap(parent &a, parent &b) throw() {
T *temp = a.data;
a.data = b.data;
b.data = temp;
}
};
With only a couple of (usually valid) assumptions:
the pointers are valid to start with, and
assigning valid pointers will never throw an exception
...it's trivial for this swap to give the nothrow guarantee unconditionally (i.e., we can just say: "swap will not throw"). If parent stored objects directly instead of pointers, we could only guarantee that conditionally (e.g., swap will throw if and only if the copy constructor or assignment operator for T throws.")
For C++11, using a pointer like this often (usually?) makes it easy to provide an extremely efficient move constructor (that also gives the nothrow guarantee). Of course, using a pointer to (most of) the data isn't the only possible route to fast move construction -- but it is an easy one.
Finally, there are the cases I suspect you had in mind when you asked the question -- ones where the logic involved doesn't necessarily indicate whether you should use automatic or dynamic allocation. In this case, it's (obviously) a judgement call. From a purely theoretical viewpoint, it probably makes no difference at all which you use in these cases. From a practical viewpoint, however, it can make quite a bit of difference. Even though neither the C nor C++ standard guarantees (or even hints at) anything of the sort, the reality is that on most typical systems, objects using automatic allocation will end up on the stack. On most typical systems (e.g., Windows, Linux) the stack is limited to only a fairly small fraction of the available memory (typically on the order of single-digit to low double-digit megabytes).
This means that if all the objects of these types that might exist at any given time might exceed a few megabytes (or so) you need to ensure that (at least most of) the data is allocated dynamically, not automatically. There are two ways to do that: you can either leave it to the user to allocate the parent objects dynamically when/if they might exceed the available stack space, or else you can have the user work with relatively small "shell" objects that allocate space dynamically on the user's behalf.
If that's at all likely to be an issue, it's almost always preferable for the class to handle the dynamic allocation instead of forcing the user to do so. This has two obvious good points:
The user gets to use stack-based resource management (SBRM, aka RAII), and
The effects of limited stack space are limited instead of "percolating" through the whole design.
Bottom line: especially for a template where the type being stored isn't known up-front, I'd tend to favor a pointer and dynamic allocation. I'd reserve direct storage of sub-objects primarily to situations where I know the stored type will (almost?) always be quite small, or where profiling has indicated that dynamic allocation is causing a real speed problem. In the latter case, however, I'd give at least some though to alternatives like overloading operator new for that class.
Related
I'm starting with the assumption that, generally, it is a good idea to allocate small objects in the stack, and big objects in dynamic memory. Another assumption is that I'm possibly confused while trying to learn about memory, STL containers and smart pointers.
Consider the following example, where I have an object that is necessarily allocated in the free store through a smart pointer, and I can rely on clients getting said object from a factory, for instance. This object contains some data that is specifically allocated using an STL container, which happens to be a std::vector. In one case, this data vector itself is dynamically allocated using some smart pointer, and in the other situation I just don't use a smart pointer.
Is there any practical difference between design A and design B, described below?
Situation A:
class SomeClass{
public:
SomeClass(){ /* initialize some potentially big STL container */ }
private:
std::vector<double> dataVector_;
};
Situation B:
class SomeOtherClass{
public:
SomeOtherClass() { /* initialize some potentially big STL container,
but is it allocated in any different way? */ }
private:
std::unique_ptr<std::vector<double>> pDataVector_;
};
Some factory functions.
std::unique_ptr<SomeClass> someClassFactory(){
return std::make_unique<SomeClass>();
}
std::unique_ptr<SomeOtherClass> someOtherClassFactory(){
return std::make_unique<SomeOtherClass>();
}
Use case:
int main(){
//in my case I can reliably assume that objects themselves
//are going to always be allocated in dynamic memory
auto pSomeClassObject(someClassFactory());
auto pSomeOtherClassObject(someOtherClassFactory());
return 0;
}
I would expect that both design choices have the same outcome, but do they?
Is there any advantage or disadvantage for choosing A or B? Specifically, should I generally choose design A because it's simpler or are there more considerations? Is B morally wrong because it can dangle for a std::vector?
tl;dr : Is it wrong to have a smart pointer pointing to a STL container?
edit:
The related answers pointed to useful additional information for someone as confused as myself.
Usage of objects or pointers to objects as class members and memory allocation
and Class members that are objects - Pointers or not? C++
And changing some google keywords lead me to When vectors are allocated, do they use memory on the heap or the stack?
std::unique_ptr<std::vector<double>> is slower, takes more memory, and the only advantage is that it contains an additional possible state: "vector doesn't exist". However, if you care about that state, use boost::optional<std::vector> instead. You should almost never have a heap-allocated container, and definitely never use a unique_ptr. It actually works fine, no "dangling", it's just pointlessly slow.
Using std::unique_ptr here is just wasteful unless your goal is a compiler firewall (basically hiding the compile-time dependency to vector, but then you'd need a forward declaration to standard containers).
You're adding an indirection but, more importantly, the full contents of SomeClass turns into 3 separate memory blocks to load when accessing the contents (SomeClass merged with/containing unique_ptr's block pointing to std::vector's block pointing to its element array). In addition you're paying one extra superfluous level of heap overhead.
Now you might start imagining scenarios where an indirection is helpful to the vector, like maybe you can shallow move/swap the unique_ptrs between two SomeClass instances. Yes, but vector already provides that without a unique_ptr wrapper on top. And it already has states like empty that you can reuse for some concept of validity/nilness.
Remember that variable-sized containers themselves are small objects, not big ones, pointing to potentially big blocks. vector isn't big, its dynamic contents can be. The idea of adding indirections for big objects isn't a bad rule of thumb, but vector is not a big object. With move semantics in place, it's worth thinking of it more like a little memory block pointing to a big one that can be shallow copied and swapped cheaply. Before move semantics, there were more reasons to think of something like std::vector as one indivisibly large object (though its contents were always swappable), but now it's worth thinking of it more like a little handle pointing to big, dynamic contents.
Some common reasons to introduce an indirection through something like unique_ptr is:
Abstraction & hiding. If you're trying to abstract or hide the concrete definition of some type/subtype, Foo, then this is where you need the indirection so that its handle can be captured (or potentially even used with abstraction) by those who don't know exactly what Foo is.
To allow a big, contiguous 1-block-type object to be passed around from owner to owner without invoking a copy or invalidating references/pointers (iterators included) to it or its contents.
A hasty kind of reason that's wasteful but sometimes useful in a deadline rush is to simply introduce a validity/null state to something that doesn't inherently have it.
Occasionally it's useful as an optimization to hoist out certain less frequently-accessed, larger members of an object so that its commonly-accessed elements fit more snugly (and perhaps with adjacent objects) in a cache line. There unique_ptr can let you split apart that object's memory layout while still conforming to RAII.
Now wrapping a shared_ptr on top of a standard container might have more legitimate applications if you have a container that can actually be owned (sensibly) by more than one owner. With unique_ptr, only one owner can possess the object at a time, and standard containers already let you swap and move each other's internal guts (the big, dynamic parts). So there's very little reason I can think of to wrap a standard container directly with a unique_ptr, as it's already somewhat like a smart pointer to a dynamic array (but with more functionality to work with that dynamic data, including deep copying it if desired).
And if we talk about non-standard containers, like say you're working with a third party library that provides some data structures whose contents can get very large but they fail to provide those cheap, non-invalidating move/swap semantics, then you might superficially wrap it around a unique_ptr, exchanging some creation/access/destruction overhead to get those cheap move/swap semantics back as a workaround. For the standard containers, no such workaround is needed.
I agree with #MooingDuck; I don't think using std::unique_ptr has any compelling advantages. However, I could see a use case for std::shared_ptr if the member data is very large and the class is going to support COW (copy-on-write) semantics (or any other use case where the data is shared across multiple instances).
I stumbled upon Stack Overflow question Memory leak with std::string when using std::list<std::string>, and one of the comments says this:
Stop using new so much. I can't see any reason you used new anywhere you did. You can create objects by value in C++ and it's one of the huge advantages to using the language. You do not have to allocate everything on the heap. Stop thinking like a Java programmer.
I'm not really sure what he means by that.
Why should objects be created by value in C++ as often as possible, and what difference does it make internally? Did I misinterpret the answer?
There are two widely-used memory allocation techniques: automatic allocation and dynamic allocation. Commonly, there is a corresponding region of memory for each: the stack and the heap.
Stack
The stack always allocates memory in a sequential fashion. It can do so because it requires you to release the memory in the reverse order (First-In, Last-Out: FILO). This is the memory allocation technique for local variables in many programming languages. It is very, very fast because it requires minimal bookkeeping and the next address to allocate is implicit.
In C++, this is called automatic storage because the storage is claimed automatically at the end of scope. As soon as execution of current code block (delimited using {}) is completed, memory for all variables in that block is automatically collected. This is also the moment where destructors are invoked to clean up resources.
Heap
The heap allows for a more flexible memory allocation mode. Bookkeeping is more complex and allocation is slower. Because there is no implicit release point, you must release the memory manually, using delete or delete[] (free in C). However, the absence of an implicit release point is the key to the heap's flexibility.
Reasons to use dynamic allocation
Even if using the heap is slower and potentially leads to memory leaks or memory fragmentation, there are perfectly good use cases for dynamic allocation, as it's less limited.
Two key reasons to use dynamic allocation:
You don't know how much memory you need at compile time. For instance, when reading a text file into a string, you usually don't know what size the file has, so you can't decide how much memory to allocate until you run the program.
You want to allocate memory which will persist after leaving the current block. For instance, you may want to write a function string readfile(string path) that returns the contents of a file. In this case, even if the stack could hold the entire file contents, you could not return from a function and keep the allocated memory block.
Why dynamic allocation is often unnecessary
In C++ there's a neat construct called a destructor. This mechanism allows you to manage resources by aligning the lifetime of the resource with the lifetime of a variable. This technique is called RAII and is the distinguishing point of C++. It "wraps" resources into objects. std::string is a perfect example. This snippet:
int main ( int argc, char* argv[] )
{
std::string program(argv[0]);
}
actually allocates a variable amount of memory. The std::string object allocates memory using the heap and releases it in its destructor. In this case, you did not need to manually manage any resources and still got the benefits of dynamic memory allocation.
In particular, it implies that in this snippet:
int main ( int argc, char* argv[] )
{
std::string * program = new std::string(argv[0]); // Bad!
delete program;
}
there is unneeded dynamic memory allocation. The program requires more typing (!) and introduces the risk of forgetting to deallocate the memory. It does this with no apparent benefit.
Why you should use automatic storage as often as possible
Basically, the last paragraph sums it up. Using automatic storage as often as possible makes your programs:
faster to type;
faster when run;
less prone to memory/resource leaks.
Bonus points
In the referenced question, there are additional concerns. In particular, the following class:
class Line {
public:
Line();
~Line();
std::string* mString;
};
Line::Line() {
mString = new std::string("foo_bar");
}
Line::~Line() {
delete mString;
}
Is actually a lot more risky to use than the following one:
class Line {
public:
Line();
std::string mString;
};
Line::Line() {
mString = "foo_bar";
// note: there is a cleaner way to write this.
}
The reason is that std::string properly defines a copy constructor. Consider the following program:
int main ()
{
Line l1;
Line l2 = l1;
}
Using the original version, this program will likely crash, as it uses delete on the same string twice. Using the modified version, each Line instance will own its own string instance, each with its own memory and both will be released at the end of the program.
Other notes
Extensive use of RAII is considered a best practice in C++ because of all the reasons above. However, there is an additional benefit which is not immediately obvious. Basically, it's better than the sum of its parts. The whole mechanism composes. It scales.
If you use the Line class as a building block:
class Table
{
Line borders[4];
};
Then
int main ()
{
Table table;
}
allocates four std::string instances, four Line instances, one Table instance and all the string's contents and everything is freed automagically.
Because the stack is faster and leak-proof
In C++, it takes but a single instruction to allocate space—on the stack—for every local scope object in a given function, and it's impossible to leak any of that memory. That comment intended (or should have intended) to say something like "use the stack and not the heap".
The reason why is complicated.
First, C++ is not garbage collected. Therefore, for every new, there must be a corresponding delete. If you fail to put this delete in, then you have a memory leak. Now, for a simple case like this:
std::string *someString = new std::string(...);
//Do stuff
delete someString;
This is simple. But what happens if "Do stuff" throws an exception? Oops: memory leak. What happens if "Do stuff" issues return early? Oops: memory leak.
And this is for the simplest case. If you happen to return that string to someone, now they have to delete it. And if they pass it as an argument, does the person receiving it need to delete it? When should they delete it?
Or, you can just do this:
std::string someString(...);
//Do stuff
No delete. The object was created on the "stack", and it will be destroyed once it goes out of scope. You can even return the object, thus transfering its contents to the calling function. You can pass the object to functions (typically as a reference or const-reference: void SomeFunc(std::string &iCanModifyThis, const std::string &iCantModifyThis). And so forth.
All without new and delete. There's no question of who owns the memory or who's responsible for deleting it. If you do:
std::string someString(...);
std::string otherString;
otherString = someString;
It is understood that otherString has a copy of the data of someString. It isn't a pointer; it is a separate object. They may happen to have the same contents, but you can change one without affecting the other:
someString += "More text.";
if(otherString == someString) { /*Will never get here */ }
See the idea?
Objects created by new must be eventually deleted lest they leak. The destructor won't be called, memory won't be freed, the whole bit. Since C++ has no garbage collection, it's a problem.
Objects created by value (i. e. on stack) automatically die when they go out of scope. The destructor call is inserted by the compiler, and the memory is auto-freed upon function return.
Smart pointers like unique_ptr, shared_ptr solve the dangling reference problem, but they require coding discipline and have other potential issues (copyability, reference loops, etc.).
Also, in heavily multithreaded scenarios, new is a point of contention between threads; there can be a performance impact for overusing new. Stack object creation is by definition thread-local, since each thread has its own stack.
The downside of value objects is that they die once the host function returns - you cannot pass a reference to those back to the caller, only by copying, returning or moving by value.
C++ doesn't employ any memory manager by its own. Other languages like C# and Java have a garbage collector to handle the memory
C++ implementations typically use operating system routines to allocate the memory and too much new/delete could fragment the available memory
With any application, if the memory is frequently being used it's advisable to preallocate it and release when not required.
Improper memory management could lead memory leaks and it's really hard to track. So using stack objects within the scope of function is a proven technique
The downside of using stack objects are, it creates multiple copies of objects on returning, passing to functions, etc. However, smart compilers are well aware of these situations and they've been optimized well for performance
It's really tedious in C++ if the memory being allocated and released in two different places. The responsibility for release is always a question and mostly we rely on some commonly accessible pointers, stack objects (maximum possible) and techniques like auto_ptr (RAII objects)
The best thing is that, you've control over the memory and the worst thing is that you will not have any control over the memory if we employ an improper memory management for the application. The crashes caused due to memory corruptions are the nastiest and hard to trace.
I see that a few important reasons for doing as few new's as possible are missed:
Operator new has a non-deterministic execution time
Calling new may or may not cause the OS to allocate a new physical page to your process. This can be quite slow if you do it often. Or it may already have a suitable memory location ready; we don't know. If your program needs to have consistent and predictable execution time (like in a real-time system or game/physics simulation), you need to avoid new in your time-critical loops.
Operator new is an implicit thread synchronization
Yes, you heard me. Your OS needs to make sure your page tables are consistent and as such calling new will cause your thread to acquire an implicit mutex lock. If you are consistently calling new from many threads you are actually serialising your threads (I've done this with 32 CPUs, each hitting on new to get a few hundred bytes each, ouch! That was a royal p.i.t.a. to debug.)
The rest, such as slow, fragmentation, error prone, etc., have already been mentioned by other answers.
Pre-C++17:
Because it is prone to subtle leaks even if you wrap the result in a smart pointer.
Consider a "careful" user who remembers to wrap objects in smart pointers:
foo(shared_ptr<T1>(new T1()), shared_ptr<T2>(new T2()));
This code is dangerous because there is no guarantee that either shared_ptr is constructed before either T1 or T2. Hence, if one of new T1() or new T2() fails after the other succeeds, then the first object will be leaked because no shared_ptr exists to destroy and deallocate it.
Solution: use make_shared.
Post-C++17:
This is no longer a problem: C++17 imposes a constraint on the order of these operations, in this case ensuring that each call to new() must be immediately followed by the construction of the corresponding smart pointer, with no other operation in between. This implies that, by the time the second new() is called, it is guaranteed that the first object has already been wrapped in its smart pointer, thus preventing any leaks in case an exception is thrown.
A more detailed explanation of the new evaluation order introduced by C++17 was provided by Barry in another answer.
Thanks to #Remy Lebeau for pointing out that this is still a problem under C++17 (although less so): the shared_ptr constructor can fail to allocate its control block and throw, in which case the pointer passed to it is not deleted.
Solution: use make_shared.
To a great extent, that's someone elevating their own weaknesses to a general rule. There's nothing wrong per se with creating objects using the new operator. What there is some argument for is that you have to do so with some discipline: if you create an object you need to make sure it's going to be destroyed.
The easiest way of doing that is to create the object in automatic storage, so C++ knows to destroy it when it goes out of scope:
{
File foo = File("foo.dat");
// Do things
}
Now, observe that when you fall off that block after the end-brace, foo is out of scope. C++ will call its destructor automatically for you. Unlike Java, you don't need to wait for the garbage collection to find it.
Had you written
{
File * foo = new File("foo.dat");
you would want to match it explicitly with
delete foo;
}
or even better, allocate your File * as a "smart pointer". If you aren't careful about that it can lead to leaks.
The answer itself makes the mistaken assumption that if you don't use new you don't allocate on the heap; in fact, in C++ you don't know that. At most, you know that a small amount of memory, say one pointer, is certainly allocated on the stack. However, consider if the implementation of File is something like:
class File {
private:
FileImpl * fd;
public:
File(String fn){ fd = new FileImpl(fn);}
Then FileImpl will still be allocated on the stack.
And yes, you'd better be sure to have
~File(){ delete fd ; }
in the class as well; without it, you'll leak memory from the heap even if you didn't apparently allocate on the heap at all.
new() shouldn't be used as little as possible. It should be used as carefully as possible. And it should be used as often as necessary as dictated by pragmatism.
Allocation of objects on the stack, relying on their implicit destruction, is a simple model. If the required scope of an object fits that model then there's no need to use new(), with the associated delete() and checking of NULL pointers.
In the case where you have lots of short-lived objects allocation on the stack should reduce the problems of heap fragmentation.
However, if the lifetime of your object needs to extend beyond the current scope then new() is the right answer. Just make sure that you pay attention to when and how you call delete() and the possibilities of NULL pointers, using deleted objects and all of the other gotchas that come with the use of pointers.
When you use new, objects are allocated to the heap. It is generally used when you anticipate expansion. When you declare an object such as,
Class var;
it is placed on the stack.
You will always have to call destroy on the object that you placed on the heap with new. This opens the potential for memory leaks. Objects placed on the stack are not prone to memory leaking!
One notable reason to avoid overusing the heap is for performance -- specifically involving the performance of the default memory management mechanism used by C++. While allocation can be quite quick in the trivial case, doing a lot of new and delete on objects of non-uniform size without strict order leads not only to memory fragmentation, but it also complicates the allocation algorithm and can absolutely destroy performance in certain cases.
That's the problem that memory pools where created to solve, allowing to to mitigate the inherent disadvantages of traditional heap implementations, while still allowing you to use the heap as necessary.
Better still, though, to avoid the problem altogether. If you can put it on the stack, then do so.
I tend to disagree with the idea of using new "too much". Though the original poster's use of new with system classes is a bit ridiculous. (int *i; i = new int[9999];? really? int i[9999]; is much clearer.) I think that is what was getting the commenter's goat.
When you're working with system objects, it's very rare that you'd need more than one reference to the exact same object. As long as the value is the same, that's all that matters. And system objects don't typically take up much space in memory. (one byte per character, in a string). And if they do, the libraries should be designed to take that memory management into account (if they're written well). In these cases, (all but one or two of the news in his code), new is practically pointless and only serves to introduce confusions and potential for bugs.
When you're working with your own classes/objects, however (e.g. the original poster's Line class), then you have to begin thinking about the issues like memory footprint, persistence of data, etc. yourself. At this point, allowing multiple references to the same value is invaluable - it allows for constructs like linked lists, dictionaries, and graphs, where multiple variables need to not only have the same value, but reference the exact same object in memory. However, the Line class doesn't have any of those requirements. So the original poster's code actually has absolutely no needs for new.
I think the poster meant to say You do not have to allocate everything on the heap rather than the the stack.
Basically, objects are allocated on the stack (if the object size allows, of course) because of the cheap cost of stack-allocation, rather than heap-based allocation which involves quite some work by the allocator, and adds verbosity because then you have to manage data allocated on the heap.
Two reasons:
It's unnecessary in this case. You're making your code needlessly more complicated.
It allocates space on the heap, and it means that you have to remember to delete it later, or it will cause a memory leak.
Many answers have gone into various performance considerations. I want to address the comment which puzzled OP:
Stop thinking like a Java programmer.
Indeed, in Java, as explained in the answer to this question,
You use the new keyword when an object is being explicitly created for the first time.
but in C++, objects of type T are created like so: T{} (or T{ctor_argument1,ctor_arg2} for a constructor with arguments). That's why usually you just have no reason to want to use new.
So, why is it ever used at all? Well, for two reasons:
You need to create many values the number of which is not known at compile time.
Due to limitations of the C++ implementation on common machines - to prevent a stack overflow by allocating too much space creating values the regular way.
Now, beyond what the comment you quoted implied, you should note that even those two cases above are covered well enough without you having to "resort" to using new yourself:
You can use container types from the standard libraries which can hold a runtime-variable number of elements (like std::vector).
You can use smart pointers, which give you a pointer similar to new, but ensure that memory gets released where the "pointer" goes out of scope.
and for this reason, it is an official item in the C++ community Coding Guidelines to avoid explicit new and delete: Guideline R.11.
The core reason is that objects on heap are always difficult to use and manage than simple values. Writing code that are easy to read and maintain is always the first priority of any serious programmer.
Another scenario is the library we are using provides value semantics and make dynamic allocation unnecessary. Std::string is a good example.
For object oriented code however, using a pointer - which means use new to create it beforehand - is a must. In order to simplify the complexity of resource management, we have dozens of tools to make it as simple as possible, such as smart pointers. The object based paradigm or generic paradigm assumes value semantics and requires less or no new, just as the posters elsewhere stated.
Traditional design patterns, especially those mentioned in GoF book, use new a lot, as they are typical OO code.
new is the new goto.
Recall why goto is so reviled: while it is a powerful, low-level tool for flow control, people often used it in unnecessarily complicated ways that made code difficult to follow. Furthermore, the most useful and easiest to read patterns were encoded in structured programming statements (e.g. for or while); the ultimate effect is that the code where goto is the appropriate way to is rather rare, if you are tempted to write goto, you're probably doing things badly (unless you really know what you're doing).
new is similar — it is often used to make things unnecessarily complicated and harder to read, and the most useful usage patterns can be encoded have been encoded into various classes. Furthermore, if you need to use any new usage patterns for which there aren't already standard classes, you can write your own classes that encode them!
I would even argue that new is worse than goto, due to the need to pair new and delete statements.
Like goto, if you ever think you need to use new, you are probably doing things badly — especially if you are doing so outside of the implementation of a class whose purpose in life is to encapsulate whatever dynamic allocations you need to do.
One more point to all the above correct answers, it depends on what sort of programming you are doing. Kernel developing in Windows for example -> The stack is severely limited and you might not be able to take page faults like in user mode.
In such environments, new, or C-like API calls are prefered and even required.
Of course, this is merely an exception to the rule.
new allocates objects on the heap. Otherwise, objects are allocated on the stack. Look up the difference between the two.
Coming from a C# background, I have only vaguest idea on memory management on C++-- all I know is that I would have to free the memory manually. As a result my C++ code is written in such a way that objects of the type std::vector, std::list, std::map are freely instantiated, used, but not freed.
I didn't realize this point until I am almost done with my programs, now my code is consisted of the following kinds of patterns:
struct Point_2
{
double x;
double y;
};
struct Point_3
{
double x;
double y;
double z;
};
list<list<Point_2>> Computation::ComputationJob
(list<Point_3>pts3D, vector<Point_2>vectors)
{
map<Point_2, double> pt2DMap=ConstructPointMap(pts3D);
vector<Point_2> vectorList = ConstructVectors(vectors);
list<list<Point_2>> faceList2D=ConstructPoints(vectorList , pt2DMap);
return faceList2D;
}
My question is, must I free every.single.one of the list usage ( in the above example, this means that I would have to free pt2DMap, vectorList and faceList2D)? That would be very tedious! I might just as well rewrite my Computation class so that it is less prone to memory leak.
Any idea how to fix this?
No: if objects are not allocated with new, they need not be freed/deleted explicitly. When they go out of scope, they are deallocated automatically. When that happens, the destructor is called, which should deallocate all objects that they refer to. (This is called Resource Acquisition Is Initialization, or RAII, and standard classes such as std::list and std::vector follow this pattern.)
If you do use new, then you should either use a smart pointer (scoped_ptr) or explicitly call delete. The best place to call delete is in a destructor (for reasons of exception safety), though smart pointers should be preferred whenever possible.
What I can say in general is that the C++ standard containers make copies of your object under the scenes. You have no control over that. What this means is that if construction of your objects (Point_2 in your case) involves any resource allocations (eg: new or malloc calls), then you have to write custom versions of the copy constructors and destructors that make this behave sensibly when your map decides to copy Point_2s around. Usually this involves techniques like reference counting.
Many people find it much easier to just put pointers to complex object into standard containers, rather than the objects themselves.
If you don't do anything special in constructors or destructors for your objects (which now appears to be the case for you), the there's no problem whatsoever. Some containers (like maps) will be doing dynamic allocations under the scenes, but that is effectively invisible to you. The containers worry about their resource allocations. You only have to worry about yours.
All stl containers clean up their contents automatically, all you must care is cleaning of data you allocate dynamically (i.e. the rule is: take care of the pointers).
For example, if you have list<MyType> - the list contains objects of some custom type inside - on a destruction it will call ~MyType() which should take care of proper cleaning the object contents (I.e. if MyType has some pointers to allocated memory inside, you should delete them in the destructor).
On the other hand, if you start using list<MyType*> - the container does now how to clean this properly, it contains some scalar values (just like integers) and will delete just pointers themselves, without cleaning pointed content, so you need to clean this manually.
A really good advice (helped me a lot years ago :) ) when switching from Java/C# to C++ is to carefully trace each dynamic memory object lifecycle: a) where it gets created, b) where it gets used, c) where and when it gets deleted.
Make sure it is cleaned only once and does not get referenced after that!
In C++, ff I have a class that needs to hold a member which could be dynamically allocated and used as a pointer, or not, like this:
class A {
type a;
};
or
class A {
A();
~A();
type* a;
};
and in the constructor:
A::A {
a = new type();
}
and destructor:
A::~A {
delete a;
}
are there any advantages or disadvantages to either one, aside from the dynamic one requiring more code? Do they behave differently (aside from the pointer having to be dereferenced) or is one slower than the other? Which one should I use?
There are several differences:
The size of every member must be known when you're defining a class. This means you must include your type header, and you can't just use a forward-declaration as you would with a pointer member (since the size of all pointers is known). This has implications for #include clutter and compile times for large projects.
The memory for the data member is part of the enclosing class instance, so it will be allocated at the same time, in the same place, as all the other class members (whether on the stack or the heap). This has implications for data locality - having everything in the same place could potentially lead to better cache utilization, etc. Stack allocation will likely be a tad faster than heap allocation. Declaring too many huge object instances could blow your stack quicker.
The pointer type is trickier to manage - since it doesn't automatically get allocated or destroyed along with the class, you need to make sure to do that yourself. This becomes tricky with multiple pointer members - if you're newing all of them in the constructor, and halfway through the process there's an exception, the destructor doesn't get called and you have a memory leak. It's better to assign pointer variables to a "smart pointer" container (like std::auto_ptr) immediately, this way the cleanup gets handled automatically (and you don't need to worry about deleteing them in the destructor, often saving you from writing one at all). Also, any time you're handling resources manually you need to worry about copy constructors and assignment operators.
With the pointer you have more control, but also more responsibilities. You have more control in the sense that you can decide the lifetime of the object more precisely, while without the pointer the lifetime is essentially equal to the lifetime of the containing object. Also, with the pointer the member could actually be an instance of a subclass of the pointer type.
Performance-wise, using the pointer does mean more memory usage, more memory fragmentation, and dereferencing does take amount of time. For all but the most performance critical code none of this is really worth worrying about, however.
The main difference is that the pointer can potentially point somewhere else.
edit
Laurence's answer isn't wrong, but it's a bit general. In specific, dynamic allocation is going to be slightly slower. Dereferencing through the pointer is likewise going to be very slightly slower. Again, this is not a lot of speed loss, and the flexibility it buys may well be very much worth it.
The main difference is that if you don't use a pointer, the memory for the inner member will be allocated as a part of the memory allocated for the containing object. If you use new, you will get memory in separate chunks (you already seem to have proper creation and destruction of the referenced object down)
You need to understand the implications of default copy constructor and copy assignment operators when using raw pointers. The raw pointer gets copied in both the cases. In other words, you will end up having multiple objects (or raw pointers) pointing to the same memory location. Therefore, your destructor written as is above will attempt to delete the same memory multiple times.
If the member variable should live beyond the lifetime of the object, or if its ownership should be transferred to another object, then the member should be dynamically (heap) allocated using "new". If it is not, then it is often the best choice to make it a direct member of the class in order to simplify code and lessen the burden on the memory-allocator. Memory allocation is expensive.
I know that when delete [] will cause destruction for all array elements and then releases the memory.
I initially thought that compiler wants it just to call destructor for all elements in the array, but I have also a counter - argument for that which is:
Heap memory allocator must know the size of bytes allocated and using sizeof(Type) its possible to find no of elements and to call appropriate no of destructors for an array to prevent resource leaks.
So my assumption is correct or not and please clear my doubt on it.
So I am not getting the usage of [] in delete [] ?
Scott Meyers says in his Effective C++ book: Item 5: Use the same form in corresponding uses of new and delete.
The big question for delete is this: how many objects reside in the memory being deleted? The answer to that determines how many destructors must be called.
Does the pointer being deleted point to a single object or to an array of objects? The only way for delete to know is for you to tell it. If you don't use brackets in your use of delete, delete assumes a single object is pointed to.
Also, the memory allocator might allocate more space that required to store your objects and in this case dividing the size of the memory block returned by the size of each object won't work.
Depending on the platform, the _msize (windows), malloc_usable_size (linux) or malloc_size (osx) functions will tell you the real length of the block that was allocated. This information can be exploited when designing growing containers.
Another reason why it won't work is that Foo* foo = new Foo[10] calls operator new[] to allocate the memory. Then delete [] foo; calls operator delete[] to deallocate the memory. As those operators can be overloaded, you have to adhere to the convention otherwise delete foo; calls operator delete which may have an incompatible implementation with operator delete []. It's a matter of semantics, not just keeping track of the number of allocated object to later issue the right number of destructor calls.
See also:
[16.14] After p = new Fred[n], how does the compiler know there are n objects to be destructed during delete[] p?
Short answer: Magic.
Long answer: The run-time system stores the number of objects, n, somewhere where it can be retrieved if you only know the pointer, p. There are two popular techniques that do this. Both these techniques are in use by commercial-grade compilers, both have tradeoffs, and neither is perfect. These techniques are:
Over-allocate the array and put n just to the left of the first Fred object.
Use an associative array with p as the key and n as the value.
EDIT: after having read #AndreyT comments, I dug into my copy of Stroustrup's "The Design and Evolution of C++" and excerpted the following:
How do we ensure that an array is correctly deleted? In particular, how do we ensure that the destructor is called for all elements of an array?
...
Plain delete isn't required to handle both individual objects an arrays. This avoids complicating the common case of allocating and deallocating individual objects. It also avoids encumbering individual objects with information necessary for array deallocation.
An intermediate version of delete[] required the programmer to specify the number of elements of the array.
...
That proved too error prone, so the burden of keeping track of the number of elements was placed on the implementation instead.
As #Marcus mentioned, the rational may have been "you don't pay for what you don't use".
EDIT2:
In "The C++ Programming Language, 3rd edition", §10.4.7, Bjarne Stroustrup writes:
Exactly how arrays and individual objects are allocated is implementation-dependent. Therefore, different implementations will react differently to incorrect uses of the delete and delete[] operators. In simple and uninteresting cases like the previous one, a compiler can detect the problem, but generally something nasty will happen at run time.
The special destruction operator for arrays, delete[], isn’t logically necessary. However, suppose the implementation of the free store had been required to hold sufficient information for every object to tell if it was an individual or an array. The user could have been relieved of a burden, but that obligation would have imposed significant time and space overheads on some C++ implementations.
The main reason why it was decided to keep separate delete and delete[] is that these two entities are not as similar as it might seem at the first sight. For a naive observer they might seem to be almost the same: just destruct and deallocate, with the only difference in the potential number of objects to process. In reality, the difference is much more significant.
The most important difference between the two is that delete might perform polymorphic deletion of objects, i.e. the static type of the object in question might be different from its dynamic type. delete[] on the other hand must deal with strictly non-polymorphic deletion of arrays. So, internally these two entities implement logic that is significantly different and non-intersecting between the two. Because of the possibility of polymorphic deletion, the functionality of delete is not even remotely the same as the functionality of delete[] on an array of 1 element, as a naive observer might incorrectly assume initially.
Contrary to the strange claims made in some other answers, it is, of course, perfectly possible to replace delete and delete[] with just a single construct that would branch at the very early stage, i.e. it would determine the type of the memory block (array or not) using the household information that would be stored by new/new[], and then jump to the appropriate functionality, equivalent to either delete or delete[]. However, this would be a rather poor design decision, since, once again, the functionality of the two is too different. Forcing both into a single construct would be akin to creating a Swiss Army Knife of a deallocation function. Also, in order to be able to tell an array from a non-array we'd have to introduce an additional piece of household information even into a single-object memory allocations done with plain new. This might easily result in notable memory overhead in single object allocations.
But, once again, the main reason here is the functional difference between delete and delete[]. These language entities possess only apparent skin-deep similarity that exists only at the level of naive specification ("destruct and free memory"), but once one gets to understand in detail what these entities really have to do one realizes that they are too different to be merged into one.
P.S. This is BTW one of the problems with the suggestion about sizeof(type) you made in the question. Because of the potentially polymorphic nature of delete, you don't know the type in delete, which is why you can't obtain any sizeof(type). There are more problems with this idea, but that one is already enough to explain why it won't fly.
The heap itself knows the size of allocated block - you only need the address. Look like free() works - you only pass the address and it frees memory.
The difference between delete (delete[]) and free() is that the former two first call the destructors, then free memory (possibly using free()). The problem is that delete[] also has only one argument - the address and having only that address it need to know the number of objects to run destructors on. So new[] uses som implementation-defined way of writing somewhere the number of elements - usually it prepends the array with the number of elements. Now delete[] will rely on that implementation-specific data to run destructors and then free memory (again, only using the block address).
delete[] just calls a different implementation (function);
There's no reason an allocator couldn't track it (in fact, it would be easy enough to write your own).
I don't know the reason they did not manage it, or the history of the implementation, if I were to guess: Many of these 'well, why wasn't this slightly simpler?' questions (in C++) came down to one or more of:
compatibility with C
performance
In this case, performance. Using delete vs delete[] is easy enough, I believe it could all be abstracted from the programmer and be reasonably fast (for general use). delete[] only requires only a few additional function calls and operations (omitting destructor calls), but that is per call to delete, and unnecessary because the programmer generally knows the type he/she is dealing with (if not, there's likely a bigger problem at hand). So it just avoids calling through the allocator. Additionally, these single allocations may not need to be tracked by the allocator in as much detail; Treating every allocation as an array would require additional entries for count for trivial allocations, so it is multiple levels of simple allocator implementation simplifications which are actually important for many people, considering it is a very low level domain.
This is more complicated.
The keyword and the convention to use it to delete an array was invented for the convenience of implementations, and some implementations do use it (I don't know which though. MS VC++ does not).
The convenience is this:
In all other cases, you know the exact size to be freed by other means. When you delete a single object, you can have the size from compile-time sizeof(). When you delete a polymorphic object by base pointer and you have a virtual destructor, you can have the size as a separate entry in vtbl. If you delete an array, how would you know the size of memory to be freed, unless you track it separately?
The special syntax would allow tracking such size only for an array - for instance, by putting it before the address that is returned to the user. This takes up additional resources and is not needed for non-arrays.