Why is creating STL containers dynamically considered bad practice? - c++

Title says it.
Sample of bad practive:
std::vector<Point>* FindPoints()
{
std::vector<Point>* result = new std::vector<Point>();
//...
return result;
}
What's wrong with it if I delete that vector later?
I mostly program in C#, so this problem is not very clear for me in C++ context.

As a rule of thumb, you don't do this because the less you allocate on the heap, the less you risk leaking memory. :)
std::vector is useful also because it automatically manages the memory used for the vector in RAII fashion; by allocating it on the heap now you require an explicit deallocation (with delete result) to avoid leaking its memory. The thing is made complicated because of exceptions, that can alter your return path and skip any delete you put on the way. (In C# you don't have such problems because inaccessible memory is just recalled periodically by the garbage collector)
If you want to return an STL container you have several choices:
just return it by value; in theory you should incur in a copy-penality because of the temporaries that are created in the process of returning result, but newer compilers should be able to elide the copy using NRVO1. There may also be std::vector implementations that implement copy-on-write optimization like many std::string implementations do, but I've never heard about that.
On C++0x compilers, instead, the move semantics should trigger, avoiding any copy.
Store the pointer of result in an ownership-transferring smart pointer like std::auto_ptr (or std::unique_ptr in C++0x), and also change the return type of your function to std::auto_ptr<std::vector<Point > >; in that way, your pointer is always encapsulated in a stack-object, that is automatically destroyed when the function exits (in any way), and destroys the vector if its still owned by it. Also, it's completely clear who owns the returned object.
Make the result vector a parameter passed by reference by the caller, and fill that one instead of returning a new vector.
Hardcore STL option: you would instead provide your data as iterators; the client code would then use std::copy+std::back_inserter or whatever to store such data in whichever container it wants. Not seen much (it can be tricky to code right) but it's worth mentioning.
As #Steve Jessop pointed out in the comments, NRVO works completely only if the return value is used directly to initialize a variable in the calling method; otherwise, it would still be able to elide the construction of the temporary return value, but the assignment operator for the variable to which the return value is assigned could still be called (see #Steve Jessop's comments for details).

Creating anything dynamically is bad practice unless it's really necessary. There's rarely a good reason to create a container dynamically, so it's usually not a good idea.
Edit: Usually, instead of worrying about things like how fast or slow returning a container is, most of the code should deal only with an iterator (or two) into the container.

Creating objects dynamically in general is considered a bad practice in C++. What if an exception is thrown from your "//..." code? You'll never be able to delete the object. It is easier and safer to simply do:
std::vector<Point> FindPoints()
{
std::vector<Point> result;
//...
return result;
}
Shorter, safer, more straghtforward... As for the performance, modern compilers will optimize away the copy on return and if they are not able to, move constructors will get executed so this is still a cheap operation.

Perhaps you're referring to this recent question: C++: vector<string> *args = new vector<string>(); causes SIGABRT
One liner: It's bad practice because it's a pattern that's prone to memory leaks.
You're forcing the caller to accept dynamic allocation and take charge of its lifetime. It's ambiguous from the declaration whether the pointer returned is a static buffer, a buffer owned by some other API (or object), or a buffer that's now owned by the caller. You should avoid this pattern in any language (including plain C) unless it's clear from the function name what's going on (e.g strdup, malloc).
The usual way is to instead do this:
void FindPoints(std::vector<Point>* ret) {
std::vector<Point> result;
//...
ret->swap(result);
}
void caller() {
//...
std::vector<Point> foo;
FindPoints(&foo);
// foo deletes itself
}
All objects are on the stack, and all the deletion is taken care of by the compiler. Or just return by value, if you're running a C++0x compiler+STL, or don't mind the copy.

I like Jerry Coffin's answer. Additionally, if you want to avoid returning a copy, consider passing the result container as a reference, and the swap() method may be needed sometimes.
void FindPoints(std::vector<Point> &points)
{
std::vector<Point> result;
//...
result.swap(points);
}

Programming is the art of finding good compromises. Dynamically allocated memory can have some place of course, and I can even think to problems where a good compromise between code complexity and efficiency is obtained using std::vector<std::vector<T>*>.
However std::vector does a great job of hiding most needs of dynamically allocated arrays, and managed pointers are many times just a perfect solution for dynamically allocated single instances. This means that it's just not so common finding cases where an unmanaged dynamically allocated container (or dynamically allocated whatever, actually) is the best compromise in C++.
This in my opinion doesn't make dynamic allocation "bad", but just "suspect" if you see it in code, because there's an high probability that better solutions could be possile.
In your case for example I see no reason for using dynamic allocation; just making the function returning an std::vector would be efficient and safe. With any decent compiler Return Value Optimization will be used when assigning to a newly declared vector, and if you need to assign the result to an existing vector you can still do something like:
FindPoints().swap(myvector);
that will not do any copying of the data but just some pointer twiddling (note that you cannot use the apparently more natural myvector.swap(FindPoints()) because of a C++ rule that is sometimes annoying that forbids passing temporaries as non-const references).
In my experience the biggest source of needs of dynamically allocated objects are complex data structures where the same instance can be reached using multiple access paths (e.g. instances are at the same time both in a doubly linked list and indexed by a map). In the standard library containers are always the only owner of the contained objects (C++ is a copy semantic language) so it may be difficult to implement those solutions efficiently without the pointer and dynamic allocation concept.
Often you can stil reasonable-enough compromises that just use standard containers however (may be paying some extra O(log N) lookups that you could have avoided) and that, considering the much simpler code, can be IMO the best compromise in most cases.

Related

For a data member, is there any difference between dynamically allocating this variable(or not) if the containing object is already in dynamic memory?

I'm starting with the assumption that, generally, it is a good idea to allocate small objects in the stack, and big objects in dynamic memory. Another assumption is that I'm possibly confused while trying to learn about memory, STL containers and smart pointers.
Consider the following example, where I have an object that is necessarily allocated in the free store through a smart pointer, and I can rely on clients getting said object from a factory, for instance. This object contains some data that is specifically allocated using an STL container, which happens to be a std::vector. In one case, this data vector itself is dynamically allocated using some smart pointer, and in the other situation I just don't use a smart pointer.
Is there any practical difference between design A and design B, described below?
Situation A:
class SomeClass{
public:
SomeClass(){ /* initialize some potentially big STL container */ }
private:
std::vector<double> dataVector_;
};
Situation B:
class SomeOtherClass{
public:
SomeOtherClass() { /* initialize some potentially big STL container,
but is it allocated in any different way? */ }
private:
std::unique_ptr<std::vector<double>> pDataVector_;
};
Some factory functions.
std::unique_ptr<SomeClass> someClassFactory(){
return std::make_unique<SomeClass>();
}
std::unique_ptr<SomeOtherClass> someOtherClassFactory(){
return std::make_unique<SomeOtherClass>();
}
Use case:
int main(){
//in my case I can reliably assume that objects themselves
//are going to always be allocated in dynamic memory
auto pSomeClassObject(someClassFactory());
auto pSomeOtherClassObject(someOtherClassFactory());
return 0;
}
I would expect that both design choices have the same outcome, but do they?
Is there any advantage or disadvantage for choosing A or B? Specifically, should I generally choose design A because it's simpler or are there more considerations? Is B morally wrong because it can dangle for a std::vector?
tl;dr : Is it wrong to have a smart pointer pointing to a STL container?
edit:
The related answers pointed to useful additional information for someone as confused as myself.
Usage of objects or pointers to objects as class members and memory allocation
and Class members that are objects - Pointers or not? C++
And changing some google keywords lead me to When vectors are allocated, do they use memory on the heap or the stack?
std::unique_ptr<std::vector<double>> is slower, takes more memory, and the only advantage is that it contains an additional possible state: "vector doesn't exist". However, if you care about that state, use boost::optional<std::vector> instead. You should almost never have a heap-allocated container, and definitely never use a unique_ptr. It actually works fine, no "dangling", it's just pointlessly slow.
Using std::unique_ptr here is just wasteful unless your goal is a compiler firewall (basically hiding the compile-time dependency to vector, but then you'd need a forward declaration to standard containers).
You're adding an indirection but, more importantly, the full contents of SomeClass turns into 3 separate memory blocks to load when accessing the contents (SomeClass merged with/containing unique_ptr's block pointing to std::vector's block pointing to its element array). In addition you're paying one extra superfluous level of heap overhead.
Now you might start imagining scenarios where an indirection is helpful to the vector, like maybe you can shallow move/swap the unique_ptrs between two SomeClass instances. Yes, but vector already provides that without a unique_ptr wrapper on top. And it already has states like empty that you can reuse for some concept of validity/nilness.
Remember that variable-sized containers themselves are small objects, not big ones, pointing to potentially big blocks. vector isn't big, its dynamic contents can be. The idea of adding indirections for big objects isn't a bad rule of thumb, but vector is not a big object. With move semantics in place, it's worth thinking of it more like a little memory block pointing to a big one that can be shallow copied and swapped cheaply. Before move semantics, there were more reasons to think of something like std::vector as one indivisibly large object (though its contents were always swappable), but now it's worth thinking of it more like a little handle pointing to big, dynamic contents.
Some common reasons to introduce an indirection through something like unique_ptr is:
Abstraction & hiding. If you're trying to abstract or hide the concrete definition of some type/subtype, Foo, then this is where you need the indirection so that its handle can be captured (or potentially even used with abstraction) by those who don't know exactly what Foo is.
To allow a big, contiguous 1-block-type object to be passed around from owner to owner without invoking a copy or invalidating references/pointers (iterators included) to it or its contents.
A hasty kind of reason that's wasteful but sometimes useful in a deadline rush is to simply introduce a validity/null state to something that doesn't inherently have it.
Occasionally it's useful as an optimization to hoist out certain less frequently-accessed, larger members of an object so that its commonly-accessed elements fit more snugly (and perhaps with adjacent objects) in a cache line. There unique_ptr can let you split apart that object's memory layout while still conforming to RAII.
Now wrapping a shared_ptr on top of a standard container might have more legitimate applications if you have a container that can actually be owned (sensibly) by more than one owner. With unique_ptr, only one owner can possess the object at a time, and standard containers already let you swap and move each other's internal guts (the big, dynamic parts). So there's very little reason I can think of to wrap a standard container directly with a unique_ptr, as it's already somewhat like a smart pointer to a dynamic array (but with more functionality to work with that dynamic data, including deep copying it if desired).
And if we talk about non-standard containers, like say you're working with a third party library that provides some data structures whose contents can get very large but they fail to provide those cheap, non-invalidating move/swap semantics, then you might superficially wrap it around a unique_ptr, exchanging some creation/access/destruction overhead to get those cheap move/swap semantics back as a workaround. For the standard containers, no such workaround is needed.
I agree with #MooingDuck; I don't think using std::unique_ptr has any compelling advantages. However, I could see a use case for std::shared_ptr if the member data is very large and the class is going to support COW (copy-on-write) semantics (or any other use case where the data is shared across multiple instances).

Why should C++ programmers minimize use of 'new'?

I stumbled upon Stack Overflow question Memory leak with std::string when using std::list<std::string>, and one of the comments says this:
Stop using new so much. I can't see any reason you used new anywhere you did. You can create objects by value in C++ and it's one of the huge advantages to using the language. You do not have to allocate everything on the heap. Stop thinking like a Java programmer.
I'm not really sure what he means by that.
Why should objects be created by value in C++ as often as possible, and what difference does it make internally? Did I misinterpret the answer?
There are two widely-used memory allocation techniques: automatic allocation and dynamic allocation. Commonly, there is a corresponding region of memory for each: the stack and the heap.
Stack
The stack always allocates memory in a sequential fashion. It can do so because it requires you to release the memory in the reverse order (First-In, Last-Out: FILO). This is the memory allocation technique for local variables in many programming languages. It is very, very fast because it requires minimal bookkeeping and the next address to allocate is implicit.
In C++, this is called automatic storage because the storage is claimed automatically at the end of scope. As soon as execution of current code block (delimited using {}) is completed, memory for all variables in that block is automatically collected. This is also the moment where destructors are invoked to clean up resources.
Heap
The heap allows for a more flexible memory allocation mode. Bookkeeping is more complex and allocation is slower. Because there is no implicit release point, you must release the memory manually, using delete or delete[] (free in C). However, the absence of an implicit release point is the key to the heap's flexibility.
Reasons to use dynamic allocation
Even if using the heap is slower and potentially leads to memory leaks or memory fragmentation, there are perfectly good use cases for dynamic allocation, as it's less limited.
Two key reasons to use dynamic allocation:
You don't know how much memory you need at compile time. For instance, when reading a text file into a string, you usually don't know what size the file has, so you can't decide how much memory to allocate until you run the program.
You want to allocate memory which will persist after leaving the current block. For instance, you may want to write a function string readfile(string path) that returns the contents of a file. In this case, even if the stack could hold the entire file contents, you could not return from a function and keep the allocated memory block.
Why dynamic allocation is often unnecessary
In C++ there's a neat construct called a destructor. This mechanism allows you to manage resources by aligning the lifetime of the resource with the lifetime of a variable. This technique is called RAII and is the distinguishing point of C++. It "wraps" resources into objects. std::string is a perfect example. This snippet:
int main ( int argc, char* argv[] )
{
std::string program(argv[0]);
}
actually allocates a variable amount of memory. The std::string object allocates memory using the heap and releases it in its destructor. In this case, you did not need to manually manage any resources and still got the benefits of dynamic memory allocation.
In particular, it implies that in this snippet:
int main ( int argc, char* argv[] )
{
std::string * program = new std::string(argv[0]); // Bad!
delete program;
}
there is unneeded dynamic memory allocation. The program requires more typing (!) and introduces the risk of forgetting to deallocate the memory. It does this with no apparent benefit.
Why you should use automatic storage as often as possible
Basically, the last paragraph sums it up. Using automatic storage as often as possible makes your programs:
faster to type;
faster when run;
less prone to memory/resource leaks.
Bonus points
In the referenced question, there are additional concerns. In particular, the following class:
class Line {
public:
Line();
~Line();
std::string* mString;
};
Line::Line() {
mString = new std::string("foo_bar");
}
Line::~Line() {
delete mString;
}
Is actually a lot more risky to use than the following one:
class Line {
public:
Line();
std::string mString;
};
Line::Line() {
mString = "foo_bar";
// note: there is a cleaner way to write this.
}
The reason is that std::string properly defines a copy constructor. Consider the following program:
int main ()
{
Line l1;
Line l2 = l1;
}
Using the original version, this program will likely crash, as it uses delete on the same string twice. Using the modified version, each Line instance will own its own string instance, each with its own memory and both will be released at the end of the program.
Other notes
Extensive use of RAII is considered a best practice in C++ because of all the reasons above. However, there is an additional benefit which is not immediately obvious. Basically, it's better than the sum of its parts. The whole mechanism composes. It scales.
If you use the Line class as a building block:
class Table
{
Line borders[4];
};
Then
int main ()
{
Table table;
}
allocates four std::string instances, four Line instances, one Table instance and all the string's contents and everything is freed automagically.
Because the stack is faster and leak-proof
In C++, it takes but a single instruction to allocate space—on the stack—for every local scope object in a given function, and it's impossible to leak any of that memory. That comment intended (or should have intended) to say something like "use the stack and not the heap".
The reason why is complicated.
First, C++ is not garbage collected. Therefore, for every new, there must be a corresponding delete. If you fail to put this delete in, then you have a memory leak. Now, for a simple case like this:
std::string *someString = new std::string(...);
//Do stuff
delete someString;
This is simple. But what happens if "Do stuff" throws an exception? Oops: memory leak. What happens if "Do stuff" issues return early? Oops: memory leak.
And this is for the simplest case. If you happen to return that string to someone, now they have to delete it. And if they pass it as an argument, does the person receiving it need to delete it? When should they delete it?
Or, you can just do this:
std::string someString(...);
//Do stuff
No delete. The object was created on the "stack", and it will be destroyed once it goes out of scope. You can even return the object, thus transfering its contents to the calling function. You can pass the object to functions (typically as a reference or const-reference: void SomeFunc(std::string &iCanModifyThis, const std::string &iCantModifyThis). And so forth.
All without new and delete. There's no question of who owns the memory or who's responsible for deleting it. If you do:
std::string someString(...);
std::string otherString;
otherString = someString;
It is understood that otherString has a copy of the data of someString. It isn't a pointer; it is a separate object. They may happen to have the same contents, but you can change one without affecting the other:
someString += "More text.";
if(otherString == someString) { /*Will never get here */ }
See the idea?
Objects created by new must be eventually deleted lest they leak. The destructor won't be called, memory won't be freed, the whole bit. Since C++ has no garbage collection, it's a problem.
Objects created by value (i. e. on stack) automatically die when they go out of scope. The destructor call is inserted by the compiler, and the memory is auto-freed upon function return.
Smart pointers like unique_ptr, shared_ptr solve the dangling reference problem, but they require coding discipline and have other potential issues (copyability, reference loops, etc.).
Also, in heavily multithreaded scenarios, new is a point of contention between threads; there can be a performance impact for overusing new. Stack object creation is by definition thread-local, since each thread has its own stack.
The downside of value objects is that they die once the host function returns - you cannot pass a reference to those back to the caller, only by copying, returning or moving by value.
C++ doesn't employ any memory manager by its own. Other languages like C# and Java have a garbage collector to handle the memory
C++ implementations typically use operating system routines to allocate the memory and too much new/delete could fragment the available memory
With any application, if the memory is frequently being used it's advisable to preallocate it and release when not required.
Improper memory management could lead memory leaks and it's really hard to track. So using stack objects within the scope of function is a proven technique
The downside of using stack objects are, it creates multiple copies of objects on returning, passing to functions, etc. However, smart compilers are well aware of these situations and they've been optimized well for performance
It's really tedious in C++ if the memory being allocated and released in two different places. The responsibility for release is always a question and mostly we rely on some commonly accessible pointers, stack objects (maximum possible) and techniques like auto_ptr (RAII objects)
The best thing is that, you've control over the memory and the worst thing is that you will not have any control over the memory if we employ an improper memory management for the application. The crashes caused due to memory corruptions are the nastiest and hard to trace.
I see that a few important reasons for doing as few new's as possible are missed:
Operator new has a non-deterministic execution time
Calling new may or may not cause the OS to allocate a new physical page to your process. This can be quite slow if you do it often. Or it may already have a suitable memory location ready; we don't know. If your program needs to have consistent and predictable execution time (like in a real-time system or game/physics simulation), you need to avoid new in your time-critical loops.
Operator new is an implicit thread synchronization
Yes, you heard me. Your OS needs to make sure your page tables are consistent and as such calling new will cause your thread to acquire an implicit mutex lock. If you are consistently calling new from many threads you are actually serialising your threads (I've done this with 32 CPUs, each hitting on new to get a few hundred bytes each, ouch! That was a royal p.i.t.a. to debug.)
The rest, such as slow, fragmentation, error prone, etc., have already been mentioned by other answers.
Pre-C++17:
Because it is prone to subtle leaks even if you wrap the result in a smart pointer.
Consider a "careful" user who remembers to wrap objects in smart pointers:
foo(shared_ptr<T1>(new T1()), shared_ptr<T2>(new T2()));
This code is dangerous because there is no guarantee that either shared_ptr is constructed before either T1 or T2. Hence, if one of new T1() or new T2() fails after the other succeeds, then the first object will be leaked because no shared_ptr exists to destroy and deallocate it.
Solution: use make_shared.
Post-C++17:
This is no longer a problem: C++17 imposes a constraint on the order of these operations, in this case ensuring that each call to new() must be immediately followed by the construction of the corresponding smart pointer, with no other operation in between. This implies that, by the time the second new() is called, it is guaranteed that the first object has already been wrapped in its smart pointer, thus preventing any leaks in case an exception is thrown.
A more detailed explanation of the new evaluation order introduced by C++17 was provided by Barry in another answer.
Thanks to #Remy Lebeau for pointing out that this is still a problem under C++17 (although less so): the shared_ptr constructor can fail to allocate its control block and throw, in which case the pointer passed to it is not deleted.
Solution: use make_shared.
To a great extent, that's someone elevating their own weaknesses to a general rule. There's nothing wrong per se with creating objects using the new operator. What there is some argument for is that you have to do so with some discipline: if you create an object you need to make sure it's going to be destroyed.
The easiest way of doing that is to create the object in automatic storage, so C++ knows to destroy it when it goes out of scope:
{
File foo = File("foo.dat");
// Do things
}
Now, observe that when you fall off that block after the end-brace, foo is out of scope. C++ will call its destructor automatically for you. Unlike Java, you don't need to wait for the garbage collection to find it.
Had you written
{
File * foo = new File("foo.dat");
you would want to match it explicitly with
delete foo;
}
or even better, allocate your File * as a "smart pointer". If you aren't careful about that it can lead to leaks.
The answer itself makes the mistaken assumption that if you don't use new you don't allocate on the heap; in fact, in C++ you don't know that. At most, you know that a small amount of memory, say one pointer, is certainly allocated on the stack. However, consider if the implementation of File is something like:
class File {
private:
FileImpl * fd;
public:
File(String fn){ fd = new FileImpl(fn);}
Then FileImpl will still be allocated on the stack.
And yes, you'd better be sure to have
~File(){ delete fd ; }
in the class as well; without it, you'll leak memory from the heap even if you didn't apparently allocate on the heap at all.
new() shouldn't be used as little as possible. It should be used as carefully as possible. And it should be used as often as necessary as dictated by pragmatism.
Allocation of objects on the stack, relying on their implicit destruction, is a simple model. If the required scope of an object fits that model then there's no need to use new(), with the associated delete() and checking of NULL pointers.
In the case where you have lots of short-lived objects allocation on the stack should reduce the problems of heap fragmentation.
However, if the lifetime of your object needs to extend beyond the current scope then new() is the right answer. Just make sure that you pay attention to when and how you call delete() and the possibilities of NULL pointers, using deleted objects and all of the other gotchas that come with the use of pointers.
When you use new, objects are allocated to the heap. It is generally used when you anticipate expansion. When you declare an object such as,
Class var;
it is placed on the stack.
You will always have to call destroy on the object that you placed on the heap with new. This opens the potential for memory leaks. Objects placed on the stack are not prone to memory leaking!
One notable reason to avoid overusing the heap is for performance -- specifically involving the performance of the default memory management mechanism used by C++. While allocation can be quite quick in the trivial case, doing a lot of new and delete on objects of non-uniform size without strict order leads not only to memory fragmentation, but it also complicates the allocation algorithm and can absolutely destroy performance in certain cases.
That's the problem that memory pools where created to solve, allowing to to mitigate the inherent disadvantages of traditional heap implementations, while still allowing you to use the heap as necessary.
Better still, though, to avoid the problem altogether. If you can put it on the stack, then do so.
I tend to disagree with the idea of using new "too much". Though the original poster's use of new with system classes is a bit ridiculous. (int *i; i = new int[9999];? really? int i[9999]; is much clearer.) I think that is what was getting the commenter's goat.
When you're working with system objects, it's very rare that you'd need more than one reference to the exact same object. As long as the value is the same, that's all that matters. And system objects don't typically take up much space in memory. (one byte per character, in a string). And if they do, the libraries should be designed to take that memory management into account (if they're written well). In these cases, (all but one or two of the news in his code), new is practically pointless and only serves to introduce confusions and potential for bugs.
When you're working with your own classes/objects, however (e.g. the original poster's Line class), then you have to begin thinking about the issues like memory footprint, persistence of data, etc. yourself. At this point, allowing multiple references to the same value is invaluable - it allows for constructs like linked lists, dictionaries, and graphs, where multiple variables need to not only have the same value, but reference the exact same object in memory. However, the Line class doesn't have any of those requirements. So the original poster's code actually has absolutely no needs for new.
I think the poster meant to say You do not have to allocate everything on the heap rather than the the stack.
Basically, objects are allocated on the stack (if the object size allows, of course) because of the cheap cost of stack-allocation, rather than heap-based allocation which involves quite some work by the allocator, and adds verbosity because then you have to manage data allocated on the heap.
Two reasons:
It's unnecessary in this case. You're making your code needlessly more complicated.
It allocates space on the heap, and it means that you have to remember to delete it later, or it will cause a memory leak.
Many answers have gone into various performance considerations. I want to address the comment which puzzled OP:
Stop thinking like a Java programmer.
Indeed, in Java, as explained in the answer to this question,
You use the new keyword when an object is being explicitly created for the first time.
but in C++, objects of type T are created like so: T{} (or T{ctor_argument1,ctor_arg2} for a constructor with arguments). That's why usually you just have no reason to want to use new.
So, why is it ever used at all? Well, for two reasons:
You need to create many values the number of which is not known at compile time.
Due to limitations of the C++ implementation on common machines - to prevent a stack overflow by allocating too much space creating values the regular way.
Now, beyond what the comment you quoted implied, you should note that even those two cases above are covered well enough without you having to "resort" to using new yourself:
You can use container types from the standard libraries which can hold a runtime-variable number of elements (like std::vector).
You can use smart pointers, which give you a pointer similar to new, but ensure that memory gets released where the "pointer" goes out of scope.
and for this reason, it is an official item in the C++ community Coding Guidelines to avoid explicit new and delete: Guideline R.11.
The core reason is that objects on heap are always difficult to use and manage than simple values. Writing code that are easy to read and maintain is always the first priority of any serious programmer.
Another scenario is the library we are using provides value semantics and make dynamic allocation unnecessary. Std::string is a good example.
For object oriented code however, using a pointer - which means use new to create it beforehand - is a must. In order to simplify the complexity of resource management, we have dozens of tools to make it as simple as possible, such as smart pointers. The object based paradigm or generic paradigm assumes value semantics and requires less or no new, just as the posters elsewhere stated.
Traditional design patterns, especially those mentioned in GoF book, use new a lot, as they are typical OO code.
new is the new goto.
Recall why goto is so reviled: while it is a powerful, low-level tool for flow control, people often used it in unnecessarily complicated ways that made code difficult to follow. Furthermore, the most useful and easiest to read patterns were encoded in structured programming statements (e.g. for or while); the ultimate effect is that the code where goto is the appropriate way to is rather rare, if you are tempted to write goto, you're probably doing things badly (unless you really know what you're doing).
new is similar — it is often used to make things unnecessarily complicated and harder to read, and the most useful usage patterns can be encoded have been encoded into various classes. Furthermore, if you need to use any new usage patterns for which there aren't already standard classes, you can write your own classes that encode them!
I would even argue that new is worse than goto, due to the need to pair new and delete statements.
Like goto, if you ever think you need to use new, you are probably doing things badly — especially if you are doing so outside of the implementation of a class whose purpose in life is to encapsulate whatever dynamic allocations you need to do.
One more point to all the above correct answers, it depends on what sort of programming you are doing. Kernel developing in Windows for example -> The stack is severely limited and you might not be able to take page faults like in user mode.
In such environments, new, or C-like API calls are prefered and even required.
Of course, this is merely an exception to the rule.
new allocates objects on the heap. Otherwise, objects are allocated on the stack. Look up the difference between the two.

Free Memory Occupied by Std List, Vector, Map etc

Coming from a C# background, I have only vaguest idea on memory management on C++-- all I know is that I would have to free the memory manually. As a result my C++ code is written in such a way that objects of the type std::vector, std::list, std::map are freely instantiated, used, but not freed.
I didn't realize this point until I am almost done with my programs, now my code is consisted of the following kinds of patterns:
struct Point_2
{
double x;
double y;
};
struct Point_3
{
double x;
double y;
double z;
};
list<list<Point_2>> Computation::ComputationJob
(list<Point_3>pts3D, vector<Point_2>vectors)
{
map<Point_2, double> pt2DMap=ConstructPointMap(pts3D);
vector<Point_2> vectorList = ConstructVectors(vectors);
list<list<Point_2>> faceList2D=ConstructPoints(vectorList , pt2DMap);
return faceList2D;
}
My question is, must I free every.single.one of the list usage ( in the above example, this means that I would have to free pt2DMap, vectorList and faceList2D)? That would be very tedious! I might just as well rewrite my Computation class so that it is less prone to memory leak.
Any idea how to fix this?
No: if objects are not allocated with new, they need not be freed/deleted explicitly. When they go out of scope, they are deallocated automatically. When that happens, the destructor is called, which should deallocate all objects that they refer to. (This is called Resource Acquisition Is Initialization, or RAII, and standard classes such as std::list and std::vector follow this pattern.)
If you do use new, then you should either use a smart pointer (scoped_ptr) or explicitly call delete. The best place to call delete is in a destructor (for reasons of exception safety), though smart pointers should be preferred whenever possible.
What I can say in general is that the C++ standard containers make copies of your object under the scenes. You have no control over that. What this means is that if construction of your objects (Point_2 in your case) involves any resource allocations (eg: new or malloc calls), then you have to write custom versions of the copy constructors and destructors that make this behave sensibly when your map decides to copy Point_2s around. Usually this involves techniques like reference counting.
Many people find it much easier to just put pointers to complex object into standard containers, rather than the objects themselves.
If you don't do anything special in constructors or destructors for your objects (which now appears to be the case for you), the there's no problem whatsoever. Some containers (like maps) will be doing dynamic allocations under the scenes, but that is effectively invisible to you. The containers worry about their resource allocations. You only have to worry about yours.
All stl containers clean up their contents automatically, all you must care is cleaning of data you allocate dynamically (i.e. the rule is: take care of the pointers).
For example, if you have list<MyType> - the list contains objects of some custom type inside - on a destruction it will call ~MyType() which should take care of proper cleaning the object contents (I.e. if MyType has some pointers to allocated memory inside, you should delete them in the destructor).
On the other hand, if you start using list<MyType*> - the container does now how to clean this properly, it contains some scalar values (just like integers) and will delete just pointers themselves, without cleaning pointed content, so you need to clean this manually.
A really good advice (helped me a lot years ago :) ) when switching from Java/C# to C++ is to carefully trace each dynamic memory object lifecycle: a) where it gets created, b) where it gets used, c) where and when it gets deleted.
Make sure it is cleaned only once and does not get referenced after that!

Deleting a reference

Is this valid? An acceptable practice?
typedef vector<int> intArray;
intArray& createArray()
{
intArray *arr = new intArray(10000, 0);
return(*arr);
}
int main(int argc, char *argv[])
{
intArray& array = createArray();
//..........
delete &array;
return 0;
}
The behavior of the code will be your intended behavior. Now, the problem is that while you might consider that programming is about writing something for the compiler to process, it is just as much about writing something that other programmers (or you in the future) will understand and be able to maintain. The code you provided in many cases will be equivalent to using pointers for the compiler, but for other programmers, it will just be a potential source of errors.
References are meant to be aliases to objects that are managed somewhere else, somehow else. In general people will be surprised when they encounter delete &ref, and in most cases programmers won't expect having to perform a delete on the address of a reference, so chances are that in the future someone is going to call the function an forget about deleting and you will have a memory leak.
In most cases, memory can be better managed by the use of smart pointers (if you cannot use other high level constructs like std::vectors). By hiding the pointer away behind the reference you are making it harder to use smart pointers on the returned reference, and thus you are not helping but making it harder for users to work with your interface.
Finally, the good thing about references is that when you read them in code, you know that the lifetime of the object is managed somewhere else and you need not to worry about it. By using a reference instead of a pointer you are basically going back to the single solution (previously in C only pointers) and suddenly extra care must be taken with all references to figure out whether memory must be managed there or not. That means more effort, more time to think about memory management, and less time to worry about the actual problem being solved -- with the extra strain of unusual code, people grow used to look for memory leaks with pointers and expect none out of references.
In a few words: having memory held by reference hides from the user the requirement to handle the memory and makes it harder to do so correctly.
Yes, I think it will work. But if I saw something like this in any code I worked on, I would rip it out and refactor right away.
If you intend to return an allocated object, use a pointer. Please!
It's valid... but I don't see why you'd ever want to do it. It's not exception safe, and std::vector is going to manage the memory for you anyway. Why new it?
EDIT: If you are returning new'd memory from a function, you should return the pointer, lest users of your function's heads explode.
Is this valid?
Yes.
An acceptable practice?
No.
This code has several problems:
The guideline of designing for least surprising behavior is broken: you return something that "looks like" an object but must be deleted by the client code (that should mean a pointer - a reference should be something that always points to a valid object).
your allocation can fail. Even if you check the result in the allocating function, what will you return? An invalid reference? Do you rely on the allocation throwing an exception for such a case?
As a design principle, consider either creating a RAII object that is responsible for managing the lifetime of your object (in this case a smart pointer) or deleting the pointer at the same abstraction level that you created it:
typedef vector<int> intArray;
intArray& createArray()
{
intArray *arr = new intArray(10000, 0);
return(*arr);
}
void deleteArray(intArray& object)
{
delete &object;
}
int main(int argc, char *argv[])
{
intArray& array = createArray();
//..........
deleteArray(array);
return 0;
}
This design improves coding style consistency (allocation and deallocation are hidden and implemented at the same abstraction level) but it would still make more sense to work through a pointer than a reference (unless the fact that your object is dynamically allocated must remain an implementation detail for some design reason).
It will work but I'm afraid it's flat-out unacceptable practise. There's a strong convention in the C++ world that memory management is done with pointers. Your code violates this convention, and is liable to trip up just about anyone who uses it.
It seems like you're going out of your way to avoid returning a raw pointer from this function. If your concern is having to check repeatedly for a valid pointer in main, you can use a reference for the processing of your array. But have createArray return a pointer, and make sure that the code which deletes the array takes it as a pointer too. Or, if it's really as simple as this, simply declare the array on the stack in main and forego the function altogether. (Initialization code in that case could take a reference to the array object to be initialized, and the caller could pass its stack object to the init code.)
It is valid because compiler can compile and run successfully. However, this kind of coding practices makes codes more harder for readers and maintainers because of
Manual memory management
Vague ownership transfer to client side
But there is a subtle point in this question, it is efficiency requirement. Sometimes we can not return pass-by value because object size might be too big, bulky as in this example (1000 * sizeof(int)); For that reason; we should use pointers if we need to transfer objects to different parts of our code. But this doesn't means above implementation is acceptable because for this kind of requirements, there is very useful tool, it is smart-pointers. So, design decision is up to programmer but for this kind of specific implementation details, programmer should use acceptable patterns like smart-pointers in this example.

How do ensure that while writing C++ code itself it will not cause any memory leaks?

Running valgrind or purify would be the next steps
But while while writing the code itself how do you ensure that it will not cause any memory leaks?
You can ensure following things:-
1: Number of new equal to delete
2: Opened File descriptor is closed or not
Is there any thing else?
Use the RAII idiom everywhere you can
Use smart pointers, e.g. std::auto_ptr where appropriate. (don't use auto_prt in any of the standard collections as it won't work as you think it will)
Avoid creating objects dynamically wherever possible. Programmers coming from Java and other similar languages often write stuff like:
string * s = new string( "hello world" );
when they should have written:
string s = "hello world";
Similarly, they create collections of pointers when they should create collections of values. For example, if you have a class like this:
class Person {
public:
Person( const string & name ) : mName( name ) {}
...
private:
string mName;
};
Rather than writing code like:
vector <Person *> vp;
or even:
vector <shared_ptr <Person> > vp;
instead use values:
vector <Person> vp;
You can easily add to such a vector:
vp.push_back( Person( "neil butterworth" ) );
and all the memory for both Person and the vector is managed for you. Of course, if you need a collection of polymorphic types, you should use (smart) pointers
Use Smart Pointers
Use RAII
Hide default copy ctors, operator=()
in EVERY CLASS,
unless a) your class is trivial and
only uses native types and YOU KNOW
IT ALWAYS WILL BE SO b) you
explicitly define your own
On 1) RAII, the idea is to have deletes happen automatically, if you find yourself thinking "I just called new, I'll need to remember to call delete somewhere" then you're doing something wrong. The delete should either be a) automatic or b) be put in a dtor (and which dtor should be obvious).
On 2) Hiding defaults. Identifying rogue default copy ctors etc can be a nightmare, the easiest thing is to avoid them by hiding them. If you have a generic "root" object that everything inherits from (can be handy for debugging / profiling anyway) hide the defaults here, then when an something tries to assign / copy an inheriting class the compiler barfs because the ctor's etc aren't available on the base class.
Minimize the calls to new by using the STL containers for storing your data.
I'm with Glen and jalf regarding RAII at every opportunity.
IMHO you should aim to write completely delete-free code. The only explicit "delete"s should be in your smart pointer class implementations. If you find yourself wanting to write a "delete", go and find an appropriate smart pointer type instead. If none of the "industry standard" ones (boost's etc) fit and you find yourself wanting to write some bizzare new one, chances are your architecture is broken or at the least there will be maintenance difficulties in future.
I've long held that explicit "delete" is to memory management what "goto" is to flow control. More on this in this answer.
I always use std::auto_ptr when I need to create a new object on the heap.
std::auto_ptr<Foo> CreateFoo()
{
return std::auto_ptr<Foo>(new Foo());
}
Even if you call
CreateFoo()
it won't leak
The basic steps are twofold:
Firstly, be aware that every new requires a delete. So, when you use the new operator, up your awareness of what that object will be doing, how it will be used, and how its lifetime will be managed.
Secondly, make sure that you never overwrite a pointer. You can do this using a smart pointer class instead of raw pointers, but if you do make absolutely sure you never use it with implicit conversion. (an example: using MSXML library, I created a CCOMPtr smart pointer to hold nodes, to get a node you call the get_Node method, passing in the address of the smart pointer - which had a conversion operator that returned the underlying pointer type. Unfortunately, this meant that if the smart pointer already held data, that member data would be overwritten, leaking the previous node).
I think those 2 cases are the times when you might leak memory. If you only use the smart pointer directly - never allowing its internal data to be exposed, you're safe from the latter issue. If you wrap all your code that uses new and delete in a class (ie using RAII) then you're pretty safe from the former too.
Avoiding memory leaks in C++ is very easy if you do the above.
Two simple rules of thumb:
Never call delete explicitly (outside a RAII class, that is). Every memory allocation should be the responsibility of a RAII class which calls delete in the destructor.
Almost never call new explicitly. If you do, you should immediately wrap the resulting pointer in a smart pointer, which takes ownership of the allocation, and works as above.
In your own RAII classes, two common pitfalls are:
Failure to handle copying correctly: Who takes ownership of the memory if the object is copied? Do they create a new allocation? Do you implement both copy constructor and assignment operator? Does the latter handle self assignment?
Failure to consider exception safety. What happens if an exception is thrown during an operation (an assignment, for example)? Does the object revert to a consistent state? (it should always do this, no matter what) Does it roll back to the state it had before the operation? (it should do this when possible) std::vector has to handle this, during push_back for example. It might cause the vector to resize, which means 1) a memory allocation which may throw, and 2) all the existing elements have to be copied, each of which may throw. An algorithm like std::sort has to deal with it too. It has to call a user-supplied comparer, which could potentially throw too! if that happens, is the sequence left in a valid state? Are temporary objects destructed cleanly?
If you handle the above two cases in your RAII classes, it is pretty much impossible for them to leak memory.
And if you use RAII classes to wrap all resource allocations (memory allocations, file handles, database connections and any other type of resource that has to be acquired and released), then your application can not leak memory.
Make sure shared memory created by your application is freed if nobody's using it anymore, clean up memory mapped files...
Basically, make sure you clean up any type of resource your application directly or indirectly creates. File descriptors are only one type of resource your application may use during runtime.
if you make any tree or graph recursively in your code for your data structure maybe eat all of your memory.
There are static code analysis tools available that do this sort of thing; wikipedia is a good place to start looking. Basically, outside of being careful and choosing the correct containers you can not make guarantees about the code you write - hence the need for tools such as valgrind and gdb.
Incorporate valgrind unit and system testing early in your development cycle and use it consistantly.