C++ dynamic memory detail - c++

I'm a C and Java programmer, so memory allocation and OOP aren't anything new to me. But, I'm not sure about how exactly to avoid memory leaks with C++ implementation of objects. Namely:
string s1("0123456789");
string s2 = s1.substr(0,3);
s2 now has a new string object, so it must be freed via:
delete &s2;
Right?
Moreover, am I correct to assume that I'll have to delete the address for any (new) object returned by a function, regardless of the return type not being a pointer or reference? It just seems weird that an object living on the heap wouldn't be returned as a pointer when it must be freed.

No.
You only need to free memory you allocate (i.e. via new or memalloc).

No,
Both s1 and s2 will get destructed when out of scope.
s1.substr() will create a temporary object that you don't have to think of.

There are a couple of layers to the answers.
First, variables can be declared in several different ways:
As local variables (either inside a function or as class members). Examples: int i, or int i = 42. These have automatic storage duration (these examples are technically shorthand for auto int i, or auto int i = 42, although the auto keyword is virtually never used. What this means is that these variables will be automatically freed when they go out of scope. Their destructor is guaranteed to be called (no matter how you leave the scope, whether it is by a function return or by throwing an exception, their destructor will be called when they go out of scope, and then the memory they use is freed). Local variables declared like this are allocated on the stack.
A static variables (with the static keyword, implying static storage duration, as opposed to automatic shown above. These just stay around for the duration of the program, and so don't have to be freed
on the heap, with dynamic allocation (new int or new int(42)). These have to be freed manually by calling delete.
So at the lowest level, you basically just have to preserve symmetry. If something was allocated with new, free it with delete,mallocis freed byfree, andnew[]bydelete[]`. And variables that are declared without any of these are automatically handled, and should not be manually freed.
Now, to keep memory management simple, RAII is typically used. The technique is based on the observation that only dynamically allocated objects have to be manually freed, and that local variables give you a very convenient hook for implementing custom actions when a local variable goes out of scope.
So, the dynamic allocation is wrapped up inside a separate class, which can be allocated as a local object on the stack. It can then, in its constructor, make any dynamic allocations you need, and its destructor cleans it up with the necessary delete calls.
This means that you essentially never have to call delete in your top-level code. It'll virtually always be hidden behind a RAII object's destructor. new calls become rare too, but are still used together with smart pointers (such as this: boost::shared_ptr<int>(new int(42)) which dynamically allocates an integer, and then passes it to a smart pointer which takes over ownership of it, and cleans it up automatically.

Both s1 and s2 are auto allocated. You don't delete those.
You only delete objects which you have created using new.
C++ knows three allocation modes: auto, static and dynamic. Read up on those.
Auto values, like your strings in the example are freed automatically when they leave scope. Their destructor is called automatically. Any memory that the strings allocated dynamically during their operation is freed when the string destructor is called.

Like everyone else said - no delete is needed here. Unless you see a new, you (generally) don't need a delete. I do want to add that s2 doesn't have a new string object. s2 is a string object from the time that it is declared. Assigning a slice of s1 to s2 modifies s2 so that it contains the same characters that are in the substring.
It's really important to understand what is going on here and it will become even more important the more you dig into C++. Oh, and congrats on learning a new language.

No, s2 does not need to be manually deleted. It is a local variable on the stack that will get automatically destroyed once it goes out of scope, just as it was automatically allocated when it was declared. Generally you only delete things that you allocated with new.
Temporary objects returned by functions are managed automatically and get destroyed by the end of the statement - if they are needed longer they get usually copied to a local variable before the temporary is destroyed (e.g. by a simple assignment like Object o = f(); or like in the line with the substr() call in your example).
If a function returns a pointer, the pointed-to object is not automatically managed like this. In the documentation should be noted who is responsible to delete the object once all work is done. As a default convention usually the one who allocated the object is responsible to delete it again, but the details need to be documented somewhere.

One of the most important concepts to understand in modern C++ (especially coming from a c background) is RAII. C++ encapsulates resources like memory (or mutexes, or transactions) inside classes that "acquire" the resource on construction (the string constructor allocates dynamic memory) and "release" it on destruction (destruction of the string class frees it). Since destruction of stack objects is deterministic (the stack based object expires when the enclosed scope does), the release doesn't have to be written, and will happen even if an exception is thrown.
So, no. In most of my coding, I never have a write an explicit delete (or delete[]), because either the resources are managed by string, an STL conatiner, or a shared_ptr or scoped_ptr.

why do you have to free s2 manually?
deletion of dynamic memory in s2 will be handled by the std::string destructor.

Related

Can i delete a statically defined variable using delete operator? [duplicate]

I have an object with a vector of pointers to other objects in it, something like this:
class Object {
...
vector<Object*> objlist;
...
};
Now, Objects will be added to list in both of these ways:
Object obj;
obj.objlist.push_back(new Object);
and
Object name;
Object* anon = &name;
obj.objlist.push_back(anon);
If a make a destructor that is simply
~Object {
for (int i = 0; i < objlist.size(); i++) {
delete objlist[i];
objlist[i] = NULL;
}
}
Will there be any adverse consequences in terms of when it tries to delete an object that was not created with new?
Yes, there will be adverse effects.
You must not delete an object that was not allocated with new. If the object was allocated on the stack, your compiler has already generated a call to its destructor at the end of its scope. This means you will call the destructor twice, with potentially very bad effects.
Besides calling the destructor twice, you will also attempt to deallocate a memory block that was never allocated. The new operator presumably puts the objects on the heap; delete expects to find the object in the same region the new operator puts them. However, your object that was not allocated with new lives on the stack. This will very probably crash your program (if it does not already crash after calling the destructor a second time).
You'll also get in deep trouble if your Object on the heap lives longer than your Object on the stack: you'll get a dangling reference to somewhere on the stack, and you'll get incorrect results/crash the next time you access it.
The general rule I see is that stuff that live on the stack can reference stuff that lives on the heap, but stuff on the heap should not reference stuff on the stack because of the very high chances that they'll outlive stack objects. And pointers to both should not be mixed together.
No, you can only delete what you newed
Object* anon = &name;
When name goes out of scope, you will have an invalid pointer in your vector.
What you're actually asking is whether it's safe to delete an object not allocated via new through the delete operator, and if so, why?
Unfortunately, this is obfuscated by some other problems in your code. As mentioned, when name goes out of scope, you're going to end up with an invalid pointer.
See zneak's answer for why your original question doesn't result in a safe operation, and why the scope for name actually matters.
This will not work - if you delete an object that wasn't allocated by new you've violated the rules or the delete operator.
If you need to have your vector store objects that may or may not need to be deleted, you'll need to keep track of that somehow. One option is to use a smart pointer that keeps track of whether the pointed to object is dynamic or not. For example, shared_ptr<> allows you to specify a deallocator object when constructing the shard_ptr<> and as the docs mention:
For example, a "no-op" deallocator is useful when returning a shared_ptr to a statically allocated object
However, you should still be careful when passing pointers to automatic variables - if the vector's lifetime is longer than the lifetime of the variable then it'll be refering to garbage at some point.

A Java programmer wants to understand a C++ code: Use a method of an object without new

I have been using Java since years. Now I need to understand a piece of C++ program.
TimeStamp theTimeStamp;
theTimeStamp.update();
What puzzles me is why don't we write
TimeStamp theTimeStamp = new();
My intuition is, to use an object, a memory space should be first allocated and associated with the object.
I guess this is a point where Java and C++ differ fundamentally? Could you clarify?
[EDIED] I wrote 'TimeStamp theTimeStamp = malloc();'
You would never write TimeStamp theTimeStamp = malloc(); in C++.
First, because malloc is from C and should really not be used in C++. Second, it is unaware of its context and needs a parameter to specify how much memory it must allocate, and returns an untyped pointer which you'd need to cast.
Instead you'd e.g. write
TimeStamp * theTimeStamp = new TimeStamp();
See - that's very similar to Java. Notice the * in there? That's for specifying that theTimeStamp is a pointer (in Java, every variable of a user-defined type is a pointer/reference, so you don't have to care about explicitly stating this).
In C++, however, you can choose whether you want
C++ to automatically handle the creation and destruction with the variable scope (i.e., without the *, as is done in your first code example). This means however, that as soon as theTimeStamp goes out of scope (i.e. usually at the end of the block where the variable is defined), the variable will be destroyed automatically.
Or if you want to do the dynamic memory allocation yourself (the default case in Java) - but in contrast to Java, you'd also have to care about the deletion of the object yourself in C++.
This having to take care about deletion "manually" is why in C++ usually such raw pointers are not used directly, but instead so called smart pointer types, e.g. std::shared_ptr from the new C++11 standard. They spare you the chore of having to do the deletion manually (and probably forgetting about it in many cases). There are other smart pointer types as well; the shared_ptr however provides the closest resemblance to what Java does - you can assign a shared_ptr to another, thereby keeping the object it points to alive, and only when the last of the shared pointer's pointing to an object is destroyed, will the object pointed to also be destroyed.
Whenever you can, it is however preferable in C++ to refrain from using pointers at all, and instead using automatically allocated variables.
Think of this in this way. In java when you want an int, you don't do
int i = new int(5);
i++;
You do
int i = 5;
i++;
In Java there is a distinction between a primitive and Objects. Primitives are allocated on the stack without new, and are destroyed at the end of scope. Objects are allocated with new and are garbage collected.
In C++ every class you write is by default like a primitive. It gets created on the stack, and gets destroyed at the end of scope. You can control what happens at creation and destruction by writing the Constructor and Destructor. Now this works great as long as your variables are limited by scope. When this is not the case, you can allocate the object of the class on the heap using new (old style) or make_unique/make_shared (modern style).
In your example given, as no arguments are passed, the constructor of TimeStamp is called with no arguments to create a new TimeStamp on the stack. This variable will be deleted when its scope runs out (A new stack frame is used).
malloc allocates an amount of memory passed as a parameter and returns a void* to a block of that size. This memory is allocated on the heap, and will not be deleted when the scope runs out, and must be freed explicitly.
Seeing as this is C++ however, you do not want to be using malloc and free, you should instead stick to the friendly C++ variants new and delete.
An automatic variable is a variable which is allocated and deallocated automatically when program flow enters and leaves the variable's context.
All variables declared within a block of code are automatic by default.
So when the flow reaches
TimeStamp theTimeStamp;
it automatically allocates this object on the stack using default constructor. The destructor is invoked automatically too, when the flow reaches }
You can also allocate it using dynamic memory:
TimeStamp *theTimeStamp = new TimeStamp(); //calling default constructor
And delete theTimeStamp; manually.
Never use malloc or free to allocate the class variable(object).

New Object variations

This is a very newbie question, but something completely new to me. In my code, and everywhere I have seen it before, new objects are created as such...
MyClass x = new MyClass(factory);
However, I just saw some example code that looks like this...
MyClass x(factory);
Does that do the same thing?
Not at all.
The first example uses dynamic memory allocation, i.e., you are allocating an instance of MyClass on the heap (as opposed to the stack). You would need to call delete on that pointer manually or you end up with a memory leak. Also, operator new returns a pointer, not the object itself, so your code would not compile. It needs to change to:
MyClass* x = new MyClass(factory);
The second example allocated an instance of MyClass on the stack. This is very useful for short lived objects as they will automatically be cleaned up when the leave the current scope (and it is fast; cleaning up the stack involves nothing more than incrementing or decrementing a pointer).
This is also how you would implement the Resource Acquisition is Initialization pattern, more commonly referred to as RAII. The destructor for your type would clean up any dynamically allocated memory, so when the stack allocated variable goes out of scope any dynamically allocated memory is cleaned up for you without the need for any outside calls to delete.
No. When you use new, you create objects off the heap that you must then delete later. In addition, you really need MyClass*. The other form creates an object on the stack that will be automatically destroyed at end of scope.

Is it possible to delete a non-new object?

I have an object with a vector of pointers to other objects in it, something like this:
class Object {
...
vector<Object*> objlist;
...
};
Now, Objects will be added to list in both of these ways:
Object obj;
obj.objlist.push_back(new Object);
and
Object name;
Object* anon = &name;
obj.objlist.push_back(anon);
If a make a destructor that is simply
~Object {
for (int i = 0; i < objlist.size(); i++) {
delete objlist[i];
objlist[i] = NULL;
}
}
Will there be any adverse consequences in terms of when it tries to delete an object that was not created with new?
Yes, there will be adverse effects.
You must not delete an object that was not allocated with new. If the object was allocated on the stack, your compiler has already generated a call to its destructor at the end of its scope. This means you will call the destructor twice, with potentially very bad effects.
Besides calling the destructor twice, you will also attempt to deallocate a memory block that was never allocated. The new operator presumably puts the objects on the heap; delete expects to find the object in the same region the new operator puts them. However, your object that was not allocated with new lives on the stack. This will very probably crash your program (if it does not already crash after calling the destructor a second time).
You'll also get in deep trouble if your Object on the heap lives longer than your Object on the stack: you'll get a dangling reference to somewhere on the stack, and you'll get incorrect results/crash the next time you access it.
The general rule I see is that stuff that live on the stack can reference stuff that lives on the heap, but stuff on the heap should not reference stuff on the stack because of the very high chances that they'll outlive stack objects. And pointers to both should not be mixed together.
No, you can only delete what you newed
Object* anon = &name;
When name goes out of scope, you will have an invalid pointer in your vector.
What you're actually asking is whether it's safe to delete an object not allocated via new through the delete operator, and if so, why?
Unfortunately, this is obfuscated by some other problems in your code. As mentioned, when name goes out of scope, you're going to end up with an invalid pointer.
See zneak's answer for why your original question doesn't result in a safe operation, and why the scope for name actually matters.
This will not work - if you delete an object that wasn't allocated by new you've violated the rules or the delete operator.
If you need to have your vector store objects that may or may not need to be deleted, you'll need to keep track of that somehow. One option is to use a smart pointer that keeps track of whether the pointed to object is dynamic or not. For example, shared_ptr<> allows you to specify a deallocator object when constructing the shard_ptr<> and as the docs mention:
For example, a "no-op" deallocator is useful when returning a shared_ptr to a statically allocated object
However, you should still be careful when passing pointers to automatic variables - if the vector's lifetime is longer than the lifetime of the variable then it'll be refering to garbage at some point.

When should I use the new keyword in C++?

I've been using C++ for a short while, and I've been wondering about the new keyword. Simply, should I be using it, or not?
With the new keyword...
MyClass* myClass = new MyClass();
myClass->MyField = "Hello world!";
Without the new keyword...
MyClass myClass;
myClass.MyField = "Hello world!";
From an implementation perspective, they don't seem that different (but I'm sure they are)... However, my primary language is C#, and of course the 1st method is what I'm used to.
The difficulty seems to be that method 1 is harder to use with the std C++ classes.
Which method should I use?
Update 1:
I recently used the new keyword for heap memory (or free store) for a large array which was going out of scope (i.e. being returned from a function). Where before I was using the stack, which caused half of the elements to be corrupt outside of scope, switching to heap usage ensured that the elements were intact. Yay!
Update 2:
A friend of mine recently told me there's a simple rule for using the new keyword; every time you type new, type delete.
Foobar *foobar = new Foobar();
delete foobar; // TODO: Move this to the right place.
This helps to prevent memory leaks, as you always have to put the delete somewhere (i.e. when you cut and paste it to either a destructor or otherwise).
Method 1 (using new)
Allocates memory for the object on the free store (This is frequently the same thing as the heap)
Requires you to explicitly delete your object later. (If you don't delete it, you could create a memory leak)
Memory stays allocated until you delete it. (i.e. you could return an object that you created using new)
The example in the question will leak memory unless the pointer is deleted; and it should always be deleted, regardless of which control path is taken, or if exceptions are thrown.
Method 2 (not using new)
Allocates memory for the object on the stack (where all local variables go) There is generally less memory available for the stack; if you allocate too many objects, you risk stack overflow.
You won't need to delete it later.
Memory is no longer allocated when it goes out of scope. (i.e. you shouldn't return a pointer to an object on the stack)
As far as which one to use; you choose the method that works best for you, given the above constraints.
Some easy cases:
If you don't want to worry about calling delete, (and the potential to cause memory leaks) you shouldn't use new.
If you'd like to return a pointer to your object from a function, you must use new
There is an important difference between the two.
Everything not allocated with new behaves much like value types in C# (and people often say that those objects are allocated on the stack, which is probably the most common/obvious case, but not always true). More precisely, objects allocated without using new have automatic storage duration
Everything allocated with new is allocated on the heap, and a pointer to it is returned, exactly like reference types in C#.
Anything allocated on the stack has to have a constant size, determined at compile-time (the compiler has to set the stack pointer correctly, or if the object is a member of another class, it has to adjust the size of that other class). That's why arrays in C# are reference types. They have to be, because with reference types, we can decide at runtime how much memory to ask for. And the same applies here. Only arrays with constant size (a size that can be determined at compile-time) can be allocated with automatic storage duration (on the stack). Dynamically sized arrays have to be allocated on the heap, by calling new.
(And that's where any similarity to C# stops)
Now, anything allocated on the stack has "automatic" storage duration (you can actually declare a variable as auto, but this is the default if no other storage type is specified so the keyword isn't really used in practice, but this is where it comes from)
Automatic storage duration means exactly what it sounds like, the duration of the variable is handled automatically. By contrast, anything allocated on the heap has to be manually deleted by you.
Here's an example:
void foo() {
bar b;
bar* b2 = new bar();
}
This function creates three values worth considering:
On line 1, it declares a variable b of type bar on the stack (automatic duration).
On line 2, it declares a bar pointer b2 on the stack (automatic duration), and calls new, allocating a bar object on the heap. (dynamic duration)
When the function returns, the following will happen:
First, b2 goes out of scope (order of destruction is always opposite of order of construction). But b2 is just a pointer, so nothing happens, the memory it occupies is simply freed. And importantly, the memory it points to (the bar instance on the heap) is NOT touched. Only the pointer is freed, because only the pointer had automatic duration.
Second, b goes out of scope, so since it has automatic duration, its destructor is called, and the memory is freed.
And the barinstance on the heap? It's probably still there. No one bothered to delete it, so we've leaked memory.
From this example, we can see that anything with automatic duration is guaranteed to have its destructor called when it goes out of scope. That's useful. But anything allocated on the heap lasts as long as we need it to, and can be dynamically sized, as in the case of arrays. That is also useful. We can use that to manage our memory allocations. What if the Foo class allocated some memory on the heap in its constructor, and deleted that memory in its destructor. Then we could get the best of both worlds, safe memory allocations that are guaranteed to be freed again, but without the limitations of forcing everything to be on the stack.
And that is pretty much exactly how most C++ code works.
Look at the standard library's std::vector for example. That is typically allocated on the stack, but can be dynamically sized and resized. And it does this by internally allocating memory on the heap as necessary. The user of the class never sees this, so there's no chance of leaking memory, or forgetting to clean up what you allocated.
This principle is called RAII (Resource Acquisition is Initialization), and it can be extended to any resource that must be acquired and released. (network sockets, files, database connections, synchronization locks). All of them can be acquired in the constructor, and released in the destructor, so you're guaranteed that all resources you acquire will get freed again.
As a general rule, never use new/delete directly from your high level code. Always wrap it in a class that can manage the memory for you, and which will ensure it gets freed again. (Yes, there may be exceptions to this rule. In particular, smart pointers require you to call new directly, and pass the pointer to its constructor, which then takes over and ensures delete is called correctly. But this is still a very important rule of thumb)
The short answer is: if you're a beginner in C++, you should never be using new or delete yourself.
Instead, you should use smart pointers such as std::unique_ptr and std::make_unique (or less often, std::shared_ptr and std::make_shared). That way, you don't have to worry nearly as much about memory leaks. And even if you're more advanced, best practice would usually be to encapsulate the custom way you're using new and delete into a small class (such as a custom smart pointer) that is dedicated just to object lifecycle issues.
Of course, behind the scenes, these smart pointers are still performing dynamic allocation and deallocation, so code using them would still have the associated runtime overhead. Other answers here have covered these issues, and how to make design decisions on when to use smart pointers versus just creating objects on the stack or incorporating them as direct members of an object, well enough that I won't repeat them. But my executive summary would be: don't use smart pointers or dynamic allocation until something forces you to.
Which method should I use?
This is almost never determined by your typing preferences but by the context. If you need to keep the object across a few stacks or if it's too heavy for the stack you allocate it on the free store. Also, since you are allocating an object, you are also responsible for releasing the memory. Lookup the delete operator.
To ease the burden of using free-store management people have invented stuff like auto_ptr and unique_ptr. I strongly recommend you take a look at these. They might even be of help to your typing issues ;-)
If you are writing in C++ you are probably writing for performance. Using new and the free store is much slower than using the stack (especially when using threads) so only use it when you need it.
As others have said, you need new when your object needs to live outside the function or object scope, the object is really large or when you don't know the size of an array at compile time.
Also, try to avoid ever using delete. Wrap your new into a smart pointer instead. Let the smart pointer call delete for you.
There are some cases where a smart pointer isn't smart. Never store std::auto_ptr<> inside a STL container. It will delete the pointer too soon because of copy operations inside the container. Another case is when you have a really large STL container of pointers to objects. boost::shared_ptr<> will have a ton of speed overhead as it bumps the reference counts up and down. The better way to go in that case is to put the STL container into another object and give that object a destructor that will call delete on every pointer in the container.
Without the new keyword you're storing that on call stack. Storing excessively large variables on stack will lead to stack overflow.
If your variable is used only within the context of a single function, you're better off using a stack variable, i.e., Option 2. As others have said, you do not have to manage the lifetime of stack variables - they are constructed and destructed automatically. Also, allocating/deallocating a variable on the heap is slow by comparison. If your function is called often enough, you'll see a tremendous performance improvement if use stack variables versus heap variables.
That said, there are a couple of obvious instances where stack variables are insufficient.
If the stack variable has a large memory footprint, then you run the risk of overflowing the stack. By default, the stack size of each thread is 1 MB on Windows. It is unlikely that you'll create a stack variable that is 1 MB in size, but you have to keep in mind that stack utilization is cumulative. If your function calls a function which calls another function which calls another function which..., the stack variables in all of these functions take up space on the same stack. Recursive functions can run into this problem quickly, depending on how deep the recursion is. If this is a problem, you can increase the size of the stack (not recommended) or allocate the variable on the heap using the new operator (recommended).
The other, more likely condition is that your variable needs to "live" beyond the scope of your function. In this case, you'd allocate the variable on the heap so that it can be reached outside the scope of any given function.
The simple answer is yes - new() creates an object on the heap (with the unfortunate side effect that you have to manage its lifetime (by explicitly calling delete on it), whereas the second form creates an object in the stack in the current scope and that object will be destroyed when it goes out of scope.
Are you passing myClass out of a function, or expecting it to exist outside that function? As some others said, it is all about scope when you aren't allocating on the heap. When you leave the function, it goes away (eventually). One of the classic mistakes made by beginners is the attempt to create a local object of some class in a function and return it without allocating it on the heap. I can remember debugging this kind of thing back in my earlier days doing c++.
C++ Core Guidelines R.11: Avoid using new and delete explicitly.
Things have changed significantly since most answers to this question were written. Specifically, C++ has evolved as a language, and the standard library is now richer. Why does this matter? Because of a combination of two factors:
Using new and delete is potentially dangerous: Memory might leak if you don't keep a very strong discipline of delete'ing everything you've allocated when it's no longer used; and never deleteing what's not currently allocated.
The standard library now offers smart pointers which encapsulate the new and delete calls, so that you don't have to take care of managing allocations on the free store/heap yourself. So do other containers, in the standard library and elsewhere.
This has evolved into one of the C++ community's "core guidelines" for writing better C++ code, as the linked document shows. Of course, there exceptions to this rule: Somebody needs to write those encapsulating classes which do use new and delete; but that someone is rarely yourself.
Adding to #DanielSchepler's valid answer:
The second method creates the instance on the stack, along with such things as something declared int and the list of parameters that are passed into the function.
The first method makes room for a pointer on the stack, which you've set to the location in memory where a new MyClass has been allocated on the heap - or free store.
The first method also requires that you delete what you create with new, whereas in the second method, the class is automatically destructed and freed when it falls out of scope (the next closing brace, usually).
The short answer is yes the "new" keyword is incredibly important as when you use it the object data is stored on the heap as opposed to the stack, which is most important!