I'm woring against a model which consists of a number of different types (Properties, Parent, Child, etc). Each type is associated with a set of functions from a c api. For example:
Type "Properties":
char* getName(PropertiesHandle);
char* getDescription(PropertiesHandle);
Type "Parent"
PropertiesHandle getProperties(ParentHandle);
ChildHanlde getFirstChild(ParentHandle);
Type "Child"
PropertiesHandle getProperties(ChildHandle);
ParentHanlde getParent(ChildHandle);
ChildHandle getNextChild(ChildHandle);
I have in turn created a set of C++ interfaces for this c api library, as follows:
class IProperties
{
public:
virtual std::string getName() = 0;
virtual std::string getDescription() = 0;
};
class IParent
{
public:
virtual std::shared_ptr<IProperties> getProperties() = 0;
virtual std::shared_ptr<IChild> getFirstChild() = 0;
};
class IChild
{
public:
virtual std::shared_ptr<IProperties> getProperties() = 0;
virtual std::shared_ptr<IParent> getParent() = 0;
virtual std::shared_ptr<IChild> getNextChild() = 0;
};
I then implement these interfaces via the classes Properties, Parent and Child.
So a Child instance will take its specific ChildHandle via its constructor and its getParent function will look something like this:
std::shared_ptr<IParent> getParent()
{
// get the parent handle and wrap it in a Parent object
return std::shared_ptr<IParent>(new Parent(_c_api->getParent(_handle)));
}
Is it reasonable for me to return a shared_ptr here in your opinion. I cant use std::unique_ptr since Google Mock requires parameters and return values of mocked methods to be copyable. I'm mocking these interfaces in my tests via Google Mock.
I'm thinking also about how things might get optimized in the future which might present the possibly of circular references. This could be caused if caching is used to avoid multiple calls to the C api (for example, no need for a child to establish its parent more than once) combined with say the Child constructor taking its Parent. This would then require the use of weak_ptrs which would change the interfaces and a lot of my code...
The key question is: what are the semantics of the returned pointer?
if the returned parent/child/properties object has a lifetime independent of the returning (presumably, in some sense, owning) object, it's reasonable to return shared_ptr: this indicates that the caller and callee have equal rights to decide the object's lifetime
std::shared_ptr<IChild> child = parent->getFirstChild();
// now I can keep child around ... if parent is destroyed, one
// orphaned subtree is magically kept around. Is this desirable?
if the returned object has a lifetime dependent on the callee's own lifetime, then:
shared_ptr will wrongly suggest it's meaningful for the caller to extend the returned object's lifetime beyond that of the callee
unique_ptr will wrongly suggest transfer of ownership
raw pointer doesn't explicitly make any misleading promises, but doesn't give any hint about correct use either
So, if the caller is just getting a working reference to your object's internal state, without either transfer of ownership or extension of object lifetime, it doesn't suggest using any smart pointer.
Consider just returning a reference.
There's nothing wrong with returning a shared_ptr, but I'll try to convince you that this might not be the best option.
By using a smart pointer you gain the advantage of safety, but the users of your API lose the flexibility of using the type of smart pointer that best fits their needs and instead have to always use shared_ptr.
It also depends on how much you value safety over flexibility, but I would personally consider returning a naked pointer and allow the user to use the smart pointer he wants. Of course, if it is necessary that I use shared_ptr for some reason, I will.
shared_ptr is fine, but it does provide some limitation to the end user, such as C++11 support. A raw pointer, or a trait that allows them to tailor the smart pointer, may provide more flexibility to the end user.
Regardless of the pointer, I suggest being careful with the semantics introduced by the implementation. With the current implementation, with a new wrapped being instantiated for every accessor call, equivalence checks will fail. Consider the following code:
auto child = parent->getFirstChild();
if ( parent == child->getParent() ) // Will be false, as they point to different
// instantiations of Parent.
...
if ( child->getParent() == child->getParent() ) // False for the same reason.
...
auto sibling = child->getNextChild();
if ( parent == sibling->getParent() ) // Also false for the same reason.
...
Also, when using std::shared_ptr, it can be worthwhile to consider using std::make_shared to reduce some of the overhead that occurs with the allocations.
Related
With respect to smart pointers and new C++11/14 features, I am wondering what the best-practice return values and function parameter types would be for classes that have these facilities:
A factory function (outside of the class) that creates objects and returns them to users of the class. (For example opening a document and returning an object that can be used to access the content.)
Utility functions that accept objects from the factory functions, use them, but do not take ownership. (For example a function that counts the number of words in the document.)
Functions that keep a reference to the object after they return (like a UI component that takes a copy of the object so it can draw the content on the screen as needed.)
What would the best return type be for the factory function?
If it's a raw pointer the user will have to delete it correctly which is problematic.
If it returns a unique_ptr<> then the user can't share it if they want to.
If it's a shared_ptr<> then will I have to pass around shared_ptr<> types everywhere? This is what I'm doing now and it's causing problems as I'm getting cyclic references, preventing objects from being destroyed automatically.
What is the best parameter type for the utility function?
I imagine passing by reference will avoid incrementing a smart pointer reference count unnecessarily, but are there any drawbacks of this? The main one that comes to mind is that it prevents me from passing derived classes to functions taking parameters of the base-class type.
Is there some way that I can make it clear to the caller that it will NOT copy the object? (Ideally so that the code will not compile if the function body does try to copy the object.)
Is there a way to make it independent of the type of smart pointer in use? (Maybe taking a raw pointer?)
Is it possible to have a const parameter to make it clear the function will not modify the object, without breaking smart pointer compatibility?
What is the best parameter type for the function that keeps a reference to the object?
I'm guessing shared_ptr<> is the only option here, which probably means the factory class must return a shared_ptr<> also, right?
Here is some code that compiles and hopefully illustrates the main points.
#include <iostream>
#include <memory>
struct Document {
std::string content;
};
struct UI {
std::shared_ptr<Document> doc;
// This function is not copying the object, but holding a
// reference to it to make sure it doesn't get destroyed.
void setDocument(std::shared_ptr<Document> newDoc) {
this->doc = newDoc;
}
void redraw() {
// do something with this->doc
}
};
// This function does not need to take a copy of the Document, so it
// should access it as efficiently as possible. At the moment it
// creates a whole new shared_ptr object which I feel is inefficient,
// but passing by reference does not work.
// It should also take a const parameter as it isn't modifying the
// object.
int charCount(std::shared_ptr<Document> doc)
{
// I realise this should be a member function inside Document, but
// this is for illustrative purposes.
return doc->content.length();
}
// This function is the same as charCount() but it does modify the
// object.
void appendText(std::shared_ptr<Document> doc)
{
doc->content.append("hello");
return;
}
// Create a derived type that the code above does not know about.
struct TextDocument: public Document {};
std::shared_ptr<TextDocument> createTextDocument()
{
return std::shared_ptr<TextDocument>(new TextDocument());
}
int main(void)
{
UI display;
// Use the factory function to create an instance. As a user of
// this class I don't want to have to worry about deleting the
// instance, but I don't really care what type it is, as long as
// it doesn't stop me from using it the way I need to.
auto doc = createTextDocument();
// Share the instance with the UI, which takes a copy of it for
// later use.
display.setDocument(doc);
// Use a free function which modifies the object.
appendText(doc);
// Use a free function which doesn't modify the object.
std::cout << "Your document has " << charCount(doc)
<< " characters.\n";
return 0;
}
I'll answer in reverse order so to begin with the simple cases.
Utility functions that accept objects from the factory functions, use them, but do not take ownership. (For example a function that counts the number of words in the document.)
If you are calling a factory function, you are always taking ownership of the created object by the very definition of a factory function. I think what you mean is that some other client first obtains an object from the factory and then wishes to pass it to the utility function that does not take ownership itself.
In this case, the utility function should not care at all how ownership of the object it operates on is managed. It should simply accept a (probably const) reference or – if “no object” is a valid condition – a non-owning raw pointer. This will minimize the coupling between your interfaces and make the utility function most flexible.
Functions that keep a reference to the object after they return (like a UI component that takes a copy of the object so it can draw the content on the screen as needed.)
These should take a std::shared_ptr by value. This makes it clear from the function's signature that they take shared ownership of the argument.
Sometimes, it can also be meaningful to have a function that takes unique ownership of its argument (constructors come to mind). Those should take a std::unique_ptr by value (or by rvalue reference) which will also make the semantics clear from the signature.
A factory function (outside of the class) that creates objects and returns them to users of the class. (For example opening a document and returning an object that can be used to access the content.)
This is the difficult one as there are good arguments for both, std::unique_ptr and std::shared_ptr. The only thing clear is that returning an owning raw pointer is no good.
Returning a std::unique_ptr is lightweight (no overhead compared to returning a raw pointer) and conveys the correct semantics of a factory function. Whoever called the function obtains exclusive ownership over the fabricated object. If needed, the client can construct a std::shared_ptr out of a std::unique_ptr at the cost of a dynamic memory allocation.
On the other hand, if the client is going to need a std::shared_ptr anyway, it would be more efficient to have the factory use std::make_shared to avoid the additional dynamic memory allocation. Also, there are situations where you simply must use a std::shared_ptr for example, if the destructor of the managed object is non-virtual and the smart pointer is to be converted to a smart pointer to a base class. But a std::shared_ptr has more overhead than a std::unique_ptr so if the latter is sufficient, we would rather avoid that if possible.
So in conclusion, I'd come up with the following guideline:
If you need a custom deleter, return a std::shared_ptr.
Else, if you think that most of your clients are going to need a std::shared_ptr anyway, utilize the optimization potential of std::make_shared.
Else, return a std::unique_ptr.
Of course, you could avoid the problem by providing two factory functions, one that returns a std::unique_ptr and one that returns a std::shared_ptr so each client can use what best fits its needs. If you need this frequently, I guess you can abstract most of the redundancy away with some clever template meta-programming.
What would the best return type be for the factory function?
unique_ptr would be best. It prevents accidental leaks, and the user can release ownership from the pointer, or transfer ownership to a shared_ptr (which has a constructor for that very purpose), if they want to use a different ownership scheme.
What is the best parameter type for the utility function?
A reference, unless the program flow is so convoluted that the object might be destroyed during the function call, in which case shared_ptr or weak_ptr. (In either case, it can refer to a base class, and add const qualifiers, if you want that.)
What is the best parameter type for the function that keeps a reference to the object?
shared_ptr or unique_ptr, if you want it to take responsibility for the object's lifetime and not otherwise worry about it. A raw pointer or reference, if you can (simply and reliably) arrange for the object to outlive everything that uses it.
Most of the other answers cover this, but #T.C. linked to a few really good guidelines which I'd like to summarise here:
Factory function
A factory that produces a reference type should return a unique_ptr by default, or a shared_ptr if ownership is to be shared with the factory.
-- GotW #90
As others have pointed out, you as the recipient of the unique_ptr can convert it to a shared_ptr if you wish.
Function parameters
Don’t pass a smart pointer as a function parameter unless you want to use or manipulate the smart pointer itself, such as to share or transfer ownership.
Prefer passing objects by value, *, or &, not by smart pointer.
-- GotW #91
This is because when you pass by smart pointer, you increment the reference counter at the start of the function, and decrement it at the end. These are atomic operations, which require synchronisation across multiple threads/processors, so in heavily multithreaded code the speed penalty can be quite high.
When you're in the function the object is not going to disappear because the caller still holds a reference to it (and can't do anything with the object until your function returns) so incrementing the reference count is pointless if you're not going to keep a copy of the object after the function returns.
For functions that don't take ownership of the object:
Use a * if you need to express null (no object), otherwise prefer to use a &; and if the object is input-only, write const widget* or const widget&.
-- GotW #91
This doesn't force your caller to use a particular smart pointer type - any smart pointer can be converted into a normal pointer or a reference. So if your function doesn't need to keep a copy of the object or take ownership of it, use a raw pointer. As above, the object won't disappear in the middle of your function because the caller is still holding on to it (except in special circumstances, which you would already be aware of if this is an issue for you.)
For functions that do take ownership of the object:
Express a “sink” function using a by-value unique_ptr parameter.
void f( unique_ptr<widget> );
-- GotW #91
This makes it clear the function takes ownership of the object, and it's possible to pass raw pointers to it that you might have from legacy code.
For functions that take shared ownership of the object:
Express that a function will store and share ownership of a heap object using a by-value shared_ptr parameter.
-- GotW #91
I think these guidelines are very useful. Read the pages the quotes came from for more background and in-depth explanation, it's worth it.
I would return a unique_ptr by value in most situations. Most resources shouldn't be shared, since that makes it hard to reason about their lifetimes. You can usually write your code in such a way to avoid shared ownership. In any case, you can make a shared_ptr from the unique_ptr, so it's not like you're limiting your options.
I'm currently trying to find out how to properly use the shared_ptr feature of C++11 in C++ APIs. The main area where I need it is in container classes (Like nodes in a scene graph for example which may contain a list of child nodes and a reference to the parent node and stuff like that). Creating copies of the nodes is not an option and using references or pointers is pain in the ass because no one really knows who is responsible for destructing the nodes (And when someone destructs a node which is still referenced by some other node the program will crash).
So I think using shared_ptr may be a good idea here. Let's take a look at the following simplified example (Which demonstrates a child node which must be connected to a parent node):
#include <memory>
#include <iostream>
using namespace std;
class Parent {};
class Child {
private:
shared_ptr<Parent> parent;
public:
Child(const shared_ptr<Parent>& parent) : parent(parent) {}
Parent& getParent() { return *parent.get(); }
};
int main() {
// Create parent
shared_ptr<Parent> parent(new Parent());
// Create child for the parent
Child child(parent);
// Some other code may need to get the parent from the child again like this:
Parent& p = child.getParent();
...
return 0;
}
This API forces the user to use a shared_ptr for creating the actual connection between the child and the parent. But in other methods I want a more simple API, that's why the getParent() method returns a reference to the parent and not the shared_ptr.
My first question is: Is this a correct usage of shared_ptr? Or is there room for improvement?
My second question is: How do I properly react on null-pointers? Because the getParent method returns a reference the user may think it never can return NULL. But that's wrong because it will return NULL when someone passes a shared pointer containing a null-pointer to the constructor. Actually I don't want null pointers. The parent must always be set. How do I properly handle this? By manually checking the shared pointer in the constructor and throwing an exception when it contains NULL? Or is there a better way? Maybe some sort of non-nullable-shared-pointer?
Using shared pointers for the purpose you describe is reasonable and increasingly common in C++11 libraries.
A few points to note:
On an API, taking a shared_ptr as an argument forces the caller construct a shared_ptr. This is definitely a good move where there is a transfer of ownership of the pointee. In cases where the function merely uses a shared_ptr, it may be acceptable to take a reference to the object or the shared_ptr
You are using shared_ptr<Parent> to hold a back reference to the parent object whilst using one in the other direction. This will create a retain-cycle resulting in objects that never get deleted. In general, used a shared_ptr when referencing from the top down, and a weak_ptr when referencing up. Watch out in particular for delegate/callback/observer objects - these almost always want a weak_ptr to the callee. You also need to take care around lambdas if they are executing asynchronously. A common pattern is to capture a weak_ptr.
Passing shared pointers by reference rather than value is a stylistic point with arguments for and against. Clearly when passing by reference you are not passing ownership (e.g. increasing the reference count on the object). On the other hand, you are also not taking the overhead either. There is a danger that you under reference objects this way. On a more practical level, with a C++11 compiler and standard library, passing by value should result in a move rather than copy construction and be very nearly free anyway. However, passing by reference makes debugging considerably easier as you won't be repeatedly stepping into shared_ptr's constructor.
Construct your shared_ptr with std::make_shared rather than new() and shared_ptr's constructor
shared_ptr<Parent> parent = std::make_shared<Parent>();
With modern compilers and libraries this can save a call to new().
both shared_ptr and weak_ptr can contain NULL - just as any other pointer can. You should always get in the habit of checking before dereferencing and probably assert()ing liberally too. For the constructor case, you can always accept NULL pointers and instead throw at the point of use.
You might consider using a typedef for your shared pointer type. One style that is sometimes used is follows:
typedef std::weak_ptr<Parent> Parent_P;
typedef std::shared_ptr<Parent> Parent_WkP;
typedef std::weak_ptr<Child> Child_P;
typedef std::shared_ptr<Child> Child_WkP;
It's also useful to know that in header files you can forward declare shared_ptr<Type> without having seen a full declaration for Type. This can save a lot of header bloat
The way that you are using shared pointers is correct with 2 caveats.
That your tree of parents and childen must share the lifetime of the pointers with other objects. If your Parent child tree will be the sole users of the pointer, please use a unique_ptr. If another object controls the lifetime of the pointer are you only want to reference the pointer, you may be better off using a weak_ptr unless the lifetime is guaranteed to exceed your Parent Child tree the raw pointer may be suitable.. Please remember that with shared_ptr you can get circular reference so it is not a silver bullet.
As for how to control NULL pointers: well this all comes down to the contract implicit in your API. If the user is not allowed to supply a null pointer, you just need to document this fact. The best way to do this is to include an assert that the pointer is not null. This will crash your application in debug mode (if the pointer is null) but will not incur a runtime penalty on your release binary. If however a null pointer is is an allowed input for some reason, then you need to provide correct error handling in the case of a null pointer.
Children do not own their parents. Rather, it's the other way around. If children need to be able to get their parents, then use a non-owning pointer or reference. Use shared (or better, unique if you can) pointer for parent to child.
I've been thinking about the possible use of delete this in c++, and I've seen one use.
Because you can say delete this only when an object is on heap, I can make the destructor private and stop objects from being created on stack altogether. In the end I can just delete the object on heap by saying delete this in a random public member function that acts as a destructor. My questions:
1) Why would I want to force the object to be made on the heap instead of on the stack?
2) Is there another use of delete this apart from this? (supposing that this is a legitimate use of it :) )
Any scheme that uses delete this is somewhat dangerous, since whoever called the function that does that is left with a dangling pointer. (Of course, that's also the case when you delete an object normally, but in that case, it's clear that the object has been deleted). Nevertheless, there are somewhat legitimate cases for wanting an object to manage its own lifetime.
It could be used to implement a nasty, intrusive reference-counting scheme. You would have functions to "acquire" a reference to the object, preventing it from being deleted, and then "release" it once you've finished, deleting it if noone else has acquired it, along the lines of:
class Nasty {
public:
Nasty() : references(1) {}
void acquire() {
++references;
}
void release() {
if (--references == 0) {
delete this;
}
}
private:
~Nasty() {}
size_t references;
};
// Usage
Nasty * nasty = new Nasty; // 1 reference
nasty->acquire(); // get a second reference
nasty->release(); // back to one
nasty->release(); // deleted
nasty->acquire(); // BOOM!
I would prefer to use std::shared_ptr for this purpose, since it's thread-safe, exception-safe, works for any type without needing any explicit support, and prevents access after deleting.
More usefully, it could be used in an event-driven system, where objects are created, and then manage themselves until they receive an event that tells them that they're no longer needed:
class Worker : EventReceiver {
public:
Worker() {
start_receiving_events(this);
}
virtual void on(WorkEvent) {
do_work();
}
virtual void on(DeleteEvent) {
stop_receiving_events(this);
delete this;
}
private:
~Worker() {}
void do_work();
};
1) Why would I want to force the object to be made on the heap instead of on the stack?
1) Because the object's lifetime is not logically tied to a scope (e.g., function body, etc.). Either because it must manage its own lifespan, or because it is inherently a shared object (and thus, its lifespan must be attached to those of its co-dependent objects). Some people here have pointed out some examples like event handlers, task objects (in a scheduler), and just general objects in a complex object hierarchy.
2) Because you want to control the exact location where code is executed for the allocation / deallocation and construction / destruction. The typical use-case here is that of cross-module code (spread across executables and DLLs (or .so files)). Because of issues of binary compatibility and separate heaps between modules, it is often a requirement that you strictly control in what module these allocation-construction operations happen. And that implies the use of heap-based objects only.
2) Is there another use of delete this apart from this? (supposing that this is a legitimate use of it :) )
Well, your use-case is really just a "how-to" not a "why". Of course, if you are going to use a delete this; statement within a member function, then you must have controls in place to force all creations to occur with new (and in the same translation unit as the delete this; statement occurs). Not doing this would just be very very poor style and dangerous. But that doesn't address the "why" you would use this.
1) As others have pointed out, one legitimate use-case is where you have an object that can determine when its job is over and consequently destroy itself. For example, an event handler deleting itself when the event has been handled, a network communication object that deletes itself once the transaction it was appointed to do is over, or a task object in a scheduler deleting itself when the task is done. However, this leaves a big problem: signaling to the outside world that it no longer exists. That's why many have mentioned the "intrusive reference counting" scheme, which is one way to ensure that the object is only deleted when there are no more references to it. Another solution is to use a global (singleton-like) repository of "valid" objects, in which case any accesses to the object must go through a check in the repository and the object must also add/remove itself from the repository at the same time as it makes the new and delete this; calls (either as part of an overloaded new/delete, or alongside every new/delete calls).
However, there is a much simpler and less intrusive way to achieve the same behavior, albeit less economical. One can use a self-referencing shared_ptr scheme. As so:
class AutonomousObject {
private:
std::shared_ptr<AutonomousObject> m_shared_this;
protected:
AutonomousObject(/* some params */);
public:
virtual ~AutonomousObject() { };
template <typename... Args>
static std::weak_ptr<AutonomousObject> Create(Args&&... args) {
std::shared_ptr<AutonomousObject> result(new AutonomousObject(std::forward<Args>(args)...));
result->m_shared_this = result; // link the self-reference.
return result; // return a weak-pointer.
};
// this is the function called when the life-time should be terminated:
void OnTerminate() {
m_shared_this.reset( NULL ); // do not use reset(), but use reset( NULL ).
};
};
With the above (or some variations upon this crude example, depending on your needs), the object will be alive for as long as it deems necessary and that no-one else is using it. The weak-pointer mechanism serves as the proxy to query for the existence of the object, by possible outside users of the object. This scheme makes the object a bit heavier (has a shared-pointer in it) but it is easier and safer to implement. Of course, you have to make sure that the object eventually deletes itself, but that's a given in this kind of scenario.
2) The second use-case I can think of ties in to the second motivation for restricting an object to be heap-only (see above), however, it applies also for when you don't restrict it as such. If you want to make sure that both the deallocation and the destruction are dispatched to the correct module (the module from which the object was allocated and constructed), you must use a dynamic dispatching method. And for that, the easiest is to just use a virtual function. However, a virtual destructor is not going to cut it because it only dispatches the destruction, not the deallocation. The solution is to use a virtual "destroy" function that calls delete this; on the object in question. Here is a simple scheme to achieve this:
struct CrossModuleDeleter; //forward-declare.
class CrossModuleObject {
private:
virtual void Destroy() /* final */;
public:
CrossModuleObject(/* some params */); //constructor can be public.
virtual ~CrossModuleObject() { }; //destructor can be public.
//.... whatever...
friend struct CrossModuleDeleter;
template <typename... Args>
static std::shared_ptr< CrossModuleObject > Create(Args&&... args);
};
struct CrossModuleDeleter {
void operator()(CrossModuleObject* p) const {
p->Destroy(); // do a virtual dispatch to reach the correct deallocator.
};
};
// In the cpp file:
// Note: This function should not be inlined, so stash it into a cpp file.
void CrossModuleObject::Destroy() {
delete this;
};
template <typename... Args>
std::shared_ptr< CrossModuleObject > CrossModuleObject::Create(Args&&... args) {
return std::shared_ptr< CrossModuleObject >( new CrossModuleObject(std::forward<Args>(args)...), CrossModuleDeleter() );
};
The above kind of scheme works well in practice, and it has the nice advantage that the class can act as a base-class with no additional intrusion by this virtual-destroy mechanism in the derived classes. And, you can also modify it for the purpose of allowing only heap-based objects (as usually, making constructors-destructors private or protected). Without the heap-based restriction, the advantage is that you can still use the object as a local variable or data member (by value) if you want, but, of course, there will be loop-holes left to avoid by whoever uses the class.
As far as I know, these are the only legitimate use-cases I have ever seen anywhere or heard of (and the first one is easily avoidable, as I have shown, and often should be).
The general reason is that the lifetime of the object is determined by some factor internal to the class, at least from an application viewpoint. Hence, it may very well be a private method which calls delete this;.
Obviously, when the object is the only one to know how long it's needed, you can't put it on a random thread stack. It's necessary to create such objects on the heap.
It's generally an exceptionally bad idea. There are a very few cases- for example, COM objects have enforced intrusive reference counting. You'd only ever do this with a very specific situational reason- never for a general-purpose class.
1) Why would I want to force the object to be made on the heap instead of on the stack?
Because its life span isn't determined by the scoping rule.
2) Is there another use of delete this apart from this? (supposing that this is a legitimate use of it :) )
You use delete this when the object is the best placed one to be responsible for its own life span. One of the simplest example I know of is a window in a GUI. The window reacts to events, a subset of which means that the window has to be closed and thus deleted. In the event handler the window does a delete this. (You may delegate the handling to a controller class. But the situation "window forwards event to controller class which decides to delete the window" isn't much different of delete this, the window event handler will be left with the window deleted. You may also need to decouple the close from the delete, but your rationale won't be related to the desirability of delete this).
delete this;
can be useful at times and is usually used for a control class that also controls the lifetime of another object. With intrusive reference counting, the class it is controlling is one that derives from it.
The outcome of using such a class should be to make lifetime handling easier for users or creators of your class. If it doesn't achieve this, it is bad practice.
A legitimate example may be where you need a class to clean up all references to itself before it is destructed. In such a case, you "tell" the class whenever you are storing a reference to it (in your model, presumably) and then on exit, your class goes around nulling out these references or whatever before it calls delete this on itself.
This should all happen "behind the scenes" for users of your class.
"Why would I want to force the object to be made on the heap instead of on the stack?"
Generally when you force that it's not because you want to as such, it's because the class is part of some polymorphic hierarchy, and the only legitimate way to get one is from a factory function that returns an instance of a different derived class according to the parameters you pass it, or according to some configuration that it knows about. Then it's easy to arrange that the factory function creates them with new. There's no way that users of those classes could have them on the stack even if they wanted to, because they don't know in advance the derived type of the object they're using, only the base type.
Once you have objects like that, you know that they're destroyed with delete, and you can consider managing their lifecycle in a way that ultimately ends in delete this. You'd only do this if the object is somehow capable of knowing when it's no longer needed, which usually would be (as Mike says) because it's part of some framework that doesn't manage object lifetime explicitly, but does tell its components that they've been detached/deregistered/whatever[*].
If I remember correctly, James Kanze is your man for this. I may have misremembered, but I think he occasionally mentions that in his designs delete this isn't just used but is common. Such designs avoid shared ownership and external lifecycle management, in favour of networks of entity objects managing their own lifecycles. And where necessary, deregistering themselves from anything that knows about them prior to destroying themselves. So if you have several "tools" in a "toolbelt" then you wouldn't construe that as the toolbelt "owning" references to each of the tools, you think of the tools putting themselves in and out of the belt.
[*] Otherwise you'd have your factory return a unique_ptr or auto_ptr to encourage callers to stuff the object straight into the memory management type of their choice, or you'd return a raw pointer but provide the same encouragement via documentation. All the stuff you're used to seeing.
A good rule of thumb is not to use delete this.
Simply put, the thing that uses new should be responsible enough to use the delete when done with the object. This also avoids the problems with is on the stack/heap.
Once upon a time i was writing some plugin code. I believe i mixed build (debug for plugin, release for main code or maybe the other way around) because one part should be fast. Or maybe another situation happened. Such main is already released built on gcc and plugin is being debugged/tested on VC. When the main code deleted something from the plugin or plugin deleted something a memory issue would occur. It was because they both used different memory pools or malloc implementations. So i had a private dtor and a virtual function called deleteThis().
-edit- Now i may consider overloading the delete operator or using a smart pointer or simply just state never delete a function. It will depend and usually overloading new/delete should never be done unless you really know what your doing (dont do it). I decide to use deleteThis() because i found it easier then the C like way of thing_alloc and thing_free as deleteThis() felt like the more OOP way of doing it
Related to: C++ private pointer "leaking"?
According to Effective C++ (Item 28), "avoid returning handles (references, pointers, or iterators) to object internals. It increases encapsulation, helps const member functions act const, and minimizes the creation of dangling handles."
Returning objects by value is the only way I can think of to avoid returning handles. This to me suggests I should return private object internals by value as much as possible.
However, to return object by value, this requires the copy constructor which goes against the Google C++ Style Guide of "DISALLOW_COPY_AND_ASSIGN" operators.
As a C++ newbie, unless I am missing something, I find these two suggestions to conflict each other.
So my questions are: is there no silver bullet which allows efficient reference returns to object internals that aren't susceptible to dangling pointers? Is the const reference return as good as it gets? In addition, should I not be using pointers for private object fields that often? What is a general rule of thumb for choosing when to store private instance fields of objects as by value or by pointer?
(Edit) For clarification, Meyers' example dangling pointer code:
class Rectangle {
public:
const Point& upperLeft() const { return pData->ulhc; }
const Point& lowerRight() const { return pData->lrhc; }
...
};
class GUIObject { ... };
const Rectangle boundingBox(const GUIObject& obj);
If the client creates a function with code such as:
GUIObject *pgo; // point to some GUIObject
const Point *pUpperLeft = &(boundingBox(*pgo).upperLeft());
"The call to boundingBox will return a new, temporary Rectangle object [(called temp from here.)] upperLeft will then be called on temp, and that call will return a reference to an internal part of temp, in particular, to one of the Points making it up...at the end of the statement, boundingBox's return value temp will be destroyed, and that will indirectly lead to the destruction of temp's Points. That, in turn, will leave pUpperLeft pointing to an object that no longer exists." Meyers, Effective C++ (Item 28)
I think he is suggesting to return Point by value instead to avoid this:
const Point upperLeft() const { return pData->ulhc; }
The Google C++ style guide is, shall we say, somewhat "special" and has led to much discussion on various C++ newsgroups. Let's leave it at that.
Under normal circumstances I would suggest that following the guidelines in Effective C++ is generally considered to be a good thing; in your specific case, returning an object instead of any sort of reference to an internal object is usually the right thing to do. Most compilers are pretty good at handling large return values (Google for Return Value Optimization, pretty much every compiler does it).
If measurements with a profiler suggest that returning a value is becoming a bottleneck, then I would look at alternative methods.
First, let's look at this statement in context:
According to Effective C++ (Item 28),
"avoid returning handles (references,
pointers, or iterators) to object
internals. It increases encapsulation,
helps const member functions act
const, and minimizes the creation of
dangling handles."
This is basically talking about a class's ability to maintain invariants (properties that remain unchanged, roughly speaking).
Let's say you have a button widget wrapper, Button, which stores an OS-specific window handle to the button. If the client using the class had access to the internal handle, they could tamper with it using OS-specific calls like destroying the button, making it invisible, etc. Basically by returning this handle, your Button class sacrifices any control it originally had over the button handle.
You want to avoid these situations in such a Button class by providing everything you can do with the button as methods in this Button class. Then you don't need to ever return a handle to the OS-specific button handle.
Unfortunately, this doesn't always work in practice. Sometimes you have to return the handle or pointer or some other internal by reference for various reasons. Let's take boost::scoped_ptr, for instance. It is a smart pointer designed to manage memory through the internal pointer it stores. It has a get() method which returns this internal pointer. Unfortunately, that allows clients to do things like:
delete my_scoped_ptr.get(); // wrong
Nevertheless, this compromise was required because there are many cases where we are working with C/C++ APIs that require regular pointers to be passed in. Compromises are often necessary to satisfy libraries which don't accept your particular class but does accept one of its internals.
In your case, try to think if your class can avoid returning internals this way by instead providing functions to do everything one would want to do with the internal through your public interface. If not, then you've done all you can do; you'll have to return a pointer/reference to it but it would be a good habit to document it as a special case. You should also consider using friends if you know which places need to gain access to the class's internals in advance; this way you can keep such accessor methods private and inaccessible to everyone else.
Returning objects by value is the only
way I can think of to avoid returning
handles. This to me suggests I should
return private object internals by
value as much as possible.
No, if you can return a copy, then you can equally return by const reference. The clients cannot (under normal circumstances) tamper with such internals.
It really depends on the situation. If you plan to see changes in the calling method you want to pass by reference. Remember that passing by value is a pretty heavy operation. It requires a call to the copy constructor which in essence has to allocate and store enough memory to fit size of your object.
One thing you can do is fake pass by value. What that means is pass the actual parameter by value to a method that accepts const your object. This of course means the caller does not care to see changes to your object.
Try to limit pass by value if you can unless you have to.
I've always used the following rule for signatures of functions that return ref-counted objects based on whether they do an AddRef or not, but want to explain it to my colleagues too... So my question is, is the rule described below a widely followed rule? I'm looking for pointers to (for example) coding rules that advocate this style.
If the function does not add a reference to the object, it should be returned as the return value of the function:
class MyClass
{
protected:
IUnknown *getObj() { return m_obj; }
private:
IUnknown *m_obj;
};
However, if the function adds a reference to the object, then a pointer-to-pointer of the object is passed as a parameter to the function:
class MyClass
{
public:
void getObj(IUnknown **outObj) { *outObj = m_obj; (*outObj)->AddRef(); }
private:
IUnknown *m_obj;
};
It's much more typical to use the reference-counting smart pointers for cases when a new object is created and the caller has to take ownership of it.
I've used this same style on projects with a lot of COM. It was taught to me by a couple of people that learned it when they worked at NuMega on a little thing called SoftICE. I think this is also the style taught in the book "Essential COM", by Don Box (here it is at Amazon). At one point in time this book was considered the Bible for COM. I think the only reason this isn't still the case is that COM has become so much more than just COM.
All that said, I prefer CComPtr and other smart pointers.
One approach is to never use the function's return value. Only use output parameters, as in your second case. This is already a rule anyway in published COM interfaces.
Here's an "official" reference but, as is typical, it doesn't even mention your first case: http://support.microsoft.com/kb/104138
But inside a component, banning return values makes for ugly code. It is much nicer to have composability - i.e. putting functions together conveniently, passing the return value of one function directly as an argument to another.
Smart pointers allow you to do that. They are banned in public COM interfaces but then so are non-HRESULT return values. Consequently, your problem goes away. If you want to use a return value to pass back an interface pointer, do it via a smart pointer. And store members in smart pointers as well.
However, suppose for some reason you didn't want to use smart pointers (you're crazy, by the way!) then I can tell you that your reasoning is correct. Your function is acting as a "property getter", and in your first example it should not AddRef.
So your rule is correct (although there's a bug in your implementation which I'll come to in a second, as you may not have spotted it.)
This function wants an object:
void Foo(IUnknown *obj);
It doesn't need to affect obj's refcount at all, unless it wants to store it in a member variable. It certainly should NOT be the responsibility of Foo to call Release on obj before it returns! Imagine the mess that would create.
Now this function returns an object:
IUnknown *Bar();
And very often we like to compose functions, passing the output of one directly to another:
Foo(Bar());
This would not work if Bar had bumped up the refcount of whatever it returned. Who's going to Release it? So Bar does not call AddRef. This means that it is returning something that it stores and manages, i.e. it's effectively a property getter.
Also if the caller is using a smart pointer, p:
p = Bar();
Any sane smart pointer is going to AddRef when it is assigned an object. If Bar had also AddRef-ed well, we have again leaked one count. This is really just a special case of the same composability problem.
Output parameters (pointer-to-pointer) are different, because they aren't affected by the composability problem in the same way:
Again, smart pointers provide the most common case, using your second example:
myClass.getObj(&p);
The smart pointer isn't going to do any ref-counting here, so getObj has to do it.
Now we come to the bug. Suppose smart pointer p already points to something when you pass it to getObj...
The corrected version is:
void getObj(IUnknown **outObj)
{
if (*outObj != 0)
(*outObj)->Release();
*outObj = m_obj;
(*outObj)->AddRef(); // might want to check for 0 here also
}
In practise, people make that mistake so often that I find it simpler to make my smart pointer assert if operator& is called when it already has an object.