Use of shared pointers in public interfaces - c++

We have a pretty standard tree API using shared pointers that looks roughly like this (implementations omitted for brevity):
class node;
using node_ptr = std::shared_ptr<node>;
class node : public std::enable_shared_from_this<node> {
std::weak_ptr<node> parent;
std::vector<node_ptr> children;
public:
virtual ~node() = default;
virtual void do_something() = 0;
void add_child(node_ptr new_child);
void remove_child(node_ptr child);
node_ptr get_parent();
const std::vector<node_ptr>& get_children();
};
class derived_node : public node {
derived_node() = default;
public:
virtual void do_something() override;
static node_ptr create(/* args... */);
};
// More derived node types...
This works just fine and prevents nodes being leaked as you'd imagine. However, I've read on various other answers on SO that using std::shared_ptr in a public API like this is considered bad style and should be avoided.
Obviously this ventures into opinion-based territory, so a couple of concrete questions to avoid this question being closed :-)
Are there any well-known pitfalls to using shared_ptrs in interfaces like this, which we have so far been fortunate enough to avoid?
If so, is there a commonly-used (I hesitate to say "idiomatic") alternative formulation which avoids said pitfalls but still allows for simple memory management for users?
Thanks.

Its not bad style, it depends on your goals, and assumptions.
A few projects I've worked on with hard restraints required us to avoid shared_ptrs because we wanted to manage our own memory. So use of 3rd party libs that would require to use shared_ptrs are out.
Another reason you might wish to avoid shared_ptrs is that its somewhat opinionated. Some projects will wrap everything around it and just pretend its like having a GC language (Urg!). Other projects will treat shared_ptrs with a little more restraint, and only use shared_ptr's when it comes down to actually things that have shared ownership.
Most of the 3rd party API (certainly not all) I've worked with operate on the principle if you've allocated it, you destroy it. So long as your very clear about the ownership of the resource it doesn't cause too much issue.

std::shared_ptr is to manage ownership,
so prefer for print_tree function
void print_tree(const node& root); // no owner ship transfer
than
void print_tree(const std::shared_ptr<node>& root);
The later requires a shared_ptr, so may require a construction of a shared_ptr from the object. (whereas retrieving object from shared_ptr is a simple getter)
Now, for your getters, you have mainly the choice between
share_ptr, if you want to share ownership with user
weak_ptr, secure reference to internal
pointer/reference, insecure reference to internal.
By secure and insecure, I mean that if object is destroyed,
you may test that with weak_ptr, but not with simple pointer.
The security has some overhead though, so there is a trade-off.
If accessors are for local usage only and not to keep reference on it, pointer/reference may be a good option.
As example, std::vector::iterator are unusable once the vector is destroyed, so there are good for local usage but may be dangerous to keep iterator as reference (but possible).
Do you expect/allow that user keeps reference of a node but allow the root (or parent) to be destroyed ? what should happen to that node for user ?
For void add_child(node_ptr new_child);, you clearly take ownership. You may hide the shared_ptr if the node construct it child itself, something like
template <typename NodeType, typename...Ts>
std::weak_ptr<NodeType> add_child(Ts&&... args)
{
auto n = std::make_shared<NodeType>(std::forward<Ts>(args)...);
// or
// auto n = NodeType::create(std::forward<Ts>(args)...);
// Stuff as set parent and add to children vector
return n;
}

One reason this may be considered bad style may be the additional overhead involved in reference counting that is often (never say never) not actually needed in external APIs, since they often fall into one of two categories:
An API that receives a pointer, acts on it and returns it - a raw pointer will usually work better, since the function does not need to manage the pointer itself in any way
An API that manages the pointer, such as in your case - a std::unique_ptr will usually be a better fit, and it has 0 overhead.

Related

A very weak reference (cannot be turned into shared)

I'd like to know if there is any smart pointer type concept that implements the "very weak reference" idea.
This would be basically a weak_ptr but that cannot be turned into a shared_ptr, basically, when you have very_weak_refs out-there, you are sure that the strong-ref count can never go up.
This would allow better "strong ownership" of memory by managers, and delivering very weak references in the wild would still allow clients to access the data by using good old raw pointer through a .lock_get() function or equivalent... (name designed to mirror what you would have doing .lock().get() usually).
You don't have the same security on data, because your object may be destroyed while you use it, but if your environment is controlled enough so that you know that the manager's cannot clean its data while you are processing, you're still good to use raw-pointers locally after having checked against null_ptr after lock_get().
Do any of you wished for similar thing, more info/intelligence/thoughts on that ?
thanks.
rationale : the motivation behind is that weak_ptr has the "security flaw" of being turnable to shared and therefore, after distributing weak references in the wild, you basically did the same than distributing shared ones because anybody can keep very long lived shared refs on your data effectively preventing correct cleaning by the entities that were suppsed to be strong (the manager).
This is solved by very-weak-refs, when you distribute that kind of objects in your manager's public interface, you are sure that when you delete your last shared ref, your data is deleted.
For me, the whole concept of weak references works only with well behaved clients; who understands that they should promotes their weak refs into shareds for only small amounts of time.
Unfortunately, what you are asking for is impossible with the traditional interface of smart pointers.
The issue is one of lifetime. A weak_ptr cannot be use to access the object directly, because it does not guarantee that said object will live long enough: the object might be pulled right from under your feet.
Example:
int main() {
std::shared_ptr<int> sp(new int(4));
std::weak_ptr<int> wp(sp);
if (not wp.expired()) {
sp.reset();
std::cout << *wp << "\n"; // Access WP ? But there is nothing there!
}
}
Thus, for better or worse, there is no other choice than recovering a shared pointer from the weak pointer any time you actually need to access the object without controlling the duration of this access.
This last point, however, is our clue. A simple idea being to write a well-behaved client of weak_ptr yourself and change the way it allows the external world to access the data. For example:
template <typename T>
class very_weak_ptr {
public:
very_weak_ptr() {}
explicit very_weak_ptr(std::weak_ptr<T> wp): _wp(wp) {}
template <typename F>
void apply(F&& f) {
std::shared_ptr<T> sp = _wp.lock();
f(sp.get());
}
private:
std::weak_ptr<T> _wp;
}; // class very_weak_ptr
Note: there is one remaining flaw, enable_shared_from_this allows to recover a std::shared_ptr<T> from the very instance of T; you can add a compile time check on T to prevent usage of this class with such objects.
What you ask for is functionally equivalent to, starting with a std::shared_ptr<>:
initially creating a std::weakptr<> thereto + .get()-ting a raw pointer,
before use of the raw pointer, checking the weak_ptr<> hasn't .expired().
Do any of you wished for similar thing, more info/intelligence/thoughts on that ? thanks.
No... if you need to check the weak_ptr<>.expired() anyway, you might as well get a valid shared_ptr<>. What do you really think this is going to achieve? 'Better "strong ownership"' when you know somehow / require that the manager can't release the object during the period you're using the raw pointer - doesn't add up....

Delegating object destruction

I've found this class definition (T has to derive from TBase),
passResponsabilityToBarDestructor is not the actual name of the
function, sadly.
template<typename T>
class Foo
{
public:
Foo(const std::string& aName, Bar& aBar)
{
const TBase* myObj = static_cast<const TBase*>(new T);
NiceNameSpace::passResponsabilityToBarDestructor(aName, myObj, aBar);
}
virtual ~Foo() {}
};
I'm wondering if it is well designed.
When I write a class I tend to avoid delegating destruction since I
don't know if the delegated class (in this case Bar) is going to be modified
by someone not aware of the fact that passResponsabilityToBarDestructor has to call
a member function of aBar saving the pointer myObj somewhere and deleting it
when myObj get destroyed.
I would like to know:
if this class is well designed
if my design efforts (when I cannot use smart pointers I get headaches trying to
write classes destroying stuff in the same class constructing it) are a waste of time.
Delegation of destruction actually helps in many cases. I have come across code - where the cost of destruction is quite heavy, so the designers don't want to destroy the object during the call flow - instead delegate it to another thread level and delete it in background (ideally when system is not busy). In such cases, the garbage collector (in another thread) destroys the object.
This also sometimes used for quick switch of data (for cases of data refresh) and delete the old data at ease. I think it is a similar concept as in gc of Java.
Regarding as to whether this particular design is efficient/useful, may be if you add overall concept, it may help us to add some suggestion. Effectively, I have given some hint on second part of your question. HTH!

Is it bad practice to have a unique_ptr member that has a raw pointer accessor?

I have a class that has a unique_ptr member, and this class retains sole ownership of this object. However, external classes may require access to this object. In this case, should I just return a raw pointer? shared_ptr doesn't seem to be correct because that would imply that the accessing class now shares ownership of that memory, whereas I want to make it clear that the original class is the sole owner.
For example, perhaps I have a tree class that owns a root node. Another class may wish to explore the tree for some reason, and requires a pointer to the root node to do this. A partial implementation might look like:
class Tree
{
public:
Node* GetRoot()
{
return m_root.Get();
}
private:
std::unique_ptr<Node> m_root;
};
Is this bad practice? What would a better solution be?
A more normal implementation might be for the Tree to expose iterators or provide a visit mechanism to explore the tree, rather than exposing the implementation details of the Tree itself. Exposing the implementation details means that you can never change the tree's underlying structure without risk of breaking who-knows-how-many clients of that code.
If you absolutely insist there's a need for this, at least return the pointer as const, such as const Node* GetRoot() const because external clients should absolutely not be mutating the tree structure.
Scott Meyers, Effective C++ Item 15 says yes, although the primary reason is for interactivity with legacy code. If you are in a controlled environment - e.g. a startup where there is little legacy code and it's easy to extend a class as needed or in a homework assignment - there may be little need and it may only encourage using the class incorrectly. But the more reusable approach is to expose raw resources.
Keep in mind that when people say "avoid pointers" they really mean "use ownership semantics", and that pointers are not inherently bad.
Returning a non-owning pointer is an acceptable way of getting references that may also be null and isn't anymore dangerous than the alternatives. Now, in your case this would only make sense if your tree was meant to be used as a tree, rather than using a tree internally such as std::map does.
It's uncommon to do this though as usually you'll want the pointers and resources abstracted away.

delete this & private destructor

I've been thinking about the possible use of delete this in c++, and I've seen one use.
Because you can say delete this only when an object is on heap, I can make the destructor private and stop objects from being created on stack altogether. In the end I can just delete the object on heap by saying delete this in a random public member function that acts as a destructor. My questions:
1) Why would I want to force the object to be made on the heap instead of on the stack?
2) Is there another use of delete this apart from this? (supposing that this is a legitimate use of it :) )
Any scheme that uses delete this is somewhat dangerous, since whoever called the function that does that is left with a dangling pointer. (Of course, that's also the case when you delete an object normally, but in that case, it's clear that the object has been deleted). Nevertheless, there are somewhat legitimate cases for wanting an object to manage its own lifetime.
It could be used to implement a nasty, intrusive reference-counting scheme. You would have functions to "acquire" a reference to the object, preventing it from being deleted, and then "release" it once you've finished, deleting it if noone else has acquired it, along the lines of:
class Nasty {
public:
Nasty() : references(1) {}
void acquire() {
++references;
}
void release() {
if (--references == 0) {
delete this;
}
}
private:
~Nasty() {}
size_t references;
};
// Usage
Nasty * nasty = new Nasty; // 1 reference
nasty->acquire(); // get a second reference
nasty->release(); // back to one
nasty->release(); // deleted
nasty->acquire(); // BOOM!
I would prefer to use std::shared_ptr for this purpose, since it's thread-safe, exception-safe, works for any type without needing any explicit support, and prevents access after deleting.
More usefully, it could be used in an event-driven system, where objects are created, and then manage themselves until they receive an event that tells them that they're no longer needed:
class Worker : EventReceiver {
public:
Worker() {
start_receiving_events(this);
}
virtual void on(WorkEvent) {
do_work();
}
virtual void on(DeleteEvent) {
stop_receiving_events(this);
delete this;
}
private:
~Worker() {}
void do_work();
};
1) Why would I want to force the object to be made on the heap instead of on the stack?
1) Because the object's lifetime is not logically tied to a scope (e.g., function body, etc.). Either because it must manage its own lifespan, or because it is inherently a shared object (and thus, its lifespan must be attached to those of its co-dependent objects). Some people here have pointed out some examples like event handlers, task objects (in a scheduler), and just general objects in a complex object hierarchy.
2) Because you want to control the exact location where code is executed for the allocation / deallocation and construction / destruction. The typical use-case here is that of cross-module code (spread across executables and DLLs (or .so files)). Because of issues of binary compatibility and separate heaps between modules, it is often a requirement that you strictly control in what module these allocation-construction operations happen. And that implies the use of heap-based objects only.
2) Is there another use of delete this apart from this? (supposing that this is a legitimate use of it :) )
Well, your use-case is really just a "how-to" not a "why". Of course, if you are going to use a delete this; statement within a member function, then you must have controls in place to force all creations to occur with new (and in the same translation unit as the delete this; statement occurs). Not doing this would just be very very poor style and dangerous. But that doesn't address the "why" you would use this.
1) As others have pointed out, one legitimate use-case is where you have an object that can determine when its job is over and consequently destroy itself. For example, an event handler deleting itself when the event has been handled, a network communication object that deletes itself once the transaction it was appointed to do is over, or a task object in a scheduler deleting itself when the task is done. However, this leaves a big problem: signaling to the outside world that it no longer exists. That's why many have mentioned the "intrusive reference counting" scheme, which is one way to ensure that the object is only deleted when there are no more references to it. Another solution is to use a global (singleton-like) repository of "valid" objects, in which case any accesses to the object must go through a check in the repository and the object must also add/remove itself from the repository at the same time as it makes the new and delete this; calls (either as part of an overloaded new/delete, or alongside every new/delete calls).
However, there is a much simpler and less intrusive way to achieve the same behavior, albeit less economical. One can use a self-referencing shared_ptr scheme. As so:
class AutonomousObject {
private:
std::shared_ptr<AutonomousObject> m_shared_this;
protected:
AutonomousObject(/* some params */);
public:
virtual ~AutonomousObject() { };
template <typename... Args>
static std::weak_ptr<AutonomousObject> Create(Args&&... args) {
std::shared_ptr<AutonomousObject> result(new AutonomousObject(std::forward<Args>(args)...));
result->m_shared_this = result; // link the self-reference.
return result; // return a weak-pointer.
};
// this is the function called when the life-time should be terminated:
void OnTerminate() {
m_shared_this.reset( NULL ); // do not use reset(), but use reset( NULL ).
};
};
With the above (or some variations upon this crude example, depending on your needs), the object will be alive for as long as it deems necessary and that no-one else is using it. The weak-pointer mechanism serves as the proxy to query for the existence of the object, by possible outside users of the object. This scheme makes the object a bit heavier (has a shared-pointer in it) but it is easier and safer to implement. Of course, you have to make sure that the object eventually deletes itself, but that's a given in this kind of scenario.
2) The second use-case I can think of ties in to the second motivation for restricting an object to be heap-only (see above), however, it applies also for when you don't restrict it as such. If you want to make sure that both the deallocation and the destruction are dispatched to the correct module (the module from which the object was allocated and constructed), you must use a dynamic dispatching method. And for that, the easiest is to just use a virtual function. However, a virtual destructor is not going to cut it because it only dispatches the destruction, not the deallocation. The solution is to use a virtual "destroy" function that calls delete this; on the object in question. Here is a simple scheme to achieve this:
struct CrossModuleDeleter; //forward-declare.
class CrossModuleObject {
private:
virtual void Destroy() /* final */;
public:
CrossModuleObject(/* some params */); //constructor can be public.
virtual ~CrossModuleObject() { }; //destructor can be public.
//.... whatever...
friend struct CrossModuleDeleter;
template <typename... Args>
static std::shared_ptr< CrossModuleObject > Create(Args&&... args);
};
struct CrossModuleDeleter {
void operator()(CrossModuleObject* p) const {
p->Destroy(); // do a virtual dispatch to reach the correct deallocator.
};
};
// In the cpp file:
// Note: This function should not be inlined, so stash it into a cpp file.
void CrossModuleObject::Destroy() {
delete this;
};
template <typename... Args>
std::shared_ptr< CrossModuleObject > CrossModuleObject::Create(Args&&... args) {
return std::shared_ptr< CrossModuleObject >( new CrossModuleObject(std::forward<Args>(args)...), CrossModuleDeleter() );
};
The above kind of scheme works well in practice, and it has the nice advantage that the class can act as a base-class with no additional intrusion by this virtual-destroy mechanism in the derived classes. And, you can also modify it for the purpose of allowing only heap-based objects (as usually, making constructors-destructors private or protected). Without the heap-based restriction, the advantage is that you can still use the object as a local variable or data member (by value) if you want, but, of course, there will be loop-holes left to avoid by whoever uses the class.
As far as I know, these are the only legitimate use-cases I have ever seen anywhere or heard of (and the first one is easily avoidable, as I have shown, and often should be).
The general reason is that the lifetime of the object is determined by some factor internal to the class, at least from an application viewpoint. Hence, it may very well be a private method which calls delete this;.
Obviously, when the object is the only one to know how long it's needed, you can't put it on a random thread stack. It's necessary to create such objects on the heap.
It's generally an exceptionally bad idea. There are a very few cases- for example, COM objects have enforced intrusive reference counting. You'd only ever do this with a very specific situational reason- never for a general-purpose class.
1) Why would I want to force the object to be made on the heap instead of on the stack?
Because its life span isn't determined by the scoping rule.
2) Is there another use of delete this apart from this? (supposing that this is a legitimate use of it :) )
You use delete this when the object is the best placed one to be responsible for its own life span. One of the simplest example I know of is a window in a GUI. The window reacts to events, a subset of which means that the window has to be closed and thus deleted. In the event handler the window does a delete this. (You may delegate the handling to a controller class. But the situation "window forwards event to controller class which decides to delete the window" isn't much different of delete this, the window event handler will be left with the window deleted. You may also need to decouple the close from the delete, but your rationale won't be related to the desirability of delete this).
delete this;
can be useful at times and is usually used for a control class that also controls the lifetime of another object. With intrusive reference counting, the class it is controlling is one that derives from it.
The outcome of using such a class should be to make lifetime handling easier for users or creators of your class. If it doesn't achieve this, it is bad practice.
A legitimate example may be where you need a class to clean up all references to itself before it is destructed. In such a case, you "tell" the class whenever you are storing a reference to it (in your model, presumably) and then on exit, your class goes around nulling out these references or whatever before it calls delete this on itself.
This should all happen "behind the scenes" for users of your class.
"Why would I want to force the object to be made on the heap instead of on the stack?"
Generally when you force that it's not because you want to as such, it's because the class is part of some polymorphic hierarchy, and the only legitimate way to get one is from a factory function that returns an instance of a different derived class according to the parameters you pass it, or according to some configuration that it knows about. Then it's easy to arrange that the factory function creates them with new. There's no way that users of those classes could have them on the stack even if they wanted to, because they don't know in advance the derived type of the object they're using, only the base type.
Once you have objects like that, you know that they're destroyed with delete, and you can consider managing their lifecycle in a way that ultimately ends in delete this. You'd only do this if the object is somehow capable of knowing when it's no longer needed, which usually would be (as Mike says) because it's part of some framework that doesn't manage object lifetime explicitly, but does tell its components that they've been detached/deregistered/whatever[*].
If I remember correctly, James Kanze is your man for this. I may have misremembered, but I think he occasionally mentions that in his designs delete this isn't just used but is common. Such designs avoid shared ownership and external lifecycle management, in favour of networks of entity objects managing their own lifecycles. And where necessary, deregistering themselves from anything that knows about them prior to destroying themselves. So if you have several "tools" in a "toolbelt" then you wouldn't construe that as the toolbelt "owning" references to each of the tools, you think of the tools putting themselves in and out of the belt.
[*] Otherwise you'd have your factory return a unique_ptr or auto_ptr to encourage callers to stuff the object straight into the memory management type of their choice, or you'd return a raw pointer but provide the same encouragement via documentation. All the stuff you're used to seeing.
A good rule of thumb is not to use delete this.
Simply put, the thing that uses new should be responsible enough to use the delete when done with the object. This also avoids the problems with is on the stack/heap.
Once upon a time i was writing some plugin code. I believe i mixed build (debug for plugin, release for main code or maybe the other way around) because one part should be fast. Or maybe another situation happened. Such main is already released built on gcc and plugin is being debugged/tested on VC. When the main code deleted something from the plugin or plugin deleted something a memory issue would occur. It was because they both used different memory pools or malloc implementations. So i had a private dtor and a virtual function called deleteThis().
-edit- Now i may consider overloading the delete operator or using a smart pointer or simply just state never delete a function. It will depend and usually overloading new/delete should never be done unless you really know what your doing (dont do it). I decide to use deleteThis() because i found it easier then the C like way of thing_alloc and thing_free as deleteThis() felt like the more OOP way of doing it

What shared_ptr policy to use with asynchronous scheme?

I have my own multi-threaded service which handles some commands. The service consists of command parser, worker threads with queues and some caches. I don't want to keep an eye on each object's life-cycle, so I use shared_ptr's very extensive. Every component uses shared_ptr's in its own way:
command parser creates shared_ptr's and stores them in cache;
worker binds shared_ptr's to functors and puts them to queue.
cache temporary or permanently holds some shared_ptr's.
the data that is referenced by shared_ptr can also hold some other shared_ptr's.
And there is another underlying service (for example, command receiver and sender) that has the same structure, but uses his own cache, workers and shared_ptr's. It's independent from my service and is maintained by another developer.
It's a complete nightmare, when I try to track all shared_ptr dependencies to prevent cross-references.
Is there a way to specify some shared_ptr "interface" or "policy", so I will know which shared_ptr's I can pass safely to the underlying service without inspecting the code or interacting with the developer? Policy should involve the shared_ptr owning-cycle, for example, the worker holds the functor with binded shared_ptr since the dispatch() function call and only til some other function call, while the cache holds the shared_ptr since the cache's constructor call and til the cache's destructor call.
Especially, I'm curious about shutdown situation, when the application may freeze while waiting the threads to join.
There is no silver bullet... and shared_ptr certainly is not one.
My first question would be: do you need all those shared pointers ?
The best way to avoid cyclic references is to define the lifetime policy of each object and make sure they are compatible. This can be easily documented:
you pass me a reference, I expect the object to live throughout the function call, but no more
you pass me a unique_ptr, I am now responsible for the object
you pass me a shared_ptr, I expect to be able to keep a handle to the object myself without adversely affecting you
Now, there are rare situations where the use of shared_ptr is indeed necessary. The indication of caches lead me to think that it might be your case, at least for some uses.
In this case, you can (at least informally) enforce a layering approach.
Define a number of layers, from 0 (the base) to infinite
Each type of object is ascribed to a layer, several types may share the same layer
An object of type A might only hold a shared_ptr to an object of type B if, and only if, Layer(A) > Layer(B)
Note that we expressly forbid sibling relationships. With this scheme, no circle of references can ever be formed. Indeed, we obtain a DAG (Directed Acyclic Graph).
Now, when a type is created, it must be ascribed a layer number, and this must be documented (preferably in the code).
An object may change of layer, however:
if its layer number decreases, then you must reexamine the references it holds (easy)
if its layer number increases, then you must reexamine all the references to it (hard)
Note: by convention, types of objects which cannot hold any reference are usually in the layer 0.
Note 2: I first stumble upon this convention in an article by Herb Sutter, where he applied it to Mutexes and tried to prevent deadlock. This is an adaptation to the current issue.
This can be enforced a bit more automatically (by the compiler) as long as you are ready to work your existing code base.
We create a new SharedPtr class aware of our layering scheme:
template <typename T>
constexpr unsigned getLayer(T const&) { return T::Layer; }
template <typename T, unsigned L>
class SharedPtrImpl {
public:
explicit SharedPtrImpl(T* t): _p(t)
{
static_assert(L > getLayer(std::declval<T>()), "Layering Violation");
}
T* get() const { return _p.get(); }
T& operator*() const { return *this->get(); }
T* operator->() const { return this->get(); }
private:
std::shared_ptr<T> _p;
};
Each type that may be held in such a SharedPtr is given its layer statically, and we use a base class to help us out:
template <unsigned L>
struct LayerMember {
static unsigned const Layer = L;
template <typename T>
using SharedPtr<T> = SharedPtrImpl<T, L>;
};
And now, we can easily use it:
class Foo: public LayerMember<3> {
public:
private:
SharedPtr<Bar> _bar; // statically checked!
};
However this coding approach is a little more involved, I think that convention may well be sufficient ;)
You should look at weak_ptr. It complements shared_ptr but does not keep objects alive, so is very useful when you might have circular references.