I have a directed acyclic graph, composed of Node objects. Each node has a list of std::shared_ptrs to other nodes, which are its children in the graph. I have lots of useful methods I need, such as inserting/emplacing/reparenting nodes, testing if a node is an ancestor of another, etc. Some are standard STL-like methods, and some are specific to directed acyclic graphs and specific to my needs.
The question is, When such a method takes a node as a parameter, should it take a reference? of a weak_ptr? or a shared_ptr? I tried to examine use cases but it's hard to tell. What's the best design here? I'm new to smart pointers and I'm not sure what's the best choice. Should I treat shared_ptr<Node> as "the representation" of node objects? Or maybe the way to choose is more sophisticated?
Thanks in advance
Only pass a shared_ptr (by value) or copy it when the set of owners is meaningfully extended. It's safe, and preferred, to pass pointers when dealing with nodes as pure information.
Note the std::enable_shared_from_this facility to retrieve the correct std::shared_ptr from any graph object. With that base class, a valid naked pointer and a shared pointer are essentially equivalent. I'm not sure how much, if any, overhead it adds. (It definitely ensures that there will be no additional heap fragmentation, which std::make_shared also does.)
Passing shared_ptr anywhere is just an optimization of functionality elegantly provided by shared_from_this. But when you do, pass them by const reference, since they are just providing information without actively arbitrating ownership.
Related
I have a conainter, lets say a std::list<int>, which I would like to share between objects. One of the objects is known to live longer than the others, so he will hold the container. In order to be able to access the list, the other objects may have a pointer to the list.
Since the holder object might get moved, I'll need to wrap the list with a unique_ptr:
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::list<int>& list; };
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects:
class LongLiveHolder { std::unique_ptr<NonExistentListNode<int>> back; };
class ShortLiveObject { NonExistentListNode<int>& back; };
, which would save me a redundant dereference when accessing the list, except that I would no longer have the full std::list interface to use with the shorter-lived object- just the node pointers.
Can I somehow get rid of this extra layer of indirection, while still having the std::list interface in the shorter-lived object?
Preface
You may be overthinking the cost of the extra indirection from the std::unique_ptr (unless you have a lot of these lists and you know that usages of them will be frequent and intermixed with other procedures). In general, I'd first trust my compiler to do smart things. If you want to know the cost, do performance profiling.
The main purpose of the std::unique_ptr in your use-case is just to have shared data with a stable address when other data that reference it gets moved. If you use the list member of the long-lived object multiple times in a single procedure, you can possibly help your compiler to help you (and also get some nicer-to-read code) when you use the list through the long-lived object by making a variable in the scope of the procedure that stores a reference to the std::list pointed to by the std::unique_ptr like:
void fn(LongLiveHolder& holder) {
auto& list {holder.list.get()};
list.<some_operation_1>(...);
list.<some_operation_2>(...);
list.<some_operation_3>(...);
}
But again, you should inspect the generated machine code and do performance profiling if you really want to know what kind of difference it makes.
If Context Permits, Write your own List
You said:
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects: [...]
Considering Changes in what is the First Node
What if the first node of the list is allowed to be deleted? What if a new node is allowed to be inserted at the beginning of the list? You'd need a very specific context for those to not be requirements. What you want in your short-lived object is a view abstractions which supports the same interface as the actual list but just doesn't manage the lifetime of the list contents. If you implement the view abstraction as a pointer to the list's first node, then how will the view object know about changes to what the "real"/lifetime-managing list considers to be the first node? It can't- unless the lifetime-managing list keeps an internal list of all views of itself which are alive and also updates those (which itself is a performance and space overhead), and even then, what about the reverse? If the view abstraction was used to change what's considered the first node, how would the lifetime-managing list know about that change? The simplest, sane solution is to have an extra level of indirection: make the view point to the list instead of to what was the list's first node when the view was created.
Considering Requirements on Time Complexity of getting the list size
I'm pretty sure a std::list can't just hold pointers to front and back nodes. For one thing, since c++11 requires that std::list::size() is O(1), std::list probably has to keep track of its size at all times in a counter member- either storing it in itself, or doing some kind of size-tracking in each node struct, or some other implementation-defined behaviour. I'm pretty sure the simplest and most performant way to have multiple moveable references (non-const pointers) to something that needs to do this kind of bookkeeping is to just add another level of indirection.
You could try to "skip" the indirection layer required by the bookkeeping for specific cases that don't require that information, which is the iterators/node-pointers approach, which I'll comment on later. I can't think of a better place or way to store that bookkeeping other than with the collection itself. Ie. If the list interface has requirements that require such bookkeeping, an extra layer of indirection for each user of the list implementation has a very strong design rationale.
If Context Permits
If you don't care about having O(1) to get the size of your list, and you know that what is considered the first node will not change for the lifetime of the short-lived object, then you can write your own List class list-view class and make your own context-specific optimizations. That's one of the big selling-points of languages like C++: You get a nice standard library that does commonly useful things, and when you have a specific scenario where some features of those tools aren't required and are resulting in unnecessary overhead, you can build your own tool/abstraction (or possibly use someone else's library).
Commentary on std::unique_ptr + reference
Your first snippet works, but you can probably get some better implicit constructors and such for SortLiveObject by using std::reference_wrapper, since the default implicity-declared copy-assignment and default-construct functions get deleted when there's a reference member.
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::reference_wrapper<std::list<int>> list; };
Commentary on std::shared_ptr + std::weak_ref
Like #Adrian Maire suggested, std::shared_ptr in the longer-lived, object which might move while the shorter-lived object exists, and std::weak_ptr in the shorter-lived object is a working approach, but it probably has more overhead (at least coming from the ref-count) than using std::unique_ptr + a reference, and I can't think of any generalized pros, so I wouldn't suggest it unless you already had some other reason to use a std::shared_ptr. In the scenario you gave, I'm pretty sure you do not.
Commentary on Storing iterators/node-pointers in the short-lived object
#Daniel Langr already commented about this, but I'll try to expand.
Specifically for std::list, there is a possible standard-compliant solution (with several caveats) that doesn't have the extra indirection of the smart pointer. Caveats:
You must be okay with only having an iterator interface for the shorter-lived object (which you indicated that you are not).
The front and back iterators must be stable for the lifetime of the shorter-lived object. (the iterators should not be deleted from the list, and the shorter-lived object won't see new list entries that are pushed to the front or back by someone using the longer-lived object).
From cppreference.com's page for std::list's constructors:
After container move construction (overload (8)), references, pointers, and iterators (other than the end iterator) to other remain valid, but refer to elements that are now in *this. The current standard makes this guarantee via the blanket statement in [container.requirements.general]/12, and a more direct guarantee is under consideration via LWG 2321.
From cppreference.com's page for std::list:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
But I am not a language lawyer. I could be missing something important.
Also, you replied to Daniel saying:
Some iterators get invalid when moving the container (e.g. insert_iterator) #DanielLangr
Yes, so if you want to be able to make std::input_iterators, use the std::unique_ptr + reference approach and construct short-lived std::input_iterators when needed instead of trying to store long-lived ones.
If the list owner will be moved, then you need some memory address to share somehow.
You already indicated the unique_ptr. It's a decent solution if the non-owners don't need to save it internally.
The std::shared_ptr is an obvious alternative.
Finally, you can have a std::shared_ptr in the owner object, and pass std::weak_ptr to non-owners.
This is a more general question that I'm trying to resolve for C++ best practices. Suppose I want to create objects which store references to each other, like a graph. All objects are owned by the same object, like a Graph object to all the Nodes, which is to say the ownership is fixed.
Here's my idea: a class Graph has a std::vector of Nodes, each Node has a std::vector of Nodes representing its list of connections. I'm wondering how best to implement this in terms of smart pointers? To my understanding, ownership is unique so the Graph vector should be std::vector<std::unique_ptr<Node>> nodes and I can populate that as needed. But the connections vector, how can I get each node to store references to its connections? These would only be read-only references, and maybe it would be better to name all the nodes and only store the names, or to store connections in the Graph. But is there a good way of storing references to the connection nodes as if they were const pointers?
Note: this is really about ownership and smart pointers, not about data structures, the graph example is just an example.
When discussing "Best Practices", it's important to consider what your quality-attributes and needs are for the code.
There is no "right" or "wrong" answer in the example of code such as a Graph; there are varying degrees that solve different problems in different ways -- and it depends strongly on the way its intended to be used.
By-far the simplest way to solve such a problem is for the main container (Graph) to have strong ownership in the with unique_ptr, and to only view the lifetime in the internal elements (Node) with a raw pointer, e.g.:
class Graph
{
...
private:
std::vector<std::unique_ptr<Node>> m_nodes;
};
class Node
{
...
private:
std::vector<const Node*> m_connected_nodes;
};
This would work well, since Node cannot mutate its connected nodes, and since Graph assumes that Node will never outlive it.
However, this approach does not work if you ever want Node to outlive Graph, or if you want Node to be used across multiple Graph objects. If it lives between different Graphs, then you may run the risk of a Node referring to a dangling pointer -- and this would be bad.
If this is the case, you might need to consider a different ownership pattern, such as shared_ptr and weak_ptr ownership:
class Graph
{
...
private:
std::vector<std::shared_ptr<Node>> m_nodes;
};
class Node
{
...
private:
std::vector<std::weak_ptr<const Node*>> m_connected_nodes;
};
In this case, Nodes only weakly know other Node objects, whereas Graph is the strong owner of them. This prevents the dangling issue, but incurs additional overhead now for the shared_ptr's control node, and for having to check for whether it's alive before accessing weak_ptr nodes.
So the correct answer is: It depends. If you can get away with the former approach, that's probably the cleanest; you always have 1 owner, and thus the logic is simple and easy to follow.
I'm wondering how best to implement this in terms of smart pointers?
By not using them. Use a vector of nodes for the graph: std::vector<Node>. This is a reasonable default choice until you have a good reason to do otherwise.
But is there a good way of storing references to the connection nodes as if they were const pointers?
Yes. Const pointers are a good way of storing as if they were const pointers. (And by "const pointer", I presume we are actually talking about pointer to const).
A reference wrapper is another choice. Although it has the advantage of not having representation for null, it does have the downside of clumsy syntax.
I am encountering the same problem over and over, in the last weeks. Boiled down to its core, I build a (directed acyclic) hierarchy of objects like:
a -> c
b -> c
b -> d
An instance can have more than one parent, and more than one child. c might be a value shared among readers and writers. b might be a composite value. The hierarchy is easily created by a factory - e.g. a.setChild(new C()). Later, a client only cares about the root, on which she calls getValue() or setValue().
My question is: Who cleans up the hierarchy? Who is responsible to call delete on b or c?
Option - The factory created the nodes, the factory must delete the nodes:
I do not like this option, because I understand a factory as a replacement for new. It feels weird to keep a factory until the instances it created can be destroyed.
Option - "Smart" pointers:
Much less good, because it pollutes the interface, and introduces a lot of complexity for a simple thing such as pointers.
Option - A graph-like class that does the memory management:
The class collects all nodes a, b, c, d, ... and provides access to the root. The nodes themselves reference each other, but do not delete children, or parents. If the "graph" or composite manager is destroyed, it destroys all nodes.
I prefer the last option. It, however, complicates the construction of the hierarchy. Either you build the hierarchy outside, and tell the graph about every single node, or you build the hierarchy inside the graph, which smells like option 1. The graph can only delete, what it knows. Therefore, memory leaks are somehow in the design, if you have to pass the nodes to the graph.
Is there a pattern for this problem? Which strategy do you prefer?
Edit 1 - Sep 1st, 2014: Sorry, for being unspecific with respect to smart pointers. I tried to avoid yet another "when to use smart pointers question", and instead focus the question on alternative solutions. However, I am willing to use smart pointers, if they indeed are the best option (or if necessary).
In my opinion, the signature setChild(C* child) should be preferred over setChild(std::shared_ptr<C> child) for the same reason as for-each loops should be preferred over iterators. However, I must admit that like std:string a shared pointer is more specific about its semantic.
In terms of complexity, every operation inside a node, now, have to deal with shared pointers. std::vector<C*> becomes std::vector< std::shared_ptr<C> >, ... Furthermore, every pointer carries a reference count around, which could be avoided if there were other options.
I should add that I develop a low level part of a real-time system. It is not a firmware but close.
Edit 2 - Sep 1st, 2014:
Thank you for all the input. My specific problem is: I get a byte array with lets say sensor data. At some point, I am told which value is written where in that array. On the one hand, I want to have a mapping from a position in the array to a primitive value (int32, double, ...). On the other hand, I want to merge primitive values to complex types (structures, vectors, ...). Unfortunately, not all mappings are bidirectional. Eg. I can read a comparison between values, but I can not write values according to the comparison result. Therefore, I separated readers and writers and let them, if necessary, access the same value.
Option - "Smart" pointers: Much less good, because it pollutes the
interface, and introduces a lot of complexity for a simple thing such
as pointers.
Smart pointers are pointers with memory management, which is exactly what you need.
You are not forced to expose the smart pointers in your interface, you can get a raw pointer from the smart one as long as the object is still owned by a smart pointer somewhere (Note that this is not necessarily the best idea though, smart pointers in an interface is far from being an ugly thing).
Actually exposing raw pointers in your interface indirectly introduce much more pollution and complexity because you need to document ownership and lifetime rules which are explicit when using smart pointers, which makes them even more simpler to use than "simple pointers".
Also it is most likely the "futur" way of doing things: since c++11/14 smart pointers are part of the standard as much as std::string, would you say that std::string pollutes the interface and introduces a lot of complexity compared to a const char* ?
If you have performance issues, then the question is: will your hand crafted alternative solution be truly more performant (this need to be measured) and is it worth it in term of development time required to develop the feature and maintain the code ?
I have a class that has a unique_ptr member, and this class retains sole ownership of this object. However, external classes may require access to this object. In this case, should I just return a raw pointer? shared_ptr doesn't seem to be correct because that would imply that the accessing class now shares ownership of that memory, whereas I want to make it clear that the original class is the sole owner.
For example, perhaps I have a tree class that owns a root node. Another class may wish to explore the tree for some reason, and requires a pointer to the root node to do this. A partial implementation might look like:
class Tree
{
public:
Node* GetRoot()
{
return m_root.Get();
}
private:
std::unique_ptr<Node> m_root;
};
Is this bad practice? What would a better solution be?
A more normal implementation might be for the Tree to expose iterators or provide a visit mechanism to explore the tree, rather than exposing the implementation details of the Tree itself. Exposing the implementation details means that you can never change the tree's underlying structure without risk of breaking who-knows-how-many clients of that code.
If you absolutely insist there's a need for this, at least return the pointer as const, such as const Node* GetRoot() const because external clients should absolutely not be mutating the tree structure.
Scott Meyers, Effective C++ Item 15 says yes, although the primary reason is for interactivity with legacy code. If you are in a controlled environment - e.g. a startup where there is little legacy code and it's easy to extend a class as needed or in a homework assignment - there may be little need and it may only encourage using the class incorrectly. But the more reusable approach is to expose raw resources.
Keep in mind that when people say "avoid pointers" they really mean "use ownership semantics", and that pointers are not inherently bad.
Returning a non-owning pointer is an acceptable way of getting references that may also be null and isn't anymore dangerous than the alternatives. Now, in your case this would only make sense if your tree was meant to be used as a tree, rather than using a tree internally such as std::map does.
It's uncommon to do this though as usually you'll want the pointers and resources abstracted away.
I've been evaluating various smart pointer implementations (wow, there are a LOT out there) and it seems to me that most of them can be categorized into two broad classifications:
1) This category uses inheritance on the objects referenced so that they have reference counts and usually up() and down() (or their equivalents) implemented. IE, to use the smart pointer, the objects you're pointing at must inherit from some class the ref implementation provides.
2) This category uses a secondary object to hold the reference counts. For example, instead of pointing the smart pointer right at an object, it actually points at this meta data object... Who has a reference count and up() and down() implementations (and who usually provides a mechanism for the pointer to get at the actual object being pointed to, so that the smart pointer can properly implement operator ->()).
Now, 1 has the downside that it forces all of the objects you'd like to reference count to inherit from a common ancestor, and this means that you cannot use this to reference count objects that you don't have control over the source code to.
2 has the problem that since the count is stored in another object, if you ever have a situation that a pointer to an existing reference counted object is being converted into a reference, you probably have a bug (I.E., since the count is not in the actual object, there is no way for the new reference to get the count... ref to ref copy construction or assignment is fine, because they can share the count object, but if you ever have to convert from a pointer, you're totally hosed)...
Now, as I understand it, boost::shared_pointer uses mechanism 2, or something like it... That said, I can't quite make up my mind which is worse! I have only ever used mechanism 1, in production code... Does anyone have experience with both styles? Or perhaps there is another way thats better than both of these?
"What is the best way to implement smart pointers in C++"
Don't! Use an existing, well tested smart pointer, such as boost::shared_ptr or std::tr1::shared_ptr (std::unique_ptr and std::shared_ptr with C++ 11)
If you have to, then remember to:
use safe-bool idiom
provide an operator->
provide the strong exception guarantee
document the exception requirements your class makes on the deleter
use copy-modify-swap where possible to implement the strong exception guarantee
document whether you handle multithreading correctly
write extensive unit tests
implement conversion-to-base in such a way that it will delete on the derived pointer type (policied smart pointers / dynamic deleter smart pointers)
support getting access to raw pointer
consider cost/benifit of providing weak pointers to break cycles
provide appropriate casting operators for your smart pointers
make your constructor templated to handle constructing base pointer from derived.
And don't forget anything I may have forgotten in the above incomplete list.
Just to supply a different view to the ubiquitous Boost answer (even though it is the right answer for many uses), take a look at Loki's implementation of smart pointers. For a discourse on the design philosophy, the original creator of Loki wrote the book Modern C++ Design.
I've been using boost::shared_ptr for several years now and while you are right about the downside (no assignment via pointer possible), I think it was definitely worth it because of the huge amount of pointer-related bugs it saved me from.
In my homebrew game engine I've replaced normal pointers with shared_ptr as much as possible. The performance hit this causes is actually not so bad if you are calling most functions by reference so that the compiler does not have to create too many temporary shared_ptr instances.
Boost also has an intrusive pointer (like solution 1), that doesn't require inheriting from anything. It does require changing the pointer to class to store the reference count and provide appropriate member functions. I've used this in cases where memory efficiency was important, and didn't want the overhead of another object for each shared pointer used.
Example:
class Event {
public:
typedef boost::intrusive_ptr<Event> Ptr;
void addRef();
unsigned release();
\\ ...
private:
unsigned fRefCount;
};
inline void Event::addRef()
{
fRefCount++;
}
inline unsigned Event::release(){
fRefCount--;
return fRefCount;
}
inline void intrusive_ptr_add_ref(Event* e)
{
e->addRef();
}
inline void intrusive_ptr_release(Event* e)
{
if (e->release() == 0)
delete e;
}
The Ptr typedef is used so that I can easily switcth between boost::shared_ptr<> and boost::intrusive_ptr<> without changing any client code
If you stick with the ones that are in the standard library you will be fine.
Though there are a few other types than the ones you specified.
Shared: Where the ownership is shared between multiple objects
Owned: Where one object owns the object but transfer is allowed.
Unmovable: Where one object owns the object and it can not be transferred.
The standard library has:
std::auto_ptr
Boost has a couple more than have been adapted by tr1 (next version of the standard)
std::tr1::shared_ptr
std::tr1::weak_ptr
And those still in boost (which in relatively is a must have anyway) that hopefully make it into tr2.
boost::scoped_ptr
boost::scoped_array
boost::shared_array
boost::intrusive_ptr
See:
Smart Pointers: Or who owns you baby?
It seems to me this question is kind of like asking "Which is the best sort algorithm?" There is no one answer, it depends on your circumstances.
For my own purposes, I'm using your type 1. I don't have access to the TR1 library. I do have complete control over all the classes I need to have shared pointers to. The additional memory and time efficiency of type 1 might be pretty slight, but memory usage and speed are big issues for my code, so type 1 was a slam dunk.
On the other hand, for anyone who can use TR1, I'd think the type 2 std::tr1::shared_ptr class would be a sensible default choice, to be used whenever there isn't some pressing reason not to use it.
The problem with 2 can be worked around. Boost offers boost::shared_from_this for this same reason. In practice, it's not a big problem.
But the reason they went with your option #2 is that it can be used in all cases. Relying on inheritance isn't always an option, and then you're left with a smart pointer you can't use for half your code.
I'd have to say #2 is best, simply because it can be used in any circumstances.
Our project uses smart pointers extensively. In the beginning there was uncertainty about which pointer to use, and so one of the main authors chose an intrusive pointer in his module and the other a non-intrusive version.
In general, the differences between the two pointer types were not significant. The only exception being that early versions of our non-intrusive pointer implicitly converted from a raw pointer and this can easily lead to memory problems if the pointers are used incorrectly:
void doSomething (NIPtr<int> const &);
void foo () {
NIPtr<int> i = new int;
int & j = *i;
doSomething (&j); // Ooops - owned by two pointers! :(
}
A while ago, some refactoring resulted in some parts of the code being merged, and so a choice had to be made about which pointer type to use. The non-intrusive pointer now had the converting constructor declared as explicit and so it was decided to go with the intrusive pointer to save on the amount of code change that was required.
To our great surprise one thing we did notice was that we had an immediate performance improvement by using the intrusive pointer. We did not put much research into this, and just assumed that the difference was the cost of maintaining the count object. It is possible that other implementations of non-intrusive shared pointer have solved this problem by now.
What you are talking about are intrusive and non-intrusive smart pointers. Boost has both. boost::intrusive_ptr calls a function to decrease and increase the reference count of your object, everytime it needs to change the reference count. It's not calling member functions, but free functions. So it allows managing objects without the need to change the definition of their types. And as you say, boost::shared_ptr is non-intrusive, your category 2.
I have an answer explaining intrusive_ptr: Making shared_ptr not use delete. In short, you use it if you have an object that has already reference counting, or need (as you explain) an object that is already referenced to be owned by an intrusive_ptr.