Handles vs Smart pointers. What to use?

Handles vs Smart pointers. What to use? - c++

I'm starting to develop a graphical engine just for practicing purposes. One of the first questions that arised is either to use handles or smart pointers to refer to my class instances.
From my point of view:
Smart pointers pros: created under demand, they do not have the problem of becoming stale pointers; cons: as they are in a linked list, searching for a pointer is an O(n) operation.
Handles pros: search is O(1), object relocation is O(1); cons: can became stale pointers, creating a new handle forces the system to check for the first NULL entry in the handles table.
Which one to choose? Please explain your selection.
EDITED:
I want to clarify some points after your comments and answers.
I don't mean smart pointers are a linked list in the way of "are represented by a STL linked list". I mean they behave, in some way as a linked list (if you move one object from one memory block to another, you need to iterate the full list of smart pointers to update all references to this object properly -it can be done with a linked list -).
And I don't mean handles exactly as opaque pointers or pointer to implementation models. I mean having a global handle table (an array of pointers) so when I request an object, I get a dereferenceable instance containing the index in this table where the actual pointer to the object can be found. So, if I move the object from one block to another, just updating the pointer entry in the handle table I get all pointers automatically updated at the same time.

Neither of those definitions fit what's normally used. Smart pointers aren't in a linked-list in any way at all. Usually you use the observer pattern to keep a vector of raw pointers to objects that still exist if you need to iterate them or something. Handles as you describe them are pretty much only used for binary compatibility reasons and never in-process.
Use smart pointers, they take care of themselves.

The term "handle" is a broad term that, essentially, means an identifier to an object.
A pointer or smart pointer falls under this definition, so you need to pick a terser term for your Option 2.
"Handle"
|
/------+-------\
/ | \
/ | \
Pointer Reference Other Identififer
| | \
|----+----| `T&` \
| | |---+------|
`T*` `shared_ptr<T>` Text Number (e.g. HWND in WinAPI)
If I assume that you mean some fixed, memory-abstracted "other identifier" then, sure, you can employ this. You don't necessarily have an either/or scenario here. You probably want to use smart pointers anyway (for lifetime management if nothing else), and smart pointers don't need to be in a linked list.
You could have a std::map<your_identifier_type, std::shared_ptr<T> > to map your fixed, user-defined identifier to a [potentially-changing] smart pointer.
Disclaimer: This diagram was hastily drawn and represents my vision of the terminology tree as it stands now, half an hour after getting out of bed. There may be minor discrepancies with other views, but it should give a fairly reliable impression of things.

Related

Sharing a `std::list` without adding a (redundant) reference to it

I have a conainter, lets say a std::list<int>, which I would like to share between objects. One of the objects is known to live longer than the others, so he will hold the container. In order to be able to access the list, the other objects may have a pointer to the list.
Since the holder object might get moved, I'll need to wrap the list with a unique_ptr:
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::list<int>& list; };
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects:
class LongLiveHolder { std::unique_ptr<NonExistentListNode<int>> back; };
class ShortLiveObject { NonExistentListNode<int>& back; };
, which would save me a redundant dereference when accessing the list, except that I would no longer have the full std::list interface to use with the shorter-lived object- just the node pointers.
Can I somehow get rid of this extra layer of indirection, while still having the std::list interface in the shorter-lived object?

Preface
You may be overthinking the cost of the extra indirection from the std::unique_ptr (unless you have a lot of these lists and you know that usages of them will be frequent and intermixed with other procedures). In general, I'd first trust my compiler to do smart things. If you want to know the cost, do performance profiling.
The main purpose of the std::unique_ptr in your use-case is just to have shared data with a stable address when other data that reference it gets moved. If you use the list member of the long-lived object multiple times in a single procedure, you can possibly help your compiler to help you (and also get some nicer-to-read code) when you use the list through the long-lived object by making a variable in the scope of the procedure that stores a reference to the std::list pointed to by the std::unique_ptr like:
void fn(LongLiveHolder& holder) {
auto& list {holder.list.get()};
list.<some_operation_1>(...);
list.<some_operation_2>(...);
list.<some_operation_3>(...);
}
But again, you should inspect the generated machine code and do performance profiling if you really want to know what kind of difference it makes.
If Context Permits, Write your own List
You said:
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects: [...]
Considering Changes in what is the First Node
What if the first node of the list is allowed to be deleted? What if a new node is allowed to be inserted at the beginning of the list? You'd need a very specific context for those to not be requirements. What you want in your short-lived object is a view abstractions which supports the same interface as the actual list but just doesn't manage the lifetime of the list contents. If you implement the view abstraction as a pointer to the list's first node, then how will the view object know about changes to what the "real"/lifetime-managing list considers to be the first node? It can't- unless the lifetime-managing list keeps an internal list of all views of itself which are alive and also updates those (which itself is a performance and space overhead), and even then, what about the reverse? If the view abstraction was used to change what's considered the first node, how would the lifetime-managing list know about that change? The simplest, sane solution is to have an extra level of indirection: make the view point to the list instead of to what was the list's first node when the view was created.
Considering Requirements on Time Complexity of getting the list size
I'm pretty sure a std::list can't just hold pointers to front and back nodes. For one thing, since c++11 requires that std::list::size() is O(1), std::list probably has to keep track of its size at all times in a counter member- either storing it in itself, or doing some kind of size-tracking in each node struct, or some other implementation-defined behaviour. I'm pretty sure the simplest and most performant way to have multiple moveable references (non-const pointers) to something that needs to do this kind of bookkeeping is to just add another level of indirection.
You could try to "skip" the indirection layer required by the bookkeeping for specific cases that don't require that information, which is the iterators/node-pointers approach, which I'll comment on later. I can't think of a better place or way to store that bookkeeping other than with the collection itself. Ie. If the list interface has requirements that require such bookkeeping, an extra layer of indirection for each user of the list implementation has a very strong design rationale.
If Context Permits
If you don't care about having O(1) to get the size of your list, and you know that what is considered the first node will not change for the lifetime of the short-lived object, then you can write your own List class list-view class and make your own context-specific optimizations. That's one of the big selling-points of languages like C++: You get a nice standard library that does commonly useful things, and when you have a specific scenario where some features of those tools aren't required and are resulting in unnecessary overhead, you can build your own tool/abstraction (or possibly use someone else's library).
Commentary on std::unique_ptr + reference
Your first snippet works, but you can probably get some better implicit constructors and such for SortLiveObject by using std::reference_wrapper, since the default implicity-declared copy-assignment and default-construct functions get deleted when there's a reference member.
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::reference_wrapper<std::list<int>> list; };
Commentary on std::shared_ptr + std::weak_ref
Like #Adrian Maire suggested, std::shared_ptr in the longer-lived, object which might move while the shorter-lived object exists, and std::weak_ptr in the shorter-lived object is a working approach, but it probably has more overhead (at least coming from the ref-count) than using std::unique_ptr + a reference, and I can't think of any generalized pros, so I wouldn't suggest it unless you already had some other reason to use a std::shared_ptr. In the scenario you gave, I'm pretty sure you do not.
Commentary on Storing iterators/node-pointers in the short-lived object
#Daniel Langr already commented about this, but I'll try to expand.
Specifically for std::list, there is a possible standard-compliant solution (with several caveats) that doesn't have the extra indirection of the smart pointer. Caveats:
You must be okay with only having an iterator interface for the shorter-lived object (which you indicated that you are not).
The front and back iterators must be stable for the lifetime of the shorter-lived object. (the iterators should not be deleted from the list, and the shorter-lived object won't see new list entries that are pushed to the front or back by someone using the longer-lived object).
From cppreference.com's page for std::list's constructors:
After container move construction (overload (8)), references, pointers, and iterators (other than the end iterator) to other remain valid, but refer to elements that are now in *this. The current standard makes this guarantee via the blanket statement in [container.requirements.general]/12, and a more direct guarantee is under consideration via LWG 2321.
From cppreference.com's page for std::list:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
But I am not a language lawyer. I could be missing something important.
Also, you replied to Daniel saying:
Some iterators get invalid when moving the container (e.g. insert_iterator) #DanielLangr
Yes, so if you want to be able to make std::input_iterators, use the std::unique_ptr + reference approach and construct short-lived std::input_iterators when needed instead of trying to store long-lived ones.

If the list owner will be moved, then you need some memory address to share somehow.
You already indicated the unique_ptr. It's a decent solution if the non-owners don't need to save it internally.
The std::shared_ptr is an obvious alternative.
Finally, you can have a std::shared_ptr in the owner object, and pass std::weak_ptr to non-owners.

Memory management in the composite pattern

I am encountering the same problem over and over, in the last weeks. Boiled down to its core, I build a (directed acyclic) hierarchy of objects like:
a -> c
b -> c
b -> d
An instance can have more than one parent, and more than one child. c might be a value shared among readers and writers. b might be a composite value. The hierarchy is easily created by a factory - e.g. a.setChild(new C()). Later, a client only cares about the root, on which she calls getValue() or setValue().
My question is: Who cleans up the hierarchy? Who is responsible to call delete on b or c?
Option - The factory created the nodes, the factory must delete the nodes:
I do not like this option, because I understand a factory as a replacement for new. It feels weird to keep a factory until the instances it created can be destroyed.
Option - "Smart" pointers:
Much less good, because it pollutes the interface, and introduces a lot of complexity for a simple thing such as pointers.
Option - A graph-like class that does the memory management:
The class collects all nodes a, b, c, d, ... and provides access to the root. The nodes themselves reference each other, but do not delete children, or parents. If the "graph" or composite manager is destroyed, it destroys all nodes.
I prefer the last option. It, however, complicates the construction of the hierarchy. Either you build the hierarchy outside, and tell the graph about every single node, or you build the hierarchy inside the graph, which smells like option 1. The graph can only delete, what it knows. Therefore, memory leaks are somehow in the design, if you have to pass the nodes to the graph.
Is there a pattern for this problem? Which strategy do you prefer?
Edit 1 - Sep 1st, 2014: Sorry, for being unspecific with respect to smart pointers. I tried to avoid yet another "when to use smart pointers question", and instead focus the question on alternative solutions. However, I am willing to use smart pointers, if they indeed are the best option (or if necessary).
In my opinion, the signature setChild(C* child) should be preferred over setChild(std::shared_ptr<C> child) for the same reason as for-each loops should be preferred over iterators. However, I must admit that like std:string a shared pointer is more specific about its semantic.
In terms of complexity, every operation inside a node, now, have to deal with shared pointers. std::vector<C*> becomes std::vector< std::shared_ptr<C> >, ... Furthermore, every pointer carries a reference count around, which could be avoided if there were other options.
I should add that I develop a low level part of a real-time system. It is not a firmware but close.
Edit 2 - Sep 1st, 2014:
Thank you for all the input. My specific problem is: I get a byte array with lets say sensor data. At some point, I am told which value is written where in that array. On the one hand, I want to have a mapping from a position in the array to a primitive value (int32, double, ...). On the other hand, I want to merge primitive values to complex types (structures, vectors, ...). Unfortunately, not all mappings are bidirectional. Eg. I can read a comparison between values, but I can not write values according to the comparison result. Therefore, I separated readers and writers and let them, if necessary, access the same value.

Option - "Smart" pointers: Much less good, because it pollutes the
interface, and introduces a lot of complexity for a simple thing such
as pointers.
Smart pointers are pointers with memory management, which is exactly what you need.
You are not forced to expose the smart pointers in your interface, you can get a raw pointer from the smart one as long as the object is still owned by a smart pointer somewhere (Note that this is not necessarily the best idea though, smart pointers in an interface is far from being an ugly thing).
Actually exposing raw pointers in your interface indirectly introduce much more pollution and complexity because you need to document ownership and lifetime rules which are explicit when using smart pointers, which makes them even more simpler to use than "simple pointers".
Also it is most likely the "futur" way of doing things: since c++11/14 smart pointers are part of the standard as much as std::string, would you say that std::string pollutes the interface and introduces a lot of complexity compared to a const char* ?
If you have performance issues, then the question is: will your hand crafted alternative solution be truly more performant (this need to be measured) and is it worth it in term of development time required to develop the feature and maintain the code ?

Pointers to objects in a set or in a vector - does it matter?

just came a across a situation where I needs to store heap-allocated pointers (to a class B) in an STL container. The class that owns the privately held container (class A) also creates the instances of B. Class A will be able to return a const pointers to B instances for clients of A.
Now, does it matter if these pointer are stored in a set or a vector? I thought of having a set just to verify that no duplicates are stored but since addresses are stored, two B pointers with the same data can be stored (unless I provide a comparison class for data comparison I presume).
Any thoughts on this (quite vague) subject? What are the pros/cons for the alternatives? Are smart_pointers something to look into?
Please ask me if anything imperative is unclear, thank you!

There's nothing wrong with storing pointers in a standard container - be it a vector, set, map, or whatever. You just have to be aware of who owns that memory and make sure that it's released appropriately. When choosing a container, choose the container that makes the most sense for your needs. vector is great for random access and appending but not so great for inserting elsewhere in the container. list deals with insertions extremely well, but it doesn't have random access. Sets ensure that there are no duplicates in the container and it's sorted (though the sorting isn't very useful if the set holds pointers and you don't give a comparator function) whereas a map is a set of key-value pairs, so sorting and access is done by key. Etc. Etc. Every container has its pros and cons and which is best for a particular situation depends entirely on that situation.
As for pointers, again, having pointers in containers is fine. The issue that you need to worry about is who owns the memory and therefore must worry about freeing it. If there is a clear object that owns what a particular pointer points to, then it should probably be that object which frees it. If it's essentially the container which owns the memory, then you need to make sure that you delete all of the pointers in the container before the container is destroyed.
If you are concerned with there being multiple pointers to the same data floating around or there is no clear owner for a particular chunk of memory, then smart pointers are a good solution. Boost's shared_ptr would probably be a good one to use, and shared_ptr will be part of C++0x. Many would suggest that you should always use shared pointers, but there is some overhead involved and whether it's best for your particular application will depend entirely on your application.
Ultimately, you need to be aware of the strengths and weaknesses of the various container types and determine what the best container is for whatever you're doing. The same goes for how to deal with pointer management. You need to write your program in a way that it's clear who owns a particular chunk of memory and make sure that that owner frees it when appropriate. Shared pointers are just one solution for that (albeit an excellent one). What the best solution is depends on the particulars of your program.

Why would there be any duplicates in the first place? If class A is the sole entity responsible for creating the instances, and it holds the container privately, meaning there's no way for others to mutate it, it seems to me that there should be no cause for duplicates. Well, if there is, won't that be remediable by some checking prior to adding the pointer to the vector?
I don't know why it would matter if you store a pointer in what kind of container. Containers don't really manipulate their data, they only provide access to them in different ways. So, it's up to you :)

If you need to store pointers in stl containers, use shared_ptr.
Now, the set sounds completely wrong. What are you going to do with those?
If you need to add and remove, then list.
If you need to iterate over range, or all, then vector.
If you need to access specific, knowing a key, then map.
Take a look at others as well. One size doesn't fit all.

My answer is that any decisions that you make have be be with your goal in mind. If you need a 'no duplicates allowed' rule that a set enforces, then use a set. If not then you might want to use a vector or any container might do the trick.
As for smart_pointers yes, they are really really useful. Should they be used? I don't know, once again I don't know what your end goal is or the problem that you are trying to solve with them.
Basically it comes down this. If I said "I want to use a hammer. What do you think of that?" you would probably say "Well what for, I know that hammers are pretty good for nails and wood scenarios but they could also be used as a tool to hurt people or maybe as a book stand. Look, just wait a second, what is this for again?" The problem is that I have not, really said why I want to use a hammer. I have not said what goal I am trying to achieve.
So if you have sort of an overall goal then why not let us know, then it will be obvious if you are using the right tools for the job and we can help you more.

In my opinion, stick with vector unless you have a real reason not to. Sets come with some runtime overhead as well as quite large semantic overhead compared to a vector.

Handling one object in some containers

I want to store pointers to one instance of an object in some (two or more) containers. I've met one problem in this idea: how I can handle removing of this object. Objects have rather stormy life (I am talking about game, but I think this situation is not so specific) and can be removed rather often. To my mind this problem is divided into two problems
1.
How should I signal to containers about deletion? In C# I used to create boolean property IsDead in stored objects, so each iteration of the main loop at first finds 'dead' objects and removes them. No circular reference and everything is rather clear :-) Is this technique correct?
2.
Even if I implement this technique in C++ I meet difficulty with calling destructors if this object is in some containers. Even if I create some kind of a field 'IsDead' and remove dead object from all lists, I had to free memory.
After reading some articles I have an idea that I should have one 'main' container with shared_ptr to all my objects, and other containers should store weak_ptr to them, so only main container checks object's status and others look only at shared_ptr. Are my intentions correct or is there another solution?

It sounds like you're looking for shared_ptr<T>.
http://msdn.microsoft.com/en-us/library/bb982026.aspx
This is a reference counted ptr in C++ that enables easy sharing of objects. The shared_ptr<T> can be freely handed out to several objects. As the shared_ptr instances are copied around and destucted the internal reference counter will be updated appropriately. When all references are removed the underlying data will be deleted.

Boost shared_ptr use_count function

My application problem is the following -
I have a large structure foo. Because these are large and for memory management reasons, we do not wish to delete them when processing on the data is complete.
We are storing them in std::vector<boost::shared_ptr<foo>>.
My question is related to knowing when all processing is complete. First decision is that we do not want any of the other application code to mark a complete flag in the structure because there are multiple execution paths in the program and we cannot predict which one is the last.
So in our implementation, once processing is complete, we delete all copies of boost::shared_ptr<foo>> except for the one in the vector. This will drop the reference counter in the shared_ptr to 1. Is it practical to use shared_ptr.use_count() to see if it is equal to 1 to know when all other parts of my app are done with the data.
One additional reason I'm asking the question is that the boost documentation on the shared pointer shared_ptr recommends not using "use_count" for production code.
Edit -
What I did not say is that when we need a new foo, we will scan the vector of foo pointers looking for a foo that is not currently in use and use that foo for the next round of processing. This is why I was thinking that having the reference counter of 1 would be a safe way to ensure that this particular foo object is no longer in use.

My immediate reaction (and I'll admit, it's no more than that) is that it sounds like you're trying to get the effect of a pool allocator of some sort. You might be better off overloading operator new and operator delete to get the effect you want a bit more directly. With something like that, you can probably just use a shared_ptr like normal, and the other work you want delayed, will be handled in operator delete for that class.
That leaves a more basic question: what are you really trying to accomplish with this? From a memory management viewpoint, one common wish is to allocate memory for a large number of objects at once, and after the entire block is empty, release the whole block at once. If you're trying to do something on that order, it's almost certainly easier to accomplish by overloading new and delete than by playing games with shared_ptr's use_count.
Edit: based on your comment, overloading new and delete for class sounds like the right thing to do. If anything, integration into your existing code will probably be easier; in fact, you can often do it completely transparently.
The general idea for the allocator is pretty much the same as you've outlined in your edited question: have a structure (bitmaps and linked lists are both common) to keep track of your free objects. When new needs to allocate an object, it can scan the bit vector or look at the head of the linked list of free objects, and return its address.
This is one case that linked lists can work out quite well -- you (usually) don't have to worry about memory usage, because you store your links right in the free object, and you (virtually) never have to walk the list, because when you need to allocate an object, you just grab the first item on the list.
This sort of thing is particularly common with small objects, so you might want to look at the Modern C++ Design chapter on its small object allocator (and an article or two since then by Andrei Alexandrescu about his newer ideas of how to do that sort of thing). There's also the Boost::pool allocator, which is generally at least somewhat similar.

If you want to know whether or not the use count is 1, use the unique() member function.

I would say your application should have some method that eliminates all references to the Foo from other parts of the app, and that method should be used instead of checking use_count(). Besides, if use_count() is greater than 1, what would your program do? You shouldn't be relying on shared_ptr's features to eliminate all references, your application architecture should be able to eliminate references. As a final check before removing it from the vector, you could assert(unique()) to verify it really is being released.

I think you can use shared_ptr's custom deleter functionality to call a particular function when the last copy has been released. That way, you're not using use_count at all.
You would need to hold something other than a copy of the shared_ptr in your vector so that the shared_ptr is only tracking the outstanding processing.
Boost has several examples of custom deleters in the shared_ptr docs.

I would suggest that instead of trying to use the shared_ptr's use_count to keep track, it might be better to implement your own usage counter. this way you will have full control over this rather than using the shared_ptr's one which, as you rightly suggest, is not recommended. You can also pre-set your own counter to allow for the number of threads you know will need to act on the data, rather than relying on them all being initialised at the beginning to get their copies of the structure.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js