Design of (shared_ptr + weak_ptr) compatible with raw pointers

Design of (shared_ptr + weak_ptr) compatible with raw pointers - c++

Preamble
In C++11 there is std::shared_ptr + std::weak_ptr combo. Despite being very useful, it has a nasty issue: you cannot easily construct shared_ptr from a raw pointer. As a result of this flaw, such smart pointers usually become "viral": people start to completely avoid raw pointers and references, and use exclusively shared_ptr and weak_ptr smart pointers all over the code. Because there is no way to pass a raw reference into a function expecting a smart pointer.
On the other hand, there is boost::intrusive_ptr. It is equivalent to std::shared_ptr and can easily be constructed from raw pointer, because reference counter is contained within the object. Unfortunately, there is no weak_ptr companion to it, so there is no way to have non-owning references which you could check for being invalid. In fact, some believe that weak companion for intrusive_ptr is impossible.
Now, there is std::enable_shared_from_this, which embeds a weak_ptr directly into your class, so that you could construct shared_ptr from pointer to object. But there is small limitation (at least one shared_ptr must exist), and it still does not allow the obvious syntax: std::shared_ptr(pObject).
Also, there is a std::make_shared, which allocates reference counters and the user's object in a single memory chunk. This is very close to the concept of intrusive_ptr, but the user's object can be destroyed independently of the reference counting block. Also, this concept has an inevitable drawback: the whole memory block (which can be large) is deallocated only when all weak_ptr-s are gone.
Question
The main question is: how to create a pair of shared_ptr/weak_ptr, which would have the benefits of both std::shared_ptr/std::weak_ptr and boost::intrusive_ptr?
In particular:
shared_ptr models shared ownership over the object, i.e. the object is destroyed exactly when the last shared_ptr pointing to it is destroyed.
weak_ptr does not model ownership over the object, and it can be used to solve the circular dependency problem.
weak_ptr can be checked for being valid: it is valid when there exists a shared_ptr pointing to the object.
shared_ptr can be constructed from a valid weak_ptr.
weak_ptr can be constructed from a valid raw pointer to the object. Raw pointer is valid if there exists at least one weak_ptr still pointing to that object. Constructing weak_ptr from invalid pointer results in undefined behavior.
The whole smart pointer system should be cast-friendly, like the abovementioned existing systems.
It is OK for being intrusive, i.e. asking the user to inherit once from given base class. Holding the object's memory when the object is already destroyed is also OK. Thread safety is very good to have (unless being too inefficient), but solutions without it are also interesting. It is OK to allocate several chunks of memory per object, though having one memory chunk per object is preferred.

Points 1-4 and 6 are already modelled by shared_ptr/weak_ptr.
Point 5 makes no sense. If lifetime is shared, then there is no valid object if a weak_ptr exists but a shared_ptr does not. Any raw pointer would be an invalid pointer. The lifetime of the object has ended. The object is no more.
A weak_ptr does not keep the object alive, it keeps the control block alive. A shared_ptr keeps both the control block and the controlled object alive.
If you don't want to "waste" memory by combining the control block with the controlled object, don't call make_shared.
If you don't want shared_ptr<X> to be passed virally into functions, don't pass it. Pass a reference or const reference to the X. You only need to mention shared_ptr in the argument list if you intend on managing the lifetime in the function. If you simply want to perform operations on what the shared_ptr is pointing at, pass *p or *p.get() and accept a [const] reference.

Override new on the object to allocate a control block before the instance of the object.
This is pseudo-intrusive. Conversion to from raw pointer is possible, because of the known offset. The object can be destroyed without a problem.
The reference counting block holds a strong and weak count, and a function object to destroy the object.
Downside: it doesn't work polymorphically very well.
Imagine we have:
struct A {int x;};
struct B {int y;};
struct C:B,A {int z;};
then we allocate a C this way.
C* c = new C{};
and store it in an A*:
A* a = c;
We then pass this to a smart-pointer-to-A. It expects the control block to be immediately before the address a points to, but because B exists before A in the inheritance graph of C, there is an instance of B there instead.
That seems less than ideal.
So we cheat. We again replace new. But it instead registers the pointer value and size with a registry somewhere. There we store the weak/strong pointer counts (etc).
We rely on a linear address space and class layout. When we have a pointer p, we simply look for whose range of address it is in. Then we know the strong/weak counts.
This one has horrible performance in general, especially multi-threaded, and relies upon undefined behavior (pointer comparisons for pointers not pointing to the same object, or less order in such cases).

In theory, it is possible to implement intrusive version of shared_ptr and weak_ptr, but it might be unsafe due to C++ language limitations.
Two reference counters (strong and weak) are stored in the base class RefCounters of the managed object. Any smart pointer (either shared or weak) contains a single pointer to the managed object. Shared pointers own the object itself, and shared + weak pointers together own the memory block of the object. So when the last shared pointer is gone, object is destroyed, but its memory block remains alive as long as there is at least one weak pointer to it. Casting pointers works as expected, given that all the involved types are still inherited from the RefCounted class.
Unfortunately, in C++ it is usually forbidden to work with members of object after the object is destroyed, although most implementations should allow doing that without problems. More details about legibility of the approach can be found in this question.
Here is the base class required for the smart pointers to work:
struct RefCounters {
size_t strong_cnt;
size_t weak_cnt;
};
struct RefCounted : public RefCounters {
virtual ~RefCounted() {}
};
Here is a part of shared pointer definition (shows how object is destroyed and memory chunk is deallocated):
template<class T> class SharedPtr {
static_assert(std::is_base_of<RefCounted, T>::value);
T *ptr;
RefCounters *Counter() const {
RefCounters *base = ptr;
return base;
}
void DestroyObject() {
ptr->~T();
}
void DeallocateMemory() {
RefCounted *base = ptr;
operator delete(base);
}
public:
~SharedPtr() {
if (ptr) {
if (--Counter()->strong_cnt == 0) {
DestroyObject();
if (Counter()->weak_cnt == 0)
DeallocateMemory();
}
}
}
...
};
Full code with sample is available here.

Related

A shared pointer which is conceptually owned by one, unique, object

What is the canonical way to deal with shared pointers in C++ when there is a clear case to argue that "one, unique object owns the pointer"?
For example, if a shared_ptr is a member of a particular class, which is responsible for initializing the pointer, then it could be argued that this class should also have the final say on when the pointer is deleted.
In other words, it may be the case that when the owning class goes out of scope (or is itself delete'd that any remaining references to the pointer no longer make sense. This may be due to related variables which were members of the destroyed class.
Here is a sketch of an example
class Owner
{
Owner()
{
p.reset(malloc_object(arguments), free_object);
}
std::shared_ptr<type> get() { return p; }
// seems strange because now something somewhere
// else in the code can hold up the deletion of p
// unless a manual destructor is written
~Owner()
{
p.reset(nullptr); // arduous
}
std::shared_ptr<type> p;
int a, b, c; // some member variables which are logically attached to p
// such that neither a, b, c or p make sense without each other
}
One cannot use a unique_ptr as this would not permit the pointer to be returned by the get function, unless a raw pointer is returned. (Is this is an acceptable solution?)
A unique_ptr in combination with returning weak_ptr from the get function might make sense. But this is not valid C++. weak_ptr is used in conjunction with shared_ptr.
A shared_ptr with the get function returning weak_ptr is better than a raw pointer becuase in order to use the weak pointer, it has to be converted to a shared pointer. This will fail if the reference count is already zero and the object has been deleted.
However using a shared_ptr defeats the point, since ideally a unique_ptr would be chosen because there can then only be one thing which "owns" the pointed to data.
I hope the question is clear enough, it was quite difficult to explain since I can't copy the code I am working with.

It is ok to return the shared_ptr there, what will happen is that the pointer will still be held somewhere outside the Owner class. Since your doing p.reset(nullptr); at the destructor, whoever was holding that shared_ptr will now be holding a pointer to null.
Using weak_ptr with shared_ptr is also a good solution, the problem is the same which is the fact that the best class to represent p is unique_ptr as you described.
The path I would choose is to hold a unique_ptr which seems more adequate and to implement the get() function like this:
type* get() { return p.get(); }
The behaviour is the same and the code is clearer since having p as unique_ptr will give clarity on how it should be used.

Will instance of shared_ptr<Base> and shared_ptr<Derived> with same raw pointer share reference count?

Let's say I have two classes, Base and Derived, where Derived inherits from Base. Now, let's say I execute the following code:
shared_ptr<Derived> derivedPtr = make_shared<Derived>();
shared_ptr<Base> basePtr = derivedPtr;
Will the copying of derivedPtr to basePtr result in derivedPtr's reference count being updated (so that derivedPtr.use_count() and basePtr.use_count() equal 2)? Or, since the two instances of shared_ptr are different types, will the two have a separate reference count that isn't shared (so that derivedPtr.use_count() and basePtr.use_count() equal 1)?

So shared_ptr is more than just a pointer and a reference count.
It is a pointer and a pointer to a control block. That control block contains a strong count, a weak count, and a destruction function.
There are 3 ways to construct a shared_ptr.
First, you can construct it from a raw pointer. When that happens, it allocates a control block and sticks a "destroyer" function into it to destroy the raw pointer memory (delete t;).
Second, you can use make_shared. This allocates one block with space for both the control block and the object in it. It then sets the destroyer up to just destroy the object, and not recycle the memory. The destructor of the control block cleans up both memory allocations.
Third, there is the aliasing constructors. These share control blocks (and hence destruction code), but have a different object pointer.
The most common aliasing constructor is the one that creates a pointer-to-base, which you are doing above. The pointer-to-base differs from the shared ptr you created it from, but the control block remains the same. So whenever the control block hits 0 strong reference counts, it destroys the object as its original derived object.
The rarer one can be used to return shared pointers to member variables, like this:
struct Bob {
int x;
};
auto pBob = std::make_shared<Bob>();
pBob->x = 7;
auto pInt = std::shared_ptr<int>( pBob, &(pBob->x) );
now pInt is a pointer to pBob->x that shares the reference counting of the Bob created 2 lines above (where we made pBob).
pBob = {};
now the last pointer to the Bob is gone, but the object survives, kept alive by the pInt's control block (and strong count) ownership.
Then when we:
pInt = {};
finally the Bob is deallocated.
The cast-to-base implicit conversion you did in your question is just a variation of this.
This second aliasing constructor can also be used to do extremely strange things, but that is another topic.
shared/weak ptr is one of those cases where it seems you can just "monkey code" it without understanding it, but in my experience using shared ownership is sufficiently hard that fully understanding shared ptr is (a) easier than getting shared ownership right, and (b) makes getting shared ownership right easier.

C++11 Reference count smart pointer design

I am reading this,
http://www.informit.com/articles/article.aspx?p=31529&seqNum=5
and author explain three types of smart pointer design (see pictures at the end of the post).
I believe current GCC, CLang and probably Visual C++ uses smart pointers with control block.
I can imagine why intrusive reference counting is not used, but what is the problem with second implementation - smart pointer with pointer to pointer block? There should be two pointer de-references, but smart pointer object size will be just half.
smart pointer with control block
smart pointer with pointer to pointer block
smart pointer with intrusive reference counting

One important reason is performance, shared_ptr::get() doesn't have to dereference a pointer to find the object address if it's stored directly inside the shared_ptr object.
But apart from performance, the smart pointer with pointer to pointer block implementation wouldn't support all the things you can do with shared_ptr e.g.
std::shared_ptr<int> pi(new int(0));
std::shared_ptr<void> pv = pi;
std::shared_ptr<int> pi2 = static_pointer_cast<int>(pv);
struct A {
int i;
};
std::shared_ptr<A> pa(new A);
std::shared_ptr<int> pai(pa, pa->i);
struct B { virtual ~B() = default; };
struct C : B { };
std::shared_ptr<B> pb(new C);
std::shared_ptr<C> pc = std::dynamic_pointer_cast<C>(pb);
In these examples pv, pai and pb store a pointed that is not the same type as the pointer owned by the control block, so there must be a second pointer (which might be a different type) stored in the shared_ptr itself.
For pv and pb it would be possible to make it work, by converting the pointer stored in the control block to the type that needs to be returned. That would work in some cases, although there are examples using multiple inheritance that would not work correctly.
But for the pai example (which uses the aliasing constructor) there is no way to make that work without storing a pointer separate to the one in the control block, because the two pointers are completely unrelated types and you can't convert between them.
You said in a comment:
I see and in case of make_shared, second pointer points to the address internal to the allocated block. (I actually tried this already and it seems that way)
Yes, that's correct. There is still a second pointer, but both poitners refer into the same block of memory. This has the advantage that only one memory allocation is needed instead of two separate ones for the object and the control block. Additionally, the object and control block are adjacent in memory so are more likely to share a cache line. If the CPU has got the ref-count in its cache already then it probably also has the object in its cache, so accessing them both is faster and means there is another cache line available to be used for other data.

Should a pointer be the same before and after adding to a unique_ptr?

I have a std::vector of unique_ptrs and I'm happy to have them manage the life cycle of those objects.
However I require storing other pointers to those objects for convenience. I know that once unique_ptr removes something, those other pointers will dangle. But I'm more concerned about the validity of those pointers before and after unique_ptr gets them.
I do not always create via new within the unique_ptr itself, for example I might pass new Something as a function parameter in which case the unique_ptr is using move on that pointer into itself inside the function.
But I might also new Something before I pass it into a function that then assigned it a unique_ptr.
Once an object is assigned to a unique_ptr I can get a pointer to it via get(). But can I always assume that this get() pointer points to the same place as the pointer initially obtained via new if the original pointer was created before the assignment to a unique_ptr ?
My assumption is Yes, and that even if the vector resizes and reallocates, the unique_ptr as well as any other pointers to the objects in memory remain the same.

Yes, a std::unique_ptr<T> holds a pointer to T, and it will not alter the value between initialization and later retrieval with get()
A common use of a unique_ptr is to assign one "parent" object ownership of a dynamically-allocated "subobject", in a similiar same way as:
struct A
{
B b;
}
int main()
{
A a = ...;
B* p = &a.b;
}
In the above b is a true subobject of A.
Compare this to:
struct A
{
unique_ptr<B> b = new B(...);
}
int main()
{
A a = ...;
B* p = a.b.get();
}
In the above A and (*b) have a similar relationship to the first example, except here the B object is allocated on the heap. In both cases the destructor of A will destroy the "subobject". This "on heap" subobject structure may be preferable in some cases, for example because B is a polymorphic base type, or to make B an optional/nullable subobject of A.
The advantage of using unique_ptr over a raw pointer to manage this ownership relationship is that it will automatically destroy it in As destructor, and it will automatically move construct and move assign it as part of A.
As usual, in both cases, you must be careful that the lifetime of any raw pointers to the subobject are enclosed by the lifetime of the owning object.

Yes, you are correct, because unique_ptr does not copy the object; therefore, it has to point to the same address. However, once you give a pointer to a unique_ptr to own, you should not use that raw pointer any more, because the unique_ptr could be destroyed and deallocate the memory, and turn your raw pointer into a dangling pointer. Perhaps shared_ptr would be better for your situation.

shared_ptr deletes the object

void ClassName::LocalMethod( )
{
boost::shared_ptr<ClassName> classNamePtr( this );
//some operation with classNamePtr
return;
}
Here the object is getting released when it returns from LocalMethod() since classNamePtr is out of scope. Isn't the shared_ptr smart enough to know that the ClassName object is still in scope and not to delete it?

What does it mean to create a shared_ptr to an object? It means that the holder of the shared_ptr now assumes ownership over the object. Ownership meaning that the object will be deleted when he so desires. When the holder of the shared_ptr destroys its shared_ptr, that will cause the object to potentially be destroyed, assuming that there are no other shared_ptrs to that object.
When a shared_ptr is a member of a class, that means that the lifetime of the object pointed to by the shared_ptr is at least as long as the object that the shared_ptr is a member of. When a shared_ptr is on the stack, this means that the lifetime of the object that the shared_ptr is pointing to will be at least as long as the scope it was created in. Once the object falls off the stack, it may be deleted.
The only time you should ever take a pointer and wrap it into a shared_ptr is when you are allocating the object initially. Why? Because an object does not know whether it is in a shared_ptr or not. It can't know. This means that the person who creates the original shared_ptr now has the responsibility to pass it around to other people who need to share ownership of that memory. The only way shared ownership works is through the copy constructor of shared_ptr. For example:
shared_ptr<int> p1 = new int(12);
shared_ptr<int> p2 = p1.get();
shared_ptr<int> p3 = p1;
The copy constructor of shared_ptr creates shared ownership between p1 and p3. Note that p2 does not share ownership with p1. They both think they have ownership over the same memory, but that's not the same as sharing it. Because they both think that they have unique ownership of it.
Therefore, when the three pointers are destroyed, the following will happen. First, p3 will be destroyed. But since p3 and p1 share ownership of the integer, the integer will not be destroyed yet. Next, p2 will be destroyed. Since it thinks that it is the only holder of the integer, it will then destroy it.
At this point, p1 is pointing to deleted memory. When p1 is destroyed, it thinks that it is the only holder of the integer, so it will then destroy it. This is bad, since it was already destroyed.
Your problem is this. You are inside an instance of a class. And you need to call some functions of yours that take a shared_ptr. But all you have is this, which is a regular pointer. What do you do?
You're going to get some examples that suggest enable_shared_from_this. But consider a more relevant question: "why do those functions take a shared_ptr as an argument?"
The type of pointer a function takes is indicative of what that function does with its argument. If a function takes a shared_ptr, that means that it needs to own the pointer. It needs to take shared ownership of the memory. So, look at your code and ask whether those functions truly need to take ownership of the memory. Are they storing the shared_ptr somewhere long-term (ie: in an object), or are they just using them for the duration of the function call?
If it's the latter, then the functions should take a naked pointer, not a shared_ptr. That way, they cannot claim ownership. Your interface is then self-documenting: the pointer type explains ownership.
However, it is possible that you could be calling functions that truly do need to take shared ownership. Then you need to use enable_shared_from_this. First, your class needs to be derived from enable_shared_from_this. Then, in the function:
void ClassName::LocalMethod()
{
boost::shared_ptr<ClassName> classNamePtr(shared_from_this());
//some operation with classNamePtr
return;
}
Note that there is a cost here. enable_shared_from_this puts a boost::weak_ptr in the class. But there is no virtual overhead or somesuch; it doesn't make the class virtual. enable_shared_from_this is a template, so you have to declare it like this:
class ClassName : public boost::enable_shared_from_this<ClassName>

Isn't the shared_ptr smart enough to know that the ClassName object is
still in scope and not to delete it?
That's not how shared_ptr works. When you pass a pointer while constructing a shared_ptr, the shared_ptr will assume ownership of the pointee (in this case, *this). In other words, the shared_ptr assumes total control over the lifetime of the pointee by virtue of the fact that the shared_ptr now owns it. Because of this, the last shared_ptr owning the pointee will delete it.
If there will be no copies of classNamePtr outside of ClassName::LocalMethod(), you can pass a deleter that does nothing while constructing classNamePtr. Here's an example of a custom deleter being used to prevent a shared_ptr from deleting its pointee. Adapting the example to your situation:
struct null_deleter // Does nothing
{
void operator()(void const*) const {}
};
void ClassName::LocalMethod()
{
// Construct a shared_ptr to this, but make it so that it doesn't
// delete the pointee.
boost::shared_ptr<ClassName> classNamePtr(this, null_deleter());
// Some operation with classNamePtr
// The only shared_ptr here will go away as the stack unwinds,
// but because of the null deleter it won't delete this.
return;
}
You can also use enable_shared_from_this to obtain a shared_ptr from this. Note that the member function shared_from_this() only works if you have an existing shared_ptr already pointing to this.
class ClassName : public enable_shared_from_this<ClassName>
{
public:
void LocalMethod()
{
boost::shared_ptr<ClassName> classNamePtr = shared_from_this();
}
}
// ...
// This must have been declared somewhere...
shared_ptr<ClassName> p(new ClassName);
// before you call this:
p->LocalMethod();
This is the more appropriate, "official" method and it's much less hackish than the null deleter method.
It could also be that you don't actually need to create a shared_ptr in the first place. What goes into the section commented //some operation with classNamePtr? There might be an even better way than the first two ways.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js