Any equivalent of Rust's Arc::try_unwrap in c++? - c++

This is a c++ ecosystem question - though it is easiest to ask to refer to Rust.
Are there stable implementations of a thread-safe / reference count smart pointers which support to "unwrap" it in a thread-safe manner - under the condition that there is ref-count of exactly 1, as in https://doc.rust-lang.org/std/sync/struct.Arc.html#method.try_unwrap.
Coarsely, speaking std::shared_ptr is similar to ARC, but this use-case seems not to be supported, nor does it appear straight forward to implement (e.g. see https://en.cppreference.com/w/cpp/memory/shared_ptr/use_count#Notes).

The exhaustive API of std::shared_ptr is available online (see cppreference) and as you can see there is no built-in support.
Furthermore, due to race-conditions with the promotion of std::weak_ptr, it is not possible to safely use use_count or unique to implement such functionality -- and unique was deprecated in C++17 and removed in C++20.
As a result, the functionality is simply not available with std::shared_ptr.
There may be other implementations of std::shared_ptr which offer this functionality -- though Boost's doesn't appear to.
As noted in the notes of use_count, the primary difficulty in implementing this function is the potential race-condition with weak_ptr promotion. That is, a naive:
// Susceptible to race-conditions, do not use!
if (pointer.use_count() == 1) {
return std::move(*pointer);
}
return std::nullopt;
Would not work because between the check and the actual move, a new shared owner may have appeared in another thread allowing concurrent access to the value.
The only ways to have this functionality safely are:
The shared_ptr implementation does not support weak pointers in the first place.
The shared_ptr implementation provides it, and ensures the absence of race condition with weak_ptr promotion.
I note that the latter typically requires locking the same lock used for weak_ptr promotion; hence why it cannot be provided externally.
A weaker variant could be implemented if unique were also guaranteeing the absence of weak_ptr. Although it would not be strictly equivalent as the presence of any weak_ptr would cause it to fail, it could still be useful in many scenarios where no weak_ptr is created.

Related

Partial template specialization of std::atomic for smart pointers

Background
Since C++11, atomic operations on std::shared_ptr can be done via std::atomic_... methods found here, because the partial specialization as shown below is not possible:
std::atomic<std::shared_ptr<T>>
This is due to the fact that std::atomic only accepts TriviallyCopyable types, and std::shared_ptr (or std::weak_ptr) is not trivially copyable.
However, as of C++20, these methods have been deprecated, and got replaced by the partial template specialization of std::atomic for std::shared_ptr as described here.
Question
I am not sure of
Why std::atomic_... got replaced.
Techniques used to enable the partial template specialization of std::atomic for smart pointers.
Several proposals for atomic<shared_ptr> or something of that nature explain a variety of reasons. Of particular note is P0718, which tells us:
The C++ standard provides an API to access and manipulate specific shared_ptr objects atomically, i.e., without introducing data races when the same object is manipulated from multiple threads without further synchronization. This API is fragile and error-prone, as shared_ptr objects manipulated through this API are indistinguishable from other shared_ptr objects, yet subject to the restriction that they may be manipulated/accessed only through this API. In particular, you cannot dereference such a shared_ptr without first loading it into another shared_ptr object, and then dereferencing through the second object.
N4058 explains a performance issue with regard to how you have to go about implementing such a thing. Since shared_ptr is typically bigger than a single pointer in size, atomic access typically has to be implemented with a spinlock. So either every shared_ptr instance has a spinlock even if it never gets used atomically, or the implementation of those atomic functions has to have a lookaside table of spinlocks for individual objects. Or use a global spinlock.
None of these are problems if you have a type dedicated to being atomic.
atomic<shared_ptr> implementations can use the usual techniques for atomic<T> when T is too large to fit into a CPU atomic operation. They get to get around the TriviallyCopyable restriction by fiat: the standard requires that they exist and be atomic, so the implementation makes it so. C++ implementations don't have to play by the same rules as regular C++ programs.

Use constructor in place of atomic.store() when atomicity is not currently needed

I use std::atomic for atomicity. Still, somewhere in the code, atomicity is not needed by program logic. In this case, I'm wondering whether it is OK, both pedantically and practically, to use constructor in place of store() as an optimization. For example,
// p.store(nullptr, std::memory_order_relaxed);
new(p) std::atomic<node*>(nullptr);
In accord with the standard, whether this works depends entirely on the implementation of std::atomic<T>. If it is lock-free for that T, then the implementation probably just stores a T. If it isn't lock-free, things get more complex, since it may store a mutex or some other thing.
The thing is, you don't know what std::atomic<T> stores. This matters because if it stores a const-qualified object or a reference type, then reusing the storage here will cause problems. The pointer returned by placement-new can certainly be used, but if a const or reference type is used, the original object name p cannot.
Why would std::atomic<T> store a const or reference type? Who knows; my point is that, because its implementation is not under your control, then pedantically you cannot know how any particular implementation behaves.
As for "practically", it's unlikely that this will cause a problem. Especially if the atomic<T> is always lock-free.
That being said, "practically" should also include some notion of how other users will interpret this code. While people experienced with doing things like reusing storage will be able to understand what the code is doing, they will likely be puzzled by why you're doing it. That means you'll need to either stick a comment on that line or make a (template) function non_atomic_reset.
Also, it should be noted that std::shared_ptr uses atomic increments/decrements for its reference counter. I bring that up because there is no std::single_threaded_shared_ptr that doesn't use atomics, or a special constructor that doesn't use atomics. So even in cases where you're using shared_ptr in pure single-threaded code, those atomics are still firing. This was considered a reasonable tradeoff by the C++ standards committee.
Atomics aren't cheap, but they're not that expensive (most of the time) that using unusual mechanisms like this to bypass an atomic store is a good idea. As always, profile to see if the code obfuscation is worth it.

Why atomic overloads for shared_ptr exist

Why are there are atomic overloads for shared_ptr as described here rather than there being a specialization for std::atomic which deals with shared_ptrs. Seems inconsistent with the object oriented patterns employed by the rest of the C++ standard library..
And just to make sure I am getting this right, when using shared_ptrs to implement the read copy update idiom we need to do all accesses (reads and writes) to shared pointers through these functions right?
Because:
std::atomic may be instantiated with any TriviallyCopyable type T.
Source: http://en.cppreference.com/w/cpp/atomic/atomic
And
std::is_trivially_copyable<std::shared_ptr<int>>::value == false;
Thus, you cannot instantiate std::atomic<> with std::shared_ptr<>. However, automatic memory management is useful in multi-threading, thus those overloads were provided. Those overloads are most likely not lock-free however (one of the big draws of using std::atomic<> in the first place); they probably use a lock to provide synchronicity.
As for your second question: yes.

What's the overhead from shared_ptr being thread-safe?

std::shared_ptr is guaranteed to be thread-safe. I don't know what mechanism the typical implementations use to ensure this, but surely it must have some overhead. And that overhead would be present even in the case that your application is single-threaded.
Is the above the case? And if so, does that means it violates the principle of "you don't pay for what you don't use", if you aren't using the thread-safety guarantees?
If we check out cppreference page for std::shared_ptr they state the following in the Implementation notes section:
To satisfy thread safety requirements, the reference counters are typically incremented and decremented using std::atomic::fetch_add with std::memory_order_relaxed.
It is interesting to note an actual implementation, for example the libstdc++ implementation document here says:
For the version of shared_ptr in libstdc++ the compiler and library
are fixed, which makes things much simpler: we have an atomic CAS or
we don't, see Lock Policy below for details.
The Selecting Lock Policy section says (emphasis mine):
There is a single _Sp_counted_base class, which is a template
parameterized on the enum __gnu_cxx::_Lock_policy. The entire family
of classes is parameterized on the lock policy, right up to
__shared_ptr, __weak_ptr and __enable_shared_from_this. The actual std::shared_ptr class inherits from __shared_ptr with the lock policy
parameter selected automatically based on the thread model and
platform that libstdc++ is configured for, so that the best available
template specialization will be used. This design is necessary because
it would not be conforming for shared_ptr to have an extra template
parameter, even if it had a default value. The available policies are:
[...]
3._S_Single
This policy uses a non-reentrant add_ref_lock() with no locking. It is used when libstdc++ is built without --enable-threads.
and further says (emphasis mine):
For all three policies, reference count increments and decrements are
done via the functions in ext/atomicity.h, which detect if the program
is multi-threaded. If only one thread of execution exists in the
program then less expensive non-atomic operations are used.
So at least in this implementation you don't pay for what you don't use.
At least in the boost code on i386, boost::shared_ptr was implemented using an atomic CAS operation. This meant that while it has some overhead, it is quite low. I'd expect any implementation of std::shared_ptr to be similar.
In tight loops in high performance numerical code I found some speed-ups by switching to raw pointers and being really careful. But for normal code - I wouldn't worry about it.

c++: Loki StrongPtr looks unsafe to me, is that so?

I am currently looking at the most popular smart Ptr implementations such as boost shared and weak pointers aswell as loki Smart and Strong pointer since I want to implement my own and from what I understand Loki Strong pointer looks unsafe to me but I rather think that I understand it wrong so I'd like to discuss whether it's safe or not. The reason why I think it's not safe is that as far as I can tell it does not treat weak Pointers (that is a StrongPtr, where false indicates its weak) with enough care:
for instance the dereferencing functions:
PointerType operator -> ()
{
KP::OnDereference( GetPointer() ); //this only asserts by default as far as i know
//could be invalidated right here
return GetPointer();
}
In a multithreaded environment a weak pointer could be invalidated at any time, so that this function might return an invalidated Ptr.
As far as my understanding goes you would either have to create a strongPtr instance of the ptr you are dereferencing to ensure that it does not get invalidated half way through. I think thats also the reason why boost does not allow you to dereference a weak_ptr without creating a shared_ptr instance first. Lokis StrongPtr Constructor suffers from the same problem I think.
Is this a problem or am I reading the src wrong?
Regarding the use of assert, it's a programming error to use operator-> on an empty StrongPtr<> instance; i.e., it is the caller's responsibility to ensure that the StrongPtr<> instance is non-empty before dereferencing it. Why should anything more than an assert be needed? That said, if you deem some other behavior more appropriate than assert, then that's exactly what the policy is for.
This is a fundamental difference between preconditions and postconditions; here's a long but very good thread on the subject: comp.lang.c++.moderated: Exceptions. Read in particular the posts by D. Abrahams, as he explains in detail what I'm stating as understood fact. ;-]
Regarding the thread-safety of StrongPtr<>, I suspect most of Loki predates any serious thread-safety concerns; on the other hand, boost::shared_ptr<> and std::shared_ptr<> are explicitly guaranteed to be thread-safe, so I'm sure their implementations make for a "better" (though much more complicated) basis for study.
After reading carefully, I think I saw the rationale.
StrongPtr objects are dual in that they represent both Strong and Weak references.
The mechanism of assert works great on a Strong version. On a Weak version, it is the caller's responsability to ensure that the object referenced will live long enough. This can be achieved either:
automatically, if you know that you have a Strong version
manually, by creating a Strong instance
The benefit wrt std::shared_ptr is that you can avoid creating a new object when you already know that the item will outlive your use. It's an arguable design decision, but works great for experts (of which Alexandrescu undoubtebly is). It may not have been targetted at regular users (us) for which enforcing that a Strong version be taken would have been much better imho.
One could also argue that it's always easier to criticize with the benefit of hindsight. Loki, for all its greatness, is old.