What's the overhead from shared_ptr being thread-safe?

What's the overhead from shared_ptr being thread-safe? - c++

std::shared_ptr is guaranteed to be thread-safe. I don't know what mechanism the typical implementations use to ensure this, but surely it must have some overhead. And that overhead would be present even in the case that your application is single-threaded.
Is the above the case? And if so, does that means it violates the principle of "you don't pay for what you don't use", if you aren't using the thread-safety guarantees?

If we check out cppreference page for std::shared_ptr they state the following in the Implementation notes section:
To satisfy thread safety requirements, the reference counters are typically incremented and decremented using std::atomic::fetch_add with std::memory_order_relaxed.
It is interesting to note an actual implementation, for example the libstdc++ implementation document here says:
For the version of shared_ptr in libstdc++ the compiler and library
are fixed, which makes things much simpler: we have an atomic CAS or
we don't, see Lock Policy below for details.
The Selecting Lock Policy section says (emphasis mine):
There is a single _Sp_counted_base class, which is a template
parameterized on the enum __gnu_cxx::_Lock_policy. The entire family
of classes is parameterized on the lock policy, right up to
__shared_ptr, __weak_ptr and __enable_shared_from_this. The actual std::shared_ptr class inherits from __shared_ptr with the lock policy
parameter selected automatically based on the thread model and
platform that libstdc++ is configured for, so that the best available
template specialization will be used. This design is necessary because
it would not be conforming for shared_ptr to have an extra template
parameter, even if it had a default value. The available policies are:
[...]
3._S_Single
This policy uses a non-reentrant add_ref_lock() with no locking. It is used when libstdc++ is built without --enable-threads.
and further says (emphasis mine):
For all three policies, reference count increments and decrements are
done via the functions in ext/atomicity.h, which detect if the program
is multi-threaded. If only one thread of execution exists in the
program then less expensive non-atomic operations are used.
So at least in this implementation you don't pay for what you don't use.

At least in the boost code on i386, boost::shared_ptr was implemented using an atomic CAS operation. This meant that while it has some overhead, it is quite low. I'd expect any implementation of std::shared_ptr to be similar.
In tight loops in high performance numerical code I found some speed-ups by switching to raw pointers and being really careful. But for normal code - I wouldn't worry about it.

Related

Any equivalent of Rust's Arc::try_unwrap in c++?

This is a c++ ecosystem question - though it is easiest to ask to refer to Rust.
Are there stable implementations of a thread-safe / reference count smart pointers which support to "unwrap" it in a thread-safe manner - under the condition that there is ref-count of exactly 1, as in https://doc.rust-lang.org/std/sync/struct.Arc.html#method.try_unwrap.
Coarsely, speaking std::shared_ptr is similar to ARC, but this use-case seems not to be supported, nor does it appear straight forward to implement (e.g. see https://en.cppreference.com/w/cpp/memory/shared_ptr/use_count#Notes).

The exhaustive API of std::shared_ptr is available online (see cppreference) and as you can see there is no built-in support.
Furthermore, due to race-conditions with the promotion of std::weak_ptr, it is not possible to safely use use_count or unique to implement such functionality -- and unique was deprecated in C++17 and removed in C++20.
As a result, the functionality is simply not available with std::shared_ptr.
There may be other implementations of std::shared_ptr which offer this functionality -- though Boost's doesn't appear to.
As noted in the notes of use_count, the primary difficulty in implementing this function is the potential race-condition with weak_ptr promotion. That is, a naive:
// Susceptible to race-conditions, do not use!
if (pointer.use_count() == 1) {
return std::move(*pointer);
}
return std::nullopt;
Would not work because between the check and the actual move, a new shared owner may have appeared in another thread allowing concurrent access to the value.
The only ways to have this functionality safely are:
The shared_ptr implementation does not support weak pointers in the first place.
The shared_ptr implementation provides it, and ensures the absence of race condition with weak_ptr promotion.
I note that the latter typically requires locking the same lock used for weak_ptr promotion; hence why it cannot be provided externally.
A weaker variant could be implemented if unique were also guaranteeing the absence of weak_ptr. Although it would not be strictly equivalent as the presence of any weak_ptr would cause it to fail, it could still be useful in many scenarios where no weak_ptr is created.

Partial template specialization of std::atomic for smart pointers

Background
Since C++11, atomic operations on std::shared_ptr can be done via std::atomic_... methods found here, because the partial specialization as shown below is not possible:
std::atomic<std::shared_ptr<T>>
This is due to the fact that std::atomic only accepts TriviallyCopyable types, and std::shared_ptr (or std::weak_ptr) is not trivially copyable.
However, as of C++20, these methods have been deprecated, and got replaced by the partial template specialization of std::atomic for std::shared_ptr as described here.
Question
I am not sure of
Why std::atomic_... got replaced.
Techniques used to enable the partial template specialization of std::atomic for smart pointers.

Several proposals for atomic<shared_ptr> or something of that nature explain a variety of reasons. Of particular note is P0718, which tells us:
The C++ standard provides an API to access and manipulate specific shared_ptr objects atomically, i.e., without introducing data races when the same object is manipulated from multiple threads without further synchronization. This API is fragile and error-prone, as shared_ptr objects manipulated through this API are indistinguishable from other shared_ptr objects, yet subject to the restriction that they may be manipulated/accessed only through this API. In particular, you cannot dereference such a shared_ptr without first loading it into another shared_ptr object, and then dereferencing through the second object.
N4058 explains a performance issue with regard to how you have to go about implementing such a thing. Since shared_ptr is typically bigger than a single pointer in size, atomic access typically has to be implemented with a spinlock. So either every shared_ptr instance has a spinlock even if it never gets used atomically, or the implementation of those atomic functions has to have a lookaside table of spinlocks for individual objects. Or use a global spinlock.
None of these are problems if you have a type dedicated to being atomic.
atomic<shared_ptr> implementations can use the usual techniques for atomic<T> when T is too large to fit into a CPU atomic operation. They get to get around the TriviallyCopyable restriction by fiat: the standard requires that they exist and be atomic, so the implementation makes it so. C++ implementations don't have to play by the same rules as regular C++ programs.

Are c++ standard library containers like std::queue guaranteed to be reentrant?

I've seen people suggest that I should wrap standard containers such as std::queue and std::vector in a mutex lock or similar if i wish to use them.
I read that as needing a lock for each individual instance of a container being accessed by multiple threads, not per type or any utilization of the c++ standard library. But this assumes that the standard containers and standard library is guaranteed to be re-entrant.
Is there such a guarantee in the language?

The standard says:
Except where explicitly specified in this standard, it is implementation-defined which functions in the Standard C++ library may be recursively reentered.
Then it proceeds to specify that a function must be reentrant in, if I count them correctly, zero cases.
If one is to strictly follow the standard in this regard, the standard library suddenly becomes rather limited in its usefulness. A huge number of library functions call user-supplied functions. Writers of these functions, especially if those are themselves released as a library, in general don't know where they will be called from.
It is completely reasonable to assume that e.g. any constructor may be called from emplace_back of any standard container; if the user wishes to eliminate any uncertainty, he must refrain from any calls to emplace_back in any constructor. Any copy constructor is callable from e.g. vector::resize or sort, so one cannot manage vectors or do sorting in copy constructors. And so on, ad libitum.
This includes calling any third party component that might reasonably be using the standard library.
All these restrictions together probably mean that a large part of the standard library cannot be used in real world programs at all.
Update: this doesn't even start taking threads into consideration. With multiple threads, at least functions that deal with containers and algorithms must be reentrant. Imagine that std::vector::operator[] is not reentrant. This would mean that one cannot access two different vectors at the same time from two different threads! This is clearly not what the standard intends. I understand that this is your primary interest. To reiterate, no, I don't think there is reentrancy guarantee; and no, I don't think absence of such guarantee is reasonable in any way. --- end update.
My conclusion is that this is probably an oversight. The standard should mandate that all standard functions must be reentrant, unless otherwise specified.
I would
completely ignore the possibility of of any standard function being non-reentrant, except when it is clear that the function cannot be reasonably made reentrant.
raise an issue with the standards committee.

[Answer left for historical purposes, but see n.m.'s answer. There's no requirement on individual functions, but there is a single global non-requirement]
Yes, the standard guarantees reentrancy of member functions of standard containers.
Let me define what (non)-reentrancy means for functions. A reentrant function can be called with well-defined behavior on a thread while it is already on the call stack of that thread, i.e. executing. Obviously, this can only happen if the control flow temporarily left the reentrant function via a function call. If the behavior is not well-defined, the function is not reentrant.
(Leaf functions can't be said to be reentrant or non-reentrant, as the flow of control can only leave a leaf function by returning, but this isn't critical to the analysis).
Example:
int fac(int n) { return n==0 ? 1 : n * fac(n-1); }
The behavior of fac(3) is to return 6, even while fac(4) is running. Therefore, fac is reentrant.
The C++ Standard does define the behavior of member functions of standard containers. It also defines all restrictions under which such behavior is guaranteed. None of the member functions of standard containers have restrictions with respect to reentrancy. Therefore, any implementation which would restrict reentrancy is non-conformant.

C++ atomic with non-trivial type?

Reading the docs on boost::atomic and on std::atomic leaves me confused as to whether the atomic interface is supposed to support non-trivial types?
That is, given a (value-)type that can only be written/read by enclosing the read/write in a full mutex, because it has a non-trivial copy-ctor/assignment operator, is this supposed to be supported by std::atomic (as boost clearly states that it is UB).
Am I supposed to provide the specialization the docs talk about myself for non-trivial types?
Note: I was hitting on this because I have a cross-thread callback object boost::function<bool (void)> simpleFn; that needs to be set/reset atomically. Having a separate mutex / critical section or even wrapping both in a atomic-like helper type with simple set and get seem easy enough, but is there anything out of the box?

Arne's answer already points out that the Standard requires trivially copyable types for std::atomic.
Here's some rationale why atomics might not be the right tool for your problem in the first place: Atomics are the fundamental building primitives for building thread-safe data structures in C++. They are supposed to be the lowest-level building blocks for constructing more powerful data structures like thread-safe containers.
In particular, atomics are usually used for building lock-free data structures. For locking data structures primitives like std::mutex and std::condition_variable are a way better match, if only for the fact that it is very hard to write blocking code with atomics without introducing lots of busy waiting.
So when you think of std::atomic the first association should be lock-free (despite the fact that most of the atomic types are technically allowed to have blocking implementations). What you describe is a simple lock-based concurrent data structure, so wrapping it in an atomic should already feel wrong from a conceptual point of view.
Unfortunately, it is currently not clear how to express in the language that a data structure is thread-safe (which I guess was your primary intent for using atomic in the first place). Herb Sutter had some interesting ideas on this issue, but I guess for now we simply have to accept the fact that we have to rely on documentation to communicate how certain data structures behave with regards to thread-safety.

The standard specifies (§29.5,1) that
The type of the template argument T shall be trivially copyable
Meaning no, you cannot use types with non-trivial copy-ctor or assignment-op.
However, like with any template in namespace std, you are free to specialize the template for any type that it has not been specialized for by the implementation. So if you really want to use std::atomic<MyNonTriviallyCopyableType>, you have to provide the specialization yourself. How that specialization behaves is up to you, meaning, you are free to blow off your leg or the leg of anyone using that specialization, because it's just outside the scope of the standard.

C++11 STL containers and thread safety

I have trouble finding any up-to-date information on this.
Do C++11 versions of STL containers have some level of thread safety guaranteed?
I do expect that they don't, due to performance reasons. But then again, that's why we have both std::vector::operator[] and std::vector::at.

Since the existing answers don't cover it (only a comment does), I'll just mention 23.2.2 [container.requirements.dataraces] of the current C++ standard specification which says:
implementations are required to avoid data races when the contents of the contained object in different elements in the same sequence, excepting vector<bool>, are modified concurrently.
i.e. it's safe to access distinct elements of the same container, so for example you can have a global std::vector<std::future<int>> of ten elements and have ten threads which each write to a different element of the vector.
Apart from that, the same rules apply to containers as for the rest of the standard library (see 17.6.5.9 [res.on.data.races]), as Mr.C64's answer says, and additionally [container.requirements.dataraces] lists some non-const member functions of containers that can be called safely because they only return non-const references to elements, they don't actually modify anything (in general any non-const member function must be considered a modification.)

I think STL containers offer the following basic thread-safety guarantee:
simultaneous reads of the same object are OK
simultaneous read/writes of different objects are OK
But you have to use some form of custom synchronization (e.g. critical section) if you want to do something different, like e.g. simultaneous writes on the same object.

No. Check out PPL or Intel TBB for thread safe STL-like containers.
Like others have noted they have usual "multiple reader thread safety" but that is even pre C++11. Ofc this doesnt mean single writer multiple readers. It means 0 writers. :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js