Synchronizing global reference-counted resource with atomics -- is relaxed appropriate here? - c++

I have a global reference-counted object obj that I want to protect from data races by using atomic operations:
T* obj; // initially nullptr
std::atomic<int> count; // initially zero
My understanding is that I need to use std::memory_order_release after I write to obj, so that the other threads will be aware of it being created:
void increment()
{
if (count.load(std::memory_order_relaxed) == 0)
obj = std::make_unique<T>();
count.fetch_add(1, std::memory_order_release);
}
Likewise, I need to use std::memory_order_acquire when reading the counter, to ensure the thread has visibility of obj being changed:
void decrement()
{
count.fetch_sub(1, std::memory_order_relaxed);
if (count.load(std::memory_order_acquire) == 0)
obj.reset();
}
I am not convinced that the code above is correct, but I'm not entirely sure why. I feel like after obj.reset() is called, there should be a std::memory_order_release operation to inform other threads about it. Is that correct?
Are there other things that can go wrong, or is my understanding of atomic operations in this case completely wrong?

It is wrong regardless of memory ordering.
As #MaartenBamelis pointed out for concurrent calling of increment the object is constructed twice. And the same is true for concurrent decrement: object is reset twice (which may result in double destructor call).
Note that there's disagreement between T* obj; declaration and using it as unique_ptr but neither raw pointer not unique pointer are safe for concurrent modification. In practice, reset or delete will check pointer for null, then delete and set it to null, and these steps are not atomic.
fetch_add and fetch_sub are fetch and op instead of just op for a reason: if you don't use the value observed during operation, it is likely to be a race.

This code is inherently racey. If two threads call increment at the same time when count is initially 0, both will see count as 0, and both will create obj (and race to see which copy is kept; given unique_ptr has no special threading protections, terrible things can happen if two of them set it at once).
If two threads decrement at the same time (holding the last two references), and finish the fetch_sub before either calls load, both will reset obj (also bad).
And if a decrement finishes the fetch_sub (to 0), then another thread increments before the decrement load occurs, the increment will see the count as 0 and reinitialize. Whether the object is cleared after being replaced, or replaced after being cleared, or some horrible mixture of the two, will depend on whether increment's fetch_add runs before or after decrement's load.
In short: If you find yourself using two separate atomic operations on the same variable, and testing the result of one of them (without looping, as in a compare and swap loop), you're wrong.
More correct code would look like:
void increment() // Still not safe
{
// acquire is good for the != 0 case, for a later read of obj
// or would be if the other writer did a release *after* constructing an obj
if (count.fetch_add(1, std::memory_order_acquire) == 0)
obj = std::make_unique<T>();
}
void decrement()
{
if (count.fetch_sub(1, std::memory_order_acquire) == 1)
obj.reset();
}
but even then it's not reliable; there's no guarantee that, when count is 0, two threads couldn't call increment, both of them fetch_add at once, and while exactly one of them is guaranteed to see the count as 0, said 0-seeing thread might end up delayed while the one that saw it as 1 assumes the object exists and uses it before it's initialized.
I'm not going to swear there's no mutex-free solution here, but dealing with the issues involved with atomics is almost certainly not worth the headache.
It might be possible to confine the mutex to inside the if() branches, but taking a mutex is also an atomic RMW operation (and not much more than that for a good lightweight implementation) so this doesn't necessarily help a huge amount. If you need really good read-side scaling, you'd want to look into something like RCU instead of a ref-count, to allow readers to truly be read-only, not contending with other readers.

I don't really see a simple way of implementing a reference-counted resource with atomics. Maybe there's some clever way that I haven't thought of yet, but in my experience, clever does not equal readable.
My advice would be to implement it first using a mutex. Then you simply lock the mutex, check the reference count, do whatever needs to be done, and unlock again. It's guaranteed correct:
std::mutex mutex;
int count;
std::unique_ptr<T> obj;
void increment()
{
auto lock = std::scoped_lock{mutex};
if (++count == 1) // Am I the first reference?
obj = std::make_unique<T>();
}
void decrement()
{
auto lock = std::scoped_lock{mutex};
if (--count == 0) // Was I the last reference?
obj.reset();
}
Although at this point, I would just use a std::shared_ptr instead of managing the reference count myself:
std::mutex mutex;
std::weak_ptr<T> obj;
std::shared_ptr<T> acquire()
{
auto lock = std::scoped_lock{mutex};
auto sp = obj.lock();
if (!sp)
obj = sp = std::make_shared<T>();
return sp;
}
I believe this also makes it safe when exceptions may be thrown when constructing the object.
Mutexes are surprisingly performant, so I expect that locking code is plenty quick unless you have a highly specialized use case where you need code to be lock-free.

Related

Is implementation of double checked singleton thread-safe?

I know that the common implementation of thread-safe singleton looks like this:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
pInstance = temp; // assign temp to pInstance
}
}
return pInstance;
}
But why they say that it is a thread-safe implementation?
For example, the first thread can pass both tests on pInstance == 0, create new Singleton and assign it to the temp pointer and then start assignment pInstance = temp (as far as I know, the pointer assignment operation is not atomic).
At the same time the second thread tests the first pInstance == 0, where pInstance is assigned only half. It's not nullptr but not a valid pointer too, which then returned from the function.
Can such a situation happen? I didn't find the answer anywhere and seems that it is a quite correct implementation and I don't understand anything
It's not safe by C++ concurrency rules, since the first read of pInstance is not protected by a lock or something similar and thus does not synchronise correctly with the write (which is protected). There is therefore a data race and thus Undefined Behaviour. One of the possible results of this UB is precisely what you've identified: the first check reading a garbage value of pInstance which is just being written by a different thread.
The common explanation is that it saves on acquiring the lock (a potentially time-expensive operation) in the more common case (pInstance already valid). However, it's not safe.
Since C++11 and beyond guarantees initialisation of function-scope static variables happens only once and is thread-safe, the best way to create a singleton in C++ is to have a static local in the function:
Singleton& Singleton::instance() {
static Singleton s;
return s;
}
Note that there's no need for either dynamic allocation or a pointer return type.
As Voo mentioned in comments, the above assumes pInstance is a raw pointer. If it was std::atomic<Singleton*> instead, the code would work just fine as intended. Of course, it's then a question whether an atomic read is that much slower than obtaining the lock, which should be answered via profiling. Still, it would be a rather pointless exercise, since the static local variables is better in all but very obscure cases.

Is this a correct C++11 double-checked locking version with shared_ptr?

This article by Jeff Preshing states that the double-checked locking pattern (DCLP) is fixed in C++11. The classical example used for this pattern is the singleton pattern but I happen to have a different use case and I am still lacking experience in handling "atomic<> weapons" - maybe someone over here can help me out.
Is the following piece of code a correct DCLP implementation as described by Jeff under "Using C++11 Sequentially Consistent Atomics"?
class Foo {
std::shared_ptr<B> data;
std::mutex mutex;
void detach()
{
if (data.use_count() > 1)
{
std::lock_guard<std::mutex> lock{mutex};
if (data.use_count() > 1)
{
data = std::make_shared<B>(*data);
}
}
}
public:
// public interface
};
No, this is not a correct implementation of DCLP.
The thing is that your outer check data.use_count() > 1 accesses the object (of type B with reference count), which can be deleted (unreferenced) in mutex-protected part. Any sort of memory fences cannot help there.
Why data.use_count() accesses the object:
Assume these operations have been executed:
shared_ptr<B> data1 = make_shared<B>(...);
shared_ptr<B> data = data1;
Then you have following layout (weak_ptr support is not shown here):
data1 [allocated with B::new()] data
--------------------------
[pointer type] ref; --> |atomic<int> m_use_count;| <-- [pointer type] ref
|B obj; |
--------------------------
Each shared_ptr object is just a pointer, which points to allocated memory region. This memory region embeds object of type B plus atomic counter, reflecting number of shared_ptr's, pointed to given object. When this counter becomes zero, memory region is freed(and B object is destroyed). Exactly this counter is returned by shared_ptr::use_count().
UPDATE: Execution, which can lead to accessing memory which is already freed (initially, two shared_ptr's point to the same object, .use_count() is 2):
/* Thread 1 */ /* Thread 2 */ /* Thread 3 */
Enter detach() Enter detach()
Found `data.use_count()` > 1
Enter critical section
Found `data.use_count()` > 1
Dereference `data`,
found old object.
Unreference old `data`,
`use_count` becomes 1
Delete other shared_ptr,
old object is deleted
Assign new object to `data`
Access old object
(for check `use_count`)
!! But object is freed !!
Outer check should only take a pointer to object for decide, whether to need aquire lock.
BTW, even your implementation would be correct, it has a little sence:
If data (and detach) can be accessed from several threads at the same time, object's uniqueness gives no advantages, since it can be accessed from the several threads. If you want to change object, all accesses to data should be protected by outer mutex, in that case detach() cannot be executed concurrently.
If data (and detach) can be accessed only by single thread at the same time, detach implementation doesn't need any locking at all.
This constitutes a data race if two threads invoke detach on the same instance of Foo concurrently, because std::shared_ptr<B>::use_count() (a read-only operation) would run concurrently with the std::shared_ptr<B> move-assignment operator (a modifying operation), which is a data race and hence a cause of undefined behavior. If Foo instances are never accessed concurrently, on the other hand, there is no data race, but then the std::mutex would be useless in your example. The question is: how does data's pointee become shared in the first place? Without this crucial bit of information, it is hard to tell if the code is safe even if a Foo is never used concurrently.
According to your source, I think you still need to add thread fences before the first test and after the second test.
std::shared_ptr<B> data;
std::mutex mutex;
void detach()
{
std::atomic_thread_fence(std::memory_order_acquire);
if (data.use_count() > 1)
{
auto lock = std::lock_guard<std::mutex>{mutex};
if (data.use_count() > 1)
{
std::atomic_thread_fence(std::memory_order_release);
data = std::make_shared<B>(*data);
}
}
}

Is reference counting thread safe

For example consider
class ProcessList {
private
std::vector<std::shared_ptr<SomeObject>> list;
Mutex mutex;
public:
void Add(std::shared_ptr<SomeObject> o) {
Locker locker(&mutex); // Start of critical section. Locker release so will the mutex - In house stuff
list.push_back(std::make_shared<SomeObject>(o).
}
void Remove(std::shared_ptr<SomeObject> o) {
Locker locker(&mutex); // Start of critical section. Locker release so will the mutex - In house stuff
// Code to remove said object but indirectly modifying the reference count in copy below
}
void Process() {
std::vector<std::shared_ptr<SomeObject>> copy;
{
Locker locker(&mutes);
copy = std::vector<std::shared_ptr<SomeObject>>(
list.begin(), list.end()
)
}
for (auto it = copy.begin(), it != copy.end(); ++it) {
it->Procss(); // This may take time/add/remove to the list
}
}
};
One thread runs Process. Multiple threads run add/remove.
Will the reference count be safe and always correct - or should a mutex be placed around that?
Yes, the standard (at ยง20.8.2.2, at least as of N3997) that's intended to require that the reference counting be thread-safe.
For your simple cases like Add:
void Add(std::shared_ptr<SomeObject> o) {
Locker locker(&mutex);
list.push_back(std::make_shared<SomeObject>(o).
}
...the guarantees in the standard are strong enough that you shouldn't need the mutex, so you can just have:
void Add(std::shared_ptr<SomeObject> o) {
list.push_back(std::make_shared<SomeObject>(o).
}
For some of your operations, it's not at all clear that thread-safe reference counting necessarily obviates your mutex though. For example, inside of your Process, you have:
{
Locker locker(&mutes);
copy = std::vector<std::shared_ptr<SomeObject>>(
list.begin(), list.end()
)
}
This carries out the entire copy as an atomic operation--nothing else can modify the list during the copy. This assures that your copy gives you a snapshot of the list precisely as it was when the copy was started. If you eliminate the mutex, the reference counting will still work, but your copy might reflect changes made while the copy is being made.
In other words, the thread safety of the shared_ptr only assures that each individual increment or decrement is atomic--it doesn't assure that manipulations of the entire list are atomic as the mutex does in this case.
Since your list is actually a vector, you should be able to simplify the copying code a bit to just copy = list.
Also note that your Locker seems to be a subset of what std::lock_guard provides. It appears you could use:
std::lock_guard<std::mutex> locker(&mutes);
...in its place quite easily.
It will be an overhead to have a mutex for reference counting.
Internally, mutexes use atomic operations, basically a mutex does internal thread safe reference counting. So you can just use atomics for your reference counting directly instead of using a mutex and essentially doing double the work.
Unless your CPU architecture have an atomic increment/decrement and you used that for reference count, then, no, it's not safe; C++ makes no guarantee on the thread safety of x++/x-- operation on any of its standard types.
Use atomic<int> if your compiler supports them (C++11), otherwise you'll need to have the lock.
Further references:
https://www.threadingbuildingblocks.org/docs/help/tbb_userguide/Atomic_Operations.htm

Safe destruction of a map containing shared_ptr

I was reading the Mir project source code and I stumbled upon this piece of code :
void mir::frontend::ResourceCache::free_resource(google::protobuf::Message* key)
{
std::shared_ptr<void> value;
{
std::lock_guard<std::mutex> lock(guard);
auto const& p = resources.find(key);
if (p != resources.end())
{
value = p->second;
}
resources.erase(key);
}
}
I have seen this before in other projects as well. It holds a reference to the value in the map before its erasure, even when the bloc is protected by a lock_guard. I'm not sure why they hold a reference to the value by using std::shared_ptr value.
What are the repercussions if we remove the value = p->second ?
Will someone please enlighten me ?
This is the code http://bazaar.launchpad.net/~mir-team/mir/trunk/view/head:/src/frontend/resource_cache.cpp
My guess it that this is done to avoid running the destructor of value inside the locked code. This lock is meant to protect the modification of the map, and running some arbitrary code, such as the destructor of another object with it locked, is not needed nor wanted.
Just imagine that the destructor of value accesses indirectly, for whatever reason to the map, or to another thread-shared structure. There are chances that you end up in a deadlock.
The bottom end is: run as little code as possible from locked code, but not less. And never call a external, unknown, function (such as the shared_ptr deleter or a callback) from locked code.
The goal is to move the actual execution of the shared_ptr deleter till after the lock is released. This way, if the deleter (or the destructor using the default deleter) takes a long time, the lock is not held for that operation.
If you were to remove the value = p->second, the value would be destroyed while the lock is held. Since the lock protects the map, but not the actual values, it would hold the lock longer than is strictly necessary.

Is this a valid alternative to the double checked locking?

I would like to use singleton pattern in a multithreaded program. Double-checked locking method seems suitable for its efficiency, however this method is broken and not easy to get right.
I write the following code hoping that it works as an alternative to the double-checked locking. Is it a correct implementation of a thread-safe singleton pattern?
static bool created = false;
static Instance *instance = 0;
Instance *GetInstance() {
if (!created) {
Lock lock; // acquire a lock, parameters are omitted for simplicity
if (!instance) {
instance = new Instance;
} else {
created = true;
}
}
return instance;
}
The first call will create Instance. The second call will set created to true. And finally, all other calls will return a well initialized instance.
http://voofie.com/content/192/alternative-to-double-checked-locking-and-the-singleton-pattern/
No, this doesn't help. If the writes to created and instance are non-atomic then there is no guarantee that the values are visible to a thread that doesn't lock the mutex.
e.g. Thread 1 calls getInstance. created is false, and instance is null, so it locks the mutex and creates a new instance. Thread 1 calls getInstance again, and this time sets created to true. Thread 2 now calls getInstance. By the vagaries of the processor's memory management it sees created as true, but there is no guarantee that it also sees instance as non-null, and even if it does there is no guarantee that the memory values for the pointed-to instance are consistent.
If you're not using atomics then you need to use mutexes, and must use them for all accesses to a protected variable.
Additional info: If you are using mutexes, then the compiler and runtime work together to ensure that when one thread releases a mutex lock and another thread acquires a lock on that same mutex then the second thread can see all the writes done by the first. This is not true for non-atomic accesses, and may or may not be true for atomic accesses, depending on what memory ordering constraints the compiler and runtime guarantee for you (with C++11 atomics you can choose the ordering constraints).
It has the same reliability of the double-checked locking.
You can get more with "triple check", or even "quadruple-check", but full reliability can be demonstrated to be impossible.
Please note that declaring a local-static variable will make the compiler to implement itself your same logic.
#include<memory>
#include "Instance.h" //or whatever...
Instance* GetInstance()
{
static std::unique_ptr<Instance> p(new Instance);
return p.get();
}
If the compiler is configured for multithreading environment, it sould protect the static p with a mutex and manage the lock when initializing p at the very first call. It also should chain p destruction to the tail of the "at_exit" chain, so that -at program end- proper destruction will be performed.
[EDIT]
Since this is a requirement for C++11 and is implemented only in some C++03 pre-standard, check the compiler implementation and settings.
Right now, I can only ensure MinGW 4.6 on and VS2010 already did it.
No. There's absolutely no difference between your code and double
checked locking. The correct implementation is:
static std::mutex m;
Singleton&
Singleton::instance()
{
static Singleton* theOneAndOnly;
std::lock_guard l(m);
if (theOneAndOnly == NULL)
theOneAndOnly = new Singleton;
return *theOneAndOnly;
}
It's hard to imagine a case where this would cause a problem, and it is
guaranteed. You do aquire the lock each time, but aquiring an
uncontested mutex should be fairly cheap, you're not accessing
Singleton's that much, and if you do end up having to access it in the
middle of a tight loop, there's nothing to stop you from acquiring a
reference to it before entering the loop, and using it.
This code contains a race-condition, in that created can be read while it is concurrently being written to by a different thread.
As a result, it has undefined behaviour, and is not a valid way of writing that code.
As KennyTM pointed out in the comments, a far better alternative is:
Instance* GetInstance() { static Instance instance; return &instance; }
your solution will work fine, but it does one more check.
the right double check locking looks like,
static Instance *instance = 0;
Instance *GetInstance() {
if (instance == NULL) //first check.
{
Lock lock; //scope lock.
if (instance == NULL) //second check, the second check must under the lock.
instance = new Instance;
}
return instance;
}
the double check locking will have a good performance for it does not acquire the lock every time. and it is thread-safe the locking will gurentee there is only one instance created.