In the C++ Seasoning video by Sean Parent https://youtu.be/W2tWOdzgXHA at 33:41 when starting to talk about “no raw synchronization primitives”, he brings an example to show that with raw synchronization primitives we will get it wrong. The example is a bad copy on write class:
template <typename T>
class bad_cow {
struct object_t {
explicit object_t(const T& x) : data_m(x) { ++count_m; }
atomic<int> count_m;
T data_m;
};
object_t* object_m;
public:
explicit bad_cow(const T& x) : object_m(new object_t(x)) { }
~bad_cow() { if (0 == --object_m->count_m) delete object_m; }
bad_cow(const bad_cow& x) : object_m(x.object_m) { ++object_m->count_m; }
bad_cow& operator=(const T& x) {
if (object_m->count_m == 1) {
// label #2
object_m->data_m = x;
} else {
object_t* tmp = new object_t(x);
--object_m->count_m; // bug #1
// this solves bug #1:
// if (0 == --object_m->count_m) delete object_m;
object_m = tmp;
}
return *this;
}
};
He then asks the audience to find the bug, which is the bug #1 as he confirms.
But a more obvious bug I guess, is when some thread is about to proceed to execute a line of code that I have denoted with label #2, while all of a sudden, some other thread just destroys the object and the destructor is called, which deletes object_m. So, the first thread will encounter a deleted memory location.
Am I right? I don’t seem so!
some other thread just destroys the object and the destructor is
called, which deletes object_m. So, the first thread will encounter a
deleted memory location.
Am I right? I don’t seem so!
Assuming the rest of the program isn't buggy, that shouldn't happen, because each thread should have its own reference-count object referencing the data_m object. Therefore, if thread B has a bad_cow object that references the data-object, then thread A cannot (or at least should not) ever delete that object, because the count_m field can never drop to zero as long as there remains at least one reference-count object pointing to it.
Of course, a buggy program might encounter the race condition you suggest -- for example, a thread might be holding only a raw pointer to the data-object, rather than a bad_cow that increments its reference count; or a buggy thread might call delete on the object explicitly rather than relying on the bad_cow class to handle deletion properly.
Your objection doesn't hold because *this at that moment is pointing to the object and the count is 1. The counter cannot get to 0 unless someone is not playing this game correctly (but in that case anything can happen anyway).
Another similar objection could be that while you're assigning to *this and the code being executed is inside the #2 branch another thread makes a copy of *this; even if this second thread is just reading the pointed object may see it mutating suddenly because of the assignment. The problem in this case is that count was 1 when entering the if in the thread doing the mutation but increased immediately after.
This is also however a bad objection because this code handles concurrency to the pointed-to object (like for example std::shared_ptr does) but you are not allowed to mutate and read a single instance of bad_cow class from different threads. In other words a single instance of bad_cow cannot be used from multiple threads if some of them are writers without adding synchronization. Distinct instances of bad_cow pointing to the same storage are instead safe to be used from different threads (after the fix #1, of course).
Related
This article by Jeff Preshing states that the double-checked locking pattern (DCLP) is fixed in C++11. The classical example used for this pattern is the singleton pattern but I happen to have a different use case and I am still lacking experience in handling "atomic<> weapons" - maybe someone over here can help me out.
Is the following piece of code a correct DCLP implementation as described by Jeff under "Using C++11 Sequentially Consistent Atomics"?
class Foo {
std::shared_ptr<B> data;
std::mutex mutex;
void detach()
{
if (data.use_count() > 1)
{
std::lock_guard<std::mutex> lock{mutex};
if (data.use_count() > 1)
{
data = std::make_shared<B>(*data);
}
}
}
public:
// public interface
};
No, this is not a correct implementation of DCLP.
The thing is that your outer check data.use_count() > 1 accesses the object (of type B with reference count), which can be deleted (unreferenced) in mutex-protected part. Any sort of memory fences cannot help there.
Why data.use_count() accesses the object:
Assume these operations have been executed:
shared_ptr<B> data1 = make_shared<B>(...);
shared_ptr<B> data = data1;
Then you have following layout (weak_ptr support is not shown here):
data1 [allocated with B::new()] data
--------------------------
[pointer type] ref; --> |atomic<int> m_use_count;| <-- [pointer type] ref
|B obj; |
--------------------------
Each shared_ptr object is just a pointer, which points to allocated memory region. This memory region embeds object of type B plus atomic counter, reflecting number of shared_ptr's, pointed to given object. When this counter becomes zero, memory region is freed(and B object is destroyed). Exactly this counter is returned by shared_ptr::use_count().
UPDATE: Execution, which can lead to accessing memory which is already freed (initially, two shared_ptr's point to the same object, .use_count() is 2):
/* Thread 1 */ /* Thread 2 */ /* Thread 3 */
Enter detach() Enter detach()
Found `data.use_count()` > 1
Enter critical section
Found `data.use_count()` > 1
Dereference `data`,
found old object.
Unreference old `data`,
`use_count` becomes 1
Delete other shared_ptr,
old object is deleted
Assign new object to `data`
Access old object
(for check `use_count`)
!! But object is freed !!
Outer check should only take a pointer to object for decide, whether to need aquire lock.
BTW, even your implementation would be correct, it has a little sence:
If data (and detach) can be accessed from several threads at the same time, object's uniqueness gives no advantages, since it can be accessed from the several threads. If you want to change object, all accesses to data should be protected by outer mutex, in that case detach() cannot be executed concurrently.
If data (and detach) can be accessed only by single thread at the same time, detach implementation doesn't need any locking at all.
This constitutes a data race if two threads invoke detach on the same instance of Foo concurrently, because std::shared_ptr<B>::use_count() (a read-only operation) would run concurrently with the std::shared_ptr<B> move-assignment operator (a modifying operation), which is a data race and hence a cause of undefined behavior. If Foo instances are never accessed concurrently, on the other hand, there is no data race, but then the std::mutex would be useless in your example. The question is: how does data's pointee become shared in the first place? Without this crucial bit of information, it is hard to tell if the code is safe even if a Foo is never used concurrently.
According to your source, I think you still need to add thread fences before the first test and after the second test.
std::shared_ptr<B> data;
std::mutex mutex;
void detach()
{
std::atomic_thread_fence(std::memory_order_acquire);
if (data.use_count() > 1)
{
auto lock = std::lock_guard<std::mutex>{mutex};
if (data.use_count() > 1)
{
std::atomic_thread_fence(std::memory_order_release);
data = std::make_shared<B>(*data);
}
}
}
I was reading the Mir project source code and I stumbled upon this piece of code :
void mir::frontend::ResourceCache::free_resource(google::protobuf::Message* key)
{
std::shared_ptr<void> value;
{
std::lock_guard<std::mutex> lock(guard);
auto const& p = resources.find(key);
if (p != resources.end())
{
value = p->second;
}
resources.erase(key);
}
}
I have seen this before in other projects as well. It holds a reference to the value in the map before its erasure, even when the bloc is protected by a lock_guard. I'm not sure why they hold a reference to the value by using std::shared_ptr value.
What are the repercussions if we remove the value = p->second ?
Will someone please enlighten me ?
This is the code http://bazaar.launchpad.net/~mir-team/mir/trunk/view/head:/src/frontend/resource_cache.cpp
My guess it that this is done to avoid running the destructor of value inside the locked code. This lock is meant to protect the modification of the map, and running some arbitrary code, such as the destructor of another object with it locked, is not needed nor wanted.
Just imagine that the destructor of value accesses indirectly, for whatever reason to the map, or to another thread-shared structure. There are chances that you end up in a deadlock.
The bottom end is: run as little code as possible from locked code, but not less. And never call a external, unknown, function (such as the shared_ptr deleter or a callback) from locked code.
The goal is to move the actual execution of the shared_ptr deleter till after the lock is released. This way, if the deleter (or the destructor using the default deleter) takes a long time, the lock is not held for that operation.
If you were to remove the value = p->second, the value would be destroyed while the lock is held. Since the lock protects the map, but not the actual values, it would hold the lock longer than is strictly necessary.
I believe I've got a good handle on at least the basics of multi-threading in C++, but I've never been able to get a clear answer on locking a mutex around shared resources in the constructor or the destructor. I was under the impression that you should lock in both places, but recently coworkers have disagreed. Pretend the following class is accessed by multiple threads:
class TestClass
{
public:
TestClass(const float input) :
mMutex(),
mValueOne(1),
mValueTwo("Text")
{
//**Does the mutex need to be locked here?
mValueTwo.Set(input);
mValueOne = mValueTwo.Get();
}
~TestClass()
{
//Lock Here?
}
int GetValueOne() const
{
Lock(mMutex);
return mValueOne;
}
void SetValueOne(const int value)
{
Lock(mMutex);
mValueOne = value;
}
CustomType GetValueTwo() const
{
Lock(mMutex);
return mValueOne;
}
void SetValueTwo(const CustomType type)
{
Lock(mMutex);
mValueTwo = type;
}
private:
Mutex mMutex;
int mValueOne;
CustomType mValueTwo;
};
Of course everything should be safe through the initialization list, but what about the statements inside the constructor? In the destructor would it be beneficial to do a non-scoped lock, and never unlock (essentially just call pthread_mutex_destroy)?
Multiple threads cannot construct the same object, nor should any thread be allowed to use the object before it's fully constructed. So, in sane code, construction without locking is safe.
Destruction is a slightly harder case. But again, proper lifetime management of your object can ensure that an object is never destroyed when there's a chance that some thread(s) might still use it.
A shared pointer can help in achieving this eg. :
construct the object in a certain thread
pass shared pointers to every thread that needs access to the object (including the thread that constructed it if needed)
the object will be destroyed when all threads have released the shared pointer
But obviously, other valid approaches exist. The key is to keep proper boundaries between the three main stages of an object's lifetime : construction, usage and destruction. Never allow an overlap between any of these stages.
They don't have to be locked in the constructor, as the only way anyone external can get access to that data at that point is if you pass them around from the constructor itself (or do some undefined behaviour, like calling a virtual method).
[Edit: Removed part about destructor, since as a comment rightfully asserts, you have bigger issues if you're trying to access resources from an object which might be dead]
Derived from this question and related to this question:
If I construct an object in one thread and then convey a reference/pointer to it to another thread, is it thread un-safe for that other thread to access the object without explicit locking/memory-barriers?
// thread 1
Obj obj;
anyLeagalTransferDevice.Send(&obj);
while(1); // never let obj go out of scope
// thread 2
anyLeagalTransferDevice.Get()->SomeFn();
Alternatively: is there any legal way to convey data between threads that doesn't enforce memory ordering with regards to everything else the thread has touched? From a hardware standpoint I don't see any reason it shouldn't be possible.
To clarify; the question is with regards to cache coherency, memory ordering and whatnot. Can Thread 2 get and use the pointer before Thread 2's view of memory includes the writes involved in constructing obj? To miss-quote Alexandrescu(?) "Could a malicious CPU designer and compiler writer collude to build a standard conforming system that make that break?"
Reasoning about thread-safety can be difficult, and I am no expert on the C++11 memory model. Fortunately, however, your example is very simple. I rewrite the example, because the constructor is irrelevant.
Simplified Example
Question: Is the following code correct? Or can the execution result in undefined behavior?
// Legal transfer of pointer to int without data race.
// The receive function blocks until send is called.
void send(int*);
int* receive();
// --- thread A ---
/* A1 */ int* pointer = receive();
/* A2 */ int answer = *pointer;
// --- thread B ---
int answer;
/* B1 */ answer = 42;
/* B2 */ send(&answer);
// wait forever
Answer: There may be a data race on the memory location of answer, and thus the execution results in undefined behavior. See below for details.
Implementation of Data Transfer
Of course, the answer depends on the possible and legal implementations of the functions send and receive. I use the following data-race-free implementation. Note that only a single atomic variable is used, and all memory operations use std::memory_order_relaxed. Basically this means, that these functions do not restrict memory re-orderings.
std::atomic<int*> transfer{nullptr};
void send(int* pointer) {
transfer.store(pointer, std::memory_order_relaxed);
}
int* receive() {
while (transfer.load(std::memory_order_relaxed) == nullptr) { }
return transfer.load(std::memory_order_relaxed);
}
Order of Memory Operations
On multicore systems, a thread can see memory changes in a different order as what other threads see. In addition, both compilers and CPUs may reorder memory operations within a single thread for efficiency - and they do this all the time. Atomic operations with std::memory_order_relaxed do not participate in any synchronization and do not impose any ordering.
In the above example, the compiler is allowed to reorder the operations of thread B, and execute B2 before B1, because the reordering has no effect on the thread itself.
// --- valid execution of operations in thread B ---
int answer;
/* B2 */ send(&answer);
/* B1 */ answer = 42;
// wait forever
Data Race
C++11 defines a data race as follows (N3290 C++11 Draft): "The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior." And the term happens before is defined earlier in the same document.
In the above example, B1 and A2 are conflicting and non-atomic operations, and neither happens before the other. This is obvious, because I have shown in the previous section, that both can happen at the same time.
That's the only thing that matters in C++11. In contrast, the Java Memory Model also tries to define the behavior if there are data races, and it took them almost a decade to come up with a reasonable specification. C++11 didn't make the same mistake.
Further Information
I'm a bit surprised that these basics are not well known. The definitive source of information is the section Multi-threaded executions and data races in the C++11 standard. However, the specification is difficult to understand.
A good starting point are Hans Boehm's talks - e.g. available as online videos:
Threads and Shared Variables in C++11
Getting C++ Threads Right
There are also a lot of other good resources, I have mentioned elsewhere, e.g.:
std::memory_order - cppreference.com
There is no parallel access to the same data, so there is no problem:
Thread 1 starts execution of Obj::Obj().
Thread 1 finishes execution of Obj::Obj().
Thread 1 passes reference to the memory occupied by obj to thread 2.
Thread 1 never does anything else with that memory (soon after, it falls into infinite loop).
Thread 2 picks-up the reference to memory occupied by obj.
Thread 2 presumably does something with it, undisturbed by thread 1 which is still infinitely looping.
The only potential problem is if Send didn't acts as a memory barrier, but then it wouldn't really be a "legal transfer device".
As others have alluded to, the only way in which a constructor is not thread-safe is if something somehow gets a pointer or reference to it before the constructor is finished, and the only way that would occur is if the constructor itself has code that registers the this pointer to some type of container which is shared across threads.
Now in your specific example, Branko Dimitrijevic gave a good complete explanation how your case is fine. But in the general case, I'd say to not use something until the constructor is finished, though I don't think there's anything "special" that doesn't happen until the constructor is finished. By the time it enters the (last) constructor in an inheritance chain, the object is pretty much fully "good to go" with all of its member variables being initialized, etc. So no worse than any other critical section work, but another thread would need to know about it first, and the only way that happens is if you're sharing this in the constructor itself somehow. So only do that as the "last thing" if you are.
It is only safe (sort of) if you wrote both threads, and know the first thread is not accessing it while the second thread is. For example, if the thread constructing it never accesses it after passing the reference/pointer, you would be OK. Otherwise it is thread unsafe. You could change that by making all methods that access data members (read or write) lock memory.
Read this question until now... Still will post my comments:
Static Local Variable
There is a reliable way to construct objects when you are in a multi-thread environment, that is using a static local variable (static local variable-CppCoreGuidelines),
From the above reference: "This is one of the most effective solutions to problems related to initialization order. In a multi-threaded environment the initialization of the static object does not introduce a race condition (unless you carelessly access a shared object from within its constructor)."
Also note from the reference, if the destruction of X involves an operation that needs to be synchronized you can create the object on the heap and synchronize when to call the destructor.
Below is an example I wrote to show the Construct On First Use Idiom, which is basically what the reference talks about.
#include <iostream>
#include <thread>
#include <vector>
class ThreadConstruct
{
public:
ThreadConstruct(int a, float b) : _a{a}, _b{b}
{
std::cout << "ThreadConstruct construct start" << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(2));
std::cout << "ThreadConstruct construct end" << std::endl;
}
void get()
{
std::cout << _a << " " << _b << std::endl;
}
private:
int _a;
float _b;
};
struct Factory
{
template<class T, typename ...ARGS>
static T& get(ARGS... args)
{
//thread safe object instantiation
static T instance(std::forward<ARGS>(args)...);
return instance;
}
};
//thread pool
class Threads
{
public:
Threads()
{
for (size_t num_threads = 0; num_threads < 5; ++num_threads) {
thread_pool.emplace_back(&Threads::run, this);
}
}
void run()
{
//thread safe constructor call
ThreadConstruct& thread_construct = Factory::get<ThreadConstruct>(5, 10.1);
thread_construct.get();
}
~Threads()
{
for(auto& x : thread_pool) {
if(x.joinable()) {
x.join();
}
}
}
private:
std::vector<std::thread> thread_pool;
};
int main()
{
Threads thread;
return 0;
}
Output:
ThreadConstruct construct start
ThreadConstruct construct end
5 10.1
5 10.1
5 10.1
5 10.1
5 10.1
There is a scenario that i need to solve with shared_ptr and weak_ptr smart pointers.
Two threads, thread 1 & 2, are using a shared object called A. Each of the threads have a reference to that object. thread 1 decides to delete object A but at the same time thread 2 might be using it. If i used shared_ptr to hold object A's references in each thread, the object wont get deleted at the right time.
What should i do to be able to delete the object when its supposed to and prevent an error in other threads that using that object at the same time?
There's 2 cases:
One thread owns the shared data
If thread1 is the "owner" of the object and thread2 needs to just use it, store a weak_ptr in thread2. Weak pointers do not participate in reference counting, instead they provide a way to access a shared_ptr to the object if the object still exists. If the object doesn't exist, weak_ptr will return an empty/null shared_ptr.
Here's an example:
class CThread2
{
private:
boost::weak_ptr<T> weakPtr
public:
void SetPointer(boost::shared_ptr<T> ptrToAssign)
{
weakPtr = ptrToAssign;
}
void UsePointer()
{
boost::shared_ptr<T> basePtr;
basePtr = weakPtr.lock()
if (basePtr)
{
// pointer was not deleted by thread a and still exists,
// so it can be used.
}
else
{
// thread1 must have deleted the pointer
}
}
};
My answer to this question (link) might also be useful.
The data is truly owned by both
If either of your threads can perform deletion, than you can not have what I describe above. Since both threads need to know the state of the pointer in addition to the underlying object, this may be a case where a "pointer to a pointer" is useful.
boost::shared_ptr< boost::shared_ptr<T> >
or (via a raw ptr)
shared_ptr<T>* sharedObject;
or just
T** sharedObject;
Why is this useful?
You only have one referrer to T (in fact shared_ptr is pretty redundant)
Both threads can check the status of the single shared pointer (is it NULL? Was it deleted by the other thread?)
Pitfalls:
- Think about what happens when both sides try to delete at the same time, you may need to lock this pointer
Revised Example:
class CAThread
{
private:
boost::shared_ptr<T>* sharedMemory;
public:
void SetPointer(boost::shared_ptr<T>* ptrToAssign)
{
assert(sharedMemory != NULL);
sharedMemory = ptrToAssign;
}
void UsePointer()
{
// lock as needed
if (sharedMemory->get() != NULL)
{
// pointer was not deleted by thread a and still exists,
// so it can be used.
}
else
{
// other thread must have deleted the pointer
}
}
void AssignToPointer()
{
// lock as needed
sharedMemory->reset(new T);
}
void DeletePointer()
{
// lock as needed
sharedMemory->reset();
}
};
I'm ignoring all the concurrency issues with the underlying data, but that's not really what you're asking about.
Qt has a QPointer class that does this. The pointers are automatically set to 0 if what they're pointed at is deleted.
(Of course, this would only work if you're interested in integrating Qt into your project.)