I am having a hard time understanding memory barriers and why barriers are needed in the following code? (Taken from the wikipedia about double checked locking)
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire); // <- 1
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
std::atomic_thread_fence(std::memory_order_release); // <- 2
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}
Why does fence 2 exist? Doesn't the lock make sure that access to the m_instance is atomic and will not be affected by code reordering?
Also could someone please give me an example of a race condition that would be present if the barriers were removed and the m_instance variable were not an std::atomic?
Related
libc++ shared_ptr implementation release() for the sake of simplicity can be depicted as:
void release()
{
if (ref_count.fetch_sub(1, std::memory_order_acq_rel) == 1)
{
delete this;
}
}
Why doesn't libc++ split it into release decrement and acquire fence?
void release()
{
if (ref_count.fetch_sub(1, std::memory_order_release) == 1)
{
std::atomic_thread_fence(std::memory_order_acquire);
delete this;
}
}
as Boost recommends, which looks superior as it doesn't impose acquire mem order for all but the last decrement.
I have known that mutex can also bring the effect as memory barrier from here: Can mutex replace memory barriers, but I always see there is an memory barrier using in c++ singleton example as below, is the memory barrier unnecessary?
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex); // using mutex here
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
assert(tmp != nullptr);
std::atomic_thread_fence(std::memory_order_release); // using memory barrier here
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}
If you can use C++11, you do not need to program your own protection.
As also referenced here, all the needed stuff is already part of C++11. Copied from there:
For the singleton pattern, double-checked locking is not needed:
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
— § 6.7 [stmt.dcl] p4
Singleton& GetInstance() {
static Singleton s;
return s;
}
The implementation will provide a memory barrier or whatever to protect your concurrent access. So keep it simple as given in the example!
So I did some reading: https://en.wikipedia.org/wiki/Double-checked_locking and http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ . I found this code for using it
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
std::atomic_thread_fence(std::memory_order_release);
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}
and there is one thing that is not clear to me. Does it work differently than following code without fences?
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_acquire);
if (tmp == nullptr) {
tmp = new Singleton;
m_instance.store(tmp, std::memory_order_release);
}
}
return tmp;
}
What I mean if I replace fences with appropriate memory order in load/store, does it work the same?
The difference between the two constructs is explained in a follow-up article on the same site: Acquire and Release Fences Don't Work the Way You'd Expect. Basically, the fence guarantees that all the atomic stores after the fence will be visible "not before" all the stores before the fence. The store with memory_order_release parameter only makes such a guarantee for the stores to the variable concerned by the store instruction.
In your example, you only have one atomic, m_instance, so the two constructs are functionally equivalent and the one without the fences is probably more performant.
In C++ and the Perils of Double-Checked Locking, the authors give an example on how to implement the pattern correctly.
Singleton* Singleton::instance () {
Singleton* tmp = pInstance;
... // insert memory barrier (1)
if (tmp == 0) {
Lock lock;
tmp = pInstance;
if (tmp == 0) {
tmp = new Singleton;
... // insert memory barrier (2)
pInstance = tmp;
}
}
return tmp;
}
What I couldn't figure out, though, is if the first memory barrier must be after Singleton* tmp = pInstance;? (EDIT: To be clear, I understand that the barrier is needed. What I don't understand is if it must come after assigning tmp) If so why? Is the following not valid?
Singleton* Singleton::instance () {
... // insert memory barrier (1)
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* tmp = new Singleton;
... // insert memory barrier (2)
pInstance = tmp;
}
}
return pInstance;
}
It is essential. Otherwise, reads that occur after the if may be prefetched by the CPU before the copy, which would be a disaster. In the case where pInstance is not NULL and we don't acquire any locks, you must guarantee that reads that occur after the read of pInstance in the code are not re-ordered to before the read of pInstance.
Consider:
Singleton* tmp = pInstance;
if (tmp == 0) { ... }
return tmp->foo;
What happens if the CPU reads tmp->foo before tmp? For example, the CPU could optimize this to:
bool loaded = false;
int return_value = 0;
if (pInstance != NULL)
{ // do the fetch early
return_value = pInstance->foo;
loaded = true;
}
Singleton* tmp = pInstance;
if (tmp == 0) { ... }
return loaded ? return_value : tmp->foo;
Notice what this does? The read of tmp->foo has now moved to before the check if the pointer is non-NULL. This is a perfectly legal memory prefetch optimization (speculative read) that a CPU might do. But it's absolutely disastrous to the logic of double checked locking.
It is absolutely vital that code after the if (tmp == 0) not prefetch anything from before we see pInstance as non-NULL. So you need something to prevent the CPU from reorganizing the code's memory operations as above. A memory barrier does this.
Why are you still talking about the paper from 2004? C++ 11 guarantees static variables are initialized only once. Here is your fullly-working, 100% correct singleton (which, of course, is an anti-pattern on it's own):
static TheTon& TheTon::instance() {
static TheTon ton;
return ton;
}
I have been reading about thread safe singletons and the implementation I find everywhere has a getInstance() method something like this:
Singleton* getInstance()
{
if ( !initialized )
{
lock();
if ( !initialized )
{
instance = new Singleton();
initialized = true;
}
unlock();
}
return instance;
}
Is this actually thread safe?
Have I missed something or is there a small chance this function will return an uninitialized instance because 'initialized' may be reordered and set before instance?
This article is on a slightly different topic but the top answer describes why I think the above code is not thread safe:
Why is volatile not considered useful in multithreaded C or C++ programming?
Not a good idea. Look for double check locking. For instance:
http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405726
http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405772
It is indeed not thread safe, because after the pointer gets returned you still work with it, although the mutex is unlocked again.
What you can do is making the child class which inherits from singleton, thread safe. Then you're good to go.
Below is the code for a thread-safe singleton, using Double Check and temporary variable. A temporary variable is used to construct the object completely first and then assign it to pInstance.
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
pInstance = temp; // assign temp to pInstance
}
}
return pInstance;
}