Is c++ singleton need memory barrier while using mutex? - c++

I have known that mutex can also bring the effect as memory barrier from here: Can mutex replace memory barriers, but I always see there is an memory barrier using in c++ singleton example as below, is the memory barrier unnecessary?
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex); // using mutex here
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
assert(tmp != nullptr);
std::atomic_thread_fence(std::memory_order_release); // using memory barrier here
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}

If you can use C++11, you do not need to program your own protection.
As also referenced here, all the needed stuff is already part of C++11. Copied from there:
For the singleton pattern, double-checked locking is not needed:
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
— § 6.7 [stmt.dcl] p4
Singleton& GetInstance() {
static Singleton s;
return s;
}
The implementation will provide a memory barrier or whatever to protect your concurrent access. So keep it simple as given in the example!

Related

Memory barriers and the singleton pattern

I am having a hard time understanding memory barriers and why barriers are needed in the following code? (Taken from the wikipedia about double checked locking)
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire); // <- 1
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
std::atomic_thread_fence(std::memory_order_release); // <- 2
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}
Why does fence 2 exist? Doesn't the lock make sure that access to the m_instance is atomic and will not be affected by code reordering?
Also could someone please give me an example of a race condition that would be present if the barriers were removed and the m_instance variable were not an std::atomic?

Double checked locking: Fences and atomics

So I did some reading: https://en.wikipedia.org/wiki/Double-checked_locking and http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ . I found this code for using it
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
std::atomic_thread_fence(std::memory_order_release);
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}
and there is one thing that is not clear to me. Does it work differently than following code without fences?
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_acquire);
if (tmp == nullptr) {
tmp = new Singleton;
m_instance.store(tmp, std::memory_order_release);
}
}
return tmp;
}
What I mean if I replace fences with appropriate memory order in load/store, does it work the same?
The difference between the two constructs is explained in a follow-up article on the same site: Acquire and Release Fences Don't Work the Way You'd Expect. Basically, the fence guarantees that all the atomic stores after the fence will be visible "not before" all the stores before the fence. The store with memory_order_release parameter only makes such a guarantee for the stores to the variable concerned by the store instruction.
In your example, you only have one atomic, m_instance, so the two constructs are functionally equivalent and the one without the fences is probably more performant.

Acquire barrier in the double checked locking pattern

In C++ and the Perils of Double-Checked Locking, the authors give an example on how to implement the pattern correctly.
Singleton* Singleton::instance () {
Singleton* tmp = pInstance;
... // insert memory barrier (1)
if (tmp == 0) {
Lock lock;
tmp = pInstance;
if (tmp == 0) {
tmp = new Singleton;
... // insert memory barrier (2)
pInstance = tmp;
}
}
return tmp;
}
What I couldn't figure out, though, is if the first memory barrier must be after Singleton* tmp = pInstance;? (EDIT: To be clear, I understand that the barrier is needed. What I don't understand is if it must come after assigning tmp) If so why? Is the following not valid?
Singleton* Singleton::instance () {
... // insert memory barrier (1)
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* tmp = new Singleton;
... // insert memory barrier (2)
pInstance = tmp;
}
}
return pInstance;
}
It is essential. Otherwise, reads that occur after the if may be prefetched by the CPU before the copy, which would be a disaster. In the case where pInstance is not NULL and we don't acquire any locks, you must guarantee that reads that occur after the read of pInstance in the code are not re-ordered to before the read of pInstance.
Consider:
Singleton* tmp = pInstance;
if (tmp == 0) { ... }
return tmp->foo;
What happens if the CPU reads tmp->foo before tmp? For example, the CPU could optimize this to:
bool loaded = false;
int return_value = 0;
if (pInstance != NULL)
{ // do the fetch early
return_value = pInstance->foo;
loaded = true;
}
Singleton* tmp = pInstance;
if (tmp == 0) { ... }
return loaded ? return_value : tmp->foo;
Notice what this does? The read of tmp->foo has now moved to before the check if the pointer is non-NULL. This is a perfectly legal memory prefetch optimization (speculative read) that a CPU might do. But it's absolutely disastrous to the logic of double checked locking.
It is absolutely vital that code after the if (tmp == 0) not prefetch anything from before we see pInstance as non-NULL. So you need something to prevent the CPU from reorganizing the code's memory operations as above. A memory barrier does this.
Why are you still talking about the paper from 2004? C++ 11 guarantees static variables are initialized only once. Here is your fullly-working, 100% correct singleton (which, of course, is an anti-pattern on it's own):
static TheTon& TheTon::instance() {
static TheTon ton;
return ton;
}

Mutex when returning object value

If I understand how C++ compilers handle local variables then IsShutdownInProgress() does not need any locking since the shutdownInProgress static variable will be placed on the stack. Am I correct?
class MyClass
{
private:
// Irrelevant code commented away
static pthread_mutex_t mutex;
static bool shutdownInProgress;
public:
static void ShutdownIsInProgress()
{
pthread_mutex_lock(mutex);
shutdownInProgress = true;
pthread_mutex_unlock(mutex);
}
static bool IsShutdownInProgress()
{
// pthread_mutex_lock(mutex);
// pthread_mutex_unlock(mutex);
return shutdownInProgress;
}
}
Am I correct?
No. This will make a copy of it to return; but reading it to make that copy without synchronisation will give a data race, with undefined behaviour. You'll need to make a local copy of it with the mutex locked:
static bool IsShutdownInProgress()
{
pthread_mutex_lock(mutex);
bool result = shutdownInProgress;
pthread_mutex_unlock(mutex);
return result;
}
or, using a less error-prone RAII lock type:
static bool IsShutdownInProgress()
{
lock_guard lock(mutex);
return shutdownInProgress;
}
In C++11, you might consider std::atomic<bool> for more convenient, and perhaps more efficient, access to simple types from multiple threads.
Race conditions are unrelated to whether a variable is located on the heap or on the stack. A race condition is when one thread is modifying a variable (a memory location) and another thread is reading or modifying the same variable. There is no guarantee that the modification of a bool is atomic so the posted code has a race condition, and therefore undefined behaviour.
A fix would be to store the value of the bool when the mutex is held and return the variable:
static bool IsShutdownInProgress()
{
pthread_mutex_lock(&mutex);
bool result = shutdownInProgress;
pthread_mutex_unlock(&mutex);
return result;
}
c++11 introduced std::mutex and std::lock_guard that could be used and the use of the lock_guard would avoid the requirement of a temporary variable to store the bool value for return:
static std::mutex mtx_;
static bool IsShutdownInProgress()
{
std::lock_guard<std::mutex> lk(mtx_);
return shutdownInProgress;
}
c++11 also introduced std::atomic<> that would ensure the modification is atomic and avoid the need for an explicit lock:
static std::atomic<bool> shutdownInProgress;
static bool IsShutdownInProgress()
{
return shutdownInProgress;
}
If c++11 is unavailable to boost::atomic was introduced in v1.53.0 and boost also has equivalent boost::mutex and boost::lock_guard.
Yes, it needs a lock
C++11's memory model states you have a data race if any threads are writing a value at the same time as another thread reading it. This is because both a read and/or a write may not be atomic.
In this case you will return a local from the function, but to get that local the compiler needs to copy the value in shutdownInProgress, which may be concurrently being changed by another thread calling ShutdownIsInProgress().
An easy way to solve this is to make shutdownInProgress an atomic:
static std::atomic<bool> shutdownInProgress;
If you make it atomic, you don't need any locks at all for either function

Thread safe singleton in C++

I have been reading about thread safe singletons and the implementation I find everywhere has a getInstance() method something like this:
Singleton* getInstance()
{
if ( !initialized )
{
lock();
if ( !initialized )
{
instance = new Singleton();
initialized = true;
}
unlock();
}
return instance;
}
Is this actually thread safe?
Have I missed something or is there a small chance this function will return an uninitialized instance because 'initialized' may be reordered and set before instance?
This article is on a slightly different topic but the top answer describes why I think the above code is not thread safe:
Why is volatile not considered useful in multithreaded C or C++ programming?
Not a good idea. Look for double check locking. For instance:
http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405726
http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405772
It is indeed not thread safe, because after the pointer gets returned you still work with it, although the mutex is unlocked again.
What you can do is making the child class which inherits from singleton, thread safe. Then you're good to go.
Below is the code for a thread-safe singleton, using Double Check and temporary variable. A temporary variable is used to construct the object completely first and then assign it to pInstance.
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
pInstance = temp; // assign temp to pInstance
}
}
return pInstance;
}