In C++ and the Perils of Double-Checked Locking, the authors give an example on how to implement the pattern correctly.
Singleton* Singleton::instance () {
Singleton* tmp = pInstance;
... // insert memory barrier (1)
if (tmp == 0) {
Lock lock;
tmp = pInstance;
if (tmp == 0) {
tmp = new Singleton;
... // insert memory barrier (2)
pInstance = tmp;
}
}
return tmp;
}
What I couldn't figure out, though, is if the first memory barrier must be after Singleton* tmp = pInstance;? (EDIT: To be clear, I understand that the barrier is needed. What I don't understand is if it must come after assigning tmp) If so why? Is the following not valid?
Singleton* Singleton::instance () {
... // insert memory barrier (1)
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* tmp = new Singleton;
... // insert memory barrier (2)
pInstance = tmp;
}
}
return pInstance;
}
It is essential. Otherwise, reads that occur after the if may be prefetched by the CPU before the copy, which would be a disaster. In the case where pInstance is not NULL and we don't acquire any locks, you must guarantee that reads that occur after the read of pInstance in the code are not re-ordered to before the read of pInstance.
Consider:
Singleton* tmp = pInstance;
if (tmp == 0) { ... }
return tmp->foo;
What happens if the CPU reads tmp->foo before tmp? For example, the CPU could optimize this to:
bool loaded = false;
int return_value = 0;
if (pInstance != NULL)
{ // do the fetch early
return_value = pInstance->foo;
loaded = true;
}
Singleton* tmp = pInstance;
if (tmp == 0) { ... }
return loaded ? return_value : tmp->foo;
Notice what this does? The read of tmp->foo has now moved to before the check if the pointer is non-NULL. This is a perfectly legal memory prefetch optimization (speculative read) that a CPU might do. But it's absolutely disastrous to the logic of double checked locking.
It is absolutely vital that code after the if (tmp == 0) not prefetch anything from before we see pInstance as non-NULL. So you need something to prevent the CPU from reorganizing the code's memory operations as above. A memory barrier does this.
Why are you still talking about the paper from 2004? C++ 11 guarantees static variables are initialized only once. Here is your fullly-working, 100% correct singleton (which, of course, is an anti-pattern on it's own):
static TheTon& TheTon::instance() {
static TheTon ton;
return ton;
}
Related
I have known that mutex can also bring the effect as memory barrier from here: Can mutex replace memory barriers, but I always see there is an memory barrier using in c++ singleton example as below, is the memory barrier unnecessary?
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex); // using mutex here
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
assert(tmp != nullptr);
std::atomic_thread_fence(std::memory_order_release); // using memory barrier here
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}
If you can use C++11, you do not need to program your own protection.
As also referenced here, all the needed stuff is already part of C++11. Copied from there:
For the singleton pattern, double-checked locking is not needed:
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
— § 6.7 [stmt.dcl] p4
Singleton& GetInstance() {
static Singleton s;
return s;
}
The implementation will provide a memory barrier or whatever to protect your concurrent access. So keep it simple as given in the example!
I have been looking at a lock free single producer/single consumer circular buffer when I thought about speculative execution and its effect on the simple code.
With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.
Here is the Producer code:
bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); //(1)
const auto next_tail = increment(current_tail);
if(next_tail != _head.load(std::memory_order_acquire)) //(2)
{
_array[current_tail] = item; //(3)
_tail.store(next_tail, std::memory_order_release); //(4)
return true;
}
return false; // full queue
}
Here is the Consumer code:
bool pop(Element& item)
{
const auto current_head = _head.load(std::memory_order_relaxed); //(1)
if(current_head == _tail.load(std::memory_order_acquire)) //(2)
return false; // empty queue
item = _array[current_head]; //(3)
_head.store(increment(current_head), std::memory_order_release); //(4)
return true;
}
The Question
What if the push() would be compiled as the following function due to speculative execution:
bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); // 1
const auto next_tail = increment(current_tail);
//The load is performed before the test, it is valid
const auto head = _head.load(std::memory_order_acquire);
//Here is the speculation, the CPU speculate that the test will succeed
//store due to speculative execution AND it respects the memory order due to read-acquire
_array[current_tail] = item;
_tail.store(next_tail, std::memory_order_release);
//Note that in this case the test checks if you it has to restore the memory back
if(next_tail == head)//the code was next_tail != _head.load(std::memory_order_acquire)
{
//We restore the memory back but the pop may have been called before and see an invalid memory
_array[current_tail - 1] = item;
_tail.store(next_tail - 1, std::memory_order_release);
return true;
}
return false; // full queue
}
To me, to be perfectly valid the push function should make sure the barrier is issued after the condition success:
bool push(const Element& item)
{
const auto current_tail = _tail.load(std::memory_order_relaxed); // 1
const auto next_tail = increment(current_tail);
if(next_tail != _head.load(std::memory_order_relaxed)) // 2
{
//Here we are sure that nothing can be reordered before the condition
std::atomic_thread_fence(std::memory_order_acquire); //2.1
_array[current_tail] = item; // 3
_tail.store(next_tail, std::memory_order_release); // 4
return true;
}
return false; // full queue
}
re: your proposed reordering: no, the compiler can't invent writes to atomic variables.
Runtime speculation also can't invent writes that actually become visible to other threads. It can put whatever it wants in its own private store buffer, but the correctness of earlier branches must be checked before a store can become visible to other threads.
Normally this works by in-order retirement: an instruction can only retire (become non-speculative) once all previous instructions are retired/non-speculative. A store can't commit from the store buffer to L1d cache until after the store instruction retires.
re: the title: no, speculative execution still has to respect the memory model. If a CPU wants to speculatively load past an incomplete acquire-load, it can, but only if it checks to make sure those load results are still valid when they're "officially" allowed to happen.
x86 CPUs do in practice do this, because the strong x86 memory model means that all loads are acquire-loads, so any out-of-order loading has to be speculative and rolled back if it's not valid. (This is why you can get memory-order mis-speculation pipeline nukes.)
So asm works the way the ISA rules say it works, and C++ compilers know that. Compilers use this to implement the C++ memory model on top of the target ISA.
If you do an acquire-load in C++, it really works as an acquire-load.
You can mentally model your logic for possible compile-time + run-time reordering according to the C++ reordering rules as written. See http://preshing.com/20120913/acquire-and-release-semantics/.
I am having a hard time understanding memory barriers and why barriers are needed in the following code? (Taken from the wikipedia about double checked locking)
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire); // <- 1
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
std::atomic_thread_fence(std::memory_order_release); // <- 2
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}
Why does fence 2 exist? Doesn't the lock make sure that access to the m_instance is atomic and will not be affected by code reordering?
Also could someone please give me an example of a race condition that would be present if the barriers were removed and the m_instance variable were not an std::atomic?
So I did some reading: https://en.wikipedia.org/wiki/Double-checked_locking and http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/ . I found this code for using it
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_relaxed);
if (tmp == nullptr) {
tmp = new Singleton;
std::atomic_thread_fence(std::memory_order_release);
m_instance.store(tmp, std::memory_order_relaxed);
}
}
return tmp;
}
and there is one thing that is not clear to me. Does it work differently than following code without fences?
std::atomic<Singleton*> Singleton::m_instance;
std::mutex Singleton::m_mutex;
Singleton* Singleton::getInstance() {
Singleton* tmp = m_instance.load(std::memory_order_acquire);
if (tmp == nullptr) {
std::lock_guard<std::mutex> lock(m_mutex);
tmp = m_instance.load(std::memory_order_acquire);
if (tmp == nullptr) {
tmp = new Singleton;
m_instance.store(tmp, std::memory_order_release);
}
}
return tmp;
}
What I mean if I replace fences with appropriate memory order in load/store, does it work the same?
The difference between the two constructs is explained in a follow-up article on the same site: Acquire and Release Fences Don't Work the Way You'd Expect. Basically, the fence guarantees that all the atomic stores after the fence will be visible "not before" all the stores before the fence. The store with memory_order_release parameter only makes such a guarantee for the stores to the variable concerned by the store instruction.
In your example, you only have one atomic, m_instance, so the two constructs are functionally equivalent and the one without the fences is probably more performant.
I have been reading about thread safe singletons and the implementation I find everywhere has a getInstance() method something like this:
Singleton* getInstance()
{
if ( !initialized )
{
lock();
if ( !initialized )
{
instance = new Singleton();
initialized = true;
}
unlock();
}
return instance;
}
Is this actually thread safe?
Have I missed something or is there a small chance this function will return an uninitialized instance because 'initialized' may be reordered and set before instance?
This article is on a slightly different topic but the top answer describes why I think the above code is not thread safe:
Why is volatile not considered useful in multithreaded C or C++ programming?
Not a good idea. Look for double check locking. For instance:
http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405726
http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405772
It is indeed not thread safe, because after the pointer gets returned you still work with it, although the mutex is unlocked again.
What you can do is making the child class which inherits from singleton, thread safe. Then you're good to go.
Below is the code for a thread-safe singleton, using Double Check and temporary variable. A temporary variable is used to construct the object completely first and then assign it to pInstance.
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
pInstance = temp; // assign temp to pInstance
}
}
return pInstance;
}