Double-checked locking in C++11? [duplicate] - c++

This question already has answers here:
Double-Checked Lock Singleton in C++11
(3 answers)
Closed 9 years ago.
Here is a Java example problem from http://www.ibm.com/developerworks/java/library/j-dcl/index.html
public static Singleton getInstance()
{
if (instance == null) //#4
{
synchronized(Singleton.class) { //#1
if (instance == null) //#2
instance = new Singleton(); //#3
}
}
return instance;
}
It seems this isn't safe because #3 can set instance to not null before the constructor is executed thus when another thread checks instance on #4 it will not be null and return a instance that hasn't been constructed properly.
Apparently using a function variable wont help because it may be optimized out or just be executed in a way that also sets the value to instance when we don't want it to.
I was thinking wouldn't the easiest way is to have a function new Singleton(); so it is completed before assigning it to instance. Now the issue is how do I tell C++ a function should NOT be inline? I think Singleton* make_singleton() volatile should do that but i'm positive I am wrong.

I'll ignore the singleton bits for a while and assume you need this for lazy initialization and not for silly things like singletons.
I suggest forgetting about double-checked locking. C++ provides a very useful tool for this kind of situation in the form of std::call_once, so use that.
template <typename T>
struct lazy {
public:
// needs constraining to prevent from doing copies
// see: http://flamingdangerzone.com/cxx11/2012/06/05/is_related.html
template <typename Fun>
explicit lazy(Fun&& fun) : fun(std::forward<Fun>(fun)) {}
T& get() const {
std::call_once(flag, [this] { ptr.reset(fun()); });
return *ptr;
}
// more stuff like op* and op->, implemented in terms of get()
private:
std::once_flag flag;
std::unique_ptr<T> ptr;
std::function<T*()> fun;
};
// --- usage ---
lazy<foo> x([] { return new foo; });

This is exactly the type of situation atomics are designed for. By storing the result into an atomic, you know that the compiler can't sequence any critical stores or operations after the atomic is set. Atomics are both for emitting processor instruction primitives to ensure the necessary sequential consistency (e.g., for cache coherency across cores) and for telling the compiler which semantics must be preserved (and therefore to limit the types of reorderings it can do). If you use an atomic here, it won't matter if the function is inlined because whatever inlining the compiler does will have to preserve the semantics of the atomic itself.
You may also be interested in looking into std::call_once which is also designed for this situation, and more specifically for the situation where multiple threads may need something done, but exactly one of them should do it.

Related

How to create a singleton object using std::unique_ptr of a class with private destructor? [duplicate]

I've created all singletons in my program with that document in mind:
http://erdani.com/publications/DDJ_Jul_Aug_2004_revised.pdf
(in case anyone wondered why singleton, all of them are factories and some of them store some global settings concerning how they should create instances).
Each of them looks somehow like this:
declaration:
class SingletonAndFactory {
static SingletonAndFactory* volatile instance;
public:
static SingletonAndFactory& getInstance();
private:
SingletonAndFactory();
SingletonAndFactory(
const SingletonAndFactory& ingletonFactory
);
~SingletonAndFactory();
};
definition:
boost::mutex singletonAndFactoryMutex;
////////////////////////////////////////////////////////////////////////////////
// class SingletonAndFactory {
SingletonAndFactory* volatile singletonAndFactory::instance = 0;
// public:
SingletonAndFactory& SingletonAndFactory::getInstance() {
// Singleton implemented according to:
// "C++ and the Perils of Double-Checked Locking".
if (!instance) {
boost::mutex::scoped_lock lock(SingletonAndFactoryMutex);
if (!instance) {
SingletonAndFactory* volatile tmp = (SingletonAndFactory*) malloc(sizeof(SingletonAndFactory));
new (tmp) SingletonAndFactory; // placement new
instance = tmp;
}
}
return *instance;
}
// private:
SingletonAndFactory::SingletonAndFactory() {}
// };
Putting aside question what design of singleton is the best (since it would start a pointless flame war) my question is: would it benefit me to replace normal pointer with std::unique_ptr? In particular, would it call singleton's destructor on program exit? If so how would I achieve it? When I tried to add something like friend class std::unique_ptr<SingletonAndFactory>; it didn't worked out since compiler keep on complaining that the destructor is private.
I know it doesn't matter in my current project since none of factories have something that would require cleaning of any sort, but for future reference I would like to know how to implement such behavior.
It's not the unique_ptr itself that does the deletion, it's the deleter. So if you wanted to go with the friend approach, you'd have to do this:
friend std::unique_ptr<SingletonFactory>::deleter_type;
However, I don't think it's guaranteed that the default deleter will not delegate the actual delete to another function, which would break this.
Instead, you might want to supply your own deleter, perhaps like this:
class SingletonFactory {
static std::unique_ptr<SingletonFactory, void (*)(SingletonFactory*)> volatile instance;
public:
static SingletonFactory& getInstance();
private:
SingletonFactory();
SingletonFactory(
const SingletonFactory& ingletonFactory
);
~SingletonFactory();
void deleter(SingletonFactory *d) { d->~SingletonFactory(); free(d); }
};
And in the creation function:
SingletonFactory* volatile tmp = (SingletonFactory*) malloc(sizeof(SingletonFactory));
new (tmp) SingletonFactory; // placement new
instance = decltype(instance)(tmp, &deleter);
In C++11, you can guarantee thread-safe lazy initialisation and destruction at the end of the program using a local static:
SingletonAndFactory& SingletonAndFactory::getInstance() {
static SingletonAndFactory instance;
return instance;
}
Beware that this can still cause lifetime issues, as it may be destroyed before other static objects. If they try to access it from their destructors, then you'll be in trouble.
Before that, it was impossible (although the above was guaranteed by many compilers). As described in the document you link to, volatile has nothing to do with thread synchronisation, so your code has a data race and undefined behaviour. Options are:
Take the (potentially large) performance hit of locking to check the pointer
Use whatever non-portable atomic intrinsics your compiler provides to test the pointer
Forget about thread-safe initialisation, and make sure it's initialised before you start your threads
Don't use singletons
I favour the last option, since it solves all the other problems introduced by the Singleton anti-pattern.

C++11: Safe double checked locking for lazy initialization. Possible?

I have read many questions considering thread-safe double checked locking (for singletons or lazy init). In some threads, the answer is that the pattern is entirely broken, others suggest a solution.
So my question is: Is there a way to write a fully thread-safe double checked locking pattern in C++? If so, how does it look like.
We can assume C++11, if that makes things easier. As far as I know, C++11 improved the memory model which could yield the needed improvements.
I do know that it is possible in Java by making the double-check guarded variable volatile. Since C++11 borrowed large parts of the memory model from the one of Java, so I think it could be possible, but how?
Simply use a static local variable for lazily initialized Singletons, like so:
MySingleton* GetInstance() {
static MySingleton instance;
return &instance;
}
The (C++11) standard already guarantees that static variables are initialized in a threadsafe manner and it seems likely that the implementation of this at least as robust and performant as anything you'd write yourself.
The threadsafety of the initialization can be found in §6.7.4 of the (C++11) standard:
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
Since you wanted to see a valid DCLP C++11 implementation, here is one.
The behavior is fully thread-safe and identical to GetInstance() in Grizzly's answer.
std::mutex mtx;
std::atomic<MySingleton *> instance_p{nullptr};
MySingleton* GetInstance()
{
auto *p = instance_p.load(std::memory_order_acquire);
if (!p)
{
std::lock_guard<std::mutex> lck{mtx};
p = instance_p.load(std::memory_order_relaxed);
if (!p)
{
p = new MySingleton;
instance_p.store(p, std::memory_order_release);
}
}
return p;
}

Should I use const to make objects thread-safe?

I wrote a class which instances may be accessed by several threads. I used a trick to remember users they have to lock the object before using it. It involves keeping only const instances. When in the need to read or modify sensitive data, other classes should call a method (which is const, thus allowed) to get a non-const version of the locked object. Actually it returns a proxy object containing a pointer to the non-const object and a scoped_lock, so it unlocks the object when going out of scope. The proxy object also overloads operator-> so the access to the object is transparent.
This way, shooting onself's foot by accessing unlocked objects is harder (there is always const_cast).
"Clever tricks" should be avoided, and this smells bad anyway.
Is this design really bad ?
What else can I or should I do ?
Edit: Getters are non-const to enforce locking.
Basic problem: a non-const reference may exist elsewhere. If that gets written safely, it does not follow that it can be read safely -- you may look at an intermediate state.
Also, some const methods might (legitimately) modify hidden internal details in a thread-unsafe way.
Analyse what you're actually doing to the object and find an appropriate synchronisation mode.
If your clever container really does know enough about the objects to control all their synchronisation via proxies, then make those objects private inner classes.
This is clever, but unfortunately doomed to fail.
The problem, underlined by spraff, is that you protect against reads but not against writes.
Consider the following sequence:
unsigned getAverageSalary(Employee const& e) {
return e.paid() / e.hired_duration();
}
What happens if we increment paid between the two function calls ? We get an incoherent value.
The problem is that your scheme does not explicitly enforce locking for reads.
Consider the alternative of a Proxy pattern: The object itself is a bundle of data, all privates. Only a Proxy class (friend) can read/write its data, and when initializing the Proxy it grabs the lock (on the mutex of the object) automatically.
class Data {
friend class Proxy;
Mutex _mutex;
int _bar;
};
class Proxy {
public:
Proxy(Data& data): _lock(data._mutex), _data(data) {}
int bar() const { return _data._bar; }
void bar(int b) { _data._bar = b; }
private:
Proxy(Proxy const&) = delete; // disable copy
Lock _lock;
Data& _data;
};
If I wanted to do what you are doing, I would do one of the following.
Method 1:
shared_mutex m; // somewhere outside the class
class A
{
private:
int variable;
public:
void lock() { m.lock(); }
void unlock() { m.unlock(); }
bool is_locked() { return m.is_locked(); }
bool write_to_var(int newvalue)
{
if (!is_locked())
return false;
variable = newvalue;
return true;
}
bool read_from_var(int *value)
{
if (!is_locked() || value == NULL)
return false;
*value = variable;
return true;
}
};
Method 2:
shared_mutex m; // somewhere outside the class
class A
{
private:
int variable;
public:
void write_to_var(int newvalue)
{
m.lock();
variable = newvalue;
m.unlock();
}
int read_from_var()
{
m.lock();
int to_return = variable;
m.unlock();
return to_return;
}
};
The first method is more efficient (not locking-unlocking all the time), however, the program may need to keep checking the output of every read and write to see if they were successful. The second method automatically handles the locking and so the programmer wouldn't even know the lock is there.
Note: This is not code for copy-paste. It shows a concept and sketches how it's done. Please don't comment saying you forgot some error checking somewhere.
This sounds a lot like Alexandrescu's idea with volatile. You're not
using the actual semantics of const, but rather exploiting the way the
type system uses it. In this regard, I would prefer Alexandrescu's use
of volatile: const has very definite and well understood semantics,
and subverting them will definitely cause confusion for anyone reading
or maintaining the code. volatile is more appropriate, as it has no
well defined semantics, and in the context of most applications, is not
used for anything else.
And rather than returning a classical proxy object, you should return a
smart pointer. You could actually use shared_ptr for this, grabbing
the lock before returning the value, and releasing it in the deleter
(rather than deleting the object); I rather fear, however, that this
would cause some confusion amongst the readers, and I would probably go
with a custom smart pointer (probably using shared_ptr with the custom
deleter in the implementation). (From your description, I suspect that
this is closer to what you had in mind anyway.)

Double-Checked Lock Singleton in C++11

Is the following singleton implementation data-race free?
static std::atomic<Tp *> m_instance;
...
static Tp &
instance()
{
if (!m_instance.load(std::memory_order_relaxed))
{
std::lock_guard<std::mutex> lock(m_mutex);
if (!m_instance.load(std::memory_order_acquire))
{
Tp * i = new Tp;
m_instance.store(i, std::memory_order_release);
}
}
return * m_instance.load(std::memory_order_relaxed);
}
Is the std::memory_model_acquire of the load operation superfluous? Is it possible to further relax both load and store operations by switching them to std::memory_order_relaxed? In that case, is the acquire/release semantic of std::mutex enough to guarantee its correctness, or a further std::atomic_thread_fence(std::memory_order_release) is also required to ensure that the writes to memory of the constructor happen before the relaxed store? Yet, is the use of fence equivalent to have the store with memory_order_release?
EDIT: Thanks to the answer of John, I came up with the following implementation that should be data-race free. Even though the inner load could be non-atomic at all, I decided to leave a relaxed load in that it does not affect the performance. In comparison to always have an outer load with the acquire memory order, the thread_local machinery improves the performance of accessing the instance of about an order of magnitude.
static Tp &
instance()
{
static thread_local Tp *instance;
if (!instance &&
!(instance = m_instance.load(std::memory_order_acquire)))
{
std::lock_guard<std::mutex> lock(m_mutex);
if (!(instance = m_instance.load(std::memory_order_relaxed)))
{
instance = new Tp;
m_instance.store(instance, std::memory_order_release);
}
}
return *instance;
}
I think this a great question and John Calsbeek has the correct answer.
However, just to be clear a lazy singleton is best implemented using the classic Meyers singleton. It has garanteed correct semantics in C++11.
§ 6.7.4
... If control enters
the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for
completion of the initialization. ...
The Meyer's singleton is preferred in that the compiler can aggressively optimize the concurrent code. The compiler would be more restricted if it had to preserve the semantics of a std::mutex. Furthermore, the Meyer's singleton is 2 lines and virtually impossible to get wrong.
Here is a classic example of a Meyer's singleton. Simple, elegant, and broken in c++03. But simple, elegant, and powerful in c++11.
class Foo
{
public:
static Foo& instance( void )
{
static Foo s_instance;
return s_instance;
}
};
That implementation is not race-free. The atomic store of the singleton, while it uses release semantics, will only synchronize with the matching acquire operation—that is, the load operation that is already guarded by the mutex.
It's possible that the outer relaxed load would read a non-null pointer before the locking thread finished initializing the singleton.
The acquire that is guarded by the lock, on the other hand, is redundant. It will synchronize with any store with release semantics on another thread, but at that point (thanks to the mutex) the only thread that can possibly store is the current thread. That load doesn't even need to be atomic—no stores can happen from another thread.
See Anthony Williams' series on C++0x multithreading.
See also call_once.
Where you'd previously use a singleton to do something, but not actually use the returned object for anything, call_once may be the better solution.
For a regular singleton you could do call_once to set a (global?) variable and then return that variable...
Simplified for brevity:
template< class Function, class... Args>
void call_once( std::once_flag& flag, Function&& f, Args&& args...);
Exactly one execution of exactly one of the functions, passed as f to the invocations in the group (same flag object), is performed.
No invocation in the group returns before the abovementioned execution of the selected function is completed successfully

Any comments to this (hopefully) 100% safe double-check-locking replacement for singletons

As Scott Meyers and Andrei Alexandrescu outlined in this article the simple try to implement the double-check locking implementation is unsafe in C++ specifically and in general on multi-processor systems without using memory barriers.
I was thinking a little bit about that and came to a solution that avoids using memory barriers and should also work 100% safe in C++. The trick is to store a copy of the pointer to the instance thread-local so each thread has to acquire the lock for the first times it access the singleton.
Here is a little sample code (syntax not checked; I used pthread but all other threading libs could be used):
class Foo
{
private:
Helper *helper;
pthread_key_t localHelper;
pthread_mutex_t mutex;
public:
Foo()
: helper(NULL)
{
pthread_key_create(&localHelper, NULL);
pthread_mutex_init(&mutex);
}
~Foo()
{
pthread_key_delete(&localHelper);
pthread_mutex_destroy(&mutex);
}
Helper *getHelper()
{
Helper *res = pthread_getspecific(localHelper);
if (res == NULL)
{
pthread_mutex_lock(&mutex);
if (helper == NULL)
{
helper = new Helper();
}
res = helper;
pthread_mutex_unlock(&mutex);
pthread_setspecific(localHelper, res);
}
return res;
}
};
What are your comments/opinions?
Do you find any flaws in the idea or the implementation?
EDIT:
Helper is the type of the singleton object (I know the name is not the bet...I took it from the Java examples in the Wikipedia article about DCLP).
Foo is the Singleton container.
EDIT 2:
Because it seems to be a little bit misunderstanding that Foo is not a static class and how it is used, here an example of the usage:
static Foo foo;
.
.
.
foo.getHelper()->doSomething();
.
.
.
The reason that Foo's members are not static is simply that I was able to create/destroy the mutex and the TLS in the constructor/destructor.
If a RAII version of a C++ mutex / TLS class is used Foo can easily be switched to be static.
You seem to be calling:
pthread_mutex_init(&mutex);
...in the Helper() constructor. But that constructor is itself called in the function getHelper() (which should be static, I think) which uses the mutex. So the mutex appears to be initialised twice or not at all.
I find the code very confusing, I must say. Double-checked locking is not that complex. Why don't you start again, and this time create a Mutex class, which does the initialisation, and uses RAI to release the underlying pthread mutex? Then use this Mutex class to implement your locking.
This isn't the double-checked locking pattern. Most of the potential thread safety issues of the pattern are due to the fact the the a common state is read outside of a mutually exclusive lock, and then re-checked inside it.
What you are doing is checking a thread local data item, and then checking the common state inside a lock. This is more like a standard single check singleton pattern with a thread local cached value optimization.
To a casual glance it does look safe, though.
Looks interesting! Clever use of thread-local storage to reduce contention.
But I wonder if this is really different from the problematic approach outlined by Meyers/Alexandrescu...?
Say you have two threads for which the singleton is uninitialized (e.g. thread local slot is empty) and they run getHelper in parallel.
Won't they get into the same race over the helper member? You're still calling operator new and you're still assigning that to a member, so the risk of rogue reordering is still there, right?
EDIT: Ah, I see now. The lock is taken around the NULL-check, so it should be safe. The thread-local replaces the "first" NULL-check of the DCLP.
Obviously a few people are misunderstanding your intent/solution. Something to think about.
I think the real question to be asked is this:
Is calling pthread_getspecific() cheaper than a memory barrier?
Are you trying to make a thread-safe singleton or, implement Meyers & Andrescu's tips? The most straightforward thing is to use a
pthread_once
as is done below. Obviously you're not going for lock-free speed or anything, creating & destroying mutexes as you are so, the code may as well be simple and clear - this is the kind of thing pthread_once was made for. Note, the Helper ptr is static, otherwise it'd be harder to guarantee that there's only one Helper object:
// Foo.h
#include <pthread.h>
class Helper {};
class Foo
{
private:
static Helper* s_pTheHelper;
static ::pthread_once_t once_control;
private:
static void createHelper() { s_pTheHelper = new Helper(); }
public:
Foo()
{ // stuff
}
~Foo()
{ // stuff
}
static Helper* getInstance()
{
::pthread_once(&once_control, Foo::createHelper);
return s_pTheHelper;
}
};
// Foo.cpp
// ..
Helper* Foo::s_pTheHelper = NULL;
::pthread_once_t Foo::once_control = PTHREAD_ONCE_INIT;
// ..