I would like to use singleton pattern in a multithreaded program. Double-checked locking method seems suitable for its efficiency, however this method is broken and not easy to get right.
I write the following code hoping that it works as an alternative to the double-checked locking. Is it a correct implementation of a thread-safe singleton pattern?
static bool created = false;
static Instance *instance = 0;
Instance *GetInstance() {
if (!created) {
Lock lock; // acquire a lock, parameters are omitted for simplicity
if (!instance) {
instance = new Instance;
} else {
created = true;
}
}
return instance;
}
The first call will create Instance. The second call will set created to true. And finally, all other calls will return a well initialized instance.
http://voofie.com/content/192/alternative-to-double-checked-locking-and-the-singleton-pattern/
No, this doesn't help. If the writes to created and instance are non-atomic then there is no guarantee that the values are visible to a thread that doesn't lock the mutex.
e.g. Thread 1 calls getInstance. created is false, and instance is null, so it locks the mutex and creates a new instance. Thread 1 calls getInstance again, and this time sets created to true. Thread 2 now calls getInstance. By the vagaries of the processor's memory management it sees created as true, but there is no guarantee that it also sees instance as non-null, and even if it does there is no guarantee that the memory values for the pointed-to instance are consistent.
If you're not using atomics then you need to use mutexes, and must use them for all accesses to a protected variable.
Additional info: If you are using mutexes, then the compiler and runtime work together to ensure that when one thread releases a mutex lock and another thread acquires a lock on that same mutex then the second thread can see all the writes done by the first. This is not true for non-atomic accesses, and may or may not be true for atomic accesses, depending on what memory ordering constraints the compiler and runtime guarantee for you (with C++11 atomics you can choose the ordering constraints).
It has the same reliability of the double-checked locking.
You can get more with "triple check", or even "quadruple-check", but full reliability can be demonstrated to be impossible.
Please note that declaring a local-static variable will make the compiler to implement itself your same logic.
#include<memory>
#include "Instance.h" //or whatever...
Instance* GetInstance()
{
static std::unique_ptr<Instance> p(new Instance);
return p.get();
}
If the compiler is configured for multithreading environment, it sould protect the static p with a mutex and manage the lock when initializing p at the very first call. It also should chain p destruction to the tail of the "at_exit" chain, so that -at program end- proper destruction will be performed.
[EDIT]
Since this is a requirement for C++11 and is implemented only in some C++03 pre-standard, check the compiler implementation and settings.
Right now, I can only ensure MinGW 4.6 on and VS2010 already did it.
No. There's absolutely no difference between your code and double
checked locking. The correct implementation is:
static std::mutex m;
Singleton&
Singleton::instance()
{
static Singleton* theOneAndOnly;
std::lock_guard l(m);
if (theOneAndOnly == NULL)
theOneAndOnly = new Singleton;
return *theOneAndOnly;
}
It's hard to imagine a case where this would cause a problem, and it is
guaranteed. You do aquire the lock each time, but aquiring an
uncontested mutex should be fairly cheap, you're not accessing
Singleton's that much, and if you do end up having to access it in the
middle of a tight loop, there's nothing to stop you from acquiring a
reference to it before entering the loop, and using it.
This code contains a race-condition, in that created can be read while it is concurrently being written to by a different thread.
As a result, it has undefined behaviour, and is not a valid way of writing that code.
As KennyTM pointed out in the comments, a far better alternative is:
Instance* GetInstance() { static Instance instance; return &instance; }
your solution will work fine, but it does one more check.
the right double check locking looks like,
static Instance *instance = 0;
Instance *GetInstance() {
if (instance == NULL) //first check.
{
Lock lock; //scope lock.
if (instance == NULL) //second check, the second check must under the lock.
instance = new Instance;
}
return instance;
}
the double check locking will have a good performance for it does not acquire the lock every time. and it is thread-safe the locking will gurentee there is only one instance created.
Related
I have a global reference-counted object obj that I want to protect from data races by using atomic operations:
T* obj; // initially nullptr
std::atomic<int> count; // initially zero
My understanding is that I need to use std::memory_order_release after I write to obj, so that the other threads will be aware of it being created:
void increment()
{
if (count.load(std::memory_order_relaxed) == 0)
obj = std::make_unique<T>();
count.fetch_add(1, std::memory_order_release);
}
Likewise, I need to use std::memory_order_acquire when reading the counter, to ensure the thread has visibility of obj being changed:
void decrement()
{
count.fetch_sub(1, std::memory_order_relaxed);
if (count.load(std::memory_order_acquire) == 0)
obj.reset();
}
I am not convinced that the code above is correct, but I'm not entirely sure why. I feel like after obj.reset() is called, there should be a std::memory_order_release operation to inform other threads about it. Is that correct?
Are there other things that can go wrong, or is my understanding of atomic operations in this case completely wrong?
It is wrong regardless of memory ordering.
As #MaartenBamelis pointed out for concurrent calling of increment the object is constructed twice. And the same is true for concurrent decrement: object is reset twice (which may result in double destructor call).
Note that there's disagreement between T* obj; declaration and using it as unique_ptr but neither raw pointer not unique pointer are safe for concurrent modification. In practice, reset or delete will check pointer for null, then delete and set it to null, and these steps are not atomic.
fetch_add and fetch_sub are fetch and op instead of just op for a reason: if you don't use the value observed during operation, it is likely to be a race.
This code is inherently racey. If two threads call increment at the same time when count is initially 0, both will see count as 0, and both will create obj (and race to see which copy is kept; given unique_ptr has no special threading protections, terrible things can happen if two of them set it at once).
If two threads decrement at the same time (holding the last two references), and finish the fetch_sub before either calls load, both will reset obj (also bad).
And if a decrement finishes the fetch_sub (to 0), then another thread increments before the decrement load occurs, the increment will see the count as 0 and reinitialize. Whether the object is cleared after being replaced, or replaced after being cleared, or some horrible mixture of the two, will depend on whether increment's fetch_add runs before or after decrement's load.
In short: If you find yourself using two separate atomic operations on the same variable, and testing the result of one of them (without looping, as in a compare and swap loop), you're wrong.
More correct code would look like:
void increment() // Still not safe
{
// acquire is good for the != 0 case, for a later read of obj
// or would be if the other writer did a release *after* constructing an obj
if (count.fetch_add(1, std::memory_order_acquire) == 0)
obj = std::make_unique<T>();
}
void decrement()
{
if (count.fetch_sub(1, std::memory_order_acquire) == 1)
obj.reset();
}
but even then it's not reliable; there's no guarantee that, when count is 0, two threads couldn't call increment, both of them fetch_add at once, and while exactly one of them is guaranteed to see the count as 0, said 0-seeing thread might end up delayed while the one that saw it as 1 assumes the object exists and uses it before it's initialized.
I'm not going to swear there's no mutex-free solution here, but dealing with the issues involved with atomics is almost certainly not worth the headache.
It might be possible to confine the mutex to inside the if() branches, but taking a mutex is also an atomic RMW operation (and not much more than that for a good lightweight implementation) so this doesn't necessarily help a huge amount. If you need really good read-side scaling, you'd want to look into something like RCU instead of a ref-count, to allow readers to truly be read-only, not contending with other readers.
I don't really see a simple way of implementing a reference-counted resource with atomics. Maybe there's some clever way that I haven't thought of yet, but in my experience, clever does not equal readable.
My advice would be to implement it first using a mutex. Then you simply lock the mutex, check the reference count, do whatever needs to be done, and unlock again. It's guaranteed correct:
std::mutex mutex;
int count;
std::unique_ptr<T> obj;
void increment()
{
auto lock = std::scoped_lock{mutex};
if (++count == 1) // Am I the first reference?
obj = std::make_unique<T>();
}
void decrement()
{
auto lock = std::scoped_lock{mutex};
if (--count == 0) // Was I the last reference?
obj.reset();
}
Although at this point, I would just use a std::shared_ptr instead of managing the reference count myself:
std::mutex mutex;
std::weak_ptr<T> obj;
std::shared_ptr<T> acquire()
{
auto lock = std::scoped_lock{mutex};
auto sp = obj.lock();
if (!sp)
obj = sp = std::make_shared<T>();
return sp;
}
I believe this also makes it safe when exceptions may be thrown when constructing the object.
Mutexes are surprisingly performant, so I expect that locking code is plenty quick unless you have a highly specialized use case where you need code to be lock-free.
I believe I've got a good handle on at least the basics of multi-threading in C++, but I've never been able to get a clear answer on locking a mutex around shared resources in the constructor or the destructor. I was under the impression that you should lock in both places, but recently coworkers have disagreed. Pretend the following class is accessed by multiple threads:
class TestClass
{
public:
TestClass(const float input) :
mMutex(),
mValueOne(1),
mValueTwo("Text")
{
//**Does the mutex need to be locked here?
mValueTwo.Set(input);
mValueOne = mValueTwo.Get();
}
~TestClass()
{
//Lock Here?
}
int GetValueOne() const
{
Lock(mMutex);
return mValueOne;
}
void SetValueOne(const int value)
{
Lock(mMutex);
mValueOne = value;
}
CustomType GetValueTwo() const
{
Lock(mMutex);
return mValueOne;
}
void SetValueTwo(const CustomType type)
{
Lock(mMutex);
mValueTwo = type;
}
private:
Mutex mMutex;
int mValueOne;
CustomType mValueTwo;
};
Of course everything should be safe through the initialization list, but what about the statements inside the constructor? In the destructor would it be beneficial to do a non-scoped lock, and never unlock (essentially just call pthread_mutex_destroy)?
Multiple threads cannot construct the same object, nor should any thread be allowed to use the object before it's fully constructed. So, in sane code, construction without locking is safe.
Destruction is a slightly harder case. But again, proper lifetime management of your object can ensure that an object is never destroyed when there's a chance that some thread(s) might still use it.
A shared pointer can help in achieving this eg. :
construct the object in a certain thread
pass shared pointers to every thread that needs access to the object (including the thread that constructed it if needed)
the object will be destroyed when all threads have released the shared pointer
But obviously, other valid approaches exist. The key is to keep proper boundaries between the three main stages of an object's lifetime : construction, usage and destruction. Never allow an overlap between any of these stages.
They don't have to be locked in the constructor, as the only way anyone external can get access to that data at that point is if you pass them around from the constructor itself (or do some undefined behaviour, like calling a virtual method).
[Edit: Removed part about destructor, since as a comment rightfully asserts, you have bigger issues if you're trying to access resources from an object which might be dead]
I need to write class that loads shared libraries. The dlopen() / dlerror() sequence needs a lock to be thread safe.
class LibLoader {
public:
LibLoader(string whichLib);
bool Load() { Wait(lock); ... dlopen() ... dlerror() ... }
bool Unload() { Wait(lock); ... dlclose() ... dlerror() ... }
bool IsLoaded() {...}
// ... access to symbols...
private:
static Lock lock;
}
Lock Lock::lock;
The users of this class (there will be multiple at the same time) will want to make it a static member of this class to avoid loading a shared library multiple time for every object of the class:
class NeedsALib {
public:
NeedsALib() { if (!myLib.IsLoaded()) { myLib.Load(); } }
private:
static LibLoader myLib;
}
LibLoader::myLib;
The problem with this code is that it may / will crash, because it relies on the order of statics being destructed when the program terminates. If the lock is gone before myLib it will crash....
How can this be written in a safe manner that is thread safe and doesn't rely on the order of static destruction ?
Unfortunately, I think the only way to avoid this is to both use unportable one-time initialization directives, and to avoid destroying the lock at all. There are two basic problems you need to deal with:
What happens if two threads race to access the lock for the first time? [ie, you cannot lazy-create the lock]
What happens if the lock is destroyed too early? [ie, you cannot statically-create the lock]
The combination of these constraints forces you to use a non-portable mechanism to create the lock.
On pthreads, the most straightforward way to handle this is with the PTHREAD_MUTEX_INITIALIZER, which lets you statically initialize locks:
class LibLoader{
static pthread_mutex_t mutex;
// ...
};
// never destroyed
pthread_mutex_t LibLoader::mutex = PTHREAD_MUTEX_INITIALIZER;
On windows, you can use synchronous one-time initialization.
Alternately, if you can guarentee that there will only be one thread before main runs, you can use the singleton pattern without destruction, and just force the lock to be touched before main():
class LibLoader {
class init_helper {
init_helper() { LibLoader::getLock(); }
};
static init_helper _ih;
static Lock *_theLock;
static Lock *getLock() {
if (!_theLock)
_theLock = new Lock();
return _theLock;
}
// ...
};
static init_helper LibLoader::_ih;
static Lock *LibLoader::_theLock;
Note that this makes the possibly non-portable (but highly likely to be true) assumption that static objects of POD type are not destroyed until all non-POD static objects have been destroyed. I'm not aware of any platform in which this is not the case.
Wrapping up the requirements: multiple LibLoader instances are needed, each for a different library, but a single lock must exist to ensure they don't overwrite each others' error codes.
One way would be to rely on static initialization and destruction order within a file.
Better way would be not to make LibLoader static field in NeedsALib (and the like). It seems these client classes can be passed an instance of the right LibLoader in the constructor.
If creating LibLoader instances outside of its client classes isn't convenient, you can make all static fields (the lock and the loaders) pointers and use singleton pattern with lazy initialization. Then as you create the first loader, it ends up creating the lock as well. Singleton itself would require locking here, but you could perhaps run it before spawning your threads. Destruction would also be explicit and under your control. You can also do this with loaders only (retaining static lock).
Also, if LibLoader doesn't have a lot of state to store, you can make each client class (NeedsALib etc) instantiate its own LibLoader. This is admittedly quite wasteful, though.
Is the following singleton implementation data-race free?
static std::atomic<Tp *> m_instance;
...
static Tp &
instance()
{
if (!m_instance.load(std::memory_order_relaxed))
{
std::lock_guard<std::mutex> lock(m_mutex);
if (!m_instance.load(std::memory_order_acquire))
{
Tp * i = new Tp;
m_instance.store(i, std::memory_order_release);
}
}
return * m_instance.load(std::memory_order_relaxed);
}
Is the std::memory_model_acquire of the load operation superfluous? Is it possible to further relax both load and store operations by switching them to std::memory_order_relaxed? In that case, is the acquire/release semantic of std::mutex enough to guarantee its correctness, or a further std::atomic_thread_fence(std::memory_order_release) is also required to ensure that the writes to memory of the constructor happen before the relaxed store? Yet, is the use of fence equivalent to have the store with memory_order_release?
EDIT: Thanks to the answer of John, I came up with the following implementation that should be data-race free. Even though the inner load could be non-atomic at all, I decided to leave a relaxed load in that it does not affect the performance. In comparison to always have an outer load with the acquire memory order, the thread_local machinery improves the performance of accessing the instance of about an order of magnitude.
static Tp &
instance()
{
static thread_local Tp *instance;
if (!instance &&
!(instance = m_instance.load(std::memory_order_acquire)))
{
std::lock_guard<std::mutex> lock(m_mutex);
if (!(instance = m_instance.load(std::memory_order_relaxed)))
{
instance = new Tp;
m_instance.store(instance, std::memory_order_release);
}
}
return *instance;
}
I think this a great question and John Calsbeek has the correct answer.
However, just to be clear a lazy singleton is best implemented using the classic Meyers singleton. It has garanteed correct semantics in C++11.
§ 6.7.4
... If control enters
the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for
completion of the initialization. ...
The Meyer's singleton is preferred in that the compiler can aggressively optimize the concurrent code. The compiler would be more restricted if it had to preserve the semantics of a std::mutex. Furthermore, the Meyer's singleton is 2 lines and virtually impossible to get wrong.
Here is a classic example of a Meyer's singleton. Simple, elegant, and broken in c++03. But simple, elegant, and powerful in c++11.
class Foo
{
public:
static Foo& instance( void )
{
static Foo s_instance;
return s_instance;
}
};
That implementation is not race-free. The atomic store of the singleton, while it uses release semantics, will only synchronize with the matching acquire operation—that is, the load operation that is already guarded by the mutex.
It's possible that the outer relaxed load would read a non-null pointer before the locking thread finished initializing the singleton.
The acquire that is guarded by the lock, on the other hand, is redundant. It will synchronize with any store with release semantics on another thread, but at that point (thanks to the mutex) the only thread that can possibly store is the current thread. That load doesn't even need to be atomic—no stores can happen from another thread.
See Anthony Williams' series on C++0x multithreading.
See also call_once.
Where you'd previously use a singleton to do something, but not actually use the returned object for anything, call_once may be the better solution.
For a regular singleton you could do call_once to set a (global?) variable and then return that variable...
Simplified for brevity:
template< class Function, class... Args>
void call_once( std::once_flag& flag, Function&& f, Args&& args...);
Exactly one execution of exactly one of the functions, passed as f to the invocations in the group (same flag object), is performed.
No invocation in the group returns before the abovementioned execution of the selected function is completed successfully
I am running some thread safe code here. I am using a mutex to protect the section of code that needs to be run only by only 1 thread at a time. The problem I have is using this code sometimes I end up with 2 Mutex objects. This is a static function by the way. How do I make sure only 1 mutex object gets created??
/*static*/ MyClass::GetResource()
{
if (m_mutex == 0)
{
// make a new mutex object
m_mutex = new MyMutex();
}
m_mutex->Lock();
Simply create m_mutex outside of GetResource(), before it can ever be called - this removes the critical section around the actual creation of the mutex.
MyClass::Init()
{
m_mutex = new Mutex;
}
MyClass::GetResource()
{
m_mutex->Lock();
...
m_mutex->Unlock();
}
The issue is the thread could be interrupted after checking if m_mutex is 0, but not before it creates the mutex, allowing another thread to run through the same code.
Don't assign to m_mutex right away. Create a new mutex, and then do an atomic compare exchange.
You don't mention your target platform, but on Windows:
MyClass::GetResource()
{
if (m_mutex == 0)
{
// make a new mutex object
MyMutex* mutex = new MyMutex();
// Only set if mutex is still NULL.
if (InterlockedCompareExchangePointer(&m_mutex, mutex, 0) != 0)
{
// someone else beat us to it.
delete mutex;
}
}
m_mutex->Lock();
Otherwise, replace with whatever compare/swap function your platform provides.
Another option is to use one-time initialization support, which is available on Windows Vista and up, or just pre-create the mutex if you can.
Lazy mutex initialization isn't really appropriate for static methods; you need some guarantee that nobody races to the initialization. The following uses the compiler to generate a single static mutex for the class.
/* Header (.hxx) */
class MyClass
{
...
private:
static mutable MyMutex m_mutex; // Declares, "this mutex exists, somewhere."
};
/* Compilation Unit (.cxx) */
MyMutex MyClass::m_mutex; // The aforementioned, "somewhere."
MyClass::GetResource()
{
m_mutex.Lock();
...
m_mutex.Unlock();
}
Some other solutions will require extra assumptions of your fellow programmers. With the "call init()" method, for instance, you have to be sure that the initialization method were called, and everybody would have to know this rule.
Why use a pointer anyway? Why not replace the pointer with an actual instance that does not require dynamic memory management? This avoids the race condition, and does not impose a performance hit on every call into the function.
Since it is only to protect one specific section of code, simply declare it static inside the function.
static MyClass::GetResource()
{
static MyMutex mutex;
mutex.Lock();
// ...
mutex.Unlock();
The variable is a local variable with static storage duration. It is explicitly stated in the Standard:
An implementation is permitted to perform early initialization of other block-scope variables with static or
thread storage duration under the same conditions that an implementation is permitted to statically initialize
a variable with static or thread storage duration in namespace scope (3.6.2). Otherwise such a variable is
initialized the first time control passes through its declaration; such a variable is considered initialized upon
the completion of its initialization. If the initialization exits by throwing an exception, the initialization
is not complete, so it will be tried again the next time control enters the declaration. If control enters
the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for
completion of the initialization.
The last sentence is of particular interest to you.