efficient thread-safe singleton in C++ - c++

The usual pattern for a singleton class is something like
static Foo &getInst()
{
static Foo *inst = NULL;
if(inst == NULL)
inst = new Foo(...);
return *inst;
}
However, it's my understanding that this solution is not thread-safe, since 1) Foo's constructor might be called more than once (which may or may not matter) and 2) inst may not be fully constructed before it is returned to a different thread.
One solution is to wrap a mutex around the whole method, but then I'm paying for synchronization overhead long after I actually need it. An alternative is something like
static Foo &getInst()
{
static Foo *inst = NULL;
if(inst == NULL)
{
pthread_mutex_lock(&mutex);
if(inst == NULL)
inst = new Foo(...);
pthread_mutex_unlock(&mutex);
}
return *inst;
}
Is this the right way to do it, or are there any pitfalls I should be aware of? For instance, are there any static initialization order problems that might occur, i.e. is inst always guaranteed to be NULL the first time getInst is called?

If you are using C++11, here is a right way to do this:
Foo& getInst()
{
static Foo inst(...);
return inst;
}
According to new standard there is no need to care about this problem any more. Object initialization will be made only by one thread, other threads will wait till it complete.
Or you can use std::call_once. (more info here)

Your solution is called 'double checked locking' and the way you've written it is not threadsafe.
This Meyers/Alexandrescu paper explains why - but that paper is also widely misunderstood. It started the 'double checked locking is unsafe in C++' meme - but its actual conclusion is that double checked locking in C++ can be implemented safely, it just requires the use of memory barriers in a non-obvious place.
The paper contains pseudocode demonstrating how to use memory barriers to safely implement the DLCP, so it shouldn't be difficult for you to correct your implementation.

Herb Sutter talks about the double-checked locking in CppCon 2014.
Below is the code I implemented in C++11 based on that:
class Foo {
public:
static Foo* Instance();
private:
Foo() {}
static atomic<Foo*> pinstance;
static mutex m_;
};
atomic<Foo*> Foo::pinstance { nullptr };
std::mutex Foo::m_;
Foo* Foo::Instance() {
if(pinstance == nullptr) {
lock_guard<mutex> lock(m_);
if(pinstance == nullptr) {
pinstance = new Foo();
}
}
return pinstance;
}
you can also check complete program here: http://ideone.com/olvK13

Use pthread_once, which is guaranteed that the initialization function is run once atomically.
(On Mac OS X it uses a spin lock. Don't know the implementation of other platforms.)

TTBOMK, the only guaranteed thread-safe way to do this without locking would be to initialize all your singletons before you ever start a thread.

Your alternative is called "double-checked locking".
There could exist multi-threaded memory models in which it works, but POSIX does not guarantee one

ACE singleton implementation uses double-checked locking pattern for thread safety, you can refer to it if you like.
You can find source code here.

Does TLS work here? https://en.wikipedia.org/wiki/Thread-local_storage#C_and_C++
For example,
static _thread Foo *inst = NULL;
static Foo &getInst()
{
if(inst == NULL)
inst = new Foo(...);
return *inst;
}
But we also need a way to delete it explicitly, like
static void deleteInst() {
if (!inst) {
return;
}
delete inst;
inst = NULL;
}

The solution is not thread safe because the statement
inst = new Foo();
can be broken down into two statements by compiler:
Statement1: inst = malloc(sizeof(Foo));
Statement2: inst->Foo();
Suppose that after execution of statement 1 by one thread context switch occurs. And 2nd thread also executes the getInstance() method. Then the 2nd thread will find that the 'inst' pointer is not null. So 2nd thread will return pointer to an uninitialized object as constructor has not yet been called by the 1st thread.

Related

std::shared_ptr::unique(), copying and thread safety

I have a shared_ptr stored in a central place which can be accessed by multiple threads through a method getPointer(). I want to make sure that only one thread uses the pointer at one time. Thus, whenever a thread wants to get the pointer I test if the central copy is the only one via std::shared_ptr::unique() method. If it returns yes, I return the copy assuming that unique()==false as long as that thread works on the copy. Other threads trying to access the pointer at the same time receive a nullptr and have to try again in the future.
Now my question:
Is it theoretically possible that two different threads calling getPointer() can get mutual access to the pointer despite the mutex guard and the testing via unique() ?
std::shared_ptr<int> myPointer; // my pointer is initialized somewhere else but before the first call to getPointer()
std::mutex myMutex;
std::shared_ptr<int> getPointer()
{
std::lock_guard<std::mutex> guard(myMutex);
std::shared_ptr<int> returnValue;
if ( myPointer.unique() )
returnValue = myPointer;
else
returnValue = nullptr;
return returnValue;
}
Regards
Only one "active" copy can exist at a time.
It is protected by the mutex until after a second shared_ptr is created at which point a subsequent call (once it gets the mutex after the first call has exited) will fail the unique test until the initial caller's returned shared_ptr is destroyed.
As noted in the comments, unique is going away in c++20, but you can test use_count == 1 instead, as that is what unique does.
Your solution seems overly complicated. It exploits the internal workings of the shared pointer to deduce a flag value. Why not just make the flag explicit?
std::shared_ptr<int> myPointer;
std::mutex myMutex;
bool myPointerIsInUse = false;
bool GetPermissionToUseMyPointer() {
std::lock_guard<std::mutex guard(myMutex);
auto r = (! myPointerIsInUse);
myPointerIsInUse ||= myPointerIsInUse;
return r;
}
bool RelinquishPermissionToUseMyPointer() {
std::lock_guard<std::mutex guard(myMutex);
myPointerIsInUse = false;
}
P.S., If you wrap that in a class with a few extra bells and whistles, it'll start to look a lot like a semaphore.

Thread safe singleton in C++

I have been reading about thread safe singletons and the implementation I find everywhere has a getInstance() method something like this:
Singleton* getInstance()
{
if ( !initialized )
{
lock();
if ( !initialized )
{
instance = new Singleton();
initialized = true;
}
unlock();
}
return instance;
}
Is this actually thread safe?
Have I missed something or is there a small chance this function will return an uninitialized instance because 'initialized' may be reordered and set before instance?
This article is on a slightly different topic but the top answer describes why I think the above code is not thread safe:
Why is volatile not considered useful in multithreaded C or C++ programming?
Not a good idea. Look for double check locking. For instance:
http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405726
http://www.drdobbs.com/cpp/c-and-the-perils-of-double-checked-locki/184405772
It is indeed not thread safe, because after the pointer gets returned you still work with it, although the mutex is unlocked again.
What you can do is making the child class which inherits from singleton, thread safe. Then you're good to go.
Below is the code for a thread-safe singleton, using Double Check and temporary variable. A temporary variable is used to construct the object completely first and then assign it to pInstance.
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
pInstance = temp; // assign temp to pInstance
}
}
return pInstance;
}

Read-write thread-safe smart pointer in C++, x86-64

I develop some lock free data structure and following problem arises.
I have writer thread that creates objects on heap and wraps them in smart pointer with reference counter. I also have a lot of reader threads, that work with these objects. Code can look like this:
SmartPtr ptr;
class Reader : public Thread {
virtual void Run {
for (;;) {
SmartPtr local(ptr);
// do smth
}
}
};
class Writer : public Thread {
virtual void Run {
for (;;) {
SmartPtr newPtr(new Object);
ptr = newPtr;
}
}
};
int main() {
Pool* pool = SystemThreadPool();
pool->Run(new Reader());
pool->Run(new Writer());
for (;;) // wait for crash :(
}
When I create thread-local copy of ptr it means at least
Read an address.
Increment reference counter.
I can't do these two operations atomically and thus sometimes my readers work with deleted object.
The question is - what kind of smart pointer should I use to make read-write access from several threads with correct memory management possible? Solution should exist, since Java programmers don't even care about such a problem, simply relying on that all objects are references and are deleted only when nobody uses them.
For PowerPC I found http://drdobbs.com/184401888, looks nice, but uses Load-Linked and Store-Conditional instructions, that we don't have in x86.
As far I as I understand, boost pointers provide such functionality only using locks. I need lock free solution.
boost::shared_ptr have atomic_store which uses a "lock-free" spinlock which should be fast enough for 99% of possible cases.
boost::shared_ptr<Object> ptr;
class Reader : public Thread {
virtual void Run {
for (;;) {
boost::shared_ptr<Object> local(boost::atomic_load(&ptr));
// do smth
}
}
};
class Writer : public Thread {
virtual void Run {
for (;;) {
boost::shared_ptr<Object> newPtr(new Object);
boost::atomic_store(&ptr, newPtr);
}
}
};
int main() {
Pool* pool = SystemThreadPool();
pool->Run(new Reader());
pool->Run(new Writer());
for (;;)
}
EDIT:
In response to comment below, the implementation is in "boost/shared_ptr.hpp"...
template<class T> void atomic_store( shared_ptr<T> * p, shared_ptr<T> r )
{
boost::detail::spinlock_pool<2>::scoped_lock lock( p );
p->swap( r );
}
template<class T> shared_ptr<T> atomic_exchange( shared_ptr<T> * p, shared_ptr<T> r )
{
boost::detail::spinlock & sp = boost::detail::spinlock_pool<2>::spinlock_for( p );
sp.lock();
p->swap( r );
sp.unlock();
return r; // return std::move( r )
}
With some jiggery-pokery you should be able to accomplish this using InterlockedCompareExchange128. Store the reference count and pointer in a 2 element __int64 array. If reference count is in array[0] and pointer in array[1] the atomic update would look like this:
while(true)
{
__int64 comparand[2];
comparand[0] = refCount;
comparand[1] = pointer;
if(1 == InterlockedCompareExchange128(
array,
pointer,
refCount + 1,
comparand))
{
// Pointer is ready for use. Exit the while loop.
}
}
If an InterlockedCompareExchange128 intrinsic function isn't available for your compiler then you may use the underlying CMPXCHG16B instruction instead, if you don't mind mucking around in assembly language.
The solution proposed by RobH doesn't work. It has the same problem as the original question: when accessing the reference count object, it might already have been deleted.
The only way I see of solving the problem without a global lock (as in boost::atomic_store) or conditional read/write instructions is to somehow delay the destruction of the object (or the shared reference count object if such thing is used). So zennehoy has a good idea but his method is too unsafe.
The way I might do it is by keeping copies of all the pointers in the writer thread so that the writer can control the destruction of the objects:
class Writer : public Thread {
virtual void Run() {
list<SmartPtr> ptrs; //list that holds all the old ptr values
for (;;) {
SmartPtr newPtr(new Object);
if(ptr)
ptrs.push_back(ptr); //push previous pointer into the list
ptr = newPtr;
//Periodically go through the list and destroy objects that are not
//referenced by other threads
for(auto it=ptrs.begin(); it!=ptrs.end(); )
if(it->refCount()==1)
it = ptrs.erase(it);
else
++it;
}
}
};
However there are still requirements for the smart pointer class. This doesn't work with shared_ptr as the reads and writes are not atomic. It almost works with boost::intrusive_ptr. The assignment on intrusive_ptr is implemented like this (pseudocode):
//create temporary from rhs
tmp.ptr = rhs.ptr;
if(tmp.ptr)
intrusive_ptr_add_ref(tmp.ptr);
//swap(tmp,lhs)
T* x = lhs.ptr;
lhs.ptr = tmp.ptr;
tmp.ptr = x;
//destroy temporary
if(tmp.ptr)
intrusive_ptr_release(tmp.ptr);
As far as I understand the only thing missing here is a compiler level memory fence before lhs.ptr = tmp.ptr;. With that added, both reading rhs and writing lhs would be thread-safe under strict conditions: 1) x86 or x64 architecture 2) atomic reference counting 3) rhs refcount must not go to zero during the assignment (guaranteed by the Writer code above) 4) only one thread writing to lhs (using CAS you could have several writers).
Anyway, you could create your own smart pointer class based on intrusive_ptr with necessary changes. Definitely easier than re-implementing shared_ptr. And besides, if you want performance, intrusive is the way to go.
The reason this works much more easily in java is garbage collection. In C++, you have to manually ensure that a value is not just starting to be used by a different thread when you want to delete it.
A solution I've used in a similar situation is to simply delay the deletion of the value. I create a separate thread that iterates through a list of things to be deleted. When I want to delete something, I add it to this list with a timestamp. The deleting thread waits until some fixed time after this timestamp before actually deleting the value. You just have to make sure that the delay is large enough to guarantee that any temporary use of the value has completed.
100 milliseconds would have been enough in my case, I chose a few seconds to be safe.

Double checked locking on C++: new to a temp pointer, then assign it to instance

Anything wrong with the following Singleton implementation?
Foo& Instance() {
if (foo) {
return *foo;
}
else {
scoped_lock lock(mutex);
if (foo) {
return *foo;
}
else {
// Don't do foo = new Foo;
// because that line *may* be a 2-step
// process comprising (not necessarily in order)
// 1) allocating memory, and
// 2) actually constructing foo at that mem location.
// If 1) happens before 2) and another thread
// checks the foo pointer just before 2) happens, that
// thread will see that foo is non-null, and may assume
// that it is already pointing to a a valid object.
//
// So, to fix the above problem, what about doing the following?
Foo* p = new Foo;
foo = p; // Assuming no compiler optimisation, can pointer
// assignment be safely assumed to be atomic?
// If so, on compilers that you know of, are there ways to
// suppress optimisation for this line so that the compiler
// doesn't optimise it back to foo = new Foo;?
}
}
return *foo;
}
No, you cannot even assume that foo = p; is atomic. It's possible that it might load 16 bits of a 32-bit pointer, then be swapped out before loading the rest.
If another thread sneaks in at that point and calls Instance(), you're toasted because your foo pointer is invalid.
For true security, you will have to protect the entire test-and-set mechanism, even though that means using mutexes even after the pointer is built. In other words (and I'm assuming that scoped_lock() will release the lock when it goes out of scope here (I have little experience with Boost)), something like:
Foo& Instance() {
scoped_lock lock(mutex);
if (foo != 0)
foo = new Foo();
return *foo;
}
If you don't want a mutex (for performance reasons, presumably), an option I've used in the past is to build all singletons before threading starts.
In other words, assuming you have that control (you may not), simply create an instance of each singleton in main before kicking off the other threads. Then don't use a mutex at all. You won't have threading problems at that point and you can just use the canonical don't-care-about-threads-at-all version:
Foo& Instance() {
if (foo != 0)
foo = new Foo();
return *foo;
}
And, yes, this does make your code more dangerous to people who couldn't be bothered to read your API docs but (IMNSHO) they deserve everything they get :-)
Why not keep it simple?
Foo& Instance()
{
scoped_lock lock(mutex);
static Foo instance;
return instance;
}
Edit: In C++11 where threads is introduced into the language. The following is thread safe. The language guarantees that instance is only initialized once and in a thread safe manor.
Foo& Instance()
{
static Foo instance;
return instance;
}
So its lazily evaluated. Its thread safe. Its very simple. Win/Win/Win.
This depends on what threading library you're using. If you're using C++0x you can use atomic compare-and-swap operations and write barriers to guarantee that double-checked locking works. If you're working with POSIX threads or Windows threads, you can probably find a way to do it. The bigger question is why? Singletons, it turns out, are usually unnecessary.
the new operator in c++ always invovle 2-steps process :
1.) allocating memory identical to simple malloc
2.) invoke constructor for given data type
Foo* p = new Foo;
foo = p;
the code above will make the singleton creation into 3 step, which is even vulnerable to problem you trying to solve.
Why don't you just use a real mutex ensuring that only one thread will attempt to create foo?
Foo& Instance() {
if (!foo) {
pthread_mutex_lock(&lock);
if (!foo) {
Foo *p = new Foo;
foo = p;
}
pthread_mutex_unlock(&lock);
}
return *foo;
}
This is a test-and-test-and-set lock with free readers. Replace the above with a reader-writer lock if you want reads to be guaranteed safe in a non-atomic-replacement environment.
edit: if you really want free readers, you can write foo first, and then write a flag variable fooCreated = 1. Checking fooCreated != 0 is safe; if fooCreated != 0, then foo is initialized.
Foo& Instance() {
if (!fooCreated) {
pthread_mutex_lock(&lock);
if (!fooCreated) {
foo = new Foo;
fooCreated = 1;
}
pthread_mutex_unlock(&lock);
}
return *foo;
}
It has nothing wrong with your code. After the scoped_lock, there will be only one thread in that section, so the first thread that enters will initialize foo and return, and then second thread(if any) enters, it will return immediately because foo is not null anymore.
EDIT: Pasted the simplified code.
Foo& Instance() {
if (!foo) {
scoped_lock lock(mutex);
// only one thread can enter here
if (!foo)
foo = new Foo;
}
return *foo;
}
Thanks for all your input. After consulting Joe Duffy's excellent book, "Concurrent Programming on Windows", I am now thinking that I should be using the code below. It's largely the code from his book, except for some renames and the InterlockedXXX line. The following implementation uses:
volatile keyword on both the temp and "actual" pointers to protect against re-ordering from the compiler.
InterlockedCompareExchangePointer to protect against reordering from
the CPU.
So, that should be pretty safe (... right?):
template <typename T>
class LazyInit {
public:
typedef T* (*Factory)();
LazyInit(Factory f = 0)
: factory_(f)
, singleton_(0)
{
::InitializeCriticalSection(&cs_);
}
T& get() {
if (!singleton_) {
::EnterCriticalSection(&cs_);
if (!singleton_) {
T* volatile p = factory_();
// Joe uses _WriterBarrier(); then singleton_ = p;
// But I thought better to make singleton_ = p atomic (as I understand,
// on Windows, pointer assignments are atomic ONLY if they are aligned)
// In addition, the MSDN docs say that InterlockedCompareExchangePointer
// sets up a full memory barrier.
::InterlockedCompareExchangePointer((PVOID volatile*)&singleton_, p, 0);
}
::LeaveCriticalSection(&cs_);
}
#if SUPPORT_IA64
_ReadBarrier();
#endif
return *singleton_;
}
virtual ~LazyInit() {
::DeleteCriticalSection(&cs_);
}
private:
CRITICAL_SECTION cs_;
Factory factory_;
T* volatile singleton_;
};

Does a static object within a function introduce a potential race condition?

I'm curious about the following code:
class MyClass
{
public:
MyClass() : _myArray(new int[1024]) {}
~MyClass() {delete [] _myArray;}
private:
int * _myArray;
};
// This function may be called by different threads in an unsynchronized manner
void MyFunction()
{
static const MyClass _myClassObject;
[...]
}
Is there a possible race condition in the above code? Specifically, is the compiler likely to generate code equivalent to the following, "behind the scenes"?
void MyFunction()
{
static bool _myClassObjectInitialized = false;
if (_myClassObjectInitialized == false)
{
_myClassObjectInitialized = true;
_myClassObject.MyClass(); // call constructor to set up object
}
[...]
}
... in which case, if two threads were to call MyFunction() nearly-simultaneously, then _myArray might get allocated twice, causing a memory leak?
Or is this handled correctly somehow?
There's absolutely a possible race condition there. Whether or not there actually is one is pretty damn undefined. You shouldn't use such code in single-threaded scenarios because it's bad design, but it could be the death of your app in multithreaded. Anything that is static const like that should probably go in a convenient namespace, and get allocated at the start of the application.
Use a semaphore if you're using multiple threads, its's what they're for.