I know that the common implementation of thread-safe singleton looks like this:
Singleton* Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
Singleton* temp = new Singleton; // initialize to temp
pInstance = temp; // assign temp to pInstance
}
}
return pInstance;
}
But why they say that it is a thread-safe implementation?
For example, the first thread can pass both tests on pInstance == 0, create new Singleton and assign it to the temp pointer and then start assignment pInstance = temp (as far as I know, the pointer assignment operation is not atomic).
At the same time the second thread tests the first pInstance == 0, where pInstance is assigned only half. It's not nullptr but not a valid pointer too, which then returned from the function.
Can such a situation happen? I didn't find the answer anywhere and seems that it is a quite correct implementation and I don't understand anything
It's not safe by C++ concurrency rules, since the first read of pInstance is not protected by a lock or something similar and thus does not synchronise correctly with the write (which is protected). There is therefore a data race and thus Undefined Behaviour. One of the possible results of this UB is precisely what you've identified: the first check reading a garbage value of pInstance which is just being written by a different thread.
The common explanation is that it saves on acquiring the lock (a potentially time-expensive operation) in the more common case (pInstance already valid). However, it's not safe.
Since C++11 and beyond guarantees initialisation of function-scope static variables happens only once and is thread-safe, the best way to create a singleton in C++ is to have a static local in the function:
Singleton& Singleton::instance() {
static Singleton s;
return s;
}
Note that there's no need for either dynamic allocation or a pointer return type.
As Voo mentioned in comments, the above assumes pInstance is a raw pointer. If it was std::atomic<Singleton*> instead, the code would work just fine as intended. Of course, it's then a question whether an atomic read is that much slower than obtaining the lock, which should be answered via profiling. Still, it would be a rather pointless exercise, since the static local variables is better in all but very obscure cases.
Related
I have a global reference-counted object obj that I want to protect from data races by using atomic operations:
T* obj; // initially nullptr
std::atomic<int> count; // initially zero
My understanding is that I need to use std::memory_order_release after I write to obj, so that the other threads will be aware of it being created:
void increment()
{
if (count.load(std::memory_order_relaxed) == 0)
obj = std::make_unique<T>();
count.fetch_add(1, std::memory_order_release);
}
Likewise, I need to use std::memory_order_acquire when reading the counter, to ensure the thread has visibility of obj being changed:
void decrement()
{
count.fetch_sub(1, std::memory_order_relaxed);
if (count.load(std::memory_order_acquire) == 0)
obj.reset();
}
I am not convinced that the code above is correct, but I'm not entirely sure why. I feel like after obj.reset() is called, there should be a std::memory_order_release operation to inform other threads about it. Is that correct?
Are there other things that can go wrong, or is my understanding of atomic operations in this case completely wrong?
It is wrong regardless of memory ordering.
As #MaartenBamelis pointed out for concurrent calling of increment the object is constructed twice. And the same is true for concurrent decrement: object is reset twice (which may result in double destructor call).
Note that there's disagreement between T* obj; declaration and using it as unique_ptr but neither raw pointer not unique pointer are safe for concurrent modification. In practice, reset or delete will check pointer for null, then delete and set it to null, and these steps are not atomic.
fetch_add and fetch_sub are fetch and op instead of just op for a reason: if you don't use the value observed during operation, it is likely to be a race.
This code is inherently racey. If two threads call increment at the same time when count is initially 0, both will see count as 0, and both will create obj (and race to see which copy is kept; given unique_ptr has no special threading protections, terrible things can happen if two of them set it at once).
If two threads decrement at the same time (holding the last two references), and finish the fetch_sub before either calls load, both will reset obj (also bad).
And if a decrement finishes the fetch_sub (to 0), then another thread increments before the decrement load occurs, the increment will see the count as 0 and reinitialize. Whether the object is cleared after being replaced, or replaced after being cleared, or some horrible mixture of the two, will depend on whether increment's fetch_add runs before or after decrement's load.
In short: If you find yourself using two separate atomic operations on the same variable, and testing the result of one of them (without looping, as in a compare and swap loop), you're wrong.
More correct code would look like:
void increment() // Still not safe
{
// acquire is good for the != 0 case, for a later read of obj
// or would be if the other writer did a release *after* constructing an obj
if (count.fetch_add(1, std::memory_order_acquire) == 0)
obj = std::make_unique<T>();
}
void decrement()
{
if (count.fetch_sub(1, std::memory_order_acquire) == 1)
obj.reset();
}
but even then it's not reliable; there's no guarantee that, when count is 0, two threads couldn't call increment, both of them fetch_add at once, and while exactly one of them is guaranteed to see the count as 0, said 0-seeing thread might end up delayed while the one that saw it as 1 assumes the object exists and uses it before it's initialized.
I'm not going to swear there's no mutex-free solution here, but dealing with the issues involved with atomics is almost certainly not worth the headache.
It might be possible to confine the mutex to inside the if() branches, but taking a mutex is also an atomic RMW operation (and not much more than that for a good lightweight implementation) so this doesn't necessarily help a huge amount. If you need really good read-side scaling, you'd want to look into something like RCU instead of a ref-count, to allow readers to truly be read-only, not contending with other readers.
I don't really see a simple way of implementing a reference-counted resource with atomics. Maybe there's some clever way that I haven't thought of yet, but in my experience, clever does not equal readable.
My advice would be to implement it first using a mutex. Then you simply lock the mutex, check the reference count, do whatever needs to be done, and unlock again. It's guaranteed correct:
std::mutex mutex;
int count;
std::unique_ptr<T> obj;
void increment()
{
auto lock = std::scoped_lock{mutex};
if (++count == 1) // Am I the first reference?
obj = std::make_unique<T>();
}
void decrement()
{
auto lock = std::scoped_lock{mutex};
if (--count == 0) // Was I the last reference?
obj.reset();
}
Although at this point, I would just use a std::shared_ptr instead of managing the reference count myself:
std::mutex mutex;
std::weak_ptr<T> obj;
std::shared_ptr<T> acquire()
{
auto lock = std::scoped_lock{mutex};
auto sp = obj.lock();
if (!sp)
obj = sp = std::make_shared<T>();
return sp;
}
I believe this also makes it safe when exceptions may be thrown when constructing the object.
Mutexes are surprisingly performant, so I expect that locking code is plenty quick unless you have a highly specialized use case where you need code to be lock-free.
This question already has answers here:
PTHREAD_MUTEX_INITIALIZER vs pthread_mutex_init ( &mutex, param)
(4 answers)
Closed 3 years ago.
So my concern is hitting a function which creates a thread before main() gets called. At that point, thread safe functions need to have a mutex which is ready. Only, there is no way to guarantee the order in which global objects get initialized in C++ (well, to some extend, we can, but still, in a large project, good luck with that!)
So I like to have my global objects, such as singletons, allocated at runtime. That way, they get initialized the first time you call their instance() function and that means they are ready when needed. However, if I have two or more threads running, they may each end up creating an instance unless I make sure to have a valid lock in that function.
I want to make sure that using the following initialization:
pthread_mutex_t global_mutex = PTHREAD_MUTEX_INITIALIZER;
always happen at compile time and thus that global_mutex is always going to be ready for pthread_mutex_lock().
This means my singleton (and other classes that need initialization before main() gets called) can rely on that mutex as in the following simplified code:
// singleton.h
class singleton
{
public:
static singleton * instance();
...other functions...
private:
singleton();
};
// singleton.cpp
pthread_mutex_t global_mutex = PTHREAD_MUTEX_INITIALIZER;
singleton * global_singleton = nullptr;
singleton::singleton()
{
// do some initialization
}
singleton * singleton::instance()
{
pthread_mutex_lock(&global_mutex); // <- works even before main() called?
if(global_singleton == nullptr)
{
global_singleton = new singleton();
}
pthread_mutex_unlock(&global_mutex);
return global_singleton;
}
Note: I'm not asking about the validity of the singleton pattern. Just about the mutex. Also, no, you can't safely use a smart pointer in the global variable because you can't know whether it will be initialized before you call the instance() function. If you want to return a smart pointer, you can always use a bare pointer to a smart pointer. It doesn't change much anything for the singleton. It still won't get destroyed automatically.
In C++11, function-static variables are required by the standard to be initialized in a thread-safe fashion. How the implementation pulls that off is up to it, but the compiler is required to generate something to ensure thread safety.
So you can do this:
singleton * singleton::instance()
{
static auto global_singleton = new singleton();
return global_singleton;
}
Of course, you must understand that complex interactions of these "somethings" that make static initialization thread-safe may well lead to deadlocks. This is yet another reason why the singleton pattern should be avoided.
One solution is to use the pthread_once mechanism which is designed to provide a guaranteed one-time initialisation for libraries:
pthread_once
Adapting your example, something like:
#include <pthread.h>
static pthread_once_t lib_is_initialized = PTHREAD_ONCE_INIT;
statec pthread_mutex_t global_mutex = PTHREAD_MUTEX_INITIALIZER;
void initialise_lib()
{
pthread_mutex_init(&global_mutex, NULL);
}
singleton * singleton::instance()
{
pthread_once(&lib_is_initialized, initialize_lib);
pthread_mutex_lock(&global_mutex);
if(global_singleton == nullptr)
{
global_singleton = new singleton();
}
pthread_mutex_unlock(&global_mutex);
return global_singleton;
}
However this is more complex than necessary. You can rely on the safe initialisation of a static variable being performed once, so I would recommend using the simpler solution posted by #Nicol Bolas.
Reading this article about Double Checked Locking Pattern in C++, I reached the place (page 10) where the authors demonstrate one of the attempts to implement DCLP "correctly" using volatile variables:
class Singleton {
public:
static volatile Singleton* volatile instance();
private:
static volatile Singleton* volatile pInstance;
};
// from the implementation file
volatile Singleton* volatile Singleton::pInstance = 0;
volatile Singleton* volatile Singleton::instance() {
if (pInstance == 0) {
Lock lock;
if (pInstance == 0) {
volatile Singleton* volatile temp = new Singleton;
pInstance = temp;
}
}
return pInstance;
}
After such example there is a text snippet that I don't understand:
First, the Standard’s constraints on observable behavior are only for
an abstract machine defined by the Standard, and that abstract machine
has no notion of multiple threads of execution. As a result, though
the Standard prevents compilers from reordering reads and writes to
volatile data within a thread, it imposes no constraints at all on
such reorderings across threads. At least that’s how most compiler
implementers interpret things. As a result, in practice, many
compilers may generate thread-unsafe code from the source above.
and later:
... C++’s abstract machine is single-threaded, and C++ compilers may
choose to generate thread-unsafe code from source like the above,
anyway.
These remarks are related to the execution on the uni-processor, so it's definitely not about cache-coherence issues.
If the compiler can't reorder reads and writes to volatile data within a thread, how can it reorder reads and writes across threads for this particular example thus generating thread-unsafe code?
The pointer to the Singleton may be volatile, but the data within the singleton is not.
Imagine Singleton has int x, y, z; as members, set to 15, 16, 17 in the constructor for some reason.
volatile Singleton* volatile temp = new Singleton;
pInstance = temp;
OK, temp is written before pInstance. When are x,y,z written relative to those? Before? After? You don't know. They aren't volatile, so they don't need to be ordered relative to the volatile ordering.
Now a thread comes in and sees:
if (pInstance == 0) { // first check
And let's say pInstance has been set, is not null.
What are the values of x,y,z? Even though new Singleton has been called, and the constructor has "run", you don't know whether the operations that set x,y,z have run or not.
So now your code goes and reads x,y,z and crashes, because it was really expecting 15,16,17, not random data.
Oh wait, pInstance is a volatile pointer to volatile data! So x,y,z is volatile right? Right? And thus ordered with pInstance and temp. Aha!
Almost. Any reads from *pInstance will be volatile, but the construction via new Singleton was not volatile. So the initial writes to x,y,z were not ordered. :-(
So you could, maybe, make the members volatile int x, y, z; OK. However...
C++ now has a memory model, even if it didn't when the article was written. Under the current rules, volatile does not prevent data races. volatile has nothing to do with threads. The program is UB. Cats and Dogs living together.
Also, although this is pushing the limits of the standard (ie it gets vague as to what volatile really means), an all-knowing, all-seeing, full-program-optimizing compiler could look at your uses of volatile and say "no, those volatiles don't actually connect to any IO memory addressses etc, they really aren't observable behaviour, I'm just going to make them non-volatile"...
I think they're referring to the cache coherency problem discussed in section 6 ("DCLP on Multiprocessor Machines". With a multiprocessor system, the processor/cache hardware may write out the value for pInstance before the values are written out for the allocated Singleton. This can cause a 2nd CPU to see the non-NULL pInstance before it can see the data it points to.
This requires a hardware fence instruction to ensure all the memory is updated before other CPUs in the system can see any of it.
If I'm understanding correctly they are saying that in the context of a single-thread abstract machine the compiler may simply transform:
volatile Singleton* volatile temp = new Singleton;
pInstance = temp;
Into:
pInstance = new Singleton;
Because the observable behavior is unchanged. Then this brings us back to the original problem with double checked locking.
I have read many questions considering thread-safe double checked locking (for singletons or lazy init). In some threads, the answer is that the pattern is entirely broken, others suggest a solution.
So my question is: Is there a way to write a fully thread-safe double checked locking pattern in C++? If so, how does it look like.
We can assume C++11, if that makes things easier. As far as I know, C++11 improved the memory model which could yield the needed improvements.
I do know that it is possible in Java by making the double-check guarded variable volatile. Since C++11 borrowed large parts of the memory model from the one of Java, so I think it could be possible, but how?
Simply use a static local variable for lazily initialized Singletons, like so:
MySingleton* GetInstance() {
static MySingleton instance;
return &instance;
}
The (C++11) standard already guarantees that static variables are initialized in a threadsafe manner and it seems likely that the implementation of this at least as robust and performant as anything you'd write yourself.
The threadsafety of the initialization can be found in §6.7.4 of the (C++11) standard:
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
Since you wanted to see a valid DCLP C++11 implementation, here is one.
The behavior is fully thread-safe and identical to GetInstance() in Grizzly's answer.
std::mutex mtx;
std::atomic<MySingleton *> instance_p{nullptr};
MySingleton* GetInstance()
{
auto *p = instance_p.load(std::memory_order_acquire);
if (!p)
{
std::lock_guard<std::mutex> lck{mtx};
p = instance_p.load(std::memory_order_relaxed);
if (!p)
{
p = new MySingleton;
instance_p.store(p, std::memory_order_release);
}
}
return p;
}
I would like to use singleton pattern in a multithreaded program. Double-checked locking method seems suitable for its efficiency, however this method is broken and not easy to get right.
I write the following code hoping that it works as an alternative to the double-checked locking. Is it a correct implementation of a thread-safe singleton pattern?
static bool created = false;
static Instance *instance = 0;
Instance *GetInstance() {
if (!created) {
Lock lock; // acquire a lock, parameters are omitted for simplicity
if (!instance) {
instance = new Instance;
} else {
created = true;
}
}
return instance;
}
The first call will create Instance. The second call will set created to true. And finally, all other calls will return a well initialized instance.
http://voofie.com/content/192/alternative-to-double-checked-locking-and-the-singleton-pattern/
No, this doesn't help. If the writes to created and instance are non-atomic then there is no guarantee that the values are visible to a thread that doesn't lock the mutex.
e.g. Thread 1 calls getInstance. created is false, and instance is null, so it locks the mutex and creates a new instance. Thread 1 calls getInstance again, and this time sets created to true. Thread 2 now calls getInstance. By the vagaries of the processor's memory management it sees created as true, but there is no guarantee that it also sees instance as non-null, and even if it does there is no guarantee that the memory values for the pointed-to instance are consistent.
If you're not using atomics then you need to use mutexes, and must use them for all accesses to a protected variable.
Additional info: If you are using mutexes, then the compiler and runtime work together to ensure that when one thread releases a mutex lock and another thread acquires a lock on that same mutex then the second thread can see all the writes done by the first. This is not true for non-atomic accesses, and may or may not be true for atomic accesses, depending on what memory ordering constraints the compiler and runtime guarantee for you (with C++11 atomics you can choose the ordering constraints).
It has the same reliability of the double-checked locking.
You can get more with "triple check", or even "quadruple-check", but full reliability can be demonstrated to be impossible.
Please note that declaring a local-static variable will make the compiler to implement itself your same logic.
#include<memory>
#include "Instance.h" //or whatever...
Instance* GetInstance()
{
static std::unique_ptr<Instance> p(new Instance);
return p.get();
}
If the compiler is configured for multithreading environment, it sould protect the static p with a mutex and manage the lock when initializing p at the very first call. It also should chain p destruction to the tail of the "at_exit" chain, so that -at program end- proper destruction will be performed.
[EDIT]
Since this is a requirement for C++11 and is implemented only in some C++03 pre-standard, check the compiler implementation and settings.
Right now, I can only ensure MinGW 4.6 on and VS2010 already did it.
No. There's absolutely no difference between your code and double
checked locking. The correct implementation is:
static std::mutex m;
Singleton&
Singleton::instance()
{
static Singleton* theOneAndOnly;
std::lock_guard l(m);
if (theOneAndOnly == NULL)
theOneAndOnly = new Singleton;
return *theOneAndOnly;
}
It's hard to imagine a case where this would cause a problem, and it is
guaranteed. You do aquire the lock each time, but aquiring an
uncontested mutex should be fairly cheap, you're not accessing
Singleton's that much, and if you do end up having to access it in the
middle of a tight loop, there's nothing to stop you from acquiring a
reference to it before entering the loop, and using it.
This code contains a race-condition, in that created can be read while it is concurrently being written to by a different thread.
As a result, it has undefined behaviour, and is not a valid way of writing that code.
As KennyTM pointed out in the comments, a far better alternative is:
Instance* GetInstance() { static Instance instance; return &instance; }
your solution will work fine, but it does one more check.
the right double check locking looks like,
static Instance *instance = 0;
Instance *GetInstance() {
if (instance == NULL) //first check.
{
Lock lock; //scope lock.
if (instance == NULL) //second check, the second check must under the lock.
instance = new Instance;
}
return instance;
}
the double check locking will have a good performance for it does not acquire the lock every time. and it is thread-safe the locking will gurentee there is only one instance created.