I've seen singleton implemented using double-check locking like this:
Foo& Foo::Instance()
{
static std::unique_ptr<Foo> instance;
if (!instance)
{
boost::unique_lock<boost::mutex> lock(MUTEX);
if (!instance)
instance.reset(new Foo());
}
return *instance;
}
I know that double-checked locking is fixed in C++, but in our project we use Visual C++ 2010, which doesn't support all C++11 features.
I'm curious: in which ways is this code unsafe?
Problem 1
In Meyers-Alexandrescu paper, there is a clear example of how naive DCL may fail because of "unexpected" order of actual commands.
Singleton* Singleton::instance() {
if (pInstance == 0) { // 1st test
Lock lock;
if (pInstance == 0) { // 2nd test
pInstance = new Singleton;
}
}
return pInstance;
}
Here, pInstance = new Singleton; actuall consists of 3 steps: memory allocation, Singleton constructor and assignment, and compiler may place them in different order. If assignment happens before constructor, concurrent thread may end up using uninitialized piece of memory instead of valid Singleton instance.
I wonder: does it still apply to my example, where instead of plain pointer unique_ptr is used?
From the first look, instance.reset(new Foo()); seems OK: reset is called only after Foo is fully initialized. But what if inlining occurs? I'm in doubt about thread-safety of this.
Problem 2
Another concern: is static std::unique_ptr<Foo> instance; itself safe? AFAIK, static X x; translates to something like this:
if (!x_is_initialized) {
x_is_initialized = true;
x = new X()
}
So in my case, unique_ptr may be allocated by thread #1 (but not constructed yet!), then thread #2 takes over, and it sees that unique_ptr is seemingly initialized and has some value (actually a bogus pointer, which has not yet been replaced by nullptr). Thread #2 happily dereferences that bogus and program crashes with access violation.
Can it actually happen?
Don't optimize prematurely
Unfortunately, MSVC 2010 does does not support 'magic statics' that, in effect, perform automatic double-checked locking. But before you start optimizing here: Do you REALLY need it? Don't complicate your code unless it's really necessary. Especially, since you have MSVC 2010 which does not fully support C++11 you don't have any portable standard way that guarantees proper multi-threading.
The way to get it to work
However, you can use boost::atomic<Foo*> to deal with the problem and the compiler will most likely handle the problem correctly. If you really want to be sure, check the compiled assembly code in both debug and release mode. The assignment to an atomic pointer is guaranteed to take place after the construction, even if code is inlined. This is due to special compiler intrinsics for atomic operations which are guaranteed not be be reordered with writes that are supposed to happen before the write to the atomic variable.
The following code should do the trick:
Foo & Foo::Instance()
{
static boost::atomic<Foo *> instance; // zero-initialized, since static
if ( !instance.load() )
{
boost::lock_guard<boost::mutex> lock(mutex);
if ( !instance.load() )
{
// this code is guaranteed to be called at most once.
instance = new Foo;
std::atexit( []{ delete &Instance(); } );
}
}
return *instance.load();
}
Problem 1
Your compiler might still reorder things in some optimization pass. If the compiler doesn't, then the processor might do some construction reordering. Unless you use genuine atomics with their special instructions or thread-safe constructs like mutexes and condition variables you will get races, if you access a variable through different threads at the same time and at least one of them is writing. Never EVER do that. Again, boost::atomic will do the job (most likely).
Problem 2
That is exactly what magic statics are supposed to do: They safely initialize static variables that are accessed concurrently. MSVC 2010 does not support this. Therefore, don't use it. The code that is produced by the compiler will be unsafe. What you suspected in your question can in theory really happen. By the way: The memory for static variables is reserved at program start-up and is AFAIK zero-initialized. No new operator is called to reserve the memory for the static variable.
Still a problem?
The std::atexit() function might not be thread-safely implemented in MSVC 2010 and should possibly not be used at all, or should only be used in the main() thread. Most implementations of double-checked locking ignore this clean-up problem. And it is no problem as long as the destructor of Foo does nothing important. The unfreed memory, file handles and so forth will be reclaimed by the operating system anyways. Examples of something important that could be done in the destructor are notifying another process about something or serializing state to disk that will be loaded at the next start of the application. If you are interested in double checked locking there's a really good talk Lock-Free Programming (or, Juggling Razor Blades) by Herb Sutter that covers this topic.
The implementation is not thread safe: there is a data race since it's possible that some thread is accessing instance at the same time that another is modifying it:
static std::unique_ptr<Foo> instance;
if (!instance) // Read instance
{
boost::unique_lock<boost::mutex> lock(MUTEX);
if (!instance)
instance.reset(new Foo()); // Modify instance
}
Programs with data races have undefined behavior, realistically for VC++ on x86 it could result in:
Reader threads seeing a pointer value in instance to a Foo that isn't fully constructed. The compiler is free to reorder the store into instance with the construction of the Foo, since such a reordering would not be observable in a program without data races.
Multiple Foos being constructed simultaneously and leaked. The compiler may optimize away the check of !instance inside the lock, since it already knows the value of instance from the earlier check and that value could not have changed in a program without data races. This results in multiple threads allocating Foos and storing them in the unique_ptr. The check of the contained pointer value could be optimized out inside reset as well, resulting in leaked Foo objects.
For the question of whether or not static std::unique_ptr<foo> instance; is thread-safe in the first place, the answer is a firm maybe. std::unique_ptr's default constructor is constexpr in C++11, so instance should be zero-filled during constant initialization before main is even entered. Of course VS2010 does not support constexpr, so your guess is as good as mine. Examination of the generated assembly should give you an idea.
If you're OK with C++11, everything just got a whole lot simpler thanks to ยง6.7.4:
If control enters
the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for
completion of the initialization
How simple? This simple:
Foo& Foo::Instance()
{
// this is thread-safe in C++11
static Foo instance;
return instance;
}
To what degree VC2010 supports this, I do not know. And while I realize this doesn't literally answer your questions about your specific problems (I believe that instance.reset() solves the simple pointer problem, since reset() is basically just an assignment, but am not sure), hopefully it's helpful anyway.
Related
It seems fairly common to have pthread mutexes that are intended to exist until the end of a program's lifetime. Often these are created using PTHREAD_MUTEX_INITIALIZER.
Here's a short but complete code example showing what I'm referring to:
#include <pthread.h>
#include <iostream>
void log(char const * const message) {
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&mutex);
std::cout << message << std::endl;
pthread_mutex_unlock(&mutex);
}
struct Object final {
Object() { log("Object::Object()"); }
~Object() { log("Object::~Object()"); }
};
Object const object;
int main(int const argc, const char * argv[]) {
log("main()");
// Here the program would enter a main loop, with log() potentially being
// called from multiple threads at various times.
}
Output:
Object::Object()
main()
Object::~Object()
Error-checking and RAII wrapping for the lock are omitted for brevity.
This example includes a thread-safe (at least that's the intent) logging function that's intended to be available throughout the lifetime of the program, including during deinitialization of objects with static storage duration, possibly (although not in this case) across multiple translation units, meaning that deinitialization order may be indeterminate.
The issue is that there's no opportunity to safely destroy the mutex, as it may be needed at any point in the program's lifetime. (In practice I don't intend to have objects with static storage duration and nontrivial destructors, but I'm interested in addressing the issue nonetheless.)
The first question that arises is whether mutexes initialized using PTHREAD_MUTEX_INITIALIZER need to be destroyed using pthread_mutex_destroy(). At least some versions of the documentation include this wording:
In cases where default mutex attributes are appropriate, the macro
PTHREAD_MUTEX_INITIALIZER can be used to initialize mutexes. The
effect shall be equivalent to dynamic initialization by a call to
pthread_mutex_init() with parameter attr specified as NULL, except
that no error checks are performed.
This suggests that if pthread_mutex_destroy() is expected to be called on mutexes initialized using pthread_mutex_init(), then it's expected to be called on mutexes initialized using PTHREAD_MUTEX_INITIALIZER as well.
However, in searching online and here on Stack Overflow, I've found disagreement as to whether it's required. For example, here someone offers the following quote from a book on Linux development:
It is not necessary to call pthread_mutex_destroy() on a mutex that
was statically initialized using PTHREAD_MUTEX_INITIALIZER.
On the other hand, in this thread it's argued that explicitly destroying such a mutex is in fact required.
I've also seen it argued that it's not necessary to clean up mutexes in these sorts of circumstances regardless of how they're initialized because the resources will be reclaimed anyway. (This would presumably be the same logic behind the 'construct on first use and deliberately leak memory' idiom sometimes used for singletons and other objects with static storage duration.)
I found a number of other threads that touch on the subject, with a mix of opinions on if/how mutexes should be destroyed. I'll also mention that I believe I've seen production code from credible sources that initializes mutexes with PTHREAD_MUTEX_INITIALIZER and never destroys them.
I've gone into some detail here as a matter of due diligence, but my question is (I think) fairly simple. It would be useful to have mutexes that are available from initialization to the very end of the program's lifetime. I suspect not cleaning up such mutexes wouldn't cause any problems, but that approach troubles me. And even though some say mutexes initialized using PTHREAD_MUTEX_INITIALIZER don't need to be cleaned up, that seems contrary to the documentation and to various claims made by others.
In summary, is there a safe and reasonable way to manage pthread mutexes that are intended to be available until the end of a program's lifetime? Is there any standard best practice here that I've failed to stumble upon in my search?
Since initializing with PTHREAD_MUTEX_INITIALIZER is equivalent to calling pthread_mutex_init, it is OK to call pthread_mutex_destroy to destroy such a mutex.
However, calling pthread_mutex_destroy is not required; the resources will be reclaimed by the OS at program exit. Since it is an object with a trivial destructor, it is not destroyed as part of static cleanup at program exit, and so is safe to use until the end of the program.
class SelfTesting
{
private:
char * pChar;
SelfTesting()
{
pChar = new char[1024 * 1024 * 1024];
}
public:
static SelfTesting * pSelf;
static SelfTesting& GetInst()
{
if (!pSelf)
{
boost::lock_guard<boost::mutex> lock(g_boost_mutex);
if (!pSelf)
{
pSelf = new SelfTesting;
}
}
return *pSelf;
}
};
Generally, I know the problem is caused by:
1. allocate memory for `SelfTesting`
2. store the pointer to the allocated memory in `pChar`
3. initialize `SelfTesting`
If other thread touch the pointer between step 2 and 3, data race condition happens. From
If the pointer copying is not atomic, data race condition can also happens. From
I know that I can use local static variable to implement this pattern in C++11. My question is that is the above implementation thread safe or it is undefined when I am using C++ standard before C++11. Is boost::mutex make sure that pSelf will update after lock is died?
As written, this pattern is not safe in any version of C++. In C++11 terms, you have a data race between the outer read of pSelf and the write of pSelf.
In general, pre-C++11, there's no such thing as guaranteed safety of multi-threaded code. The most important thing about C++11 was that it introduced into the abstract model of C++ execution the idea that there might be multiple threads of execution in the first place. Before that, there were absolutely no guarantees whatsoever about multi-threaded execution, because the concept didn't exist. Any situation where you have more than one thread is undefined as far as the standard is concerned, because it doesn't define what a thread is.
This means basically that any multi-threaded code you write in C++98 is entirely dependent on the particular implementation you use. There were some things you could rely on with mainstream compilers, though in the development of C++11, several behaviors (especially optimizations) were found in the various compilers that were not actually safe when there were multiple threads.
But this is all irrelevant, because the code as you wrote it was never guaranteed to be safe in any compiler I'm aware of. With Visual C++, you could have made it safe by making pSelf volatile (VC++ treated volatile variables sort of like atomics), but that wouldn't have worked in GCC. In GCC, you would have to use the atomic intrinsics, or the Boost.Atomic library.
I wonder, is it safe to implement like this? :
typedef shared_ptr<Foo> FooPtr;
FooPtr *gPtrToFooPtr // global variable
// init (before any thread has been created)
void init()
{
gPtrToFooPtr = new FooPtr(new Foo);
}
// thread A, B, C, ..., K
// Once thread Z execute read_and_drop(),
// no more call to read() from any thread.
// But it is possible even after read_and_drop() has returned,
// some thread is still in read() function.
void read()
{
FooPtr a = *gPtrToFooPtr;
// do useful things (read only)
}
// thread Z (executed once)
void read_and_drop()
{
FooPtr b = *gPtrToFooPtr;
// do useful things with a (read only)
b.reset();
}
We do not know which thread would do the actual realease.
Does boost's shared_ptr do the release safely under circumstance like this?
According to boost's document, thread safety of shared_ptr is:
A shared_ptr instance can be "read" (accessed using only const
operations) simultaneously by multiple threads. Different shared_ptr
instances can be "written to" (accessed using mutable operations such
as operator= or reset) simultaneosly by multiple threads.
As far as I am concerned, the code above does not violate any of thread safety criteria I mentioned above. And I believe the code should run fine. Does anyone tell me if I am right or wrong?
Thanks in advance.
Editted 2012-06-20 01:00 UTC+9
The pseudo code above works fine. The shared_ptr implementation guarantees to work correctly under circumstances where multiple thread is accessing instances of it (each thread MUST access its own instance of shared_ptr instantiated by using copy constructor).
Note that in the pseudo code above, you must delete gPtrToFooPtr to have the shared_ptr implementation finally release (drop the reference count by one) the object it owns(not proper expression since it is not an auto_ptr, but who cares ;) ). And in this case, you must be aware of the fact that it may cause SIGSEGV in multithreaded application.
How do you define 'safe' here? If you define it as 'I want to make sure that the object is destroyed exactly once', then YES, the release is safe. However, the problem is that the two threads share one smart pointer in your example. This is not safe at all. The reset() performed by one thread might not be visible to the other thread.
As stated by the documentation, smart pointers offer the same guarantees as built in types (i.e., pointers). Therefore, it is problematic to perform an unguarded write while an other thread might still be reading. It is undefined when that other reading thread will see writes of the other one. Therefore, while one thread calls reset() the pointer might NOT be reset in the other thread, since the shared_ptr instance itself is shared.
If you want some sort of thread safety, you have to use two shared pointer instances. Then, of course, resetting one of them WILL NOT release the object, since the other thread still has a reference to it. Usually this behaviour is intended.
However, I think the bigger problem is that you are misusing shared_ptrs. It is quite uncommon to use pointers of shared_ptrs and to allocate the shared_ptr on the heap (using new). If you do that, you have the problem you wanted to avoid using smart pointers again (you have to manage the lifetime of the shared_ptr now). Maybe check out some example code about smart pointers and their usage first.
For your own good, I will be honest.
Your code is doing many things and almost all are simply useless and absurd.
typedef shared_ptr<Foo> FooPtr;
FooPtr *gPtrToFooPtr // global variable
A raw pointer to a smart pointer, cancels the advantage of automatic resource management and does not solve any problem.
void read()
{
FooPtr a = *gPtrToFooPtr;
// do useful things (read only)
}
a is not used in any meaningful way.
{
FooPtr b = ...
b.reset();
}
b.reset() is useless here, b is about to be destroyed anyway. b has no purpose in this function.
I am afraid you have no idea what you are doing, what smart pointers are for, how to use shared_ptr, and how to do MT programming; so, you end up with this absurd pile of useless features to not solve the problem.
What about doing simple things simply:
Foo f;
// called before others functions
void init() {
// prepare f
}
// called in many threads {R1, R2, ... Rn} in parallel
void read()
{
// use f (read-only)
}
// called after all threads {R1, R2, ... Rn} have terminated
void read_and_drop()
{
// reset f
}
read_and_drop() must not be called before it can be guaranteed that other threads are not reading f.
To your edit:
Why not call reset() first on the global shared_ptr?
If you were the last one to access the object, fine it is deleted, then you delete the shared_ptr on the heap.
If some other thread still uses it, you reduce the ref count by one, and "disconnect" the global ptr from the (still existing) object that is pointed-to. You can then safely delete the shared_ptr on the heap without affecting any thread that might still use it.
I am working on a set that is frequently read but rarely written.
class A {
boost::shared_ptr<std::set<int> > _mySet;
public:
void add(int v) {
boost::shared_ptr<std::set<int> > tmpSet(new std::set<int>(*_mySet));
tmpSet->insert(v); // insert to tmpSet
_mySet = tmpSet; // swap _mySet
}
void check(int v) {
boost::shared_ptr<std::set<int> > theSet = _mySet;
if (theSet->find(v) != theSet->end()) {
// do something irrelevant
}
}
};
In the class, add() is only called by one thread and check() is called by many threads. check() does not care whether _mySet is the latest or not. Is the class thread-safe? Is it possible that the thread executing check() would observe swap _mySet happening before insert to tmpSet?
This is an interesting use of shared_ptr to implement thread safety.
Whether it is OK depends on the thread-safety guarantees of
boost::shared_ptr. In particular, does it establish some sort of
fence or membar, so that you are guaranteed that all of the writes in
the constructor and insert functions of set occur before any
modification of the pointer value becomes visible.
I can find no thread safety guarantees whatsoever in the Boost
documentation of smart pointers. This surprizes me, as I was sure that
there was some. But a quick look at the sources for 1.47.0 show none,
and that any use of boost::shared_ptr in a threaded environment will
fail. (Could someone please point me to what I'm missing. I can't
believe that boost::shared_ptr has ignored threading.)
Anyway, there are three possibilities: you can't use the shared pointer
in a threaded environment (which seems to be the case), the shared
pointer ensures its own internal consistency in a threaded environment,
but doesn't establish ordering with regards to other objects, or the
shared pointer establishes full ordering. Only in the last case will
your code be safe as is. In the first case, you'll need some form of
lock around everything, and in the second, you'll need some sort of
fences or membar to ensure that the necessary writes are actually done
before publishing the new version, and that they will be seen before
trying to read it.
You do need synchronization, it is not thread safe. Generally it doesn't matter, even something as simple as shared += value; is not thread safe.
look here for example with regards to thread safety of shared_ptr: Is boost shared_ptr <XXX> thread safe?
I would also question your allocation/swapping in add() and use of shared_ptr in check()
update:
I went back and re-rad dox for shared_ptr ... It is most likely thread-safe in your particular since the reference counting for shared_ptr is thread-safe. However you are doing (IMHO) unnecessary complexity by not using read/write lock.
Eventually this code should be thread safe:
atomic_store(&_my_set,tmpSet);
and
theSet = atomic_load(&_mySet);
(instead of simple assignments)
But I don't know the current status of atomicity support for shared_ptr.
Note, that adding atomicity to shared_ptr in lock-free manner is really dificult thing; so even atomicity is implemented it may relay on mutexes or usermode spinlocks and, therefore, may sometimes suffer from performance issues
Edit: Perhaps, volatile qualifier for _my_set member variable should also be added.. but I'm not sure that it is strictly required by semantics of atomic operations
I've been reading about thread-safe singleton patterns here:
http://en.wikipedia.org/wiki/Singleton_pattern#C.2B.2B_.28using_pthreads.29
And it says at the bottom that the only safe way is to use pthread_once - which isn't available on Windows.
Is that the only way of guaranteeing thread safe initialisation?
I've read this thread on SO:
Thread safe lazy construction of a singleton in C++
And seems to hint at an atomic OS level swap and compare function, which I assume on Windows is:
http://msdn.microsoft.com/en-us/library/ms683568.aspx
Can this do what I want?
Edit: I would like lazy initialisation and for there to only ever be one instance of the class.
Someone on another site mentioned using a global inside a namespace (and he described a singleton as an anti-pattern) - how can it be an "anti-pattern"?
Accepted Answer:
I've accepted Josh's answer as I'm using Visual Studio 2008 - NB: For future readers, if you aren't using this compiler (or 2005) - Don't use the accepted answer!!
Edit:
The code works fine except the return statement - I get an error:
error C2440: 'return' : cannot convert from 'volatile Singleton *' to 'Singleton *'.
Should I modify the return value to be volatile Singleton *?
Edit: Apparently const_cast<> will remove the volatile qualifier. Thanks again to Josh.
A simple way to guarantee cross-platform thread safe initialization of a singleton is to perform it explicitly (via a call to a static member function on the singleton) in the main thread of your application before your application starts any other threads (or at least any other threads that will access the singleton).
Ensuring thread safe access to the singleton is then achieved in the usual way with mutexes/critical sections.
Lazy initialization can also be achieved using a similar mechanism. The usual problem encountered with this is that the mutex required to provide thread-safety is often initialized in the singleton itself which just pushes the thread-safety issue to initialization of the mutex/critical section. One way to overcome this issue is to create and initialize a mutex/critical section in the main thread of your application then pass it to the singleton via a call to a static member function. The heavyweight initialization of the singleton can then occur in a thread-safe manner using this pre-initialized mutex/critical section. For example:
// A critical section guard - create on the stack to provide
// automatic locking/unlocking even in the face of uncaught exceptions
class Guard {
private:
LPCRITICAL_SECTION CriticalSection;
public:
Guard(LPCRITICAL_SECTION CS) : CriticalSection(CS) {
EnterCriticalSection(CriticalSection);
}
~Guard() {
LeaveCriticalSection(CriticalSection);
}
};
// A thread-safe singleton
class Singleton {
private:
static Singleton* Instance;
static CRITICAL_SECTION InitLock;
CRITICIAL_SECTION InstanceLock;
Singleton() {
// Time consuming initialization here ...
InitializeCriticalSection(&InstanceLock);
}
~Singleton() {
DeleteCriticalSection(&InstanceLock);
}
public:
// Not thread-safe - to be called from the main application thread
static void Create() {
InitializeCriticalSection(&InitLock);
Instance = NULL;
}
// Not thread-safe - to be called from the main application thread
static void Destroy() {
delete Instance;
DeleteCriticalSection(&InitLock);
}
// Thread-safe lazy initializer
static Singleton* GetInstance() {
Guard(&InitLock);
if (Instance == NULL) {
Instance = new Singleton;
}
return Instance;
}
// Thread-safe operation
void doThreadSafeOperation() {
Guard(&InstanceLock);
// Perform thread-safe operation
}
};
However, there are good reasons to avoid the use of singletons altogether (and why they are sometimes referred to as an anti-pattern):
They are essentially glorified global variables
They can lead to high coupling between disparate parts of an application
They can make unit testing more complicated or impossible (due to the difficultly in swapping real singletons with fake implementations)
An alternative is to make use of a 'logical singleton' whereby you create and initialise a single instance of a class in the main thread and pass it to the objects which require it. This approach can become unwieldy where there are many objects which you want to create as singletons. In this case the disparate objects can be bundled into a single 'Context' object which is then passed around where necessary.
If you are are using Visual C++ 2005/2008 you can use the double checked locking pattern, since "volatile variables behave as fences". This is the most efficient way to implement a lazy-initialized singleton.
From MSDN Magazine:
Singleton* GetSingleton()
{
volatile static Singleton* pSingleton = 0;
if (pSingleton == NULL)
{
EnterCriticalSection(&cs);
if (pSingleton == NULL)
{
try
{
pSingleton = new Singleton();
}
catch (...)
{
// Something went wrong.
}
}
LeaveCriticalSection(&cs);
}
return const_cast<Singleton*>(pSingleton);
}
Whenever you need access to the singleton, just call GetSingleton(). The first time it is called, the static pointer will be initialized. After it's initialized, the NULL check will prevent locking for just reading the pointer.
DO NOT use this on just any compiler, as it's not portable. The standard makes no guarantees on how this will work. Visual C++ 2005 explicitly adds to the semantics of volatile to make this possible.
You'll have to declare and initialize the CRITICAL SECTION elsewhere in code. But that initialization is cheap, so lazy initialization is usually not important.
While I like the accepted solution, I just found another promising lead and thought I should share it here: One-Time Initialization (Windows)
You can use an OS primitive such as mutex or critical section to ensure thread safe initialization however this will incur an overhead each time your singleton pointer is accessed (due to acquiring a lock). It's also non portable.
There is one clarifying point you need to consider for this question. Do you require ...
That one and only one instance of a class is ever actually created
Many instances of a class can be created but there should only be one true definitive instance of the class
There are many samples on the web to implement these patterns in C++. Here's a Code Project Sample
The following explains how to do it in C#, but the exact same concept applies to any programming language that would support the singleton pattern
http://www.yoda.arachsys.com/csharp/singleton.html
What you need to decide is wheter you want lazy initialization or not. Lazy initialization means that the object contained inside the singleton is created on the first call to it
ex :
MySingleton::getInstance()->doWork();
if that call isnt made until later on, there is a danger of a race condition between the threads as explained in the article. However, if you put
MySingleton::getInstance()->initSingleton();
at the very beginning of your code where you assume it would be thread safe, then you are no longer lazy initializing, you will require "some" more processing power when your application starts. However it will solve a lot of headaches about race conditions if you do so.
If you are looking for a more portable, and easier solution, you could turn to boost.
boost::call_once can be used for thread safe initialization.
Its pretty simple to use, and will be part of the next C++0x standard.
The question does not require the singleton is lazy-constructed or not.
Since many answers assume that, I assume that for the first phrase discuss:
Given the fact that the language itself is not thread-awareness, and plus the optimization technique, writing a portable reliable c++ singleton is very hard (if not impossible), see "C++ and the Perils of Double-Checked Locking" by Scott Meyers and Andrei Alexandrescu.
I've seen many of the answer resort to sync object on windows platform by using CriticalSection, but CriticalSection is only thread-safe when all the threads is running on one single processor, today it's probably not true.
MSDN cite: "The threads of a single process can use a critical section object for mutual-exclusion synchronization. ".
And http://msdn.microsoft.com/en-us/library/windows/desktop/ms682530(v=vs.85).aspx
clearify it further:
A critical section object provides synchronization similar to that provided by a mutex object, except that a critical section can be used only by the threads of a single process.
Now, if "lazy-constructed" is not a requirement, the following solution is both cross-module safe and thread-safe, and even portable:
struct X { };
X * get_X_Instance()
{
static X x;
return &x;
}
extern int X_singleton_helper = (get_X_instance(), 1);
It's cross-module-safe because we use locally-scoped static object instead of file/namespace scoped global object.
It's thread-safe because: X_singleton_helper must be assigned to the correct value before entering main or DllMain It's not lazy-constructed also because of this fact), in this expression the comma is an operator, not punctuation.
Explicitly use "extern" here to prevent compiler optimize it out(Concerns about Scott Meyers article, the big enemy is optimizer.), and also make static-analyze tool such as pc-lint keep silent. "Before main/DllMain" is Scott meyer called "single-threaded startup part" in "Effective C++ 3rd" item 4.
However, I'm not very sure about whether compiler is allowed to optimize the call the get_X_instance() out according to the language standard, please comment.
There are many ways to do thread safe Singleton* initialization on windows. In fact some of them are even cross-platform. In the SO thread that you linked to, they were looking for a Singleton that is lazily constructed in C, which is a bit more specific, and can be a bit trickier to do right, given the intricacies of the memory model you are working under.
which you should never use