Portable thread-safe lazy singleton

Portable thread-safe lazy singleton - c++

Greetings to all.
I'm trying to write a thread safe lazy singleton for future use. Here's the best I could come up with. Can anyone spot any problems with it? The key assumption is that static initialization occurs in a single thread before dynamic initialisations. (this will be used for a commercial project and company is not using boost :(, life would be a breeze otherwise :)
PS: Haven't check that this compiles yet, my apologies.
/*
There are two difficulties when implementing the singleton pattern:
Problem (a): The "global variable instantiation fiasco". TODO: URL
This is due to the unspecified order in which global variables are initialised. Static class members are equivalent
to a global variable in C++ during initialisation.
Problem (b): Multi-threading.
Care must be taken to ensure that the mutex initialisation is handled properly with respect to problem (a).
*/
/*
Things achieved, maybe:
*) Portable
*) Lazy creation.
*) Safe from unspecified order of global variable initialisation.
*) Thread-safe.
*) Mutex is properly initialise when invoked during global variable intialisation:
*) Effectively lock free in instance().
*/
/************************************************************************************
Platform dependent mutex implementation
*/
class Mutex
{
public:
void lock();
void unlock();
};
/************************************************************************************
Threadsafe singleton
*/
class Singleton
{
public: // Interface
static Singleton* Instance();
private: // Static helper functions
static Mutex* getMutex();
private: // Static members
static Singleton* _pInstance;
static Mutex* _pMutex;
private: // Instance members
bool* _pInstanceCreated; // This is here to convince myself that the compiler is not re-ordering instructions.
private: // Singletons can't be coppied
explicit Singleton();
~Singleton() { }
};
/************************************************************************************
We can't use a static class member variable to initialised the mutex due to the unspecified
order of initialisation of global variables.
Calling this from
*/
Mutex* Singleton::getMutex()
{
static Mutex* pMutex = 0; // alternatively: static Mutex* pMutex = new Mutex();
if( !pMutex )
{
pMutex = new Mutex(); // Constructor initialises the mutex: eg. pthread_mutex_init( ... )
}
return pMutex;
}
/************************************************************************************
This static member variable ensures that we call Singleton::getMutex() at least once before
the main entry point of the program so that the mutex is always initialised before any threads
are created.
*/
Mutex* Singleton::_pMutex = Singleton::getMutex();
/************************************************************************************
Keep track of the singleton object for possible deletion.
*/
Singleton* Singleton::_pInstance = Singleton::Instance();
/************************************************************************************
Read the comments in Singleton::Instance().
*/
Singleton::Singleton( bool* pInstanceCreated )
{
fprintf( stderr, "Constructor\n" );
_pInstanceCreated = pInstanceCreated;
}
/************************************************************************************
Read the comments in Singleton::Instance().
*/
void Singleton::setInstanceCreated()
{
_pInstanceCreated = true;
}
/************************************************************************************
Fingers crossed.
*/
Singleton* Singleton::Instance()
{
/*
'instance' is initialised to zero the first time control flows over it. So
avoids the unspecified order of global variable initialisation problem.
*/
static Singleton* instance = 0;
/*
When we do:
instance = new Singleton( instanceCreated );
the compiler can reorder instructions and any way it wants as long
as the observed behaviour is consistent to that of a single threaded environment ( assuming
that no thread-safe compiler flags are specified). The following is thus not threadsafe:
if( !instance )
{
lock();
if( !instance )
{
instance = new Singleton( instanceCreated );
}
lock();
}
Instead we use:
static bool instanceCreated = false;
as the initialisation indicator.
*/
static bool instanceCreated = false;
/*
Double check pattern with a slight swist.
*/
if( !instanceCreated )
{
getMutex()->lock();
if( !instanceCreated )
{
/*
The ctor keeps a persistent reference to 'instanceCreated'.
In order to convince our-selves of the correct order of initialisation (I think
this is quite unecessary
*/
instance = new Singleton( instanceCreated );
/*
Set the reference to 'instanceCreated' to true.
Note that since setInstanceCreated() actually uses the non-static
member variable: '_pInstanceCreated', I can't see the compiler taking the
liberty to call Singleton's ctor AFTER the following call. (I don't know
much about compiler optimisation, but I doubt that it will break up the ctor into
two functions and call one part of it before the following call and the other part after.
*/
instance->setInstanceCreated();
/*
The double check pattern should now work.
*/
}
getMutex()->unlock();
}
return instance;
}

No, this will not work. It is broken.
The problem has little/nothing to do with the compiler. It has to do with the order in which a second CPU will 'see' what the first CPU has done to memory. The memory (and caches) will be consistent, but the timing of WHEN each CPU decides to write or read each part of memory/cache is indeterminate.
So for CPU1:
instance = new Singleton( instanceCreated );
instance->setInstanceCreated();
Let's consider the compiler first. There is NO reason why the compiler doesn't reorder or otherwise alter these functions. Maybe like:
temp_register = new Singleton(instanceCreated);
temp_register->setInstanceCreated();
instance = temp_register;
or many other possibilities - like you said as long as single-threaded observed behaviour is consistent. This DOES include things like " break up the ctor into two functions and call one part of it before the following call and the other part after."
Now, it probably wouldn't break it up into 2 calls, but it would INLINE the ctor, particularly since it is so small. Then, once inlined, everything may be reordered, as if the ctor was broken in 2, for example.
In general, I would say not only is it possible that the compiler reordered things, it is probable - ie for the code you have, there is probably a reordering (once inlined, and inlining is likely) that is 'better' than the order given by the C++ code.
But let's leave that aside, and try to understand the real issues of double-checked locking.
So, let's just assume the compiler didn't reorder anything. What about the CPU? Or more importantly CPUs - plural.
The first CPU, 'CPU1' needs to follow the instructions given by the compiler, in particular, it needs to write to memory the things it has been told to write:
instance,
instanceCreated
other member variable of the Singleton (ie your Singleton does DO something, and has some state, doesn't it?)
Actually, that 'other member variable' stuff is really important. Important for your singleton - that's its real purpose right?, and important for our discussion. So let's give it a name: important_data. ie instance->important_data. And maybe instance->important_function(), which uses important_data. Etc.
As mentioned, let's assume the compiler has written the code such that these items are written in the order you are expecting, namely:
important_data - written inside the ctor, called from
instance = new Singleton(instanceCreated);
instance - assigned right after new/ctor returns
instanceCreated - inside setInstanceCreated()
Now, the CPU hands these writes off to the memory bus. Know what the memory bus does? IT REORDERS THEM. The CPU and architecture has the same constraints as the compiler - ie make sure this one CPU sees things consistently - ie single threaded consistent. So if, for example, instance and instanceCreated are on the same cache-line (highly likely, actually), they might be written together, and since they were just read, that cache-line is 'hot', so maybe they get written FIRST before important_data, so that that cache-line can be retired to make room for the cache-line where important_data lives.
Did you see that? instanceCreated and instance were just committed to memory BEFORE important_data. Note that CPU1 doesn't care, because it is living in a single-threaded world...
So now introduce CPU2:
CPU2 comes in, sees instanceCreated == true and instance != NULL and thus goes off and decides to call Singleton::Instance()->important_function(), which uses important_data, which is uninitialized. CRASH BANG BOOM.
By the way, it gets worse. So far, we've seen that the compiler could reorder, but we're pretending it didn't. Let's go one step further and pretend that CPU1 did NOT reorder any of the memory writing. Are we OK now?
No. Of course not.
Just as CPU1 decided to optimize/reorder its memory writes, CPU2 can REORDER ITS READS!
CPU2 comes in and sees
if (!instanceCreated) ...
so it needs to read instanceCreated. Ever heard of 'speculative execution'? (Great name for a FPS game, by the way). If the memory bus isn't busy doing anything, CPU2 might pre-read some other values 'hoping' that instanceCreated is true. ie it may pre-read important_data for example. Maybe important_data (or the uninitialized, possibly re-claimed-by-the-allocator memory that will become important_data) is already in CPU2's cache. Or maybe (more likely?) CPU2 just free'd that memory, and the allocator wrote NULL in its first 4 bytes (allocators often use that memory for their free-lists), so actually, the memory soon-to-become important_data may actually still be in the write queue of CPU2. In that case, why would CPU2 bother re-reading that memory, when it hasn't even finished writing it yet!? (it wouldn't - it would just get the values from its write-queue.)
Did that make sense? If not, imagine that the value of instance (which is a pointer) is 0x17e823d0. What was that memory doing before it became (becomes) the Singleton? Is that memory still in the write-queue of CPU2?...
Or basically, don't even think about why it might want to do so, but realize that CPU2 might read important_data first, then instanceCreated second. So even though CPU1 may have wrote them in order CPU2 sees 'crap' in important_data, then sees true in instanceCreated (and who knows what in instance!). Again, CRASH BANG BOOM. Or BOOM CRASH BANG, since by now you realize that the order isn't guaranteed...

It's usually better to have a non-lazy singleton which does nothing in its constructor, and then in GetInstance do a thread-safe call once to a function which allocates any expensive resources. You're already creating a Mutex non-lazily, so why not just put the mutex and some kind of Pimpl in your Singleton object?
By the way, this is easier on Posix:
struct Singleton {
static Singleton *GetInstance() {
pthread_once(&control, doInit);
return instance;
}
private:
static void doInit() {
// slight problem: we can't throw from here, or fail
try {
instance = new Singleton();
} catch (...) {
// we could stash an error indicator in a static member,
// and check it in GetInstance.
std::abort();
}
}
static pthread_once_t control;
static Singleton *instance;
};
pthread_once_t Singleton::control = PTHREAD_ONCE_INIT;
Singleton *Singleton::instance = 0;
There do exist pthread_once implementations for Windows and other platforms.

If you wish to see an in-depth discussion of Singletons, the various policies about their lifetime and the thread safety issues, I can only recommend a good read: "Modern C++ Design" by Alexandrescu.
The implementation is presented on the web in Loki, find it here!
And yes, it does hold in a single header file. So I would really encourage you to at least grab the file and read it, and better yet read the book to have the full-blown reflection.

At global scope in your code:
/************************************************************************************
Keep track of the singleton object for possible deletion.
*/
Singleton* Singleton::_pInstance = Singleton::Instance();
This makes your implementation not lazy. Presumably you want to set _pInstance to NULL at global scope, and assign to it after you construct the singleton inside Instance() before you unlock the mutex.

More food for thought from Meyers & Alexandrescu, with Singleton being the specific target: C++ and the Perils of Double-Checked Locking. It's a bit of a prickly problem.

Related

Synchronizing method calls on shared object from multiple threads

I am thinking about how to implement a class that will contain private data that will be eventually be modified by multiple threads through method calls. For synchronization (using the Windows API), I am planning on using a CRITICAL_SECTION object since all the threads will spawn from the same process.
Given the following design, I have a few questions.
template <typename T> class Shareable
{
private:
const LPCRITICAL_SECTION sync; //Can be read and used by multiple threads
T *data;
public:
Shareable(LPCRITICAL_SECTION cs, unsigned elems) : sync{cs}, data{new T[elems]} { }
~Shareable() { delete[] data; }
void sharedModify(unsigned index, T &datum) //<-- Can this be validly called
//by multiple threads with synchronization being implicit?
{
EnterCriticalSection(sync);
/*
The critical section of code involving reads & writes to 'data'
*/
LeaveCriticalSection(sync);
}
};
// Somewhere else ...
DWORD WINAPI ThreadProc(LPVOID lpParameter)
{
Shareable<ActualType> *ptr = static_cast<Shareable<ActualType>*>(lpParameter);
T copyable = /* initialization */;
ptr->sharedModify(validIndex, copyable); //<-- OK, synchronized?
return 0;
}
The way I see it, the API calls will be conducted in the context of the current thread. That is, I assume this is the same as if I had acquired the critical section object from the pointer and called the API from within ThreadProc(). However, I am worried that if the object is created and placed in the main/initial thread, there will be something funky about the API calls.
When sharedModify() is called on the same object concurrently,
from multiple threads, will the synchronization be implicit, in the
way I described it above?
Should I instead get a pointer to the
critical section object and use that instead?
Is there some other
synchronization mechanism that is better suited to this scenario?

When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
It's not implicit, it's explicit. There's only only CRITICAL_SECTION and only one thread can hold it at a time.
Should I instead get a pointer to the critical section object and use that instead?
No. There's no reason to use a pointer here.
Is there some other synchronization mechanism that is better suited to this scenario?
It's hard to say without seeing more code, but this is definitely the "default" solution. It's like a singly-linked list -- you learn it first, it always works, but it's not always the best choice.

When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
Implicit from the caller's perspective, yes.
Should I instead get a pointer to the critical section object and use that instead?
No. In fact, I would suggest giving the Sharable object ownership of its own critical section instead of accepting one from the outside (and embrace RAII concepts to write safer code), eg:
template <typename T>
class Shareable
{
private:
CRITICAL_SECTION sync;
std::vector<T> data;
struct SyncLocker
{
CRITICAL_SECTION &sync;
SyncLocker(CRITICAL_SECTION &cs) : sync(cs) { EnterCriticalSection(&sync); }
~SyncLocker() { LeaveCriticalSection(&sync); }
}
public:
Shareable(unsigned elems) : data(elems)
{
InitializeCriticalSection(&sync);
}
Shareable(const Shareable&) = delete;
Shareable(Shareable&&) = delete;
~Shareable()
{
{
SyncLocker lock(sync);
data.clear();
}
DeleteCriticalSection(&sync);
}
void sharedModify(unsigned index, const T &datum)
{
SyncLocker lock(sync);
data[index] = datum;
}
Shareable& operator=(const Shareable&) = delete;
Shareable& operator=(Shareable&&) = delete;
};
Is there some other synchronization mechanism that is better suited to this scenario?
That depends. Will multiple threads be accessing the same index at the same time? If not, then there is not really a need for the critical section at all. One thread can safely access one index while another thread accesses a different index.
If multiple threads need to access the same index at the same time, a critical section might still not be the best choice. Locking the entire array might be a big bottleneck if you only need to lock portions of the array at a time. Things like the Interlocked API, or Slim Read/Write locks, might make more sense. It really depends on your thread designs and what you are actually trying to protect.

C++: Thread Safety in a Signal/Slot Library

I'm implementing a Signal/Slot framework, and got to the point that I want it to be thread-safe. I already had a lot of support from the Boost mailing-list, but since this is not really boost-related, I'll ask my pending question here.
When is a signal/slot implementation (or any framework that calls functions outside itself, specified in some way by the user) considered thread-safe? Should it be safe w.r.t. its own data, i.e. the data associated to its implementation details? Or should it also take into account the user's data, which might or might not be modified whatever functions are passed to the framework?
This is an example given on the mailing-list (Edit: this is an example use-case --i.e. user code--. My code is behind the calls to the Emitter object):
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
In the above code, it might happen that Event is emitted, causing someFunction to be executed. If somePtr is non-null, but becomes null just after the if, but before the assignment, we're in trouble. From the point of view of thread2, this is not obvious because it is disconnecting someFunction before calling cleanupPtr.
I can see why this could potentially lead to trouble, but who's responsibility is this? Should my library protect the user from using it in every irresponsible but imaginable way?

I suspect there is no clearly good answer, but clarity will come from documenting the guarantees you wish to make about concurrent access to an Emitter object.
One level of guarantee, which to me is what is implied by a promise of thread safety, is that:
Concurrent operations on the object are guaranteed to leave the object in a consistent state (at least, from the point of view of the accessing threads.)
Non-commutative operations will be performed as if they were scheduled serially in some (unknown) order.
Then the question is, what does the emit method promise semantically: passing control to the connected routine, or evaluation of the function? If the former, then your work sounds like it is already done; if the latter, then the 'as-if ordered' requirement would mean that you need to enforce some level of synchronisation.
Users of the library can work with either, provided it is clear what is being promised.

Firstly the simplest possibility: If you don't claim your library to be thread-safe, you don't have to bother about this.
(But even) if you do:
In your example the user would have to take care about thread-safety, since both functions could be dangerous, even without using your event-system (IMHO, this is a pretty good way to determine who should take care about those kind of problems). A possible way for him to do this in C++11 could be:
#include <mutex>
// A mutex is used to control thread-acess to a shared resource
std::mutex _somePtr_mutex;
int* somePtr = nullptr;
void someFunction()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
if(somePtr)
*somePtr = 17;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}
void cleanupPtr()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
int *tmp = somePtr;
somePtr = null;
delete tmp;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}

The last question is easy. If you say your library is threadsafe, it should threadsafe. It makes no sense to say it is partly threadsafe or, it is only threadsafe if you do not abuse it. In that case you have to explain what exactly is not threadsafe.
Now to your first question regarded someFunction:
The operation is non atomic. Which means the CPU can interrupt between the if and the assigment. And that will happen, I know that :-) The other thread can erase the pointer anytime. Even between two short and fast looking statements.
Now to cleanupPtr:
I am not a compiler expert, but if you want to be shure that your assigment take place in the same moment you wrote it in code you should write the keyword volatile in front of the declaration of somePtr. The compiler will now know that you use that attribute in a multithreaded situation and will not buffer the value in a register of the CPU.
If you have a thread situation with a reader thread and a writer thread, the keyword volatile can (IMHO) be enough to sync them. As long as the attributes you use to exchange information between threads are generic.
For other situations you can use mutex or atomics. I will give you an example for mutex. I use C++11 for that, but it works similar with previous versions of C++ using boost.
Using mutex:
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
std::recursive_mutex g_mutex;
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
I only added a recursive mutex here without changing any other code of the sample, even if it's now cargo code.
There are two kinds of mutex in the std. A utterly useless std::mutex and the std::recursive_mutex which work like you expect a mutex should work. The std::mutex exclude the access of any further call even from the same thread. Which can happen if a method which needs mutex protection calls a public method which use the same mutex. std::recursive_mutex is reentrant for the same thread.
Atomics (or interlocks in win32) are another way, but only to exchange values between threads or access them concurrently. Your example is missing such values, but in your case, I would look a little deeper in them (std::atomic).
UPDATE
If your are the user of a library which is not explicit declared as threadsafe by the developer, take it as non threadsafe and shield every call to it with a mutex lock.
To stick with the example. If you cannot change someFunction the you have to wrap the function like:
void threadsafeSomeFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
someFunction();
}

Proper compiler intrinsics for double-checked locking?

When implementing double-checked locking, what is the proper way to do the memory and/or compiler barriers when implementing double-checked locking for initialization?
Something like std::call_once isn't what I want; it's way too slow. It's typically just implemented on top of pthread_mutex_lock and EnterCriticalSection respective to OS.
In my programs, I often run into initialization cases where the initialization is safe to repeat, as long as exactly one thread gets to set the final pointer. If another thread beats it to setting the final pointer to the singleton object, it deletes what it created and makes use of the other thread's. I also often use this in cases where it doesn't matter which thread "wins" because they all come up with the same result.
Here's an unsafe, overly-contrived example, using Visual C++ intrinsics:
MyClass *GetGlobalMyClass()
{
static MyClass *const UNSET_POINTER = reinterpret_cast<MyClass *>(
static_cast<intptr_t>(-1));
static MyClass *volatile s_object = UNSET_POINTER;
if (s_object == UNSET_POINTER)
{
MyClass *newObject = MyClass::Create();
if (_InterlockedCompareExchangePointer(&s_object, newObject,
UNSET_POINTER) != UNSET_POINTER)
{
// Another thread beat us. If Create didn't return null, destroy.
if (newObject)
{
newObject->Destroy(); // calls "delete this;", presumably
}
}
}
return s_object;
}
On a weakly-ordered memory architecture, my understanding is that it's possible that the new value of s_object is visible to other threads before other variables written inside MyClass::Create or MyClass::MyClass are visible. Also, the compiler itself could arrange the code this way in the absence of a compiler barrier (in Visual C++, _WriteBarrier, but _InterlockedCompareExchange acts as a barrier).
Do I need like a store fence intrinsic function in there or something in order to ensure that MyClass's variables are visible to all threads before s_object becomes somethings besides -1?

Fortunately, the rules in C++ are very simple:
If there is a data race, the behaviour is undefined.
In you code the data race is caused by the following read, which conflicts with the write operation in __InterlockedCompareExchangePointer.
if (s_object.m_void == UNSET_POINTER)
A thread-safe solution without blocking might look as follows. Note that on x86 a load operation with sequential consistency has basically no overhead compared to a regular load operation. If you care about other architectures, you can also use acquire release instead of sequential consistency.
static std::atomic<MyClass*> s_object{nullptr};
MyClass* o = s_object.load(std::memory_order_seq_cst);
if (o == nullptr) {
o = new MyClass{...};
MyClass* expected = nullptr;
if (!s_object.compare_exchange_strong(expected, o, std::memory_order_seq_cst)) {
delete o;
o = expected;
}
}
return o;

For a proper C++11 implementation any function-local static variable will be constructed in a thread-safe fashion by the first thread passing through this variable.

Warm up some variables in C++

I have a C++ library that works on some numeric values, this values are not available at compile time but are immediatly available at runtime and are based on machine-related details, in short I need values like display resolution, the number of CPU cores and so on.
The key points of my question are:
I can't ask to the user to input this values ( both the coders/user of my lib and the final user )
I need to do this warm up only once the application starts, it's a 1 time only thing
this values are later used by methods and classes
the possible solutions are:
build a data structure Data, declare some Data dummy where dummy is the name of the variable used to store everything and the contructor/s will handle the one time inizialization for the related values
wrap something like the first solution in a method like WarmUp() and putting this method right after the start for the main() ( it's also a simple thing to remember and use )
the big problems that are still unsolved are:
the user can declare more than 1 data structure since Data it's a type and there are no restrictions about throwing 2-4-5-17 variables of the same type in C++
the WarmUp() method can be a little intrusive in the design of the other classes, it can also happen that the WarmUp() method is used in local methods and not in the main().
I basically need to force the creation of 1 single instance of 1 specific type at runtime when I have no power over the actual use of my library, or at least I need to design this in a way that the user will immediately understand what kind of error is going on keeping the use of the library intuitive and simple as much as possible.
Can you see a solution to this ?
EDIT:
my problems are also more difficult due to the fact that I'm also trying to get a multi-threading compatible data structure.

What about to use lazily-create singleton? E.g
struct Data
{
static Data & instance()
{
static Data data_;
return data_;
}
private:
Data()
{
//init
}
}
Data would be initialized on first use, when you call Data::instance()
Edit: as for multithreading, read efficient thread-safe singleton in C++
Edit2 realisation using boost::call_once

First, as others have said, the obvious (and best) answer to
this is a singleton. Since you've added the multithreading
requirement, however: there are two solutions, depending on
whether the object will be modified by code using the singleton.
(From your description, I gather not.) If not, then it is
sufficient to use the "naïve" implementation of a singleton, and
ensure that the singleton is initialized before threads are
started. If no thread is started before you enter main (and
I would consider it bad practice otherwise), then something like
the following is largely sufficient:
class Singleton
{
static Singleton const* ourInstance;
Singleton();
Singleton( Singleton const& );
Singleton& operator=( Singleton const& );
public:
static Singleton const& instance();
};
and in the implementation:
Singleton const* Singleton::ourInstance = &Singleton::instance();
Singleton const&
Singleton::instance()
{
if ( ourInstance == NULL ) {
ourInstance = new Singleton;
}
return *ourInstance;
}
No locking is necessary, since no thread will be modifying
anything once threading starts.
If the singleton is mutable, then you have to protect all access
to it. You could do something like the above (without the
const, obviously), and leave the locking to the client, but in
such cases, I'd prefer locking in the instance function, and
returning an std::shared_ptr with a deleter which frees the
lock, which was acquired in the instance function. I think
something like the following could work (but I've never actually
needed it, and so haven't tried it):
class Singleton
{
static Singleton* ourInstance;
static std::mutex ourMutex;
class LockForPointer
{
public:
operator()( Singleton* )
{
Singleton::ourMutex.unlock();
}
};
class LockForInstance
{
bool myOwnershipIsTransfered;
public:
LockForInstance
: myOwnershipIsTransfered( false )
{
Singleton::ourMutex.lock();
}
~LockForInstance()
{
if ( !myOwnershipIsTransfered ) {
Singleton::ourMutex.unlock();
}
}
LockForPointer transferOwnership()
{
myOwnershipIsTransfered = true;
return LockForPointer();
}
};
public:
static std::shared_ptr<Singleton> instance();
};
and the implementation:
static Singleton* ourInstance = NULL;
static std::mutex ourMutex;
std::shared_ptr<Singleton>
Singleton::instance()
{
LockForInstance lock;
if ( ourInstance == NULL ) {
ourInstance = new Singleton;
}
return std::shared_ptr<Singleton>( ourInstance, lock.transferOwnership() );
}
This way, the same lock is used for the check for null and for
accessing the data.

Warmup is usually related to performance issues (and makes me think of the processor cache, see the __builtin_prefetch of GCC).
Maybe you want to make a singleton class, there are many solutions to this (e.g. this C++ singleton tutorial).
Also, if performance is the primary concern, you could consider that the configured parameters are given at initialization (or even at installation) time. Then, you could specialize your code, perhaps as simply as having templates and instanciating them at first by emitting (at runtime = initialization time) the appropriate C++ stub source code and compiling it (at "runtime", e.g. at initialization) and dynamically loading it (using plugins & dlopen ...). See also this answer.

Thread safe container

There is some exemplary class of container in pseudo code:
class Container
{
public:
Container(){}
~Container(){}
void add(data new)
{
// addition of data
}
data get(size_t which)
{
// returning some data
}
void remove(size_t which)
{
// delete specified object
}
private:
data d;
};
How this container can be made thread safe? I heard about mutexes - where these mutexes should be placed? Should mutex be static for a class or maybe in global scope? What is good library for this task in C++?

First of all mutexes should not be static for a class as long as you going to use more than one instance. There is many cases where you should or shouldn't use use them. So without seeing your code it's hard to say. Just remember, they are used to synchronise access to shared data. So it's wise to place them inside methods that modify or rely on object's state. In your case I would use one mutex to protect whole object and lock all three methods. Like:
class Container
{
public:
Container(){}
~Container(){}
void add(data new)
{
lock_guard<Mutex> lock(mutex);
// addition of data
}
data get(size_t which)
{
lock_guard<Mutex> lock(mutex);
// getting copy of value
// return that value
}
void remove(size_t which)
{
lock_guard<Mutex> lock(mutex);
// delete specified object
}
private:
data d;
Mutex mutex;
};

Intel Thread Building Blocks (TBB) provides a bunch of thread-safe container implementations for C++. It has been open sourced, you can download it from: http://threadingbuildingblocks.org/ver.php?fid=174 .

First: sharing mutable state between threads is hard. You should be using a library that has been audited and debugged.
Now that it is said, there are two different functional issue:
you want a container to provide safe atomic operations
you want a container to provide safe multiple operations
The idea of multiple operations is that multiple accesses to the same container must be executed successively, under the control of a single entity. They require the caller to "hold" the mutex for the duration of the transaction so that only it changes the state.
1. Atomic operations
This one appears simple:
add a mutex to the object
at the start of each method grab a mutex with a RAII lock
Unfortunately it's also plain wrong.
The issue is re-entrancy. It is likely that some methods will call other methods on the same object. If those once again attempt to grab the mutex, you get a dead lock.
It is possible to use re-entrant mutexes. They are a bit slower, but allow the same thread to lock a given mutex as much as it wants. The number of unlocks should match the number of locks, so once again, RAII.
Another approach is to use dispatching methods:
class C {
public:
void foo() { Lock lock(_mutex); foo_impl(); }]
private:
void foo_impl() { /* do something */; foo_impl(); }
};
The public methods are simple forwarders to private work-methods and simply lock. Then one just have to ensure that private methods never take the mutex...
Of course there are risks of accidentally calling a locking method from a work-method, in which case you deadlock. Read on to avoid this ;)
2. Multiple operations
The only way to achieve this is to have the caller hold the mutex.
The general method is simple:
add a mutex to the container
provide a handle on this method
cross your fingers that the caller will never forget to hold the mutex while accessing the class
I personally prefer a much saner approach.
First, I create a "bundle of data", which simply represents the class data (+ a mutex), and then I provide a Proxy, in charge of grabbing the mutex. The data is locked so that the proxy only may access the state.
class ContainerData {
protected:
friend class ContainerProxy;
Mutex _mutex;
void foo();
void bar();
private:
// some data
};
class ContainerProxy {
public:
ContainerProxy(ContainerData& data): _data(data), _lock(data._mutex) {}
void foo() { data.foo(); }
void bar() { foo(); data.bar(); }
};
Note that it is perfectly safe for the Proxy to call its own methods. The mutex will be released automatically by the destructor.
The mutex can still be reentrant if multiple Proxies are desired. But really, when multiple proxies are involved, it generally turns into a mess. In debug mode, it's also possible to add a "check" that the mutex is not already held by this thread (and assert if it is).
3. Reminder
Using locks is error-prone. Deadlocks are a common cause of error and occur as soon as you have two mutexes (or one and re-entrancy). When possible, prefer using higher level alternatives.

Add mutex as an instance variable of class. Initialize it in constructor, and lock it at the very begining of every method, including destructor, and unlock at the end of method. Adding global mutex for all instances of class (static member or just in gloabl scope) may be a performance penalty.

The is also a very nice collection of lock-free containers (including maps) by Max Khiszinsky
LibCDS1 Concurrent Data Structures
Here is the documentation page:
http://libcds.sourceforge.net/doc/index.html
It can be kind of intimidating to get started, because it is fully generic and requires you register a chosen garbage collection strategy and initialize that. Of course, the threading library is configurable and you need to initialize that as well :)
See the following links for some getting started info:
initialization of CDS and the threading manager
http://sourceforge.net/projects/libcds/forums/forum/1034512/topic/4600301/
the unit tests ((cd build && ./build.sh ----debug-test for debug build)
Here is base template for 'main':
#include <cds/threading/model.h> // threading manager
#include <cds/gc/hzp/hzp.h> // Hazard Pointer GC
int main()
{
// Initialize \p CDS library
cds::Initialize();
// Initialize Garbage collector(s) that you use
cds::gc::hzp::GarbageCollector::Construct();
// Attach main thread
// Note: it is needed if main thread can access to libcds containers
cds::threading::Manager::attachThread();
// Do some useful work
...
// Finish main thread - detaches internal control structures
cds::threading::Manager::detachThread();
// Terminate GCs
cds::gc::hzp::GarbageCollector::Destruct();
// Terminate \p CDS library
cds::Terminate();
}
Don't forget to attach any additional threads you are using:
#include <cds/threading/model.h>
int myThreadFunc(void *)
{
// initialize libcds thread control structures
cds::threading::Manager::attachThread();
// Now, you can work with GCs and libcds containers
....
// Finish working thread
cds::threading::Manager::detachThread();
}
1 (not to be confuse with Google's compact datastructures library)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js