C++ Thread Safe Lazy Load - c++

I have a property which is similar to the following:
Foo* myFoo_m;
Foo getMyFoo() const
if (myFoo_m == NULL)
myFoo_m = new Foo();
// perform initialization
This works well in a single-threaded environment, but how do I handle this in a multi-threaded environment? Most of the info I've found deals with static singletons, but in this case, myFoo is a public instance property.
I am porting this over from C# (where I can use Lazy) and Java (where I can use double check locking), but it doesn't seem that there is a straightforward way to do this in C++. I cannot rely on any external libraries (no BOOST), and this needs to work on windows and linux. I also cannot use C++11.
Any insight would be good. I am new to C++.

If you have access to c++11 you can use std::mutex to lock prevent multiple threads from initializing the lazy section. (Note: std::mutex only became available on Windows with VS2012)
You can even perform a scoped aquisition of the mutex with std::lock_guard:
std::mutex m_init_mutex;
Foo getMyFoo() const
std::lock_guard<std::mutex> lock(m_init_mutex);
if (myFoo_m == NULL)
myFoo_m = new Foo();
// perform initialization
EDIT: The OPs now stated that C++11 isn't an option, but perhaps this answer will be useful in the future

By saying "no C++11", "no Boost or other third-party code", "must work on Windows and Linux", you have restricted yourself to using implementation-specific locking mechanisms.
I think your best option is to define a simple lock class for yourself, and implement it to use pthread_mutex on Linux and a CriticalSection on Windows. Possibly you already have some platform-specific code, to start the threads in the first place.
You could try something like Windows Services for UNIX to avoid writing platform-specific code, but it's probably not worth it for one lock. And although it's supplied by Microsoft, you'd probably consider it an external library anyway.

Warning: I didn't see the "no C++11" requirement, so please disregard the answer.
Since C++11 mandates that static variable initialization be thread-safe, here's a simple way that you might consider "cheating":
Foo init_foo()
// initialize and return a Foo
Foo & get_instance_lazily()
static Foo impl = init_foo();
return impl;
The instance will be initialized the first time that you call get_instance_lazily(), and thread-safely so.


How to release the object in a TLS-slot at thread exit on Windows?

for example, in a multi-thread program:
struct TLSObject;
void foo()
TLSObject* p = TlsGetValue(slot);
if (p == 0) {
p = new TLSObject;
TlsSetValue(slot, p);
// doing something with p
the first time to call foo() in any thread will makes a new TLSObject.
my question is:
How to delete a TLSObject(if I don't use boost::thread and boost::thread_specific_ptr) ?
boost::thread_specific_ptr can do cleanup work at thread exit,
but it depends on boost::thread I guess, not for normal OS thread,
and it's slow.
Instead of TlsAlloc, use FlsAlloc (and related Fls* functions). With FLS, you register a cleanup callback which the OS will call on the thread before the thread terminates, giving you the opportunity to clean up.
Alright. For Windows Vista and above, as James McNellis said - we
could use FlsCallback.
For a DLL, we could just use DllMain, if reason parameter equals
to DLL_THREAD_DETACH, we do the cleanup. An alternative might be
to use _pRawDllMain, it's just like another DllMain, you could
find it from boost source.
For an EXE, we could use TLS callback, please have a look at
here and here, and, of course, boost source. In
practice, it works on Windows XP, but I found that optimizations
may make it ineffective, so be careful with optimizations, or make
a explicit reference to the pointer of your callback function.
Save the code below to tls.cpp and add it to your project, no
matter it's exe or dll, it will work. Note that for a DLL on
Windows Vista and above the onThreadExit function may be called
twice - one from dll_callback and one from tls_callback.
#include <windows.h>
extern void onThreadExit();
static void NTAPI tls_callback(PVOID, DWORD reason, PVOID)
if (reason == DLL_THREAD_DETACH) {
static BOOL WINAPI dll_callback(LPVOID, DWORD reason, LPVOID)
if (reason == DLL_THREAD_DETACH) {
return TRUE;
#pragma section(".CRT$XLY",long,read)
extern "C" __declspec(allocate(".CRT$XLY")) PIMAGE_TLS_CALLBACK _xl_y = tls_callback;
extern "C"
extern BOOL (WINAPI * const _pRawDllMain)(HANDLE, DWORD, LPVOID) = &dll_callback;
#pragma comment(linker, "/INCLUDE:__tls_used")
#pragma comment(linker, "/INCLUDE:__xl_y")
If you think it's obscure, use boost's at_thread_exit, the
complexity is hidden. In fact the code above is a simplified
version of boost tls. And if you do not want to use boost, on
Windows system this is a alternative.
Or, a more generic way: thread_local.
A 'boost::thread_specific_ptr' should work on any thread (according to the answer to my question: Check if thread is a boost thread)
About it being slow, yes, it isn't ideal. However, what you can do is use whatever normal TLS mechanism you wish (I used the GCC specific modifier) and then create an additional thread_specific_ptr which cleans up the data (create a wrapper to your true TLS pointer). So creation and deletion of the TLS is a bit expensive, but access is unaffected.
You should be able use one of the many scope exit mechanisms to achieve this, for example this one.
Another alternative would be to wrap your TLSObject into an RAII class that releases the object when the RAII wrapper is destroyed. This is a very common resource management pattern and definitely applicable here.

Is Mutex required for 1 byte shared memory

my case is one thread read and want to
decide if needed to change the value or not?
some thing like below
void set(bool status)
if(status == m_status)
m_status = status;
if this possible?
Using a synchronization object for boolean state is overkill.
On Windows you can use Interlocked Variable Access.
For cross platform solution .. see Boost Atomic
std::atomic from C++11 is also a solution
I think you need to clarify your question a bit. Is it possible? Yes. Is it necessary? Probably. Are there other ways to do it? Yes, as another answer has noted.
Don't forget to unlock when you're done with the things you want to change. And just a stylistic note, I find it much clearer to use your 'if' statement to encase the code block instead of return'ing out of the function. Like this:
void set(bool status)
if(status != m_status)
m_status = status;
Just my opinion, of course.
Generally it's not possible. It will work most of the time on most platforms, but it's formally undefined and there are cases where cache coherency issues will come to hunt you.
If you can get C++11, use std::atomic<bool> from the new <atomic> header. If not, you should be using legacy compiler-specific equivalent. Windows have Interlocked* functions, GCC has __sync keyword. There is actually a cross-platform implementation of the important bits of the C++11 standard buried deep in Boost.Interprocess library, but it's unfortunately not exposed to the user.

Coroutines in C or C++?

It seems like coroutines are normally found in higher level languages.
There seem to be several different definitions of them as well. I am trying to find a way to have the specifically called coroutines in C like we have in Lua.
function foo()
print("foo", 1)
print("foo", 2)
There's no language level support for coroutines in either C or C++.
You could implement them using assembler or fibres, but the result would not be portable and in the case of C++ you'd almost certainly lose the ability to use exceptions and be unable to rely on stack unwinding for cleanup.
In my opinion you should either use a language the supports them or not use them - implementing your own version in a language that doesn't support them is asking for trouble.
There is a new (as of version 1.53.0) coroutine library in the Boost C++ library: http://www.boost.org/doc/libs/1_53_0/libs/coroutine/doc/html/index.html
I'm unaware of a C library--I came across this question looking for one.
Sorry - neither C nor C++ has support for coroutines. However, a simple search for "C coroutine: yields the following fascinating treatise on the problem: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html, although you may find his solution a bit - um - impractical
Nowadays, C++ provides coroutines natively as part of C++20.
Concerning the C language, they are not supported natively but several libraries provides them. Some are not portable as they rely on some architecture-dependent assembly instructions but some are portable as they use standard library functions like setjmp()/longjmp() or getcontext()/setcontext()/makecontext()/swapcontext(). There are also some original propositions like this one which uses the C language trick from the Duff's device.
N.B.: On my side, I designed this library.
There's a bunch of coroutine libraries for C++. Here's one from RethinkDB.
There's also mine header-only library, which is tailored to be used with callbacks. I've tried Boost coroutines but I don't use them yet because of the incompatibility with valgrind. My implementation uses ucontext.h and works fine under valgrind so far.
With "standard" coroutines you have to jump thru some hoops to use them with callbacks. For example, here is how a working thread-safe (but leaking) Cherokee handler looks with Boost coroutines:
typedef coroutine<void()> coro_t;
auto lock = make_shared<std::mutex>();
coro_t* coro = new coro_t ([handler,buffer,&coro,lock](coro_t::caller_type& ca)->void {
p1: ca(); // Pass the control back in order for the `coro` to initialize.
coro_t* coro_ = coro; // Obtain a copy of the self-reference in order to call self from callbacks.
cherokee_buffer_add (buffer, "hi", 2); handler->sent += 2;
lock->lock(); // Prevents the thread from calling the coroutine while it still runs.
std::thread later ([coro_,lock]() {
//std::this_thread::sleep_for (std::chrono::milliseconds (400));
lock->lock(); // Wait for the coroutine to cede before resuming it.
(*coro_)(); // Continue from p2.
}); later.detach();
p2: ca(); // Relinquish control to `cherokee_handler_frople_step` (returning ret_eagain).
cherokee_buffer_add (buffer, ".", 1); handler->sent += 1;
(*coro)(); // Back to p1.
lock->unlock(); // Now the callback can run.
and here is how it looks with mine:
struct Coro: public glim::CBCoro<128*1024> {
cherokee_handler_frople_t* _handler; cherokee_buffer_t* _buffer;
Coro (cherokee_handler_frople_t *handler, cherokee_buffer_t* buffer): _handler (handler), _buffer (buffer) {}
virtual ~Coro() {}
virtual void run() override {
cherokee_buffer_add (_buffer, "hi", 2); _handler->sent += 2;
yieldForCallback ([&]() {
std::thread later ([this]() {
//std::this_thread::sleep_for (std::chrono::milliseconds (400));
}); later.detach();
cherokee_buffer_add_str (_buffer, "."); _handler->sent += 1;

c++ threads - parallel processing

I was wondering how to execute two processes in a dual-core processor in c++.
I know threads (or multi-threading) is not a built-in feature of c++.
There is threading support in Qt, but I did not understand anything from their reference. :(
So, does anyone know a simple way for a beginner to do it. Cross-platform support (like Qt) would be very helpful since I am on Linux.
Try the Multithreading in C++0x part 1: Starting Threads as a 101. If you compiler does not have C++0x support, then stay with Boost.Thread
Take a look at Boost.Thread. This is cross-platform and a very good library to use in your C++ applications.
What specifically would you like to know?
The POSIX thread (pthreads) library is probably your best bet if you just need a simple threading library, it has implementations both on Windows and Linux.
A guide can be found e.g. here. A Win32 implementation of pthreads can be downloaded here.
Edit: Didn't see you were on Linux. In that case I'm not 100% sure but I think the libraries are probably already bundled in with your GCC installation.
I'd recommend using the Boost libraries Boost.Thread instead. This will wrap platform specifics of Win32 and Posix, and give you a solid set of threading and synchronization objects. It's also in very heavy use, so finding help on any issues you encounter on SO and other sites is easy.
You can search for a free PDF book "C++-GUI-Programming-with-Qt-4-1st-ed.zip" and read Chapter 18 about Multi-threading in Qt.
Concurrent programming features supported by Qt includes (not limited to) the following:
Read Write Lock
Wait Condition
Thread Specific Storage
However, be aware of the following trade-offs with Qt:
Performance penalties vs native threading libraries. POSIX thread (pthreads) has been native to Linux since kernel 2.4 and may not substitute for < process.h > in W32API in all situations.
Inter-thread communication in Qt is implemented with SIGNAL and SLOT constructs. These are NOT part of the C++ language and are implemented as macros which requires proprietary code generators provided by Qt to be fully compiled.
If you can live with the above limitations, just follow these recipes for using QThread:
#include < QtCore >
Derive your own class from QThread. You must implement a public function run() that returns void to contain instructions to be executed.
Instantiate your own class and call start() to kick off a new thread.
Sameple Code:
#include <QtCore>
class MyThread : public QThread {
void run() {
// do something
int main(int argc, char** argv) {
MyThread t1, t2;
t1.start(); // default implementation from QThread::start() is fine
t2.start(); // another thread
t1.wait(); // wait for thread to finish
return 0;
As an important note in c++14, the use of concurrent threading is available:
class Example
auto DoStuff() -> std::string
return "Doing Stuff";
auto DoStuff2() -> std::string
return "Doing Stuff 2";
int main()
Example EO;
func_pointer = &Example::DoStuff;
std::future<string> thread_one = std::async(std::launch::async, func_pointer, &EO); //Launching upon declaring
func_pointer_2 = &Example::DoStuff2;
std::future<string> thread_two = std::async(std::launch::deferred, func_pointer_2, &EO);
thread_two.get(); //Launching upon calling
Both std::async (std::launch::async, std::launch::deferred) and std::thread are fully compatible with Qt, and in some cases may be better at working in different OS environments.
For parallel processing, see this.

Reader/Writer Locks in C++

I'm looking for a good reader/writer lock in C++. We have a use case of a single infrequent writer and many frequent readers and would like to optimize for this. Preferable I would like a cross-platform solution, however a Windows only one would be acceptable.
Since C++ 17 (VS2015) you can use the standard:
#include <shared_mutex>
typedef std::shared_mutex Lock;
typedef std::unique_lock< Lock > WriteLock;
typedef std::shared_lock< Lock > ReadLock;
Lock myLock;
void ReadFunction()
ReadLock r_lock(myLock);
//Do reader stuff
void WriteFunction()
WriteLock w_lock(myLock);
//Do writer stuff
For older compiler versions and standards you can use boost to create a read-write lock:
#include <boost/thread/locks.hpp>
#include <boost/thread/shared_mutex.hpp>
typedef boost::shared_mutex Lock;
typedef boost::unique_lock< Lock > WriteLock;
typedef boost::shared_lock< Lock > ReadLock;
Newer versions of boost::thread have read/write locks (1.35.0 and later, apparently the previous versions did not work correctly).
They have the names shared_lock, unique_lock, and upgrade_lock and operate on a shared_mutex.
Using standard pre-tested, pre-built stuff is always good (for example, Boost as another answer suggested), but this is something that's not too hard to build yourself. Here's a dumb little implementation pulled out from a project of mine:
#include <pthread.h>
struct rwlock {
pthread_mutex_t lock;
pthread_cond_t read, write;
unsigned readers, writers, read_waiters, write_waiters;
void reader_lock(struct rwlock *self) {
if (self->writers || self->write_waiters) {
do pthread_cond_wait(&self->read, &self->lock);
while (self->writers || self->write_waiters);
void reader_unlock(struct rwlock *self) {
if (self->write_waiters)
void writer_lock(struct rwlock *self) {
if (self->readers || self->writers) {
do pthread_cond_wait(&self->write, &self->lock);
while (self->readers || self->writers);
self->writers = 1;
void writer_unlock(struct rwlock *self) {
self->writers = 0;
if (self->write_waiters)
else if (self->read_waiters)
void rwlock_init(struct rwlock *self) {
self->readers = self->writers = self->read_waiters = self->write_waiters = 0;
pthread_mutex_init(&self->lock, NULL);
pthread_cond_init(&self->read, NULL);
pthread_cond_init(&self->write, NULL);
pthreads not really being Windows-native, but the general idea is here. This implementation is slightly biased towards writers (a horde of writers can starve readers indefinitely); just modify writer_unlock if you'd rather the balance be the other way around.
Yes, this is C and not C++. Translation is an exercise left to the reader.
Greg Rogers pointed out that the POSIX standard does specify pthread_rwlock_*. This doesn't help if you don't have pthreads, but it stirred my mind into remembering: Pthreads-w32 should work! Instead of porting this code to non-pthreads for your own use, just use Pthreads-w32 on Windows, and native pthreads everywhere else.
Whatever you decide to use, benchmark your work load against simple locks, as read/write locks tend to be 3-40x slower than simple mutex, when there is no contention.
Here is some reference
C++17 supports std::shared_mutex . It is supported in MSVC++ 2015 and 2017.
Edit: The MSDN Magazine link isn't available anymore. The CodeProject article is now available on https://www.codeproject.com/Articles/32685/Testing-reader-writer-locks and sums it up pretty nicely. Also I found a new MSDN link about Compound Synchronisation Objects.
There is an article about reader-writer locks on MSDN that presents some implementations of them. It also introduces the Slim reader/writer lock, a kernel synchronisation primitive introduced with Vista. There's also a CodeProject article about comparing different implementations (including the MSDN article's ones).
Intel Thread Building Blocks also provide a couple of rw_lock variants:
They have a spin_rw_mutex for very short periods of contention and a queueing_rw_mutex for longer periods of contention. The former can be used in particularly performance sensitive code. The latter is more comparable in performance to that provided by Boost.Thread or directly using pthreads. But profile to make sure which one is a win for your access patterns.
Boost.Thread has since release 1.35.0 already supports reader-writer locks. The good thing about this is that the implementation is greatly cross-platform, peer-reviewed, and is actually a reference implementation for the upcoming C++0x standard.
I can recommend the ACE library, which provides a multitude of locking mechanisms and is ported to various platforms.
Depending on the boundary conditions of your problem, you may find the following classes useful:
ACE_Write_Guard and ACE_Read_Guard
Here is a good and lightweight implementation suitable for most tasks.
Multiple-Reader, Single-Writer Synchronization Lock Class for Win32 by Glenn Slayde
#include <shared_mutex>
class Foo {
void Write() {
std::unique_lock lock{mutex_};
// ...
void Read() {
std::shared_lock lock{mutex_};
// ...
std::shared_mutex mutex_;
You could copy Sun's excellent ReentrantReadWriteLock. It includes features such as optional fairness, lock downgrading, and of course reentrancy.
Yes it's in Java, but you can easily read and transpose it to C++, even if you don't know any Java. The documentation I linked to contains all the behavioral properties of this implementation so you can make sure it does what you want.
If nothing else, it's a guide.