What are the C++98 and C++11 memory models for local arrays and interactions with threads?
I'm not referring to the C++11 thread_local keyword, which pertains to global and static variables.
Instead, I'd like to find out what is the guaranteed behavior of threads for arrays that are allocated at compile-time. By compile-time I mean "int array[100]", which is different to allocation using the new[] keyword. I do not mean static variables.
For example, let's say I have the following struct/class:
struct xyz { int array[100]; };
and the following function:
void fn(int x) {
xyz dog;
for(int i=0; i<100; ++i) { dog.array[i] = x; }
// do something else with dog.array, eg. call another function with dog as parameter
}
Is it safe to call fn() from multiple threads?
It seems that the C++ memory model is: all local non-static variables and arrays are allocated on the stack, and that each thread has its own stack. Is this true (ie. is this officially part of the standard) ?
Such variables are allocated on the stack, and since each thread has its own stack, it is perfectly safe to use local arrays. They are not different from e.g. local ints.
C++98 didn't say anything about threads. Programs otherwise written to C++98 but which use threads do not have a meaning that is defined by C++98. Of course, it's sensible for threading extensions to provide stable, private local variables to threads, and they usually do. But there can exist threads for which this is not the case: for instance processes created by vfork on some Unix systems, whereby the parent and child will execute in the same stack frame, since the v in vfork means not to clone the address space, and vfork does not redirect the new process to a different function, either.
In C++11, there is threading support. Local variables in separate activation chains in separate C++11 threads do not interfere. But if you go outside the language and whip out vfork or anything resembling it, then all bets are off, like before.
But here is something. C++ has closures now. What if two threads both invoke the same closure? Then you have two threads sharing the same local variables. A closure is like an object and its captured local variables are like members. If two or more threads call the same closure, then you have a de facto multi-threaded object whose members (i.e. captured lexical variables) are shared.
A thread can also simply pass the address of its local variables to another thread, thereby causing them to become shared.
Related
Consider a program with three threads A,B,C.
They have a shared global object G.
I want to use an atomic variable(i) inside G which is written by Thread B and Read by A.
My approach was:
declare i in G as:
std::atomic<int> i;
write it from thread B using a pointer to G as:
G* pG; //this is available inside A and B
pG->i = 23;
And read it from thread A using the same way.
int k = pG->i;
Is my approach correct if these threads try to access this variable simultaneously.?
Like JV says, it depends what your definition of "correct" is. See http://preshing.com/20120612/an-introduction-to-lock-free-programming/. If it doesn't need to synchronize with anything, you should use std::memory_order_relaxed stores instead of the default sequential consistency stores, so it compiles to more efficient asm (no memory barrier instructions).
But yes, accessing an atomic struct member through a pointer is fine, as long as the pointer itself is initialized before the threads start.
If the struct is a global, then don't use a pointer to it, just access the global directly. Having a separate variable that always points to the same global is an extra level of indirection for no benefit.
If you want to change the pointer, it also needs to be std::atomic<struct foo *> pG, and changing it gets complicated as far as deciding when it's safe to free the old data after changing it.
I am (re-) learning C++ (again, after many years of Java and Python), but it seems I am not familiar with the concepts of heap and stack any more. I am reading threads like this one, which makes it quite clear: local variables in functions/methods live on the stack and will be destroyed when leaving that function/method. For objects that I need longer I would need to allocate memory on the heap.
Fine. But not I am reading a bit of C++ code and see the following (simplified):
struct MyStruct
{
int Integer;
bool Boolean;
}
// in header file
class MyClass
{
TArray<MyStruct> MyArray; // TArray is a custom dynamic array in the framework
void FillArray();
TArray<MyStruct> GetArray() { return MyArray; }
}
// in cpp file
void MyClass::FillArray()
{
for(int i = 0; i < 10; ++i)
{
MyStruct s; // These objects are created on the stack, right?
s.Integer = i;
s.Boolean = true;
MyArray.Add(s);
}
}
So there is a custom struct MyStruct, and a class that has a container MyArray. Then at some point I call MyClass->FillArray() to initialize the array. In that method I create objects of MyStruct on the stack (i.e. not via new) and add them to the array. From my understanding these objects within the array should be destroyed as soon as the FillArray() methods returns.
Now at some point later in code I call MyClass->GetArray(). And to my surprise the array returned does indeed contain all the struct objects that have been created before.
Why are these struct objects still there, why have they not been destroyed as soon as the FillArray() method returns?
Matthias, you seem to have grasped the concept of "stack" and "heap" variables correctly enough. The "magic" you missed is that MyArray::Add clones the provided s and thus its value lives on in the MyArray instance.
If you find "stack-" variables mysterious just think of them as any scoped variable. A scoped variable is destructed when it goes out of scope. And a local variable of a function is just a special case of a scope.
To have data "live on" outside a scope you need to create it on the heap (use new) and pass on the pointer to its location by value. This is also true if you want to share data between threads. Another thread-of-execution is just execution using a call-stack of its own. And thus two threads calling the same function will have their separate local (stack) variables. So understanding the stack-concept is good when you wander into the land of multi-threading.
In garbage collected languages, on the other hand, all data is treated as if "on the heap" and accessed thorough reference counted "pointers". Thus in e.g., C# and Java you don not normally talk about "stack-" versus "heap-" variables.
Many C++ classes implements some "magic" internally to optimise its internal storage for you. C++ strings for example may implement short string optimisation in that stores "short strings" locally (on the stack) and longer strings on the heap (using new). It then applies the RAII idiom to clean up any heap-memory on destruction. In this way you as client does normally not have to care. Just as in your example where you do not have to care how TArray handles its internal memory. You just make sure you pass the values to it correctly.
Good luck with learning C++ again :)
MyArray.Add takes a struct that's in the stack.
The question is wether it takes it by reference or by value.
If it takes it by reference (pointer), then once it goes out of scope you'll end up in a mess.
If it takes it by value, it copies the values (creating duplicates in a different scope).
It seems like what MyArray.Add does is create a new struct with the same values as it's parameter.
My specific question is that when implementing a singleton class in C++, is there any substantial differences between the two below codes regarding performance, side issues or something:
class singleton
{
// ...
static singleton& getInstance()
{
// allocating on heap
static singleton* pInstance = new singleton();
return *pInstance;
}
// ...
};
and this:
class singleton
{
// ...
static singleton& getInstance()
{
// using static variable
static singleton instance;
return instance;
}
// ...
};
(Note that dereferencing in the heap-based implementation should not affect performance, as AFAIK there is no extra machine-code generated for dereferencing. It's seems only a matter of syntax to distinguish from pointers.)
UPDATE:
I've got interesting answers and comments which I try to summarize them here. (Reading detailed answers is recommended for those interested.):
In the singleton using static local variable, the class destructor is automatically invoked at process termination, whereas in the dynamic allocation case, you have to manage object destruction someway at sometime, e.g. by using smart pointers:
static singleton& getInstance() {
static std::auto_ptr<singleton> instance (new singleton());
return *instance.get();
}
The singleton using dynamic allocation is "lazier" than the static singleton variable, as in the later case, the required memory for the singleton object is (always?) reserved at process start-up (as part of the whole memory required for loading program) and only calling of the singleton constructor is deferred to getInstance() call-time. This may matter when sizeof(singleton) is large.
Both are thread-safe in C++11. But with earlier versions of C++, it's implementation-specific.
The dynamic allocation case uses one level of indirection to access the singleton object, whereas in the static singleton object case, direct address of the object is determined and hard-coded at compile-time.
P.S.: I have corrected the terminology I'd used in the original posting according to the #TonyD's answer.
the new version obviously needs to allocate memory at run-time, whereas the non-pointer version has the memory allocated at compile time (but both need to do the same construction)
the new version won't invoke the object's destructor at program termination, but the non-new version will: you could use a smart pointer to correct this
you need to be careful that some static/namespace-scope object's destructors don't invoke your singleton after its static local instance's destructor has run... if you're concerned about this, you should perhaps read a bit more about Singleton lifetimes and approaches to managing them. Andrei Alexandrescu's Modern C++ Design has a very readable treatment.
under C++03, it's implementation-defined whether either will be thread safe. (I believe GCC tends to be, whilst Visual Studio tends not -comments to confirm/correct appreciated.)
under C++11, it's safe: 6.7.4 "If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization." (sans recursion).
Discussion re compile-time versus run-time allocation & initialisation
From the way you've worded your summary and a few comments, I suspect you're not completely understanding a subtle aspect of the allocation and initialisation of static variables....
Say your program has 3 local static 32-bit ints - a, b and c - in different functions: the compiler's likely to compile a binary that tells the OS loader to leave 3x32-bits = 12 bytes of memory for those statics. The compiler decides what offsets each of those variables is at: it may put a at offset 1000 hex in the data segment, b at 1004, and c at 1008. When the program executes, the OS loader doesn't need to allocate memory for each separately - all it knows about is the total of 12 bytes, which it may or may not have been asked specifically to 0-initialise, but it may want to do anyway to ensure the process can't see left over memory content from other users' programs. The machine code instructions in the program will typically hard-code the offsets 1000, 1004, 1008 for accesses to a, b and c - so no allocation of those addresses is needed at run-time.
Dynamic memory allocation is different in that the pointers (say p_a, p_b, p_c) will be given addresses at compile time as just described, but additionally:
the pointed-to memory (each of a, b and c) has to be found at run-time (typically when the static function first executes but the compiler's allowed to do it earlier as per my comment on the other answer), and
if there's too little memory currently given to the process by the Operating System for the dynamic allocation to succeed, then the program library will ask the OS for more memory (e.g. using sbreak()) - which the OS will typically wipe out for security reasons
the dynamic addresses allocated for each of a, b and c have to be copied back into the pointers p_a, p_b and p_c.
This dynamic approach is clearly more convoluted.
The main difference is that using a local static the object will be destroyed when closing the program, instead heap-allocated objects will just be abandoned without being destroyed.
Note that in C++ if you declare a static variable inside a function it will be initialized the first time you enter the scope, not at program start (like it happens instead for global static duration variables).
In general over the years I switched from using lazy initialization to explicit controlled initialization because program startup and shutdown are delicate phases and quite difficult to debug. If your class is not doing anything complex and just cannot fail (e.g. it's just a registry) then even lazy initialization is fine... otherwise being in control will save you quite a lot of problems.
A program that crashes before entering the first instruction of main or after executing last instruction of main is harder to debug.
Another problem of using lazy construction of singletons is that if your code is multithread you've to pay attention to the risk of having concurrent threads initializing the singleton at the same time. Doing initialization and shutdown in a single thread context is simpler.
The possible races during initialization of function-level static instances in multithreaded code has been resolved since C++11, when the language added official multithreading support: for normal cases proper synchronization guards are automatically added by the compiler so this is not a concern in C++11 or later code. However if initialization of a static in function a calls function b and vice-versa you can risk a deadlock if the two functions are called the first time at the same time by different threads (this is not an issue only if the compiler uses a single mutex for all statics). Note also that calling the function that contains a static object from within the initialization code of the static object recursively is not permitted.
What is the problem with having static variables (especially within functions) in multithreaded programs?
Thanks.
Initialization is not thread-safe. Two threads can enter the function and both may initialize the function-scope static variable. That's not good. There's no telling what the result might be.
In C++0x, initialization of function-scope static variables will be thread-safe; the first thread to call the function will initialize the variable and any other threads calling that function will need to block until that initialization is complete.
I don't think there are currently any compiler + standard library pairs that fully implement the C++0x concurrency memory model and the thread support and atomics libraries.
To pick an illustrative example at random, take an interface like asctime in the C library. The prototype looks like this:
char *
asctime(const struct tm *timeptr);
This implicitly must have some global buffer to store the characters in the char* returned. The most common and simple way to accomplish this would be something like:
char *
asctime(const struct tm *timeptr)
{
static char buf[MAX_SIZE];
/* TODO: convert timeptr into string */
return buf;
}
This is totally broken in a multi-threaded environment, because buf will be at the same address for each call to asctime(). If two threads call asctime() at the same time, they run the risk of overwriting each other's results. Implicit in the contract of asctime() is that the characters of the string will stick around until the next call to asctime(), and concurrent calls breaks this.
There are some language extensions that work around this particular problem in this particular example via thread-local storage (__thread,__declspec(thread)). I believe this idea made it into C++0x as the thread_local keyword.
Even so I would argue it's a bad design decision to use it this way, for similar reasons as for why it's bad to use global variables. Among other things, it may be thought of as a cleaner interface for the caller to maintain and provide this kind of state, rather than the callee. These are subjective arguments, however.
A static variable usually means multiple invocations of your function would share a state and thus interfere with one another.
Normally you want your functions to be self contained; have local copies of everything they work on and share nothing with the outside world bar parameters and return values. (Which, if you think a certain way, aren't a part of the function anyway.)
Consider:
int add(int x, int y);
definitely thread-safe, local copies of x and y.
void print(const char *text, Printer *printer);
dangerous, someone outside might be doing something with the same printer, e.g. calling another print() on it.
void print(const char *text);
definitely non-thread-safe, two parallel invocations are guaranteed to use the same printer.
Of course, there are ways to secure access to shared resources (search keyword: mutex); this is just how your gut feeling should be.
Unsynchronized parallel writes to a variable are also non-thread-safe most of the time, as are a read and write. (search keywords: synchronization, synchronization primitives [of which mutex is but one], also atomicity/atomic operation for when parallel access is safe.)
Is there a way to implement a singleton object in C++ that is:
Lazily constructed in a thread safe manner (two threads might simultaneously be the first user of the singleton - it should still only be constructed once).
Doesn't rely on static variables being constructed beforehand (so the singleton object is itself safe to use during the construction of static variables).
(I don't know my C++ well enough, but is it the case that integral and constant static variables are initialized before any code is executed (ie, even before static constructors are executed - their values may already be "initialized" in the program image)? If so - perhaps this can be exploited to implement a singleton mutex - which can in turn be used to guard the creation of the real singleton..)
Excellent, it seems that I have a couple of good answers now (shame I can't mark 2 or 3 as being the answer). There appears to be two broad solutions:
Use static initialisation (as opposed to dynamic initialisation) of a POD static variable, and implementing my own mutex with that using the builtin atomic instructions. This was the type of solution I was hinting at in my question, and I believe I knew already.
Use some other library function like pthread_once or boost::call_once. These I certainly didn't know about - and am very grateful for the answers posted.
Basically, you're asking for synchronized creation of a singleton, without using any synchronization (previously-constructed variables). In general, no, this is not possible. You need something available for synchronization.
As for your other question, yes, static variables which can be statically initialized (i.e. no runtime code necessary) are guaranteed to be initialized before other code is executed. This makes it possible to use a statically-initialized mutex to synchronize creation of the singleton.
From the 2003 revision of the C++ standard:
Objects with static storage duration (3.7.1) shall be zero-initialized (8.5) before any other initialization takes place. Zero-initialization and initialization with a constant expression are collectively called static initialization; all other initialization is dynamic initialization. Objects of POD types (3.9) with static storage duration initialized with constant expressions (5.19) shall be initialized before any dynamic initialization takes place. Objects with static storage duration defined in namespace scope in the same translation unit and dynamically initialized shall be initialized in the order in which their definition appears in the translation unit.
If you know that you will be using this singleton during the initialization of other static objects, I think you'll find that synchronization is a non-issue. To the best of my knowledge, all major compilers initialize static objects in a single thread, so thread-safety during static initialization. You can declare your singleton pointer to be NULL, and then check to see if it's been initialized before you use it.
However, this assumes that you know that you'll use this singleton during static initialization. This is also not guaranteed by the standard, so if you want to be completely safe, use a statically-initialized mutex.
Edit: Chris's suggestion to use an atomic compare-and-swap would certainly work. If portability is not an issue (and creating additional temporary singletons is not a problem), then it is a slightly lower overhead solution.
Unfortunately, Matt's answer features what's called double-checked locking which isn't supported by the C/C++ memory model. (It is supported by the Java 1.5 and later — and I think .NET — memory model.) This means that between the time when the pObj == NULL check takes place and when the lock (mutex) is acquired, pObj may have already been assigned on another thread. Thread switching happens whenever the OS wants it to, not between "lines" of a program (which have no meaning post-compilation in most languages).
Furthermore, as Matt acknowledges, he uses an int as a lock rather than an OS primitive. Don't do that. Proper locks require the use of memory barrier instructions, potentially cache-line flushes, and so on; use your operating system's primitives for locking. This is especially important because the primitives used can change between the individual CPU lines that your operating system runs on; what works on a CPU Foo might not work on CPU Foo2. Most operating systems either natively support POSIX threads (pthreads) or offer them as a wrapper for the OS threading package, so it's often best to illustrate examples using them.
If your operating system offers appropriate primitives, and if you absolutely need it for performance, instead of doing this type of locking/initialization you can use an atomic compare and swap operation to initialize a shared global variable. Essentially, what you write will look like this:
MySingleton *MySingleton::GetSingleton() {
if (pObj == NULL) {
// create a temporary instance of the singleton
MySingleton *temp = new MySingleton();
if (OSAtomicCompareAndSwapPtrBarrier(NULL, temp, &pObj) == false) {
// if the swap didn't take place, delete the temporary instance
delete temp;
}
}
return pObj;
}
This only works if it's safe to create multiple instances of your singleton (one per thread that happens to invoke GetSingleton() simultaneously), and then throw extras away. The OSAtomicCompareAndSwapPtrBarrier function provided on Mac OS X — most operating systems provide a similar primitive — checks whether pObj is NULL and only actually sets it to temp to it if it is. This uses hardware support to really, literally only perform the swap once and tell whether it happened.
Another facility to leverage if your OS offers it that's in between these two extremes is pthread_once. This lets you set up a function that's run only once - basically by doing all of the locking/barrier/etc. trickery for you - no matter how many times it's invoked or on how many threads it's invoked.
Here's a very simple lazily constructed singleton getter:
Singleton *Singleton::self() {
static Singleton instance;
return &instance;
}
This is lazy, and the next C++ standard (C++0x) requires it to be thread safe. In fact, I believe that at least g++ implements this in a thread safe manner. So if that's your target compiler or if you use a compiler which also implements this in a thread safe manner (maybe newer Visual Studio compilers do? I don't know), then this might be all you need.
Also see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2513.html on this topic.
You can't do it without any static variables, however if you are willing to tolerate one, you can use Boost.Thread for this purpose. Read the "one-time initialisation" section for more info.
Then in your singleton accessor function, use boost::call_once to construct the object, and return it.
For gcc, this is rather easy:
LazyType* GetMyLazyGlobal() {
static const LazyType* instance = new LazyType();
return instance;
}
GCC will make sure that the initialization is atomic. For VC++, this is not the case. :-(
One major issue with this mechanism is the lack of testability: if you need to reset the LazyType to a new one between tests, or want to change the LazyType* to a MockLazyType*, you won't be able to. Given this, it's usually best to use a static mutex + static pointer.
Also, possibly an aside: It's best to always avoid static non-POD types. (Pointers to PODs are OK.) The reasons for this are many: as you mention, initialization order isn't defined -- neither is the order in which destructors are called though. Because of this, programs will end up crashing when they try to exit; often not a big deal, but sometimes a showstopper when the profiler you are trying to use requires a clean exit.
While this question has already been answered, I think there are some other points to mention:
If you want lazy-instantiation of the singleton while using a pointer to a dynamically allocated instance, you'll have to make sure you clean it up at the right point.
You could use Matt's solution, but you'd need to use a proper mutex/critical section for locking, and by checking "pObj == NULL" both before and after the lock. Of course, pObj would also have to be static ;)
.
A mutex would be unnecessarily heavy in this case, you'd be better going with a critical section.
But as already stated, you can't guarantee threadsafe lazy-initialisation without using at least one synchronisation primitive.
Edit: Yup Derek, you're right. My bad. :)
You could use Matt's solution, but you'd need to use a proper mutex/critical section for locking, and by checking "pObj == NULL" both before and after the lock. Of course, pObj would also have to be static ;) . A mutex would be unnecessarily heavy in this case, you'd be better going with a critical section.
OJ, that doesn't work. As Chris pointed out, that's double-check locking, which is not guaranteed to work in the current C++ standard. See: C++ and the Perils of Double-Checked Locking
Edit: No problem, OJ. It's really nice in languages where it does work. I expect it will work in C++0x (though I'm not certain), because it's such a convenient idiom.
read on weak memory model. It can break double-checked locks and spinlocks. Intel is strong memory model (yet), so on Intel it's easier
carefully use "volatile" to avoid caching of parts the object in registers, otherwise you'll have initialized the object pointer, but not the object itself, and the other thread will crash
the order of static variables initialization versus shared code loading is sometimes not trivial. I've seen cases when the code to destruct an object was already unloaded, so the program crashed on exit
such objects are hard to destroy properly
In general singletons are hard to do right and hard to debug. It's better to avoid them altogether.
I suppose saying don't do this because it's not safe and will probably break more often than just initializing this stuff in main() isn't going to be that popular.
(And yes, I know that suggesting that means you shouldn't attempt to do interesting stuff in constructors of global objects. That's the point.)