The significance of separate stack-space for threads

The significance of separate stack-space for threads - c++

I have long known that Threads each have separate stack-space, but shared heap-memory.
But I recently found some code that made me question exactly what that meant.
Here is a shortened version of the code:
void SampleFunction()
{
CRemoteMessage rmessage;
rMessage.StartBackgroundAsync(); // Kickoff a background thread.
/* Do other long-running work here...
* but don't leave function SampleFunction
*/
rMessage.GetReply(); // Blocks if needed, but the message-background is mostly done by now.
rMessage.ProcessReply();
}
In this code, the rmessage is a local, stack-variable, but spends most of its time in a background thread. Is this safe?? How exactly is the background thread able to access the stack-variable of this thread?

Generally speaking, the stack and heap are part of the memory space that can be shared between threads. No one is preventing you from sharing stack addressed variables.
Each thread however has its own set of registers, including a stack pointer (and the derivatives), so you can maintain separate stacks if you need (otherwise it would be impossible), so the threads can call functions and do whatever they need. You can choose to break this separation if you want.

I think the confusion here is that you think of the stack of a thread as a separate entity that can only be accessed by the one thread. That's not how this works.
Every process has one large memory space to its use and every thread can read (and write!) everything in this space; the separation into stack-space and heap is a higher level design decision. For the background thread it doesn't matter whether the memory it receives is allocated on another thread's stack or on the heap.
There are even rare situations where you want to create a new stack for a thread yourself - makes no difference to the thread itself.

Related

Cache efficiency with static member in thread

I'm currently making an application with multiple worker threads, running in parallel. The main part of the program is executed before the workers, and each workers are put to sleep when they have finished their tasks:
MainLoop()
{
// ...
SoundManager::PlaySound("sound1.mp3"); // Add a sound to be played, store the sound in a list in SoundManager
SoundManager::PlaySound("sound2.mp3");
SoundManager::PlaySound("sound3.mp3");
// ...
SoundThreadWorker.RunJob(); // Wake up thread and play every sound pushed in SoundManager
// Running other threads
SoundThreadWorker.WaitForFinish(); // Wait until the thread have finished its tasks, thread is put to sleep(but not closed)
// Waiting other threads
// ...
}
// In SoundThreadWorker class, running in a different thread from the main loop
RunJob()
{
SoundManager::PlayAllSound(); // Play all sound stored in SoundManager
}
In this case, the static variable storing all sounds should be safe because no sound are added when the thread is running.
Is this cache efficient?
I have read here that: https://www.agner.org/optimize/optimizing_cpp.pdf
"The different threads need separate storage. No function or class
that is used by multiple threads should rely on static or global
variables. (See thread-local storage p. 28) The threads have each
their stack. This can cause cache contentions if the threads share
the same cache."
I have a hard time understand how static variable are stored in cache, and how they are used by each thread. Do I have two instance of SoundManager in cache, since thread does not share their stack? Do I need to create a shared memory to avoid this problem?

That passage is about memory that is changed, not about memory that remains constant. Sharing constants between threads is fine.
When you have multiple CPUs each updating the same place, they have to be sending their changes back and forth to each other all the time. This results in contention for 'owning' a particular piece of memory.
Often the ownership isn't explicit. But when one CPU tells all the others that a particular cache line needs to be invalidated because it just changed something there, then all the other CPUs have to evict the value from their caches. This has the effect of the CPU to last modify a piece of memory effectively 'owning' the cache line it was in.
And, again, this is only an issue for things that are changed.
Also, the view of memory and cache that I gave you is rather simplistic. Please don't use it when reasoning about the thread safety of a particular piece of code. It's sufficient to understand why multiple CPUs updating the same piece of memory is bad for your cache, but it's not sufficient for understanding which CPU's version of a particular memory location ends up being used by the others.
A memory location that doesn't change during the lifetime of a thread being used by multiple threads will result in that memory location appearing in multiple CPU caches. But this isn't a problem. Nor is it a problem for a particular memory location that doesn't change to be stored in the L2 and L3 caches that are shared between CPUs.

C++ constructor memory synchronization

Assume that I have code like:
void InitializeComplexClass(ComplexClass* c);
class Foo {
public:
Foo() {
i = 0;
InitializeComplexClass(&c);
}
private:
ComplexClass c;
int i;
};
If I now do something like Foo f; and hand a pointer to f over to another thread, what guarantees do I have that any stores done by InitializeComplexClass() will be visible to the CPU executing the other thread that accesses f? What about the store writing zero into i? Would I have to add a mutex to the class, take a writer lock on it in the constructor and take corresponding reader locks in any methods that accesses the member?
Update: Assume I hand a pointer over to a bunch of other threads once the constructor has returned. I'm not assuming that the code is running on x86, but could be instead running on something like PowerPC, which has a lot of freedom to do memory reordering. I'm essentially interested in what sorts of memory barriers the compiler has to inject into the code when the constructor returns.

In order for the other thread to be able to know about your new object, you have to hand over the object / signal other thread somehow. For signaling a thread you write to memory. Both x86 and x64 perform all memory writes in order, CPU does not reorder these operations with regards to each other. This is called "Total Store Ordering", so CPU write queue works like "first in first out".
Given that you create an object first and then pass it on to another thread, these changes to memory data will also occur in order and the other thread will always see them in the same order. By the time the other thread learns about the new object, the contents of this object was guaranteed to be available for that thread even earlier (if the thread only somehow knew where to look).
In conclusion, you do not have to synchronise anything this time. Handing over the object after it has been initialised is all the synchronisation you need.
Update: On non-TSO architectures you do not have this TSO guarantee. So you need to synchronise. Use MemoryBarrier() macro (or any interlocked operation), or some synchronisation API. Signalling the other thread by corresponding API causes also synchronisation, otherwise it would not be synchronisation API.
x86 and x64 CPU may reorder writes past reads, but that is not relevant here. Just for better understanding - writes can be ordered after reads since writes to memory go through a write queue and flushing that queue may take some time. On the other hand, read cache is always consistent with latest updates from other processors (that have went through their own write queue).
This topic has been made so unbelievably confusing for so many, but in the end there is only a couple of things a x86-x64 programmer has to be worried about:
- First, is the existence of write queue (and one should not at all be worried about read cache!).
- Secondly, concurrent writing and reading in different threads to same variable in case of non-atomic variable length, which may cause data tearing, and for which case you would need synchronisation mechanisms.
- And finally, concurrent updates to same variable from multiple threads, for which we have interlocked operations, or again synchronisation mechanisms.)

If you do :
Foo f;
// HERE: InitializeComplexClass() and "i" member init are guaranteed to be completed
passToOtherThread(&f);
/* From this point, you cannot guarantee the state/members
of 'f' since another thread can modify it */
If you're passing an instance pointer to another thread, you need to implement guards in order for both threads to interact with the same instance. If you ONLY plan to use the instance on the other thread, you do not need to implement guards. However, do not pass a stack pointer like in your example, pass a new instance like this:
passToOtherThread(new Foo());
And make sure to delete it when you are done with it.

Do cpp object methods have their own stack frame?

I have a hypothesis here, but it's a little tough to verify.
Is there a unique stack frame for each calling thread when two threads invoke the same method of the same object instance? In a compiled binary, I understand a class to be a static code section filled with function definitions in memory and the only difference between different objects is the this pointer which is passed beneath the hood.
But therefore the thread calling it must have its own stack frame, or else two threads trying to access the same member function of the same object instance, would be corrupting one another's local variables.
Just to reiterate here, I'm not referring to whether or not two threads can corrupt the objects data by both modifying this at the same time, I'm well aware of that. I'm more getting at whether or not, in the case that two threads enter the same method of the same instance at the same time, whether or not the local variables of that context are the same places in memory. Again, my assumption is that they are not.

You are correct. Each thread makes use of its own stack and each stack makes local variables distinct between threads.
This is not specific to C++ though. It's just the way processors function. (That is in modern processors, some older processors had only one stack, like the 6502 that had only 256 bytes of stack and no real capability to run threads...)
Objects may be on the stack and shared between threads and thus you can end up modifying the same object on another thread stack. But that's only if you share that specific pointer.

you are right that different threads have unique stacks. That is not a feature of c++ or cpp but something provided by the OS. class objects won't necessary be different. This depends on how they are allocated. Different threads could share heap objects which might lead to concurrent problem.

Local variables of any function or class method are stored in each own stack (actually place in thread's stack, stack frame), so it is doesn't matter from what thread you're calling method - it will use it's own stack during execution for each call
a little different explanation: each method call creates its own stack (or better stack frame)
NOTE: static variables will be the same
of course there exists techniques to get access to another's method's stack memory during execution, but there are kinda hacks

Boost thread - Out of scope possibility

I was curious about the accuracy of the following code
for(int i=0 ; i<5 ; i++)
{
SomeClass* ptrinst = new SomeClass()
boost::thread t( boost::bind (&SomeClass::SomeMethod,ptrinst));
......
}
What would happen to the running thread when t runs out of scope ?

Since the main thread does not call t.join(), the main thread will continue to run its loop, spawning additional threads and then continue onwards. So the answer is, under your current coding, the child threads will not interact with your parent thread (at least not directly).
Also note that the thread class is a strange beast - the only thing that happens when you fall out of scope is, your main thread no longer has a handle to call t.join() on. The fact that it falls out of scope of the parent thread has zero impact on the child thread. Once you spawn your child thread by instantiating it, the child is, essentially, decoupled from the parent (well, the globals/dynamically allocated memory that were visible in the parent are also visible to the child, but you will need mutexes if you want to modify/mutate those globals). As I mentioned later in the post, you need to gain a solid understanding of memory visibility and ownership within a threading context. Just reading my comments here probably will not help you.
If you want the main thread to wait on the completion of the child threads, you need to store those threads in a std::vector<boost::thread> v; outside of your loop and then in a second loop, call join on all those instances.
Your current code looks a bit suspect as you are invoking an instance method through bind - that's fine, but I wouldn't normally expect that instance method to call delete this; which means it's up to the parent thread to clean up (the parent thread shouldn't clean up until the child threads are done). However, there is no way for it to clean up at the right time without some kind of thread synchronization. Hence, a memory leak or some kind of nasty race condition is almost assured (suppose you put a delete ptrinst; in the ... portion of your main thread in an attempt to clean-up. Without some kind of synchronization, you may delete the pointer before the child threads are done using it).
Also, you may want to use std::thread and std::bind in place of the boost versions.
One last note: I suspect you are still experimenting with the use of threads. If this is true, it may be a good idea to read up and experiment a lot more with simpler examples until you try to fix this code. Otherwise, you may be setting yourself for a world of hurt (debugging hell including race conditions, weird memory synchronization issues, etc...).
Try to build a more solid understanding of what happens with memory and threads: what memory is visible to what threads and what memory can and cannot be shared.

Is it safe to pass (synchronously) stack-allocated memory to other thread?

Recently I heard that memory in the stack is not shared with other thread and memory in the heap is shared with other threads.
I normally do:
HWND otherThreadHwnd;
DWORD commandId;
// initialize commandId and otherThreadHwnd
struct MyData {
int data1_;
long data2_;
void* chunk_;
};
int abc() {
MyData myData;
// initialize myData
SendMessage(otherThreadHwnd,commandId,&myData);
// read myData
}
Is it alright to do this?

Yes, it is safe in this instance.
Data on the stack only exists for the lifetime of the function call. Since SendMessage is a synchronous, blocking call, the data will be valid for the duration of that call.
This code would be broken if you replace SendMessage with a call to PostMessage, SendNotifyMessage, or SendMessageCallback, since they would not block and the function may have returned before the target window received the message.

I think 2 different issues are being confused by whoever you "heard that memory in the stack is not shared with other thread":
object lifetime - the data on the stack is only valid as long the thread doesn't leave the scope of the variable's name. In the example you giove, you're handling this by making the call to the other thread synchronously.
memory address visibility - the addresses pspace for a process is shared among the various threads in that process. So variables addressable by one thread are addressable by other threads in that process. If you are passing the address to a thread in a different process, the situation is quite different and you'd need to use some other mechanism (which might be to ensure that the memory block is mapped into both processes - but that I don't think that can normally be done with stack memory).

Yes, it is okay.
SendMessage is working in blocking mode. Even if myData is allocated in stack, its address is still visible to all threads in the process. Each thread has its own private stack; but data in the stack can be explicitly shared, for example, by your code. However, as you guess, do not use PostThreadMessage in such case.

What you heard about is "potential infringement of privacy", which is sharing the data on one thread's private stack with another thread.
Although it is not encouraged, it is only a "potential" problem--with correct synchronization, it can be done safely. In your case, this synchronization is done by ::SendMessage(); it will not return until the message is processed in the other thread, so the data will not go out of scope on the main thread's stack. But beware that whatever you do with this pointer in the worker thread, it must be done before returning from the message handler (if you're storing it somewhere, be sure to make a copy).

As others have said already, how you have it written is just fine, and in general, nothing will immediately fail when passing a pointer to an object on the stack to another thread as long as everything's synchronized. However, I tend to cringe a little when doing so because things that seem threadsafe can get out of their intended order when an exception occurs or if one of the threads is involved with asynchronous IO callbacks. In the case of an exception in the other thread during your call to SendMessage, it may return 0 immediately. If the exception is later handled in the other thread, you may have an access violation. Yet another potential hazard is that whatever's being stored on the stack can never be forcibly disposed of from another thread. If it gets stuck waiting for some callback, object, etc, forever and the user has decided to cancel or quit the application, there is no way for the working thread to be sure the stalled thread has tidied up whatever objects are on its stack.
My point is this: In simple scenarios as you've described where everything works perfectly, nothing ever changes, and no outside dependencies fail, sharing pointers to the local stack is safe - but since allocating on the heap is really just as simple, and it gives you the opportunity to explicitly control the object's lifetime from any thread in extenuating circumstances, why not just use the heap?
Finally, I strongly suggest that you be very careful with the void* chunk_ member of your MyData structure, as it is not threadsafe as described if it's copied in the other thread.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js