I have a hypothesis here, but it's a little tough to verify.
Is there a unique stack frame for each calling thread when two threads invoke the same method of the same object instance? In a compiled binary, I understand a class to be a static code section filled with function definitions in memory and the only difference between different objects is the this pointer which is passed beneath the hood.
But therefore the thread calling it must have its own stack frame, or else two threads trying to access the same member function of the same object instance, would be corrupting one another's local variables.
Just to reiterate here, I'm not referring to whether or not two threads can corrupt the objects data by both modifying this at the same time, I'm well aware of that. I'm more getting at whether or not, in the case that two threads enter the same method of the same instance at the same time, whether or not the local variables of that context are the same places in memory. Again, my assumption is that they are not.
You are correct. Each thread makes use of its own stack and each stack makes local variables distinct between threads.
This is not specific to C++ though. It's just the way processors function. (That is in modern processors, some older processors had only one stack, like the 6502 that had only 256 bytes of stack and no real capability to run threads...)
Objects may be on the stack and shared between threads and thus you can end up modifying the same object on another thread stack. But that's only if you share that specific pointer.
you are right that different threads have unique stacks. That is not a feature of c++ or cpp but something provided by the OS. class objects won't necessary be different. This depends on how they are allocated. Different threads could share heap objects which might lead to concurrent problem.
Local variables of any function or class method are stored in each own stack (actually place in thread's stack, stack frame), so it is doesn't matter from what thread you're calling method - it will use it's own stack during execution for each call
a little different explanation: each method call creates its own stack (or better stack frame)
NOTE: static variables will be the same
of course there exists techniques to get access to another's method's stack memory during execution, but there are kinda hacks
Related
I have read that each function invocation leads to pushing of a stack frame in the global call stack and once the function call is completed the call stack is popped off and the control passes to the address that we get from the popped of stack frame. If a called function calls on to yet another function, it will push another return address onto the top of the same call stack, and so on, with the information stacking up and unstacking as the program dictates.
I was wondering what's at the base of global call stack in a C or C++ program?
I did some searching on the internet but none of the sources explicitly mention about it. Is the call stack empty when our program starts and only once a function is called, the call stack usage starts? OR Is the address where main() function has to return, gets implicitly pushed as the base of our call stack and is a stack frame in our call stack? I expect the main() would also have a stack frame in our call stack since we are always returning something at end of our main() function and there needs to be some address to return to. OR is this dependent on compiler/OS and differs according to implementation?
It would be helpful if someone has some informative links about this or could provide details on the process that goes into it.
main() is invoked by the libc code that handles setting up the environment for the executable etc. So by the time main() is called, the stack already has at least one frame created by the caller.
I'm not sure if there is a universal answer, as stack is something that may be implemented differently per architecture. For example a stack may grow up (i.e. stack position pointer value increases when pushing onto the stack) or grow downwards.
Exiting main() is usually done by calling an operating function to indicate the program wishes to to terminate (with the specified return code), so I don't expect a return address for main() to be present on the stack, but this may differ per operating system and even compiler.
I'm not sure why you need to know this, as this is typically something you leave up to the system.
First of all, there is no such thing as a "global call stack". Each thread has a stack, and the stack for the main thread is often looking quite different from the thread of any thread spawned later on. And mostly, each of these "stacks" is just an arbitrary memory segment currently declared to be used as such, sub-allocated from any arbitrary suitable memory pool.
And due to compiler optimizations, many function calls will not even end up on the stack, usually. Meaning there isn't necessarily a distinguishable stack frame. You are only guaranteed that you can reference variables you put on the stack, but not that the compiler must preserve anything you didn't explicitly reference.
There is not even a guarantee that the memory layout for your call stack must even be organized in distinguishable frames. Function pointers are never guaranteed to be part of the stack frame, just happens to be an implementation detail in architectures where data and function pointers may co-exist in the address space. (As there are architectures which require return addresses to be stored in a different address space than the data used in the call stack.)
That aside, yes, there is code which is executed outside of the main() function. Specifically initializers for global static variables, code to set up the runtime environment (env, call parameters, stdin/stdout) etc.
E.g. when having linked to libc, there is __libc_start_main which will call your main function after initialization is done. And clean up when your main function returns.
__libc_start_main is about the point where "stack" starts being used, as far as you can see from within the program. That's not actually true though, there has already been some loader code been executed in kernel space, for reserving memory for your process to operate in initially (including memory for the future stack), initializing registers and memory to well defined values etc.
Right before actually "starting" your process, after dropping out of kernel mode, arbitrary pointers to a future stack, and the first instruction of your program, are loaded into the corresponding processor registers. Effectively, that's where __libc_start_main (or any other initialization function, depending on your runtime) starts running, and the stack visible to you starts building up.
Getting back into the kernel usually involves an interrupt now, which doesn't follow the stack either, but may just directly access processor registers to simply swap the contents of the corresponding processor registers. (E.g. if you call a function from the kernel, the memory required by the call stack inside the function call is not allocated from your stack, but from one you don't even have access to.)
Either way, everything that happens before main() is called, and whenever you enter a syscall, is implementation dependent, and you are not guaranteed any specific observable behavior. And messing around with processor registers, and thereby alternating the program flow, is also far outside defined behavior as far as a pure C / C++ run time is concerned.
Every system I have seen, when main() is called a stack is setup. It has to be or just declaring a variable inside main would fail. A stack is setup once a thread or process is created. Thus any thread of execution has a stack. Further in every assembly language i know, a register or fixed memory location is used to indicate the current value of the stack pointer, so the concept of a stack always exists (the stack pointer might be bad, but stack operations always exist since they are built into the every mainstream assembly language).
I have in a Server object multiple thread who are doing the same task. Those threads are init with a Server::* routine.
In this routine there is a infinite loop with some treatments.
I was wondering if it was thread safe to use the same method for multiple threads ? No wonder for the fields of the class, If I want to read or write it I will use a mutex. But what about the routine itself ?
Since a function is an address, those thread will be running in the same memory zone ?
Do I need to create a method with same code for every thread ?
Ps: I use std::mutex(&Server::Task, this)
There is no problem with two threads running the same function at the same time (whether it's a member function or not).
In terms of instructions, it's similar to if you had two threads reading the same field at the same time - that's fine, they both get the same value. It's when you have one writing and one reading, or two writing, that you can start to have race conditions.
In C++ every thread is allocated its own call stack. This means that all local variables which exist only in the scope of a given thread's call stack belong to that thread alone. However, in the case of shared data or resources, such as a global data structure or a database, it is possible for different threads to access these at the same time. One solution to this synchronization problem is to use std::mutex, which you are already doing.
While the function itself might be the same address in memory in terms of its place in the table you aren't writing to it from multiple locations, the function itself is immutable and local variables scoped inside that function will be stacked per thread.
If your writes are protected and the fetches don't pull stale data you're as safe as you could possibly need on most architectures and implementations out there.
Behind the scenes, int Server::Task(std::string arg) is very similar to int Server__Task(Server* this, std::string arg). Just like multiple threads can execute the same function, multiple threads can also execute the same member function - even with the same arguments.
A mutex ensures that no conflicting changes are made, and that each thread sees every prior change. But since code does not chance, you don't need a mutex for it, just like you don't need a mutex for string literals.
If I have shared an object between 2 threads (by passing in a void pointer to the object) what happens if they both try to call (different) methods at the same time? I'm not worried about the member variables themselves, there's a mutex in place for other reasons that luckily covers that already. The main function are calling disjoint methods so there's no possibility of overlap that way, but I wasn't sure what the behavior would be if main calls thing.a() while thread calls thing.b() at the same time (or even if they just overlap for that matter).
Nothing special would happen, each thread has its own stack, and each call (even if it was to the same function) would have its own call-frame and its own set of arguments and local variables.
I have long known that Threads each have separate stack-space, but shared heap-memory.
But I recently found some code that made me question exactly what that meant.
Here is a shortened version of the code:
void SampleFunction()
{
CRemoteMessage rmessage;
rMessage.StartBackgroundAsync(); // Kickoff a background thread.
/* Do other long-running work here...
* but don't leave function SampleFunction
*/
rMessage.GetReply(); // Blocks if needed, but the message-background is mostly done by now.
rMessage.ProcessReply();
}
In this code, the rmessage is a local, stack-variable, but spends most of its time in a background thread. Is this safe?? How exactly is the background thread able to access the stack-variable of this thread?
Generally speaking, the stack and heap are part of the memory space that can be shared between threads. No one is preventing you from sharing stack addressed variables.
Each thread however has its own set of registers, including a stack pointer (and the derivatives), so you can maintain separate stacks if you need (otherwise it would be impossible), so the threads can call functions and do whatever they need. You can choose to break this separation if you want.
I think the confusion here is that you think of the stack of a thread as a separate entity that can only be accessed by the one thread. That's not how this works.
Every process has one large memory space to its use and every thread can read (and write!) everything in this space; the separation into stack-space and heap is a higher level design decision. For the background thread it doesn't matter whether the memory it receives is allocated on another thread's stack or on the heap.
There are even rare situations where you want to create a new stack for a thread yourself - makes no difference to the thread itself.
If I have an C++ object created in the main thread, and then start another thread, and from that thread I call a public member function of the object I created, what happens?
Is it different if the public function has parameters or if it manipulates private object members?
Does it behave differently on windows, linux or mac os?
What happens if the object is created on the stack?
There are two points that matter:
first, as usual, you need to esnure that the lifetime of the instance exceeds the duration of its usage.
second, access to variables across multiples threads need be synchronized to prevent race conditions.
That's all folks.
Each thread has a own stack and thus you can have concurrent streams of execution. It is your own duty to make the object thread-safe.
It does not matter. However, private members are a candidate for race conditions.
If you create an object on the stack, it won't be accessible from another thread.
If I have an C++ object created in the main thread, and then start another thread, and from that thread I call a public member function of the object I created, what happens?
It depends on lifetime of the object.
If the object is created on heap(dynamic memory using new) then the other thread will access the members of the object correctly(assuming no race conditions) unless the lifetime of the object ended by calling delete in the first thread.
If the object is created on stack(locally) in the first thread then you will have a *Undefined Behavior*if the lifetime of the created object ended before being accessed in second thread.
Why can you access the object on stack in second thread?
Each thread has its own stack and unless the object created on stack of thread is valid and alive You would be trying to access an address location which doesn't point to any valid object in second thread.
Note that each process has an address space and all threads in the same process share the same address space, hence the address of the variable can be accessed in the second thread. However, You need to ensure that address contains a valid object.
Is it different if the public function has parameters or if it manipulates private object members?
Access specifiers and multithreading are not related at all.
Same access specifier rules apply in all threads.
Does it behave differently on windows, linux or mac os?
The answer to #1 is guaranteed on all Operating systems.
Compared to the original behaviour there should be no differences if created on the heap. However there are some culprits of course, usually known under the term "thread safety". If you access the same member from different threads, you have to ensurse that accessing the same resources does not lead to a "race condition".
To avoid race conditions you can use different kind of "locks", for instance mutexes etc. When using using lock objects there is another culprit: The danger of "deadlocks", if two accessors wait for each other and the original lock never gets released.
It will work perfectly well. Objects do not belong to any specific thread and can equally well be called from anywhere.
However, and this is important, calling member function on two threads at the same time will cause problems where you update some data in one thread while reading it in another. You need to either arrange your code to ensure this can't happen, or ensure that your threads coordinate access (using mutex most likely)
What happens is exactly what happens if you call it from the same
thread. The same machine code gets executed. The only potential
difference is that you can have several threads accessing the object at
the same time; it's up to you to protect against this (at least if any
of the threads is modifying the object—otherwise, no protection is
needed).
In the case of an object on the stack, you have to consider lifetime
issues, but this is the case anyway; save a pointer to an object on the
stack in a global variable, then leave the scope where the object was
defined, and the global variable becomes a dangling pointer; trying to
access the object through it is undefined behavior (and calling a
non-static member function on it it is considered using it). Whether
the access is from the same thread or a different thread doesn't change
anything.