Does the C++ standard provide a guarantee about the non-overlapping nature of thread stacks (as in started by an std::thread)? In particular is there a guarantee that threads will have have their own, exclusive, allocated range in the process's address space for the thread stack? Where is this described in the standard?
For example
std::uintptr_t foo() {
auto integer = int{0};
return std::bit_cast<std::uintptr_t>(&integer);
...
}
void bar(std::uint64_t id, std::atomic<std::uint64_t>& atomic) {
while (atomic.load() != id) {}
cout << foo() << endl;
atomic.fetch_add(1);
}
int main() {
auto atomic = std::atomic<std::uint64_t>{0};
auto one = std::thread{[&]() { bar(0, atomic); }};
auto two = std::thread{[&]() { bar(1, atomic); }};
one.join();
two.join();
}
Can this ever print the same value twice? It feels like the standard should be providing this guarantee somewhere. But not sure..
The C++ standard does not even require that function calls are implemented using a stack (or that threads have stack in this sense).
The current C++ draft says this about overlapping objects:
Two objects with overlapping lifetimes that are not bit-fields may have the same address if one is nested within the other, or if at least one is a subobject of zero size and they are of different types; otherwise, they have distinct addresses and occupy disjoint bytes of storage.
And in the (non-normative) footnote:
Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference ([intro.execution]).
In your example, I do not think the threads synchronize properly, as probably intended, so the lifetimes of the integer objects do not necessarily overlap, so both objects can be put at the same address.
If the code were fixed to synchronize properly and foo were manually inlined into bar, in such a way that the integer object still exists when its address is printed, then there would have to be two objects allocated at different addresses because the difference is observable.
However, none of this tells you whether stackful coroutines can be implemented in C++ without compiler help. Real-world compilers make assumptions about the execution environment that are not reflected in the C++ standard and are only implied by the ABI standards. Particularly relevant to stack-switching coroutines is the fact that the address of the thread descriptor and thread-local variables does not change while executing a function (because they can be expensive to compute and the compiler emits code to cache them in registers or on the stack).
This is what can happen:
Coroutine runs on thread A and accesses errno.
Coroutine is suspended from thread A.
Coroutine resumes on thread B.
Coroutine accesses errno again.
At this point, thread B will access the errno value of thread A, which might well be doing something completely different at this point with it.
This problem is avoid if a coroutine is only ever be resumed on the same thread on which it was suspended, which is very restrictive and probably not what most coroutine library authors have in mind. The worst part is that resuming on the wrong thread is likely appear to work, most of the time, because some widely-used thread-local variables (such as errno) which are not quite thread-local do not immediately result in obviously buggy programs.
For all the Standard cares, implementations call new __StackFrameFoo when foo() needs a stack frame. Where those end, who knows.
The chief rule is that different objects have different addresses, and that includes object which "live on the stack". But the rule only applies to two objects which exist at the same time, and then only as far as the comparison is done with proper thread synchronization. And of course, comparing addresses does hinder the optimizer, which might need to assign an address for an object that could otherwise be optimized out.
Related
In C++11 standard the machine model changed from a single thread machine to a multi threaded machine.
Does this mean that the typical static int x; void func() { x = 0; while (x == 0) {} } example of optimized out read will no longer happen in C++11?
EDIT: for those who don't know this example (I'm seriously astonished), please read this: https://en.wikipedia.org/wiki/Volatile_variable
EDIT2:
OK, I was really expecting that everyone who knew what volatile is has seen this example.
If you use the code in the example the variable read in the cycle will be optimized out, making the cycle endless.
The solution of course is to use volatile which will force the compiler to read the variable on each access.
My question is if this is a deprecated problem in C++11, since the machine model is multi-threaded, therefore the compiler should consider concurrent access to variable to be present in the system.
Whether it is optimized out depends entirely on compilers and what they choose to optimize away. The C++98/03 memory model does not recognize the possibility that x could change between the setting of it and the retrieval of the value.
The C++11 memory model does recognize that x could be changed. However, it doesn't care. Non-atomic access to variables (ie: not using std::atomics or proper mutexes) yields undefined behavior. So it's perfectly fine for a C++11 compiler to assume that x never changes between the write and reads, since undefined behavior can mean, "the function never sees x change ever."
Now, let's look at what C++11 says about volatile int x;. If you put that in there, and you have some other thread mess with x, you still have undefined behavior. Volatile does not affect threading behavior. C++11's memory model does not define reads or writes from/to x to be atomic, nor does it require the memory barriers needed for non-atomic reads/writes to be properly ordered. volatile has nothing to do with it one way or the other.
Oh, your code might work. But C++11 doesn't guarantee it.
What volatile tells the compiler is that it can't optimize memory reads from that variable. However, CPU cores have different caches, and most memory writes do not immediately go out to main memory. They get stored in that core's local cache, and may be written... eventually.
CPUs have ways to force cache lines out into memory and to synchronize memory access among different cores. These memory barriers allow two threads to communicate effectively. Merely reading from memory in one core that was written in another core isn't enough; the core that wrote the memory needs to issue a barrier, and the core that's reading it needs to have had that barrier complete before reading it to actually get the data.
volatile guarantees none of this. Volatile works with "hardware, mapped memory and stuff" because the hardware that writes that memory makes sure that the cache issue is taken care of. If CPU cores issued a memory barrier after every write, you can basically kiss any hope of performance goodbye. So C++11 has specific language saying when constructs are required to issue a barrier.
volatile is about memory access (when to read); threading is about memory integrity (what is actually stored there).
The C++11 memory model is specific about what operations will cause writes in one thread to become visible in another. It's about memory integrity, which is not something volatile handles. And memory integrity generally requires both threads to do something.
For example, if thread A locks a mutex, does a write, and then unlocks it, the C++11 memory model only requires that write to become visible to thread B if thread B later locks it. Until it actually acquires that particular lock, it's undefined what value is there. This stuff is laid out in great detail in section 1.10 of the standard.
Let's look at the code you cite, with respect to the standard. Section 1.10, p8 speaks of the ability of certain library calls to cause a thread to "synchronize with" another thread. Most of the other paragraphs explain how synchronization (and other things) build an order of operations between threads. Of course, your code doesn't invoke any of this. There is no synchronization point, no dependency ordering, nothing.
Without such protection, without some form of synchronization or ordering, 1.10 p21 comes in:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
Your program contains two conflicting actions (reading from x and writing to x). Neither is atomic, and neither is ordered by synchronization to happen before the other.
Thus, you have achieved undefined behavior.
So the only case where you get guaranteed multithreaded behavior by the C++11 memory model is if you use a proper mutex or std::atomic<int> x with the proper atomic load/store calls.
Oh, and you don't need to make x volatile too. Anytime you call a (non-inline) function, that function or something it calls could modify a global variable. So it cannot optimize away the read of x in the while loop. And every C++11 mechanism to synchronize requires calling a function. That just so happens to invoke a memory barrier.
Intel developer zone mentions, "Volatile: Almost Useless for Multi-Threaded Programming"
The volatile keyword is used in this signal handler example from cppreference.com
#include <csignal>
#include <iostream>
namespace
{
volatile std::sig_atomic_t gSignalStatus;
}
void signal_handler(int signal)
{
gSignalStatus = signal;
}
int main()
{
// Install a signal handler
std::signal(SIGINT, signal_handler);
std::cout << "SignalValue: " << gSignalStatus << '\n';
std::cout << "Sending signal " << SIGINT << '\n';
std::raise(SIGINT);
std::cout << "SignalValue: " << gSignalStatus << '\n';
}
I have a C++ class C which contains some code, including a static variable which is meant to only be read, and perhaps a constexpr static function. For example:
template<std::size_t T>
class C {
public:
//some functions
void func1();
void func2()
static constexpr std::size_t sfunc1(){ return T; }
private:
std::size_t var1;
std::array<std::size_t,10000> array1;
static int svar1;
}
The idea is to use the thread affinity mechanisms of openMP 4.5 to control the socket (NUMA architecture) where various instances of this class are executed (and therefore also place it in a memory location close to the socket to avoid using the interconnect between the NUMA nodes). It is my understanding that since this code contains a static variable it is effectively shared between all class instances so I won't have control of the memory location where the static variable will be placed, upon thread creation. Is this correct? But I presume the other non-static variables will be located at memory locations close to the socket being used? Thanks
You have to assume that the thread stack, thread-bound malloc, and thread local storage will allocate to the thread's "local" memory - so any auto or new variables should be optimised at least on the thread they were created on, though I don't know which compilers support that kind of allocation model; but as you say, static non-const data can only exist in one location. I guess if the compiler recognises const segments or constructed const segments, then after construction they could be duplicated per zone and then mapped to the same logical address? Again don't know if compilers are doing that automagically.
Non-const statics are going to be troublesome. Presumably these statics are helping to perform some sort of thread synchronisation. If they contain flags that are read often and written rarely then for best performance it may be better for the writer to write to a number of registered copies (one per zone) and each thread uses a thread-local pointer to the appropriate zone copy, than half (or 3/4) the readers are always slow. Of course, that ceases to be a simple atomic write, and a single mutex just puts you back where you started. I suspect this is roll-your-own code land.
The simple case that shouldn't be forgotten: if objects are passed between threads, then potentially a thread could be accessing a non-local object.
Can external I/O be relied upon as a form of cross-thread synchronization?
To be specific, consider the pseudocode below, which assumes the existence of network/socket functions:
int a; // Globally accessible data.
socket s1, s2; // Platform-specific.
int main() {
// Set up + connect two sockets to (the same) remote machine.
s1 = ...;
s2 = ...;
std::thread t1{thread1}, t2{thread2};
t1.join();
t2.join();
}
void thread1() {
a = 42;
send(s1, "foo");
}
void thread2() {
recv(s2); // Blocking receive (error handling omitted).
f(a); // Use a, should be 42.
}
We assume that the remote machine only sends data to s2 upon receiving the "foo" from s1. If this assumption fails, then certainly undefined behavior will result. But if it holds (and no other external failure occurs like network data corruption, etc.), does this program produce defined behavior?
"Never", "unspecified (depends on implementation)", "depends on the guarantees provided by the implementation of send/recv" are example answers of the sort I'm expecting, preferably with justification from the C++ standard (or other relevant standards, such as POSIX for sockets/networking).
If "never", then changing a to be a std::atomic<int> initialized to a definite value (say 0) would avoid undefined behaviour, but then is the value guaranteed to be read as 42 in thread2 or could a stale value be read? Do POSIX sockets provide a further guarantee that ensures a stale value will not be read?
If "depends", do POSIX sockets provide the relevant guarantee to make it defined behavior? (How about if s1 and s2 were the same socket instead of two separate sockets?)
For reference, the standard I/O library has a clause which seems to provide an analogous guarantee when working with iostreams (27.2.3¶2 in N4604):
If one thread makes a library call a that writes a value to a stream and, as a result, another thread reads this value from the stream through a library call b such that this does not result in a data race, then a’s write synchronizes with b’s read.
So is it a matter of the underlying network library/functions being used providing a similar guarantee?
In practical terms, it seems the compiler can't reorder accesses to the global a with respect to the send andrecv functions (as they could use a in principle). However, the thread running thread2 could still read a stale value of a unless there was some kind of memory barrier / synchronization guarantee provided by the send/recv pair itself.
Short answer: No, there is no generic guarantee that a will be updated. My suggestion would be to send the value of a along with "foo" - e.g. "foo, 42", or something like it. That is guaranteed to work, and probably not that significant overhead. [There may of course be other reasons why that doesn't work well]
Long rambling stuff that doesn't really answer the problem:
Global data is not guaranteed to be "visible" immediately in different cores of multicore processors without further operations. Yes, most modern processors are "coherent", but not all models of all brands are guaranteed to do so. So if thread2 runs on a processor that has already cached a copy of a, it can not be guaranteed that the value of a is 42 at the point when you call f.
The C++ standard guarantees that global variables are loaded after the function call, so the compiler is not allowed to do:
tmp = a;
recv(...);
f(tmp);
but as I said above, cache-operations may be needed to guarantee that all processors see the same value at the same time. If send and recv are long in time or big in accesses enough [there is no direct measure that says how long or big] you may see the correct value most or even all of the time, but there is no guarantee for ordinary types that they are ACTUALLY updated outside of the thread that wrote the value last.
std::atomic will help on some types of processors, but there is no guarantee that this is "visible" in a second thread or on a second processor core at any reasonable time after it was changed.
The only practical solution is to have some kind of "repeat until I see it change" type code - this may require one value that is (for example) a counter, and one value that is the actual value - if you want to be able to say that "a is now 42. I've set a again, it's 42 this time too". If a is reppresenting, for example the number of data items available in a buffer, it is probably "it changed value" that matters, and just checking "is this the same as last time". The std::atomic operations have guarantees with regard to ordering, which allows you to use them to ensure that "if I update this field, the other field is guaranteed to appear at the same time or before this". So you can use that to guarantee for example a pair of data items are set to the "there is a new value" (for example a counter to indicate the "version number" of the current data) and "the new value is X".
Of course, if you KNOW what processor architectures your code will run on, you can plausibly make more advanced guesses as to what the behaviour will be. For example all x86 and many ARM processors use the cache-interface to implement atomic updates on a variable, so by doing an atomic update on one core, you can know that "no other processor will have a stale value of this". But there are processors available that do not have this implementation detail, and where an update, even with an atomic instruction, will not be updated on other cores or in other threads until "some time in the future, uncertain when".
In general, no, external I/O can't be relied upon for cross-thread synchronization.
The question is out-of-scope of the C++ standard itself, as it involves the behavior of external/OS library functions. So whether the program is undefined behavior depends on any synchronization guarantees provided by the network I/O functions. In the absence of such guarantees, it is indeed undefined behavior. Switching to (initialized) atomics to avoid undefined behavior still wouldn't guarantee the "correct" up-to-date value will be read. To ensure that within the realms of the C++ standard would require some kind of locking (e.g. spinlock or mutex), even though it seems like waiting shouldn't be required due to the real-time ordering of the situation.
In general, the notion of "real-time" synchronization (involving visibility rather than merely ordering) required to avoid having to potentially wait after the recv returns before loading a isn't supported by the C++ standard. At a lower level, this notion does exist however, and would typically be implemented through inter-processor interrupts, e.g. FlushProcessWriteBuffers on Windows, or sys_membarrier on x86 Linux. This would be inserted after the store to a before send in thread1. No synchronization or barrier would be required in thread2. (It also seems like a simple SFENCE in thread1 might suffice on x86 due to its strong memory model, at least in the absence of non-temporal loads/stores.)
A compiler barrier shouldn't be needed in either thread for the reasons outlined in the question (call to an external function send, which for all the compiler knows could be acquiring an internal mutex to synchronize with the other call to recv).
Insidious problems of the sort described in section 4.3 of Hans Boehm's paper "Threads Cannot be Implemented as a Library" should not be a concern as the C++ compiler is thread-aware (and in particular the opaque functions send and recv could contain synchronization operations), so transformations introducing writes to a after the send in thread1 are not permissible under the memory model.
This leaves the open question of whether the POSIX network functions provide the necessary guarantees. I highly doubt it, as on some of the architectures with weak memory models, they are highly non-trivial and/or expensive to provide (requiring a process-wide mutex or IPI as mentioned earlier). On x86 specifically, it's almost certain that accessing a shared resource like a socket will entail an SFENCE or MFENCE (or even a LOCK-prefixed instruction) somewhere along the line, which should be sufficient, but this is unlikely to be enshrined in a standard anywhere. Edit: In fact, I think even the INT to switch to kernel mode entails a drain of the store buffer (the best reference I have to hand is this forum post).
Suppose that we have the following bit of code:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
void guarantee(bool cond, const char *msg) {
if (!cond) {
fprintf(stderr, "%s", msg);
exit(1);
}
}
bool do_shutdown = false; // Not volatile!
pthread_cond_t shutdown_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t shutdown_cond_mutex = PTHREAD_MUTEX_INITIALIZER;
/* Called in Thread 1. Intended behavior is to block until
trigger_shutdown() is called. */
void wait_for_shutdown_signal() {
int res;
res = pthread_mutex_lock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not lock shutdown cond mutex");
while (!do_shutdown) { // while loop guards against spurious wakeups
res = pthread_cond_wait(&shutdown_cond, &shutdown_cond_mutex);
guarantee(res == 0, "Could not wait for shutdown cond");
}
res = pthread_mutex_unlock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not unlock shutdown cond mutex");
}
/* Called in Thread 2. */
void trigger_shutdown() {
int res;
res = pthread_mutex_lock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not lock shutdown cond mutex");
do_shutdown = true;
res = pthread_cond_signal(&shutdown_cond);
guarantee(res == 0, "Could not signal shutdown cond");
res = pthread_mutex_unlock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not unlock shutdown cond mutex");
}
Can a standards-compliant C/C++ compiler ever cache the value of do_shutdown in a register across the call to pthread_cond_wait()? If not, which standards/clauses guarantee this?
The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown. This seems rather improbable, but I know of no standard that prevents it.
In practice, do any C/C++ compilers cache the value of do_shutdown in a register across the call to pthread_cond_wait()?
Which function calls is the compiler guaranteed not to cache the value of do_shutdown across? It's clear that if the function is declared externally and the compiler cannot access its definition, it must make no assumptions about its behavior so it cannot prove that it does not access do_shutdown. If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?
Of course the current C and C++ standards say nothing on the subject.
As far as I know, Posix still avoids formally defining a concurrency model (I may be out of date, though, in which case apply my answer only to earlier Posix versions). Therefore what it does say has to be read with a little sympathy - it does not precisely lay out the requirements in this area, but implementers are expected to "know what it means" and do something that makes threads usable.
When the standard says that mutexes "synchronize memory access", implementations must assume that this means changes made under the lock in one thread will be visible under the lock in other threads. In other words, it's necessary (although not sufficient) that synchronization operations include memory barriers of one kind or another, and necessary behaviour of a memory barrier is that it must assume globals can change.
Threads Cannot be Implemented as a Library covers some specific issues that are required for a pthreads to actually be usable, but are not explicitly stated in the Posix standard at the time of writing (2004). It becomes quite important whether your compiler-writer, or whoever defined the memory model for your implementation, agrees with Boehm what "usable" means, in terms of allowing the programmer to "reason convincingly about program correctness".
Note that Posix doesn't guarantee a coherent memory cache, so if your implementation perversely wants to cache do_something in a register in your code, then even if you marked it volatile, it might perversely choose not to dirty your CPU's local cache between the synchronizing operation and reading do_something. So if the writer thread is running on a different CPU with its own cache, you might not see the change even then.
That's (one reason) why threads cannot be implemented merely as a library. This optimization of fetching a volatile global only from local CPU cache would be valid in a single-threaded C implementation[*], but breaks multi-threaded code. Hence, the compiler needs to "know about" threads, and how they affect other language features (for an example outside pthreads: on Windows, where cache is always coherent, Microsoft spells out the additional semantics that it grants volatile in multi-threaded code). Basically, you have to assume that if your implementation has gone to the trouble of providing the pthreads functions, then it will go to the trouble of defining a workable memory model in which locks actually synchronize memory access.
If the compiler can inline the
function and prove it does not access
do_shutdown, then can it cache
do_shutdown even in a multithreaded
setting? What about a non-inlined
function in the same compilation unit?
Yes to all of this - if the object is non-volatile, and the compiler can prove that this thread doesn't modify it (either through its name or through an aliased pointer), and if no memory barriers occur, then it can reuse previous values. There can and will be other implementation-specific conditions that sometimes stop it, of course.
[*] provided that the implementation knows the global is not located at some "special" hardware address which requires that reads always go through cache to main memory in order to see the results of whatever hardware op affects that address. But to put a global at any such location, or to make its location special with DMA or whatever, requires implementation-specific magic. Absent any such magic the implementation in principle can sometimes know this.
Since do_shutdown has external linkage there's no way the compiler could know what happens to it across the call (unless it had full visibility to the functions being called). So it would have to reload the value (volatile or not - threading has no bearing on this) after the call.
As far as I know there's nothing directly said about this in the standard, except that the (single-threaded) abstract machine the standard uses to define the behavior of expressions indicates that the variable needs to be read when it's accessed in an expression. The standard permits that reading of the variable to be optimized away only if the behavior can be proven to be "as if" it were reloaded. And that can happen only if the compiler can know that the value was not modified by the function call.
Also not that the pthread library does make certain guarantees about memory barriers for various functions, including pthread_cond_wait(): Does guarding a variable with a pthread mutex guarantee it's also not cached?
Now, if do_shutdown were static (no external linkage) and you have several threads that used that static variable defined in the same module (ie., the address of the static variable was never taken to be passed to another module), That might be a different story. for example, say that you have a single function that used such a variable, and started several thread instances running for that function. In that case, a standards conforming compiler implementation might cache the value across function calls since it could assume that nothing else could modify the value (the standard's abstract machine model doesn't include threading).
So in that case, you would have to use mechanisms to ensure that the value was reloaded across the call. Note that because of hardware intricacies, the volatile keyword might not be adequate to ensure correct memory access ordering - you should rely on APIs provided by pthreads or the OS to ensure that. (as a side-note, recent versions of Microsoft's compilers do document that volatile enforce full memory barriers, but I've read opinions that indicate this isn't required by the standard).
The hand-waving answers are all wrong. Sorry to be harsh.
There is no way
The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown.
If you believe differently, please show proof: a complete C++ program such that a compiler not designed for MT could deduce that pthread_cond_wait does not modify do_shutdown.
It's absurd, a compiler cannot possibly understand what pthread_ functions do, unless it has built-in knowledge of POSIX threads.
From my own work, I can say that yes, the compiler can cache values across pthread_mutex_lock/pthread_mutex_unlock. I spent most of a weekend tracing down a bug in a bit of code that was caused by a set of pointers assignments being cached and unavailable to the threads that needed them. As a quick test, I wrapped the assignments in a mutex lock/unlock, and the threads still did not have access to the proper pointer values. Moving the pointer assignments & associated mutex locking to a separate function did fix the problem.
Most of the times, the definition of reentrance is quoted from Wikipedia:
A computer program or routine is
described as reentrant if it can be
safely called again before its
previous invocation has been completed
(i.e it can be safely executed
concurrently). To be reentrant, a
computer program or routine:
Must hold no static (or global)
non-constant data.
Must not return the address to
static (or global) non-constant
data.
Must work only on the data provided
to it by the caller.
Must not rely on locks to singleton
resources.
Must not modify its own code (unless
executing in its own unique thread
storage)
Must not call non-reentrant computer
programs or routines.
How is safely defined?
If a program can be safely executed concurrently, does it always mean that it is reentrant?
What exactly is the common thread between the six points mentioned that I should keep in mind while checking my code for reentrant capabilities?
Also,
Are all recursive functions reentrant?
Are all thread-safe functions reentrant?
Are all recursive and thread-safe functions reentrant?
While writing this question, one thing comes to mind:
Are the terms like reentrance and thread safety absolute at all i.e. do they have fixed concrete definitions? For, if they are not, this question is not very meaningful.
1. How is safely defined?
Semantically. In this case, this is not a hard-defined term. It just mean "You can do that, without risk".
2. If a program can be safely executed concurrently, does it always mean that it is reentrant?
No.
For example, let's have a C++ function that takes both a lock, and a callback as a parameter:
#include <mutex>
typedef void (*callback)();
std::mutex m;
void foo(callback f)
{
m.lock();
// use the resource protected by the mutex
if (f) {
f();
}
// use the resource protected by the mutex
m.unlock();
}
Another function could well need to lock the same mutex:
void bar()
{
foo(nullptr);
}
At first sight, everything seems ok… But wait:
int main()
{
foo(bar);
return 0;
}
If the lock on mutex is not recursive, then here's what will happen, in the main thread:
main will call foo.
foo will acquire the lock.
foo will call bar, which will call foo.
the 2nd foo will try to acquire the lock, fail and wait for it to be released.
Deadlock.
Oops…
Ok, I cheated, using the callback thing. But it's easy to imagine more complex pieces of code having a similar effect.
3. What exactly is the common thread between the six points mentioned that I should keep in mind while checking my code for reentrant capabilities?
You can smell a problem if your function has/gives access to a modifiable persistent resource, or has/gives access to a function that smells.
(Ok, 99% of our code should smell, then… See last section to handle that…)
So, studying your code, one of those points should alert you:
The function has a state (i.e. access a global variable, or even a class member variable)
This function can be called by multiple threads, or could appear twice in the stack while the process is executing (i.e. the function could call itself, directly or indirectly). Function taking callbacks as parameters smell a lot.
Note that non-reentrancy is viral : A function that could call a possible non-reentrant function cannot be considered reentrant.
Note, too, that C++ methods smell because they have access to this, so you should study the code to be sure they have no funny interaction.
4.1. Are all recursive functions reentrant?
No.
In multithreaded cases, a recursive function accessing a shared resource could be called by multiple threads at the same moment, resulting in bad/corrupted data.
In singlethreaded cases, a recursive function could use a non-reentrant function (like the infamous strtok), or use global data without handling the fact the data is already in use. So your function is recursive because it calls itself directly or indirectly, but it can still be recursive-unsafe.
4.2. Are all thread-safe functions reentrant?
In the example above, I showed how an apparently threadsafe function was not reentrant. OK, I cheated because of the callback parameter. But then, there are multiple ways to deadlock a thread by having it acquire twice a non-recursive lock.
4.3. Are all recursive and thread-safe functions reentrant?
I would say "yes" if by "recursive" you mean "recursive-safe".
If you can guarantee that a function can be called simultaneously by multiple threads, and can call itself, directly or indirectly, without problems, then it is reentrant.
The problem is evaluating this guarantee… ^_^
5. Are the terms like reentrance and thread safety absolute at all, i.e. do they have fixed concrete definitions?
I believe they do, but then, evaluating a function is thread-safe or reentrant can be difficult. This is why I used the term smell above: You can find a function is not reentrant, but it could be difficult to be sure a complex piece of code is reentrant
6. An example
Let's say you have an object, with one method that needs to use a resource:
struct MyStruct
{
P * p;
void foo()
{
if (this->p == nullptr)
{
this->p = new P();
}
// lots of code, some using this->p
if (this->p != nullptr)
{
delete this->p;
this->p = nullptr;
}
}
};
The first problem is that if somehow this function is called recursively (i.e. this function calls itself, directly or indirectly), the code will probably crash, because this->p will be deleted at the end of the last call, and still probably be used before the end of the first call.
Thus, this code is not recursive-safe.
We could use a reference counter to correct this:
struct MyStruct
{
size_t c;
P * p;
void foo()
{
if (c == 0)
{
this->p = new P();
}
++c;
// lots of code, some using this->p
--c;
if (c == 0)
{
delete this->p;
this->p = nullptr;
}
}
};
This way, the code becomes recursive-safe… But it is still not reentrant because of multithreading issues: We must be sure the modifications of c and of p will be done atomically, using a recursive mutex (not all mutexes are recursive):
#include <mutex>
struct MyStruct
{
std::recursive_mutex m;
size_t c;
P * p;
void foo()
{
m.lock();
if (c == 0)
{
this->p = new P();
}
++c;
m.unlock();
// lots of code, some using this->p
m.lock();
--c;
if (c == 0)
{
delete this->p;
this->p = nullptr;
}
m.unlock();
}
};
And of course, this all assumes the lots of code is itself reentrant, including the use of p.
And the code above is not even remotely exception-safe, but this is another story… ^_^
7. Hey 99% of our code is not reentrant!
It is quite true for spaghetti code. But if you partition correctly your code, you will avoid reentrancy problems.
7.1. Make sure all functions have NO state
They must only use the parameters, their own local variables, other functions without state, and return copies of the data if they return at all.
7.2. Make sure your object is "recursive-safe"
An object method has access to this, so it shares a state with all the methods of the same instance of the object.
So, make sure the object can be used at one point in the stack (i.e. calling method A), and then, at another point (i.e. calling method B), without corrupting the whole object. Design your object to make sure that upon exiting a method, the object is stable and correct (no dangling pointers, no contradicting member variables, etc.).
7.3. Make sure all your objects are correctly encapsulated
No one else should have access to their internal data:
// bad
int & MyObject::getCounter()
{
return this->counter;
}
// good
int MyObject::getCounter()
{
return this->counter;
}
// good, too
void MyObject::getCounter(int & p_counter)
{
p_counter = this->counter;
}
Even returning a const reference could be dangerous if the user retrieves the address of the data, as some other portion of the code could modify it without the code holding the const reference being told.
7.4. Make sure the user knows your object is not thread-safe
Thus, the user is responsible to use mutexes to use an object shared between threads.
The objects from the STL are designed to be not thread-safe (because of performance issues), and thus, if a user want to share a std::string between two threads, the user must protect its access with concurrency primitives;
7.5. Make sure your thread-safe code is recursive-safe
This means using recursive mutexes if you believe the same resource can be used twice by the same thread.
"Safely" is defined exactly as the common sense dictates - it means "doing its thing correctly without interfering with other things". The six points you cite quite clearly express the requirements to achieve that.
The answers to your 3 questions is 3× "no".
Are all recursive functions reentrant?
NO!
Two simultaneous invocations of a recursive function can easily screw up each other, if
they access the same global/static data, for example.
Are all thread-safe functions reentrant?
NO!
A function is thread-safe if it doesn't malfunction if called concurrently. But this can be achieved e.g. by using a mutex to block the execution of the second invocation until the first finishes, so only one invocation works at a time. Reentrancy means executing concurrently without interfering with other invocations.
Are all recursive and thread-safe functions reentrant?
NO!
See above.
The common thread:
Is the behavior well defined if the routine is called while it is interrupted?
If you have a function like this:
int add( int a , int b ) {
return a + b;
}
Then it is not dependent upon any external state. The behavior is well defined.
If you have a function like this:
int add_to_global( int a ) {
return gValue += a;
}
The result is not well defined on multiple threads. Information could be lost if the timing was just wrong.
The simplest form of a reentrant function is something that operates exclusively on the arguments passed and constant values. Anything else takes special handling or, often, is not reentrant. And of course the arguments must not reference mutable globals.
Now I have to elaborate on my previous comment. #paercebal answer is incorrect. In the example code didn't anyone notice that the mutex which as supposed to be parameter wasn't actually passed in?
I dispute the conclusion, I assert: for a function to be safe in the presence of concurrency it must be re-entrant. Therefore concurrent-safe (usually written thread-safe) implies re-entrant.
Neither thread safe nor re-entrant have anything to say about arguments: we're talking about concurrent execution of the function, which can still be unsafe if inappropriate parameters are used.
For example, memcpy() is thread-safe and re-entrant (usually). Obviously it will not work as expected if called with pointers to the same targets from two different threads. That's the point of the SGI definition, placing the onus on the client to ensure accesses to the same data structure are synchronised by the client.
It is important to understand that in general it is nonsense to have thread-safe operation include the parameters. If you've done any database programming you will understand. The concept of what is "atomic" and might be protected by a mutex or some other technique is necessarily a user concept: processing a transaction on a database can require multiple un-interrupted modifications. Who can say which ones need to be kept in sync but the client programmer?
The point is that "corruption" doesn't have to be messing up the memory on your computer with unserialised writes: corruption can still occur even if all individual operations are serialised. It follows that when you're asking if a function is thread-safe, or re-entrant, the question means for all appropriately separated arguments: using coupled arguments does not constitute a counter-example.
There are many programming systems out there: Ocaml is one, and I think Python as well, which have lots of non-reentrant code in them, but which uses a global lock to interleave thread acesss. These systems are not re-entrant and they're not thread-safe or concurrent-safe, they operate safely simply because they prevent concurrency globally.
A good example is malloc. It is not re-entrant and not thread-safe. This is because it has to access a global resource (the heap). Using locks doesn't make it safe: it's definitely not re-entrant. If the interface to malloc had be design properly it would be possible to make it re-entrant and thread-safe:
malloc(heap*, size_t);
Now it can be safe because it transfers the responsibility for serialising shared access to a single heap to the client. In particular no work is required if there are separate heap objects. If a common heap is used, the client has to serialise access. Using a lock inside the function is not enough: just consider a malloc locking a heap* and then a signal comes along and calls malloc on the same pointer: deadlock: the signal can't proceed, and the client can't either because it is interrupted.
Generally speaking, locks do not make things thread-safe .. they actually destroy safety by inappropriately trying to manage a resource that is owned by the client. Locking has to be done by the object manufacturer, thats the only code that knows how many objects are created and how they will be used.
The "common thread" (pun intended!?) amongst the points listed is that the function must not do anything that would affect the behaviour of any recursive or concurrent calls to the same function.
So for example static data is an issue because it is owned by all threads; if one call modifies a static variable the all threads use the modified data thus affecting their behaviour. Self modifying code (although rarely encountered, and in some cases prevented) would be a problem, because although there are multiple thread, there is only one copy of the code; the code is essential static data too.
Essentially to be re-entrant, each thread must be able to use the function as if it were the only user, and that is not the case if one thread can affect the behaviour of another in a non-deterministic manner. Primarily this involves each thread having either separate or constant data that the function works on.
All that said, point (1) is not necessarily true; for example, you might legitimately and by design use a static variable to retain a recursion count to guard against excessive recursion or to profile an algorithm.
A thread-safe function need not be reentrant; it may achieve thread safety by specifically preventing reentrancy with a lock, and point (6) says that such a function is not reentrant. Regarding point (6), a function that calls a thread-safe function that locks is not safe for use in recursion (it will dead-lock), and is therefore not said to be reentrant, though it may nonetheless safe for concurrency, and would still be re-entrant in the sense that multiple threads can have their program-counters in such a function simultaneously (just not with the locked region). May be this helps to distinguish thread-safety from reentarncy (or maybe adds to your confusion!).
The answers your "Also" questions are "No", "No" and "No". Just because a function is recursive and/or thread safe it doesn't make it re-entrant.
Each of these type of function can fail on all the points you quote. (Though I'm not 100% certain of point 5).
non reentrant function means that there will be a static context, maintained by function. when first time entering, there will be create new context for you. and next entering, you don't send more parameter for that, for convenient to token analyze, . e.g. strtok in c. if you have not clear the context, there might be some errors.
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
on the contrary of non-reentrant, reentrant function means calling function in anytime will get the same result without side effect. because there is none of context.
in the view of thread safe, it just means there is only one modification for public variable in current time, in current process. so you should add lock guard to ensure just one change for public field in one time.
so thread safety and reentrant are two different things in different views.reentrant function safety says you should clear context before next time for context analyze. thread safety says you should keep visit public field order.
The terms "Thread-safe" and "re-entrant" mean only and exactly what their definitions say. "Safe" in this context means only what the definition you quote below it says.
"Safe" here certainly doesn't mean safe in the broader sense that calling a given function in a given context won't totally hose your application. Altogether, a function might reliably produce a desired effect in your multi-threaded application but not qualify as either re-entrant or thread-safe according to the definitions. Oppositely, you can call re-entrant functions in ways that will produce a variety of undesired, unexpected and/or unpredictable effects in your multi-threaded application.
Recursive function can be anything and Re-entrant has a stronger definition than thread-safe so the answers to your numbered questions are all no.
Reading the definition of re-entrant, one might summarize it as meaning a function which will not modify any anything beyond what you call it to modify. But you shouldn't rely on only the summary.
Multi-threaded programming is just extremely difficult in the general case. Knowing which part of one's code re-entrant is only a part of this challenge. Thread safety is not additive. Rather than trying to piece together re-entrant functions, it's better to use an overall thread-safe design pattern and use this pattern to guide your use of every thread and shared resources in the your program.