Is non-re-enterant issue comes only when we have multiple threads? - c++

I was looking into the old concept of writing re-enterant code. They say, don't use global, static variables. Because, it is prone to non-deterministic behaviour. However, I am not sure about whether it is applicable where there is a single thread. I understand thread safe and reentrancy are two different concepts. While using C++, we avoid using global and static variables. Even, we try to avoid singletons. But, the question is what will happen to his piece of code borrowed from the wiki, if it is running on a single thread. Is it prone to non-deterministic behaviour.
int g_var = 1;
int f()
g_var = g_var + 2;
return g_var;
int g()
return f() + 2;
Another part is people say that re-entrancy and thread safe are unrelated. However, in this example we can make the results predictable by putting a mutex in function g. So, when two threads execute this g() concurrently the results are deterministic. So, we actually fixed the non-enterant code by putting a thread safe mechanism. Please clear my concepts here. Am I missing something or my understanding is not right?

Reentrancy problems have more to do with e.g. shared libraries than threads. Remember that shared libraries are shared, there is only one copy of a shared library loaded, no matter how many processes use the library. This of course means that global and static data is shared as well, which can lead to problems. And normal in-process mechanisms like thread mutexes will not help here, since those are only per process, you have to use inter-process facilities to protect this data. And doing it is often too much work, it's often easier to avoid the issue completely by not having global or static data in shared libraries.

Note that this program will be compiled to something like this:
int g_var = 1;
int f()
int tmp = gvar;
// hardware interrupt might invoke isr() here!
tmp += 2;
g_var = tmp;
return tmp;
int g()
int tmp = f();
return tmp + 2;
So if f() is interrupted in the middle and reentered, the two invocations of f() will increase g_var by only 2 and not 4.

You must take care in the following situations:
Multi-threaded programs.
Multi-process programs where several processes share the same variables.
Programs using hardware interrupts and interrupt service routines.
Programs using certain forms of callback functions.
The code you posted uses none of the above so it will work perfectly fine.
Another part is people say that re-entrancy and thread safe are unrelated.
Re-entrant usually means "this code is written in such a manner, that it needs no protection mechanisms". For example, a function that only uses local variables and no library calls is re-entrant.
Thread-safe means that the code is properly using mechanisms such as mutexes or critical sections to protect shared resources.
The opposite of re-entrant is non-re-entrant. And when you fix non-re-entrant code, you make it thread-safe. So the terms are related but not synonymous: they mean different things.

Reentrancy problems are not unique to multi-threaded code.
Consider the following (single thread) code:
time_t t = time(NULL);
struct tm *a;
struct tm *b;
a = localtime(&t);
t += 3600;
b = localtime(&t);
Here you have an issue, since localtime() is not reentrant.
It (usually) returns a pointer to static storage.
Thus while the struct tm a and b pointers should have different content, they're now the same. a == b, since the last call would change the same struct that a points to.
The above code can be fixed for a single thread program like so:
time_t t = time(NULL);
struct tm *tmp;
struct tm a;
struct tm b;
tmp = localtime(&t);
a = *tmp;
t += 3600;
b = localtime(&t);
b = *tmp;
For multi threaded programs another set of problems are introduced by non-reentrant functions, as multiple threads could call localtime() at the same time, leading to unpredictable race conditions. Now, some platforms that
supports threads implements localtime() and other non-reentrant functions using thread-local storage, reducing the problem to be similar to that of a single threaded program.
Reentrancy problems related to multi threading can normally be solved with a mutex. To safely use localtime(), you could create your own function that protects calling localtime() and copying out
the result:
static mutex lmutex;
struct my_localtime(time_t *t, struct tm *result)
struct tm *tmp;
tmp = localtime(tmp);
*result = *tmp;


Is using volatile on shared memory safe?

Lets suppose following:
I have two processes on Linux / Mac OS.
I have mmap on shared memory (or in a file).
Then in both processes I have following:
struct Data{
volatile int reload = 0; // using int because is more standard
// more things in the future...
void *mmap_memory = mmap(...);
Data *data = static_cast<Data *>(mmap_memory); // suppose size is sufficient and all OK
Then in one of the processes I do:
data->reload = 1;
And in the other I do:
if (data->reload == 1)
Will this be thread / inter process safe?
Idea is from here:
This can not be safe with std::atomic<>, since it does not "promise" anything about shared memory. Also constructing/destructing from two different processes is not clear at all.
Will this be thread / inter process safe?
From your own link:
One problematic and common assumption is that volatile is equivalent to “atomic”. This is not the case. All the volatile keyword denotes is that the variable may be modified externally, and thus reads/writes cannot be optimized.
Your code needs atomic access to the value. if (data->reload == 1) won't work if it reads some partial/intermediate value from data->reload.
And nevermind what happens if multiple threads do read 1 from data->reload - your posted code doesn't handle that at all.
Also see Why is volatile not considered useful in multithreaded C or C++ programming?

How to avoid destroying and recreating threads inside loop?

I have a loop with that creates and uses two threads. The threads always do the same thing and I'm wondering how they can be reused instead of created and destroyed each iteration? Some other operations are do inside the loop that affect the data the threads process. Here is a simplified example:
const int args1 = foo1();
const int args2 = foo2();
vector<string> myVec = populateVector();
int a = 1;
while(int i = 0; i < 100; i++)
auto func = [&](const vector<string> vec){
//do stuff involving variable a
thread t1(func, args1);
thread t2(func, args2);
a = 2 * a;
Is there a way to have t1 and t2 restart? Is there a design pattern I should look into? I ask because adding threads made the program slightly slower when I thought it would be faster.
You can use std::async as suggested in the comments.
What you're also trying to do is a very common usage for a Threadpool. I simple header only implementation of which I commonly utilize is here
To use this library, create the pool outside of the loop with a number of threads set during construction. Then enqueue a function in which a thread will go off and execute. With this library, you'll be getting a std::future (much like the std::async steps) and this is what you'd wait on in your loop.
Generically, you'd want to make access to any data thread-safe with mutexs (or other means, there are a lot of ways to do this) but under very specific situations, you'll not need to.
In this case,
so long as the vector isn't being increased in size (doesn't need to reallocate)
Only reading items or only modifying each item at a time in its own thread
the you wouldn't need to worry about synchronization.
Though its just good habit to do the sync anyways... When other people eventually modify the code, they're not going to know your rules and will cause issues.

Setting all TLS (thread local storage) variables to a new, single value in C++

I have a class Foo with the following thread-specific static member:
__declspec(thread) static bool s_IsAllAboutThatBass;
In the implementation file it is initialized like so:
__declspec(thread) bool Foo::s_IsAllAboutThatBass = true;
So far so good. Now, any thread can flip this bool willy nilly as they deem fit. Then the problem: at some point I want each thread to reset that bool to its initial true value.
How can I slam all instances of the TLS to true from a central thread?
I've thought of ways I could do this with synchronization primitives I know about, like critical sections, read/write sections, or events, but nothing fits the bill. In my real use cases I am unable to block any of the other threads for any significant length of time.
Any help is appreciated. Thank you!
Edit: Plan A
One idea is to use a generation token, or cookie that is read by all threads and written to by the central thread. Each thread can then have a TLS for the last generation viewed by that thread when grabbing s_isAllAboutThatBass via some accessor. When the thread local cookie differs from the shared cookie, we increment the thread local one and update s_isAllAboutThatBass to true.
Here is a light weighted implementation of "Plan A" with C++11 Standard atomic variable and thread_local-specifier. (If your compiler doesn't support them, please replace to vendor specific facilities.)
#include <atomic>
struct Foo {
static std::atomic<unsigned> s_TokenGeneration;
static thread_local unsigned s_LocalToken;
static thread_local bool s_LocalState;
// for central thread
void signalResetIsAllAboutThatBass() {
// accessor for other threads
void setIsAllAboutThatBass(bool b) {
unsigned currToken = s_TokenGeneration;
s_LocalToken = currToken;
s_LocalState = b;
bool getIsAllAboutThatBass() const {
unsigned currToken = s_TokenGeneration;
if (s_LocalToken < currToken) {
// reset thread-local token & state
s_LocalToken = currToken;
s_LocalState = true;
return s_LocalState;
std::atomic<unsigned> Foo::s_TokenGeneration;
thread_local unsigned Foo::s_LocalToken = 0u;
thread_local bool Foo::s_LocalState = true;
The simplest answer is: you can't. The reason that it's called thread local storage is because only its thread can access it. Which, by definition, means that some other "central thread" can't get to it. That's what it's all about, by definition.
Now, depending on how your hardware and compiler platform implements TLS, there might be a trick around it, if your implemention of TLS works by mapping TLS variables to different virtual memory addresses. Typically, what happens is that one CPU register is thread-specific, it's set to point to different memory addresses, and all TLS variables are accessed as relative addresses.
If that is the case, you could, perhaps, derive some thread-safe mechanism by which each thread takes a pointer to its TLS variable, and puts it into a non-TLS container, that your "central thread" can get to.
And, of course, you must keep all of that in sync with your threads, and clean things up after each thread terminates.
You'll have to figure out whether this is the case on your platform with a trivial test: declare a TLS variable, then compare its pointer address in two different threads. If it's different, you might be able to work around it, in this fashion. Technically, this kind of pointer comparison is non-portable, and implementation defined, but by this time you're already far into implemention-specific behavior.
But if the addresses are the same, it means that your implementation uses virtual memory addressing to implement TLS. Only the executing thread has access to its TLS variable, period, and there is no practical means by which any "central thread" could look at other threads' TLS variables. It's enforced by your operating system kernel. The "central thread" must cooperate which each thread, and make arrangements to access the thread's TLS variables using typical means of interthread communications.
The cookie approach would work fine, and you don't need to use a TLS slot to implement it, just a local variable inside your thread procedure. To handle the case where the cookie changes value between the time that the thread is created and the time that it starts running (there is a small delay), you would have to pass the current cookie value as an input parameter for the thread creation, then your thread procedure can initialize its local variable to that value before it starts checking the live cookie for changes.
intptr_t g_cookie = 1;
pthread_rwlock_t g_lock;
void* thread_proc(void *arg)
intptr_t cookie = (intptr_t)arg;
while (keepRunningUntilSomeCondition)
if (cookie != g_cookie)
cookie = g_cookie;
s_IsAllAboutThatBass = true;
void createThread()
pthread_t thread;
pthread_create(&thread, NULL, &thread_proc, (void*)g_cookie);
void signalThreads()
int main()
pthread_rwlock_init(&g_lock, NULL);
// use createThread() and signalThreads() as needed...
return 0;

Proper compiler intrinsics for double-checked locking?

When implementing double-checked locking, what is the proper way to do the memory and/or compiler barriers when implementing double-checked locking for initialization?
Something like std::call_once isn't what I want; it's way too slow. It's typically just implemented on top of pthread_mutex_lock and EnterCriticalSection respective to OS.
In my programs, I often run into initialization cases where the initialization is safe to repeat, as long as exactly one thread gets to set the final pointer. If another thread beats it to setting the final pointer to the singleton object, it deletes what it created and makes use of the other thread's. I also often use this in cases where it doesn't matter which thread "wins" because they all come up with the same result.
Here's an unsafe, overly-contrived example, using Visual C++ intrinsics:
MyClass *GetGlobalMyClass()
static MyClass *const UNSET_POINTER = reinterpret_cast<MyClass *>(
static MyClass *volatile s_object = UNSET_POINTER;
if (s_object == UNSET_POINTER)
MyClass *newObject = MyClass::Create();
if (_InterlockedCompareExchangePointer(&s_object, newObject,
// Another thread beat us. If Create didn't return null, destroy.
if (newObject)
newObject->Destroy(); // calls "delete this;", presumably
return s_object;
On a weakly-ordered memory architecture, my understanding is that it's possible that the new value of s_object is visible to other threads before other variables written inside MyClass::Create or MyClass::MyClass are visible. Also, the compiler itself could arrange the code this way in the absence of a compiler barrier (in Visual C++, _WriteBarrier, but _InterlockedCompareExchange acts as a barrier).
Do I need like a store fence intrinsic function in there or something in order to ensure that MyClass's variables are visible to all threads before s_object becomes somethings besides -1?
Fortunately, the rules in C++ are very simple:
If there is a data race, the behaviour is undefined.
In you code the data race is caused by the following read, which conflicts with the write operation in __InterlockedCompareExchangePointer.
if (s_object.m_void == UNSET_POINTER)
A thread-safe solution without blocking might look as follows. Note that on x86 a load operation with sequential consistency has basically no overhead compared to a regular load operation. If you care about other architectures, you can also use acquire release instead of sequential consistency.
static std::atomic<MyClass*> s_object{nullptr};
MyClass* o = s_object.load(std::memory_order_seq_cst);
if (o == nullptr) {
o = new MyClass{...};
MyClass* expected = nullptr;
if (!s_object.compare_exchange_strong(expected, o, std::memory_order_seq_cst)) {
delete o;
o = expected;
return o;
For a proper C++11 implementation any function-local static variable will be constructed in a thread-safe fashion by the first thread passing through this variable.

Do I need to use volatile keyword if I declare a variable between mutexes and return it?

Let's say I have the following function.
std::mutex mutex;
int getNumber()
int size = someVector.size();
return size;
Is this a place to use volatile keyword while declaring size? Will return value optimization or something else break this code if I don't use volatile? The size of someVector can be changed from any of the numerous threads the program have and it is assumed that only one thread (other than modifiers) calls getNumber().
No. But beware that the size may not reflect the actual size AFTER the mutex is released.
Edit:If you need to do some work that relies on size being correct, you will need to wrap that whole task with a mutex.
You haven't mentioned what the type of the mutex variable is, but assuming it is an std::mutex (or something similar meant to guarantee mutual exclusion), the compiler is prevented from performing a lot of optimizations. So you don't need to worry about return value optimization or some other optimization allowing the size() query from being performed outside of the mutex block.
However, as soon as the mutex lock is released, another waiting thread is free to access the vector and possibly mutate it, thus changing the size. Now, the number returned by your function is outdated. As Mats Petersson mentions in his answer, if this is an issue, then the mutex lock needs to be acquired by the caller of getNumber(), and held until the caller is done using the result. This will ensure that the vector's size does not change during the operation.
Explicitly calling mutex::lock followed by mutex::unlock quickly becomes unfeasible for more complicated functions involving exceptions, multiple return statements etc. A much easier alternative is to use std::lock_guard to acquire the mutex lock.
int getNumber()
std::lock_guard<std::mutex> l(mutex); // lock is acquired
int size = someVector.size();
return size;
} // lock is released automatically when l goes out of scope
Volatile is a keyword that you use to tell the compiler to literally actually write or read the variable and not to apply any optimizations. Here is an example
int example_function() {
int a;
volatile int b;
a = 1; // this is ignored because nothing reads it before it is assigned again
a = 2; // same here
a = 3; // this is the last one, so a write takes place
b = 1; // b gets written here, because b is volatile
b = 2; // and again
b = 3; // and again
return a + b;
What is the real use of this? I've seen it in delay functions (keep the CPU busy for a bit by making it count up to a number) and in systems where several threads might look at the same variable. It can sometimes help a bit with multi-threaded things, but it isn't really a threading thing and is certainly not a silver bullet