Is using volatile on shared memory safe? - c++

Lets suppose following:
I have two processes on Linux / Mac OS.
I have mmap on shared memory (or in a file).
Then in both processes I have following:
struct Data{
volatile int reload = 0; // using int because is more standard
// more things in the future...
};
void *mmap_memory = mmap(...);
Data *data = static_cast<Data *>(mmap_memory); // suppose size is sufficient and all OK
Then in one of the processes I do:
//...
data->reload = 1;
//...
And in the other I do:
while(...){
do_some_work();
//...
if (data->reload == 1)
do_reload();
}
Will this be thread / inter process safe?
Idea is from here:
https://embeddedartistry.com/blog/2019/03/11/improve-volatile-usage-with-volatile_load-and-volatile_store/
Note:
This can not be safe with std::atomic<>, since it does not "promise" anything about shared memory. Also constructing/destructing from two different processes is not clear at all.

Will this be thread / inter process safe?
No.
From your own link:
One problematic and common assumption is that volatile is equivalent to “atomic”. This is not the case. All the volatile keyword denotes is that the variable may be modified externally, and thus reads/writes cannot be optimized.
Your code needs atomic access to the value. if (data->reload == 1) won't work if it reads some partial/intermediate value from data->reload.
And nevermind what happens if multiple threads do read 1 from data->reload - your posted code doesn't handle that at all.
Also see Why is volatile not considered useful in multithreaded C or C++ programming?

Related

Libcurl - CURLSHOPT_LOCKFUNC - how to use it?

Please tell me the nuances of using the option CURLSHOPT_LOCKFUNC ?
I found an example here.
I can't quite understand, is there an array of mutexes used to block access to data?
--Here is this part of the code from the example:
static pthread_mutex_t lockarray[NUM_LOCKS]; Is it an array of mutexes ?
//.....
static void lock_cb(CURL *handle, curl_lock_data data,
curl_lock_access access, void *userptr)
{
(void)access;
(void)userptr;
(void)handle;
pthread_mutex_lock(&lockarray[data]);
}
That is, from this section of code - I can conclude that different mutexes are used to lock CURLSHOPT_LOCKFUNC ?
I don't understand how it is? Isn't there more than one mutex needed to synchronize threads to access data?
And this part of the code is also not clear:
pthread_mutex_lock(&lockarray[data]);
What kind of variable is data ?
The documentation says: https://curl.se/libcurl/c/CURLSHOPT_LOCKFUNC.html
The data argument tells what kind of data libcurl wants to lock. Make
sure that the callback uses a different lock for each kind of data.
What means: "what kind of data libcurl wants to lock" ??
I can't understand.
Yes, the code is using an array of mutexes.
curl_lock_data is an enum that is defined in curl.h and specifies the different types of data that curl uses locks for:
/* Different data locks for a single share */
typedef enum {
CURL_LOCK_DATA_NONE = 0,
/* CURL_LOCK_DATA_SHARE is used internaly to say that
* the locking is just made to change the internal state of the share
* itself.
*/
CURL_LOCK_DATA_SHARE,
CURL_LOCK_DATA_COOKIE,
CURL_LOCK_DATA_DNS,
CURL_LOCK_DATA_SSL_SESSION,
CURL_LOCK_DATA_CONNECT,
CURL_LOCK_DATA_LAST
} curl_lock_data;
Since the values are sequential, the code is using them as indexes into the array.
So, for example, when multiple threads are accessing shared cookie data, they will lock/unlock the CURL_LOCK_DATA_COOKIE mutex.
curl_lock_access is another enum defined in curl.h, which specifies why a given lock is being obtained:
/* Different lock access types */
typedef enum {
CURL_LOCK_ACCESS_NONE = 0, /* unspecified action */
CURL_LOCK_ACCESS_SHARED = 1, /* for read perhaps */
CURL_LOCK_ACCESS_SINGLE = 2, /* for write perhaps */
CURL_LOCK_ACCESS_LAST /* never use */
} curl_lock_access;
The difference between these reasons doesn't matter for a mutex, which has only a notion of exclusive access, meaning only 1 thread at a time can hold the lock. That is why the code is not using the access parameter.
But, other types of locks, for instance a RW (reader/writer) lock, do have a concept of shared access (ie, multiple threads can hold the lock at the same time) vs exclusive access. When multiple threads want to access the same shared data for just reading only, it makes sense for them to not block each other. Only when a thread wants to modify the shared data should other threads then be blocked from accessing that data, for reading or writing, until the modification is finished.
For example:
const int NUM_LOCKS = CURL_LOCK_DATA_LAST + 1;
static pthread_rwlock_t lockarray[NUM_LOCKS];
...
static void lock_cb(CURL *handle, curl_lock_data data,
curl_lock_access access, void *userptr)
{
(void)handle;
(void)userptr;
switch (access) {
case CURL_LOCK_ACCESS_SHARED:
pthread_rwlock_rdlock(&lockarray[data]);
break;
case CURL_LOCK_ACCESS_ SINGLE:
pthread_rwlock_wrlock(&lockarray[data]);
break;
}
}
static void unlock_cb(CURL *handle, curl_lock_data data,
curl_lock_access access, void *userptr)
{
(void)handle;
(void)access;
(void)userptr;
pthread_rwlock_unlock(&lockarray[data]);
}

C++: __sync_synchronize() still needed with std::atomic?

I've been running into an infrequent but re-occurring race condition.
The program has two threads and uses std::atomic. I'll simplify the critical parts of the code to look like:
std::atomic<uint64_t> b; // flag, initialized to 0
uint64_t data[100]; // shared data, initialized to 0
thread 1 (publishing):
// set various shared variables here, for example
data[5] = 10;
uint64_t a = b.exchange(1); // signal to thread 2 that data is ready
thread 2 (receiving):
if (b.load() != 0) { // signal that data is ready
// read various shared variables here, for example:
uint64_t x = data[5];
// race condition sometimes (x sometimes not consistent)
}
The odd thing is that when I add __sync_synchronize() to each thread, then the race condition goes away. I've seen this happen on two different servers.
i.e. when I change the code to look like the following, then the problem goes away:
thread 1 (publishing):
// set various shared variables here, for example
data[5] = 10;
__sync_synchronize();
uint64_t a = b.exchange(1); // signal to thread 2 that data is ready
thread 2 (receiving):
if (b.load() != 0) { // signal that data is ready
__sync_synchronize();
// read various shared variables here, for example:
uint64_t x = data[5];
}
Why is __sync_synchronize() necessary? It seems redundant as I thought both exchange and load ensured the correct sequential ordering of logic.
Architecture is x86_64 processors, linux, g++ 4.6.2
Whilst it is impossible to say from your simplified code what actually goes on in your actual application, the fact that __sync_synchronize helps, and the fact that this function is a memory barrier tells me that you are writing things in the one thread that the other thread is reading, in a way that isn't atomic.
An example:
thread_1:
object *p = new object;
p->x = 1;
b.exchange(p); /* give pointer p to other thread */
thread_2:
object *p = b.load();
if (p->x == 1) do_stuff();
else error("Huh?");
This may very well trigger the error-path in thread2, because the write to p->x has not actually been completed when thread 2 reads the new pointer value p.
Adding memory barrier, in this case, in the thread_1 code should fix this. Note that for THIS case, a memory barrier in thread_2 will not do anything - it may alter the timing and appear to fix the problem, but it won't be the right thing. You may need memory barriers on both sides still, if you are reading/writing memory that is shared between two threads.
I understand that this may not be precisely what your code is doing, but the concept is the same - __sync_synchronize ensures that memory reads and memory writes have completed for ALL of the instructions before that function call [which isn't a real function call, it will inline a single instruction that waits for any pending memory operations to comlete].
Noteworthy is that operations on std::atomic ONLY affect the actual data stored in the atomic object. Not reads/writes of other data.
Sometimes you also need a "compiler barrier" to avoid the compiler moving stuff from one side of an operation to another:
std::atomic<bool> flag(false);
value = 42;
flag.store(true);
....
another thread:
while(!flag.load());
print(value);
Now, there is a chance that the compiler generates the first form as:
flag.store(true);
value = 42;
Now, that wouldn't be good, would it? std::atomic is guaranteed to be a "compiler barrier", but in other cases, the compiler may well shuffle stuff around in a similar way.

Proper compiler intrinsics for double-checked locking?

When implementing double-checked locking, what is the proper way to do the memory and/or compiler barriers when implementing double-checked locking for initialization?
Something like std::call_once isn't what I want; it's way too slow. It's typically just implemented on top of pthread_mutex_lock and EnterCriticalSection respective to OS.
In my programs, I often run into initialization cases where the initialization is safe to repeat, as long as exactly one thread gets to set the final pointer. If another thread beats it to setting the final pointer to the singleton object, it deletes what it created and makes use of the other thread's. I also often use this in cases where it doesn't matter which thread "wins" because they all come up with the same result.
Here's an unsafe, overly-contrived example, using Visual C++ intrinsics:
MyClass *GetGlobalMyClass()
{
static MyClass *const UNSET_POINTER = reinterpret_cast<MyClass *>(
static_cast<intptr_t>(-1));
static MyClass *volatile s_object = UNSET_POINTER;
if (s_object == UNSET_POINTER)
{
MyClass *newObject = MyClass::Create();
if (_InterlockedCompareExchangePointer(&s_object, newObject,
UNSET_POINTER) != UNSET_POINTER)
{
// Another thread beat us. If Create didn't return null, destroy.
if (newObject)
{
newObject->Destroy(); // calls "delete this;", presumably
}
}
}
return s_object;
}
On a weakly-ordered memory architecture, my understanding is that it's possible that the new value of s_object is visible to other threads before other variables written inside MyClass::Create or MyClass::MyClass are visible. Also, the compiler itself could arrange the code this way in the absence of a compiler barrier (in Visual C++, _WriteBarrier, but _InterlockedCompareExchange acts as a barrier).
Do I need like a store fence intrinsic function in there or something in order to ensure that MyClass's variables are visible to all threads before s_object becomes somethings besides -1?
Fortunately, the rules in C++ are very simple:
If there is a data race, the behaviour is undefined.
In you code the data race is caused by the following read, which conflicts with the write operation in __InterlockedCompareExchangePointer.
if (s_object.m_void == UNSET_POINTER)
A thread-safe solution without blocking might look as follows. Note that on x86 a load operation with sequential consistency has basically no overhead compared to a regular load operation. If you care about other architectures, you can also use acquire release instead of sequential consistency.
static std::atomic<MyClass*> s_object{nullptr};
MyClass* o = s_object.load(std::memory_order_seq_cst);
if (o == nullptr) {
o = new MyClass{...};
MyClass* expected = nullptr;
if (!s_object.compare_exchange_strong(expected, o, std::memory_order_seq_cst)) {
delete o;
o = expected;
}
}
return o;
For a proper C++11 implementation any function-local static variable will be constructed in a thread-safe fashion by the first thread passing through this variable.

Is non-re-enterant issue comes only when we have multiple threads?

I was looking into the old concept of writing re-enterant code. They say, don't use global, static variables. Because, it is prone to non-deterministic behaviour. However, I am not sure about whether it is applicable where there is a single thread. I understand thread safe and reentrancy are two different concepts. While using C++, we avoid using global and static variables. Even, we try to avoid singletons. But, the question is what will happen to his piece of code borrowed from the wiki, if it is running on a single thread. Is it prone to non-deterministic behaviour.
http://en.wikipedia.org/wiki/Reentrancy_%28computing%29
int g_var = 1;
int f()
{
g_var = g_var + 2;
return g_var;
}
int g()
{
return f() + 2;
}
Another part is people say that re-entrancy and thread safe are unrelated. However, in this example we can make the results predictable by putting a mutex in function g. So, when two threads execute this g() concurrently the results are deterministic. So, we actually fixed the non-enterant code by putting a thread safe mechanism. Please clear my concepts here. Am I missing something or my understanding is not right?
Reentrancy problems have more to do with e.g. shared libraries than threads. Remember that shared libraries are shared, there is only one copy of a shared library loaded, no matter how many processes use the library. This of course means that global and static data is shared as well, which can lead to problems. And normal in-process mechanisms like thread mutexes will not help here, since those are only per process, you have to use inter-process facilities to protect this data. And doing it is often too much work, it's often easier to avoid the issue completely by not having global or static data in shared libraries.
Note that this program will be compiled to something like this:
int g_var = 1;
int f()
{
int tmp = gvar;
// hardware interrupt might invoke isr() here!
tmp += 2;
g_var = tmp;
return tmp;
}
int g()
{
int tmp = f();
return tmp + 2;
}
So if f() is interrupted in the middle and reentered, the two invocations of f() will increase g_var by only 2 and not 4.
You must take care in the following situations:
Multi-threaded programs.
Multi-process programs where several processes share the same variables.
Programs using hardware interrupts and interrupt service routines.
Programs using certain forms of callback functions.
The code you posted uses none of the above so it will work perfectly fine.
Another part is people say that re-entrancy and thread safe are unrelated.
Re-entrant usually means "this code is written in such a manner, that it needs no protection mechanisms". For example, a function that only uses local variables and no library calls is re-entrant.
Thread-safe means that the code is properly using mechanisms such as mutexes or critical sections to protect shared resources.
The opposite of re-entrant is non-re-entrant. And when you fix non-re-entrant code, you make it thread-safe. So the terms are related but not synonymous: they mean different things.
Reentrancy problems are not unique to multi-threaded code.
Consider the following (single thread) code:
time_t t = time(NULL);
struct tm *a;
struct tm *b;
a = localtime(&t);
t += 3600;
b = localtime(&t);
Here you have an issue, since localtime() is not reentrant.
It (usually) returns a pointer to static storage.
Thus while the struct tm a and b pointers should have different content, they're now the same. a == b, since the last call would change the same struct that a points to.
The above code can be fixed for a single thread program like so:
time_t t = time(NULL);
struct tm *tmp;
struct tm a;
struct tm b;
tmp = localtime(&t);
a = *tmp;
t += 3600;
b = localtime(&t);
b = *tmp;
For multi threaded programs another set of problems are introduced by non-reentrant functions, as multiple threads could call localtime() at the same time, leading to unpredictable race conditions. Now, some platforms that
supports threads implements localtime() and other non-reentrant functions using thread-local storage, reducing the problem to be similar to that of a single threaded program.
Reentrancy problems related to multi threading can normally be solved with a mutex. To safely use localtime(), you could create your own function that protects calling localtime() and copying out
the result:
static mutex lmutex;
struct my_localtime(time_t *t, struct tm *result)
{
struct tm *tmp;
mutex_lock(&lmutex);
tmp = localtime(tmp);
*result = *tmp;
mutex_unlock(&lmutex);
}

Do I need to use volatile keyword if I declare a variable between mutexes and return it?

Let's say I have the following function.
std::mutex mutex;
int getNumber()
{
mutex.lock();
int size = someVector.size();
mutex.unlock();
return size;
}
Is this a place to use volatile keyword while declaring size? Will return value optimization or something else break this code if I don't use volatile? The size of someVector can be changed from any of the numerous threads the program have and it is assumed that only one thread (other than modifiers) calls getNumber().
No. But beware that the size may not reflect the actual size AFTER the mutex is released.
Edit:If you need to do some work that relies on size being correct, you will need to wrap that whole task with a mutex.
You haven't mentioned what the type of the mutex variable is, but assuming it is an std::mutex (or something similar meant to guarantee mutual exclusion), the compiler is prevented from performing a lot of optimizations. So you don't need to worry about return value optimization or some other optimization allowing the size() query from being performed outside of the mutex block.
However, as soon as the mutex lock is released, another waiting thread is free to access the vector and possibly mutate it, thus changing the size. Now, the number returned by your function is outdated. As Mats Petersson mentions in his answer, if this is an issue, then the mutex lock needs to be acquired by the caller of getNumber(), and held until the caller is done using the result. This will ensure that the vector's size does not change during the operation.
Explicitly calling mutex::lock followed by mutex::unlock quickly becomes unfeasible for more complicated functions involving exceptions, multiple return statements etc. A much easier alternative is to use std::lock_guard to acquire the mutex lock.
int getNumber()
{
std::lock_guard<std::mutex> l(mutex); // lock is acquired
int size = someVector.size();
return size;
} // lock is released automatically when l goes out of scope
Volatile is a keyword that you use to tell the compiler to literally actually write or read the variable and not to apply any optimizations. Here is an example
int example_function() {
int a;
volatile int b;
a = 1; // this is ignored because nothing reads it before it is assigned again
a = 2; // same here
a = 3; // this is the last one, so a write takes place
b = 1; // b gets written here, because b is volatile
b = 2; // and again
b = 3; // and again
return a + b;
}
What is the real use of this? I've seen it in delay functions (keep the CPU busy for a bit by making it count up to a number) and in systems where several threads might look at the same variable. It can sometimes help a bit with multi-threaded things, but it isn't really a threading thing and is certainly not a silver bullet