Locking when accessing shared memory for reading - c++

If I am accessing shared memory for reading only, to check a condition for an if() block, should I still lock the mutex? E.g.
mutex_lock();
if (var /* shared memory */) {
}
mutex_unlock();
Is locking here needed and good practice?

If the variable you are reading could be written to concurrently, then yes, you should acquire a lock on the mutex.
You could only read it atomically if your compiler provides you with the necessary primitives for that; this could be either the atomic features that come with C11 and C++11 or some other language extension provided by your compiler. Then you could move the mutex acquisition into the conditional, but if you wait until after the test to acquire the mutex then someone else may change it between the time you test it and the time you acquire the mutex:
if (example) {
// "example" variable could be changed here by another thread.
mutex_lock();
// Now the condition might be false!
mutex_unlock();
}
Therefore, I would suggest acquiring the mutex before the conditional, unless profiling has pinpointed mutex acquisition as a bottleneck. (And in the case where the tested variable is larger than a CPU register -- a 64-bit number on a 32-bit CPU, for example -- then you don't even have the option of delaying mutex acquisition without some other kind of atomic fetch or compare primitive.)

Related

Use mutex or not in a concurrent reading

I am programming in C++ in Linux and I am using pthreads library. I am using mutex to protect some shared variables but I am not sure if in this specific case it is necessary the use of mutex.
I have 3 threads. The shared variable is a string (global variable).
Thread1 changes it's value and afterwards, thread2 and thread3 read it's value and store in another string.
In this case, the string's value is only modified by one thread. Is still necessary the use of mutex to protect a shared variable in a concurrent read by two threads?
"Thread1 changes it's value and afterwards ..." -- if "afterwards" means that the other threads are created after the change, there's no need for a mutex; thread creation synchronizes memory. If it means anything else then you need some form of synchronization, in part because "afterwards" in different threads is meaningless without synchronization.
What you should use is a shared_mutex (get it from boost if you don't want to use C++14/17) (for C++14 there's a shared_timed_mutex that you could use). Then, you do a shared_lock if you want to read the string, and you do a unique_lock if you want to write on it.
If two shared locks meet, they don't collide and they don't block, but a shared lock and a unique lock collide and one of the locks blocks until the other finishes.
Since you are using pthreads, you can use a pthread_rwlock_t.
For updating the object, it would be locked using pthread_rwlock_wrlock() to get a write lock; all readers would access the object only after obtaining a shared read lock with pthread_rwlock_rdlock(). Since the write lock is exclusive, and the read lock is shared, you'd get the behavior you desire.
An example of the use of pthread_rwlock_t read/write locks can be found at http://www.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.genprogc/using_readwrite_locks.htm.
A good summary of the available calls for use on a pthread_rwlock_t lock can be found at https://docs.oracle.com/cd/E19455-01/806-5257/6je9h032u/index.html. I've reproduced the table listing the operations:
Operation
Initialize a read-write lock "pthread_rwlock_init(3THR)"
Read lock on read-write lock "pthread_rwlock_rdlock(3THR)"
Read lock with a nonblocking read-write lock "pthread_rwlock_tryrdlock(3THR)"
Write lock on read-write lock "pthread_rwlock_wrlock(3THR)"
Write lock with a nonblocking read-write lock "pthread_rwlock_trywrlock(3THR)"
Unlock a read-write lock "pthread_rwlock_unlock(3THR)"
Destroy a read-write lock "pthread_rwlock_destroy(3THR)"

Do I need a memory barrier for a change notification flag between threads?

I need a very fast (in the sense "low cost for reader", not "low latency") change notification mechanism between threads in order to update a read cache:
The situation
Thread W (Writer) updates a data structure (S) (in my case a setting in a map) only once in a while.
Thread R (Reader) maintains a cache of S and does read this very frequently. When Thread W updates S Thread R needs to be notified of the update in reasonable time (10-100ms).
Architecture is ARM, x86 and x86_64. I need to support C++03 with gcc 4.6 and higher.
Code
is something like this:
// variables shared between threads
bool updateAvailable;
SomeMutex dataMutex;
std::string myData;
// variables used only in Thread R
std::string myDataCache;
// Thread W
SomeMutex.Lock();
myData = "newData";
updateAvailable = true;
SomeMutex.Unlock();
// Thread R
if(updateAvailable)
{
SomeMutex.Lock();
myDataCache = myData;
updateAvailable = false;
SomeMutex.Unlock();
}
doSomethingWith(myDataCache);
My Question
In Thread R no locking or barriers occur in the "fast path" (no update available).
Is this an error? What are the consequences of this design?
Do I need to qualify updateAvailable as volatile?
Will R get the update eventually?
My understanding so far
Is it safe regarding data consistency?
This looks a bit like "Double Checked Locking". According to http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html a memory barrier can be used to fix it in C++.
However the major difference here is that the shared resource is never touched/read in the Reader fast path. When updating the cache, the consistency is guaranteed by the mutex.
Will R get the update?
Here is where it gets tricky. As I understand it, the CPU running Thread R could cache updateAvailable indefinitely, effectively moving the Read way way before the actual if statement.
So the update could take until the next cache flush, for example when another thread or process is scheduled.
Use C++ atomics and make updateAvailable an std::atomic<bool>. The reason for this is that it's not just the CPU that can see an old version of the variable but especially the compiler which doesn't see the side effect of another thread and thus never bothers to refetch the variable so you never see the updated value in the thread. Additionally, this way you get a guaranteed atomic read, which you don't have if you just read the value.
Other than that, you could potentially get rid of the lock, if for example the producer only ever produces data when updateAvailable is false, you can get rid of the mutex because the std::atomic<> enforces proper ordering of the reads and writes. If that's not the case, you'll still need the lock.
You do have to use a memory fence here. Without the fence, there is no guarantee updates will be ever seen on the other thread. In C++03 you have the option of either using platform-specific ASM code (mfence on Intel, no idea about ARM) or use OS-provided atomic set/get functions.
Do I need to qualify updateAvailable as volatile?
As volatile doesn't correlate with threading model in C++, you should use atomics for make your program strictly standard-confirmant:
On C++11 or newer preferable way is to use atomic<bool> with memory_order_relaxed store/load:
atomic<bool> updateAvailable;
//Writer
....
updateAvailable.store(true, std::memory_order_relaxed); //set (under mutex locked)
// Reader
if(updateAvailable.load(std::memory_order_relaxed)) // check
{
...
updateAvailable.store(false, std::memory_order_relaxed); // clear (under mutex locked)
....
}
gcc since 4.7 supports similar functionality with in its atomic builtins.
As for gcc 4.6, it seems there is not strictly-confirmant way to evade fences when access updateAvailable variable. Actually, memory fence is usually much faster than 10-100ms order of time. So you can use its own atomic builtins:
int updateAvailable = 0;
//Writer
...
__sync_fetch_and_or(&updateAvailable, 1); // set to non-zero
....
//Reader
if(__sync_fetch_and_and(&updateAvailable, 1)) // check, but never change
{
...
__sync_fetch_and_and(&updateAvailable, 0); // clear
...
}
Is it safe regarding data consistency?
Yes, it is safe. Your reason is absolutely correct here:
the shared resource is never touched/read in the Reader fast path.
This is NOT double-check locking!
It is explicitely stated in the question itself.
In case when updateAvailable is false, Reader thread uses variable myDataCache which is local to the thread (no other threads use it). With double-check locking scheme all threads use shared object directly.
Why memory fences/barriers are NOT NEEDED here
The only variable, accessed concurrently, is updateAvailable. myData variable is accessed with mutex protection, which provides all needed fences. myDataCache is local to the Reader thread.
When Reader thread sees updateAvailable variable to be false, it uses myDataCache variable, which is changed by the thread itself. Program order garantees correct visibility of changes in that case.
As for visibility garantees for variable updateAvailable, C++11 standard provide such garantees for atomic variable even without fences. 29.3 p13 says:
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
Jonathan Wakely has confirmed, that this paragraph is applied even to memory_order_relaxed accesses in chat.

pthreads locking scheme to allow concurrent reads of a shared data structure

Let's say you have some code that both reads and writes to a data structure. If you have multiple threads executing this code (and sharing the data structure), is there some arrangement that would achieve the following:
Allow 2 or more concurrent reads, with no writes
Disallow 2 or more writes
Disallow 1 or more reads concurrently with 1 or more writes
A single mutex that is locked during any read and any write achieves goals 2 and 3, but fails to achieve goal 1. Is there some solution that achieves all three goals?
Assume that it is not possible to devise a scheme where different sub-sections of the data structure can be protected with different mutexes.
My clunkly approach to this is:
Have one mutex per thread, and each thread locks its own mutex when it needs to read.
Have one additional 'global' mutex. When any thread wants to write, it first locks this global mutex. Then it goes through a loop of pthread_mutex_trylock() on all of the thread-specific mutexes until it has locked them all, then performs the write, then unlocks them all. Finally, it unlocks the global mutex.
This approach seems to be likely not very efficient, however.
Thanks in advance,
Henry
Pthreads includes reader-writer locks that have this behaviour. You initialise them in an analagous way to mutexes - either statically:
pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER;
or dynamically with pthread_rwlock_init().
To lock for reading (shared) you use pthread_rwlock_rdlock(), and to lock for writing (exclusive) you use pthread_rwlock_wrlock(). There are also "trylock" and "timedlock" variations of these.
You can, of course, also build such a lock from pthreads mutex and condition variables. For example, you could implement the reader-side lock as:
pthread_mutex_lock(&mutex);
readers++;
pthread_mutex_unlock(&mutex);
The writer-side lock is:
pthread_mutex_lock(&mutex);
while (readers > 0)
pthread_cond_wait(&mutex, &cond);
The reader-side unlock is:
pthread_mutex_lock(&mutex);
if (--readers == 0)
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
And the writer-side unlock is:
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
(This is just for interest's sake - you are better off using the built-in reader-writer locks, because those can be implemented directly using architecture-specific code which may well be more efficient than using the other pthreads primitives).
Note also that in a real implementation you would want to consider the case of readers overflowing.

Boost, mutex concept

I am new to multi-threading programming, and confused about how Mutex works. In the Boost::Thread manual, it states:
Mutexes guarantee that only one thread can lock a given mutex. If a code section is surrounded by a mutex locking and unlocking, it's guaranteed that only a thread at a time executes that section of code. When that thread unlocks the mutex, other threads can enter to that code region:
My understanding is that Mutex is used to protect a section of code from being executed by multiple threads at the same time, NOT protect the memory address of a variable. It's hard for me to grasp the concept, what happen if I have 2 different functions trying to write to the same memory address.
Is there something like this in Boost library:
lock a memory address of a variable, e.g., double x, lock (x); So
that other threads with a different function can not write to x.
do something with x, e.g., x = x + rand();
unlock (x)
Thanks.
The mutex itself only ensures that only one thread of execution can lock the mutex at any given time. It's up to you to ensure that modification of the associated variable happens only while the mutex is locked.
C++ does give you a way to do that a little more easily than in something like C. In C, it's pretty much up to you to write the code correctly, ensuring that anywhere you modify the variable, you first lock the mutex (and, of course, unlock it when you're done).
In C++, it's pretty easy to encapsulate it all into a class with some operator overloading:
class protected_int {
int value; // this is the value we're going to share between threads
mutex m;
public:
operator int() { return value; } // we'll assume no lock needed to read
protected_int &operator=(int new_value) {
lock(m);
value = new_value;
unlock(m);
return *this;
}
};
Obviously I'm simplifying that a lot (to the point that it's probably useless as it stands), but hopefully you get the idea, which is that most of the code just treats the protected_int object as if it were a normal variable.
When you do that, however, the mutex is automatically locked every time you assign a value to it, and unlocked immediately thereafter. Of course, that's pretty much the simplest possible case -- in many cases, you need to do something like lock the mutex, modify two (or more) variables in unison, then unlock. Regardless of the complexity, however, the idea remains that you centralize all the code that does the modification in one place, so you don't have to worry about locking the mutex in the rest of the code. Where you do have two or more variables together like that, you generally will have to lock the mutex to read, not just to write -- otherwise you can easily get an incorrect value where one of the variables has been modified but the other hasn't.
No, there is nothing in boost(or elsewhere) that will lock memory like that.
You have to protect the code that access the memory you want protected.
what happen if I have 2 different functions trying to write to the same
memory address.
Assuming you mean 2 functions executing in different threads, both functions should lock the same mutex, so only one of the threads can write to the variable at a given time.
Any other code that accesses (either reads or writes) the same variable will also have to lock the same mutex, failure to do so will result in indeterministic behavior.
It is possible to do non-blocking atomic operations on certain types using Boost.Atomic. These operations are non-blocking and generally much faster than a mutex. For example, to add something atomically you can do:
boost::atomic<int> n = 10;
n.fetch_add(5, boost:memory_order_acq_rel);
This code atomically adds 5 to n.
In order to protect a memory address shared by multiple threads in two different functions, both functions have to use the same mutex ... otherwise you will run into a scenario where threads in either function can indiscriminately access the same "protected" memory region.
So boost::mutex works just fine for the scenario you describe, but you just have to make sure that for a given resource you're protecting, all paths to that resource lock the exact same instance of the boost::mutex object.
I think the detail you're missing is that a "code section" is an arbitrary section of code. It can be two functions, half a function, a single line, or whatever.
So the portions of your 2 different functions that hold the same mutex when they access the shared data, are "a code section surrounded by a mutex locking and unlocking" so therefore "it's guaranteed that only a thread at a time executes that section of code".
Also, this is explaining one property of mutexes. It is not claiming this is the only property they have.
Your understanding is correct with respect to mutexes. They protect the section of code between the locking and unlocking.
As per what happens when two threads write to the same location of memory, they are serialized. One thread writes its value, the other thread writes to it. The problem with this is that you don't know which thread will write first (or last), so the code is not deterministic.
Finally, to protect a variable itself, you can find a near concept in atomic variables. Atomic variables are variables that are protected by either the compiler or the hardware, and can be modified atomically. That is, the three phases you comment (read, modify, write) happen atomically. Take a look at Boost atomic_count.

One reader. One writer. Some general questions about mutexes and atomic-builtins

I have a parent and a worker thread that share a bool flag and a std::vector. The parent only reads (i.e., reads the bool or calls my_vector.empty()); the worker only writes.
My questions:
Do I need to mutex protect the bool flag?
Can I say that all bool read/writes are inherently atomic operations? If you say Yes or No, where did you get your information from?
I recently heard about GCC Atomic-builtin. Can I use these to make my flag read/writes atomic without having to use mutexes? What is the difference? I understand Atomic builtins boil down to machine code, but even mutexes boil down to CPU's memory barrier instructions right? Why do people call mutexes an "OS-level" construct?
Do I need to mutex protect my std::vector? Recall that the worker thread populates this vector, whereas the parent only calls empty() on it (i.e., only reads it)
I do not believe mutex protection is necessary for either the bool or the vector. I rationalize as follows, "Ok, if I read the shared memory just before it was updated.. thats still fine, I will get the updated value the next time around. More importantly, I do not see why the writer should be blocked while the reading is reading, because afterall, the reader is only reading!"
If someone can point me in the right direction, that would be just great. I am on GCC 4.3, and Intel x86 32-bit.
Thanks a lot!
Do I need to mutex protect the bool flag?
Not necessarily, an atomic instruction would do. By atomic instruction I mean a compiler intrinsic function that a) prevents compiler reordering/optimization and b) results in atomic read/write and c) issues an appropriate memory fence to ensure visibility between CPUs (not necessary for current x86 CPUs which employ MESI cache coherency protocol). Similar to gcc atomic builtins.
Can I say that all bool read/writes are inherently atomic operations? If you say Yes or No, where did you get your information from?
Depends on the CPU. For Intel CPUs - yes. See Intel® 64 and IA-32 Architectures Software Developer's Manuals.
I recently heard about GCC Atomic-builtin. Can I use these to make my flag read/writes atomic without having to use mutexes? What is the difference? I understand Atomic builtins boil down to machine code, but even mutexes boil down to CPU's memory barrier instructions right? Why do people call mutexes an "OS-level" construct?
The difference between atomics and mutexes is that the latter can put the waiting thread to sleep until the mutex is released. With atomics you can only busy-spin.
Do I need to mutex protect my std::vector? Recall that the worker thread populates this vector, whereas the parent only calls empty() on it (i.e., only reads it)
You do.
I do not believe mutex protection is necessary for either the bool or the vector. I rationalize as follows, "Ok, if I read the shared memory just before it was updated.. thats still fine, I will get the updated value the next time around. More importantly, I do not see why the writer should be blocked while the reading is reading, because afterall, the reader is only reading!"
Depending on the implementation, vector.empty() may involve reading two buffer begin/end pointers and subtracting or comparing them, hence there is a chance that you read a new version of one pointer and an old version of another one without a mutex. Surprising behaviour may ensue.
From the C++11 standards point of view, you have to protect the bool with a mutex, or alternatively use std::atomic<bool>. Even when you are sure that your bool is read and written to atomically anyways, there is still the chance that the compiler can optimize away accesses to it because it does not know about other threads that could potentially access it.
If for some reason you absolutely need the latest bit of performance of your platform, consider reading the "Intel 64 and IA-32 Architectures Software Developer's Manual", which will tell you how things work under the hood on your architecture. But of course, this will make your program unportable.
Answers:
You will need to protect the bool (or any other variable for that matter) that has the possibility of being operated on by two or more threads at the same time. You can either do this with a mutex or by operating on the bool atomically.
Bool reads and bool writes may be atomic operations, but two sequential operations are certainly not (e.g., a read and then a write). More on this later.
Atomic builtins provide a solution to the problem above: the ability to read and write a variable in a step that cannot be interrupted by another thread. This makes the operation atomic.
If you are using the bool flag as your 'mutex' (that is, only the thread that sets the bool flag to true has permission to modify the vector) then you're OK. The mutual exclusion is managed by the boolean, and as long as you're modifying the bool using atomic operations you should be all set.
To answer this, let me use an example:
bool flag(false);
std::vector<char> my_vector;
while (true)
{
if (flag == false) // check to see if the mutex is owned
{
flag = true; // obtain ownership of the flag (the mutex)
// manipulate the vector
flag = false; // release ownership of the flag
}
}
In the above code in a multithreaded environment it is possible for the thread to be preempted between the if statement (the read) and the assignment (the write), which means it possible for two (or more) threads with this kind of code to both "own" the mutex (and the rights to the vector) at the same time. This is why atomic operations are crucial: they ensure that in the above scenario the flag will only be set by one thread at a time, therefore ensuring the vector will only be manipulated by one thread at a time.
Note that setting the flag back to false need not be an atomic operation because you this instance is the only one with rights to modify it.
A rough (read: untested) solution may look something like:
bool flag(false);
std::vector<char> my_vector;
while (true)
{
// check to see if the mutex is owned and obtain ownership if possible
if (__sync_bool_compare_and_swap(&flag, false, true))
{
// manipulate the vector
flag = false; // release ownership of the flag
}
}
The documentation for the atomic builtin reads:
The “bool” version returns true if the comparison is successful and newval was written.
Which means the operation will check to see if flag is false and if it is set the value to true. If the value was false true is returned, otherwise false. All of this happens in an atomic step, so it is guaranteed not to be preempted by another thread.
I don't have the expertise to answer your entire question but your last bullet is incorrect in cases in which reads are non-atomic by default.
A context switch can happen anywhere, the reader can get context switched partway through a read, the writer can get switched in and do the full write, and then the reader would finish their read. The reader would see neither the first value, nor the second value, but potentially some wildly inaccurate intermediate value.