pthreads locking scheme to allow concurrent reads of a shared data structure - concurrency

Let's say you have some code that both reads and writes to a data structure. If you have multiple threads executing this code (and sharing the data structure), is there some arrangement that would achieve the following:
Allow 2 or more concurrent reads, with no writes
Disallow 2 or more writes
Disallow 1 or more reads concurrently with 1 or more writes
A single mutex that is locked during any read and any write achieves goals 2 and 3, but fails to achieve goal 1. Is there some solution that achieves all three goals?
Assume that it is not possible to devise a scheme where different sub-sections of the data structure can be protected with different mutexes.
My clunkly approach to this is:
Have one mutex per thread, and each thread locks its own mutex when it needs to read.
Have one additional 'global' mutex. When any thread wants to write, it first locks this global mutex. Then it goes through a loop of pthread_mutex_trylock() on all of the thread-specific mutexes until it has locked them all, then performs the write, then unlocks them all. Finally, it unlocks the global mutex.
This approach seems to be likely not very efficient, however.
Thanks in advance,
Henry

Pthreads includes reader-writer locks that have this behaviour. You initialise them in an analagous way to mutexes - either statically:
pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER;
or dynamically with pthread_rwlock_init().
To lock for reading (shared) you use pthread_rwlock_rdlock(), and to lock for writing (exclusive) you use pthread_rwlock_wrlock(). There are also "trylock" and "timedlock" variations of these.
You can, of course, also build such a lock from pthreads mutex and condition variables. For example, you could implement the reader-side lock as:
pthread_mutex_lock(&mutex);
readers++;
pthread_mutex_unlock(&mutex);
The writer-side lock is:
pthread_mutex_lock(&mutex);
while (readers > 0)
pthread_cond_wait(&mutex, &cond);
The reader-side unlock is:
pthread_mutex_lock(&mutex);
if (--readers == 0)
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
And the writer-side unlock is:
pthread_cond_signal(&cond);
pthread_mutex_unlock(&mutex);
(This is just for interest's sake - you are better off using the built-in reader-writer locks, because those can be implemented directly using architecture-specific code which may well be more efficient than using the other pthreads primitives).
Note also that in a real implementation you would want to consider the case of readers overflowing.

Related

std::condition_variable memory writes visibility

Does std::condition_variable::notify_one() or std::condition_variable::notify_all() guarantee that non-atomic memory writes in the current thread prior to the call will be visible in notified threads?
Other threads do:
{
std::unique_lock lock(mutex);
cv.wait(lock, []() { return values[threadIndex] != 0; });
// May a thread here see a zero value and therefore start to wait again?
}
Main thread does:
fillData(values); // All values are zero and all threads wait() before calling this.
cv.notify_all(); // Do need some memory fence or lock before this
// to ensure that new non-zero values will be visible
// in other threads immediately after waking up?
Doesn't notify_all() store some atomic value therefore enforcing memory ordering? I did not clarified it.
UPD: according to Superlokkus' answer and an answer here: we have to acquire a lock to ensure memory writes visibility in other threads (memory propagation), otherwise threads in my case may read zero values.
Also I missed this quote here about condition_variable, which specifically answers my question. Even an atomic variable has to be modified under a lock in a case when the modification must become visible immediately.
Even if the shared variable is atomic, it must be modified under the
mutex in order to correctly publish the modification to the waiting
thread.
I guess you are mixing up memory ordering of so called atomic values and the mechanisms of classic lock based synchronization.
When you have a datum which is shared between threads, lets say an int for example, one thread can not simply read it while the other thread might be write to it meanwhile. Otherwise we would have a data race.
To get around this for long time we used classic lock based synchronization:
The threads share at least a mutex and the int. To read or to write any thread has to hold the lock first, meaning they wait on the mutex. Mutexes are build so that they are fine that this can happen concurrently. If a thread wins gettting the mutex it can change or read the int and then should unlock it, so others can read/write too. Using a conditional variable like you used is just to make the pattern "readers wait for a change of a value by a writer" more efficient, they get woken up by the cv instead of periodically waiting on the lock, reading, and unlocking, which would be called busy waiting.
So because you hold the lock in any after waiting on the mutex or in you case, correctly (mutex is still needed) waiting on the conditional variable, you can change the int. And readers will read the new value after the writer was able to wrote it, never the old. UPDATE: However one thing if have to add, which might also be the cause of confusion: Conditional variables are subject for so called spurious wakeups. Meaning even though you write did not have notified any thread, a read thread might still wake up, with the mutex locked. So you have to check if you writer actually waked you up, which is usually done by the writer by changing another datum just to notify this, or if its suitable by using the same datum you already wanted to share. The lambda parameter overload of std::condition_variable::wait was just made to make the checking and going back to sleep code looking a bit prettier. Based on your question now I don't know if you want to use you values for this job.
However at snippet for the "main" thread is incorrect or incomplete:
You are not synchronizing on the mutex in order to change values.
You have to hold the lock for that, but notifying can be done without the lock.
std::unique_lock lock(mutex);
fillData(values);
lock.unlock();
cv.notify_all();
But these mutex based patters have some drawbacks and are slow, only one thread at a time can do something. This is were so called atomics, like std::atomic<int> came into play. They can be written and read at the same time without an mutex by multiple threads concurrently. Memory ordering is only a thing to consider there and an optimization for cases where you uses several of them in a meaningful way or you don't need the "after the write, I never see the old value" guarantee. However with it's default memory ordering memory_order_seq_cst you would also be fine.

Why would I want to lock two mutexes in one function - that too with deferred lock?

https://en.cppreference.com/w/cpp/thread/lock_tag
void transfer(bank_account &from, bank_account &to, int amount)
{
// lock both mutexes without deadlock
std::lock(from.m, to.m);
// make sure both already-locked mutexes are unlocked at the end of scope
std::lock_guard<std::mutex> lock1(from.m, std::adopt_lock);
std::lock_guard<std::mutex> lock2(to.m, std::adopt_lock);
// equivalent approach:
// std::unique_lock<std::mutex> lock1(from.m, std::defer_lock);
// std::unique_lock<std::mutex> lock2(to.m, std::defer_lock);
// std::lock(lock1, lock2);
from.balance -= amount;
to.balance += amount;
}
What do they gain by locking two mutexes at once?
What have they gained by defered lock here?
Please explain the reason behind that decision of theirs.
If I modify a bank account without locking it, someone else could try to modify it at the same time. This is a race and the result will be undefined behaviour (usually lost or magically created money).
While transferring money, I am modifying 2 bank accounts. So they both need to be locked.
The problem is that when locking more than one thing, every locker must lock and unlock in the same order, otherwise we get deadlocks.
When it's bank accounts, there is no natural order of locks. Thousands of threads could be transferring money in all directions.
So we need a method of locking more than one mutex in a way that works around this - this is std::lock
std::lock merely locks the mutex - it does not guarantee unlocking on exit from the current code block.
std::lock_guard<> unlocks the mutex it's referring to upon destruction (see RAII). This makes the code behave correctly in all circumstances - even when there is an exception which might cause an early exit from the current code block without the code flowing over statement such as to.m.unlock()
A good explanation (with examples) here: https://wiki.sei.cmu.edu/confluence/display/cplusplus/CON53-CPP.+Avoid+deadlock+by+locking+in+a+predefined+order
Extension on Richard Hodges's answer
What do they gain by locking two mutexes at once?
Richard explained nicely already, just a little bit more explicit: we avoid dead-lock this way (std::lock is implemented such that dead-lock won't occur).
What have they gained by deferred lock here?
Deferring the lock results in not acquiring it immediately. That's important because if they did so, they would just do it without any protection against dead-lock (which the subsequent std::lock then achieves).
About dead lock avoidance (see std::lock):
Locks the given Lockable objects lock1, lock2, ..., lockn using a deadlock avoidance algorithm to avoid deadlock.
The objects are locked by an unspecified series of calls to lock, try_lock, and unlock. [...]
Side note: another, much simpler algorithm avoiding dead locks is always locking the bank account with e. g. lower account number (AN) first. If a thread is waiting for the lock of higher AN, then the other thread holding it either already has both of the locks acquired or is waiting for the second – which cannot be the one of the first thread as it must have a yet higher AN.
This does not change much for arbitrary number of threads, any thread holding a lower lock is waiting for a higher one, if hold as well. If you draw a directed graph with edges from A to B if A is waiting for second lock that B holds, you'll get a (multi-) tree structure, but you won't ever have circular substructures (which would indicate a dead lock).
The bank account data structure has a lock for each account.
When transferring money from one account to another, we need to lock both accounts (since we are removing money from one and adding it to another). We would like this operation not to deadlock, so lock both at once using std::lock, since doing it like that ensures there isn't a deadlock.
After we finish the transaction we need to make sure we release the lock. This code does that using RAII. With the adopt_lock tag, we make the object adopt an already locked mutex (which will be released when lock1 falls out of scope).
With the defer_lock tag, we create a unique_lock for a currently unlocked mutex, with the intention of locking it later. Again, it will be unlocked when the unique_lock falls out of scope.
from and to are 2 accounts which may be used anywhere in the application separatelly.
By having mutex for each account, you make sure nobody uses from nor to accounts while you do the transfer.
lock_guard will release mutex when exiting from function.

Use mutex or not in a concurrent reading

I am programming in C++ in Linux and I am using pthreads library. I am using mutex to protect some shared variables but I am not sure if in this specific case it is necessary the use of mutex.
I have 3 threads. The shared variable is a string (global variable).
Thread1 changes it's value and afterwards, thread2 and thread3 read it's value and store in another string.
In this case, the string's value is only modified by one thread. Is still necessary the use of mutex to protect a shared variable in a concurrent read by two threads?
"Thread1 changes it's value and afterwards ..." -- if "afterwards" means that the other threads are created after the change, there's no need for a mutex; thread creation synchronizes memory. If it means anything else then you need some form of synchronization, in part because "afterwards" in different threads is meaningless without synchronization.
What you should use is a shared_mutex (get it from boost if you don't want to use C++14/17) (for C++14 there's a shared_timed_mutex that you could use). Then, you do a shared_lock if you want to read the string, and you do a unique_lock if you want to write on it.
If two shared locks meet, they don't collide and they don't block, but a shared lock and a unique lock collide and one of the locks blocks until the other finishes.
Since you are using pthreads, you can use a pthread_rwlock_t.
For updating the object, it would be locked using pthread_rwlock_wrlock() to get a write lock; all readers would access the object only after obtaining a shared read lock with pthread_rwlock_rdlock(). Since the write lock is exclusive, and the read lock is shared, you'd get the behavior you desire.
An example of the use of pthread_rwlock_t read/write locks can be found at http://www.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.genprogc/using_readwrite_locks.htm.
A good summary of the available calls for use on a pthread_rwlock_t lock can be found at https://docs.oracle.com/cd/E19455-01/806-5257/6je9h032u/index.html. I've reproduced the table listing the operations:
Operation
Initialize a read-write lock "pthread_rwlock_init(3THR)"
Read lock on read-write lock "pthread_rwlock_rdlock(3THR)"
Read lock with a nonblocking read-write lock "pthread_rwlock_tryrdlock(3THR)"
Write lock on read-write lock "pthread_rwlock_wrlock(3THR)"
Write lock with a nonblocking read-write lock "pthread_rwlock_trywrlock(3THR)"
Unlock a read-write lock "pthread_rwlock_unlock(3THR)"
Destroy a read-write lock "pthread_rwlock_destroy(3THR)"

Locking when accessing shared memory for reading

If I am accessing shared memory for reading only, to check a condition for an if() block, should I still lock the mutex? E.g.
mutex_lock();
if (var /* shared memory */) {
}
mutex_unlock();
Is locking here needed and good practice?
If the variable you are reading could be written to concurrently, then yes, you should acquire a lock on the mutex.
You could only read it atomically if your compiler provides you with the necessary primitives for that; this could be either the atomic features that come with C11 and C++11 or some other language extension provided by your compiler. Then you could move the mutex acquisition into the conditional, but if you wait until after the test to acquire the mutex then someone else may change it between the time you test it and the time you acquire the mutex:
if (example) {
// "example" variable could be changed here by another thread.
mutex_lock();
// Now the condition might be false!
mutex_unlock();
}
Therefore, I would suggest acquiring the mutex before the conditional, unless profiling has pinpointed mutex acquisition as a bottleneck. (And in the case where the tested variable is larger than a CPU register -- a 64-bit number on a 32-bit CPU, for example -- then you don't even have the option of delaying mutex acquisition without some other kind of atomic fetch or compare primitive.)

How to implement a recursive MRSW lock?

I need a fully-recursive multiple-reader/single-writer lock (shared mutex) for my project - I don't agree with the notion that if you have complete const-correctness you shouldn't need them (there was some discussion about that on the boost mailing list), in my case the lock should protect a completely transparent cache which would be mutable in any case.
As for the semantics of recursive MRSW locks, I think the only ones that make sense are that acquiring a exclusive lock in addition to a shared one temporarily releases the shared one, to be reacquired when the exclusive one is released.
Has the somewhat strange effect that unlocking can wait but I can live with that - writing rarely happens anyway and recursive locking usually only happens through recursive code paths, in which case the caller has to be prepared that the call might wait in any case. To avoid it one can still simply upgrade the lock instead of using recursive locking.
Acquiring a shared lock on top of an exclusive one should obviously just increases the lock count.
So the question becomes - how should I implement it? The usual approach with a critical section and two semaphores doesn't work here because - as far as I can see - the woken up thread has to handshake, by inserting it's thread id into the lock's owner map.
I suppose it would be doable with two condition variables and a couple of mutexes but the sheer amount of synchronization primitives that would end up using sounds like a bit too much overhead for my taste.
An idea which just sprang into my mind is to utilize TLS to remember the type of lock I'm holding (and possibly the local lock counts). Have to think it through - but I'll still post the question for now.
Target platform is Win32 but that shouldn't really matter. Note that I'm specifically targeting Win2k so anything related to the new MRSW lock primitive in Windows 7 is not relevant for me. :-)
Okay, I solved it.
It can be done with just 2 semaphores, a critical section and almost no more locking than for a regular non-recursive MRSW lock (there is obviously some more CPU-time spent inside the lock because that multimap must be managed) - but it's tricky. The structure I came up with looks like this:
// Protects everything that follows, except mWriterThreadId and mRecursiveUpgrade
CRITICAL_SECTION mLock;
// Semaphore to wait on for a read lock
HANDLE mSemaReader;
// Semaphore to wait on for a write lock
HANDLE mSemaWriter;
// Number of threads waiting for a write lock.
int mWriterWaiting;
// Number of times the writer entered the write lock.
int mWriterActive;
// Number of threads inside a read lock. Note that this does not include
// recursive read locks.
int mReaderActiveThreads;
// Whether or not the current writer obtained the lock by a recursive
// upgrade. Note that this member might be set outside the critical
// section, so it should only be read from by the writer during his
// unlock.
bool mRecursiveUpgrade;
// This member contains the current thread id once for each
// (recursive) read lock held by the current thread in addition to an
// undefined number of other thread ids which may or may not hold a
// read lock, even inside the critical section (!).
std::multiset<unsigned long> mReaderActive;
// If there is no writer this member contains 0.
// If the current thread is the writer this member contains his
// thread-id.
// Otherwise it can contain either of them, even inside the
// critical section (!).
// Also note that it might be set outside the critical section.
unsigned long mWriterThreadId;
Now, the basic idea is this:
Full update of mWriterWaiting and mWriterActive for an unlock is performed by the unlocking thread.
For mWriterThreadId and mReaderActive this is not possible, as the waiting thread needs to insert itself when it was released.
So the rule is, that you may never access those two members except to check whether you are holding a read lock or are the current writer - specifically it may not be used to checker whether or not there are any readers / writers - for that you have to use the (somewhat redundant but necessary for this reason) mReaderActiveThreads and mWriterActive.
I'm currently running some test code (which has been going on deadlock- and crash-free for 30 minutes or so) - when I'm sure that it's stable and I've cleaned up the code somewhat I'll put it on some pastebin and add a link in a comment here (just in case someone else ever needs this).
Well, I did some thinking. Starting from the simple "two semaphores and a critical section" one adds a writer lock count and a owning writer TID to the structure.
Unlock still set most of the new status in the critsec. Readers still normally increase the lock count - recursive locking simply adds a non-existing reader to the counter.
During writers lock() I compare the owning TID, and if the writer already own it the write lock counter is increased.
Setting the new writer TID can't be done by the unlock() - it doesn't know which one will be wakened, but if writers reset it back to zero in their unlock() it's not a problem - the current thread id won't ever be zero and setting it is an atomic operation.
All sounds simple enough - one nasty problem left: A recursive reader-reader lock while a writer is waiting will deadlock. And I don't know how to solve that short of doing a reader-biased lock... somehow I need to know whether or not I already own a reader lock.
Using TLS doesn't sound too great after I realized that the number if available slots might be rather limited...
As far as I understand, you need to provide your writer exclusive access to the data, while readers can operate simultaneously (if this is not what you want, please clarify your question).
I think you need to implement a sort of "inverse semaphore", i.e. a semaphore that will block a thread when positive, and signal all waiting threads when zero. If you do this, you can use two such semaphores for your program. The operation of your threads could then be the following:
Reader:
(1) wait on sem A
(2) increase sem B
(3) read operation
(4) decrease sem B
Writer:
(1) increase sem A
(2) wait on sem B
(3) write operation
(4) decrease sem A
In this way the writer will perform the write operation as soon as all pending readers have finished reading. As soon as your writer finishes, readers can resume their operation without blocking each other.
I am not familiar with Windows mutex/semaphore facilities but I can think of a way to implement such semaphores using the POSIX threads API (combining a mutex, a counter and a conditional variable).