Why only one upgradable lock can be held with shared locks

Why only one upgradable lock can be held with shared locks - c++

The boost documentation for upgradable and shared locks says that when shared locks are held only one other thread may acquire an upgradable lock. So that if other threads try to acquire an upgradable lock when shared lock(s) are held along with the upgradable lock they will block.
Is there some deadlock possibility that I am missing when more than one thread acquire upgradable locks along with one (or more than one shared lock)? Or is this just a logical requirement (so a "should not do this" sort of thing)?
Note that I am not talking about exclusively locked states. Only an upgradeable locked state. If an upgradable lock is held along with other shared locks it is in essence a READ lock. Then why can't two upgradable locks be held together?

Is there some deadlock possibility that I am missing when more than one thread acquire upgradable locks
TL;DR Yes, there is.
along with one (or more than one shared lock)
This doesn't really affect the deadlock possibility. Allowing shared locks while upgradeable lock exists is simply a feature of upgradeable locks, compared to an exclusive lock.
Let us first consider what upgradeable locks can be used for. We shall imagine the following situation:
Several writer threads must check a condition (a read operation), then depending on that, modify the state
Checking the condition is expensive
The condition is satisfied rarely.
Other threads also read the state.
Now, let us consider that we only have reader (shared) / writer (exclusive) locks, with no upgradeable lock:
Writer takes an exclusive lock, and starts checking the condition
Other threads must block while the expensive check operation is running - even if they only need to read.
It may be considered a disadvantage that the read portion of the check - write cycle blocks even reading threads. So, let us consider an alternative:
Writer takes a shared lock, and starts checking the condition
Other threads may also take shared locks
Writer has checked the condition, and releases the read lock
The condition was satisfied, and writer now tries to take an exclusive lock to proceed
Between 3. and 4. more than one writer may have finished checking the state - since they can check concurrently - and are now racing to take the exclusive lock. Only one can win, and others must block. While they are blocking, the winner is modifying the state.
In this situation, the writers waiting to grab the exclusive lock can not assume that the condition that they checked is valid any more! Another writer may have grabbed the lock before them, and the state has now been modified. They could ignore this, which might lead to undefined behaviour, depending on what the relation with the condition and the modification is. Or, they could check the condition again when they get the exclusive lock, which reverts us back to the first approach, except with potentially redundant checks that were useless because of the race. Either way, this approach is worse than the first one.
A solution for the situation above is the privileged read (upgradeable) lock:
Writer takes a privileged lock, and starts checking the condition
Other threads may take shared locks
Writer has checked the condition which was satisfied, and upgrades to exclusive lock (must block until other locks have been released)
Let us consider a situation where multiple writers have been granted a privileged lock. They get to check the expensive condition concurrently, which is nice, but they still have a race to upgrade the lock. This time, the race results in a dead lock, because each writer hold the read lock, and wait for all read locks to be released before they can upgrade.
If the upgradeable lock is exclusive in relation to other upgradeable locks, then this dead lock cannot occur, and the race doesn't exist between the expensive check and the modification, but reader threads may still operate while the writer is checking.

Related

What does a read lock do in C++?

When using a shared_mutex, there is exclusive access and shared access. Exclusive access only allows one thread to access the resource while others are blocked until the thread holding the lock releases the lock. A shared access is when multiple threads are allowed to access the resource but under a "read lock". What is a "read lock"? I don't understand the meaning of a read lock. Can someone give code examples of a "read lock".
My guess of a read lock: I thought that a read lock only allows threads to run the code and not modify anything. Apparently I was wrong when I tried some code and it didn't work as I thought.
Again, can someone help me understand what a read lock is.
Thanks.

Your guess is very close to being right but there's a slight subtlety.
The read/write distinction shows intent of the locker, it does not limit what the locker actually does. If a read-locker decides to modify the protected resource, it can, and hilarity will ensue. However, that's no different to a thread modifying the protected resource with no lock at all. The programmer must follow the rules, or pay the price.
As to the distinction, a read lock is for when you only want to read the shared resource. A billion different threads can do this safely because the resource is not changing while they do it.
A write lock, on the other hand, means you want to change the shared resource, so there should be no readers (or other writers) with an active lock.
It's a way of optimising locks based on usage. For example, consider the following lock queue, with the earliest entries on the right, and no queue jumping (readers jumping in front of writers if the writer is currently blocked) allowed:
(queue here) -> RR W RRRR -> (action here)
The "resource controller" can allow all four R lockers at the queue head access at the same time.
The W locker will then have to wait until all of those locks are released before gaining access.
The final two R lockers must wait until the W locker releases but can then both get access concurrently.
In other words, the possible lock holders at any given time are (mutually exclusive):
any number of readers.
one writer.
As an aside, there are other possible optimisation strategies you can use as well.
For example (as alluded to earlier), you may want a reader to be able to jump the queue if it looks like the writer would have to wait a while anyway (such as there being a large number of current readers).
That's fraught with danger since you could starve a write locker, but it can be done provided there are limits on the number of queue jumpers.
I have done it in the past. From memory (a long ago memory of an embedded OS built in BCPL for the 6809 CPU, which shows more of my age than I probably want to), the writer included with its lock request the maximum number of queue jumpers it was happy with (defaulting to zero), and the queueing of future readers took that into account.

A write-lock only allows one lock to be acquired at a time.
A read-lock only allows other read-locks to be acquired at the same time, but prevents write-locks.
If you try to acquire a write-lock, you will be forced to wait until every other lock (read or write) has been released. Once you have the write-lock, no one else can get a lock of either kind.
If you try to acquire a read-lock, you will have to wait for any write-lock to be released. If there is no write-lock, you will obtain the read-lock, regardless how many other threads have a read-lock.
Whether or not you modify data is entirely up to you to police. If you have a read-lock, you should not be modifying shared data.

A read lock is just another way of saying that a thread of execution has acquired shared ownership of the mutex. It says nothing about what that thread can actually do.
It is up to the programmer not to update whatever shared resource is being protected by the mutex under a shared lock. To do that, acquire an exclusive lock.

Does C++ <mutex> header use hardware support when enforcing concurrency or is purely an algorithm based solution

How is C++ mutex implemented under the hood ? Does it only use Decker's, Peterson's or other algorithms to enforce mutual exclusion or does it also use hardware support such as interrupt ad compare-and-swap (CAS).
Is it possible to implement an entire multi-threading library consisting of mutex and condition variable without any hardware support ?

Standard requires that there are atomic operations. Not necessarily CAS, but at least exchange. std::atomic_flag is required to be true atomic, although CAS is superfluous for it, simple exchange is satisfactory.
Note that any algorithm such as Decker's, Peterson's or other would at least require atomic load and atomic store, and they will not work if load and store are non-atomic. Also non-atomic types do not enforce memory ordering, which is implied by those algorithms.
It is not really required that std::mutex will be based on operating system calls, and that there are operating system calls at all.
In theory, std::mutex could be spinning only mutex.
It could also be blocking only mutex.
In practice a good implementation would first try to handle blocking with atomic, but when it is resorted to wait, it would do OS wait.

It asks the operating system to do it.
On Linux that'll probably be using pthreads (How does pthread implemented in linux kernel 3.2?).
On Windows it's with the Windows API. You can try asking Microsoft about that one.
The standard library implementation won't be making direct hardware access. This is what operating systems are for.

On Linux, when the mutex is uncontended, the lock and unlock happen in user-space only using atomic instructions, such as compare-and-swap. In a contended case a call into the kernel is required to wait on the mutex till it's been unlocked (on lock) or to wake up waiters (on unlock). Locking user-space mutexes does not disable interrupts.
Here is the source code for low-level mutex primitives in glibc, the comments are instructive:
/* Low-level locks use a combination of atomic operations (to acquire and
release lock ownership) and futex operations (to block until the state
of a lock changes). A lock can be in one of three states:
0: not acquired,
1: acquired with no waiters; no other threads are blocked or about to block
for changes to the lock state,
>1: acquired, possibly with waiters; there may be other threads blocked or
about to block for changes to the lock state.
We expect that the common case is an uncontended lock, so we just need
to transition the lock between states 0 and 1; releasing the lock does
not need to wake any other blocked threads. If the lock is contended
and a thread decides to block using a futex operation, then this thread
needs to first change the state to >1; if this state is observed during
lock release, the releasing thread will wake one of the potentially
blocked threads.
Much of this code takes a 'private' parameter. This may be:
LLL_PRIVATE: lock only shared within a process
LLL_SHARED: lock may be shared across processes.
Condition variables contain an optimization for broadcasts that requeues
waiting threads on a lock's futex. Therefore, there is a special
variant of the locks (whose name contains "cond") that makes sure to
always set the lock state to >1 and not just 1.
Robust locks set the lock to the id of the owner. This allows detection
of the case where the owner exits without releasing the lock. Flags are
OR'd with the owner id to record additional information about lock state.
Therefore the states of robust locks are:
0: not acquired
id: acquired (by user identified by id & FUTEX_TID_MASK)
The following flags may be set in the robust lock value:
FUTEX_WAITERS - possibly has waiters
FUTEX_OWNER_DIED - owning user has exited without releasing the futex. */
And pthread_mutex_lock code.
Mutex implementation requires multiple threads to agree on a value of a shared variable - the mutex, whether it is locked or unlocked. This is also known as consensus problem and it is not possible to solve it using only atomic loads and stores, a compare-and-swap is required.

How to make windows slim read writer lock fair?

i found out that windows implemented a slim reader-writer-lock (see https://msdn.microsoft.com/en-us/library/windows/desktop/aa904937%28v=vs.85%29.aspx ). Unfortunately (for me) this rw-lock is neither fifo nor is it fair (in any sense).
Is there a possibility to make the windows rw-lock with some workaround fair or fifo?
If not, in which scenarios would you use the windows slim rw-lock?

It is unlikely you can change the slim lock itself to be fair, especially since the documentation doesn't indicate any method of doing so, and most locks today are unfair for performance reasons.
That said, it is fairly straightforward to roll your own approximately FIFO lock with Windows events, and a 64-bit control word that you manipulate with compare and swap that is still very slim. Here's an outline:
The state of the lock is reflected in the control word is manipulated atomically to transition between the states, and allows threads to enter the lock (if allowed) with a single atomic operation and no kernel switch (that's the performance part of "slim"). The reset events are used to notify waiting threads, when threads need to block and can be allocated on-demand (that's the low memory footprint of slim).
The lock control word has the follow states:
Free - no readers or writers, and no waiters. Any thread can acquire the lock for reading or writing by atomically transitioning the lock into state (2) or (3).
N readers in the lock. There are N readers in the lock at the moment. New readers can immediately acquire the lock by adding 1 to the count - use a field of 30-bits or so within the control word to represent this count. Writers must block (perhaps after spinning). When readers leave the lock, they decrement the count, which may transition to state (1) when the last reader leaves (although they don't need to do anything special in a (2) -> (1) transition).
State (2) + waiting writers + 0 or more waiting readers. In this state, there are 1 or more readers still in the lock, but at least one waiting writer. The writers should wait on a manual-reset event, which is designed, although not guaranteed, to be FIFO. There is a field in the control word to indicate how many writers are waiting. In this state, new readers that want to enter the lock cannot, and set a reader-waiting bit instead, and block on the reader-waiting event. New writers increment the waiting writer count and block on the writer-waiting event. When the last reader leaves (setting the reader-count field to 0), it signals the writer-waiting event, releasing the longest-waiting writer to enter the lock.
Writer in the lock. When a writer is in the lock, all readers queue up and wait on the reader-waiting event. All incoming writers increment the waiting-writer count and queue up as usual on the writer-waiting event. There may even be some waiting readers when the writer acquires the lock because of state (3) above, and these are treated identically. When the writer leaves the lock, it checks for waiting writers and readers and either unblocks a writer or all readers, depending on policy, discussed below.
All the state transitions discussed above are done atomically using compare-and-swap. The typical pattern is that any of the lock() or unlock() calls look at the control word, determine what state they are in and what transition needs to happen (following the rules above), calculate the new control word in a temporary then attempt to swap in the new control word with compare-and-swap. Sometimes that attempt fails because another thread concurrently modified the control word (e.g., another reader entered the lock, incrementing the reader count). No problem, just start over from "determine state..." and try again. Such races are rare in practice since the state word calculation is very short, and that's just how things work with CAS-based complex locks.
This lock design is "slim" is almost every sense. Performance-wise, it is near the top of what you can get for a general purpose design. In particular, the common fast-paths of (a) reader entering the lock with 0 or more readers already in the block (b) reader leaving the lock with 0 or more readers still in the lock and (c) writer entering/leaving an uncontended lock are all about as fast as possible in the usual case: a single atomic operation. Furthermore, the reader entry and exit paths are "lock free" in the sense that incoming readers do not temporarily take an mutex internal to the rwlock, manipulate state, and then unlock it while entering/leaving the lock. This approach is slow and subject to issues whenever a reader thread performs a context switch at the critical moment in holds the internal lock. Such approaches do not scale to heaver reader activity with a short rwlock critical section: even though multiple readers can, in theory, enter the critical section, they all bottleneck on entering and leaving the internal lock (which happens twice for every enter/exit operation) and performance is worse than a normal mutex!
It is also lightweight in that it only needs a couple of Windows Event objects, and these objects can be allocated on demand - they are only needed when contention occurs and a state transition that requires blocking is about to occur. That's how CRITICAL_SECTION objects work.
The lock above is fair in the sense that readers won't block writers, and writers are served in FIFO order. How writers interact with waiting readers is up to your policy for who to unblock when the lock becomes free after a writer unlocks and there are both waiting readers and writers. On simple policy is to unblock all waiting readers.
In this design, writers will alternate in FIFO order with FIFO batches of readers. Writers are FIFO relative to other writers, and reader batches are FIFO relative to other reader batches, but the relationship between writers and readers isn't exactly FIFO: because all incoming readers are added to the same reader-waiting set, in the case that there are already several waiting writers, arriving readers all go into the next "batch" to be released, which actually puts them ahead of writers that are already waiting. That's quite reasonable though: readers all go at once, so adding more readers to the batch doesn't necessary cost much and probably increases efficiency, and if you did serve everything thread in strict FIFO order, the lock would reduce in behavior to a simple mutex under contention.
Another possible design is to always unblock writers if any are waiting. This favors writers at the expense of readers and does mean that a never-ending stream of writers could block out readers indefinitely. This approach makes sense where you know your writes are latency sensitive important and you either aren't worried about reader starvation, or you know it can't occur due to the design of your application (e.g., because there is only one possible writer at a time).
Beyond that, there are a lot of other policies possible, such as favoring writers up until readers have been waiting for a certain period, or limiting reader batch sizes, or whatever. They are mostly possible to implement efficiently since the bookkeeping is generally limited to the slow paths where threads will block anyway.
I've glossed over some implementation details and gotchas here (in particular, the need to be careful when making the transitions that involve blocking to avoid "missed wakeup" issues) - but this definitely works. I've written such a lock before the slim rwlock existed to fill the need for a fast high-performance rwlock, and it performs very well. Other tradeoffs are possible too, e.g., for designs in which reads are expected to dominate, contention can be reduced by splitting up the control word across cache lines, at the cost of more expensive write operations.
One final note - this lock is a bit fatter, in memory use, than the Windows one in the case that is contended - because it allocates one or two windows Events per lock, while the slim lock avoids this. The slim lock likely does it by directly supporting the slim lock behavior in the kernel, so the control word can directly be used as part of the kernel-side waitlist. You can't reproduce that exactly, but you can still remove the per-lock overhead in another way: use thread-local storage to allocate your two events per thread rather than per lock. Since a thread can only be waiting on one lock at a time, you only need this structure one per thread. That brings it into line with slim lock in memory use (unless you have very few locks and a ton of threads).

this rw-lock is neither fifo nor is it fair
I wouldn't expect anything to do with threading to be "fair" or "fifo" unless it said it was explicitly. In this case, I would expect writing locks to take priority, as it might never be able to obtain a lock if there are a lot of reading threads, but then I also wouldn't assume that's the case, and I've not read the MS documentation for a better understanding.
That said, it sounds like your issue is that you have a lot of contention on the lock, caused by write threads; because otherwise your readers would always be able to share the lock. You should probably examine why your writing thread(s) are trying to hold the lock for so long; buffering up adds for example will help mitigate this.

boost interprocess condition blocking on notify_all

I have a managed shared memory segment which has a boost::interprocess::interprocess_mutex and a boost::interprocess::interprocess_condition variable. I have 2 processes accessing the shared memory and they are synchronizing access based on the mutex and condition. I have come across a case where my first process blocks on notify_all method, initially I thought this was a non blocking method but it seems the interprocess condition implements a mutex which is used to synchronize itself.
The case where I get this deadlock is when process 2 is killed ungracefully while it is waiting on the condition, this I believe prevents the conditions mutex from being unlocked and then when I run process 2 again it blocks. Is there anyway to reset or clean up the interprocess condition the second time I start process 2.

http://www.boost.org/doc/libs/1_48_0/boost/interprocess/sync/interprocess_mutex.hpp
Are you using timed lock?
For a simple dead-lock avoidance algorithm take a look here: Wikipedia
It's for threads but I believe it can be used with interprocess_locks.
Recursively, only one thread is allowed to pass through a lock.
If other threads enter the lock, they must wait until the initial thread
that passed through completes n number of times. But if the number of
threads that enter locking equal the number that are locked, assign
one thread as the super-thread, and only allow it to run (tracking the
number of times it enters/exits locking) until it completes.
After a super-thread is finished, the condition changes back to using the logic from the recursive lock, and the exiting super-thread
sets itself as not being a super-thread
notifies the locker that other locked, waiting threads need to re-check this condition
If a deadlock scenario exists, set a new super-thread and follow that logic. Otherwise, resume regular locking.
Note that the above algorithm doesn't solve livelock situations, to prevent such behaviour use a semaphore if possible.
I was stunned to notice that interprocess_mutex doesn't support implement deadlock avoidance algorithms since these days, most mutex i.e std::mutex and boost::mutex already do.
I guess it's OS specific limitations.
For more flexibility try using a named_upgradable_mutex
Use a timed lock catch the exception when the process crashes and remove the upgradable mutex!. This type also allows elevated priviledges to be obtained by either thread!

std::condition_variable::notify_all() guarantees

Suppose there are N waiting threads (readers) on a condition variable which are notified by another thread (producer). Now all the N readers will proceed by trying to own the unique_lock they refers to, one at a time. Now suppose the producer wants to lock the same unique_lock again, for some reasons, before any of those woken readers even started own the lock. By the standard, is there any guarantee that the producer will successfully (try to) enter its critical section only after all the notified readers have started their locking step?

There is no guarantee about scheduling, other than the rather fuzzy one given at §1.10 paragraph 2:
Implementations should ensure that all unblocked threads eventually
make progress. [ Note: Standard library functions may silently block
on I/O or locks. Factors in the execution environment, including
externally-imposed thread priorities, may prevent an implementation
from making certain guarantees of forward progress. —end note ]
If you want to make sure no reader acquires the lock before the producer, you can simply acquire the lock before notifying.
If you want to make sure the producer can only acquire the lock after all the readers are done with it you need some more complex synchronization, possibly involving some kind of counter.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js