goroutine blocks when calling RWMutex RLock twice after an RWMutex Unlock

goroutine blocks when calling RWMutex RLock twice after an RWMutex Unlock - concurrency

var mu sync.RWMutex
go func() {
mu.RLock()
defer mu.RUnlock()
mu.RLock() // In my real scenario this second lock happened in a nested function.
defer mu.RUnlock()
// More code.
}()
mu.Lock()
mu.Unlock() // The goroutine above still hangs.
If a function read-locks a read/write mutex twice, while another function write-locks and then write-unlocks that same mutex, the original function still hangs.
Why is that? Is it because there's a serial order in which mutexes allow code to execute?
I've just solved a scenario like this (which took me hours to pinpoint) by removing the second mu.RLock() line.

This is one of several standard behaviours for a read-write lock. What Wikipedia calls "Write-preferring RW locks".
The documentation for sync's RWMutex.Lock says:
To ensure that the lock eventually becomes available, a blocked Lock call excludes new readers from acquiring the lock.
Otherwise a series of readers that each acquired the read lock before the previous released it could starve out writes indefinitely.
This means that it is always unsafe to call RLock on a RWMutex that the same goroutine already has read locked. (Which by the way is also true of Lock on regular mutexes as well, as Go's mutexes do not support recursive locking.)
The reason it is unsafe is that if the goroutine ever blocks getting the second read lock (due to a blocked writer) it will never release the first read lock. This will cause every future lock call on the mutex to block forever, deadlocking part or all of the program. Go will only detect a deadlock if all goroutines are blocked.

Related

std::condition_variable memory writes visibility

Does std::condition_variable::notify_one() or std::condition_variable::notify_all() guarantee that non-atomic memory writes in the current thread prior to the call will be visible in notified threads?
Other threads do:
{
std::unique_lock lock(mutex);
cv.wait(lock, []() { return values[threadIndex] != 0; });
// May a thread here see a zero value and therefore start to wait again?
}
Main thread does:
fillData(values); // All values are zero and all threads wait() before calling this.
cv.notify_all(); // Do need some memory fence or lock before this
// to ensure that new non-zero values will be visible
// in other threads immediately after waking up?
Doesn't notify_all() store some atomic value therefore enforcing memory ordering? I did not clarified it.
UPD: according to Superlokkus' answer and an answer here: we have to acquire a lock to ensure memory writes visibility in other threads (memory propagation), otherwise threads in my case may read zero values.
Also I missed this quote here about condition_variable, which specifically answers my question. Even an atomic variable has to be modified under a lock in a case when the modification must become visible immediately.
Even if the shared variable is atomic, it must be modified under the
mutex in order to correctly publish the modification to the waiting
thread.

I guess you are mixing up memory ordering of so called atomic values and the mechanisms of classic lock based synchronization.
When you have a datum which is shared between threads, lets say an int for example, one thread can not simply read it while the other thread might be write to it meanwhile. Otherwise we would have a data race.
To get around this for long time we used classic lock based synchronization:
The threads share at least a mutex and the int. To read or to write any thread has to hold the lock first, meaning they wait on the mutex. Mutexes are build so that they are fine that this can happen concurrently. If a thread wins gettting the mutex it can change or read the int and then should unlock it, so others can read/write too. Using a conditional variable like you used is just to make the pattern "readers wait for a change of a value by a writer" more efficient, they get woken up by the cv instead of periodically waiting on the lock, reading, and unlocking, which would be called busy waiting.
So because you hold the lock in any after waiting on the mutex or in you case, correctly (mutex is still needed) waiting on the conditional variable, you can change the int. And readers will read the new value after the writer was able to wrote it, never the old. UPDATE: However one thing if have to add, which might also be the cause of confusion: Conditional variables are subject for so called spurious wakeups. Meaning even though you write did not have notified any thread, a read thread might still wake up, with the mutex locked. So you have to check if you writer actually waked you up, which is usually done by the writer by changing another datum just to notify this, or if its suitable by using the same datum you already wanted to share. The lambda parameter overload of std::condition_variable::wait was just made to make the checking and going back to sleep code looking a bit prettier. Based on your question now I don't know if you want to use you values for this job.
However at snippet for the "main" thread is incorrect or incomplete:
You are not synchronizing on the mutex in order to change values.
You have to hold the lock for that, but notifying can be done without the lock.
std::unique_lock lock(mutex);
fillData(values);
lock.unlock();
cv.notify_all();
But these mutex based patters have some drawbacks and are slow, only one thread at a time can do something. This is were so called atomics, like std::atomic<int> came into play. They can be written and read at the same time without an mutex by multiple threads concurrently. Memory ordering is only a thing to consider there and an optimization for cases where you uses several of them in a meaningful way or you don't need the "after the write, I never see the old value" guarantee. However with it's default memory ordering memory_order_seq_cst you would also be fine.

What does std::mutex prevent threads from modifying?

What part of memory gets locked by mutex when .lock() or .try_lock(), is it just the function or is it the whole program that gets locked?

Nothing is locked except the mutex. Everything else continues running (until it tries to lock an already locked mutex that is). The mutex is only there so that two threads cannot run the code between a mutex lock and a mutex unlock at the same time.

A mutex doesn't really lock anything, except for itself. You can think of a mutex as being a gate where you can only unlock it from the inside. When the gate is locked, any thread that tries to lock the mutex will sit there at the gate and wait for the current thread that is behind the gate to unlock it and let them in. When they gate is not locked then when you call lock you can just go in, close and lock the gate, and now no threads can get past the gate until you unlock it and let them in.

A mutex doesn't lock anything. You just use a mutex to communicate to other parts of your code that they should consider whatever you decide needs to be protected from access by several threads at the same time to be off-limits for now.
You could think of a mutex as something like a boolean okToModify. Whenever you want to edit something, you check if okToModify is true. If it is, you set it to false (preventing any other threads from modifying it), change it, then set okToModify back to true to tell the other threads you're done and give them a chance to modify:
// WARNING! This code doesn't actually work as a lock!
// it is just an example of the concept.
struct LockedInt {
bool okToModify; // This would be your mutex instead of a bool.
int integer;
};
struct LockedInt myLockedInt = { true, 0 };
...
while (myLockedInt.okToModify == false)
; // wait doing nothing until whoever is modifying the int is done.
myLockedInt.okToModify = false; // Prevent other threads from getting out of while loop above.
myLockedInt.integer += 1;
myLockedInt.okToModify = true; // Now other threads get out of the while loop if they were waiting and can modify.
The while loop and okToModify = false above is basically what locking a mutex does, and okToModify = true is what unlocking a mutex does.
Now, why do we need mutexes and don't use booleans? Because a thread could be running at the same time as those three lines above. The code for locking a mutex actually guarantees that the waiting for okToModify to become true and setting okToModify = false happen in one go, and therefore no other thread can get "in between the lines", for example by using a special machine-code instruction called "compare-and-exchange".
So do not use booleans instead of mutexes, but you can think of a mutex as a special, thread-safe boolean.

m.lock() doesn't really lock anything. What it does is, it waits to take ownership of the mutex. A mutex always either is owned by exactly one thread or else it is available. m.lock() waits until the mutex becomes available, and then it takes ownership of it in the name of the calling thread.
m.unlock releases the mutex (i.e., it relinquishes ownership), causing the mutex to once again become available.
Mutexes also perform another very important function. In modern C++, when some thread T performs a sequence of assignments of various values to various memory locations, the system makes no guarantees about when other threads U, V, and W will see those assignments, whether the other threads will see the assignments happen in the same order in which thread T performed them, or even, whether the other threads will ever see the assignments.
There are some quite complicated rules governing things that a programmer can do to ensure that different threads see a consistent view of shared memory objects (Google "C++ memory model"), but here's one simple rule:
Whatever thread T did before it releases some mutex M is guaranteed to be visible to any other thread U after thread U subsequently locks the same mutex M.

lock guard - will it queue multiple request

I have a four member functions that can be called multiple times asynchronously from other piece of code - but since these functions are making use of its class member variables, I need to ensure that until one call execution is not over the second should not start but be in queue.
I have heard of lock guard feature in C++ that make a code block - in my case as automatic lock for a duration for a function :
void DoSomeWork()
{
std::lock_guard<std::mutex> lg(m); // Lock will be held from here to end of function
--------;
return;
}
Since my four class methods do independent work should I have four mutex one for each lock guard for each member function. Will the async calls made be in some sort of queue if a lock guard is active?
I mean if there are say 10 calls made to that member method at same time - so once 1st call acquires the lock guard the remaining 9 call request will wait until lock is free and take up execution one by one?

If a mutex is locked, the next request to lock it will block until the the previous thread holding the lock has unlocked it.
Note that attempting to lock a mutex multiple times from a single thread is undefined behavior. Don't do that.
For more information see e.g. this std::mutex reference.

Assuming you mean multiple threads issuing locks for the same mutex, based on prior questions, there's no queuing for pthreads or posix synchronization types. Say multiple threads each have a loop that starts with a lock and ends with an unlock, looping right back to the lock request, in which case the same thread can keep getting the lock, and none of the other threads will run (there's a very small chance that a time slice could occur between the unlock and lock, switching context to another thread). Using conditional variables also have an issue with spurious wakeup.
https://en.wikipedia.org/wiki/Spurious_wakeup
Based on testing, Windows native synchronization types, (CreateMutex, CreateSemaphore, WaitForSingleObject, WaitForMultipleObjects) do queue requests, but I haven't found it documented.
Some server applications on some operating systems will install a device driver that runs at kernel level in order to workaround the limitations of synchronization types on those operating systems.

unlock the mutex after condition_variable::notify_all() or before?

Looking at several videos and the documentation example, we unlock the mutex before calling the notify_all(). Will it be better to instead call it after?
The common way:
Inside the Notifier thread:
//prepare data for several worker-threads;
//and now, awaken the threads:
std::unique_lock<std::mutex> lock2(sharedMutex);
_threadsCanAwaken = true;
lock2.unlock();
_conditionVar.notify_all(); //awaken all the worker threads;
//wait until all threads completed;
//cleanup:
_threadsCanAwaken = false;
//prepare new batches once again, etc, etc
Inside one of the worker threads:
while(true){
// wait for the next batch:
std::unique_lock<std::mutex> lock1(sharedMutex);
_conditionVar.wait(lock1, [](){return _threadsCanAwaken});
lock1.unlock(); //let sibling worker-threads work on their part as well
//perform the final task
//signal the notifier that one more thread has completed;
//loop back and wait until the next task
}
Notice how the lock2 is unlocked before we notify the condition variable - should we instead unlock it after the notify_all() ?
Edit
From my comment below: My concern is that, what if the worker spuriously awakes, sees that the mutex is unlocked, super-quickly completes the task and loops back to the start of while. Now the slow-poke Notifier finally calls notify_all(), causing the worker to loop an additional time (excessive and undesired).

There are no advantages to unlocking the mutex before signaling the condition variable unless your implementation is unusual. There are two disadvantages to unlocking before signaling:
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
There is a performance penalty for unlocking before signaling that is avoided by unlocking after signaling. If you signal before you unlock, a good implementation will know that your signal cannot possibly render any thread ready-to-run because the mutex is held by the calling thread and any thread affects by the condition variable necessarily cannot make forward progress without the mutex. This permits a significant optimization (often called "wait morphing") that is not possible if you unlock first.
So signal while holding the lock unless you have some unusual reason to do otherwise.

should we instead unlock it after the notify_all() ?
It is correct to do it either way but you may have different behavior in different situations. It is quite difficult to predict how it will affect performance of your program - I've seen both positive and negative effects for different applications. So it is better you profile your program and make decision on your particular situation based on profiling.

As mentioned here : cppreference.com
The notifying thread does not need to hold the lock on the same mutex
as the one held by the waiting thread(s); in fact doing so is a
pessimization, since the notified thread would immediately block
again, waiting for the notifying thread to release the lock.
That said, documentation for wait
At the moment of blocking the thread, the function automatically calls
lck.unlock(), allowing other locked threads to continue.
Once notified (explicitly, by some other thread), the function
unblocks and calls lck.lock(), leaving lck in the same state as when
the function was called. Then the function returns (notice that this
last mutex locking may block again the thread before returning).
so when notified wait will re-attempt to gain the lock and in that process it will get blocked again till original notifying thread releases the lock.
So I'll suggest that release the lock before calling notify. As done in example on cppreference.com and most importantly
Don't be Pessimistic.

David's answer seems to me wrong.
First, assuming the simple case of two threads, one waiting for the other on a condition variable, unlocking first by the notifier will not waken the other waiting thread, as the signal has not arrived. Then the notify call will immediately waken the waiting thread. You do not need any special optimizations.
On the other hand, signalling first has the potential of waking up a thread and making it sleep immediately again, as it cannot hold the lock—unless wait morphing is implemented.
Wait morphing does not exist in Linux at least, according to the answer under this StackOverflow question: Which OS / platforms implement wait morphing optimization?
The cppreference example also unlocks first before signalling: https://en.cppreference.com/w/cpp/thread/condition_variable/notify_all
It explicit says:
The notifying thread does not need to hold the lock on the same mutex as the one held by the waiting thread(s). Doing so may be a pessimization, since the notified thread would immediately block again, waiting for the notifying thread to release the lock, though some implementations recognize the pattern and do not attempt to wake up the thread that is notified under lock.

should we instead unlock it after the notify_all() ?
After reading several related posts, I've formed the opinion that it's purely a performance issue. If OS supports "wait morphing", unlock after; otherwise, unlock before.
I'm adding an answer here to augment that of #DavidSchwartz 's. Particularly, I'd like to clarify his point 1.
If you unlock before you signal, the signal may wake a thread that choose to block on the condition variable after you unlocked. This can lead to a deadlock if you use the same condition variable to signal more than one logical condition. This kind of bug is hard to create, hard to diagnose, and hard to understand. It is trivially avoided by always signaling before unlocking. This ensures that the change of shared state and the signal are an atomic operation and that race conditions and deadlocks are impossible.
The 1st thing I said is that, because it's a CV and not a Mutex, a better term for the so-called "deadlock" might be "sleep paralysis" - a mistake some programs make is that
a thread that's supposed to wake
went to sleep due to not rechecking the condition it's been waiting for before wait'ng again.
The 2nd thing is that, when waking some other thread(s),
the default choice should be broadcast/notify_all (broadcast is the POSIX term, which is equivalent to its C++ counterpart).
signal/notify is an optimized special case used for when there's only 1 other thread is waiting.
Finally 3rd, David is adamant that
it's better to unlock after notify,
because it can avoid the "deadlock" which I've been referring to as "sleep paralysis".
If it's unlock then notify, then there's a window where another thread (let's call this the "wrong" thread) may i.) acquire the mutex, ii.)going into wait, and iii.) wake up. The steps i. ii. and iii. happens too quickly, consumed the signal, leaving the intended (let's call it "correct") thread in sleep.
I discussed this extensively with David, he clarified that only when all 3 points are violated ( 1. condvar associated with several separate conditions and/or didn't check it before waiting again; 2. signal/notify only 1 thread when there're more than 1 other threads using the condvar; 3. unlock before notify creating a window for race condition ), the "sleep paralysis" would occur.
Finally, my recommendation is that, point 1 and 2 are essential for correctness of the program, and fixing issues associated with 1 and 2 should be prioritized over 3, which should only be a augmentative "last resort".
For the purpose of providing reference, manpage for signal/broadcast and wait contains some info from version 3 of Single Unix Specification that gave some explanations on point 1 and 2, and partly 3. Although specified for POSIX/Unix/Linux in C, it's concepts are applicable to C++.
As of this writing (2023-01-31), the 2018 edition of version 4 of Single Unix Specification is released, and the drafting of version 5 is underway.

Why do both the notify and wait function of a std::condition_variable need a locked mutex

On my neverending quest to understand std::contion_variables I've run into the following. On this page it says the following:
void print_id (int id) {
std::unique_lock<std::mutex> lck(mtx);
while (!ready) cv.wait(lck);
// ...
std::cout << "thread " << id << '\n';
}
And after that it says this:
void go() {
std::unique_lock<std::mutex> lck(mtx);
ready = true;
cv.notify_all();
}
Now as I understand it, both of these functions will halt on the std::unqique_lock line. Until a unique lock is acquired. That is, no other thread has a lock.
So say the print_id function is executed first. The unique lock will be aquired and the function will halt on the wait line.
If the go function is then executed (on a separate thread), the code there will halt on the unique lock line. Since the mutex is locked by the print_id function already.
Obviously this wouldn't work if the code was like that. But I really don't see what I'm not getting here. So please enlighten me.

What you're missing is that wait unlocks the mutex and then waits for the signal on cv.
It locks the mutex again before returning.
You could have found this out by clicking on wait on the page where you found the example:
At the moment of blocking the thread, the function automatically calls lck.unlock(), allowing other locked threads to continue.
Once notified (explicitly, by some other thread), the function unblocks and calls lck.lock(), leaving lck in the same state as when the function was called.

There's one point you've missed—calling wait() unlocks the mutex. The thread atomically (releases the mutex + goes to sleep). Then, when woken by the signal, it tries to re-acquire the mutex (possibly blocking); once it acquires it, it can proceed.
Notice that it's not necessary to have the mutex locked for calling notify_*, only for wait*

To answer the question as posed, which seems necessary regarding claims that you should not acquire a lock on notification for performance reasons (isn't correctness more important than performance?): The necessity to lock on "wait" and the recommendation to always lock around "notify" is to protect the user from himself and his program from data and logical races. Without the lock in "go", the program you posted would immediately have a data race on "ready". However, even if ready were itself synchronized (e.g. atomic) you would have a logical race with a missed notification, because without the lock in "go" it is possible for the notify to occur just after the check for "ready" and just before the actual wait, and the waiting thread may then remain blocked indefinitely. The synchronization on the atomic variable itself is not enough to prevent this. This is why helgrind will warn when a notification is done without holding the lock. There are some fringe cases where the mutex lock is really not required around the notify. In all of these cases, there needs to be a bidirectional synchronization beforehand so that the producing thread can know for sure that the other thread is already waiting. IMO these cases are for experts only. Actually, I have seen an expert, giving a talk about multi-threading, getting this wrong — he thought an atomic counter would suffice. That said, the lock around the wait is always necessary for correctness (or, at least, an operation that is atomic with the wait), and this is why the standard library enforces it and atomically unlocks the mutex on entering the wait.
POSIX condition variables are, unlike Windows events, not "idiot-proof" because they are stateless (apart from being aware of waiting threads). The recommendation to use a lock on the notify is there to protect you from the worst and most common screwups. You can build a Windows-like stateful event using a mutex + condition var + bool variable if you like, of course.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js