When to use recursive mutex?

When to use recursive mutex? - c++

I understand recursive mutex allows mutex to be locked more than once without getting to a deadlock and should be unlocked the same number of times. But in what specific situations do you need to use a recursive mutex? I'm looking for design/code-level situations.

For example when you have function that calls it recursively, and you want to get synchronized access to it:
void foo() {
... mutex_acquire();
... foo();
... mutex_release();
}
without a recursive mutex you would have to create an "entry point" function first, and this becomes cumbersome when you have a set of functions that are mutually recursive. Without recursive mutex:
void foo_entry() {
mutex_acquire(); foo(); mutex_release(); }
void foo() { ... foo(); ... }

Recursive and non-recursive mutexes have different use cases. No mutex type can easily replace the other. Non-recursive mutexes have less overhead, and recursive mutexes have in some situations useful or even needed semantics and in other situations dangerous or even broken semantics. In most cases, someone can replace any strategy using recursive mutexes with a different safer and more efficient strategy based on the usage of non-recursive mutexes.
If you just want to exclude other threads from using your mutex protected resource, then you could use any mutex type, but might want to use the non-recursive mutex because of its smaller overhead.
If you want to call functions recursively, which lock the same mutex, then they either
have to use one recursive mutex, or
have to unlock and lock the same non-recursive mutex again and again (beware of concurrent threads!) (assuming this is semantically sound, it could still be a performance issue), or
have to somehow annotate which mutexes they already locked (simulating recursive ownership/mutexes).
If you want to lock several mutex-protected objects from a set of such objects, where the sets could have been built by merging, you can choose
to use per object exactly one mutex, allowing more threads to work in parallel, or
to use per object one reference to any possibly shared recursive mutex, to lower the probability of failing to lock all mutexes together, or
to use per object one comparable reference to any possibly shared non-recursive mutex, circumventing the intent to lock multiple times.
If you want to release a lock in a different thread than it has been locked, then you have to use non-recursive locks (or recursive locks which explicitly allow this instead of throwing exceptions).
If you want to use synchronization variables, then you need to be able to explicitly unlock the mutex while waiting on any synchronization variable, so that the resource is allowed to be used in other threads. That is only sanely possible with non-recursive mutexes, because recursive mutexes could already have been locked by the caller of the current function.

I encountered the need for a recursive mutex today, and I think it's maybe the simplest example among the posted answers so far:
This is a class that exposes two API functions, Process(...) and reset().
public void Process(...)
{
acquire_mutex(mMutex);
// Heavy processing
...
reset();
...
release_mutex(mMutex);
}
public void reset()
{
acquire_mutex(mMutex);
// Reset
...
release_mutex(mMutex);
}
Both functions must not run concurrently because they modify internals of the class, so I wanted to use a mutex.
Problem is, Process() calls reset() internally, and it would create a deadlock because mMutex is already acquired.
Locking them with a recursive lock instead fixes the problem.

If you want to see an example of code that uses recursive mutexes, look at the sources for "Electric Fence" for Linux/Unix. 'Twas one of the common Unix tools for finding "bounds checking" read/write overruns and underruns as well as using memory that has been freed, before Valgrind came along.
Just compile and link electric fence with sources (option -g with gcc/g++), and then link it with your software with the link option -lefence, and start stepping through the calls to malloc/free. http://elinux.org/Electric_Fence

It would certainly be a problem if a thread blocked trying to acquire (again) a mutex it already owned...
Is there a reason to not permit a mutex to be acquired multiple times by the same thread?

In general, like everyone here said, it's more about design. A recursive mutex is normally used in a recursive functions.
What others fail to tell you here is that there's actually almost no cost overhead in recursive mutexes.
In general, a simple mutex is a 32 bits key with bits 0-30 containing owner's thread id and bit 31 a flag saying if the mutex has waiters or not. It has a lock method which is a CAS atomic race to claim the mutex with a syscall in case of failure. The details are not important here. It looks like this:
class mutex {
public:
void lock();
void unlock();
protected:
uint32_t key{}; //bits 0-30: thread_handle, bit 31: hasWaiters_flag
};
a recursive_mutex is normally implemented as:
class recursive_mutex : public mutex {
public:
void lock() {
uint32_t handle = current_thread_native_handle(); //obtained from TLS memory in most OS
if ((key & 0x7FFFFFFF) == handle) { // Impossible to return true unless you own the mutex.
uses++; // we own the mutex, just increase uses.
} else {
mutex::lock(); // we don't own the mutex, try to obtain it.
uses = 1;
}
}
void unlock() {
// asserts for debug, we should own the mutex and uses > 0
--uses;
if (uses == 0) {
mutex::unlock();
}
}
private:
uint32_t uses{}; // no need to be atomic, can only be modified in exclusion and only interesting read is on exclusion.
};
As you see it's an entirely user space construct. (base mutex is not though, it MAY fall into a syscall if it fails to obtain the key in an atomic compare and swap on lock and it will do a syscall on unlock if the has_waitersFlag is on).
For a base mutex implementation: https://github.com/switchbrew/libnx/blob/master/nx/source/kernel/mutex.c

If you want to be able to call public methods from different threads inside other public methods of a class and many of these public methods change the state of the object, you should use a recursive mutex. In fact, I make it a habit of using by default a recursive mutex unless there is a good reason (e.g. special performance considerations) not to use it.
It leads to better interfaces, because you don't have to split your implementation among non-locked and locked parts and you are free to use your public methods with peace of mind inside all methods as well.
It leads also in my experience to interfaces that are easier to get right in terms of locking.

Seems no one mentioned it before, but code using recursive_mutex is way easier to debug, since its internal structure contains identifier of a thread holding it.

Related

C++ atomics: how to allow only a single thread to access a function?

I'd like to write a function that is accessible only by a single thread at a time. I don't need busy waits, a brutal 'rejection' is enough if another thread is already running it. This is what I have come up with so far:
std::atomic<bool> busy (false);
bool func()
{
if (m_busy.exchange(true) == true)
return false;
// ... do stuff ...
m_busy.exchange(false);
return true;
}
Is the logic for the atomic exchange correct?
Is it correct to mark the two atomic operations as std::memory_order_acq_rel? As far as I understand a relaxed ordering (std::memory_order_relaxed) wouldn't be enough to prevent reordering.

Your atomic swap implementation might work. But trying to do thread safe programming without a lock is most always fraught with issues and is often harder to maintain.
Unless there's a performance improvement that's needed, then std::mutex with the try_lock() method is all you need, eg:
std::mutex mtx;
bool func()
{
// making use of std::unique_lock so if the code throws an
// exception, the std::mutex will still get unlocked correctly...
std::unique_lock<std::mutex> lck(mtx, std::try_to_lock);
bool gotLock = lck.owns_lock();
if (gotLock)
{
// do stuff
}
return gotLock;
}

Your code looks correct to me, as long as you leave the critical section by falling out, not returning or throwing an exception.
You can unlock with a release store; an RMW (like exchange) is unnecessary. The initial exchange only needs acquire. (But does need to be an atomic RMW like exchange or compare_exchange_strong)
Note that ISO C++ says that taking a std::mutex is an "acquire" operation, and releasing is is a "release" operation, because that's the minimum necessary for keeping the critical section contained between the taking and the releasing.
Your algo is exactly like a spinlock, but without retry if the lock's already taken. (i.e. just a try_lock). All the reasoning about necessary memory-order for locking applies here, too. What you've implemented is logically equivalent to the try_lock / unlock in #selbie's answer, and very likely performance-equivalent, too. If you never use mtx.lock() or whatever, you're never actually blocking i.e. waiting for another thread to do something, so your code is still potentially lock-free in the progress-guarantee sense.
Rolling your own with an atomic<bool> is probably good; using std::mutex here gains you nothing; you want it to be doing only this for try-lock and unlock. That's certainly possible (with some extra function-call overhead), but some implementations might do something more. You're not using any of the functionality beyond that. The one nice thing std::mutex gives you is the comfort of knowing that it safely and correctly implements try_lock and unlock. But if you understand locking and acquire / release, it's easy to get that right yourself.
The usual performance reason to not roll your own locking is that mutex will be tuned for the OS and typical hardware, with stuff like exponential backoff, x86 pause instructions while spinning a few times, then fallback to a system call. And efficient wakeup via system calls like Linux futex. All of this is only beneficial to the blocking behaviour. .try_lock leaves that all unused, and if you never have any thread sleeping then unlock never has any other threads to notify.
There is one advantage to using std::mutex: you can use RAII without having to roll your own wrapper class. std::unique_lock with the std::try_to_lock policy will do this. This will make your function exception-safe, making sure to always unlock before exiting, if it got the lock.

C++: Recursive locks - are there any drawbacks?

The background: I have a few threads that should access shared data. One of the threads might lock a Mutex, and within the mutual exclusion block, some functions (of the same thread) might call the very same lock again.
-I don't want to create many Mutexes
-I don't want to give up locking (obviously)
-I'd rather not change the design as it's quite a big change
void funcB()
{
lock(MA);
...
unlock(MA);
}
void funcA()
{
lock(MA);
...
funcB();
...
unlock(MA);
}
It seems the only way to go is - use a recursive lock. Are there any drawbacks to using this feature?
Of course, if you think of any other way to solve this case, please share

are there any drawbacks?
Slight performance penalty - measure if you care.
any other way to solve
You could give funcB a bool should_lock = true argument, or lots of variations on the theme, e.g. have one overload that locks a mutex then calls another overload that expects a reference to an already locked mutex (perhaps use an assert to check it's locked in debug builds): then funcA can call the latter.

std::mutex vs std::recursive_mutex as class member

I have seen some people hate on recursive_mutex:
http://www.zaval.org/resources/library/butenhof1.html
But when thinking about how to implement a class that is thread safe (mutex protected), it seems to me excruciatingly hard to prove that every method that should be mutex protected is mutex protected and that mutex is locked at most once.
So for object oriented design, should std::recursive_mutex be default and std::mutex considered as an performance optimization in general case unless it is used only in one place (to protect only one resource)?
To make things clear, I'm talking about one private nonstatic mutex. So each class instance has only one mutex.
At the beginning of each public method:
{
std::scoped_lock<std::recursive_mutex> sl;

Most of the time, if you think you need a recursive mutex then your design is wrong, so it definitely should not be the default.
For a class with a single mutex protecting the data members, then the mutex should be locked in all the public member functions, and all the private member functions should assume the mutex is already locked.
If a public member function needs to call another public member function, then split the second one in two: a private implementation function that does the work, and a public member function that just locks the mutex and calls the private one. The first member function can then also call the implementation function without having to worry about recursive locking.
e.g.
class X {
std::mutex m;
int data;
int const max=50;
void increment_data() {
if (data >= max)
throw std::runtime_error("too big");
++data;
}
public:
X():data(0){}
int fetch_count() {
std::lock_guard<std::mutex> guard(m);
return data;
}
void increase_count() {
std::lock_guard<std::mutex> guard(m);
increment_data();
}
int increase_count_and_return() {
std::lock_guard<std::mutex> guard(m);
increment_data();
return data;
}
};
This is of course a trivial contrived example, but the increment_data function is shared between two public member functions, each of which locks the mutex. In single-threaded code, it could be inlined into increase_count, and increase_count_and_return could call that, but we can't do that in multithreaded code.
This is just an application of good design principles: the public member functions take responsibility for locking the mutex, and delegate the responsibility for doing the work to the private member function.
This has the benefit that the public member functions only have to deal with being called when the class is in a consistent state: the mutex is unlocked, and once it is locked then all invariants hold. If you call public member functions from each other then they have to handle the case that the mutex is already locked, and that the invariants don't necessarily hold.
It also means that things like condition variable waits will work: if you pass a lock on a recursive mutex to a condition variable then (a) you need to use std::condition_variable_any because std::condition_variable won't work, and (b) only one level of lock is released, so you may still hold the lock, and thus deadlock because the thread that would trigger the predicate and do the notify cannot acquire the lock.
I struggle to think of a scenario where a recursive mutex is required.

should std::recursive_mutex be default and std::mutex considered as an performance optimization?
Not really, no. The advantage of using non-recursive locks is not just a performance optimization, it means that your code is self-checking that leaf-level atomic operations really are leaf-level, they aren't calling something else that uses the lock.
There's a reasonably common situation where you have:
a function that implements some operation that needs to be serialized, so it takes the mutex and does it.
another function that implements a larger serialized operation, and wants to call the first function to do one step of it, while it is holding the lock for the larger operation.
For the sake of a concrete example, perhaps the first function atomically removes a node from a list, while the second function atomically removes two nodes from a list (and you never want another thread to see the list with only one of the two nodes taken out).
You don't need recursive mutexes for this. For example you could refactor the first function as a public function that takes the lock and calls a private function that does the operation "unsafely". The second function can then call the same private function.
However, sometimes it's convenient to use a recursive mutex instead. There's still an issue with this design: remove_two_nodes calls remove_one_node at a point where a class invariant doesn't hold (the second time it calls it, the list is in precisely the state we don't want to expose). But assuming we know that remove_one_node doesn't rely on that invariant this isn't a killer fault in the design, it's just that we've made our rules a little more complex than the ideal "all class invariants always hold whenever any public function is entered".
So, the trick is occasionally useful and I don't hate recursive mutexes to quite the extent that article does. I don't have the historical knowledge to argue that the reason for their inclusion in Posix is different from what the article says, "to demonstrate mutex attributes and thread extensons". I certainly don't consider them the default, though.
I think it's safe to say that if in your design you're uncertain whether you need a recursive lock or not, then your design is incomplete. You will later regret the fact that you're writing code and you don't know something so fundamentally important as whether the lock is allowed to be already held or not. So don't put in a recursive lock "just in case".
If you know that you need one, use one. If you know that you don't need one, then using a non-recursive lock isn't just an optimization, it's helping to enforce a constraint of the design. It's more useful for the second lock to fail, than for it to succeed and conceal the fact that you've accidentally done something that your design says should never happen. But if you follow your design, and never double-lock the mutex, then you'll never find out whether it's recursive or not, and so a recursive mutex isn't directly harmful.
This analogy might fail, but here's another way to look at it. Imagine you had a choice between two kinds of pointer: one that aborts the program with a stacktrace when you dereference a null pointer, and another one that returns 0 (or to extend it to more types: behaves as if the pointer refers to a value-initialized object). A non-recursive mutex is a bit like the one that aborts, and a recursive mutex is a bit like the one that returns 0. They both potentially have their uses -- people sometimes go to some lengths to implement a "quiet not-a-value" value. But in the case where your code is designed to never dereference a null pointer, you don't want to use by default the version that silently allows that to happen.

I'm not going to directly weigh in on the mutex versus recursive_mutex debate, but I thought it would be good to share a scenario where recursive_mutex'es are absolutely critical to the design.
When working with Boost::asio, Boost::coroutine (and probably things like NT Fibers although I'm less familiar with them), it is absolutely essential that your mutexes be recursive even without the design problem of re-entrancy.
The reason is because the coroutine based approach by its very design will suspend execution inside a routine and then subsequently resume it. This means that two top level methods of a class might "be being called at the same time on the same thread" without any sub calls being made.

Is there a facility in boost to allow for write-biased locking?

If I have the following code:
#include <boost/date_time.hpp>
#include <boost/thread.hpp>
boost::shared_mutex g_sharedMutex;
void reader()
{
boost::shared_lock<boost::shared_mutex> lock(g_sharedMutex);
boost::this_thread::sleep(boost::posix_time::seconds(10));
}
void makeReaders()
{
while (1)
{
boost::thread ar(reader);
boost::this_thread::sleep(boost::posix_time::seconds(3));
}
}
boost::thread mr(makeReaders);
boost::this_thread::sleep(boost::posix_time::seconds(5));
boost::unique_lock<boost::shared_mutex> lock(g_sharedMutex);
...
the unique lock will never be acquired, because there are always going to be readers. I want a unique_lock that, when it starts waiting, prevents any new read locks from gaining access to the mutex (called a write-biased or write-preferred lock, based on my wiki searching). Is there a simple way to do this with boost? Or would I need to write my own?

Note that I won't comment on the win32 implementation because it's way more involved and I don't have the time to go through it in detail. That being said, it's interface is the same as the pthread implementation which means that the following answer should be equally valid.
The relevant pieces of the pthread implementation of boost::shared_mutex as of v1.51.0:
void lock_shared()
{
boost::this_thread::disable_interruption do_not_disturb;
boost::mutex::scoped_lock lk(state_change);
while(state.exclusive || state.exclusive_waiting_blocked)
{
shared_cond.wait(lk);
}
++state.shared_count;
}
void lock()
{
boost::this_thread::disable_interruption do_not_disturb;
boost::mutex::scoped_lock lk(state_change);
while(state.shared_count || state.exclusive)
{
state.exclusive_waiting_blocked=true;
exclusive_cond.wait(lk);
}
state.exclusive=true;
}
The while loop conditions are the most relevant part for you. For the lock_shared function (read lock), notice how the while loop will not terminate as long as there's a thread trying to acquire (state.exclusive_waiting_blocked) or already owns (state.exclusive) the lock. This essentially means that write locks have priority over read locks.
For the lock function (write lock), the while loop will not terminate as long as there's at least one thread that currently owns the read lock (state.shared_count) or another thread owns the write lock (state.exclusive). This essentially gives you the usual mutual exclusion guarantees.
As for deadlocks, well the read lock will always return as long as the write locks are guaranteed to be unlocked once they are acquired. As for the write lock, it's guaranteed to return as long as the read locks and the write locks are always guaranteed to be unlocked once acquired.
In case you're wondering, the state_change mutex is used to ensure that there's no concurrent calls to either of these functions. I'm not going to go through the unlock functions because they're a bit more involved. Feel free to look them over yourself, you have the source after all (boost/thread/pthread/shared_mutex.hpp) :)
All in all, this is pretty much a text book implementation and they've been extensively tested in a wide range of scenarios (libs/thread/test/test_shared_mutex.cpp and massive use across the industry). I wouldn't worry too much as long you use them idiomatically (no recursive locking and always lock using the RAII helpers). If you still don't trust the implementation, then you could write a randomized test that simulates whatever test case you're worried about and let it run overnight on hundreds of thread. That's usually a good way to tease out deadlocks.
Now why would you see that a read lock is acquired after a write lock is requested? Difficult to say without seeing the diagnostic code that you're using. Chances are that the read lock is acquired after your print statement (or whatever you're using) is completed and before state_change lock is acquired in the write thread.

Boost, mutex concept

I am new to multi-threading programming, and confused about how Mutex works. In the Boost::Thread manual, it states:
Mutexes guarantee that only one thread can lock a given mutex. If a code section is surrounded by a mutex locking and unlocking, it's guaranteed that only a thread at a time executes that section of code. When that thread unlocks the mutex, other threads can enter to that code region:
My understanding is that Mutex is used to protect a section of code from being executed by multiple threads at the same time, NOT protect the memory address of a variable. It's hard for me to grasp the concept, what happen if I have 2 different functions trying to write to the same memory address.
Is there something like this in Boost library:
lock a memory address of a variable, e.g., double x, lock (x); So
that other threads with a different function can not write to x.
do something with x, e.g., x = x + rand();
unlock (x)
Thanks.

The mutex itself only ensures that only one thread of execution can lock the mutex at any given time. It's up to you to ensure that modification of the associated variable happens only while the mutex is locked.
C++ does give you a way to do that a little more easily than in something like C. In C, it's pretty much up to you to write the code correctly, ensuring that anywhere you modify the variable, you first lock the mutex (and, of course, unlock it when you're done).
In C++, it's pretty easy to encapsulate it all into a class with some operator overloading:
class protected_int {
int value; // this is the value we're going to share between threads
mutex m;
public:
operator int() { return value; } // we'll assume no lock needed to read
protected_int &operator=(int new_value) {
lock(m);
value = new_value;
unlock(m);
return *this;
}
};
Obviously I'm simplifying that a lot (to the point that it's probably useless as it stands), but hopefully you get the idea, which is that most of the code just treats the protected_int object as if it were a normal variable.
When you do that, however, the mutex is automatically locked every time you assign a value to it, and unlocked immediately thereafter. Of course, that's pretty much the simplest possible case -- in many cases, you need to do something like lock the mutex, modify two (or more) variables in unison, then unlock. Regardless of the complexity, however, the idea remains that you centralize all the code that does the modification in one place, so you don't have to worry about locking the mutex in the rest of the code. Where you do have two or more variables together like that, you generally will have to lock the mutex to read, not just to write -- otherwise you can easily get an incorrect value where one of the variables has been modified but the other hasn't.

No, there is nothing in boost(or elsewhere) that will lock memory like that.
You have to protect the code that access the memory you want protected.
what happen if I have 2 different functions trying to write to the same
memory address.
Assuming you mean 2 functions executing in different threads, both functions should lock the same mutex, so only one of the threads can write to the variable at a given time.
Any other code that accesses (either reads or writes) the same variable will also have to lock the same mutex, failure to do so will result in indeterministic behavior.

It is possible to do non-blocking atomic operations on certain types using Boost.Atomic. These operations are non-blocking and generally much faster than a mutex. For example, to add something atomically you can do:
boost::atomic<int> n = 10;
n.fetch_add(5, boost:memory_order_acq_rel);
This code atomically adds 5 to n.

In order to protect a memory address shared by multiple threads in two different functions, both functions have to use the same mutex ... otherwise you will run into a scenario where threads in either function can indiscriminately access the same "protected" memory region.
So boost::mutex works just fine for the scenario you describe, but you just have to make sure that for a given resource you're protecting, all paths to that resource lock the exact same instance of the boost::mutex object.

I think the detail you're missing is that a "code section" is an arbitrary section of code. It can be two functions, half a function, a single line, or whatever.
So the portions of your 2 different functions that hold the same mutex when they access the shared data, are "a code section surrounded by a mutex locking and unlocking" so therefore "it's guaranteed that only a thread at a time executes that section of code".
Also, this is explaining one property of mutexes. It is not claiming this is the only property they have.

Your understanding is correct with respect to mutexes. They protect the section of code between the locking and unlocking.
As per what happens when two threads write to the same location of memory, they are serialized. One thread writes its value, the other thread writes to it. The problem with this is that you don't know which thread will write first (or last), so the code is not deterministic.
Finally, to protect a variable itself, you can find a near concept in atomic variables. Atomic variables are variables that are protected by either the compiler or the hardware, and can be modified atomically. That is, the three phases you comment (read, modify, write) happen atomically. Take a look at Boost atomic_count.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js