I'm writing thread-safe class in C++. All of its public methods use locks (non-recursive spin locks). Private methods are lock-free. So, everything should be OK: user calls public method, it locks object and then does the work through private methods. But I got dead lock when a public method calls another public method. I've read that recursive mutexes are bad, cause it's difficult to debug them. So I use C's stdio way: public method Foo() only locks the object and calls Foo_nolock() to do the whole work. But I don't like these _nolock() methods. I think it duplicates my code.
So I've got an idea: I'll make lock-free class BarNoLock, and thread-safe class Bar that has only one member: an instance of BarNoLock. And all Bar's methods will only lock this member and call it's methods.
Is it a good idea or maybe there are some better patterns/practices? Thanks.
Update: I know about pimpl and bridge. I ask about multi-threading patterns, not OOP.
I'm not sure why recursive mutexes would be considered bad, see this question for a discussion of them.
Recursive Lock (Mutex) vs Non-Recursive Lock (Mutex)
But I don't think that's necessarily your problem because Win32 critical sections support multiple entries from the same thread without blocking. From the doc:
When a thread owns a critical section, it can make additional calls to EnterCriticalSection or TryEnterCriticalSection without blocking its execution. This prevents a thread from deadlocking itself while waiting for a critical section that it already owns. To release its ownership, the thread must call LeaveCriticalSection one time for each time that it entered the critical section. There is no guarantee about the order in which waiting threads will acquire ownership of the critical section
So maybe you were doing something else wrong when you were deadlocking yourself? Having to work around not deadlocking yourself on the same mutex from the same thread with weird function call semantics is not something you should have to do.
Looks like you have reinvented the Bridge Pattern. Sounds perfectly in order.
Related
Ran across this destructor in a codebase I am debugging.
ManagerImpl::~ManagerImpl() {
// don't go away if some thread is still hitting us
boost::unique_lock<boost::mutex> l(m_mutex);
}
Does it actually serve any useful purpose in a multi-threaded program? It looks like kludge.
I assume the idea is to defer destruction if another thread is calling a function that locks the mutex, but is it even effectively at doing that? ElectricFence segfaults would have me believe otherwise.
It is probably trying to postpone destruction until another thread unlocks the mutex and leaves another member function.
However, this wouldn't prevent another thread calling that function again after the lock in the destructor has been released.
There must be more interaction between threads (which you don't show) to make this code make sense. Still, thought, this doesn't seem to be robust code.
I have seen some people hate on recursive_mutex:
http://www.zaval.org/resources/library/butenhof1.html
But when thinking about how to implement a class that is thread safe (mutex protected), it seems to me excruciatingly hard to prove that every method that should be mutex protected is mutex protected and that mutex is locked at most once.
So for object oriented design, should std::recursive_mutex be default and std::mutex considered as an performance optimization in general case unless it is used only in one place (to protect only one resource)?
To make things clear, I'm talking about one private nonstatic mutex. So each class instance has only one mutex.
At the beginning of each public method:
{
std::scoped_lock<std::recursive_mutex> sl;
Most of the time, if you think you need a recursive mutex then your design is wrong, so it definitely should not be the default.
For a class with a single mutex protecting the data members, then the mutex should be locked in all the public member functions, and all the private member functions should assume the mutex is already locked.
If a public member function needs to call another public member function, then split the second one in two: a private implementation function that does the work, and a public member function that just locks the mutex and calls the private one. The first member function can then also call the implementation function without having to worry about recursive locking.
e.g.
class X {
std::mutex m;
int data;
int const max=50;
void increment_data() {
if (data >= max)
throw std::runtime_error("too big");
++data;
}
public:
X():data(0){}
int fetch_count() {
std::lock_guard<std::mutex> guard(m);
return data;
}
void increase_count() {
std::lock_guard<std::mutex> guard(m);
increment_data();
}
int increase_count_and_return() {
std::lock_guard<std::mutex> guard(m);
increment_data();
return data;
}
};
This is of course a trivial contrived example, but the increment_data function is shared between two public member functions, each of which locks the mutex. In single-threaded code, it could be inlined into increase_count, and increase_count_and_return could call that, but we can't do that in multithreaded code.
This is just an application of good design principles: the public member functions take responsibility for locking the mutex, and delegate the responsibility for doing the work to the private member function.
This has the benefit that the public member functions only have to deal with being called when the class is in a consistent state: the mutex is unlocked, and once it is locked then all invariants hold. If you call public member functions from each other then they have to handle the case that the mutex is already locked, and that the invariants don't necessarily hold.
It also means that things like condition variable waits will work: if you pass a lock on a recursive mutex to a condition variable then (a) you need to use std::condition_variable_any because std::condition_variable won't work, and (b) only one level of lock is released, so you may still hold the lock, and thus deadlock because the thread that would trigger the predicate and do the notify cannot acquire the lock.
I struggle to think of a scenario where a recursive mutex is required.
should std::recursive_mutex be default and std::mutex considered as an performance optimization?
Not really, no. The advantage of using non-recursive locks is not just a performance optimization, it means that your code is self-checking that leaf-level atomic operations really are leaf-level, they aren't calling something else that uses the lock.
There's a reasonably common situation where you have:
a function that implements some operation that needs to be serialized, so it takes the mutex and does it.
another function that implements a larger serialized operation, and wants to call the first function to do one step of it, while it is holding the lock for the larger operation.
For the sake of a concrete example, perhaps the first function atomically removes a node from a list, while the second function atomically removes two nodes from a list (and you never want another thread to see the list with only one of the two nodes taken out).
You don't need recursive mutexes for this. For example you could refactor the first function as a public function that takes the lock and calls a private function that does the operation "unsafely". The second function can then call the same private function.
However, sometimes it's convenient to use a recursive mutex instead. There's still an issue with this design: remove_two_nodes calls remove_one_node at a point where a class invariant doesn't hold (the second time it calls it, the list is in precisely the state we don't want to expose). But assuming we know that remove_one_node doesn't rely on that invariant this isn't a killer fault in the design, it's just that we've made our rules a little more complex than the ideal "all class invariants always hold whenever any public function is entered".
So, the trick is occasionally useful and I don't hate recursive mutexes to quite the extent that article does. I don't have the historical knowledge to argue that the reason for their inclusion in Posix is different from what the article says, "to demonstrate mutex attributes and thread extensons". I certainly don't consider them the default, though.
I think it's safe to say that if in your design you're uncertain whether you need a recursive lock or not, then your design is incomplete. You will later regret the fact that you're writing code and you don't know something so fundamentally important as whether the lock is allowed to be already held or not. So don't put in a recursive lock "just in case".
If you know that you need one, use one. If you know that you don't need one, then using a non-recursive lock isn't just an optimization, it's helping to enforce a constraint of the design. It's more useful for the second lock to fail, than for it to succeed and conceal the fact that you've accidentally done something that your design says should never happen. But if you follow your design, and never double-lock the mutex, then you'll never find out whether it's recursive or not, and so a recursive mutex isn't directly harmful.
This analogy might fail, but here's another way to look at it. Imagine you had a choice between two kinds of pointer: one that aborts the program with a stacktrace when you dereference a null pointer, and another one that returns 0 (or to extend it to more types: behaves as if the pointer refers to a value-initialized object). A non-recursive mutex is a bit like the one that aborts, and a recursive mutex is a bit like the one that returns 0. They both potentially have their uses -- people sometimes go to some lengths to implement a "quiet not-a-value" value. But in the case where your code is designed to never dereference a null pointer, you don't want to use by default the version that silently allows that to happen.
I'm not going to directly weigh in on the mutex versus recursive_mutex debate, but I thought it would be good to share a scenario where recursive_mutex'es are absolutely critical to the design.
When working with Boost::asio, Boost::coroutine (and probably things like NT Fibers although I'm less familiar with them), it is absolutely essential that your mutexes be recursive even without the design problem of re-entrancy.
The reason is because the coroutine based approach by its very design will suspend execution inside a routine and then subsequently resume it. This means that two top level methods of a class might "be being called at the same time on the same thread" without any sub calls being made.
I've been looking for causes for deadlocks and strategies/tools to avoid and detect them.
Another potential cause for deadlocks is to have blocking functions calling other blocking functions in a circular way, so that eventually a call never returns.
Sometimes this is hard to discover, specially in very large projects.
So, are there any tools/libraries/techiques that allow to automate the detection of circular calls in a program?
EDIT:
I code mostly in C and C++ so, if possible, give any information about the topic that is applicable to those languages.
Nevertheless, it seems this topic is scarcely covered in SO, so answers for other languages are ok too. although maybe those deserve a topic of its own if someone finds it relevant
Thanks.
Circular (or recursive) calls that try to acquire the same non-reentrant lock are one of the easiest to debug blocking scenarios: locking is deterministic, and can be easily checked. When the application locks, fire up the debugger and look at the stack trace to understand what locks are held and why.
As to general solutions for the problem of locking... you can look into some libraries that provide mutex ordering, and detect when you are trying to lock on a mutex out of order. This type of solutions might be complex to implement correctly, but once in place it ensures that you cannot enter a deadlock condition, as it forces all processes to obtain the locks in the same order (i.e. if process A holds lock La, and it tries to acquire lock Lb for which the ordering is correct, then it can either succeed or lock, but whichever process is holding lock Lb cannot try to lock La as the ordering constraint would not be met).
If you are on Linux there 2 Valgrind tools for detecting deadlocks and race conditions: Helgrind, DRD. They both complement each other and it's worth to check for thread errors by both of them.
In linux you can use valgrind to detect deadlocks, use --tool=helgrind.
Best way to detect deadlocks (IMO) is to make a test program that calls all the functions in a random order in like 30 different threads 10000s of times.
If you get a deadlock you can use VS2010 "Parallel Stacks" window. Debug->Windows->Parallel Stacks
This window will show you all the stacks, so you can find the methods that are deadlocking.
A simple strategy I use to write thread-safe objects:
A thread safe object should be safe when its public methods are called, so you don't get deadlocks when it is used.
So, the idea is to just lock all the public methods that access the object's data.
Besides that you need to insure that within the class' code you never call a public method. If you need to use one of the public methods, then make that method private, and wrap the private method with a public method that locks and then calls it.
If you want better lock granularity you could just create objects for each part that has its own lock, and lock it like I suggested. Then use encapsulation to combine those classes to the one class.
Example:
class Blah {
MyData data;
Lock lock;
public:
DataItem GetData(int index)
{
ReadLock read(lock);
return LocalGetData(index);
}
DataItem FindData(string key)
{
ReadLock read(lock);
DataItem item;
//find the item, can use LocalGetData() to get the item without deadlocking
return item;
}
void PutData(DataItem item)
{
ReadLock write(lock);
//put item in database
}
private:
DataItem LocalGetData(int index)
{
return data[index];
}
}
You could find a tool that builds a call graph, and check the graph for cycles.
Otherwise, there are a number of strategies for detecting deadlocks or other circularities, but they all depend on having some sort of supporting infrastructure in place.
There are deadlock avoidance strategies, having to do with assigning lock priorities and ordering the locks according to priority. These require code changes and enforcing the standards, though.
I'm using the C++ boost::thread library, which in my case means I'm using pthreads. Officially, a mutex must be unlocked from the same thread which locks it, and I want the effect of being able to lock in one thread and then unlock in another. There are many ways to accomplish this. One possibility would be to write a new mutex class which allows this behavior.
For example:
class inter_thread_mutex{
bool locked;
boost::mutex mx;
boost::condition_variable cv;
public:
void lock(){
boost::unique_lock<boost::mutex> lck(mx);
while(locked) cv.wait(lck);
locked=true;
}
void unlock(){
{
boost::lock_guard<boost::mutex> lck(mx);
if(!locked) error();
locked=false;
}
cv.notify_one();
}
// bool try_lock(); void error(); etc.
}
I should point out that the above code doesn't guarantee FIFO access, since if one thread calls lock() while another calls unlock(), this first thread may acquire the lock ahead of other threads which are waiting. (Come to think of it, the boost::thread documentation doesn't appear to make any explicit scheduling guarantees for either mutexes or condition variables). But let's just ignore that (and any other bugs) for now.
My question is, if I decide to go this route, would I be able to use such a mutex as a model for the boost Lockable concept. For example, would anything go wrong if I use a boost::unique_lock< inter_thread_mutex > for RAII-style access, and then pass this lock to boost::condition_variable_any.wait(), etc.
On one hand I don't see why not. On the other hand, "I don't see why not" is usually a very bad way of determining whether something will work.
The reason I ask is that if it turns out that I have to write wrapper classes for RAII locks and condition variables and whatever else, then I'd rather just find some other way to achieve the same effect.
EDIT:
The kind of behavior I want is basically as follows. I have an object, and it needs to be locked whenever it is modified. I want to lock the object from one thread, and do some work on it. Then I want to keep the object locked while I tell another worker thread to complete the work. So the first thread can go on and do something else while the worker thread finishes up. When the worker thread gets done, it unlocks the mutex.
And I want the transition to be seemless so nobody else can get the mutex lock in between when thread 1 starts the work and thread 2 completes it.
Something like inter_thread_mutex seems like it would work, and it would also allow the program to interact with it as if it were an ordinary mutex. So it seems like a clean solution. If there's a better solution, I'd be happy to hear that also.
EDIT AGAIN:
The reason I need locks to begin with is that there are multiple master threads, and the locks are there to prevent them from accessing shared objects concurrently in invalid ways.
So the code already uses loop-level lock-free sequencing of operations at the master thread level. Also, in the original implementation, there were no worker threads, and the mutexes were ordinary kosher mutexes.
The inter_thread_thingy came up as an optimization, primarily to improve response time. In many cases, it was sufficient to guarantee that the "first part" of operation A, occurs before the "first part" of operation B. As a dumb example, say I punch object 1 and give it a black eye. Then I tell object 1 to change it's internal structure to reflect all the tissue damage. I don't want to wait around for the tissue damage before I move on to punch object 2. However, I do want the tissue damage to occur as part of the same operation; for example, in the interim, I don't want any other thread to reconfigure the object in such a way that would make tissue damage an invalid operation. (yes, this example is imperfect in many ways, and no I'm not working on a game)
So we made the change to a model where ownership of an object can be passed to a worker thread to complete an operation, and it actually works quite nicely; each master thread is able to get a lot more operations done because it doesn't need to wait for them all to complete. And, since the event sequencing at the master thread level is still loop-based, it is easy to write high-level master-thread operations, as they can be based on the assumption that an operation is complete (more precisely, the critical "first part" upon which the sequencing logic depends is complete) when the corresponding function call returns.
Finally, I thought it would be nice to use inter_thread mutex/semaphore thingies using RAII with boost locks to encapsulate the necessary synchronization that is required to make the whole thing work.
man pthread_unlock (this is on OS X, similar wording on Linux) has the answer:
NAME
pthread_mutex_unlock -- unlock a mutex
SYNOPSIS
#include <pthread.h>
int
pthread_mutex_unlock(pthread_mutex_t *mutex);
DESCRIPTION
If the current thread holds the lock on mutex, then the
pthread_mutex_unlock() function unlocks mutex.
Calling pthread_mutex_unlock() with a mutex that the
calling thread does not hold will result in
undefined behavior.
...
My counter-question would be - what kind of synchronization problem are you trying to solve with this? Most probably there is an easier solution.
Neither pthreads nor boost::thread (built on top of it) guarantee any order in which a contended mutex is acquired by competing threads.
Sorry, but I don't understand. what will be the state of your mutex in line [1] in the following code if another thread can unlock it?
inter_thread_mutex m;
{
m.lock();
// [1]
m.unlock();
}
This has no sens.
There's a few ways to approach this. Both of the ones I'm going to suggest are going to involve adding an additional piece of information to the object, rather adding a mechanism to unlock a thread from a thread other than the one that owns it.
1) you can add some information to indicate the object's state:
enum modification_state { consistent, // ready to be examined or to start being modified
phase1_complete, // ready for the second thread to finish the work
};
// first worker thread
lock();
do_init_work(object);
object.mod_state = phase1_complete;
unlock();
signal();
do_other_stuff();
// second worker thread
lock()
while( object.mod_state != phase1_complete )
wait()
do_final_work(obj)
object.mod_state = consistent;
unlock()
signal()
// some other thread that needs to read the data
lock()
while( object.mod_state != consistent )
wait();
read_data(obj)
unlock()
Works just fine with condition variables, because obviously you're not writing your own lock.
2) If you have a specific thread in mind, you can give the object an owner.
// first worker
lock();
while( obj.owner != this_thread() ) wait();
do_initial_work(obj);
obj.owner = second_thread_id;
unlock()
signal()
...
This is pretty much the same solution as my first solution, but more flexible in the adding/removing of phases, and less flexible in the adding/removing of threads.
To be honest, I'm not sure how inter thread mutex would help you here. You'd still need a semaphore or condition variable to signal the passing of the work to the second thread.
Small modification to what you already have: how about storing the id of the thread which you want to take the lock, in your inter_thread_whatever? Then unlock it, and send a message to that thread, saying "I want you execute whatever routine it is that tries to take this lock".
Then the condition in lock becomes while(locked || (desired_locker != thisthread && desired_locker != 0)). Technically you've "released the lock" in the first thread, and "taken it again" in the second thread, but there's no way that any other thread can grab it in between, so it's as if you've transferred it directly from one to the other.
There's a potential problem, that if a thread exits or is killed, while it's the desired locker of your lock, then that thread deadlocks. But you were already talking about the first thread waiting for a message from the second thread to say that it has successfully acquired the lock, so presumably you already have a plan in mind for what happens if that message is never received. To that plan, add "reset the desired_locker field on the inter_thread_whatever".
This is all very hairy, though, I'm not convinced that what I've proposed is correct. Is there a way that the "master" thread (the one that's directing all these helpers) can just make sure that it doesn't order any more operations to be performed on whatever is protected by this lock, until the first op is completed (or fails and some RAII thing notifies you)? You don't need locks as such, if you can deal with it at the level of the message loop.
I don't think it is a good idea to say that your inter_thread_mutex (binary_semaphore) can be seen as a model of Lockable. The main issue is that the main feature of your inter_thread_mutex defeats the Locakble concept. If inter_thread_mutex was a model of lockable you will expect in In [1] that the inter_thread_mutex m is locked.
// thread T1
inter_thread_mutex m;
{
unique_lock<inter_thread_mutex> lk(m);
// [1]
}
But as an other thread T2 can do m.unlock() while T1 is in [1], the guaranty is broken.
Binary semaphores can be used as Lockables as far as each thread tries to lock before unlocking. But the main goal of your class is exactly the contrary.
This is one of the reason semaphores in Boost.Interprocess don't use lock/unlock to name the functions, but wait/notify. Curiously these are the same names used by conditions :)
A mutex is a mechanism for describing mutually exclusive blocks of code. It does not make sense for these blocks of code to cross thread boundaries. Trying to use such a concept in such an counter intuitive way can only lead to problems down the line.
It sounds very much like you're looking for a different multi-threading concept, but without more detail it's hard to know what.
I understand recursive mutex allows mutex to be locked more than once without getting to a deadlock and should be unlocked the same number of times. But in what specific situations do you need to use a recursive mutex? I'm looking for design/code-level situations.
For example when you have function that calls it recursively, and you want to get synchronized access to it:
void foo() {
... mutex_acquire();
... foo();
... mutex_release();
}
without a recursive mutex you would have to create an "entry point" function first, and this becomes cumbersome when you have a set of functions that are mutually recursive. Without recursive mutex:
void foo_entry() {
mutex_acquire(); foo(); mutex_release(); }
void foo() { ... foo(); ... }
Recursive and non-recursive mutexes have different use cases. No mutex type can easily replace the other. Non-recursive mutexes have less overhead, and recursive mutexes have in some situations useful or even needed semantics and in other situations dangerous or even broken semantics. In most cases, someone can replace any strategy using recursive mutexes with a different safer and more efficient strategy based on the usage of non-recursive mutexes.
If you just want to exclude other threads from using your mutex protected resource, then you could use any mutex type, but might want to use the non-recursive mutex because of its smaller overhead.
If you want to call functions recursively, which lock the same mutex, then they either
have to use one recursive mutex, or
have to unlock and lock the same non-recursive mutex again and again (beware of concurrent threads!) (assuming this is semantically sound, it could still be a performance issue), or
have to somehow annotate which mutexes they already locked (simulating recursive ownership/mutexes).
If you want to lock several mutex-protected objects from a set of such objects, where the sets could have been built by merging, you can choose
to use per object exactly one mutex, allowing more threads to work in parallel, or
to use per object one reference to any possibly shared recursive mutex, to lower the probability of failing to lock all mutexes together, or
to use per object one comparable reference to any possibly shared non-recursive mutex, circumventing the intent to lock multiple times.
If you want to release a lock in a different thread than it has been locked, then you have to use non-recursive locks (or recursive locks which explicitly allow this instead of throwing exceptions).
If you want to use synchronization variables, then you need to be able to explicitly unlock the mutex while waiting on any synchronization variable, so that the resource is allowed to be used in other threads. That is only sanely possible with non-recursive mutexes, because recursive mutexes could already have been locked by the caller of the current function.
I encountered the need for a recursive mutex today, and I think it's maybe the simplest example among the posted answers so far:
This is a class that exposes two API functions, Process(...) and reset().
public void Process(...)
{
acquire_mutex(mMutex);
// Heavy processing
...
reset();
...
release_mutex(mMutex);
}
public void reset()
{
acquire_mutex(mMutex);
// Reset
...
release_mutex(mMutex);
}
Both functions must not run concurrently because they modify internals of the class, so I wanted to use a mutex.
Problem is, Process() calls reset() internally, and it would create a deadlock because mMutex is already acquired.
Locking them with a recursive lock instead fixes the problem.
If you want to see an example of code that uses recursive mutexes, look at the sources for "Electric Fence" for Linux/Unix. 'Twas one of the common Unix tools for finding "bounds checking" read/write overruns and underruns as well as using memory that has been freed, before Valgrind came along.
Just compile and link electric fence with sources (option -g with gcc/g++), and then link it with your software with the link option -lefence, and start stepping through the calls to malloc/free. http://elinux.org/Electric_Fence
It would certainly be a problem if a thread blocked trying to acquire (again) a mutex it already owned...
Is there a reason to not permit a mutex to be acquired multiple times by the same thread?
In general, like everyone here said, it's more about design. A recursive mutex is normally used in a recursive functions.
What others fail to tell you here is that there's actually almost no cost overhead in recursive mutexes.
In general, a simple mutex is a 32 bits key with bits 0-30 containing owner's thread id and bit 31 a flag saying if the mutex has waiters or not. It has a lock method which is a CAS atomic race to claim the mutex with a syscall in case of failure. The details are not important here. It looks like this:
class mutex {
public:
void lock();
void unlock();
protected:
uint32_t key{}; //bits 0-30: thread_handle, bit 31: hasWaiters_flag
};
a recursive_mutex is normally implemented as:
class recursive_mutex : public mutex {
public:
void lock() {
uint32_t handle = current_thread_native_handle(); //obtained from TLS memory in most OS
if ((key & 0x7FFFFFFF) == handle) { // Impossible to return true unless you own the mutex.
uses++; // we own the mutex, just increase uses.
} else {
mutex::lock(); // we don't own the mutex, try to obtain it.
uses = 1;
}
}
void unlock() {
// asserts for debug, we should own the mutex and uses > 0
--uses;
if (uses == 0) {
mutex::unlock();
}
}
private:
uint32_t uses{}; // no need to be atomic, can only be modified in exclusion and only interesting read is on exclusion.
};
As you see it's an entirely user space construct. (base mutex is not though, it MAY fall into a syscall if it fails to obtain the key in an atomic compare and swap on lock and it will do a syscall on unlock if the has_waitersFlag is on).
For a base mutex implementation: https://github.com/switchbrew/libnx/blob/master/nx/source/kernel/mutex.c
If you want to be able to call public methods from different threads inside other public methods of a class and many of these public methods change the state of the object, you should use a recursive mutex. In fact, I make it a habit of using by default a recursive mutex unless there is a good reason (e.g. special performance considerations) not to use it.
It leads to better interfaces, because you don't have to split your implementation among non-locked and locked parts and you are free to use your public methods with peace of mind inside all methods as well.
It leads also in my experience to interfaces that are easier to get right in terms of locking.
Seems no one mentioned it before, but code using recursive_mutex is way easier to debug, since its internal structure contains identifier of a thread holding it.