I have a "Device" class representing the connection of a peripheral hardware device. Scores of member functions ("device functions") are called on each Device object by clients.
class Device {
public:
std::timed_mutex mutex_;
void DeviceFunction1();
void DeviceFunction2();
void DeviceFunction3();
void DeviceFunction4();
// void DeviceFunctionXXX(); lots and lots of device functions
// other stuff
// ...
};
The Device class has a member std::timed_mutex mutex_ which must be locked by each of the device functions prior to communicating with the device, to prevent communication with the device simultaneously from concurrent threads.
An obvious but repetitive and cumbersome approach is to copy/paste the mutex_.try_lock() code at the top of the execution of each device function.
void Device::DeviceFunction1() {
mutex_.try_lock(); // this is repeated in ALL functions
// communicate with device
// other stuff
// ...
}
However, I'm wondering if there is a C++ construct or design pattern or paradigm which can be used to "group" these functions in such a way that the mutex_.try_lock() call is "implicit" for all functions in the group.
In other words: in a similar fashion that a derived class can implicitly call common code in a base class constructor, I'd like to do something similar with functions calls (instead of class inheritance).
Any recommendations?
First of all, if the mutex must be locked before you do anything else, then you should call mutex_.lock(), or at least not ignore the fact that try_lock may actually fail to lock the mutex. Also, manually placing calls to lock and unlock a mutex is extremely error-prone and can be much harder to get right than you might think. Don't do it. Use, e.g., an std::lock_guard instead.
The fact that you're using an std::timed_mutex suggests that what's actually going on in your real code may be a bit more involved (what for would you be using an std::timed_mutex otherwise). Assuming that what you're really doing is something more complex than just calling try_lock and ignoring its return value, consider encapsulating your complex locking procedure, whatever it may be, in a custom lock guard type, e.g.:
class the_locking_dance
{
auto do_the_locking_dance(std::timed_mutex& mutex)
{
while (!mutex.try_lock_for(100ms))
/* do whatever it is that you wanna do */;
return std::lock_guard { mutex, std::adopt_lock_t };
}
std::lock_guard<std::timed_mutex> guard;
public:
the_locking_dance(std::timed_mutex& mutex)
: guard(do_the_locking_dance(mutex))
{
}
};
and then create a local variable
the_locking_dance guard(mutex_);
to acquire and hold on to your lock. This will also automatically release the lock upon exit from a block.
Apart from all that, note that what you're doing here is, most likely, not a good idea in general. The real question is: why are there so many different methods that all need to be protected by the same mutex to begin with? Do you really have to support an arbitrary number of threads you know nothing about, which arbitrarily may do arbitrary things with the same device object at arbitrary times in arbitrary order? If not, then why are you building your Device abstraction to support this use case? Is there really no better interface that you could design for your application scenario, knowing about what it actually is the threads are supposed to be doing. Do you really have to do such fine-grained locking? Consider how inefficient it is with your current abstraction to, e.g., call multiple device functions in a row as that requires constantly locking and unlocking and locking and unlocking this mutex again and again all over the place…
All that being said, there may be a way to improve the locking frequency while, at the same time, addressing your original question:
I'm wondering if there is a C++ construct or design pattern or paradigm which can be used to "group" these functions in such a way that the mutex_.try_lock() call is "implicit" for all functions in the group.
You could group these functions by exposing them not as methods of a Device object directly, but as methods of yet another lock guard type, for example
class Device
{
…
void DeviceFunction1();
void DeviceFunction2();
void DeviceFunction3();
void DeviceFunction4();
public:
class DeviceFunctionSet1
{
Device& device;
the_locking_dance guard;
public:
DeviceFunctionSet1(Device& device)
: device(device), guard(device.mutex_)
{
}
void DeviceFunction1() { device.DeviceFunction1(); }
void DeviceFunction2() { device.DeviceFunction2(); }
};
class DeviceFunctionSet2
{
Device& device;
the_locking_dance guard;
public:
DeviceFunctionSet2(Device& device)
: device(device), guard(device.mutex_)
{
}
void DeviceFunction3() { device.DeviceFunction4(); }
void DeviceFunction4() { device.DeviceFunction3(); }
};
};
Now, to get access to the methods of your device within a given block scope, you first acquire the respective DeviceFunctionSet and then you can call the methods:
{
DeviceFunctionSet1 dev(my_device);
dev.DeviceFunction1();
dev.DeviceFunction2();
}
The nice thing about this is that the locking happens once for an entire group of functions (which will, hopefully, somewhat logically belong together as a group of functions used to achieve a particular task with your Device) automatically and you can also never forget to unlock the mutex…
Even with this, however, the most important thing is to not just build a generic "thread-safe Device". These things are usually neither efficient nor really useful. Build an abstraction that reflects the way multiple threads are supposed to cooperate using a Device in your particular application. Everything else is second to that. But without knowing anything about what your application actually is, there's not really anything more that could be said to that…
Related
I tried searching within questions dedicated to design patterns/data exchange/classes design, to no avail.
I am specifically programming in c++, but being this mainly a design problem, I think is quite a general one.
What I am trying to do is designing the data exchange between at least two classes, could be more,
as follows:
One class reads images from disk and shares them
Arbitrary number of classes (0+) read and process these images independently
The sharing class should not be constrained to the presence of consumer classes.
Not being an expert, only options I could think of are either publish-subscribe machinery or using a shared memory.
What are the possible solutions for such a problem, and their pros and cons?
Thank you in advance
You could implement it as a classic producer-consumer pattern. You did not mention whether producers can work from different threads, but I will assume multi-threading capability to make this solution more flexible.
// Not important what this actually is.
class Image
{ };
using ImagePtr = std::shared_ptr<Image>;
// Shared queue which stores currently available images and
// encapsulates synchronization details.
class ImageQueue
{
private:
std::queue<ImagePtr> m_images;
std::mutex m_mutex;
std::condition_variable m_cond;
public:
void PostImage(std::shared_ptr<Image> image)
{
// Lock the queue, push image, notify single thread.
std::unique_lock<std::mutex> lock(m_mutex);
m_images.push(image);
m_cond.notify_one();
}
ImagePtr WaitForImage()
{
// Lock the queue, wait if empty, fetch image and return it.
std::unique_lock<std::mutex> lock(m_mutex);
if (m_images.empty())
{
m_cond.wait(lock, [&m_images]() -> bool { return !m_images.empty(); });
}
assert (!m_images.empty());
auto nextImage = m_images.front();
m_images.pop();
return nextImage;
}
};
// Image producer class, loads images and posts them into the queue.
class ImageProducer
{
private:
ImageQueue* m_queue;
public:
void LoadImage(const char* file)
{
auto image = loadAndInitializeImageObject(file);
m_queue->PostImage(image);
}
};
// Image consumer class, fetches images and processes them.
class ImageConsumer
{
private:
ImageQueue* m_queue;
public:
void ProcessImage()
{
auto image = m_queue->WaitForImage();
processImage(image);
}
};
This a very, very beta-version concept, but it should give you an overview. Some notes:
There should be, of course, a single queue instance. It could be instantiated independently and passed to both classes as a constructor argument (via pointer or reference), but it also could be a member of ImageProducer class which could provide a public accessor to obtain the pointer/reference to it - the choice depends on particular needs.
Currently, the logic does not include clear point when processing should end. Queue could have an additional bool flag (e.g. m_processingActive, possibly wrapped in std::atomic<>). This flag would be initialized to true during construction and, after last image is produced, changed to false by the producer. Consumers would end waiting for images when queue becomes inactive.
There are probably some additional improvements, some things may be done differently and, possibly, better. But this basic concept is a quite good starting point (I hope).
Of course, you are not limited to a single ImageConsumer class. Actual processing function (processImage in my code) could be a virtual function, which is implemented in specialized classes.
Publish-subscribe is a very generic design pattern. One way to implement it is using shared memory, but doesn't work very well over a network.
I'm going to assume that you want to do this in-process, multi-threaded.
Since the data seems to be fairly large, but static you should not put the images in the sharing mechanism, but rather allocate it and pass around pointer to the allocated memory. You can clean it up at the end or if you're not sure when the consumers are done with it, use a std::shared_ptr. You can also pass around the actual image data but that would cause it to be copied multiple times. Might be ok if it's small.
Now to implement the published-subscriber mechanism in a thread-safe way is a bit hard and easy to mess up. I would suggest using a library (boost::signals2 seems recommended).
I have seen some people hate on recursive_mutex:
http://www.zaval.org/resources/library/butenhof1.html
But when thinking about how to implement a class that is thread safe (mutex protected), it seems to me excruciatingly hard to prove that every method that should be mutex protected is mutex protected and that mutex is locked at most once.
So for object oriented design, should std::recursive_mutex be default and std::mutex considered as an performance optimization in general case unless it is used only in one place (to protect only one resource)?
To make things clear, I'm talking about one private nonstatic mutex. So each class instance has only one mutex.
At the beginning of each public method:
{
std::scoped_lock<std::recursive_mutex> sl;
Most of the time, if you think you need a recursive mutex then your design is wrong, so it definitely should not be the default.
For a class with a single mutex protecting the data members, then the mutex should be locked in all the public member functions, and all the private member functions should assume the mutex is already locked.
If a public member function needs to call another public member function, then split the second one in two: a private implementation function that does the work, and a public member function that just locks the mutex and calls the private one. The first member function can then also call the implementation function without having to worry about recursive locking.
e.g.
class X {
std::mutex m;
int data;
int const max=50;
void increment_data() {
if (data >= max)
throw std::runtime_error("too big");
++data;
}
public:
X():data(0){}
int fetch_count() {
std::lock_guard<std::mutex> guard(m);
return data;
}
void increase_count() {
std::lock_guard<std::mutex> guard(m);
increment_data();
}
int increase_count_and_return() {
std::lock_guard<std::mutex> guard(m);
increment_data();
return data;
}
};
This is of course a trivial contrived example, but the increment_data function is shared between two public member functions, each of which locks the mutex. In single-threaded code, it could be inlined into increase_count, and increase_count_and_return could call that, but we can't do that in multithreaded code.
This is just an application of good design principles: the public member functions take responsibility for locking the mutex, and delegate the responsibility for doing the work to the private member function.
This has the benefit that the public member functions only have to deal with being called when the class is in a consistent state: the mutex is unlocked, and once it is locked then all invariants hold. If you call public member functions from each other then they have to handle the case that the mutex is already locked, and that the invariants don't necessarily hold.
It also means that things like condition variable waits will work: if you pass a lock on a recursive mutex to a condition variable then (a) you need to use std::condition_variable_any because std::condition_variable won't work, and (b) only one level of lock is released, so you may still hold the lock, and thus deadlock because the thread that would trigger the predicate and do the notify cannot acquire the lock.
I struggle to think of a scenario where a recursive mutex is required.
should std::recursive_mutex be default and std::mutex considered as an performance optimization?
Not really, no. The advantage of using non-recursive locks is not just a performance optimization, it means that your code is self-checking that leaf-level atomic operations really are leaf-level, they aren't calling something else that uses the lock.
There's a reasonably common situation where you have:
a function that implements some operation that needs to be serialized, so it takes the mutex and does it.
another function that implements a larger serialized operation, and wants to call the first function to do one step of it, while it is holding the lock for the larger operation.
For the sake of a concrete example, perhaps the first function atomically removes a node from a list, while the second function atomically removes two nodes from a list (and you never want another thread to see the list with only one of the two nodes taken out).
You don't need recursive mutexes for this. For example you could refactor the first function as a public function that takes the lock and calls a private function that does the operation "unsafely". The second function can then call the same private function.
However, sometimes it's convenient to use a recursive mutex instead. There's still an issue with this design: remove_two_nodes calls remove_one_node at a point where a class invariant doesn't hold (the second time it calls it, the list is in precisely the state we don't want to expose). But assuming we know that remove_one_node doesn't rely on that invariant this isn't a killer fault in the design, it's just that we've made our rules a little more complex than the ideal "all class invariants always hold whenever any public function is entered".
So, the trick is occasionally useful and I don't hate recursive mutexes to quite the extent that article does. I don't have the historical knowledge to argue that the reason for their inclusion in Posix is different from what the article says, "to demonstrate mutex attributes and thread extensons". I certainly don't consider them the default, though.
I think it's safe to say that if in your design you're uncertain whether you need a recursive lock or not, then your design is incomplete. You will later regret the fact that you're writing code and you don't know something so fundamentally important as whether the lock is allowed to be already held or not. So don't put in a recursive lock "just in case".
If you know that you need one, use one. If you know that you don't need one, then using a non-recursive lock isn't just an optimization, it's helping to enforce a constraint of the design. It's more useful for the second lock to fail, than for it to succeed and conceal the fact that you've accidentally done something that your design says should never happen. But if you follow your design, and never double-lock the mutex, then you'll never find out whether it's recursive or not, and so a recursive mutex isn't directly harmful.
This analogy might fail, but here's another way to look at it. Imagine you had a choice between two kinds of pointer: one that aborts the program with a stacktrace when you dereference a null pointer, and another one that returns 0 (or to extend it to more types: behaves as if the pointer refers to a value-initialized object). A non-recursive mutex is a bit like the one that aborts, and a recursive mutex is a bit like the one that returns 0. They both potentially have their uses -- people sometimes go to some lengths to implement a "quiet not-a-value" value. But in the case where your code is designed to never dereference a null pointer, you don't want to use by default the version that silently allows that to happen.
I'm not going to directly weigh in on the mutex versus recursive_mutex debate, but I thought it would be good to share a scenario where recursive_mutex'es are absolutely critical to the design.
When working with Boost::asio, Boost::coroutine (and probably things like NT Fibers although I'm less familiar with them), it is absolutely essential that your mutexes be recursive even without the design problem of re-entrancy.
The reason is because the coroutine based approach by its very design will suspend execution inside a routine and then subsequently resume it. This means that two top level methods of a class might "be being called at the same time on the same thread" without any sub calls being made.
I'm running into a mild conundrum concerning thread safety for my game loop. What I have below is 3 threads (including the main) that are meant to work together. One for event managing (main thread), one for logic, and one for the rendering. All 3 of these threads exist within their own class, as you can see below. In basic testing the structure works without problems. This system uses SFML and renders with OpenGL.
int main(){
Gamestate gs;
EventManager em(&gs);
LogicManager lm(&gs);
Renderer renderer(&gs);
lm.start();
renderer.start();
em.eventLoop();
return 0;
}
However, as you may have noticed I have a "Gamestate" class that is meant to act as a container of all the resources that need to be shared between the threads (mostly with LogicManager as a writer and Renderer as a reader. EventManager is mostly just for window events). My questions are: (1 and 2 being the most important)
1) Is this a good way of going about things? Meaning is having a "global" Gamestate class a good idea to use? Is there a better way of going about it?
2) My intention was to have Gamestate have mutexes in the getters/setters, except that doesn't work for reading because I can't return the object while it's still locked, which means I'd have to put synchronization outside of the getters/setters and make the mutexes public. It also means I'd have a bloody ton of mutexes for all the different resources. What is the most elegant way of going about this problem?
3) I have all of the threads accessing "bool run" to check if to continue their loops
while(gs->run){
....
}
run gets set to false if I receive a quit message in the EventManager. Do I need to synchronize that variable at all? Would I set it to volatile?
4) Does constantly dereferencing pointers and such have an impact on performance? eg gs->objects->entitylist.at(2)->move(); Do all those '->' and '.' cause any major slowdown?
Global state
1) Is this a good way of going about things? Meaning is having a "global" Gamestate class a good idea to use? Is there a better way of going about it?
For a game, as opposed to some reusable piece of code, I'd say a global state is good enough. You might even avoid passing gamestate pointers around, and really make it a global variable instead.
Synchronization
2) My intention was to have Gamestate have mutexes in the getters/setters, except that doesn't work for reading because I can't return the object while it's still locked, which means I'd have to put synchronization outside of the getters/setters and make the mutexes public. It also means I'd have a bloody ton of mutexes for all the different resources. What is the most elegant way of going about this problem?
I'd try to think of this in terms of transactions. Wrapping every single state change into its own mutex locking code will not only impact performance, but might lead to actually incorrect behaviour if the code gets one state element, performs some computation on it and sets the value later on, while some other code modified the same element in between. So I'd try to structure LogicManager and Renderer in such ways that all the interaction with the Gamestate occurs bundled in a few places. For the duration of that interaction, the thread should hold a mutex on the state.
If you want to enforce the use of mutexes, then you can create some construct where you have at least two classes. Let's call them GameStateData and GameStateAccess. GameStateData would contain all the state, but without providing public access to it. GameStateAccess would be a friend of GameStateData and provide access to its private data. The constructor of GameStateAccess would take a reference or pointer to the GameStateData and would lock the mutex for that data. The destructor would free the mutex. That way, your code to manipulate the state would simply be written as a block where a GameStateAccess object is in scope.
There is still a loophole, though: In cases where objects returned from this GameStateAccess class are pointers or references to mutable objects, then this setup won't keep your code from carrying such a pointer out of the scope protected by the mutex. To prevent this, either take care about how you write things, or use some custom pointer-like template class which can be cleared once the GameStateAccess goes out of scope, or make sure you only pass things by value not reference.
Example
Using C++11, the above idea for lock management could be implemented as follows:
class GameStateData {
private:
std::mutex _mtx;
int _val;
friend class GameStateAccess;
};
GameStateData global_state;
class GameStateAccess {
private:
GameStateData& _data;
std::lock_guard<std::mutex> _lock;
public:
GameStateAccess(GameStateData& data)
: _data(data), _lock(data._mtx) {}
int getValue() const { return _data._val; }
void setValue(int val) { _data._val = val; }
};
void LogicManager::performStateUpdate {
int valueIncrement = computeValueIncrement(); // No lock for this computation
{ GameStateAccess gs(global_state); // Lock will be held during this scope
int oldValue = gs.getValue();
int newValue = oldValue + valueIncrement;
gs.setValue(newValue); // still in the same transaction
} // free lock on global state
cleanup(); // No lock held here either
}
Loop termination indicator
3) I have all of the threads accessing "bool run" to check if to continue their loops
while(gs->run){
....
}
run gets set to false if I receive a quit message in the EventManager. Do I need to synchronize that variable at all? Would I set it to volatile?
For this application, a volatile but otherwise unsynchronized variable should be fine. You have to declare it volatile in order to prevent the compiler from generating code which caches that value, thus hiding a modification by another thread.
As an alternative, you might want to use a std::atomic variable for this.
Pointer indirection overhead
4) Does constantly dereferencing pointers and such have an impact on performance? eg gs->objects->entitylist.at(2)->move(); Do all those -> and . cause any major slowdown?
It depends on the alternatives. In many cases, the compiler will be able to keep the value of e.g. gs->objects->entitylist.at(2) in the above code, if it is used repeatedly, and won't have to compute it over and over again. In general I would consider the performance penalty due to all this pointer indirection to be of minor concern, but that is hard to tell for sure.
Is it a good way of going about things? (class Gamestate)
1) Is this a good way of going about things?
Yes.
Meaning is having a "global" Gamestate class a good idea to use?
Yes, if the getter/setter are thread-safe.
Is there a better way of going about it?
No. The data is necessary for both game logic and representation. You could remove the global gamestate if you put it in a sub-routine, but this would only transport your problem to another function. A global Gamestate will also enable you to safe the current state very easily.
Mutex and getters/setters
2) My intention was to have Gamestate have mutexes in the getters/setters [...]. What is the most elegant way of going about this problem?
This is called reader/writer problem. You don't need public mutexes for this. Just keep in mind that you can have many readers, but only one writer. You could implement a queue for the readers/writers and block additional readers until the writer has finished.
while(gs->run)
Do I need to synchronize that variable at all?
Whenever a non-synchronized access of a variable could result in a unknown state, it should be synchronized. So if run will be set to false immediately after the rendering engine started the next iteration and the Gamestate has been destroyed, it will result in a mess. However, if the gs->run is only an indicator whether the loop should continue, it is safe.
Keep in mind that both logic and rendering engine should be stopped at the same time. If you can't shutdown both at the same time stop the rendering engine first in order to prevent a freeze.
Dereferencing pointers
4) Does constantly dereferencing pointers and such have an impact on performance?
There are two rules of optimization:
Do not optimize
Do not optimize yet.
The compiler will probably take care of this problem. You, as a programmer, should use the version which is most readable for you.
I'm not sure if this is a question regarding programming technique or design but I'm open for suggestions.
The problem: I want to create an abstraction layer between data sources (sensors) and consumers. The idea is that the consumers only "know" the interfaces (abstract base class) of different sensor types. Each of this sensor types usually consists of several individual values which all have their own getter methods.
As an example I will use a simplified GPS sensor.
class IGpsSensor {
public:
virtual float getLongitude() = 0;
virtual float getLatitude() = 0;
virtual float getElevation() = 0;
// Deviations
virtual float getLongitudeDev() = 0;
virtual float getLatitudeDev() = 0;
virtual float getElevationDev() = 0;
virtual int getNumOfSatellites() = 0;
};
Since updates to the sensor are done by a different thread (details are up to the implementation of the interface), synchronizing getters and also the update methods seems like a reasonable approach to ensure consistency.
So far so good. In most cases this level of synchronization should suffice. However, sometimes it might be necessary to aquire more than one value (with consecutive getXXX() calls) and ensure that no update is happening in between. Whether this is necessary or not (and which values are important) is up to the consumer.
Sticking to the example, in a lot of cases it is only important to know longitude and latitude (but hopefully both relating to the same update()). I admit that this could be done be grouping them together into a "Position" class or struct. But a consumer might also use the sensor for a more complicated algorithm and requires the deviation as well.
Now I was wondering, what would be a proper way to do this.
Solutions I could think of:
Group all possible values into a struct (or class) and add an additional (synchronized) getter returning copies of all values at once - seems like a lot of unnecessary overhead to me in case only 2 or 3 out of maybe 10 values are needed.
Add a method returning a reference to the mutex used within the data source to allow locking by the consumer - this doesn't feel like "good design". And since getters are already synchronized, using a recursive mutex is mandatory. However, I assume that there are multiple readers but only one writer and thus I'd rather go with a shared mutex here.
Thanks for your help.
How about exposing a "Reader" interface? To get the reader object, you would do something like this:
const IGpsSensorReader& gps_reader = gps_sensor.getReader();
The IGpsSensorReader class could have access to protected members of the IGpsSensor class. When constructed, it would acquire the lock. Upon destruction, it would release the lock. An accessor could do something like this:
{ //block that accesses attributes
const IGpsSensorReader& gps_reader = gps_sensor.getReader();
//read whatever values from gps_reader it needs
} //closing the scope will destruct gps_reader, causing an unlock
You could also expose a getWriter method to the thread doing the updates. Internally, you could use boost's shared_mutex to mediate access between the readers and the writers.
A technique I've used in some simple projects is to only provide access to a proxy object. This proxy object holds a lock for the duration of its lifetime, and provides the actual interface to my data. This access does no synchronization itself, because it is only available through the proxy which is already locked appropriately. I've never tried expanding this to a full scale project, but it has seemed to work well for my purposes.
Possible solution: derive all your source classes from
class Transaction {
pthread_mutex_t mtx;
// constructor/destructor
public:
void beginTransaction() { pthread_mutex_lock(&mtx); } // ERROR CHECKING MISSING
void endTransaction() { pthread_mutex_unlock(&mtx); } // DO ERROR CHECKING
protected:
// helper method
int getSingle(int *ptr)
{ int v; beginTransaction(); v=*ptr; endTransaction(); return v; }
};
If you need to read out multiple values, use begin/endTransaction methods. To define your getValue functions, just call getSingle with pointer to the appropriate member [this is just a convenience method so that you don't have to call begin/endTransaction in each getValue function.].
You will need to flesh out some details, because if your getValue functions use begin/endTransaction, you won't be able to call them inside a transaction. (A mutex can be locked only once, unless it is configured to be recursive.)
I understand recursive mutex allows mutex to be locked more than once without getting to a deadlock and should be unlocked the same number of times. But in what specific situations do you need to use a recursive mutex? I'm looking for design/code-level situations.
For example when you have function that calls it recursively, and you want to get synchronized access to it:
void foo() {
... mutex_acquire();
... foo();
... mutex_release();
}
without a recursive mutex you would have to create an "entry point" function first, and this becomes cumbersome when you have a set of functions that are mutually recursive. Without recursive mutex:
void foo_entry() {
mutex_acquire(); foo(); mutex_release(); }
void foo() { ... foo(); ... }
Recursive and non-recursive mutexes have different use cases. No mutex type can easily replace the other. Non-recursive mutexes have less overhead, and recursive mutexes have in some situations useful or even needed semantics and in other situations dangerous or even broken semantics. In most cases, someone can replace any strategy using recursive mutexes with a different safer and more efficient strategy based on the usage of non-recursive mutexes.
If you just want to exclude other threads from using your mutex protected resource, then you could use any mutex type, but might want to use the non-recursive mutex because of its smaller overhead.
If you want to call functions recursively, which lock the same mutex, then they either
have to use one recursive mutex, or
have to unlock and lock the same non-recursive mutex again and again (beware of concurrent threads!) (assuming this is semantically sound, it could still be a performance issue), or
have to somehow annotate which mutexes they already locked (simulating recursive ownership/mutexes).
If you want to lock several mutex-protected objects from a set of such objects, where the sets could have been built by merging, you can choose
to use per object exactly one mutex, allowing more threads to work in parallel, or
to use per object one reference to any possibly shared recursive mutex, to lower the probability of failing to lock all mutexes together, or
to use per object one comparable reference to any possibly shared non-recursive mutex, circumventing the intent to lock multiple times.
If you want to release a lock in a different thread than it has been locked, then you have to use non-recursive locks (or recursive locks which explicitly allow this instead of throwing exceptions).
If you want to use synchronization variables, then you need to be able to explicitly unlock the mutex while waiting on any synchronization variable, so that the resource is allowed to be used in other threads. That is only sanely possible with non-recursive mutexes, because recursive mutexes could already have been locked by the caller of the current function.
I encountered the need for a recursive mutex today, and I think it's maybe the simplest example among the posted answers so far:
This is a class that exposes two API functions, Process(...) and reset().
public void Process(...)
{
acquire_mutex(mMutex);
// Heavy processing
...
reset();
...
release_mutex(mMutex);
}
public void reset()
{
acquire_mutex(mMutex);
// Reset
...
release_mutex(mMutex);
}
Both functions must not run concurrently because they modify internals of the class, so I wanted to use a mutex.
Problem is, Process() calls reset() internally, and it would create a deadlock because mMutex is already acquired.
Locking them with a recursive lock instead fixes the problem.
If you want to see an example of code that uses recursive mutexes, look at the sources for "Electric Fence" for Linux/Unix. 'Twas one of the common Unix tools for finding "bounds checking" read/write overruns and underruns as well as using memory that has been freed, before Valgrind came along.
Just compile and link electric fence with sources (option -g with gcc/g++), and then link it with your software with the link option -lefence, and start stepping through the calls to malloc/free. http://elinux.org/Electric_Fence
It would certainly be a problem if a thread blocked trying to acquire (again) a mutex it already owned...
Is there a reason to not permit a mutex to be acquired multiple times by the same thread?
In general, like everyone here said, it's more about design. A recursive mutex is normally used in a recursive functions.
What others fail to tell you here is that there's actually almost no cost overhead in recursive mutexes.
In general, a simple mutex is a 32 bits key with bits 0-30 containing owner's thread id and bit 31 a flag saying if the mutex has waiters or not. It has a lock method which is a CAS atomic race to claim the mutex with a syscall in case of failure. The details are not important here. It looks like this:
class mutex {
public:
void lock();
void unlock();
protected:
uint32_t key{}; //bits 0-30: thread_handle, bit 31: hasWaiters_flag
};
a recursive_mutex is normally implemented as:
class recursive_mutex : public mutex {
public:
void lock() {
uint32_t handle = current_thread_native_handle(); //obtained from TLS memory in most OS
if ((key & 0x7FFFFFFF) == handle) { // Impossible to return true unless you own the mutex.
uses++; // we own the mutex, just increase uses.
} else {
mutex::lock(); // we don't own the mutex, try to obtain it.
uses = 1;
}
}
void unlock() {
// asserts for debug, we should own the mutex and uses > 0
--uses;
if (uses == 0) {
mutex::unlock();
}
}
private:
uint32_t uses{}; // no need to be atomic, can only be modified in exclusion and only interesting read is on exclusion.
};
As you see it's an entirely user space construct. (base mutex is not though, it MAY fall into a syscall if it fails to obtain the key in an atomic compare and swap on lock and it will do a syscall on unlock if the has_waitersFlag is on).
For a base mutex implementation: https://github.com/switchbrew/libnx/blob/master/nx/source/kernel/mutex.c
If you want to be able to call public methods from different threads inside other public methods of a class and many of these public methods change the state of the object, you should use a recursive mutex. In fact, I make it a habit of using by default a recursive mutex unless there is a good reason (e.g. special performance considerations) not to use it.
It leads to better interfaces, because you don't have to split your implementation among non-locked and locked parts and you are free to use your public methods with peace of mind inside all methods as well.
It leads also in my experience to interfaces that are easier to get right in terms of locking.
Seems no one mentioned it before, but code using recursive_mutex is way easier to debug, since its internal structure contains identifier of a thread holding it.