Hi Fellow Boost Enthusiasts
We have run into a problem with shared_mutex and have been digging into the boost source.
We can't tell if this is a deadlock case, or we are just not understanding the shared
mutex implementation for reader/writer locks.
Application :
We have a map map< Ptr*, data> that needs to be created and queried by multiple threads.
However, most of the Ptr* values are common, so there is a fast warmup followed by
what we believe is a pattern of almost no insertions into the map. So we thought to use
a reader/writer pattern to control access to the map, like this
boost::mutex& lock_;
bool found = false;
{
shared_lock<boost::shared_mutex> slock(lock_);
(search the map to see if you have key)
if (keyFound) {
found = true;
}
}
if (!found) {
upgrade_lock<boost::shared_mutex> ulock(lock_);
(search the map again to see if the key has been inserted)
if (key still found) {
upgrade_to_unique_lock<boost::shared_mutex> wlock(ulock);
insert into the map & do whatever
}
}
The original shared_lock should be destroyed when the block goes out of scope,
making the upgrade_lock the only lock if the original shared_lock does not succeed.
Observations:
All our threads are stuck for days in
_lll_lock_wait or pthread_mutex_lock
Upon digging into the boost::shared_mutex implementation, we find that there is
a single common "state_changed" lock inside the mutex, and in order for either the
shared_lock or unique_lock to succeed, it needs to acquire the common state_changed lock
for both lock construction and destruction. It seems that the unique_lock will go into
a state where it may release the scoped_lock on state_changed, but we cannot tell.
Either way, we cannot tell why the threads basically lock up for long periods of time
with sporadic progress - it's not quite a deadlock but something close.
Any help appreciated.
Sam Appleton
Take look at Boost.Thread change log, in particular at issue #7755 "Thread: deadlock with shared_mutex on Windows", which was fixed in 1.54. It might be the issue you encounter.
By the way, a lot of Boost.Thread bugs have been fixed since 1.50, so it's worth upgrading to the latest version.
Related
I am trying to use read/write lock in C++ using shared_mutex
typedef boost::shared_mutex Lock;
typedef boost::unique_lock< Lock > WriteLock;
typedef boost::shared_lock< Lock > ReadLock;
class Test {
Lock lock;
WriteLock writeLock;
ReadLock readLock;
Test() : writeLock(lock), readLock(lock) {}
readFn1() {
readLock.lock();
/*
Some Code
*/
readLock.unlock();
}
readFn2() {
readLock.lock();
/*
Some Code
*/
readLock.unlock();
}
writeFn1() {
writeLock.lock();
/*
Some Code
*/
writeLock.unlock();
}
writeFn2() {
writeLock.lock();
/*
Some Code
*/
writeLock.unlock();
}
}
The code seems to be working fine but I have a few conceptual questions.
Q1. I have seen the recommendations to use unique_lock and shared_lock on http://en.cppreference.com/w/cpp/thread/shared_mutex/lock, but I don't understand why because shared_mutex already supports lock and lock_shared methods?
Q2. Does this code have the potential to cause write starvation? If yes then how can I avoid the starvation?
Q3. Is there any other locking class I can try to implement read write lock?
Q1: use of a mutex wrapper
The recommendation to use a wrapper object instead of managing the mutex directly is to avoid unfortunate situation where your code is interrupted and the mutex is not released, leaving it locked forever.
This is the principle of RAII.
But this only works if your ReadLock or WriteLock are local to the function using it.
Example:
readFn1() {
boost::unique_lock< Lock > rl(lock);
/*
Some Code
==> imagine exception is thrown
*/
rl.unlock(); // this is never reached if exception thrown
} // fortunately local object are destroyed automatically in case
// an excpetion makes you leave the function prematurely
In your code this won't work if one of the function is interupted, becaus your ReadLock WriteLock object is a member of Test and not local to the function setting the lock.
Q2: Write starvation
It is not fully clear how you will invoke the readers and the writers, but yes, there is a risk:
as long as readers are active, the writer is blocked by the unique_lock waiting for the mutex to be aquirable in exclusive mode.
however as long as the wrtier is waiting, new readers can obtain access to the shared lock, causing the unique_lock to be further delayed.
If you want to avoid starvation, you have to ensure that waiting writers do get the opportunity to set their unique_lock. For example att in your readers some code to check if a writer is waiting before setting the lock.
Q3 Other locking classes
Not quite sure what you're looking for, but I have the impression that condition_variable could be of interest for you. But the logic is a little bit different.
Maybe, you could also find a solution by thinking out of the box: perhaps there's a suitable lock-free data structure that could facilitate coexistance of readers and writers by changing slightly the approach ?
The types for the locks are ok but instead of having them as member functions create then inside the member functions locktype lock(mymutex). That way they are released on destruction even in the case of an exception.
Q1. I have seen the recommendations to use unique_lock and shared_lock on http://en.cppreference.com/w/cpp/thread/shared_mutex/lock, but I don't understand why because shared_mutex already supports lock and lock_shared methods?
Possibly because unique_lock has been around since c++11 but shared_lock is coming onboard with c++17. Also, [possibly] unique_lock can be more efficient. Here's the original rationale for shared_lock [by the creator] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2406.html and I defer to that.
Q2. Does this code have the potential to cause write starvation? If yes then how can I avoid the starvation?
Yes, absolutely. If you do:
while (1)
writeFn1();
You can end up with a time line of:
T1: writeLock.lock()
T2: writeLock.unlock()
T3: writeLock.lock()
T4: writeLock.unlock()
T5: writeLock.lock()
T6: writeLock.unlock()
...
The difference T2-T1 is arbitrary, based on amount of work being done. But, T3-T2 is near zero. This is the window for another thread to acquire the lock. Because the window is so small, it probably won't get it.
To solve this, the simplest method is to insert a small sleep (e.g. nanosleep) between T2 and T3. You could do this by adding it to the bottom of writeFn1.
Other methods can involve creating a queue for the lock. If a thread can't get the lock, it adds itself to a queue and the first thread on the queue gets the lock when the lock is released. In the linux kernel, this is implemented for a "queued spinlock"
Q3. Is there any other locking class I can try to implement read write lock?
While not a class, you could use pthread_mutex_lock and pthread_mutex_unlock. These implement recursive locks. You could add your own code to implement the equivalent of boost::scoped_lock. Your class can control the semantics.
Or, boost has its own locks.
I have a need for interprocess synchronization around a piece of hardware. Because this code will need to work on Windows and Linux, I'm wrapping with Boost Interprocess mutexes. Everything works well accept my method for checking abandonment of the mutex. There is the potential that this can happen and so I must prepare for it.
I've abandoned the mutex in my testing and, sure enough, when I use scoped_lock to lock the mutex, the process blocks indefinitely. I figured the way around this is by using the timeout mechanism on scoped_lock (since much time spent Googling for methods to account for this don't really show much, boost doesn't do much around this because of portability reasons).
Without further ado, here's what I have:
#include <boost/interprocess/sync/named_recursive_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
typedef boost::interprocess::named_recursive_mutex MyMutex;
typedef boost::interprocess::scoped_lock<MyMutex> ScopedLock;
MyMutex* pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");
{
// ScopedLock lock(*pGate); // this blocks indefinitely
boost::posix_time::ptime timeout(boost::posix_time::microsec_clock::local_time() + boost::posix_time::seconds(10));
ScopedLock lock(*pGate, timeout); // a 10 second timeout that returns immediately if the mutex is abandoned ?????
if(!lock.owns()) {
delete pGate;
boost::interprocess::named_recursive_mutex::remove("MutexName");
pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");
}
}
That, at least, is the idea. Three interesting points:
When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected.
When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?
When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.
So, what am I missing on using these objects? Perhaps it's staring me in the face, but I can't see it and so I'm asking for help.
I should also mention that, because of how this hardware works, if the process cannot gain ownership of the mutex within 10 seconds, the mutex is abandoned. In fact, I could probably wait as little as 50 or 60 milliseconds, but 10 seconds is a nice "round" number of generosity.
I'm compiling on Windows 7 using Visual Studio 2010.
Thanks,
Andy
When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected
The best solution for your problem would be if boost had support for robust mutexes. However Boost currently does not support robust mutexes. There is only a plan to emulate robust mutexes, because only linux has native support on that. The emulation is still just planned by Ion Gaztanaga, the library author.
Check this link about a possible hacking of rubust mutexes into the boost libs:
http://boost.2283326.n4.nabble.com/boost-interprocess-gt-1-45-robust-mutexes-td3416151.html
Meanwhile you might try to use atomic variables in a shared segment.
Also take a look at this stackoverflow entry:
How do I take ownership of an abandoned boost::interprocess::interprocess_mutex?
When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?
This is very strange, you should not get this behavior. However:
The timed lock is possibly implemented in terms of the try lock. Check this documentation:
http://www.boost.org/doc/libs/1_53_0/doc/html/boost/interprocess/scoped_lock.html#idp57421760-bb
This means, the implementation of the timed lock might throw an exception internally and then returns false.
inline bool windows_mutex::timed_lock(const boost::posix_time::ptime &abs_time)
{
sync_handles &handles =
windows_intermodule_singleton<sync_handles>::get();
//This can throw
winapi_mutex_functions mut(handles.obtain_mutex(this->id_));
return mut.timed_lock(abs_time);
}
Possibly, the handle cannot be obtained, because the mutex is abandoned.
When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.
I am not sure about this one, but I think the named mutex is implemented by using a shared memory. If you are using Linux, check for the file /dev/shm/MutexName. In Linux, a file descriptor remains valid until that is not closed, no matter if you have removed the file itself by e.g. boost::interprocess::named_recursive_mutex::remove.
Check out the BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING and BOOST_INTERPROCESS_TIMEOUT_WHEN_LOCKING_DURATION_MS compile flags. Define the first symbol in your code to force the interprocess mutexes to time out and the second symbol to define the timeout duration.
I helped to get them added to the library to solve the abandoned mutex issue. It was necessary to add it due to many interprocess constructs (like message_queue) that rely on the simple mutex rather than the timed mutex. There may be a more robust solution in the future, but this solution has worked just fine for my interprocess needs.
I'm sorry I can't help you with your code at the moment; something is not working correctly there.
BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING is not so good. It throws an exception and does not help much. To workaround exceptional behaviour I wrote this macro. It works just alright for common purposed. In this sample named_mutex is used. The macro creates a scoped lock with a timeout, and if the lock cannot be acquired for EXCEPTIONAL reasons, it will unlock it afterwards. This way the program can lock it again later and does not freeze or crash immediately.
#define TIMEOUT 1000
#define SAFELOCK(pMutex) \
boost::posix_time::ptime wait_time \
= boost::posix_time::microsec_clock::universal_time() \
+ boost::posix_time::milliseconds(TIMEOUT); \
boost::interprocess::scoped_lock<boost::interprocess::named_mutex> lock(*pMutex, wait_time); \
if(!lock.owns()) { \
pMutex->unlock(); }
But even this is not optimal, because the code to be locked now runs unlocked once. This may cause problems. You can easily extend the macro however. E.g. run code only if lock.owns() is true.
boost::interprocess::named_mutex has 3 defination:
on windows, you can use macro to use windows mutex instead of boost mutex, you can try catch the abandoned exception, and you should unlock it!
on linux, the boost has pthread_mutex, but it not robust attribute in 1_65_1version
so I implemented interprocess_mutex myself use system API(windows Mutex and linux pthread_mutex process shared mode), but windows Mutex is in the kernel instead of file.
Craig Graham answered this in a reply already but I thought I'd elaborate because I found this, didn't read his message, and beat my head against it to figure it out.
On a POSIX system, timed lock calls:
timespec ts = ptime_to_timespec(abs_time);
pthread_mutex_timedlock(&m_mut, &ts)
Where abs_time is the ptime that the user passes into interprocess timed_lock.
The problem is, that abs_time must be in UTC, not system time.
Assume that you want to wait for 10 seconds; if you're ahead of UTC your timed_lock() will return immediately,
and if you're behind UTC, your timed_lock() will return in hours_behind - 10 seconds.
The following ptime times out an interprocess mutex in 10 seconds:
boost::posix_time::ptime now = boost::posix_time::second_clock::universal_time() +
boost::posix_time::seconds(10);
If I use ::local_time() instead of ::universal_time(), since I'm ahead of UTC, it returns immediately.
The documentation fails to mention this.
I haven't tried it, but digging into the code a bit, it looks like the same problem would occur on a non-POSIX system.
If BOOST_INTERPROCESS_POSIX_TIMEOUTS is not defined, the function ipcdetail::try_based_timed_lock(*this, abs_time) is called.
It uses universal time as well, waiting on while(microsec_clock::universal_time() < abs_time).
This is only speculation, as I don't have quick access to a Windows system to test this on.
For full details, see https://www.boost.org/doc/libs/1_76_0/boost/interprocess/sync/detail/common_algorithms.hpp
According to Boost documentation boost::mutex and boost::timed_mutex are supposed to be different. The first one implements Lockable Concept, and the second - TimedLockable Concept.
But if you take a look at the source, you can see they're basically the same thing. The only difference are lock typedefs. You can call timed_lock on boost::mutex or use boost::unique_lock with timeout just fine.
typedef ::boost::detail::basic_timed_mutex underlying_mutex;
class mutex:
public ::boost::detail::underlying_mutex
class timed_mutex:
public ::boost::detail::basic_timed_mutex
What's the rationale behind that? Is it some remnant of the past, is it wrong to use boost::mutex as a TimedLockable? It's undocumented after all.
I have not looked at the source, but I used these a few days ago, and the timed mutexes function differently. They block until the time is up, then return. A unique lock will block until it can get the lock.
A try lock will not block, and you can then test to see if it has ownership of the lock. A timed lock will block for the specified amount of time, then behave as a try lock - that is, cease blocking, and you can test for ownership of the lock.
I believe that internally some of the different boost locks are typedefs for unique lock since they all use unique locking. The typedef names are there so that you can keep track of what you are using different ones for, even though you could use different functionality and confuse your client code.
Edit: here is an example of a timed lock:
boost::timed_mutex timedMutexObj;
boost::mutex::scoped_lock scopedLockObj(timedMutexObj, boost::get_system_time() + boost::posix_time::seconds(60));
if(scopedLockObj.owns_lock()) {
// proceed
}
For reference: http://www.boost.org/doc/libs/1_49_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_concepts.timed_lockable.timed_lock
Edit again: to provide a specific answer to your question, yes, it would be wrong to use boost::mutex as a TimedLockable because boost::timed_mutex is provided for this purpose. If they are the same thing in the source and this is undocumented, this is unreliable behavior and you should follow the documentation. (My code example did not used timed_mutex at first but I updated it)
There is a widely known way of locking multiple locks, which relies on choosing fixed linear ordering and aquiring locks according to this ordering.
That was proposed, for example, in the answer for "Acquire a lock on two mutexes and avoid deadlock". Especially, the solution based on address comparison seems to be quite elegant and obvious.
When I tried to check how it is actually implemented, I've found, to my surprise, that this solution in not widely used.
To quote the Kernel Docs - Unreliable Guide To Locking:
Textbooks will tell you that if you always lock in the same order, you
will never get this kind of deadlock. Practice will tell you that this
approach doesn't scale: when I create a new lock, I don't understand
enough of the kernel to figure out where in the 5000 lock hierarchy it
will fit.
PThreads doesn't seem to have such a mechanism built in at all.
Boost.Thread came up with
completely different solution, lock() for multiple (2 to 5) mutexes is based on trying and locking as many mutexes as it is possible at the moment.
This is the fragment of the Boost.Thread source code (Boost 1.48.0, boost/thread/locks.hpp:1291):
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
unsigned const lock_count=3;
unsigned lock_first=0;
for(;;)
{
switch(lock_first)
{
case 0:
lock_first=detail::lock_helper(m1,m2,m3);
if(!lock_first)
return;
break;
case 1:
lock_first=detail::lock_helper(m2,m3,m1);
if(!lock_first)
return;
lock_first=(lock_first+1)%lock_count;
break;
case 2:
lock_first=detail::lock_helper(m3,m1,m2);
if(!lock_first)
return;
lock_first=(lock_first+2)%lock_count;
break;
}
}
}
where lock_helper returns 0 on success and number of mutexes that weren't successfully locked otherwise.
Why is this solution better, than comparing addresses or any other kind of ids? I don't see any problems with pointer comparison, which can be avoided using this kind of "blind" locking.
Are there any other ideas on how to solve this problem on a library level?
From the bounty text:
I'm not even sure if I can prove correctness of the presented Boost solution, which seems more tricky than the one with linear order.
The Boost solution cannot deadlock because it never waits while already holding a lock. All locks but the first are acquired with try_lock. If any try_lock call fails to acquire its lock, all previously acquired locks are freed. Also, in the Boost implementation the new attempt will start from the lock failed to acquire the previous time, and will first wait till it is available; it's a smart design decision.
As a general rule, it's always better to avoid blocking calls while holding a lock. Therefore, the solution with try-lock, if possible, is preferred (in my opinion). As a particular consequence, in case of lock ordering, the system at whole might get stuck. Imagine the very last lock (e.g. the one with the biggest address) was acquired by a thread which was then blocked. Now imagine some other thread needs the last lock and another lock, and due to ordering it will first get the other one and will wait on the last lock. Same can happen with all other locks, and the whole system makes no progress until the last lock is released. Of course it's an extreme and rather unlikely case, but it illustrates the inherent problem with lock ordering: the higher a lock number the more indirect impact the lock has when acquired.
The shortcoming of the try-lock-based solution is that it can cause livelock, and in extreme cases the whole system might also get stuck for at least some time. Therefore it is important to have some back-off schema that make pauses between locking attempts longer with time, and perhaps randomized.
Sometimes, lock A needs to be acquired before lock B does. Lock B might have either a lower or a higher address, so you can't use address comparison in this case.
Example: When you have a tree data-structure, and threads try to read and update nodes, you can protect the tree using a reader-writer lock per node. This only works if your threads always acquire locks top-down root-to-leave. The address of the locks does not matter in this case.
You can only use address comparison if it does not matter at all which lock gets acquired first. If this is the case, address comparison is a good solution. But if this is not the case you can't do it.
I guess the Linux kernel requires certain subsystems to be locked before others are. This cannot be done using address comparison.
The "address comparison" and similar approaches, although used quite often, are special cases. They works fine if you have
a lock-free mechanism to get
two (or more) "items" of the same kind or hierarchy level
any stable ordering schema between those items
For example: You have a mechanism to get two "accounts" from a list. Assume that the access to the list is lock-free. Now you have pointers to both items and want to lock them. Since they are "siblings" you have to choose which one to lock first. Here the approach using addresses (or any other stable ordering schema like "account id") is OK.
But the linked Linux text talks about "lock hierarchies". This means locking not between "siblings" (of the same kind) but between "parent" and "children" which might be from different types. This may happen in actual tree structures as well in other scenarios.
Contrived example: To load a program you must
lock the file inode,
lock the process table
lock the destination memory
These three locks are not "siblings" not in a clear hierarchy. The locks are also not taken directly one after the other - each subsystem will take the locks at free will. If you consider all usecases where those three (and more) subsystems interact you see, that there is no clear, stable ordering you can think of.
The Boost library is in the same situation: It strives to provide generic solutions. So they cannot assume the points from above and must fall back to a more complicated strategy.
One scenario when address compare will fail is if you use the proxy pattern.
You can delegate the locks to the same object and the addresses will be different.
Consider the following example
template<typename MutexType>
class MutexHelper
{
MutexHelper(MutexType &m) : _m(m) {}
void lock()
{
std::cout <<"locking ";
m.lock();
}
void unlock()
{
std::cout <<"unlocking ";
m.unlock();
}
MutexType &_m;
};
if the function
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3);
will actually use address compare the following code ca produce a deadlock
Mutex m1;
Mutex m1;
thread1
MutexHelper hm1(m1);
MutexHelper hm2(m2);
lock(hm1, hm2);
thread2:
MutexHelper hm2(m2);
MutexHelper hm1(m1);
lock(hm1, hm2);
EDIT:
this is an interesting thread that share some light on boost::lock implementation
thread-best-practice-to-lock-multiple-mutexes
Address compare does not work for inter-process shared mutexes (named synchronization objects).
I don't well understand the difference betweeen these two lock classes.
In boost documentation it is said, boost::unique_lock doesn't realize lock automatically.
Does it mean that the main difference between unique_lock and lock_guard is that with unique_lock we must call explicitly the lock() function ?
First to answer your question. No you don't need to call lock on a unique_lock. See below:
The unique_lock is only a lock class with more features. In most cases the lock_guard will do what you want and will be sufficient.
The unique_lock has more features to offer to you. E.g a timed wait if you need a timeout or if you want to defer your lock to a later point than the construction of the object. So it highly depends on what you want to do.
BTW: The following code snippets do the same thing.
boost::mutex mutex;
boost::lock_guard<boost::mutex> lock(mutex);
boost::mutex mutex;
boost::unique_lock<boost::mutex> lock(mutex);
The first one can be used to synchronize access to data, but if you want to use condition variables you need to go for the second one.
The currently best voted answer is good, but it did not clarify my doubt till I dug a bit deeper so decided to share with people who might be in the same boat.
Firstly both lock_guard and unique_lock follows the RAII pattern, in the simplest use case the lock is acquired during construction and unlocked during destruction automatically. If that is your use case then you don't need the extra flexibility of unique_lock and lock_guard will be more efficient.
The key difference between both is a unique_lock instance doesn't need to always own the mutex it is associated with while in lock_guard it owns the mutex. This means unique_lock would need to have an extra flag indicating whether it owns the lock and another extra method 'owns_lock()' to check that. Knowing this we can explain all extra benefits this flags brings with the overhead of that extra data to be set and checked
Lock doesn't have to taken right at the construction, you can pass the flag std::defer_lock during its construction to keep the mutex unlocked during construction.
We can unlock it before the function ends and don't have to necessarily wait for destructor to release it, which can be handy.
You can pass the ownership of the lock from a function, it is movable and not copyable.
It can be used with conditional variables since that requires mutex to be locked, condition checked and unlocked while waiting for a condition.
Their implementation can be found under path .../boost/thread/locks.hpp - and they are sitting just one next to other :) To sum things short:
lock_guard is a short simple utility class that locks mutex in constructor and unlocks in destructor, not caring about details.
unique_lock is a bit more complex one, adding pretty lot of features - but it still locks automatically in constructor. It is called unique_lock because it introduces "lock ownership" concept ( see owns_lock() method ).
If you're used to pthreads(3):
boost::mutex = pthread_mutex_*
boost::unique_lock = pthread_rwlock_* used to obtain write/exclusive locks (i.e. pthread_rwlock_wrlock)
boost::shared_lock = pthread_rwlock_* used to obtain read/shared locks (i.e. pthread_rwlock_rdlock)
Yes a boost::unique_lock and a boost::mutex function in similar ways, but a boost::mutex is generally a lighter weight mutex to acquire and release. That said, a shared_lock with the lock already acquired is faster (and allows for concurrency), but it's comparatively expensive to obtain a unique_lock.
You have to look under the covers to see the implementation details, but that's the gist of the intended differences.
Speaking of performance: here's a moderately useful comparison of latencies:
http://www.eecs.berkeley.edu/%7Ercs/research/interactive_latency.html
It would be nice if I/someone could benchmark the relative cost of the different pthread_* primitives, but last I looked, pthread_mutex_* was ~25us, whereas pthread_rwlock_* was ~20-100us depending on whether or not the read lock had been already acquired (~10us) or not (~20us) or writer (~100us). You'll have to benchmark to confirm current numbers and I'm sure it's very OS specific.
I think unique_lock may be also used when you need to emphasize the difference between unique and shared locks.