Upgradeable read/write lock Win32 - c++

I am in search of an upgradeable read write lock for win32 with the behaviour of pthreads rwlock, where a read lock can be up- and downgraded.
What I want:
pthread_rwlock_rdlock( &lock );
...read...
if( some condition ) {
pthread_rwlock_wrlock( &lock );
...write...
pthread_rwlock_unlock( &lock );
}
...read...
pthread_rwlock_unlock( &lock );
The upgrade behaviour is not required by posix, but it works on linux on mac.
Currently, I have a working implementation (based on an event, a semaphore and a critical section) that is upgradeable, but the upgrade may fail when readers are active. If it fails a read unlock + recheck + write lock is necessary.
What I have:
lock.rdlock();
...read...
if( some condition ) {
if( lock.tryupgrade() ) {
...write...
lock.unlock();
return;
} else {
lock.unlock();
// <- here, other threads may alter the condition ->
lock.wrlock();
if( some condition ) { // so, re-check required
...write...
}
lock.unlock();
return;
}
}
...read...
lock.unlock();
EDIT: The bounty:
I am still in search, but want to add some restrictions: it is used intra-process-only (so based on critical sections is ok, WIN32 mutexes are not ok), and it should be pure WIN32 API (no MFC, ATL etc.). Acquiring read locks should be fast (so, acquiring the read lock should not enter a critical section in its fast path). Maybe an InterlockedIncrement based solution is possible?

The boost shared_mutex class supports reader (shared) and writer (unique) locks and temporary upgrades from shared to unique locks.
Example for boost shared_mutex (multiple reads/one write)?
I don't recommend writing your own, it's a tricky thing to get right and difficult to test thoroughly.

pthread library is a 'Portable Threads' library. That means it's also supported on windows ;) Have a look: Pthreads-w32.
Additionally, consider using OpenMP instead of locks: the compiler extension provides portable critical sections, kernes threading model, tasks and much more! MS C++ supports this technology as well as g++ in Linux.
Cheers! :)

What is wrong with this approach?
// suppose:
struct RwLock
{
void AcquireExclusive();
void AcquireShared();
void Release();
bool TryAcquireExclusive();
};
// rwlock that has TryAcquireSharedToExclusive
struct ConvertableRwLock
{
void AcquireExclusive()
{
writeIntent.AcquireExclusive();
rwlock.AcquireExclusive();
writeIntent.Release();
}
void AcquireShared()
{
readIntent.AcquireShared();
rwlock.AcquireShared();
readIntent.Release();
}
void Release()
{
rwlock.Release();
}
bool TryConvertSharedToExclusive()
{
// Defer to other exclusive including convert.
// Avoids deadlock with other TryConvertSharedToExclusive.
if (!writeIntent.TryAcquireExclusive())
{
rwlock.Release();
return false;
}
// starve readers
readIntent.AcquireExclusive();
// release full lock, but no other can acquire since
// this thread has readIntent and writeIntent.
rwlock.Release();
// Wait for other shared to drain.
rwlock.AcquireExclusive();
readIntent.Release();
writeIntent.Release();
return true;
}
private:
RwLock rwlock;
RwLock readIntent;
RwLock writeIntent;
};

Related

Conditionally acquire an std::mutex

I have a multi-threaded app that uses the GPU, which is inherently single-threaded, and the actual APIs I use, cv::gpu::FAST_GPU, does crash when I try to use them multi-threaded, so basically I have:
static std::mutex s_FAST_GPU_mutex;
{
std::lock_guard<std::mutex> guard(s_FAST_GPU_mutex);
cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
}
Now, benchmarking the code shows me FAST_GPU() in isolation is faster than the CPU FAST(), but in the actual application my other threads spend a lot of time waiting for the lock, so the overall throughput is worse.
Looking through the documentation, and at this answer it seems that this might be possible:
static std::mutex s_FAST_GPU_mutex;
static std::unique_lock<std::mutex> s_FAST_GPU_lock(s_FAST_GPU_mutex, std::defer_lock);
{
// Create an unlocked guard
std::lock_guard<decltype(s_FAST_GPU_lock)> guard(s_FAST_GPU_lock, std::defer_lock);
if (s_FAST_GPU_lock.try_lock())
{
cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
}
else
{
cv::FAST(/*parameters*/);
}
}
However, this will not compile as std::lock_guard only accepts a std::adopt_lock. How can I implement this properly?
It is actually unsafe to have a unique_lock accessible from multiple threads at the same time. I'm not familiar with the opencv portion of your question, so this answer is focused on the mutex/lock usage.
static std::mutex s_FAST_GPU_mutex;
{
// Create a unique lock, attempting to acquire
std::unique_lock<std::mutex> guard(s_FAST_GPU_mutex, std::try_to_lock);
if (guard.owns_lock())
{
cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
guard.unlock(); // Or just let it go out of scope later
}
else
{
cv::FAST(/*parameters*/);
}
}
This attempts to acquire the lock, if it succeeds, uses FAST_GPU, and then releases the lock. If the lock was already acquired, then goes down the second branch, invoking FAST
You can use std::lock_guard, if you adopt the mutex in the locked state, like this:
{
if (s_FAST_GPU_mutex.try_lock())
{
std::lock_guard<decltype(s_FAST_GPU_lock)> guard(s_FAST_GPU_mutex, std::adopt_lock);
cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
}
else
{
cv::FAST(/*parameters*/);
}
}

Cross-platform up- and downgradable read/write lock

I try to turn some central data structure of a large codebase multithreaded.
The access interfaces were changed to represent read/write locks, which may be up- and downgraded:
Before:
Container& container = state.getContainer();
auto value = container.find( "foo" )->bar;
container.clear();
Now:
ReadContainerLock container = state.getContainer();
auto value = container.find( "foo" )->bar;
{
// Upgrade read lock to write lock
WriteContainerLock write = state.upgrade( container );
write.clear();
} // Downgrades write lock to read lock
Using an actual std::mutex for the locking (instead of r/w implementation) works fine but brings no performance benefit (actually degrades runtime).
Actual changing data is relatively rare, so it seems very desirable to go with the read/write concept. The big issue now is that I cannot seem to find any library, which implements the read/write concept and supports upgrade and downgrade and works on Windows, OSX and Linux alike.
Boost has BOOST_THREAD_PROVIDES_SHARED_MUTEX_UPWARDS_CONVERSIONS but does not seem to support downgrading (blocking) atomic upgrading from shared to unique.
Is there any library out there, that supports the desired feature set?
EDIT:
Sorry for being unclear. Of course I mean multiple-readers/single-writer lock semantic.
The question has changed since I answered. As the previous answer is still useful, I will leave it up.
The new question seems to be "I want a (general purpose) reader writer lock where any reader can be upgraded to a writer atomically".
This cannot be done without deadlocks, or the ability to roll back operations (transactional reads), which is far from general-purpose.
Suppose you have Alice and Bob. Both want to read for a while, then they both want to write.
Alice and Bob both get a read lock. They then upgrade to a write lock. Neither can progress, because a write lock cannot be acquired while a read lock is acquired. You cannot unlock the read lock, because then the state Alice read while read locked may not be consistent with the state after the write lock is acquired.
This can only be solved with the possibility the read->write upgrade can fail, or the ability to rollback all operations in a read (so Alice can "unread", Bob can advance, then Alice can re-read and try to get the write lock).
Writing type-safe transactional code isn't really supported in C++. You can do it manually, but beyond simple cases it is error prone. Other forms of transactional rollbacks can also be used. None of them are general purpose reader-writer locks.
You can roll your own. If the states are R, U, W and {} (read, upgradable, write and no lock), these are transitions you can easily support:
{} -> R|U|W
R|U|W -> {}
U->W
W->U
U->R
and implied by the above:
W->R
which I think satisifies your requirements.
The "missing" transition is R->U, which is what lets us have multiple-readers safely. At most one reader (the upgrade reader) has the right to upgrade to write without releasing their read lock. While they are in that upgrade state they do not block other threads from reading (but they do block other threads from writing).
Here is a sketch. There is a shared_mutex A; and a mutex B;.
B represents the right to upgrade to write and the right to read while you hold it. All writers also hold a B, so you cannot both have the right to upgrade to write while someone else has the right to write.
Transitions look like:
{}->R = read(A)
{}->W = lock(B) then write(A)
{}->U = lock(B)
U->W = write(A)
W->U = unwrite(A)
U->R = read(A) then unlock(B)
W->R = W->U->R
R->{} = unread(A)
W->{} = unwrite(A) then unlock(B)
U->{} = unlock(B)
This simply requires std::shared_mutex and std::mutex, and a bit of boilerplate to write up the locks and the transitions.
If you want to be able to spawn a write lock while the upgrade lock "remains in scope" extra work needs to be done to "pass the upgrade lock back to the read lock".
Here are some bonus try transitions, inspired by #HowardHinnat below:
R->try U = return try_lock(B) && unread(A)
R->try W = return R->try U->W
Here is an upgradable_mutex with no try operations:
struct upgradeable_mutex {
std::mutex u;
std::shared_timed_mutex s;
enum class state {
unlocked,
shared,
aspiring,
unique
};
// one step at a time:
template<state start, state finish>
void transition_up() {
transition_up<start, (state)((int)finish-1)>();
transition_up<(state)((int)finish-1), finish>();
}
// one step at a time:
template<state start, state finish>
void transition_down() {
transition_down<start, (state)((int)start-1)>();
transition_down<(state)((int)start-1), finish>();
}
void lock();
void unlock();
void lock_shared();
void unlock_shared();
void lock_aspiring();
void unlock_aspiring();
void aspiring_to_unique();
void unique_to_aspiring();
void aspiring_to_shared();
void unique_to_shared();
};
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::unlocked, upgradeable_mutex::state::shared
>
() {
s.lock_shared();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::shared, upgradeable_mutex::state::unlocked
>
() {
s.unlock_shared();
}
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::unlocked, upgradeable_mutex::state::aspiring
>
() {
u.lock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::unlocked
>
() {
u.unlock();
}
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::unique
>
() {
s.lock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::unique, upgradeable_mutex::state::aspiring
>
() {
s.unlock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::shared
>
() {
s.lock();
u.unlock();
}
void upgradeable_mutex::lock() {
transition_up<state::unlocked, state::unique>();
}
void upgradeable_mutex::unlock() {
transition_down<state::unique, state::unlocked>();
}
void upgradeable_mutex::lock_shared() {
transition_up<state::unlocked, state::shared>();
}
void upgradeable_mutex::unlock_shared() {
transition_down<state::shared, state::unlocked>();
}
void upgradeable_mutex::lock_aspiring() {
transition_up<state::unlocked, state::aspiring>();
}
void upgradeable_mutex::unlock_aspiring() {
transition_down<state::aspiring, state::unlocked>();
}
void upgradeable_mutex::aspiring_to_unique() {
transition_up<state::aspiring, state::unique>();
}
void upgradeable_mutex::unique_to_aspiring() {
transition_down<state::unique, state::aspiring>();
}
void upgradeable_mutex::aspiring_to_shared() {
transition_down<state::aspiring, state::shared>();
}
void upgradeable_mutex::unique_to_shared() {
transition_down<state::unique, state::shared>();
}
I attempt to get the compiler to work out some of the above transitions "for me" with the transition_up and transition_down trick. I think I can do better, and it did increase code bulk significantly.
Having it 'auto-write' the unlocked-to-unique, and unique-to-(unlocked|shared) was all I got out of it. So probably not worth it.
Creating smart RAII objects that use the above is a bit tricky, as they have to support some transitions that the default unique_lock and shared_lock do not support.
You could just write aspiring_lock and then do conversions in there (either as operator unique_lock, or as methods that return said, etc), but the ability to convert from unique_lock&& down to shared_lock is exclusive to upgradeable_mutex and is a bit tricky to work with implicit conversions...
live example.
Here's my usual suggestion: Seqlock
You can have a single writer and many readers concurrently. Writers compete using a spinlock. A single writer doesn't need to compete so is cheaper.
Readers are truly only reading. They're not writing any state variables, counters, etc. This means you don't really know how many readers are there. But also, there no cache line ping pong so you get the best performance possible in terms of latency and throughput.
What's the catch? the data almost has to be POD. It doesn't really have to POD, but it can not be invalidated (no deleting std::map nodes) as readers may read it while it's being written.
It's only after the fact that readers discover the data is possibly bad and they have to re-read.
Yes, writers don't wait for readers so there's no concept of upgrade/downgrade. You can unlock one and lock the other. You pay less than with any sort of mutex but the data may have changed in the process.
I can go into more detail if you like.
The std::shared_mutex (as implemented in boost if not available on your platform(s)) provides some alternative for the problem.
For atomic upgrade lock semantics, the boost upgrade lock may be the best cross platform alternative.
It does not have an upgrade and downgrade locking mechanism you are looking for, but to get an exclusive lock, the shared access can be relinquished first, then exclusive access sought.
// assumes shared_lock with shared access has been obtained
ReadContainerLock container = state.getContainer();
auto value = container.find( "foo" )->bar;
{
container.shared_mutex().unlock();
// Upgrade read lock to write lock
std::unique_lock<std::shared_mutex> write(container.shared_mutex());
// container work...
write.unlock();
container.shared_mutex().lock_shared();
} // Downgrades write lock to read lock
A utility class can be used to cause the re-locking of the shared_mutex at the end of the scope;
struct re_locker {
re_locker(std::shared_mutex& m) : m_(m) { m_.unlock(); }
~re_locker() { m_.shared_lock(); }
// delete the copy and move constructors and assignment operator (redacted for simplicity)
};
// ...
auto value = container.find( "foo" )->bar;
{
re_locker re_lock(container.shared_mutex());
// Upgrade read lock to write lock
std::unique_lock<std::shared_mutex> write(container.shared_mutex());
// container work...
} // Downgrades write lock to read lock
Depending on what exception guarantees you want or require, you may need to add a "can re-lock" flag to the re_locker to either do the re-lock or not if an exception is thrown during the container operations/work.

Upgrading read lock to write lock without releasing the first in C++11?

I know it's possible using boost::UpgradeLockable in C++14.
Is there anything similar for C++11?
An upgradeable lock can be written on top of simpler locking primitives.
struct upgradeable_timed_mutex {
void lock() {
upgradable_lock();
upgrade_lock();
}
void unlock() {
upgrade_unlock();
upgradable_unlock();
}
void shared_lock() { shared.shared_lock(); }
void shared_unlock() { shared.shared_unlock(); }
void upgradable_lock() { unshared.lock(); }
void ungradable_unlock() { unshared.unlock(); }
void upgrade_lock() { shared.lock(); }
void upgrade_unlock() { shared.unlock(); }
private:
friend struct upgradable_lock;
std::shared_timed_mutex shared;
std::timed_mutex unshared;
};
and similar for the timed and try variants. Note that timed variants which access two mutexes in a row have to do some extra work to avoid spending up to 2x the requested time, and try_lock has to be careful about the first lock's state in case the 2nd fails.
Then you have to write upgradable_lock, with the ability to spawn a std::unique_lock upon request.
Naturally this is hand-written thread safety code, so it is unlikely to be correct.
In C++1z you can write an untimed version as well (with std::shared_mutex and std::mutex).
Less concretely, there can be exactly one upgradeable or write lock at a time. This is what the unshared mutex represents.
So long as you hold unshared, nobody else is writing to the guarded data, so you can read from it without holding the shared mutex at all.
When you want to upgrade, you can grab a unique lock on the shared mutex. This cannot deadlock so long as no readers try to upgrade to upgradable. This excludes readers from reading, you can write, and then you release it and return back to a read only state (where you only hold the unshared mutex).

Can mutex implementations be interchanged (independently of the thread implementation)

Do all mutex implementations ultimately call the same basic system/hardware calls - meaning that they can be interchanged?
Specifically, if I'm using __gnu_parallel algorithms (that uses openmp) and I want to make the classes they call threadsafe may I use boost::mutex for the locking? or must I write my own mutex such as the one described here
//An openmp mutex. Can this be replaced with boost::mutex?
class Mutex {
public:
Mutex() { omp_init_lock(&_mutex); }
~Mutex() { omp_destroy_lock(&_mutex); }
void lock() { omp_set_lock(&_mutex); }
void unlock() { omp_unset_lock(&_mutex); }
private:
omp_lock_t _mutex;
};
Edit, the link above to the openmp mutex seems to be broken, for anyone interested, the lock that goes with this mutex is along these lines
class Lock
{
public:
Lock(Mutex& mutex)
: m_mutex(mutex),
m_release(false)
{
m_mutex.lock();
}
~Lock()
{
if (!m_release)
m_mutex.unlock();
}
bool operator() const
{
return !m_release;
}
void release()
{
if (!m_release)
{
m_release = true;
m_mutex.unlock();
}
}
private:
Mutex& m_mutex;
bool m_release;
};
This link provides a useful discussion:
http://groups.google.com/group/comp.programming.threads/browse_thread/thread/67e7b9b9d6a4b7df?pli=1
Paraphrasing, (at least on Linux) Boost::Thread and OpenMP both an interface to pthread and so in principle should be able to be mixed (as Anders says +1), but mixing threading technologies in this way is generally a bad idea (as Andy says, +1).
You should not mix synchronization mechanisms. E.g. current pthreads mutex implementation is based on futex and it is different from previous pthreads implementations (see man 7 pthreads). If you create your own level of abstraction, you should use it. It should be considered what is your need - inter-thread or inter-process synchronization?
If you need cooperation with code that uses boost::mutex, you should use boost::mutex in place of open mp.
Additionally IMHO it is quite strange to use open mp library functions to realize mutex.
The part requiring compatibility is the thread suspension, rescheduling and context switching. As long as the threads are real threads, scheduled by the operating system, you should be able to use any mutex implementation that relies on some kind of kerner primitive for suspending and resuming a waiting thread.

Multiple-readers, single-writer locks in Boost

I'm trying to implement the following code in a multithreading scenario:
Get shared access to mutex
Read data structure
If necessary:
Get exclusive access to mutex
Update data structure
Release exclusive lock
Release shared lock
Boost threads has a shared_mutex class which was designed for a multiple-readers, single-writer model. There are several stackoverflow questions regarding this class. However, I'm not sure it fits the scenario above where any reader may become a writer. The documentation states:
The UpgradeLockable concept is a
refinement of the SharedLockable
concept that allows for upgradable
ownership as well as shared ownership
and exclusive ownership. This is an
extension to the multiple-reader /
single-write model provided by the
SharedLockable concept: a single
thread may have upgradable ownership
at the same time as others have shared
ownership.
From the word "single" I suspect that only one thread may hold an upgradable lock. The others only hold a shared lock which can't be upgraded to an exclusive lock.
Do you know if boost::shared_lock is useful in this situation (any reader may become a writer), or if there's any other way to achieve this?
Yes, you can do exactly what you want as shown in the accepted answer here. A call to upgrade to exclusive access will block until all readers are done.
boost::shared_mutex _access;
void reader()
{
// get shared access
boost::shared_lock<boost::shared_mutex> lock(_access);
// now we have shared access
}
void writer()
{
// get upgradable access
boost::upgrade_lock<boost::shared_mutex> lock(_access);
// get exclusive access
boost::upgrade_to_unique_lock<boost::shared_mutex> uniqueLock(lock);
// now we have exclusive access
}
boost::shared_lock doesn't help in this situation (multiple readers that can become writers), since only a single thread may own an upgradable lock. This is both implied by the quote from the documentation in the question, and by looking at the code (thread\win32\shared_mutex.hpp). If a thread tries to acquire an upgradable lock while another thread holds one, it will wait for the other thread.
I settled on using a regular lock for all reader/writers, which is OK in my case since the critical section is short.
You know LightweightLock
or same at LightweightLock_zip
does exactly what you want.
I have used it a long time.
[EDIT]
here's the source:
/////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 1995-2002 Brad Wilson
//
// This material is provided "as is", with absolutely no warranty
// expressed or implied. Any use is at your own risk. Permission to
// use or copy this software for any purpose is hereby granted without
// fee, provided the above notices are retained on all copies.
// Permission to modify the code and to distribute modified code is
// granted, provided the above notices are retained, and a notice that
// the code was modified is included with the above copyright notice.
//
/////////////////////////////////////////////////////////////////////////////
//
// This lightweight lock class was adapted from samples and ideas that
// were put across the ATL mailing list. It is a non-starving, kernel-
// free lock that does not order writer requests. It is optimized for
// use with resources that can take multiple simultaneous reads,
// particularly when writing is only an occasional task.
//
// Multiple readers may acquire the lock without any interference with
// one another. As soon as a writer requests the lock, additional
// readers will spin. When the pre-writer readers have all given up
// control of the lock, the writer will obtain it. After the writer
// has rescinded control, the additional readers will gain access
// to the locked resource.
//
// This class is very lightweight. It does not use any kernel objects.
// It is designed for rapid access to resources without requiring
// code to undergo process and ring changes. Because the "spin"
// method for this lock is "Sleep(0)", it is a good idea to keep
// the lock only long enough for short operations; otherwise, CPU
// will be wasted spinning for the lock. You can change the spin
// mechanism by #define'ing __LW_LOCK_SPIN before including this
// header file.
//
// VERY VERY IMPORTANT: If you have a lock open with read access and
// attempt to get write access as well, you will deadlock! Always
// rescind your read access before requesting write access (and,
// of course, don't rely on any read information across this).
//
// This lock works in a single process only. It cannot be used, as is,
// for cross-process synchronization. To do that, you should convert
// this lock to using a semaphore and mutex, or use shared memory to
// avoid kernel objects.
//
// POTENTIAL FUTURE UPGRADES:
//
// You may consider writing a completely different "debug" version of
// this class that sacrifices performance for safety, by catching
// potential deadlock situations, potential "unlock from the wrong
// thread" situations, etc. Also, of course, it's virtually mandatory
// that you should consider testing on an SMP box.
//
///////////////////////////////////////////////////////////////////////////
#pragma once
#ifndef _INC_CRTDBG
#include
#endif
#ifndef _WINDOWS_
#include
#endif
#ifndef __LW_LOCK_SPIN
#define __LW_LOCK_SPIN Sleep(0)
#endif
class LightweightLock
{
// Interface
public:
// Constructor
LightweightLock()
{
m_ReaderCount = 0;
m_WriterCount = 0;
}
// Destructor
~LightweightLock()
{
_ASSERTE( m_ReaderCount == 0 );
_ASSERTE( m_WriterCount == 0 );
}
// Reader lock acquisition and release
void LockForReading()
{
while( 1 )
{
// If there's a writer already, spin without unnecessarily
// interlocking the CPUs
if( m_WriterCount != 0 )
{
__LW_LOCK_SPIN;
continue;
}
// Add to the readers list
InterlockedIncrement((long*) &m_ReaderCount );
// Check for writers again (we may have been pre-empted). If
// there are no writers writing or waiting, then we're done.
if( m_WriterCount == 0 )
break;
// Remove from the readers list, spin, try again
InterlockedDecrement((long*) &m_ReaderCount );
__LW_LOCK_SPIN;
}
}
void UnlockForReading()
{
InterlockedDecrement((long*) &m_ReaderCount );
}
// Writer lock acquisition and release
void LockForWriting()
{
// See if we can become the writer (expensive, because it inter-
// locks the CPUs, so writing should be an infrequent process)
while( InterlockedExchange((long*) &m_WriterCount, 1 ) == 1 )
{
__LW_LOCK_SPIN;
}
// Now we're the writer, but there may be outstanding readers.
// Spin until there aren't any more; new readers will wait now
// that we're the writer.
while( m_ReaderCount != 0 )
{
__LW_LOCK_SPIN;
}
}
void UnlockForWriting()
{
m_WriterCount = 0;
}
long GetReaderCount() { return m_ReaderCount; };
long GetWriterConut() { return m_WriterCount; };
// Implementation
private:
long volatile m_ReaderCount;
long volatile m_WriterCount;
};