What could cause a deadlock of a single write/multiple read lock?

I have a class instance that is used by several other classes in other threads to communicate.
This class uses a slim reader/writer lock (WinAPI's SRWLOCK) as a synchronization object and a couple of RAII helper classes to actually lock/unlock the thing:
static unsigned int readCounter = 0;
class CReadLock
CReadLock(SRWLOCK& Lock) : m_Lock(Lock) { InterlockedIncrement(&readCounter); AcquireSRWLockShared(&m_Lock); }
~CReadLock() {ReleaseSRWLockShared(m_Lock); InterlockedDecrement(&readCounter);}
SRWLOCK& m_Lock;
class CWriteLock
CWriteLock(SRWLOCK& Lock) : m_Lock(Lock) { AcquireSRWLockExclusive(&m_Lock); }
~CWriteLock() { ReleaseSRWLockExclusive(&m_Lock); }
SRWLOCK& m_Lock;
The problem is the whole thing deadlocks all the time. When I pause the deadlocked program, I see:
one thread stuck in AcquireSRWLockExclusive();
two threads stuck in AcquireSRWLockShared();
readCounter global is set to 3.
The way I see it, the only way for this to happen is CReadLock instance's destructor hasn't been called somehow somewhere so the lock is perpetually stuck. However, the only way for this to happen (as far as I know) is because an exception has been thrown. It wasn't. I checked.
What might be the problem? How should I go about fixing (or at least locating the reason of) this thing?

Are you using Read lock in recursive manner?
void foo()
CReadLock rl(m_lock);
void bar()
CReadLock rl(m_lock);
void baz()
CWritedLock rl(m_lock);
if foo() and baz() are called simultaneously you may get deadlock:
1. (Thread A) foo locks
2. (Thread B) baz asks to create write lock now all read locks would block until all are released - waits.
3. (Thread A) bar tries to lock and waits because there is pending write lock
The fact that you have 2 threads stuck on Read lock and Read Lock counter is 3 that most likely shows that you have a recursion in one of the locks - i.e. one thread had tried to acquired read lock twice.

Well, as far as I can read from it, you have one thread holding the read lock currently, one write thread waiting for that read lock to be released, and two read threads waiting for that write thread to get and release the lock.
In other words, you have one dangling read thread, which has not bee destructed, like you say yourself. Add debug print to destructor and constructor.


is there any way to wakeup multiple threads at the same time in c/c++

well, actually, I'm not asking the threads must "line up" to work, but I just want to notify multiple threads. so I'm not looking for barrier.
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation(also the potential problem in multiple semaphore post operation). it's kind of like:
std::atomic_flag flag{ATOMIC_FLAG_INIT};
void example() {
if (!flag.test_and_set()) {
// this is the thread to do the job, and notify others
notify_others(); // this is what I'm looking for
} else {
// this is the waiting thread
void runner() {
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
// ...
so how can I do this in c/c++ or maybe posix API?
sorry, I didn't make this question clear enough, I'd add some more explaination.
it's not thunder heard problem I'm talking about, and yes, it's the re-acquire-lock that bothers me, and I tried shared_mutex, there's still some problem.
let me split the threads to 2 parts, 1 as leader thread, which do the writing job, the others as worker threads, which do the reading job.
but actually they're all equal in programme, the leader thread is the thread that 1st got access to the job( you can take it as the shared buffer is underflowed for this thread). once the job is done, the other workers just need to be notified that them have the access.
if the mutex is used here, any thread would block the others.
to give an example: the main thread's job do_something() here is a read, and it block the main thread, thus the whole system is blocked.
unfortunatly, shared_mutex won't solve this problem:
void example() {
if (!flag.test_and_set()) {
// leader thread:
} else {
// worker thread
// outer loop
void looper() {
for (int i=0; i<10; ++i) {
threads.emplace_back([]() {
while(1) {
in this code, if the leader job was done, and not much to do between this unlock and next lock (remember they're in a loop), it may get the lock again, leave the worker jobs not working, which is why I call it starve earlier.
and to explain the blocking in do_something(), I don't want this part of job takes all my CPU time, even if the leader's job is not ready (no data arrive for read)
and std::call_once may still not be the answer to this. because, as you can see, the workers must wait till the leader's job finished.
to summarize, this is actually a one-producer-multi-consumer problem.
but I want the consumers can do the job when the product is ready for them. and any can be the producer or consumer. if any but the 1st find the product has run out, the thread should be the producer, thus others are automatically consumer.
but unfortunately, I'm not sure if this idea would work or not
it's kind of like the condition_variable::notify_all(), but I don't want the threads wakeup one-by-one, which may cause starvation
In principle it's not waking up that is serialized, but re-acquiring the lock.
You can avoid that by using std::condition_variable_any with a std::shared_lock - so long as nobody ever gets an exclusive lock on the std::shared_mutex. Alternatively, you can provide your own Lockable type.
Note however that this won't magically allow you to concurrently run more threads than you have cores, or force the scheduler to start them all running in parallel. They'll just be marked as runnable and scheduled as normal - this only fixes the avoidable serialization in your own code.
It sounds like you are looking for call_once
#include <mutex>
void example()
static std::once_flag flag;
bool i_did_once = false;
std::call_once(flag, [&i_did_once]() mutable {
i_did_once = true;
if(! i_did_once)
I don't see how your problem relates to starvation. Are you perhaps thinking about the thundering herd problem? This may arise if do_some_other_thing has a mutex but in that case you have to describe your problem in more detail.

And odd use of conditional variable with local mutex

Poring through legacy code of old and large project, I had found that there was used some odd method of creating thread-safe queue, something like this:
template < typename _Msg>
class WaitQue: public QWaitCondition
typedef _Msg DataType;
void wakeOne(const DataType& msg)
QMutexLocker lock_(&mx);
void wait(DataType& msg)
/// wait if empty.
QMutex wx; // WHAT?
QMutexLocker cvlock_(&wx);
if (que.empty())
QMutexLocker _wlock(&mx);
msg = que.front();
unsigned long size() {
QMutexLocker lock_(&mx);
return que.size();
std::queue<DataType> que;
QMutex mx;
wakeOne is used from threads as kind of "posting" function" and wait is called from other threads and waits indefinitely until a message appears in queue. In some cases roles between threads reverse at different stages and using separate queues.
Is this even legal way to use a QMutex by creating local one? I kind of understand why someone could do that to dodge deadlock while reading size of que but how it even works? Is there a simpler and more idiomatic way to achieve this behavior?
Its legal to have a local condition variable. But it normally makes no sense.
As you've worked out in this case is wrong. You should be using the member:
void wait(DataType& msg)
QMutexLocker cvlock_(&mx);
while (que.empty())
msg = que.front();
Notice also that you must have while instead of if around the call to QWaitCondition::wait. This is for complex reasons about (possible) spurious wake up - the Qt docs aren't clear here. But more importantly the fact that the wake and the subsequent reacquire of the mutex is not an atomic operation means you must recheck the variable queue for emptiness. It could be this last case where you previously were getting deadlocks/UB.
Consider the scenario of an empty queue and a caller (thread 1) to wait into QWaitCondition::wait. This thread blocks. Then thread 2 comes along and adds an item to the queue and calls wakeOne. Thread 1 gets woken up and tries to reacquire the mutex. However, thread 3 comes along in your implementation of wait, takes the mutex before thread 1, sees the queue isn't empty, processes the single item and moves on, releasing the mutex. Then thread 1 which has been woken up finally acquires the mutex, returns from QWaitCondition::wait and tries to process... an empty queue. Yikes.

Swapping mutex locks

I'm having trouble with properly "swapping" locks. Consider this situation:
bool HidDevice::wait(const std::function<bool(const Info&)>& predicate)
/* A method scoped lock. */
std::unique_lock waitLock(this->waitMutex, std::defer_lock);
/* A scoped, general access, lock. */
std::lock_guard lock(this->mutex);
bool exitEarly = false;
/* do some checks... */
if (exitEarly)
return false;
/* Only one thread at a time can execute this method, however
other threads can execute other methods or abort this one. Thus,
general access mutex "this->mutex" should be unlocked (to allow threads
to call other methods) while at the same time, "this->waitMutex" should
be locked to prevent multiple executions of code below. */
waitLock.lock(); // How do I release "this->mutex" here?
/* do some stuff... */
/* The main problem is with this event based OS function. It can
only be called once with the data I provide, therefore I need to
have a 2 locks - one blocks multiple method calls (the usual stuff)
and "waitLock" makes sure that only one instance of "osBlockingFunction"
is ruinning at the time. Since this is a thread blocking function,
"this->mutex" must be unlocked at this point. */
bool result = osBlockingFunction(...);
/* In methods, such as "close", "this->waitMutex" and others are then used
to make sure that thread blocking methods have returned and I can safely
modify related data. */
/* do some more stuff... */
return result;
How could I solve this "swapping" problem without overly complicating code? I could unlock this->mutex before locking another, however I'm afraid that in that nanosecond, a race condition might occur.
Imagine that 3 threads are calling wait method. The first one will lock this->mutex, then this->waitMutex and then will unlock this->mutex. The second one will lock this->mutex and will have to wait for this->waitMutex to be available. It will not unlock this->mutex. The third one will get stuck on locking this->mutex.
I would like to get the last 2 threads to wait for this->waitMutex to be available.
Edit 2:
Expanded example with osBlockingFunction.
It smells like that the design/implementation should be a bit different with std::condition_variable cv on the HidDevice::wait and only one mutex. And as you write "other threads can execute other methods or abort this one" will call cv.notify_one to "abort" this wait. The cv.wait {enter wait & unlocks the mutex} atomically and on cv.notify {exits wait and locks the mutex} atomically. Like that HidDevice::wait is more simple:
bool HidDevice::wait(const std::function<bool(const Info&)>& predicate)
std::unique_lock<std::mutex> lock(this->m_Mutex); // Only one mutex.
m_bEarlyExit = false;
this->cv.wait(lock, spurious wake-up check);
if (m_bEarlyExit) // A bool data-member for abort.
/* do some stuff... */
My assumption is (according to the name of the function) that on /* do some checks... */ the thread waits until some logic comes true.
"Abort" the wait, will be in the responsibility of other HidDevice function, called by the other thread:
void HidDevice::do_some_checks() /* do some checks... */
if ( some checks )
if ( other checks )
m_bEarlyExit = true;
Something similar to that.
I recommend creating a little "unlocker" facility. This is a mutex wrapper with inverted semantics. On lock it unlocks and vice-versa:
template <class Lock>
class unlocker
Lock& locked_;
unlocker(Lock& lk) : locked_{lk} {}
void lock() {locked_.unlock();}
bool try_lock() {locked_.unlock(); return true;}
void unlock() {locked_.lock();}
Now in place of:
waitLock.lock(); // How do I release "this->mutex" here?
You can instead say:
unlocker temp{lock};
std::lock(waitLock, temp);
where lock is a unique_lock instead of a lock_guard holding mutex.
This will lock waitLock and unlock mutex as if by one uninterruptible instruction.
And now, after coding all of that, I can reason that it can be transformed into:
lock.unlock(); // lock must be a unique_lock to do this
Whether the first version is more or less readable is a matter of opinion. The first version is easier to reason about (once one knows what std::lock does). But the second one is simpler. But with the second, the reader has to think more carefully about the correctness.
Just read the edit in the question. This solution does not fix the problem in the edit: The second thread will block the third (and following threads) from making progress in any code that requires mutex but not waitMutex, until the first thread releases waitMutex.
So in this sense, my answer is technically correct, but does not satisfy the desired performance characteristics. I'll leave it up for informational purposes.

One mutex vs Multiple mutexes. Which one is better for the thread pool?

Example here, just want to protect the iData to ensure only one thread visit it at the same time.
struct myData;
myData iData;
Method 1, mutex inside the call function (multiple mutexes could be created):
void _proceedTest(myData &data)
std::mutex mtx;
std::unique_lock<std::mutex> lk(mtx);
int const nMaxThreads = std::thread::hardware_concurrency();
vector<std::thread> threads;
for (int iThread = 0; iThread < nMaxThreads; ++iThread)
threads.push_back(std::thread(_proceedTest, iData));
for (auto& th : threads) th.join();
Method2, use only one mutex:
void _proceedTest(myData &data, std::mutex &mtx)
std::unique_lock<std::mutex> lk(mtx);
std::mutex mtx;
int const nMaxThreads = std::thread::hardware_concurrency();
vector<std::thread> threads;
for (int iThread = 0; iThread < nMaxThreads; ++iThread)
threads.push_back(std::thread(_proceedTest, iData, mtx));
for (auto& th : threads) th.join();
I want to make sure that the Method 1 (multiple mutexes) ensures that only one thread can visit the iData at the same time.
If Method 1 is correct, not sure Method 1 is better of Method 2?
I want to make sure that the Method 1 (multiple mutexes) ensures that only one thread can visit the iData at the same time.
Your 1st example creates a local mutex variable on the stack, it won't be shared with the other threads. Thus it's completely useless.
It won't guarantee exclusive access to iData.
If Method 1 is correct, not sure Method 1 is better of Method 2?
It isn't correct.
The other answers are correct on the technical level, but there is an important language independent thing missing: you always prefer to minimize the number of different mutexes/locks/... !
Because: as soon as you have more than one thing that a thread needs to acquire in order to do something (to then release all acquired locks) order becomes crucial.
When you have two locks, and you have to different pieces of code, like:
getLockA() {
getLockB() {
do something
release B
release A
getLockB() {
getLockA() {
you can quickly run into deadlocks - because two threads/processes can acquire one lock each - and then they are both stuck, waiting for the other one to release its lock. Of course - when looking at the above example "you would never make a mistake, and always go A first then B". But what if those locks exist in completely different parts of your application? And they aren't acquired in the same method or class, but over the course of say 3, 5 nested method invocations?
Thus: when you can solve your problem with one lock - use one lock only! The more locks you need to get something done, the higher the risk to end up in dead locks.
Method 1 only works if you make the mutex variable static.
void _proceedTest(myData &data)
static std::mutex mtx;
std::unique_lock<std::mutex> lk(mtx);
This will make mtx be shared by all threads that enter _proceedTest.
Since a static function scope variable is only visible to users of the function, it is not really a sufficient lock for the passed in data. This is because it is conceivable that multiple threads could be calling different functions that each want to manipulate data.
Thus, even though Method 1 is salvageable, Method 2 is still better, even though the cohesion between the lock and the data is weak.
The mutex in version 1 will go out of scope once you leave the _proceedTest scope, locking a mutex like that makes no sense because it will never be accessible to the other thread.
In the second version multiple threads can share the mutex (as long as it doesn't go out of scope, for example as a class member), this way one thread can lock it and the other thread can see that it is locked (and won't be able to lock it aswell, hence the term mutual exclusion).

Simple Thread Synchronization

I need a simple "one at a time" lock on a section of code. Consider the function func which can be run from multiple threads:
void func()
// locking/mutex statement goes here
// corresponding unlock goes here
I need to make sure that operation1 and operation2 always run "together". With C# I would use a simple lock block around these two calls. What is the C++/Win32/MFC equivalent?
Presumably some sort of Mutex?
Improving Michael solution above for C++.
Michael solution is perfect for C applications. But when used in C++ this style is discouraged because of the possibility of exceptions. If an exception happens in operation1 or operation2 then the critical section will not be correctly left and all other threads will block waiting.
// Perfect solution for C applications
void func()
// cs previously initialized via InitializeCriticalSection
// A better solution for C++
class Locker
Locker(CSType& cs): m_cs(cs)
CSType& m_cs;
void func()
// cs previously initialized via InitializeCriticalSection
Locker lock(cs);
Critical sections will work (they're lighter-weight that mutexes.) InitializeCriticalSection, EnterCriticalSection, LeaveCriticalSection, and DeleteCriticalSection are the functions to look for on MSDN.
void func()
// cs previously initialized via InitializeCriticalSection
Critical sections are faster than mutexes since critical sections are primarily user mode primitives - in the case of an uncontended acquire (usually the common case) there is no system call into the kernel, and acquiring takes on the order of dozens of cycles. A kernel switch is more more expensive (on the order of hundreds of cycles). The only time critical sections call into the kernel is in order to block, which involves waiting on a kernel primitive, (either mutex or event). Acquiring a mutex always involves a call into the kernel, and is thus orders of magnitude slower.
However, critical sections can only be used to synchronize resources in one process. In order to synchronize across multiple processes, a mutex is needed.
The best method would be to use a critical section, use EnterCriticalSection and LeaveCriticalSection. The only ticky part is that you need to initialize a critical section first with InitializeCriticalSection. If this code is within a class, put the initialization in the constructor and the CRITICAL_SECTION data structure as a member of the class. If the code is not part of a class, you need to likely use a global or something similiar to ensure it is initialized once.
using MFC:
Define a synchronization object. ( Mutext or Critical section)
1.1 If multiple threads belonging to different process enters the
func() then use CMutex.
1.2. If multiple threads of same process enters the func() then use
CSingleLock can be used to ease the usage of synchronization objects.
Lets say we have defined critical section
CCriticalSection m_CriticalSection;
void func()
// locking/mutex statement goes here
CSingleLock aLock(&m_CriticalSection, **TRUE**);
// TRUE indicates that Lock aquired during aLock creation.
// if FALSE used then use aLock.Lock() for locking.
// corresponding unlock goes here
EDIT: Refer VC++ article from MSDN: Multithreading with C++ and MFC Classes and
Multithreading: How to Use the Synchronization Classes
You can try this:
void func()
// See answer by Sasha on how to create the mutex
WaitForSingleObject (mutex, INFINITE);