I am facing an issue with respect to shared resources and I am using one mutex for synchronization. It is working fine with small number of threads [ example 10 threads], but I have an issue of “control is not coming back” (might be because of deadlock)if I try with with more number of threads[example 60 threads].
Note: The code is a legacy code and is written in VC6, and I am maintaining the code.
I have global data to share between multiple device, for that I am using lock and unlock functions as below...
inline LONG SharedData::Lock()
return WaitForSingleObject(m_hMutex, INFINITE );
inline BOOL SharedData::Unlock()
return ReleaseMutex(m_hMutex);
I am suspecting the destructor is causing some issue, below is the destructor...
SharedData::~ SharedData()
if (m_lShareCnt < 1)
//clearing the heap
Catch(…) { }
if (!m_lShareCnt)
if(m_hMutex != NULL && m_hMutex != INVALID_HANDLE_VALUE )
And the constructor as follows
SharedData:: SharedData ()
if (!m_hMutex) m_hMutex = CreateMutex(NULL, FALSE, NULL);
Can anybody tell what might be wrong in the code?

I think one problem is race condition on reference counter m_lShareCnt which is is not atomic. If non-atomic variable is modified concurrently by multiple threads then it's value might be unexpected.
The solution would be to make the reference counter atomic, or just protect the access to this variable with the mutex that you already use. Since you use ancient VC6, the easiest you can do is to use InterlockedIncrement and InterlockedDecrement methods for atomic read/write.


Why can mutex be used in different threads?

Using (writing) same variable in multiple threads simultaneously causes undefined behavior and crashes.
Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
If mutex somehow can be used simultaneously, why not make all variables work simultaneously without locking?
All my research is pressing Show definition on mutex::lock in Visual Studio, where I get at the end _Mtx_lock function without realization, and then I found it’s realization (Windows), though it has some functions also without realization:
int _Mtx_lock(_Mtx_t mtx)
{ /* lock mutex */
return (mtx_do_lock(mtx, 0));
static int mtx_do_lock(_Mtx_t mtx, const xtime *target)
{ /* lock mutex */
if ((mtx->type & ~_Mtx_recursive) == _Mtx_plain)
{ /* set the lock */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
{ /* not current thread, do lock */
mtx->thread_id = static_cast<long>(GetCurrentThreadId());
return (_Thrd_success);
{ /* handle timed or recursive mutex */
int res = WAIT_TIMEOUT;
if (target == 0)
{ /* no target --> plain wait (i.e. infinite timeout) */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
res = WAIT_OBJECT_0;
else if (target->sec < 0 || target->sec == 0 && target->nsec <= 0)
{ /* target time <= 0 --> plain trylock or timed wait for */
/* time that has passed; try to lock with 0 timeout */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
{ /* not this thread, lock it */
if (mtx->_get_cs()->try_lock())
res = WAIT_OBJECT_0;
res = WAIT_OBJECT_0;
{ /* check timeout */
xtime now;
xtime_get(&now, TIME_UTC);
while (now.sec < target->sec
|| now.sec == target->sec && now.nsec < target->nsec)
{ /* time has not expired */
if (mtx->thread_id == static_cast<long>(GetCurrentThreadId())
|| mtx->_get_cs()->try_lock_for(
_Xtime_diff_to_millis2(target, &now)))
{ /* stop waiting */
res = WAIT_OBJECT_0;
xtime_get(&now, TIME_UTC);
if (res != WAIT_OBJECT_0 && res != WAIT_ABANDONED)
else if (1 < ++mtx->count)
{ /* check count */
if ((mtx->type & _Mtx_recursive) != _Mtx_recursive)
{ /* not recursive, fixup count */
mtx->thread_id = static_cast<long>(GetCurrentThreadId());
switch (res)
return (_Thrd_success);
if (target == 0 || (target->sec == 0 && target->nsec == 0))
return (_Thrd_busy);
return (_Thrd_timedout);
return (_Thrd_error);
So, according to this code, and the atomic_ keywords I think mutex can be written the next way:
atomic_bool state = false;
void lock()
state = true;
void unlock()
state = false;
bool try_lock()
state = true;
return false;
return true;
As you have found, std::mutex is thread-safe because it uses atomic operations. It can be reproduced with std::atomic_bool. Using atomic variables from multiple thread is not undefined behavior, because that is the purpose of those variables.
From C++ standard (emphasis mine):
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
Atomic variables are implemented using atomic operations of the CPU. This is not implemented for non-atomic variables, because those operations take longer time to execute and would be useless if the variables are only used in one thread.
Your example is not thread-safe:
void lock()
state = true;
If two threads are checking if(!state) simultaneously, it is possible that both enter the if section, and both threads believe they have the ownership:
Thread 1 Thread 2
if (!state)
if (!state)
You must use an atomic exchange function to ensure that the another thread cannot come in between checking the value and changing it.
void lock()
bool expected;
do {
expected = false;
} while (!state.compare_exchange_weak(expected, true));
You can also add a counter and give time for other threads to execute if the wait takes a long time:
void lock()
bool expected;
size_t counter = 0;
do {
expected = false;
if (counter > 100) {
else if (counter > 20) {
else if (counter > 3) {
} while (!state.compare_exchange_weak(expected, true));
Using (writing) same variable in multiple threads simultaneously causes undefined behavior and crashes. Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
It is only undefined behaviour for regular variables, and only if there is no synchronisation. std::mutex is defined to be thread safe. It's entire point is to provide synchronisation to other objects.
From [intro.races]:
The library defines a number of atomic operations ([atomics]) and operations on mutexes ([thread]) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another.
Note: For example, a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. Correspondingly, a call that releases the same mutex will perform a release operation on those same locations.
Certain library calls synchronize with other library calls performed by another thread.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
(Emphasis added)
Why mutex can be used in different threads?
How could it possibly be useful if it couldn't be used in different threads? Synchronizing multiple threads with a shared mutex is the only reason for it to exist at all.
Using (writing) same variable in multiple threads simultaneously
It's a bad idea to only worry about things happening "simultaneously". The problem is generally things happening with undetermined ordering, ie, unpredictably.
There are lots of multi-threading bugs that seem impossible if you believe things have to be simultaneous to go wrong.
causes undefined behavior and crashes.
Undefined Behaviour is not required to cause a crash. If it had to crash, crashing would be behaviour which was ... defined. There are endless questions on here from people who don't understand this, asking why their "undefined behaviour test" didn't crash.
Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
Because of the way they're used. Mutexes are not simply assigned to or read from, like simple variables, but are manipulated with specialized code designed specifically to do this correctly.
You can almost write your own mutex - a spinlock, anyway - just by using std::atomic<int> and a lot of care.
The difference is that a mutex also interacts with your operating system scheduler, and that interface is not portably exposed as part of the language standard. The std::mutex class bundles the OS-specific part up in a class with correct semantics, so you can write portable C++ instead of being limited to, say, POSIX-compatible C++ or Windows-compatible C++.
In your exploration of the VS std::mutex implementation, you ignored the mtx->_get_cs()->lock() part: this is using the Windows Critical Section to interact with the scheduler.
Your implementation is an attempt at a spinlock, which is fine so long as you know the lock is never held for long (or if you don't mind sacrificing a core for each blocked thread). By contrast, the mutex allows a waiting thread to be de-scheduled until the lock is released. This is the part handled by the Critical Section.
You also ignored all the timeout and recursive locking code - which is fine, but your implementation isn't really attempting to do the same thing as the original.
A mutex is specifically made to synchronize code on different threads that work on the same resource. It is designed not have issues when used on multi-threaded. Be aware that when using different mutexes together, you can still deadlocks when taking them in different order on different threads. C++'s std::unique_lock solves this.
A variable is meant to use on either single threads or on synchronized threads because that's how they can be accessed in the fastest way. This has to do with computer architecture (registers, cache, operations in several steps).
To work with variables on non-synchronized threads, you can work with std::atomic variables. That can be faster than synchronizing, but the access is slower and more cumbersome than for normal variables. Some more complex situations with several variables can only be handled in a synchronous way.

Difference between C++ mutex and RTOS xMutex

I'm experimenting with locking on an ESP32. Apparently, there are different ways to implement a lock:
There is the default C++ mutex library:
#include <mutex>
std::mutex mtx;
And there is the implementation from RTOS:
SemaphoreHandle_t xMutex = xSemaphoreCreateMutex();
xSemaphoreTake(xMutex, portMAX_DELAY);
Are there fundamental differences I should be aware of?
Or are they equivalent?
Assuming you're using the ESP-IDF SDK, the toolchain is based on GCC 5.2 targeting the xtensa-lx106 instruction set, with a partially open-source C runtime library.
std::mutex in GNU libstdc++ delegates to pthread_mutex_lock/unlock calls. ESP-IDF SDK contains a pthread emulation layer, where we can see what pthread_mutex_lock and pthread_mutex_unlock actually do:
static int IRAM_ATTR pthread_mutex_lock_internal(esp_pthread_mutex_t *mux, TickType_t tmo)
if (!mux) {
return EINVAL;
if ((mux->type == PTHREAD_MUTEX_ERRORCHECK) &&
(xSemaphoreGetMutexHolder(mux->sem) == xTaskGetCurrentTaskHandle())) {
return EDEADLK;
if (mux->type == PTHREAD_MUTEX_RECURSIVE) {
if (xSemaphoreTakeRecursive(mux->sem, tmo) != pdTRUE) {
return EBUSY;
} else {
if (xSemaphoreTake(mux->sem, tmo) != pdTRUE) {
return EBUSY;
return 0;
int IRAM_ATTR pthread_mutex_unlock(pthread_mutex_t *mutex)
esp_pthread_mutex_t *mux;
if (!mutex) {
return EINVAL;
mux = (esp_pthread_mutex_t *)*mutex;
if (!mux) {
return EINVAL;
if (((mux->type == PTHREAD_MUTEX_RECURSIVE) ||
(xSemaphoreGetMutexHolder(mux->sem) != xTaskGetCurrentTaskHandle())) {
return EPERM;
int ret;
if (mux->type == PTHREAD_MUTEX_RECURSIVE) {
ret = xSemaphoreGiveRecursive(mux->sem);
} else {
ret = xSemaphoreGive(mux->sem);
if (ret != pdTRUE) {
assert(false && "Failed to unlock mutex!");
return 0;
So as you can see it mainly delegates the calls to the RTOS semaphore API, with some additional checks.
Chances are you don't need/want those checks. Given the tiny i-cache of the esp32 chip and the excruciatingly slow serial RAM, I would prefer to stay as close to the hardware as possible (i.e. don't use std::mutex unless it does exactly what you need).
Are there fundamental differences I should be aware of?
I am not familiar with the API that you're calling in your second example, but it looks as if your xMutex variable refers to a counting semaphore. The "semaphore" abstraction is more powerful than the "mutex" abstraction. I.e., you can always use a semaphore as a substitute for a mutex, but there are some algorithms in which a mutex would not work as a substitute for a semaphore.
I like to think of a semaphore as a blocking queue of informationless tokens. The "give" operation puts a token into the queue, while the "take" takes one from the queue, possibly waiting for some other thread to give a token if the queue happens to be empty at the moment when take() was called.
P.S., In order to use a semaphore as a substitute for a mutex, you'll need it to contain one token when the mutex should be "free", and zero tokens when the mutex should be "in use." That means, you'll want the code that creates the semaphore to ensure that it contains one token at the start
The xMutex = xSemaphoreCreateMutex() statement in your example does not explicitly show how many tokens the new semaphore contains. If it's zero tokens, then you'll probably want your next line of code to "give()" one token in order to complete the initialization.

Synchronize Threads - InterlockedExchange

I like to check if a thread is doing work. If the thread is doing work I will wait for an event until the thread has stopped its work. The event the thread will set at the end.
To check if the thread is working I declared a volatile bool variable. The bool variable will be true if the thread is running, else it is false. At the end of the thread the bool variable will be set to false.
Is it adequate to use a volatile bool variable or do I have to use an atomic function?
BTW: Can please someone explain me the InterlockedExchange Method, I don´t understand the use case I will need this function.
I see without my code it is not clear to say if a volatile bool variable will adequate. I wrote a testclass which shows my problem.
class Testclass
void doThreadedWork();
void Work();
void StartWork();
void WaitUntilFinish();
HANDLE hHasWork;
HANDLE hAbort;
HANDLE hFinished;
volatile bool m_bWorking;
#include "stdafx.h"
#include "Testclass.h"
DWORD WINAPI myThread(LPVOID lpParameter)
Testclass* pTestclass = (Testclass*) lpParameter;
return 0;
DWORD myThreadID;
HANDLE myHandle = CreateThread(0, 0, myThread, this, 0, &myThreadID);
m_bWorking = false;
hHasWork = CreateEvent(NULL,TRUE,FALSE,NULL);
hAbort = CreateEvent(NULL,TRUE,FALSE,NULL);
hFinished = CreateEvent(NULL,FALSE,FALSE,NULL);
void Testclass::Work()
// do some work
m_bWorking = false;
void Testclass::StartWork()
m_bWorking = true;
void Testclass::doThreadedWork()
HANDLE hEvents[2];
hEvents[0] = hHasWork;
hEvents[1] = hAbort;
DWORD dwEvent = WaitForMultipleObjects(2, hEvents, FALSE, INFINITE);
if(WAIT_OBJECT_0 == dwEvent)
void Testclass::WaitUntilFinish()
// if the thread is not working, do not wait and return
For me it is not realy clear if m_bWorking value n a atomic way or if the volatile cast will adequate.
There is a lot of background to cover for your question. We don't know for example what tool chain you are using so I am going to answer it as a winapi question. I further assume you have some something in mind like this:
volatile bool flag = false;
DWORD WINAPI WorkFn(void*) {
flag = true;
// work here
// done.
flag = false;
return 0;
int main() {
HANDLE th = CreateThread(...., &WorkFn, NULL, ..);
// wait for start of work.
while (!flag) {
// ?? # 1
// Seems thread is busy now. Time to wait for it to finish.
while (flag) {
// ?? # 2
There are many things wrong here. For starters the volatile does very little here. When flag = true happens it will eventually be visible to the other thread because it is backed by a global variable. This is so because it will at least make it into the cache and the cache has ways to tell other processors that a given line (which is a range of addresses) is dirty. The only way it would not make it into the cache is that if the compiler makes a super crazy optimization in which flag stays in the cpu as a register. That could actually happen but not in this particular code example.
So volatile tells the compiler to never keep the variable as a register. That is what it is, every time you see a volatile variable you can translate it as "never enregister this variable". Its use here is just basically a paranoid move.
If this code is what you had in mind then this looping over a flag pattern is called a Spinlock and this one is a really poor one. It is almost never the right thing to do in a user mode program.
Before we go into better approaches let me tackle your Interlocked question. What people usually mean is this pattern
volatile long flag = 0;
DWORD WINAPI WorkFn(void*) {
InterlockedExchange(&flag, 1);
int main() {
while (InterlockedCompareExchange(&flag, 1, 1) = 0L) {
Assume the ... means similar code as before. What the InterlockedExchange() is doing is forcing the write to memory to happen in a deterministic, "broadcast the change now", kind of way and the typical way to read it in the same "bypass the cache" way is via InterlockedCompareExchange().
One problem with them is that they generate more traffic on the system bus. That is, the bus now being used to broadcast cache synchronization packets among the cpus on the system.
std::atomic<bool> flag would be the modern, C++11 way to do the same, but still not what you really want to do.
I added the YieldProcessor() call there to point to the real problem. When you wait for a memory address to change you are using cpu resources that would be better used somewhere else, for example in the actual work (!!). If you actually yield the processor there is at least a chance that the OS will give it to the WorkFn, but in a multicore machine it will quickly go back to polling the variable. In a modern machine you will be checking this flag millions of times per second, with the yield, probably 200000 times per second. Terrible waste either way.
What you want to do here is to leverage Windows to do a zero-cost wait, or at least a low cost as you want to:
DWORD WINAPI WorkFn(void*) {
// work here
return 0;
int main() {
HANDLE th = CreateThread(...., &WorkFn, NULL, ..);
WaitForSingleObject(th, INFINITE);
// work is done!
When you return from the worker thread the thread handle get signaled and the wait it satisfied. While stuck in WaitForSingleObject you don't consume any cpu cycles. If you want to do a periodic activity in the main() function while you wait you can replace INFINITE with 1000, which will release the main thread every second. In that case you need to check the return value of WaitForSingleObject to tell the timeout from thread being done case.
If you need to actually know when work started, you need an additional waitable object, for example, a Windows event which is obtained via CreateEvent() and can be waited on using the same WaitForSingleObject.
Update [1/23/2016]
Now that we can see the code you have in mind, you don't need atomics, volatile works just fine. The m_bWorking is protected by the cs mutex anyhow for the true case.
If I might suggest, you can use TryEnterCriticalSection and cs to accomplish the same without m_bWorking at all:
void Testclass::Work()
// do some work
SetEvent(hFinished); // could be removed as well
void Testclass::StartWork()
ResetEvent(hFinished); // could be removed.
void Testclass::WaitUntilFinish()
if (TryEnterCriticalSection(&cs)) {
// Not busy now.
} else {
// busy doing work. If we use EnterCriticalSection(&cs)
// here we can even eliminate hFinished from the code.
For some reason, the Interlocked API does not include an "InterlockedGet" or "InterlockedSet" function. This is a strange omission and the typical work around is to cast through volatile.
You can use code like the following on Windows:
#include <intrin.h>
__inline int InterlockedIncrement(int *j)
{ // This is VS-specific
return _InterlockedIncrement((volatile LONG *) j);
__inline int InterlockedDecrement(int *j)
{ // This is VS-specific
return _InterlockedDecrement((volatile LONG *) j);
__inline static void InterlockedSet(int *val, int newval)
*((volatile int *)val) = newval;
__inline static int InterlockedGet(int *val)
return *((volatile int *)val);
Yes, it's ugly. But it's the best way to work around the deficiency if you're not using C++11. If you're using C++11, use std::atomic instead.
Note that this is Windows-specific code and should not be used on other platforms.
No, volatile bool will not be enough. You need an atomic bool, as you correctly suspect. Otherwise, you might never see your bool updated.
There is also no InterlockedExchange in C++ (the tags of your question), but there are compare_exchange_weak and compare_exchange_strong functions in C++11. Those are used to set the value of an object to a certain NewValue, provided it's current value is TestValue and indicate the status of this attempt (was the change made or not). The benefit of those functions is that this is done in such a fasion that you are guaranteed that if two threads are trying to perform this operation, only one will succeed. This is very helpful when you need to take a certain actions depending on the result of the operation.

Error with function in multithreaded environment

What my function does is iterate through an array of bools and upon finding an element set to false, it is set to true. The function is a method from my memory manager singleton class which returns a pointer to memory. I'm getting an error where my iterator appears to loop through and ends up starting at the beginning, which I believe to because multiple threads are calling the function.
void* CNetworkMemoryManager::GetMemory()
WaitForSingleObject(hMutexCounter, INFINITE);
if(mCounter >= NetConsts::kNumMemorySlots)
mCounter = 0;
unsigned int tempCounter = mCounter;
unsigned int start = tempCounter;
if(tempCounter >= NetConsts::kNumMemorySlots)
tempCounter = 0;
//looped all the way around
if(tempCounter == start)
return NULL;
//return pointer to free space and increment
mCounter = tempCounter + 1;
mUsedSlots[tempCounter] = true;
return mPointers[tempCounter];
My error is the assert that goes off in the loop. My question is how do I fix the function and is the error caused by multithreading?
Edit: added a mutex to guard the mCounter variable. No change. Error still occurs.
I can't say if the error is caused by multi threading or not but I can say your code is not thread safe.
You free the lock with
and then access tempCounter and mUsedSlots:
mUsedSlots[tempCounter] = true;
return mPointers[tempCounter];
neither of which are const. This is a data race because you have not correctly serialized access to these variables.
Change this to:
mUsedSlots[tempCounter] = true;
const unsigned int retVal = mPointers[tempCounter];
return retVal;
Then at least your code is thread safe, whether this solves your problem I can't say, try it out. On machines with multiple cores very weird things to happen as a result of data races.
As general best practice I would suggest looking at some C++11 synchronization features like std::mutex and std::lock_guard, this would have saved you from your self because std::lock_guard releases that lock automatically so you can't forget and, as in this case, you can't do it too soon inadvertently. This would also make your code more portable. If you don't have C++11 yet use the boost equivalents.

Why is Boost scoped_lock not unlocking the mutex?

I've been using boost::mutex::scoped_lock in this manner:
void ClassName::FunctionName()
boost::mutex::scoped_lock scopedLock(mutex_);
//do stuff
while(waitBoolean == true ){
//get on with the thread's activities
Basically it sets waitBoolean, and the other thread signals that it is done by setting waitBoolean to false;
This doesn't seem to work, however, because the other thread can't get a lock on mutex_ !!
I was assuming that by wrapping the scoped_lock in brackets I would be terminating its lock. This isn't the case? Reading online says that it only gives up the mutex when the destructor is called. Won't it be destroyed when it goes out of that local scope?
Signaling part of code:
boost::mutex::scoped_lock scopedLock(mutex_);
//Run some function that need to be done...
To synchronize two threads use a condition variable. That is the state of the art way to synchronize two threads the way you want :
Using boost, the waiting part is something like :
void BoostSynchronisationPoint::waitSynchronisation()
boost::unique_lock<boost::mutex> lock(_mutex);
_synchronisationSent = false;
_condition.wait(lock); // unlock and wait
The notify part is something like :
void BoostSynchronisationPoint::sendSynchronisation()
boost::lock_guard<boost::mutex> lock(_mutex);
_synchronisationSent = true;
The business with _synchronisationSent is to avoid spurrious wakeups : see wikipedia
The scoped_lock should indeed be released at the end of the scope. However you don't lock the waitBoolean when you're looping on it, suggesting you don't protect it properly other places as well - e.g. where it's set to false, and you'll end up with nasty race conditions.
I'd say you should use a boost::condition_variable to do this sort of things, instead of sleep + thread-unsafe checking.
Also I would suggest to mark as volatile that waitBoolean, however you have to use a condition or even better a barrier.