Difference between C++ mutex and RTOS xMutex - c++

I'm experimenting with locking on an ESP32. Apparently, there are different ways to implement a lock:
There is the default C++ mutex library:
#include <mutex>
std::mutex mtx;
mtx.lock();
mtx.unlock();
And there is the implementation from RTOS:
SemaphoreHandle_t xMutex = xSemaphoreCreateMutex();
xSemaphoreTake(xMutex, portMAX_DELAY);
xSemaphoreGive(xMutex);
Are there fundamental differences I should be aware of?
Or are they equivalent?

Assuming you're using the ESP-IDF SDK, the toolchain is based on GCC 5.2 targeting the xtensa-lx106 instruction set, with a partially open-source C runtime library.
std::mutex in GNU libstdc++ delegates to pthread_mutex_lock/unlock calls. ESP-IDF SDK contains a pthread emulation layer, where we can see what pthread_mutex_lock and pthread_mutex_unlock actually do:
static int IRAM_ATTR pthread_mutex_lock_internal(esp_pthread_mutex_t *mux, TickType_t tmo)
{
if (!mux) {
return EINVAL;
}
if ((mux->type == PTHREAD_MUTEX_ERRORCHECK) &&
(xSemaphoreGetMutexHolder(mux->sem) == xTaskGetCurrentTaskHandle())) {
return EDEADLK;
}
if (mux->type == PTHREAD_MUTEX_RECURSIVE) {
if (xSemaphoreTakeRecursive(mux->sem, tmo) != pdTRUE) {
return EBUSY;
}
} else {
if (xSemaphoreTake(mux->sem, tmo) != pdTRUE) {
return EBUSY;
}
}
return 0;
}
int IRAM_ATTR pthread_mutex_unlock(pthread_mutex_t *mutex)
{
esp_pthread_mutex_t *mux;
if (!mutex) {
return EINVAL;
}
mux = (esp_pthread_mutex_t *)*mutex;
if (!mux) {
return EINVAL;
}
if (((mux->type == PTHREAD_MUTEX_RECURSIVE) ||
(mux->type == PTHREAD_MUTEX_ERRORCHECK)) &&
(xSemaphoreGetMutexHolder(mux->sem) != xTaskGetCurrentTaskHandle())) {
return EPERM;
}
int ret;
if (mux->type == PTHREAD_MUTEX_RECURSIVE) {
ret = xSemaphoreGiveRecursive(mux->sem);
} else {
ret = xSemaphoreGive(mux->sem);
}
if (ret != pdTRUE) {
assert(false && "Failed to unlock mutex!");
}
return 0;
}
So as you can see it mainly delegates the calls to the RTOS semaphore API, with some additional checks.
Chances are you don't need/want those checks. Given the tiny i-cache of the esp32 chip and the excruciatingly slow serial RAM, I would prefer to stay as close to the hardware as possible (i.e. don't use std::mutex unless it does exactly what you need).

Are there fundamental differences I should be aware of?
I am not familiar with the API that you're calling in your second example, but it looks as if your xMutex variable refers to a counting semaphore. The "semaphore" abstraction is more powerful than the "mutex" abstraction. I.e., you can always use a semaphore as a substitute for a mutex, but there are some algorithms in which a mutex would not work as a substitute for a semaphore.
I like to think of a semaphore as a blocking queue of informationless tokens. The "give" operation puts a token into the queue, while the "take" takes one from the queue, possibly waiting for some other thread to give a token if the queue happens to be empty at the moment when take() was called.
P.S., In order to use a semaphore as a substitute for a mutex, you'll need it to contain one token when the mutex should be "free", and zero tokens when the mutex should be "in use." That means, you'll want the code that creates the semaphore to ensure that it contains one token at the start
The xMutex = xSemaphoreCreateMutex() statement in your example does not explicitly show how many tokens the new semaphore contains. If it's zero tokens, then you'll probably want your next line of code to "give()" one token in order to complete the initialization.

Related

Why can mutex be used in different threads?

Using (writing) same variable in multiple threads simultaneously causes undefined behavior and crashes.
Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
If mutex somehow can be used simultaneously, why not make all variables work simultaneously without locking?
All my research is pressing Show definition on mutex::lock in Visual Studio, where I get at the end _Mtx_lock function without realization, and then I found it’s realization (Windows), though it has some functions also without realization:
int _Mtx_lock(_Mtx_t mtx)
{ /* lock mutex */
return (mtx_do_lock(mtx, 0));
}
static int mtx_do_lock(_Mtx_t mtx, const xtime *target)
{ /* lock mutex */
if ((mtx->type & ~_Mtx_recursive) == _Mtx_plain)
{ /* set the lock */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
{ /* not current thread, do lock */
mtx->_get_cs()->lock();
mtx->thread_id = static_cast<long>(GetCurrentThreadId());
}
++mtx->count;
return (_Thrd_success);
}
else
{ /* handle timed or recursive mutex */
int res = WAIT_TIMEOUT;
if (target == 0)
{ /* no target --> plain wait (i.e. infinite timeout) */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
mtx->_get_cs()->lock();
res = WAIT_OBJECT_0;
}
else if (target->sec < 0 || target->sec == 0 && target->nsec <= 0)
{ /* target time <= 0 --> plain trylock or timed wait for */
/* time that has passed; try to lock with 0 timeout */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
{ /* not this thread, lock it */
if (mtx->_get_cs()->try_lock())
res = WAIT_OBJECT_0;
else
res = WAIT_TIMEOUT;
}
else
res = WAIT_OBJECT_0;
}
else
{ /* check timeout */
xtime now;
xtime_get(&now, TIME_UTC);
while (now.sec < target->sec
|| now.sec == target->sec && now.nsec < target->nsec)
{ /* time has not expired */
if (mtx->thread_id == static_cast<long>(GetCurrentThreadId())
|| mtx->_get_cs()->try_lock_for(
_Xtime_diff_to_millis2(target, &now)))
{ /* stop waiting */
res = WAIT_OBJECT_0;
break;
}
else
res = WAIT_TIMEOUT;
xtime_get(&now, TIME_UTC);
}
}
if (res != WAIT_OBJECT_0 && res != WAIT_ABANDONED)
;
else if (1 < ++mtx->count)
{ /* check count */
if ((mtx->type & _Mtx_recursive) != _Mtx_recursive)
{ /* not recursive, fixup count */
--mtx->count;
res = WAIT_TIMEOUT;
}
}
else
mtx->thread_id = static_cast<long>(GetCurrentThreadId());
switch (res)
{
case WAIT_OBJECT_0:
case WAIT_ABANDONED:
return (_Thrd_success);
case WAIT_TIMEOUT:
if (target == 0 || (target->sec == 0 && target->nsec == 0))
return (_Thrd_busy);
else
return (_Thrd_timedout);
default:
return (_Thrd_error);
}
}
}
So, according to this code, and the atomic_ keywords I think mutex can be written the next way:
atomic_bool state = false;
void lock()
{
if(!state)
state = true;
else
while(state){}
}
void unlock()
{
state = false;
}
bool try_lock()
{
if(!state)
state = true;
else
return false;
return true;
}
As you have found, std::mutex is thread-safe because it uses atomic operations. It can be reproduced with std::atomic_bool. Using atomic variables from multiple thread is not undefined behavior, because that is the purpose of those variables.
From C++ standard (emphasis mine):
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
Atomic variables are implemented using atomic operations of the CPU. This is not implemented for non-atomic variables, because those operations take longer time to execute and would be useless if the variables are only used in one thread.
Your example is not thread-safe:
void lock()
{
if(!state)
state = true;
else
while(state){}
}
If two threads are checking if(!state) simultaneously, it is possible that both enter the if section, and both threads believe they have the ownership:
Thread 1 Thread 2
if (!state)
if (!state)
state=true;
state=true;
You must use an atomic exchange function to ensure that the another thread cannot come in between checking the value and changing it.
void lock()
{
bool expected;
do {
expected = false;
} while (!state.compare_exchange_weak(expected, true));
}
You can also add a counter and give time for other threads to execute if the wait takes a long time:
void lock()
{
bool expected;
size_t counter = 0;
do {
expected = false;
if (counter > 100) {
Sleep(10);
}
else if (counter > 20) {
Sleep(5);
}
else if (counter > 3) {
Sleep(1);
}
counter++;
} while (!state.compare_exchange_weak(expected, true));
}
Using (writing) same variable in multiple threads simultaneously causes undefined behavior and crashes. Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
It is only undefined behaviour for regular variables, and only if there is no synchronisation. std::mutex is defined to be thread safe. It's entire point is to provide synchronisation to other objects.
From [intro.races]:
The library defines a number of atomic operations ([atomics]) and operations on mutexes ([thread]) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another.
...
Note: For example, a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. Correspondingly, a call that releases the same mutex will perform a release operation on those same locations.
Certain library calls synchronize with other library calls performed by another thread.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
(Emphasis added)
Why mutex can be used in different threads?
How could it possibly be useful if it couldn't be used in different threads? Synchronizing multiple threads with a shared mutex is the only reason for it to exist at all.
Using (writing) same variable in multiple threads simultaneously
It's a bad idea to only worry about things happening "simultaneously". The problem is generally things happening with undetermined ordering, ie, unpredictably.
There are lots of multi-threading bugs that seem impossible if you believe things have to be simultaneous to go wrong.
causes undefined behavior and crashes.
Undefined Behaviour is not required to cause a crash. If it had to crash, crashing would be behaviour which was ... defined. There are endless questions on here from people who don't understand this, asking why their "undefined behaviour test" didn't crash.
Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
Because of the way they're used. Mutexes are not simply assigned to or read from, like simple variables, but are manipulated with specialized code designed specifically to do this correctly.
You can almost write your own mutex - a spinlock, anyway - just by using std::atomic<int> and a lot of care.
The difference is that a mutex also interacts with your operating system scheduler, and that interface is not portably exposed as part of the language standard. The std::mutex class bundles the OS-specific part up in a class with correct semantics, so you can write portable C++ instead of being limited to, say, POSIX-compatible C++ or Windows-compatible C++.
In your exploration of the VS std::mutex implementation, you ignored the mtx->_get_cs()->lock() part: this is using the Windows Critical Section to interact with the scheduler.
Your implementation is an attempt at a spinlock, which is fine so long as you know the lock is never held for long (or if you don't mind sacrificing a core for each blocked thread). By contrast, the mutex allows a waiting thread to be de-scheduled until the lock is released. This is the part handled by the Critical Section.
You also ignored all the timeout and recursive locking code - which is fine, but your implementation isn't really attempting to do the same thing as the original.
A mutex is specifically made to synchronize code on different threads that work on the same resource. It is designed not have issues when used on multi-threaded. Be aware that when using different mutexes together, you can still deadlocks when taking them in different order on different threads. C++'s std::unique_lock solves this.
A variable is meant to use on either single threads or on synchronized threads because that's how they can be accessed in the fastest way. This has to do with computer architecture (registers, cache, operations in several steps).
To work with variables on non-synchronized threads, you can work with std::atomic variables. That can be faster than synchronizing, but the access is slower and more cumbersome than for normal variables. Some more complex situations with several variables can only be handled in a synchronous way.

Multithreaded C++: force read from memory, bypassing cache

I'm working on a personal hobby-time game engine and I'm working on a multithreaded batch executor. I was originally using a concurrent lockless queue and std::function all over the place to facilitate communication between the master and slave threads, but decided to scrap it in favor of a lighter-weight way of doing things that give me tight control over memory allocation: function pointers and memory pools.
Anyway, I've run into a problem:
The function pointer, no matter what I try, is only getting read correctly by one thread while the others read a null pointer and thus fail an assert.
I'm fairly certain this is a problem with caching. I have confirmed that all threads have the same address for the pointer. I've tried declaring it as volatile, intptr_t, std::atomic, and tried all sorts of casting-fu and the threads all just seem to ignore it and continue reading their cached copies.
I've modeled the master and slave in a model checker to make sure the concurrency is good, and there is no livelock or deadlock (provided that the shared variables all synchronize correctly)
void Executor::operator() (int me) {
while (true) {
printf("Slave %d waiting.\n", me);
{
std::unique_lock<std::mutex> lock(batch.ready_m);
while(!batch.running) batch.ready.wait(lock);
running_threads++;
}
printf("Slave %d running.\n", me);
BatchFunc func = batch.func;
assert(func != nullptr);
int index;
if (batch.store_values) {
while ((index = batch.item.fetch_add(1)) < batch.n_items) {
void* data = reinterpret_cast<void*>(batch.data_buffer + index * batch.item_size);
func(batch.share_data, data);
}
}
else {
while ((index = batch.item.fetch_add(1)) < batch.n_items) {
void** data = reinterpret_cast<void**>(batch.data_buffer + index * batch.item_size);
func(batch.share_data, *data);
}
}
// at least one thread finished, so make sure we won't loop back around
batch.running = false;
if (running_threads.fetch_sub(1) == 1) { // I am the last one
batch.done = true; // therefore all threads are done
batch.complete.notify_all();
}
}
}
void Executor::run_batch() {
assert(!batch.running);
if (batch.func == nullptr || batch.n_items == 0) return;
batch.item.store(0);
batch.running = true;
batch.done = false;
batch.ready.notify_all();
printf("Master waiting.\n");
{
std::unique_lock<std::mutex> lock(batch.complete_m);
while (!batch.done) batch.complete.wait(lock);
}
printf("Master ready.\n");
batch.func = nullptr;
batch.n_items = 0;
}
batch.func is being set by another function
template<typename SharedT, typename ItemT>
void set_batch_job(void(*func)(const SharedT*, ItemT*), const SharedT& share_data, bool byValue = true) {
static_assert(sizeof(SharedT) <= SHARED_DATA_MAXSIZE, "Shared data too large");
static_assert(std::is_pod<SharedT>::value, "Shared data type must be POD");
assert(std::is_pod<ItemT>::value || !byValue);
assert(!batch.running);
batch.func = reinterpret_cast<volatile BatchFunc>(func);
memcpy(batch.share_data, (void*) &share_data, sizeof(SharedT));
batch.store_values = byValue;
if (byValue) {
batch.item_size = sizeof(ItemT);
}
else { // store pointers instead of values
batch.item_size = sizeof(ItemT*);
}
batch.n_items = 0;
}
and here is the struct (and typedef) that it's dealing with
typedef void(*BatchFunc)(const void*, void*);
struct JobBatch {
volatile BatchFunc func;
void* const share_data = operator new(SHARED_DATA_MAXSIZE);
intptr_t const data_buffer = reinterpret_cast<intptr_t>(operator new (EXEC_DATA_BUFFER_SIZE));
volatile size_t item_size;
std::atomic<int> item; // Index into the data array
volatile int n_items = 0;
std::condition_variable complete; // slave -> master signal
std::condition_variable ready; // master -> slave signal
std::mutex complete_m;
std::mutex ready_m;
bool store_values = false;
volatile bool running = false; // there is work to do in the batch
volatile bool done = false; // there is no work left to do
JobBatch();
} batch;
How do I make sure that all the necessary reads and writes to batch.func get synchronized properly between threads?
Just in case it matters: I'm using Visual Studio and compiling an x64 Debug Windows executable. Intel i5, Windows 10, 8GB RAM.
So I did a little reading on the C++ memory model and I managed to hack together a solution using atomic_thread_fence. Everything is probably super broken because I'm crazy and shouldn't roll my own system here, but hey, it's fun to learn!
Basically, whenever you're done writing things that you want other threads to see, you need to call atomic_thread_fence(std::memory_order_release)
On the receiving thread(s), you call atomic_thread_fence(std::memory_order_acquire) before reading shared data.
In my case, release should be done immediately before waiting on a condition variable and acquire should be done immediately before using data written by other threads.
This ensures that the writes on one thread are seen by the others.
I'm no expert, so this is probably not the right way to tackle the problem and will likely be faced with certain doom. For instance, I still have a deadlock/livelock problem to sort out.
tl;dr: it's not exactly a cache thing: threads may not have their data totally in sync with each other unless you enforce that with atomic memory fences.

System-wide global variable / semaphore / mutex in C++/Linux?

Is it possible to create a system-wide global variable / semaphore / mutex in C++ on Linux?
Here's the reason: I've got a system that often runs multiple copies of the same software on unrelated data. It's common to have 4 jobs, each running the same software. The software has a small section where it creates a huge graph that takes a lot of memory; outside that section memory usage is moderate.
It so happens sometimes that 2 jobs simultaneously hit the same memory-hungry section and the whole system starts swapping. Thus we want to prevent that by creating something like a critical section mutex between different jobs so that no more than one of them would allocate a lot of memory at a time.
If these were thread of the same job pthread locks would do the job.
What would be a good way to implement such mutex between different jobs?
You can use a named semaphore if you can get all the processes to agree on a common name.
A named semaphore is identified by a name of the form
/somename; that is, a null-terminated string of up to
NAME_MAX-4 (i.e., 251) characters consisting of an initial
slash, followed by one or more characters, none of which are
slashes. Two processes can operate on the same named
semaphore by passing the same name to sem_open(3).
For interprocess mutual exclusion, you can use file locking. With linux, the code is as simple as protecting the critical section with a call to flock.
int fd_lock = open(LOCK_FILE, O_CREAT);
flock(fd_lock, LOCK_EX);
// do stuff
flock(fd_lock, LOCK_UN);
If you need POSIX compatibility, you can use fcntl.
You can make C++ mutexes work across process boundaries on Linux. However, there's some black magic involved which makes it less appropriate for production code.
Explanation:
The standard library's std::mutex and std::shared_mutex use pthread's struct pthread_mutex_s and pthread_rwlock_t under the hood. The native_handle() method returns a pointer to one of these structures.
The drawback is that certain details are abstracted out of the standard library and defaulted in the implementation. For example, std::shared_mutex creates its underlying pthread_rwlock_t structure by passing NULL as the second parameter to pthread_rwlock_init(). This is supposed to be a pointer to a pthread_rwlockattr_t structure containing an attribute which determines sharing policy.
public:
__shared_mutex_pthread()
{
int __ret = pthread_rwlock_init(&_M_rwlock, NULL);
...
In theory, it should receive default attributes. According to the man pages for pthread_rwlockattr_getpshared():
The default value of the process-shared attribute is PTHREAD_PROCESS_PRIVATE.
That said, both std::shared_mutex and std::mutex work across processes anyway. I'm using Clang 6.0.1 (x86_64-unknown-linux-gnu / POSIX thread model). Here's a description of what I did to check:
Create a shared memory region with shm_open.
Check the size of the region with fstat to determine ownership. If .st_size is zero, then ftruncate() it and the caller knows that it is the region's creating process.
Call mmap on it.
The creator process uses placement-new to construct a std::mutex or std::shared_mutex object within the shared region.
Later processes use reinterpret_cast<>() to obtain a typed pointer to the same object.
The processes now loop on calling trylock() and unlock() at intervals. You can see them blocking one another using printf() before and after trylock() and before unlock().
Extra detail: I was interested in whether the c++ headers or the pthreads implementation were at fault, so I dug into pthread_rwlock_arch_t. You'll find a __shared attribute which is zero and a __flags attribute which is also zero for the field denoted by __PTHREAD_RWLOCK_INT_FLAGS_SHARED. So it seems that by default this structure is not intended to be shared, though it seems to provide this facility anyway (as of July 2019).
Summary
It seems to work, though somewhat by chance. I would advise caution in writing production software that works contrary to documentation.
I looked at using the shared-pthread-mutex solution but didn't like the logic race in it. So I wrote a class to do this using the atomic builtins
#include <string>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
using std::string;
//from the command line - "ls /dev/shm" and "lsof /dev/shm/<name>" to see which process ID has access to it
template<typename PAYLOAD>
class InterprocessSharedVariable
{
protected:
int mSharedMemHandle;
string const mSharedMemoryName;
bool mOpenedMemory;
bool mHaveLock;
pid_t mPID;
// this is the shared memory structure
typedef struct
{
pid_t mutex;
PAYLOAD payload;
}
tsSharedPayload;
tsSharedPayload* mSharedData;
bool openSharedMem()
{
mPID = getpid();
// The following caters for the shared mem being created by root but opened by non-root,
// giving the shared-memory 777 permissions.
int openFlags = O_CREAT | O_RDWR;
int shareMode = S_IRWXU | S_IRWXG | S_IRWXO;
// see https://stackoverflow.com/questions/11909505/posix-shared-memory-and-semaphores-permissions-set-incorrectly-by-open-calls
// store old
mode_t old_umask = umask(0);
mSharedMemHandle = shm_open (mSharedMemoryName.c_str(), openFlags, shareMode);
// restore old
umask(old_umask);
if (mSharedMemHandle < 0)
{
std::cerr << "failed to open shared memory" << std::endl;
return false;
}
if (-1 == ftruncate(mSharedMemHandle, sizeof(tsSharedPayload)))
{
std::cerr << "failed to resize shared memory" << std::endl;
return false;
}
mSharedData = (tsSharedPayload*) mmap (NULL,
sizeof(tsSharedPayload),
PROT_READ | PROT_WRITE,
MAP_SHARED,
mSharedMemHandle,
0);
if (MAP_FAILED == mSharedData)
{
std::cerr << "failed to map shared memory" << std::endl;
return false;
}
return true;
}
void closeSharedMem()
{
if (mSharedMemHandle > 0)
{
mSharedMemHandle = 0;
shm_unlink (mSharedMemoryName.c_str());
}
}
public:
InterprocessSharedVariable () = delete;
InterprocessSharedVariable (string const&& sharedMemoryName) : mSharedMemoryName(sharedMemoryName)
{
mSharedMemHandle = 0;
mOpenedMemory = false;
mHaveLock = false;
mPID = 0;
}
virtual ~InterprocessSharedVariable ()
{
releaseSharedVariable ();
closeSharedMem ();
}
// no copying
InterprocessSharedVariable (InterprocessSharedVariable const&) = delete;
InterprocessSharedVariable& operator= (InterprocessSharedVariable const&) = delete;
bool tryLockSharedVariable (pid_t& ownerProcessID)
{
// Double-checked locking. See if a process has already grabbed the mutex. Note the process could be dead
__atomic_load (&mSharedData->mutex, &ownerProcessID, __ATOMIC_SEQ_CST);
if (0 != ownerProcessID)
{
// It is possible that we have started with the same PID as a previous process that terminated abnormally
if (ownerProcessID == mPID)
{
// ... in which case, we already "have ownership"
return (true);
}
// Another process may have the mutex. Check whether it is alive.
// We are specifically looking for an error returned with ESRCH
// Note that if the other process is owned by root, "kill 0" may return a permissions error (which indicates the process is running!)
int processCheckResult = kill (ownerProcessID, 0);
if ((0 == processCheckResult) || (ESRCH != errno))
{
// another process owns the shared memory and is running
return (false);
}
// Here: The other process does not exist ((0 != processCheckResult) && (ESRCH == errno))
// We could assume here that we can now take ownership, but be proper and fall into the compare-exchange
ownerProcessID = 0;
}
// It's possible that another process has snuck in here and taken ownership of the shared memory.
// If that has happened, the exchange will "fail" (and the existing PID is stored in ownerProcessID)
// ownerProcessID == 0 -> representing the "expected" value
mHaveLock = __atomic_compare_exchange_n (&mSharedData->mutex,
&ownerProcessID, //"expected"
mPID, //"desired"
false, //"weak"
__ATOMIC_SEQ_CST, //"success-memorder"
__ATOMIC_SEQ_CST); //"fail-memorder"
return (mHaveLock);
}
bool acquireSharedVariable (bool& failed, pid_t& ownerProcessID)
{
if (!mOpenedMemory)
{
mOpenedMemory = openSharedMem ();
if (!mOpenedMemory)
{
ownerProcessID = 0;
failed = true;
return false;
}
}
// infrastructure is working
failed = false;
bool gotLock = tryLockSharedVariable (ownerProcessID);
return (gotLock);
}
void releaseSharedVariable ()
{
if (mHaveLock)
{
__atomic_store_n (&mSharedData->mutex, 0, __ATOMIC_SEQ_CST);
mHaveLock = false;
}
}
};
Example usage - here we are simply using it to ensure that only one instance of the application runs.
int main(int argc, char *argv[])
{
typedef struct { } tsEmpty;
InterprocessSharedVariable<tsEmpty> programMutex ("/run-once");
bool memOpenFailed;
pid_t ownerProcessID;
if (!programMutex.acquireSharedVariable (memOpenFailed, ownerProcessID))
{
if (memOpenFailed)
{
std::cerr << "Failed to open shared memory" << std::endl;
}
else
{
std::cerr << "Program already running - process ID " << ownerProcessID << std::endl;
}
return -1;
}
... do stuff ...
return 0;
}
Mutual exclusion locks (mutexes) prevent multiple threads from simultaneously executing critical sections of code that access shared data (that is, mutexes are used to serialize the execution of threads). All mutexes must be global. A successful call for a mutex lock by way of mutex_lock() will cause another thread that is also trying to lock the same mutex to block until the owner thread unlocks it by way of mutex_unlock(). Threads within the same process or within other processes can share mutexes.
Mutexes can synchronize threads within the same process or in other processes. Mutexes can be used to synchronize threads between processes if the mutexes are allocated in writable memory and shared among the cooperating processes (see mmap(2)), and have been initialized for this task.
For inter-process synchronization, a mutex needs to be allocated in memory shared between these processes. Since the memory for such a mutex must be allocated dynamically, the mutex needs to be explicitly initialized using mutex_init().
also, for inter-process synchronization, besides the requirement to be allocated in shared memory, the mutexes must also use the attribute PTHREAD_PROCESS_SHARED, otherwise accessing the mutex from another process than its creator results in undefined behaviour (see this: linux.die.net/man/3/pthread_mutexattr_setpshared): "The process-shared attribute is set to PTHREAD_PROCESS_SHARED to permit a mutex to be operated upon by any thread that has access to the memory where the mutex is allocated, even if the mutex is allocated in memory that is shared by multiple processes."

Threading issue: Control is not coming back to the caller

I am facing an issue with respect to shared resources and I am using one mutex for synchronization. It is working fine with small number of threads [ example 10 threads], but I have an issue of “control is not coming back” (might be because of deadlock)if I try with with more number of threads[example 60 threads].
Note: The code is a legacy code and is written in VC6, and I am maintaining the code.
Explanation
I have global data to share between multiple device, for that I am using lock and unlock functions as below...
inline LONG SharedData::Lock()
{
return WaitForSingleObject(m_hMutex, INFINITE );
}
inline BOOL SharedData::Unlock()
{
return ReleaseMutex(m_hMutex);
}
I am suspecting the destructor is causing some issue, below is the destructor...
SharedData::~ SharedData()
{
Lock();
try
{
m_lShareCnt--;
if (m_lShareCnt < 1)
{
//clearing the heap
}
}
Catch(…) { }
Unlock();
if (!m_lShareCnt)
{
if(m_hMutex != NULL && m_hMutex != INVALID_HANDLE_VALUE )
{
CloseHandle(m_hMutex);
m_hMutex=NULL;
}
}
return;
}
And the constructor as follows
SharedData:: SharedData ()
{
try
{
if (!m_hMutex) m_hMutex = CreateMutex(NULL, FALSE, NULL);
m_lShareCnt++;
}
}
Can anybody tell what might be wrong in the code?
I think one problem is race condition on reference counter m_lShareCnt which is is not atomic. If non-atomic variable is modified concurrently by multiple threads then it's value might be unexpected.
The solution would be to make the reference counter atomic, or just protect the access to this variable with the mutex that you already use. Since you use ancient VC6, the easiest you can do is to use InterlockedIncrement and InterlockedDecrement methods for atomic read/write.

How would you implement your own reader/writer lock in C++11?

I have a set of data structures I need to protect with a readers/writer lock. I am aware of boost::shared_lock, but I would like to have a custom implementation using std::mutex, std::condition_variable and/or std::atomic so that I can better understand how it works (and tweak it later).
Each data structure (moveable, but not copyable) will inherit from a class called Commons which encapsulates the locking. I'd like the public interface to look something like this:
class Commons {
public:
void read_lock();
bool try_read_lock();
void read_unlock();
void write_lock();
bool try_write_lock();
void write_unlock();
};
...so that it can be publicly inherited by some:
class DataStructure : public Commons {};
I'm writing scientific code and can generally avoid data races; this lock is mostly a safeguard against the mistakes I'll probably make later. Thus my priority is low read overhead so I don't hamper a correctly-running program too much. Each thread will probably run on its own CPU core.
Could you please show me (pseudocode is ok) a readers/writer lock? What I have now is supposed to be the variant that prevents writer starvation. My main problem so far has been the gap in read_lock between checking if a read is safe to actually incrementing a reader count, after which write_lock knows to wait.
void Commons::write_lock() {
write_mutex.lock();
reading_mode.store(false);
while(readers.load() > 0) {}
}
void Commons::try_read_lock() {
if(reading_mode.load()) {
//if another thread calls write_lock here, bad things can happen
++readers;
return true;
} else return false;
}
I'm kind of new to multithreading, and I'd really like to understand it. Thanks in advance for your help!
Here's pseudo-code for a ver simply reader/writer lock using a mutex and a condition variable. The mutex API should be self-explanatory. Condition variables are assumed to have a member wait(Mutex&) which (atomically!) drops the mutex and waits for the condition to be signaled. The condition is signaled with either signal() which wakes up one waiter, or signal_all() which wakes up all waiters.
read_lock() {
mutex.lock();
while (writer)
unlocked.wait(mutex);
readers++;
mutex.unlock();
}
read_unlock() {
mutex.lock();
readers--;
if (readers == 0)
unlocked.signal_all();
mutex.unlock();
}
write_lock() {
mutex.lock();
while (writer || (readers > 0))
unlocked.wait(mutex);
writer = true;
mutex.unlock();
}
write_unlock() {
mutex.lock();
writer = false;
unlocked.signal_all();
mutex.unlock();
}
That implementation has quite a few drawbacks, though.
Wakes up all waiters whenever the lock becomes available
If most of the waiters are waiting for a write lock, this is wastefull - most waiters will fail to acquire the lock, after all, and resume waiting. Simply using signal() doesn't work, because you do want to wake up everyone waiting for a read lock unlocking. So to fix that, you need separate condition variables for readability and writability.
No fairness. Readers starve writers
You can fix that by tracking the number of pending read and write locks, and either stop acquiring read locks once there a pending write locks (though you'll then starve readers!), or randomly waking up either all readers or one writer (assuming you use separate condition variable, see section above).
Locks aren't dealt out in the order they are requested
To guarantee this, you'll need a real wait queue. You could e.g. create one condition variable for each waiter, and signal all readers or a single writer, both at the head of the queue, after releasing the lock.
Even pure read workloads cause contention due to the mutex
This one is hard to fix. One way is to use atomic instructions to acquire read or write locks (usually compare-and-exchange). If the acquisition fails because the lock is taken, you'll have to fall back to the mutex. Doing that correctly is quite hard, though. Plus, there'll still be contention - atomic instructions are far from free, especially on machines with lots of cores.
Conclusion
Implementing synchronization primitives correctly is hard. Implementing efficient and fair synchronization primitives is even harder. And it hardly ever pays off. pthreads on linux, e.g. contains a reader/writer lock which uses a combination of futexes and atomic instructions, and which thus probably outperforms anything you can come up with in a few days of work.
Check this class:
//
// Multi-reader Single-writer concurrency base class for Win32
//
// (c) 1999-2003 by Glenn Slayden (glenn#glennslayden.com)
//
//
#include "windows.h"
class MultiReaderSingleWriter
{
private:
CRITICAL_SECTION m_csWrite;
CRITICAL_SECTION m_csReaderCount;
long m_cReaders;
HANDLE m_hevReadersCleared;
public:
MultiReaderSingleWriter()
{
m_cReaders = 0;
InitializeCriticalSection(&m_csWrite);
InitializeCriticalSection(&m_csReaderCount);
m_hevReadersCleared = CreateEvent(NULL,TRUE,TRUE,NULL);
}
~MultiReaderSingleWriter()
{
WaitForSingleObject(m_hevReadersCleared,INFINITE);
CloseHandle(m_hevReadersCleared);
DeleteCriticalSection(&m_csWrite);
DeleteCriticalSection(&m_csReaderCount);
}
void EnterReader(void)
{
EnterCriticalSection(&m_csWrite);
EnterCriticalSection(&m_csReaderCount);
if (++m_cReaders == 1)
ResetEvent(m_hevReadersCleared);
LeaveCriticalSection(&m_csReaderCount);
LeaveCriticalSection(&m_csWrite);
}
void LeaveReader(void)
{
EnterCriticalSection(&m_csReaderCount);
if (--m_cReaders == 0)
SetEvent(m_hevReadersCleared);
LeaveCriticalSection(&m_csReaderCount);
}
void EnterWriter(void)
{
EnterCriticalSection(&m_csWrite);
WaitForSingleObject(m_hevReadersCleared,INFINITE);
}
void LeaveWriter(void)
{
LeaveCriticalSection(&m_csWrite);
}
};
I didn't have a chance to try it, but the code looks OK.
You can implement a Readers-Writers lock following the exact Wikipedia algorithm from here (I wrote it):
#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
int g_sharedData = 0;
int g_readersWaiting = 0;
std::mutex mu;
bool g_writerWaiting = false;
std::condition_variable cond;
void reader(int i)
{
std::unique_lock<std::mutex> lg{mu};
while(g_writerWaiting)
cond.wait(lg);
++g_readersWaiting;
// reading
std::cout << "\n reader #" << i << " is reading data = " << g_sharedData << '\n';
// end reading
--g_readersWaiting;
while(g_readersWaiting > 0)
cond.wait(lg);
cond.notify_one();
}
void writer(int i)
{
std::unique_lock<std::mutex> lg{mu};
while(g_writerWaiting)
cond.wait(lg);
// writing
std::cout << "\n writer #" << i << " is writing\n";
g_sharedData += i * 10;
// end writing
g_writerWaiting = true;
while(g_readersWaiting > 0)
cond.wait(lg);
g_writerWaiting = false;
cond.notify_all();
}//lg.unlock()
int main()
{
std::thread reader1{reader, 1};
std::thread reader2{reader, 2};
std::thread reader3{reader, 3};
std::thread reader4{reader, 4};
std::thread writer1{writer, 1};
std::thread writer2{writer, 2};
std::thread writer3{writer, 3};
std::thread writer4{reader, 4};
reader1.join();
reader2.join();
reader3.join();
reader4.join();
writer1.join();
writer2.join();
writer3.join();
writer4.join();
return(0);
}
I believe this is what you are looking for:
class Commons {
std::mutex write_m_;
std::atomic<unsigned int> readers_;
public:
Commons() : readers_(0) {
}
void read_lock() {
write_m_.lock();
++readers_;
write_m_.unlock();
}
bool try_read_lock() {
if (write_m_.try_lock()) {
++readers_;
write_m_.unlock();
return true;
}
return false;
}
// Note: unlock without holding a lock is Undefined Behavior!
void read_unlock() {
--readers_;
}
// Note: This implementation uses a busy wait to make other functions more efficient.
// Consider using try_write_lock instead! and note that the number of readers can be accessed using readers()
void write_lock() {
while (readers_) {}
if (!write_m_.try_lock())
write_lock();
}
bool try_write_lock() {
if (!readers_)
return write_m_.try_lock();
return false;
}
// Note: unlock without holding a lock is Undefined Behavior!
void write_unlock() {
write_m_.unlock();
}
int readers() {
return readers_;
}
};
For the record since C++17 we have std::shared_mutex, see: https://en.cppreference.com/w/cpp/thread/shared_mutex