Why would one want multiple locks within the same data structure? - concurrency

I've been studying some existing C code for a device driver I want to write. It is my first time dealing with concurrency and locking and also the first time I've dealt with anything hardware-related like device drivers. I found a piece of code that seems to use multiple locks for the same data structure. I'm wondering why would anyone include two different locking structures (both a spinlock and a mutex) within the same data structure instead of just a single lock that locks the entire data structure?
struct wiimote_state {
spinlock_t lock;
__u32 flags;
__u8 accel_split[2];
__u8 drm;
__u8 devtype;
__u8 exttype;
__u8 mp;
/* synchronous cmd requests */
struct mutex sync;
struct completion ready;
int cmd;
__u32 opt;
/* results of synchronous requests */
__u8 cmd_battery;
__u8 cmd_err;
__u8 *cmd_read_buf;
__u8 cmd_read_size;
/* calibration/cache data */
__u16 calib_bboard[4][3];
__s16 calib_pro_sticks[4];
__u8 pressure_drums[7];
__u8 cache_rumble;
};
An example of a code snippet using these multiple locks:
static void wiimote_init_detect(struct wiimote_data *wdata)
{
__u8 exttype = WIIMOTE_EXT_NONE, extdata[6];
bool ext;
int ret;
wiimote_cmd_acquire_noint(wdata);
spin_lock_irq(&wdata->state.lock);
wdata->state.devtype = WIIMOTE_DEV_UNKNOWN;
wiimote_cmd_set(wdata, WIIPROTO_REQ_SREQ, 0);
wiiproto_req_status(wdata);
spin_unlock_irq(&wdata->state.lock);
ret = wiimote_cmd_wait_noint(wdata);
if (ret)
goto out_release;
spin_lock_irq(&wdata->state.lock);
ext = wdata->state.flags & WIIPROTO_FLAG_EXT_PLUGGED;
spin_unlock_irq(&wdata->state.lock);
if (!ext)
goto out_release;
wiimote_cmd_init_ext(wdata);
exttype = wiimote_cmd_read_ext(wdata, extdata);
out_release:
wiimote_cmd_release(wdata);
wiimote_init_set_type(wdata, exttype);
/* schedule MP timer */
spin_lock_irq(&wdata->state.lock);
if (!(wdata->state.flags & WIIPROTO_FLAG_BUILTIN_MP) &&
!(wdata->state.flags & WIIPROTO_FLAG_NO_MP))
mod_timer(&wdata->timer, jiffies + HZ * 4);
spin_unlock_irq(&wdata->state.lock);
}
static inline void wiimote_cmd_acquire_noint(struct wiimote_data *wdata)
{
mutex_lock(&wdata->state.sync);
}
static inline void wiimote_cmd_release(struct wiimote_data *wdata)
{
mutex_unlock(&wdata->state.sync);
}

A couple of things going on here.
First, don't think of the spinlock and the mutex as necessarily being "for" the data structure, just because they are "in" the data structure. In this case the spinlock appears to be what you describe, a lock for accessing the data structure. The mutex seems to be a lock for the action of commanding the wiimote.
Next, spinlocks and mutexes are quite different, the key being that mutexes are blocking and spinlocks poll until the resource is available.
Finally, although that's not what I see here, a module could contain both spinlocks and mutexes doing a similar function, if they were implemented at different times by different developers.

Related

Libcurl - CURLSHOPT_LOCKFUNC - how to use it?

Please tell me the nuances of using the option CURLSHOPT_LOCKFUNC ?
I found an example here.
I can't quite understand, is there an array of mutexes used to block access to data?
--Here is this part of the code from the example:
static pthread_mutex_t lockarray[NUM_LOCKS]; Is it an array of mutexes ?
//.....
static void lock_cb(CURL *handle, curl_lock_data data,
curl_lock_access access, void *userptr)
{
(void)access;
(void)userptr;
(void)handle;
pthread_mutex_lock(&lockarray[data]);
}
That is, from this section of code - I can conclude that different mutexes are used to lock CURLSHOPT_LOCKFUNC ?
I don't understand how it is? Isn't there more than one mutex needed to synchronize threads to access data?
And this part of the code is also not clear:
pthread_mutex_lock(&lockarray[data]);
What kind of variable is data ?
The documentation says: https://curl.se/libcurl/c/CURLSHOPT_LOCKFUNC.html
The data argument tells what kind of data libcurl wants to lock. Make
sure that the callback uses a different lock for each kind of data.
What means: "what kind of data libcurl wants to lock" ??
I can't understand.
Yes, the code is using an array of mutexes.
curl_lock_data is an enum that is defined in curl.h and specifies the different types of data that curl uses locks for:
/* Different data locks for a single share */
typedef enum {
CURL_LOCK_DATA_NONE = 0,
/* CURL_LOCK_DATA_SHARE is used internaly to say that
* the locking is just made to change the internal state of the share
* itself.
*/
CURL_LOCK_DATA_SHARE,
CURL_LOCK_DATA_COOKIE,
CURL_LOCK_DATA_DNS,
CURL_LOCK_DATA_SSL_SESSION,
CURL_LOCK_DATA_CONNECT,
CURL_LOCK_DATA_LAST
} curl_lock_data;
Since the values are sequential, the code is using them as indexes into the array.
So, for example, when multiple threads are accessing shared cookie data, they will lock/unlock the CURL_LOCK_DATA_COOKIE mutex.
curl_lock_access is another enum defined in curl.h, which specifies why a given lock is being obtained:
/* Different lock access types */
typedef enum {
CURL_LOCK_ACCESS_NONE = 0, /* unspecified action */
CURL_LOCK_ACCESS_SHARED = 1, /* for read perhaps */
CURL_LOCK_ACCESS_SINGLE = 2, /* for write perhaps */
CURL_LOCK_ACCESS_LAST /* never use */
} curl_lock_access;
The difference between these reasons doesn't matter for a mutex, which has only a notion of exclusive access, meaning only 1 thread at a time can hold the lock. That is why the code is not using the access parameter.
But, other types of locks, for instance a RW (reader/writer) lock, do have a concept of shared access (ie, multiple threads can hold the lock at the same time) vs exclusive access. When multiple threads want to access the same shared data for just reading only, it makes sense for them to not block each other. Only when a thread wants to modify the shared data should other threads then be blocked from accessing that data, for reading or writing, until the modification is finished.
For example:
const int NUM_LOCKS = CURL_LOCK_DATA_LAST + 1;
static pthread_rwlock_t lockarray[NUM_LOCKS];
...
static void lock_cb(CURL *handle, curl_lock_data data,
curl_lock_access access, void *userptr)
{
(void)handle;
(void)userptr;
switch (access) {
case CURL_LOCK_ACCESS_SHARED:
pthread_rwlock_rdlock(&lockarray[data]);
break;
case CURL_LOCK_ACCESS_ SINGLE:
pthread_rwlock_wrlock(&lockarray[data]);
break;
}
}
static void unlock_cb(CURL *handle, curl_lock_data data,
curl_lock_access access, void *userptr)
{
(void)handle;
(void)access;
(void)userptr;
pthread_rwlock_unlock(&lockarray[data]);
}

Making processes wait until resource is loaded into shared memory using Boost.Interprocess

Suppose I have n processes with IDs 1 to n. I have a file with lots of data, where each process will only store a disjoint subset of the data. I would like to load and process the file using exactly one process, store the resulting data in a data structure allocated via Boost.Interprocess in shared memory, and then allow any (including the one who loaded the file) process to read from the data.
For this to work, I need to make use of some of the Boost.Interprocess synchronization constructs located here to ensure processes do not try to read the data before it has been loaded. However, I am struggling with this part and it is likely due to my lack of experience in this area. At the moment, I have process(1) loading the file into shared memory and I need a way to ensure any given process cannot read the file contents until the load is complete, even if the read might happen arbitrarily long after the loading occurs.
I wanted to try and use a combination of a mutex and condition variable using the notify_all call so that process(1) can signal to the other processes it is okay to read from the shared memory data, but this seems to have an issue in that process(1) might send a notify_all call before some process(i) has even tried to wait for the condition variable to signal it is okay to read the data.
Any ideas for how to approach this in a reliable manner?
Edit 1
Here is my attempt to clarify my dilemma and express more clearly what I have tried. I have some class that I allocate into a shared memory space using Boost.Interprocess that has a form similar to the below:
namespace bi = boost::interprocess;
class cache {
public:
cache() = default;
~cache() = default;
void set_process_id(std::size_t ID) { id = ID; }
void load_file(const std::string& filename) {
// designated process to load
// file has ID equal to 0
if( id == 0 ){
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
// do work to process the file and
// place result in the data variable
// after processing file, notify all other
// processes that they can access the data
load_cond.notify_all();
}
}
void read_into(std::array<double, 100>& data_out) {
{ // wait to read data until load is complete
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
load_cond.wait(lock);
}
data_out = data;
}
private:
size_t id;
std::array<double, 100> data;
bi::interprocess_mutex m;
bi::interprocess_condition load_cond;
};
The above is roughly what I had when I asked the question but did not sit well with me because if the read_into method was called after the designated process executes the notify_all call, then the read_into would be stuck. What I just did this morning that seems to fix this dilemma is change this class to the following:
namespace bi = boost::interprocess;
class cache {
public:
cache():load_is_complete(false){}
~cache() = default;
void set_process_id(std::size_t ID) { id = ID; }
void load_file(const std::string& filename) {
// designated process to load
// file has ID equal to 0
if( id == 0 ){
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
// do work to process the file and
// place result in the data variable
// after processing file, notify all other
// processes that they can access the data
load_is_complete = true;
load_cond.notify_all();
}
}
void read_into(std::array<double, 100>& data_out) {
{ // wait to read data until load is complete
// lock using the mutex
bi::scoped_lock<bi::interprocess_mutex> lock(m);
if( not load_is_complete ){
load_cond.wait(lock);
}
}
data_out = data;
}
private:
size_t id;
std::array<double, 100> data;
bool load_is_complete;
bi::interprocess_mutex m;
bi::interprocess_condition load_cond;
};
Not sure if the above is the most elegant, but I believe it should ensure processes cannot access the data being stored in shared memory until it has completed loading, whether they get to the mutex m before the designated process or after the designated process has loaded the file contents. If there is a more elegant way, I would like to know.
The typical way is to use named interprocess mutexex. See e.g. the example(s) in Boris Schälings "Boost" Book, which is freely available, also online: https://theboostcpplibraries.com/boost.interprocess-synchronization
If your segment creation is already suitably synchronized, you can use "unnamed" interprocess mutices inside your shared segment, which is usually more efficient and avoids polluting system namespaces with extraneous synchronization primitives.

How can I implement a C++ Reader-Writer lock using a single unlock method, which can be called be a reader or writer?

I'm working on a project, which requires the use of specific OS abstractions and I need to implement a reader-writer lock using their semaphore and mutex. I currently, have a setup in the format:
class ReadWriteLock
{
public:
ReadWriteLock(uint32_t maxReaders);
~ReadWriteLock();
uint32_t GetMaxReaders() const;
eResult GetReadLock(int32_t timeout);
eResult GetWriteLock(int32_t timeout);
eResult Unlock();
private:
uint32_t m_MaxReaders;
Mutex* m_WriterMutex;
Semaphore* m_ReaderSemaphore;
};
In this implementation I need to use this Unlock method to either unlock the writer and release all reader semaphore slots, or to simply unleash a reader semaphore slot, however, I am struggling as I cannot think of an implementation, which will be work in all cases. How can I make this work in the given setup? I know it is possible as POSIX was able to implement a universal unlock method in their implementation, but I cannot find any indication of how that was done, so would appreciate any information people can share.
Note that I cannot use C++11 or other OS primitives.
Well, define two functions UnlockRead and UnlockWrite.
I believe you do not need both accesses (Write/Read) at the same time in the same place. So what I am proposing is to have two other classes for locking access:
class ReadWriteAccess
{
public:
ReadWriteAccess(uint32_t maxReaders);
~ReadWriteAccess();
uint32_t GetMaxReaders() const;
uint32_t GetMaxReaders() const;
eResult GetReadLock(int32_t timeout);
eResult GetWriteLock(int32_t timeout);
eResult UnlockWrite();
eResult UnlockRead();
private:
uint32_t m_MaxReaders;
Mutex* m_WriterMutex;
Semaphore* m_ReaderSemaphore;
};
And have separate classes for read and write lock and use RAII to be always on safe side:
class ReadLock
{
public:
ReadLock(ReadWriteAccess& access, int32_t timeout) : access(access)
{
result = access.GetReadLock(timeout);
}
eResult getResult() const { return result; }
~ReadLock()
{
if (result)
access.UnlockRead();
}
private:
ReadWriteAccess& access;
eResult result;
};
and use like this:
T someResource;
ReadWriteAccess someResourceGuard;
void someFunction()
{
ReadLock lock(someResourceGuard);
if (lock.getResult())
cout << someResource; // it is safe to read something from resource
}
Of course, the very similar implementation you can easily write by yourself for WriteLock
Since OP insisted in comments to have "one" Unlock - please consider the drawbacks:
Assume it is implemented some kind of stack of last calls to Lock functions:
class ReadWriteLock
{
public:
ReadWriteLock(uint32_t maxReaders);
~ReadWriteLock();
uint32_t GetMaxReaders() const;
eResult GetReadLock(int32_t timeout)
{
eResult result = GetReadLockImpl(timestamp);
if (result)
lockStack.push(READ);
}
eResult GetWriteLock(int32_t timeout)
{
eResult result = GetWriteLockImpl(timestamp);
if (result)
lockStack.push(WRITE);
}
eResult Unlock()
{
LastLockMode lockMode = lockStack.top();
lockStack.pop();
if (lockMode == READ)
UnlockReadImpl();
else
UnlockWriteImpl();
}
private:
uint32_t m_MaxReaders;
Mutex* m_WriterMutex;
Semaphore* m_ReaderSemaphore;
enum Mode { READ, WRITE };
std::stack<Mode> lockStack;
};
But the above would work only in one-thread application. And one-thread application never need any locks.
So - you have to have multi-thread stack - like:
template <typename Value>
class MultiThreadStack
{
public:
void push(Value)
{
stackPerThread[getThreadId()].push(value);
}
Value top()
{
return stackPerThread[getThreadId()].top();
}
void pop()
{
stackPerThread[getThreadId()].pop();
}
private:
ThreadId getThreadId() { return /* your system way to get thread id*/; }
std::map<ThreadId, std::stack<Value>> stackPerThread;
};
So use this MultiThreadStack not std::stack in ReadWriteLock.
But, the std::map above would need ReadWriteLock to lock access to it from multuple threads - so, well, either you know all your threads before you start using this stuff (preregistration) or you end up in the same problem as described here. So my advice - if you can - change your design.
When acquiring the lock successfully the type is known: either you have many readers running or only one writer, you cannot have both readers and writers running with a validly acquired lock.
So it suffices to store the current lock mode when a lock call succeeds and all following unlock calls (potentially many in case reading permit was provided, only one indeed if writing lock was requested) will be of that mode.

Multithreaded C++: force read from memory, bypassing cache

I'm working on a personal hobby-time game engine and I'm working on a multithreaded batch executor. I was originally using a concurrent lockless queue and std::function all over the place to facilitate communication between the master and slave threads, but decided to scrap it in favor of a lighter-weight way of doing things that give me tight control over memory allocation: function pointers and memory pools.
Anyway, I've run into a problem:
The function pointer, no matter what I try, is only getting read correctly by one thread while the others read a null pointer and thus fail an assert.
I'm fairly certain this is a problem with caching. I have confirmed that all threads have the same address for the pointer. I've tried declaring it as volatile, intptr_t, std::atomic, and tried all sorts of casting-fu and the threads all just seem to ignore it and continue reading their cached copies.
I've modeled the master and slave in a model checker to make sure the concurrency is good, and there is no livelock or deadlock (provided that the shared variables all synchronize correctly)
void Executor::operator() (int me) {
while (true) {
printf("Slave %d waiting.\n", me);
{
std::unique_lock<std::mutex> lock(batch.ready_m);
while(!batch.running) batch.ready.wait(lock);
running_threads++;
}
printf("Slave %d running.\n", me);
BatchFunc func = batch.func;
assert(func != nullptr);
int index;
if (batch.store_values) {
while ((index = batch.item.fetch_add(1)) < batch.n_items) {
void* data = reinterpret_cast<void*>(batch.data_buffer + index * batch.item_size);
func(batch.share_data, data);
}
}
else {
while ((index = batch.item.fetch_add(1)) < batch.n_items) {
void** data = reinterpret_cast<void**>(batch.data_buffer + index * batch.item_size);
func(batch.share_data, *data);
}
}
// at least one thread finished, so make sure we won't loop back around
batch.running = false;
if (running_threads.fetch_sub(1) == 1) { // I am the last one
batch.done = true; // therefore all threads are done
batch.complete.notify_all();
}
}
}
void Executor::run_batch() {
assert(!batch.running);
if (batch.func == nullptr || batch.n_items == 0) return;
batch.item.store(0);
batch.running = true;
batch.done = false;
batch.ready.notify_all();
printf("Master waiting.\n");
{
std::unique_lock<std::mutex> lock(batch.complete_m);
while (!batch.done) batch.complete.wait(lock);
}
printf("Master ready.\n");
batch.func = nullptr;
batch.n_items = 0;
}
batch.func is being set by another function
template<typename SharedT, typename ItemT>
void set_batch_job(void(*func)(const SharedT*, ItemT*), const SharedT& share_data, bool byValue = true) {
static_assert(sizeof(SharedT) <= SHARED_DATA_MAXSIZE, "Shared data too large");
static_assert(std::is_pod<SharedT>::value, "Shared data type must be POD");
assert(std::is_pod<ItemT>::value || !byValue);
assert(!batch.running);
batch.func = reinterpret_cast<volatile BatchFunc>(func);
memcpy(batch.share_data, (void*) &share_data, sizeof(SharedT));
batch.store_values = byValue;
if (byValue) {
batch.item_size = sizeof(ItemT);
}
else { // store pointers instead of values
batch.item_size = sizeof(ItemT*);
}
batch.n_items = 0;
}
and here is the struct (and typedef) that it's dealing with
typedef void(*BatchFunc)(const void*, void*);
struct JobBatch {
volatile BatchFunc func;
void* const share_data = operator new(SHARED_DATA_MAXSIZE);
intptr_t const data_buffer = reinterpret_cast<intptr_t>(operator new (EXEC_DATA_BUFFER_SIZE));
volatile size_t item_size;
std::atomic<int> item; // Index into the data array
volatile int n_items = 0;
std::condition_variable complete; // slave -> master signal
std::condition_variable ready; // master -> slave signal
std::mutex complete_m;
std::mutex ready_m;
bool store_values = false;
volatile bool running = false; // there is work to do in the batch
volatile bool done = false; // there is no work left to do
JobBatch();
} batch;
How do I make sure that all the necessary reads and writes to batch.func get synchronized properly between threads?
Just in case it matters: I'm using Visual Studio and compiling an x64 Debug Windows executable. Intel i5, Windows 10, 8GB RAM.
So I did a little reading on the C++ memory model and I managed to hack together a solution using atomic_thread_fence. Everything is probably super broken because I'm crazy and shouldn't roll my own system here, but hey, it's fun to learn!
Basically, whenever you're done writing things that you want other threads to see, you need to call atomic_thread_fence(std::memory_order_release)
On the receiving thread(s), you call atomic_thread_fence(std::memory_order_acquire) before reading shared data.
In my case, release should be done immediately before waiting on a condition variable and acquire should be done immediately before using data written by other threads.
This ensures that the writes on one thread are seen by the others.
I'm no expert, so this is probably not the right way to tackle the problem and will likely be faced with certain doom. For instance, I still have a deadlock/livelock problem to sort out.
tl;dr: it's not exactly a cache thing: threads may not have their data totally in sync with each other unless you enforce that with atomic memory fences.

Mutex Safety with Interrupts (Embedded Firmware)

Edit #Mike pointed out that my try_lock function in the code below is unsafe and that accessor creation can produce a race condition as well. The suggestions (from everyone) have convinced me that I'm going down the wrong path.
Original Question
The requirements for locking on an embedded microcontroller are different enough from multithreading that I haven't been able to convert multithreading examples to my embedded applications. Typically I don't have an OS or threads of any kind, just main and whatever interrupt functions are called by the hardware periodically.
It's pretty common that I need to fill up a buffer from an interrupt, but process it in main. I've created the IrqMutex class below to try to safely implement this. Each person trying to access the buffer is assigned a unique id through IrqMutexAccessor, then they each can try_lock() and unlock(). The idea of a blocking lock() function doesn't work from interrupts because unless you allow the interrupt to complete, no other code can execute so the unlock() code never runs. I do however use a blocking lock from the main() code occasionally.
However, I know that the double-check lock doesn't work without C++11 memory barriers (which aren't available on many embedded platforms). Honestly despite reading quite a bit about it, I don't really understand how/why the memory access reordering can cause a problem. I think that the use of volatile sig_atomic_t (possibly combined with the use of unique IDs) makes this different from the double-check lock. But I'm hoping someone can: confirm that the following code is correct, explain why it isn't safe, or offer a better way to accomplish this.
class IrqMutex {
friend class IrqMutexAccessor;
private:
std::sig_atomic_t accessorIdEnum;
volatile std::sig_atomic_t owner;
protected:
std::sig_atomic_t nextAccessor(void) { return ++accessorIdEnum; }
bool have_lock(std::sig_atomic_t accessorId) {
return (owner == accessorId);
}
bool try_lock(std::sig_atomic_t accessorId) {
// Only try to get a lock, while it isn't already owned.
while (owner == SIG_ATOMIC_MIN) {
// <-- If an interrupt occurs here, both attempts can get a lock at the same time.
// Try to take ownership of this Mutex.
owner = accessorId; // SET
// Double check that we are the owner.
if (owner == accessorId) return true;
// Someone else must have taken ownership between CHECK and SET.
// If they released it after CHECK, we'll loop back and try again.
// Otherwise someone else has a lock and we have failed.
}
// This shouldn't happen unless they called try_lock on something they already owned.
if (owner == accessorId) return true;
// If someone else owns it, we failed.
return false;
}
bool unlock(std::sig_atomic_t accessorId) {
// Double check that the owner called this function (not strictly required)
if (owner == accessorId) {
owner = SIG_ATOMIC_MIN;
return true;
}
// We still return true if the mutex was unlocked anyway.
return (owner == SIG_ATOMIC_MIN);
}
public:
IrqMutex(void) : accessorIdEnum(SIG_ATOMIC_MIN), owner(SIG_ATOMIC_MIN) {}
};
// This class is used to manage our unique accessorId.
class IrqMutexAccessor {
friend class IrqMutex;
private:
IrqMutex& mutex;
const std::sig_atomic_t accessorId;
public:
IrqMutexAccessor(IrqMutex& m) : mutex(m), accessorId(m.nextAccessor()) {}
bool have_lock(void) { return mutex.have_lock(accessorId); }
bool try_lock(void) { return mutex.try_lock(accessorId); }
bool unlock(void) { return mutex.unlock(accessorId); }
};
Because there is one processor, and no threading the mutex serves what I think is a subtly different purpose than normal. There are two main use cases I run into repeatedly.
The interrupt is a Producer and takes ownership of a free buffer and loads it with a packet of data. The interrupt/Producer may keep its ownership lock for a long time spanning multiple interrupt calls. The main function is the Consumer and takes ownership of a full buffer when it is ready to process it. The race condition rarely happens, but if the interrupt/Producer finishes with a packet and needs a new buffer, but they are all full it will try to take the oldest buffer (this is a dropped packet event). If the main/Consumer started to read and process that oldest buffer at exactly the same time they would trample all over each other.
The interrupt is just a quick change or increment of something (like a counter). However, if we want to reset the counter or jump to some new value with a call from the main() code we don't want to try to write to the counter as it is changing. Here main actually does a blocking loop to obtain a lock, however I think its almost impossible to have to actually wait here for more than two attempts. Once it has a lock, any calls to the counter interrupt will be skipped, but that's generally not a big deal for something like a counter. Then I update the counter value and unlock it so it can start incrementing again.
I realize these two samples are dumbed down a bit, but some version of these patterns occur in many of the peripherals in every project I work on and I'd like once piece of reusable code that can safely handle this across various embedded platforms. I included the C tag, because all of this is directly convertible to C code, and on some embedded compilers that's all that is available. So I'm trying to find a general method that is guaranteed to work in both C and C++.
struct ExampleCounter {
volatile long long int value;
IrqMutex mutex;
} exampleCounter;
struct ExampleBuffer {
volatile char data[256];
volatile size_t index;
IrqMutex mutex; // One mutex per buffer.
} exampleBuffers[2];
const volatile char * const REGISTER;
// This accessor shouldn't be created in an interrupt or a race condition can occur.
static IrqMutexAccessor myMutex(exampleCounter.mutex);
void __irqQuickFunction(void) {
// Obtain a lock, add the data then unlock all within one function call.
if (myMutex.try_lock()) {
exampleCounter.value++;
myMutex.unlock();
} else {
// If we failed to obtain a lock, we skipped this update this one time.
}
}
// These accessors shouldn't be created in an interrupt or a race condition can occur.
static IrqMutexAccessor myMutexes[2] = {
IrqMutexAccessor(exampleBuffers[0].mutex),
IrqMutexAccessor(exampleBuffers[1].mutex)
};
void __irqLongFunction(void) {
static size_t bufferIndex = 0;
// Check if we have a lock.
if (!myMutex[bufferIndex].have_lock() and !myMutex[bufferIndex].try_lock()) {
// If we can't get a lock try the other buffer
bufferIndex = (bufferIndex + 1) % 2;
// One buffer should always be available so the next line should always be successful.
if (!myMutex[bufferIndex].try_lock()) return;
}
// ... at this point we know we have a lock ...
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
exampleBuffers[bufferIndex].data[exampleBuffers[bufferIndex].index++] = c;
// We may keep the lock for multiple function calls until the end of packet.
static const char END_PACKET_SIGNAL = '\0';
if (c == END_PACKET_SIGNAL) {
// Unlock this buffer so it can be read from main.
myMutex[bufferIndex].unlock();
// Switch to the other buffer for next time.
bufferIndex = (bufferIndex + 1) % 2;
}
}
int main(void) {
while (true) {
// Mutex for counter
static IrqMutexAccessor myCounterMutex(exampleCounter.mutex);
// Change counter value
if (EVERY_ONCE_IN_A_WHILE) {
// Skip any updates that occur while we are updating the counter.
while(!myCounterMutex.try_lock()) {
// Wait for the interrupt to release its lock.
}
// Set the counter to a new value.
exampleCounter.value = 500;
// Updates will start again as soon as we unlock it.
myCounterMutex.unlock();
}
// Mutexes for __irqLongFunction.
static IrqMutexAccessor myBufferMutexes[2] = {
IrqMutexAccessor(exampleBuffers[0].mutex),
IrqMutexAccessor(exampleBuffers[1].mutex)
};
// Process buffers from __irqLongFunction.
for (size_t i = 0; i < 2; i++) {
// Obtain a lock so we can read the data.
if (!myBufferMutexes[i].try_lock()) continue;
// Check that the buffer isn't empty.
if (exampleBuffers[i].index == 0) {
myBufferMutexes[i].unlock(); // Don't forget to unlock.
continue;
}
// ... read and do something with the data here ...
exampleBuffer.index = 0;
myBufferMutexes[i].unlock();
}
}
}
}
Also note that I used volatile on any variable that is read-by or written-by the interrupt routine (unless the variable was only accessed from the interrupt like the static bufferIndex value in __irqLongFunction). I've read that mutexes remove some of need for volatile in multithreaded code, but I don't think that applies here. Did I use the right amount of volatile? I used it on: ExampleBuffer[].data[256], ExampleBuffer[].index, and ExampleCounter.value.
I apologize for the long answer, but perhaps it is fitting for a long question.
To answer your first question, I would say that your implementation of IrqMutex is not safe. Let me try to explain where I see problems.
Function nextAccessor
std::sig_atomic_t nextAccessor(void) { return ++accessorIdEnum; }
This function has a race condition, because the increment operator is not atomic, despite it being on an atomic value marked volatile. It involves 3 operations: reading the current value of accessorIdEnum, incrementing it, and writing the result back. If two IrqMutexAccessors are created at the same time, it's possible that they both get the same ID.
Function try_lock
The try_lock function also has a race condition. One thread (eg main), could go into the while loop, and then before taking ownership, another thread (eg an interrupt) can also go into the while loop and take ownership of the lock (returning true). Then the first thread can continue, moving onto owner = accessorId, and thus "also" take the lock. So two threads (or your main thread and an interrupt) can try_lock on an unowned mutex at the same time and both return true.
Disabling interrupts by RAII
We can achieve some level of simplicity and encapsulation by using RAII for interrupt disabling, for example the following class:
class InterruptLock {
public:
InterruptLock() {
prevInterruptState = currentInterruptState();
disableInterrupts();
}
~InterruptLock() {
restoreInterrupts(prevInterruptState);
}
private:
int prevInterruptState; // Whatever type this should be for the platform
InterruptLock(const InterruptLock&); // Not copy-constructable
};
And I would recommend disabling interrupts to get the atomicity you need within the mutex implementation itself. For example something like:
bool try_lock(std::sig_atomic_t accessorId) {
InterruptLock lock;
if (owner == SIG_ATOMIC_MIN) {
owner = accessorId;
return true;
}
return false;
}
bool unlock(std::sig_atomic_t accessorId) {
InterruptLock lock;
if (owner == accessorId) {
owner = SIG_ATOMIC_MIN;
return true;
}
return false;
}
Depending on your platform, this might look different, but you get the idea.
As you said, this provides a platform to abstract away from the disabling and enabling interrupts in general code, and encapsulates it to this one class.
Mutexes and Interrupts
Having said how I would consider implementing the mutex class, I would not actually use a mutex class for your use-cases. As you pointed out, mutexes don't really play well with interrupts, because an interrupt can't "block" on trying to acquire a mutex. For this reason, for code that directly exchanges data with an interrupt, I would instead strongly consider just directly disabling interrupts (for a very short time while the main "thread" touches the data).
So your counter might simply look like this:
volatile long long int exampleCounter;
void __irqQuickFunction(void) {
exampleCounter++;
}
...
// Change counter value
if (EVERY_ONCE_IN_A_WHILE) {
InterruptLock lock;
exampleCounter = 500;
}
In my mind, this is easier to read, easier to reason about, and won't "slip" when there's contention (ie miss timer beats).
Regarding the buffer use-case, I would strongly recommend against holding a lock for multiple interrupt cycles. A lock/mutex should be held for just the slightest moment required to "touch" a piece of memory - just long enough to read or write it. Get in, get out.
So this is how the buffering example might look:
struct ExampleBuffer {
char data[256];
} exampleBuffers[2];
ExampleBuffer* volatile bufferAwaitingConsumption = nullptr;
ExampleBuffer* volatile freeBuffer = &exampleBuffers[1];
const volatile char * const REGISTER;
void __irqLongFunction(void) {
static const char END_PACKET_SIGNAL = '\0';
static size_t index = 0;
static ExampleBuffer* receiveBuffer = &exampleBuffers[0];
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
receiveBuffer->data[index++] = c;
// End of packet?
if (c == END_PACKET_SIGNAL) {
// Make the packet available to the consumer
bufferAwaitingConsumption = receiveBuffer;
// Move on to the next buffer
receiveBuffer = freeBuffer;
freeBuffer = nullptr;
index = 0;
}
}
int main(void) {
while (true) {
// Fetch packet from shared variable
ExampleBuffer* packet;
{
InterruptLock lock;
packet = bufferAwaitingConsumption;
bufferAwaitingConsumption = nullptr;
}
if (packet) {
// ... read and do something with the data here ...
// Once we're done with the buffer, we need to release it back to the producer
{
InterruptLock lock;
freeBuffer = packet;
}
}
}
}
This code is arguably easier to reason about, since there are only two memory locations shared between the interrupt and the main loop: one to pass packets from the interrupt to the main loop, and one to pass empty buffers back to the interrupt. We also only touch those variables under "lock", and only for the minimum time needed to "move" the value. (for simplicity I've skipped over the buffer overflow logic when the main loop takes too long to free the buffer).
It's true that in this case one may not even need the locks, since we're just reading and writing simple value, but the cost of disabling the interrupts is not much, and the risk of making mistakes otherwise, is not worth it in my opinion.
Edit
As pointed out in the comments, the above solution was meant to only tackle the multithreading problem, and omitted overflow checking. Here is more complete solution which should be robust under overflow conditions:
const size_t BUFFER_COUNT = 2;
struct ExampleBuffer {
char data[256];
ExampleBuffer* next;
} exampleBuffers[BUFFER_COUNT];
volatile size_t overflowCount = 0;
class BufferList {
public:
BufferList() : first(nullptr), last(nullptr) { }
// Atomic enqueue
void enqueue(ExampleBuffer* buffer) {
InterruptLock lock;
if (last)
last->next = buffer;
else {
first = buffer;
last = buffer;
}
}
// Atomic dequeue (or returns null)
ExampleBuffer* dequeueOrNull() {
InterruptLock lock;
ExampleBuffer* result = first;
if (first) {
first = first->next;
if (!first)
last = nullptr;
}
return result;
}
private:
ExampleBuffer* first;
ExampleBuffer* last;
} freeBuffers, buffersAwaitingConsumption;
const volatile char * const REGISTER;
void __irqLongFunction(void) {
static const char END_PACKET_SIGNAL = '\0';
static size_t index = 0;
static ExampleBuffer* receiveBuffer = &exampleBuffers[0];
// Recovery from overflow?
if (!receiveBuffer) {
// Try get another free buffer
receiveBuffer = freeBuffers.dequeueOrNull();
// Still no buffer?
if (!receiveBuffer) {
overflowCount++;
return;
}
}
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
if (index < sizeof(receiveBuffer->data))
receiveBuffer->data[index++] = c;
// End of packet, or out of space?
if (c == END_PACKET_SIGNAL) {
// Make the packet available to the consumer
buffersAwaitingConsumption.enqueue(receiveBuffer);
// Move on to the next free buffer
receiveBuffer = freeBuffers.dequeueOrNull();
index = 0;
}
}
size_t getAndResetOverflowCount() {
InterruptLock lock;
size_t result = overflowCount;
overflowCount = 0;
return result;
}
int main(void) {
// All buffers are free at the start
for (int i = 0; i < BUFFER_COUNT; i++)
freeBuffers.enqueue(&exampleBuffers[i]);
while (true) {
// Fetch packet from shared variable
ExampleBuffer* packet = dequeueOrNull();
if (packet) {
// ... read and do something with the data here ...
// Once we're done with the buffer, we need to release it back to the producer
freeBuffers.enqueue(packet);
}
size_t overflowBytes = getAndResetOverflowCount();
if (overflowBytes) {
// ...
}
}
}
The key changes:
If the interrupt runs out of free buffers, it will recover
If the interrupt receives data while it doesn't have a receive buffer, it will communicate that to the main thread via getAndResetOverflowCount
If you keep getting buffer overflows, you can simply increase the buffer count
I've encapsulated the multithreaded access into a queue class implemented as a linked list (BufferList), which supports atomic dequeue and enqueue. The previous example also used queues, but of length 0-1 (either an item is enqueued or it isn't), and so the implementation of the queue was just a single variable. In the case of running out of free buffers, the receive queue could have 2 items, so I upgraded it to a proper queue rather than adding more shared variables.
If the interrupt is the producer and mainline code is the consumer, surely it's as simple as disabling the interrupt for the duration of the consume operation?
That's how I used to do it in my embedded micro controller days.