I know there is a lot of confusion regarding volatile.
So I have 3 real life examples I'm not sure about the correct usage of volatile.
1) DMA Stream
The hardware writes directly to data with DMA.
Is volatile needed in this span?
#include <cstdint>
#include <semaphore>
#include <span>
static std::binary_semaphore semaphore{0};
//DMA Interrupt after complete receive
extern "C" void DMAcomplete() {
semaphore.release();
}
void readFromDMA(std::span<volatile uint8_t> data) {
//Modify DMA register and start DMA
//... = data.data();
//wait for DMA to finish
semaphore.acquire();
}
2) ISR read
This example is similar to the first one, but now the ISR is actually manipulating the data.
Is volatile needed in the span?
#include <cstdint>
#include <semaphore>
#include <span>
#include <atomic>
static std::binary_semaphore semaphore{0};
static std::atomic<volatile uint8_t*> data;
static std::atomic_size_t size;
static std::atomic_size_t index;
//Interrupt is called per byte
extern "C" void ISRperByte() {
uint8_t receivedData;
data[index++] = receivedData;
//Receive complete
if(index >= size-1)
semaphore.release();
}
void readFromISR(std::span<volatile uint8_t> toRead) {
data = toRead.data();
size = toRead.size();
index = 0;
//Enable Interrupt etc.
//Wait until all reads are done
semaphore.acquire();
}
3) ISR Callback
Does ICallback* have to be volatile?
#include <atomic>
class ICallback {
public:
virtual ~ICallback() = default;
virtual void doStuff() volatile = 0;
};
static std::atomic<volatile ICallback*> atomicCallback = nullptr;
//Interrupt is called by hardware
extern "C" void ISR() {
auto cb = atomicCallback.load();
if(cb)
cb->doStuff();
}
void setCallback(ICallback& cb) {
atomicCallback = &cb;
}
void resetCallback() {
atomicCallback = nullptr;
}
EDIT:
Here is s snippet of case 1 using C
#include <stdint.h>
#include <stdbool.h>
//Assume assigment to this is atomic
static volatile bool semaphore;
//DMA Interrupt after complete receive
void DMAcomplete() {
//notify
semaphore = 1;
}
void readFromDMA(volatile uint8_t* data, uint32_t size) {
semaphore = 0;
//Modify DMA register and start DMA
//... = data.data();
//wait for DMA to finish
while(semaphore != 1);
}
EDIT2:
If I call readfromDMA or readfromISR with none volatile data, is it valid after the function returns? Since the caller doesn't declare his data as volatile, but the data is changed during DMA/ISR, it seems a little suspicious.
Our compiler says this about volatile declared Objects:
● All accesses are preserved
● All accesses are complete, that is, the whole object is accessed
● All accesses are performed in the same order as given in the abstract machine
● All accesses are atomic, that is, they cannot be interrupted.
The compiler adheres to these rules for accesses to all 8-, 16-, and 32-bit scalar types.
For all combinations of object types not listed, only the rule that states that all accesses
are preserved applies.
So for the examples the following would be correct:
DMA
Volatile is not needed in readFromDMA.
BUT the data which is passed to this function has to be volatile for example std::vector<volatile uint8_t>.
ISR read
In this example the volatile in data is indeed needed, because the ISR manipulates it. Additionally, as in example 1, volatile is also needed in the declaration of the buffer for example std::vector<volatile uint8_t>.
ISR Callback
ICallback* does not need to be volatile, BUT the object implementing doStuff has to be aware, that this function is called from an ISR. So it may need some volatile member.
Related
In this program, why is the destructor on line 14 is called twice for the same instance of mystruct_t?
I'm assuming that all pointer manipulation in this program is thread safe. I think the atomic updates do not work on my system or compiler.
I tried this on MSVC 2017, MSVC 2019 and on clang
/* This crashes for me (line 19) */
#include <iostream>
#include <vector>
#include <thread>
#include <memory>
#include <chrono>
#include <assert.h>
struct mystruct_t {
int32_t nInvocation = 0;
~mystruct_t();
mystruct_t() = default;
};
mystruct_t::~mystruct_t() {
nInvocation++;
int nInvoke = nInvocation;
if (nInvoke > 1) {
/* destructor was invoked twice */
assert(0);
}
/* sleep is not necessary for crash */
//std::this_thread::sleep_for(std::chrono::microseconds(525));
}
std::shared_ptr<mystruct_t> globalPtr;
void thread1() {
for (;;) {
std::this_thread::sleep_for(std::chrono::microseconds(1000));
std::shared_ptr<mystruct_t> ptrNewInstance = std::make_shared<mystruct_t>();
globalPtr = ptrNewInstance;
}
}
void thread2() {
for (;;) {
std::shared_ptr<mystruct_t> pointerCopy = globalPtr;
}
}
int main()
{
std::thread t1;
t1 = std::thread([]() {
thread1();
});
std::thread t2;
t2 = std::thread([]() {
thread2();
});
for (int i = 0;; ++i) {
std::this_thread::sleep_for(std::chrono::microseconds(1000));
std::shared_ptr<mystruct_t> pointerCopy = globalPtr;
globalPtr = nullptr;
}
return 0;
}
As several users here already mentioned, you're running into undefined behavior since you globally (or foreign threaded) alter your referred object while thread-locally, you try to copy assign it. A drawback of the sharedPtr especially for newcomers is the quite hidden danger in the suggestion, you're always thread safe in copying them. As you do not use references, this can become even harder to see in doubt. Always try to see the shared_ptr as a regular class in the first place with a common ('trivial') member-wise copy assigment where interferences are always possible in non-protected threading environments.
If you're going to encounter similar situations with that or similar code in future, try to use a robust channeled broadcasting/event based scheme instead of locally placed locks! The channels (buffered or single data based) themselves care about the proper data lifetime then, ensuring the sharedPtr's underlying data 'rescue'.
Edit #Mike pointed out that my try_lock function in the code below is unsafe and that accessor creation can produce a race condition as well. The suggestions (from everyone) have convinced me that I'm going down the wrong path.
Original Question
The requirements for locking on an embedded microcontroller are different enough from multithreading that I haven't been able to convert multithreading examples to my embedded applications. Typically I don't have an OS or threads of any kind, just main and whatever interrupt functions are called by the hardware periodically.
It's pretty common that I need to fill up a buffer from an interrupt, but process it in main. I've created the IrqMutex class below to try to safely implement this. Each person trying to access the buffer is assigned a unique id through IrqMutexAccessor, then they each can try_lock() and unlock(). The idea of a blocking lock() function doesn't work from interrupts because unless you allow the interrupt to complete, no other code can execute so the unlock() code never runs. I do however use a blocking lock from the main() code occasionally.
However, I know that the double-check lock doesn't work without C++11 memory barriers (which aren't available on many embedded platforms). Honestly despite reading quite a bit about it, I don't really understand how/why the memory access reordering can cause a problem. I think that the use of volatile sig_atomic_t (possibly combined with the use of unique IDs) makes this different from the double-check lock. But I'm hoping someone can: confirm that the following code is correct, explain why it isn't safe, or offer a better way to accomplish this.
class IrqMutex {
friend class IrqMutexAccessor;
private:
std::sig_atomic_t accessorIdEnum;
volatile std::sig_atomic_t owner;
protected:
std::sig_atomic_t nextAccessor(void) { return ++accessorIdEnum; }
bool have_lock(std::sig_atomic_t accessorId) {
return (owner == accessorId);
}
bool try_lock(std::sig_atomic_t accessorId) {
// Only try to get a lock, while it isn't already owned.
while (owner == SIG_ATOMIC_MIN) {
// <-- If an interrupt occurs here, both attempts can get a lock at the same time.
// Try to take ownership of this Mutex.
owner = accessorId; // SET
// Double check that we are the owner.
if (owner == accessorId) return true;
// Someone else must have taken ownership between CHECK and SET.
// If they released it after CHECK, we'll loop back and try again.
// Otherwise someone else has a lock and we have failed.
}
// This shouldn't happen unless they called try_lock on something they already owned.
if (owner == accessorId) return true;
// If someone else owns it, we failed.
return false;
}
bool unlock(std::sig_atomic_t accessorId) {
// Double check that the owner called this function (not strictly required)
if (owner == accessorId) {
owner = SIG_ATOMIC_MIN;
return true;
}
// We still return true if the mutex was unlocked anyway.
return (owner == SIG_ATOMIC_MIN);
}
public:
IrqMutex(void) : accessorIdEnum(SIG_ATOMIC_MIN), owner(SIG_ATOMIC_MIN) {}
};
// This class is used to manage our unique accessorId.
class IrqMutexAccessor {
friend class IrqMutex;
private:
IrqMutex& mutex;
const std::sig_atomic_t accessorId;
public:
IrqMutexAccessor(IrqMutex& m) : mutex(m), accessorId(m.nextAccessor()) {}
bool have_lock(void) { return mutex.have_lock(accessorId); }
bool try_lock(void) { return mutex.try_lock(accessorId); }
bool unlock(void) { return mutex.unlock(accessorId); }
};
Because there is one processor, and no threading the mutex serves what I think is a subtly different purpose than normal. There are two main use cases I run into repeatedly.
The interrupt is a Producer and takes ownership of a free buffer and loads it with a packet of data. The interrupt/Producer may keep its ownership lock for a long time spanning multiple interrupt calls. The main function is the Consumer and takes ownership of a full buffer when it is ready to process it. The race condition rarely happens, but if the interrupt/Producer finishes with a packet and needs a new buffer, but they are all full it will try to take the oldest buffer (this is a dropped packet event). If the main/Consumer started to read and process that oldest buffer at exactly the same time they would trample all over each other.
The interrupt is just a quick change or increment of something (like a counter). However, if we want to reset the counter or jump to some new value with a call from the main() code we don't want to try to write to the counter as it is changing. Here main actually does a blocking loop to obtain a lock, however I think its almost impossible to have to actually wait here for more than two attempts. Once it has a lock, any calls to the counter interrupt will be skipped, but that's generally not a big deal for something like a counter. Then I update the counter value and unlock it so it can start incrementing again.
I realize these two samples are dumbed down a bit, but some version of these patterns occur in many of the peripherals in every project I work on and I'd like once piece of reusable code that can safely handle this across various embedded platforms. I included the C tag, because all of this is directly convertible to C code, and on some embedded compilers that's all that is available. So I'm trying to find a general method that is guaranteed to work in both C and C++.
struct ExampleCounter {
volatile long long int value;
IrqMutex mutex;
} exampleCounter;
struct ExampleBuffer {
volatile char data[256];
volatile size_t index;
IrqMutex mutex; // One mutex per buffer.
} exampleBuffers[2];
const volatile char * const REGISTER;
// This accessor shouldn't be created in an interrupt or a race condition can occur.
static IrqMutexAccessor myMutex(exampleCounter.mutex);
void __irqQuickFunction(void) {
// Obtain a lock, add the data then unlock all within one function call.
if (myMutex.try_lock()) {
exampleCounter.value++;
myMutex.unlock();
} else {
// If we failed to obtain a lock, we skipped this update this one time.
}
}
// These accessors shouldn't be created in an interrupt or a race condition can occur.
static IrqMutexAccessor myMutexes[2] = {
IrqMutexAccessor(exampleBuffers[0].mutex),
IrqMutexAccessor(exampleBuffers[1].mutex)
};
void __irqLongFunction(void) {
static size_t bufferIndex = 0;
// Check if we have a lock.
if (!myMutex[bufferIndex].have_lock() and !myMutex[bufferIndex].try_lock()) {
// If we can't get a lock try the other buffer
bufferIndex = (bufferIndex + 1) % 2;
// One buffer should always be available so the next line should always be successful.
if (!myMutex[bufferIndex].try_lock()) return;
}
// ... at this point we know we have a lock ...
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
exampleBuffers[bufferIndex].data[exampleBuffers[bufferIndex].index++] = c;
// We may keep the lock for multiple function calls until the end of packet.
static const char END_PACKET_SIGNAL = '\0';
if (c == END_PACKET_SIGNAL) {
// Unlock this buffer so it can be read from main.
myMutex[bufferIndex].unlock();
// Switch to the other buffer for next time.
bufferIndex = (bufferIndex + 1) % 2;
}
}
int main(void) {
while (true) {
// Mutex for counter
static IrqMutexAccessor myCounterMutex(exampleCounter.mutex);
// Change counter value
if (EVERY_ONCE_IN_A_WHILE) {
// Skip any updates that occur while we are updating the counter.
while(!myCounterMutex.try_lock()) {
// Wait for the interrupt to release its lock.
}
// Set the counter to a new value.
exampleCounter.value = 500;
// Updates will start again as soon as we unlock it.
myCounterMutex.unlock();
}
// Mutexes for __irqLongFunction.
static IrqMutexAccessor myBufferMutexes[2] = {
IrqMutexAccessor(exampleBuffers[0].mutex),
IrqMutexAccessor(exampleBuffers[1].mutex)
};
// Process buffers from __irqLongFunction.
for (size_t i = 0; i < 2; i++) {
// Obtain a lock so we can read the data.
if (!myBufferMutexes[i].try_lock()) continue;
// Check that the buffer isn't empty.
if (exampleBuffers[i].index == 0) {
myBufferMutexes[i].unlock(); // Don't forget to unlock.
continue;
}
// ... read and do something with the data here ...
exampleBuffer.index = 0;
myBufferMutexes[i].unlock();
}
}
}
}
Also note that I used volatile on any variable that is read-by or written-by the interrupt routine (unless the variable was only accessed from the interrupt like the static bufferIndex value in __irqLongFunction). I've read that mutexes remove some of need for volatile in multithreaded code, but I don't think that applies here. Did I use the right amount of volatile? I used it on: ExampleBuffer[].data[256], ExampleBuffer[].index, and ExampleCounter.value.
I apologize for the long answer, but perhaps it is fitting for a long question.
To answer your first question, I would say that your implementation of IrqMutex is not safe. Let me try to explain where I see problems.
Function nextAccessor
std::sig_atomic_t nextAccessor(void) { return ++accessorIdEnum; }
This function has a race condition, because the increment operator is not atomic, despite it being on an atomic value marked volatile. It involves 3 operations: reading the current value of accessorIdEnum, incrementing it, and writing the result back. If two IrqMutexAccessors are created at the same time, it's possible that they both get the same ID.
Function try_lock
The try_lock function also has a race condition. One thread (eg main), could go into the while loop, and then before taking ownership, another thread (eg an interrupt) can also go into the while loop and take ownership of the lock (returning true). Then the first thread can continue, moving onto owner = accessorId, and thus "also" take the lock. So two threads (or your main thread and an interrupt) can try_lock on an unowned mutex at the same time and both return true.
Disabling interrupts by RAII
We can achieve some level of simplicity and encapsulation by using RAII for interrupt disabling, for example the following class:
class InterruptLock {
public:
InterruptLock() {
prevInterruptState = currentInterruptState();
disableInterrupts();
}
~InterruptLock() {
restoreInterrupts(prevInterruptState);
}
private:
int prevInterruptState; // Whatever type this should be for the platform
InterruptLock(const InterruptLock&); // Not copy-constructable
};
And I would recommend disabling interrupts to get the atomicity you need within the mutex implementation itself. For example something like:
bool try_lock(std::sig_atomic_t accessorId) {
InterruptLock lock;
if (owner == SIG_ATOMIC_MIN) {
owner = accessorId;
return true;
}
return false;
}
bool unlock(std::sig_atomic_t accessorId) {
InterruptLock lock;
if (owner == accessorId) {
owner = SIG_ATOMIC_MIN;
return true;
}
return false;
}
Depending on your platform, this might look different, but you get the idea.
As you said, this provides a platform to abstract away from the disabling and enabling interrupts in general code, and encapsulates it to this one class.
Mutexes and Interrupts
Having said how I would consider implementing the mutex class, I would not actually use a mutex class for your use-cases. As you pointed out, mutexes don't really play well with interrupts, because an interrupt can't "block" on trying to acquire a mutex. For this reason, for code that directly exchanges data with an interrupt, I would instead strongly consider just directly disabling interrupts (for a very short time while the main "thread" touches the data).
So your counter might simply look like this:
volatile long long int exampleCounter;
void __irqQuickFunction(void) {
exampleCounter++;
}
...
// Change counter value
if (EVERY_ONCE_IN_A_WHILE) {
InterruptLock lock;
exampleCounter = 500;
}
In my mind, this is easier to read, easier to reason about, and won't "slip" when there's contention (ie miss timer beats).
Regarding the buffer use-case, I would strongly recommend against holding a lock for multiple interrupt cycles. A lock/mutex should be held for just the slightest moment required to "touch" a piece of memory - just long enough to read or write it. Get in, get out.
So this is how the buffering example might look:
struct ExampleBuffer {
char data[256];
} exampleBuffers[2];
ExampleBuffer* volatile bufferAwaitingConsumption = nullptr;
ExampleBuffer* volatile freeBuffer = &exampleBuffers[1];
const volatile char * const REGISTER;
void __irqLongFunction(void) {
static const char END_PACKET_SIGNAL = '\0';
static size_t index = 0;
static ExampleBuffer* receiveBuffer = &exampleBuffers[0];
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
receiveBuffer->data[index++] = c;
// End of packet?
if (c == END_PACKET_SIGNAL) {
// Make the packet available to the consumer
bufferAwaitingConsumption = receiveBuffer;
// Move on to the next buffer
receiveBuffer = freeBuffer;
freeBuffer = nullptr;
index = 0;
}
}
int main(void) {
while (true) {
// Fetch packet from shared variable
ExampleBuffer* packet;
{
InterruptLock lock;
packet = bufferAwaitingConsumption;
bufferAwaitingConsumption = nullptr;
}
if (packet) {
// ... read and do something with the data here ...
// Once we're done with the buffer, we need to release it back to the producer
{
InterruptLock lock;
freeBuffer = packet;
}
}
}
}
This code is arguably easier to reason about, since there are only two memory locations shared between the interrupt and the main loop: one to pass packets from the interrupt to the main loop, and one to pass empty buffers back to the interrupt. We also only touch those variables under "lock", and only for the minimum time needed to "move" the value. (for simplicity I've skipped over the buffer overflow logic when the main loop takes too long to free the buffer).
It's true that in this case one may not even need the locks, since we're just reading and writing simple value, but the cost of disabling the interrupts is not much, and the risk of making mistakes otherwise, is not worth it in my opinion.
Edit
As pointed out in the comments, the above solution was meant to only tackle the multithreading problem, and omitted overflow checking. Here is more complete solution which should be robust under overflow conditions:
const size_t BUFFER_COUNT = 2;
struct ExampleBuffer {
char data[256];
ExampleBuffer* next;
} exampleBuffers[BUFFER_COUNT];
volatile size_t overflowCount = 0;
class BufferList {
public:
BufferList() : first(nullptr), last(nullptr) { }
// Atomic enqueue
void enqueue(ExampleBuffer* buffer) {
InterruptLock lock;
if (last)
last->next = buffer;
else {
first = buffer;
last = buffer;
}
}
// Atomic dequeue (or returns null)
ExampleBuffer* dequeueOrNull() {
InterruptLock lock;
ExampleBuffer* result = first;
if (first) {
first = first->next;
if (!first)
last = nullptr;
}
return result;
}
private:
ExampleBuffer* first;
ExampleBuffer* last;
} freeBuffers, buffersAwaitingConsumption;
const volatile char * const REGISTER;
void __irqLongFunction(void) {
static const char END_PACKET_SIGNAL = '\0';
static size_t index = 0;
static ExampleBuffer* receiveBuffer = &exampleBuffers[0];
// Recovery from overflow?
if (!receiveBuffer) {
// Try get another free buffer
receiveBuffer = freeBuffers.dequeueOrNull();
// Still no buffer?
if (!receiveBuffer) {
overflowCount++;
return;
}
}
// Get data from the hardware and modify the buffer here.
const char c = *REGISTER;
if (index < sizeof(receiveBuffer->data))
receiveBuffer->data[index++] = c;
// End of packet, or out of space?
if (c == END_PACKET_SIGNAL) {
// Make the packet available to the consumer
buffersAwaitingConsumption.enqueue(receiveBuffer);
// Move on to the next free buffer
receiveBuffer = freeBuffers.dequeueOrNull();
index = 0;
}
}
size_t getAndResetOverflowCount() {
InterruptLock lock;
size_t result = overflowCount;
overflowCount = 0;
return result;
}
int main(void) {
// All buffers are free at the start
for (int i = 0; i < BUFFER_COUNT; i++)
freeBuffers.enqueue(&exampleBuffers[i]);
while (true) {
// Fetch packet from shared variable
ExampleBuffer* packet = dequeueOrNull();
if (packet) {
// ... read and do something with the data here ...
// Once we're done with the buffer, we need to release it back to the producer
freeBuffers.enqueue(packet);
}
size_t overflowBytes = getAndResetOverflowCount();
if (overflowBytes) {
// ...
}
}
}
The key changes:
If the interrupt runs out of free buffers, it will recover
If the interrupt receives data while it doesn't have a receive buffer, it will communicate that to the main thread via getAndResetOverflowCount
If you keep getting buffer overflows, you can simply increase the buffer count
I've encapsulated the multithreaded access into a queue class implemented as a linked list (BufferList), which supports atomic dequeue and enqueue. The previous example also used queues, but of length 0-1 (either an item is enqueued or it isn't), and so the implementation of the queue was just a single variable. In the case of running out of free buffers, the receive queue could have 2 items, so I upgraded it to a proper queue rather than adding more shared variables.
If the interrupt is the producer and mainline code is the consumer, surely it's as simple as disabling the interrupt for the duration of the consume operation?
That's how I used to do it in my embedded micro controller days.
Is this piece of code considered as thread-safe?
When I consume the buffer, it will crash sometimes and i think it is contributed to data-racing problems, is there any problem with this implementation?
TSByteBuf.cpp
#include "TSByteBuf.h"
int TSByteBuf::Read(byte* buf, int len)
{
while (true)
{
if (isBusy.load())
{
//Sleep(10);
}else
{
isBusy.store(true);
int dByteGet = m_buffer.sgetn((char*) buf, len);
isBusy.store(false);
return dByteGet;
}
}
}
int TSByteBuf::Write(byte* buf, int len)
{
while (true)
{
if (isBusy.load())
{
//Sleep(10);
}else
{
isBusy.store(true);
int dBytePut = m_buffer.sputn((char*) buf, len);
isBusy.store(false);
return dBytePut;
}
}
}
TSByteBuf.h
#ifndef TSBYTEBUF_H
#define TSBYTEBUF_H
#include <sstream>
#include <atomic>
typedef unsigned char byte;
class TSByteBuf
{
public:
std::stringbuf m_buffer;
//bool Write(byte* buf, int len);
//bool Read(byte* buf, int len);
int Write(byte* buf, int len);
int Read(byte* buf, int len);
protected:
std::atomic<bool> isBusy;
};
#endif
There's a race between the threads trying to set the isBusy variable. With std::atomic<>, loads and stores are guaranteed to be atomic, but there's a time windows between those two operations in the code. You need to use a different set of functions that provide the two atomically. See compare_exchange.
You can make your life easier by using the tools offered by the C++ standard library. To make sure only one thread accesses the given area (has an exclusive access) at a time, you can use std::mutex. Further you can use std::lock_guard, which will automatically lock (and unlock with the end of the scope) the mutex for you.
int TSByteBuf::Read(byte* buf, int len)
{
std::lock_guard<std::mutex> lg(mutex);
// do your thing, no need to unlock afterwards, the guard will take care of it for you
}
The mutex variable needs to be shared between the threads, make it a member variable of the class.
There's an alternative to using std::mutex by creating your own locking mechanism if you want to make sure the thread never goes to sleep. As pointed out in the comments, you probably don't need this and the usage of std::mutex will be fine. I'm keeping it here just for a reference.
class spin_lock {
public:
spin_lock() : flag(ATOMIC_FLAG_INIT) {}
void lock() {
while (flag.test_and_set(std::memory_order_acquire))
;
}
void unlock() { flag.clear(std::memory_order_release); }
private:
std::atomic_flag flag;
};
Notice the use of the more lightweight std::atomic_flag. Now you can use the class like this:
int TSByteBuf::Read(byte* buf, int len)
{
std::unique_lock<spin_lock> lg(spinner);
// do your thing, no need to unlock afterwards, the guard will take care of it for you
}
"is there any problem with this implementation?"
One problem I spot, is thatstd::atomic<bool> isBusy; wouldn't replace a std::mutex for locking concurrent access to m_buffer. You never set the value to true.
But even if you do so (as seen from your edit), store() and load() operations for the isBusy value don't form a lock to protect access to m_buffer in whole. Thread context switches may occur in between.
I'm trying to familiarize myself with c++11 atomics, so I tried writing a barrier class for threads (before someone complains about not using existing classes: this is more for learning/self improvement than due to any real need). my class looks basically as followed:
class barrier
{
private:
std::atomic<int> counter[2];
std::atomic<int> lock[2];
std::atomic<int> cur_idx;
int thread_count;
public:
//constructors...
bool wait();
};
All members are initialized to zero, except thread_count, which holds the appropriate count.
I have implemented the wait function as
int idx = cur_idx.load();
if(lock[idx].load() == 0)
{
lock[idx].store(1);
}
int val = counter[idx].fetch_add(1);
if(val >= thread_count - 1)
{
counter[idx].store(0);
cur_idx.fetch_xor(1);
lock[idx].store(0);
return true;
}
while(lock[idx].load() == 1);
return false;
However when trying to use it with two threads (thread_count is 2) whe first thread gets in the wait loop just fine, but the second thread doesn't unlock the barrier (it seems it doesn't even get to int val = counter[idx].fetch_add(1);, but I'm not too sure about that. However when I'm using gcc atomic-intrinsics by using volatile int instead of std::atomic<int> and writing wait as followed:
int idx = cur_idx;
if(lock[idx] == 0)
{
__sync_val_compare_and_swap(&lock[idx], 0, 1);
}
int val = __sync_fetch_and_add(&counter[idx], 1);
if(val >= thread_count - 1)
{
__sync_synchronize();
counter[idx] = 0;
cur_idx ^= 1;
__sync_synchronize();
lock[idx] = 0;
__sync_synchronize();
return true;
}
while(lock[idx] == 1);
return false;
it works just fine. From my understanding there shouldn't be any fundamental differences between the two versions (more to the point if anything the second should be less likely to work). So which of the following scenarios applies?
I got lucky with the second implementation and my algorithm is crap
I didn't fully understand std::atomic and there is a problem with the first variant (but not the second)
It should work, but the experimental implementation for c++11 libraries isn't as mature as I have hoped
For the record I'm using 32bit mingw with gcc 4.6.1
The calling code looks like this:
spin_barrier b(2);
std::thread t([&b]()->void
{
std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
b.wait();
});
b.wait();
t.join();
Since mingw doesn't whave <thread> headers jet I use a self written version for that which basically wraps the appropriate pthread functions (before someone asks: yes it works without the barrier, so it shouldn't be a problem with the wrapping)
Any insights would be appreciated.
edit: Explanation for the algorithm to make it clearer:
thread_count is the number of threads which shall wait for the barrier (so if thread_count threads are in the barrier all can leave the barrier).
lock is set to one when the first (or any) thread enters the barrier.
counter counts how many threads are inside the barrier and is atomically incremented once for each thread
if counter>=thread_count all threads are inside the barrier so counter and lock are reset to zero
otherwise the thread waits for the lock to become zero
in the next use of the barrier different variables (counter, lock) are used ensure there are no problems if threads are still waiting on the first use of the barrier (e.g. they had been preempted when the barrier is lifted)
edit2:
I have now tested it using gcc 4.5.1 under linux, where both versions seem to work just fine, which seems to point to a problem with mingw's std::atomic, but I'm still not completely convinced, since looking into the <atomic> header revaled that most functions simply call the appropriate gcc-atomic meaning there really shouldn't bea difference between the two versions
I have no idea if this is going to be of help, but the following snippet from Herb Sutter's implementation of a concurrent queue uses a spinlock based on atomics:
std::atomic<bool> consumerLock;
{ // the critical section
while (consumerLock.exchange(true)) { } // this is the spinlock
// do something useful
consumerLock = false; // unlock
}
In fact, the Standard provides a purpose-built type for this construction that is required to have lock-free operations, std::atomic_flag. With that, the critical section would look like this:
std::atomic_flag consumerLock;
{
// critical section
while (consumerLock.test_and_set()) { /* spin */ }
// do stuff
consumerLock.clear();
}
(You can use acquire and release memory ordering there if you prefer.)
It looks needlessly complicated. Try this simpler version (well, I haven't tested it, I just meditated on it:))) :
#include <atomic>
class spinning_barrier
{
public:
spinning_barrier (unsigned int n) : n_ (n), nwait_ (0), step_(0) {}
bool wait ()
{
unsigned int step = step_.load ();
if (nwait_.fetch_add (1) == n_ - 1)
{
/* OK, last thread to come. */
nwait_.store (0); // XXX: maybe can use relaxed ordering here ??
step_.fetch_add (1);
return true;
}
else
{
/* Run in circles and scream like a little girl. */
while (step_.load () == step)
;
return false;
}
}
protected:
/* Number of synchronized threads. */
const unsigned int n_;
/* Number of threads currently spinning. */
std::atomic<unsigned int> nwait_;
/* Number of barrier syncronizations completed so far,
* it's OK to wrap. */
std::atomic<unsigned int> step_;
};
EDIT:
#Grizzy, I can't find any errors in your first (C++11) version and I've also run it for like a hundred million syncs with two threads and it completes. I've run it on a dual-socket/quad-core GNU/Linux machine though, so I'm rather inclined to suspect your option 3. - the library (or rather, its port to win32) is not mature enough.
Here is an elegant solution from the book C++ Concurrency in Action: Practical Multithreading.
struct bar_t {
unsigned const count;
std::atomic<unsigned> spaces;
std::atomic<unsigned> generation;
bar_t(unsigned count_) :
count(count_), spaces(count_), generation(0)
{}
void wait() {
unsigned const my_generation = generation;
if (!--spaces) {
spaces = count;
++generation;
} else {
while(generation == my_generation);
}
}
};
Here is a simple version of mine :
// spinning_mutex.hpp
#include <atomic>
class spinning_mutex
{
private:
std::atomic<bool> lockVal;
public:
spinning_mutex() : lockVal(false) { };
void lock()
{
while(lockVal.exchange(true) );
}
void unlock()
{
lockVal.store(false);
}
bool is_locked()
{
return lockVal.load();
}
};
Usage : (from std::lock_guard example)
#include <thread>
#include <mutex>
#include "spinning_mutex.hpp"
int g_i = 0;
spinning_mutex g_i_mutex; // protects g_i
void safe_increment()
{
std::lock_guard<spinning_mutex> lock(g_i_mutex);
++g_i;
// g_i_mutex is automatically released when lock
// goes out of scope
}
int main()
{
std::thread t1(safe_increment);
std::thread t2(safe_increment);
t1.join();
t2.join();
}
I know the thread is a little bit old, but since it is still the first google result when searching for a thread barrier using c++11 only, I want to present a solution that gets rid of the busy waiting using the std::condition_variable.
Basically it is the solution of chill, but instead of the while loop it is using std::conditional_variable.wait() and std::conditional_variable.notify_all(). In my tests it seems to work fine.
#include <atomic>
#include <condition_variable>
#include <mutex>
class SpinningBarrier
{
public:
SpinningBarrier (unsigned int threadCount) :
threadCnt(threadCount),
step(0),
waitCnt(0)
{}
bool wait()
{
if(waitCnt.fetch_add(1) >= threadCnt - 1)
{
std::lock_guard<std::mutex> lock(mutex);
step += 1;
condVar.notify_all();
waitCnt.store(0);
return true;
}
else
{
std::unique_lock<std::mutex> lock(mutex);
unsigned char s = step;
condVar.wait(lock, [&]{return step == s;});
return false;
}
}
private:
const unsigned int threadCnt;
unsigned char step;
std::atomic<unsigned int> waitCnt;
std::condition_variable condVar;
std::mutex mutex;
};
Why not use std::atomic_flag (from C++11)?
http://en.cppreference.com/w/cpp/atomic/atomic_flag
std::atomic_flag is an atomic boolean type. Unlike all specializations
of std::atomic, it is guaranteed to be lock-free.
Here's how I would write my spinning thread barrier class:
#ifndef SPINLOCK_H
#define SPINLOCK_H
#include <atomic>
#include <thread>
class SpinLock
{
public:
inline SpinLock() :
m_lock(ATOMIC_FLAG_INIT)
{
}
inline SpinLock(const SpinLock &) :
m_lock(ATOMIC_FLAG_INIT)
{
}
inline SpinLock &operator=(const SpinLock &)
{
return *this;
}
inline void lock()
{
while (true)
{
for (int32_t i = 0; i < 10000; ++i)
{
if (!m_lock.test_and_set(std::memory_order_acquire))
{
return;
}
}
std::this_thread::yield(); // A great idea that you don't see in many spinlock examples
}
}
inline bool try_lock()
{
return !m_lock.test_and_set(std::memory_order_acquire);
}
inline void unlock()
{
m_lock.clear(std::memory_order_release);
}
private:
std::atomic_flag m_lock;
};
#endif
Stolen straight from docs
spinlock.h
#include <atomic>
using namespace std;
/* Fast userspace spinlock */
class spinlock {
public:
spinlock(std::atomic_flag& flag) : flag(flag) {
while (flag.test_and_set(std::memory_order_acquire)) ;
};
~spinlock() {
flag.clear(std::memory_order_release);
};
private:
std::atomic_flag& flag;
};
usage.cpp
#include "spinlock.h"
atomic_flag kartuliga = ATOMIC_FLAG_INIT;
void mutually_exclusive_function()
{
spinlock lock(kartuliga);
/* your shared-resource-using code here */
}
How can I set a variable of type long (on 64 bit machine = 8 bytes) inside a signal handler? I've read that you can only use variables of type sig_atomic_t, which is actually implemented as volatile int inside a signal handler and it is unsafe to modify data types bigger than an int.
You can use a long inside a signal handler, you can use anything, in fact. The only thing you should take care of is proper synchronization in order to avoid race conditions.
sig_atomic_t should be used for variables shared between the signal handler and the rest of the code. Any variable "private" to the signal handler can be of any type, any size.
Sample code :
#include <signal.h>
static volatile long badShared; // NOT OK: shared not sig_atomic_t
static volatile sig_atomic_t goodShared; // OK: shared sig_atomic_t
void handler(int signum)
{
int localInt = 17;
long localLong = 23; // OK: not shared
if (badShared == 0) // NOT OK: shared not sig_atomic_t
++badShared;
if (goodShared == 0) // OK: shared sig_atomic_t
++goodShared;
}
int main()
{
signal(SOMESIGNAL, handler);
badShared++; // NOT OK: shared not sig_atomic_t
goodShared++; // OK: shared sig_atomic_t
return 0;
}
If you want to use a shared variable other than sig_atomic_t use atomics (atomic_long_read, atomic_long_set).