Self-implemented thread-safe string buffer in C++ - c++

Is this piece of code considered as thread-safe?
When I consume the buffer, it will crash sometimes and i think it is contributed to data-racing problems, is there any problem with this implementation?
TSByteBuf.cpp
#include "TSByteBuf.h"
int TSByteBuf::Read(byte* buf, int len)
{
while (true)
{
if (isBusy.load())
{
//Sleep(10);
}else
{
isBusy.store(true);
int dByteGet = m_buffer.sgetn((char*) buf, len);
isBusy.store(false);
return dByteGet;
}
}
}
int TSByteBuf::Write(byte* buf, int len)
{
while (true)
{
if (isBusy.load())
{
//Sleep(10);
}else
{
isBusy.store(true);
int dBytePut = m_buffer.sputn((char*) buf, len);
isBusy.store(false);
return dBytePut;
}
}
}
TSByteBuf.h
#ifndef TSBYTEBUF_H
#define TSBYTEBUF_H
#include <sstream>
#include <atomic>
typedef unsigned char byte;
class TSByteBuf
{
public:
std::stringbuf m_buffer;
//bool Write(byte* buf, int len);
//bool Read(byte* buf, int len);
int Write(byte* buf, int len);
int Read(byte* buf, int len);
protected:
std::atomic<bool> isBusy;
};
#endif

There's a race between the threads trying to set the isBusy variable. With std::atomic<>, loads and stores are guaranteed to be atomic, but there's a time windows between those two operations in the code. You need to use a different set of functions that provide the two atomically. See compare_exchange.
You can make your life easier by using the tools offered by the C++ standard library. To make sure only one thread accesses the given area (has an exclusive access) at a time, you can use std::mutex. Further you can use std::lock_guard, which will automatically lock (and unlock with the end of the scope) the mutex for you.
int TSByteBuf::Read(byte* buf, int len)
{
std::lock_guard<std::mutex> lg(mutex);
// do your thing, no need to unlock afterwards, the guard will take care of it for you
}
The mutex variable needs to be shared between the threads, make it a member variable of the class.
There's an alternative to using std::mutex by creating your own locking mechanism if you want to make sure the thread never goes to sleep. As pointed out in the comments, you probably don't need this and the usage of std::mutex will be fine. I'm keeping it here just for a reference.
class spin_lock {
public:
spin_lock() : flag(ATOMIC_FLAG_INIT) {}
void lock() {
while (flag.test_and_set(std::memory_order_acquire))
;
}
void unlock() { flag.clear(std::memory_order_release); }
private:
std::atomic_flag flag;
};
Notice the use of the more lightweight std::atomic_flag. Now you can use the class like this:
int TSByteBuf::Read(byte* buf, int len)
{
std::unique_lock<spin_lock> lg(spinner);
// do your thing, no need to unlock afterwards, the guard will take care of it for you
}

"is there any problem with this implementation?"
One problem I spot, is thatstd::atomic<bool> isBusy; wouldn't replace a std::mutex for locking concurrent access to m_buffer. You never set the value to true.
But even if you do so (as seen from your edit), store() and load() operations for the isBusy value don't form a lock to protect access to m_buffer in whole. Thread context switches may occur in between.

Related

Thread safety of boost::unordered_map<int, struct> and shared_mutex

I’m trying to parse the ts stream data coming from sockets with 4 threads. I’ve decided to use boost shared mutex to manage connections and data receiving. But I’m totally newbie in c++ and I’m not sure if I’ll do it right with tread safety. I’m using boost unordered_map<int, TsStreams>, when a new user is connecting, I’m locking the mutex with a unique lock and adding a user to the map, when this user is disconnecting, I’m locking the mutex with a unique lock and remove him from the map. TsStreams structure contains the vector and some additional variables while the user sending the data, I’m using the shared lock to get the user’s TsStreams reference from the map, add new data to the vector and modify additional variables. Is modifying TsStreams in that way thread-safe or not?
class Demuxer {
public:
Demuxer();
typedef signal<void (int, TsStream)> PacketSignal;
void onUserConnected(User);
void onUserDisconnected(int);
void onUserData(Data);
void addPacketSignal(const PacketSignal::slot_type& slot);
private:
mutable PacketSignal packetSignal;
void onPacketReady(int, TsStream);
TsDemuxer tsDemuxer;
boost::unordered_map<int, TsStreams> usersData;
boost::shared_mutex mtx_;
};
#include "Demuxer.h"
Demuxer::Demuxer() {
tsDemuxer.addPacketSignal(boost::bind(&Demuxer::onPacketReady, this, _1, _2));
}
void Demuxer::onUserConnected(User user){
boost::unique_lock<boost::shared_mutex> lock(mtx_);
if(usersData.count(user.socket)){
usersData.erase(user.socket);
}
TsStreams streams;
streams.video.isVideo = true;
usersData.insert(std::make_pair(user.socket, streams));
}
void Demuxer::onUserDisconnected(int socket){
boost::unique_lock<boost::shared_mutex> lock(mtx_);
if(usersData.count(socket)){
usersData.erase(socket);
}
}
void Demuxer::onUserData(Data data) {
boost::shared_lock<boost::shared_mutex> lock(mtx_);
if(!usersData.count(data.socket)){
return;
}
tsDemuxer.parsePacket(data.socket, std::ref(usersData.at(data.socket)), (uint8_t *) data.buffer, data.length);
}
void Demuxer::onPacketReady(int socket, TsStream data) {
packetSignal(socket, data);
}
void Demuxer::addPacketSignal(const PacketSignal::slot_type& slot){
packetSignal.connect(slot);
}
struct TsStreams{
TsStreams() = default;
TsStreams(const TsStreams &p1) {}
TsStream video;
TsStream audio;
};
struct TsStream
{
TsStream() = default;
TsStream(const TsStream &p1) {}
boost::recursive_mutex mtx_; // to make sure to have the queue, it may not be necessary
uint64_t PTS = 0;
uint64_t DTS = 0;
std::vector<char> buffer;
uint32_t bytesDataLength = 0;
bool isVideo = false;
};
class TsDemuxer {
public:
typedef signal<void (int, TsStream)> PacketSignal;
void parsePacket(int socket, TsStreams &streams, uint8_t *data, int size);
connection addPacketSignal(const PacketSignal::slot_type& slot);
private:
PacketSignal packetSignal;
void parseTSPacket(int socket, TsStream &stream, uint8_t *data, int size);
void parseAdaptationField(BitReader &bitReader);
void parseStream(int socket, TsStream &stream, BitReader &bitReader, uint32_t payload_unit_start_indicator);
void parsePES(TsStream &stream, BitReader &bitReader);
int64_t parseTSTimestamp(BitReader &bitReader);
};
void TsDemuxer::parsePacket(int socket, TsStreams &streams, uint8_t *data, int size) {
//some parsing
if(video){
streams.video.mtx_.lock();
parseTSPacket(socket, streams.video, (uint8_t *)buf, 188);
}else{
streams.audio.mtx_.lock();
parseTSPacket(socket, streams.audio, (uint8_t *)buf, 188);
}
}
void TsDemuxer::parseTSPacket(int socket, TsStream &stream, uint8_t *data, int size)
{
//some more parsing
parseStream(socket, stream, bitReader, payload_unit_start_indicator);
}
void TsDemuxer::parseStream(int socket, TsStream &stream, BitReader &bitReader, uint32_t payload_unit_start_indicator) {
if(payload_unit_start_indicator)
{
if(!stream.buffer.empty()){
packetSignal(socket, stream);
stream.buffer = vector<char>();
stream.bytesDataLength = 0;
}
parsePES(stream, bitReader);
}
size_t payloadSizeBytes = bitReader.numBitsLeft() / 8;
copy(bitReader.getBitReaderData(), bitReader.getBitReaderData()+payloadSizeBytes,back_inserter(stream.buffer));
stream.mtx_.unlock();
}
The demuxer looks correct to me. There are a few inefficiencies though:
You don't need to count before you erase. Just erase. If an element is not present, this will do nothing. That saves you one lookup. Likewise, don't use count followed by at. Use find (see below for the use).
You may want to move as much work as possible out of the critical section. Foe example in onUserConnected you could create the TsStreams object before acquiring the lock.
Note that changing an unordered map will never invalidate pointers or references to elements in the map unless they are erased. That means in onUserData you don't have to hold the lock on the map while parsing the packet.
That is, assuming you don't call onUserData for the same user from two different threads. You could prevent this by introducing a second lock the TsStream object. Likewise, you should guard against erasing the element while another thread may still parse the last packet. I would use a shared_ptr for this. Something like this:
class Demuxer {
...
boost::unordered_map<int, boost::shared_ptr<TsStreams> > usersData;
boost::shared_mutex mtx_;
};
void Demuxer::onUserData(Data data) {
boost::shared_lock<boost::shared_mutex> maplock(mtx_);
auto found = usersData.find(data.socket);
if(found == usersData.end())
return;
boost::shared_ptr<TsStreams> stream = found->second;
boost::unique_lock<boost::recursive_mutex> datalock(stream->mtx_);
maplock.unlock();
tsDemuxer.parsePacket(data.socket, *stream, (uint8_t *) data.buffer, data.length);
}
If you reduce the time the Demuxer lock is taken with this approach, you should probably replace that shared mutex with a normal one. shared mutexes have much higher overhead and are not worth it for such short critical sections.
The TsDemuxer looks a bit wonky:
In TsDemuxer::parsePacket you never unlock the mutex. Shouldn't that be a unique_lock? Likewise, in parseStream the unlock seems unpaired. In general, using a unique_lock object is always the way to go compared to manual locking and unlocking. If anything, lock and unlock the unique_lock, not the mutex.
Remarks unrelated to multithreading
stream.buffer.clear() is more efficient thanstream.buffer = vector<char>() because this will reuse the buffer memory instead of deallocating it completely.
As others have noted, these parts of boost are now part of the standard library. Replace boost:: with std:: and enable a recent C++ standard like C++14 or 17 and you are fine. At worst you have to replace shared_mutex with shared_timed_mutex.
In Demuxer, you pass the User and Data objects by value. Are you sure those shouldn't be const references?

How can I implement a C++ Reader-Writer lock using a single unlock method, which can be called be a reader or writer?

I'm working on a project, which requires the use of specific OS abstractions and I need to implement a reader-writer lock using their semaphore and mutex. I currently, have a setup in the format:
class ReadWriteLock
{
public:
ReadWriteLock(uint32_t maxReaders);
~ReadWriteLock();
uint32_t GetMaxReaders() const;
eResult GetReadLock(int32_t timeout);
eResult GetWriteLock(int32_t timeout);
eResult Unlock();
private:
uint32_t m_MaxReaders;
Mutex* m_WriterMutex;
Semaphore* m_ReaderSemaphore;
};
In this implementation I need to use this Unlock method to either unlock the writer and release all reader semaphore slots, or to simply unleash a reader semaphore slot, however, I am struggling as I cannot think of an implementation, which will be work in all cases. How can I make this work in the given setup? I know it is possible as POSIX was able to implement a universal unlock method in their implementation, but I cannot find any indication of how that was done, so would appreciate any information people can share.
Note that I cannot use C++11 or other OS primitives.
Well, define two functions UnlockRead and UnlockWrite.
I believe you do not need both accesses (Write/Read) at the same time in the same place. So what I am proposing is to have two other classes for locking access:
class ReadWriteAccess
{
public:
ReadWriteAccess(uint32_t maxReaders);
~ReadWriteAccess();
uint32_t GetMaxReaders() const;
uint32_t GetMaxReaders() const;
eResult GetReadLock(int32_t timeout);
eResult GetWriteLock(int32_t timeout);
eResult UnlockWrite();
eResult UnlockRead();
private:
uint32_t m_MaxReaders;
Mutex* m_WriterMutex;
Semaphore* m_ReaderSemaphore;
};
And have separate classes for read and write lock and use RAII to be always on safe side:
class ReadLock
{
public:
ReadLock(ReadWriteAccess& access, int32_t timeout) : access(access)
{
result = access.GetReadLock(timeout);
}
eResult getResult() const { return result; }
~ReadLock()
{
if (result)
access.UnlockRead();
}
private:
ReadWriteAccess& access;
eResult result;
};
and use like this:
T someResource;
ReadWriteAccess someResourceGuard;
void someFunction()
{
ReadLock lock(someResourceGuard);
if (lock.getResult())
cout << someResource; // it is safe to read something from resource
}
Of course, the very similar implementation you can easily write by yourself for WriteLock
Since OP insisted in comments to have "one" Unlock - please consider the drawbacks:
Assume it is implemented some kind of stack of last calls to Lock functions:
class ReadWriteLock
{
public:
ReadWriteLock(uint32_t maxReaders);
~ReadWriteLock();
uint32_t GetMaxReaders() const;
eResult GetReadLock(int32_t timeout)
{
eResult result = GetReadLockImpl(timestamp);
if (result)
lockStack.push(READ);
}
eResult GetWriteLock(int32_t timeout)
{
eResult result = GetWriteLockImpl(timestamp);
if (result)
lockStack.push(WRITE);
}
eResult Unlock()
{
LastLockMode lockMode = lockStack.top();
lockStack.pop();
if (lockMode == READ)
UnlockReadImpl();
else
UnlockWriteImpl();
}
private:
uint32_t m_MaxReaders;
Mutex* m_WriterMutex;
Semaphore* m_ReaderSemaphore;
enum Mode { READ, WRITE };
std::stack<Mode> lockStack;
};
But the above would work only in one-thread application. And one-thread application never need any locks.
So - you have to have multi-thread stack - like:
template <typename Value>
class MultiThreadStack
{
public:
void push(Value)
{
stackPerThread[getThreadId()].push(value);
}
Value top()
{
return stackPerThread[getThreadId()].top();
}
void pop()
{
stackPerThread[getThreadId()].pop();
}
private:
ThreadId getThreadId() { return /* your system way to get thread id*/; }
std::map<ThreadId, std::stack<Value>> stackPerThread;
};
So use this MultiThreadStack not std::stack in ReadWriteLock.
But, the std::map above would need ReadWriteLock to lock access to it from multuple threads - so, well, either you know all your threads before you start using this stuff (preregistration) or you end up in the same problem as described here. So my advice - if you can - change your design.
When acquiring the lock successfully the type is known: either you have many readers running or only one writer, you cannot have both readers and writers running with a validly acquired lock.
So it suffices to store the current lock mode when a lock call succeeds and all following unlock calls (potentially many in case reading permit was provided, only one indeed if writing lock was requested) will be of that mode.

C++ Blocking Queue Segfault w/ Boost

I had a need for a Blocking Queue in C++ with timeout-capable offer(). The queue is intended for multiple producers, one consumer. Back when I was implementing, I didn't find any good existing queues that fit this need, so I coded it myself.
I'm seeing segfaults come out of the take() method on the queue, but they are intermittent. I've been looking over the code for issues but I'm not seeing anything that looks problematic.
I'm wondering if:
There is an existing library that does this reliably that I should
use (boost or header-only preferred).
Anyone sees any obvious flaw in my code that I need to fix.
Here is the header:
class BlockingQueue
{
public:
BlockingQueue(unsigned int capacity) : capacity(capacity) { };
bool offer(const MyType & myType, unsigned int timeoutMillis);
MyType take();
void put(const MyType & myType);
unsigned int getCapacity();
unsigned int getCount();
private:
std::deque<MyType> queue;
unsigned int capacity;
};
And the relevant implementations:
boost::condition_variable cond;
boost::mutex mut;
bool BlockingQueue::offer(const MyType & myType, unsigned int timeoutMillis)
{
Timer timer;
// boost::unique_lock is a scoped lock - its destructor will call unlock().
// So no need for us to make that call here.
boost::unique_lock<boost::mutex> lock(mut);
// We use a while loop here because the monitor may have woken up because
// another producer did a PulseAll. In that case, the queue may not have
// room, so we need to re-check and re-wait if that is the case.
// We use an external stopwatch to stop the madness if we have taken too long.
while (queue.size() >= this->capacity)
{
int monitorTimeout = timeoutMillis - ((unsigned int) timer.getElapsedMilliSeconds());
if (monitorTimeout <= 0)
{
return false;
}
if (!cond.timed_wait(lock, boost::posix_time::milliseconds(timeoutMillis)))
{
return false;
}
}
cond.notify_all();
queue.push_back(myType);
return true;
}
void BlockingQueue::put(const MyType & myType)
{
// boost::unique_lock is a scoped lock - its destructor will call unlock().
// So no need for us to make that call here.
boost::unique_lock<boost::mutex> lock(mut);
// We use a while loop here because the monitor may have woken up because
// another producer did a PulseAll. In that case, the queue may not have
// room, so we need to re-check and re-wait if that is the case.
// We use an external stopwatch to stop the madness if we have taken too long.
while (queue.size() >= this->capacity)
{
cond.wait(lock);
}
cond.notify_all();
queue.push_back(myType);
}
MyType BlockingQueue::take()
{
// boost::unique_lock is a scoped lock - its destructor will call unlock().
// So no need for us to make that call here.
boost::unique_lock<boost::mutex> lock(mut);
while (queue.size() == 0)
{
cond.wait(lock);
}
cond.notify_one();
MyType myType = this->queue.front();
this->queue.pop_front();
return myType;
}
unsigned int BlockingQueue::getCapacity()
{
return this->capacity;
}
unsigned int BlockingQueue::getCount()
{
return this->queue.size();
}
And yes, I didn't implement the class using templates - that is next on the list :)
Any help is greatly appreciated. Threading issues can be really hard to pin down.
-Ben
Why are cond, and mut globals? I would expect them to be members of your BlockingQueue object. I don't know what else is touching those things, but there may be an issue there.
I too have implemented a ThreadSafeQueue as part of a larger project:
https://github.com/cdesjardins/QueuePtr/blob/master/include/ThreadSafeQueue.h
It is a similar concept to yours, except the enqueue (aka offer) functions are non-blocking because there is basically no max capacity. To enforce a capacity I typically have a pool with N buffers added at system init time, and a Queue for message passing at run time, this also eliminates the need for memory allocation at run time which I consider to be a good thing (I typically work on embedded applications).
The only difference between a pool, and a queue is that a pool gets a bunch of buffers enqueued at system init time. So you have something like this:
ThreadSafeQueue<BufferDataType*> pool;
ThreadSafeQueue<BufferDataType*> queue;
void init()
{
for (int i = 0; i < NUM_BUFS; i++)
{
pool.enqueue(new BufferDataType);
}
}
Then when you want send a message you do something like the following:
void producerA()
{
BufferDataType *buf;
if (pool.waitDequeue(buf, timeout) == true)
{
initBufWithMyData(buf);
queue.enqueue(buf);
}
}
This way the enqueue function is quick and easy, but if the pool is empty, then you will block until someone puts a buffer back into the pool. The idea being that some other thread will be blocking on the queue and will return buffers to the pool when they have been processed as follows:
void consumer()
{
BufferDataType *buf;
if (queue.waitDequeue(buf, timeout) == true)
{
processBufferData(buf);
pool.enqueue(buf);
}
}
Anyways take a look at it, maybe it will help.
I suppose the problem in your code is modifying the deque by several threads. Look:
you're waiting for codition from another thread;
and then immediately sending a signal to other threads that deque is unlocked just before you want to modify it;
then you modifying the deque while other threads are thinking deque is allready unlocked and starting doing the same.
So, try to place all the cond.notify_*() after modifying the deque. I.e.:
void BlockingQueue::put(const MyType & myType)
{
boost::unique_lock<boost::mutex> lock(mut);
while (queue.size() >= this->capacity)
{
cond.wait(lock);
}
queue.push_back(myType); // <- modify first
cond.notify_all(); // <- then say to others that deque is free
}
For better understanding I suggest to read about the pthread_cond_wait().

Writing a (spinning) thread barrier using c++11 atomics

I'm trying to familiarize myself with c++11 atomics, so I tried writing a barrier class for threads (before someone complains about not using existing classes: this is more for learning/self improvement than due to any real need). my class looks basically as followed:
class barrier
{
private:
std::atomic<int> counter[2];
std::atomic<int> lock[2];
std::atomic<int> cur_idx;
int thread_count;
public:
//constructors...
bool wait();
};
All members are initialized to zero, except thread_count, which holds the appropriate count.
I have implemented the wait function as
int idx = cur_idx.load();
if(lock[idx].load() == 0)
{
lock[idx].store(1);
}
int val = counter[idx].fetch_add(1);
if(val >= thread_count - 1)
{
counter[idx].store(0);
cur_idx.fetch_xor(1);
lock[idx].store(0);
return true;
}
while(lock[idx].load() == 1);
return false;
However when trying to use it with two threads (thread_count is 2) whe first thread gets in the wait loop just fine, but the second thread doesn't unlock the barrier (it seems it doesn't even get to int val = counter[idx].fetch_add(1);, but I'm not too sure about that. However when I'm using gcc atomic-intrinsics by using volatile int instead of std::atomic<int> and writing wait as followed:
int idx = cur_idx;
if(lock[idx] == 0)
{
__sync_val_compare_and_swap(&lock[idx], 0, 1);
}
int val = __sync_fetch_and_add(&counter[idx], 1);
if(val >= thread_count - 1)
{
__sync_synchronize();
counter[idx] = 0;
cur_idx ^= 1;
__sync_synchronize();
lock[idx] = 0;
__sync_synchronize();
return true;
}
while(lock[idx] == 1);
return false;
it works just fine. From my understanding there shouldn't be any fundamental differences between the two versions (more to the point if anything the second should be less likely to work). So which of the following scenarios applies?
I got lucky with the second implementation and my algorithm is crap
I didn't fully understand std::atomic and there is a problem with the first variant (but not the second)
It should work, but the experimental implementation for c++11 libraries isn't as mature as I have hoped
For the record I'm using 32bit mingw with gcc 4.6.1
The calling code looks like this:
spin_barrier b(2);
std::thread t([&b]()->void
{
std::this_thread::sleep_for(std::chrono::duration<double>(0.1));
b.wait();
});
b.wait();
t.join();
Since mingw doesn't whave <thread> headers jet I use a self written version for that which basically wraps the appropriate pthread functions (before someone asks: yes it works without the barrier, so it shouldn't be a problem with the wrapping)
Any insights would be appreciated.
edit: Explanation for the algorithm to make it clearer:
thread_count is the number of threads which shall wait for the barrier (so if thread_count threads are in the barrier all can leave the barrier).
lock is set to one when the first (or any) thread enters the barrier.
counter counts how many threads are inside the barrier and is atomically incremented once for each thread
if counter>=thread_count all threads are inside the barrier so counter and lock are reset to zero
otherwise the thread waits for the lock to become zero
in the next use of the barrier different variables (counter, lock) are used ensure there are no problems if threads are still waiting on the first use of the barrier (e.g. they had been preempted when the barrier is lifted)
edit2:
I have now tested it using gcc 4.5.1 under linux, where both versions seem to work just fine, which seems to point to a problem with mingw's std::atomic, but I'm still not completely convinced, since looking into the <atomic> header revaled that most functions simply call the appropriate gcc-atomic meaning there really shouldn't bea difference between the two versions
I have no idea if this is going to be of help, but the following snippet from Herb Sutter's implementation of a concurrent queue uses a spinlock based on atomics:
std::atomic<bool> consumerLock;
{ // the critical section
while (consumerLock.exchange(true)) { } // this is the spinlock
// do something useful
consumerLock = false; // unlock
}
In fact, the Standard provides a purpose-built type for this construction that is required to have lock-free operations, std::atomic_flag. With that, the critical section would look like this:
std::atomic_flag consumerLock;
{
// critical section
while (consumerLock.test_and_set()) { /* spin */ }
// do stuff
consumerLock.clear();
}
(You can use acquire and release memory ordering there if you prefer.)
It looks needlessly complicated. Try this simpler version (well, I haven't tested it, I just meditated on it:))) :
#include <atomic>
class spinning_barrier
{
public:
spinning_barrier (unsigned int n) : n_ (n), nwait_ (0), step_(0) {}
bool wait ()
{
unsigned int step = step_.load ();
if (nwait_.fetch_add (1) == n_ - 1)
{
/* OK, last thread to come. */
nwait_.store (0); // XXX: maybe can use relaxed ordering here ??
step_.fetch_add (1);
return true;
}
else
{
/* Run in circles and scream like a little girl. */
while (step_.load () == step)
;
return false;
}
}
protected:
/* Number of synchronized threads. */
const unsigned int n_;
/* Number of threads currently spinning. */
std::atomic<unsigned int> nwait_;
/* Number of barrier syncronizations completed so far,
* it's OK to wrap. */
std::atomic<unsigned int> step_;
};
EDIT:
#Grizzy, I can't find any errors in your first (C++11) version and I've also run it for like a hundred million syncs with two threads and it completes. I've run it on a dual-socket/quad-core GNU/Linux machine though, so I'm rather inclined to suspect your option 3. - the library (or rather, its port to win32) is not mature enough.
Here is an elegant solution from the book C++ Concurrency in Action: Practical Multithreading.
struct bar_t {
unsigned const count;
std::atomic<unsigned> spaces;
std::atomic<unsigned> generation;
bar_t(unsigned count_) :
count(count_), spaces(count_), generation(0)
{}
void wait() {
unsigned const my_generation = generation;
if (!--spaces) {
spaces = count;
++generation;
} else {
while(generation == my_generation);
}
}
};
Here is a simple version of mine :
// spinning_mutex.hpp
#include <atomic>
class spinning_mutex
{
private:
std::atomic<bool> lockVal;
public:
spinning_mutex() : lockVal(false) { };
void lock()
{
while(lockVal.exchange(true) );
}
void unlock()
{
lockVal.store(false);
}
bool is_locked()
{
return lockVal.load();
}
};
Usage : (from std::lock_guard example)
#include <thread>
#include <mutex>
#include "spinning_mutex.hpp"
int g_i = 0;
spinning_mutex g_i_mutex; // protects g_i
void safe_increment()
{
std::lock_guard<spinning_mutex> lock(g_i_mutex);
++g_i;
// g_i_mutex is automatically released when lock
// goes out of scope
}
int main()
{
std::thread t1(safe_increment);
std::thread t2(safe_increment);
t1.join();
t2.join();
}
I know the thread is a little bit old, but since it is still the first google result when searching for a thread barrier using c++11 only, I want to present a solution that gets rid of the busy waiting using the std::condition_variable.
Basically it is the solution of chill, but instead of the while loop it is using std::conditional_variable.wait() and std::conditional_variable.notify_all(). In my tests it seems to work fine.
#include <atomic>
#include <condition_variable>
#include <mutex>
class SpinningBarrier
{
public:
SpinningBarrier (unsigned int threadCount) :
threadCnt(threadCount),
step(0),
waitCnt(0)
{}
bool wait()
{
if(waitCnt.fetch_add(1) >= threadCnt - 1)
{
std::lock_guard<std::mutex> lock(mutex);
step += 1;
condVar.notify_all();
waitCnt.store(0);
return true;
}
else
{
std::unique_lock<std::mutex> lock(mutex);
unsigned char s = step;
condVar.wait(lock, [&]{return step == s;});
return false;
}
}
private:
const unsigned int threadCnt;
unsigned char step;
std::atomic<unsigned int> waitCnt;
std::condition_variable condVar;
std::mutex mutex;
};
Why not use std::atomic_flag (from C++11)?
http://en.cppreference.com/w/cpp/atomic/atomic_flag
std::atomic_flag is an atomic boolean type. Unlike all specializations
of std::atomic, it is guaranteed to be lock-free.
Here's how I would write my spinning thread barrier class:
#ifndef SPINLOCK_H
#define SPINLOCK_H
#include <atomic>
#include <thread>
class SpinLock
{
public:
inline SpinLock() :
m_lock(ATOMIC_FLAG_INIT)
{
}
inline SpinLock(const SpinLock &) :
m_lock(ATOMIC_FLAG_INIT)
{
}
inline SpinLock &operator=(const SpinLock &)
{
return *this;
}
inline void lock()
{
while (true)
{
for (int32_t i = 0; i < 10000; ++i)
{
if (!m_lock.test_and_set(std::memory_order_acquire))
{
return;
}
}
std::this_thread::yield(); // A great idea that you don't see in many spinlock examples
}
}
inline bool try_lock()
{
return !m_lock.test_and_set(std::memory_order_acquire);
}
inline void unlock()
{
m_lock.clear(std::memory_order_release);
}
private:
std::atomic_flag m_lock;
};
#endif
Stolen straight from docs
spinlock.h
#include <atomic>
using namespace std;
/* Fast userspace spinlock */
class spinlock {
public:
spinlock(std::atomic_flag& flag) : flag(flag) {
while (flag.test_and_set(std::memory_order_acquire)) ;
};
~spinlock() {
flag.clear(std::memory_order_release);
};
private:
std::atomic_flag& flag;
};
usage.cpp
#include "spinlock.h"
atomic_flag kartuliga = ATOMIC_FLAG_INIT;
void mutually_exclusive_function()
{
spinlock lock(kartuliga);
/* your shared-resource-using code here */
}

C++ Critical Section not working

My critical section code does not work!!!
Backgrounder.run IS able to modify MESSAGE_QUEUE g_msgQueue and LockSections destructor hadn't been called yet !!!
Extra code :
typedef std::vector<int> MESSAGE_LIST; // SHARED OBJECT .. MUST LOCK!
class MESSAGE_QUEUE : MESSAGE_LIST{
public:
MESSAGE_LIST * m_pList;
MESSAGE_QUEUE(MESSAGE_LIST* pList){ m_pList = pList; }
~MESSAGE_QUEUE(){ }
/* This class will be shared between threads that means any
* attempt to access it MUST be inside a critical section.
*/
void Add( int messageCode ){ if(m_pList) m_pList->push_back(messageCode); }
int getLast()
{
if(m_pList){
if(m_pList->size() == 1){
Add(0x0);
}
m_pList->pop_back();
return m_pList->back();
}
}
void removeLast()
{
if(m_pList){
m_pList->erase(m_pList->end()-1,m_pList->end());
}
}
};
class Backgrounder{
public:
MESSAGE_QUEUE* m_pMsgQueue;
static void __cdecl Run( void* args){
MESSAGE_QUEUE* s_pMsgQueue = (MESSAGE_QUEUE*)args;
if(s_pMsgQueue->getLast() == 0x45)printf("It's a success!");
else printf("It's a trap!");
}
Backgrounder(MESSAGE_QUEUE* pMsgQueue)
{
m_pMsgQueue = pMsgQueue;
_beginthread(Run,0,(void*)m_pMsgQueue);
}
~Backgrounder(){ }
};
int main(){
MESSAGE_LIST g_List;
CriticalSection crt;
ErrorHandler err;
LockSection lc(&crt,&err); // Does not work , see question #2
MESSAGE_QUEUE g_msgQueue(&g_List);
g_msgQueue.Add(0x45);
printf("%d",g_msgQueue.getLast());
Backgrounder back_thread(&g_msgQueue);
while(!kbhit());
return 0;
}
#ifndef CRITICALSECTION_H
#define CRITICALSECTION_H
#include <windows.h>
#include "ErrorHandler.h"
class CriticalSection{
long m_nLockCount;
long m_nThreadId;
typedef CRITICAL_SECTION cs;
cs m_tCS;
public:
CriticalSection(){
::InitializeCriticalSection(&m_tCS);
m_nLockCount = 0;
m_nThreadId = 0;
}
~CriticalSection(){ ::DeleteCriticalSection(&m_tCS); }
void Enter(){ ::EnterCriticalSection(&m_tCS); }
void Leave(){ ::LeaveCriticalSection(&m_tCS); }
void Try();
};
class LockSection{
CriticalSection* m_pCS;
ErrorHandler * m_pErrorHandler;
bool m_bIsClosed;
public:
LockSection(CriticalSection* pCS,ErrorHandler* pErrorHandler){
m_bIsClosed = false;
m_pCS = pCS;
m_pErrorHandler = pErrorHandler;
// 0x1AE is code prefix for critical section header
if(!m_pCS)m_pErrorHandler->Add(0x1AE1);
if(m_pCS)m_pCS->Enter();
}
~LockSection(){
if(!m_pCS)m_pErrorHandler->Add(0x1AE2);
if(m_pCS && m_bIsClosed == false)m_pCS->Leave();
}
void ForceCSectionClose(){
if(!m_pCS)m_pErrorHandler->Add(0x1AE3);
if(m_pCS){m_pCS->Leave();m_bIsClosed = true;}
}
};
/*
Safe class basic structure;
class SafeObj
{
CriticalSection m_cs;
public:
void SafeMethod()
{
LockSection myLock(&m_cs);
//add code to implement the method ...
}
};
*/
#endif
Two questions in one. I don't know about the first, but the critical section part is easy to explain. The background thread isn't trying to claim the lock and so, of course, is not blocked. You need to make the critical section object crt visible to the thread so that it can lock it.
The way to use this lock class is that each section of code that you want serialised must create a LockSection object and hold on to it until the end of the serialised block:
Thread 1:
{
LockSection lc(&crt,&err);
//operate on shared object from thread 1
}
Thread 2:
{
LockSection lc(&crt,&err);
//operate on shared object from thread 2
}
Note that it has to be the same critical section instance crt that is used in each block of code that is to be serialised.
This code has a number of problems.
First of all, deriving from the standard containers is almost always a poor idea. In this case you're using private inheritance, which reduces the problems, but doesn't eliminate them entirely. In any case, you don't seem to put the inheritance to much (any?) use anyway. Even though you've derived your MESSAGE_QUEUE from MESSAGE_LIST (which is actually std::vector<int>), you embed a pointer to an instance of a MESSAGE_LIST into MESSAGE_QUEUE anyway.
Second, if you're going to use a queue to communicate between threads (which I think is generally a good idea) you should make the locking inherent in the queue operations rather than requiring each thread to manage the locking correctly on its own.
Third, a vector isn't a particularly suitable data structure for representing a queue anyway, unless you're going to make it fixed size, and use it roughly like a ring buffer. That's not a bad idea either, but it's quite a bit different from what you've done. If you're going to make the size dynamic, you'd probably be better off starting with a deque instead.
Fourth, the magic numbers in your error handling (0x1AE1, 0x1AE2, etc.) is quite opaque. At the very least, you need to give these meaningful names. The one comment you have does not make the use anywhere close to clear.
Finally, if you're going to go to all the trouble of writing code for a thread-safe queue, you might as well make it generic so it can hold essentially any kind of data you want, instead of dedicating it to one specific type.
Ultimately, your code doesn't seem to save the client much work or trouble over using the Windows functions directly. For the most part, you've just provided the same capabilities under slightly different names.
IMO, a thread-safe queue should handle almost all the work internally, so that client code can use it about like it would any other queue.
// Warning: untested code.
// Assumes: `T::T(T const &) throw()`
//
template <class T>
class queue {
std::deque<T> data;
CRITICAL_SECTION cs;
HANDLE semaphore;
public:
queue() {
InitializeCriticalSection(&cs);
semaphore = CreateSemaphore(NULL, 0, 2048, NULL);
}
~queue() {
DeleteCriticalSection(&cs);
CloseHandle(semaphore);
}
void push(T const &item) {
EnterCriticalSection(&cs);
data.push_back(item);
LeaveCriticalSection(&cs);
ReleaseSemaphore(semaphore, 1, NULL);
}
T pop() {
WaitForSingleObject(semaphore, INFINITE);
EnterCriticalSection(&cs);
T item = data.front();
data.pop_front();
LeaveCriticalSection(&cs);
return item;
}
};
HANDLE done;
typedef queue<int> msgQ;
enum commands { quit, print };
void backgrounder(void *qq) {
// I haven't quite puzzled out what your background thread
// was supposed to do, so I've kept it really simple, executing only
// the two commands listed above.
msgQ *q = (msgQ *)qq;
int command;
while (quit != (command = q->pop()))
printf("Print\n");
SetEvent(done);
}
int main() {
msgQ q;
done = CreateEvent(NULL, false, false, NULL);
_beginthread(backgrounder, 0, (void*)&q);
for (int i=0; i<20; i++)
q.push(print);
q.push(quit);
WaitForSingleObject(done, INFINITE);
return 0;
}
Your background thread needs access to the same CriticalSection object and it needs to create LockSection objects to lock it -- the locking is collaborative.
You are trying to return the last element after popping it.