How to Create Thread-Safe Buffers / POD? - c++

My problem is quite common I suppose, but it drives me crazy:
I have a multi-threaded application with 5 threads. 4 of these threads do their job, like network communication and local file system access, and then all write their output to a data structure of this form:
struct Buffer {
std::vector<std::string> lines;
bool has_been_modified;
}
The 5th thread prints these buffer/structures to the screen:
Buffer buf1, buf2, buf3, buf4;
...
if ( buf1.has_been_modified ||
buf2.has_been_modified ||
buf3.has_been_modified ||
buf4.has_been_modified )
{
redraw_screen_from_buffers();
}
How do I protect the buffers from being overwritten while they are either being read from or written to?
I can't find a proper solution, although I think this has to be a quiet common problem.
Thanks.

You should use a mutex. The mutex class is std::mutex. With C++11 you can use std::lock_guard<std::mutex> to encapsulate the mutex using RAII. So you would change your Buffer struct to
struct Buffer {
std::vector<std::string> lines;
bool has_been_modified;
std::mutex mutex;
};
and whenever you read or write to the buffer or has_been_modified you would do
std::lock_guard<std::mutex> lockGuard(Buffer.mutex); //Do this for each buffer you want to access
... //Access buffer here
and the mutex will be automatically released by the lock_guard when it is destroyed.
You can read more about mutexes here.

You can use a mutex (or mutexes) around the buffers to ensure that they're not modified by multiple threads at the same time.
// Mutex shared between the multiple threads
std::mutex g_BufferMutex;
void redraw_screen_from_buffers()
{
std::lock_guard<std::mutex> bufferLockGuard(g_BufferMutex);
//redraw here after mutex has been locked.
}
Then your buffer modification code would have to lock the same mutex when the buffers are being modified.
void updateBuffer()
{
std::lock_guard<std::mutex> bufferLockGuard(g_BufferMutex);
// update here after mutex has been locked
}
This contains some mutex examples.

What appears you want to accomplish is to have multiple threads/workers and one observer. The latter needs to do its job only when all workers are done/signal. If this is the case then check code in this SO q/a. std::condition_variable - Wait for several threads to notify observer

mutex are a very nice thing when trying to avoid dataraces, and I'm sure the answer posted by #Phantom will satisfy most people. However, one should know that this is not scalable to large systems.
By locking you are synchronising your threads. As only one at a time can be accessing the vector, on thread writting to the container will cause the other one to wait for it to finish ... with may be good for you but causes serious performance botleneck when high performance is needed.
The best solution would be to use a more complexe lock free structure. Unfortunatelly I don't think there is any standart lockfree structure in the STL. One exemple of lockfree queue is available here
Using such a structure, your 4 working threads would be able to enqueue messages to the container while the 5th one would dequeue them, without any dataraces
More on lockfree datastructure can be found here !

Related

Lock stepping pthread mutex

I don't know if this is good practice or not but I am doing work on a real time stream of input data and using pthreads in lockstep order to allow one thread at a time to do different operations at the same. This is my program flow for each thread:
void * my_thread() {
pthread_mutex_lock(&read_mutex);
/*
read data from a stream such as stdin into global buffer
*/
pthread_mutex_lock(&operation_mutex);
pthread_mutex_unlock(&read_mutex);
/*
perform some work on the data you read
*/
pthread_mutex_lock(&output_mutex);
pthread_mutex_unlock(&operation_mutex);
/*
Write the data to output such as stdout
*/
pthread_mutex_unlock(&output_mutex);
}
I know there is pthread conditional lock, but is my approach a good idea or a bad idea? I tested this on various size streams and I am trying to think of corner cases to make this deadlock, produce race condition, or both. I know mutexes don't guarantee thread order execution but I need help to think of scenarios that will break this.
UPDATE:
I stepped away from this, but had sometime recently to rethink about this. I rewrote the code using C++ threads and mutexes. I am trying to use condition variables but have no such luck. This is my approach to the problem:
void my_thread_v2() {
//Let only 1 thread read in at a time
std::unique_lock<std::mutex> stdin_lock(stdin_mutex);
stdin_cond.wait(stdin_lock);
/*
Read from stdin stream
*/
//Unlock the stdin mutex
stdin_lock.unlock();
stdin_cond.notify_one();
//Lock step
std::unique_lock<std::mutex> operation_lock(operation_mutex);
operation_cond.wait(operation_lock);
/*
Perform work on the data that you read in
*/
operation_lock.unlock();
operation_cond.notify_one();
std::unique_lock<std::mutex> stdout_lock(stdout_mutex);
stdout_cond.wait(stdout_lock);
/*
Write the data out to stdout
*/
//Unlock the stdout mutex
stdout_lock.unlock();
stdout_cond.notify_one();
}
I know the issue with this code is that there is no way to signal the first condition. I definitely am not understanding the proper use of the condition variable. I looked at various examples on cpp references, but can't seem to get away from the thought that the initial approach maybe the only way of doing what I want to do which is to lock step the threads. Can someone shed some light on this?
UPDATE 2:
So I implemented a simple Monitor class that utilizes C++ condition_variable and unique_lock:
class ThreadMonitor{
public:
ThreadMonitor() : is_occupied(false) {}
void Wait() {
std::unique_lock<std::mutex> lock(mx);
while(is_occupied) {
cond.wait(lock);
}
is_occupied = true;
}
void Notify() {
std::unique_lock<std::mutex> lock(mx);
is_occupied = false;
cond.notify_one();
}
private:
bool is_occupied;
std::condition_variable cond;
std::mutex mx;
};
This is my initial approach assuming i have three ThreadMonitors called stdin_mon, operation_mon, and stdout_mon:
void my_thread_v3() {
//Let only 1 thread read in at a time
stdin_mon.Wait();
/*
Read from stdin stream
*/
stdin_mon.Notify();
operation_mon.Wait();
/*
Perform work on the data that you read in
*/
operation_mon.Notify();
stdout_mon.Wait();
/*
Write the data out to stdout
*/
//Unlock the stdout
stdout_mon.notify();
}
The issue with this was that the data was still being corrupted so I had to change back to the original logic of lock stepping the threads:
void my_thread_v4() {
//Let only 1 thread read in at a time
stdin_mon.Wait();
/*
Read from stdin stream
*/
operation_mon.Wait();
stdin_mon.Notify();
/*
Perform work on the data that you read in
*/
stdout_mon.Wait();
operation_mon.Notify();
/*
Write the data out to stdout
*/
//Unlock the stdout
stdout_mon.notify();
}
I am beginning to suspect that if thread order matters that this is the only way to handle it. I am also questioning what the benefit is of using a Monitor that utilizes condition_variable over just using a mutex.
The problem with your approach is that you still can modify the data while another thread is reading it:
Thread A acquired read, then operation and released read again, and starts writing some data, but is interrupted.
Now thread B operates, acquires read and can read the partially modified, possibly inconsistent data!
I assume you want to allow multiple threads reading the same data without blocking, but as soon as writing, the data shall be protected. Finally, while outputting data, we are just reading the modified data again and thus can do this concurrently again, but need to prevent simultaneous write.
Instead of having multiple mutex instances, you can do this better with a read/write mutex:
Any function only reading the data acquires the read lock.
Any function intending to write acquires write lock right from the start (be aware that first acquiring read, then write lock without releasing the read lock in between can result in dead-lock; if you release read lock in between, though, your data handling needs to be robust against data being modified by another thread in between as well!).
Reducing write lock to shared without releasing in between is safe, so we can do so now before outputting. If data must not be modified in between writing data and outputting it, we even need to do this without entirely releasing the lock.
Last point is problematic as not supported neither by C++ standard's thread support library nor by pthreads library.
For C++ boost provides a solution; if you don't want to or cannot (C!) use boost a simple, but possibly not most efficient approach would be protecting acquiring write lock via another mutex:
acquire standard (non-rw) mutex protecting the read write mutex
acquire RW mutex for writing
release protecting mutex
read data, write modified data
acquire protecting mutex
release RW mutex
re-acquire RW mutex for reading; it does not matter if another thread acquired for reading as well, we only need to protect against locking for write here
release protecting mutex
output
release RW mutex (no need to protect)...
Non-modifying functions can just acquire the read lock without any further protection, there aren't any conflicts with...
In C++, you'd prefer using the thread support library and additionally gain platform independent code for free, in C, you would use a standard pthread mutex for protecting acquiring the write lock just as you did before and use the RW variants from pthread for the read write lock.

Implement a high performance mutex similar to Qt's one

I have a multi-thread scientific application where several computing threads (one per core) have to store their results in a common buffer. This requires a mutex mechanism.
Working threads spend only a small fraction of their time writing to the buffer, so the mutex is unlocked most of the time, and locks have a high probability to succeed immediately without waiting for another thread to unlock.
Currently, I have used Qt's QMutex for the task, and it works well : the mutex has a negligible overhead.
However, I have to port it to c++11/STL only. When using std::mutex, the performance drops by 66% and the threads spend most of their time locking the mutex.
After another question, I figured that Qt uses a fast locking mechanism based on a simple atomic flag, optimized for cases where the mutex is not already locked. And falls back to a system mutex when concurrent locking occurs.
I would like to implement this in STL. Is there a simple way based on std::atomic and std::mutex ? I have digged in Qt's code but it seems overly complicated for my use (I do not need locks timeouts, pimpl, small footprint etc...).
Edit : I have tried a spinlock, but this does not work well because :
Periodically (every few seconds), another thread locks the mutexes and flushes the buffer. This takes some time, so all worker threads get blocked at this time. The spinlocks make the scheduling busy, causing the flush to be 10-100x slower than with a proper mutex. This is not acceptable
Edit : I have tried this, but it's not working (locks all threads)
class Mutex
{
public:
Mutex() : lockCounter(0) { }
void lock()
{
if(lockCounter.fetch_add(1, std::memory_order_acquire)>0)
{
std::unique_lock<std::mutex> lock(internalMutex);
cv.wait(lock);
}
}
void unlock();
{
if(lockCounter.fetch_sub(1, std::memory_order_release)>1)
{
cv.notify_one();
}
}
private:
std::atomic<int> lockCounter;
std::mutex internalMutex;
std::condition_variable cv;
};
Thanks!
Edit : Final solution
MikeMB's fast mutex was working pretty well.
As a final solution, I did:
Use a simple spinlock with a try_lock
When a thread fails to try_lock, instead of waiting, they fill a queue (which is not shared with other threads) and continue
When a thread gets a lock, it updates the buffer with the current result, but also with the results stored in the queue (it processes its queue)
The buffer flushing was made much more efficiently : the blocking part only swaps two pointers.
General Advice
As was mentioned in some comments, I'd first have a look, whether you can restructure your program design to make the mutex implementation less critical for your performance .
Also, as multithreading support in standard c++ is pretty new and somewhat infantile, you sometimes just have to fall back on platform specific mechanisms, like e.g. a futex on linux systems or critical sections on windows or non-standard libraries like Qt.
That being said, I could think of two implementation approaches that might potentially speed up your program:
Spinlock
If access collisions happen very rarely, and the mutex is only hold for short periods of time (two things one should strive to achieve anyway of course), it might be most efficient to just use a spinlock, as it doesn't require any system calls at all and it's simple to implement (taken from cppreference):
class SpinLock {
std::atomic_flag locked ;
public:
void lock() {
while (locked.test_and_set(std::memory_order_acquire)) {
std::this_thread::yield(); //<- this is not in the source but might improve performance.
}
}
void unlock() {
locked.clear(std::memory_order_release);
}
};
The drawback of course is that waiting threads don't stay asleep and steal processing time.
Checked Locking
This is essentially the idea you demonstrated: You first make a fast check, whether locking is actually needed based on an atomic swap operation and use a heavy std::mutex only if it is unavoidable.
struct FastMux {
//Status of the fast mutex
std::atomic<bool> locked;
//helper mutex and vc on which threads can wait in case of collision
std::mutex mux;
std::condition_variable cv;
//the maximum number of threads that might be waiting on the cv (conservative estimation)
std::atomic<int> cntr;
FastMux():locked(false), cntr(0){}
void lock() {
if (locked.exchange(true)) {
cntr++;
{
std::unique_lock<std::mutex> ul(mux);
cv.wait(ul, [&]{return !locked.exchange(true); });
}
cntr--;
}
}
void unlock() {
locked = false;
if (cntr > 0){
std::lock_guard<std::mutex> ul(mux);
cv.notify_one();
}
}
};
Note that the std::mutex is not locked in between lock() and unlock() but it is only used for handling the condition variable. This results in more calls to lock / unlock if there is high congestion on the mutex.
The problem with your implementation is, that cv.notify_one(); can potentially be called between if(lockCounter.fetch_add(1, std::memory_order_acquire)>0) and cv.wait(lock); so your thread might never wake up.
I didn't do any performance comparisons against a fixed version of your proposed implementation though so you just have to see what works best for you.
Not really an answer per definition, but depending on the specific task, a lock-free queue might help getting rid of the mutex at all. This would help the design, if you have multiple producers and a single consumer (or even multiple consumers). Links:
Though not directly C++/STL, Boost.Lockfree provides such a queue.
Another option is the lock-free queue implementation in "C++ Concurrency in Action" by Anthony Williams.
A Fast Lock-Free Queue for C++
Update wrt to comments:
Queue size / overflow:
Queue overflowing can be avoided by i) making the queue large enough or ii) by making the producer thread wait with pushing data once the queue is full.
Another option would be to use multiple consumers and multiple queues and implement a parallel reduction but this depends on how the data is treated.
Consumer thread:
The queue could use std::condition_variable and make the consumer thread wait until there is data.
Another option would be to use a timer for checking in regular intervals (polling) for the queue being non-empty, once it is non-empty the thread can continuously fetch data and the go back into wait-mode.

Multiple threads and mutexes

I am very new to Linux programming so bear with me. I have 2 thread type that perform different operations so I want each one to have it's own mutex. Here is the code I am using , is it good ? If not why ?
static pthread_mutex_t cs_mutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_mutex_t cs_mutex2 = PTHREAD_MUTEX_INITALIZER;
void * Thread1(void * lp)
{
int * sock = (int*)lp;
char buffer[2024];
int bytecount = recv(*sock, buffer, 2048, 0);
while (0 == 0)
{
if ((bytecount ==0) || (bytecount == -1))
{
pthread_mutex_lock(&cs_mutex);
//Some uninteresting operations witch plays with set 1 of global variables;
pthread_mutex_unlock(&cs_mutex);
}
}
}
void * Thread2(void * lp)
{
while (0 == 0)
{
pthread_mutex_lock(&cs_mutex2);
//Some uninteresting operations witch plays with some global variables;
pthread_mutex_unlock(&cs_mutex2);
}
}
Normally, a mutex is not thread related.
It ensures that a critical area is only accessed by a single thread.
So if u have some shared areas, like processing the same array by multiple threads, then you must ensure exclusive access for this area.
That means, you do not need a mutex for each thread. You need a mutex for the critical area.
If you only have one driver, there is no advantage to having two cars. Your Thread2 code can only make useful progress while holding cs_mutex2. So there's no point to having more than one thread running that code. Only one thread can hold the mutex at a time, and the other thread can do no useful work.
So all you'll accomplish is that occasionally the thread that doesn't hold the mutex will try to run and have to wait for the other. And occasionally the thread that does hold the mutex will try to release and re-acquire it and get pre-empted by the other.
This is a completely pointless use of threads.
I see three problems here. There's a question your infinite loop, another about your intention in having multiple threads, and there's a future maintainability "gotcha" lurking.
First
int bytecount = recv(*sock, buffer, 2048, 0);
while (0 == 0)
Is that right? You read some stuff from a socket, and start an infinite loop without ever closing the socket? I can only assume that you do some more reading in the loop, but in which case you are waiting for an external event while holding the mutex. In general that's a bad pattern limiting your concurrency. A possibly pattern is to have one thread reading the data and then passing the read data to other threads which do the processing.
Next, you have two different sets of resources each protected by their own mutex. You then intend to have a set of Threads for each resource. But each thread has the pattern
take mutex
lots of processing
release mutex
tiny window (a few machine instructions)
take mutex again
lots of processing
release mutex
next tiny window
There's virtually no opportunity for two threads to work in parallel. I question whether your have need for multiple threads for each resource.
Last there's a potential maintenance issue. I'm just pointing this out for future reference, I don't think you need to do anything right now. You have two functions, intended for use by two threads, but in the end they are just functions that can be called by anyone. If later maintenance results in those functions (or refactored subsets of the functions) then you could get two threads
take mutex 1
take mutex 2
and the other
take mutex 2
take mutex 1
Bingo: deadlock.
Not an easy problem to avoid, but at the very least one can aid the maintainer by careful naming choices and refactoring.
I think your code is correct, however please note 2 things:
It is not exception safe. If exception is thrown from Some uninteresting operations then your mutex will be never unlocked -> deadlock
You could also consider using std::mutex or boost::mutex instead of raw mutexes. For mutex locking it's better to use boost::mutex::scoped_lock (or std:: analog with modern compiler)
void test()
{
// not synch code here
{
boost::mutex::scoped_lock lock(mutex_);
// synchronized code here
}
}
If you have 2 different sets of data and 2 different threads working on these sets -- why do you need mutexes at all? Usually, mutexes are used when you deal with some shared piece of data and you don't want two threads to deal with it simultaneously, so you lock it with mutex, do some stuff, unlock.

Lock std::map C++

I have a problem with using std:map in my multithread application. I need to lock the map object when thread is writing to this object. Parallely another threads which reading this object should shop until writing process is finish.
Sample code:
std::map<int, int> ClientTable;
int write_function() //<-- only one thread uses this function
{
while(true)
{
//lock ClientTable
ClientTable.insert(std::pair<int, int>(1, 2)); // random values
//unlock ClientTable
//thread sleeps for 2 secs
}
}
int read_function() //<--- many thread uses this function
{
while(true)
{
int test = ClientTable[2]; // just for test
}
}
How to lock this std::map object and correctly synchronise this threads?
Looks like what you need is a typical read-write lock, allowing any number of readers but a single writer. You can have a look at boost's shared_mutex.
Additional usage examples can be found here: Example for boost shared_mutex (multiple reads/one write)?
Well, since a std::map doesn't have a builtin lock, you would have to use a separate lock that protects the map. If the map is all that's protected by that lock you could subclass map to add the lock there along with "lock" and "unlock" calls, but if the lock will be used for other items as well then it's probably not the best idea to do that.
As for "how to correctly synchronize" -- that's very specific to the application at hand. However, for the example given, you have to insert the lock/unlock calls around the read operation as well.
If you have read/write locks, this might also be a good application for one of these.

Is there an 'upgrade_to_unique_lock' for boost::interprocess?

I am looking for the best way to effectively share chunks of data between two (or more) processes in a writer-biased reader/writer model.
My current tests are with boost::interprocess. I have created some managed_shared_memory and am attempting to lock access to the data chunk by using an interprocess mutex stored in the shared memory.
However, even when using sharable_lock on the reader and upgradable_lock on the writer, the client will read fragmented values during write operations instead of blocking. While doing a similar reader/writer setup between threads in a single process, I used upgrade_to_unique_lock to solve this issue. However, I have not found its boost::interprocess equivalent. Does one exist?
Server (writer):
while (1) {
// Get upgrade lock on the mutex
upgradable_lock <MutexType> lock(myMutex);
// Need 'upgrade_to_unique_lock' here so shared readers will block until
// write operation is finished.
// Write values here
}
Client (reader)
while (1)
{
// Get shared access
sharable_lock <MutexType> lock(myMutex);
// Read p1's data here -- occasionally invalid!
}
I guess the bigger question at hand is this: is an interprocess mutex even the proper way to access shared memory between processes in a writer-biased setup?
Note: using Boost 1.44.0
All Boost.Interprocess upgradable locks support upgrade per this. Definition here.
Regarding your broader question - I would think that this is precisely what you want. Readers can still work concurrently, and you have to prevent concurrent writes. Unless you can partition the shared memory such that more constrained access is guaranteed, this looks the best.
Solution by OP.
The answer, as stated in the question comments is to use the member function unlock_upgradable_and_lock. If there is an boost::interprocess analog to upgrade_to_unique_lock, I don't know where it is. But the writer() function can be rewritten as:
while (1) {
// Get upgrade lock on the mutex
myMutex.lock_upgradable();
// Get exclusive access and block everyone else
myMutex.unlock_upgradable_and_lock();
// Write values here
// Unlock the mutex (and stop blocking readers)
myMutex.unlock();
}