Suitable thread "fence" for a worker thread

Suitable thread "fence" for a worker thread - c++

I have a Worker class that runs its own thread to do some work in parallel. During specific intervalls I want it to be idle. I have an interface
class Worker
{
mutex m_wait;
void pause() {
m_wait.lock();
}
void continue() {
m_wait.unlock();
}
static doWork(mutex& lock) {
while(true) {
{
lock_guard _l(lock);
// Lock and immidiatly unlock again
}
// Only *some* work
}
}
};
As you can see the doWork method checks once in a while if the mutex is locked (by pause()) and doesn't continue if that is.
I have some questions about this implementation regarding speed and alternatives:
How much overhead is there from checking the lock? Suppose I make a matrix-multiplication of 10x10*10x10 in the real work (small datasets). then
Is there an alternative, that achieves the same effect but has less overhead?

The cost of locking/unlocking a mutex depends on your platform and specific mutex implementation. In the best case locking/unlocking a mutex where there is no contention is equivalent of two interlocked memory operations.
In case of contention locking becomes a costly operation because most implementations will make a kernel call to wait for the mutex to unlock. However in that case you probably do not care as the whole point would be to stop the thread in the first place.

Related

C++ atomics: how to allow only a single thread to access a function?

I'd like to write a function that is accessible only by a single thread at a time. I don't need busy waits, a brutal 'rejection' is enough if another thread is already running it. This is what I have come up with so far:
std::atomic<bool> busy (false);
bool func()
{
if (m_busy.exchange(true) == true)
return false;
// ... do stuff ...
m_busy.exchange(false);
return true;
}
Is the logic for the atomic exchange correct?
Is it correct to mark the two atomic operations as std::memory_order_acq_rel? As far as I understand a relaxed ordering (std::memory_order_relaxed) wouldn't be enough to prevent reordering.

Your atomic swap implementation might work. But trying to do thread safe programming without a lock is most always fraught with issues and is often harder to maintain.
Unless there's a performance improvement that's needed, then std::mutex with the try_lock() method is all you need, eg:
std::mutex mtx;
bool func()
{
// making use of std::unique_lock so if the code throws an
// exception, the std::mutex will still get unlocked correctly...
std::unique_lock<std::mutex> lck(mtx, std::try_to_lock);
bool gotLock = lck.owns_lock();
if (gotLock)
{
// do stuff
}
return gotLock;
}

Your code looks correct to me, as long as you leave the critical section by falling out, not returning or throwing an exception.
You can unlock with a release store; an RMW (like exchange) is unnecessary. The initial exchange only needs acquire. (But does need to be an atomic RMW like exchange or compare_exchange_strong)
Note that ISO C++ says that taking a std::mutex is an "acquire" operation, and releasing is is a "release" operation, because that's the minimum necessary for keeping the critical section contained between the taking and the releasing.
Your algo is exactly like a spinlock, but without retry if the lock's already taken. (i.e. just a try_lock). All the reasoning about necessary memory-order for locking applies here, too. What you've implemented is logically equivalent to the try_lock / unlock in #selbie's answer, and very likely performance-equivalent, too. If you never use mtx.lock() or whatever, you're never actually blocking i.e. waiting for another thread to do something, so your code is still potentially lock-free in the progress-guarantee sense.
Rolling your own with an atomic<bool> is probably good; using std::mutex here gains you nothing; you want it to be doing only this for try-lock and unlock. That's certainly possible (with some extra function-call overhead), but some implementations might do something more. You're not using any of the functionality beyond that. The one nice thing std::mutex gives you is the comfort of knowing that it safely and correctly implements try_lock and unlock. But if you understand locking and acquire / release, it's easy to get that right yourself.
The usual performance reason to not roll your own locking is that mutex will be tuned for the OS and typical hardware, with stuff like exponential backoff, x86 pause instructions while spinning a few times, then fallback to a system call. And efficient wakeup via system calls like Linux futex. All of this is only beneficial to the blocking behaviour. .try_lock leaves that all unused, and if you never have any thread sleeping then unlock never has any other threads to notify.
There is one advantage to using std::mutex: you can use RAII without having to roll your own wrapper class. std::unique_lock with the std::try_to_lock policy will do this. This will make your function exception-safe, making sure to always unlock before exiting, if it got the lock.

Can a single thread double-lock a mutex?

If I want a thread to pause for a while, until certain conditions are met, can I double-lock the mutex?
Here's a basic example of the idea I'm working with right now:
class foo
{
public:
static std::mutex processingMutex;
void infinite_processing_loop()
{
processingMutex.lock(); // Lock the mutex, initially
while(true)
{
if ( /*ready for this thread to process data*/ )
{
// ... perform one round of data processing ...
}
else // NOT ready for this thread to process data
{
/* Pause this thread,
and wait for another thread to unlock this mutex
(when more data might be ready for processing) */
processingMutex.lock();
}
}
processingMutex.unlock(); // Will never be executed
}
};
Will the processing thread halt, when it tries to double-lock the mutex?
And can another thread unlock the same mutex, causing the halted processing thread to resume?
Or does std::mutex automatically recognize when mutex is locked twice from the same processing thread?

std::mutex will usually deadlock on second attempt to lock by the owner thread. And even if it didn't, it's considered a bug for an application to attempt with this primitive.
std::recursive_mutex will allow re-entrant locks. So if you lock twice, you need to unlock twice before the mutex is available for other threads to grab.
There's a school of thought that any design that involves recursively acquiring a mutex after it's already been locked is a design flaw. I'll try to dig up that thread and add it.

Implement a high performance mutex similar to Qt's one

I have a multi-thread scientific application where several computing threads (one per core) have to store their results in a common buffer. This requires a mutex mechanism.
Working threads spend only a small fraction of their time writing to the buffer, so the mutex is unlocked most of the time, and locks have a high probability to succeed immediately without waiting for another thread to unlock.
Currently, I have used Qt's QMutex for the task, and it works well : the mutex has a negligible overhead.
However, I have to port it to c++11/STL only. When using std::mutex, the performance drops by 66% and the threads spend most of their time locking the mutex.
After another question, I figured that Qt uses a fast locking mechanism based on a simple atomic flag, optimized for cases where the mutex is not already locked. And falls back to a system mutex when concurrent locking occurs.
I would like to implement this in STL. Is there a simple way based on std::atomic and std::mutex ? I have digged in Qt's code but it seems overly complicated for my use (I do not need locks timeouts, pimpl, small footprint etc...).
Edit : I have tried a spinlock, but this does not work well because :
Periodically (every few seconds), another thread locks the mutexes and flushes the buffer. This takes some time, so all worker threads get blocked at this time. The spinlocks make the scheduling busy, causing the flush to be 10-100x slower than with a proper mutex. This is not acceptable
Edit : I have tried this, but it's not working (locks all threads)
class Mutex
{
public:
Mutex() : lockCounter(0) { }
void lock()
{
if(lockCounter.fetch_add(1, std::memory_order_acquire)>0)
{
std::unique_lock<std::mutex> lock(internalMutex);
cv.wait(lock);
}
}
void unlock();
{
if(lockCounter.fetch_sub(1, std::memory_order_release)>1)
{
cv.notify_one();
}
}
private:
std::atomic<int> lockCounter;
std::mutex internalMutex;
std::condition_variable cv;
};
Thanks!
Edit : Final solution
MikeMB's fast mutex was working pretty well.
As a final solution, I did:
Use a simple spinlock with a try_lock
When a thread fails to try_lock, instead of waiting, they fill a queue (which is not shared with other threads) and continue
When a thread gets a lock, it updates the buffer with the current result, but also with the results stored in the queue (it processes its queue)
The buffer flushing was made much more efficiently : the blocking part only swaps two pointers.

General Advice
As was mentioned in some comments, I'd first have a look, whether you can restructure your program design to make the mutex implementation less critical for your performance .
Also, as multithreading support in standard c++ is pretty new and somewhat infantile, you sometimes just have to fall back on platform specific mechanisms, like e.g. a futex on linux systems or critical sections on windows or non-standard libraries like Qt.
That being said, I could think of two implementation approaches that might potentially speed up your program:
Spinlock
If access collisions happen very rarely, and the mutex is only hold for short periods of time (two things one should strive to achieve anyway of course), it might be most efficient to just use a spinlock, as it doesn't require any system calls at all and it's simple to implement (taken from cppreference):
class SpinLock {
std::atomic_flag locked ;
public:
void lock() {
while (locked.test_and_set(std::memory_order_acquire)) {
std::this_thread::yield(); //<- this is not in the source but might improve performance.
}
}
void unlock() {
locked.clear(std::memory_order_release);
}
};
The drawback of course is that waiting threads don't stay asleep and steal processing time.
Checked Locking
This is essentially the idea you demonstrated: You first make a fast check, whether locking is actually needed based on an atomic swap operation and use a heavy std::mutex only if it is unavoidable.
struct FastMux {
//Status of the fast mutex
std::atomic<bool> locked;
//helper mutex and vc on which threads can wait in case of collision
std::mutex mux;
std::condition_variable cv;
//the maximum number of threads that might be waiting on the cv (conservative estimation)
std::atomic<int> cntr;
FastMux():locked(false), cntr(0){}
void lock() {
if (locked.exchange(true)) {
cntr++;
{
std::unique_lock<std::mutex> ul(mux);
cv.wait(ul, [&]{return !locked.exchange(true); });
}
cntr--;
}
}
void unlock() {
locked = false;
if (cntr > 0){
std::lock_guard<std::mutex> ul(mux);
cv.notify_one();
}
}
};
Note that the std::mutex is not locked in between lock() and unlock() but it is only used for handling the condition variable. This results in more calls to lock / unlock if there is high congestion on the mutex.
The problem with your implementation is, that cv.notify_one(); can potentially be called between if(lockCounter.fetch_add(1, std::memory_order_acquire)>0) and cv.wait(lock); so your thread might never wake up.
I didn't do any performance comparisons against a fixed version of your proposed implementation though so you just have to see what works best for you.

Not really an answer per definition, but depending on the specific task, a lock-free queue might help getting rid of the mutex at all. This would help the design, if you have multiple producers and a single consumer (or even multiple consumers). Links:
Though not directly C++/STL, Boost.Lockfree provides such a queue.
Another option is the lock-free queue implementation in "C++ Concurrency in Action" by Anthony Williams.
A Fast Lock-Free Queue for C++
Update wrt to comments:
Queue size / overflow:
Queue overflowing can be avoided by i) making the queue large enough or ii) by making the producer thread wait with pushing data once the queue is full.
Another option would be to use multiple consumers and multiple queues and implement a parallel reduction but this depends on how the data is treated.
Consumer thread:
The queue could use std::condition_variable and make the consumer thread wait until there is data.
Another option would be to use a timer for checking in regular intervals (polling) for the queue being non-empty, once it is non-empty the thread can continuously fetch data and the go back into wait-mode.

Is there a facility in boost to allow for write-biased locking?

If I have the following code:
#include <boost/date_time.hpp>
#include <boost/thread.hpp>
boost::shared_mutex g_sharedMutex;
void reader()
{
boost::shared_lock<boost::shared_mutex> lock(g_sharedMutex);
boost::this_thread::sleep(boost::posix_time::seconds(10));
}
void makeReaders()
{
while (1)
{
boost::thread ar(reader);
boost::this_thread::sleep(boost::posix_time::seconds(3));
}
}
boost::thread mr(makeReaders);
boost::this_thread::sleep(boost::posix_time::seconds(5));
boost::unique_lock<boost::shared_mutex> lock(g_sharedMutex);
...
the unique lock will never be acquired, because there are always going to be readers. I want a unique_lock that, when it starts waiting, prevents any new read locks from gaining access to the mutex (called a write-biased or write-preferred lock, based on my wiki searching). Is there a simple way to do this with boost? Or would I need to write my own?

Note that I won't comment on the win32 implementation because it's way more involved and I don't have the time to go through it in detail. That being said, it's interface is the same as the pthread implementation which means that the following answer should be equally valid.
The relevant pieces of the pthread implementation of boost::shared_mutex as of v1.51.0:
void lock_shared()
{
boost::this_thread::disable_interruption do_not_disturb;
boost::mutex::scoped_lock lk(state_change);
while(state.exclusive || state.exclusive_waiting_blocked)
{
shared_cond.wait(lk);
}
++state.shared_count;
}
void lock()
{
boost::this_thread::disable_interruption do_not_disturb;
boost::mutex::scoped_lock lk(state_change);
while(state.shared_count || state.exclusive)
{
state.exclusive_waiting_blocked=true;
exclusive_cond.wait(lk);
}
state.exclusive=true;
}
The while loop conditions are the most relevant part for you. For the lock_shared function (read lock), notice how the while loop will not terminate as long as there's a thread trying to acquire (state.exclusive_waiting_blocked) or already owns (state.exclusive) the lock. This essentially means that write locks have priority over read locks.
For the lock function (write lock), the while loop will not terminate as long as there's at least one thread that currently owns the read lock (state.shared_count) or another thread owns the write lock (state.exclusive). This essentially gives you the usual mutual exclusion guarantees.
As for deadlocks, well the read lock will always return as long as the write locks are guaranteed to be unlocked once they are acquired. As for the write lock, it's guaranteed to return as long as the read locks and the write locks are always guaranteed to be unlocked once acquired.
In case you're wondering, the state_change mutex is used to ensure that there's no concurrent calls to either of these functions. I'm not going to go through the unlock functions because they're a bit more involved. Feel free to look them over yourself, you have the source after all (boost/thread/pthread/shared_mutex.hpp) :)
All in all, this is pretty much a text book implementation and they've been extensively tested in a wide range of scenarios (libs/thread/test/test_shared_mutex.cpp and massive use across the industry). I wouldn't worry too much as long you use them idiomatically (no recursive locking and always lock using the RAII helpers). If you still don't trust the implementation, then you could write a randomized test that simulates whatever test case you're worried about and let it run overnight on hundreds of thread. That's usually a good way to tease out deadlocks.
Now why would you see that a read lock is acquired after a write lock is requested? Difficult to say without seeing the diagnostic code that you're using. Chances are that the read lock is acquired after your print statement (or whatever you're using) is completed and before state_change lock is acquired in the write thread.

pthreads: thread starvation caused by quick re-locking

I have a two threads, one which works in a tight loop, and the other which occasionally needs to perform a synchronization with the first:
// thread 1
while(1)
{
lock(work);
// perform work
unlock(work);
}
// thread 2
while(1)
{
// unrelated work that takes a while
lock(work);
// synchronizing step
unlock(work);
}
My intention is that thread 2 can, by taking the lock, effectively pause thread 1 and perform the necessary synchronization. Thread 1 can also offer to pause, by unlocking, and if thread 2 is not waiting on lock, re-lock and return to work.
The problem I have encountered is that mutexes are not fair, so thread 1 quickly re-locks the mutex and starves thread 2. I have attempted to use pthread_yield, and so far it seems to run okay, but I am not sure it will work for all systems / number of cores. Is there a way to guarantee that thread 1 will always yield to thread 2, even on multi-core systems?
What is the most effective way of handling this synchronization process?

You can build a FIFO "ticket lock" on top of pthreads mutexes, along these lines:
#include <pthread.h>
typedef struct ticket_lock {
pthread_cond_t cond;
pthread_mutex_t mutex;
unsigned long queue_head, queue_tail;
} ticket_lock_t;
#define TICKET_LOCK_INITIALIZER { PTHREAD_COND_INITIALIZER, PTHREAD_MUTEX_INITIALIZER }
void ticket_lock(ticket_lock_t *ticket)
{
unsigned long queue_me;
pthread_mutex_lock(&ticket->mutex);
queue_me = ticket->queue_tail++;
while (queue_me != ticket->queue_head)
{
pthread_cond_wait(&ticket->cond, &ticket->mutex);
}
pthread_mutex_unlock(&ticket->mutex);
}
void ticket_unlock(ticket_lock_t *ticket)
{
pthread_mutex_lock(&ticket->mutex);
ticket->queue_head++;
pthread_cond_broadcast(&ticket->cond);
pthread_mutex_unlock(&ticket->mutex);
}
Under this kind of scheme, no low-level pthreads mutex is held while a thread is within the ticketlock protected critical section, allowing other threads to join the queue.

In your case it is better to use condition variable to notify second thread when it is required to awake and perform all required operations.

pthread offers a notion of thread priority in its API. When two threads are competing over a mutex, the scheduling policy determines which one will get it. The function pthread_attr_setschedpolicy lets you set that, and pthread_attr_getschedpolicy permits retrieving the information.
Now the bad news:
When only two threads are locking / unlocking a mutex, I can’t see any sort of competition, the first who runs the atomic instruction takes it, the other blocks. I am not sure whether this attribute applies here.
The function can take different parameters (SCHED_FIFO, SCHED_RR, SCHED_OTHER and SCHED_SPORADIC), but in this question, it has been answered that only SCHED_OTHER was supported on linux)
So I would give it a shot if I were you, but not expect too much. pthread_yield seems more promising to me. More information available here.

Ticket lock above looks like the best. However, to insure your pthread_yield works, you could have a bool waiting, which is set and reset by thread2. thread1 yields as long as bool waiting is set.

Here's a simple solution which will work for your case (two threads). If you're using std::mutex then this class is a drop-in replacement. Change your mutex to this type and you are guaranteed that if one thread holds the lock and the other is waiting on it, once the first thread unlocks, the second thread will grab the lock before the first thread can lock it again.
If more than two threads happen to use the mutex simultaneously it will still function but there are no guarantees on fairness.
If you're using plain pthread_mutex_t you can easily change your locking code according to this example (unlock remains unchanged).
#include <mutex>
// Behaves the same as std::mutex but guarantees fairness as long as
// up to two threads are using (holding/waiting on) it.
// When one thread unlocks the mutex while another is waiting on it,
// the other is guaranteed to run before the first thread can lock it again.
class FairDualMutex : public std::mutex {
public:
void lock() {
_fairness_mutex.lock();
std::mutex::lock();
_fairness_mutex.unlock();
}
private:
std::mutex _fairness_mutex;
};

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js