In many places there is a suggestion to use std::defer_lock:
{
std::unique_lock<std::mutex> lk1(mutex1, std::defer_lock);
std::unique_lock<std::mutex> lk2(mutex2, std::defer_lock);
std::lock(lk1, lk2);
//Do some stuff
}
But what if I won't use std::defer_lock, and try to do simply that:
{
std::unique_lock<std::mutex> lk1(mutex1);
std::unique_lock<std::mutex> lk2(mutex2);
//Do some stuff
}
As far as I know, this code will lock mutex1, and then lock mutex2. Unlocking will be performed in the reverse order - mutex2, and then mutex1. There is no reason to deadlock.
Documentation says, that std::lock(lk1, lk2); in the first example would lock buth mutexes simultaneously. So, is this some sort of optimization?
The deadlock issue is a problem of the order of locking several mutexes. This might be apparent in easy applications, but very difficult in more (or too?) complex situations where the mutexes may not always be locked at the same function and they are usually not called lk1, lk2,... so by locking simultaneously, you don't create a situation where two threads wait on locked mutex, hence being unable to unlock the mutex they locked.
You might want to avoid mixing mutexes, but that can lead to other tricky situations. However when possible, it is best to let threads work on independent state and let 1 controlling thread accumulate the result.
Related
I'd like to write a function that is accessible only by a single thread at a time. I don't need busy waits, a brutal 'rejection' is enough if another thread is already running it. This is what I have come up with so far:
std::atomic<bool> busy (false);
bool func()
{
if (m_busy.exchange(true) == true)
return false;
// ... do stuff ...
m_busy.exchange(false);
return true;
}
Is the logic for the atomic exchange correct?
Is it correct to mark the two atomic operations as std::memory_order_acq_rel? As far as I understand a relaxed ordering (std::memory_order_relaxed) wouldn't be enough to prevent reordering.
Your atomic swap implementation might work. But trying to do thread safe programming without a lock is most always fraught with issues and is often harder to maintain.
Unless there's a performance improvement that's needed, then std::mutex with the try_lock() method is all you need, eg:
std::mutex mtx;
bool func()
{
// making use of std::unique_lock so if the code throws an
// exception, the std::mutex will still get unlocked correctly...
std::unique_lock<std::mutex> lck(mtx, std::try_to_lock);
bool gotLock = lck.owns_lock();
if (gotLock)
{
// do stuff
}
return gotLock;
}
Your code looks correct to me, as long as you leave the critical section by falling out, not returning or throwing an exception.
You can unlock with a release store; an RMW (like exchange) is unnecessary. The initial exchange only needs acquire. (But does need to be an atomic RMW like exchange or compare_exchange_strong)
Note that ISO C++ says that taking a std::mutex is an "acquire" operation, and releasing is is a "release" operation, because that's the minimum necessary for keeping the critical section contained between the taking and the releasing.
Your algo is exactly like a spinlock, but without retry if the lock's already taken. (i.e. just a try_lock). All the reasoning about necessary memory-order for locking applies here, too. What you've implemented is logically equivalent to the try_lock / unlock in #selbie's answer, and very likely performance-equivalent, too. If you never use mtx.lock() or whatever, you're never actually blocking i.e. waiting for another thread to do something, so your code is still potentially lock-free in the progress-guarantee sense.
Rolling your own with an atomic<bool> is probably good; using std::mutex here gains you nothing; you want it to be doing only this for try-lock and unlock. That's certainly possible (with some extra function-call overhead), but some implementations might do something more. You're not using any of the functionality beyond that. The one nice thing std::mutex gives you is the comfort of knowing that it safely and correctly implements try_lock and unlock. But if you understand locking and acquire / release, it's easy to get that right yourself.
The usual performance reason to not roll your own locking is that mutex will be tuned for the OS and typical hardware, with stuff like exponential backoff, x86 pause instructions while spinning a few times, then fallback to a system call. And efficient wakeup via system calls like Linux futex. All of this is only beneficial to the blocking behaviour. .try_lock leaves that all unused, and if you never have any thread sleeping then unlock never has any other threads to notify.
There is one advantage to using std::mutex: you can use RAII without having to roll your own wrapper class. std::unique_lock with the std::try_to_lock policy will do this. This will make your function exception-safe, making sure to always unlock before exiting, if it got the lock.
https://en.cppreference.com/w/cpp/thread/lock_tag
void transfer(bank_account &from, bank_account &to, int amount)
{
// lock both mutexes without deadlock
std::lock(from.m, to.m);
// make sure both already-locked mutexes are unlocked at the end of scope
std::lock_guard<std::mutex> lock1(from.m, std::adopt_lock);
std::lock_guard<std::mutex> lock2(to.m, std::adopt_lock);
// equivalent approach:
// std::unique_lock<std::mutex> lock1(from.m, std::defer_lock);
// std::unique_lock<std::mutex> lock2(to.m, std::defer_lock);
// std::lock(lock1, lock2);
from.balance -= amount;
to.balance += amount;
}
What do they gain by locking two mutexes at once?
What have they gained by defered lock here?
Please explain the reason behind that decision of theirs.
If I modify a bank account without locking it, someone else could try to modify it at the same time. This is a race and the result will be undefined behaviour (usually lost or magically created money).
While transferring money, I am modifying 2 bank accounts. So they both need to be locked.
The problem is that when locking more than one thing, every locker must lock and unlock in the same order, otherwise we get deadlocks.
When it's bank accounts, there is no natural order of locks. Thousands of threads could be transferring money in all directions.
So we need a method of locking more than one mutex in a way that works around this - this is std::lock
std::lock merely locks the mutex - it does not guarantee unlocking on exit from the current code block.
std::lock_guard<> unlocks the mutex it's referring to upon destruction (see RAII). This makes the code behave correctly in all circumstances - even when there is an exception which might cause an early exit from the current code block without the code flowing over statement such as to.m.unlock()
A good explanation (with examples) here: https://wiki.sei.cmu.edu/confluence/display/cplusplus/CON53-CPP.+Avoid+deadlock+by+locking+in+a+predefined+order
Extension on Richard Hodges's answer
What do they gain by locking two mutexes at once?
Richard explained nicely already, just a little bit more explicit: we avoid dead-lock this way (std::lock is implemented such that dead-lock won't occur).
What have they gained by deferred lock here?
Deferring the lock results in not acquiring it immediately. That's important because if they did so, they would just do it without any protection against dead-lock (which the subsequent std::lock then achieves).
About dead lock avoidance (see std::lock):
Locks the given Lockable objects lock1, lock2, ..., lockn using a deadlock avoidance algorithm to avoid deadlock.
The objects are locked by an unspecified series of calls to lock, try_lock, and unlock. [...]
Side note: another, much simpler algorithm avoiding dead locks is always locking the bank account with e. g. lower account number (AN) first. If a thread is waiting for the lock of higher AN, then the other thread holding it either already has both of the locks acquired or is waiting for the second – which cannot be the one of the first thread as it must have a yet higher AN.
This does not change much for arbitrary number of threads, any thread holding a lower lock is waiting for a higher one, if hold as well. If you draw a directed graph with edges from A to B if A is waiting for second lock that B holds, you'll get a (multi-) tree structure, but you won't ever have circular substructures (which would indicate a dead lock).
The bank account data structure has a lock for each account.
When transferring money from one account to another, we need to lock both accounts (since we are removing money from one and adding it to another). We would like this operation not to deadlock, so lock both at once using std::lock, since doing it like that ensures there isn't a deadlock.
After we finish the transaction we need to make sure we release the lock. This code does that using RAII. With the adopt_lock tag, we make the object adopt an already locked mutex (which will be released when lock1 falls out of scope).
With the defer_lock tag, we create a unique_lock for a currently unlocked mutex, with the intention of locking it later. Again, it will be unlocked when the unique_lock falls out of scope.
from and to are 2 accounts which may be used anywhere in the application separatelly.
By having mutex for each account, you make sure nobody uses from nor to accounts while you do the transfer.
lock_guard will release mutex when exiting from function.
I'm trying to solve the producer consumer problem in C++11.
I have an object which holds resources, and multiple threads can
add or consume those resources. My problem is when I try to implement
a "consume when available" method on that object.
Please assume that insert/remove operations are of trivial complexity.
A little explanation of the logic in code.
struct ResourceManager{
std::mutex mux;
std::unique_lock lock{mux};
std::condition_variable bell;
void addResource(/*some Resource*/){
lock.lock();
//add resource
lock.unlock();
bell.notify_one(); //notifies waiting consumer threads to consume
}
T getResource(){
while(true){
lock.lock();
if(/*resource is available*/){
//remove resource from the object
lock.unlock();
return resource;
}else{
//new unique lock mutex object wmux creation
lock.unlock(); //problem line
bell.wait(wmux); //waits until addResource rings the bell
continue;
}
}
}
};
Suppose the following scenario:
-Two threads, T1, T2, call addResource, getResource almost simultaneously.
-T2 locks the mutex, and finds out there are no more resources available,
so it has to block until a new resource is available.
So it unlocks the mutex and sets up the bell waiting.
-T1 runs match faster. When the mutex is unlocked,
it immediately adds the resource, and before T2 sets up the waiting bell,
T1 has already rang the bell, which notifies no one.
-T2 indefinitely waits for the bell to ring, but no further resources are added.
I'm making the assumption that a thread locking a mutex, may be the only one
to unlock it. So if I tried calling bell.wait before unlocking the mutex,
the mutex could never be unlocked.
I'd like to use no time-waiting or multiple times checking solution if possible.
So which way could I solve this problem in C++11?
lock.unlock(); //problem line
bell.wait(wmux); //waits until addResource rings the bell
Yes, this is the problem line, indeed.
To correctly use a condition variable as designed, you do not unlock a mutex before wait()ing on its related condition variable. wait()ing on a condition variable atomically unlocks it for the duration of the wait, and reacquires the mutex once the thread has been notify()-ed. Both the unlock-and-wait, and wake-up-after-being-notified-and-lock, are atomic operations.
Allnotify()s should be issued while the mutex is locked. All wait() are also done while the mutex is fully locked. Given that notify(), as I mentioned, is atomic, this results in all mutex-related operations being atomic and fully sequenced, including managing the resources protected by the mutex, and thread notification via condition variables, which is now protected by a mutex as well.
There are design patterns that can be implemented that notify condition variables without using mutex protection. But they're much harder to implement correctly and still achieve thread-safe semantics. Having all condition variable operations also protected by the mutex, in addition to everything else that the mutex protects, is much simpler to implement.
std::condition_variable::wait needs to be passed a locked std::unique_lock on your mutex. wait will unlock the mutex as part of its operation, and will re-lock it before it returns.
The normal way to use lock guards like std::lock_guard and std::unique_lock is to construct them locally and let their constructor lock your mutex and their destructor unlock it.
Also, you can avoid the external while loop in your original code by providing a predicate to std::condition_variable::wait.
struct ResourceManager {
std::mutex mux;
std::condition_variable bell;
void addResource(T resource)
{
std::lock_guard<std::mutex> lock{mux};
// Add the resource
bell.notify_one();
}
T getResource()
{
std::unique_lock<std::mutex> lock{mux};
bell.wait(lock, [this](){ return resourceIsAvailable(); });
return // the ressource
}
};
I have a multi-thread scientific application where several computing threads (one per core) have to store their results in a common buffer. This requires a mutex mechanism.
Working threads spend only a small fraction of their time writing to the buffer, so the mutex is unlocked most of the time, and locks have a high probability to succeed immediately without waiting for another thread to unlock.
Currently, I have used Qt's QMutex for the task, and it works well : the mutex has a negligible overhead.
However, I have to port it to c++11/STL only. When using std::mutex, the performance drops by 66% and the threads spend most of their time locking the mutex.
After another question, I figured that Qt uses a fast locking mechanism based on a simple atomic flag, optimized for cases where the mutex is not already locked. And falls back to a system mutex when concurrent locking occurs.
I would like to implement this in STL. Is there a simple way based on std::atomic and std::mutex ? I have digged in Qt's code but it seems overly complicated for my use (I do not need locks timeouts, pimpl, small footprint etc...).
Edit : I have tried a spinlock, but this does not work well because :
Periodically (every few seconds), another thread locks the mutexes and flushes the buffer. This takes some time, so all worker threads get blocked at this time. The spinlocks make the scheduling busy, causing the flush to be 10-100x slower than with a proper mutex. This is not acceptable
Edit : I have tried this, but it's not working (locks all threads)
class Mutex
{
public:
Mutex() : lockCounter(0) { }
void lock()
{
if(lockCounter.fetch_add(1, std::memory_order_acquire)>0)
{
std::unique_lock<std::mutex> lock(internalMutex);
cv.wait(lock);
}
}
void unlock();
{
if(lockCounter.fetch_sub(1, std::memory_order_release)>1)
{
cv.notify_one();
}
}
private:
std::atomic<int> lockCounter;
std::mutex internalMutex;
std::condition_variable cv;
};
Thanks!
Edit : Final solution
MikeMB's fast mutex was working pretty well.
As a final solution, I did:
Use a simple spinlock with a try_lock
When a thread fails to try_lock, instead of waiting, they fill a queue (which is not shared with other threads) and continue
When a thread gets a lock, it updates the buffer with the current result, but also with the results stored in the queue (it processes its queue)
The buffer flushing was made much more efficiently : the blocking part only swaps two pointers.
General Advice
As was mentioned in some comments, I'd first have a look, whether you can restructure your program design to make the mutex implementation less critical for your performance .
Also, as multithreading support in standard c++ is pretty new and somewhat infantile, you sometimes just have to fall back on platform specific mechanisms, like e.g. a futex on linux systems or critical sections on windows or non-standard libraries like Qt.
That being said, I could think of two implementation approaches that might potentially speed up your program:
Spinlock
If access collisions happen very rarely, and the mutex is only hold for short periods of time (two things one should strive to achieve anyway of course), it might be most efficient to just use a spinlock, as it doesn't require any system calls at all and it's simple to implement (taken from cppreference):
class SpinLock {
std::atomic_flag locked ;
public:
void lock() {
while (locked.test_and_set(std::memory_order_acquire)) {
std::this_thread::yield(); //<- this is not in the source but might improve performance.
}
}
void unlock() {
locked.clear(std::memory_order_release);
}
};
The drawback of course is that waiting threads don't stay asleep and steal processing time.
Checked Locking
This is essentially the idea you demonstrated: You first make a fast check, whether locking is actually needed based on an atomic swap operation and use a heavy std::mutex only if it is unavoidable.
struct FastMux {
//Status of the fast mutex
std::atomic<bool> locked;
//helper mutex and vc on which threads can wait in case of collision
std::mutex mux;
std::condition_variable cv;
//the maximum number of threads that might be waiting on the cv (conservative estimation)
std::atomic<int> cntr;
FastMux():locked(false), cntr(0){}
void lock() {
if (locked.exchange(true)) {
cntr++;
{
std::unique_lock<std::mutex> ul(mux);
cv.wait(ul, [&]{return !locked.exchange(true); });
}
cntr--;
}
}
void unlock() {
locked = false;
if (cntr > 0){
std::lock_guard<std::mutex> ul(mux);
cv.notify_one();
}
}
};
Note that the std::mutex is not locked in between lock() and unlock() but it is only used for handling the condition variable. This results in more calls to lock / unlock if there is high congestion on the mutex.
The problem with your implementation is, that cv.notify_one(); can potentially be called between if(lockCounter.fetch_add(1, std::memory_order_acquire)>0) and cv.wait(lock); so your thread might never wake up.
I didn't do any performance comparisons against a fixed version of your proposed implementation though so you just have to see what works best for you.
Not really an answer per definition, but depending on the specific task, a lock-free queue might help getting rid of the mutex at all. This would help the design, if you have multiple producers and a single consumer (or even multiple consumers). Links:
Though not directly C++/STL, Boost.Lockfree provides such a queue.
Another option is the lock-free queue implementation in "C++ Concurrency in Action" by Anthony Williams.
A Fast Lock-Free Queue for C++
Update wrt to comments:
Queue size / overflow:
Queue overflowing can be avoided by i) making the queue large enough or ii) by making the producer thread wait with pushing data once the queue is full.
Another option would be to use multiple consumers and multiple queues and implement a parallel reduction but this depends on how the data is treated.
Consumer thread:
The queue could use std::condition_variable and make the consumer thread wait until there is data.
Another option would be to use a timer for checking in regular intervals (polling) for the queue being non-empty, once it is non-empty the thread can continuously fetch data and the go back into wait-mode.
I'm wondering if it's possible to lock multiple mutexes at the same time, like:
Mutex1.Lock();
{
Mutex2.Lock();
{
// Code locked by mutex 1 and 2.
}
Mutex2.Unlock();
// Code locked by mutex 1.
}
Mutex1.Unlock();
It would be very useful for some situations. Thanks.
std::lock seems to exist for this purpose.
Locks the given Lockable objects lock1, lock2, ..., lockn using a deadlock avoidance algorithm to avoid deadlock.
The objects are locked by an unspecified series of calls to lock, try_lock, unlock. If a call to lock or unlock results in an exception, unlock is called for any locked objects before rethrowing.
http://en.cppreference.com/w/cpp/thread/lock
C++17 also provides scoped_lock for the specific purpose of locking multiple mutexes that prevents deadlock in a RAII style, similar to lock_guard.
#include<mutex>
std::mutex mtx1, mtx2;
void foo()
{
std::scoped_lock lck{mtx1, mtx2};
// proceed
}
It is possible but the order of locking must be consistent throughout the application otherwise deadlock is a likely result (if two threads acquire the locks in opposite order then each thread could be waiting on the other to release one of the locks).
Recommend using a scoped lock and unlock facility for exception safety, to ensure locks are always released (std::lock_guard with std::mutex for example):
std::mutex mtx1;
std::mutex mtx2;
std::lock_guard<std::mutex> mtx1_lock(mtx1);
{
std::lock_guard<std::mutex> mtx2_lock(mtx2);
{
}
}
If your compiler does not support these C++11 features boost has similar in boost::mutex and boost::lock_guard.