(Note: Much of this is redundant with commentary on Massive CPU load using std::lock (c++11), but I think this topic deserves its own question and answers.)
I recently encountered some sample C++11 code that looked something like this:
std::unique_lock<std::mutex> lock1(from_acct.mutex, std::defer_lock);
std::unique_lock<std::mutex> lock2(to_acct.mutex, std::defer_lock);
std::lock(lock1, lock2); // avoid deadlock
transfer_money(from_acct, to_acct, amount);
Wow, I thought, std::lock sounds interesting. I wonder what the standard says it does?
C++11 section 30.4.3 [thread.lock.algorithm], paragraphs (4) and (5):
template void lock(L1&, L2&, L3&...);
4 Requires: Each template parameter type shall meet the Lockable
requirements, [ Note: The unique_lock class template meets these
requirements when suitably instantiated. — end note ]
5 Effects: All arguments are locked via a sequence of calls to lock(),
try_lock(), or unlock() on each argument. The sequence of calls shall
not result in deadlock, but is otherwise unspecifed. [ Note: A
deadlock avoidance algorithm such as try-and-back-off must be used, but
the specifc algorithm is not specifed to avoid over-constraining
implementations. — end note ] If a call to lock() or try_lock() throws
an exception, unlock() shall be called for any argument that had been
locked by a call to lock() or try_lock().
Consider the following example. Call it "Example 1":
Thread 1 Thread 2
std::lock(lock1, lock2); std::lock(lock2, lock1);
Can this deadlock?
A plain reading of the standard says "no". Great! Maybe the compiler can order my locks for me, which would be kind of neat.
Now try Example 2:
Thread 1 Thread 2
std::lock(lock1, lock2, lock3, lock4); std::lock(lock3, lock4);
std::lock(lock1, lock2);
Can this deadlock?
Here again, a plain reading of the standard says "no". Uh oh. The only way to do that is with some kind of back-off-and-retry loop. More on that below.
Finally, Example 3:
Thread 1 Thread 2
std::lock(lock1,lock2); std::lock(lock3,lock4);
std::lock(lock3,lock4); std::lock(lock1,lock2);
Can this deadlock?
Once again, a plain reading of the standard says "no". (If the "sequence of calls to lock()" in one of these invocations is not "resulting in deadlock", what is, exactly?) However, I am pretty sure this is unimplementable, so I suppose it's not what they meant.
This appears to be one of the worst things I have ever seen in a C++ standard. I am guessing it started out as an interesting idea: Let the compiler assign a lock ordering. But once the committee chewed it up, the result is either unimplementable or requires a retry loop. And yes, that is a bad idea.
You can argue that "back off and retry" is sometimes useful. That is true, but only when you do not know which locks you are trying to grab up front. For example, if the identity of the second lock depends on data protected by the first (say because you are traversing some hierarchy), then you might have to do some grab-release-grab spinning. But in that case you cannot use this gadget, because you do not know all of the locks up front. On the other hand, if you do know which locks you want up front, then you (almost) always want simply to impose an ordering, not to loop.
Also, note that Example 1 can live-lock if the implementation simply grabs the locks in order, backs off, and retries.
In short, this gadget strikes me as useless at best. Just a bad idea all around.
OK, questions. (1) Are any of my claims or interpretations wrong? (2) If not, what the heck were they thinking? (3) Should we all agree that "best practice" is to avoid std::lock completely?
[Update]
Some answers say I am misinterpreting the standard, then go on to interpret it the same way I did, then confuse the specification with the implementation.
So, just to be clear:
In my reading of the standard, Example 1 and Example 2 cannot deadlock. Example 3 can, but only because avoiding deadlock in that case is unimplementable.
The entire point of my question is that avoiding deadlock for Example 2 requires a back-off-and-retry loop, and such loops are extremely poor practice. (Yes, some sort of static analysis on this trivial example could make that avoidable, but not in the general case.) Also note that GCC implements this thing as a busy loop.
[Update 2]
I think a lot of the disconnect here is a basic difference in philosophy.
There are two approaches to writing software, especially multi-threaded software.
In one approach, you throw a bunch of stuff together and run it to see how well it works. You are never convinced that your code has a problem unless someone can demonstrate that problem on a real system, right now, today.
In the other approach, you write code that can be rigorously analyzed to prove that it has no data races, that all of its loops terminate with probability 1, and so forth. You perform this analysis strictly within the machine model guaranteed by the language spec, not on any particular implementation.
Advocates of the latter approach are not impressed by any demonstrations on particular CPUs, compilers, compiler minor versions, operating systems, runtimes, etc. Such demonstrations are barely interesting and totally irrelevant. If your algorithm has a data race, it is broken, no matter what happens when you run it. If your algorithm has a livelock, it is broken, no matter what happens when you run it. And so forth.
In my world, the second approach is called "Engineering". I am not sure what the first approach is called.
As far as I can tell, the std::lock interface is useless for Engineering. I would love to be proven wrong.
I think you are misunderstanding the scope of the deadlock avoidance. That's understandable since the text seems to mention lock in two different contexts, the "multi-lock" std::lock and the individual locks carried out by that "multi-lock" (however the lockables implement it). The text for std::lock states:
All arguments are locked via a sequence of calls to lock(), try_lock(),or unlock() on each argument. The sequence of calls shall not result in deadlock
If you call std::lock passing ten different lockables, the standard guarantees no deadlock for that call. It's not guaranteed that deadlock is avoided if you lock the lockables outside the control of std::lock. That means thread 1 locking A then B can deadlock against thread 2 locking B then A. That was the case in your original third example, which had (pseudo-code):
Thread 1 Thread 2
lock A lock B
lock B lock A
As that couldn't have been std::lock (it only locked one resource), it must have been something like unique_lock.
The deadlock avoidance will occur if both threads attempt to lock A/B and B/A in a single call to std::lock, as per your first example. Your second example won't deadlock either since thread 1 will be backing off if the second lock is needed by a thread 2 already having the first lock. Your updated third example:
Thread 1 Thread 2
std::lock(lock1,lock2); std::lock(lock3,lock4);
std::lock(lock3,lock4); std::lock(lock1,lock2);
still has the possibility of deadlock since the atomicity of the lock is a single call to std::lock. For example, if thread 1 successfully locks lock1 and lock2, then thread 2 successfully locks lock3 and lock4, deadlock will ensue as both threads attempt to lock the resource held by the other.
So, in answer to your specific questions:
1/ Yes, I think you've misunderstood what the standard is saying. The sequence it talks about is clearly the sequence of locks carried out on the individual lockables passed to a single std::lock.
2/ As to what they were thinking, it's sometimes hard to tell :-) But I would posit that they wanted to give us capabilities that we would otherwise have to write ourselves. Yes, back-off-and-retry may not be an ideal strategy but, if you need the deadlock avoidance functionality, you may have to pay the price. Better for the implementation to provide it rather than it having to be written over and over again by developers.
3/ No, there's no need to avoid it. I don't think I've ever found myself in a situation where simple manual ordering of locks wasn't possible but I don't discount the possibility. If you do find yourself in that situation, this can assist (so you don't have to code up your own deadlock avoidance stuff).
In regard to the comments that back-off-and-retry is a problematic strategy, yes, that's correct. But you may be missing the point that it may be necessary if, for example, you cannot enforce the ordering of the locks before-hand.
And it doesn't have to be as bad as you think. Because the locks can be done in any order by std::lock, there's nothing stopping the implementation from re-ordering after each backoff to bring the "failing" lockable to the front of the list. That would mean those that were locked would tend to gather at the front, so that the std::lock would be less likely to be claiming resources unnecessarily.
Consider the call std::lock (a, b, c, d, e, f) in which f was the only lockable that was already locked. In the first lock attempt, that call would lock a through e then "fail" on f.
Following the back-off (unlocking a through e), the list to lock would be changed to f, a, b, c, d, e so that subsequent iterations would be less likely to unnecessarily lock. That's not fool-proof since other resources may be locked or unlocked between iterations, but it tends towards success.
In fact, it may even order the list initially by checking the states of all lockables so that all those currently locked are up the front. That would start the "tending toward success" operation earlier in the process.
That's just one strategy, there may well be others, even better. That's why the standard didn't mandate how it was to be done, on the off-chance there may be some genius out there who comes up with a better way.
Perhaps it would help if you thought of each individual call to std::lock(x, y, ...) as atomic. It will block until it can lock all of its arguments. If you don't know all of the mutexes you need to lock a-priori, do not use this function. If you do know, then you can safely use this function, without having to order your locks.
But by all means order your locks if that is what you prefer to do.
Thread 1 Thread 2
std::lock(lock1, lock2); std::lock(lock2, lock1);
The above will not deadlock. One of the threads will get both locks, and the other thread will block until the first one has released the locks.
Thread 1 Thread 2
std::lock(lock1, lock2, lock3, lock4); std::lock(lock3, lock4);
std::lock(lock1, lock2);
The above will not deadlock. Though this is tricky. If Thread 2 gets lock3 and lock4 before Thread1 does, then Thread 1 will block until Thread 2 releases all 4 locks. If Thread 1 gets the four locks first, then Thread 2 will block at the point of locking lock3 and lock4 until Thread 1 releases all 4 locks.
Thread 1 Thread 2
std::lock(lock1,lock2); std::lock(lock3,lock4);
std::lock(lock3,lock4); std::lock(lock1,lock2);
Yes, the above can deadlock. You can view the above as exactly equivalent to:
Thread 1 Thread 2
lock12.lock(); lock34.lock();
lock34.lock(); lock12.lock();
Update
I believe a misunderstanding is that dead-lock and live-lock are both correctness issues.
In actual practice, dead-lock is a correctness issue, as it causes the process to freeze. And live-lock is a performance issue, as it causes the process to slow down, but it still completes its task correctly. The reason is that live-lock will not (in practice) sustain itself indefinitely.
<disclaimer>
There are forms of live-lock that can be created which are permanent, and thus equivalent to dead-lock. This answer does not address such code, and such code is not relevant to this issue.
</disclaimer>
The yield shown in this answer is a significant performance optimization which significantly decreases live-lock, and thus significantly increases the performance of std::lock(x, y, ...).
Update 2
After a long delay, I have written a first draft of a paper on this subject. The paper compares 4 different ways of getting this job done. It contains software you can copy and paste into your own code and test yourself:
http://howardhinnant.github.io/dining_philosophers.html
Your confusion with the standardese seems to be due to this statement
5 Effects: All arguments are locked via a sequence of calls to lock(),
try_lock(), or unlock() on each argument.
That does not imply that std::lock will recursively call itself with each argument to the original call.
Objects that satisfy the Lockable concept (§30.2.5.4 [thread.req.lockable.req]) must implement all 3 of those member functions. std::lock will invoke these member functions on each argument, in an unspecified order, to attempt to acquire a lock on all objects, while doing something implementation defined to avoid deadlock.
Your example 3 has a potential for deadlock because you're not issuing a single call to std::lock with all objects that you want to acquire a lock on.
Example 2 will not cause a deadlock, Howard's answer explains why.
Did C++11 adopt this function from Boost?
If so, Boost's description is instructive (emphasis mine):
Effects: Locks the Lockable objects supplied as arguments in an unspecified and
indeterminate order in a way that avoids deadlock. It is safe to call
this function concurrently from multiple threads with the same mutexes
(or other lockable objects) in different orders without risk of
deadlock. If any of the lock() or try_lock() operations on the
supplied Lockable objects throws an exception any locks acquired by
the function will be released before the function exits.
Related
In general it is a good practice to declare a swap and move noexcept as that allows to provide some exception guarantee.
At the same time writing a thread-safe class often implies adding a mutex protecting the internal resources from races.
If I want to implement a swap function for such a class the straightforward solution is to lock in a safe way the resources of both arguments of the swap and then perform the resource swap as, for example, clearly answered in the answer to this question: Implementing swap for class with std::mutex .
The problem with such an algorithm is that a mutex lock is not noexcept, therefore swap cannot, strictly speaking, be noexcept. Is there a solution to safely swap two objects of a class with a mutex?
The only possibility that comes to my mind is to store the resource as a handle so that the swap becomes a simple pointer swap which can be done atomically.
Otherwise one could consider the lock exceptions as unrecoverable error which should anyway terminate the program, but this solution feels like just a way to put the dust under the carpet.
EDIT:
As came out in the comments, I know that the exceptions thrown by the mutexes are not arbitrary but then the question can be rephrased as such:
Are there robust practices to limit the situation a mutex can throw to those when it is actually an unrecoverable OS problem?
What comes to my mind is to check, in the swap algorithm, whether the two objects to swap are not the same. That is a clear deadlock situation which will trigger an exception in the best case scenario but can be easily checked for.
Are there other similar triggers which one can safely check to make a swap function robust and practically noexcept for all the situation that matter?
On POSIX systems it is common for std::mutex to be a thin wrapper around pthread_mutex_t, for which lock and unlock function can fail when:
There is an attempt to acquire already owned lock
The mutex object is not initialized or has been destroyed already
Both of the above are UB in C++ and are not even guaranteed to be returned by POSIX. On Windows both are UB if std::mutex is a wrapper around SRWLOCK.
So it seems that the main point of allowing lock and unlock functions to throw is to signal about errors in program, not to make programmer expect and handle them.
This is confirmed by the recommended locking pattern: the destructor ~unique_lock is noexcept(true), but is supposed to call unlock which is noexcept(false). That means if exception is thrown by unlock function, the whole program gets terminated by std::terminate.
The standard also mentions this:
The error conditions for error codes, if any, reported by member
functions of the mutex types shall be:
(4.1) — resource_unavailable_try_again — if any native handle type
manipulated is not available.
(4.2) — operation_not_permitted — if the thread does not have the
privilege to perform the operation.
(4.3) — invalid_argument — if any native handle type manipulated as
part of mutex construction is incorrect
In theory you might encounter operation_not_permitted error, but situations when this happens are not really defined in the standard.
So unless you cause UB in your program related to the std::mutex usage or use the mutex in some OS-specific scenario, quality implementations of lock and unlock should never throw.
Among the common implementations, there is at least one that might be of low quality: std::mutex implemented on top of CRITICAL_SECTION in old versions of Windows (I think Windows XP and earlier) can throw after failing to lazily allocate internal event during contention. On the other hand, even earlier versions allocated this event during initialization to prevent failing later, so std::mutex::mutex constructor might need to throw there (even though it is noexcept(true) in the standard).
I'm a bit confused about the requirements in terms of thread-safety placed on std::promise::set_value().
The standard says:
Effects: Atomically stores the value r in the shared state and makes
that state ready
However, it also says that promise::set_value() can only be used to set a value once. If it is called multiple times, a std::future_error is thrown. So you can only set the value of a promise once.
And indeed, just about every tutorial, online code sample, or actual use case for std::promise involves a communication channel between 2 threads, where one thread calls std::future::get(), and the other thread calls std::promise::set_value().
I've never seen a use case where multiple threads might call std::promise::set_value(), and even if they did, all but one would cause a std::future_error exception to be thrown.
So why does the standard mandate that calls to std::promise::set_value() are atomic? What is the use case for calling std::promise::set_value() from multiple threads concurrently?
EDIT:
Since the top-voted answer here is not really answering my question, I assume what I'm asking is unclear. So, to clarify: I'm aware of what futures and promises are for and how they work. My question is why, specifically, does the standard insist that std::promise::set_value() must be atomic? This is a more subtle question than "why must there not be a race between calls to promise::set_value() and calls to future::get()"?
In fact, many of the answers here (incorrectly) respond that the reason is because if std::promise::set_value() wasn't atomic, then std::future::get() could potentially cause a race condition. But this is not true.
The only requirement to avoid a race condition is that std::promise::set_value() must have a happens-before relationship with std::future::get() - in other words, it must be guaranteed that when std::future::wait() returns, std::promise::set_value() has completed.
This is completely orthogonal to std::promise::set_value() itself being atomic or not. In a typical implementation using condition variables, std::future::get()/wait() would wait on a condition variable. Then, std::promise::set_value() could non-atomically perform any arbitrarily complex computation to set the actual value. Then it would notify the shared condition variable, (implying a memory fence with release semantics), and std::future::get() would wake up and safely read the result.
So, std::promise::set_value() itself does not need to be atomic to avoid a race condition here - it simply needs to satisfy a happens-before relationship with std::future::get().
So again, my question is: why does the C++ standard insist that std::promise::set_value() must actually be an atomic operation, as if a call to std::promise::set_value() was performed entirely under a mutex lock? I see no reason why this requirement should exist, unless there is some reason or use case for multiple threads calling std::promise::set_value() concurrently. And I can't think of such a use-case, hence this question.
If it was not an atomic store, then two threads could simultaneously call promise::set_value, which does the following:
check that the future is not ready (i.e., has a stored value or exception)
store the value
mark the state ready
release anything blocking on the shared state becoming ready
By making this sequence atomic, the first thread to execute (1) gets all the way through to (3), and any other thread calling promise::set_value at the same time will fail at (1) and raise a future_error with promise_already_satisfied.
Without the atomicity, two threads could potentially store their value, and then one would successfully mark the state ready, and the other would raise an exception, i.e. the same result, except that it might be the value from the thread that saw an exception that got through.
In many cases that might not matter which thread 'wins', but when it does matter, without the atomicity guarantee you would need to wrap another mutex around the promise::set_value call. Other approaches such as compare-and-exchange wouldn't work because you can't check the future (unless it's a shared_future) to see if your value won or not.
When it doesn't matter which thread 'wins', you could give each thread its own future, and use std::experimental::when_any to collect the first result that happened to become available.
Edit after some historical research:
Although the above (two threads using the same promise object) doesn't seem like a good use-case, it was certainly envisaged by one of the contemporary papers of the introduction of future to C++: N2744. This paper proposed a couple of use-cases which had such conflicting threads calling set_value, and I'll quote them here:
Second, consider use cases where two or more asynchronous operations are performed in parallel and "compete" to satisfy the promise. Some examples include:
A sequence of network operations (e.g. request a web page) is performed in conjunction with a wait on a timer.
A value may be retrieved from multiple servers. For redundancy, all servers are tried but only the first value obtained is needed.
In both examples, the first asynchronous operation to complete is the one that satisfies the promise. Since either operation may complete second, the code for both must be written to expect that calls to set_value() may fail.
I've never seen a use case where multiple threads might call
std::promise::set_value(), and even if they did, all but one would
cause a std::future_error exception to be thrown.
You missed the whole idea of promises and futures.
Usually, we have a pair of promise and a future. the promise is the object you push the asynchronous result or the exception, and the future is the object you pull the asynchronous result or the exception.
Under most cases, the future and the promise pair do not reside on the same thread, (otherwise we would use a simple pointer). so, you might pass the promise to some thread, threadpool, or some third library asynchronous function, and set the result from there, and pull the result in the caller thread.
setting the result with std::promise::set_value must be atomic, not because many promises set the result, but because an object (the future) which resides on another thread must read the result, and doing it un-atomically is undefined behavior, so setting the value and pulling it (either by calling std::future::get or std::future::then) must happen atomically
Remember, every future and promise has a shared state, setting the result from one thread updates the shared state, and getting the result reads from the shared state. like every shared state/memory in C++, when it's done from multiple threads, the update/reading must happen under a lock. otherwise it's undefined behavior.
These are all good answers, but there's one additional point that's essential. Without atomicity of setting a value, reading the value may be subject to observability side-effects.
E.g., in a naive implementation:
void thread1()
{
// do something. Maybe read from disk, or perform computation to populate value
v = value;
flag = true;
}
void thread2()
{
if(flag)
{
v2 = v;//Here we have a read problem.
}
}
Atomicity in the std::promise<> allows you to avoid the very basic race condition between writing a value in one thread and reading in another. Of course, if flag were std::atomic<> and the proper fence flags are used, you no longer have any side effects, and std::promise guarantees that.
There is a widely known way of locking multiple locks, which relies on choosing fixed linear ordering and aquiring locks according to this ordering.
That was proposed, for example, in the answer for "Acquire a lock on two mutexes and avoid deadlock". Especially, the solution based on address comparison seems to be quite elegant and obvious.
When I tried to check how it is actually implemented, I've found, to my surprise, that this solution in not widely used.
To quote the Kernel Docs - Unreliable Guide To Locking:
Textbooks will tell you that if you always lock in the same order, you
will never get this kind of deadlock. Practice will tell you that this
approach doesn't scale: when I create a new lock, I don't understand
enough of the kernel to figure out where in the 5000 lock hierarchy it
will fit.
PThreads doesn't seem to have such a mechanism built in at all.
Boost.Thread came up with
completely different solution, lock() for multiple (2 to 5) mutexes is based on trying and locking as many mutexes as it is possible at the moment.
This is the fragment of the Boost.Thread source code (Boost 1.48.0, boost/thread/locks.hpp:1291):
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
unsigned const lock_count=3;
unsigned lock_first=0;
for(;;)
{
switch(lock_first)
{
case 0:
lock_first=detail::lock_helper(m1,m2,m3);
if(!lock_first)
return;
break;
case 1:
lock_first=detail::lock_helper(m2,m3,m1);
if(!lock_first)
return;
lock_first=(lock_first+1)%lock_count;
break;
case 2:
lock_first=detail::lock_helper(m3,m1,m2);
if(!lock_first)
return;
lock_first=(lock_first+2)%lock_count;
break;
}
}
}
where lock_helper returns 0 on success and number of mutexes that weren't successfully locked otherwise.
Why is this solution better, than comparing addresses or any other kind of ids? I don't see any problems with pointer comparison, which can be avoided using this kind of "blind" locking.
Are there any other ideas on how to solve this problem on a library level?
From the bounty text:
I'm not even sure if I can prove correctness of the presented Boost solution, which seems more tricky than the one with linear order.
The Boost solution cannot deadlock because it never waits while already holding a lock. All locks but the first are acquired with try_lock. If any try_lock call fails to acquire its lock, all previously acquired locks are freed. Also, in the Boost implementation the new attempt will start from the lock failed to acquire the previous time, and will first wait till it is available; it's a smart design decision.
As a general rule, it's always better to avoid blocking calls while holding a lock. Therefore, the solution with try-lock, if possible, is preferred (in my opinion). As a particular consequence, in case of lock ordering, the system at whole might get stuck. Imagine the very last lock (e.g. the one with the biggest address) was acquired by a thread which was then blocked. Now imagine some other thread needs the last lock and another lock, and due to ordering it will first get the other one and will wait on the last lock. Same can happen with all other locks, and the whole system makes no progress until the last lock is released. Of course it's an extreme and rather unlikely case, but it illustrates the inherent problem with lock ordering: the higher a lock number the more indirect impact the lock has when acquired.
The shortcoming of the try-lock-based solution is that it can cause livelock, and in extreme cases the whole system might also get stuck for at least some time. Therefore it is important to have some back-off schema that make pauses between locking attempts longer with time, and perhaps randomized.
Sometimes, lock A needs to be acquired before lock B does. Lock B might have either a lower or a higher address, so you can't use address comparison in this case.
Example: When you have a tree data-structure, and threads try to read and update nodes, you can protect the tree using a reader-writer lock per node. This only works if your threads always acquire locks top-down root-to-leave. The address of the locks does not matter in this case.
You can only use address comparison if it does not matter at all which lock gets acquired first. If this is the case, address comparison is a good solution. But if this is not the case you can't do it.
I guess the Linux kernel requires certain subsystems to be locked before others are. This cannot be done using address comparison.
The "address comparison" and similar approaches, although used quite often, are special cases. They works fine if you have
a lock-free mechanism to get
two (or more) "items" of the same kind or hierarchy level
any stable ordering schema between those items
For example: You have a mechanism to get two "accounts" from a list. Assume that the access to the list is lock-free. Now you have pointers to both items and want to lock them. Since they are "siblings" you have to choose which one to lock first. Here the approach using addresses (or any other stable ordering schema like "account id") is OK.
But the linked Linux text talks about "lock hierarchies". This means locking not between "siblings" (of the same kind) but between "parent" and "children" which might be from different types. This may happen in actual tree structures as well in other scenarios.
Contrived example: To load a program you must
lock the file inode,
lock the process table
lock the destination memory
These three locks are not "siblings" not in a clear hierarchy. The locks are also not taken directly one after the other - each subsystem will take the locks at free will. If you consider all usecases where those three (and more) subsystems interact you see, that there is no clear, stable ordering you can think of.
The Boost library is in the same situation: It strives to provide generic solutions. So they cannot assume the points from above and must fall back to a more complicated strategy.
One scenario when address compare will fail is if you use the proxy pattern.
You can delegate the locks to the same object and the addresses will be different.
Consider the following example
template<typename MutexType>
class MutexHelper
{
MutexHelper(MutexType &m) : _m(m) {}
void lock()
{
std::cout <<"locking ";
m.lock();
}
void unlock()
{
std::cout <<"unlocking ";
m.unlock();
}
MutexType &_m;
};
if the function
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3);
will actually use address compare the following code ca produce a deadlock
Mutex m1;
Mutex m1;
thread1
MutexHelper hm1(m1);
MutexHelper hm2(m2);
lock(hm1, hm2);
thread2:
MutexHelper hm2(m2);
MutexHelper hm1(m1);
lock(hm1, hm2);
EDIT:
this is an interesting thread that share some light on boost::lock implementation
thread-best-practice-to-lock-multiple-mutexes
Address compare does not work for inter-process shared mutexes (named synchronization objects).
I use boost::thread to manage threads. In my program i have pool of threads (workers) that are activated sometimes to do some job simultaneously.
Now i use boost::condition_variable: and all threads are waiting inside boost::condition_variable::wait() call on their own conditional_variableS objects.
Can i AVOID using mutexes in classic scheme, when i work with conditional_variables? I want to wake up threads, but don't need to pass some data to them, so don't need a mutex to be locked/unlocked during awakening process, why should i spend CPU on this (but yes, i should remember about spurious wakeups)?
The boost::condition_variable::wait() call trying to REACQUIRE the locking object when CV received the notification. But i don't need this exact facility.
What is cheapest way to awake several threads from another thread?
If you don't reacquire the locking object, how can the threads know that they are done waiting? What will tell them that? Returning from the block tells them nothing because the blocking object is stateless. It doesn't have an "unlocked" or "not blocking" state for it to return in.
You have to pass some data to them, otherwise how will they know that before they had to wait and now they don't? A condition variable is completely stateless, so any state that you need must be maintained and passed by you.
One common pattern is to use a mutex, condition variable, and a state integer. To block, do this:
Acquire the mutex.
Copy the value of the state integer.
Block on the condition variable, releasing the mutex.
If the state integer is the same as it was when you coped it, go to step 3.
Release the mutex.
To unblock all threads, do this:
Acquire the mutex.
Increment the state integer.
Broadcast the condition variable.
Release the mutex.
Notice how step 4 of the locking algorithm tests whether the thread is done waiting? Notice how this code tracks whether or not there has been an unblock since the thread decided to block? You have to do that because condition variables don't do it themselves. (And that's why you need to reacquire the locking object.)
If you try to remove the state integer, your code will behave unpredictably. Sometimes you will block too long due to missed wakeups and sometimes you won't block long enough due to spurious wakeups. Only a state integer (or similar predicate) protected by the mutex tells the threads when to wait and when to stop waiting.
Also, I haven't seen how your code uses this, but it almost always folds into logic you're already using. Why did the threads block anyway? Is it because there's no work for them to do? And when they wakeup, are they going to figure out what to do? Well, finding out that there's no work for them to do and finding out what work they do need to do will require some lock since it's shared state, right? So there almost always is already a lock you're holding when you decide to block and need to reacquire when you're done waiting.
For controlling threads doing parallel jobs, there is a nice primitive called a barrier.
A barrier is initialized with some positive integer value N representing how many threads it holds. A barrier has only a single operation: wait. When N threads call wait, the barrier releases all of them. Additionally, one of the threads is given a special return value indicating that it is the "serial thread"; that thread will be the one to do some special job, like integrating the results of the computation from the other threads.
The limitation is that a given barrier has to know the exact number of threads. It's really suitable for parallel processing type situations.
POSIX added barriers in 2003. A web search indicates that Boost has them, too.
http://www.boost.org/doc/libs/1_33_1/doc/html/barrier.html
Generally speaking, you can't.
Assuming the algorithm looks something like this:
ConditionVariable cv;
void WorkerThread()
{
for (;;)
{
cv.wait();
DoWork();
}
}
void MainThread()
{
for (;;)
{
ScheduleWork();
cv.notify_all();
}
}
NOTE: I intentionally omitted any reference to mutexes in this pseudo-code. For the purposes of this example, we'll suppose ConditionVariable does not require a mutex.
The first time through MainTnread(), work is queued and then it notifies WorkerThread() that it should execute its work. At this point two things can happen:
WorkerThread() completes DoWork() before MainThread() can complete ScheduleWork().
MainThread() completes ScheduleWork() before WorkerThread() can complete DoWork().
In case #1, WorkerThread() comes back around to sleep on the CV, and is awoken by the next cv.notify() and all is well.
In case #2, MainThread() comes back around and notifies... nobody and continues on. Meanwhile WorkerThread() eventually comes back around in its loop and waits on the CV but it is now one or more iterations behind MainThread().
This is known as a "lost wakeup". It is similar to the notorious "spurious wakeup" in that the two threads now have different ideas about how many notify()s have taken place. If you are expecting the two threads to maintain synchrony (and usually you are), you need some sort of shared synchronization primitive to control it. This is where the mutex comes in. It helps avoid lost wakeups which, arguably, are a more serious problem than the spurious variety. Either way, the effects can be serious.
UPDATE: For further rationale behind this design, see this comment by one of the original POSIX authors: https://groups.google.com/d/msg/comp.programming.threads/cpJxTPu3acc/Hw3sbptsY4sJ
Spurious wakeups are two things:
Write your program carefully, and make sure it works even if you
missed something.
Support efficient SMP implementations
There may be rare cases where an "absolutely, paranoiacally correct"
implementation of condition wakeup, given simultaneous wait and
signal/broadcast on different processors, would require additional
synchronization that would slow down ALL condition variable operations
while providing no benefit in 99.99999% of all calls. Is it worth the
overhead? No way!
But, really, that's an excuse because we wanted to force people to
write safe code. (Yes, that's the truth.)
boost::condition_variable::notify_*(lock) does NOT require that the caller hold the lock on the mutex. THis is a nice improvement over the Java model in that it decouples the notification of threads with the holding of the lock.
Strictly speaking, this means the following pointless code SHOULD DO what you are asking:
lock_guard lock(mutex);
// Do something
cv.wait(lock);
// Do something else
unique_lock otherLock(mutex);
//do something
otherLock.unlock();
cv.notify_one();
I do not believe you need to call otherLock.lock() first.
I'm confused by the example given by Leo Davidson in is Ccriticalsection usable in production?. Leo gives three code blocks introduced as "Wrong (his example)", "Right", and "Even better (so you get RAII)".
After dismissing the first block as "Wrong", Leo acknowledges later that this is something that can occur if a function that obtains a lock calls another function which obtains the same lock. Fine - there is a real danger here to avoid, and the example is not so much "wrong" as an easy trap to fall into through careless programming.
But the second and third examples confuse me completely... because we have one sync object (the CCriticalSection crit) which is used for two CSingleLock locks... implying that crit is not a lockable thing at all, but only the mechanism which does the locking for an independent object or objects. The trouble is, there is a comment saying "crit is unlocked now" right at the end... which contradicts that implication. Also... other comments qualify themselves by the need to test IsLocked()... when in my understanding, the CCriticalSection cannot timeout, and will only ever return if IsLocked() is TRUE.
The Microsoft documentation I have scanned is really not clear about what role the CSyncObject plays and the CSingleLock or CMultiLock plays. That's my main concern. Can anyone point to documentation that definitively says you can create two locks using a single sync object as Leo has suggested here?
After dismissing the first block as
"Wrong", Leo acknowledges later that
this is something that can occur if a
function that obtains a lock calls
another function which obtains the
same lock. Fine - there is a real
danger here to avoid, and the example
is not so much "wrong" as an easy trap
to fall into through careless
programming.
The "wrong" first block is always wrong and should never be something you do, whether explicitly or by accident. You cannot use a CSingleLock to obtain multiple locks at the same time.
As its name suggests, CSingleLock is an object which manages one lock on one synchronization object. (The underlying synchronization object may be capable of being locked multiple times, but not via just a single CSingleLock.)
I meant that the other two code-blocks were situations you could run into legitimately.
You never need to lock the same CCriticalSection if you already have a lock on it (since you only need one lock to know you own the object), but you may lock it multiple times (usually as a result of holding the lock, then calling a function which gets the lock itself in case it is called by something that doesn't already have it). That's fine.
But the second and third examples
confuse me completely... because we
have one sync object (the
CCriticalSection crit) which is used
for two CSingleLock locks... implying
that crit is not a lockable thing at
all, but only the mechanism which does
the locking for an independent object
or objects.
You can lock a CCriticalSection directly (and multiple times if you want to). It has Lock and Unlock methods for doing that.
If you do that, though, you have to ensure that you have matching Unlock calls for every one of your Lock calls. It can be easy to miss one (especially if you use early returns or exceptions where an Unlock later in a function may be bypassed entirely).
Using a CSingleLock to lock a CCriticalSection is usually better because it will release the lock it holds automatically when it goes out of scope (including if you return early, throw an exception or whatever).
Can anyone point to documentation that
definitively says you can create two
locks using a single sync object as
Leo has suggested here?
Although I couldn't find the source, CCriticalSection (like most MFC objects) is almost certainly a very thin wrapper around the Win32 equivalent, in this case CRITICAL_SECTION. The documentation on EnterCriticalSection tells you:
After a thread has ownership of a
critical section, it can make
additional calls to
EnterCriticalSection or
TryEnterCriticalSection without
blocking its execution. This prevents
a thread from deadlocking itself while
waiting for a critical section that it
already owns. The thread enters the
critical section each time
EnterCriticalSection and
TryEnterCriticalSection succeed. A
thread must call LeaveCriticalSection
once for each time that it entered the
critical section.