Does order of unlocking mutexes make a difference here?

Does order of unlocking mutexes make a difference here? - c++

Let's say I have two variables, protected_var1 and protected_var2. Let's further assume that these variables are updated via multiple threads, and are fairly independent in that usually one or the other but not both is worked on - so they both have their own mutex guard for efficiency.
Assuming:
-I always lock mutexes in order (mutex1 then mutex2) in my code in regions where both locks are required.
-Both mutexes are used many other places by them selves (like just lock mutex1, or just lock mutex2).
Does the order in which I unlock the mutexes at the end of a function using both make a difference in this situation?
void foo()
{
pthread_mutex_lock(&mutex1);
pthread_mutex_lock(&mutex2);
int x = protected_var1 + protected_var2;
pthread_mutex_unlock(&mutex1); //Does the order of the next two lines matter?
pthread_mutex_unlock(&mutex2);
}
I was asked a question a long time ago in an interview regarding this situation, and I came out feeling that the answer was yes - the order of those two unlocks does matter. I cannot for the life of me figure out how a deadlock could result from this though if the locks are always obtained in the same order wherever both are used.

The order shouldn't matter, as long as you don't attempt to acquire
another lock between the releases. The important thing is to always
acquire the locks in the same order; otherwise, you risk a deadlock.
EDIT:
To expand on the constraint: You must establish a strict ordering among
the mutexes, e.g. mutex1 precedes mutex2 (but this rule is valid for
any number of mutexes). You may only request a lock on a mutex if you
don't hold a mutex which comes after it in the order; e.g. you may not
request a lock on mutex1 if you hold a lock on mutex2. Anytime
these rules are respected, you should be safe. With regards to
releasing, if you release mutex1, then try to reacquire it before
releasing mutex2, you've violated the rule. In this regard, there may
be some advantage in respecting a stack-like order: last acquired is
always the first released. But it's sort of an indirect effect: the
rule is that you cannot request a lock on mutex1 if you hold one on
mutex2. Regardless of whether you had a lock on mutex1 when you
acquired the lock on mutex2 or not.

I cannot for the life of me figure out how a deadlock could result from this though if the locks are always obtained in the same order wherever both are used.
In these circumstances, I don't think the order of unlocking the mutexes could be the cause of a deadlock.
Since pthread_mutex_unlock() doesn't block, both mutexes would always get unlocked regardless of the order of the two calls.
Note that if you attempt to acquire any locks between the two unlock calls, this can change the picture completely.

It doesn't matter for correctness of locking. The reason is that, even supposing some other thread is waiting to lock mutex1 and then mutex2, the worst case is that it gets immediately scheduled as soon as you release mutex1 (and acquires mutex1). It then blocks waiting for mutex2, which the thread you're asking about will release as soon as it gets scheduled again, and there's no reason that shouldn't happen soon (immediately, if these are the only two threads in play).
So there might be a small cost in performance in that exact situation, compared with if you released mutex2 first and so there was only one rescheduling operation. Nothing you'd normally expect to predict or worry about, though, it's all within the boundaries of "scheduling often isn't deterministic".
The order you release the locks could certainly affect scheduling in general, though. Suppose that there are two threads waiting for your thread, and one of them is blocked on mutex1 while the other is blocked on mutex2. It might turn out that whichever lock you release first, that thread gets to run first, simply because your thread has outlived its welcome (consumed more than an entire time-slice), and hence gets descheduled as soon as anything else is runnable. But that can't cause a fault in an otherwise-correct program: you aren't allowed to rely on your thread being descheduled as soon as it releases the first lock. So either order of those two waiting threads running, both running simultaneously if you have multiple cores, or the two alternating on one core, must all be equally safe whichever order you release the locks.

The order of unlocking cannot cause deadlocks. However, if give the opportunity, I recommend unlocking them in reverse locking order. This has negligible effect on the running of the code. However, developers are used to thinking in terms of scopes, and scopes "close" in reverse order. Seeing them in reverse order to simply think of scoping locks. This brings me to the second point, which is that, in most cases, it is safest to replace the direct calls to lock and unlock with a stack based guard that calls them for you. Doing so yields the greatest degree of correctness for the least mental effort, and also happens to be safe in the presence of exceptions (which can really muck with a manual unlock)!
A simple guard (there are many out there.. this just just a quick roll-your-own):
class StMutexLock
{
public:
StMutexLock(pthread_mutex_t* inMutex)
: mMutex(inMutex)
{
pthread_mutex_lock(mMutex);
}
~StMutexUnlock()
{
pthread_mutex_unlock(mMutex);
}
private:
pthread_mutex_t* mMutex;
}
{
StMutexLock lock2(&mutex2);
StMutexLock lock1(&mutex1);
int x = protected_var1 + protected_var2;
doProtectedVar1ThingThatCouldThrow(); // exceptions are no problem!
// no explicit unlock required. Destructors take care of everything
}

No, it doesn't matter. A deadlock cannot result from this; both unlock operators are guaranteed to succeed (bar heap corruption or similar problems).

The order of unlocking isn't a problem here, but the order of locking can be a problem.
Consider:
void foo()
{
pthread_mutex_lock(&mutex1);
pthread_mutex_lock(&mutex2);
int x = protected_var1 + protected_var2;
pthread_mutex_unlock(&mutex1);
pthread_mutex_unlock(&mutex2);
}
void bar()
{
pthread_mutex_lock(&mutex2);
pthread_mutex_lock(&mutex1);
int x = protected_var1 + protected_var2;
pthread_mutex_unlock(&mutex1);
pthread_mutex_unlock(&mutex2);
}
This can lead to a deadlock since foo might have locked mutex1 and now waits for mutex2 while bar has locked mutex2 and now waits for mutex1. So it's a good idea to somehow ensure that nested mutex locks always get locked in the same order.

What I can see is that if another operation is taking mutex2 and keeps it for a long time, your foo() function will be stuck after pthread_mutex_lock(&mutex1); and will probably have some performance hit.

So long as whenever var1 and var2 are locked they are locked in the same order you are safe regardless of the freeing order. In fact freeing in the order they were locked is the RAII freeing behavior found in STL and BOOST locks.

void * threadHandle (void *arg)
{
// Try to lock the first mutex...
pthread_mutex_lock (&mutex_1);
// Try to lock the second mutex...
while (pthread_mutex_trylock(&mutex_2) != 0) // Test if already locked
{
// Second mutex is locked by some other thread. Unlock the first mutex so that other threads won't starve or deadlock
pthread_mutex_unlock (&mutex_1);
// stall here
usleep (100);
// Try to lock the first mutex again
pthread_mutex_lock(&mutex_1);
}
// If you are here, that means both mutexes are locked by this thread
// Modify the global data
count++;
// Unlock both mutexes!!!
pthread_mutex_unlock (&mutex_1);
pthread_mutex_unlock (&mutex_2);
}
I guess this can prevent the deadlock

Related

std::condition_variable memory writes visibility

Does std::condition_variable::notify_one() or std::condition_variable::notify_all() guarantee that non-atomic memory writes in the current thread prior to the call will be visible in notified threads?
Other threads do:
{
std::unique_lock lock(mutex);
cv.wait(lock, []() { return values[threadIndex] != 0; });
// May a thread here see a zero value and therefore start to wait again?
}
Main thread does:
fillData(values); // All values are zero and all threads wait() before calling this.
cv.notify_all(); // Do need some memory fence or lock before this
// to ensure that new non-zero values will be visible
// in other threads immediately after waking up?
Doesn't notify_all() store some atomic value therefore enforcing memory ordering? I did not clarified it.
UPD: according to Superlokkus' answer and an answer here: we have to acquire a lock to ensure memory writes visibility in other threads (memory propagation), otherwise threads in my case may read zero values.
Also I missed this quote here about condition_variable, which specifically answers my question. Even an atomic variable has to be modified under a lock in a case when the modification must become visible immediately.
Even if the shared variable is atomic, it must be modified under the
mutex in order to correctly publish the modification to the waiting
thread.

I guess you are mixing up memory ordering of so called atomic values and the mechanisms of classic lock based synchronization.
When you have a datum which is shared between threads, lets say an int for example, one thread can not simply read it while the other thread might be write to it meanwhile. Otherwise we would have a data race.
To get around this for long time we used classic lock based synchronization:
The threads share at least a mutex and the int. To read or to write any thread has to hold the lock first, meaning they wait on the mutex. Mutexes are build so that they are fine that this can happen concurrently. If a thread wins gettting the mutex it can change or read the int and then should unlock it, so others can read/write too. Using a conditional variable like you used is just to make the pattern "readers wait for a change of a value by a writer" more efficient, they get woken up by the cv instead of periodically waiting on the lock, reading, and unlocking, which would be called busy waiting.
So because you hold the lock in any after waiting on the mutex or in you case, correctly (mutex is still needed) waiting on the conditional variable, you can change the int. And readers will read the new value after the writer was able to wrote it, never the old. UPDATE: However one thing if have to add, which might also be the cause of confusion: Conditional variables are subject for so called spurious wakeups. Meaning even though you write did not have notified any thread, a read thread might still wake up, with the mutex locked. So you have to check if you writer actually waked you up, which is usually done by the writer by changing another datum just to notify this, or if its suitable by using the same datum you already wanted to share. The lambda parameter overload of std::condition_variable::wait was just made to make the checking and going back to sleep code looking a bit prettier. Based on your question now I don't know if you want to use you values for this job.
However at snippet for the "main" thread is incorrect or incomplete:
You are not synchronizing on the mutex in order to change values.
You have to hold the lock for that, but notifying can be done without the lock.
std::unique_lock lock(mutex);
fillData(values);
lock.unlock();
cv.notify_all();
But these mutex based patters have some drawbacks and are slow, only one thread at a time can do something. This is were so called atomics, like std::atomic<int> came into play. They can be written and read at the same time without an mutex by multiple threads concurrently. Memory ordering is only a thing to consider there and an optimization for cases where you uses several of them in a meaningful way or you don't need the "after the write, I never see the old value" guarantee. However with it's default memory ordering memory_order_seq_cst you would also be fine.

What does std::mutex prevent threads from modifying?

What part of memory gets locked by mutex when .lock() or .try_lock(), is it just the function or is it the whole program that gets locked?

Nothing is locked except the mutex. Everything else continues running (until it tries to lock an already locked mutex that is). The mutex is only there so that two threads cannot run the code between a mutex lock and a mutex unlock at the same time.

A mutex doesn't really lock anything, except for itself. You can think of a mutex as being a gate where you can only unlock it from the inside. When the gate is locked, any thread that tries to lock the mutex will sit there at the gate and wait for the current thread that is behind the gate to unlock it and let them in. When they gate is not locked then when you call lock you can just go in, close and lock the gate, and now no threads can get past the gate until you unlock it and let them in.

A mutex doesn't lock anything. You just use a mutex to communicate to other parts of your code that they should consider whatever you decide needs to be protected from access by several threads at the same time to be off-limits for now.
You could think of a mutex as something like a boolean okToModify. Whenever you want to edit something, you check if okToModify is true. If it is, you set it to false (preventing any other threads from modifying it), change it, then set okToModify back to true to tell the other threads you're done and give them a chance to modify:
// WARNING! This code doesn't actually work as a lock!
// it is just an example of the concept.
struct LockedInt {
bool okToModify; // This would be your mutex instead of a bool.
int integer;
};
struct LockedInt myLockedInt = { true, 0 };
...
while (myLockedInt.okToModify == false)
; // wait doing nothing until whoever is modifying the int is done.
myLockedInt.okToModify = false; // Prevent other threads from getting out of while loop above.
myLockedInt.integer += 1;
myLockedInt.okToModify = true; // Now other threads get out of the while loop if they were waiting and can modify.
The while loop and okToModify = false above is basically what locking a mutex does, and okToModify = true is what unlocking a mutex does.
Now, why do we need mutexes and don't use booleans? Because a thread could be running at the same time as those three lines above. The code for locking a mutex actually guarantees that the waiting for okToModify to become true and setting okToModify = false happen in one go, and therefore no other thread can get "in between the lines", for example by using a special machine-code instruction called "compare-and-exchange".
So do not use booleans instead of mutexes, but you can think of a mutex as a special, thread-safe boolean.

m.lock() doesn't really lock anything. What it does is, it waits to take ownership of the mutex. A mutex always either is owned by exactly one thread or else it is available. m.lock() waits until the mutex becomes available, and then it takes ownership of it in the name of the calling thread.
m.unlock releases the mutex (i.e., it relinquishes ownership), causing the mutex to once again become available.
Mutexes also perform another very important function. In modern C++, when some thread T performs a sequence of assignments of various values to various memory locations, the system makes no guarantees about when other threads U, V, and W will see those assignments, whether the other threads will see the assignments happen in the same order in which thread T performed them, or even, whether the other threads will ever see the assignments.
There are some quite complicated rules governing things that a programmer can do to ensure that different threads see a consistent view of shared memory objects (Google "C++ memory model"), but here's one simple rule:
Whatever thread T did before it releases some mutex M is guaranteed to be visible to any other thread U after thread U subsequently locks the same mutex M.

Why would I want to lock two mutexes in one function - that too with deferred lock?

https://en.cppreference.com/w/cpp/thread/lock_tag
void transfer(bank_account &from, bank_account &to, int amount)
{
// lock both mutexes without deadlock
std::lock(from.m, to.m);
// make sure both already-locked mutexes are unlocked at the end of scope
std::lock_guard<std::mutex> lock1(from.m, std::adopt_lock);
std::lock_guard<std::mutex> lock2(to.m, std::adopt_lock);
// equivalent approach:
// std::unique_lock<std::mutex> lock1(from.m, std::defer_lock);
// std::unique_lock<std::mutex> lock2(to.m, std::defer_lock);
// std::lock(lock1, lock2);
from.balance -= amount;
to.balance += amount;
}
What do they gain by locking two mutexes at once?
What have they gained by defered lock here?
Please explain the reason behind that decision of theirs.

If I modify a bank account without locking it, someone else could try to modify it at the same time. This is a race and the result will be undefined behaviour (usually lost or magically created money).
While transferring money, I am modifying 2 bank accounts. So they both need to be locked.
The problem is that when locking more than one thing, every locker must lock and unlock in the same order, otherwise we get deadlocks.
When it's bank accounts, there is no natural order of locks. Thousands of threads could be transferring money in all directions.
So we need a method of locking more than one mutex in a way that works around this - this is std::lock
std::lock merely locks the mutex - it does not guarantee unlocking on exit from the current code block.
std::lock_guard<> unlocks the mutex it's referring to upon destruction (see RAII). This makes the code behave correctly in all circumstances - even when there is an exception which might cause an early exit from the current code block without the code flowing over statement such as to.m.unlock()
A good explanation (with examples) here: https://wiki.sei.cmu.edu/confluence/display/cplusplus/CON53-CPP.+Avoid+deadlock+by+locking+in+a+predefined+order

Extension on Richard Hodges's answer
What do they gain by locking two mutexes at once?
Richard explained nicely already, just a little bit more explicit: we avoid dead-lock this way (std::lock is implemented such that dead-lock won't occur).
What have they gained by deferred lock here?
Deferring the lock results in not acquiring it immediately. That's important because if they did so, they would just do it without any protection against dead-lock (which the subsequent std::lock then achieves).
About dead lock avoidance (see std::lock):
Locks the given Lockable objects lock1, lock2, ..., lockn using a deadlock avoidance algorithm to avoid deadlock.
The objects are locked by an unspecified series of calls to lock, try_lock, and unlock. [...]
Side note: another, much simpler algorithm avoiding dead locks is always locking the bank account with e. g. lower account number (AN) first. If a thread is waiting for the lock of higher AN, then the other thread holding it either already has both of the locks acquired or is waiting for the second – which cannot be the one of the first thread as it must have a yet higher AN.
This does not change much for arbitrary number of threads, any thread holding a lower lock is waiting for a higher one, if hold as well. If you draw a directed graph with edges from A to B if A is waiting for second lock that B holds, you'll get a (multi-) tree structure, but you won't ever have circular substructures (which would indicate a dead lock).

The bank account data structure has a lock for each account.
When transferring money from one account to another, we need to lock both accounts (since we are removing money from one and adding it to another). We would like this operation not to deadlock, so lock both at once using std::lock, since doing it like that ensures there isn't a deadlock.
After we finish the transaction we need to make sure we release the lock. This code does that using RAII. With the adopt_lock tag, we make the object adopt an already locked mutex (which will be released when lock1 falls out of scope).
With the defer_lock tag, we create a unique_lock for a currently unlocked mutex, with the intention of locking it later. Again, it will be unlocked when the unique_lock falls out of scope.

from and to are 2 accounts which may be used anywhere in the application separatelly.
By having mutex for each account, you make sure nobody uses from nor to accounts while you do the transfer.
lock_guard will release mutex when exiting from function.

Need to mutex-protect (atomic) assignment sought by condition variable?

I understand how to use condition variables (crummy name for this construct, IMO, as the cv object neither is a variable nor indicates a condition). So I have a pair of threads, canonically set up with Boost.Thread as:
bool awake = false;
boost::mutex sync;
boost::condition_variable cv;
void thread1()
{
boost::unique_lock<boost::mutex> lock1(sync);
while (!awake)
cv.wait(lock1);
lock1.unlock(); // this line actually not canonical, but why not?
// proceed...
}
void thread2()
{
//...
boost::unique_lock<boost::mutex> lock2;
awake = true;
lock2.unlock();
cv.notify_all();
}
My question is: does thread2 really need to be protecting the assignment to awake? It seems to me the notify_all() call should be sufficient. If the data being manipulated and checked against were more than a simple "ok to proceed" flag, I see the value in the mutex, but here it seems like overkill.
A secondary question is that asked in the code fragment: Why doesn't the Boost documentation show the lock in thread1 being unlocked before the "process data" step?
EDIT: Maybe my question is really: Is there a cleaner construct than a CV to implement this kind of wait?

does thread2 really need to be protecting the assignment to awake?
Yes. Modifying an object from one thread and accessing it from another without synchronisation gives undefined behaviour. Even if it's just a bool.
For example, on some multiprocessor systems the write might only affect local memory; without an explicit synchronisation operation, other threads might never see the change.
Why doesn't the Boost documentation show the lock in thread1 being unlocked before the "process data" step?
If you unlocked the mutex before clearing the flag, then you might miss another signal.
Is there a cleaner construct than a CV to implement this kind of wait?
In Boost and the standard C++ library, no; a condition variable is flexible enough to handle arbitrary shared state and not particularly over-complicated for this simple case, so there's no particular need for anything simpler.
More generally, you could use a semaphore or a pipe to send a simple signal between threads.

Formally, you definitely need the lock in both threads: if any thread
modifies an object, and more than one thread accesses it, then all
accesses must be synchronized.
In practice, you'll probably get away with it without the lock; it's
almost certain that notify_all will issue the necessary fence or
membar instructions to ensure that the memory is properly synchronized.
But why take the risk?
As to the absense of the unlock, that's the whole point of the scoped
locking pattern: the unlock is in the destructor of the object, so
that the mutex will be unlocked even if an exception passes through.

Modelling boost::Lockable with semaphore rather than mutex (previously titled: Unlocking a mutex from a different thread)

I'm using the C++ boost::thread library, which in my case means I'm using pthreads. Officially, a mutex must be unlocked from the same thread which locks it, and I want the effect of being able to lock in one thread and then unlock in another. There are many ways to accomplish this. One possibility would be to write a new mutex class which allows this behavior.
For example:
class inter_thread_mutex{
bool locked;
boost::mutex mx;
boost::condition_variable cv;
public:
void lock(){
boost::unique_lock<boost::mutex> lck(mx);
while(locked) cv.wait(lck);
locked=true;
}
void unlock(){
{
boost::lock_guard<boost::mutex> lck(mx);
if(!locked) error();
locked=false;
}
cv.notify_one();
}
// bool try_lock(); void error(); etc.
}
I should point out that the above code doesn't guarantee FIFO access, since if one thread calls lock() while another calls unlock(), this first thread may acquire the lock ahead of other threads which are waiting. (Come to think of it, the boost::thread documentation doesn't appear to make any explicit scheduling guarantees for either mutexes or condition variables). But let's just ignore that (and any other bugs) for now.
My question is, if I decide to go this route, would I be able to use such a mutex as a model for the boost Lockable concept. For example, would anything go wrong if I use a boost::unique_lock< inter_thread_mutex > for RAII-style access, and then pass this lock to boost::condition_variable_any.wait(), etc.
On one hand I don't see why not. On the other hand, "I don't see why not" is usually a very bad way of determining whether something will work.
The reason I ask is that if it turns out that I have to write wrapper classes for RAII locks and condition variables and whatever else, then I'd rather just find some other way to achieve the same effect.
EDIT:
The kind of behavior I want is basically as follows. I have an object, and it needs to be locked whenever it is modified. I want to lock the object from one thread, and do some work on it. Then I want to keep the object locked while I tell another worker thread to complete the work. So the first thread can go on and do something else while the worker thread finishes up. When the worker thread gets done, it unlocks the mutex.
And I want the transition to be seemless so nobody else can get the mutex lock in between when thread 1 starts the work and thread 2 completes it.
Something like inter_thread_mutex seems like it would work, and it would also allow the program to interact with it as if it were an ordinary mutex. So it seems like a clean solution. If there's a better solution, I'd be happy to hear that also.
EDIT AGAIN:
The reason I need locks to begin with is that there are multiple master threads, and the locks are there to prevent them from accessing shared objects concurrently in invalid ways.
So the code already uses loop-level lock-free sequencing of operations at the master thread level. Also, in the original implementation, there were no worker threads, and the mutexes were ordinary kosher mutexes.
The inter_thread_thingy came up as an optimization, primarily to improve response time. In many cases, it was sufficient to guarantee that the "first part" of operation A, occurs before the "first part" of operation B. As a dumb example, say I punch object 1 and give it a black eye. Then I tell object 1 to change it's internal structure to reflect all the tissue damage. I don't want to wait around for the tissue damage before I move on to punch object 2. However, I do want the tissue damage to occur as part of the same operation; for example, in the interim, I don't want any other thread to reconfigure the object in such a way that would make tissue damage an invalid operation. (yes, this example is imperfect in many ways, and no I'm not working on a game)
So we made the change to a model where ownership of an object can be passed to a worker thread to complete an operation, and it actually works quite nicely; each master thread is able to get a lot more operations done because it doesn't need to wait for them all to complete. And, since the event sequencing at the master thread level is still loop-based, it is easy to write high-level master-thread operations, as they can be based on the assumption that an operation is complete (more precisely, the critical "first part" upon which the sequencing logic depends is complete) when the corresponding function call returns.
Finally, I thought it would be nice to use inter_thread mutex/semaphore thingies using RAII with boost locks to encapsulate the necessary synchronization that is required to make the whole thing work.

man pthread_unlock (this is on OS X, similar wording on Linux) has the answer:
NAME
pthread_mutex_unlock -- unlock a mutex
SYNOPSIS
#include <pthread.h>
int
pthread_mutex_unlock(pthread_mutex_t *mutex);
DESCRIPTION
If the current thread holds the lock on mutex, then the
pthread_mutex_unlock() function unlocks mutex.
Calling pthread_mutex_unlock() with a mutex that the
calling thread does not hold will result in
undefined behavior.
...
My counter-question would be - what kind of synchronization problem are you trying to solve with this? Most probably there is an easier solution.
Neither pthreads nor boost::thread (built on top of it) guarantee any order in which a contended mutex is acquired by competing threads.

Sorry, but I don't understand. what will be the state of your mutex in line [1] in the following code if another thread can unlock it?
inter_thread_mutex m;
{
m.lock();
// [1]
m.unlock();
}
This has no sens.

There's a few ways to approach this. Both of the ones I'm going to suggest are going to involve adding an additional piece of information to the object, rather adding a mechanism to unlock a thread from a thread other than the one that owns it.
1) you can add some information to indicate the object's state:
enum modification_state { consistent, // ready to be examined or to start being modified
phase1_complete, // ready for the second thread to finish the work
};
// first worker thread
lock();
do_init_work(object);
object.mod_state = phase1_complete;
unlock();
signal();
do_other_stuff();
// second worker thread
lock()
while( object.mod_state != phase1_complete )
wait()
do_final_work(obj)
object.mod_state = consistent;
unlock()
signal()
// some other thread that needs to read the data
lock()
while( object.mod_state != consistent )
wait();
read_data(obj)
unlock()
Works just fine with condition variables, because obviously you're not writing your own lock.
2) If you have a specific thread in mind, you can give the object an owner.
// first worker
lock();
while( obj.owner != this_thread() ) wait();
do_initial_work(obj);
obj.owner = second_thread_id;
unlock()
signal()
...
This is pretty much the same solution as my first solution, but more flexible in the adding/removing of phases, and less flexible in the adding/removing of threads.
To be honest, I'm not sure how inter thread mutex would help you here. You'd still need a semaphore or condition variable to signal the passing of the work to the second thread.

Small modification to what you already have: how about storing the id of the thread which you want to take the lock, in your inter_thread_whatever? Then unlock it, and send a message to that thread, saying "I want you execute whatever routine it is that tries to take this lock".
Then the condition in lock becomes while(locked || (desired_locker != thisthread && desired_locker != 0)). Technically you've "released the lock" in the first thread, and "taken it again" in the second thread, but there's no way that any other thread can grab it in between, so it's as if you've transferred it directly from one to the other.
There's a potential problem, that if a thread exits or is killed, while it's the desired locker of your lock, then that thread deadlocks. But you were already talking about the first thread waiting for a message from the second thread to say that it has successfully acquired the lock, so presumably you already have a plan in mind for what happens if that message is never received. To that plan, add "reset the desired_locker field on the inter_thread_whatever".
This is all very hairy, though, I'm not convinced that what I've proposed is correct. Is there a way that the "master" thread (the one that's directing all these helpers) can just make sure that it doesn't order any more operations to be performed on whatever is protected by this lock, until the first op is completed (or fails and some RAII thing notifies you)? You don't need locks as such, if you can deal with it at the level of the message loop.

I don't think it is a good idea to say that your inter_thread_mutex (binary_semaphore) can be seen as a model of Lockable. The main issue is that the main feature of your inter_thread_mutex defeats the Locakble concept. If inter_thread_mutex was a model of lockable you will expect in In [1] that the inter_thread_mutex m is locked.
// thread T1
inter_thread_mutex m;
{
unique_lock<inter_thread_mutex> lk(m);
// [1]
}
But as an other thread T2 can do m.unlock() while T1 is in [1], the guaranty is broken.
Binary semaphores can be used as Lockables as far as each thread tries to lock before unlocking. But the main goal of your class is exactly the contrary.
This is one of the reason semaphores in Boost.Interprocess don't use lock/unlock to name the functions, but wait/notify. Curiously these are the same names used by conditions :)

A mutex is a mechanism for describing mutually exclusive blocks of code. It does not make sense for these blocks of code to cross thread boundaries. Trying to use such a concept in such an counter intuitive way can only lead to problems down the line.
It sounds very much like you're looking for a different multi-threading concept, but without more detail it's hard to know what.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js