Boost interprocess_condition multiple threads calling wait() fails - c++

Running into a very strange issue with 2+ threads waiting on an interprocess_condition variable.
Boost 1.60.0
With 1 thread calling wait() and a 2nd calling notify_all(), everything works as expected.
When there are 2+ calling wait(), I get an assertion failure on do_wait() and the process exits.
Test.cpp:
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/interprocess/sync/interprocess_condition.hpp>
#include <iostream>
using namespace boost::interprocess;
struct Data {
interprocess_mutex mux_;
interprocess_condition cond_;
};
int main(int argc, char *argv[]) {
if (argc > 1 && atoi(argv[1]) == 0) {
struct shm_remove {
shm_remove() { shared_memory_object::remove("MySharedMemory"); }
~shm_remove() { shared_memory_object::remove("MySharedMemory"); }
} remover;
managed_shared_memory seg(create_only, "MySharedMemory", 65536);
Data *const d = seg.construct<Data>(unique_instance)();
scoped_lock<interprocess_mutex> lock(d->mux_);
std::cout << "Waiting" << std::endl;
d->cond_.wait(lock);
} else if (argc > 1 && atoi(argv[1]) == 1) {
managed_shared_memory seg(open_only, "MySharedMemory");
std::pair<Data *, std::size_t> res = seg.find<Data>(unique_instance);
scoped_lock<interprocess_mutex> lock(res.first->mux_);
std::cout << "Waiting" << std::endl;
res.first->cond_.wait(lock);
} else {
managed_shared_memory seg(open_only, "MySharedMemory");
std::pair<Data *, std::size_t> res = seg.find<Data>(unique_instance);
scoped_lock<interprocess_mutex> lock(res.first->mux_);
std::cout << "Notifying" << std::endl;
res.first->cond_.notify_all();
}
}
Compiled as:
$ clang++ -I/usr/local/include test.cpp
Running with 1 wait() and 1 notify():
$ ./a.out 0&
[8] 25889
Waiting
$ ./a.out 2&
[9] 25901
Notifying
[8]- Done ./a.out 0
[9]+ Done ./a.out 2
Running with 2 waits:
$ ./a.out 0&
[8] 25986
Waiting
$ ./a.out 1&
[9] 25998
Waiting
Assertion failed: (res == 0), function do_wait, file /usr/local/include/boost/interprocess/sync/posix/condition.hpp, line 175.
Tested on OSX El Capitan
$ uname -a
Darwin LUS-JOHUGHES2 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64
I also tried the above example on an Ubuntu Trusty machine and all examples work as expected, leading me to believe there is an issue with the OSX implementation. I have not tried it on Windows.

Did some digging and found a definite answer to the problem.
The boost assertion error above is failing when the second process calls do_wait(), which calls pthread_wait() which returns immediately with EINVAL (instead of a successful 0).
In OSX's pthread implementation, the condition variable stores a raw pointer to the mutex variable. The first call to pthread_wait() in the first process sets this pointer. The second call to pthread_wait() checks this stored mutex pointer against the mutex pointer passed into pthread_wait(). {Can be found in the source here: https://opensource.apple.com/source/libpthread/libpthread-137.1.1/src/pthread_cond.c}
Since the two processes have mapped the shared mutex and condition variables in different address spaces, the second call to pthread_wait() will never work, since it compares raw pointers.
Thus the 2 options to make this work are as follows:
Use Fixed Address Mapping: http://www.boost.org/doc/libs/1_60_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_region.mapped_region_fixed_address_mapping which guarantees that the mapped regions will be at the same address, and thus raw pointers will work, or
Instead of exec()'ing new processes, use fork() which means that the child process will have a copy of the original segment manager, mapped to the same address, and thus raw pointers will work.
I didn't dig much into the glibc pthreads code to see what they do differently than Apple, so I'm not sure why the original example works on Linux but not OSX.
I think the Boost docs would definitely benefit with a paragraph discussing this pitfall.

This is a bug both on Darwin's C library and Boost.Interprocess. First, that C library claims it's posix-compliant and supports process shared memory condition variables, which is false (as it uses a raw pointer to store the address of the mutex). Second, Boost.Interprocess should detect this platform as a buggy one, should disable the use of pthreads and fallback to emulation.
In boost/interprocess/detail/workaround.hpp, you will find a comment saying:
//Mac Os X < Lion (10.7) might define _POSIX_THREAD_PROCESS_SHARED but there is no real support.
Some old reports claimed that newer macos versions really support process shared condition variables, but this claim is false, so the __APPLE__ section should be just:
#define BOOST_INTERPROCESS_BUGGY_POSIX_PROCESS_SHARED

Related

std::unique_lock and std::shared_lock in same thread - shouldn't work, but does?

Why is it possible to acquire two different locks on a std::shared_mutex?
The following code
#include <shared_mutex>
#include <iostream>
int main(int /*argc*/, char * /*argv*/[]) {
std::shared_mutex mut;
std::unique_lock<std::shared_mutex> ulock{mut};
if(ulock)
std::cout << "uniqe_lock is bool true\n";
if(ulock.owns_lock())
std::cout << "uniqe_lock owns lock\n";
std::shared_lock<std::shared_mutex> slock{mut};
if(slock)
std::cout << "shared_lock is bool true\n";
if(slock.owns_lock())
std::cout << "shared_lock owns lock\n";
}
when compiled with a g++ 9.4.0 (current available on Ubuntu 20.04.1)
g++ -std=c++17 mutex.cpp
outputs all cout lines. I.e. the thread acquires both the unique_lock and the shared_lock. As I understand cppreference on the shared_mutex, acquiring the second lock should fail while the first one is still alive.
"Within one thread, only one lock (shared or exclusive) can be acquired at the same time."
What do I miss here? Why can the shared_lock be acquired even with the previous unique_lock still active?
Also from cppreference: "If lock_shared is called by a thread that already owns the mutex in any mode (exclusive or shared), the behavior is undefined."
You have UB; the toolchain is not required to diagnose this, though it might if you enabled your stdlib's debug mode (_GLIBCXX_DEBUG for libstdc++).

Not holding the lock while notifying a boost interprocessing condition variable causes issues

UPDATE 21.02.2020: Holding the lock while notifying doesn't actually help. As I understand so far, the condition variable is left invalid in the shared memory by the waiting process.
So I have this application using boost interprocess to share memory, and the access to it is synced using an interprocess condition variable.
I am using boost 1.62 on Windows. I am compiling using Microsoft Windows Build Tools 2015.
What happens is that when I terminate the waiting process with a Ctrl-C, the notifying process gets stuck in the notify call.
Here's a demo program that allows reproducing the issue. You have to run the executable once without any argument to start the waiting process and once more with some argument to start the notifying process. Then kill the first process. Sometimes you will observe that the printing stops at "Entering notify".
#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/interprocess/sync/interprocess_condition.hpp>
#include <iostream>
struct shared_data
{
boost::interprocess::interprocess_mutex mutex;
boost::interprocess::interprocess_condition condition;
bool test_bool = false;
};
int main(int argc, char *argv[])
{
using namespace boost::interprocess;
if (argc == 1) {
struct shm_remove
{
shm_remove() {
shared_memory_object::remove("MySharedMemory");
}
~shm_remove() {
shared_memory_object::remove("MySharedMemory");
}
} remover;
shared_memory_object shm(create_only, "MySharedMemory", read_write);
shm.truncate(sizeof(shared_data));
mapped_region region(shm, read_write);
void* addr = region.get_address();
shared_data* data = new (addr) shared_data;
while (true) {
scoped_lock<interprocess_mutex> lock(data->mutex);
while (!data->test_bool) {
data->condition.wait(lock);
}
std::cout << "test_bool became true" << std::endl;
data->test_bool = false;
}
}
else {
shared_memory_object shm(open_only, "MySharedMemory", read_write);
mapped_region region(shm, read_write);
shared_data* data = static_cast<shared_data*>(region.get_address());
while (true) {
{
scoped_lock<interprocess_mutex> lock(data->mutex);
data->test_bool = true;
}
std::cout << "Entering notify" << std::endl;
data->condition.notify_one();
std::cout << "Exiting notify" << std::endl;
}
}
}
(Of course, killing while waiting is harsh, but I as far as I've debugged it, the wait call is cleaned up after the signal)
If I keep the lock acquired while calling notify_one, the issue does not manifest.
However, I was expecting not to be a need to keep the lock acquired while notifying, in the spirit of the c++ threading implementation.
I haven't found any specification on this point in the documentation, only the example, which does indeed keep the lock acquired.
Now, given that I have a solution to my problem, my questions are:
Is the need to have the lock acquired while notifying the expected
and only correct usage, or is it a bug?
If it is the expected usage,
why?
You don't have to hold the lock when calling notify, but in most of the cases you should still do it, because otherwise some threads (or in your case processes) could miss the notification. Consider the following scenario:
process 1 acquires the lock and checks the condition, but is preempted before calling condition.wait
process 2 calls condition.notify_one - but there is no process to be notified
you kill process 2
now process 1 finally calls condition.wait - and waits forever.
By acquiring the lock before calling notify, you can ensure that the other process has already called wait and therefore cannot miss the notification. This also holds for std::condition_variable, not only your interprocess example.
There are a few situations where this might not be an issue (e.g., because you do not wait forever, but only for a limited time), but you should be very careful there.

Printing std::this_thread::get_id() gives "thread::id of a non-executing thread"?

This used to work perfectly fine (and then aliens must have hacked my PC):
#include <thread>
#include <iostream>
int main()
{
std::cout << std::this_thread::get_id() << std::endl;
return 0;
}
and now it prints thread::id of a non-executing thread.
ideone.com prints some ID, but it's interesting what may have caused this behavior on my platform.
$ uname -a
Linux xxx 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Any ideas?
EDIT: Well.. when I add
std::cout << pthread_self() << std::endl;
both lines print the same ID, but when I remove it, the result is still the same - "non-executing thread".
It's a side-effect of a glibc feature, fixed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57060:
// For the GNU C library pthread_self() is usable without linking to
// libpthread.so but returns 0, so we cannot use it in single-threaded
// programs, because this_thread::get_id() != thread::id{} must be true.
If you explicitly link against pthreads (-pthread or -lpthread) then your program will behave as expected.
Oddly enough, on my system, adding a call to pthread_self (before or after the call to std::this_thread::get_id() does not change behavior:
0
thread::id of a non-executing thread
This may be an Ubuntu-specific behavior, linking pthreads automatically if pthread_self is called, but it seems a bit odd. Note that std::this_thread::get_id() calls pthread_self via a weak reference (itself via __gthread_self).
Standard does not actually define what the this_thread::get_id is going to return. All it says is:
Returns: An object of type thread::id that uniquely identifies the
current thread of execution. No other thread of execution shall have
this id and this thread of execution shall always have this id. The
object returned shall not compare equal to a default constructed
thread::id.
This condition is met with your output, so everything is in order.
By the way, do not confuse thread_id returned by this_thread::get_id with numerical thread id returned by std::thread::get_id(). thread_id main usage is to be compared and used in associative containers, not directly introspected or printed.

System-wide global variable / semaphore / mutex in C++/Linux?

Is it possible to create a system-wide global variable / semaphore / mutex in C++ on Linux?
Here's the reason: I've got a system that often runs multiple copies of the same software on unrelated data. It's common to have 4 jobs, each running the same software. The software has a small section where it creates a huge graph that takes a lot of memory; outside that section memory usage is moderate.
It so happens sometimes that 2 jobs simultaneously hit the same memory-hungry section and the whole system starts swapping. Thus we want to prevent that by creating something like a critical section mutex between different jobs so that no more than one of them would allocate a lot of memory at a time.
If these were thread of the same job pthread locks would do the job.
What would be a good way to implement such mutex between different jobs?
You can use a named semaphore if you can get all the processes to agree on a common name.
A named semaphore is identified by a name of the form
/somename; that is, a null-terminated string of up to
NAME_MAX-4 (i.e., 251) characters consisting of an initial
slash, followed by one or more characters, none of which are
slashes. Two processes can operate on the same named
semaphore by passing the same name to sem_open(3).
For interprocess mutual exclusion, you can use file locking. With linux, the code is as simple as protecting the critical section with a call to flock.
int fd_lock = open(LOCK_FILE, O_CREAT);
flock(fd_lock, LOCK_EX);
// do stuff
flock(fd_lock, LOCK_UN);
If you need POSIX compatibility, you can use fcntl.
You can make C++ mutexes work across process boundaries on Linux. However, there's some black magic involved which makes it less appropriate for production code.
Explanation:
The standard library's std::mutex and std::shared_mutex use pthread's struct pthread_mutex_s and pthread_rwlock_t under the hood. The native_handle() method returns a pointer to one of these structures.
The drawback is that certain details are abstracted out of the standard library and defaulted in the implementation. For example, std::shared_mutex creates its underlying pthread_rwlock_t structure by passing NULL as the second parameter to pthread_rwlock_init(). This is supposed to be a pointer to a pthread_rwlockattr_t structure containing an attribute which determines sharing policy.
public:
__shared_mutex_pthread()
{
int __ret = pthread_rwlock_init(&_M_rwlock, NULL);
...
In theory, it should receive default attributes. According to the man pages for pthread_rwlockattr_getpshared():
The default value of the process-shared attribute is PTHREAD_PROCESS_PRIVATE.
That said, both std::shared_mutex and std::mutex work across processes anyway. I'm using Clang 6.0.1 (x86_64-unknown-linux-gnu / POSIX thread model). Here's a description of what I did to check:
Create a shared memory region with shm_open.
Check the size of the region with fstat to determine ownership. If .st_size is zero, then ftruncate() it and the caller knows that it is the region's creating process.
Call mmap on it.
The creator process uses placement-new to construct a std::mutex or std::shared_mutex object within the shared region.
Later processes use reinterpret_cast<>() to obtain a typed pointer to the same object.
The processes now loop on calling trylock() and unlock() at intervals. You can see them blocking one another using printf() before and after trylock() and before unlock().
Extra detail: I was interested in whether the c++ headers or the pthreads implementation were at fault, so I dug into pthread_rwlock_arch_t. You'll find a __shared attribute which is zero and a __flags attribute which is also zero for the field denoted by __PTHREAD_RWLOCK_INT_FLAGS_SHARED. So it seems that by default this structure is not intended to be shared, though it seems to provide this facility anyway (as of July 2019).
Summary
It seems to work, though somewhat by chance. I would advise caution in writing production software that works contrary to documentation.
I looked at using the shared-pthread-mutex solution but didn't like the logic race in it. So I wrote a class to do this using the atomic builtins
#include <string>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
using std::string;
//from the command line - "ls /dev/shm" and "lsof /dev/shm/<name>" to see which process ID has access to it
template<typename PAYLOAD>
class InterprocessSharedVariable
{
protected:
int mSharedMemHandle;
string const mSharedMemoryName;
bool mOpenedMemory;
bool mHaveLock;
pid_t mPID;
// this is the shared memory structure
typedef struct
{
pid_t mutex;
PAYLOAD payload;
}
tsSharedPayload;
tsSharedPayload* mSharedData;
bool openSharedMem()
{
mPID = getpid();
// The following caters for the shared mem being created by root but opened by non-root,
// giving the shared-memory 777 permissions.
int openFlags = O_CREAT | O_RDWR;
int shareMode = S_IRWXU | S_IRWXG | S_IRWXO;
// see https://stackoverflow.com/questions/11909505/posix-shared-memory-and-semaphores-permissions-set-incorrectly-by-open-calls
// store old
mode_t old_umask = umask(0);
mSharedMemHandle = shm_open (mSharedMemoryName.c_str(), openFlags, shareMode);
// restore old
umask(old_umask);
if (mSharedMemHandle < 0)
{
std::cerr << "failed to open shared memory" << std::endl;
return false;
}
if (-1 == ftruncate(mSharedMemHandle, sizeof(tsSharedPayload)))
{
std::cerr << "failed to resize shared memory" << std::endl;
return false;
}
mSharedData = (tsSharedPayload*) mmap (NULL,
sizeof(tsSharedPayload),
PROT_READ | PROT_WRITE,
MAP_SHARED,
mSharedMemHandle,
0);
if (MAP_FAILED == mSharedData)
{
std::cerr << "failed to map shared memory" << std::endl;
return false;
}
return true;
}
void closeSharedMem()
{
if (mSharedMemHandle > 0)
{
mSharedMemHandle = 0;
shm_unlink (mSharedMemoryName.c_str());
}
}
public:
InterprocessSharedVariable () = delete;
InterprocessSharedVariable (string const&& sharedMemoryName) : mSharedMemoryName(sharedMemoryName)
{
mSharedMemHandle = 0;
mOpenedMemory = false;
mHaveLock = false;
mPID = 0;
}
virtual ~InterprocessSharedVariable ()
{
releaseSharedVariable ();
closeSharedMem ();
}
// no copying
InterprocessSharedVariable (InterprocessSharedVariable const&) = delete;
InterprocessSharedVariable& operator= (InterprocessSharedVariable const&) = delete;
bool tryLockSharedVariable (pid_t& ownerProcessID)
{
// Double-checked locking. See if a process has already grabbed the mutex. Note the process could be dead
__atomic_load (&mSharedData->mutex, &ownerProcessID, __ATOMIC_SEQ_CST);
if (0 != ownerProcessID)
{
// It is possible that we have started with the same PID as a previous process that terminated abnormally
if (ownerProcessID == mPID)
{
// ... in which case, we already "have ownership"
return (true);
}
// Another process may have the mutex. Check whether it is alive.
// We are specifically looking for an error returned with ESRCH
// Note that if the other process is owned by root, "kill 0" may return a permissions error (which indicates the process is running!)
int processCheckResult = kill (ownerProcessID, 0);
if ((0 == processCheckResult) || (ESRCH != errno))
{
// another process owns the shared memory and is running
return (false);
}
// Here: The other process does not exist ((0 != processCheckResult) && (ESRCH == errno))
// We could assume here that we can now take ownership, but be proper and fall into the compare-exchange
ownerProcessID = 0;
}
// It's possible that another process has snuck in here and taken ownership of the shared memory.
// If that has happened, the exchange will "fail" (and the existing PID is stored in ownerProcessID)
// ownerProcessID == 0 -> representing the "expected" value
mHaveLock = __atomic_compare_exchange_n (&mSharedData->mutex,
&ownerProcessID, //"expected"
mPID, //"desired"
false, //"weak"
__ATOMIC_SEQ_CST, //"success-memorder"
__ATOMIC_SEQ_CST); //"fail-memorder"
return (mHaveLock);
}
bool acquireSharedVariable (bool& failed, pid_t& ownerProcessID)
{
if (!mOpenedMemory)
{
mOpenedMemory = openSharedMem ();
if (!mOpenedMemory)
{
ownerProcessID = 0;
failed = true;
return false;
}
}
// infrastructure is working
failed = false;
bool gotLock = tryLockSharedVariable (ownerProcessID);
return (gotLock);
}
void releaseSharedVariable ()
{
if (mHaveLock)
{
__atomic_store_n (&mSharedData->mutex, 0, __ATOMIC_SEQ_CST);
mHaveLock = false;
}
}
};
Example usage - here we are simply using it to ensure that only one instance of the application runs.
int main(int argc, char *argv[])
{
typedef struct { } tsEmpty;
InterprocessSharedVariable<tsEmpty> programMutex ("/run-once");
bool memOpenFailed;
pid_t ownerProcessID;
if (!programMutex.acquireSharedVariable (memOpenFailed, ownerProcessID))
{
if (memOpenFailed)
{
std::cerr << "Failed to open shared memory" << std::endl;
}
else
{
std::cerr << "Program already running - process ID " << ownerProcessID << std::endl;
}
return -1;
}
... do stuff ...
return 0;
}
Mutual exclusion locks (mutexes) prevent multiple threads from simultaneously executing critical sections of code that access shared data (that is, mutexes are used to serialize the execution of threads). All mutexes must be global. A successful call for a mutex lock by way of mutex_lock() will cause another thread that is also trying to lock the same mutex to block until the owner thread unlocks it by way of mutex_unlock(). Threads within the same process or within other processes can share mutexes.
Mutexes can synchronize threads within the same process or in other processes. Mutexes can be used to synchronize threads between processes if the mutexes are allocated in writable memory and shared among the cooperating processes (see mmap(2)), and have been initialized for this task.
For inter-process synchronization, a mutex needs to be allocated in memory shared between these processes. Since the memory for such a mutex must be allocated dynamically, the mutex needs to be explicitly initialized using mutex_init().
also, for inter-process synchronization, besides the requirement to be allocated in shared memory, the mutexes must also use the attribute PTHREAD_PROCESS_SHARED, otherwise accessing the mutex from another process than its creator results in undefined behaviour (see this: linux.die.net/man/3/pthread_mutexattr_setpshared): "The process-shared attribute is set to PTHREAD_PROCESS_SHARED to permit a mutex to be operated upon by any thread that has access to the memory where the mutex is allocated, even if the mutex is allocated in memory that is shared by multiple processes."

Boost w/ C++ - Curious mutex behavior

I'm experimenting with Boost threads, as it's to my knowledge I can write a multi-threaded Boost application and compile it in Windows or Linux, while pthreads, which I'm more familiar with, is strictly for use on *NIX systems.
I have the following sample application, which is borrowed from another SO question:
#include <boost/thread/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/bind.hpp>
#include <iostream>
#define NAP_DURATION (10000UL) // 10ms
boost::mutex io_mutex;
void count(int id)
{
for (int i = 0; i < 1000; ++i)
{
boost::mutex::scoped_lock lock(io_mutex);
std::cout << "Thread ID:" << id << ": " << i << std::endl;
if (id == 1)
{
std::cout << "I'm thread " << id << " and I'm taking a short nap" << std::endl;
usleep(NAP_DURATION);
}
else
{
std::cout << "I'm thread " << id << ", I drink 100 cups of coffee and don't need a nap" << std::endl;
}
std::cout << "Thread ID:" << id << ": " << i << std::endl;
boost::thread::yield();
}
}
int main(int argc, char* argv[])
{
boost::thread thrd1( boost::bind(&count, 1));
boost::thread thrd2( boost::bind(&count, 2));
thrd1.join();
thrd2.join();
return 0;
}
I installed Boost on my Ubuntu 14.04 LTS system via:
sudo apt-get install libboost-all-dev
And I compile the above code via:
g++ test.cpp -lboost_system -lboost_thread -I"$BOOST_INLCUDE" -L"$BOOST_LIB"
I've run into what appears to be some interesting inconsistencies. If I set a lengthy NAP_DURATION, say 1 second (1000000) it seems that only thread 1 ever gets the mutex until it completes its operations, and it's very rare that thread 2 ever gets the lock until thread 1 is done, even when I set the NAP_DURATION to be just a few milliseconds.
When I've written similar such applications using pthreads, the lock would typically alternate more or less randomly between threads, since another thread would already be blocked on the mutex.
So, to the question(s):
Is this expected behavior?
Is there a way to control this behavior, such as making scoped locks behave like locking operations are queued?
If the answer to (2) is "no", is it possible to achieve something similar with Boost condition variables and not having to worry about lock/unlock calls failing?
Are scoped_locks guaranteed to unlock? I'm using the RAII approach rather than manually locking/unlocking because apparently the unlock operation can fail and throw an exception, and I'm trying to make this code solid.
Thank you.
Clarifications
I'm aware that putting the calling thread to sleep won't unlock the mutex, since it's still in scope, but the expected scheduling was along the lines of:
Thread1 locks, gets the mutex.
Thread2 locks, blocks.
Thread1 executes, releases the lock, and immediately attempts to lock again.
Thread2 was already waiting on the lock, gets it before thread1.
Is this expected behavior?
Yes and no. You shouldn't have any expectations about which thread will get a mutex, since it's unspecified. But it's certainly within the range of expected behavior.
Is there a way to control this behavior, such as making scoped locks behave like locking operations are queued?
Don't use mutexes this way. Just don't. Use mutexes only such that they're held for very short periods of time relative to other things a thread is doing.
If the answer to (2) is "no", is it possible to achieve something similar with Boost condition variables and not having to worry about lock/unlock calls failing?
Sure. Code what you want.
Are scoped_locks guaranteed to unlock? I'm using the RAII approach rather than manually locking/unlocking because apparently the unlock operation can fail and throw an exception, and I'm trying to make this code solid.
It's not clear what it is you're worried about, but the RAII approach is recommended.
Why are you surprised, exactly ?
If you were expecting thread 2 to acquire the mutex while thread 1 is asleep, then, yes, this is expecting behaviour and your understanding was wrong, because your lock is in scope.
But if you are surprised because of lack of alternance between thread 1 and thread 2 at the end of loop iteration, then you can have a look at this SO question about scheduling that seems "unfair"