Why can mutex be used in different threads? - c++

Using (writing) same variable in multiple threads simultaneously causes undefined behavior and crashes.
Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
If mutex somehow can be used simultaneously, why not make all variables work simultaneously without locking?
All my research is pressing Show definition on mutex::lock in Visual Studio, where I get at the end _Mtx_lock function without realization, and then I found it’s realization (Windows), though it has some functions also without realization:
int _Mtx_lock(_Mtx_t mtx)
{ /* lock mutex */
return (mtx_do_lock(mtx, 0));
}
static int mtx_do_lock(_Mtx_t mtx, const xtime *target)
{ /* lock mutex */
if ((mtx->type & ~_Mtx_recursive) == _Mtx_plain)
{ /* set the lock */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
{ /* not current thread, do lock */
mtx->_get_cs()->lock();
mtx->thread_id = static_cast<long>(GetCurrentThreadId());
}
++mtx->count;
return (_Thrd_success);
}
else
{ /* handle timed or recursive mutex */
int res = WAIT_TIMEOUT;
if (target == 0)
{ /* no target --> plain wait (i.e. infinite timeout) */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
mtx->_get_cs()->lock();
res = WAIT_OBJECT_0;
}
else if (target->sec < 0 || target->sec == 0 && target->nsec <= 0)
{ /* target time <= 0 --> plain trylock or timed wait for */
/* time that has passed; try to lock with 0 timeout */
if (mtx->thread_id != static_cast<long>(GetCurrentThreadId()))
{ /* not this thread, lock it */
if (mtx->_get_cs()->try_lock())
res = WAIT_OBJECT_0;
else
res = WAIT_TIMEOUT;
}
else
res = WAIT_OBJECT_0;
}
else
{ /* check timeout */
xtime now;
xtime_get(&now, TIME_UTC);
while (now.sec < target->sec
|| now.sec == target->sec && now.nsec < target->nsec)
{ /* time has not expired */
if (mtx->thread_id == static_cast<long>(GetCurrentThreadId())
|| mtx->_get_cs()->try_lock_for(
_Xtime_diff_to_millis2(target, &now)))
{ /* stop waiting */
res = WAIT_OBJECT_0;
break;
}
else
res = WAIT_TIMEOUT;
xtime_get(&now, TIME_UTC);
}
}
if (res != WAIT_OBJECT_0 && res != WAIT_ABANDONED)
;
else if (1 < ++mtx->count)
{ /* check count */
if ((mtx->type & _Mtx_recursive) != _Mtx_recursive)
{ /* not recursive, fixup count */
--mtx->count;
res = WAIT_TIMEOUT;
}
}
else
mtx->thread_id = static_cast<long>(GetCurrentThreadId());
switch (res)
{
case WAIT_OBJECT_0:
case WAIT_ABANDONED:
return (_Thrd_success);
case WAIT_TIMEOUT:
if (target == 0 || (target->sec == 0 && target->nsec == 0))
return (_Thrd_busy);
else
return (_Thrd_timedout);
default:
return (_Thrd_error);
}
}
}
So, according to this code, and the atomic_ keywords I think mutex can be written the next way:
atomic_bool state = false;
void lock()
{
if(!state)
state = true;
else
while(state){}
}
void unlock()
{
state = false;
}
bool try_lock()
{
if(!state)
state = true;
else
return false;
return true;
}

As you have found, std::mutex is thread-safe because it uses atomic operations. It can be reproduced with std::atomic_bool. Using atomic variables from multiple thread is not undefined behavior, because that is the purpose of those variables.
From C++ standard (emphasis mine):
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
Atomic variables are implemented using atomic operations of the CPU. This is not implemented for non-atomic variables, because those operations take longer time to execute and would be useless if the variables are only used in one thread.
Your example is not thread-safe:
void lock()
{
if(!state)
state = true;
else
while(state){}
}
If two threads are checking if(!state) simultaneously, it is possible that both enter the if section, and both threads believe they have the ownership:
Thread 1 Thread 2
if (!state)
if (!state)
state=true;
state=true;
You must use an atomic exchange function to ensure that the another thread cannot come in between checking the value and changing it.
void lock()
{
bool expected;
do {
expected = false;
} while (!state.compare_exchange_weak(expected, true));
}
You can also add a counter and give time for other threads to execute if the wait takes a long time:
void lock()
{
bool expected;
size_t counter = 0;
do {
expected = false;
if (counter > 100) {
Sleep(10);
}
else if (counter > 20) {
Sleep(5);
}
else if (counter > 3) {
Sleep(1);
}
counter++;
} while (!state.compare_exchange_weak(expected, true));
}

Using (writing) same variable in multiple threads simultaneously causes undefined behavior and crashes. Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
It is only undefined behaviour for regular variables, and only if there is no synchronisation. std::mutex is defined to be thread safe. It's entire point is to provide synchronisation to other objects.
From [intro.races]:
The library defines a number of atomic operations ([atomics]) and operations on mutexes ([thread]) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another.
...
Note: For example, a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. Correspondingly, a call that releases the same mutex will perform a release operation on those same locations.
Certain library calls synchronize with other library calls performed by another thread.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
(Emphasis added)

Why mutex can be used in different threads?
How could it possibly be useful if it couldn't be used in different threads? Synchronizing multiple threads with a shared mutex is the only reason for it to exist at all.
Using (writing) same variable in multiple threads simultaneously
It's a bad idea to only worry about things happening "simultaneously". The problem is generally things happening with undetermined ordering, ie, unpredictably.
There are lots of multi-threading bugs that seem impossible if you believe things have to be simultaneous to go wrong.
causes undefined behavior and crashes.
Undefined Behaviour is not required to cause a crash. If it had to crash, crashing would be behaviour which was ... defined. There are endless questions on here from people who don't understand this, asking why their "undefined behaviour test" didn't crash.
Why using mutex, despite on fact that they are also variables, not causes undefined behavior?
Because of the way they're used. Mutexes are not simply assigned to or read from, like simple variables, but are manipulated with specialized code designed specifically to do this correctly.
You can almost write your own mutex - a spinlock, anyway - just by using std::atomic<int> and a lot of care.
The difference is that a mutex also interacts with your operating system scheduler, and that interface is not portably exposed as part of the language standard. The std::mutex class bundles the OS-specific part up in a class with correct semantics, so you can write portable C++ instead of being limited to, say, POSIX-compatible C++ or Windows-compatible C++.
In your exploration of the VS std::mutex implementation, you ignored the mtx->_get_cs()->lock() part: this is using the Windows Critical Section to interact with the scheduler.
Your implementation is an attempt at a spinlock, which is fine so long as you know the lock is never held for long (or if you don't mind sacrificing a core for each blocked thread). By contrast, the mutex allows a waiting thread to be de-scheduled until the lock is released. This is the part handled by the Critical Section.
You also ignored all the timeout and recursive locking code - which is fine, but your implementation isn't really attempting to do the same thing as the original.

A mutex is specifically made to synchronize code on different threads that work on the same resource. It is designed not have issues when used on multi-threaded. Be aware that when using different mutexes together, you can still deadlocks when taking them in different order on different threads. C++'s std::unique_lock solves this.
A variable is meant to use on either single threads or on synchronized threads because that's how they can be accessed in the fastest way. This has to do with computer architecture (registers, cache, operations in several steps).
To work with variables on non-synchronized threads, you can work with std::atomic variables. That can be faster than synchronizing, but the access is slower and more cumbersome than for normal variables. Some more complex situations with several variables can only be handled in a synchronous way.

Related

Shared vector Synchronization with an atomic boolean

I am having a shared vector which gets accessed by two threads.
A function from thread A pushs into the vector and a function from thread B swaps the vector completely for processing.
MovetoVec(PInfo* pInfo)
{
while(1)
{
if(GetSwitch())
{
swapBucket->push_back(pInfo);
toggles = true;
break;
}
else if(pInfo->tryMove == 5)
{
delete pInfo;
break;
}
pInfo->tryMove++;
Sleep(25);
}
}
The thread A tries to get atomic boolean toggles to true and pushes into vector.(the above MoveToVec function will be called by many number of threads). The function GetSwitch is defined as
GetSwitch()
{
if(toggles)
{
toggles = false;
return TRUE;
}
else
return FALSE;
}
toggles here is atomic_bool.And the another function from thread B that swaps the vector is
GetClassObj(vector<ProfiledInfo*>* toSwaps)
{
if(GetSwitch())
{
toSwaps->swap(*swapBucket);
toggles = true;
}
}
If GetSwitch returns false then the threadB does nothing. Here i dint use any locking. It works in most of the cases. But some time one of the pInfo objects in swapBucket is NULL. I got to know it is because of poor synchronization.
I followed this type of GetSwitch() logic just to neglect the overhead caused by locking. Should i drop this out and go back to mutex or critical section stuffs?
Your GetSwitch implementation is wrong. It is possible for multiple threads to acquire the switch simultaneously.
An example of such a scenario with just two threads:
Thread 1 | Thread 2
--------------------------|--------------------------
if (toggles) |
| if (toggles)
toggles = false; |
| toggles = false;
The if-test and assignment are not an atomic operation and therefore cannot be used to synchronize threads on their own.
If you want to use an atomic boolean as a means of synchronization, you need to compare and exchange the value in one atomic operation. Luckily, C++ provides such an operation called std::compare_exchange, which is available in a weak and strong flavor (the weak one may spuriously fail but is cheaper when called in a loop).
Using this operation, your GetSwitch method would become:
bool GetSwitch()
{
bool expected = true; // The value we expect 'toggles' to have
bool desired = false; // The value we want 'toggles' to get
// Check if 'toggles' is as expected, and if it is, update it to the desired value
bool result = toggles.compare_exchange_strong(&expected, desired);
// The result of the compare_exchange is true if the value was updated and false if it was not
return result;
}
This will ensure that comparing and updating the value happens atomically.
Note that the C++ standard does not guarantee an atomic boolean to be lock-free. In your case, you could also use std::atomic_flag which is guaranteed to be lock-free by the standard! Carefully read the example though, it works a tad bit different than atomic variables.
Writing lock-free code, as you are attempting to do, is quite complex and error-prone.
My advice would be to write the code with locks first and ensure it is 100% correct. Mutexes are actually surprisingly fast, so performance should be okay in most cases. A good read on lock performance: http://preshing.com/20111118/locks-arent-slow-lock-contention-is
Only once you have profiled your code, and convinced yourself that the locks are impacting performance, you should attempt to write the code lock-free. Then profile again because lock-free code is not necessarily faster.

C++ concurrency: Variable visibility outside of mutexes [duplicate]

This question already exists:
C++ concurrency: conditional atomic operations?
Closed 6 years ago.
I'm having some trouble understanding when variables are forced to be written to memory, even outside of mutex blocks. I apologize for the convoluted code below, because I have stripped away logic that deals with whether reader decides if some data is stale. The important thing to note is that 99.9% of the time, readers will take the fast path and synchronization must be very fast, which is why I use an atomic int32 to communicate both staleness and whether the slow path is now necessary.
I have the following setup, which I am "fairly" certain is race-free:
#define NUM_READERS 10
BigObject mSharedObject;
std::atomic_int32_t mStamp = 1;
std::mutex mMutex;
std::condition_variable mCondition;
int32_t mWaitingReaders = 0;
void reader() {
for (;;) { // thread loop
for (;;) { // spin until stamp is acceptible
int32_t stamp = mStamp.load();
if (stamp > 0) { // fast path
if (stampIsAcceptible(stamp) &&
mStamp.compare_exchange_weak(stamp, stamp + 1)) {
break;
}
} else { // slow path
// tell the loader (writer) that we're halted
std::unique_lock<mutex> lk(mMutex);
mWaitingReaders++;
mCondition.notify_all();
while (mWaitingReaders != 0) {
mCondition.wait(lk);
} // ###
lk.unlock();
// *** THIS IS WHERE loader's CHANGES TO mSharedObject
// *** MUST BE VISIBLE TO THIS THREAD!
}
}
// stamp acceptible; mSharedObject guaranteed not written to
mSharedObject.accessAndDoFunStuff();
mStamp.fetch_sub(1); // part of hidden staleness logic
}
}
void loader() {
for (;;) { // thread loop
// spin until we somehow decide we want to change mSharedObject!
while (meIsHappySleeping()) {}
// we want to modify mSharedObject, so set mStamp to 0 and wait
// for readers to see this and report that they are now waiting
int32_t oldStamp = mStamp.exchange(0);
unique_lock<mutex> lk(mMutex);
while (mWaitingReaders != NUM_READERS) {
mCondition.wait(lk);
}
// all readers are waiting. start writing to mSharedObject
mSharedObject.loadFromFile("example.foo");
mStamp.store(oldStamp);
mWaitingReaders = 0; // report completion
lk.unlock();
mCondition.notify_all();
// *** NOW loader's CHANGES TO mSharedObject
// *** MUST BE VISIBLE TO THE READER THREADS!
}
}
void setup() {
for (int i = 0; i < NUM_READERS; i++) {
std::thread t(reader); t.detach();
}
std::thead t(loader); t.detach();
}
The parts marked in stars *** are what concerns me. This is because while my code excludes races (as far as I can see), mSharedObject is only protected by a mutex while being written to by loader(). Because reader() needs to be extremely fast (as noted above), I do not want its read-only accesses to mSharedObject to have to be protected by a mutex.
One "guaranteed" solution is to introduce a thread-local variable const BigObject *latestObject at line ###, which is set to &mSharedObject and then use that for access. But is this bad practice? And is it really necessary? Will the atomic operations / mutex release operations guarantee that readers see the changes?
Thanks!
Lock-free code, and even locking code using just atomics is far from simple. The first thing to do would be to just add a mutex and profile how much of the performance is actually lost in the synchronisation. Note that current implementations of mutex may just do a quick spin-lock, which is roughly an atomic operation when uncontended.
If you want to attempt lock-free programming you will need to look into the memory ordering arguments to atomic operations. The writer will need to ..._release to synchronise with a reader doing ..._acquire (or use sequential consistency in both sides). Otherwise the reads/writes to any other variables may not be visible.

Using a mutex to block execution from outside the critical section

I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?
If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.
You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).

Using std::condition_variable with atomic<bool>

There are several questions on SO dealing with atomic, and other that deal with std::condition_variable. But my question if my use below is correct?
Three threads, one ctrl thread that does preparation work before unpausing the two other threads. The ctrl thread also is able to pause the worker threads (sender/receiver) while they are in their tight send/receive loops.
The idea with using the atomic is to make the tight loops faster in case the boolean for pausing is not set.
class SomeClass
{
public:
//...
// Disregard that data is public...
std::condition_variable cv; // UDP threads will wait on this cv until allowed
// to run by ctrl thread.
std::mutex cv_m;
std::atomic<bool> pause_test_threads;
};
void do_pause_test_threads(SomeClass *someclass)
{
if (!someclass->pause_test_threads)
{
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
}
}
void unpause_test_threads(SomeClass *someclass)
{
if (someclass->pause_test_threads)
{
{
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = false;
}
someclass->cv.notify_all(); // Allow send/receive threads to run.
}
}
void wait_to_start(SomeClass *someclass)
{
std::unique_lock<std::mutex> lk(someclass->cv_m); // RAII, no need for unlock.
auto not_paused = [someclass](){return someclass->pause_test_threads == false;};
someclass->cv.wait(lk, not_paused);
}
void ctrl_thread(SomeClass *someclass)
{
// Do startup work
// ...
unpause_test_threads(someclass);
for (;;)
{
// ... check for end-program etc, if so, break;
if (lost ctrl connection to other endpoint)
{
pause_test_threads();
}
else
{
unpause_test_threads();
}
sleep(SLEEP_INTERVAL);
}
unpause_test_threads(someclass);
}
void sender_thread(SomeClass *someclass)
{
wait_to_start(someclass);
...
for (;;)
{
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
...
}
}
void receiver_thread(SomeClass *someclass)
{
wait_to_start(someclass);
...
for (;;)
{
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
...
}
I looked through your code manipulating conditional variable and atomic, and it seems that it is correct and won't cause problems.
Why you should protect writes to shared variable even if it is atomic:
There could be problems if write to shared variable happens between checking it in predicate and waiting on condition. Consider following:
Waiting thread wakes spuriously, aquires mutex, checks predicate and evaluates it to false, so it must wait on cv again.
Controlling thread sets shared variable to true.
Controlling thread sends notification, which is not received by anybody, because there is no threads waiting on conditional variable.
Waiting thread waits on conditional variable. Since notification was already sent, it would wait until next spurious wakeup, or next time when controlling thread sends notification. Potentially waiting indefinetly.
Reads from shared atomic variables without locking is generally safe, unless it introduces TOCTOU problems.
In your case you are reading shared variable to avoid unnecessary locking and then checking it again after lock (in conditional wait call). It is a valid optimisation, called double-checked locking and I do not see any potential problems here.
You might want to check if atomic<bool> is lock-free. Otherwise you will have even more locks you would have without it.
In general, you want to treat the fact that variable is atomic independently of how it works with a condition variable.
If all code that interacts with the condition variable follows the usual pattern of locking the mutex before query/modification, and the code interacting with the condition variable does not rely on code that does not interact with the condition variable, it will continue to be correct even if it wraps an atomic mutex.
From a quick read of your pseudo-code, this appears to be correct. However, pseudo-code is often a poor substitute for real code for multi-threaded code.
The "optimization" of only waiting on the condition variable (and locking the mutex) when an atomic read says you might want to may or may not be an optimization. You need to profile throughput.
atomic data doesn't need another synchronization, it's basis of lock-free algorithms and data structures.
void do_pause_test_threads(SomeClass *someclass)
{
if (!someclass->pause_test_threads)
{
/// your pause_test_threads might be changed here by other thread
/// so you have to acquire mutex before checking and changing
/// or use atomic methods - compare_exchange_weak/strong,
/// but not all together
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
}
}

Can I switch the test and modification part in wait/signal semaphore?

The classic none-busy-waiting version of wait() and signal() semaphore are implemented as below. In this verson, value can be negative.
//primitive
wait(semaphore* S)
{
S->value--;
if (S->value < 0)
{
add this process to S->list;
block();
}
}
//primitive
signal(semaphore* S)
{
S->value++;
if (S->value <= 0)
{
remove a process P from S->list;
wakeup(P);
}
}
Question: Is the following version also correct? Here I test first and modify the value. It's great if you can show me a scenario where it doesn't work.
//primitive wait().
//If (S->value > 0), the whole function is atomic
//otherise, only if(){} section is atomic
wait(semaphore* S)
{
if (S->value <= 0)
{
add this process to S->list;
block();
}
// here I decrement the value after the previous test and possible blocking
S->value--;
}
//similar to wait()
signal(semaphore* S)
{
if (S->list is not empty)
{
remove a process P from S->list;
wakeup(P);
}
// here I increment the value after the previous test and possible waking up
S->value++;
}
Edit:
My motivation is to figure out whether I can use this latter version to achieve mutual exclusion, and no deadlock, no starvation.
Your modified version introduces a race condition:
Thread A: if(S->Value < 0) // Value = 1
Thread B: if(S->Value < 0) // Value = 1
Thread A: S->Value--; // Value = 0
Thread B: S->Value--; // Value = -1
Both threads have acquired a count=1 semaphore. Oops. Note that there's another problem even if they're non-preemptible (see below), but for completeness, here's a discussion on atomicity and how real locking protocols work.
When working with protocols like this, it's very important to nail down exactly what atomic primitives you are using. Atomic primitives are such that they seem to execute instantaneously, without being interleaved with any other operations. You cannot just take a big function and call it atomic; you have to make it atomic somehow, using other atomic primitives.
Most CPUs offer a primitive called 'atomic compare and exchange'. I'll abbreviate it cmpxchg from here on. The semantics are like so:
bool cmpxchg(long *ptr, long old, long new) {
if (*ptr == old) {
*ptr = new;
return true;
} else {
return false;
}
}
cmpxchg is not implemented with this code. It is in the CPU hardware, but behaves a bit like this, only atomically.
Now, let's add to this some additional helpful functions (built out of other primitives):
add_waitqueue(waitqueue) - Sets our process state to sleeping and adds us to a wait queue, but continues executing (ATOMIC)
schedule() - Switch threads. If we're in a sleeping state, we don't run again until awakened (BLOCKING)
remove_waitqueue(waitqueue) - removes our process from a wait queue, then sets our state to awakened if it isn't already (ATOMIC)
memory_barrier() - ensures that any reads/writes logically before this point actually are performed before this point, avoiding nasty memory ordering issues (we'll assume all other atomic primitives come with a free memory barrier, although this isn't always true) (CPU/COMPILER PRIMITIVE)
Here's how a typical semaphore acquisition routine will look. It's a bit more complex than your example, because I've explicitly nailed down what atomic operations I'm using:
void sem_down(sem *pSem)
{
while (1) {
long spec_count = pSem->count;
read_memory_barrier(); // make sure spec_count doesn't start changing on us! pSem->count may keep changing though
if (spec_count > 0)
{
if (cmpxchg(&pSem->count, spec_count, spec_count - 1)) // ATOMIC
return; // got the semaphore without blocking
else
continue; // count is stale, try again
} else { // semaphore count is zero
add_waitqueue(pSem->wqueue); // ATOMIC
// recheck the semaphore count, now that we're in the waitqueue - it may have changed
if (pSem->count == 0) schedule(); // NOT ATOMIC
remove_waitqueue(pSem->wqueue); // ATOMIC
// loop around again to try to acquire the semaphore
}
}
}
You'll note that the actual test for a non-zero pSem->count, in a real-world semaphore_down function, is accomplished by cmpxchg. You can't trust any other read; the value can change an instant after you read the value. We simply can't separate the value check and the value modification.
The spec_count here is speculative. This is important. I'm essentially making a guess at what the count will be. It's a pretty good guess, but it's a guess. cmpxchg will fail if my guess is wrong, at which point the routine has to loop and try again. If I guess 0, then I will either be woken up (as it ceases to be zero while I'm on the waitqueue), or I will notice it's not zero anymore in the schedule test.
You should also note that there is no possible way to make a function that contains a blocking operation atomic. It's nonsensical. Atomic functions, by definition, appear to execute instantaneously, not interleaved with anything else whatsoever. But a blocking function, by definition, waits for something else to happen. This is inconsistent. Likewise, no atomic operation can be 'split up' across a blocking operation, which it is in your example.
Now, you could do away with a lot of this complexity by declaring the function non-preemptable. By using locks or other methods, you simply ensure only one thread is ever running (not including blocking of course) in the semaphore code at a time. But a problem still remains then. Start with a value of 0, where C has taken the semaphore down twice, then:
Thread A: if (S->Value < 0) // Value = 0
Thread A: Block....
Thread B: if (S->Value < 0) // Value = 0
Thread B: Block....
Thread C: S->Value++ // value = 1
Thread C: Wakeup(A)
(Thread C calls signal() again)
Thread C: S->Value++ // value = 2
Thread C: Wakeup(B)
(Thread C calls wait())
Thread C: if (S->Value <= 0) // Value = 2
Thread C: S->Value-- // Value = 1
// A and B have been woken
Thread A: S->Value-- // Value = 0
Thread B: S->Value-- // Value = -1
You could probably fix this with a loop to recheck S->value - again, assuming you are on a single processor machine and your semaphore code is preemptable. Unfortunately, these assumptions are false on all desktop OSes :)
For more discussion on how real locking protocols work, you might be interested in the paper "Fuss, Futexes and Furwocks: Fast Userlevel Locking in Linux"