Why do condition variables sometimes erroneously wake up? - concurrency

I've known for eons that the way you use a condition variable is
lock
while not task_done
wait on condition variable
unlock
Because sometimes condition variables will spontaneously wake. But I've never understood why that's the case. In the past I've read it's expensive to make a condition variable that doesn't have that behavior, but nothing more than that.
So... why do you need to worry about falsely being woken up when waiting on a condition variable?

It isn't that the condition variable will erroneously wake up; the condition variable will only wake up if it has been signalled from another thread. However, it is possible that by the time the thread has been re-scheduled for execution, some other thread has already managed to nab the resource on which you were waiting, and so it is necessary to double-check. For example, if a group of threads x,y,z are waiting on some resource R that w was previously holding, and x,y,z,w communicate through a condition variable... suppose w is done with R and signals x,y,z. So, x,y, and z will all be taken off of the wait queue and placed in the runqueue to be scheduled for execution. Suppose x is scheduled first... so then it acquires R, and then it might be put to sleep, and then y might be scheduled, and so when y is running, the resource R on which y was previously waiting is still not available, so it is necessary for y to go to sleep again. Then z wakes up, and z also finds that R is still in use, so z needs to go back to sleep again, etc.
If you have exactly two threads, and the condition variable is shared between just the two of them, there are sometimes situations where it is ok to not perform that check. However, if you want to make your application dynamic and capable of scaling up to an arbitrary number of threads, then it's good to be in the habit (not to mention much simpler and less worrisome) to do that extra check as it is required in most situations.

Threads can wake up without being signaled. This is called a spurious wakeup. However, just precisely why they occur is a question that seems to be mired in superstition and uncertainty. Reasons I have seen include being a side effect of the way threading implementations work, or being intentionally added to force programmers to properly use loops instead of conditionals around wait.

Related

Any way to check a certain number of threads have completed

I have an application where I need at least (for some consistency guarantees) k out of n threads to return. After that, I'd like the other threads to continue running, but after k threads finish, then I continue along. I can have a variable that gets incremented at the end of each thread, and then have a spin lock, but I'd prefer to not have a spin lock. Any suggestions for something else to do?
I can have a variable that gets incremented at the end of each thread
That would work. I recommend doing that.
and then have a spin lock, but I'd prefer to not have a spin lock
There's no need to use a spin lock. Wait on a condition variable instead.

Is this implementation of a general semaphore with binary semaphores correct?

Prove or Disprove the correctness of the following semaphore.
Here are my thoughts on this.
Well, if someone implements it so wait runs first before signal, there will be a deadlock. The program will call wait, decrement count, enter the count < 0 condition and wait at gate. Because it is waiting at gate, it cannot proceed to the signal that is right after the wait. So in that case, this might imply that the semaphore is incorrect.
However, if we assume that two processes are running, one running wait first and the other running signal first, then if the first process run waits and blocks at wait(gate), then the other process can run signal and release the process that was blocked. Thus, continuing on this scheme would make the algorithm valid and not result in a dead lock.
Given implementation follows these principles:
Binary semaphore S protect count variable from concurrent access.
If non-negative, count reflect number of free resources for general semaphore. Otherwise, absolute value of count reflect number of threads which wait (p5) or ready-to-wait (between p4 and p5) on binary semaphore gate.
Every signal() call increments count and, if its previous value is negative, signals binary semaphore gate.
But, because of possibility of ready-to-wait state, given implementation is incorrect:
Assume thread#1 calls wait(), and currently is in ready-to-wait state. Assume another thread#2 also calls wait(), and currently is in ready-to-wait state too.
Assume thread#3 calls signal() at this moment. Because count is negative (-2), the thread performs all operations including p10 (signal(gate)). Because gate is not waited at the moment, it becomes in free state.
Assume another thread#4 calls signal() at this moment. Because count is still negative (-1), the thread also performs all operations including p10. But now gate is already in free state. So, signal(gate) is no-op here, and we have missed signal event: only one of thread#1 and thread#2 will continue after executing p5 (wait(gate)). Other thread will wait forever.
Without possibility of ready-to-wait state (that is signal(S) and wait(gate) would be executed atomically) implementation would be OK.

Mutex granularity

I have a question regarding threads. It is known that basically when we call for mutex(lock) that means that thread keeps on executing the part of code uninterrupted by other threads until it meets mutex(unlock). (At least that's what they say in the book) So my question is if it is actually possible to have several scoped WriteLocks which do not interfere with each other. For example something like this:
If I have a buffer with N elements without any new elements coming, however with high frequency updates (like change value of Kth element) is it possible to set a different lock on each element so that the only time threads would stall and wait is if actually 2 or more threads are trying to update the same element?
To answer your question about N mutexes: yes, that is indeed possible. What resources are protected by a mutex depends entirely on you as the user of that mutex.
This leads to the first (statement) part of your question. A mutex by itself does not guarantee that a thread will work uninterrupted. All it guarantees is MUTual EXclusion - if thread B attempts to lock a mutex which thread A has locked, thread B will block (execute no code) until thread A unlocks the mutex.
This means mutexes can be used to guarantee that a thread executes a block of code uninterrupted; but this works only if all threads follow the same mutex-locking protocol around that block of code. Which means it is your responsibility to assign semantics (or meaning) to each individual mutex, and correctly adhere to those semantics in your code.
If you decide for the semantics to be "I have an array a of N data elements and an array m of N mutexes, and accessing a[i] can only be done when m[i] is locked," then that's how it will work.
The need to consistently stick to the same protocol is why you should generally encapsulate the mutex and the code/data protected by it in a class in some way or another, so that outside code doesn't need to know the details of the protocol. It just knows "call this member function, and the synchronisation will happen automagically." This "automagic" will be the class correcrtly implementing the protocol.
A crucial consideration when deciding between a mutex per array and a mutex per element is whether there are operations - like tracking the number of "in-use" array elements, the "active" element, or moving a pointer-to-array to a larger buffer - that can only be done safely by one thread while all the others are blocked.
A lesser but sometimes important consideration is the amount of extra memory more mutexes use.
If you genuinely need to do this kind of update as quickly as possible in a highly contested multi-threaded program, you may also want to learn about lock-free atomic types and their compare-and-swap / exchange operations, but I'd recommend against considering that unless profiling the existing locking is significant in your overall program performance.
A mutex does not stop other threads from running completely, it only stops other threads from locking the same mutex. I.e. while one thread is keeping the mutex locked, the operating system continues to do context switches letting other threads run also, but if any other thread is trying to lock the same mutex its execution will be halted until the mutex is unlocked.
So yes, you can indeed have several different mutexes and lock/unlock them independently. Just beware of deadlocks, i.e. if one thread can lock more than one mutex at a time you can run into a situation where thread 1 has locked mutex A and is trying to lock mutex B but blocks because thread 2 already has mutex B locked and it is trying to lock mutex A..
Its not completely clear that your use case is:
the threads gets a buffer assigned on that they have to work
the threads have some results and request a special buffer to update.
On the first variant you need some assignment logic that assigns a buffer to a thread.
This logic has to be exectued in an atomic way. so the best is to use a mutex to protect the assignment logic.
On the other variant it may be the best to have a vector of mutexes, one for each buffer element.
In Both cases the buffer does not need a protection because it (or better each field of it) is only accessed from one thread at a time.
You also may inform yourself about 'semaphores'. These contain a counter that allows to manage ressources that have a limited amount but more than one. Mutexes are a special case of semaphores with n=1.
You can have mutex per entry, C++11 mutex can be easily converted into an adaptive-spinlock, so you can achieve good CPU/Latency tradeoff.
Or, if you need very low latency yet have enough CPUs you can use an atomic "busy" flag per entry and spin in a tight compare-exchange loop on contention.
From experience, though, the best performance and scalability are achieved when concurrent writes are serialized via a command queue (or a queue of smaller immutable buffers to be concatenated at destination) and a single thread processing the queue.

What could happen if two threads access the same bool variable at the same time?

I have a cross platform c++ program where I'm using the boost libraries to create an asynchronous timer.
I have a global variable:
bool receivedInput = false;
One thread waits for and processes input
string argStr;
while (1)
{
getline(cin, argStr);
processArguments(argStr);
receivedInput = true;
}
The other thread runs a timer where a callback gets called every 10 seconds. In that callback, I check to see if I've received a message
if (receivedInput)
{
//set up timer to fire again in 10 seconds
receivedInput = false;
}
else
exit(1);
So is this safe? For the read in thread 2, I think it wouldn't matter since the condition will evaluate to either true or false. But I'm unsure what would happen if both threads try to set receivedInput at the same time. I also made my timer 3x longer than the period I expect to receive input so I'm not worried about a race condition.
Edit:
To solve this I used boost::unique_lock when I set receivedInput and boost::shared_lock when I read receivedInput. I used an example from here
This is fundamentally unsafe. After thread 1 has written true to receivedInput it isn't guaranteed that thread 2 will see the new value. For example, the compiler may optimize your code making certain assumptions about the value of receivedInput at the time it is used as the if condition or caching it in a register, so you are not guaranteed that main memory will actually be read at the time the if condition is evaluated. Also, both compiler and CPU may change the order of reads and writes for optimization, for example true may be written to receivedInput before getLine() and processArguments().
Moreover, relying on timing for synchronization is a very bad idea since often you have no guarantees as to the amount of CPU time each thread will get in a given time interval or whether it will be scheduled in a given time interval at all.
A common mistake is to think that making receivedInput volatile may help here. In fact, volatile guarantees that values are actually read/written to the main memory (instead of for example being cached in a register) and that reads and writes of the variable are ordered with respect to each other. However, it does not guarantee that the reads and writes of the volatile variable are ordered with respect to other instructions.
You need memory barriers or a proper synchronization mechanism for this to work as you expect.
You would have to check your threading standard. Assuming we're talking about POSIX threads, this is explicitly undefined behavior -- an object may not be accessed by one thread while another thread is or might be modifying it. Anything can happen.
If your threads use the value of receivedInput to control independent code blocks, but not to synchronize with each other, there is one simple solution:
add "volatile" before receivedInput, so the compiler will not do the optimization preventing the threads share the value of receivedInput.

How do I safely read a variable from one thread and modify it from another?

I have a class instances which is being used in multiple threads. I am updating multiple member variables from one thread and reading the same member variables from one thread. What is the correct way to maintain the thread safety?
eg:
phthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
//unlock here?
Should I unlock the mutex over here? ( if another thread access the obj1 member variables 1 and 2 now, the accessed data might not be correct because memberV2 has not yet be updated. However, if I does not release the lock, the other thread might block because there is time consuming operation below.
//perform some time consuming operation which must be done before the assignment to memberV2 and after the assignment to memberV1
obj1.memberV2 = update field 2 from some calculation
pthread_mutex_unlock(&mutex1) //should I only unlock here?
Thanks
Your locking is correct. You should not release the lock early just to allow another thread to proceed (because that would allow the other thread to see the object in an inconsistent state.)
Perhaps it would be better to do something like:
//perform time consuming calculation
pthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
obj1.memberV2 = result;
pthread_mutex_unlock(&mutex1)
This of course assumes that the values used in the calculation won't be modified on any other thread.
Its hard to tell what you are doing that is causing problems. The mutex pattern is pretty simple. You Lock the mutex, access the shared data, unlock the mutex. This protects data, becuase the mutex will only let one thread get the lock at a time. Any thread that fails to get the lock has to wait till the mutex is unlocked. Unlocking wakes the waiters up. They will then fight to attain the lock. Losers go back to sleep. The time it takes to wake up might be multiple ms or more from the time the lock is released. Make sure you always unlock the mutex eventually.
Make sure you don't to keep locks locked for a long period of time. Most of the time, a long period of time is like a micro second. I prefer to keep it down around "a few lines of code." Thats why people have suggested that you do the long running calculation outside the lock. The reason for not keeping locks a long time is you increase the number of times other threads will hit the lock and have to spin or sleep, which decreases performance. You also increase the probability that your thread might be pre-empted while owning the lock, which means the lock is enabled while that thread sleeps. Thats even worse performance.
Threads that fail a lock dont have to sleep. Spinning means a thread encountering a locked mutex doesn't sleep, but loops repeatedly testing the lock for a predefine period before giving up and sleeping. This is a good idea if you have multiple cores or cores capable of multiple simultaneous threads. Multiple active threads means two threads can be executing the code at the same time. If the lock is around a small amount of code, then the thread that got the lock is going to be done real soon. the other thread need only wait a couple nano secs before it will get the lock. Remember, sleeping your thread is a context switch and some code to attach your thread to the waiters on the mutex, all have costs. Plus, once your thread sleeps, you have to wait for a period of time before the scheduler wakes it up. that could be multiple ms. Lookup spinlocks.
If you only have one core, then if a thread encounters a lock it means another sleeping thread owns the lock and no matter how long you spin it aint gonna unlock. So you would use a lock that sleeps a waiter immediately in hopes that the thread owning the lock will wake up and finish.
You should assume that a thread can be preempted at any machine code instruction. Also you should assume that each line of c code is probably many machine code instructions. The classic example is i++. This is one statement in c, but a read, an increment, and a store in machine code land.
If you really care about performance, try to use atomic operations first. Look to mutexes as a last resort. Most concurrency problems are easily solved with atomic operations (google gcc atomic operations to start learning) and very few problems really need mutexes. Mutexes are way way way slower.
Protect your shared data wherever it is written and wherever it is read. else...prepare for failure. You don't have to protect shared data during periods of time when only a single thread is active.
Its often useful to be able to run your app with 1 thread as well as N threads. This way you can debug race conditions easier.
Minimize the shared data that you protect with locks. Try to organize data into structures such that a single thread can gain exclusive access to the entire structure (perhaps by setting a single locked flag or version number or both) and not have to worry about anything after that. Then most of the code isnt cluttered with locks and race conditions.
Functions that ultimately write to shared variables should use temp variables until the last moment and then copy the results. Not only will the compiler generate better code, but accesses to shared variables especially changing them cause cache line updates between L2 and main ram and all sorts of other performance issues. Again if you don't care about performance disregard this. However i recommend you google the document "everything a programmer should know about memory" if you want to know more.
If you are reading a single variable from the shared data you probably don't need to lock as long as the variable is an integer type and not a member of a bitfield (bitfield members are read/written with multiple instructions). Read up on atomic operations. When you need to deal with multiple values, then you need a lock to make sure you didn't read version A of one value, get preempted, and then read version B of the next value. Same holds true for writing.
You will find that copies of data, even copies of entire structures come in handy. You can be working on building a new copy of the data and then swap it by changing a pointer in with one atomic operation. You can make a copy of the data and then do calculations on it without worrying if it changes.
So maybe what you want to do is:
lock the mutex
Make a copy of the input data to the long running calculation.
unlock the mutex
L1: Do the calculation
Lock the mutex
if the input data has changed and this matters
read the input data, unlock the mutex and go to L1
updata data
unlock mutex
Maybe, in the example above, you still store the result if the input changed, but go back and recalc. It depends if other threads can use a slightly out of date answer. Maybe other threads when they see that a thread is already doing the calculation simply change the input data and leave it to the busy thread to notice that and redo the calculation (there will be a race condition you need to handle if you do that, and easy one). That way the other threads can do other work rather than just sleep.
cheers.
Probably the best thing to do is:
temp = //perform some time consuming operation which must be done before the assignment to memberV2
pthread_mutex_lock(&mutex1)
obj1.memberV1 = 1;
obj1.memberV2 = temp; //result from previous calculation
pthread_mutex_unlock(&mutex1)
What I would do is separate the calculation from the update:
temp = some calculation
pthread_mutex_lock(&mutex1);
obj.memberV1 = 1;
obj.memberV2 = temp;
pthread_mutex_unlock(&mutex1);