I have been looking to implement a solution for readers-writers problem using threading/synchronization constructs introduced since c++11.
I ran into this question, the most voted answer has this code -
Reader Thread
// --- read code
rw_mtx.lock(); // will block if there is a write in progress
read_count += 1; // announce intention to read
rw_mtx.unlock();
cell_value = data_base[cell_number];
rw_mtx.lock();
read_count -= 1; // announce intention to read
if (read_count == 0) rw_write_q.notify_one();
rw_mtx.unlock();
Writer Thread
// --- write code
std::unique_lock<std::mutex> rw_lock(rw_mtx);
write_count += 1;
rw_write_q.wait(rw_lock, []{return read_count == 0;});
data_base[cell_number] = cell_value;
write_count -= 1;
if (write_count > 0) rw_write_q.notify_one();
In the writer thread, before writing to data_base[cell_number], shouldn't there be a memory-barrier/fence, to synchronize access to that shared memory?(Same for reader thread)
If you agree withe the above(yay!), how can this be acheived? Looking to improve my understanding here.
Thanks for your help!
Memory barriers in high level programming languages are considered intrinsic to the behavior of the locks provided by the language. From Wikipedia's Memory Barrier page:
Multithreaded programs usually use synchronization primitives provided by a high-level programming environment, such as Java and .NET Framework, or an application programming interface (API) such as POSIX Threads or Windows API. ... In such environments explicit use of memory barriers is not generally necessary.
Wikipedia
If you dig into the source code of pthread_mutex_lock, for example, you will see reliance on futex and atomic exchange functions, which would use a memory barrier.
Your comments seem to indicate that you do not understand why the code sample you pulled from the answer implements a readers-writer lock.
As mentioned in the answer you cited, the code you show has a fairness issue, in that a waiting writer may get starved out by a constant stream of readers. However, if we ignore that issue for now, let us first agree that:
1. No writer will enter the critical section if there is already at least one reader in the critical section.
This is because the condition variable waits for the reader count to reach zero. The way a condition variable works is that it releases the mutex if the condition is false, and acquires the lock when it is signaled. Upon acquiring the mutex, it re-tests the condition, and continues its hold on the mutex if the condition is true.
2. When there are no readers, only one writer will enter the critical section.
The critical section for the writer is both the reader count state of the readers writer lock and section of code that requires write lock protection.
Since the mutex is held when the condition is true, and the lock provides exclusive access, there is only one writer in the critical section which will be writing a new value into the array.
Upon completing the critical section, a new signal is raised on the condition variable to wake up any other waiters and the governing mutex is then released. Since the mutex provides exclusive access, upon release, only one thread will be allowed to acquire the mutex, whether it is a pending writer or reader.
3. Multiple readers may enter the read critical section.
The critical section for a reader is treated differently than the writer. For the reader, we assume the state of the array is synchronized with the most recent write when the lock is acquired. But, reads will not alter the state of the array. So, the acquired mutex critical section is the reader count. Meanwhile, the readers-writer lock (implemented using the mutex and condition variable) critical section includes the part of the code that requires read access to the array.
So, on entering a critical section, the acquired mutex is used to increment the reader count. Once the reader count is non-zero, the readers-writer lock is now being held by a reader and will cause writers to wait on the condition variable. Once the reader count is incremented, the acquired mutex can be released.
Since the mutex was released, a different reader thread can now acquire the mutex and also increment the reader count, and then release the mutex. This allows multiple readers to enter the read critical section. The readers-writer lock remains held for reading since the reader count is positive. The mutex is released to allow other readers to enter.
Upon completing the read critical section, the mutex is acquired to decrement the reader count. If the count is zero, a signal on the condition variable is raised to wake up any pending writers. Then, the mutex is release.
If you looked towards the end of the answer you cited, you would have noticed a mention that C++17 has introduced a much nicer way to implement readers-writer locks. That would be with a shared_mutex.
class DataBase {
// ...
mutable std::shared_mutex rwlock_;
std::vector<ElementType> data_base_;
// ...
ElementType reader (int n) const {
std::shared_lock lock(rwlock_);
return data_base_[n];
}
// ...
void writer (int n, ElementType v) {
std::unique_lock lock(rwlock_);
data_base_[n] = v;
}
// ...
};
Related
Does std::condition_variable::notify_one() or std::condition_variable::notify_all() guarantee that non-atomic memory writes in the current thread prior to the call will be visible in notified threads?
Other threads do:
{
std::unique_lock lock(mutex);
cv.wait(lock, []() { return values[threadIndex] != 0; });
// May a thread here see a zero value and therefore start to wait again?
}
Main thread does:
fillData(values); // All values are zero and all threads wait() before calling this.
cv.notify_all(); // Do need some memory fence or lock before this
// to ensure that new non-zero values will be visible
// in other threads immediately after waking up?
Doesn't notify_all() store some atomic value therefore enforcing memory ordering? I did not clarified it.
UPD: according to Superlokkus' answer and an answer here: we have to acquire a lock to ensure memory writes visibility in other threads (memory propagation), otherwise threads in my case may read zero values.
Also I missed this quote here about condition_variable, which specifically answers my question. Even an atomic variable has to be modified under a lock in a case when the modification must become visible immediately.
Even if the shared variable is atomic, it must be modified under the
mutex in order to correctly publish the modification to the waiting
thread.
I guess you are mixing up memory ordering of so called atomic values and the mechanisms of classic lock based synchronization.
When you have a datum which is shared between threads, lets say an int for example, one thread can not simply read it while the other thread might be write to it meanwhile. Otherwise we would have a data race.
To get around this for long time we used classic lock based synchronization:
The threads share at least a mutex and the int. To read or to write any thread has to hold the lock first, meaning they wait on the mutex. Mutexes are build so that they are fine that this can happen concurrently. If a thread wins gettting the mutex it can change or read the int and then should unlock it, so others can read/write too. Using a conditional variable like you used is just to make the pattern "readers wait for a change of a value by a writer" more efficient, they get woken up by the cv instead of periodically waiting on the lock, reading, and unlocking, which would be called busy waiting.
So because you hold the lock in any after waiting on the mutex or in you case, correctly (mutex is still needed) waiting on the conditional variable, you can change the int. And readers will read the new value after the writer was able to wrote it, never the old. UPDATE: However one thing if have to add, which might also be the cause of confusion: Conditional variables are subject for so called spurious wakeups. Meaning even though you write did not have notified any thread, a read thread might still wake up, with the mutex locked. So you have to check if you writer actually waked you up, which is usually done by the writer by changing another datum just to notify this, or if its suitable by using the same datum you already wanted to share. The lambda parameter overload of std::condition_variable::wait was just made to make the checking and going back to sleep code looking a bit prettier. Based on your question now I don't know if you want to use you values for this job.
However at snippet for the "main" thread is incorrect or incomplete:
You are not synchronizing on the mutex in order to change values.
You have to hold the lock for that, but notifying can be done without the lock.
std::unique_lock lock(mutex);
fillData(values);
lock.unlock();
cv.notify_all();
But these mutex based patters have some drawbacks and are slow, only one thread at a time can do something. This is were so called atomics, like std::atomic<int> came into play. They can be written and read at the same time without an mutex by multiple threads concurrently. Memory ordering is only a thing to consider there and an optimization for cases where you uses several of them in a meaningful way or you don't need the "after the write, I never see the old value" guarantee. However with it's default memory ordering memory_order_seq_cst you would also be fine.
What part of memory gets locked by mutex when .lock() or .try_lock(), is it just the function or is it the whole program that gets locked?
Nothing is locked except the mutex. Everything else continues running (until it tries to lock an already locked mutex that is). The mutex is only there so that two threads cannot run the code between a mutex lock and a mutex unlock at the same time.
A mutex doesn't really lock anything, except for itself. You can think of a mutex as being a gate where you can only unlock it from the inside. When the gate is locked, any thread that tries to lock the mutex will sit there at the gate and wait for the current thread that is behind the gate to unlock it and let them in. When they gate is not locked then when you call lock you can just go in, close and lock the gate, and now no threads can get past the gate until you unlock it and let them in.
A mutex doesn't lock anything. You just use a mutex to communicate to other parts of your code that they should consider whatever you decide needs to be protected from access by several threads at the same time to be off-limits for now.
You could think of a mutex as something like a boolean okToModify. Whenever you want to edit something, you check if okToModify is true. If it is, you set it to false (preventing any other threads from modifying it), change it, then set okToModify back to true to tell the other threads you're done and give them a chance to modify:
// WARNING! This code doesn't actually work as a lock!
// it is just an example of the concept.
struct LockedInt {
bool okToModify; // This would be your mutex instead of a bool.
int integer;
};
struct LockedInt myLockedInt = { true, 0 };
...
while (myLockedInt.okToModify == false)
; // wait doing nothing until whoever is modifying the int is done.
myLockedInt.okToModify = false; // Prevent other threads from getting out of while loop above.
myLockedInt.integer += 1;
myLockedInt.okToModify = true; // Now other threads get out of the while loop if they were waiting and can modify.
The while loop and okToModify = false above is basically what locking a mutex does, and okToModify = true is what unlocking a mutex does.
Now, why do we need mutexes and don't use booleans? Because a thread could be running at the same time as those three lines above. The code for locking a mutex actually guarantees that the waiting for okToModify to become true and setting okToModify = false happen in one go, and therefore no other thread can get "in between the lines", for example by using a special machine-code instruction called "compare-and-exchange".
So do not use booleans instead of mutexes, but you can think of a mutex as a special, thread-safe boolean.
m.lock() doesn't really lock anything. What it does is, it waits to take ownership of the mutex. A mutex always either is owned by exactly one thread or else it is available. m.lock() waits until the mutex becomes available, and then it takes ownership of it in the name of the calling thread.
m.unlock releases the mutex (i.e., it relinquishes ownership), causing the mutex to once again become available.
Mutexes also perform another very important function. In modern C++, when some thread T performs a sequence of assignments of various values to various memory locations, the system makes no guarantees about when other threads U, V, and W will see those assignments, whether the other threads will see the assignments happen in the same order in which thread T performed them, or even, whether the other threads will ever see the assignments.
There are some quite complicated rules governing things that a programmer can do to ensure that different threads see a consistent view of shared memory objects (Google "C++ memory model"), but here's one simple rule:
Whatever thread T did before it releases some mutex M is guaranteed to be visible to any other thread U after thread U subsequently locks the same mutex M.
The boost library (before the C++11 standard), offered support for threads. As part of its support, it also offers the implementation of a "barrier", a simple class which allows synchronization. To quote the boost website:
"A barrier is a simple concept. Also known as a rendezvous, it is a synchronization point between multiple threads. The barrier is configured for a particular number of threads (n), and as threads reach the barrier they must wait until all n threads have arrived. Once the n-th thread has reached the barrier, all the waiting threads can proceed, and the barrier is reset."
The implementation of the main function of the barrier (wait), as of Boost 1.54, is shown below:
bool wait()
{
boost::mutex::scoped_lock lock(m_mutex);
unsigned int gen = m_generation;
if (--m_count == 0)
{
m_generation++;
m_count = m_threshold;
m_cond.notify_all();
return true;
}
while (gen == m_generation)
m_cond.wait(lock);
return false;
}
It can be seen that the barrier is reusable: Once constructed, it doesn't need to be destroyed after its first use.
My question now is: What is the variable m_generation for? I am assuming the writers of the boost library had a reason to include it. It is incremented each time the barrier is reset/ready to be reused, but to what end? It is a private variable, thus it cannot be read out from the outside. The same problem could just as easily be solved with a simple internal bool variable inside the wait() function, without having a private class variable.
In a nutshell, m_generation is needed to deal with spurious wakeups.
The generation counter is used in conjunction with the condition variable to signal to all threads waiting on the barrier that they are free to proceed:
Once there are m_threshold threads that have reached the barrier, its generation number gets bumped up, and the condition variable is signalled. This causes the waiting threads (i.e. those that have reached the barrier earlier) to wake up from m_cond.wait(lock).
Now, the waiting threads can wake up from m_cond.wait(lock) for other reasons. This is where m_generation comes in: if it's changed, then the barrier has been reset and the thread can proceed. If m_generation still contains the same value, the thread needs to go back into m_cond.wait(lock).
Having an automatic variable inside wait() would not achieve this, since each thread would have its own instance.
If I am accessing shared memory for reading only, to check a condition for an if() block, should I still lock the mutex? E.g.
mutex_lock();
if (var /* shared memory */) {
}
mutex_unlock();
Is locking here needed and good practice?
If the variable you are reading could be written to concurrently, then yes, you should acquire a lock on the mutex.
You could only read it atomically if your compiler provides you with the necessary primitives for that; this could be either the atomic features that come with C11 and C++11 or some other language extension provided by your compiler. Then you could move the mutex acquisition into the conditional, but if you wait until after the test to acquire the mutex then someone else may change it between the time you test it and the time you acquire the mutex:
if (example) {
// "example" variable could be changed here by another thread.
mutex_lock();
// Now the condition might be false!
mutex_unlock();
}
Therefore, I would suggest acquiring the mutex before the conditional, unless profiling has pinpointed mutex acquisition as a bottleneck. (And in the case where the tested variable is larger than a CPU register -- a 64-bit number on a 32-bit CPU, for example -- then you don't even have the option of delaying mutex acquisition without some other kind of atomic fetch or compare primitive.)
I need a fully-recursive multiple-reader/single-writer lock (shared mutex) for my project - I don't agree with the notion that if you have complete const-correctness you shouldn't need them (there was some discussion about that on the boost mailing list), in my case the lock should protect a completely transparent cache which would be mutable in any case.
As for the semantics of recursive MRSW locks, I think the only ones that make sense are that acquiring a exclusive lock in addition to a shared one temporarily releases the shared one, to be reacquired when the exclusive one is released.
Has the somewhat strange effect that unlocking can wait but I can live with that - writing rarely happens anyway and recursive locking usually only happens through recursive code paths, in which case the caller has to be prepared that the call might wait in any case. To avoid it one can still simply upgrade the lock instead of using recursive locking.
Acquiring a shared lock on top of an exclusive one should obviously just increases the lock count.
So the question becomes - how should I implement it? The usual approach with a critical section and two semaphores doesn't work here because - as far as I can see - the woken up thread has to handshake, by inserting it's thread id into the lock's owner map.
I suppose it would be doable with two condition variables and a couple of mutexes but the sheer amount of synchronization primitives that would end up using sounds like a bit too much overhead for my taste.
An idea which just sprang into my mind is to utilize TLS to remember the type of lock I'm holding (and possibly the local lock counts). Have to think it through - but I'll still post the question for now.
Target platform is Win32 but that shouldn't really matter. Note that I'm specifically targeting Win2k so anything related to the new MRSW lock primitive in Windows 7 is not relevant for me. :-)
Okay, I solved it.
It can be done with just 2 semaphores, a critical section and almost no more locking than for a regular non-recursive MRSW lock (there is obviously some more CPU-time spent inside the lock because that multimap must be managed) - but it's tricky. The structure I came up with looks like this:
// Protects everything that follows, except mWriterThreadId and mRecursiveUpgrade
CRITICAL_SECTION mLock;
// Semaphore to wait on for a read lock
HANDLE mSemaReader;
// Semaphore to wait on for a write lock
HANDLE mSemaWriter;
// Number of threads waiting for a write lock.
int mWriterWaiting;
// Number of times the writer entered the write lock.
int mWriterActive;
// Number of threads inside a read lock. Note that this does not include
// recursive read locks.
int mReaderActiveThreads;
// Whether or not the current writer obtained the lock by a recursive
// upgrade. Note that this member might be set outside the critical
// section, so it should only be read from by the writer during his
// unlock.
bool mRecursiveUpgrade;
// This member contains the current thread id once for each
// (recursive) read lock held by the current thread in addition to an
// undefined number of other thread ids which may or may not hold a
// read lock, even inside the critical section (!).
std::multiset<unsigned long> mReaderActive;
// If there is no writer this member contains 0.
// If the current thread is the writer this member contains his
// thread-id.
// Otherwise it can contain either of them, even inside the
// critical section (!).
// Also note that it might be set outside the critical section.
unsigned long mWriterThreadId;
Now, the basic idea is this:
Full update of mWriterWaiting and mWriterActive for an unlock is performed by the unlocking thread.
For mWriterThreadId and mReaderActive this is not possible, as the waiting thread needs to insert itself when it was released.
So the rule is, that you may never access those two members except to check whether you are holding a read lock or are the current writer - specifically it may not be used to checker whether or not there are any readers / writers - for that you have to use the (somewhat redundant but necessary for this reason) mReaderActiveThreads and mWriterActive.
I'm currently running some test code (which has been going on deadlock- and crash-free for 30 minutes or so) - when I'm sure that it's stable and I've cleaned up the code somewhat I'll put it on some pastebin and add a link in a comment here (just in case someone else ever needs this).
Well, I did some thinking. Starting from the simple "two semaphores and a critical section" one adds a writer lock count and a owning writer TID to the structure.
Unlock still set most of the new status in the critsec. Readers still normally increase the lock count - recursive locking simply adds a non-existing reader to the counter.
During writers lock() I compare the owning TID, and if the writer already own it the write lock counter is increased.
Setting the new writer TID can't be done by the unlock() - it doesn't know which one will be wakened, but if writers reset it back to zero in their unlock() it's not a problem - the current thread id won't ever be zero and setting it is an atomic operation.
All sounds simple enough - one nasty problem left: A recursive reader-reader lock while a writer is waiting will deadlock. And I don't know how to solve that short of doing a reader-biased lock... somehow I need to know whether or not I already own a reader lock.
Using TLS doesn't sound too great after I realized that the number if available slots might be rather limited...
As far as I understand, you need to provide your writer exclusive access to the data, while readers can operate simultaneously (if this is not what you want, please clarify your question).
I think you need to implement a sort of "inverse semaphore", i.e. a semaphore that will block a thread when positive, and signal all waiting threads when zero. If you do this, you can use two such semaphores for your program. The operation of your threads could then be the following:
Reader:
(1) wait on sem A
(2) increase sem B
(3) read operation
(4) decrease sem B
Writer:
(1) increase sem A
(2) wait on sem B
(3) write operation
(4) decrease sem A
In this way the writer will perform the write operation as soon as all pending readers have finished reading. As soon as your writer finishes, readers can resume their operation without blocking each other.
I am not familiar with Windows mutex/semaphore facilities but I can think of a way to implement such semaphores using the POSIX threads API (combining a mutex, a counter and a conditional variable).