Reading/Writing from STL Map in multithreaded environment - c++

Problem: I need to write a function which returns a value for a input key from a map. If function can't found the value in map it will fetch the value from database, write into map for future use and return the same. There can be multiple threads calling this function.
I am thinking on this line:
string GetData (const int key)
{
pthread_rwlock_rdlock(&rwlock); //read lock
string result = "not found";
my_map::const_iterator iter = m.find(key);
if ( iter != m.end() )//found
{
result = iter->second;
}
else //missing
{
pthread_rwlock_wrlock(&rwlock); // write lock
//fetch value from data base
//if successful, add to the map
m[key] = "missing data";
result = "missing data";
pthread_rwlock_unlock(&rwlock); // unlock write lock
}
pthread_rwlock_unlock(&rwlock); // unlock read lock
return result;
}
Is this function thread safe? Isn't it possible for two or more threads to queue up on write lock and query the same key from database? If yes, how can I avoid that scenario?

This function is not thread-safe because it results in undefined behavior. When you attempt to obtain the write lock, you already hold a read lock. From the documentation for pthread_rwlock_wrlock:
Results are undefined if the calling thread holds the read-write lock (whether a read or write lock) at the time the call [to pthread_rwlock_wrlock] is made.
This solution is also not exception-safe. If an exception is thrown while the lock is held, the lock will not be released and your application will undoubtedly deadlock. You should use a C++ threading library (Boost.Thread, OpenThreads, just::thread, or something similar) that provides a C++-oriented design supporting things like scoped_lock (or lock_guard).
As for making the algorithm correct, you need something along the lines of:
obtain read lock
attempt to find object
if object exists
return object
else
release read lock
obtain write lock
if object exists
return object
else
insert object
return object
[If you use some sort of lock_guard, you don't need to worry about releasing held locks when you return]

Not properly implemented. You can't take the write lock while still holding the read lock, for fear of deadlock. From the man page for pthread_rwlock_wrlock on my Linux box:
The pthread_rwlock_wrlock() function shall apply a write lock to the
read-write lock referenced by rwlock. The calling thread acquires the
write lock if no other thread (reader or writer) holds the read-write
lock rwlock. Otherwise, the thread shall block until it can acquire
the lock. The calling thread may deadlock if at the time the call is
made it holds the read-write lock (whether a read or write lock).
Further, you should check the return value of the calls... for example, there's an implementation-defined limit to the number of simultaneous readers.
There are also the usual issues with exception safety... consider a scope guard or try/catch block.

You might fix it looking for the value again once you have acquired the write lock. That should be enough to fix the problem you are describing. Something like:
string GetData (const int key)
{
pthread_rwlock_rdlock(&rwlock); //read lock
string result = "not found";
my_map::const_iterator iter = m.find(key);
if ( iter != m.end() )//found
{
result = iter->second;
}
else //missing
{
// change from read mode to write mode
pthread_rwlock_unlock(&rwlock); // unlock read lock
pthread_rwlock_wrlock(&rwlock); // write lock
// Try again
iter = m.find(key);
if (iter != m.end()) {
result = iter->second;
} else {
//if successful, add to the map
m[key] = "missing data";
result = "missing data";
}
}
pthread_rwlock_unlock(&rwlock); // unlock read/write lock
return result;
}

Related

When can memory_order_acquire or memory_order_release be safely removed from compare_exchange?

I refer to the code in Lewiss Baker's coroutine tutorial.
https://lewissbaker.github.io/2017/11/17/understanding-operator-co-await
bool async_manual_reset_event::awaiter::await_suspend(
std::experimental::coroutine_handle<> awaitingCoroutine) noexcept
{
// Special m_state value that indicates the event is in the 'set' state.
const void* const setState = &m_event;
// Remember the handle of the awaiting coroutine.
m_awaitingCoroutine = awaitingCoroutine;
// Try to atomically push this awaiter onto the front of the list.
void* oldValue = m_event.m_state.load(std::memory_order_acquire);
do
{
// Resume immediately if already in 'set' state.
if (oldValue == setState) return false;
// Update linked list to point at current head.
m_next = static_cast<awaiter*>(oldValue);
// Finally, try to swap the old list head, inserting this awaiter
// as the new list head.
} while (!m_event.m_state.compare_exchange_weak(
oldValue,
this,
std::memory_order_release,
std::memory_order_acquire));
// Successfully enqueued. Remain suspended.
return true;
}
where m_state is just a std::atomic<void *>.
bool async_manual_reset_event::is_set() const noexcept
{
return m_state.load(std::memory_order_acquire) == this;
}
void async_manual_reset_event::reset() noexcept
{
void* oldValue = this;
m_state.compare_exchange_strong(oldValue, nullptr, std::memory_order_acquire);
}
void async_manual_reset_event::set() noexcept
{
// Needs to be 'release' so that subsequent 'co_await' has
// visibility of our prior writes.
// Needs to be 'acquire' so that we have visibility of prior
// writes by awaiting coroutines.
void* oldValue = m_state.exchange(this, std::memory_order_acq_rel);
if (oldValue != this)
{
// Wasn't already in 'set' state.
// Treat old value as head of a linked-list of waiters
// which we have now acquired and need to resume.
auto* waiters = static_cast<awaiter*>(oldValue);
while (waiters != nullptr)
{
// Read m_next before resuming the coroutine as resuming
// the coroutine will likely destroy the awaiter object.
auto* next = waiters->m_next;
waiters->m_awaitingCoroutine.resume();
waiters = next;
}
}
}
Note in m_state.exchange of the set() method, the comment above shows clearly why the call to exchange requires both acquire and release.
I wonder why in the m_state.compare_exchange_weak of the await_suspend() method, the third parameter is a std::memory_order_release but not a memory_order_acq_rel (the acquire is removed).
The author (Lewis) did explain that we need release in the compare_exchange_weak because we need to let later set() see the writes in compare_exchange_weak. But why don't we require other compare_exchange_weak in other threads to see the writes in the current compare_exchange_weak?
Is it because of release sequence? I.e., in a release chain (write release at first, and all the middle operations are "read acquire then write release" operations, and the final operation is read acquire), then you don't need to tell them to acquire in the middle?
In the following code, I tried to implement a shared lock,
struct lock {
uint64_t exclusive : 1;
uint64_t id : 48;
uint64_t shared_count : 15;
};
std::atomic<lock> lock_ { {0, 0, 0} };
bool try_lock_shared() noexcept {
lock currentlock = lock_.load(std::memory_order_acquire);
if (currentlock.exclusive == 1) {
return false;
}
lock newlock;
do {
newlock = currentlock;
newlock.shared_count++;
}
while(!lock_.compare_exchange_weak(currentlock, newlock, std::memory_order_acq_rel) && currentlock.exclusive == 0);
return currentlock.exclusive == 0;
}
bool try_lock() noexcept {
uint64_t id = utils::get_thread_id();
lock currentlock = lock_.load(std::memory_order_acquire);
if (currentlock.exclusive == 1) {
assert(currentlock.id != id);
return false;
}
bool result = false;
lock newlock { 1, id, 0 };
do {
newlock.shared_count = currentlock.shared_count;
}
while(!(result = lock_.compare_exchange_weak(currentlock, newlock, std::memory_order_acq_rel)) && currentlock.exclusive == 0);
return result;
}
I used lock_.compare_exchange_weak(currentlock, newlock, std::memory_order_acq_rel) everywhere, can I safely replace them to compare_exchange_weak(currentlock, newlock, std::memory_order_release, std::memory_order_acquire) ?
I could also see examples that memory_order_release is removed from compare_exchange_strong (see the compare_exchange_strong in reset() function of Lewis code), where you only need std::memory_order_acquire for compare_exchange_strong (but not release). I didn't really see memory_order_release is removed from weak nor memory_order_acquire is removed from strong.
This made me wonder whether there's deeper rule that I didn't understand or not.
Thanks.
memory_order_acquire makes only sense for operations that read a value, and memory_order_release makes only sense for operations that write a value. Since a read-modify-write operations reads and writes, it is possible to combine these memory orders, but it is not always necessary.
The m_event.m_state.compare_exchange_weak uses memory_order_release to write the new value, because it tries to replace a value that has previously been read using memory_order_acquire:
// load initial value using memory_order_acquire
void* oldValue = m_event.m_state.load(std::memory_order_acquire);
do {
...
} while (!m_event.m_state.compare_exchange_weak(oldValue, this,
std::memory_order_release,
// in case of failure, load new value using memory_order_acquire
std::memory_order_acquire));
IMHO in this case it is not even necessary to use memory_order_acquire at all, since oldValue is never de-referenced, but only stored as next pointer, i.e., it would be perfectly find to replace these two memory_order_acquire with memory_order_relaxed.
In async_manual_reset_event::set() the situtation is different:
void* oldValue = m_state.exchange(this, std::memory_order_acq_rel);
if (oldValue != this)
{
auto* waiters = static_cast<awaiter*>(oldValue);
while (waiters != nullptr)
{
// we are de-referencing the pointer read from m_state!
auto* next = waiters->m_next;
waiters->m_awaitingCoroutine.resume();
waiters = next;
}
Since we are de-referencing the pointer we read from m_state, we have to ensure that these reads happen after the writes to these waiter objects. This is ensured via the synchronize-with relation on m_state. The writer is added via the previously discussed compare_exchange using memory_order_release. The acquire-part of the exchange synchronizes with the release-compare_exchange (and in fact all prior release-compare_exchange that are part of the release sequence), thus providing the necessary happens-before relation.
To be honest, I am not sure why this exchange would need the release part. I think the author might have wanted to be on "the safe side", since several other operations are also stronger than necessary (I already mentioned that await_suspend does not need memory_order_acquire, and the same goes for is_set and reset).
For your lock implementation it is very simple - when you want to acquire the lock (try_lock_shared/try_lock) use memory_order_acquire for the compare-exchange operation only. Releasing the lock has to use memory_order_release.
The argument is also quite simple: you have to ensure that when you have acquired the lock, any changes previously made to the data protected by the lock is visible to the current owner, that is, you have to ensure that these changes happened before the operations you are about to perform after acquiring the lock. This is achieved by establishing a synchronize-with relation between the try_lock (acquire-CAS) and the previous unlock (release-store).
When trying to argue about the correctness of an implementation based on the semantics of the C++ memory model I usually do this as follows:
identify the necessary happens-before relations (like for your lock)
make sure that these happens-before relations are established correctly on all code paths
And I always annotate the atomic operations to document how these relations are established (i.e., which other operations are involved). For example:
// (1) - this acquire-load synchronizes-with the release-CAS (11)
auto n = head.load(std::memory_order_acquire);
// (8) - this acquire-load synchronizes-with the release-CAS (11)
h.acquire(head, std::memory_order_acquire);
// (11) - this release-CAS synchronizes-with the acquire-load (1, 8)
if (head.compare_exchange_weak(expected, next, std::memory_order_release, std::memory_order_relaxed))
(see https://github.com/mpoeter/xenium/blob/master/xenium/michael_scott_queue.hpp for the full code)
For more details about the C++ memory model I can recommend this paper which I have co-authored: Memory Models for C/C++ Programmers

Reading from std::map without atomic flag on write

I have a std::map myMap and a std::atomic myLock.
The write is:
if(myLock == 0)
{
myLock++;
myMap.insert(key, value);
myLock--;
}
If I do something like this without locking from another thread, is this considered undefined behavior? The key thing is, I do not mind if the results are not accurate (ie a value in the map updated after I iterated past it). I just don't want to crash.
MyConstIterator endIt = mMap.cend();
for(MyConstIterator it = myMap.cbegin(); it != endIt; ++it)
{
}
I'm trying to achieve lock less reads without a mutex, but I know std::map is not thread safe. Do I have to add to the atomic lock to avoid a crash?
Your use of lock won't make your map thread safe. Two threads can read myLock == 0 and head into your brace.
You need a mutex. This answer on locking may be useful.

Using a mutex to block execution from outside the critical section

I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)
//these are initiated in the constructor
int* data;
std::atomic<size_t> size;
void write(int value) {
//wait here while "read_lock"
//set "write_lock" to "write_lock" + 1
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = value;
//set "write_lock" to "write_lock" - 1
}
the order of the writes is not important, all I need here is for each write to go to a unique slot
Every once in a while though, I need one thread to read the data using this function
int* read() {
//set "read_lock" to true
//wait here while "write_lock"
int* ret = data;
data = new int[capacity];
size = 0;
//set "read_lock" to false
return ret;
}
so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)
In theory this should lead to 2 operating scenarios:
1 - just a bunch of threads writing into the container
2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.
The question part is that I don't know what kind of a barrier to use for the locks -
A spinlock would be wasteful since there are many containers like this and they all need cpu cycles
I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.
So what would be the optimal solution here?
If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).
So something like this may be what you need:
class MyClass
{
public:
using mutex_type = std::shared_timed_mutex;
using shared_lock = std::shared_lock<mutex_type>;
using unique_lock = std::unique_lock<mutex_type>;
private:
mutable mutex_type mtx;
public:
// All updater threads can operate at the same time
auto lock_for_updates() const
{
return shared_lock(mtx);
}
// Reader threads need to kick all the updater threads out
auto lock_for_reading() const
{
return unique_lock(mtx);
}
};
// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_updates();
// update the data here
}
// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
auto lock = sptr->lock_for_reading();
// read the data here
}
The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.
You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.
int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;
void write(int x) {
++num_writers;
if (reading) {
--num_writers;
if (num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
{ std::lock_guard l(entering); }
++num_writers;
}
auto slot = size.fetch_add(1, std::memory_order_acquire);
data[slot] = x;
--num_writers;
if (reading && num_writers == 0)
{
std::lock_guard l(leaving);
cv.notify_one();
}
}
int* read() {
int* other_data = new int[capacity];
{
std::unique_lock enter_lock(entering);
reading = true;
std::unique_lock leave_lock(leaving);
cv.wait(leave_lock, [] () { return num_writers == 0; });
swap(data, other_data);
size = 0;
reading = false;
}
return other_data;
}
It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.
In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.
Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.
Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.
Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.
As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).

how avoid freezing other threads when one thread locks a big map

How to avoid freezing other threads which try to access the same map that is being locked by current thread? see below code:
//pseudo code
std::map<string, CSomeClass* > gBigMap;
void AccessMapForWriting(string aString){
pthread_mutex_lock(&MapLock);
CSomeClass* obj = gBigMap[aString];
if (obj){
gBigMap.erase(aString);
delete obj;
obj = NULL;
}
pthread_mutex_unlock(&MapLock);
}
void AccessMapForReading(string aString){
pthread_mutex_lock(&MapLock);
CSomeClass* obj = gBigMap[aString];
//below code consumes much time
//sometimes it even sleeps for milliseconds
if (obj){
obj->TimeConsumingOperation();
}
pthread_mutex_unlock(&MapLock);
}
//other threads will also call
//the same function -- AccessMap
void *OtherThreadFunc(void *){
//call AccessMap here
}
Consider using a read write lock instead, pthread_rwlock_t
There are some details here
It says
"Using a normal mutex, when a thread obtains the mutex all other
threads are forced to block until that mutex is released by the owner.
What about the situation where the vast majority of threads are simply
reading the data? If this is the case then we should not care if there
is 1 or up to N readers in the critical section at the same time. In
fact the only time we would normally care about exclusive ownership is
when a writer needs access to the code section."
You have a std::string as a key. Can you break down that key in a short suffix (possibly just a single letter) and a remainder? Because in that case, you might implement this datastructure as 255 maps with 255 locks. That of course means that most of the time, there's no lock contention because the suffix differs, and therefore the lock.

ls this code thread-safe?

I'm refactoring some time consuming function so that it can be called from a thread, but I'm having trouble wrapping my head around the issue (not very familiar with thread programming).
At any point, the user can cancel and the function will stop. I do not want to kill the thread as soon as the user cancels since it could cause some data integrity problems. Instead, in several places in the function, I will check if the function has been cancelled and, if so, exit. I will only do that where I know it's safe to exit.
The whole code of the function will be within a mutex. This is the pseudo-code I have in mind:
SomeClass::SomeClass() {
cancelled_ = false;
}
void SomeClass::cancelBigSearch() {
cancelled_ = true;
}
void SomeClass::bigSearch() {
mutex.lock();
// ...
// Some code
// ...
// Safe to exit at this point
if (cancelled_) {
mutex.unlock();
cancelled_ = false;
return;
}
// ...
// Some more code
// ...
if (cancelled_) {
mutex.unlock();
cancelled_ = false;
return;
}
// ...
// Again more code
// ...
if (cancelled_) {
mutex.unlock();
cancelled_ = false;
return;
}
mutex.unlock();
}
So when the user starts a search, a new thread calls bigSearch(). If the user cancels, cancelBigSearch() is called and a cancelled_ flag is set. Then, when bigSearch() reaches a point where it's safe to exit, it will exit.
Any idea if this is all thread-safe?
You should lock access to cancelled_ with another mutex, so checking and setting does not happen simultaneously. Other than that, I think your approach is OK
Update: Also, make sure no exceptions can be thrown from SomeClass::bigSearch(), otherwise the mutex might remain in a locked state. To make sure that all return paths unlock the mutex, you might want to surround the processing parts of the code with if (!cancelled_) and return only at the very end of the method (where you have the one unlock() call on the mutex.
Better yet, wrap the mutex in a RAII (acronym for Resource Allocation Is Initialization) object, so no matter how the function ends (exception or otherwise), the mutex is guaranteed to be unlocked.
Yes, this is thread safe. But:
Processors can have separate cache and cache it's own copy of cancelled_, typically mutex synchronization functions applies proper cache synchronization.
Compiler generated code, can make invalid assumptions about Your data locality, this can lead to not update in time cancelled_. Some platform specific commands can help here, or you can simply use other mechanisms.
All these lead to a thread that isn't canceled in time as you wish.
Your code usage pattern is simple "signaling". So you need to transfer signal to thread. Signal patterns allows trigger multiple times same trigger (signal), and clear it later.
This can be simulated using:
atomic operations
mutex protected variables
signal synchronization primitives
It's not thread-safe, because one thread could read cancelled_ at the same time another thread writes to it, which is a data race, which is undefined behaviour.
As others suggested, either use an atomic type for cancelled_ or protect it with another mutex.
You should also use RAII types to lock the mutexes.
e.g.
void SomeClass::cancelBigSearch() {
std::lock_guard<std::mutex> lock(cxlMutex_);
cancelled_ = true;
}
bool SomeClass::cancelled() {
std::lock_guard<std::mutex> lock(cxlMutex_);
if (cancelled_) {
// reset to false, to avoid caller having to lock mutex again to reset it
cancelled_ = false;
return true;
}
return false;
}
void SomeClass::bigSearch() {
std::lock_guard<std::mutex> lock(mutex);
// ...
// Some code
// ...
// Safe to exit at this point
if (cancelled())
return;
// ...
// Some more code
// ...
if (cancelled())
return;
// ...
// Again more code
// ...
if (cancelled())
return;
}