Using shared_ptr to implement RCU (read-copy-update)? - c++

I'm very interested in the user-space RCU (read-copy-update), and trying to simulate one via tr1::shared_ptr, here is the code, while I'm really a newbie in concurrent programming, would some experts help me to review?
The basic idea is, reader calls get_reading_copy() to gain the pointer of current protected data (let's say it's generation one, or G1). writer calls get_updating_copy() to gain a copy of the G1 (let's say it's G2), and only one writer is allowed to enter the critical section. After the updating is done, writer calls update() to do a swap, and make the m_data_ptr pointing to the G2 data. The ongoing readers and the writer now hold the shared_ptr(s) of G1, and either a reader or a writer will eventually deallocate the G1 data.
Any new readers would get the pointer to G2, and a new writer would get the copy of G2 (let's say it's G3). It's possible the G1 is not released yet, so multiple generations of data may co-exist.
template <typename T>
class rcu_protected
{
public:
typedef T type;
typedef const T const_type;
typedef std::tr1::shared_ptr<type> rcu_pointer;
typedef std::tr1::shared_ptr<const_type> rcu_const_pointer;
rcu_protected() : m_is_writing(0),
m_is_swapping(0),
m_data_ptr (new type())
{}
rcu_const_pointer get_reading_copy ()
{
spin_until_eq (m_is_swapping, 0);
return m_data_ptr;
}
rcu_pointer get_updating_copy ()
{
spin_until_eq (m_is_swapping, 0);
while (!CAS (m_is_writing, 0, 1))
{/* do sleep for back-off when exceeding maximum retry times */}
rcu_pointer new_data_ptr(new type(*m_data_ptr));
// as spin_until_eq does not have memory barrier protection,
// we need to place a read barrier to protect the loading of
// new_data_ptr not to be re-ordered before its construction
_ReadBarrier();
return new_data_ptr;
}
void update (rcu_pointer new_data_ptr)
{
while (!CAS (m_is_swapping, 0, 1))
{}
m_data_ptr.swap (new_data_ptr);
// as spin_until_eq does not have memory barrier protection,
// we need to place a write barrier to protect the assignments of
// m_is_writing/m_is_swapping be re-ordered bofore the swapping
_WriteBarrier();
m_is_writing = 0;
m_is_swapping = 0;
}
private:
volatile long m_is_writing;
volatile long m_is_swapping;
rcu_pointer m_data_ptr;
};

From a first glance, I would exchange the spin_until_eq calls, and associated spinlocks, for a mutex. If more than one writer were ever allowed within the critical section, then I would use a semaphore. Those concurrency mechanism implementations may be OS dependent so performance considerations should come into consideration also; usually, they are better than busy waits.

Related

Synchronizing method calls on shared object from multiple threads

I am thinking about how to implement a class that will contain private data that will be eventually be modified by multiple threads through method calls. For synchronization (using the Windows API), I am planning on using a CRITICAL_SECTION object since all the threads will spawn from the same process.
Given the following design, I have a few questions.
template <typename T> class Shareable
{
private:
const LPCRITICAL_SECTION sync; //Can be read and used by multiple threads
T *data;
public:
Shareable(LPCRITICAL_SECTION cs, unsigned elems) : sync{cs}, data{new T[elems]} { }
~Shareable() { delete[] data; }
void sharedModify(unsigned index, T &datum) //<-- Can this be validly called
//by multiple threads with synchronization being implicit?
{
EnterCriticalSection(sync);
/*
The critical section of code involving reads & writes to 'data'
*/
LeaveCriticalSection(sync);
}
};
// Somewhere else ...
DWORD WINAPI ThreadProc(LPVOID lpParameter)
{
Shareable<ActualType> *ptr = static_cast<Shareable<ActualType>*>(lpParameter);
T copyable = /* initialization */;
ptr->sharedModify(validIndex, copyable); //<-- OK, synchronized?
return 0;
}
The way I see it, the API calls will be conducted in the context of the current thread. That is, I assume this is the same as if I had acquired the critical section object from the pointer and called the API from within ThreadProc(). However, I am worried that if the object is created and placed in the main/initial thread, there will be something funky about the API calls.
When sharedModify() is called on the same object concurrently,
from multiple threads, will the synchronization be implicit, in the
way I described it above?
Should I instead get a pointer to the
critical section object and use that instead?
Is there some other
synchronization mechanism that is better suited to this scenario?
When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
It's not implicit, it's explicit. There's only only CRITICAL_SECTION and only one thread can hold it at a time.
Should I instead get a pointer to the critical section object and use that instead?
No. There's no reason to use a pointer here.
Is there some other synchronization mechanism that is better suited to this scenario?
It's hard to say without seeing more code, but this is definitely the "default" solution. It's like a singly-linked list -- you learn it first, it always works, but it's not always the best choice.
When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
Implicit from the caller's perspective, yes.
Should I instead get a pointer to the critical section object and use that instead?
No. In fact, I would suggest giving the Sharable object ownership of its own critical section instead of accepting one from the outside (and embrace RAII concepts to write safer code), eg:
template <typename T>
class Shareable
{
private:
CRITICAL_SECTION sync;
std::vector<T> data;
struct SyncLocker
{
CRITICAL_SECTION &sync;
SyncLocker(CRITICAL_SECTION &cs) : sync(cs) { EnterCriticalSection(&sync); }
~SyncLocker() { LeaveCriticalSection(&sync); }
}
public:
Shareable(unsigned elems) : data(elems)
{
InitializeCriticalSection(&sync);
}
Shareable(const Shareable&) = delete;
Shareable(Shareable&&) = delete;
~Shareable()
{
{
SyncLocker lock(sync);
data.clear();
}
DeleteCriticalSection(&sync);
}
void sharedModify(unsigned index, const T &datum)
{
SyncLocker lock(sync);
data[index] = datum;
}
Shareable& operator=(const Shareable&) = delete;
Shareable& operator=(Shareable&&) = delete;
};
Is there some other synchronization mechanism that is better suited to this scenario?
That depends. Will multiple threads be accessing the same index at the same time? If not, then there is not really a need for the critical section at all. One thread can safely access one index while another thread accesses a different index.
If multiple threads need to access the same index at the same time, a critical section might still not be the best choice. Locking the entire array might be a big bottleneck if you only need to lock portions of the array at a time. Things like the Interlocked API, or Slim Read/Write locks, might make more sense. It really depends on your thread designs and what you are actually trying to protect.

Cross-platform up- and downgradable read/write lock

I try to turn some central data structure of a large codebase multithreaded.
The access interfaces were changed to represent read/write locks, which may be up- and downgraded:
Before:
Container& container = state.getContainer();
auto value = container.find( "foo" )->bar;
container.clear();
Now:
ReadContainerLock container = state.getContainer();
auto value = container.find( "foo" )->bar;
{
// Upgrade read lock to write lock
WriteContainerLock write = state.upgrade( container );
write.clear();
} // Downgrades write lock to read lock
Using an actual std::mutex for the locking (instead of r/w implementation) works fine but brings no performance benefit (actually degrades runtime).
Actual changing data is relatively rare, so it seems very desirable to go with the read/write concept. The big issue now is that I cannot seem to find any library, which implements the read/write concept and supports upgrade and downgrade and works on Windows, OSX and Linux alike.
Boost has BOOST_THREAD_PROVIDES_SHARED_MUTEX_UPWARDS_CONVERSIONS but does not seem to support downgrading (blocking) atomic upgrading from shared to unique.
Is there any library out there, that supports the desired feature set?
EDIT:
Sorry for being unclear. Of course I mean multiple-readers/single-writer lock semantic.
The question has changed since I answered. As the previous answer is still useful, I will leave it up.
The new question seems to be "I want a (general purpose) reader writer lock where any reader can be upgraded to a writer atomically".
This cannot be done without deadlocks, or the ability to roll back operations (transactional reads), which is far from general-purpose.
Suppose you have Alice and Bob. Both want to read for a while, then they both want to write.
Alice and Bob both get a read lock. They then upgrade to a write lock. Neither can progress, because a write lock cannot be acquired while a read lock is acquired. You cannot unlock the read lock, because then the state Alice read while read locked may not be consistent with the state after the write lock is acquired.
This can only be solved with the possibility the read->write upgrade can fail, or the ability to rollback all operations in a read (so Alice can "unread", Bob can advance, then Alice can re-read and try to get the write lock).
Writing type-safe transactional code isn't really supported in C++. You can do it manually, but beyond simple cases it is error prone. Other forms of transactional rollbacks can also be used. None of them are general purpose reader-writer locks.
You can roll your own. If the states are R, U, W and {} (read, upgradable, write and no lock), these are transitions you can easily support:
{} -> R|U|W
R|U|W -> {}
U->W
W->U
U->R
and implied by the above:
W->R
which I think satisifies your requirements.
The "missing" transition is R->U, which is what lets us have multiple-readers safely. At most one reader (the upgrade reader) has the right to upgrade to write without releasing their read lock. While they are in that upgrade state they do not block other threads from reading (but they do block other threads from writing).
Here is a sketch. There is a shared_mutex A; and a mutex B;.
B represents the right to upgrade to write and the right to read while you hold it. All writers also hold a B, so you cannot both have the right to upgrade to write while someone else has the right to write.
Transitions look like:
{}->R = read(A)
{}->W = lock(B) then write(A)
{}->U = lock(B)
U->W = write(A)
W->U = unwrite(A)
U->R = read(A) then unlock(B)
W->R = W->U->R
R->{} = unread(A)
W->{} = unwrite(A) then unlock(B)
U->{} = unlock(B)
This simply requires std::shared_mutex and std::mutex, and a bit of boilerplate to write up the locks and the transitions.
If you want to be able to spawn a write lock while the upgrade lock "remains in scope" extra work needs to be done to "pass the upgrade lock back to the read lock".
Here are some bonus try transitions, inspired by #HowardHinnat below:
R->try U = return try_lock(B) && unread(A)
R->try W = return R->try U->W
Here is an upgradable_mutex with no try operations:
struct upgradeable_mutex {
std::mutex u;
std::shared_timed_mutex s;
enum class state {
unlocked,
shared,
aspiring,
unique
};
// one step at a time:
template<state start, state finish>
void transition_up() {
transition_up<start, (state)((int)finish-1)>();
transition_up<(state)((int)finish-1), finish>();
}
// one step at a time:
template<state start, state finish>
void transition_down() {
transition_down<start, (state)((int)start-1)>();
transition_down<(state)((int)start-1), finish>();
}
void lock();
void unlock();
void lock_shared();
void unlock_shared();
void lock_aspiring();
void unlock_aspiring();
void aspiring_to_unique();
void unique_to_aspiring();
void aspiring_to_shared();
void unique_to_shared();
};
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::unlocked, upgradeable_mutex::state::shared
>
() {
s.lock_shared();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::shared, upgradeable_mutex::state::unlocked
>
() {
s.unlock_shared();
}
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::unlocked, upgradeable_mutex::state::aspiring
>
() {
u.lock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::unlocked
>
() {
u.unlock();
}
template<>
void upgradeable_mutex::transition_up<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::unique
>
() {
s.lock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::unique, upgradeable_mutex::state::aspiring
>
() {
s.unlock();
}
template<>
void upgradeable_mutex::transition_down<
upgradeable_mutex::state::aspiring, upgradeable_mutex::state::shared
>
() {
s.lock();
u.unlock();
}
void upgradeable_mutex::lock() {
transition_up<state::unlocked, state::unique>();
}
void upgradeable_mutex::unlock() {
transition_down<state::unique, state::unlocked>();
}
void upgradeable_mutex::lock_shared() {
transition_up<state::unlocked, state::shared>();
}
void upgradeable_mutex::unlock_shared() {
transition_down<state::shared, state::unlocked>();
}
void upgradeable_mutex::lock_aspiring() {
transition_up<state::unlocked, state::aspiring>();
}
void upgradeable_mutex::unlock_aspiring() {
transition_down<state::aspiring, state::unlocked>();
}
void upgradeable_mutex::aspiring_to_unique() {
transition_up<state::aspiring, state::unique>();
}
void upgradeable_mutex::unique_to_aspiring() {
transition_down<state::unique, state::aspiring>();
}
void upgradeable_mutex::aspiring_to_shared() {
transition_down<state::aspiring, state::shared>();
}
void upgradeable_mutex::unique_to_shared() {
transition_down<state::unique, state::shared>();
}
I attempt to get the compiler to work out some of the above transitions "for me" with the transition_up and transition_down trick. I think I can do better, and it did increase code bulk significantly.
Having it 'auto-write' the unlocked-to-unique, and unique-to-(unlocked|shared) was all I got out of it. So probably not worth it.
Creating smart RAII objects that use the above is a bit tricky, as they have to support some transitions that the default unique_lock and shared_lock do not support.
You could just write aspiring_lock and then do conversions in there (either as operator unique_lock, or as methods that return said, etc), but the ability to convert from unique_lock&& down to shared_lock is exclusive to upgradeable_mutex and is a bit tricky to work with implicit conversions...
live example.
Here's my usual suggestion: Seqlock
You can have a single writer and many readers concurrently. Writers compete using a spinlock. A single writer doesn't need to compete so is cheaper.
Readers are truly only reading. They're not writing any state variables, counters, etc. This means you don't really know how many readers are there. But also, there no cache line ping pong so you get the best performance possible in terms of latency and throughput.
What's the catch? the data almost has to be POD. It doesn't really have to POD, but it can not be invalidated (no deleting std::map nodes) as readers may read it while it's being written.
It's only after the fact that readers discover the data is possibly bad and they have to re-read.
Yes, writers don't wait for readers so there's no concept of upgrade/downgrade. You can unlock one and lock the other. You pay less than with any sort of mutex but the data may have changed in the process.
I can go into more detail if you like.
The std::shared_mutex (as implemented in boost if not available on your platform(s)) provides some alternative for the problem.
For atomic upgrade lock semantics, the boost upgrade lock may be the best cross platform alternative.
It does not have an upgrade and downgrade locking mechanism you are looking for, but to get an exclusive lock, the shared access can be relinquished first, then exclusive access sought.
// assumes shared_lock with shared access has been obtained
ReadContainerLock container = state.getContainer();
auto value = container.find( "foo" )->bar;
{
container.shared_mutex().unlock();
// Upgrade read lock to write lock
std::unique_lock<std::shared_mutex> write(container.shared_mutex());
// container work...
write.unlock();
container.shared_mutex().lock_shared();
} // Downgrades write lock to read lock
A utility class can be used to cause the re-locking of the shared_mutex at the end of the scope;
struct re_locker {
re_locker(std::shared_mutex& m) : m_(m) { m_.unlock(); }
~re_locker() { m_.shared_lock(); }
// delete the copy and move constructors and assignment operator (redacted for simplicity)
};
// ...
auto value = container.find( "foo" )->bar;
{
re_locker re_lock(container.shared_mutex());
// Upgrade read lock to write lock
std::unique_lock<std::shared_mutex> write(container.shared_mutex());
// container work...
} // Downgrades write lock to read lock
Depending on what exception guarantees you want or require, you may need to add a "can re-lock" flag to the re_locker to either do the re-lock or not if an exception is thrown during the container operations/work.

Thread safety of copy on write

I try to understand proper way of developing threadsafe applications.
In current project I have following class :
class Test
{
public:
void setVal(unsigned int val)
{
mtx.lock();
testValue = val;
mtx.unlock();
}
unsigned int getVal()
{
unsigned int copy = testValue;
return copy;
}
private:
boost::mutex mtx;
unsigned int testValue;
}
And my question : is above method Test::getVal() threadsafe in multithreaded environment, or it must be locked before taking copy ?
I've read some articles about COW, and now I'm unsure.
Thanks!
If you have data which can be shared between multiple threads (such as the testValue member in your case), you must synchronise all accesses to that data. "Synchronise" has a broad meaning here: it could be done using a mutex, by making the data atomic, or by explicitly invoking a memory barrier.
But you cannot skip on this. In a parallel world with multiple threads, CPU cores, CPUs and caches, there is no guarantee that a write by one thread will be visible to another thread if they don't "shake hands" on a synchronisation primitive. It is quite possible that thread T1's cache entry for testValue will not be updated when thread T2 writes into testValue, precisely because the HW cache management system sees "no synchronisation is happening, the threads don't access shared data, why should I torpedo performance by invalidating caches?"
The C++11 standard chapter [intro.multithread] goes into more detail than you'd like on this, but here's an informal Note from that chapter summarising the idea:
5 ... Informally, performing a release operation on A forces prior side
effects on other memory locations to become visible to other threads that later perform a consume or an
acquire operation on A. ...

Updating cache without blocking

I currently have a program that has a cache like mechanism. I have a thread listening for updates from another server to this cache. This thread will update the cache when it receives an update. Here is some pseudo code:
void cache::update_cache()
{
cache_ = new std::map<std::string, value>();
while(true)
{
if(recv().compare("update") == 0)
{
std::map<std::string, value> *new_info = new std::map<std::string, value>();
std::map<std::string, value> *tmp;
//Get new info, store in new_info
tmp = cache_;
cache_ = new_cache;
delete tmp;
}
}
}
std::map<std::string, value> *cache::get_cache()
{
return cache_;
}
cache_ is being read from many different threads concurrently. I believe how I have it here I will run into undefined behavior if one of my threads call get_cache(), then my cache updates, then the thread tries to access the stored cache.
I am looking for a way to avoid this problem. I know I could use a mutex, but I would rather not block reads from happening as they have to be as low latency as possible, but if need be, I can go that route.
I was wondering if this would be a good use case for a unique_ptr. Is my understanding correct in that if a thread calls get_cache, and that returns a unique_ptr instead of a standard pointer, once all threads that have the old version of cache are finished with it(i.e leave scope), the object will be deleted.
Is using a unique_ptr the best option for this case, or is there another option that I am not thinking of?
Any input will be greatly appreciated.
Edit:
I believe I made a mistake in my OP. I meant to use and pass a shared_ptr not a unique_ptr for cache_. And when all threads are finished with cache_ the shared_ptr should delete itself.
A little about my program: My program is a webserver that will be using this information to decide what information to return. It is fairly high throughput(thousands of req/sec) Each request queries the cache once, so telling my other threads when to update is no problem. I can tolerate slightly out of date information, and would prefer that over blocking all of my threads from executing if possible. The information in the cache is fairly large, and I would like to limit any copies on value because of this.
update_cache is only run once. It is run in a thread that just listens for an update command and runs the code.
I feel there are multiple issues:
1) Do not leak memory: for that never use "delete" in your code and stick with unique_ptr (or shared_ptr in specific cases)
2) Protect accesses to shared data, for that either using locking (mutex) or lock-free mecanism (std::atomic)
class Cache {
using Map = std::map<std::string, value>();
std::unique_ptr<Map> m_cache;
std::mutex m_cacheLock;
public:
void update_cache()
{
while(true)
{
if(recv().compare("update") == 0)
{
std::unique_ptr<Map> new_info { new Map };
//Get new info, store in new_info
{
std::lock_guard<std::mutex> lock{m_cacheLock};
using std::swap;
swap(m_cache, new_cache);
}
}
}
}
Note: I don't like update_cache() being part of a public interface for the cache as it contains an infinite loop. I would probably externalize the loop with the recv and have a:
void update_cache(std::unique_ptr<Map> new_info)
{
{ // This inner brace is not useless, we don't need to keep the lock during deletion
std::lock_guard<std::mutex> lock{m_cacheLock};
using std::swap;
swap(m_cache, new_cache);
}
}
Now for the reading to the cache, use proper encapsulation and don't leave the pointer to the member map escape:
value get(const std::string &key)
{
// lock, fetch, and return.
// Depending on value type, you might want to allocate memory
// before locking
}
Using this signature you have to throw an exception if the value is not present in the cache, another option is to return something like a boost::optional.
Overall you can keep a low latency (everything is relative, I don't know your use case) if you take care of doing costly operations (memory allocation for instance) outside of the locking section.
shared_ptr is very reasonable for this purpose, C++11 has a family of functions for handling shared_ptr atomically. If the data is immutable after creation, you won't even need any additional synchronization:
class cache {
public:
using map_t = std::map<std::string, value>;
void update_cache();
std::shared_ptr<const map_t> get_cache() const;
private:
std::shared_ptr<const map_t> cache_;
};
void cache::update_cache()
{
while(true)
{
if(recv() == "update")
{
auto new_info = std::make_shared<map_t>();
// Get new info, store in new_info
// Make immutable & publish
std::atomic_store(&cache_,
std::shared_ptr<const map_t>{std::move(new_info)});
}
}
}
auto cache::get_cache() const -> std::shared_ptr<const map_t> {
return std::atomic_load(&cache_);
}

how avoid freezing other threads when one thread locks a big map

How to avoid freezing other threads which try to access the same map that is being locked by current thread? see below code:
//pseudo code
std::map<string, CSomeClass* > gBigMap;
void AccessMapForWriting(string aString){
pthread_mutex_lock(&MapLock);
CSomeClass* obj = gBigMap[aString];
if (obj){
gBigMap.erase(aString);
delete obj;
obj = NULL;
}
pthread_mutex_unlock(&MapLock);
}
void AccessMapForReading(string aString){
pthread_mutex_lock(&MapLock);
CSomeClass* obj = gBigMap[aString];
//below code consumes much time
//sometimes it even sleeps for milliseconds
if (obj){
obj->TimeConsumingOperation();
}
pthread_mutex_unlock(&MapLock);
}
//other threads will also call
//the same function -- AccessMap
void *OtherThreadFunc(void *){
//call AccessMap here
}
Consider using a read write lock instead, pthread_rwlock_t
There are some details here
It says
"Using a normal mutex, when a thread obtains the mutex all other
threads are forced to block until that mutex is released by the owner.
What about the situation where the vast majority of threads are simply
reading the data? If this is the case then we should not care if there
is 1 or up to N readers in the critical section at the same time. In
fact the only time we would normally care about exclusive ownership is
when a writer needs access to the code section."
You have a std::string as a key. Can you break down that key in a short suffix (possibly just a single letter) and a remainder? Because in that case, you might implement this datastructure as 255 maps with 255 locks. That of course means that most of the time, there's no lock contention because the suffix differs, and therefore the lock.