Lockfree Reloading and Sharing of const objects

Lockfree Reloading and Sharing of const objects - c++

My application's configuration is a const object that is shared with multiple threads. The configuration is stored in a centralized location and any thread may reach it. I've tried building a lockfree implementation that would allow me to load new configuration while still allowing other threads to read the last known configuration.
My current implementation has a race between updating the shared_ptr and reading from it.
template<typename T>
class ConfigurationHolder
{
public:
typedef std::shared_ptr<T> SPtr;
typedef std::shared_ptr<const T> CSPtr;
ConfigurationHolder() : m_active(new T()) {}
CSPtr get() const { return m_active; } // RACE - read
template<typename Reloader>
bool reload(Reloader reloader)
{
SPtr tmp(new T());
if (!tmp)
return false;
if (!reloader(tmp))
return false;
m_active=tmp; // RACE - write
return true;
}
private:
CSPtr m_active;
};
I can add a shared_mutex for the problematic read/write access to the shared_ptr, but I'm looking for a solution that will keep the implementation lockfree.
EDIT: My version of GCC does not support atomic_exchange on shared_ptr
EDIT2: Requirements Clarification: I have multiple readers and may have multiple reloaders (although this is less common). Readers need to hold a configuration object and that it would not change while they are reading it. Old configuration objects must be freed when the last reader is done with them.

You should just update your compiler to get atomic shared pointer ops.
Failing that, wrap it in a shared_timed_mutex. Then test how much it costs you.
Both of these are going to be less work than correctly writing your own lock-free shared pointer system.
If you have to:
This is a hack, but it might work. It is a read-copy-update style on the pointer itself.
Have a std::vector<std::unique_ptr<std::shared_ptr<T>>>. Have a std::atomic<std::shared_ptr<T> const*> "current" pointer, and a std::atomic<std::size_t> active_readers.
The vector stores your still living shared_ptrs. When you want to change, push a new one on the back. Keep a copy of this shared_ptr.
Now swap the "current" pointer for the new one. Busy-wait until active_readers hits zero, or until you get bored.
If active_readers hit zero, filter your vector for shared_ptrs with a use-count of 1. Remove them from the vector.
Regardless, now drop the extra shared_ptr you ketp to the state you created. And done writing.
If you need more than one writer, lock this process using a separate mutex.
On the reader side, increment active_readers. Now atomically load the "current" pointer, make a local copy of the pointed-to shared_ptr, then decrement active_readers.
However, I just wrote this. So it probably contains bugs. Proving concurrent design correct is hard.
By far the easiest way to make this reliable is to upgrade your compiler and get atomic operations on a shared_ptr.
This is probably overly complex and I think we can set it up so that the T are cleaned up when the last reader goes away, but I aimed for correctness rather than efficiency on recycling T.
With minimial sync on the readers, you could use a condition variable to signal that a reader is done with a given T; but that involves a tiny bit of contention with the writer thread.
Practically, lock-free algorithms are often slower than lock based, because the overhead of a mutex isn't as high as you fear.
A shared_timed_mutex wrapping a shared_ptr, where the writer is simply overwriting the variable, is going to be pretty darn fast. Existing readers are going to keep their old shared_ptr just fine.

Related

C++ thread safety - map reading

I am working on a program that needs std::map and specifically one like this map<string,map<string,int>> - it is meant to be something like bank change rates - the first string is the original currency and the one in the second map is the desired one and the int is their rate. This whole map will be read only. Do I still need mutexes ? I am a bit confused about the whole thread safety, since this is my first bigger multi-threaded program.

If you are talking about the standard std::map† and no thread writes to it, no synchronization is required. Concurrent reads without writes are fine.
If however at least one thread performs writes on the map, you will indeed need some sort of protection like a mutex.
Be aware that std::map::operator[] counts as write, so use std::map::at (or std::map::find if the key may not exist in the map) instead. You can make the compiler protect you from accidental writes by only referring to the shared map via const map&.
†Was clarified to be the case in the OP. For completeness' sake: Note that other classes may have mutable members. For those, even access through const& may introduce a race. If in doubt, check the documentation or use something else for parallel programming.

The rule of thumb is if you have shared data and at least one thread will be a writer then you need synchronization. If one of the threads is a writer you must have synchronization as you do not want a reader to read an element that is being written to. This can cause issues as the reader might read part of the old value and part of the new value.
In your case since all the threads will only ever being reading data there is nothing they can do that will affect the map so you can have concurrent(unsynchronized) reads.

Wrap a std::map<std::string, std::map<std::string,int>> const in a custom class which has only const member functions [*].
This will make sure that all threads which use an object of the class after its creation will only read from it, which is guaranteed to be safe since C++11.
As documentation says:
All const member functions can be called concurrently by different
threads on the same container.
Wrapping containers in your own custom types is good practice anyway. Increased thread safety is just one positive side effect of that good practice. Other positive effects include increased readability of client code, reduction/adaption of container interface to required functionality, ease of adding additional constraints and checks.
Here is a brief example:
class BankChangeRates
{
public:
BankChangeRates(std::map<std::string, std::map<std::string,int>> const& data) : data(data) {}
int get(std::string const& key, std::string const& inner_key) const
{
auto const find_iter = data.find(key);
if (find_iter != data.end())
{
auto const inner_find_iter = find_iter->second.find(inner_key);
if (inner_find_iter != find_iter->second.end())
{
return inner_find_iter->second;
}
}
// error handling
}
int size() const
{
return data.size();
}
private:
std::map<std::string, std::map<std::string,int>> const data;
};
In any case, the thread-safety problem is then reduced to how to make sure that the constructor does not read from an object to which another thread writes. This is often achieved trivially; for example, the object may be constructed before multi-threading even begins, or it may be initialised with hard-coded initialisation lists. In many other cases, the code which creates the object will generally access only other thread-safe functions and local objects.
The point is that concurrent accesses to your object will always be safe once it has been created.
[*] Of course, the const member functions should keep their promise and not attempt "workarounds" with mutable or const_cast.

If your are completely sure that both the maps are ALWAYS READONLY, Then you never need mutexes.
But you have to be extra careful that no one can update the map by any means during the program execution. Make sure that you are initializing the map at the init stage of program and then never update it for any reason.
If you are confused that, In future you may need to update it in between the program execution, then its better to have macros around the map, which are empty right now. And in future, if you need mutexes around them, just change the macro definition.
PS:: I have used map in answer which can be easily replaced by shared resources. It was for the ease of understanding

Guarded Data Design Pattern

In our application we deal with data that is processed in a worker thread and accessed in a display thread and we have a mutex that takes care of critical sections. Nothing special.
Now we thought about re-working our code where currently locking is done explicitely by the party holding and handling the data. We thought of a single entity that holds the data and only gives access to the data in a guarded fashion.
For this, we have a class called GuardedData. The caller can request such an object and should keep it only for a short time in local scope. As long as the object lives, it keeps the lock. As soon as the object is destroyed, the lock is released. The data access is coupled with the locking mechanism without any explicit extra work in the caller. The name of the class reminds the caller of the present guard.
template<typename T, typename Lockable>
class GuardedData {
GuardedData(T &d, Lockable &m) : data(d), guard(m) {}
boost::lock_guard<Lockable> guard;
T &data;
T &operator->() { return data; }
};
Again, a very simple concept. The operator-> mimics the semantics of STL iterators for access to the payload.
Now I wonder:
Is this approach well known?
Is there maybe a templated class like this already available, e.g. in the boost libraries?
I am asking because I think it is a fairly generic and usable concept. I could not find anything like it though.

Depending upon how this is used, you are almost guaranteed to end up with deadlocks at some point. If you want to operate on 2 pieces of data then you end up locking the mutex twice and deadlocking (unless each piece of data has its own mutex - which would also result in deadlock if the lock order is not consistent - you have no control over that with this scheme without making it really complicated). Unless you use a recursive mutex which may not be desired.
Also, how are your GuardedData objects passed around? boost::lock_guard is not copyable - it raises ownership issues on the mutex i.e. where & when it is released.
Its probably easier to copy parts of the data you need to the reader/writer threads as and when they need it, keeping the critical section short. The writer would similarly commit to the data model in one go.
Essentially your viewer thread gets a snapshot of the data it needs at a given time. This may even fit entirely in a cpu cache sitting near the core that is running the thread and never make it into RAM. The writer thread may modify the underlying data whilst the reader is dealing with it (but that should invalidate the view). However since the viewer has a copy it can continue on and provide a view of the data at the moment it was synchronized with the data.
The other option is to give the view a smart pointer to the data (which should be treated as immutable). If the writer wishes to modify the data, it copies it at that point, modifies the copy and when completes, switches the pointer to the data in the model. This would necessitate blocking all readers/writers whilst processing, unless there is only 1 writer. The next time the reader requests the data, it gets the fresh copy.

Well known, I'm not sure. However, I use a similar mechanism in Qt pretty often called a QMutexLocker. The distinction (a minor one, imho) is that you bind the data together with the mutex. A very similar mechanism to the one you've described is the norm for thread synchronization in C#.
Your approach is nice for guarding one data item at a time but gets cumbersome if you need to guard more than that. Additionally, it doesn't look like your design would stop me from creating this object in a shared place and accessing the data as often as I please, thinking that it's guarded perfectly fine, but in reality recursive access scenarios are not handled, nor are multi-threaded access scenarios if they occur in the same scope.
There seems to be to be a slight disconnect in the idea. Its use conveys to me that accessing the data is always made to be thread-safe because the data is guarded. Often, this isn't enough to ensure thread-safety. Order of operations on protected data often matters, so the locking is really scope-oriented, not data-oriented. You could get around this in your model by guarding a dummy object and wrapping your guard object in a temporary scope, but then why not just use one the existing mutex implementations?
Really, it's not a bad approach, but you need to make sure its intended use is understood.

What level of locking granularity is good in concurrent data structures?

I am quite new to multi-threading, I have a single threaded data analysis app that has a good bit of potential for parallelization and while the data sets are large it does not come close to saturating the hard-disk read/write so I figure I should take advantage of the threading support that is now in the standard and try to speed the beast up.
After some research I decided that producer consumer was a good approach for the reading of data from the disk and processing it and I started writing an object pool that would become part of the circular buffer that will be where the producers put data and the consumers get the data. As I was writing the class it felt like I was being too fine grained in how I was handling locking and releasing data members. It feels like half the code is locking and unlocking and like there are an insane number of synchronization objects floating around.
So I come to you with a class declaration and a sample function and this question: Is this too fine-grained? Not fine grained enough? Poorly thought out?
struct PoolArray
{
public:
Obj* arr;
uint32 used;
uint32 refs;
std::mutex locker;
};
class SegmentedPool
{
public: /*Construction and destruction cut out*/
void alloc(uint32 cellsNeeded, PoolPtr& ptr);
void dealloc(PoolPtr& ptr);
void clearAll();
private:
void expand();
//stores all the segments of the pool
std::vector< PoolArray<Obj> > pools;
ReadWriteLock poolLock;
//stores pools that are empty
std::queue< int > freePools;
std::mutex freeLock;
int currentPool;
ReadWriteLock currentLock;
};
void SegmentedPool::dealloc(PoolPtr& ptr)
{
//find and access the segment
poolLock.lockForRead();
PoolArray* temp = &(pools[ptr.getSeg()]);
poolLock.unlockForRead();
//reduce the count of references in the segment
temp->locker.lock();
--(temp->refs);
//if the number of references is now zero then set the segment back to unused
//and push it onto the queue of empty segments so that it can be reused
if(temp->refs==0)
{
temp->used=0;
freeLock.lock();
freePools.push(ptr.getSeg());
freeLock.unlock();
}
temp->locker.unlock();
ptr.set(NULL,-1);
}
A few explanations:
First PoolPtr is a stupid little pointer like object that stores the pointer and the segment number in the pool that the pointer came from.
Second this is all "templatized" but i took those lines out to try to reduce the length of the code block
Third ReadWriteLock is something I put together using a mutex and a pair of condition variables.

Locks are inefficient no matter how fine grained they are, so avoid at all cost.
Both queue and vector can be easyly implemented lock free using compare-swap primitive.
there are a number of papers on the topic
Lock free queue:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8674&rep=rep1&type=pdf
http://www.par.univie.ac.at/project/peppher/publications/Published/opodis10lfq.pdf
Lock free vector:
http://www.stroustrup.com/lock-free-vector.pdf
Straustrup's paper also refers to lock-free allocator, but don't jump at it right away, standard allocators are pretty good these days.
UPD
If you don't want to bother writing your own containers, use Intel's Threading Building Blocks library, it provide both thread safe vector and queue. They are NOT lock free, but they are optimized to use CPU cache efficiently.
UPD
Regarding PoolArray, you don't need a lock there either. If you can use c++11, use std::atomic for atomic increments and swaps, otherwise use compiler built-ins (InterLocked* functions in MSVC and _sync* in gcc http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html)

A good start - you lock things when needed and free them as soon as you're finished.
Your ReadWriteLock is pretty much a CCriticalSection object - depending on your needs it might improve performance to use that instead.
One thing I would say is that call your temp->locker.lock(); function before you release the lock on the pool poolLock.unlockForRead();, otherwise you're performing operations on the pool object when it's not under synchronisation control - it could be being used by another thread at that point. A minor point, but with multi-threading it's the minor points that trip you up in the end.
A good approach to take to multi-threading is to wrap any controlled resources in objects or functions that do the locking and unlocking inside them, so anyone who wants access to the data doesn't have to worry about which lock to lock or unlock, and when to do it. for example:
...
if(temp->refs==0)
{
temp->used=0;
freeLock.lock();
freePools.push(ptr.getSeg());
freeLock.unlock();
}
...
would be...
...
if(temp->refs==0)
{
temp->used=0;
addFreePool(ptr.getSeg());
}
...
void SegmentedPool::addFreePool(unsigned int seg)
{
freeLock.lock();
freePools.push(seg);
freeLock.unlock();
}
There are plenty of multi-threading benchmarking tools out there too. You can play around with controlling your resources in different ways, run it through one of the tools, and see where any bottlenecks are if you feel like performance is becoming an issue.

Thread safety of multiple-reader/single-writer class

I am working on a set that is frequently read but rarely written.
class A {
boost::shared_ptr<std::set<int> > _mySet;
public:
void add(int v) {
boost::shared_ptr<std::set<int> > tmpSet(new std::set<int>(*_mySet));
tmpSet->insert(v); // insert to tmpSet
_mySet = tmpSet; // swap _mySet
}
void check(int v) {
boost::shared_ptr<std::set<int> > theSet = _mySet;
if (theSet->find(v) != theSet->end()) {
// do something irrelevant
}
}
};
In the class, add() is only called by one thread and check() is called by many threads. check() does not care whether _mySet is the latest or not. Is the class thread-safe? Is it possible that the thread executing check() would observe swap _mySet happening before insert to tmpSet?

This is an interesting use of shared_ptr to implement thread safety.
Whether it is OK depends on the thread-safety guarantees of
boost::shared_ptr. In particular, does it establish some sort of
fence or membar, so that you are guaranteed that all of the writes in
the constructor and insert functions of set occur before any
modification of the pointer value becomes visible.
I can find no thread safety guarantees whatsoever in the Boost
documentation of smart pointers. This surprizes me, as I was sure that
there was some. But a quick look at the sources for 1.47.0 show none,
and that any use of boost::shared_ptr in a threaded environment will
fail. (Could someone please point me to what I'm missing. I can't
believe that boost::shared_ptr has ignored threading.)
Anyway, there are three possibilities: you can't use the shared pointer
in a threaded environment (which seems to be the case), the shared
pointer ensures its own internal consistency in a threaded environment,
but doesn't establish ordering with regards to other objects, or the
shared pointer establishes full ordering. Only in the last case will
your code be safe as is. In the first case, you'll need some form of
lock around everything, and in the second, you'll need some sort of
fences or membar to ensure that the necessary writes are actually done
before publishing the new version, and that they will be seen before
trying to read it.

You do need synchronization, it is not thread safe. Generally it doesn't matter, even something as simple as shared += value; is not thread safe.
look here for example with regards to thread safety of shared_ptr: Is boost shared_ptr <XXX> thread safe?
I would also question your allocation/swapping in add() and use of shared_ptr in check()
update:
I went back and re-rad dox for shared_ptr ... It is most likely thread-safe in your particular since the reference counting for shared_ptr is thread-safe. However you are doing (IMHO) unnecessary complexity by not using read/write lock.

Eventually this code should be thread safe:
atomic_store(&_my_set,tmpSet);
and
theSet = atomic_load(&_mySet);
(instead of simple assignments)
But I don't know the current status of atomicity support for shared_ptr.
Note, that adding atomicity to shared_ptr in lock-free manner is really dificult thing; so even atomicity is implemented it may relay on mutexes or usermode spinlocks and, therefore, may sometimes suffer from performance issues
Edit: Perhaps, volatile qualifier for _my_set member variable should also be added.. but I'm not sure that it is strictly required by semantics of atomic operations

Lockless reader/writer

I have some data that is both read and updated by multiple threads. Both reads and writes must be atomic. I was thinking of doing it like this:
// Values must be read and updated atomically
struct SValues
{
double a;
double b;
double c;
double d;
};
class Test
{
public:
Test()
{
m_pValues = &m_values;
}
SValues* LockAndGet()
{
// Spin forver until we got ownership of the pointer
while (true)
{
SValues* pValues = (SValues*)::InterlockedExchange((long*)m_pValues, 0xffffffff);
if (pValues != (SValues*)0xffffffff)
{
return pValues;
}
}
}
void Unlock(SValues* pValues)
{
// Return the pointer so other threads can lock it
::InterlockedExchange((long*)m_pValues, (long)pValues);
}
private:
SValues* m_pValues;
SValues m_values;
};
void TestFunc()
{
Test test;
SValues* pValues = test.LockAndGet();
// Update or read values
test.Unlock(pValues);
}
The data is protected by stealing the pointer to it for every read and write, which should make it threadsafe, but it requires two interlocked instructions for every access. There will be plenty of both reads and writes and I cannot tell in advance if there will be more reads or more writes.
Can it be done more effective than this? This also locks when reading, but since it's quite possible to have more writes then reads there is no point in optimizing for reading, unless it does not inflict a penalty on writing.
I was thinking of reads acquiring the pointer without an interlocked instruction (along with a sequence number), copying the data, and then having a way of telling if the sequence number had changed, in which case it should retry. This would require some memory barriers, though, and I don't know whether or not it could improve the speed.
----- EDIT -----
Thanks all, great comments! I haven't actually run this code, but I will try to compare the current method with a critical section later today (if I get the time). I'm still looking for an optimal solution, so I will get back to the more advanced comments later. Thanks again!

What you have written is essentially a spinlock. If you're going to do that, then you might as well just use a mutex, such as boost::mutex. If you really want a spinlock, use a system-provided one, or one from a library rather than writing your own.
Other possibilities include doing some form of copy-on-write. Store the data structure by pointer, and just read the pointer (atomically) on the read side. On the write side then create a new instance (copying the old data as necessary) and atomically swap the pointer. If the write does need the old value and there is more than one writer then you will either need to do a compare-exchange loop to ensure that the value hasn't changed since you read it (beware ABA issues), or a mutex for the writers. If you do this then you need to be careful how you manage memory --- you need some way to reclaim instances of the data when no threads are referencing it (but not before).

There are several ways to resolve this, specifically without mutexes or locking mechanisms. The problem is that I'm not sure what the constraints on your system is.
Remember that atomic operations is something that often get moved around by the compilers in C++.
Generally I would solve the issue like this:
Multiple-producer-single-consumer by having 1 single-producer-single-consumer per writing thread. Each thread writes into their own queue. A single consumer thread that gathers the produced data and stores it in a single-consumer-multiple-reader data storage. The implementation for this is a lot of work and only recommended if you are doing a time-critical application and that you have the time to put in for this solution.
There are more things to read up about this, since the implementation is platform specific:
Atomic etc operations on windows/xbox360:
http://msdn.microsoft.com/en-us/library/ee418650(VS.85).aspx
The multithreaded single-producer-single-consumer without locks:
http://www.codeproject.com/KB/threads/LockFree.aspx#heading0005
What "volatile" really is and can be used for:
http://www.drdobbs.com/cpp/212701484
Herb Sutter has written a good article that reminds you of the dangers of writing this kind of code:
http://www.drdobbs.com/cpp/210600279;jsessionid=ZSUN3G3VXJM0BQE1GHRSKHWATMY32JVN?pgno=2

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js