Introducing shared pointer to reduce thread contention

Introducing shared pointer to reduce thread contention - c++

I have a class that holds a shared pointer to a map.
Code snippet:
class SomeClass {
private:
shared_ptr<const map<int, string>> sptr_;
ReaderWriterLock rw_lock_;
public:
SomeClass() : sptr_(make_shared<const map<int, string>>()) {};
void UpdateSharedPointer(shared_ptr<const map<int, string>>&& new_sptr) {
if (!new_sptr) { return; }
rw_lock_.Lock(true /* is_writer */);
sptr_ = move(new_sptr);
rw_lock_.Unlock();
}
void ReadSharedPointer() const {
rw_lock_.Lock(false /* is_writer */);
// Create a copy of the pointer.
shared_ptr<const map<int, string>> local_sptr = sptr_;
rw_lock_.Unlock();
if (!local_sptr) {
return;
}
// Read the map object pointed by the local_sptr and perform some operations.
}
}
Usage pattern: An instance of SomeClass can be accessed by multiple threads. And I am okay working on a stale copy of map inside ReadSharedPointer() method.
The use of shared pointer to a constant map object leads to a less contentious code. If someone needs to update or read the map, the lock is held for only a short while. Instead, if I eliminate the shared pointer, any map update or read will need to hold the rw_lock_ for a larger time.
What I am worried about is that this extra pointer indirection might make the code fragile. For example, my current use case requires only a constant map, but if in future someone updates the map to be a non-const object and forgets to update the locking mechanism associated with the read operation, a race condition might get introduced.
Also, one might argue that eliminating the shared pointer makes the complete workflow simpler to understand. And a shared pointer might not make sense if we are not truly sharing this map but only introducing a shared pointer to reduce contention.
So basically my questions are:
How good of a design is this? What are the pros and cons? Is there any alternative to this?

Related

c++ return structures and vectors optimally

I am reading a lot of different things on C++ optimization and I am getting quite mixed up. I would appreciate some help. Basically, I want to clear up what needs to be a pointer or not when I am passing vectors and structures as parameters or returning vectors and structures.
Say I have a Structure that contains 2 elements: an int and then a vector of integers. I will be creating this structure locally in a function, and then returning it. This function will be called multiple times and generate a new structure every time. I would like to keep the last structure created in a class member (lastStruct_ for example). So before returning the struct I could update lastStruct_ in some way.
Now, what would be the best way to do this, knowing that the vector in the structure can be quite large (would need to avoid copies). Does the vector in the struct need to be a pointer ? If I want to share lastStruct_ to other classes by creating a get_lastStruct() method, should I return a reference to lastStruct_, a pointer, or not care about that ? Should lastStruct_ be a shared pointer ?
This is quite confusing to me because apparently C++ knows how to avoid copying, but I also see a lot of people recommending the use of pointers while others say a pointer to a vector makes no sense at all.
struct MyStruct {
std::vector<int> pixels;
int foo;
}
class MyClass {
MyStruct lastStruct_;
public:
MyStruct create_struct();
MyStruct getLastStruct();
}
MyClass::create_struct()
{
MyStruct s = {std::vector<int>(100, 1), 1234};
lastStruct_ = s;
return s;
}
MyClass::getLastStruct()
{
return lastStruct_;
}

If the only copy you're trying to remove is the one that happen when you return it from your factory function, I'd say containing the vector directly will be faster all the time.
Why? Two things. Return Value Optimisation (RVO/NRVO) will remove any need for temporaries when returning. This is enough for almost all cases.
When return value optimisation don't apply, move semantics will. returning a named variable (eg: return my_struct;) will do implicit move in the case NRVO won't apply.
So why is it always faster than a shared pointer? Because when copying the shared pointer, you must dereference the control block to increase the owner count. And since it's an atomic operation, the incrementation is not free.
Also, using a shared pointer brings shared ownership and non-locality. If you were to use a shared pointer, use a pointer to const data to bring back value semantics.
Now that you added the code, it's much clearer what you're trying to do.
There's no way around the copy here. If you measure performance degradation, then containing a std::shared_ptr<const std::vector<int>> might be the solution, since you'll keep value semantic but avoid vector copy.

Accessing class object from multiple threads

Suppose I have some container class like the one below:
class Container {
public:
int const & operator[] (int n) const {
return data[n];
}
private:
std::vector<int> data;
}
I need to access its elements from multiple threads, using overloaded operator [] and passing an object of this class to lambda capture by reference:
Container store;
std::thread t_1([&store]() { /* do something here and read from store */ } );
std::thread t_2([&store]() { /* do something here and read from store */ } );
Will there be some slowdowns because of such design? Is it possible to speedup this part somehow?

Since std::vector's data() lies on the heap anyway, you cannot omit the access there. The only faster way would be to keep the elements on the stacks of the two threads (threads have separate stacks but share heap space), but this is not a possibility here. Thus, I see no optimisations for your case, unless you share your whole implementation and by changing the approach, one may come up with more performant implementation.
I would advise against it, though. That would belong to CodeReview, not StackOverflow.
Lastly, I would like to mention thread safety - I do not see any races here and I believe you specifically made sure that the example does not hint that you may encounter any (by only showing access to reading and not writing to shared resources), but it is still a good idea to check for them. If what you are doing is only reading, no data races will occur.

Keeping track of (stack-allocated) objects

In a rather large application, I want to keep track of some statistics about objects of a certain class. In order to not degrade performance, I want the stats to be updated in a pull-configuration. Hence, I need to have a reference to each live object in some location. Is there an idiomatic way to:
Create, search, iterate such references
Manage it automatically (i.e. remove the reference upon destruction)
I am thinking in terms of a set of smart pointers here, but the memory management would be somewhat inverted: Instead of destroying the object when the smart pointer is destroyed, I'd want the smart pointer to be removed, when the object is destroyed. Ideally, I do not want to reinvent the wheel.
I could live with a delay in the removal of the pointers, I'd just need a way to invalidate them quickly.
edit: Because paddy asked for it: The reason for pull-based collection is that obtaining the information may be relatively costly. Pushing is obviously a clean solution but considered too expensive.

There is no special feature of the language that will allow you to do this. Sometimes object tracking is handled by rolling your own memory allocator, but this doesn't work easily on the stack.
But if you're using only the stack it actually makes your problem easier, assuming that the objects being tracked are on a single thread. C++ makes special guarantees about the order of construction and destruction on the stack. That is, the destruction order is exactly the reverse of construction order.
And so, you can leverage this to store a single pointer in each object, plus one static pointer to track the most recent one. Now you have an object stack represented as a linked list.
template <typename T>
class Trackable
{
public:
Trackable()
: previous( current() )
{
current() = this;
}
~Trackable()
{
current() = previous;
}
// External interface
static const T *head() const { return dynamic_cast<const T*>( current() ); }
const T *next() const { return dynamic_cast<const T*>( previous ); }
private:
static Trackable * & current()
{
static Trackable *ptr = nullptr;
return ptr;
}
Trackable *previous;
}
Example:
struct Foo : Trackable<Foo> {};
struct Bar : Trackable<Bar> {};
// :::
// Walk linked list of Foo objects currently on stack.
for( Foo *foo = Foo::head(); foo; foo = foo->next() )
{
// Do kung foo
}
Now, admittedly this is a very simplistic solution. In a large application you may have multiple stacks using your objects. You could handle stacks on multiple threads by making current() use thread_local semantics. Although you need some magic to make this work, as head() would need to point at a registry of threads, and that would require synchronization.
You definitely don't want to synchronize all stacks into a single list, because that will kill your program's performance scalability.
As for your pull-requirement, I presume it's a separate thread wanting to walk over the list. You would need a way to synchronize such that all new object construction or destruction is blocked inside Trackable<T> while the list is being iterated. Or similar.
But at least you could take this basic idea and extend it to your needs.
Remember, you can't use this simple list approach if you allocate your objects dynamically. For that you would need a bi-directional list.

The simplest approach is to have code inside each object so that it registers itself on instantiation and removes itself upon destruction. This code can easily be injected using a CRTP:
template <class T>
struct AutoRef {
static auto &all() {
static std::set<T*> theSet;
return theSet;
}
private:
friend T;
AutoRef() { all().insert(static_cast<T*>(this)); }
~AutoRef() { all().erase(static_cast<T*>(this)); }
};
Now a Foo class can inherit from AutoRef<Foo> to have its instances referenced inside AutoRef<Foo>::all().
See it live on Coliru

Parallel Command Pattern

I wanted to know of how to make my use of the command pattern thread-safe while maintaining performance. I have a simulation where I perform upwards of tens of billions of iterations; performance is critical.
In this simulation, I have a bunch of Moves that perform commands on objects in my simulation. The base class looks like this:
class Move
{
public:
virtual ~Move(){}
// Perform a move.
virtual void Perform(Object& obj) = 0;
// Undo a move.
virtual void Undo() = 0;
};
The reason I have the object passed in on Perform rather than the constructor, as is typical with the Command pattern, is that I cannot afford to instantiate a new Move object every iteration. Rather, a concrete implementation of Move would simply take Object, maintain a pointer to it and it's previous state for when it's needed. Here's an example of a concrete implementation:
class ConcreteMove : public Move
{
std::string _ns;
std::string _prev;
Object* _obj;
ConcreteMove(std::string newstring): _ns(newstring) {}
virtual void Perform(Object& obj) override
{
_obj= &obj;
_prev = obj.GetIdentifier();
obj.SetIdentifier(_ns);
}
virtual void Undo()
{
_obj->SetIdentifier(_prev);
}
};
Unfortunately, what this has cost me is thread-safety. I want to parallelize my loop, where multiple iterators perform moves on a bunch of objects simultaneously. But obviously one instance of ConcreteMove cannot be reused because of how I implemented it.
I considered having Perform return a State object which can be passed into Undo, that way making the implementation thread-safe, since it is independent of the ConcereteMove state. However, the creation and destruction of such an object on each iteration is too costly.
Furthermore, the simulation has a vector of Moves because multiple moves can be performed every iteration stored in a MoveManager class which contains a vector of Move object pointers instantiated by the client. I set it up this way because the constructors of each particular Concrete moves take parameters (see above example).
I considered writing a copy operator for Move and MoveManager such that it can be duplicated amongst the threads, but I don't believe that is a proper answer because then the ownership of the Move objects falls on MoveManager rather than the client (who is only responsible for the first instance). Also, the same would be said for MoveManager and responsibility of maintaining that.
Update: Here's my MoveManager if it matters
class MoveManager
{
private:
std::vector<Move*> _moves;
public:
void PushMove(Move& move)
{
_moves.push_back(&move);
}
void PopMove()
{
_moves.pop_back();
}
// Select a move by index.
Move* SelectMove(int i)
{
return _moves[i];
}
// Get the number of moves.
int GetMoveCount()
{
return (int)_moves.size();
}
};
Clarification: All I need is one collection of Move objects per thread. They are re-used every iteration, where Perform is called on different objects each time.
Does anyone know how to solve this problem efficiently in a thread-safe manner?
Thanks!

What about the notion of a thread ID. Also, why not preconstruct the identifier strings and pass pointers to them?
class ConcreteMove : public Move
{
std::string *_ns;
std::vector<std::string *> _prev;
std::vector<Object *> _obj;
ConcreteMove(unsigned numthreads, std::string *newstring)
: _ns(newstring),
_prev(numthreads),
_obj(numthreads)
{
}
virtual void Perform(unsigned threadid, Object &obj) override
{
_obj[threadid] = &obj;
_prev[threadid] = obj.GetIdentifier();
obj.SetIdentifier(_ns);
}
virtual void Undo(unsigned threadid)
{
_obj[threadid]->SetIdentifier(_prev[threadid]);
}
};

Impossible with stated requirements. Specifically,
Use the command pattern. "the command pattern is a behavioral design pattern in which an object is used to represent and encapsulate all the information needed to call a method at a later time." Thus you're storing data.
You "can't afford" to allocate memory.
You have "billions" of iterations, which means some large static allocation won't suffice.
You want to store data without any place to store it. Thus there is no answer. However, if you're willing to change your requirements, there are undoubtedly many ways to solve your problem (whatever it may be -- I couldn't tell from the description.)
I also can't estimate how many Move objects you need at once. If that number is reasonably low then a specialized allocation scheme might solve part of your problem. Likewise, if most of the Move objects are duplicates, a different specialized allocation scheme might help.
In general what you're asking can't be solved, but relax the requirements and it shouldn't be hard.

Your Move Manager should not contain a vector of pointers, it should be a vector of Move objects
std::vector<Move> _moves;
It seems you will have one Move Manager per thread, so no issue of multi-threading problems, set the vector capacity at max, and then apply perform and other actions on the move in the vector
No new allocation, and you will be reusing the move objects

How can I create a smart pointer that locks and unlocks a mutex?

I have a threaded class from which I would like to occasionally acquire a pointer an instance variable. I would like this access to be guarded by a mutex so that the thread is blocked from accessing this resource until the client is finished with its pointer.
My initial approach to this is to return a pair of objects: one a pointer to the resource and one a shared_ptr to a lock object on the mutex. This shared_ptr holds the only reference to the lock object so the mutex should be unlocked when it goes out of scope. Something like this:
void A::getResource()
{
Lock* lock = new Lock(&mMutex);
return pair<Resource*, shared_ptr<Lock> >(
&mResource,
shared_ptr<Lock>(lock));
}
This solution is less than ideal because it requires the client to hold onto the entire pair of objects. Behaviour like this breaks the thread safety:
Resource* r = a.getResource().first;
In addition, my own implementation of this is deadlocking and I'm having difficulty determining why, so there may be other things wrong with it.
What I would like to have is a shared_ptr that contains the lock as an instance variable, binding it with the means to access the resource. This seems like something that should have an established design pattern but having done some research I'm surprised to find it quite hard to come across.
My questions are:
Is there a common implementation of this pattern?
Are there issues with putting a mutex inside a shared_ptr that I'm overlooking that prevent this pattern from being widespread?
Is there a good reason not to implement my own shared_ptr class to implement this pattern?
(NB I'm working on a codebase that uses Qt but unfortunately cannot use boost in this case. However, answers involving boost are still of general interest.)

I'm not sure if there are any standard implementations, but since I like re-implementing stuff for no reason, here's a version that should work (assuming you don't want to be able to copy such pointers):
template<class T>
class locking_ptr
{
public:
locking_ptr(T* ptr, mutex* lock)
: m_ptr(ptr)
, m_mutex(lock)
{
m_mutex->lock();
}
~locking_ptr()
{
if (m_mutex)
m_mutex->unlock();
}
locking_ptr(locking_ptr<T>&& ptr)
: m_ptr(ptr.m_ptr)
, m_mutex(ptr.m_mutex)
{
ptr.m_ptr = nullptr;
ptr.m_mutex = nullptr;
}
T* operator ->()
{
return m_ptr;
}
T const* operator ->() const
{
return m_ptr;
}
private:
// disallow copy/assignment
locking_ptr(locking_ptr<T> const& ptr)
{
}
locking_ptr& operator = (locking_ptr<T> const& ptr)
{
return *this;
}
T* m_ptr;
mutex* m_mutex; // whatever implementation you use
};

You're describing a variation of the EXECUTE AROUND POINTER pattern, described by Kevlin Henney in Executing Around Sequences.
I have a prototype implementation at exec_around.h but I can't guarantee it works correctly in all cases as it's a work in progress. It includes a function mutex_around which creates an object and wraps it in a smart pointer that locks and unlocks a mutex when accessed.

There is another approach here. Far less flexible and less generic, but also far simpler. While it still seems to fit your exact scenario.
shared_ptr (both standard and Boost) offers means to construct it while providing another shared_ptr instance which will be used for usage counter and some arbitrary pointer that will not be managed at all. On cppreference.com it is the 8th form (the aliasing constructor).
Now, normally, this form is used for conversions - like providing a shared_ptr to base class object from derived class object. They share ownership and usage counter but (in general) have two different pointer values of different types. This form is also used to provide a shared_ptr to a member value based on shared_ptr to object that it is a member of.
Here we can "abuse" the form to provide lock guard. Do it like this:
auto A::getResource()
{
auto counter = std::make_shared<Lock>(&mMutex);
std::shared_ptr<Resource> result{ counter, &mResource };
return result;
}
The returned shared_ptr points to mResource and keeps mMutex locked for as long as it is used by anyone.
The problem with this solution is that it is now your responsibility to ensure that the mResource remains valid (in particular - it doesn't get destroyed) for that long as well. If locking mMutex is enough for that, then you are fine.
Otherwise, above solution must be adjusted to your particular needs. For example, you might want to have the counter a simple struct that keeps both the Lock and another shared_ptr to the A object owning the mResource.

To add to Adam Badura's answer, for a more general case using std::mutex and std::lock_guard, this worked for me:
auto A::getResource()
{
auto counter = std::make_shared<std::lock_guard<std::mutex>>(mMutex);
std::shared_ptr<Resource> ptr{ counter, &mResource} ;
return ptr;
}
where the lifetimes of std::mutex mMutex and Resource mResource are managed by some class A.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Introducing shared pointer to reduce thread contention - c++

Related

c++ return structures and vectors optimally

Accessing class object from multiple threads

Keeping track of (stack-allocated) objects

Parallel Command Pattern

How can I create a smart pointer that locks and unlocks a mutex?

Categories

Resources