Thread-safe recursive iteration - c++

I am implementing an Observer pattern in my C++ application. So far I have a base class defined as follows:
template<typename CallbackType,
typename LockType = utils::Lock<boost::mutex>,
typename StorageType = std::vector<std::shared_ptr<CallbackType>>>
class CallbackManager
{
LockType mCallbackLock { "commons::CallbackManager", __FILE__, __LINE__ };
StorageType mCallbacks;
virtual uint64_t registerCallback( const std::shared_ptr<CallbackType>& aCallbackPtr );
virtual bool unregisterCallback( uint64_t aCallbackId );
void iterateSubscribers(const std::function<void(typename StorageType::value_type&)>&);
};
One of the key requirement is to be able to perform registerCallback/unregisterCallback calls within the iterateSubscribers context (recursively).
iterateSubscribers function performs an iterative call of aFnc with every callback as an argument. aFnc might perform registerCallback/unregisterCallback call. Current implementation written as follows:
void iterateSubscribers(const std::function<void(typename StorageType::value_type&)>& aFnc)
{
decltype(mCallbacks) callbacks;
{
// BEGIN SCOPED LOCK
auto lock = mCallbackLock.getScopedLock();
boost::ignore_unused( lock ); // To supress the warning in case NoLock is used
callbacks = mCallbacks;
// END SCOPED LOCK
}
for( auto it = callbacks.begin(); it != callbacks.end(); ++it )
{
aFnc( *it );
}
}
I am performing a copy of the callbacks container which is kinda not really good from the performance point of view but resolves two issues:
mCallbacks modifications do not break a loop
new lock acquisitions will not cause a deadlock
The trade-off is acceptable for me, I will not notify newly added subscribers and will not skip notification for recently removed subscribers.
What I can't resolve in an acceptable way is:
Non-copyable container as a template parameter for StorageType will cause a compilation error in iterateSubscribers function
The one possible solution is to define such container or its value_type as a pointer, but for the reasons I cannot afford this.
I also gave a look on alternative solution - do not perform a copy, replace regular mutex with a recursive mutex and perform all calls under the lock held, but it has one problem. Once element will be added or deleted, the loop will be broken:
for( auto it = callbacks.begin(); it != callbacks.end(); ++it )
Is there any other technique/ideas to achieve desired behavior?
EDIT:
I went by the road of the refactoring. Changing legacy callback containers value_type from the copy of the callback to the shared_ptr. That allowed me to copy the container and to not rely on the TriviallyCopyable state of the underplaying value_type.

Related

c++ thread pool: alternative to std::function for passing functions/lambdas to threads?

I have a thread pool that I use to execute many tiny jobs (millions of jobs, dozens/hundreds of milliseconds each). The jobs are passed in the form of either:
std::bind(&fn, arg1, arg2, arg3...)
or
[&](){fn(arg1, arg2, arg3...);}
with the thread pool taking them like this:
std::queue<std::function<void(void)>> queue;
void addJob(std::function<void(void)> fn)
{
queue.emplace_back(std::move(fn));
}
Pretty standard stuff....except that I've noticed a bottleneck where if jobs execute in a fast enough time (less than a millisecond), the conversion from lambda/binder to std::function in the addJob function actually takes longer than execution of the jobs themselves. After doing some reading, std::function is notoriously slow and so my bottleneck isn't necessarily unexpected.
Is there a faster way of doing this type of thing? I've looked into drop-in std::function replacements but they either weren't compatible with my compiler or weren't faster. I've also looked into "fast delegates" by Don Clugston but they don't seem to allow the passing of arguments along with functions (maybe I don't understand them correctly?).
I'm compiling with VS2015u3, and the functions passed to the jobs are all static, with their arguments being either ints/floats or pointers to other objects.
Have a separate queue for each of the task types - you probably don't have tens of thousands of task types. Each of these can be e.g. a static member of your tasks. Then addJob() is actually the ctor of Task and it's perfectly type-safe.
Then define a compile-time list of your task types and visit it via template metaprogramming (for_each). It'll be way faster as you don't need any virtual call fnptr / std::function<> to achieve this.
This will only work if your tuple code sees all the Task classes (so you can't e.g. add a new descendant of Task to an already running executable by loading the image from disc - hope that's a non-issue).
template<typename D> // CRTP on D
class Task {
public:
// you might want to static_assert at some point that D is in TaskTypeList
Task() : it_(tasks_.end()) {} // call enqueue() in descendant
~Task() {
// add your favorite lock here
if (queued()) {
tasks_.erase(it_);
}
}
bool queued() const { return it_ != tasks_.end(); }
static size_t ExecNext() {
if (!tasks_.empty()) {
// add your favorite lock here
auto&& itTask = tasks_.begin();
tasks_.pop_front();
// release lock
(*itTask)();
itTask->it_ = tasks_.end();
}
return tasks_.size();
}
protected:
void enqueue() const
{
// add your favorite lock here
tasks_.push_back(static_cast<D*>(this));
it_ = tasks_.rbegin();
}
private:
std::list<D*>::iterator it_;
static std::list<D*> tasks_; // you can have one per thread, too - then you don't need locking, but tasks are assigned to threads statically
};
struct MyTask : Task<MyTask> {
MyTask() { enqueue(); } // call enqueue only when the class is ready
void operator()() { /* add task here */ }
// ...
};
struct MyTask2; // etc.
template<typename...>
struct list_ {};
using TaskTypeList = list_<MyTask, MyTask2>;
void thread_pocess(list_<>) {}
template<typename TaskType, typename... TaskTypes>
void thread_pocess(list_<TaskType, TaskTypes...>)
{
TaskType::ExecNext();
thread_process(list_<TaskTypes...>());
}
void thread_process(void*)
{
for (;;) {
thread_process(TaskTypeList());
}
}
There's a lot to tune on this code: different threads should start from different parts of the queue (or one would use a ring, or several queues and either static/dynamic assignment to threads), you'd send it to sleep when there are absolutely no tasks, one could have an enum for the tasks, etc.
Note that this can't be used with arbitrary lambdas: you need to list task types. You need to 'communicate' the lambda type out of the function where you declare it (e.g. by returning `std::make_pair(retval, list_) and sometimes it's not easy. However, you can always convert a lambda to a functor, which is straightforward - just ugly.

Implementing a simple, generic thread pool in C++11

I want to create a thread pool for experimental purposes (and for the fun factor). It should be able to process a wide variety of tasks (so I can possibly use it in later projects).
In my thread pool class I'm going to need some sort of task queue. Since the Standard Library provides std::packaged_task since the C++11 standard, my queue will look like std::deque<std::packaged_task<?()> > task_queue, so the client can push std::packaged_tasks into the queue via some sort of public interface function (and then one of the threads in the pool will be notified with a condition variable to execute it, etc.).
My question is related to the template argument of the std::packaged_task<?()>s in the deque.
The function signature ?() should be able to deal with any type/number of parameters, because the client can do something like:
std::packaged_task<int()> t(std::bind(factorial, 342));
thread_pool.add_task(t);
So I don't have to deal with the type/number of parameters.
But what should the return value be? (hence the question mark)
If I make my whole thread pool class a template class, one instance
of it will only be able to deal with tasks with a specific signature
(like std::packaged_task<int()>).
I want one thread pool object to be able to deal with any kind of task.
If I go with std::packaged_task<void()> and the function invoked
returns an integer, or anything at all, then thats undefined behaviour.
So the hard part is that packaged_task<R()> is move-only, otherwise you could just toss it into a std::function<void()>, and run those in your threads.
There are a few ways around this.
First, ridiculously, use a packaged_task<void()> to store a packaged_task<R()>. I'd advise against this, but it does work. ;) (what is the signature of operator() on packaged_task<R()>? What is the required signature for the objects you pass to packaged_task<void()>?)
Second, wrap your packaged_task<R()> in a shared_ptr, capture that in a lambda with signature void(), store that in a std::function<void()>, and done. This has overhead costs, but probably less than the first solution.
Finally, write your own move-only function wrapper. For the signature void() it is short:
struct task {
template<class F,
class dF=std::decay_t<F>,
class=decltype( std::declval<dF&>()() )
>
task( F&& f ):
ptr(
new dF(std::forward<F>(f)),
[](void* ptr){ delete static_cast<dF*>(ptr); }
),
invoke([](void*ptr){
(*static_cast<dF*>(ptr))();
})
{}
void operator()()const{
invoke( ptr.get() );
}
task(task&&)=default;
task&operator=(task&&)=default;
task()=default;
~task()=default;
explicit operator bool()const{return static_cast<bool>(ptr);}
private:
std::unique_ptr<void, void(*)(void*)> ptr;
void(*invoke)(void*) = nullptr;
};
and simple. The above can store packaged_task<R()> for any type R, and invoke them later.
This has relatively minimal overhead -- it should be cheaper than std::function, at least the implementations I've seen -- except it does not do SBO (small buffer optimization) where it stores small function objects internally instead of on the heap.
You can improve the unique_ptr<> ptr container with a small buffer optimization if you want.
I happen to have an implementation which does exactly that. My way of doing things is to wrap the std::packaged_task objects in a struct which abstracts away the return type. The method which submits a task into the thread pool returns a future on the result.
This kind of works, but due to the memory allocations required for each task it is not suitable for tasks which are very short and very frequent (I tried to use it to parallelize chunks of a fluid simulation and the overhead was way too high, in the order of several milliseconds for 324 tasks).
The key part is this structure:
struct abstract_packaged_task
{
template <typename R>
abstract_packaged_task(std::packaged_task<R> &&task):
m_task((void*)(new std::packaged_task<R>(std::move(task)))),
m_call_exec([](abstract_packaged_task *instance)mutable{
(*(std::packaged_task<R>*)instance->m_task)();
}),
m_call_delete([](abstract_packaged_task *instance)mutable{
delete (std::packaged_task<R>*)(instance->m_task);
})
{
}
abstract_packaged_task(abstract_packaged_task &&other);
~abstract_packaged_task();
void operator()();
void *m_task;
std::function<void(abstract_packaged_task*)> m_call_exec;
std::function<void(abstract_packaged_task*)> m_call_delete;
};
As you can see, it hides away the type dependencies by using lambdas with std::function and a void*. If you know the maximum size of all possibly occuring std::packaged_task objects (I have not checked whether the size has a dependency on R at all), you could try to further optimize this by removing the memory allocation.
The submission method into the thread pool then does this:
template <typename R>
std::future<R> submit_task(std::packaged_task<R()> &&task)
{
assert(m_workers.size() > 0);
std::future<R> result = task.get_future();
{
std::unique_lock<std::mutex> lock(m_queue_mutex);
m_task_queue.emplace_back(std::move(task));
}
m_queue_wakeup.notify_one();
return result;
}
where m_task_queue is an std::deque of abstract_packaged_task structs. m_queue_wakeup is a std::condition_variable to wake a worker thread up to pick up the task. The worker threads implementation is as simple as:
void ThreadPool::worker_impl()
{
std::unique_lock<std::mutex> lock(m_queue_mutex, std::defer_lock);
while (!m_terminated) {
lock.lock();
while (m_task_queue.empty()) {
m_queue_wakeup.wait(lock);
if (m_terminated) {
return;
}
}
abstract_packaged_task task(std::move(m_task_queue.front()));
m_task_queue.pop_front();
lock.unlock();
task();
}
}
You can take a look at the full source code and the corresponding header on my github.

A thread-safe implementation of a generic container of type pair<unsigned int, boost::any> using shared_ptrs

I have created a generic message queue for use in a multi-threaded application. Specifically, single producer, multi-consumer. Main code below.
1) I wanted to know if I should pass a shared_ptr allocated with new into the enqueue method by value, or is it better to have the queue wrapper allocate the memory itself and just pass in a genericMsg object by const reference?
2) Should I have my dequeue method return a shared_ptr, have a shared_ptr passed in as a parameter by reference (current strategy), or just have it directly return a genericMsg object?
3) Will I need signal/wait in enqueue/dequeue or will the read/write locks suffice?
4) Do I even need to use shared_ptrs? Or will this depend solely on the implementation I use? I like that the shared_ptrs will free memory once all references are no longer using the object. I can easily port this to regular pointers if that's recommended, though.
5) I'm storing a pair here because I'd like to discriminate what type of message I'm dealing with else w/o having to do an any_cast. Every message type has a unique ID that refers to a specific struct. Is there a better way of doing this?
Generic Message Type:
template<typename Message_T>
class genericMsg
{
public:
genericMsg()
{
id = 0;
size = 0;
}
genericMsg (unsigned int &_id, unsigned int &_size, Message_T &_data)
{
id = _id;
size = _size;
data = _data;
}
~genericMsg()
{}
unisgned int id;
unsigned int size;
Message_T data; //All structs stored here contain only POD types
};
Enqueue Methods:
// ----------------------------------------------------------------
// -- Thread safe function that adds a new genericMsg object to the
// -- back of the Queue.
// -----------------------------------------------------------------
template<class Message_T>
inline void enqueue(boost::shared_ptr< genericMsg<Message_T> > data)
{
WriteLock w_lock(myLock);
this->qData.push_back(std::make_pair(data->id, data));
}
VS:
// ----------------------------------------------------------------
// -- Thread safe function that adds a new genericMsg object to the
// -- back of the Queue.
// -----------------------------------------------------------------
template<class Message_T>
inline void enqueue(const genericMsg<Message_T> &data_in)
{
WriteLock w_lock(myLock);
boost::shared_ptr< genericMsg<Message_T> > data =
new genericMsg<Message_T>(data_in.id, data_in.size, data_in.data);
this->qData.push_back(std::make_pair(data_in.id, data));
}
Dequeue Method:
// ----------------------------------------------------------------
// -- Thread safe function that grabs a genericMsg object from the
// -- front of the Queue.
// -----------------------------------------------------------------
template<class Message_T>
void dequeue(boost::shared_ptr< genericMsg<Message_T> > &msg)
{
ReadLock r_lock(myLock);
msg = boost::any_cast< boost::shared_ptr< genericMsg<Message_T> > >(qData.front().second);
qData.pop_front();
}
Get message ID:
inline unsigned int getMessageID()
{
ReadLock r_lock(myLock);
unsigned int tempID = qData.front().first;
return tempID;
}
Data Types:
std::deque < std::pair< unsigned int, boost::any> > qData;
Edit:
I have improved upon my design. I now have a genericMessage base class that I directly subclass from in order to derive the unique messages.
Generic Message Base Class:
class genericMessage
{
public:
virtual ~genericMessage() {}
unsigned int getID() {return id;}
unsigned int getSize() {return size;}
protected:
unsigned int id;
unsigned int size;
};
Producer Snippet:
boost::shared_ptr<genericMessage> tmp (new derived_msg1(MSG1_ID));
theQueue.enqueue(tmp);
Consumer Snippet:
boost::shared_ptr<genericMessage> tmp = theQueue.dequeue();
if(tmp->getID() == MSG1_ID)
{
boost::shared_ptr<derived_msg1> tObj = boost::dynamic_pointer_cast<derived_msg1>(tmp);
tObj->printData();
}
New Queue:
std::deque< boost::shared_ptr<genericMessage> > qData;
New Enqueue:
void mq_class::enqueue(const boost::shared_ptr<genericMessage> &data_in)
{
boost::unique_lock<boost::mutex> lock(mut);
this->qData.push_back(data_in);
cond.notify_one();
}
New Dequeue:
boost::shared_ptr<genericMessage> mq_class::dequeue()
{
boost::shared_ptr<genericMessage> ptr;
{
boost::unique_lock<boost::mutex> lock(mut);
while(qData.empty())
{
cond.wait(lock);
}
ptr = qData.front();
qData.pop_front();
}
return ptr;
}
Now, my question is am I doing dequeue correctly? Is there another way of doing it? Should I pass in a shared_ptr as a reference in this case to achieve what I want?
Edit (I added answers for parts 1, 2, and 4).
1) You should have a factory method that creates new genericMsgs and returns a std::unique_ptr. There is absolutely no good reason to pass genericMsg in by const reference and then have the queue wrap it in a smart pointer: Once you've passed by reference you have lost track of ownership, so if you do that the queue is going to have to construct (by copy) the entire genericMsg to wrap.
2) I can't think of any circumstance under which it would be safe to take a reference to a shared_ptr or unique_ptr or auto_ptr. shared_ptrs and unique_ptrs are for tracking ownership and once you've taken a reference to them (or the address of them) you have no idea how many references or pointers are still out there expecting the shared_ptr/unique_ptr object to contain a valid naked pointer.
unique_ptr is always preferred to a naked pointer, and is preferred to a shared_ptr in cases where you only have a single piece of code (validly) pointing to an object at a time.
https://softwareengineering.stackexchange.com/questions/133302/stdshared-ptr-as-a-last-resort
http://herbsutter.com/gotw/_103/
Bad practice to return unique_ptr for raw pointer like ownership semantics? (the answer explains why it is good practice not bad).
3) Yes, you need to use a std::condition_variable in your dequeue function. You need to test whether qData is empty or not before calling qData.front() or qData.pop_front(). If qData is empty you need to wait on a condition variable. When enqueue inserts an item it should signal the condition variable to wake up anyone who may have been waiting.
Your use of reader/writer locks is completely incorrect. Don't use reader/writer locks. Use std::mutex. A reader lock can only be used on a method that is completely const. You are modifying qData in dequeue, so a reader lock will lead to data races there. (Reader writer locks are only applicable when you have stupid code that is both const and holds locks for extended period of time. You are only keeping the lock for the period of time it takes to insert or remove from the queue, so even if you were const the added overhead of reader/writer locks would be a net lose.)
An example of implementing a (bounded) buffer using mutexes and condition_variables can be found at: Is this a correct way to implement a bounded buffer in C++.
4) unique_ptr is always preferred to naked pointers, and usually preferred to shared_ptr. (The main exception where shared_ptr might be better is for graph-like data structures.) In cases like yours where you are reading something in on side, creating a new object with a factory, moving the ownership to the queue and then moving ownership out of the queue to the consumer it sounds like you should be using unique_ptr.
5) You are reinventing tagged unions. Virtual functions were added to c++ specifically so you wouldn't need to do this. You should subclass your messages from a class that has a virtual function called do_it() (or better yet, operator()() or something like that). Then instead of tagging each struct, make each struct a subclass of your message class. When you dequeue each struct (or ptr to struct) just call do_it() on it. Strong static typing, no casts. See C++ std condition variable covering a lot of share variables for an example.
Also: if you are going to stick with the tagged unions: you can't have separate calls to get the id and the data item. Consider: If thread A calls to get the id, then thread B calls to get the id, then thread B retrieves the data item, now what happens when thread A calls to retrieve a data item? It gets a data item, but not with the type that it expected. You need to retrieve the id and the data item under the same critical section.
First of all, it's better to use 3rd-party concurrency containers than to implement them yourself, except it's for education purpose.
Your messages doesn't look to have costly constructors/destructor, so you can store them by value and forget about all your other questions. Use move semantics (if available) for optimizations.
If your profiler says "by value" is bad idea in your particular case:
I suppose your producer creates messages, puts them into your queue and loses any interest in them. In this case you don't need shared_ptr because you don't have shared ownership. You can use unique_ptr or even a raw pointer. It's implementation details and better to hide them inside the queue.
From performance point of view, it's better to implement lock-free queue. "locks vs. signals" depends completely on your application. For example, if you use thread pool and kind of a scheduler it's better to allow your clients to do something useful while queue is full/empty. In simpler cases reader/writer lock is just fine.
If I want to be thread safe, I usually use const objects and modify only on copy or create constructor. In this way you don't need to use any lock mechanism. In a threaded system, it is usually more effective than use mutexes on a single instance.
In your case only deque would need lock.

Is this an acceptable way to lock a container using C++?

I need to implement (in C++) a thread safe container in such a way that only one thread is ever able to add or remove items from the container. I have done this kind of thing before by sharing a mutex between threads. This leads to a lot of mutex objects being littered throughout my code and makes things very messy and hard to maintain.
I was wondering if there is a neater and more object oriented way to do this. I thought of the following simple class wrapper around the container (semi-pseudo C++ code)
class LockedList {
private:
std::list<MyClass> m_List;
public:
MutexObject Mutex;
};
so that locking could be done in the following way
LockedList lockableList; //create instance
lockableList.Mutex.Lock(); // Lock object
... // search and add or remove items
lockableList.Mutex.Unlock(); // Unlock object
So my question really is to ask if this is a good approach from a design perspective? I know that allowing public access to members is frowned upon from a design perspective, does the above design have any serious flaws in it. If so is there a better way to implement thread safe container objects?
I have read a lot of books on design and C++ in general but there really does seem to be a shortage of literature regarding multithreaded programming and multithreaded software design.
If the above is a poor approach to solving the problem I have could anyone suggest a way to improve it, or point me towards some information that explains good ways to design classes to be thread safe??? Many thanks.
I would rather design a resourece owner that locks a mutex and returns an object that can be used by the thread. Once the thread has finished with it and stops using the object the resource is automatically returned to its owner and the lock released.
template<typename Resource>
class ResourceOwner
{
Lock lock;
Resource resource;
public:
ResourceHolder<Resource> getExclusiveAccess()
{
// Let the ResourceHolder lock and unlock the lock
// So while a thread holds a copy of this object only it
// can access the resource. Once the thread releases all
// copies then the lock is released allowing another
// thread to call getExclusiveAccess().
//
// Make it behave like a form of smart pointer
// 1) So you can pass it around.
// 2) So all properties of the resource are provided via ->
// 3) So the lock is automatically released when the thread
// releases the object.
return ResourceHolder<Resource>(lock, resource);
}
};
The resource holder (not thought hard so this can be improved)
template<typename Resource>
class ResourceHolder<
{
// Use a shared_ptr to hold the scopped lock
// When first created will lock the lock. When the shared_ptr
// destroyes the scopped lock (after all copies are gone)
// this will unlock the lock thus allowding other to use
// getExclusiveAccess() on the owner
std::shared_ptr<scopped_lock> locker;
Resource& resource; // local reference on the resource.
public:
ResourceHolder(Lock& lock, Resource& r)
: locker(new scopped_lock(lock))
, resource(r)
{}
// Access to the resource via the -> operator
// Thus allowing you to use all normal functionality of
// the resource.
Resource* operator->() {return &resource;}
};
Now a lockable list is:
ResourceOwner<list<int>> lockedList;
void threadedCode()
{
ResourceHolder<list<int>> list = lockedList.getExclusiveAccess();
list->push_back(1);
}
// When list goes out of scope here.
// It is destroyed and the the member locker will unlock `lock`
// in its destructor thus allowing the next thread to call getExclusiveAccess()
I would do something like this to make it more exception-safe by using RAII.
class LockedList {
private:
std::list<MyClass> m_List;
MutexObject Mutex;
friend class LockableListLock;
};
class LockableListLock {
private:
LockedList& list_;
public:
LockableListLock(LockedList& list) : list_(list) { list.Mutex.Lock(); }
~LockableListLock(){ list.Mutex.Unlock(); }
}
You would use it like this
LockableList list;
{
LockableListLock lock(list); // The list is now locked.
// do stuff to the list
} // The list is automatically unlocked when lock goes out of scope.
You could also make the class force you to lock it before doing anything with it by adding wrappers around the interface for std::list in LockableListLock so instead of accessing the list through the LockedList class, you would access the list through the LockableListLock class. For instance, you would make this wrapper around std::list::begin()
std::list::iterator LockableListLock::begin() {
return list_.m_List.begin();
}
and then use it like this
LockableList list;
LockableListLock lock(list);
// list.begin(); //This is a compiler error so you can't
//access the list without locking it
lock.begin(); // This gets you the beginning of the list
Okay, I'll state a little more directly what others have already implied: at least part, and quite possibly all, of this design is probably not what you want. At the very least, you want RAII-style locking.
I'd also make the locked (or whatever you prefer to call it) a template, so you can decouple the locking from the container itself.
// C++ like pesudo-code. Not intended to compile as-is.
struct mutex {
void lock() { /* ... */ }
void unlock() { /* ... */ }
};
struct lock {
lock(mutex &m) { m.lock(); }
~lock(mutex &m) { m.unlock(); }
};
template <class container>
class locked {
typedef container::value_type value_type;
typedef container::reference_type reference_type;
// ...
container c;
mutex m;
public:
void push_back(reference_type const t) {
lock l(m);
c.push_back(t);
}
void push_front(reference_type const t) {
lock l(m);
c.push_front(t);
}
// etc.
};
This makes the code fairly easy to write and (for at least some cases) still get correct behavior -- e.g., where your single-threaded code might look like:
std::vector<int> x;
x.push_back(y);
...your thread-safe code would look like:
locked<std::vector<int> > x;
x.push_back(y);
Assuming you provide the usual begin(), end(), push_front, push_back, etc., your locked<container> will still be usable like a normal container, so it works with standard algorithms, iterators, etc.
The problem with this approach is that it makes LockedList non-copyable. For details on this snag, please look at this question:
Designing a thread-safe copyable class
I have tried various things over the years, and a mutex declared beside the the container declaration always turns out to be the simplest way to go ( once all the bugs have been fixed after naively implementing other methods ).
You do not need to 'litter' your code with mutexes. You just need one mutex, declared beside the container it guards.
It's hard to say that the coarse grain locking is a bad design decision. We'd need to know about the system that the code lives in to talk about that. It is a good starting point if you don't know that it won't work however. Do the simplest thing that could possibly work first.
You could improve that code by making it less likely to fail if you scope without unlocking though.
struct ScopedLocker {
ScopedLocker(MutexObject &mo_) : mo(mo_) { mo.Lock(); }
~ScopedLocker() { mo.Unlock(); }
MutexObject &mo;
};
You could also hide the implementation from users.
class LockedList {
private:
std::list<MyClass> m_List;
MutexObject Mutex;
public:
struct ScopedLocker {
ScopedLocker(LockedList &ll);
~ScopedLocker();
};
};
Then you just pass the locked list to it without them having to worry about details of the MutexObject.
You can also have the list handle all the locking internally, which is alright in some cases. The design issue is iteration. If the list locks internally, then operations like this are much worse than letting the user of the list decide when to lock.
void foo(LockedList &list) {
for (size_t i = 0; i < 100000000; i++) {
list.push_back(i);
}
}
Generally speaking, it's a hard topic to give advice on because of problems like this. More often than not, it's more about how you use an object. There are a lot of leaky abstractions when you try and write code that solves multi-processor programming. That is why you see more toolkits that let people compose the solution that meets their needs.
There are books that discuss multi-processor programming, though they are few. With all the new C++11 features coming out, there should be more literature coming within the next few years.
I came up with this (which I'm sure can be improved to take more than two arguments):
template<class T1, class T2>
class combine : public T1, public T2
{
public:
/// We always need a virtual destructor.
virtual ~combine() { }
};
This allows you to do:
// Combine an std::mutex and std::map<std::string, std::string> into
// a single instance.
combine<std::mutex, std::map<std::string, std::string>> mapWithMutex;
// Lock the map within scope to modify the map in a thread-safe way.
{
// Lock the map.
std::lock_guard<std::mutex> locked(mapWithMutex);
// Modify the map.
mapWithMutex["Person 1"] = "Jack";
mapWithMutex["Person 2"] = "Jill";
}
If you wish to use an std::recursive_mutex and an std::set, that would also work.

Read-write thread-safe smart pointer in C++, x86-64

I develop some lock free data structure and following problem arises.
I have writer thread that creates objects on heap and wraps them in smart pointer with reference counter. I also have a lot of reader threads, that work with these objects. Code can look like this:
SmartPtr ptr;
class Reader : public Thread {
virtual void Run {
for (;;) {
SmartPtr local(ptr);
// do smth
}
}
};
class Writer : public Thread {
virtual void Run {
for (;;) {
SmartPtr newPtr(new Object);
ptr = newPtr;
}
}
};
int main() {
Pool* pool = SystemThreadPool();
pool->Run(new Reader());
pool->Run(new Writer());
for (;;) // wait for crash :(
}
When I create thread-local copy of ptr it means at least
Read an address.
Increment reference counter.
I can't do these two operations atomically and thus sometimes my readers work with deleted object.
The question is - what kind of smart pointer should I use to make read-write access from several threads with correct memory management possible? Solution should exist, since Java programmers don't even care about such a problem, simply relying on that all objects are references and are deleted only when nobody uses them.
For PowerPC I found http://drdobbs.com/184401888, looks nice, but uses Load-Linked and Store-Conditional instructions, that we don't have in x86.
As far I as I understand, boost pointers provide such functionality only using locks. I need lock free solution.
boost::shared_ptr have atomic_store which uses a "lock-free" spinlock which should be fast enough for 99% of possible cases.
boost::shared_ptr<Object> ptr;
class Reader : public Thread {
virtual void Run {
for (;;) {
boost::shared_ptr<Object> local(boost::atomic_load(&ptr));
// do smth
}
}
};
class Writer : public Thread {
virtual void Run {
for (;;) {
boost::shared_ptr<Object> newPtr(new Object);
boost::atomic_store(&ptr, newPtr);
}
}
};
int main() {
Pool* pool = SystemThreadPool();
pool->Run(new Reader());
pool->Run(new Writer());
for (;;)
}
EDIT:
In response to comment below, the implementation is in "boost/shared_ptr.hpp"...
template<class T> void atomic_store( shared_ptr<T> * p, shared_ptr<T> r )
{
boost::detail::spinlock_pool<2>::scoped_lock lock( p );
p->swap( r );
}
template<class T> shared_ptr<T> atomic_exchange( shared_ptr<T> * p, shared_ptr<T> r )
{
boost::detail::spinlock & sp = boost::detail::spinlock_pool<2>::spinlock_for( p );
sp.lock();
p->swap( r );
sp.unlock();
return r; // return std::move( r )
}
With some jiggery-pokery you should be able to accomplish this using InterlockedCompareExchange128. Store the reference count and pointer in a 2 element __int64 array. If reference count is in array[0] and pointer in array[1] the atomic update would look like this:
while(true)
{
__int64 comparand[2];
comparand[0] = refCount;
comparand[1] = pointer;
if(1 == InterlockedCompareExchange128(
array,
pointer,
refCount + 1,
comparand))
{
// Pointer is ready for use. Exit the while loop.
}
}
If an InterlockedCompareExchange128 intrinsic function isn't available for your compiler then you may use the underlying CMPXCHG16B instruction instead, if you don't mind mucking around in assembly language.
The solution proposed by RobH doesn't work. It has the same problem as the original question: when accessing the reference count object, it might already have been deleted.
The only way I see of solving the problem without a global lock (as in boost::atomic_store) or conditional read/write instructions is to somehow delay the destruction of the object (or the shared reference count object if such thing is used). So zennehoy has a good idea but his method is too unsafe.
The way I might do it is by keeping copies of all the pointers in the writer thread so that the writer can control the destruction of the objects:
class Writer : public Thread {
virtual void Run() {
list<SmartPtr> ptrs; //list that holds all the old ptr values
for (;;) {
SmartPtr newPtr(new Object);
if(ptr)
ptrs.push_back(ptr); //push previous pointer into the list
ptr = newPtr;
//Periodically go through the list and destroy objects that are not
//referenced by other threads
for(auto it=ptrs.begin(); it!=ptrs.end(); )
if(it->refCount()==1)
it = ptrs.erase(it);
else
++it;
}
}
};
However there are still requirements for the smart pointer class. This doesn't work with shared_ptr as the reads and writes are not atomic. It almost works with boost::intrusive_ptr. The assignment on intrusive_ptr is implemented like this (pseudocode):
//create temporary from rhs
tmp.ptr = rhs.ptr;
if(tmp.ptr)
intrusive_ptr_add_ref(tmp.ptr);
//swap(tmp,lhs)
T* x = lhs.ptr;
lhs.ptr = tmp.ptr;
tmp.ptr = x;
//destroy temporary
if(tmp.ptr)
intrusive_ptr_release(tmp.ptr);
As far as I understand the only thing missing here is a compiler level memory fence before lhs.ptr = tmp.ptr;. With that added, both reading rhs and writing lhs would be thread-safe under strict conditions: 1) x86 or x64 architecture 2) atomic reference counting 3) rhs refcount must not go to zero during the assignment (guaranteed by the Writer code above) 4) only one thread writing to lhs (using CAS you could have several writers).
Anyway, you could create your own smart pointer class based on intrusive_ptr with necessary changes. Definitely easier than re-implementing shared_ptr. And besides, if you want performance, intrusive is the way to go.
The reason this works much more easily in java is garbage collection. In C++, you have to manually ensure that a value is not just starting to be used by a different thread when you want to delete it.
A solution I've used in a similar situation is to simply delay the deletion of the value. I create a separate thread that iterates through a list of things to be deleted. When I want to delete something, I add it to this list with a timestamp. The deleting thread waits until some fixed time after this timestamp before actually deleting the value. You just have to make sure that the delay is large enough to guarantee that any temporary use of the value has completed.
100 milliseconds would have been enough in my case, I chose a few seconds to be safe.