Difficult concurrent design - c++

I have a class called Root which serves as some kind of phonebook for dynamic method calls: it holds a dictionary of url keys pointing to objects. When a command wants to execute a given method it calls a Root instance with an url and some parameter:
root_->call("/some/url", ...);
Actually, the call method in Root looks close to this:
// Version 0
const Value call(const Url &url, const Value &val) {
// A. find object
if (!objects_.get(url.path(), &target))
return ErrorValue(NOT_FOUND_ERROR, url.path());
}
// B. trigger the object's method
return target->trigger(val);
}
From the code above, you can see that this "call" method is not thread safe: the "target" object could be deleted between A and B and we have no guarantee that the "objects_" member (dictionary) is not altered while we read it.
The first solution that occurred to me was:
// Version I
const Value call(const Url &url, const Value &val) {
// Lock Root object with a mutex
ScopedLock lock(mutex_);
// A. find object
if (!objects_.get(url.path(), &target))
return ErrorValue(NOT_FOUND_ERROR, url.path());
}
// B. trigger the object's method
return target->trigger(val);
}
This is fine until "target->trigger(val)" is a method that needs to alter Root, either by changing an object's url or by inserting new objects. Modifying the scope and using a RW mutex can help (there are far more reads than writes on Root):
// Version II
const Value call(const Url &url, const Value &val) {
// A. find object
{
// Use a RW lock with smaller scope
ScopedRead lock(mutex_);
if (!objects_.get(url.path(), &target))
return ErrorValue(NOT_FOUND_ERROR, url.path());
}
}
// ? What happens to 'target' here ?
// B. trigger the object's method
return target->trigger(val);
}
What happens to 'target' ? How do we ensure it won't be deleted between finding and calling ?
Some ideas: object deletion could be post-poned in a message queue in Root. But then we would need another RW mutex read-locking deletion on the full method scope and use a separate thread to process the delete queue.
All this seems very convoluted to me and I'm not sure if concurrent design has to look like this or I just don't have the right ideas.
PS: the code is part of an open source project called oscit (OpenSoundControl it).

To avoid the deletion of 'target', I had to write a thread safe reference counted smart pointer. It is not that hard to do. The only thing you need to ensure is that the reference count is accessed within a critical section. See this post for more information.

You are on the wrong track with this. Keep in mind: you can't lock data, you can only block code. You cannot protect the "objects" member with a locally defined mutex. You need the exact same mutex in the code that alters the objects collection. It must block that code when another thread is executing the call() method. The mutex must be defined at least at class scope.

Related

Using member shared_ptr from a member callback function running in different thread (ROS topic subscription)

I am not completely sure how to best title this question since I am not completely sure what the nature of the problem actually is (I guess "how fix segfault" is not a good title).
The situation is, I have written this code:
template <typename T> class LatchedSubscriber {
private:
ros::Subscriber sub;
std::shared_ptr<T> last_received_msg;
std::shared_ptr<std::mutex> mutex;
int test;
void callback(T msg) {
std::shared_ptr<std::mutex> thread_local_mutex = mutex;
std::shared_ptr<T> thread_local_msg = last_received_msg;
if (!thread_local_mutex) {
ROS_INFO("Mutex pointer is null in callback");
}
if (!thread_local_msg) {
ROS_INFO("lrm: pointer is null in callback");
}
ROS_INFO("Test is %d", test);
std::lock_guard<std::mutex> guard(*thread_local_mutex);
*thread_local_msg = msg;
}
public:
LatchedSubscriber() {
last_received_msg = std::make_shared<T>();
mutex = std::make_shared<std::mutex>();
test = 42;
if (!mutex) {
ROS_INFO("Mutex pointer is null in constructor");
}
else {
ROS_INFO("Mutex pointer is not null in constructor");
}
}
void start(ros::NodeHandle &nh, const std::string &topic) {
sub = nh.subscribe(topic, 1000, &LatchedSubscriber<T>::callback, this);
}
T get_last_msg() {
std::lock_guard<std::mutex> guard(*mutex);
return *last_received_msg;
}
};
Essentially what it is doing is subscribing to a topic (channel), meaning that a callback function is called each time a message arrives. The job of this class is to store the last received message so the user of the class can always access it.
In the constructor I allocate a shared_ptr to the message and for a mutex to synchronize access to this message. The reason for using heap memory here is so the LatchedSubscriber can be copied and the same latched message can still be read. (the Subscriber already implements this kind of behavior where copying it doesn't do anything except for the fact that the callback stops being called once the last instance goes out of scope).
The problem is basically that the code segfaults. I am pretty sure the reason for this is that my shared pointers become null in the callback function, despite not being null in the constructor.
The ROS_INFO calls print:
Mutex pointer is not null in constructor
Mutex pointer is null in callback
lrm: pointer is null in callback
Test is 42
I don't understand how this can happen. I guess I have either misunderstood something about shared pointers, ros topic subscriptions, or both.
Things I have done:
At first I had the subscribe call happening in the constructor. I think giving the this pointer to another thread before the constructor has returned can be bad, so I moved this into a start function which is called after the object has been constructed.
There are many aspects to the thread safety of shared_ptrs it seems. At first I used mutex and last_received_msg directly in the callback. Now I have copied them into local variables hoping this would help. But it doesn't seem to make a difference.
I have added a local integer variable. I can read the integer I assigned to this variable in the constructor from the callback. Just a sanity check to make sure that the callback is actually called on an instance created by my constructor.
I think I have figured out the problem.
When subscribing I am passing the this pointer to the subscribe function along with the callback. If the LatchedSubscriber is ever copied and the original deleted, that this pointer becomes invalid, but the sub still exists so the callback keeps being called.
I didn't think this happened anywhere in my code, but the LatcedSubscriber was stored as a member inside an object which was owned by a unique pointer. It looks like make_unique might be doing some copying internally? In any case it is wrong to use the this pointer for the callback.
I ended up doing the following instead
void start(ros::NodeHandle &nh, const std::string &topic) {
auto l_mutex = mutex;
auto l_last_received_msg = last_received_msg;
boost::function<void(const T)> callback =
[l_mutex, l_last_received_msg](const T msg) {
std::lock_guard<std::mutex> guard(*l_mutex);
*l_last_received_msg = msg;
};
sub = nh.subscribe<T>(topic, 1000, callback);
}
This way copies of the two smart pointers are used with the callback instead.
Assigning the closure to a variable of type boost::function<void(const T)> seems to be necessary. Probably due to the way the subscribe function is.
This appears to have fixed the issue. I might also move the subscription into the constructor again and get rid of the start method.

Is this inter-thread object sharing strategy sound?

I'm trying to come up with a fast way of solving the following problem:
I have a thread which produces data, and several threads which consume it. I don't need to queue produced data, because data is produced much more slowly than it is consumed (and even if this failed to be the case occasionally, it wouldn't be a problem if a data point were skipped occasionally). So, basically, I have an object that encapsulates the "most recent state", which only the producer thread is allowed to update.
My strategy is as follows (please let me know if I'm completely off my rocker):
I've created three classes for this example: Thing (the actual state object), SharedObject<Thing> (an object that can be local to each thread, and gives that thread access to the underlying Thing), and SharedObjectManager<Thing>, which wraps up a shared_ptr along with a mutex.
The instance of the SharedObjectManager (SOM) is a global variable.
When the producer starts, it instantiates a Thing, and tells the global SOM about it. It then makes a copy, and does all of it's updating work on that copy. When it is ready to commit it's changes to the Thing, it passes the new Thing to the global SOM, which locks it's mutex, updates the shared pointer it keeps, and then releases the lock.
Meanwhile, the consumer threads all intsantiate SharedObject<Thing>. these objects each keep a pointer to the global SOM, as well as a cached copy of the shared_ptr kept by the SOM... It keeps this cached until update() is explicitly called.
I believe this is getting hard to follow, so here's some code:
#include <mutex>
#include <iostream>
#include <memory>
class Thing
{
private:
int _some_member = 10;
public:
int some_member() const { return _some_member; }
void some_member(int val) {_some_member = val; }
};
// one global instance
template<typename T>
class SharedObjectManager
{
private:
std::shared_ptr<T> objPtr;
std::mutex objLock;
public:
std::shared_ptr<T> get_sptr()
{
std::lock_guard<std::mutex> lck(objLock);
return objPtr;
}
void commit_new_object(std::shared_ptr<T> new_object)
{
std::lock_guard<std::mutex> lck (objLock);
objPtr = new_object;
}
};
// one instance per consumer thread.
template<typename T>
class SharedObject
{
private:
SharedObjectManager<T> * som;
std::shared_ptr<T> cache;
public:
SharedObject(SharedObjectManager<T> * backend) : som(backend)
{update();}
void update()
{
cache = som->get_sptr();
}
T & operator *()
{
return *cache;
}
T * operator->()
{
return cache.get();
}
};
// no actual threads in this test, just a quick sanity check.
SharedObjectManager<Thing> glbSOM;
int main(void)
{
glbSOM.commit_new_object(std::make_shared<Thing>());
SharedObject<Thing> myobj(&glbSOM);
std::cout<<myobj->some_member()<<std::endl;
// prints "10".
}
The idea for use by the producer thread is:
// initialization - on startup
auto firstStateObj = std::make_shared<Thing>();
glbSOM.commit_new_object(firstStateObj);
// main loop
while (1)
{
// invoke copy constructor to copy the current live Thing object
auto nextState = std::make_shared<Thing>(*(glbSOM.get_sptr()));
// do stuff to nextState, gradually filling out it's new value
// based on incoming data from other sources, etc.
...
// commit the changes to the shared memory location
glbSOM.commit_new_object(nextState);
}
The use by consumers would be:
SharedObject<Thing> thing(&glbSOM);
while(1)
{
// think about the data contained in thing, and act accordingly...
doStuffWith(thing->some_member());
// re-cache the thing
thing.update();
}
Thanks!
That is way overengineered. Instead, I'd suggest to do following:
Create a pointer to Thing* theThing together with protection mutex. Either a global one, or shared by some other means. Initialize it to nullptr.
In your producer: use two local objects of Thing type - Thing thingOne and Thing thingTwo (remember, thingOne is no better than thingTwo, but one is called thingOne for a reason, but this is a thing thing. Watch out for cats.). Start with populating thingOne. When done, lock the mutex, copy thingOne address to theThing, unlock the mutex. Start populating thingTwo. When done, see above. Repeat untill killed.
In every listener: (make sure the pointer is not nullptr). Lock the mutex. Make a copy of the object pointed two by the theThing. Unlock the mutex. Work with your copy. Burn after reading. Repeat untill killed.

Pass an object to another thread with AfxBeginThread

My program has a callback function which is called to handle notifications that are received in the form of objects. Because we can handle hundreds a second, this callback function handles the events by spawning a separate thread to handle each one. This is the callback function:
void _OnEvent(LPCTSTR eventID, CNotification cNotificaton) {
if (_pActiveDoc) {
Param_Event* p = new Param_Event;
p->pDoc = _pActiveDoc;
p->lpszEventID = eventID;
p->cNotification = cNotification;
AfxBeginThread(ProcessEvent,p);
}
}
My query comes from the fact that is passed to the callback method is initially created on the stack, and is therefore (according to my understanding) limited to the scope of the calling method:
void CallingMethod(CString strEventID) {
CNotification cNotification;
// Fill in the details of the notification
_OnEvent(strEventID,cNotification);
}
CNotification has a full copy constructor, and since the Param_Event object is created on the heap, my belief was that this would allow the original CNotification object to fall out of scope safely, with the spawned thread working from its own "private" CNotification object that exists until the Param_Event object is deleted with delete. The fact is, however, that we are getting (rare but occasional) crashing, and I am wondering if perhaps my belief here is incorrect: is it possible that the spawned thread is still accessing the original object somehow? If this was the case, this would explain the crashing by the rare occurrence of the object both falling out of scope and being overwritten in memory, thus creating a memory access exception.
Could I be right? Is there anything actually wrong with the method I am using? Would it be safer create the notification object on the heap initially (this would mean changing a lot of our code), or building a new object on the heap to pass to the spawned thread?
For reference, here is my ProcessEvent() method:
Param_TelephoneEvent *p = (Param_TelephoneEvent*)lParam;
p->pDoc->OnTelephoneEvent(p->lpszEventID,p->cNotification);
delete p;
return 0;
All advice welcome. Thanks in advance!
Edit: Copy constructor:
CNotification& CNotification::operator=(const CNotification &rhs)
{
m_eamspeMostRecentEvent = rhs.m_eamspeMostRecentEvent;
m_eamtcsCallStatusAtEvent = rhs.m_eamtcsCallStatusAtEvent;
m_bInbound = rhs.m_bInbound;
strcpy(m_tcExtension , rhs.m_tcExtension);
strcpy(m_tcNumber, rhs.m_tcNumber);
strcpy(m_tcName,rhs.m_tcName);
strcpy(m_tcDDI,rhs.m_tcDDI);
strcpy(m_tcCallID,rhs.m_tcCallID);
strcpy(m_tcInterTelEvent,rhs.m_tcInterTelEvent);
m_dTimestamp = rhs.m_dTimestamp;
m_dStartTime = rhs.m_dStartTime;
m_nCallID = rhs.m_nCallID;
return *this;
}

Is it safe to modify data of pointer in vector from another thread?

Things seem to be working but I'm unsure if this is the best way to go about it.
Basically I have an object which does asynchronous retrieval of data. This object has a vector of pointers which are allocated and de-allocated on the main thread. Using boost functions a process results callback is bound with one of the pointers in this vector. When it fires it will be running on some arbitrary thread and modify the data of the pointer.
Now I have critical sections around the parts that are pushing into the vector and erasing in case the asynch retrieval object is receives more requests but I'm wondering if I need some kind of guard in the callback that is modifying the pointer data as well.
Hopefully this slimmed down pseudo code makes things more clear:
class CAsyncRetriever
{
// typedefs of boost functions
class DataObject
{
// methods and members
};
public:
// Start single asynch retrieve with completion callback
void Start(SomeArgs)
{
SetupRetrieve(SomeArgs);
LaunchRetrieves();
}
protected:
void SetupRetrieve(SomeArgs)
{
// ...
{ // scope for data lock
boost::lock_guard<boost::mutex> lock(m_dataMutex);
m_inProgress.push_back(SmartPtr<DataObject>(new DataObject)));
m_callback = boost::bind(&CAsyncRetriever::ProcessResults, this, _1, m_inProgress.back());
}
// ...
}
void ProcessResults(DataObject* data)
{
// CALLED ON ANOTHER THREAD ... IS THIS SAFE?
data->m_SomeMember.SomeMethod();
data->m_SomeOtherMember = SomeStuff;
}
void Cleanup()
{
// ...
{ // scope for data lock
boost::lock_guard<boost::mutex> lock(m_dataMutex);
while(!m_inProgress.empty() && m_inProgress.front()->IsComplete())
m_inProgress.erase(m_inProgress.begin());
}
// ...
}
private:
std::vector<SmartPtr<DataObject>> m_inProgress;
boost::mutex m_dataMutex;
// other members
};
Edit: This is the actual code for the ProccessResults callback (plus comments for your benefit)
void ProcessResults(CRetrieveResults* pRetrieveResults, CRetData* data)
{
// pRetrieveResults is delayed binding that server passes in when invoking callback in thread pool
// data is raw pointer to ref counted object in vector of main thread (the DataObject* in question)
// if there was an error set the code on the atomic int in object
data->m_nErrorCode.Store_Release(pRetrieveResults->GetErrorCode());
// generic iterator of results bindings for generic sotrage class item
TPackedDataIterator<GenItem::CBind> dataItr(&pRetrieveResults->m_DataIter);
// namespace function which will iterate results and initialize generic storage
GenericStorage::InitializeItems<GenItem>(&data->m_items, dataItr, pRetrieveResults->m_nTotalResultsFound); // this is potentially time consuming depending on the amount of results and amount of columns that were bound in storage class definition (i.e.about 8 seconds for a million equipment items in release)
// atomic uint32_t that is incremented when kicking off async retrieve
m_nStarted.Decrement(); // this one is done processing
// boost function completion callback bound to interface that requested results
data->m_complete(data->m_items);
}
As it stands, it appears that the Cleanup code can destroy an object for which a callback to ProcessResults is in flight. That's going to cause problems when you deref the pointer in the callback.
My suggestion would be that you extend the semantics of your m_dataMutex to encompass the callback, though if the callback is long-running, or can happen inline within SetupRetrieve (sometimes this does happen - though here you state the callback is on a different thread, in which case you are OK) then things are more complex. Currently m_dataMutex is a bit confused about whether it controls access to the vector, or its contents, or both. With its scope clarified, ProcessResults could then be enhanced to verify validity of the payload within the lock.
No, it isn't safe.
ProcessResults operates on the data structure passed to it through DataObject. It indicates that you have shared state between different threads, and if both threads operate on the data structure concurrently you might have some trouble coming your way.
Updating a pointer should be an atomic operation, but you can use InterlockedExchangePointer (in Windows) to be sure. Not sure what the Linux equivalent would be.
The only consideration then would be if one thread is using an obsolete pointer. Does the other thread delete the object pointed to by the original pointer? If so, you have a definite problem.

Thread-Safe implementation of an object that deletes itself

I have an object that is called from two different threads and after it was called by both it destroys itself by "delete this".
How do I implement this thread-safe? Thread-safe means that the object never destroys itself exactly one time (it must destroys itself after the second callback).
I created some example code:
class IThreadCallBack
{
virtual void CallBack(int) = 0;
};
class M: public IThreadCallBack
{
private:
bool t1_finished, t2_finished;
public:
M(): t1_finished(false), t2_finished(false)
{
startMyThread(this, 1);
startMyThread(this, 2);
}
void CallBack(int id)
{
if (id == 1)
{
t1_finished = true;
}
else
{
t2_finished = true;
}
if (t1_finished && t2_finished)
{
delete this;
}
}
};
int main(int argc, char **argv) {
M* MObj = new M();
while(true);
}
Obviously I can't use a Mutex as member of the object and lock the delete, because this would also delete the Mutex. On the other hand, if I set a "toBeDeleted"-flag inside a mutex-protected area, where the finised-flag is set, I feel unsure if there are situations possible where the object isnt deleted at all.
Note that the thread-implementation makes sure that the callback method is called exactly one time per thread in any case.
Edit / Update:
What if I change Callback(..) to:
void CallBack(int id)
{
mMutex.Obtain()
if (id == 1)
{
t1_finished = true;
}
else
{
t2_finished = true;
}
bool both_finished = (t1_finished && t2_finished);
mMutex.Release();
if (both_finished)
{
delete this;
}
}
Can this considered to be safe? (with mMutex being a member of the m class?)
I think it is, if I don't access any member after releasing the mutex?!
Use Boost's Smart Pointer. It handles this automatically; your object won't have to delete itself, and it is thread safe.
Edit:
From the code you've posted above, I can't really say, need more info. But you could do it like this: each thread has a shared_ptr object and when the callback is called, you call shared_ptr::reset(). The last reset will delete M. Each shared_ptr could be stored with thread local storeage in each thread. So in essence, each thread is responsible for its own shared_ptr.
Instead of using two separate flags, you could consider setting a counter to the number of threads that you're waiting on and then using interlocked decrement.
Then you can be 100% sure that when the thread counter reaches 0, you're done and should clean up.
For more info on interlocked decrement on Windows, on Linux, and on Mac.
I once implemented something like this that avoided the ickiness and confusion of delete this entirely, by operating in the following way:
Start a thread that is responsible for deleting these sorts of shared objects, which waits on a condition
When the shared object is no longer being used, instead of deleting itself, have it insert itself into a thread-safe queue and signal the condition that the deleter thread is waiting on
When the deleter thread wakes up, it deletes everything in the queue
If your program has an event loop, you can avoid the creation of a separate thread for this by creating an event type that means "delete unused shared objects" and have some persistent object respond to this event in the same way that the deleter thread would in the above example.
I can't imagine that this is possible, especially within the class itself. The problem is two fold:
1) There's no way to notify the outside world not to call the object so the outside world has to be responsible for setting the pointer to 0 after calling "CallBack" iff the pointer was deleted.
2) Once two threads enter this function you are, and forgive my french, absolutely fucked. Calling a function on a deleted object is UB, just imagine what deleting an object while someone is in it results in.
I've never seen "delete this" as anything but an abomination. Doesn't mean it isn't sometimes, on VERY rare conditions, necessary. Problem is that people do it way too much and don't think about the consequences of such a design.
I don't think "to be deleted" is going to work well. It might work for two threads, but what about three? You can't protect the part of code that calls delete because you're deleting the protection (as you state) and because of the UB you'll inevitably cause. So the first goes through, sets the flag and aborts....which of the rest is going to call delete on the way out?
The more robust implementation would be to implement reference counting. For each thread you start, increase a counter; for each callback call decrease the counter and if the counter has reached zero, delete the object. You can lock the counter access, or you could use the Interlocked class to protect the counter access, though in that case you need to be careful with potential race between the first thread finishing and the second starting.
Update: And of course, I completely ignored the fact that this is C++. :-) You should use InterlockExchange to update the counter instead of the C# Interlocked class.