Sharing data between API and application threads

Sharing data between API and application threads - c++

My API computes some data in its own thread:
/*** API is running in its own thread ***/
class API {
public:
std::shared_ptr<Data> retrieveData() { return mData; }
private:
std::shared_ptr<Data> mData;
std::mutex mDataMutex;
void run () {
std::thread t([](){
while (!exitApi) {
mDataMutex.lock();
updateData(mData);
mDataMutex.unlock();
});
t.join();
}
};
An application that uses my API will retrieve the shared data in another thread:
/*** Application is running in another thread ***/
class Application {
private:
Api mApi;
void run () {
std::thread t([](){
while (!exitApp) {
std::shared_ptr<Data> data = mApi.retrieveData();
/* API thread can update the data while the App is using it! */
useData(data);
});
t.join();
}
How can I design my API so that there are no pitfalls for the application-developer when retrieving the data? I can think of three options but do not like any of them:
Instead of sharing the pointer, the API will return a copy of all the data. However, the amount of data can get quite large and copying it should be avoided.
The API will lock the data when it hands it over to the application and the application needs to explicitly ask the API to unlock it again after performing all computations. Even if documented correctly this is very much prone to deadlocks.
When the API hands over the data to the application retrieveData will also return an already locked std::unique_lock. Once the application is done with using the data it has to unlock the unique_lock. This is potentially less prone to error but still not very obvious for the application developer.
Are there any better options to design the API (in modern C++11 and beyond) that is as developer-friendly as possible?

TL;DR: Use shared_ptr with a custom deleter that calls unlock.
It think the two main approaches are:
Returning an immutable data structure so it can be shared between threads. This is makes for a clean API, but (as already mentioned) copying could be expensive. Some approaches to reduce the need for copying would be:
Use a copy-on-write data structure so that only some portions of the data need to be copied each time. Depending on your data this may be not be
possible, or it will be too much work to refactor.
Make use of move-references where possible to reduce the cost of copying. This alone probably won't be enough, but depends on your actual data.
Using locks around a mutable data structure. As is pointed out, this requires the API user to perform extra actions that may not be obvious. But smart pointers can be used to lessen the burden on the consumers:
One way to make it easier is to return a custom smart pointer that
unlocks in its destructor: Unlock happens when the caller's scope closes, so there are no unlock calls for the caller to worry about. The API consumer would pass the pointer by reference to its methods. For example func(locking_ptr& ptr). A simple implementation can be found here: https://stackoverflow.com/a/15876719/1617480.
To be able to pass that locking smart pointer by copy instead of by reference, some sort of reference counting scheme needs to be in place. You'd probably want to use shared_ptr internal to the locking smart pointer to avoid rolling your own thread-safe reference counting. More simply pass a custom deleter to shared_ptr that unlocks and deletes (No need to write a smart pointer at all).
Another type of smart pointer would surround every dereference -> in locks. I don't think this is appropriate for this use case, since it looks the API consumer wants a consistent view of the results. Here's an example of such a pointer: https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Execute-Around_Pointer

Another approach would involve closures, for example adding
// inside class API
void do_locked(std::function<void(Data*)>fun) {
std::guard_lock<std::mutex> gu(mDataMutex);
fun(mData);
}
and you could invoke it with a lambda expression
//inside Application::run2
mApi.do_locked([=](Data *dat) { useData(dat); });

Related

C++20 solutions for temporary ownership of resources

I have a resource managed by an external system with the following API:
struct api {
resource* acquire();
void release(resource*);
};
My code requires to temporarily share one of these resources across multiple threads, so I need to put the resource in a shared_ptr to get a reference count:
std::shared_ptr<resource> res = m_api.acquire();
This API requires the resources being freed from the same thread in which they were acquired.
My problem is giving it back when all the threads are done with that resource, as std::shared_ptr does not have release.
1/ I cannot just let the pointers wither on their own when they aren't used as I don't know which thread will be the last if I do that (and it needs to be the main thread).
2/ I need to add a custom deleter which stores a api*, which is too costly in this case - I can already barely afford a "simple" std::shared_ptr with its default deleter.
What are my solutions, if possible without having to reimplement an atomic reference count myself ?

How to expose a thread-safe interface that allocate resources?

I'm trying to expose a C interface for my C++ library. This notably involve functions that allow the user to create, launch, query the status, then release a background task.
The task is implemented within a C++ class, which members are protected from concurrent read/write via an std::mutex.
My issue comes when I expose a C interface for this background task. Basically I have say the following functions (assuming task_t is an opaque pointer to an actual struct containing the real task class):
task_t* mylib_task_create();
bool mylib_task_is_running(task_t* task);
void mylib_task_release(task_t* task);
My goal is to make any concurrent usage of these functions thread-safe, however I'm not sure exactly how, i.e. that if a client code thread calls mylib_task_is_running() at the same time that another thread calls mylib_task_release(), then everything's fine.
At first I thought about adding an std::mutex to the implementation of task_t, but that means the delete statement at the end of mylib_task_release() will have to happen while the mutex is not held, which means it doesn't completely solve the problem.
I also thought about using some sort of reference counting but I still end up against the same kind of issue where the actual delete might happen right after a hypothetical retain() function is called.
I feel like there should be a (relatively) simple solution to this but I can't quite put my hand on it. How can I make it so I don't have to force the client code to protect accesses to task_t?

if task_t is being deleted, you should ensure that nobody else has a pointer to it.
if one thread is deleting task_t and the other is trying to acquire it's mutex, it should be apparent that you should not have deleted the task_t.
shared_ptrs are a great help for this.

Guarded Data Design Pattern

In our application we deal with data that is processed in a worker thread and accessed in a display thread and we have a mutex that takes care of critical sections. Nothing special.
Now we thought about re-working our code where currently locking is done explicitely by the party holding and handling the data. We thought of a single entity that holds the data and only gives access to the data in a guarded fashion.
For this, we have a class called GuardedData. The caller can request such an object and should keep it only for a short time in local scope. As long as the object lives, it keeps the lock. As soon as the object is destroyed, the lock is released. The data access is coupled with the locking mechanism without any explicit extra work in the caller. The name of the class reminds the caller of the present guard.
template<typename T, typename Lockable>
class GuardedData {
GuardedData(T &d, Lockable &m) : data(d), guard(m) {}
boost::lock_guard<Lockable> guard;
T &data;
T &operator->() { return data; }
};
Again, a very simple concept. The operator-> mimics the semantics of STL iterators for access to the payload.
Now I wonder:
Is this approach well known?
Is there maybe a templated class like this already available, e.g. in the boost libraries?
I am asking because I think it is a fairly generic and usable concept. I could not find anything like it though.

Depending upon how this is used, you are almost guaranteed to end up with deadlocks at some point. If you want to operate on 2 pieces of data then you end up locking the mutex twice and deadlocking (unless each piece of data has its own mutex - which would also result in deadlock if the lock order is not consistent - you have no control over that with this scheme without making it really complicated). Unless you use a recursive mutex which may not be desired.
Also, how are your GuardedData objects passed around? boost::lock_guard is not copyable - it raises ownership issues on the mutex i.e. where & when it is released.
Its probably easier to copy parts of the data you need to the reader/writer threads as and when they need it, keeping the critical section short. The writer would similarly commit to the data model in one go.
Essentially your viewer thread gets a snapshot of the data it needs at a given time. This may even fit entirely in a cpu cache sitting near the core that is running the thread and never make it into RAM. The writer thread may modify the underlying data whilst the reader is dealing with it (but that should invalidate the view). However since the viewer has a copy it can continue on and provide a view of the data at the moment it was synchronized with the data.
The other option is to give the view a smart pointer to the data (which should be treated as immutable). If the writer wishes to modify the data, it copies it at that point, modifies the copy and when completes, switches the pointer to the data in the model. This would necessitate blocking all readers/writers whilst processing, unless there is only 1 writer. The next time the reader requests the data, it gets the fresh copy.

Well known, I'm not sure. However, I use a similar mechanism in Qt pretty often called a QMutexLocker. The distinction (a minor one, imho) is that you bind the data together with the mutex. A very similar mechanism to the one you've described is the norm for thread synchronization in C#.
Your approach is nice for guarding one data item at a time but gets cumbersome if you need to guard more than that. Additionally, it doesn't look like your design would stop me from creating this object in a shared place and accessing the data as often as I please, thinking that it's guarded perfectly fine, but in reality recursive access scenarios are not handled, nor are multi-threaded access scenarios if they occur in the same scope.
There seems to be to be a slight disconnect in the idea. Its use conveys to me that accessing the data is always made to be thread-safe because the data is guarded. Often, this isn't enough to ensure thread-safety. Order of operations on protected data often matters, so the locking is really scope-oriented, not data-oriented. You could get around this in your model by guarding a dummy object and wrapping your guard object in a temporary scope, but then why not just use one the existing mutex implementations?
Really, it's not a bad approach, but you need to make sure its intended use is understood.

Mutex protection for Singleton resources in multithreaded env

I have a server listening on a port for request. When the request comes in, it is dispatched to a singleton class Singleton. This Singleton class has a data structure RootData.
class Singleton {
void process();
void refresh();
private:
RootData mRootData;
}
In the class there are two functions: process: Work with the mRootData to do some processing and refresh: called periodically from another thread to refresh the mRootData with the latest changes from Database.
It is required that the access to mRootData be gaurded by Mutex.
I have the following questions:
1] If the class is a singleton and mRootData is inside that class, is the Mutex gaurd really necessary?
I know it is necessary for conflict between refresh/process. But from the server, i think there will be only one call to process happening at any give time ( coz the class is Singleton) Please correct me if my understanding is wrong.
2] Should i protect the i) data structure OR ii) function accessing the data structure. E.g.
i) const RootData& GetRootData()
{
ACE_Read_Guard guard(m_oMutexReadWriteLock);
return mRootData;
// Mutex is released when this function returns
}
// Similarly Write lock for SetRootData()
ii) void process()
{
ACE_Read_Guard guard(m_oMutexReadWriteLock);
// use mRootData and do processing
// GetRootData() and SetRootData() wont be mutex protected.
// Mutex is released when this function returns
}
3] If the answer to above is i) should i return by reference or by object?
Please explain in either case.
Thanks in advance.

1] If the class is a singleton and mRootData is inside that class, is the Mutex gaurd really necessary?
Yes it is, since one thread may call process() while another is calling refresh().
2] Should i protect the i) data structure OR ii) function accessing the data structure.
Mutex is meant to protect a common code path, i.e. (part of) the function accessing the shared data. And it is easiest to use when locking and releasing happens within the same code block. Putting them into different methods is almost an open invitation for deadlocks, since it is up to the caller to ensure that every lock is properly released.
Update: if GetRootData and SetRootData are public functions too, there is not much point to guard them with mutexes in their current form. The problem is, you are publishing a reference to the shared data, after which it is completely out of your control what and when the callers may do with it. 100 callers from 100 different threads may store a reference to mRootData and decide to modify it at the same time! Similarly, after calling SetRootData the caller may retain the reference to the root data, and access it anytime, without you even noticing (except eventually from a data corruption or deadlock...).
You have the following options (apart from praying that the clients be nice and don't do nasty things to your poor singleton ;-)
create a deep copy of mRootData both in the getter and the setter. This keeps the data confined to the singleton, where it can be guarded with locks. OTOH callers of GetRootData get a snapshot of the data, and subsequent changes are not visible to them - this may or may not be acceptable to you.
rewrite RootData to make it thread safe, then the singleton needs to care no more about thread safety (in its current setup - if it has other data members, the picture may be different).
Update2:
or remove the getter and setter altogether (possibly together with moving data processing methods from other classes into the singleton, where these can be properly guarded by mutexes). This would be the simplest and safest, unless you absolutely need other parties to access mRootData directly.

1.) Yes, the mutex is necessary. Although there is only one instance of the class in existence at any one time, multiple threads could still call process() on that instance at the same time (unless you design your app so that never happens).
2.) Anytime you use the value you should protect it with the mutex.
However, you don't mention a GetRootData and SetRootData in your class declaration above. Are these private (used only inside the class to access the data) or public (to allow other code to access the data directly)?
If you need to provide outside access to the data by making the GetRootData() function public, then you would need to return a copy, or your callers could then store a reference and manipulate the data after the lock has been released. Of course, then changes they made to the data wouldn't be reflected inside the singleton, which might not be what you want.

Thread Safe Access to Data Shared Between Objects

I'm something of an intermediate programmer, but relatively a novice to multi-threading.
At the moment, I'm working on an application with a structure similar to the following:
class Client
{
public:
Client();
private:
// These are all initialised/populated in the constrcutor.
std::vector<struct clientInfo> otherClientsInfo;
ClientUI* clientUI;
ClientConnector* clientConnector;
}
class ClientUI
{
public:
ClientUI(std::vector<struct clientInfo>* clientsInfo);
private:
// Callback which gets new client information
// from a server and pushes it into the otherClientsInfo vector.
synchClientInfo();
std::vector<struct clientInfo>* otherClientsInfo;
}
class ClientConnector
{
public:
ClientConnector(std::vector<struct clientInfo>* clientsInfo);
private:
connectToClients();
std::vector<struct clientInfo>* otherClientsInfo;
}
Somewhat a contrived example, I know. The program flow is this:
Client is constructed and populates otherClientsInfo and constructs clientUI and clientConnector with a pointer to otherClientsInfo.
clientUI calls synchClientInfo() anytime the server contacts it with new client information, parsing the new data and pushing it back into otherClientsInfo or removing an element.
clientConnector will access each element in otherClientsInfo when connectToClients() is called but won't alter them.
My first question is whether my assumption that if both ClientUI and ClientConnector access otherClientsInfo at the same time, will the program bomb out because of thread-unsafety?
If this is the case, then how would I go about making access to otherClientsInfo thread safe, as in perhaps somehow locking it while one object accesses it?

My first question is whether my assumption that if both ClientUI and ClientConnector access otherClientsInfo at the same time, will the program bomb out because of thread-unsafety?
Yes. Most implementations of std::vector do not allow concurrent read and modification. ( You'd know if you were using one which did )
If this is the case, then how would I go about making access to otherClientsInfo thread safe, as in perhaps somehow locking it while one object accesses it?
You would require at least a lock ( either a simple mutex or critical section or a read/write lock ) to be held whenever the vector is accessed. Since you've only one reader and writer there's no point having a read/write lock.
However, actually doing that correctly will get increasingly difficult as you are exposing te vector to the other classes, so will have to expose the locking primitive too, and remember to acquire it whenever you use the vector. It may be better to expose addClientInfo, removeClientInfo and const and non-const foreachClientInfo functions which encapsulate the locking in the Client class rather than having disjoint bits of the data owned by the client floating around the place.

See
Reader/Writer Locks in C++
and
http://msdn.microsoft.com/en-us/library/ms682530%28VS.85%29.aspx
The first one is probably a bit advanced for you. You can start with the Critical section (link 2).
I am assuming you are using Windows.

if both ClientUI and ClientConnector access otherClientsInfo at the same time, will the program bomb out because of thread-unsafety?
Yes, STL containers are not thread-safe.
If this is the case, then how would I go about making access to otherClientsInfo thread safe, as in perhaps somehow locking it while one object accesses it?
In the most simple case a mutual exclusion pattern around the access to the shared data... if you'd have multiple readers however, you would go for a more efficient pattern.

Is clientConnector called from the same thread as synchClientInfo() (even if it is all callback)?
If so, you don't need to worry about thread safety at all.
If you want to avoid simultanous access to the same data, you can use mutexes to protect the critical section. For exmample, mutexes from Boost::Thread

In order to ensure that access to the otherClientsInfo member from multiple threads is safe, you need to protect it with a mutex. I wrote an article about how to directly associate an object with a mutex in C++ over on the Dr Dobb's website:
http://www.drdobbs.com/cpp/225200269

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js