Shared, one-side-mutable state in multithreaded plugin architecture - c++

I have an application with a very simple plugin system that builds around a core that takes care of the heavy lifting, but leaves any processing beyond the basics to the plugins. Now I'd like to make that system multi threaded, or at the very least allow individual plugins to run their own threads so that they can block individually without freezing the core.
Naturally, that means making the core thread safe, so that plugins can freely operate on thread safe member functions of said core. This is not that hard for many cases, but the problem comes in when the result of one of these member functions is a (const) reference to some inner environment maintained by the core. Plugins must not modify it, but the core running in another thread may update it at any point in time while the plugin is still holding on to the reference and possibly still mid-processing.
Now I could just expose a mutex in the core and have plugins lock it for as long as they need the data to remain unchanged, but since that mutex will have to block the core event processing, holding it for too long will cause all kinds of unpleasantries. I'd really like to avoid that.
Another solution might be having the function that would return a reference, return a copy instead, but the environment can grow pretty large with many containers that contain yet more things, and copying all that is expensive.
Apart from steering cleer of any direct sharing and instead talking over sockets or pipes, these seem to be my options, although I can not decide which one I'd choose.
Which would be my best bet? What options that I have not considered might help me here?

This question is quite opinion-based, but based on past experience, I would recommend using copy-on-write. You could do it at the top level, copying the entire object, or you could do it piece by piece, which might address the overhead of copying. For example:
struct BigData;
struct Shared
{
std::shared_ptr<const BigData> a;
std::shared_ptr<const BigData> b;
std::shared_ptr<const BigData> c;
}
You would then only copy the parts you modify and send over a copy of Shared to the plugin. So if you change something in a, b and c would still point to the same object but a would point to a new BigData which was copied from the original and then modified.

Related

Best way to control access to a string object in multi-threaded program

I've got a "config" class that has a bunch of attributes that "mirror" configuration settings. A single instance of the class is shared throughout the code (using boost shared_ptr objects) and its attributes read by multiple threads (around 100).
Occasionally, the settings may change and a "monitor" thread updates the appropriate attributes in the object.
For integer and bool attributes, I'm using boost atomic so that when an update happens and the monitor thread sets the value, none of the read threads read it in a partially updated state.
However, for string attributes, I'm worried that making them atomic would hurt performance significantly. It seems like a good way to do it would be to have the string attributes actually be pointers to strings, and then when an update happens, a new string object could be built, and then the write to the shared object (the string pointer) would only be writing the address of the new string object to point to. So I assume that write time would be far shorter than writing a whole new string value to a shared string object.
Doing that, however, means I think I'd want to use shared_ptrs for the string attribs, so that a string object holding the previous value is automatically deleted once all read threads are using the updated string pointer attribute.
So to give an example:
class Config
{
public:
boost::atomic<boost::shared_ptr<std::string> > configStr1;
void updateValueInMonitorThread(std::string newValue)
{
boost::shared_ptr<string> newValuePtr;
newValuePtr = newValue;
configStr1 = newValuePtr;
}
};
void threadThatReadsConfig(boost::shared_ptr<Config> theConfig)
{
std::map<std::string, std::string> thingImWorkingOn;
thingImWorkingOn[*(theConfig->configStr1.load())] = "some value";
}
Is that overkill? Is there a better way to do it? I really don't like the way the reading threads have to access the value by dereferencing it and calling .load(). Also, is it even threadsafe, or does that stuff actually negate the safety features of the atomic and/or shared_ptr type?
I know I could use a mutex and read lock it when accessed in a "getter" and write lock it when the monitor thread updates the string's value, but I'd like to avoid that as I'm trying to keep the config class simple and it's going to have dozens, possibly hundreds of these string attributes.
Thanks in advance for any suggestions/info!
You are already giving each consumer a shared_ptr to the configuration object. So the threads won't notice if the configuration object isn't always the same object.
That is, when the main configuration changes, generate an entirely new configuration object. That seems like a lot of copying, but I'll bet it happens sufficiently rarely that you won't notice the overhead. Then you can swap the new configuration object in for the old one, and when all the consumers of the old object finish with it, it will disappear.
Obviously, this changes the semantics of the use of a configuration object. A long-running thread which would like to be able to notice configuration changes will have to periodically refresh its configuration object. The easiest way to do that would be just to acquire a new configuration object on every use of configuration data; again, that's unlikely to be too expensive, unless you use a configuration string in a hard loop.
On the plus side, you can make the entire configuration object const, which might allow for some optimizations.
The classical method of using mutex variables to set a lock on shared resources (here your string objects) is not only the best but the most efficient way of handling such situations, otherwise you may get into trouble because of incomplete protection or you may end-up with a solution that has more overhead. In some applications you may improve efficiency by using separate mutex locks for separate objects so that if an object is updating, others remain accessible.

Guarded Data Design Pattern

In our application we deal with data that is processed in a worker thread and accessed in a display thread and we have a mutex that takes care of critical sections. Nothing special.
Now we thought about re-working our code where currently locking is done explicitely by the party holding and handling the data. We thought of a single entity that holds the data and only gives access to the data in a guarded fashion.
For this, we have a class called GuardedData. The caller can request such an object and should keep it only for a short time in local scope. As long as the object lives, it keeps the lock. As soon as the object is destroyed, the lock is released. The data access is coupled with the locking mechanism without any explicit extra work in the caller. The name of the class reminds the caller of the present guard.
template<typename T, typename Lockable>
class GuardedData {
GuardedData(T &d, Lockable &m) : data(d), guard(m) {}
boost::lock_guard<Lockable> guard;
T &data;
T &operator->() { return data; }
};
Again, a very simple concept. The operator-> mimics the semantics of STL iterators for access to the payload.
Now I wonder:
Is this approach well known?
Is there maybe a templated class like this already available, e.g. in the boost libraries?
I am asking because I think it is a fairly generic and usable concept. I could not find anything like it though.
Depending upon how this is used, you are almost guaranteed to end up with deadlocks at some point. If you want to operate on 2 pieces of data then you end up locking the mutex twice and deadlocking (unless each piece of data has its own mutex - which would also result in deadlock if the lock order is not consistent - you have no control over that with this scheme without making it really complicated). Unless you use a recursive mutex which may not be desired.
Also, how are your GuardedData objects passed around? boost::lock_guard is not copyable - it raises ownership issues on the mutex i.e. where & when it is released.
Its probably easier to copy parts of the data you need to the reader/writer threads as and when they need it, keeping the critical section short. The writer would similarly commit to the data model in one go.
Essentially your viewer thread gets a snapshot of the data it needs at a given time. This may even fit entirely in a cpu cache sitting near the core that is running the thread and never make it into RAM. The writer thread may modify the underlying data whilst the reader is dealing with it (but that should invalidate the view). However since the viewer has a copy it can continue on and provide a view of the data at the moment it was synchronized with the data.
The other option is to give the view a smart pointer to the data (which should be treated as immutable). If the writer wishes to modify the data, it copies it at that point, modifies the copy and when completes, switches the pointer to the data in the model. This would necessitate blocking all readers/writers whilst processing, unless there is only 1 writer. The next time the reader requests the data, it gets the fresh copy.
Well known, I'm not sure. However, I use a similar mechanism in Qt pretty often called a QMutexLocker. The distinction (a minor one, imho) is that you bind the data together with the mutex. A very similar mechanism to the one you've described is the norm for thread synchronization in C#.
Your approach is nice for guarding one data item at a time but gets cumbersome if you need to guard more than that. Additionally, it doesn't look like your design would stop me from creating this object in a shared place and accessing the data as often as I please, thinking that it's guarded perfectly fine, but in reality recursive access scenarios are not handled, nor are multi-threaded access scenarios if they occur in the same scope.
There seems to be to be a slight disconnect in the idea. Its use conveys to me that accessing the data is always made to be thread-safe because the data is guarded. Often, this isn't enough to ensure thread-safety. Order of operations on protected data often matters, so the locking is really scope-oriented, not data-oriented. You could get around this in your model by guarding a dummy object and wrapping your guard object in a temporary scope, but then why not just use one the existing mutex implementations?
Really, it's not a bad approach, but you need to make sure its intended use is understood.

passing "this" to a thread c++

What is the best way of performing the following in C++. Whilst my current method works I'm not sure it's the best way to go:
1) I have a master class that has some function in it
2) I have a thread that takes some instructions on a socket and then runs one of the functions in the master class
3) There are a number of threads that access various functions in the master class
I create the master class and then create instances of the thread classes from the master. The constructor for the thread class gets passed the "this" pointer for the master. I can then run functions from the master class inside the threads - i.e. I get a command to do something which runs a function in the master class from the thread. I have mutex's etc to prevent race problems.
Am I going about this the wrong way - It kinda seems like the thread classes should inherit the master class or another approach would be to not have separate thread classes but just have them as functions of the master class but that gets ugly.
Sounds good to me. In my servers, it is called 'SCB' - ServerControlBlock - and provides access to services like the IOCPbuffer/socket pools, logger, UI access for status/error messages and anything else that needs to be common to all the handler threads. Works fine and I don't see it as a hack.
I create the SCB, (and ensure in the ctor that all services accessed through it are started and ready for use), before creating the thread pool that uses the SCB - no nasty singletonny stuff.
Rgds,
Martin
Separate thread classes is pretty normal, especially if they have specific functionality. I wouldn't inherit from the main thread.
Passing the this pointer to threads is not, in itself, bad. What you do with it can be.
The this pointer is just like any other POD-ish data type. It's just a chunk of bits. The stuff that is in this might be more than PODs however, and passing what is in effect a pointer to it's members can be dangerous for all the usual reasons. Any time you share anything across threads, it introduces potential race conditions and deadlocks. The elementary means to resolve those conflicts is, of course, to introduce synchronization in the form of mutexes, semaphores, etc, but this can have the suprising effect of serializing your application.
Say you have one thread reading data from a socket and storing it to a synchronized command buffer, and another thread which reads from that command buffer. Both threads use the same mutex, which protects the buffer. All is well, right?
Well, maybe not. Your threads could become serialized if you're not very careful with how you lock the buffer. Presumably you created separate threads for the buffer-insert and buffer-remove codes so that they could run in parallel. But if you lock the buffer with each insert & each remove, then only one of those operations can be executing at a time. As long as your writing to the buffer, you can't read from it and vice versa.
You can try to fine-tune the locks so that they are as brief as possible, but so long as you have shared, synchronized data, you will have some degree of serialization.
Another approach is to hand data off to another thread explicitly, and remove as much data sharing as possible. Instead of writing to and reading from a buffer as in the above, for example, your socket code might create some kind of Command object on the heap (eg Command* cmd = new Command(...);) and pass that off to the other thread. (One way to do this in Windows is via the QueueUserAPC mechanism).
There are pros & cons to both approaches. The synchronization method has the benefit of being somewhat simpler to understand and implement at the surface, but the potential drawback of being much more difficult to debug if you mess something up. The hand-off method can make many of the problems inherent with synchronization impossible (thereby actually making it simpler), but it takes time to allocate memory on the heap.

Are synchronization objects cacheable?

I am new to multithreading world and started getting into it. I found threading requires synchronization. Volatile is no more a reliable thing. I would like to know if synchronization object are cacheable by compiler or at any stage?
Platform/languages used : c++, win32, Windows
In c++, volatile keyword is used for objects which can not be cached by CPUs. But today's compilers do not strictly follow this. Is there is other way around to make synchronization objects non-cacheable (or other optimizations are not applied on those objects).
tl;dr: Are synchronization objects cacheable? If yes, how can you make it non-cacheable ?
I'm not sure I follow your question: compiler cache has almost nothing to do with multithreading. The only thing that a compiler cache would do is to increase your compilation speed by caching previous compilations.
Synchronization objects can be "cached" since they're any arbitrary object that you've decided to use for synchronization, but that has little effect on concurrency. The only thing that you need to care about when synchronizing is that when you have multiple threads contending for a resource, they must all synchronize on the same object in order to get read/write access to the resource.
I'm going to take a wild guess, based on your mentioning of volatile, and assume that you're worried a synchronization object may be cached in a thread's local cache and changes to the synchronization object from one thread may not be visible to another thread. This, however, is a flawed idea:
When you call lock() or synchronize() (depending on the language), all you need to care about is that the lock is performed on the same object regardless of the internal state of the object.
Once you've acquired a lock on an object, any resource that you're modifying within that lock scope will be modified by only one thread.
Generally, you should use a synchronization object that will not change (ideally a readonly, const or final) and we're only talking about the reference here, not the content of the object itself. Here is an example:
object sync = new object();
string something = "hello":
void ModifySomething()
{
sync = new object();// <-- YOU SHOULD NEVER DO THIS!!
lock(sync)
{
something = GenerateRandomString();
}
}
Now notice that every time a thread calls ModifySomething, the synchronization object will be replaced by an new object and the threads will never synchronize on the same object, therefore there may be concurrent writes to something.
The question doesn't make much sense without specifying a run-time environment.
In the case of, Java, say, a synchronization object (an object used for synchronization) is just like any other object. The object is target of the synchronization, so volatile (which applies to member variables) is only needed if the variable containing the synchronization object can change. I would avoid a program design that needs such constructs.
Remember (again, in Java), it is the evaluation of an expression -- generally a variable access -- that results in the synchronization object to use. This evaluation is no different than any other in this aspect.
At the end of the day, however, it is just using the synchronization tools of a particular run-time/environment in a manner in which they are well-defined and well-behaving.
Happy coding.
Java, for instance, guarantees that synchronized(x) { foo }, where x is a particular object, will create a mutually exclusive critical region in which foo is executed. To do this it must do internal work to ensure the book-keeping data is flushed correctly across all processors/caches. However, these details are generally outside the scope of the run-time in terms of using the synchronization construct.
Synchronization objects are necessarily managed by the OS, which also manages threads and caches. Therefore, it's the OS responsibility to deal with caches. If it somehow knows that a synchronization object is used only on a single CPU (e.g. because it didn't allocate the second CPU to your process), the OS may very well decide to keep the synchronization object in the fist CPU's cache. If it needs to be shared across CPU's, then that will happen too.
One practical consequence is that you'll always initialize synchronization objects. In C++, that's natural (the constructor takes care of that) but in other languages you must explicitly do so. The OS has to keep track of the synchronization objects.

What is the most efficient implementation of a java like object monitor in C++?

In Java each object has a synchronisation monitor. So i guess the implementation is pretty condensed in term of memory usage and hopefully fast as well.
When porting this to C++ what whould be the best implementation for it. I think that there must be something better then "pthread_mutex_init" or is the object overhead in java really so high?
Edit: i just checked that pthread_mutex_t on Linux i386 is 24 bytes large. Thats huge if i have to reserve this space for each object.
In a sense it's worse than pthread_mutex_init, actually. Because of Java's wait/notify you kind of need a paired mutex and condition variable to implement a monitor.
In practice, when implementing a JVM you hunt down and apply every single platform-specific optimisation in the book, and then invent some new ones, to make monitors as fast as possible. If you can't do a really fiendish job of that, you definitely aren't up to optimising garbage collection ;-)
One observation is that not every object needs to have its own monitor. An object which isn't currently synchronised doesn't need one. So the JVM can create a pool of monitors, and each object could just have a pointer field, which is filled in when a thread actually wants to synchronise on the object (with a platform-specific atomic compare and swap operation, for instance). So the cost of monitor initialisation doesn't have to add to the cost of object creation. Assuming the memory is pre-cleared, object creation can be: decrement a pointer (plus some kind of bounds check, with a predicted-false branch to the code that runs gc and so on); fill in the type; call the most derived constructor. I think you can arrange for the constructor of Object to do nothing, but obviously a lot depends on the implementation.
In practice, the average Java application isn't synchronising on very many objects at any one time, so monitor pools are potentially a huge optimisation in time and memory.
The Sun Hotspot JVM implements thin locks using compare and swap. If an object is locked, then the waiting thread wait on the monitor of thread which locked the object. This means you only need one heavy lock per thread.
I'm not sure how Java does it, but .NET doesn't keep the mutex (or analog - the structure that holds it is called "syncblk" there) directly in the object. Rather, it has a global table of syncblks, and object references its syncblk by index in that table. Furthermore, objects don't get a syncblk as soon as they're created - instead, it's created on demand on the first lock.
I assume (note, I do not know how it actually does that!) that it uses atomic compare-and-exchange to associate the object and its syncblk in a thread-safe way:
Check the hidden syncblk_index field of our object for 0. If it's not 0, lock it and proceed, otherwise...
Create a new syncblk in global table, get the index for it (global locks are acquired/released here as needed).
Compare-and-exchange to write it into object itself.
If previous value was 0 (assume that 0 is not a valid index, and is the initial value for the hidden syncblk_index field of our objects), our syncblk creation was not contested. Lock on it and proceed.
If previous value was not 0, then someone else had already created a syncblk and associated it with the object while we were creating ours, and we have the index of that syncblk now. Dispose the one we've just created, and lock on the one that we've obtained.
Thus the overhead per-object is 4 bytes (assuming 32-bit indices into syncblk table) in best case, but larger for objects which actually have been locked. If you only rarely lock on your objects, then this scheme looks like a good way to cut down on resource usage. But if you need to lock on most or all your objects eventually, storing a mutex directly within the object might be faster.
Surely you don't need such a monitor for every object!
When porting from Java to C++, it strikes me as a bad idea to just copy everything blindly. The best structure for Java is not the same as the best for C++, not least because Java has garbage collection and C++ doesn't.
Add a monitor to only those objects that really need it. If only some instances of a type need synchronization then it's not that hard to create a wrapper class that contains the mutex (and possibly condition variable) necessary for synchronization. As others have already said, an alternative is to use a pool of synchronization objects with some means of choosing one for each object, such as using a hash of the object address to index the array.
I'd use the boost thread library or the new C++0x standard thread library for portability rather than relying on platform specifics at each turn. Boost.Thread supports Linux, MacOSX, win32, Solaris, HP-UX and others. My implementation of the C++0x thread library currently only supports Windows and Linux, but other implementations will become available in due course.