I'm writing a game engine (for fun), and have a lot of threads running concurrently. I have a class which holds an instance of another class as a private variable, which in turn holds and instance of a different class as a private variable. My question is, which one of these classes should I strive to make thread safe?
Do I make all of them thread safe, and have each of them protect their data with a mutex, do I make just one of them thread safe, and assume that anybody using my code must understand that if you are using underlying classes they aren't inherently thread safe.
Example:
class A {
private:
B b;
}
class B {
private:
C c;
}
class C {
// data
}
I understand I need every class's data to avoid being corrupted via a data race, however I would like to avoid throwing a ton of mutexes on every single method of every class. I'm not sure what the proper convention is.
You almost certainly don't want to try to make every class thread-safe, since doing so would end up being very inefficient (with lots of unnecessary locking and unlocking of mutexes for no benefit) and also prone to deadlocks (the more mutexes you have to lock at once, the more likely you are to have different threads locking sequences of mutexes in a different order, which is the entry condition for a deadlock and therefore your program freezing up on you).
What you want to do instead if figure out which data structures needs to be accessed by which thread(s). When designing your data structures, you want to try to design them in such a way that the amount of data shared between threads is as minimal as possible -- if you can reduce it to zero, then you don't need to do any serialization at all! (you probably won't manage that, but if you do a CSP/message-passing design you can get pretty close, in that the only mutexes you ever need to lock are the ones protecting your message-passing queues)
Keep in mind also that your mutexes are there not just to "protect the data" but also to allow a thread to make a series of changes appear to be atom from the viewpoint of the other threads that might access that data. That is, if your thread #1 needs to make changes to objects A, B, and C, and all three of those objects each have their own mutex, which thread #1 locks before modifying the object and then unlocks afterwards, you can still have a race condition, because thread #2 might "see" the update half-completed (i.e. thread #2 might examine the objects after you've updated A but before you've updated B and C). Therefore you usually need to push your mutexes up to a level where they cover all the objects you might need to change in one go -- in the ABC example case, that means you might want to have a single mutex that is used to serialize access to A, B, and C.
One way to approach it would be to start with just a single global mutex for your entire program -- anytime any thread needs to read or write any data structure that is accessible to other threads, that is the mutex it locks (and unlocks afterwards). That design probably won't be very efficient (since threads might spend a lot of time waiting for the mutex), but it will definitely not suffer from deadlock problems. Then once you have that working, you could look to see if that single mutex is actually a noticeable performance bottleneck for you -- if not, you're done, ship your program :) OTOH if it is a bottleneck, you can then analyze which of your data structures are logically independent from each other, and split your global mutex into two mutexes -- one to serialize access to subset A of the data structures, and another one to serialize access to subset B. (Note that the subsets don't need to be equal size -- subset B might contain just one particular data structure that is critical to performance) Repeat as necessary until either you're happy with performance, or your program starts to get too complicated or buggy (in which case you might want to dial the mutex-granularity back again a bit in order to regain your sanity).
Related
When writing multithreaded code, I often need to read / write to shared memory. To prevent data races, the go - to solution would be to use something like lock_guard. However recently, I came across the concept of "synchronised values" which are usually implemented something in the lines of :
template <typename T>
class SynchronizedValue {
T value;
std::mutex lock;
/* Public helper functions to read/write to a value, making sure the lock is locked when the value is written to*/
};
This class Synchronised value will have a method SetValueTo which will lock the mutex, write to the value, and unlock the mutex, making sure that you can write to a value safely without any data races.
This makes writing multithreaded code so much easier! However, are there any drawbacks / performance overhead of using these synchronised values in contrast to mutexes / lock_guard?
are there any drawbacks / performance overhead of using these SynchronisedValues...?
Before you ask whether there is any drawback, You first ought to ask whether there is any benefit. The standard C++ library already defines std::atomic<T>. You didn't say what /* public helper functions...*/ you had in mind, but if they're just getters and setters for value, then what does your SynchronizedValues<T> class offer that you don't already get from std::atomic<T> ?
There's an important reason why "atomic" variables don't eliminate the need for mutexes, B.T.W. Mutexes aren't just about ensuring "visibility" of memory updates: The most important way to think about mutexes is that they can protect relationships between data in a program.
E.g., Imagine a program that has multiple containers for some class of object, imagine that the program needs to move objects from container to container, and imagine that it is important for some thread to occasionally count all of the objects, and be guaranteed to get an accurate count.
The program can use a mutex to make that possible. It just has to obey two simple rules; (1) No thread may remove an object from any container unless it has the mutex locked, and (2) no thread may release the mutex until every object is in a container. If all of the threads obey those two rules, then the thread that counts the objects can be guaranteed to find all of them if it locks the mutex before it starts counting.
The thing is, you can't guarantee that just by making all of the variables atomic, because atomic doesn't protect any relationship between the variable in question and any other variable. At most, it only protects relationships between the value of the variable before and after some "atomic" operation such as an atomic increment.
When there's more than one variable participating in the relationship, then you must have a mutex (or something equivalent to a mutex.)
If you look under the hood at what is actually happening in each case you just find different ways of saying and doing the same thing.
Assume that I have code like:
void InitializeComplexClass(ComplexClass* c);
class Foo {
public:
Foo() {
i = 0;
InitializeComplexClass(&c);
}
private:
ComplexClass c;
int i;
};
If I now do something like Foo f; and hand a pointer to f over to another thread, what guarantees do I have that any stores done by InitializeComplexClass() will be visible to the CPU executing the other thread that accesses f? What about the store writing zero into i? Would I have to add a mutex to the class, take a writer lock on it in the constructor and take corresponding reader locks in any methods that accesses the member?
Update: Assume I hand a pointer over to a bunch of other threads once the constructor has returned. I'm not assuming that the code is running on x86, but could be instead running on something like PowerPC, which has a lot of freedom to do memory reordering. I'm essentially interested in what sorts of memory barriers the compiler has to inject into the code when the constructor returns.
In order for the other thread to be able to know about your new object, you have to hand over the object / signal other thread somehow. For signaling a thread you write to memory. Both x86 and x64 perform all memory writes in order, CPU does not reorder these operations with regards to each other. This is called "Total Store Ordering", so CPU write queue works like "first in first out".
Given that you create an object first and then pass it on to another thread, these changes to memory data will also occur in order and the other thread will always see them in the same order. By the time the other thread learns about the new object, the contents of this object was guaranteed to be available for that thread even earlier (if the thread only somehow knew where to look).
In conclusion, you do not have to synchronise anything this time. Handing over the object after it has been initialised is all the synchronisation you need.
Update: On non-TSO architectures you do not have this TSO guarantee. So you need to synchronise. Use MemoryBarrier() macro (or any interlocked operation), or some synchronisation API. Signalling the other thread by corresponding API causes also synchronisation, otherwise it would not be synchronisation API.
x86 and x64 CPU may reorder writes past reads, but that is not relevant here. Just for better understanding - writes can be ordered after reads since writes to memory go through a write queue and flushing that queue may take some time. On the other hand, read cache is always consistent with latest updates from other processors (that have went through their own write queue).
This topic has been made so unbelievably confusing for so many, but in the end there is only a couple of things a x86-x64 programmer has to be worried about:
- First, is the existence of write queue (and one should not at all be worried about read cache!).
- Secondly, concurrent writing and reading in different threads to same variable in case of non-atomic variable length, which may cause data tearing, and for which case you would need synchronisation mechanisms.
- And finally, concurrent updates to same variable from multiple threads, for which we have interlocked operations, or again synchronisation mechanisms.)
If you do :
Foo f;
// HERE: InitializeComplexClass() and "i" member init are guaranteed to be completed
passToOtherThread(&f);
/* From this point, you cannot guarantee the state/members
of 'f' since another thread can modify it */
If you're passing an instance pointer to another thread, you need to implement guards in order for both threads to interact with the same instance. If you ONLY plan to use the instance on the other thread, you do not need to implement guards. However, do not pass a stack pointer like in your example, pass a new instance like this:
passToOtherThread(new Foo());
And make sure to delete it when you are done with it.
I am new to threading . Correct me if I am wrong that mutex locks the access to a shared data structure so that it cannot be used by other threads until it is unlocked . So, lets consider that there are 2 or more shared data structures . So , should I make different mutex objects for different data structures ? If no ,then how std::mutex will know which object it should lock ? What If I have to lock more than 1 objects at the same time ?
There are several points in your question that can be made more precise. Perhaps clearing this will solve things for you.
To begin with, a mutex, by itself, does not lock access to anything. It is basically something that your code can lock and unlock, and some "magic" ensures that only one thread can lock it at a time.
If, by convention, you decide that any code accessing some data structure foo will first begin by locking a mutex foo_mutex, then it will have the effect of protecting this data structure.
So, having said that, regarding your questions:
It depends on whether the two data structures need to be accessed together or not (e.g., can updating one without the other leave the system in an inconsistent state). If so, you should lock them with a single mutex. If not, you can improve parallelism by using two.
The mutex does not lock anything. It is you who decide by convention whether you can access 1, 2, or a million data structures, while holding it.
If you always needs to access both structures then it could be considered as a single resource so only a single lock is needed.
If you sometimes, even just once, need to access one of the structures independently then they can no longer be considered a single resource and you might need two locks. Of course, a single lock could still be sufficient, but then that lock would lock both resources at once, prohibiting other threads from accessing any of the structures.
Mutex does not "know" anything other than about itself. The lock is performed on mutex itself.
If there are two objects (or pieces of code) that need synchronized access (but can be accessed at the same time) then you have the liberty to use just one mutex for both or one for each. If you use one mutex they will not be accessed at the same time from two different threads.
If it cannot happen that access to one object is required while accessing the other object then you can use two mutexes, one for each. But if it can happen that one object must be accessed while the thread already holds another mutex then care must be taken that code never can reach a deadlock, where two threads hold one mutex each, and both at the same time wait that the other mutex is released.
In our application we deal with data that is processed in a worker thread and accessed in a display thread and we have a mutex that takes care of critical sections. Nothing special.
Now we thought about re-working our code where currently locking is done explicitely by the party holding and handling the data. We thought of a single entity that holds the data and only gives access to the data in a guarded fashion.
For this, we have a class called GuardedData. The caller can request such an object and should keep it only for a short time in local scope. As long as the object lives, it keeps the lock. As soon as the object is destroyed, the lock is released. The data access is coupled with the locking mechanism without any explicit extra work in the caller. The name of the class reminds the caller of the present guard.
template<typename T, typename Lockable>
class GuardedData {
GuardedData(T &d, Lockable &m) : data(d), guard(m) {}
boost::lock_guard<Lockable> guard;
T &data;
T &operator->() { return data; }
};
Again, a very simple concept. The operator-> mimics the semantics of STL iterators for access to the payload.
Now I wonder:
Is this approach well known?
Is there maybe a templated class like this already available, e.g. in the boost libraries?
I am asking because I think it is a fairly generic and usable concept. I could not find anything like it though.
Depending upon how this is used, you are almost guaranteed to end up with deadlocks at some point. If you want to operate on 2 pieces of data then you end up locking the mutex twice and deadlocking (unless each piece of data has its own mutex - which would also result in deadlock if the lock order is not consistent - you have no control over that with this scheme without making it really complicated). Unless you use a recursive mutex which may not be desired.
Also, how are your GuardedData objects passed around? boost::lock_guard is not copyable - it raises ownership issues on the mutex i.e. where & when it is released.
Its probably easier to copy parts of the data you need to the reader/writer threads as and when they need it, keeping the critical section short. The writer would similarly commit to the data model in one go.
Essentially your viewer thread gets a snapshot of the data it needs at a given time. This may even fit entirely in a cpu cache sitting near the core that is running the thread and never make it into RAM. The writer thread may modify the underlying data whilst the reader is dealing with it (but that should invalidate the view). However since the viewer has a copy it can continue on and provide a view of the data at the moment it was synchronized with the data.
The other option is to give the view a smart pointer to the data (which should be treated as immutable). If the writer wishes to modify the data, it copies it at that point, modifies the copy and when completes, switches the pointer to the data in the model. This would necessitate blocking all readers/writers whilst processing, unless there is only 1 writer. The next time the reader requests the data, it gets the fresh copy.
Well known, I'm not sure. However, I use a similar mechanism in Qt pretty often called a QMutexLocker. The distinction (a minor one, imho) is that you bind the data together with the mutex. A very similar mechanism to the one you've described is the norm for thread synchronization in C#.
Your approach is nice for guarding one data item at a time but gets cumbersome if you need to guard more than that. Additionally, it doesn't look like your design would stop me from creating this object in a shared place and accessing the data as often as I please, thinking that it's guarded perfectly fine, but in reality recursive access scenarios are not handled, nor are multi-threaded access scenarios if they occur in the same scope.
There seems to be to be a slight disconnect in the idea. Its use conveys to me that accessing the data is always made to be thread-safe because the data is guarded. Often, this isn't enough to ensure thread-safety. Order of operations on protected data often matters, so the locking is really scope-oriented, not data-oriented. You could get around this in your model by guarding a dummy object and wrapping your guard object in a temporary scope, but then why not just use one the existing mutex implementations?
Really, it's not a bad approach, but you need to make sure its intended use is understood.
What is the best way of performing the following in C++. Whilst my current method works I'm not sure it's the best way to go:
1) I have a master class that has some function in it
2) I have a thread that takes some instructions on a socket and then runs one of the functions in the master class
3) There are a number of threads that access various functions in the master class
I create the master class and then create instances of the thread classes from the master. The constructor for the thread class gets passed the "this" pointer for the master. I can then run functions from the master class inside the threads - i.e. I get a command to do something which runs a function in the master class from the thread. I have mutex's etc to prevent race problems.
Am I going about this the wrong way - It kinda seems like the thread classes should inherit the master class or another approach would be to not have separate thread classes but just have them as functions of the master class but that gets ugly.
Sounds good to me. In my servers, it is called 'SCB' - ServerControlBlock - and provides access to services like the IOCPbuffer/socket pools, logger, UI access for status/error messages and anything else that needs to be common to all the handler threads. Works fine and I don't see it as a hack.
I create the SCB, (and ensure in the ctor that all services accessed through it are started and ready for use), before creating the thread pool that uses the SCB - no nasty singletonny stuff.
Rgds,
Martin
Separate thread classes is pretty normal, especially if they have specific functionality. I wouldn't inherit from the main thread.
Passing the this pointer to threads is not, in itself, bad. What you do with it can be.
The this pointer is just like any other POD-ish data type. It's just a chunk of bits. The stuff that is in this might be more than PODs however, and passing what is in effect a pointer to it's members can be dangerous for all the usual reasons. Any time you share anything across threads, it introduces potential race conditions and deadlocks. The elementary means to resolve those conflicts is, of course, to introduce synchronization in the form of mutexes, semaphores, etc, but this can have the suprising effect of serializing your application.
Say you have one thread reading data from a socket and storing it to a synchronized command buffer, and another thread which reads from that command buffer. Both threads use the same mutex, which protects the buffer. All is well, right?
Well, maybe not. Your threads could become serialized if you're not very careful with how you lock the buffer. Presumably you created separate threads for the buffer-insert and buffer-remove codes so that they could run in parallel. But if you lock the buffer with each insert & each remove, then only one of those operations can be executing at a time. As long as your writing to the buffer, you can't read from it and vice versa.
You can try to fine-tune the locks so that they are as brief as possible, but so long as you have shared, synchronized data, you will have some degree of serialization.
Another approach is to hand data off to another thread explicitly, and remove as much data sharing as possible. Instead of writing to and reading from a buffer as in the above, for example, your socket code might create some kind of Command object on the heap (eg Command* cmd = new Command(...);) and pass that off to the other thread. (One way to do this in Windows is via the QueueUserAPC mechanism).
There are pros & cons to both approaches. The synchronization method has the benefit of being somewhat simpler to understand and implement at the surface, but the potential drawback of being much more difficult to debug if you mess something up. The hand-off method can make many of the problems inherent with synchronization impossible (thereby actually making it simpler), but it takes time to allocate memory on the heap.