C++ constructor memory synchronization

C++ constructor memory synchronization - c++

Assume that I have code like:
void InitializeComplexClass(ComplexClass* c);
class Foo {
public:
Foo() {
i = 0;
InitializeComplexClass(&c);
}
private:
ComplexClass c;
int i;
};
If I now do something like Foo f; and hand a pointer to f over to another thread, what guarantees do I have that any stores done by InitializeComplexClass() will be visible to the CPU executing the other thread that accesses f? What about the store writing zero into i? Would I have to add a mutex to the class, take a writer lock on it in the constructor and take corresponding reader locks in any methods that accesses the member?
Update: Assume I hand a pointer over to a bunch of other threads once the constructor has returned. I'm not assuming that the code is running on x86, but could be instead running on something like PowerPC, which has a lot of freedom to do memory reordering. I'm essentially interested in what sorts of memory barriers the compiler has to inject into the code when the constructor returns.

In order for the other thread to be able to know about your new object, you have to hand over the object / signal other thread somehow. For signaling a thread you write to memory. Both x86 and x64 perform all memory writes in order, CPU does not reorder these operations with regards to each other. This is called "Total Store Ordering", so CPU write queue works like "first in first out".
Given that you create an object first and then pass it on to another thread, these changes to memory data will also occur in order and the other thread will always see them in the same order. By the time the other thread learns about the new object, the contents of this object was guaranteed to be available for that thread even earlier (if the thread only somehow knew where to look).
In conclusion, you do not have to synchronise anything this time. Handing over the object after it has been initialised is all the synchronisation you need.
Update: On non-TSO architectures you do not have this TSO guarantee. So you need to synchronise. Use MemoryBarrier() macro (or any interlocked operation), or some synchronisation API. Signalling the other thread by corresponding API causes also synchronisation, otherwise it would not be synchronisation API.
x86 and x64 CPU may reorder writes past reads, but that is not relevant here. Just for better understanding - writes can be ordered after reads since writes to memory go through a write queue and flushing that queue may take some time. On the other hand, read cache is always consistent with latest updates from other processors (that have went through their own write queue).
This topic has been made so unbelievably confusing for so many, but in the end there is only a couple of things a x86-x64 programmer has to be worried about:
- First, is the existence of write queue (and one should not at all be worried about read cache!).
- Secondly, concurrent writing and reading in different threads to same variable in case of non-atomic variable length, which may cause data tearing, and for which case you would need synchronisation mechanisms.
- And finally, concurrent updates to same variable from multiple threads, for which we have interlocked operations, or again synchronisation mechanisms.)

If you do :
Foo f;
// HERE: InitializeComplexClass() and "i" member init are guaranteed to be completed
passToOtherThread(&f);
/* From this point, you cannot guarantee the state/members
of 'f' since another thread can modify it */
If you're passing an instance pointer to another thread, you need to implement guards in order for both threads to interact with the same instance. If you ONLY plan to use the instance on the other thread, you do not need to implement guards. However, do not pass a stack pointer like in your example, pass a new instance like this:
passToOtherThread(new Foo());
And make sure to delete it when you are done with it.

Related

Thread safety among classes with other classes for private variables

I'm writing a game engine (for fun), and have a lot of threads running concurrently. I have a class which holds an instance of another class as a private variable, which in turn holds and instance of a different class as a private variable. My question is, which one of these classes should I strive to make thread safe?
Do I make all of them thread safe, and have each of them protect their data with a mutex, do I make just one of them thread safe, and assume that anybody using my code must understand that if you are using underlying classes they aren't inherently thread safe.
Example:
class A {
private:
B b;
}
class B {
private:
C c;
}
class C {
// data
}
I understand I need every class's data to avoid being corrupted via a data race, however I would like to avoid throwing a ton of mutexes on every single method of every class. I'm not sure what the proper convention is.

You almost certainly don't want to try to make every class thread-safe, since doing so would end up being very inefficient (with lots of unnecessary locking and unlocking of mutexes for no benefit) and also prone to deadlocks (the more mutexes you have to lock at once, the more likely you are to have different threads locking sequences of mutexes in a different order, which is the entry condition for a deadlock and therefore your program freezing up on you).
What you want to do instead if figure out which data structures needs to be accessed by which thread(s). When designing your data structures, you want to try to design them in such a way that the amount of data shared between threads is as minimal as possible -- if you can reduce it to zero, then you don't need to do any serialization at all! (you probably won't manage that, but if you do a CSP/message-passing design you can get pretty close, in that the only mutexes you ever need to lock are the ones protecting your message-passing queues)
Keep in mind also that your mutexes are there not just to "protect the data" but also to allow a thread to make a series of changes appear to be atom from the viewpoint of the other threads that might access that data. That is, if your thread #1 needs to make changes to objects A, B, and C, and all three of those objects each have their own mutex, which thread #1 locks before modifying the object and then unlocks afterwards, you can still have a race condition, because thread #2 might "see" the update half-completed (i.e. thread #2 might examine the objects after you've updated A but before you've updated B and C). Therefore you usually need to push your mutexes up to a level where they cover all the objects you might need to change in one go -- in the ABC example case, that means you might want to have a single mutex that is used to serialize access to A, B, and C.
One way to approach it would be to start with just a single global mutex for your entire program -- anytime any thread needs to read or write any data structure that is accessible to other threads, that is the mutex it locks (and unlocks afterwards). That design probably won't be very efficient (since threads might spend a lot of time waiting for the mutex), but it will definitely not suffer from deadlock problems. Then once you have that working, you could look to see if that single mutex is actually a noticeable performance bottleneck for you -- if not, you're done, ship your program :) OTOH if it is a bottleneck, you can then analyze which of your data structures are logically independent from each other, and split your global mutex into two mutexes -- one to serialize access to subset A of the data structures, and another one to serialize access to subset B. (Note that the subsets don't need to be equal size -- subset B might contain just one particular data structure that is critical to performance) Repeat as necessary until either you're happy with performance, or your program starts to get too complicated or buggy (in which case you might want to dial the mutex-granularity back again a bit in order to regain your sanity).

Multiple vector writers without locking

I have a few threads writing in a vector. It's possible that different threads try to write the same byte. There is no reads. Can I use only an atomic_fecth_or(), like in the example, so the vector will become thread safe? It compiled with GCC without errors or warnings.
std::vector<std::atomic<uint8_t>> MapVis(1024*1024);
void threador()
{
...
std::atomic_fetch_or(&MapVis[i], testor1);
}

It compiled with GCC without errors or warnings
That doesn't mean anything because compilers don't perform that sort of concurrency analysis. There are dedicated static analysis tools that may do this with varying levels of success.
Can I use only an atomic_fetch_or ...
you certainly can, and it will be safe at the level of each individual std::atomic<uint8_t>.
... the vector will become thread safe?
it's not sufficient that each element is accessed safely. You specifically need to avoid any operation that invalidates iterators (swap, resize, insert, push_back etc.).
I'd hesitate to say vector is thread-safe in this context - but you're limiting yourself to a thread-safe subset of its interface, so it will work correctly.
Note that as VTT suggests, keeping a separate partial vector per thread is better if possible. Partly because it's easier to prove correct, and partly because it avoids false sharing between cores.

Yes this is guaranteed to be thread safe due to atomic opperations being guaranteed of:
Isolation from interrupts, signals, concurrent processes and threads
Thus when you access an element of MapVis atomically you're guaranteed that any other process writing to it has already completed. And that your process will not be interrupted till it finishes writing.
The concern if you were using non-atomic variables would be that:
Thread A fetches the value of MapVis[i]
Thread B fetches the value of MapVis[i]
Thread A writes the ored value to MapVis[i]
Thread B writes the ored value to MapVis[i]
As you can see Thread B needed to wait until Thread A had finished writing otherwise it's just going to stomp Thread A's changes to MapVis[i]. With atomic variables the fetch and write cannot be interrupted by concurrent threads. Meaning that Thread B couldn't interrupt Thread A's read-write operations.

Shared Variables in C++11

So I took an OS class last semester and we had a concurrency/threading project. It was an airport sim that landed planes / had them take off into the direction that the wind was blowing. We had to do it in Java. So now that finals are over and I'm bored, I'm trying to do it in C++11. In Java I used a synchronized variable for the wind (0 - 360) in main and passed it to the 3 threads I was using. My question is: Can you do that in C++11? It's a basic reader/writer, one thread writes/updates the wind, the other 2 (takeoff/land) read.
I got it working by having a global wind variable in my "threads.cpp" implementation file. But is there a way to pass a variable to as many threads as I want and all of them keep up with it? Or is it actually better for me to just use the global variable and not pass anything?(why/why not?) I was looking at std::ref() but that didn't work.
EDIT: I'm already using mutex and lock_guard. I'm just trying to figure out how to pass and keep a variable up to date in all threads. Right now it only updates in the write thread.

You can use a std::mutex with std::lock_guard to synchronize access to the shared data. Or if the shared data fits in an integer, you can use std::atomic<int> without locking.
If you want to avoid global variables, simply pass the address of the shared state to the thread functions when you launch them. For example:
void thread_entry1(std::atomic<int>* val) {}
void thread_entry2(std::atomic<int>* val) {}
std::atomic<int> shared_value;
std::thread t1(thread_entry1, &shared_value);
std::thread t2(thread_entry2, &shared_value);

Using std::mutex and std::lock_guard mimicks what a Java synchronized variable does (only in Java this happens secretly without you knowing, in C++ you do it explicitly).
However, having one producer (there is just one direction of wind) and otherwise only consumers, it suffices to write to a e.g. std::atomic<int> variable with relaxed ordering, and to read from that variable from each consumer, again with relaxed ordering. Unless you have the requirement that the global view of all airplanes are consistent (but then you would have to run a lockstep simulation, which makes threading nonsensical), there is no need for synchronization, you only have to make sure that any value that any airplane reads at any time is eventually correct and that no garbled intermediate results can occur. In other words, you need an atomic update.
Relaxed memory ordering is sufficient too, since if all you read is one value, you do not need any happens-before guarantees.
An atomic update (or rather, atomic write) is at least an order of magnitude, if not more, faster. Atomic reads and writes with relaxed ordering are indeed plain normal reads and writes on many (most) mainstream architectures.
The variable needs not be global, you can as well keep it in the main thread's simultion loop's scope and pass a reference (or pointer) to the threads.

You might want to create say, the wind object, on the heap with new through an std::shared_ptr. Pass this pointer to all interested threads and use a std::mutex and std::lock_guard to change it.

Boost, mutex concept

I am new to multi-threading programming, and confused about how Mutex works. In the Boost::Thread manual, it states:
Mutexes guarantee that only one thread can lock a given mutex. If a code section is surrounded by a mutex locking and unlocking, it's guaranteed that only a thread at a time executes that section of code. When that thread unlocks the mutex, other threads can enter to that code region:
My understanding is that Mutex is used to protect a section of code from being executed by multiple threads at the same time, NOT protect the memory address of a variable. It's hard for me to grasp the concept, what happen if I have 2 different functions trying to write to the same memory address.
Is there something like this in Boost library:
lock a memory address of a variable, e.g., double x, lock (x); So
that other threads with a different function can not write to x.
do something with x, e.g., x = x + rand();
unlock (x)
Thanks.

The mutex itself only ensures that only one thread of execution can lock the mutex at any given time. It's up to you to ensure that modification of the associated variable happens only while the mutex is locked.
C++ does give you a way to do that a little more easily than in something like C. In C, it's pretty much up to you to write the code correctly, ensuring that anywhere you modify the variable, you first lock the mutex (and, of course, unlock it when you're done).
In C++, it's pretty easy to encapsulate it all into a class with some operator overloading:
class protected_int {
int value; // this is the value we're going to share between threads
mutex m;
public:
operator int() { return value; } // we'll assume no lock needed to read
protected_int &operator=(int new_value) {
lock(m);
value = new_value;
unlock(m);
return *this;
}
};
Obviously I'm simplifying that a lot (to the point that it's probably useless as it stands), but hopefully you get the idea, which is that most of the code just treats the protected_int object as if it were a normal variable.
When you do that, however, the mutex is automatically locked every time you assign a value to it, and unlocked immediately thereafter. Of course, that's pretty much the simplest possible case -- in many cases, you need to do something like lock the mutex, modify two (or more) variables in unison, then unlock. Regardless of the complexity, however, the idea remains that you centralize all the code that does the modification in one place, so you don't have to worry about locking the mutex in the rest of the code. Where you do have two or more variables together like that, you generally will have to lock the mutex to read, not just to write -- otherwise you can easily get an incorrect value where one of the variables has been modified but the other hasn't.

No, there is nothing in boost(or elsewhere) that will lock memory like that.
You have to protect the code that access the memory you want protected.
what happen if I have 2 different functions trying to write to the same
memory address.
Assuming you mean 2 functions executing in different threads, both functions should lock the same mutex, so only one of the threads can write to the variable at a given time.
Any other code that accesses (either reads or writes) the same variable will also have to lock the same mutex, failure to do so will result in indeterministic behavior.

It is possible to do non-blocking atomic operations on certain types using Boost.Atomic. These operations are non-blocking and generally much faster than a mutex. For example, to add something atomically you can do:
boost::atomic<int> n = 10;
n.fetch_add(5, boost:memory_order_acq_rel);
This code atomically adds 5 to n.

In order to protect a memory address shared by multiple threads in two different functions, both functions have to use the same mutex ... otherwise you will run into a scenario where threads in either function can indiscriminately access the same "protected" memory region.
So boost::mutex works just fine for the scenario you describe, but you just have to make sure that for a given resource you're protecting, all paths to that resource lock the exact same instance of the boost::mutex object.

I think the detail you're missing is that a "code section" is an arbitrary section of code. It can be two functions, half a function, a single line, or whatever.
So the portions of your 2 different functions that hold the same mutex when they access the shared data, are "a code section surrounded by a mutex locking and unlocking" so therefore "it's guaranteed that only a thread at a time executes that section of code".
Also, this is explaining one property of mutexes. It is not claiming this is the only property they have.

Your understanding is correct with respect to mutexes. They protect the section of code between the locking and unlocking.
As per what happens when two threads write to the same location of memory, they are serialized. One thread writes its value, the other thread writes to it. The problem with this is that you don't know which thread will write first (or last), so the code is not deterministic.
Finally, to protect a variable itself, you can find a near concept in atomic variables. Atomic variables are variables that are protected by either the compiler or the hardware, and can be modified atomically. That is, the three phases you comment (read, modify, write) happen atomically. Take a look at Boost atomic_count.

Performing a pointer swap in a double-buffer multithread system

When double-buffering data that's due to be shared between threads, I've used a system where one thread reads from one buffer, one thread reads from the other buffer and reads from the first buffer. The trouble is, how am I going to implement the pointer swap? Do I need to use a critical section? There's no Interlocked function available that will actually swap values. I can't have thread one reading from buffer one, then start reading from buffer two, in the middle of reading, that would be appcrash, even if the other thread didn't then begin writing to it.
I'm using native C++ on Windows in Visual Studio Ultimate 2010 RC.

Using critical sections is the accepted way of doing it. Just share a CRITICAL_SECTION object between all your threads and call EnterCriticalSection and LeaveCriticalSection on that object around your pointer manipulation/buffer reading/writing code. Try to finish your critical sections as soon as possible, leaving as much code outside the critical sections as possible.
Even if you use the double interlocked exchange trick, you still need a critical section or something to synchronize your threads, so might as well use it for this purpose too.

This sounds like a reader-writer-mutex type problem to me.
[ ... but I mostly do embedded development so this may make no sense for a Windows OS.
Actually, in an embedded OS with a priority-based scheduler, you can do this without any synchronization mechanism at all, if you guarantee that the swap is atomic and only allow the lower-priority thread to swap the buffers. ]
Suppose you have two buffers, B1 and B2, and you have two threads, T1 and T2. It's OK if T1 is using B1 while T2 is using B2. By "using" I mean reading and/or writing the buffer. Then at some time, the buffers need to swap so that T1 is using B2 and T2 is using B1. The thing you have to be careful of is that the swap is done while neither thread is accessing its buffer.
Suppose you used just one simple mutex. T1 could acquire the mutex and use B1. If T2 wanted to use B2, it would have to wait for the mutex. When T1 completed, T2 would unblock and do its work with B2. If either thread (or some third-party thread) wanted to swap the buffers, it would also have to take the mutex. Thus, using just one mutex serializes access to the buffers -- not so good.
It might work better if you use a reader-writer mutex instead. T1 could acquire a read-lock on the mutex and use B1. T2 could also acquire a read-lock on the mutex and use B2. When one of those threads (or a third-party thread) decides it's time to swap the buffers, it would have to take a write-lock on the mutex. It won't be able to acquire the write-lock until there are no more read-locks. At that point, it can swap the buffer pointers, knowing that nobody is using either buffer because when there is a write-lock on the mutex, all attempts to read-lock will block.

You have to build your own function to swap the pointers which uses a semaphore or critical section to control it. The same protection needs to be added to all users of pointers, since any code which reads a pointer which is in the midst of being modified is bad.
One way to manage this is to have all the pointer manipulation logic work under the protection of the lock.

Why can't you use InterlockedExchangePointer ?
edit: Ok, I get what you are saying now, IEP doesn't actually swap 2 live pointers with each other since it only takes a single value by reference.

See, I did originally design the threads so that they would be fully asynchronous and don't require any synchronizing in their regular operations. But, since I'm performing operations on a per-object basis in a thread pool, if a given object is unreadable because it's currently being synced, I can just do another while I'm waiting. In a sense, I can both wait and operate at the same time, since I have plenty of threads to go around.
Create two critical sections, one for each of the threads.
While rendering, hold the render crit section. The other thread can still do what it likes to the other crit section though. Use TryEnterCriticalSection, and if it's held, then return false, and add the object in a list to be re-rendered later. This should allow us to keep rendering even if a given object is currently being updated.
While updating, hold both crit sections.
While doing game logic, hold the game logic crit section. If it's already held, that's no problem, because we have more threads than actual processors. So if this thread is blocked, then another thread will just use the CPU time and this doesn't need to be managed.

You haven't mentioned what your Windows platform limitations are, but if you don't need compatibility with older versions than Windows Server 2003, or Vista on the client side, you can use the InterlockedExchange64() function to exchange a 64 bit value. By packing two 32-bit pointers into a 64-bit pair structure, you can effectively swap two pointers.
There are the usual Interlocked* variantions on that; InterlockedExchangeAcquire64(), InterlockedCompareExchange64(), etc...
If you need to run on, say, XP, I'd go for a critical section. When the chance of contention is low, they perform quite well.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js