Common resource of two threads - synchronization enough?

Common resource of two threads - synchronization enough? - c++

Say I have some class whatever nature it might have. I want to share an object of this class between threads. In the past, I would have thought that a mutex - while it might not be the most efficient way - would be enough to make sure that everything works.
Now, I have read a bit about std::atomic and that it is necessary even for simple bool references: Do I have to use atomic<bool> for "exit" bool variable?
While I understand why a bool should be atomic, I do not understand how a simple mutex prevents the following issue:
Second, when two threads run on different cores, they have separate caches; writing a value stores it in the cache, but doesn't update other caches, so a thread might not see a value written by another thread.
Is a mutex not merely a mechanism that makes sure that some other thread is not be able to lock the mutex? But within the mutex area I might play around with a whole bunch of variables. The compiler might not know which variables are involved.
As a consequence, simply putting a mutex-based lock around all areas that contain shared resources does not seem sufficient to me at the moment. Could it not still be that the two threads have different versions of this resource because the thread caches just will not be updated?

The C++ storage model guarantees that changes to objects in one thread are visible to other threads if they are protected by a mutex. See http://en.cppreference.com/w/cpp/language/memory_model for details.

Related

C/C++ arrays with threads - do I need to use mutexes or locks?

I am new to using threads and have read a lot about how data is shared and protected. But I have also not really got a good grasp of using mutexes and locks to protect data.
Below is a description of the problem I will be working on. The important thing to note is that it will be time-critical, so I need to reduce overheads as much as possible.
I have two fixed-size double arrays.
The first array will provide data for subsequent calculations.
Threads will read values from it, but it will never be modified. An element may be read at some time by any of the threads.
The second array will be used to store the results of the calculations performed by the threads. An element of this array will only ever be updated by one thread, and probably only once when the result value
is written to it.
My questions then:
Do I really need to use a mutex in a thread each time I access the data from the read-only array? If so, could you explain why?
Do I need to use a mutex in a thread when it writes to the result array even though this will be the only thread that ever writes to this element?
Should I use atomic data types, and will there be any significant time overhead if I do?
Many answers to this type of question seem to be - no, you don't need the mutex if your variables are aligned. Would my array elements in this example be aligned, or is there some way to ensure they are?
The code will be implemented on 64bit Linux. I am planning on using Boost libraries for multithreading.
I have been mulling this over and looking all over the web for days, and once posted, the answer and clear explanations came back in literally seconds. There is an "accepted answer," but all the answers and comments were equally helpful.

Do I really need to use a mutex in a thread each time I access the data from the read-only array? If so could you explain why?
No. Because the data is never modified, there cannot be synchronization problem.
Do I need to use a mutex in a thread when it writes to the result array even though this will be the only thread that ever writes to this element?
Depends.
If any other thread is going to read the element, you need synchronization.
If any thread may modify the size of the vector, you need synchronization.
In any case, take care of not writing into adjacent memory locations by different threads a lot. That could destroy the performance. See "false sharing". Considering, you probably don't have a lot of cores and therefore not a lot of threads and you say write is done only once, this is probably not going to be a significant problem though.
Should I use atomic data types and will there be any significant time over head if I do?
If you use locks (mutex), atomic variables are not necessary (and they do have overhead). If you need no synchronization, atomic variables are not necessary. If you need synchronization, then atomic variables can be used to avoid locks in some cases. In which cases can you use atomics instead of locks... is more complicated and beyond the scope of this question I think.
Given the description of your situation in the comments, it seems that no synchronization is required at all and therefore no atomics nor locks.
...Would my array elements in this example be aligned, or is there some way to ensure they are?
As pointed out by Arvid, you can request specific alignment using the alginas keyword which was introduced in c++11. Pre c++11, you may resort to compiler specific extensions: https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gcc/Variable-Attributes.html

Under the two conditions given, there's no need for mutexes. Remember every use of a mutex (or any synchronization construct) is a performance overhead. So you want to avoid them as much as possible (without compromising correct code, of course).
No. Mutexes are not needed since threads are only reading the array.
No. Since each thread only writes to a distinct memory location, no race condition is possible.
No. There's no need for atomic access to objects here. In fact, using atomic objects could affect the performance negatively as it prevents the optimization possibilities such as re-ordering operations.

The only time you need to use Locks is when data is modified on a shared resource. Eg if some threads where used to write data and some used to read data (in both cases from the same resource) then you only need a lock for when writing is done. This is to prevent whats known as "race".
There is good information of race on google for when you make programs that manipulate data on a shared resource.

You are on the right track.
1) For the first array (read only) , you do not need to utilize a mutex lock for it. Since the threads are just reading not altering the data there is no way a thread can corrupt the data for another thread
2) I'm a little confused by this question. If you know that thread 1 will only write an element to array slot 1 and thread 2 will only write to array slot 2 then you do not need a mutex lock. However I'm not sure how your achieving this property. If my above statement is not correct for your situation you would definitely need a mutex lock.
3) Given the definition of atomic:
Atomic types are types that encapsulate a value whose access is guaranteed to not cause data races and can be used to synchronize memory accesses among different threads.
Key note, a mutex lock is atomic meaning that there is only 1 assembly instruction needed to grab/release a lock. If it required 2 assembly instructions to grab/release a lock, a lock would not be thread safe. For example, if thread 1 attempted to grab a lock and was switched to thread 2, thread 2 would grab the lock.
Use of atomic data types would decrease your overhead but not significantly.
4) I'm not sure how you can assure your variables are lined. Since threads can switch at any moment in your program (Your OS determines when a thread switches)
Hope this helps

Thread safe programming

I keep hearing about thread safe. What is that exactly and how and where can I learn to program thread safe code?
Also, assume I have 2 threads, one that writes to a structure and another one that reads from it. Is that dangerous in any way? Is there anything I should look for? I don't think it is a problem. Both threads will not (well can't ) be accessing the struct at the exact same time..
Also, can someone please tell me how in this example : https://stackoverflow.com/a/5125493/1248779 we are doing a better job in concurrency issues. I don't get it.

It's a very deep topic. At the heart threads are usually about making things go fast by using multiple cores at the same time; or about doing long operations in the background when you don't have a good way to interleave the operation with a 'primary' thread. The latter being very common in UI programming.
Your scenario is one of the classic trouble spots, and one of the first people run into. It's vary rare to have a struct where the members are truly independent. It's very common to want to modify multiple values in the structure to maintain consistency. Without any precautions it is very possible to modify the first value, then have the other thread read the struct and operate on it before the second value has been written.
Simple example would be a 'point' struct for 2d graphics. You'd like to move the point from [2,2] to [5,6]. If you had a different thread drawing a line to that point you could end up drawing to [5,2] very easily.
This is the tip of the iceberg really. There are lots of great books, but learning this space usually goes something like this:
Uh oh, I just read from that thing in an inconsistent state.
Uh oh, I just modified that thing from 2 threads and now it's garbage.
Yay! I learned about locks
Whoa, I have a lot of locks and everything seems to just hang sometimes when I have lots of them locking in nested code.
Hrm. I need to stop doing this locking on the fly, I seem to be missing a lot of places; so I should encapsulate them in a data structure.
That data structure thing was great, but now I seem to be locking all the time and my code is just as slow as a single thread.
condition variables are weird
It's fast because I got clever with how I lock things. Hrm. Sometimes data corrupts.
Whoa.... InterlockedWhatDidYouSay?
Hey, look no lock, I do this thing called a spin lock.
Condition variables. Hrm... I see.
You know what, how about I just start thinking about how to operate on this stuff in completely independent ways, pipelineing my operations, and having as few cross thread dependencies as possible...
Obviously it's not all about condition variables. But there are many problems that can be solved with threading, and probably almost as many ways to do it, and even more ways to do it wrong.

Thread-safety is one aspect of a larger set of issues under the general heading of "Concurrent Programming". I'd suggest reading around that subject.
Your assumption that two threads cannot access the struct at the same time is not good. First: today we have multi-core machines, so two threads can be running at exactly the same time. Second: even on a single core machine the slices of time given to any other thread are unpredicatable. You have to anticipate that ant any arbitrary time the "other" thread might be processing. See my "window of opportunity" example below.
The concept of thread-safety is exactly to answer the question "is this dangerous in any way". The key question is whether it's possible for code running in one thread to get an inconsistent view of some data, that inconsistency happening because while it was running another thread was in the middle of changing data.
In your example, one thread is reading a structure and at the same time another is writing. Suppose that there are two related fields:
{ foreground: red; background: black }
and the writer is in the process of changing those
foreground = black;
<=== window of opportunity
background = red;
If the reader reads the values at just that window of opportunity then it sees a "nonsense" combination
{ foreground: black; background: black }
This essence of this pattern is that for a brief time, while we are making a change, the system becomes inconsistent and readers should not use the values. As soon as we finish our changes it becomes safe to read again.
Hence we use the CriticalSection APIs mentioned by Stefan to prevent a thread seeing an inconsistent state.

what is that exactly?
Briefly, a program that may be executed in a concurrent context without errors related to concurrency.
If ThreadA and ThreadB read and/or write data without errors and use proper synchronization, then the program may be threadsafe. It's a design choice -- making an object threadsafe can be accomplished a number of ways, and more complex types may be threadsafe using combinations of these techniques.
and how and where can I learn to program thread safe code?
boost/libs/thread/ would likely be a good introduction. The topic is quite complex.
The C++11 standard library provides implementations for locks, atomics and threads -- any well written programs which use these would be a good read. The standard library was modeled after boost's implementation.
also, assume I have 2 threads one that writes to a structure and another one that reads from it. Is that dangerous in any way? is there anything I should look for?
Yes, it can be dangerous and/or may produce incorrect results. Just imagine that a thread may run out of its time at any point, and then another thread could then read or modify that structure -- if you have not protected it, it may be in the middle of an update. A common solution is a lock, which can be used to prevent another thread from accessing shared resources during reads/writes.

When writing multithreaded C++ programs on WIN32 platforms, you need to protect certain shared objects so that only one thread can access them at any given time from different threads. You can use 5 system functions to achieve this. They are InitializeCriticalSection, EnterCriticalSection, TryEnterCriticalSection, LeaveCriticalSection, and DeleteCriticalSection.
Also maybe this links can help:
how to make an application thread safe?
http://www.codeproject.com/Articles/1779/Making-your-C-code-thread-safe

Thread safety is a simple concept: is it "safe" to perform operation A on one thread whilst another thread is performing operation B, which may or may not be the same as operation A. This can be extended to cover many threads. In this context, "safe" means:
No undefined behaviour
All invariants of the data structures are guaranteed to be observed by the threads
The actual operations A and B are important. If two threads both read a plain int variable, then this is fine. However, if any thread may write to that variable, and there is no synchronization to ensure that the read and write cannot happen together, then you have a data race, which is undefined behaviour, and this is not thread safe.
This applies equally to the scenario you asked about: unless you have taken special precautions, then it is not safe to have one thread read from a structure at the same time as another thread writes to it. If you can guarantee that the threads cannot access the data structure at the same time, through some form of synchronization such as a mutex, critical section, semaphore or event, then there is not a problem.
You can use things like mutexes and critical sections to prevent concurrent access to some data, so that the writing thread is the only thread accessing the data when it is writing, and the reading thread is the only thread accessing the data when it is reading, thus providing the guarantee I just mentioned. This therefore avoids the undefined behaviour mentioned above.
However, you still need to ensure that your code is safe in the wider context: if you need to modify more than one variable then you need to hold the lock on the mutex across the whole operation rather than for each individual access, otherwise you may find that the invariants of your data structure may not be observed by other threads.
It is also possible that a data structure may be thread safe for some operations but not others. For example, a single-producer single-consumer queue will be OK if one thread is pushing items on the queue and another is popping items off the queue, but will break if two threads are pushing items, or two threads are popping items.
In the example you reference, the point is that global variables are implicitly shared between all threads, and therefore all accesses must be protected by some form of synchronization (such as a mutex) if any thread can modify them. On the other hand, if you have a separate copy of the data for each thread, then that thread can modify its copy without worrying about concurrent access from any other thread, and no synchronization is required. Of course, you always need synchronization if two or more threads are going to operate on the same data.
My book, C++ Concurrency in Action covers what it means for things to be thread safe, how to design thread safe data structures, and the C++ synchronization primitives used for the purpose, such as std::mutex.

Threads safe is when a certain block of code is protected from being accessed by more than one thread. Meaning that the data manipulated always stays in a consistent state.
A common example is the producer consumer problem where one thread reads from a data structure while another thread writes to the same data structure : Detailed explanation

To answer the second part of the question: Imagine two threads both accessing std::vector<int> data:
//first thread
if (data.size() > 0)
{
std::cout << data[0]; //fails if data.size() == 0
}
//second thread
if (rand() % 5 == 0)
{
data.clear();
}
else
{
data.push_back(1);
}
Run these threads in parallel and your program will crash because std::cout << data[0]; might be executed directly after data.clear();.
You need to know that at any point of your thread code, the thread might be interrupted, e.g. after checking that (data.size() > 0), and another thread could become active. Although the first thread looks correct in a single threaded app, it's not in a multi-threaded program.

Threads and simple Dead lock cure

When dealing with threads (specifically in C++) using mutex locks and semaphores is there a simple rule of thumb to avoid Dead Locks and have nice clean Synchronization?

A good simple rule of thumb is to always obtain your locks in a consistent predictable order from everywhere in your application. For example, if your resources have names, always lock them in alphabetical order. If they have numeric ids, always lock from lowest to highest. The exact order or criteria is arbitrary. The key is to be consistent. That way you'll never have a deadlock situation. eg.
Thread 1 locks resource A
Thread 2 locks resource B
Thread 1 waits to obtain a lock on B
Thread 2 waits to obtain a lock on A
Deadlock
The above can never happen if you follow the rule of thumb outlined above. For a more detailed discussion, see the Wikipedia entry on the Dining Philosophers problem.

If at all possible, design your code so that you never have to lock more then a single mutex/semaphore at a time.
If that's not possible, make sure to always lock multiple mutex/semaphores in the same order. So if one part of the code locks mutex A and then takes semaphore B, make sure that no other part of the code takes semaphore B and then locks mutex A.

Try to avoid acquiring one lock and trying to acquire another. This can result into circular dependency and cause for deadlock.
If it is un-avoidable then at least the order of acquire locks should be predictable.
Use RAII ( to make sure lock is release properly in case of exception as well)

There is no simple deadlock cure.
Acquire locks in agreed order: If all calls acquire A->B->C then no deadlock can occur. Deadlocks can occur only if the locking order differs between the two threads (one acquires A->B the second B->A).
In practice is hard to choose an order between arbitrary objects in memory. On a simple trivial project is possible, but on large projects with many individual contributors is very hard. A partial solution is to create hierarchies, by ranking the locks. All locks in module A have rank 1, all locks in module B have rank 2. One can acquire a lock of rank 2 when helding locks of rank 1, but not vice-versa. Of course you need a framework around the locking primitives that tracks and validates the ranking.

One way to ensure the ordering that other folks have talked about is to acquire locks in an order defined by their memory address. If at any point, you try to acquire a lock that should have been earlier in the sequence, you release all the locks and start over.
With a little work, it's possible to do this nearly automatically with some wrapper classes around the system primitives.

There's no practical cure. Specifically, there's no way to simply test code for being synchronizationally correct, or to have your programmers obey the rules of the gentleman with the green V.
There's no way to properly test the multithreaded code, because the program logic may depend on timing of locks acquisition, and therefore, be different from execution to execution, somehow invalidating the concept of QA.
I would say
prefer using threads only as a performance optimization for multi-core machines
only optimize performance when you are sure you need this performance
you may use threads to simplify program logic, but only when you are absolutely sure what you are doing. Be extra careful and all locks are confined to a very small piece of code. Do not let any newbies near such code.
never use threads in a mission-critical system, such as flying an aircraft or operating dangerous machinery
in all cases, threads are seldom cost-effective, due to higher debug and QA costs
If you determined to do threads or maintaining existing codebase:
confine all locks to small and simple pieces of code, which operate on primitives
avoid function calls or getting the program flow away to where the fact of being executed under lock is not immediately visible. This function will change by future authors, widening your lock span without your control.
get locks inside objects to reduce locking scope, wrap non-thread-safe 3rd-party objects with your own thread-safe interfaces.
never send synchronous notifications (callbacks) when executing under lock
use only RAII locks, to reduce the cognitive load when thinking "how else can we exit from here", as in exceptions, etc.
A few words on how to avoid multi-threading.
A single-threaded design usually involves some heart-beat function provided by program components, and called in a loop (called heartbeat cycle) which, when called, gives a chance to all components to do the next piece of work and to surrender control back again. What algorithmists like to think of as "loops" inside the components, will turn into state machines, to identify what is the next thing that should be done when called. State is best maintained as member data of respective objects.

There are plenty of simple "deadlock cures". But none that are easy to apply and work universally.
The simplest of all, of course, is "never have more than one thread".
Assuming you have a multithreaded application though, there are still a number of solutions:
You can try to minimize shared state and synchronization. Two threads that just run in parallel and never interact can never deadlock. Deadlocks only occur when multiple threads try to access the same resource. Why do they do that? Can that be avoided? Can the resource be restructured or divided so that for example, one thread can write to it, and other threads are asynchronously passed the data they need?
Perhaps the resource can be copied, giving each thread its own private copy to work with?
And as already mentioned by every other answer, if and when you try to acquire locks, do so in a global consistent order. To simplify this, you should try to ensure that all the locks a thread is going to need are acquired as a single operation. If a thread needs to acquire locks A, B and C, it should not make three lock() calls at different times and from different places. You'll get confused, and you won't be able to keep track of which locks are held by the thread, and which ones it has yet to acquire, and then you'll mess up the order. If you can acquire all the lock you need once, then you can factor it out into a separate function call which acquires N locks, and does so in the correct order to avoid deadlocks.
Then there are the more ambitious approaches: Techniques like CSP make threading extremely simple and easy to prove correct, even with thousands of concurrent threads. But it requires you to structure your program very differently from what you're used to.
Transactional Memory is another promising option, and one that may be easier to integrate into conventional programs. But production-quality implementations are still very rare.

Read Deadlock: the Problem and a Solution.
"The common advice for avoiding deadlock is to always lock the two mutexes in the same order: if you always lock mutex A before mutex B, then you'll never deadlock. Sometimes this is straightforward, as the mutexes are serving different purposes, but other times it is not so simple, such as when the mutexes are each protecting a separate instance of the same class".

If you want to attack the possibility of a deadlock you must attack one of the 4 crucial conditions for the existence of a deadlock.
The 4 conditions for a deadlock are:
1. Mutual Exclusion - only one thread can enter the critical section at a time.
2. Hold and Wait - a thread doesn't release the resources he acquired as long as he didn't finish his job even if other resources are un available.
3. No preemption - A thread doesn't have a priority over other threads.
4. Resource Cycle - There has to be a cycle chain of threads that waits for resources from other threads.
The easiest condition to attack is the resource cycle by making sure that no cycles are possible.

Why do locks work?

If the locks make sure only one thread accesses the locked data at a time, then what controls access to the locking functions?
I thought that boost::mutex::scoped_lock should be at the beginning of each of my functions so the local variables don't get modified unexpectedly by another thread, is that correct? What if two threads are trying to acquire the lock at very close times? Won't the lock's local variables used internally be corrupted by the other thread?
My question is not boost-specific but I'll probably be using that unless you recommend another.

You're right, when implementing locks you need some way of guaranteeing that two processes don't get the lock at the same time. To do this, you need to use an atomic instruction - one that's guaranteed to complete without interruption. One such instruction is test-and-set, an operation that will get the state of a boolean variable, set it to true, and return the previously retrieved state.
What this does is this allows you to write code that continually tests to see if it can get the lock. Assume x is a shared variable between threads:
while(testandset(x));
// ...
// critical section
// this code can only be executed by once thread at a time
// ...
x = 0; // set x to 0, allow another process into critical section
Since the other threads continually test the lock until they're let into the critical section, this is a very inefficient way of guaranteeing mutual exclusion. However, using this simple concept, you can build more complicated control structures like semaphores that are much more efficient (because the processes aren't looping, they're sleeping)

You only need to have exclusive access to shared data. Unless they're static or on the heap, local variables inside functions will have different instances for different threads and there is no need to worry. But shared data (stuff accessed via pointers, for example) should be locked first.
As for how locks work, they're carefully designed to prevent race conditions and often have hardware level support to guarantee atomicity. IE, there are some machine language constructs guaranteed to be atomic. Semaphores (and mutexes) may be implemented via these.

The simplest explanation is that the locks, way down underneath, are based on a hardware instruction that is guaranteed to be atomic and can't clash between threads.
Ordinary local variables in a function are already specific to an individual thread. It's only statics, globals, or other data that can be simultaneously accessed by multiple threads that needs to have locks protecting it.

The mechanism that operates the lock controls access to it.
Any locking primitive needs to be able to communicate changes between processors, so it's usually implemented on top of bus operations, i.e., reading and writing to memory. It also needs to be structured such that two threads attempting to claim it won't corrupt its state. It's not easy, but you can usually trust that any OS implemented lock will not get corrupted by multiple threads.

In a multi-threaded C++ app, do I need a mutex to protect a simple boolean?

I have a multi-threaded C++ app which does 3D rendering with the OpenSceneGraph library. I'm planning to kick off OSG's render loop as a separate thread using boost::threads, passing a data structure containing shared state in to the thread. I'm trying to avoid anything too heavyweight (like mutexes) for synchronization, as the render loop needs to be pretty tight, and OSG itself tries to avoid having to ever lock. Most of the shared state is set before the thread is started, and never changed. I do have some data that does need to be changed, which I am planning to double-buffer. However, I have a simple boolean for signaling the thread to suspend rendering, and later resume rendering, and another to kill it. In both cases the app thread sets the bool, and the render thread only reads it. Do I need to synchronize access to these bools? As far as I can tell, the worse thing that could happen is the the render loop continues on for an extra frame before suspending or quitting.

In C++11 and later, which has standards-defined concurrency, use std::atomic<bool> for this purpose. From http://en.cppreference.com/w/cpp/atomic/atomic:
If one thread writes to an atomic object while another thread reads from it, the behavior is well-defined (see memory model for details on data races).
The following old answer may have been true at some time in the past with some compilers and some operating environments, but it should not be relied upon today:
You're right, in this case you won't need to synchronise the bools. You should declare them volatile though, to ensure that the compiler actually reads them from memory each time, instead of caching the previous read in a thread (that's a simplified explanation, but it should do for this purpose).
The following question has more information about this:
C++ Thread, shared data

Why not simply use an interlocked variable?

As for C++11 and later it is finally threads-aware and clearly states that modifying a bool (or other non-atomic variable) in one thread and accessing it at the same time in another one is undefined behavior.
In you case using std::atomic<bool> should be enough to make your program correct, saving you from using locks.
Do not use volatile. It has nothing to do with threads.
For more discussion look at Can I read a bool variable in a thread without mutex?

I don't think you need a fully fledged mutex here -- though the render thread will need to busy wait in the 'suspended' state if you aren't using a synchronization object that supports a wait primitive.
You should look into using the various interlocked exchange primitives though (InterlockedExchange under Windows). Not because read/writes from the bool are non-atomic, but to ensure that there are no weird behaviours the compiler reordering memory accesses on a single thread.

This thread has a little more info and discussion on thread-safety, especially for simple data types:
How can I create a thread-safe singleton pattern in Windows?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js