One reader. One writer. Some general questions about mutexes and atomic-builtins - c++

I have a parent and a worker thread that share a bool flag and a std::vector. The parent only reads (i.e., reads the bool or calls my_vector.empty()); the worker only writes.
My questions:
Do I need to mutex protect the bool flag?
Can I say that all bool read/writes are inherently atomic operations? If you say Yes or No, where did you get your information from?
I recently heard about GCC Atomic-builtin. Can I use these to make my flag read/writes atomic without having to use mutexes? What is the difference? I understand Atomic builtins boil down to machine code, but even mutexes boil down to CPU's memory barrier instructions right? Why do people call mutexes an "OS-level" construct?
Do I need to mutex protect my std::vector? Recall that the worker thread populates this vector, whereas the parent only calls empty() on it (i.e., only reads it)
I do not believe mutex protection is necessary for either the bool or the vector. I rationalize as follows, "Ok, if I read the shared memory just before it was updated.. thats still fine, I will get the updated value the next time around. More importantly, I do not see why the writer should be blocked while the reading is reading, because afterall, the reader is only reading!"
If someone can point me in the right direction, that would be just great. I am on GCC 4.3, and Intel x86 32-bit.
Thanks a lot!

Do I need to mutex protect the bool flag?
Not necessarily, an atomic instruction would do. By atomic instruction I mean a compiler intrinsic function that a) prevents compiler reordering/optimization and b) results in atomic read/write and c) issues an appropriate memory fence to ensure visibility between CPUs (not necessary for current x86 CPUs which employ MESI cache coherency protocol). Similar to gcc atomic builtins.
Can I say that all bool read/writes are inherently atomic operations? If you say Yes or No, where did you get your information from?
Depends on the CPU. For Intel CPUs - yes. See Intel® 64 and IA-32 Architectures Software Developer's Manuals.
I recently heard about GCC Atomic-builtin. Can I use these to make my flag read/writes atomic without having to use mutexes? What is the difference? I understand Atomic builtins boil down to machine code, but even mutexes boil down to CPU's memory barrier instructions right? Why do people call mutexes an "OS-level" construct?
The difference between atomics and mutexes is that the latter can put the waiting thread to sleep until the mutex is released. With atomics you can only busy-spin.
Do I need to mutex protect my std::vector? Recall that the worker thread populates this vector, whereas the parent only calls empty() on it (i.e., only reads it)
You do.
I do not believe mutex protection is necessary for either the bool or the vector. I rationalize as follows, "Ok, if I read the shared memory just before it was updated.. thats still fine, I will get the updated value the next time around. More importantly, I do not see why the writer should be blocked while the reading is reading, because afterall, the reader is only reading!"
Depending on the implementation, vector.empty() may involve reading two buffer begin/end pointers and subtracting or comparing them, hence there is a chance that you read a new version of one pointer and an old version of another one without a mutex. Surprising behaviour may ensue.

From the C++11 standards point of view, you have to protect the bool with a mutex, or alternatively use std::atomic<bool>. Even when you are sure that your bool is read and written to atomically anyways, there is still the chance that the compiler can optimize away accesses to it because it does not know about other threads that could potentially access it.
If for some reason you absolutely need the latest bit of performance of your platform, consider reading the "Intel 64 and IA-32 Architectures Software Developer's Manual", which will tell you how things work under the hood on your architecture. But of course, this will make your program unportable.

Answers:
You will need to protect the bool (or any other variable for that matter) that has the possibility of being operated on by two or more threads at the same time. You can either do this with a mutex or by operating on the bool atomically.
Bool reads and bool writes may be atomic operations, but two sequential operations are certainly not (e.g., a read and then a write). More on this later.
Atomic builtins provide a solution to the problem above: the ability to read and write a variable in a step that cannot be interrupted by another thread. This makes the operation atomic.
If you are using the bool flag as your 'mutex' (that is, only the thread that sets the bool flag to true has permission to modify the vector) then you're OK. The mutual exclusion is managed by the boolean, and as long as you're modifying the bool using atomic operations you should be all set.
To answer this, let me use an example:
bool flag(false);
std::vector<char> my_vector;
while (true)
{
if (flag == false) // check to see if the mutex is owned
{
flag = true; // obtain ownership of the flag (the mutex)
// manipulate the vector
flag = false; // release ownership of the flag
}
}
In the above code in a multithreaded environment it is possible for the thread to be preempted between the if statement (the read) and the assignment (the write), which means it possible for two (or more) threads with this kind of code to both "own" the mutex (and the rights to the vector) at the same time. This is why atomic operations are crucial: they ensure that in the above scenario the flag will only be set by one thread at a time, therefore ensuring the vector will only be manipulated by one thread at a time.
Note that setting the flag back to false need not be an atomic operation because you this instance is the only one with rights to modify it.
A rough (read: untested) solution may look something like:
bool flag(false);
std::vector<char> my_vector;
while (true)
{
// check to see if the mutex is owned and obtain ownership if possible
if (__sync_bool_compare_and_swap(&flag, false, true))
{
// manipulate the vector
flag = false; // release ownership of the flag
}
}
The documentation for the atomic builtin reads:
The “bool” version returns true if the comparison is successful and newval was written.
Which means the operation will check to see if flag is false and if it is set the value to true. If the value was false true is returned, otherwise false. All of this happens in an atomic step, so it is guaranteed not to be preempted by another thread.

I don't have the expertise to answer your entire question but your last bullet is incorrect in cases in which reads are non-atomic by default.
A context switch can happen anywhere, the reader can get context switched partway through a read, the writer can get switched in and do the full write, and then the reader would finish their read. The reader would see neither the first value, nor the second value, but potentially some wildly inaccurate intermediate value.

Related

Do I need a memory barrier for a change notification flag between threads?

I need a very fast (in the sense "low cost for reader", not "low latency") change notification mechanism between threads in order to update a read cache:
The situation
Thread W (Writer) updates a data structure (S) (in my case a setting in a map) only once in a while.
Thread R (Reader) maintains a cache of S and does read this very frequently. When Thread W updates S Thread R needs to be notified of the update in reasonable time (10-100ms).
Architecture is ARM, x86 and x86_64. I need to support C++03 with gcc 4.6 and higher.
Code
is something like this:
// variables shared between threads
bool updateAvailable;
SomeMutex dataMutex;
std::string myData;
// variables used only in Thread R
std::string myDataCache;
// Thread W
SomeMutex.Lock();
myData = "newData";
updateAvailable = true;
SomeMutex.Unlock();
// Thread R
if(updateAvailable)
{
SomeMutex.Lock();
myDataCache = myData;
updateAvailable = false;
SomeMutex.Unlock();
}
doSomethingWith(myDataCache);
My Question
In Thread R no locking or barriers occur in the "fast path" (no update available).
Is this an error? What are the consequences of this design?
Do I need to qualify updateAvailable as volatile?
Will R get the update eventually?
My understanding so far
Is it safe regarding data consistency?
This looks a bit like "Double Checked Locking". According to http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html a memory barrier can be used to fix it in C++.
However the major difference here is that the shared resource is never touched/read in the Reader fast path. When updating the cache, the consistency is guaranteed by the mutex.
Will R get the update?
Here is where it gets tricky. As I understand it, the CPU running Thread R could cache updateAvailable indefinitely, effectively moving the Read way way before the actual if statement.
So the update could take until the next cache flush, for example when another thread or process is scheduled.
Use C++ atomics and make updateAvailable an std::atomic<bool>. The reason for this is that it's not just the CPU that can see an old version of the variable but especially the compiler which doesn't see the side effect of another thread and thus never bothers to refetch the variable so you never see the updated value in the thread. Additionally, this way you get a guaranteed atomic read, which you don't have if you just read the value.
Other than that, you could potentially get rid of the lock, if for example the producer only ever produces data when updateAvailable is false, you can get rid of the mutex because the std::atomic<> enforces proper ordering of the reads and writes. If that's not the case, you'll still need the lock.
You do have to use a memory fence here. Without the fence, there is no guarantee updates will be ever seen on the other thread. In C++03 you have the option of either using platform-specific ASM code (mfence on Intel, no idea about ARM) or use OS-provided atomic set/get functions.
Do I need to qualify updateAvailable as volatile?
As volatile doesn't correlate with threading model in C++, you should use atomics for make your program strictly standard-confirmant:
On C++11 or newer preferable way is to use atomic<bool> with memory_order_relaxed store/load:
atomic<bool> updateAvailable;
//Writer
....
updateAvailable.store(true, std::memory_order_relaxed); //set (under mutex locked)
// Reader
if(updateAvailable.load(std::memory_order_relaxed)) // check
{
...
updateAvailable.store(false, std::memory_order_relaxed); // clear (under mutex locked)
....
}
gcc since 4.7 supports similar functionality with in its atomic builtins.
As for gcc 4.6, it seems there is not strictly-confirmant way to evade fences when access updateAvailable variable. Actually, memory fence is usually much faster than 10-100ms order of time. So you can use its own atomic builtins:
int updateAvailable = 0;
//Writer
...
__sync_fetch_and_or(&updateAvailable, 1); // set to non-zero
....
//Reader
if(__sync_fetch_and_and(&updateAvailable, 1)) // check, but never change
{
...
__sync_fetch_and_and(&updateAvailable, 0); // clear
...
}
Is it safe regarding data consistency?
Yes, it is safe. Your reason is absolutely correct here:
the shared resource is never touched/read in the Reader fast path.
This is NOT double-check locking!
It is explicitely stated in the question itself.
In case when updateAvailable is false, Reader thread uses variable myDataCache which is local to the thread (no other threads use it). With double-check locking scheme all threads use shared object directly.
Why memory fences/barriers are NOT NEEDED here
The only variable, accessed concurrently, is updateAvailable. myData variable is accessed with mutex protection, which provides all needed fences. myDataCache is local to the Reader thread.
When Reader thread sees updateAvailable variable to be false, it uses myDataCache variable, which is changed by the thread itself. Program order garantees correct visibility of changes in that case.
As for visibility garantees for variable updateAvailable, C++11 standard provide such garantees for atomic variable even without fences. 29.3 p13 says:
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
Jonathan Wakely has confirmed, that this paragraph is applied even to memory_order_relaxed accesses in chat.

how to declare and use "one writer, many readers, one process, simple type" variable?

I have really simple question. I have simple type variable (like int). I have one process, one writer thread, several "readonly" threads. How should I declare variable?
volatile int
std::atomic<int>
int
I expect that when "writer" thread modifies value all "reader" threads should see fresh value ASAP.
It's ok to read and write variable at the same time, but I expect reader to obtain either old value or new value, not some "intermediate" value.
I'm using single-CPU Xeon E5 v3 machine. I do not need to be portable, I run the code only on this server, i compile with -march=native -mtune=native. Performance is very important so I do not want to add "synchronization overhead" unless absolutely required.
If I just use int and one thread writes value is it possible that in another thread I do not see "fresh" value for a while?
Just use std::atomic.
Don't use volatile, and don't use it as it is; that doesn't give the necessary synchronisation. Modifying it in one thread and accessing it from another without synchronisation will give undefined behaviour.
If you have unsynchronized access to a variable where you have one or more writers then your program has undefined behavior. Some how you have to guarantee that while a write is happening no other write or read can happen. This is called synchronization. How you achieve this synchronization depends on the application.
For something like this where we have one writer and and several readers and are using a TriviallyCopyable datatype then a std::atomic<> will work. The atomic variable will make sure under the hood that only one thread can access the variable at the same time.
If you do not have a TriviallyCopyable type or you do not want to use a std::atomic You could also use a conventional std::mutex and a std::lock_guard to control access
{ // enter locking scope
std::lock_guard lock(mutx); // create lock guard which locks the mutex
some_variable = some_value; // do work
} // end scope lock is destroyed and mutx is released
An important thing to keep in mind with this approach is that you want to keep the // do work section as short as possible as while the mutex is locked no other thread can enter that section.
Another option would be to use a std::shared_timed_mutex(C++14) or std::shared_mutex(C++17) which will allow multiple readers to share the mutex but when you need to write you can still look the mutex and write the data.
You do not want to use volatile to control synchronization as jalf states in this answer:
For thread-safe accesses to shared data, we need a guarantee that:
the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until
much later)
that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be
read. In our code, we simply set the flag after preparing the data, so
all looks fine. But what if the instructions are reordered so the flag
is set first?
volatile does guarantee the first point. It also guarantees that no
reordering occurs between different volatile reads/writes. All
volatile memory accesses will occur in the order in which they're
specified. That is all we need for what volatile is intended for:
manipulating I/O registers or memory-mapped hardware, but it doesn't
help us in multithreaded code where the volatile object is often
only used to synchronize access to non-volatile data. Those accesses
can still be reordered relative to the volatile ones.
As always if you measure the performance and the performance is lacking then you can try a different solution but make sure to remeasure and compare after changing.
Lastly Herb Sutter has an excellent presentation he did at C++ and Beyond 2012 called Atomic Weapons that:
This is a two-part talk that covers the C++ memory model, how locks and atomics and fences interact and map to hardware, and more. Even though we’re talking about C++, much of this is also applicable to Java and .NET which have similar memory models, but not all the features of C++ (such as relaxed atomics).
I'll complete a little bit the previous answers.
As exposed previously, just using int or eventually volatile int is not enough for various reason (even with the memory order constraint of Intel processors.)
So, yes, you should use atomic types for that, but you need extra considerations: atomic types guarantee coherent access but if you have visibility concerns you need to specify memory barrier (memory order.)
Barriers will enforce visibility and coherency between threads, on Intel and most modern architectures, it will enforce cache synchronizations so updates are visible for every cores. The problem is that it may be expensive if you're not careful enough.
Possible memory order are:
relaxed: no special barrier, only coherent read/write are enforce;
sequential consistency: strongest possible constraint (the default);
acquire: enforce that no loads after the current one are reordered before and add the required barrier to ensure that released stores are visible;
consume: a simplified version of acquire that mostly only constraint reordering;
release: enforce that all stores before are complete before the current one and that memory writes are done and visible to loads performing an acquire barrier.
So, if you want to be sure that updates to the variable are visible to readers, you need to flag your store with a (at least) a release memory order and, on the reader side you need an acquire memory order (again, at least.) Otherwise, readers may not see the actual version of the integer (it'll see a coherent version at least, that is the old or the new one, but not an ugly mix of the two.)
Of course, the default behavior (full consistency) will also give you the correct behavior, but at the expense of a lot of synchronization. In short, each time you add a barrier it forces cache synchronization which is almost as expensive as several cache misses (and thus reads/writes in main memory.)
So, in short you should declare your int as atomic and use the following code for store and load:
// Your variable
std::atomic<int> v;
// Read
x = v.load(std::memory_order_acquire);
// Write
v.store(x, std::memory_order_release);
And just to complete, sometimes (and more often that you think) you don't really need the sequential consistency (even the partial release/acquire consistency) since visibility of updates are pretty relative. When dealing with concurrent operations, updates take place not when write is performed but when others see the change, reading the old value is probably not a problem !
I strongly recommend reading articles related to relativistic programming and RCU, here are some interesting links:
Relativistic Programming wiki: http://wiki.cs.pdx.edu/rp/
Structured Deferral: Synchronization via Procrastination: https://queue.acm.org/detail.cfm?id=2488549
Introduction to RCU Concepts: http://www.rdrop.com/~paulmck/RCU/RCU.LinuxCon.2013.10.22a.pdf
Let's start from int at int. In general, when used on single processor, single core machine this should be sufficient, assuming int size same or smaller than CPU word (like 32bit int on 32bit CPU). In this case, assuming correctly aligned address word addresses (high level language should assure this by default) the write/read operations should be atomic. This is guaranteed by Intel as stated in [1] . However, in C++ specification simultaneous reading and writing from different threads is undefined behaviour.
$1.10
6 Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
Now volatile. This keyword disables almost every optimization. This is the reason why it was used. For example, sometimes when optimizing the compiler can come to idea, that variable you only read in one thread is constant there and simply replace it with it's initial value. This solves such problems. However, it does not make access to variable atomic. Also, in most cases, it is simply unnecessary, because use of proper multithreading tools, like mutex or memory barrier, will achieve same effect as volatile on it's own, as described for instance in [2]
While this may be sufficient for most uses, there are other operations that are not guaranteed to be atomic. Like incrementation is a one. This is when std::atomic comes in. It has those operations defined, like here for mentioned incrementations in [3]. It is also well defined when reading and writing from different threads [4].
In addition, as stated in answers in [5], there exists a lot of other factors that may influence (negatively) atomicity of operations. From loosing cache coherency between multiple cores to some hardware details are the factors that may change how operations are performed.
To summarize, std::atomic is created to support accesses from different threads and it is highly recommended to use it when multithreading.
[1] http://www.intel.com/Assets/PDF/manual/253668.pdf see section 8.1.1.
[2] https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt
[3] http://en.cppreference.com/w/cpp/atomic/atomic/operator_arith
[4] http://en.cppreference.com/w/cpp/atomic/atomic
[5] Are C++ Reads and Writes of an int Atomic?
The other answers, which say to use atomic and not volatile, are correct when portability matters. If you’re asking this question, and it’s a good question, that’s the practical answer for you, not, “But, if the standard library doesn’t provide one, you can implement a lock-free, wait-free data structure yourself!” Nevertheless, if the standard library doesn’t provide one, you can implement a lock-free data structure yourself that works on a particular compiler and a particular architecture, provided that there’s only one writer. (Also, somebody has to implement those atomic primitives in the standard library.) If I’m wrong about this, I’m sure someone will kindly inform me.
If you absolutely need an algorithm guaranteed to be lock-free on all platforms, you might be able to build one with atomic_flag. If even that doesn’t suffice, and you need to roll your own data structure, you can do that.
Since there’s only one writer thread, your CPU might guarantee that certain operations on your data will still work atomically even if you just use normal accesses instead of locks or even compare-and-swaps. This is not safe according to the language standard, because C++ has to work on architectures where it isn’t, but it can be safe, for example, on an x86 CPU if you guarantee that the variable you’re updating fits into a single cache line that it doesn’t share with anything else, and you might be able to ensure this with nonstandard extensions such as __attribute__ (( aligned (x) )).
Similarly, your compiler might provide some guarantees: g++ in particular makes guarantees about how the compiler will not assume that the memory referenced by a volatile* hasn’t changed unless the current thread could have changed it. It will actually re-read the variable from memory each time you dereference it. That is in no way sufficient to ensure thread-safety, but it can be handy if another thread is updating the variable.
A real-world example might be: the writer thread maintains some kind of pointer (on its own cache line) which points to a consistent view of the data structure that will remain valid through all future updates. It updates its data with the RCU pattern, making sure to use a release operation (implemented in an architecture-specific way) after updating its copy of the data and before making the pointer to that data globally visible, so that any other thread that sees the updated pointer is guaranteed to see the updated data as well. The reader then makes a local copy (not volatile) of the current value of the pointer, getting a view of the data which will stay valid even after the writer thread updates again, and works with that. You want to use volatile on the single variable that notifies the readers of the updates, so they can see those updates even if the compiler “knows” your thread couldn’t have changed it. In this framework, the shared data just needs to be constant, and readers will use the RCU pattern. That’s one of the two ways I’ve seen volatile be useful in the real world (the other being when you don’t want to optimize out your timing loop).
There also needs to be some way, in this scheme, for the program to know when nobody’s using an old view of the data structure any longer. If that’s a count of readers, that count needs to be atomically modified in a single operation at the same time as the pointer is read (so getting the current view of the data structure involves an atomic CAS). Or this might be a periodic tick when all the threads are guaranteed to be done with the data they’re working with now. It might be a generational data structure where the writer rotates through pre-allocated buffers.
Also observe that a lot of things your program might do could implicitly serialize the threads: those atomic hardware instructions lock the processor bus and force other CPUs to wait, those memory fences could stall your threads, or your threads might be waiting in line to allocate memory from the heap.
Unfortunately it depends.
When a variable is read and written in multiple threads, there may be 2 failures.
1) tearing. Where half the data is pre-change and half the data is post change.
2) stale data. Where the data read has some older value.
int, volatile int and std:atomic all don't tear.
Stale data is a different issue. However, all values have existed, can be concieved as correct.
volatile. This tells the compiler neither to cache the data, nor to re-order operations around it. This improves the coherence between threads by ensuring all operations in a thread are either before the variable, at the variable, or after.
This means that
volatile int x;
int y;
y =5;
x = 7;
the instruction for x = 7 will be written after y = 5;
Unfortunately, the CPU is also capable of re-ordering operations. This can mean that another thread sees x ==7 before y =5
std::atomic x; would allow a guarantee that after seeing x==7, another thread would see y ==5. (Assuming other threads are not modifying y)
So all reads of int, volatile int, std::atomic<int> would show previous valid values of x. Using volatile and atomic increase the ordering of values.
See kernel.org barriers
I have simple type variable (like int).
I have one process, one writer thread, several "readonly" threads. How
should I declare variable?
volatile int
std::atomic
int
Use std::atomic with memory_order_relaxed for the store and load
It's quick, and from your description of your problem, safe. E.g.
void func_fast()
{
std::atomic<int> a;
a.store(1, std::memory_order_relaxed);
}
Compiles to:
func_fast():
movl $1, -24(%rsp)
ret
This assumes you don't need to guarantee that any other data is seen to be written before the integer is updated, and therefore the slower and more complicated synchronisation is unnecessary.
If you use the atomic naively like this:
void func_slow()
{
std::atomic<int> b;
b = 1;
}
You get an MFENCE instruction with no memory_order* specification which is massive slower (100 cycles more more vs just 1 or 2 for the bare MOV).
func_slow():
movl $1, -24(%rsp)
mfence
ret
See http://goo.gl/svPpUa
(Interestingly on Intel the use of memory_order_release and _acquire for this code results in the same assembly language. Intel guarantees that writes and reads happen in order when using the standard MOV instruction).
Here is my attempt at bounty:
- a. General answer already given above says 'use atomics'. This is correct answer. volatile is not enough.
-a. If you dislike the answer, and you are on Intel, and you have properly aligned int, and you love unportable solutions, you can do away with simple volatile, using Intel strong memory ordering gurantees.
TL;DR: Use std::atomic<int> with a mutex around it if you read multiple times.
Depends on how strong guarantees you want.
First volatile is a compiler hint and you shouldn't count on it doing something helpful.
If you use int you can suffer for memory aliasing. Say you have something like
struct {
int x;
bool q;
}
Depending on how this is aligned in memory and the exact implementation of CPU and memory bus it's possible that writing to q will actually overwrite x when the page is copied from the cpu cache back to ram. So unless you know how much to allocate around your int it's not guaranteed that your writer will be able to write without being overwritten by some other thread.
Also even if you write you depend on the processor for reloading the data to the cache of other cores so there's no guarantee that your other thread will see a new value.
std::atomic<int> basically guarantees that you will always allocate sufficient memory, properly aligned so that you don't suffer from aliasing. Depending on the memory order requested you will also disable a bunch of optimizations, like caching, so everything will run slightly slower.
This still doesn't grantee that if your read the var multiple times you'll get the value. The only way to do that is to put a mutex around it to block the writer from changing it.
Still better find a library that already solves the problem you have and it has already been tested by others to make sure it works well.

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?
It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.
Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.
The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.
No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

Volatile and multithreading: is the following thread-safe?

Assume there are two threads running Thread1() and Thread2() respectively. The thread 1 just sets a global flag to tell thread 2 to quit and thread 2 periodically checks if it should quit.
volatile bool is_terminate = false;
void Thread1()
{
is_terminate = true;
}
void Thread2()
{
while (!is_terminate) {
// ...
}
}
I want to ask if the above code is safe assuming that access to is_terminate is atomic. I already know many materials state that volatile can not insure thread-safety generally. But in the situation that only one atomic variable is shared, do we really need to protect the shared variable using a lock?
It is probably sort of thread-safe.
Thread safety tends to depend on context. Updating a bool is always thread safe, if you never read from it.
And if you do read from it, then the answer depends on when you read from it, and what that read signifies.
On some CPUs, but not all, writes to an object of type bool will be atomic. x86 CPUs will generally make it atomic, but others might not. If the update isn't atomic, then adding volatile won't help you.
But the next problem is reordering. The compiler (and CPU) will carry out reads/writes to volatile variables in the order specified, without any reordering. So that's good.
But it makes no guarantee about reordering one volatile memory access relative to all the non-volatile ones. So a common example is that you define some kind of flag to protect access to a resource, you make the flag volatile, and then the compiler moves the resource access up so it happens before you check the flag. It's allowed to do that, because it's not reordering the internal ordering of two volatile accesses, but merely a volatile and a non-volatile one.
Honestly, the question I'd ask is why not just do it properly?
It is possible that volatile will work in this situation, but why not save yourself the trouble, and make it clearer that it's correct? Slap a memory barrier around it instead.
It is not thread safe.
If the threads, for example, are run on CPUs with separate caches there are no language rules saying that the caches are to be synchronized when writing a volatile variable. The other thread may not see the change for a very long time, if ever.
To answer in another way:
If volatile is enough to be thread safe, why is C++0x adding an entire chapter with atomic operations?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html
First, volatile is used for disabling compile optimization in c/c++. see this for understanding volatile.
The core of atomic is word align and size of is_terminate, if size of is_terminate is less than machine native size and aligned, then R and W of it is atomic.
In your context, with or without volatile, thread2 may read old value after thread1 modified it, but thread2 can read it eventually.
If eventually-read is OK for you, then your codes are thread safety.
it's safe because one thread is only reading and one is only writing.
The threads aren't really sharing that flag, one is reading, one is writing. You can't have a race because the other thread will never write a bad result, and the reading thread will never read a bad result. simple.
No, it is not. It could be thread safe if the value access was atomic, but in C++ you can't assume that variables access is thread-safe unless you use some compiler-specific constructs or synchronization primitives.
It is still not safe. You should use synchronizaton to access is_terminate Access to the bool is not guaranteed to be an atomic operation.
I believe that this code is safe, until both the threads are not writing the bool (already you have mentioned that value access is atomic).
The big problem with assuming that the volatile keyword imposes any kind of thread safety, is that the C or C++ standards have no concept of threads in the abstract machine they describe.
The guarantees that the standard imposes on the volatile keyword, are only valid within a thread - not between multiple threads.
This leaves implementors with full liberty to do whatever they please when it comes to threads. If they chose to implement the volatile keyword to be safe across threads, then you're lucky. More often than not, that's not the case though.
This code isn't seems to be thread safe. Reason can be explained easily.
Problem lies in below code line
"is_terminate = true;"
Even if access to "is_terminate" is atomic, above statement is not atomic.
This statement includes more than 1 operations. Like load "is_terminate" and update "is_terminate".
Now gotcha is if is_terminate is loaded and not updated and thread switches to another one.
Now thread 2 expected result to be true but it won't get it.
Make sure "is_terminate = true;" is atomic. So lock it.
Hope it helps.

C++ Thread, shared data

I have an application where 2 threads are running... Is there any certanty that when I change a global variable from one thread, the other will notice this change?
I don't have any syncronization or Mutual exclusion system in place... but should this code work all the time (imagine a global bool named dataUpdated):
Thread 1:
while(1) {
if (dataUpdated)
updateScreen();
doSomethingElse();
}
Thread 2:
while(1) {
if (doSomething())
dataUpdated = TRUE;
}
Does a compiler like gcc optimize this code in a way that it doesn't check for the global value, only considering it value at compile time (because it nevers get changed at the same thred)?
PS: Being this for a game-like application, it really doen't matter if there will be a read while the value is being written... all that matters is that the change gets noticed by the other thread.
Yes. No. Maybe.
First, as others have mentioned you need to make dataUpdated volatile; otherwise the compiler may be free to lift reading it out of the loop (depending on whether or not it can see that doSomethingElse doesn't touch it).
Secondly, depending on your processor and ordering needs, you may need memory barriers. volatile is enough to guarentee that the other processor will see the change eventually, but not enough to guarentee that the changes will be seen in the order they were performed. Your example only has one flag, so it doesn't really show this phenomena. If you need and use memory barriers, you should no longer need volatile
Volatile considered harmful and Linux Kernel Memory Barriers are good background on the underlying issues; I don't really know of anything similar written specifically for threading. Thankfully threads don't raise these concerns nearly as often as hardware peripherals do, though the sort of case you describe (a flag indicating completion, with other data presumed to be valid if the flag is set) is exactly the sort of thing where ordering matterns...
Here is an example that uses boost condition variables:
bool _updated=false;
boost::mutex _access;
boost::condition _condition;
bool updated()
{
return _updated;
}
void thread1()
{
boost::mutex::scoped_lock lock(_access);
while (true)
{
boost::xtime xt;
boost::xtime_get(&xt, boost::TIME_UTC);
// note that the second parameter to timed_wait is a predicate function that is called - not the address of a variable to check
if (_condition.timed_wait(lock, &updated, xt))
updateScreen();
doSomethingElse();
}
}
void thread2()
{
while(true)
{
if (doSomething())
_updated=true;
}
}
Use a lock. Always always use a lock to access shared data. Marking the variable as volatile will prevent the compiler from optimizing away the memory read, but will not prevent other problems such as memory re-ordering. Without a lock there is no guarantee that the memory writes in doSomething() will be visible in the updateScreen() function.
The only other safe way is to use a memory fence, either explicitly or an implicitly using an Interlocked* function for example.
Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;
The above will guarantee that any access to the variable will be to and from memory without any specific optimizations and as a result all threads running on the same processor will "see" changes to the variable with the same semantics as the code reads.
Chris Jester-Young pointed out that coherency concerns to such a variable value change may arise in a multi-processor systems. This is a consideration and it depends on the platform.
Actually, there are really two considerations to think about relative to platform. They are coherency and atomicity of the memory transactions.
Atomicity is actually a consideration for both single and multi-processor platforms. The issue arises because the variable is likely multi-byte in nature and the question is if one thread could see a partial update to the value or not. ie: Some bytes changed, context switch, invalid value read by interrupting thread. For a single variable that is at the natural machine word size or smaller and naturally aligned should not be a concern. Specifically, an int type should always be OK in this regard as long as it is aligned - which should be the default case for the compiler.
Relative to coherency, this is a potential concern in a multi-processor system. The question is if the system implements full cache coherency or not between processors. If implemented, this is typically done with the MESI protocol in hardware. The question didn't state platforms, but both Intel x86 platforms and PowerPC platforms are cache coherent across processors for normally mapped program data regions. Therefore this type of issue should not be a concern for ordinary data memory accesses between threads even if there are multiple processors.
The final issue relative to atomicity that arises is specific to read-modify-write atomicity. That is, how do you guarantee that if a value is read updated in value and the written, that this happen atomically, even across processors if more than one. So, for this to work without specific synchronization objects, would require that all potential threads accessing the variable are readers ONLY but expect for only one thread can ever be a writer at one time. If this is not the case, then you do need a sync object available to be able to ensure atomic actions on read-modify-write actions to the variable.
Your solution will use 100% CPU, among other problems. Google for "condition variable".
Chris Jester-Young pointed out that:
This only work under Java 1.5+'s memory model. The C++ standard does not address threading, and volatile does not guarantee memory coherency between processors. You do need a memory barrier for this
being so, the only true answer is implementing a synchronization system, right?
Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;
No, it's not certain. If you declare the variable volatile, then the complier is supposed to generate code that always loads the variable from memory on a read.
If the scope is right ( "extern", global, etc. ) then the change will be noticed. The question is when? And in what order?
The problem is that the compiler can and frequently will re-order your logic to fill all it's concurrent pipelines as a performance optimization.
It doesn't really show in your specific example because there aren't any other instructions around your assignment, but imagine functions declared after your bool assign execute before the assignment.
Check-out Pipeline Hazard on wikipedia or search google for "compiler instruction reordering"
As others have said the volatile keyword is your friend. :-)
You'll most likely find that your code would work when you had all of the optimisation options disabled in gcc. In this case (I believe) it treats everything as volatile and as a result the variable is accessed in memory for every operation.
With any sort of optimisation turned on the compiler will attempt to use a local copy held in a register. Depending on your functions this may mean that you only see the change in variable intermittently or, at worst, never.
Using the keyword volatile indicates to the compiler that the contents of this variable can change at any time and that it should not use a locally cached copy.
With all of that said you may find better results (as alluded to by Jeff) through the use of a semaphore or condition variable.
This is a reasonable introduction to the subject.