C++ Multithread with global variables

C++ Multithread with global variables - c++

Anyone know whether primitive global variable is thread safe or not?
// global variable
int count = 0;
void thread1()
{
count++;
}
void thread2()
{
count--;
if (count == 0) print("Stuff thing");
}
Can I do it this way without any lock protection for count?
Thank you.

This is not threadsafe. You have a race-condition here. The reason for that is, that count++ is not necessarily atomic (means not a single processor operation). The value is first loaded, then incremented, and then written back. Between each of these steps, the other thread can also modify the value.

No, it's not. It may be, depending on the implementation, the compile-time options and even the phase of the moon.
But the standard doesn't mandate something is thread-safe, specifically because there's nothing about threading in the current standard.
See also here for a more detailed analysis of this sort of issue.
If you're using an environment where threads are supported, you can use a mutex: examine the pthread_mutex_* calls under POSIX threads, for example.
If you're coding for C++0x/C++11, use either a mutex or one of the atomic operations detailed in that standard.

It will be thread safe only if you have 1 CPU with ++ and -- atomic operations on your PC.
If you want to make it thread safe this is the way for Windows:
LONG volatile count = 0;
void Thread1()
{
::InterlockedIncrement( &count );
}
void Thread2()
{
if(::InterlockedDecrement( &count ) == 0 )
{
printf("Stuf");
}
}

You need two things to safely use an object concurrently by two threads or more: atomicity of operations and ordering guarantees.
Some people will pretend that on some platforms what you're attempting here is safe because e.g. operations on whatever type int stands for those platforms are atomic (even incrementing or whatever). The problem with this is that you don't necessarily have ordering guarantees. So while you want and know that this particular variable is going to be accessed concurrently, the compiler doesn't. (And the compiler is right to assume that this variable is going to be used by only one thread at a time: you don't want every variable to be treated as being potentially shared. The performance consequences would be terrible.)
So don't use primitive types this way. You have no guarantees from the language and even if some platforms have their own guarantees (e.g. atomicity) you have no way of telling the compiler that the variable is shared with C++. Either use compiler extensions for atomic types, the C++0x atomic types, or library solutions (e.g. mutexes). And don't let the name mislead you: to be truly useful, an atomic type has to provide ordering guarantees along with the atomicity that comes with the name.

It is a Global variable and hence multiple threads can race to change it. It is not thread safe.
Use a mutex lock.

In general, no, you can't get away with that. In your trivial example case, it might happen to work but it's not reliable.

Related

what makes c++ atomic atomic

I just know that even the code int64_t_variable = 10 is not an atomic operation. for example
int64_t i = 0;
i = 100;
in another thread (say T2) it may read the value which is not 0 or 100.
is the above true?
std::atomic<int64_t> i; i.store(100, std::memory_order_relaxed) is atomic. so what magic does atomic use to make it happen based on Q1 is true?
I always think any operation which handles less than 64 bits is atomic (assume 64 bit cpu), looks I was wrong. so for v = n, how I can know if it is atomic or not? for example, if v is void *, is it atomic or not?
==================================
Update: In my question: when T2 read the i, both 0 and 100 are ok for me. but any other result is not reasonable. This is the point. so I don't think cpu cache or compiler stuff can make this happen.

is the above true?
Yes. If you do not use synchronization (std::atmoic<>, std::mutex, ...) then any change you make in one thread can not be assumed to show up in other threads. It could even be the compiler optimizes something away because there is no way it can change in function. For example
bool flag = true;
void stop_stuff()
{
flag = false;
}
void do_stuff()
{
while (flag)
std::cout << "never stop";
}
since there is no synchronization for flag, the compiler is free to assume it never changes optmize away even checking flag in the loop condition. If it does then no matter how many times you call stop_stuff, do_stuff will never end.
If you change flag to std::atomic<bool> flag, then the compiler can no longer make such an assumption as you are telling it that this variable can change outside the scope of the function and it needs to be checked.
Do note that not providing synchronization when you have more than one thread with shared data and at least one of those threads writes to the shared data is called a data race and per the standard is undefined behavior. The above example is just one possible outcome of that undefined behavior.
std::atomic<int64_t> i; i.store(100, std::memory_order_relaxed) is atomic. so what magic does atomic use to make it happen based on Q1 is true?
It either uses an atomic primitive your system provides, or it uses a locking mechanism like std::mutex to guard the access.
I always think any operation which handles less than 64 bits is atomic (assume 64 bit cpu), looks I was wrong. so for v = n, how I can know if it is atomic or not? for example, if v is void *, is it atomic or not?
While this may be true on some systems, it is not true for the C++ memory model. The only things that are atomic are std::atomic<T>.

Yes. Reasons include "it requires multiple instructions", "memory caching can mess with it" and "the compiler may do surprising things (aka UB) if you fail to mention that the variable might be changed elsewhere".
The above reasons need to all be treated, and std::atomic gives the compiler/C++ library implementation the information to do so. The compiler won't be allowed to do any surprise optimizations, it might issue cache flushes where necessary, and the implementation may use a locking mechanism to prevent different operations on the same object from interleaving. Note that not all of these may be necessary: x86 has some atomicity guarantees built in. You can inform yourself whether a given std::atomic type is always lock-free on your platform with std::atomic::is_always_lockfree, and whether a given instance is lock-free (due to e.g. aligned/unaligned access).
If you don't use std::atomic, you have no atomicity guarantees due to the above reasons. It might be atomic on the instruction-level on your CPU, but C++ is defined on an abstract machine without such guarantees. If your code relies on atomicity on the abstract machine but fails to specify this, the compiler might make invalid optimizations and produce UB.

Yes, for example if there is 8bit memory, another thread may read partially written variable.
Whatever it takes to guarantee correct results, it can be nothing, built in atomic operations, locks etc.
void* is not atomic std::atomic < void* > is.
Same for the char, to be atomic it needs to be explicitly stated as atomic.

Why don't compilers merge redundant std::atomic writes?

I'm wondering why no compilers are prepared to merge consecutive writes of the same value to a single atomic variable, e.g.:
#include <atomic>
std::atomic<int> y(0);
void f() {
auto order = std::memory_order_relaxed;
y.store(1, order);
y.store(1, order);
y.store(1, order);
}
Every compiler I've tried will issue the above write three times. What legitimate, race-free observer could see a difference between the above code and an optimized version with a single write (i.e. doesn't the 'as-if' rule apply)?
If the variable had been volatile, then obviously no optimization is applicable. What's preventing it in my case?
Here's the code in compiler explorer.

The C++11 / C++14 standards as written do allow the three stores to be folded/coalesced into one store of the final value. Even in a case like this:
y.store(1, order);
y.store(2, order);
y.store(3, order); // inlining + constant-folding could produce this in real code
The standard does not guarantee that an observer spinning on y (with an atomic load or CAS) will ever see y == 2. A program that depended on this would have a data race bug, but only the garden-variety bug kind of race, not the C++ Undefined Behaviour kind of data race. (It's UB only with non-atomic variables). A program that expects to sometimes see it is not necessarily even buggy. (See below re: progress bars.)
Any ordering that's possible on the C++ abstract machine can be picked (at compile time) as the ordering that will always happen. This is the as-if rule in action. In this case, it's as if all three stores happened back-to-back in the global order, with no loads or stores from other threads happening between the y=1 and y=3.
It doesn't depend on the target architecture or hardware; just like compile-time reordering of relaxed atomic operations are allowed even when targeting strongly-ordered x86. The compiler doesn't have to preserve anything you might expect from thinking about the hardware you're compiling for, so you need barriers. The barriers may compile into zero asm instructions.
So why don't compilers do this optimization?
It's a quality-of-implementation issue, and can change observed performance / behaviour on real hardware.
The most obvious case where it's a problem is a progress bar. Sinking the stores out of a loop (that contains no other atomic operations) and folding them all into one would result in a progress bar staying at 0 and then going to 100% right at the end.
There's no C++11 std::atomic way to stop them from doing it in cases where you don't want it, so for now compilers simply choose never to coalesce multiple atomic operations into one. (Coalescing them all into one operation doesn't change their order relative to each other.)
Compiler-writers have correctly noticed that programmers expect that an atomic store will actually happen to memory every time the source does y.store(). (See most of the other answers to this question, which claim the stores are required to happen separately because of possible readers waiting to see an intermediate value.) i.e. It violates the principle of least surprise.
However, there are cases where it would be very helpful, for example avoiding useless shared_ptr ref count inc/dec in a loop.
Obviously any reordering or coalescing can't violate any other ordering rules. For example, num++; num--; would still have to be full barrier to runtime and compile-time reordering, even if it no longer touched the memory at num.
Discussion is under way to extend the std::atomic API to give programmers control of such optimizations, at which point compilers will be able to optimize when useful, which can happen even in carefully-written code that isn't intentionally inefficient. Some examples of useful cases for optimization are mentioned in the following working-group discussion / proposal links:
http://wg21.link/n4455: N4455 No Sane Compiler Would Optimize Atomics
http://wg21.link/p0062: WG21/P0062R1: When should compilers optimize atomics?
See also discussion about this same topic on Richard Hodges' answer to Can num++ be atomic for 'int num'? (see the comments). See also the last section of my answer to the same question, where I argue in more detail that this optimization is allowed. (Leaving it short here, because those C++ working-group links already acknowledge that the current standard as written does allow it, and that current compilers just don't optimize on purpose.)
Within the current standard, volatile atomic<int> y would be one way to ensure that stores to it are not allowed to be optimized away. (As Herb Sutter points out in an SO answer, volatile and atomic already share some requirements, but they are different). See also std::memory_order's relationship with volatile on cppreference.
Accesses to volatile objects are not allowed to be optimized away (because they could be memory-mapped IO registers, for example).
Using volatile atomic<T> mostly fixes the progress-bar problem, but it's kind of ugly and might look silly in a few years if/when C++ decides on different syntax for controlling optimization so compilers can start doing it in practice.
I think we can be confident that compilers won't start doing this optimization until there's a way to control it. Hopefully it will be some kind of opt-in (like a memory_order_release_coalesce) that doesn't change the behaviour of existing code C++11/14 code when compiled as C++whatever. But it could be like the proposal in wg21/p0062: tag don't-optimize cases with [[brittle_atomic]].
wg21/p0062 warns that even volatile atomic doesn't solve everything, and discourages its use for this purpose. It gives this example:
if(x) {
foo();
y.store(0);
} else {
bar();
y.store(0); // release a lock before a long-running loop
for() {...} // loop contains no atomics or volatiles
}
// A compiler can merge the stores into a y.store(0) here.
Even with volatile atomic<int> y, a compiler is allowed to sink the y.store() out of the if/else and just do it once, because it's still doing exactly 1 store with the same value. (Which would be after the long loop in the else branch). Especially if the store is only relaxed or release instead of seq_cst.
volatile does stop the coalescing discussed in the question, but this points out that other optimizations on atomic<> can also be problematic for real performance.
Other reasons for not optimizing include: nobody's written the complicated code that would allow the compiler to do these optimizations safely (without ever getting it wrong). This is not sufficient, because N4455 says LLVM already implements or could easily implement several of the optimizations it mentioned.
The confusing-for-programmers reason is certainly plausible, though. Lock-free code is hard enough to write correctly in the first place.
Don't be casual in your use of atomic weapons: they aren't cheap and don't optimize much (currently not at all). It's not always easy easy to avoid redundant atomic operations with std::shared_ptr<T>, though, since there's no non-atomic version of it (although one of the answers here gives an easy way to define a shared_ptr_unsynchronized<T> for gcc).

You are referring to dead-stores elimination.
It is not forbidden to eliminate an atomic dead store but it is harder to prove that an atomic store qualifies as such.
Traditional compiler optimizations, such as dead store elimination, can be performed on atomic operations, even sequentially consistent ones.
Optimizers have to be careful to avoid doing so across synchronization points because another thread of execution can observe or modify memory, which means that the traditional optimizations have to consider more intervening instructions than they usually would when considering optimizations to atomic operations.
In the case of dead store elimination it isn’t sufficient to prove that an atomic store post-dominates and aliases another to eliminate the other store.
from N4455 No Sane Compiler Would Optimize Atomics
The problem of atomic DSE, in the general case, is that it involves looking for synchronization points, in my understanding this term means points in the code where there is happen-before relationship between an instruction on a thread A and instruction on another thread B.
Consider this code executed by a thread A:
y.store(1, std::memory_order_seq_cst);
y.store(2, std::memory_order_seq_cst);
y.store(3, std::memory_order_seq_cst);
Can it be optimised as y.store(3, std::memory_order_seq_cst)?
If a thread B is waiting to see y = 2 (e.g. with a CAS) it would never observe that if the code gets optimised.
However, in my understanding, having B looping and CASsing on y = 2 is a data race as there is not a total order between the two threads' instructions.
An execution where the A's instructions are executed before the B's loop is observable (i.e. allowed) and thus the compiler can optimise to y.store(3, std::memory_order_seq_cst).
If threads A and B are synchronized, somehow, between the stores in thread A then the optimisation would not be allowed (a partial order would be induced, possibly leading to B potentially observing y = 2).
Proving that there is not such a synchronization is hard as it involves considering a broader scope and taking into account all the quirks of an architecture.
As for my understanding, due to the relatively small age of the atomic operations and the difficulty in reasoning about memory ordering, visibility and synchronization, compilers don't perform all the possible optimisations on atomics until a more robust framework for detecting and understanding the necessary conditions is built.
I believe your example is a simplification of the counting thread given above, as it doesn't have any other thread or any synchronization point, for what I can see, I suppose the compiler could have optimised the three stores.

While you are changing the value of an atomic in one thread, some other thread may be checking it and performing an operation based on the value of the atomic. The example you gave is so specific that compiler developers don't see it worth optimizing. However, if one thread is setting e.g. consecutive values for an atomic: 0, 1, 2, etc., the other thread may be putting something in the slots indicated by the value of the atomic.

NB: I was going to comment this but it's a bit too wordy.
One interesting fact is that this behavior isn't in the terms of C++ a data race.
Note 21 on p.14 is interesting: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf (my emphasis):
The execution of a program contains a data race if it contains two
conflicting actions in different threads, at least one of which is
not atomic
Also on p.11 note 5 :
“Relaxed” atomic operations are not synchronization operations even
though, like synchronization operations, they cannot contribute to
data races.
So a conflicting action on an atomic is never a data race - in terms of the C++ standard.
These operations are all atomic (and specifically relaxed) but no data race here folks!
I agree there's no reliable/predictable difference between these two on any (reasonable) platform:
include <atomic>
std::atomic<int> y(0);
void f() {
auto order = std::memory_order_relaxed;
y.store(1, order);
y.store(1, order);
y.store(1, order);
}
and
include <atomic>
std::atomic<int> y(0);
void f() {
auto order = std::memory_order_relaxed;
y.store(1, order);
}
But within the definition provided C++ memory model it isn't a data race.
I can't easily understand why that definition is provided but it does hand the developer a few cards to engage in haphazard communication between threads that they may know (on their platform) will statistically work.
For example, setting a value 3 times then reading it back will show some degree of contention for that location. Such approaches aren't deterministic but many effective concurrent algorithms aren't deterministic.
For example, a timed-out try_lock_until() is always a race condition but remains a useful technique.
What it appears the C++ Standard is providing you with certainty around 'data races' but permitting certain fun-and-games with race conditions which are on final analysis different things.
In short the standard appears to specify that where other threads may see the 'hammering' effect of a value being set 3 times, other threads must be able to see that effect (even if they sometimes may not!).
It's the case where pretty much all modern platforms that other thread may under some circumstances see the hammering.

In short, because the standard (for example the paragaraphs around and below 20 in [intro.multithread]) disallows for it.
There are happens-before guarantees which must be fulfilled, and which among other things rule out reordering or coalescing writes (paragraph 19 even says so explicitly about reordering).
If your thread writes three values to memory (let's say 1, 2, and 3) one after another, a different thread may read the value. If, for example, your thread is interrupted (or even if it runs concurrently) and another thread also writes to that location, then the observing thread must see the operations in exactly the same order as they happen (either by scheduling or coincidence, or whatever reason). That's a guarantee.
How is this possible if you only do half of the writes (or even only a single one)? It isn't.
What if your thread instead writes out 1 -1 -1 but another one sporadically writes out 2 or 3? What if a third thread observes the location and waits for a particular value that just never appears because it's optimized out?
It is impossible to provide the guarantees that are given if stores (and loads, too) aren't performed as requested. All of them, and in the same order.

A practical use case for the pattern, if the thread does something important between updates that does not depend on or modify y, might be: *Thread 2 reads the value of y to check how much progress Thread 1 has made.`
So, maybe Thread 1 is supposed to load the configuration file as step 1, put its parsed contents into a data structure as step 2, and display the main window as step 3, while Thread 2 is waiting on step 2 to complete so it can perform another task in parallel that depends on the data structure. (Granted, this example calls for acquire/release semantics, not relaxed ordering.)
I’m pretty sure a conforming implementation allows Thread 1 not to update y at any intermediate step—while I haven’t pored over the language standard, I would be shocked if it does not support hardware on which another thread polling y might never see the value 2.
However, that is a hypothetical instance where it might be pessimal to optimize away the status updates. Maybe a compiler dev will come here and say why that compiler chose not to, but one possible reason is letting you shoot yourself in the foot, or at least stub yourself in the toe.

The compiler writer cannot just perform the optimisation. They must also convince themselves that the optimisation is valid in the situations where the compiler writer intends to apply it, that it will not be applied in situations where it is not valid, that it doesn't break code that is in fact broken but "works" on other implementations. This is probably more work than the optimisation itself.
On the other hand, I could imagine that in practice (that is in programs that are supposed to do a job, and not benchmarks), this optimisation will save very little in execution time.
So a compiler writer will look at the cost, then look at the benefit and the risks, and probably will decide against it.

Let's walk a little further away from the pathological case of the three stores being immediately next to each other. Let's assume there's some non-trivial work being done between the stores, and that such work does not involve y at all (so that data path analysis can determine that the three stores are in fact redundant, at least within this thread), and does not itself introduce any memory barriers (so that something else doesn't force the stores to be visible to other threads). Now it is quite possible that other threads have an opportunity to get work done between the stores, and perhaps those other threads manipulate y and that this thread has some reason to need to reset it to 1 (the 2nd store). If the first two stores were dropped, that would change the behaviour.

Since variables contained within an std::atomic object are expected to be accessed from multiple threads, one should expect that they behave, at a minimum, as if they were declared with the volatile keyword.
That was the standard and recommended practice before CPU architectures introduced cache lines, etc.
[EDIT2] One could argue that std::atomic<> are the volatile variables of the multicore age. As defined in C/C++, volatile is only good enough to synchronize atomic reads from a single thread, with an ISR modifying the variable (which in this case is effectively an atomic write as seen from the main thread).
I personally am relieved that no compiler would optimize away writes to an atomic variable. If the write is optimized away, how can you guarantee that each of these writes could potentially be seen by readers in other threads? Don't forget that that is also part of the std::atomic<> contract.
Consider this piece of code, where the result would be greatly affected by wild optimization by the compiler.
#include <atomic>
#include <thread>
static const int N{ 1000000 };
std::atomic<int> flag{1};
std::atomic<bool> do_run { true };
void write_1()
{
while (do_run.load())
{
flag = 1; flag = 1; flag = 1; flag = 1;
flag = 1; flag = 1; flag = 1; flag = 1;
flag = 1; flag = 1; flag = 1; flag = 1;
flag = 1; flag = 1; flag = 1; flag = 1;
}
}
void write_0()
{
while (do_run.load())
{
flag = -1; flag = -1; flag = -1; flag = -1;
}
}
int main(int argc, char** argv)
{
int counter{};
std::thread t0(&write_0);
std::thread t1(&write_1);
for (int i = 0; i < N; ++i)
{
counter += flag;
std::this_thread::yield();
}
do_run = false;
t0.join();
t1.join();
return counter;
}
[EDIT] At first, I was not advancing that the volatile was central to the implementation of atomics, but...
Since there seemed to be doubts as to whether volatile had anything to do with atomics, I investigated the matter. Here's the atomic implementation from the VS2017 stl. As I surmised, the volatile keyword is everywhere.
// from file atomic, line 264...
// TEMPLATE CLASS _Atomic_impl
template<unsigned _Bytes>
struct _Atomic_impl
{ // struct for managing locks around operations on atomic types
typedef _Uint1_t _My_int; // "1 byte" means "no alignment required"
constexpr _Atomic_impl() _NOEXCEPT
: _My_flag(0)
{ // default constructor
}
bool _Is_lock_free() const volatile
{ // operations that use locks are not lock-free
return (false);
}
void _Store(void *_Tgt, const void *_Src, memory_order _Order) volatile
{ // lock and store
_Atomic_copy(&_My_flag, _Bytes, _Tgt, _Src, _Order);
}
void _Load(void *_Tgt, const void *_Src,
memory_order _Order) const volatile
{ // lock and load
_Atomic_copy(&_My_flag, _Bytes, _Tgt, _Src, _Order);
}
void _Exchange(void *_Left, void *_Right, memory_order _Order) volatile
{ // lock and exchange
_Atomic_exchange(&_My_flag, _Bytes, _Left, _Right, _Order);
}
bool _Compare_exchange_weak(
void *_Tgt, void *_Exp, const void *_Value,
memory_order _Order1, memory_order _Order2) volatile
{ // lock and compare/exchange
return (_Atomic_compare_exchange_weak(
&_My_flag, _Bytes, _Tgt, _Exp, _Value, _Order1, _Order2));
}
bool _Compare_exchange_strong(
void *_Tgt, void *_Exp, const void *_Value,
memory_order _Order1, memory_order _Order2) volatile
{ // lock and compare/exchange
return (_Atomic_compare_exchange_strong(
&_My_flag, _Bytes, _Tgt, _Exp, _Value, _Order1, _Order2));
}
private:
mutable _Atomic_flag_t _My_flag;
};
All of the specializations in the MS stl use volatile on the key functions.
Here's the declaration of one of such key function:
inline int _Atomic_compare_exchange_strong_8(volatile _Uint8_t *_Tgt, _Uint8_t *_Exp, _Uint8_t _Value, memory_order _Order1, memory_order _Order2)
You will notice the required volatile uint8_t* holding the value contained in the std::atomic. This pattern can be observed throughout the MS std::atomic<> implementation, Here is no reason for the gcc team, nor any other stl provider to have done it differently.

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?

It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.

Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.

The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.

No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

Volatile and multithreading: is the following thread-safe?

Assume there are two threads running Thread1() and Thread2() respectively. The thread 1 just sets a global flag to tell thread 2 to quit and thread 2 periodically checks if it should quit.
volatile bool is_terminate = false;
void Thread1()
{
is_terminate = true;
}
void Thread2()
{
while (!is_terminate) {
// ...
}
}
I want to ask if the above code is safe assuming that access to is_terminate is atomic. I already know many materials state that volatile can not insure thread-safety generally. But in the situation that only one atomic variable is shared, do we really need to protect the shared variable using a lock?

It is probably sort of thread-safe.
Thread safety tends to depend on context. Updating a bool is always thread safe, if you never read from it.
And if you do read from it, then the answer depends on when you read from it, and what that read signifies.
On some CPUs, but not all, writes to an object of type bool will be atomic. x86 CPUs will generally make it atomic, but others might not. If the update isn't atomic, then adding volatile won't help you.
But the next problem is reordering. The compiler (and CPU) will carry out reads/writes to volatile variables in the order specified, without any reordering. So that's good.
But it makes no guarantee about reordering one volatile memory access relative to all the non-volatile ones. So a common example is that you define some kind of flag to protect access to a resource, you make the flag volatile, and then the compiler moves the resource access up so it happens before you check the flag. It's allowed to do that, because it's not reordering the internal ordering of two volatile accesses, but merely a volatile and a non-volatile one.
Honestly, the question I'd ask is why not just do it properly?
It is possible that volatile will work in this situation, but why not save yourself the trouble, and make it clearer that it's correct? Slap a memory barrier around it instead.

It is not thread safe.
If the threads, for example, are run on CPUs with separate caches there are no language rules saying that the caches are to be synchronized when writing a volatile variable. The other thread may not see the change for a very long time, if ever.
To answer in another way:
If volatile is enough to be thread safe, why is C++0x adding an entire chapter with atomic operations?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html

First, volatile is used for disabling compile optimization in c/c++. see this for understanding volatile.
The core of atomic is word align and size of is_terminate, if size of is_terminate is less than machine native size and aligned, then R and W of it is atomic.
In your context, with or without volatile, thread2 may read old value after thread1 modified it, but thread2 can read it eventually.
If eventually-read is OK for you, then your codes are thread safety.

it's safe because one thread is only reading and one is only writing.
The threads aren't really sharing that flag, one is reading, one is writing. You can't have a race because the other thread will never write a bad result, and the reading thread will never read a bad result. simple.

No, it is not. It could be thread safe if the value access was atomic, but in C++ you can't assume that variables access is thread-safe unless you use some compiler-specific constructs or synchronization primitives.

It is still not safe. You should use synchronizaton to access is_terminate Access to the bool is not guaranteed to be an atomic operation.

I believe that this code is safe, until both the threads are not writing the bool (already you have mentioned that value access is atomic).

The big problem with assuming that the volatile keyword imposes any kind of thread safety, is that the C or C++ standards have no concept of threads in the abstract machine they describe.
The guarantees that the standard imposes on the volatile keyword, are only valid within a thread - not between multiple threads.
This leaves implementors with full liberty to do whatever they please when it comes to threads. If they chose to implement the volatile keyword to be safe across threads, then you're lucky. More often than not, that's not the case though.

This code isn't seems to be thread safe. Reason can be explained easily.
Problem lies in below code line
"is_terminate = true;"
Even if access to "is_terminate" is atomic, above statement is not atomic.
This statement includes more than 1 operations. Like load "is_terminate" and update "is_terminate".
Now gotcha is if is_terminate is loaded and not updated and thread switches to another one.
Now thread 2 expected result to be true but it won't get it.
Make sure "is_terminate = true;" is atomic. So lock it.
Hope it helps.

C++ Thread, shared data

I have an application where 2 threads are running... Is there any certanty that when I change a global variable from one thread, the other will notice this change?
I don't have any syncronization or Mutual exclusion system in place... but should this code work all the time (imagine a global bool named dataUpdated):
Thread 1:
while(1) {
if (dataUpdated)
updateScreen();
doSomethingElse();
}
Thread 2:
while(1) {
if (doSomething())
dataUpdated = TRUE;
}
Does a compiler like gcc optimize this code in a way that it doesn't check for the global value, only considering it value at compile time (because it nevers get changed at the same thred)?
PS: Being this for a game-like application, it really doen't matter if there will be a read while the value is being written... all that matters is that the change gets noticed by the other thread.

Yes. No. Maybe.
First, as others have mentioned you need to make dataUpdated volatile; otherwise the compiler may be free to lift reading it out of the loop (depending on whether or not it can see that doSomethingElse doesn't touch it).
Secondly, depending on your processor and ordering needs, you may need memory barriers. volatile is enough to guarentee that the other processor will see the change eventually, but not enough to guarentee that the changes will be seen in the order they were performed. Your example only has one flag, so it doesn't really show this phenomena. If you need and use memory barriers, you should no longer need volatile
Volatile considered harmful and Linux Kernel Memory Barriers are good background on the underlying issues; I don't really know of anything similar written specifically for threading. Thankfully threads don't raise these concerns nearly as often as hardware peripherals do, though the sort of case you describe (a flag indicating completion, with other data presumed to be valid if the flag is set) is exactly the sort of thing where ordering matterns...

Here is an example that uses boost condition variables:
bool _updated=false;
boost::mutex _access;
boost::condition _condition;
bool updated()
{
return _updated;
}
void thread1()
{
boost::mutex::scoped_lock lock(_access);
while (true)
{
boost::xtime xt;
boost::xtime_get(&xt, boost::TIME_UTC);
// note that the second parameter to timed_wait is a predicate function that is called - not the address of a variable to check
if (_condition.timed_wait(lock, &updated, xt))
updateScreen();
doSomethingElse();
}
}
void thread2()
{
while(true)
{
if (doSomething())
_updated=true;
}
}

Use a lock. Always always use a lock to access shared data. Marking the variable as volatile will prevent the compiler from optimizing away the memory read, but will not prevent other problems such as memory re-ordering. Without a lock there is no guarantee that the memory writes in doSomething() will be visible in the updateScreen() function.
The only other safe way is to use a memory fence, either explicitly or an implicitly using an Interlocked* function for example.

Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;
The above will guarantee that any access to the variable will be to and from memory without any specific optimizations and as a result all threads running on the same processor will "see" changes to the variable with the same semantics as the code reads.
Chris Jester-Young pointed out that coherency concerns to such a variable value change may arise in a multi-processor systems. This is a consideration and it depends on the platform.
Actually, there are really two considerations to think about relative to platform. They are coherency and atomicity of the memory transactions.
Atomicity is actually a consideration for both single and multi-processor platforms. The issue arises because the variable is likely multi-byte in nature and the question is if one thread could see a partial update to the value or not. ie: Some bytes changed, context switch, invalid value read by interrupting thread. For a single variable that is at the natural machine word size or smaller and naturally aligned should not be a concern. Specifically, an int type should always be OK in this regard as long as it is aligned - which should be the default case for the compiler.
Relative to coherency, this is a potential concern in a multi-processor system. The question is if the system implements full cache coherency or not between processors. If implemented, this is typically done with the MESI protocol in hardware. The question didn't state platforms, but both Intel x86 platforms and PowerPC platforms are cache coherent across processors for normally mapped program data regions. Therefore this type of issue should not be a concern for ordinary data memory accesses between threads even if there are multiple processors.
The final issue relative to atomicity that arises is specific to read-modify-write atomicity. That is, how do you guarantee that if a value is read updated in value and the written, that this happen atomically, even across processors if more than one. So, for this to work without specific synchronization objects, would require that all potential threads accessing the variable are readers ONLY but expect for only one thread can ever be a writer at one time. If this is not the case, then you do need a sync object available to be able to ensure atomic actions on read-modify-write actions to the variable.

Your solution will use 100% CPU, among other problems. Google for "condition variable".

Chris Jester-Young pointed out that:
This only work under Java 1.5+'s memory model. The C++ standard does not address threading, and volatile does not guarantee memory coherency between processors. You do need a memory barrier for this
being so, the only true answer is implementing a synchronization system, right?

Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;

No, it's not certain. If you declare the variable volatile, then the complier is supposed to generate code that always loads the variable from memory on a read.

If the scope is right ( "extern", global, etc. ) then the change will be noticed. The question is when? And in what order?
The problem is that the compiler can and frequently will re-order your logic to fill all it's concurrent pipelines as a performance optimization.
It doesn't really show in your specific example because there aren't any other instructions around your assignment, but imagine functions declared after your bool assign execute before the assignment.
Check-out Pipeline Hazard on wikipedia or search google for "compiler instruction reordering"

As others have said the volatile keyword is your friend. :-)
You'll most likely find that your code would work when you had all of the optimisation options disabled in gcc. In this case (I believe) it treats everything as volatile and as a result the variable is accessed in memory for every operation.
With any sort of optimisation turned on the compiler will attempt to use a local copy held in a register. Depending on your functions this may mean that you only see the change in variable intermittently or, at worst, never.
Using the keyword volatile indicates to the compiler that the contents of this variable can change at any time and that it should not use a locally cached copy.
With all of that said you may find better results (as alluded to by Jeff) through the use of a semaphore or condition variable.
This is a reasonable introduction to the subject.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js