Is 'volatile' needed in this multi-threaded C++ code?

Is 'volatile' needed in this multi-threaded C++ code? - c++

I've written a Windows program in C++ which at times uses two threads: one background thread for performing time-consuming work; and another thread for managing the graphical interface. This way the program is still responsive to the user, which is needed to be able to abort a certain operation. The threads communicate via a shared bool variable, which is set to true when the GUI thread signals the worker thread to abort. Here is the code which implements this behaviour (I've stripped away irrelevant parts):
CODE EXECUTED BY THE GUI THREAD
class ProgressBarDialog : protected Dialog {
/**
* This points to the variable which the worker thread reads to check if it
* should abort or not.
*/
bool volatile* threadParameterAbort_;
...
BOOL CALLBACK ProgressBarDialog::DialogProc( HWND dialog, UINT message,
WPARAM wParam, LPARAM lParam ) {
switch( message ) {
case WM_COMMAND :
switch ( LOWORD( wParam ) ) {
...
case IDCANCEL :
case IDC_BUTTON_CANCEL :
switch ( progressMode_ ) {
if ( confirmAbort() ) {
// This causes the worker thread to be aborted
*threadParameterAbort_ = true;
}
break;
}
return TRUE;
}
}
return FALSE;
}
...
};
CODE EXECUTED BY THE WORKER THREAD
class CsvFileHandler {
/**
* This points to the variable which is set by the GUI thread when this
* thread should abort its execution.
*/
bool volatile* threadParamAbort_;
...
ParseResult parseFile( ItemList* list ) {
ParseResult result;
...
while ( readLine( &line ) ) {
if ( ( threadParamAbort_ != NULL ) && *threadParamAbort_ ) {
break;
}
...
}
return result;
}
...
};
threadParameterAbort_ in both threads point to a bool variable declared in a structure which is passed to the worker thread upon creation. It is declared as
bool volatile abortExecution_;
My question is: do I need to use volatile here, and is the code above sufficient to ensure that the program is thread-safe? The way I've reasoned for justifying the use of volatile here (see this question for background) is that it will:
prevent the reading of *threadParameterAbort_ to use the cache and instead get the value from memory, and
prevent the compiler from removing the if clause in the worker thread due to optimization.
(The following paragraph is only concerned with the thread-safety of the program as such and does not, I repeat, does not involve claiming that volatile in any way provides any means of ensuring thread-safety.) As far as I can tell, it should be thread-safe as setting of a bool variable should in most, if not all, architectures be an atomic operation. But I could be wrong. And I'm also worried about if the compiler may reorder instructions such as to break thread-safety. But better be safe (no pun intended) than sorry.
EDIT:
A minor mistake in my wording made the question appear as if I was asking if volatile is enough to ensure thread-safety. This was not my intent -- volatile does indeed not ensure thread-safety in any way -- but what I meant to ask was if the code provided above exhibit the correct behaviour to ensure that the program is thread-safe.

You should not depend on volatile to guarantee thread safety, this is because even though the compiler will guarantee that the the variable is always read from memory (and not a register cache), in multi-processor environments a memory barrier will also be required.
Rather use the correct lock around the shared memory. Locks like a Critical Section are often extremely lightweight and in a case of no contention will probably be all implemented userside. They will also contain the necessary memory barriers.
Volatile should only be used for memory mapped IO where multiple reads may return different values. Similarly for memory mapped writes.

Wikipedia says it pretty well.
In C, and consequently C++, the
volatile keyword was intended to allow access to memory mapped devices allow uses of variables between setjmp allow uses of sig_atomic_t variables in signal handlers
Operations on volatile variables are
not atomic nor establish a proper
happens-before relationship for
threading. This is according to the
relevant standards (C, C++, POSIX,
WIN32), and this is the matter of fact
for the vast majority of current
implementations. The volatile
keyword is basically worthless as a
portable threading construct.

volatile is neither necessary nor sufficient for multithreading in C++. It disables optimizations that are perfectly acceptable, but fails to enforce things like atomicity that are needed.
Edit: instead of using a critical section, I'd probably use InterlockedIncrement, which gives an atomic write with less overhead.
What I'd normally do, however, is hook up a thread-safe queue (or deque) as the input to the thread. When you have something for the thread to do, you just put a packet of data describing the job into the queue, and the thread does it when it can. When you want the thread to shut-down normally, you put a "shutdown" packet into the queue. If you need an immediate abort, you use the deque instead, and put the "abort" command on the front of the deque. In theory, this has the shortcoming that it doesn't abort the thread until it finishes its current task. About all that means is that you want to keep each task to roughly the same size/latency range as the frequency with which you currently check the flag.
This general design avoids a whole host of IPC problems.

With regard to my answer to yesterday's question, no, volatile is unnecessary. In fact, multithreading here is irrelevant.
while ( readLine( &line ) ) { // threadParamAbort_ is not local:
if ( ( threadParamAbort_ != NULL ) && *threadParamAbort_ ) {
prevent the reading of *threadParameterAbort_ to use the cache and instead get the value from
memory, and
prevent the compiler from removing the if clause in the worker
thread due to optimization.
The function readLine is external library code, or else it calls external library code. Therefore, the compiler cannot assume there are any nonlocal variables it does not modify. Once a pointer to an object (or its superobject) has been formed, it might be passed and stored anywhere. The compiler can't track what pointers end up in global variables and which don't.
So, the compiler assumes that readLine has its own private static bool * to threadParamAbort_, and modifies the value. Therefore it's necessary to reload from memory.

Seems that the same use case is descibed here: volatile - Multithreaded Programmer's Best Friend by Alexandrescu. It states that exactly in this case (to create flag) volatile can be perfectly used.
So, yes exactly in this case code should be correct. volative will prevent both - reading from cache and prevent compiler from optimizing out if statement.

Ok, so you've been beaten up enough about volatile and thread safety!, but...
An example for your specific code (though a stretch and within your control) is the fact that you are looking at your variable several times in the one 'transaction':
if ( ( threadParamAbort_ != NULL ) && *threadParamAbort_ )
If for whatever reason threadParamAbort_ is deleted after the left hand side and before the right hand side then you will dereference a deleted pointer. Again, this is unlikely given you have control but is an example of what volatile and atomicity can't do for you.

While the other answers are correct, I also suggest you take a look at the "Microsoft specific" section of the MSDN documentation for volatile

I think it'll work fine (atomic or not) because you're only using it to cancel a background operation.
The main thing that volatile is going to do is prevent the variable being cached or retrieved from a register. You can be sure that it's coming from main memory.
So if I were you, yes I would define the variable as volatile.
I think it's best to assume that volatile is needed to ensure that the variable when written and then read by the other threads that they actually get the correct value for that variable as soon as possible and to ensure that IF is not optimised away (although I doubt it would be).

Related

Volatile vs. memory fences

The code below is used to assign work to multiple threads, wake them up, and wait until they are done. The "work" in this case consists of "cleaning a volume". What exactly this operation does is irrelevant for this question -- it just helps with the context. The code is part of a huge transaction processing system.
void bf_tree_cleaner::force_all()
{
for (int i = 0; i < vol_m::MAX_VOLS; i++) {
_requested_volumes[i] = true;
}
// fence here (seq_cst)
wakeup_cleaners();
while (true) {
usleep(10000); // 10 ms
bool remains = false;
for (int vol = 0; vol < vol_m::MAX_VOLS; ++vol) {
// fence here (seq_cst)
if (_requested_volumes[vol]) {
remains = true;
break;
}
}
if (!remains) {
break;
}
}
}
A value in a boolean array _requested_volumes[i] tells whether thread i has work to do. When it is done, the worker thread sets it to false and goes back to sleep.
The problem I am having is that the compiler generates an infinite loop, where the variable remains is always true, even though all values in the array have been set to false. This only happens with -O3.
I have tried two solutions to fix that:
Declare _requested_volumes volatile
(EDIT: this solution does work actually. See edit below)
Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses. But there's a lot of dispute over this on the Internet. The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access. In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming.
Introduce memory fences
The method wakeup_cleaners() acquires a pthread_mutex_t internally in order to set a wake-up flag in the worker threads, so it should implicitly produce proper memory fences. But I'm not sure if those fences affect memory accesses in the caller method (force_all()). Therefore, I manually introduced fences in the locations specified by the comments above. This should make sure that writes performed by the worker thread in _requested_volumes are visible in the main thread.
What puzzles me is that none of these solutions works, and I have absolutely no idea why. The semantics and proper use of memory fences and volatile is confusing me right now. The problem is that the compiler is applying an undesired optimization -- hence the volatile attempt. But it could also be a problem of thread synchronization -- hence the memory fence attempt.
I could try a third solution in which a mutex protects every access to _requested_volumes, but even if that works, I would like to understand why, because as far as I understand, it's all about memory fences. Thus, it should make no difference whether it's done explicitly or implicitly via a mutex.
EDIT: My assumptions were wrong and Solution 1 actually does work. However, my question remains in order to clarify the use of volatile vs. memory fences. If volatile is such a bad thing, that should never be used in multithreaded programming, what else should I use here? Do memory fences also affect compiler optimizations? Because I see these as two orthogonal issues, and therefore orthogonal solutions: fences for visibility in multiple threads and volatile for preventing optimizations.

Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses.
Yes.
But there's a lot of dispute over this on the Internet.
Not, generally, between "the experts".
The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access.
Nope.
Non-pure, non-constexpr non-inlined function calls (getters/accessors) also necessarily have this effect. Admittedly link-time optimization confuses the issue of which functions may really get inlined.
In C, and by extension C++, volatile affects memory access optimization. Java took this keyword, and since it can't (or couldn't) do the tasks C uses volatile for in the first place, altered it to provide a memory fence.
The correct way to get the same effect in C++ is using std::atomic.
In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming.
No, it may have the desired effect, depending on how it interacts with your platform's cache hardware. This is brittle - it could change any time you upgrade a CPU, or add another one, or change your scheduler behaviour - and it certainly isn't portable.
If you're really just tracking how many workers are still working, sane methods might be a semaphore (synchronized counter), or mutex+condvar+integer count. Either are likely more efficient than busy-looping with a sleep.
If you're wedded to the busy loop, you could still reasonably have a single counter, such as std::atomic<size_t>, which is set by wakeup_cleaners and decremented as each cleaner completes. Then you can just wait for it to reach zero.
If you really want a busy loop and really prefer to scan the array each time, it should be an array of std::atomic<bool>. That way you can decide what consistency you need from each load, and it will control both the compiler optimizations and the memory hardware appropriately.

Apparently, volatile does the necessary for your example. The topic of volatile qualifier itself is too broad: you can start by searching "C++ volatile vs atomic" etc. There are a lot of articles and questions&answers on the internet, e.g. Concurrency: Atomic and volatile in C++11 memory model .
Briefly, volatile tells the compiler to disable some aggressive optimizations, particularly, to read the variable each time it is accessed (rather than storing it in a register or cache). There are compilers which do more so making volatile to act more like std::atomic: see Microsoft Specific section here. In your case disablement of an aggressive optimization is exactly what was necessary.
However, volatile doesn't define the order for the execution of the statements around it. That is why you need memory order in case you need to do something else with the data after the flags you check have been set.
For inter-thread communication it is appropriate to use std::atomic, particularly, you need to refactor _requested_volumes[vol] to be of type std::atomic<bool> or even std::atomic_flag: http://en.cppreference.com/w/cpp/atomic/atomic .
An article that discourages usage of volatile and explains that volatile can be used only in rare special cases (connected with hardware I/O): https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?

It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.

Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.

The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.

No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

C++ Multithread with global variables

Anyone know whether primitive global variable is thread safe or not?
// global variable
int count = 0;
void thread1()
{
count++;
}
void thread2()
{
count--;
if (count == 0) print("Stuff thing");
}
Can I do it this way without any lock protection for count?
Thank you.

This is not threadsafe. You have a race-condition here. The reason for that is, that count++ is not necessarily atomic (means not a single processor operation). The value is first loaded, then incremented, and then written back. Between each of these steps, the other thread can also modify the value.

No, it's not. It may be, depending on the implementation, the compile-time options and even the phase of the moon.
But the standard doesn't mandate something is thread-safe, specifically because there's nothing about threading in the current standard.
See also here for a more detailed analysis of this sort of issue.
If you're using an environment where threads are supported, you can use a mutex: examine the pthread_mutex_* calls under POSIX threads, for example.
If you're coding for C++0x/C++11, use either a mutex or one of the atomic operations detailed in that standard.

It will be thread safe only if you have 1 CPU with ++ and -- atomic operations on your PC.
If you want to make it thread safe this is the way for Windows:
LONG volatile count = 0;
void Thread1()
{
::InterlockedIncrement( &count );
}
void Thread2()
{
if(::InterlockedDecrement( &count ) == 0 )
{
printf("Stuf");
}
}

You need two things to safely use an object concurrently by two threads or more: atomicity of operations and ordering guarantees.
Some people will pretend that on some platforms what you're attempting here is safe because e.g. operations on whatever type int stands for those platforms are atomic (even incrementing or whatever). The problem with this is that you don't necessarily have ordering guarantees. So while you want and know that this particular variable is going to be accessed concurrently, the compiler doesn't. (And the compiler is right to assume that this variable is going to be used by only one thread at a time: you don't want every variable to be treated as being potentially shared. The performance consequences would be terrible.)
So don't use primitive types this way. You have no guarantees from the language and even if some platforms have their own guarantees (e.g. atomicity) you have no way of telling the compiler that the variable is shared with C++. Either use compiler extensions for atomic types, the C++0x atomic types, or library solutions (e.g. mutexes). And don't let the name mislead you: to be truly useful, an atomic type has to provide ordering guarantees along with the atomicity that comes with the name.

It is a Global variable and hence multiple threads can race to change it. It is not thread safe.
Use a mutex lock.

In general, no, you can't get away with that. In your trivial example case, it might happen to work but it's not reliable.

Can a C/C++ compiler legally cache a variable in a register across a pthread library call?

Suppose that we have the following bit of code:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
void guarantee(bool cond, const char *msg) {
if (!cond) {
fprintf(stderr, "%s", msg);
exit(1);
}
}
bool do_shutdown = false; // Not volatile!
pthread_cond_t shutdown_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t shutdown_cond_mutex = PTHREAD_MUTEX_INITIALIZER;
/* Called in Thread 1. Intended behavior is to block until
trigger_shutdown() is called. */
void wait_for_shutdown_signal() {
int res;
res = pthread_mutex_lock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not lock shutdown cond mutex");
while (!do_shutdown) { // while loop guards against spurious wakeups
res = pthread_cond_wait(&shutdown_cond, &shutdown_cond_mutex);
guarantee(res == 0, "Could not wait for shutdown cond");
}
res = pthread_mutex_unlock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not unlock shutdown cond mutex");
}
/* Called in Thread 2. */
void trigger_shutdown() {
int res;
res = pthread_mutex_lock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not lock shutdown cond mutex");
do_shutdown = true;
res = pthread_cond_signal(&shutdown_cond);
guarantee(res == 0, "Could not signal shutdown cond");
res = pthread_mutex_unlock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not unlock shutdown cond mutex");
}
Can a standards-compliant C/C++ compiler ever cache the value of do_shutdown in a register across the call to pthread_cond_wait()? If not, which standards/clauses guarantee this?
The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown. This seems rather improbable, but I know of no standard that prevents it.
In practice, do any C/C++ compilers cache the value of do_shutdown in a register across the call to pthread_cond_wait()?
Which function calls is the compiler guaranteed not to cache the value of do_shutdown across? It's clear that if the function is declared externally and the compiler cannot access its definition, it must make no assumptions about its behavior so it cannot prove that it does not access do_shutdown. If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?

Of course the current C and C++ standards say nothing on the subject.
As far as I know, Posix still avoids formally defining a concurrency model (I may be out of date, though, in which case apply my answer only to earlier Posix versions). Therefore what it does say has to be read with a little sympathy - it does not precisely lay out the requirements in this area, but implementers are expected to "know what it means" and do something that makes threads usable.
When the standard says that mutexes "synchronize memory access", implementations must assume that this means changes made under the lock in one thread will be visible under the lock in other threads. In other words, it's necessary (although not sufficient) that synchronization operations include memory barriers of one kind or another, and necessary behaviour of a memory barrier is that it must assume globals can change.
Threads Cannot be Implemented as a Library covers some specific issues that are required for a pthreads to actually be usable, but are not explicitly stated in the Posix standard at the time of writing (2004). It becomes quite important whether your compiler-writer, or whoever defined the memory model for your implementation, agrees with Boehm what "usable" means, in terms of allowing the programmer to "reason convincingly about program correctness".
Note that Posix doesn't guarantee a coherent memory cache, so if your implementation perversely wants to cache do_something in a register in your code, then even if you marked it volatile, it might perversely choose not to dirty your CPU's local cache between the synchronizing operation and reading do_something. So if the writer thread is running on a different CPU with its own cache, you might not see the change even then.
That's (one reason) why threads cannot be implemented merely as a library. This optimization of fetching a volatile global only from local CPU cache would be valid in a single-threaded C implementation[*], but breaks multi-threaded code. Hence, the compiler needs to "know about" threads, and how they affect other language features (for an example outside pthreads: on Windows, where cache is always coherent, Microsoft spells out the additional semantics that it grants volatile in multi-threaded code). Basically, you have to assume that if your implementation has gone to the trouble of providing the pthreads functions, then it will go to the trouble of defining a workable memory model in which locks actually synchronize memory access.
If the compiler can inline the
function and prove it does not access
do_shutdown, then can it cache
do_shutdown even in a multithreaded
setting? What about a non-inlined
function in the same compilation unit?
Yes to all of this - if the object is non-volatile, and the compiler can prove that this thread doesn't modify it (either through its name or through an aliased pointer), and if no memory barriers occur, then it can reuse previous values. There can and will be other implementation-specific conditions that sometimes stop it, of course.
[*] provided that the implementation knows the global is not located at some "special" hardware address which requires that reads always go through cache to main memory in order to see the results of whatever hardware op affects that address. But to put a global at any such location, or to make its location special with DMA or whatever, requires implementation-specific magic. Absent any such magic the implementation in principle can sometimes know this.

Since do_shutdown has external linkage there's no way the compiler could know what happens to it across the call (unless it had full visibility to the functions being called). So it would have to reload the value (volatile or not - threading has no bearing on this) after the call.
As far as I know there's nothing directly said about this in the standard, except that the (single-threaded) abstract machine the standard uses to define the behavior of expressions indicates that the variable needs to be read when it's accessed in an expression. The standard permits that reading of the variable to be optimized away only if the behavior can be proven to be "as if" it were reloaded. And that can happen only if the compiler can know that the value was not modified by the function call.
Also not that the pthread library does make certain guarantees about memory barriers for various functions, including pthread_cond_wait(): Does guarding a variable with a pthread mutex guarantee it's also not cached?
Now, if do_shutdown were static (no external linkage) and you have several threads that used that static variable defined in the same module (ie., the address of the static variable was never taken to be passed to another module), That might be a different story. for example, say that you have a single function that used such a variable, and started several thread instances running for that function. In that case, a standards conforming compiler implementation might cache the value across function calls since it could assume that nothing else could modify the value (the standard's abstract machine model doesn't include threading).
So in that case, you would have to use mechanisms to ensure that the value was reloaded across the call. Note that because of hardware intricacies, the volatile keyword might not be adequate to ensure correct memory access ordering - you should rely on APIs provided by pthreads or the OS to ensure that. (as a side-note, recent versions of Microsoft's compilers do document that volatile enforce full memory barriers, but I've read opinions that indicate this isn't required by the standard).

The hand-waving answers are all wrong. Sorry to be harsh.
There is no way
The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown.
If you believe differently, please show proof: a complete C++ program such that a compiler not designed for MT could deduce that pthread_cond_wait does not modify do_shutdown.
It's absurd, a compiler cannot possibly understand what pthread_ functions do, unless it has built-in knowledge of POSIX threads.

From my own work, I can say that yes, the compiler can cache values across pthread_mutex_lock/pthread_mutex_unlock. I spent most of a weekend tracing down a bug in a bit of code that was caused by a set of pointers assignments being cached and unavailable to the threads that needed them. As a quick test, I wrapped the assignments in a mutex lock/unlock, and the threads still did not have access to the proper pointer values. Moving the pointer assignments & associated mutex locking to a separate function did fix the problem.

C++ Thread, shared data

I have an application where 2 threads are running... Is there any certanty that when I change a global variable from one thread, the other will notice this change?
I don't have any syncronization or Mutual exclusion system in place... but should this code work all the time (imagine a global bool named dataUpdated):
Thread 1:
while(1) {
if (dataUpdated)
updateScreen();
doSomethingElse();
}
Thread 2:
while(1) {
if (doSomething())
dataUpdated = TRUE;
}
Does a compiler like gcc optimize this code in a way that it doesn't check for the global value, only considering it value at compile time (because it nevers get changed at the same thred)?
PS: Being this for a game-like application, it really doen't matter if there will be a read while the value is being written... all that matters is that the change gets noticed by the other thread.

Yes. No. Maybe.
First, as others have mentioned you need to make dataUpdated volatile; otherwise the compiler may be free to lift reading it out of the loop (depending on whether or not it can see that doSomethingElse doesn't touch it).
Secondly, depending on your processor and ordering needs, you may need memory barriers. volatile is enough to guarentee that the other processor will see the change eventually, but not enough to guarentee that the changes will be seen in the order they were performed. Your example only has one flag, so it doesn't really show this phenomena. If you need and use memory barriers, you should no longer need volatile
Volatile considered harmful and Linux Kernel Memory Barriers are good background on the underlying issues; I don't really know of anything similar written specifically for threading. Thankfully threads don't raise these concerns nearly as often as hardware peripherals do, though the sort of case you describe (a flag indicating completion, with other data presumed to be valid if the flag is set) is exactly the sort of thing where ordering matterns...

Here is an example that uses boost condition variables:
bool _updated=false;
boost::mutex _access;
boost::condition _condition;
bool updated()
{
return _updated;
}
void thread1()
{
boost::mutex::scoped_lock lock(_access);
while (true)
{
boost::xtime xt;
boost::xtime_get(&xt, boost::TIME_UTC);
// note that the second parameter to timed_wait is a predicate function that is called - not the address of a variable to check
if (_condition.timed_wait(lock, &updated, xt))
updateScreen();
doSomethingElse();
}
}
void thread2()
{
while(true)
{
if (doSomething())
_updated=true;
}
}

Use a lock. Always always use a lock to access shared data. Marking the variable as volatile will prevent the compiler from optimizing away the memory read, but will not prevent other problems such as memory re-ordering. Without a lock there is no guarantee that the memory writes in doSomething() will be visible in the updateScreen() function.
The only other safe way is to use a memory fence, either explicitly or an implicitly using an Interlocked* function for example.

Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;
The above will guarantee that any access to the variable will be to and from memory without any specific optimizations and as a result all threads running on the same processor will "see" changes to the variable with the same semantics as the code reads.
Chris Jester-Young pointed out that coherency concerns to such a variable value change may arise in a multi-processor systems. This is a consideration and it depends on the platform.
Actually, there are really two considerations to think about relative to platform. They are coherency and atomicity of the memory transactions.
Atomicity is actually a consideration for both single and multi-processor platforms. The issue arises because the variable is likely multi-byte in nature and the question is if one thread could see a partial update to the value or not. ie: Some bytes changed, context switch, invalid value read by interrupting thread. For a single variable that is at the natural machine word size or smaller and naturally aligned should not be a concern. Specifically, an int type should always be OK in this regard as long as it is aligned - which should be the default case for the compiler.
Relative to coherency, this is a potential concern in a multi-processor system. The question is if the system implements full cache coherency or not between processors. If implemented, this is typically done with the MESI protocol in hardware. The question didn't state platforms, but both Intel x86 platforms and PowerPC platforms are cache coherent across processors for normally mapped program data regions. Therefore this type of issue should not be a concern for ordinary data memory accesses between threads even if there are multiple processors.
The final issue relative to atomicity that arises is specific to read-modify-write atomicity. That is, how do you guarantee that if a value is read updated in value and the written, that this happen atomically, even across processors if more than one. So, for this to work without specific synchronization objects, would require that all potential threads accessing the variable are readers ONLY but expect for only one thread can ever be a writer at one time. If this is not the case, then you do need a sync object available to be able to ensure atomic actions on read-modify-write actions to the variable.

Your solution will use 100% CPU, among other problems. Google for "condition variable".

Chris Jester-Young pointed out that:
This only work under Java 1.5+'s memory model. The C++ standard does not address threading, and volatile does not guarantee memory coherency between processors. You do need a memory barrier for this
being so, the only true answer is implementing a synchronization system, right?

Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;

No, it's not certain. If you declare the variable volatile, then the complier is supposed to generate code that always loads the variable from memory on a read.

If the scope is right ( "extern", global, etc. ) then the change will be noticed. The question is when? And in what order?
The problem is that the compiler can and frequently will re-order your logic to fill all it's concurrent pipelines as a performance optimization.
It doesn't really show in your specific example because there aren't any other instructions around your assignment, but imagine functions declared after your bool assign execute before the assignment.
Check-out Pipeline Hazard on wikipedia or search google for "compiler instruction reordering"

As others have said the volatile keyword is your friend. :-)
You'll most likely find that your code would work when you had all of the optimisation options disabled in gcc. In this case (I believe) it treats everything as volatile and as a result the variable is accessed in memory for every operation.
With any sort of optimisation turned on the compiler will attempt to use a local copy held in a register. Depending on your functions this may mean that you only see the change in variable intermittently or, at worst, never.
Using the keyword volatile indicates to the compiler that the contents of this variable can change at any time and that it should not use a locally cached copy.
With all of that said you may find better results (as alluded to by Jeff) through the use of a semaphore or condition variable.
This is a reasonable introduction to the subject.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js