I have a linux multithread application in C++.
In this application in class App offer variable Status:
class App {
...
typedef enum { asStop=0, asStart, asRestart, asWork, asClose } TAppStatus;
TAppStatus Status;
...
}
All threads are often check Status by calling GetStatus() function.
inline TAppStatus App::GetStatus(){ return Status };
Other functions of the application can assign a different values to a Status variable by calling SetStatus() function and do not use Mutexes.
void App::SetStatus( TAppStatus aStatus ){ Status=aStatus };
Edit: All threads use Status in switch operator:
switch ( App::GetStatus() ){ case asStop: ... case asStart: ... };
Is the assignment in this case, an atomic operation?
Is this correct code?
Thanks.
There is no portable way to implement synchronized variables in C99 or C++03 and pthread library does not provide one either. You can:
Use C++0x <atomic> header (or C1x <stdatomic.h>). Gcc does support it for C++ if given -std=c++0x or -std=gnu++0x option since version 4.4.
Use the Linux-specific <linux/atomic.h> (this is implementation used by kernel, but it should be usable from userland as well).
Use GCC-specific __sync_* builtin functions.
Use some other library that provides atomic operations like glib.
Use locks, but that's orders of magnitude slower compared to the fast operation itself.
Note: As Martinho pointed out, while they are called "atomic", for store and load it's not the atomic property (operation cannot be interrupted and load always sees or does not see the whole store, which is usually true of 32-bit stores and loads) but the ordering property (if you store a and than b, nobody may get new value of b and than old value of a) that is hard to get but necessary in this case.
This depends entirely upon the enum representation chosen. For x86, I believe, then all assignment operations of the operating system's word size (so 32bit for x86 and 64bit for x64) and alignment of that size as well are atomic, so a simple read and write is atomic.
Even assuming that it is the correct size and alignment, this doesn't mean that these functions are thread-safe, depends on what the status is used for.
Edit: In addition, the compiler's optimizer may well wreak havoc if there are no uses of atomic operations or other volatile accesses.
Edit to your edit: No, that's not thread safe at all. If you converted it manually into a jump table then you might be thread-safe, I'll need to think about it for a little while.
On certain architecture this assignment might be atomic (by accident), but even if it is, this code is wrong. Compiler and hardware may perform various optimizations, with might break this "atomicity". Look at: http://video.google.com/videoplay?docid=-4714369049736584770#
Use locks or atomic http://www.stdthread.co.uk/doc/headers/atomic/atomic.html variable to fix it.
Related
(Note: I've added tags to this question based on where I feel will people will be who are likely to be able to help, so please don't shout:))
In my VS 2017 64bit project, I have a 32bit long value m_lClosed. When I want to update this, I use one of the Interlocked family of functions.
Consider this code, executing on thread #1
LONG lRet = InterlockedCompareExchange(&m_lClosed, 1, 0); // Set m_lClosed to 1 provided it's currently 0
Now consider this code, executing on thread #2:
if (m_lClosed) // Do something
I understand that on a single CPU, this will not be a problem because the update is atomic and the read is atomic too (see MSDN), so thread pre-emption cannot leave the variable in a partially updated state. But on a multicore CPU, we really could have both these pieces of code executing in parallel if each thread is on a different CPU. In this example, I don't think that would be a problem, but it still feels wrong to be testing something that is in the process of possibly being updated.
This webpage tells me that atomicity on multiple CPUs is achieved via the LOCK assembly instruction, preventing other CPUs from accessing that memory. That sounds like what I need, but the assembly language generated for the if test above is merely
cmp dword ptr [l],0
... no LOCK instruction in sight.
How in an event like this are we supposed to ensure atomicity of the read?
EDIT 24/4/18
Firstly thanks for all the interest this question has generated. I show below the actual code; I purposely kept it simple to focus on the atomicity of it all, but clearly it would have been better if I had showed it all from minute one.
Secondly, the project in which the actual code lives is a VS2005 project; hence no access to C++11 atomics. That's why I didn't add the C++11 tag to the question. I am using VS2017 with a "scratch" project to save having to build the huge VS2005 one every time I make a change whilst I am learning. Plus, its a better IDE.
Right, so the actual code lives in an IOCP driven server, and this whole atomicity is about handling a closed socket:
class CConnection
{
//...
DWORD PostWSARecv()
{
if (!m_lClosed)
return ::WSARecv(...);
else
return WSAESHUTDOWN;
}
bool SetClosed()
{
LONG lRet = InterlockedCompareExchange(&m_lClosed, 1, 0); // Set m_lClosed to 1 provided it's currently 0
// If the swap was carried out, the return value is the old value of m_lClosed, which should be 0.
return lRet == 0;
}
SOCKET m_sock;
LONG m_lClosed;
};
The caller will call SetClosed(); if it returns true, it will then call ::closesocket() etc. Please don't question why it is that way, it just is :)
Consider what happens if one thread closes the socket whilst another tries to post a WSARecv(). You might think that the WSARecv() will fail (the socket is closed after all!); however what if a new connection is established with the same socket handle as that which we just closed - we would then be posting the WSARecv() which will succeed, but this would be fatal for my program logic since we are now associating a completely different connection with this CConnection object. Hence, I have the if (!m_lClosed) test. You could argue that I shouldn't be handling the same connection in multiple threads, but that is not the point of this question :)
That is why I need to test m_lClosed before I make the WSARecv() call.
Now, clearly, I am only setting m_lClosed to a 1, so a torn read/write is not really a concern, but it is the principle I am concerned about. What if I set m_lClosed to 2147483647 and then test for 2147483647? In this case, a torn read/write would be more problematic.
It really depends on your compiler and the CPU you are running on.
x86 CPUs will atomically read 32-bit values without the LOCK prefix if the memory address is properly aligned. However, you most likely will need some sort of memory barrier to control the CPUs out-of-order execution if the variable is used as a lock/count of some other related data. Data that is not aligned might not be read atomically, especially if the value straddles a page boundary.
If you are not hand coding assembly you also need to worry about the compilers reordering optimizations.
Any variable marked as volatile will have ordering constraints in the compiler (and possibly the generated machine code) when compiling with Visual C++:
The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics prevent compiler re-ordering only. With Visual Studio 2003, volatile to volatile references are ordered; the compiler will not re-order volatile variable access. With Visual Studio 2005, the compiler also uses acquire semantics for read operations on volatile variables and release semantics for write operations on volatile variables (when supported by the CPU).
Microsoft specific volatile keyword enhancements:
When the /volatile:ms compiler option is used—by default when architectures other than ARM are targeted—the compiler generates extra code to maintain ordering among references to volatile objects in addition to maintaining ordering to references to other global objects. In particular:
A write to a volatile object (also known as volatile write) has Release semantics; that is, a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.
A read of a volatile object (also known as volatile read) has Acquire semantics; that is, a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.
This enables volatile objects to be used for memory locks and releases in multithreaded applications.
For architectures other than ARM, if no /volatile compiler option is specified, the compiler performs as if /volatile:ms were specified; therefore, for architectures other than ARM we strongly recommend that you specify /volatile:iso, and use explicit synchronization primitives and compiler intrinsics when you are dealing with memory that is shared across threads.
Microsoft provides compiler intrinsics for most of the Interlocked* functions and they will compile to something like LOCK XADD ... instead of a function call.
Until "recently", C/C++ had no support for atomic operations or threads in general but this changed in C11/C++11 where atomic support was added. Using the <atomic> header and its types/functions/classes moves the alignment and reordering responsibility to the compiler so you don't have to worry about that. You still have to make a choice regarding memory barriers and this determines the machine code generated by the compiler. With relaxed memory order, the load atomic operation will most likely end up as a simple MOV instruction on x86. A stricter memory order can add a fence and possibly the LOCK prefix if the compiler determines that the target platform requires it.
In C++11, an unsynchronized access to a non-atomic object (such as m_lClosed) is undefined behavior.
The standard provides all the facilities you need to write this correctly; you do not need non-portable functions such as InterlockedCompareExchange.
Instead, simply define your variable as atomic:
std::atomic<bool> m_lClosed{false};
// Writer thread...
bool expected = false;
m_lClosed.compare_exhange_strong(expected, true);
// Reader...
if (m_lClosed.load()) { /* ... */ }
This is more than sufficient (it forces sequential consistency, which might be expensive). In some cases it might be possible to generate slightly faster code by relaxing the memory order on the atomic operations, but I would not worry about that.
As I posted here, this question was never about protecting a critical section of code, it was purely about avoiding torn read/writes. user3386109 posted a comment here which I ended up using, but declined posting it as an answer here. Thus I am providing the solution I ended up using for completeness of this question; maybe it will help someone in the future.
The following shows the atomic setting and testing of m_lClosed:
long m_lClosed = 0;
Thread 1
// Set flag to closed
if (InterlockedCompareExchange(&m_lClosed, 1, 0) == 0)
cout << "Closed OK!\n";
Thread 2
This code replaces if (!m_lClosed)
if (InterlockedCompareExchange(&m_lClosed, 0, 0) == 0)
cout << "Not closed!";
OK so as it turns out this really isn't necessary; this answer explains in detail why we don't need to use any interlocked operations for a simple read/write (but we do for a read-modify-write).
in volatile: The Multithreaded Programmer's Best Friend, Andrei Alexandrescu gives this example:
class Gadget
{
public:
void Wait()
{
while (!flag_)
{
Sleep(1000); // sleeps for 1000 milliseconds
}
}
void Wakeup()
{
flag_ = true;
}
...
private:
bool flag_;
};
he states,
... the compiler concludes that it can cache flag_ in a register ... it harms correctness: after you call Wait for some Gadget object, although another thread calls Wakeup, Wait will loop forever. This is because the change of flag_ will not be reflected in the register that caches flag_.
then he offers a solution:
If you use the volatile modifier on a variable, the compiler won't cache that variable in registers — each access will hit the actual memory location of that variable.
now, other people mentioned on stackoverflow and elsewhere that volatile keyword doesn't really offer any thread-safety guarantees, and i should use std::atomic or mutex synchronization instead, which i do agree.
however, going the std::atomic route for example, which internally uses memory fences read_acquire and write_release (Acquire and Release Semantics), i don't see how it actually fixes the register-cache problem in particular.
in case of x86 for example, every load on x86/64 already implies acquire semantics and every store implies release semantics such that compiled code under x86 doesn't emit any actual memory barriers at all. (The Purpose of memory_order_consume in C++11)
g = Guard.load(memory_order_acquire);
if (g != 0)
p = Payload;
On Intel x86-64, the Clang compiler generates compact machine code for this example – one machine instruction per line of C++ source code. This family of processors features a strong memory model, so the compiler doesn’t need to emit special memory barrier instructions to implement the read-acquire.
so.... just assuming x86 arch for now, how does std::atomic solve the cache in registry problem? w/ no memory barrier instructions for read-acquire in compiled code, it seems to be the same as the compiled code for just regular read.
Did you notice that there was no load from just a register in your code? There was an explicit memory load from _Guard. So it did in fact prevent caching in a register.
Now how it does this is up to the specific platform's implementation of std::atomic, but it must do this.
And, by the way, Alexandrescu's reasoning is completely wrong for modern platforms. While it's true that volatile prevents the compiler from caching in a register, it doesn't prevent similar caching being done by the CPU or by hardware. On some platforms, it might happen to be adequate, but there is absolutely no reason to write gratuitously non-portable code that might break on a future CPU, compiler, library, or platform when a fully-portable alternative is readily available.
volatile is not necessary for any "sane" implementation when the Gadget example is changed to use std::atomic<bool>. The reason for this is not that the standard forbids the use of registers, instead (§29.3/13 in n3690):
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
Of course, what constitutes "reasonable" is open to interpretation, and it's "should", not "shall", so an implementation might ignore the requirement without violating the letter of the standard. Typical implementations do not cache the results of atomic loads, nor (much) delay issuing an atomic store to the CPU, and thus leave the decision largely to the hardware. If you would like to enforce this behavior, you should use volatile std::atomic<bool> instead. In both cases, however, if another thread sets the flag, the Wait() should be finite, but if your compiler and/or CPU are so willing, can still take much longer than you would like.
Also note that a memory fence does not guarantee that a store becomes visible to another thread immediately nor any sooner than it otherwise would. So even if the compiler added fence instructions to Gadget's methods, they wouldn't help at all. Fences are used to guarantee consistency, not to increase performance.
Simple edition: In a C++ program I'm using two different threads to work with some integer variable. but I'm sure one is always writing some value into it and the other one is only reading That. do I still need to use mutex lock when reading/writing data?
Now details : The main idea is that first thread generates some information and saves them into an array, and the second thread reads data from that array and process them. this array represents a queue. meaning I have two index values pointing to the first and last item in queue. Now I'm wondering if I have to lock these two index values whenever I'm reading or writing values or is it ok to check them without locking? note that generator thread is the only thread changing index of queue_back, and processor thread has exclusive permission to change queue_front.
if makes any change I'm developing for a linux based system and the code is compiled using gcc.
PS: In some codes which use threading, I've seen keyword volatile around variables shared between different threads, do I need to use that too?
No the read and write are not atomic, You will need to synchronize it using some synchronization mechanism.
Also, You MUST mark the shared integer as volatile, otherwise the optimizer might think the variable is never updated in one of your threads.
gcc allows you to do atomic operations on int‘s, long‘s and long long‘s (and their unsigned counterparts).
Look up for the functions:
type __sync_fetch_and_add (type *ptr, type value);
type __sync_fetch_and_sub (type *ptr, type value);
type __sync_fetch_and_or (type *ptr, type value);
type __sync_fetch_and_and (type *ptr, type value);
type __sync_fetch_and_xor (type *ptr, type value);
type __sync_fetch_and_nand (type *ptr, type value);
Do I still need to use mutex lock when reading/writing data?
Yes, you will need a lock. You may be interested in a more specific implementation called a read/write lock.
You can also use atomics and/or memory barriers. Using these will require a better understanding of your target architectures. Reproducing multithreading bugs can be very difficult, and these alternatives should be considered an optimization which may not be portable.
I've seen keyword volatile around variables shared between different threads, do I need to use that too?
Yikes. No! That is not a safe or portable solution to multithreaded reads and writes in C++. Use atomics, locks, copying, immutable and pure implementations (etc...) instead.
Interpretation of volatile can vary by platform and/or compiler, and it's not specified to operate any particular way in C or C++ for the purpose of multithreaded reads and writes (there's an old false legend that it can be used reliably as an atomic read/write). I once tested the effectiveness of volatile in a multithreaded C++ program for fun (on an intel-mac with apple's gcc). I won't provide the results because it worked well enough that some people might consider using it, although they should not because 'almost' isn't good enough.
And to qualify the use of volatile: It exists in my (large, strictly written, multithreading aware) codebase for the sole purpose of interfacing with platform dependent atomics APIs. And being entirely honest: there are a few other uses from earlier days, but they can and should be removed.
Yes, you need to synchronize access to the variable, either with a mutex, a critical section, interlocked access, etc to make sure that the reading thread does not read incomplete bytes while the writing thread is still saving them. This is especially important on multi-core/CPU systems, where the two threads can truely access the variable in parallel.
Reads and writes of properly aligned data no larger than a machine word (usually whatever int resolves to) are atomic on most major architectures. That does not mean every architecture.
This means that no, you cannot just read head and tail and expect that data is consistent. However, if for example sizeof(int) happens to be 4 and sizeof(short) happens to be 2, and if you don't care about "not mainstream" platforms, you can do some union trickery and get away without atomic operations or a mutex.
If you want your code to be portable, there is no way around proper locking or atomic compare/exchange.
About volatile, this does insert a memory barrier for Microsoft Visual C++ (as a compiler-specific sophistry) but the standard does not guarantee anything special other than that the compiler won't optimize the variable. Insofar, making something volatile does not really help much, and it guarantees thread safety in no way.
Yes, you need to protect the queue indexes in both the generator and the reader threads by using some kind of synchronization mechanism, like a mutex.
HTH
If you're really just sharing one single integer, then std::atomic<int> sounds like the right type to use, from the <atomic> header. (There should also be Boost or TR1 versions if you have an old compiler.) This ensures atomic reads and writes. There's no need for a volatile qualifier as far as I understand.
A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs.
We just had a discussion about whether mutex functions such as pthread_mutex_lock() ... pthread_mutex_unlock() are sufficient by themselves, or whether the protected variable needs to be volatile.
int foo::bar()
{
//...
//code which may or may not access _protected.
pthread_mutex_lock(m);
int ret = _protected;
pthread_mutex_unlock(m);
return ret;
}
My concern is caching. Could the compiler place a copy of _protected on the stack or in a register, and use that stale value in the assignment? If not, what prevents that from happening? Are variations of this pattern vulnerable?
I presume that the compiler doesn't actually understand that pthread_mutex_lock() is a special function, so are we just protected by sequence points?
Thanks greatly.
Update: Alright, I can see a trend with answers explaining why volatile is bad. I respect those answers, but articles on that subject are easy to find online. What I can't find online, and the reason I'm asking this question, is how I'm protected without volatile. If the above code is correct, how is it invulnerable to caching issues?
Simplest answer is volatile is not needed for multi-threading at all.
The long answer is that sequence points like critical sections are platform dependent as is whatever threading solution you're using so most of your thread safety is also platform dependent.
C++0x has a concept of threads and thread safety but the current standard does not and therefore volatile is sometimes misidentified as something to prevent reordering of operations and memory access for multi-threading programming when it was never intended and can't be reliably used that way.
The only thing volatile should be used for in C++ is to allow access to memory mapped devices, allow uses of variables between setjmp and longjmp, and to allow uses of sig_atomic_t variables in signal handlers. The keyword itself does not make a variable atomic.
Good news in C++0x we will have the STL construct std::atomic which can be used to guarantee atomic operations and thread safe constructs for variables. Until your compiler of choice supports it you may need to turn to the boost library or bust out some assembly code to create your own objects to provide atomic variables.
P.S. A lot of the confusion is caused by Java and .NET actually enforcing multi-threaded semantics with the keyword volatile C++ however follows suit with C where this is not the case.
Your threading library should include the apropriate CPU and compiler barriers on mutex lock and unlock. For GCC, a memory clobber on an asm statement acts as a compiler barrier.
Actually, there are two things that protect your code from (compiler) caching:
You are calling a non-pure external function (pthread_mutex_*()), which means that the compiler doesn't know that that function doesn't modify your global variables, so it has to reload them.
As I said, pthread_mutex_*() includes a compiler barrier, e.g: on glibc/x86 pthread_mutex_lock() ends up calling the macro lll_lock(), which has a memory clobber, forcing the compiler to reload variables.
If the above code is correct, how is it invulnerable to caching
issues?
Until C++0x, it is not. And it is not specified in C. So, it really depends on the compiler. In general, if the compiler does not guarantee that it will respect ordering constraints on memory accesses for functions or operations that involve multiple threads, you will not be able to write multithreaded safe code with that compiler. See Hans J Boehm's Threads Cannot be Implemented as a Library.
As for what abstractions your compiler should support for thread safe code, the wikipedia entry on Memory Barriers is a pretty good starting point.
(As for why people suggested volatile, some compilers treat volatile as a memory barrier for the compiler. It's definitely not standard.)
The volatile keyword is a hint to the compiler that the variable might change outside of program logic, such as a memory-mapped hardware register that could change as part of an interrupt service routine. This prevents the compiler from assuming a cached value is always correct and would normally force a memory read to retrieve the value. This usage pre-dates threading by a couple decades or so. I've seen it used with variables manipulated by signals as well, but I'm not sure that usage was correct.
Variables guarded by mutexes are guaranteed to be correct when read or written by different threads. The threading API is required to ensure that such views of variables are consistent. This access is all part of your program logic and the volatile keyword is irrelevant here.
With the exception of the simplest spin lock algorithm, mutex code is quite involved: a good optimized mutex lock/unlock code contains the kind of code even excellent programmer struggle to understand. It uses special compare and set instructions, manages not only the unlocked/locked state but also the wait queue, optionally uses system calls to go into a wait state (for lock) or wake up other threads (for unlock).
There is no way the average compiler can decode and "understand" all that complex code (again, with the exception of the simple spin lock) no matter way, so even for a compiler not aware of what a mutex is, and how it relates to synchronization, there is no way in practice a compiler could optimize anything around such code.
That's if the code was "inline", or available for analyse for the purpose of cross module optimization, or if global optimization is available.
I presume that the compiler doesn't actually understand that
pthread_mutex_lock() is a special function, so are we just protected
by sequence points?
The compiler does not know what it does, so does not try to optimize around it.
How is it "special"? It's opaque and treated as such. It is not special among opaque functions.
There is no semantic difference with an arbitrary opaque function that can access any other object.
My concern is caching. Could the compiler place a copy of _protected
on the stack or in a register, and use that stale value in the
assignment?
Yes, in code that act on objects transparently and directly, by using the variable name or pointers in a way that the compiler can follow. Not in code that might use arbitrary pointers to indirectly use variables.
So yes between calls to opaque functions. Not across.
And also for variables which can only be used in the function, by name: for local variables that don't have either their address taken or a reference bound to them (such that the compiler cannot follow all further uses). These can indeed be "cached" across arbitrary calls include lock/unlock.
If not, what prevents that from happening? Are variations of this
pattern vulnerable?
Opacity of the functions. Non inlining. Assembly code. System calls. Code complexity. Everything that make compilers bail out and think "that's complicated stuff just make calls to it".
The default position of a compiler is always the "let's execute stupidly I don't understand what is being done anyway" not "I will optimize that/let's rewrite the algorithm I know better". Most code is not optimized in complex non local way.
Now let's assume the absolute worse (from out point of view which is that the compiler should give up, that is the absolute best from the point of view of an optimizing algorithm):
the function is "inline" (= available for inlining) (or global optimization kicks in, or all functions are morally "inline");
no memory barrier is needed (as in a mono-processor time sharing system, and in a multi-processor strongly ordered system) in that synchronization primitive (lock or unlock) so it contains no such thing;
there is no special instruction (like compare and set) used (for example for a spin lock, the unlock operation is a simple write);
there is no system call to pause or wake threads (not needed in a spin lock);
then we might have a problem as the compiler could optimize around the function call. This is fixed trivially by inserting a compiler barrier such as an empty asm statement with a "clobber" for other accessible variables. That means that compiler just assumes that anything that might be accessible to a called function is "clobbered".
or whether the protected variable needs to be volatile.
You can make it volatile for the usual reason you make things volatile: to be certain to be able to access the variable in the debugger, to prevent a floating point variable from having the wrong datatype at runtime, etc.
Making it volatile would actually not even fix the issue described above as volatile is essentially a memory operation in the abstract machine that has the semantics of an I/O operation and as such is only ordered with respect to
real I/O like iostream
system calls
other volatile operations
asm memory clobbers (but then no memory side effect is reordered around those)
calls to external functions (as they might do one the above)
Volatile is not ordered with respect to non volatile memory side effects. That makes volatile practically useless (useless for practical uses) for writing thread safe code in even the most specific case where volatile would a priori help, the case where no memory fence is ever needed: when programming threading primitives on a time sharing system on a single CPU. (That may be one of the least understood aspects of either C or C++.)
So while volatile does prevent "caching", volatile doesn't even prevent compiler reordering of lock/unlock operation unless all shared variables are volatile.
Locks/synchronisation primitives make sure the data is not cached in registers/cpu cache, that means data propagates to memory. If two threads are accessing/ modifying data with in locks, it is guaranteed that data is read from memory and written to memory. We don't need volatile in this use case.
But the case where you have code with double checks, compiler can optimise the code and remove redundant code, to prevent that we need volatile.
Example: see singleton pattern example
https://en.m.wikipedia.org/wiki/Singleton_pattern#Lazy_initialization
Why do some one write this kind of code?
Ans: There is a performance benefit of not accuiring lock.
PS: This is my first post on stack overflow.
Not if the object you're locking is volatile, eg: if the value it represents depends on something foreign to the program (hardware state).
volatile should NOT be used to denote any kind of behavior that is the result of executing the program.
If it's actually volatile what I personally would do is locking the value of the pointer/address, instead of the underlying object.
eg:
volatile int i = 0;
// ... Later in a thread
// ... Code that may not access anything without a lock
std::uintptr_t ptr_to_lock = &i;
some_lock(ptr_to_lock);
// use i
release_some_lock(ptr_to_lock);
Please note that it only works if ALL the code ever using the object in a thread locks the same address. So be mindful of that when using threads with some variable that is part of an API.
I have an application where 2 threads are running... Is there any certanty that when I change a global variable from one thread, the other will notice this change?
I don't have any syncronization or Mutual exclusion system in place... but should this code work all the time (imagine a global bool named dataUpdated):
Thread 1:
while(1) {
if (dataUpdated)
updateScreen();
doSomethingElse();
}
Thread 2:
while(1) {
if (doSomething())
dataUpdated = TRUE;
}
Does a compiler like gcc optimize this code in a way that it doesn't check for the global value, only considering it value at compile time (because it nevers get changed at the same thred)?
PS: Being this for a game-like application, it really doen't matter if there will be a read while the value is being written... all that matters is that the change gets noticed by the other thread.
Yes. No. Maybe.
First, as others have mentioned you need to make dataUpdated volatile; otherwise the compiler may be free to lift reading it out of the loop (depending on whether or not it can see that doSomethingElse doesn't touch it).
Secondly, depending on your processor and ordering needs, you may need memory barriers. volatile is enough to guarentee that the other processor will see the change eventually, but not enough to guarentee that the changes will be seen in the order they were performed. Your example only has one flag, so it doesn't really show this phenomena. If you need and use memory barriers, you should no longer need volatile
Volatile considered harmful and Linux Kernel Memory Barriers are good background on the underlying issues; I don't really know of anything similar written specifically for threading. Thankfully threads don't raise these concerns nearly as often as hardware peripherals do, though the sort of case you describe (a flag indicating completion, with other data presumed to be valid if the flag is set) is exactly the sort of thing where ordering matterns...
Here is an example that uses boost condition variables:
bool _updated=false;
boost::mutex _access;
boost::condition _condition;
bool updated()
{
return _updated;
}
void thread1()
{
boost::mutex::scoped_lock lock(_access);
while (true)
{
boost::xtime xt;
boost::xtime_get(&xt, boost::TIME_UTC);
// note that the second parameter to timed_wait is a predicate function that is called - not the address of a variable to check
if (_condition.timed_wait(lock, &updated, xt))
updateScreen();
doSomethingElse();
}
}
void thread2()
{
while(true)
{
if (doSomething())
_updated=true;
}
}
Use a lock. Always always use a lock to access shared data. Marking the variable as volatile will prevent the compiler from optimizing away the memory read, but will not prevent other problems such as memory re-ordering. Without a lock there is no guarantee that the memory writes in doSomething() will be visible in the updateScreen() function.
The only other safe way is to use a memory fence, either explicitly or an implicitly using an Interlocked* function for example.
Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;
The above will guarantee that any access to the variable will be to and from memory without any specific optimizations and as a result all threads running on the same processor will "see" changes to the variable with the same semantics as the code reads.
Chris Jester-Young pointed out that coherency concerns to such a variable value change may arise in a multi-processor systems. This is a consideration and it depends on the platform.
Actually, there are really two considerations to think about relative to platform. They are coherency and atomicity of the memory transactions.
Atomicity is actually a consideration for both single and multi-processor platforms. The issue arises because the variable is likely multi-byte in nature and the question is if one thread could see a partial update to the value or not. ie: Some bytes changed, context switch, invalid value read by interrupting thread. For a single variable that is at the natural machine word size or smaller and naturally aligned should not be a concern. Specifically, an int type should always be OK in this regard as long as it is aligned - which should be the default case for the compiler.
Relative to coherency, this is a potential concern in a multi-processor system. The question is if the system implements full cache coherency or not between processors. If implemented, this is typically done with the MESI protocol in hardware. The question didn't state platforms, but both Intel x86 platforms and PowerPC platforms are cache coherent across processors for normally mapped program data regions. Therefore this type of issue should not be a concern for ordinary data memory accesses between threads even if there are multiple processors.
The final issue relative to atomicity that arises is specific to read-modify-write atomicity. That is, how do you guarantee that if a value is read updated in value and the written, that this happen atomically, even across processors if more than one. So, for this to work without specific synchronization objects, would require that all potential threads accessing the variable are readers ONLY but expect for only one thread can ever be a writer at one time. If this is not the case, then you do need a sync object available to be able to ensure atomic actions on read-modify-write actions to the variable.
Your solution will use 100% CPU, among other problems. Google for "condition variable".
Chris Jester-Young pointed out that:
This only work under Java 1.5+'s memory model. The C++ standard does not address threading, and volatile does not guarantee memory coherency between processors. You do need a memory barrier for this
being so, the only true answer is implementing a synchronization system, right?
Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;
No, it's not certain. If you declare the variable volatile, then the complier is supposed to generate code that always loads the variable from memory on a read.
If the scope is right ( "extern", global, etc. ) then the change will be noticed. The question is when? And in what order?
The problem is that the compiler can and frequently will re-order your logic to fill all it's concurrent pipelines as a performance optimization.
It doesn't really show in your specific example because there aren't any other instructions around your assignment, but imagine functions declared after your bool assign execute before the assignment.
Check-out Pipeline Hazard on wikipedia or search google for "compiler instruction reordering"
As others have said the volatile keyword is your friend. :-)
You'll most likely find that your code would work when you had all of the optimisation options disabled in gcc. In this case (I believe) it treats everything as volatile and as a result the variable is accessed in memory for every operation.
With any sort of optimisation turned on the compiler will attempt to use a local copy held in a register. Depending on your functions this may mean that you only see the change in variable intermittently or, at worst, never.
Using the keyword volatile indicates to the compiler that the contents of this variable can change at any time and that it should not use a locally cached copy.
With all of that said you may find better results (as alluded to by Jeff) through the use of a semaphore or condition variable.
This is a reasonable introduction to the subject.