I've looked at the standard but couldn't find any indication that simply writing to memory would be considered observable behaviour. If not, that would mean the compiled code need not actually write to that memory. If a compiler choose to optimize away such access anything involving mapper memory, or shared memory, may not work.
1.9-8 seems to defined a very limited observable behaviour but indicates an implementation may define more. Can one assume than any quality compiler would treat modifying memory as an observable behaviour? That is, it may not guarantee atomicity or ordering, but does guarantee that data will eventually be written.
So, have I overlooked something in the standard, or is the writing to memory merely something the compiler decides to do?
Statements from the current or C++0x standard are good. Please note I'm not talking about accessing memory through a function, I mean direct access, such as writing data to a pointer (perhaps retrieved via mmap or another library function).
This kind of thing is what volatile exists for. Else, writing to memory and never apparently reading from it is not observable behaviour. However, in the general case, it would be quite impossible for the optimizer to prove that you never read it back except in relatively trivial examples, so it's not usually an issue.
Can one assume than any quality compiler would treat modifying memory as an observable behaviour?
No. Volatile is meant for marking that. However, you cannot fully trust the compiler even after adding the volatile qualifier, at least as told by a 2008 paper: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf
EDIT:
From C standard (not C++) http://c0x.coding-guidelines.com/5.1.2.3.html
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
My reading of C99 is that unless you specify volatile, how and when the variable is actually accessed is implementation defined. If you specify volatile qualifier then code must work according to the rules of an abstract machine.
Relevant parts in the standard are: 6.7.3 Type qualifiers (volatile description) and 5.1.2.3 Program execution (the abstract machine definition).
For some time now I know that many compilers actually have heuristics to detect cases when a variable should be reread again and when it is okay to use a cached copy. Volatile makes it clear to the compiler that every access to the variable should be actually an access to the memory. Without volatile it seems compiler is free to never reread the variable.
And BTW wrapping the access in a function doesn't change that since a function even without inline might be still inlined by the compiler within the current compilation unit.
From your question below:
Assume I use an array on the heap (unspecified where it is allocated),
and I use that array to perform a calculation (temp space). The
optimizer sees that it doesn't actually need any of that space as it
can use strictly registers. Does the compiler nonetheless write the
temp values to the memory?
Per MSalters below:
It's not guaranteed, and unlikely. Consider a a Static Single
Assignment optimizer. This figures out each possible write/read
dependency, and then assigns registers to optimize these dependencies.
As a side effect, any write that's not followed by a (possible) read
creates no dependencies at all, and is eliminated. In your example
("use strictly registers") the optimizer has satisfied all write/read
dependencies with registers, so it won't write to memory at all. All
reads produce the correct values, so it's a correct optimization.
Related
Consider this function:
void f(void* loc)
{
auto p = new(loc) volatile int{42};
*p = 0;
}
I have check the generated code by clang, gcc and CL, none of them elide the initialization. (The answer may be seen by the hardwer:).
Is it an extension provided by compilers to the standard? Does the standard allow compilers not to perform the write 42?
Actualy for objects of class type, it is specfied that constructor of an object is executed without consideration for the volatile qualifier [class.ctor]:
A constructor can be invoked for a const, volatile or const volatile object. const and volatile
semantics (10.1.7.1) are not applied on an object under construction. They come into effect when the
constructor for the most derived object (4.5) ends.
[intro.execution]/8 lists the minimum requirements for a conforming implementation; these are also known as “observable behavior”. The first requirement is that “Access to volatile objects are evaluated strictly according to the rules of the abstract machine.” The compiler is required to produce all observable behavior. In particular, it is not allowed to remove accesses to volatile objects. And note that “object” here is used in the compiler-writer’s sense: it includes built-in types.
This is not a coherent question because what it means for a compiler to perform a write is platform-specific. There is no platform-independent notion of performing a write other than perhaps seeing the effects of a write in a subsequent read.
As you see, typical compilers on x86 will emit a write instruction but no memory barrier. The CPU may reorder the write, coalesce it, or even avoid doing any write to main memory because of the way the platform's cache coherence works.
The reason they made this implementation choice is that it makes volatile work for a broad range of applications, including those where the standard requires it to work, and because it has acceptable performance consequences. The standard, being platform-neutral, doesn't dictate platform-specific decisions like this and compiler writers do not understand it to do that.
They could have forced every volatile access to be uncoalsecable, un-reorderable, and pushed through the cache subsystem to main memory. But that would provide terrible performance and, on this platform, no significant benefits. So they don't do it, and they don't understand the C++ standard to suggest that there's some mythical observer on the memory bus who must see specific things. The very existence of a memory bus is platform-specific. The standard is not platform-specific.
You will sometimes see people argue, for example, that the standard somehow requires the compiler to issue instructions to do volatile writes in order but that it doesn't matter if the CPU coalesces or re-orders the writes. This is, frankly, silly. The C++ standard doesn't impose requirements on the instructions compilers generate but rather on what those instructions must actually do when executed. It doesn't distinguish between optimizations done by a CPU and optimizations done by a compiler and any such distinctions would be platform-specific anyway.
If the standard allows a CPU to re-order two writes, then it allows the compiler to re-order them. It does not, and cannot, make that kind of distinction. Of course, compiler writers may still decide that they will issues the writes in order even though the CPU can re-order them because that may make the most sense on their platform.
The "as-if rule" gives the compiler the right to optimize out or reorder expressions that would not make a difference to the output and correctness of a program under certain rules, such as;
§1.9.5
A conforming implementation executing a well-formed program shall
produce the same observable behavior as one of the possible executions
of the corresponding instance of the abstract machine with the same
program and the same input.
The cppreference url I linked above specifically mentions special rules for the values of volatile objects, as well as for "new expressions", under C++14:
New-expression has another exception from the as-if rule: the compiler
may remove calls to the replaceable allocation functions even if a
user-defined replacement is provided and has observable side-effects.
I assume "replaceable" here is what is talked about for example in
§18.6.1.1.2
Replaceable: a C++ program may define a function with this function
signature that displaces the default version defined by the C++
standard library.
Is it correct that mem below can be removed or reordered under the as-if rule?
{
... some conformant code // upper block of code
auto mem = std::make_unique<std::array<double, 5000000>>();
... more conformant code, not using mem // lower block of code
}
Is there a way to ensure it's not removed, and stays between the upper and lower blocks of code? A well placed volatile (either/or volatile std::array or left of auto) comes to mind, but as there is no reading of mem, I think even that would not help under the as-if rule.
Side note; I've not been able to get visual studio 2015 to optimize out mem and the allocation at all.
Clarification: The way to observe this would be that the allocation call to the OS comes between any i/o from the two blocks. The point of this is for test cases and/or trying to get objects to be allocated at new locations.
Yes; No. Not within C++.
The abstract machine of C++ does not talk about system allocation calls at all. Only the side effects of such a call that impact the behavior of the abstract machine are fixed by C++, and even then the compiler is free to do something else, so long as-if it results in the same observable behavior on the part of the program in the abstract machine.
In the abstract machine, auto mem = std::make_unique<std::array<double, 5000000>>(); creates a variable mem. It, if used, gives you access to a large amount of doubles packed into an array. The abstract machine is free to throw an exception, or provide you with that large amount of doubles; either is fine.
Note that it is a legal C++ compiler to replace all allocations through new with an unconditional throw of an allocation failure (or returning nullptr for the no throw versions), but that would be a poor quality of implementation.
In the case where it is allocated, the C++ standard doesn't really say where it comes from. The compiler is free to use a static array, for example, and make the delete call a no-op (note it may have to prove it catches all ways to call delete on the buffer).
Next, if you have a static array, if nobody reads or writes to it (and the construction cannot be observed), the compiler is free to eliminate it.
That being said, much of the above relies on the compiler knowing what is going on.
So an approach is to make it impossible for the compiler to know. Have your code load a DLL, then pass a pointer to the unique_ptr to that DLL at the points where you want its state to be known.
Because the compiler cannot optimize over run-time DLL calls, the state of the variable has to basically be what you'd expect it to be.
Sadly, there is no standard way to dynamically load code like that in C++, so you'll have to rely upon your current system.
Said DLL can be separately written to be a noop; or, even, you can examine some external state, and conditionally load and pass the data to the DLL based on the external state. So long as the compiler cannot prove said external state will occur, it cannot optimize around the calls not being made. Then, never set that external state.
Declare the variable at the top of the block. Pass a pointer to it to the fake-external-DLL while uninitialized. Repeat just before initializing it, then after. Then finally, do it at the end of the block before destroying it, .reset() it, then do it again.
If I call a function which has a volatile parameter, and that parameter is not used, must the compiler nonetheless produce the parameter?
void consume( volatile int ) { }
...
consume( some_expr );
GCC does honour this, but I'm uncertain if the wording of volatile in the standards requires this. In my opinion GCC is doing the right thing -- this is logically an assignment to a volatile variable and thus should not be omitted (according to 1.9-8 of the c++ standard)
NOTE: The purpose of this is to prevent the optimizer from removing the evaluation of code. That is, it forces some_expr to be evaluated. It allows the expression to be optimized, but ensures it is actually performed.
I've add C and C++ as tags as the answer to both interests me should there be any differences. I don't think there would be however.
ANSWER: I've picked the first one as I believe it is the correct practical realization of the standard. However, Steve's philosophical viewpoint is very interesting and may in fact mean the standard is ambiguous.
The unnamed argument of consume cannot be read, since it's unnamed. However, it is initialized, and that initialization (with some_expr) is a visible side-effect. Therefore the compiler may not optimize the initialization out.
Whether this requires the actual evalution of some_expr is another matter. In general that is not a visible side-effect, but it could be if some_expr contains volatile sub-expressions.
[edit]
Please note that the "unnamed" part can occur in two places. The caller in general has no way to know whether a parameter is named (let alone used) E.g.
void consume( volatile int x);
consume( some_expr );
// other .cpp
void consume( volatile int ) { } // Same function.
It's possible to argue the opposite, but I think an implementation may omit accesses to automatic volatile objects, provided that they otherwise make no difference to the program and provided also that it either forbids the programmer from observing the stack (via unmapped or read-only pages, debuggers and so on), or at least warns that doing so won't always have the effect you expect. An unnamed parameter fits that description.
The reason is that the "as-if" rule talks about visible side-effects, and says that volatile accesses are visible side-effects. But if the program itself doesn't depend in any way on them, then the standard doesn't say that the object must actually be located "on the stack". The standard allows it to be in a special memory location provided by the hardware, that cannot be read by any means, only written to, and writing to it has no physical effect on the hardware. Since that's permitted, it seems absurd to say it's "required by the standard" that this no-effect write actually goes ahead. How would anyone know? The standard doesn't mandate that there must be instructions in the binary to perform the write, it mandates that the magic, unreadable secret object must assume a particular value. If I claim it has that value, you can't prove I'm wrong :-)
It's not possible to write a conforming program that relies on such a volatile access actually occurring, or to detect whether or not some_expr was evaluated. Really it's between you and your debugger whether it was computed or not. I've seen debuggers go wrong for less.
So, I think an implementation is allowed to break its own debugger, just as it's allowed not to provide a working debugger at all. I don't immediately see why it would want to. In practice I'd expect that marking the parameter volatile has about the same effect as moving the definition of the function somewhere the optimizer can't find it -- it will act to ensure that the value could be used.
I know when reading from a location of memory which is written to by several threads or processes the volatile keyword should be used for that location like some cases below but I want to know more about what restrictions does it really make for compiler and basically what rules does compiler have to follow when dealing with such case and is there any exceptional case where despite simultaneous access to a memory location the volatile keyword can be ignored by programmer.
volatile SomeType * ptr = someAddress;
void someFunc(volatile const SomeType & input){
//function body
}
What you know is false. Volatile is not used to synchronize memory access between threads, apply any kind of memory fences, or anything of the sort. Operations on volatile memory are not atomic, and they are not guaranteed to be in any particular order. volatile is one of the most misunderstood facilities in the entire language. "Volatile is almost useless for multi-threadded programming."
What volatile is used for is interfacing with memory-mapped hardware, signal handlers and the setjmp machine code instruction.
It can also be used in a similar way that const is used, and this is how Alexandrescu uses it in this article. But make no mistake. volatile doesn't make your code magically thread safe. Used in this specific way, it is simply a tool that can help the compiler tell you where you might have messed up. It is still up to you to fix your mistakes, and volatile plays no role in fixing those mistakes.
EDIT: I'll try to elaborate a little bit on what I just said.
Suppose you have a class that has a pointer to something that cannot change. You might naturally make the pointer const:
class MyGizmo
{
public:
const Foo* foo_;
};
What does const really do for you here? It doesn't do anything to the memory. It's not like the write-protect tab on an old floppy disc. The memory itself it still writable. You just can't write to it through the foo_ pointer. So const is really just a way to give the compiler another way to let you know when you might be messing up. If you were to write this code:
gizmo.foo_->bar_ = 42;
...the compiler won't allow it, because it's marked const. Obviously you can get around this by using const_cast to cast away the const-ness, but if you need to be convinced this is a bad idea then there is no help for you. :)
Alexandrescu's use of volatile is exactly the same. It doesn't do anything to make the memory somehow "thread safe" in any way whatsoever. What it does is it gives the compiler another way to let you know when you may have screwed up. You mark things that you have made truly "thread safe" (through the use of actual synchronization objects, like Mutexes or Semaphores) as being volatile. Then the compiler won't let you use them in a non-volatile context. It throws a compiler error you then have to think about and fix. You could again get around it by casting away the volatile-ness using const_cast, but this is just as Evil as casting away const-ness.
My advice to you is to completely abandon volatile as a tool in writing multithreadded applications (edit:) until you really know what you're doing and why. It has some benefit but not in the way that most people think, and if you use it incorrectly, you could write dangerously unsafe applications.
It's not as well defined as you probably want it to be. Most of the relevant standardese from C++98 is in section 1.9, "Program Execution":
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.
Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
Once the execution of a function begins, no expressions from the calling function are evaluated until execution of the called function has completed.
When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with type other than volatile sig_atomic_t are unspecified, and the value of any object not of volatile sig_atomic_t that is modified by the handler becomes undefined.
An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal).
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting messages actually appear prior to a program waiting for input. What constitutes an interactive device is implementation-defined.
So what that boils down to is:
The compiler cannot optimize away reads or writes to volatile objects. For simple cases like the one casablanca mentioned, that works the way you might think. However, in cases like
volatile int a;
int b;
b = a = 42;
people can and do argue about whether the compiler has to generate code as if the last line had read
a = 42; b = a;
or if it can, as it normally would (in the absence of volatile), generate
a = 42; b = 42;
(C++0x may have addressed this point, I haven't read the whole thing.)
The compiler may not reorder operations on two different volatile objects that occur in separate statements (every semicolon is a sequence point) but it is totally allowed to rearrange accesses to non-volatile objects relative to volatile ones. This is one of the many reasons why you should not try to write your own spinlocks, and is the primary reason why John Dibling is warning you not to treat volatile as a panacea for multithreaded programming.
Speaking of threads, you will have noticed the complete absence of any mention of threads in the standards text. That is because C++98 has no concept of threads. (C++0x does, and may well specify their interaction with volatile, but I wouldn't be assuming anyone implements those rules yet if I were you.) Therefore, there is no guarantee that accesses to volatile objects from one thread are visible to another thread. This is the other major reason volatile is not especially useful for multithreaded programming.
There is no guarantee that volatile objects are accessed in one piece, or that modifications to volatile objects avoid touching other things right next to them in memory. This is not explicit in what I quoted but is implied by the stuff about volatile sig_atomic_t -- the sig_atomic_t part would be unnecessary otherwise. This makes volatile substantially less useful for access to I/O devices than it was probably intended to be, and compilers marketed for embedded programming often offer stronger guarantees, but it's not something you can count on.
Lots of people try to make specific accesses to objects have volatile semantics, e.g. doing
T x;
*(volatile T *)&x = foo();
This is legit (because it says "object designated by a volatile lvalue" and not "object with a volatile type") but has to be done with great care, because remember what I said about the compiler being totally allowed to reorder non-volatile accesses relative to volatile ones? That goes even if it's the same object (as far as I know anyway).
If you are worried about reordering of accesses to more than one volatile value, you need to understand the sequence point rules, which are long and complicated and I'm not going to quote them here because this answer is already too long, but here's a good explanation which is only a little simplified. If you find yourself needing to worry about the differences in the sequence point rules between C and C++ you have already screwed up somewhere (for instance, as a rule of thumb, never overload &&).
A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).
Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).
Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.
Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.
Declaring a variable as volatile means the compiler can't make any assumptions about the value that it could have done otherwise, and hence prevents the compiler from applying various optimizations. Essentially it forces the compiler to re-read the value from memory on each access, even if the normal flow of code doesn't change the value. For example:
int *i = ...;
cout << *i; // line A
// ... (some code that doesn't use i)
cout << *i; // line B
In this case, the compiler would normally assume that since the value at i wasn't modified in between, it's okay to retain the value from line A (say in a register) and print the same value in B. However, if you mark i as volatile, you're telling the compiler that some external source could have possibly modified the value at i between line A and B, so the compiler must re-fetch the current value from memory.
The compiler is not allowed to optimize away reads of a volatile object in a loop, which otherwise it'd normally do (i.e. strlen()).
It's commonly used in embedded programming when reading a hardware registry at a fixed address, and that value may change unexpectedly. (In contrast with "normal" memory, that doesn't change unless written to by the program itself...)
That is it's main purpose.
It could also be used to make sure one thread see the change in a value written by another, but it in no way guarantees atomicity when reading/writing to said object.
[edit] For background reading, and to be clear, this is what I am talking about: Introduction to the volatile keyword
When reviewing embedded systems code, one of the most common errors I see is the omission of volatile for thread/interrupt shared data. However my question is whether it is 'safe' not to use volatile when a variable is accessed via an access function or member function?
A simple example; in the following code...
volatile bool flag = false ;
void ThreadA()
{
...
while (!flag)
{
// Wait
}
...
}
interrupt void InterruptB()
{
flag = true ;
}
... the variable flag must be volatile to ensure that the read in ThreadA is not optimised out, however if the flag were read via a function thus...
volatile bool flag = false ;
bool ReadFlag() { return flag }
void ThreadA()
{
...
while ( !ReadFlag() )
{
// Wait
}
...
}
... does flag still need to be volatile? I realise that there is no harm in it being volatile, but my concern is for when it is omitted and the omission is not spotted; will this be safe?
The above example is trivial; in the real case (and the reason for my asking), I have a class library that wraps an RTOS such that there is an abstract class cTask that task objects are derived from. Such "active" objects typically have member functions that access data than may be modified in the object's task context but accessed from other contexts; is it critical then that such data is declared volatile?
I am really interested in what is guaranteed about such data rather than what a practical compiler might do. I may test a number of compilers and find that they never optimise out a read through an accessor, but then one day find a compiler or a compiler setting that makes this assumption untrue. I could imagine for example that if the function were in-lined, such an optimisation would be trivial for a compiler because it would be no different than a direct read.
My reading of C99 is that unless you specify volatile, how and when the variable is actually accessed is implementation defined. If you specify volatile qualifier then code must work according to the rules of an abstract machine.
Relevant parts in the standard are: 6.7.3 Type qualifiers (volatile description) and 5.1.2.3 Program execution (the abstract machine definition).
For some time now I know that many compilers actually have heuristics to detect cases when a variable should be reread again and when it is okay to use a cached copy. Volatile makes it clear to the compiler that every access to the variable should be actually an access to the memory. Without volatile it seems compiler is free to never reread the variable.
And BTW wrapping the access in a function doesn't change that since a function even without inline might be still inlined by the compiler within the current compilation unit.
P.S. For C++ probably it is worth checking the C89 which the former is based on. I do not have the C89 at hand.
Yes it is critical.
Like you said volatile prevents code breaking optimization on shared memory [C++98 7.1.5p8].
Since you never know what kind of optimization a given compiler may do now or in the future, you should explicitly specify that your variable is volatile.
Of course, in the second example, writing/modifying variable 'flag' is omitted. If it is never written to, there is no need for it being volatile.
Concerning the main question
The variable has still to be flagged volatile even if every thread accesses/modifies it through the same function.
A function can be "active" simultaneously in several threads. Imagine that the function code is just a blueprint that gets taken by a thread and executed. If thread B interrupts the execution of ReadFlag in thread A, it simply executes a different copy of ReadFlag (with a different context, e.g. a different stack, different register contents). And by doing so, it could mess up the execution of ReadFlag in thread A.
In C, the volatile keyword is not required here (in the general sense).
From the ANSI C spec (C89), section A8.2 "Type Specifiers":
There are no
implementation-independent semantics
for volatile objects.
Kernighan and Ritchie comment on this section (referring to the const and volatile specifiers):
Except that it should diagnose
explicit attempts to change const
objects, a compiler may ignore these
qualifiers.
Given these details, you can't be guaranteed how a particular compiler interprets the volatile keyword, or if it ignores it altogether. A keyword that is completely implementation dependent shouldn't be considered "required" in any situation.
That being said, K&R also state that:
The purpose of volatile is to force
an implementation to suppress
optimization that could otherwise
occur.
In practice, this is how practically every compiler I have seen interprets volatile. Declare a variable as volatile and the compiler will not attempt to optimize accesses to it in any way.
Most of the time, modern compilers are pretty good about judging whether or not a variable can be safely cached or not. If you find that your particular compiler is optimizing away something that it shouldn't, then adding a volatile keyword might be appropriate. Be aware, though, that this can limit the amount of optimization that the compiler can do on the rest of the code in the function that uses the volatile variable. Some compilers are better about this than others; one embedded C compiler I used would turn off all optimizations for a function that accesses a volatile, but others like gcc seem to be able to still perform some limited optimizations.
Accessing the variable through an accessor function should prevent the function from caching the value. Even if the function is auto-inlined, each call to the function should re-call the function and re-fetch a new value. I have never seen a compiler that would auto-inline the accessor function and then optimize away the data re-fetch. I'm not saying it can't happen (since this is implementation-dependent behavior), but I wouldn't write any code that expects that to happen. Your second example is essentially placing a wrapper API around the variable, and libraries do this without using volatile all the time.
All in all, the treatment of volatile objects in C is implementation-dependent. There is nothing "guaranteed" about them according to the ANSI C89 spec.
Your code is sharing the volatile object between a thread and an interrupt routine. No compiler implementation (that I have ever seen) gives volatile enough power to be sufficient for handling parallel access. You should use some sort of locking mechanism to guarantee that the two threads (in your first example) don't step on each other's toes (even though one is an interrupt handler, you can still have parallel access on a multi-CPU or multi-core system).
Edit: I didn't read the code very closely and so I thought this was a question about thread synchronization, for which volatile should never be used, however this usage looks like it might be OK (Depending on how else the variable in question is used, and if the interrupt is always running such that it's view of memory is (cache-)coherent with the one that the thread sees. In the case of 'can you remove the volatile qualifier if you wrap it in a function call?' the accepted answer is correct, you cannot. I'm going to leave my original answer because it's important for people reading this question to know that volatile is almost useless outside of certain very special cases.
More Edit: Your RTOS use case may require additional protection above and beyond volatile, you may need to use memory barriers in some cases or make them atomic... I can't really tell you for sure, it's just something you need to be careful of (I'd suggest looking at the Linux kernel documentation link I have below though, Linux doesn't use volatile for that kind of thing, very probably with a good reason). Part of when you do and do not need volatile depends very strongly on the memory model of the CPU you're running on, and often volatile is not good enough.
volatile is the WRONG way to do this, it does NOT guarantee that this code will work, it wasn't meant for this kind of use.
volatile was intended for reading/writing to memory mapped device registers, and as such it is sufficient for that purpose, however it DOES NOT help when you're talking about stuff going between threads. (In particular the compiler is still aloud to re-order some reads and writes, as is the CPU while it's executing (this one's REALLY important since volatile doesn't tell the CPU to do anything special (sometimes it means bypass cache, but that's compiler/CPU dependent))
see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html, Intel developer article, CERT, Linux kernel documentation
Short version of those articles, volatile used the way you want to is both BAD and WRONG. Bad because it will make your code slower, wrong because it doesn't actually do what you want.
In practice, on x86 your code will function correctly with or without volatile, however it will be non-portable.
EDIT: Note to self actually read the code... this is the sort of thing volatile is meant to do.