This question already has answers here:
Is it safe to share a volatile variable between the main program and an ISR in C?
(5 answers)
Closed 5 years ago.
In the following code snippet, an interrupt routine uses one of many arrays for its execution. The array used is selected synchronously, not asynchronously (it will never change while the ISR is executing). On a single core microcontroller (this question assumes an STM32L496 if the architecture is important), is the volatile specifier required in the declaration of foo?
int a[] = {1, 2, 3};
int b[] = {4, 5, 6};
int * foo; //int * volatile foo? int volatile * volatile foo?
main(){
disable_interrupt();
foo = a;
enable_interrupt();
...
disable_interrupt();
foo = b;
enable_interrupt();
}
void interrupt(){
//Use foo
}
My assumption is that the volatile specifier is not required because any caching of the value of foo will be correct.
EDIT:
To clarify, the final answer is that volatile or some other synchronisation is required because otherwise writes to foo can be omitted or reordered. Caching is not the only concern.
volatile stops the compiler optimizing it, forcing the compiler
To always read the memory, not a cached value from a register
To not move things before, or after the volatile read/write
On a complex CPU (e.g. x86), it is possible for the CPU to re-order operations before, or after a volatile access.
It is typically for memory-mapped-io, where regions of memory, are actually devices, and can change (even on a single core CPU), without visible cause.
The mechanism for C++11 is to use std::atomic to change a value which may occur on different threads of execution.
With a single core, the code will safely modify the value, and store it. If you use volatile, then it will be written to the memory point, before the interrupts are enabled.
If you don't use volatile, then the code may still have the new value in a register before it is used in the interrupt.
int * volatile foo;
Describes that foo can change, but the values it points to, are stable.
int volatile * volatile foo
Describes foo can change, and the things it points to can also change. I think you want int * volatile foo;
Update
For those who doubt that volatile is a compiler barrier.
From the standard n4296
Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O
function, or calling a function that does any of those operations are all side effects, which are changes in the
state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes
both value computations (including determining the identity of an object for glvalue evaluation and fetching
a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call
to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered
complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile
access may not have completed yet.
and
From cppreference cv object
volatile object - an object whose type is volatile-qualified, or a subobject of a volatile object, or a mutable subobject of a const-volatile object. Every access (read or write operation, member function call, etc.) made through a glvalue expression of volatile-qualified type is treated as a visible side-effect for the purposes of optimization (that is, within a single thread of execution, volatile accesses cannot be optimized out or reordered with another visible side effect that is sequenced-before or sequenced-after the volatile access. This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution, see std::memory_order). Any attempt to refer to a volatile object through a non-volatile glvalue (e.g. through a reference or pointer to non-volatile type) results in undefined behavior.
These seem to concur, that there is a compiler barrier, but some of the side effects of interacting with the volatile object may not have completed. For the single core processor, it appears to be a suitable mechanism if C++11 atomics are not available.
From : C++ standard : n4296
We have :-
Every value computation and side effect associated with a full-expression is sequenced before every value
computation and side effect associated with the next full-expression to be evaluated.
From this I understand, there is a happens-before relationship for any operation with a side-effect.
Access to volatile objects are evaluated strictly according to the rules of the abstract machine
From this I understand, that there are rules (which maybe opaque).
Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O
function, or calling a function that does any of those operations are all side effects, which are changes in the
state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes
both value computations (including determining the identity of an object for glvalue evaluation and fetching
a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call
to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered
complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile
access may not have completed yet.
From this I understand that access to volatile (and a few other things), create a side effect, which stops the compiler from re-ordering statements near a volatile access.
Related
Consider this simple code:
void g();
void foo()
{
volatile bool x = false;
if (x)
g();
}
https://godbolt.org/z/I2kBY7
You can see that neither gcc nor clang optimize out the potential call to g. This is correct in my understanding: The abstract machine is to assume that volatile variables may change at any moment (due to being e.g. hardware-mapped), so constant-folding the false initialization into the if check would be wrong.
But MSVC eliminates the call to g entirely (keeping the reads and writes to the volatile though!). Is this standard-compliant behavior?
Background: I occasionally use this kind of construct to be able to turn on/off debugging output on-the-fly: The compiler has to always read the value from memory, so changing that variable/memory during debugging should modify the control flow accordingly. The MSVC output does re-read the value but ignores it (presumably due to constant folding and/or dead code elimination), which of course defeats my intentions here.
Edits:
The elimination of the reads and writes to volatile is discussed here: Is it allowed for a compiler to optimize away a local volatile variable? (thanks Nathan!). I think the standard is abundantly clear that those reads and writes must happen. But that discussion does not cover whether it is legal for the compiler to take the results of those reads for granted and optimize based on that. I suppose this is under-/unspecified in the standard, but I'd be happy if someone proved me wrong.
I can of course make x a non-local variable to side-step the issue. This question is more out of curiosity.
I think [intro.execution] (paragraph number vary) could be used to explain MSVC behavior:
An instance of each object with automatic storage duration is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended...
The standard does not permit elimination of a read through a volatile glvalue, but the paragraph above could be interpreted as allowing to predict the value false.
BTW, the C Standard (N1570 6.2.4/2) says that
An object exists, has a constant address, and retains its last-stored value throughout its lifetime.34
34) In the case of a volatile object, the last store need not be explicit in the program.
It is unclear if there could be a non-explicit store into an object with automatic storage duration in C memory/object model.
TL;DR The compiler can do whatever it wants on each volatile access. But the documentation has to tell you.--"The semantics of an access through a volatile glvalue are implementation-defined."
The standard defines for a program permitted sequences of "volatile accesses" & other "observable behavior" (achieved via "side-effects") that an implementation must respect per "the 'as-if' rule".
But the standard says (my boldface emphasis):
Working Draft, Standard for Programming Language C++
Document Number: N4659
Date: 2017-03-21
§ 10.1.7.1 The cv-qualifiers
5 The semantics of an access through a volatile glvalue are implementation-defined. […]
Similarly for interactive devices (my boldface emphasis):
§ 4.6 Program execution
5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. [...]
7 The least requirements on a conforming implementation are:
(7.1) — Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
(7.2) — At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
(7.3) — The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program. [...]
(Anyway what specific code is generated for a program is not specified by the standard.)
So although the standard says that volatile accesses can't be elided from the abstract sequences of abstract machine side effects & consequent observable behaviors that some code (maybe) defines, you can't expect anything to be reflected in object code or real-world behaviour unless your compiler documentation tells you what constitutes a volatile access. Ditto for interactive devices.
If you are interested in volatile vis a vis the abstract sequences of abstract machine side effects and/or consequent observable behaviors that some code (maybe) defines then say so. But if you are interested in what corresponding object code is generated then you must interpret that in the context of your compiler & compilation.
Chronically people wrongly believe that for volatile accesses an abstract machine evaluation/read causes an implemented read & an abstract machine assignment/write causes an implemented write. There is no basis for this belief absent implementation documentation saying so. When/iff the implementation says that it actually does something upon a "volatile access", people are justified in expecting that something--maybe, the generation of certain object code.
I believe it is legal to skip the check.
The paragraph that everyone likes to quote
34) In the case of a volatile object, the last store need not be explicit in the program
does not imply that an implementation must assume such stores are possible at any time, or for any volatile variable. An implementation knows which stores are possible. For instance, it is entirely reasonable to assume that such implicit writes only happen for volatile variables that are mapped to device registers, and that such mapping is only possible for variables with external linkage. Or an implementation may assume that such writes only hapen to word-sized, word-aligned memory locations.
Having said that, I think MSVC behaviour is a bug. There is no real-world reason to optimise away the call. Such optimisation may be compliant, but it is needlessly evil.
I noticed that clang and gcc optimize away the construction of or assignment to a volatile struct declared on the stack, in some scenarios. For example, the following code:
struct nonvol2 {
uint32_t a, b;
};
void volatile_struct2()
{
volatile nonvol2 temp = {1, 2};
}
Compiles on clang to:
volatile_struct2(): # #volatile_struct2()
ret
On the other hand, gcc does not remove the stores, although it does optimize the two implied stores into a single one:
volatile_struct2():
movabs rax, 8589934593
mov QWORD PTR [rsp-8], rax
ret
Oddly, clang won't optimize away a volatile store to a single int variable:
void volatile_int() {
volatile int x = 42;
}
Compiles to:
volatile_int(): # #volatile_int()
mov dword ptr [rsp - 4], 1
ret
Furthermore a struct with 1 member rather than 2 is not optimized away.
Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct members themselves are declared volatile, rather than the struct itself at the point of construction:
typedef struct {
volatile uint32_t a, b;
} vol2;
void volatile_def2()
{
vol2 temp = {1, 2};
vol2 temp2 = {1, 2};
temp.a = temp2.a;
temp.a = temp2.a;
}
simply compiles down to a simple ret.
While it seems entirely "reasonable" to remove these stores which are pretty much impossible to observe by any reasonable process, my impression was that in the standard volatile loads and stores are assumed to be part of the observable behavior of the program (in addition to calls to IO functions), full stop. The implication being they are not subject to removal by "as if", since it would by definition change the observable behavior of the program.
Am I wrong about that, or is clang breaking the rules here? Perhaps construction is excluded from the cases where volatile must be assumed to have side effects?
From the point of view of the Standard, there is no requirement that implementations document anything about how any objects are physically stored in memory. Even if an implementation documents the behavior of using pointers of type unsigned char* to access objects of a certain type, an implementation would be allowed to physically store data some other way and then have the code for character-based reads and writes adjust behaviors suitably.
If an execution platform specifies a relationship between abstract-machine objects and storage seen by the CPU, and defines ways by which accesses to certain CPU addresses might trigger side effects the compiler doesn't know about, a quality compiler suitable for low-level programming on that platform should generate code where the behavior of volatile-qualified objects is consistent with that specification. The Standard makes no attempt to mandate that all implementations be suitable for low-level programming (or any other particular purpose, for that matter).
If the address of an automatic variable is never exposed to outside code, a volatile qualifier need only have only two effects:
If setjmp is called within a function, a compiler must do whatever is necessary to ensure that longjmp will not disrupt the values of any volatile-qualified objects, even if they were written between the setjmp and longjmp. Absent the qualifier, the value of objects written between setjmp and longjmp would become indeterminate when a longjmp is executed.
Rules which would allow a compiler to presume that any loops which don't have side effects will run to completion do not apply in cases where a volatile object is accessed within the loop, whether or not an implementation would define any means by which such access would be observable.
Except in those cases, the as-if rule would allow a compiler to implement the volatile qualifier in the abstract machine in a way that has no relation to the physical machine.
Let us investigate what the standard directly says. The behavior of volatile is defined by a pair of statements. [intro.execution]/7:
The least requirements on a conforming implementation are:
Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
...
And [intro.execution]/14:
Reading an object designated by a volatile glvalue (6.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
Well, [intro.execution]/14 does not apply because nothing in the above code constitutes "reading an object". You initialize it and destroy it; it is never read.
So that leaves [intro.execution]/7. The phrase of importance here is "accesses through volatile glvalues". While temp certainly is a volatile value, and it certainly is a glvalue... you never actually access through it. Oh yes, you initialize the object, but that doesn't actually access "though" temp as a glvalue.
That is, temp as an expression is a glvalue, per the definition of glvalue: "an expression whose evaluation determines the identity of an object, bit-field, or function." The statement creating and initializing temp results in a glvalue, but the initialization of temp isn't accessing through a glvalue.
Think of volatile like const. The rules about const objects don't apply until after it is initialized. Similarly, the rules about volatile objects don't apply until after it is initialized.
So there's a difference between volatile nonvol2 temp = {1, 2}; and volatile nonvol2 temp; temp.a = 1; temp.b = 2;. And Clang certainly does the right thing in that case.
That being said, the inconsistency of Clang with regard to this behavior (optimizing it out only when using a struct, and only when using a struct that contains more than one member) suggests that this is probably not a formal optimization by the writers of Clang. That is, they're not taking advantage of the wording so much as this just being an odd quirk of some accidental code coming together.
Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct members themselves are declared volatile, rather than the struct itself at the point of construction:
GCC's behavior here is:
Not in accord with the standard, as it is in violation of [intro.execution]/7, but
There's absolutely no way to prove that it isn't compliant with the standard.
Given the code you wrote, there is simply no way for a user to detect whether or not those reads and writes are actually happening. And I rather suspect that the moment you do anything to allow the outside world to see it, those changes will suddenly appear in the compiled code. However much the standard wishes to call it "observable behavior", the fact is that by C++'s own memory model, nobody can see it.
GCC gets away with the crime due to lack of witnesses. Or at least credible witnesses (anyone who could see it would be guilty of invoking UB).
So you should not treat volatile like some optimization off-switch.
Does volatile write to volatile const introduce undefined behavior? What if I drop volatile when writing?
volatile const int x = 42;
const volatile int *p = &x;
*(volatile int *)p = 8; // Does this line introduce undefined behavior?
*(int *)p = 16; // And what about this one?
Compilable code
It is undefined behavior (for both statements) as you attempt to modify the "initial" const object. From C11 (N1570) 6.7.3/p6 Type qualifiers (emphasis mine):
If an attempt is made to modify an object defined with a
const-qualified type through use of an lvalue with non-const-qualified
type, the behavior is undefined.
For completeness it may be worth adding, that Standard says also that:
If an attempt is made to refer to an object defined with a
volatile-qualified type through use of an lvalue with
non-volatile-qualified type, the behavior is undefined.
Hence latter statement, that is:
*(int *)p = 16;
is undefined for second phrase as well (it's a "double UB").
I believe that rules are the same for C++, but don't own copy of C++14 to confirm.
Writing to a variable that is originally const is undefined behaviour, so all your example writes to *p are undefined.
Removing volatile in itself is not undefined in and of itself.
However, if we have something like const volatile int *p = (const volatile int*)0x12340000; /* Address of hw register */, then removing volatile may cause the hardware to update the register value, but your program doesn't pick it up. (E.g. if we "busy wait" with while(*p & 0x01) ;, the compiler should reload the value pointed to by p every time, but while((*(const int *)p) & 1) ;, the compiler is entirely free to read the value ones, and re-use the initial value to loop forever if bit 0 is set).
You could of course have extern volatie int x; and then use const volatile int *p = &x; in some code, and x gets updated by some other piece of code outside of your current translation unit (e.g. another thread) - in which case removing either const or volatile is valid, but as above, you may "miss" updated values because the compiler doesn't expect the global value to get updated outside of your module unless you call functions. [Edit, no, taking away volatile by casting the value is also forbidden in the standard - it is however valid to add const or volatile to something, and then remove it again if it the original object referred to did not have it].
Edti2: volatile is needed to tell the compiler that "the value may change at any time, even if you think nothing should have changed it". This happens, in general, in two situations:
Hardware registers that are updated outside of the software altogether - such as status registers for a serial port, a timer-counter register, or interrupt status of the interrupt controller, to name a few case - there are thousands of other variations, but it's all the same idea: The hardware changes the value, without direct relation to the software that is accessing such registers.
Variabled updated by another thread within the process (or in case of shared memory, by another process) - again, the compiler won't be able to "see" such changes. Often, one can write code that appear to function correctly by calling a function inside wait-loops and such, and the compiler will the reload values of non-local variables anyway, but some time later the compiler decides to inline that function call, and then realizes that the code is not updating that value, so doesn't reload the value -> bug when you inspect the "updated" value and find the same old value that the compiler had already loaded earlier.
Note also that volatile is NOT a guarantee for any kind of thread/process correctness - it will just guarantee that the compiler doesn't skip over reads or writes to that variable. It is still up to the programmer to ensure that for example multiple values that are dependent on ordering are indeed updated in the correct order, and on systems where caches are not coherent between processing units, that the caches are made coherent via the software [for example a CPU and a GPU may not use coherent memory updates, so a write by the CPU does not reach the GPU until the cache has been flushed on the CPU - no amount of applying volatile to the code will fix this]
Is a C++11-compliant compiler allowed to optimize/transform this code from:
bool x = true; // *not* an atomic type, but suppose bool can be read/written atomically
/*...*/
{
while (x); // spins until another thread changes the value of x
}
to anything equivalent to an infinite loop:
{
while (true); // infinite loop
}
The above conversion is certainly valid from the point of view of a single-thread program, but this is not the general case.
Also, was that optimization allowed in pre-C++11?
Absolutely.
Since x is not marked as volatile and appears to be a local object with automatic storage duration and internal linkage, and the program does not modify it, the two programs are equivalent.
In both C++03 and C++11 this is by the as-if rule, since accessing a non-volatile object is not considered to be a "side effect" of the program:
[C++11: 1.9/12]: Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes
both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
C++11 does make room for a global object to have its value changed in one thread then that new value read in another:
[C++11: 1.10/3]: The value of an object visible to a thread T at a particular point is the initial value of the object, a value assigned to the object by T, or a value assigned to the object by another thread, according to the rules below.
However, if you're doing this, since your object is not atomic:
[C++11: 1.10/21]: The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
And, when undefined behaviour is invoked, anything can happen.
Bootnote
[C++11: 1.10/25]: An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.
Again, note that the object would have to be atomic (say, std::atomic<bool>) to obtain this guarantee.
The compiler is allowed go do anything to those two loops. Including terminating the program. Because infinite loops have undefined behaviour if they do not perform a synchronization-like operation (do something that requires synchronization with another thread or I/O), according to the C++ memory model:
Note that it means that a program with endless recursion or endless loop (whether implemented as a for-statement or by looping goto or otherwise) has undefined behavior.
I know when reading from a location of memory which is written to by several threads or processes the volatile keyword should be used for that location like some cases below but I want to know more about what restrictions does it really make for compiler and basically what rules does compiler have to follow when dealing with such case and is there any exceptional case where despite simultaneous access to a memory location the volatile keyword can be ignored by programmer.
volatile SomeType * ptr = someAddress;
void someFunc(volatile const SomeType & input){
//function body
}
What you know is false. Volatile is not used to synchronize memory access between threads, apply any kind of memory fences, or anything of the sort. Operations on volatile memory are not atomic, and they are not guaranteed to be in any particular order. volatile is one of the most misunderstood facilities in the entire language. "Volatile is almost useless for multi-threadded programming."
What volatile is used for is interfacing with memory-mapped hardware, signal handlers and the setjmp machine code instruction.
It can also be used in a similar way that const is used, and this is how Alexandrescu uses it in this article. But make no mistake. volatile doesn't make your code magically thread safe. Used in this specific way, it is simply a tool that can help the compiler tell you where you might have messed up. It is still up to you to fix your mistakes, and volatile plays no role in fixing those mistakes.
EDIT: I'll try to elaborate a little bit on what I just said.
Suppose you have a class that has a pointer to something that cannot change. You might naturally make the pointer const:
class MyGizmo
{
public:
const Foo* foo_;
};
What does const really do for you here? It doesn't do anything to the memory. It's not like the write-protect tab on an old floppy disc. The memory itself it still writable. You just can't write to it through the foo_ pointer. So const is really just a way to give the compiler another way to let you know when you might be messing up. If you were to write this code:
gizmo.foo_->bar_ = 42;
...the compiler won't allow it, because it's marked const. Obviously you can get around this by using const_cast to cast away the const-ness, but if you need to be convinced this is a bad idea then there is no help for you. :)
Alexandrescu's use of volatile is exactly the same. It doesn't do anything to make the memory somehow "thread safe" in any way whatsoever. What it does is it gives the compiler another way to let you know when you may have screwed up. You mark things that you have made truly "thread safe" (through the use of actual synchronization objects, like Mutexes or Semaphores) as being volatile. Then the compiler won't let you use them in a non-volatile context. It throws a compiler error you then have to think about and fix. You could again get around it by casting away the volatile-ness using const_cast, but this is just as Evil as casting away const-ness.
My advice to you is to completely abandon volatile as a tool in writing multithreadded applications (edit:) until you really know what you're doing and why. It has some benefit but not in the way that most people think, and if you use it incorrectly, you could write dangerously unsafe applications.
It's not as well defined as you probably want it to be. Most of the relevant standardese from C++98 is in section 1.9, "Program Execution":
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.
Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
Once the execution of a function begins, no expressions from the calling function are evaluated until execution of the called function has completed.
When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with type other than volatile sig_atomic_t are unspecified, and the value of any object not of volatile sig_atomic_t that is modified by the handler becomes undefined.
An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal).
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting messages actually appear prior to a program waiting for input. What constitutes an interactive device is implementation-defined.
So what that boils down to is:
The compiler cannot optimize away reads or writes to volatile objects. For simple cases like the one casablanca mentioned, that works the way you might think. However, in cases like
volatile int a;
int b;
b = a = 42;
people can and do argue about whether the compiler has to generate code as if the last line had read
a = 42; b = a;
or if it can, as it normally would (in the absence of volatile), generate
a = 42; b = 42;
(C++0x may have addressed this point, I haven't read the whole thing.)
The compiler may not reorder operations on two different volatile objects that occur in separate statements (every semicolon is a sequence point) but it is totally allowed to rearrange accesses to non-volatile objects relative to volatile ones. This is one of the many reasons why you should not try to write your own spinlocks, and is the primary reason why John Dibling is warning you not to treat volatile as a panacea for multithreaded programming.
Speaking of threads, you will have noticed the complete absence of any mention of threads in the standards text. That is because C++98 has no concept of threads. (C++0x does, and may well specify their interaction with volatile, but I wouldn't be assuming anyone implements those rules yet if I were you.) Therefore, there is no guarantee that accesses to volatile objects from one thread are visible to another thread. This is the other major reason volatile is not especially useful for multithreaded programming.
There is no guarantee that volatile objects are accessed in one piece, or that modifications to volatile objects avoid touching other things right next to them in memory. This is not explicit in what I quoted but is implied by the stuff about volatile sig_atomic_t -- the sig_atomic_t part would be unnecessary otherwise. This makes volatile substantially less useful for access to I/O devices than it was probably intended to be, and compilers marketed for embedded programming often offer stronger guarantees, but it's not something you can count on.
Lots of people try to make specific accesses to objects have volatile semantics, e.g. doing
T x;
*(volatile T *)&x = foo();
This is legit (because it says "object designated by a volatile lvalue" and not "object with a volatile type") but has to be done with great care, because remember what I said about the compiler being totally allowed to reorder non-volatile accesses relative to volatile ones? That goes even if it's the same object (as far as I know anyway).
If you are worried about reordering of accesses to more than one volatile value, you need to understand the sequence point rules, which are long and complicated and I'm not going to quote them here because this answer is already too long, but here's a good explanation which is only a little simplified. If you find yourself needing to worry about the differences in the sequence point rules between C and C++ you have already screwed up somewhere (for instance, as a rule of thumb, never overload &&).
A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).
Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).
Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.
Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.
Declaring a variable as volatile means the compiler can't make any assumptions about the value that it could have done otherwise, and hence prevents the compiler from applying various optimizations. Essentially it forces the compiler to re-read the value from memory on each access, even if the normal flow of code doesn't change the value. For example:
int *i = ...;
cout << *i; // line A
// ... (some code that doesn't use i)
cout << *i; // line B
In this case, the compiler would normally assume that since the value at i wasn't modified in between, it's okay to retain the value from line A (say in a register) and print the same value in B. However, if you mark i as volatile, you're telling the compiler that some external source could have possibly modified the value at i between line A and B, so the compiler must re-fetch the current value from memory.
The compiler is not allowed to optimize away reads of a volatile object in a loop, which otherwise it'd normally do (i.e. strlen()).
It's commonly used in embedded programming when reading a hardware registry at a fixed address, and that value may change unexpectedly. (In contrast with "normal" memory, that doesn't change unless written to by the program itself...)
That is it's main purpose.
It could also be used to make sure one thread see the change in a value written by another, but it in no way guarantees atomicity when reading/writing to said object.