"volatile" qualifier and compiler reorderings

"volatile" qualifier and compiler reorderings - c++

A compiler cannot eliminate or reorder reads/writes to a volatile-qualified variables.
But what about the cases where other variables are present, which may or may not be volatile-qualified?
Scenario 1
volatile int a;
volatile int b;
a = 1;
b = 2;
a = 3;
b = 4;
Can the compiler reorder first and the second, or third and the fourth assignments?
Scenario 2
volatile int a;
int b, c;
b = 1;
a = 1;
c = b;
a = 3;
Same question, can the compiler reorder first and the second, or third and the fourth assignments?

The C++ standard says (1.9/6):
The observable behavior of the
abstract machine is its sequence of
reads and writes to volatile data and
calls to library I/O functions.
In scenario 1, either of the changes you propose changes the sequence of writes to volatile data.
In scenario 2, neither change you propose changes the sequence. So they're allowed under the "as-if" rule (1.9/1):
... conforming implementations are
required to emulate (only) the
observable behavior of the abstract
machine ...
In order to tell that this has happened, you would need to examine the machine code, use a debugger, or provoke undefined or unspecified behavior whose result you happen to know on your implementation. For example, an implementation might make guarantees about the view that concurrently-executing threads have of the same memory, but that's outside the scope of the C++ standard. So while the standard might permit a particular code transformation, a particular implementation could rule it out, on grounds that it doesn't know whether or not your code is going to run in a multi-threaded program.
If you were to use observable behavior to test whether the re-ordering has happened or not (for example, printing the values of variables in the above code), then of course it would not be allowed by the standard.

For scenario 1, the compiler should not perform any of the reorderings you mention. For scenario 2, the answer might depend on:
and whether the b and c variables are visible outside the current function (either by being non-local or having had their address passed
who you talk to (apparently there is some disagreement about how string volatile is in C/C++)
your compiler implementation
So (softening my first answer), I'd say that if you're depending on certain behavior in scenario 2, you'd have to treat it as non-portable code whose behavior on a particular platform would have be determined by whatever the implementation's documentation might indicate (and if the docs said nothing about it, then you're out of luck with a guaranteed behavior.
from C99 5.1.2.3/2 "Program execution":
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
...
(paragraph 5) The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred.
Here's a little of what Herb Sutter has to say about the required behavior of volatile accesses in C/C++ (from "volatile vs. volatile" http://www.ddj.com/hpc-high-performance-computing/212701484) :
what about nearby ordinary reads and writes -- can those still be reordered around unoptimizable reads and writes? Today, there is no practical portable answer because C/C++ compiler implementations vary widely and aren't likely to converge anytime soon. For example, one interpretation of the C++ Standard holds that ordinary reads can move freely in either direction across a C/C++ volatile read or write, but that an ordinary write cannot move at all across a C/C++ volatile read or write -- which would make C/C++ volatile both less restrictive and more restrictive, respectively, than an ordered atomic. Some compiler vendors support that interpretation; others don't optimize across volatile reads or writes at all; and still others have their own preferred semantics.
And for what it's worth, Microsoft documents the following for the C/C++ volatile keyword (as Microsoft-sepcific):
A write to a volatile object (volatile write) has Release semantics; a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.
A read of a volatile object (volatile read) has Acquire semantics; a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.
This allows volatile objects to be used for memory locks and releases in multithreaded applications.

Volatile is not a memory fence. Assignments to B and C in snippet #2 can be eliminated or performed whenever. Why would you want the declarations in #2 to cause the behavior of #1?

Some compilers regard accesses to volatile-qualified objects as a memory fence. Others do not. Some programs are written to require that volatile works as a fence. Others aren't.
Code which is written to require fences, running on platforms that provide them, may run better than code which is written to not require fences, running on platforms that don't provide them, but code which requires fences will malfunction if they are not provided. Code which doesn't require fences will often run slower on platforms that provide them than would code which does require the fences, and implementations which provide fences will run such code more slowly than those that don't.
A good approach may be to define a macro semi_volatile as expanding to nothing on systems where volatile implies a memory fence, or to volatile on systems where it doesn't. If variables that need to have accesses ordered with respect to other volatile variables but not to each other are qualified as semi-volatile, and that macro is defined correctly, reliable operation will be achieved on systems with or without memory fences, and the most efficient operation that can be achieved on systems with fences will be achieved. If a compiler actually implements a qualifier that works as required, semivolatile, it could be defined as a macro that uses that qualifier and achieve even better code.
IMHO, that's an area the Standard really should address, since the concepts involved are applicable on many platforms, and any platform where fences aren't meaningful can simply ignore them.

Related

Is a fundamental type volatile initialization an observable behavior?

Consider this function:
void f(void* loc)
{
auto p = new(loc) volatile int{42};
*p = 0;
}
I have check the generated code by clang, gcc and CL, none of them elide the initialization. (The answer may be seen by the hardwer:).
Is it an extension provided by compilers to the standard? Does the standard allow compilers not to perform the write 42?
Actualy for objects of class type, it is specfied that constructor of an object is executed without consideration for the volatile qualifier [class.ctor]:
A constructor can be invoked for a const, volatile or const volatile object. const and volatile
semantics (10.1.7.1) are not applied on an object under construction. They come into effect when the
constructor for the most derived object (4.5) ends.

[intro.execution]/8 lists the minimum requirements for a conforming implementation; these are also known as “observable behavior”. The first requirement is that “Access to volatile objects are evaluated strictly according to the rules of the abstract machine.” The compiler is required to produce all observable behavior. In particular, it is not allowed to remove accesses to volatile objects. And note that “object” here is used in the compiler-writer’s sense: it includes built-in types.

This is not a coherent question because what it means for a compiler to perform a write is platform-specific. There is no platform-independent notion of performing a write other than perhaps seeing the effects of a write in a subsequent read.
As you see, typical compilers on x86 will emit a write instruction but no memory barrier. The CPU may reorder the write, coalesce it, or even avoid doing any write to main memory because of the way the platform's cache coherence works.
The reason they made this implementation choice is that it makes volatile work for a broad range of applications, including those where the standard requires it to work, and because it has acceptable performance consequences. The standard, being platform-neutral, doesn't dictate platform-specific decisions like this and compiler writers do not understand it to do that.
They could have forced every volatile access to be uncoalsecable, un-reorderable, and pushed through the cache subsystem to main memory. But that would provide terrible performance and, on this platform, no significant benefits. So they don't do it, and they don't understand the C++ standard to suggest that there's some mythical observer on the memory bus who must see specific things. The very existence of a memory bus is platform-specific. The standard is not platform-specific.
You will sometimes see people argue, for example, that the standard somehow requires the compiler to issue instructions to do volatile writes in order but that it doesn't matter if the CPU coalesces or re-orders the writes. This is, frankly, silly. The C++ standard doesn't impose requirements on the instructions compilers generate but rather on what those instructions must actually do when executed. It doesn't distinguish between optimizations done by a CPU and optimizations done by a compiler and any such distinctions would be platform-specific anyway.
If the standard allows a CPU to re-order two writes, then it allows the compiler to re-order them. It does not, and cannot, make that kind of distinction. Of course, compiler writers may still decide that they will issues the writes in order even though the CPU can re-order them because that may make the most sense on their platform.

How to safely access memory mapped hardware register from C or C++ language level?

In C and C++ I usually access memory mapped hardware registers with the well known pattern:
typedef unsigned int uint32_t;
*((volatile uint32_t*)0xABCDEDCB) = value;
As far as I know, the only thing guaranteed by the C or C++ standard is that accesses to volatile variables are evaluated strictly according to the rules of the abstract machine.
How can I be sure that the compiler will not generate torn stores for the access for a 32-bit processor? For example the compiler is allowed to emit two 16-bit stores instead of a one 32-bit store, isn't it?
Are there any guarantees in this area made by gcc?

When speeking about MCUs, as far as I know there are no such guarantees. Even more, each case of accessing HW registers may be device specific and often may have its own sequence, rules and/or set of assembler instructions. And it depends on compiler implementation, too.
The only thing here that works for me is reading datasheets concering concrete devices/compilers and follow the examples.

If you are really worried use inline assembler. A single assembler instruction will not return until completed.
Also you must ensure that the memory page you are writing to is not cached otherwise the write may not be all the way through. On ARM memory barriers may be necessary as well.
Volatile is just an instruction which tells the compiler to make no assuptions about the content of the memory since the value may be changed outside one's program but has no effect or read write ordering. Use memory barriers or atomics if this is an issue.

Microsoft comment about ISO compliant usage of volatile
"The volatile keyword in C++11 ISO Standard code is to be used only for hardware access"
http://msdn.microsoft.com/en-us/library/12a04hfd.aspx
At least in the case of Microsoft C++ (going back to Visual Studio 2005), an example of a pointer to volatile type is shown:
http://msdn.microsoft.com/en-us/library/145yc477.aspx
Another reference, in this case C, which also includes examples of pointers to volatile types.
"static volatile objects model memory-mapped I/O ports, and static const volatile objects model memory-mapped input ports"
http://en.cppreference.com/w/c/language/volatile
Operations on volatile types are not allowed to be reordered by compiler or hardware, a requirement for hardware memory mapped access. However operations to a combination of volatile and non-volatile types may end up with reordered operations on the non-volatile types, making them non thread safe (all inter thread sharing of variables would require all of them to be volatile to be thread safe). Even if two threads only share volatile types, there's still a data race issue (one thread reads just before the other thread writes).
Microsoft compilers have a non-portable (to other compilers) extension to volatile, that makes them thread safe (/volatile:ms - Microsoft specific, used by default except for ARM processors).
Back to the original question, in the case of GCC, you can have the compiler generate assembly code to verify the operation is safe.

How can I be sure that the compiler will not generate torn stores for the access for a 32-bit processor? For example the compiler is allowed to emit two 16-bit stores instead of a one 32-bit store, isn't it?
Normally, the compiler can combine or split memory accesses under the as-if rule, as long as the observable behavior of the program is unchanged, since the observable behavior of access to ordinary objects is the effect on the object's value, and not the memory access itself.
However, accesses to volatile objects are part of the observable behavior of a program. Therefore the compiler can no longer combine or split memory transactions. In the section where the C++ Standard defines "observable behavior" it specifically says that "Access to volatile objects are evaluated strictly according to the rules of the abstract machine."
Please note that the code shown is still non-portable C++, because the C++ Standard only cares about whether the object accessed is volatile, and not about modifiers on the pointer used to form an lvalue for said access. You'd need to do something crazy like this example of placement-new, to force the existence of a volatile object:
*(new volatile uint32 ((uint32*)0xABCDEDCB)) = value;

May accesses to volatiles be reordered?

Consider the following sequence of writes to volatile memory, which I've taken from David Chisnall's article at InformIT, "Understanding C11 and C++11 Atomics":
volatile int a = 1;
volatile int b = 2;
a = 3;
My understanding from C++98 was that these operations could not be reordered, per C++98 1.9:
conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as
explained below
...
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and
calls to library I/O functions
Chisnall says that the constraint on order preservation applies only to individual variables, writing that a conforming implementation could generate code that does this:
a = 1;
a = 3;
b = 2;
Or this:
b = 2;
a = 1;
a = 3;
C++11 repeats the C++98 wording that
conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as explained
below.
but says this about volatiles (1.9/8):
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
1.9/12 says that accessing a volatile glvalue (which includes the variables a, b, and c above) is a side effect, and 1.9/14 says that the side effects in one full expression (e.g., a statement) must precede the side effects of a later full expression in the same thread. This leads me to conclude that the two reorderings Chisnall shows are invalid, because they do not correspond to the ordering dictated by the abstract machine.
Am I overlooking something, or is Chisnall mistaken?
(Note that this is not a threading question. The question is whether a compiler is permitted to reorder accesses to different volatile variables in a single thread.)

IMO Chisnalls interpretation (as presented by you) is clearly wrong. The simpler case is C++98. The sequence of reads and writes to volatile data needs to be preserved and that applies to the ordered sequence of reads and writes of any volatile data, not to a single variable.
This becomes obvious, if you consider the original motivation for volatile: memory-mapped I/O. In mmio you typically have several related registers at different memory location and the protocol of an I/O device requires a specific sequence of reads and writes to its set of registers - order between registers is important.
The C++11 wording avoids talking about an absolute sequence of reads and writes, because in multi-threaded environments there is not one single well-defined sequence of such events across threads - and that is not a problem, if these accesses go to independent memory locations. But I believe the intent is that for any sequence of volatile data accesses with a well-defined order the rules remain the same as for C++98 - the order must be preserved, no matter how many different locations are accessed in that sequence.
It is an entirely separate issue what that entails for an implementation. How (and even if) a volatile data access is observable from outside the program and how the access order of the program maps to externally observable events is unspecified. An implementation should probably give you a reasonable interpretation and reasonable guarantees, but what is reasonable depends on the context.
The C++11 standard leaves room for data races between unsynchronized volatile accesses, so there is nothing that requires surrounding these by full memory fences or similar constructs. If there are parts of memory that are truly used as external interface - for memory-mapped I/O or DMA - then it may be reasonable for the implementation to give you guarantees for how volatile accesses to these parts are exposed to consuming devices.
One guarantee can probably be inferred from the standard (see [into.execution]): values of type volatile std::sigatomic_t must have values compatible with the order of writes to them even in a signal handler - at least in a single-threaded program.

You're right, he's wrong. Accesses to distinct volatile variables cannot be reordered by the compiler as long as they occur in separate full expressions i.e. are separated by what C++98 called a sequence point, or in C++11 terms one access is sequenced before the other.
Chisnall seems to be trying to explain why volatile is useless for writing thread-safe code, by showing a simple mutex implementation relying on volatile that would be broken by compiler reorderings. He's right that volatile is useless for thread-safety, but not for the reasons he gives. It's not because the compiler might reorder accesses to volatile objects, but because the CPU might reorder them. Atomic operations and memory barriers prevent the compiler and the CPU from reordering things across the barrier, as needed for thread-safety.
See the bottom right cell of Table 1 at Sutter's informative volatile vs volatile article.

For the moment, I'm going to assume your a=3s are just a mistake in copying and pasting, and you really meant them to be c=3.
The real question here is one of the difference between evaluation, and how things become visible to another processor. The standards describe order of evaluation. From that viewpoint, you're entirely correct -- given assignments to a, b and c in that order, the assignments must be evaluated in that order.
That may not correspond to the order in which those values become visible to other processors though. On a typical (current) CPU, that evaluation will only write values out to the cache. The hardware can reorder things from there though, so (for example) writes out to main memory happen in an entirely different order. Likewise, if another processor attempts to use the values, it may see them as changing in a different order.
Yes, this is entirely allowable -- the CPU is still evaluating the assignments in exactly the order prescribed by the standard, so the requirements are met. The standard simply doesn't place any requirements on what happens after evaluation, which is what happens here.
I should add: on some hardware it is sufficient though. For example, the x86 uses cache snooping, so if another processor tries to read a value that's been updated by one processor (but is still only in the cache) the processor that has the current value will put a hold on the read by the other processor until the current value can be written out so the other processor will see the current value.
That's not the case with all hardware though. While maintaining that strict model keeps things simple, it's also fairly expensive both in terms of extra hardware to ensure consistency and in simple speed when/if you have a lot of processors.
Edit: if we ignore threading for a moment, the question gets a little simpler -- but not much. According to C++11, §1.9/12:
When a call to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
As such, the accesses to volatile objects must be initiated in order, but not necessarily completed in order. Unfortunately, it's often the completion that's externally visible. As such, we pretty much come back to the usual as-if rule: the compiler can rearrange things as much as it wants, as long it produces no externally visible change.

Looks like it can happen.
There is a discussion on this page:
http://gcc.gnu.org/ml/gcc/2003-11/msg01419.html

It depends on your compiler. For example, MSVC++ as of Visual Studio 2005 guarantees* volatiles will not be reordered (actually, what Microsoft did is give up and assume programmers will forever abuse volatile - MSVC++ now adds a memory barrier around certain usages of volatile). Other versions and other compilers may not have such guarantees.
Long story short: don't bet on it. Design your code properly, and don't misuse volatile. Use memory barriers instead or full-blown mutexes as necessary. C++11's atomic types will help.

C++98 doesn't say the instructions cannot be re-ordered.
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions
This says it's the actual sequence of the reads and writes themselves, not the instructions that generate them. Any argument that says that the instructions must reflect the reads and writes in program order could equally argue that the reads and writes to the RAM itself must occur in program order, and clearly that's an absurd interpretation of the requirement.
Simply put, this doesn't mean anything. There is no "one right place" to observe the orders of reads and writes (The RAM bus? The CPU bus? Between the L1 and L2 caches? From another thread? From another core?), so this requirement is essentially meaningless.
Versions of C++ prior to any references to threads clearly don't specify the behavior of volatile variables as seen from another thread. And C++11 (wisely, IMO) didn't change this but instead introduced sensible atomic operations with well-defined inter-thread semantics.
As for memory-mapped hardware, that's always going to be platform-specific. The C++ standard doesn't even pretend to address how that might be done properly. For example, the platform might be such that only a subset of memory operations are legal in that context, say ones that bypass a write posting buffer that can reorder, and the C++ standard certainly doesn't compel the compiler to emit the right instructions for that particular hardware device -- how could it?
Update: I see some downvotes because people don't like this truth. Unfortunately, it is true.
If the C++ standard prohibits the compiler from reordering accesses to distinct volatiles, on the theory that the order of such accesses is part of the program's observable behavior, then it also requires the compiler to emit code that prohibits the CPU from doing so. The standard does not differentiate between what the compiler does and what the compiler's generated code makes the CPU do.
Since nobody believes the standard requires the compiler to emit instructions to keep the CPU from reordering accesses to volatile variables, and modern compilers don't do this, nobody should believe the C++ standard prohibits the compiler from reordering accesses to distinct volatiles.

What Rules does compiler have to follow when dealing with volatile memory locations?

I know when reading from a location of memory which is written to by several threads or processes the volatile keyword should be used for that location like some cases below but I want to know more about what restrictions does it really make for compiler and basically what rules does compiler have to follow when dealing with such case and is there any exceptional case where despite simultaneous access to a memory location the volatile keyword can be ignored by programmer.
volatile SomeType * ptr = someAddress;
void someFunc(volatile const SomeType & input){
//function body
}

What you know is false. Volatile is not used to synchronize memory access between threads, apply any kind of memory fences, or anything of the sort. Operations on volatile memory are not atomic, and they are not guaranteed to be in any particular order. volatile is one of the most misunderstood facilities in the entire language. "Volatile is almost useless for multi-threadded programming."
What volatile is used for is interfacing with memory-mapped hardware, signal handlers and the setjmp machine code instruction.
It can also be used in a similar way that const is used, and this is how Alexandrescu uses it in this article. But make no mistake. volatile doesn't make your code magically thread safe. Used in this specific way, it is simply a tool that can help the compiler tell you where you might have messed up. It is still up to you to fix your mistakes, and volatile plays no role in fixing those mistakes.
EDIT: I'll try to elaborate a little bit on what I just said.
Suppose you have a class that has a pointer to something that cannot change. You might naturally make the pointer const:
class MyGizmo
{
public:
const Foo* foo_;
};
What does const really do for you here? It doesn't do anything to the memory. It's not like the write-protect tab on an old floppy disc. The memory itself it still writable. You just can't write to it through the foo_ pointer. So const is really just a way to give the compiler another way to let you know when you might be messing up. If you were to write this code:
gizmo.foo_->bar_ = 42;
...the compiler won't allow it, because it's marked const. Obviously you can get around this by using const_cast to cast away the const-ness, but if you need to be convinced this is a bad idea then there is no help for you. :)
Alexandrescu's use of volatile is exactly the same. It doesn't do anything to make the memory somehow "thread safe" in any way whatsoever. What it does is it gives the compiler another way to let you know when you may have screwed up. You mark things that you have made truly "thread safe" (through the use of actual synchronization objects, like Mutexes or Semaphores) as being volatile. Then the compiler won't let you use them in a non-volatile context. It throws a compiler error you then have to think about and fix. You could again get around it by casting away the volatile-ness using const_cast, but this is just as Evil as casting away const-ness.
My advice to you is to completely abandon volatile as a tool in writing multithreadded applications (edit:) until you really know what you're doing and why. It has some benefit but not in the way that most people think, and if you use it incorrectly, you could write dangerously unsafe applications.

It's not as well defined as you probably want it to be. Most of the relevant standardese from C++98 is in section 1.9, "Program Execution":
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.
Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
Once the execution of a function begins, no expressions from the calling function are evaluated until execution of the called function has completed.
When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with type other than volatile sig_atomic_t are unspecified, and the value of any object not of volatile sig_atomic_t that is modified by the handler becomes undefined.
An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal).
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting messages actually appear prior to a program waiting for input. What constitutes an interactive device is implementation-defined.
So what that boils down to is:
The compiler cannot optimize away reads or writes to volatile objects. For simple cases like the one casablanca mentioned, that works the way you might think. However, in cases like
volatile int a;
int b;
b = a = 42;
people can and do argue about whether the compiler has to generate code as if the last line had read
a = 42; b = a;
or if it can, as it normally would (in the absence of volatile), generate
a = 42; b = 42;
(C++0x may have addressed this point, I haven't read the whole thing.)
The compiler may not reorder operations on two different volatile objects that occur in separate statements (every semicolon is a sequence point) but it is totally allowed to rearrange accesses to non-volatile objects relative to volatile ones. This is one of the many reasons why you should not try to write your own spinlocks, and is the primary reason why John Dibling is warning you not to treat volatile as a panacea for multithreaded programming.
Speaking of threads, you will have noticed the complete absence of any mention of threads in the standards text. That is because C++98 has no concept of threads. (C++0x does, and may well specify their interaction with volatile, but I wouldn't be assuming anyone implements those rules yet if I were you.) Therefore, there is no guarantee that accesses to volatile objects from one thread are visible to another thread. This is the other major reason volatile is not especially useful for multithreaded programming.
There is no guarantee that volatile objects are accessed in one piece, or that modifications to volatile objects avoid touching other things right next to them in memory. This is not explicit in what I quoted but is implied by the stuff about volatile sig_atomic_t -- the sig_atomic_t part would be unnecessary otherwise. This makes volatile substantially less useful for access to I/O devices than it was probably intended to be, and compilers marketed for embedded programming often offer stronger guarantees, but it's not something you can count on.
Lots of people try to make specific accesses to objects have volatile semantics, e.g. doing
T x;
*(volatile T *)&x = foo();
This is legit (because it says "object designated by a volatile lvalue" and not "object with a volatile type") but has to be done with great care, because remember what I said about the compiler being totally allowed to reorder non-volatile accesses relative to volatile ones? That goes even if it's the same object (as far as I know anyway).
If you are worried about reordering of accesses to more than one volatile value, you need to understand the sequence point rules, which are long and complicated and I'm not going to quote them here because this answer is already too long, but here's a good explanation which is only a little simplified. If you find yourself needing to worry about the differences in the sequence point rules between C and C++ you have already screwed up somewhere (for instance, as a rule of thumb, never overload &&).

A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).
Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).
Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.
Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.

Declaring a variable as volatile means the compiler can't make any assumptions about the value that it could have done otherwise, and hence prevents the compiler from applying various optimizations. Essentially it forces the compiler to re-read the value from memory on each access, even if the normal flow of code doesn't change the value. For example:
int *i = ...;
cout << *i; // line A
// ... (some code that doesn't use i)
cout << *i; // line B
In this case, the compiler would normally assume that since the value at i wasn't modified in between, it's okay to retain the value from line A (say in a register) and print the same value in B. However, if you mark i as volatile, you're telling the compiler that some external source could have possibly modified the value at i between line A and B, so the compiler must re-fetch the current value from memory.

The compiler is not allowed to optimize away reads of a volatile object in a loop, which otherwise it'd normally do (i.e. strlen()).
It's commonly used in embedded programming when reading a hardware registry at a fixed address, and that value may change unexpectedly. (In contrast with "normal" memory, that doesn't change unless written to by the program itself...)
That is it's main purpose.
It could also be used to make sure one thread see the change in a value written by another, but it in no way guarantees atomicity when reading/writing to said object.

Does using "pointer to volatile" prevent compiler optimizations at all times?

Here's the problem: your program temporarily uses some sensitive data and wants to erase it when it's no longer needed. Using std::fill() on itself won't always help - the compiler might decide that the memory block is not accessed later, so erasing it is a waste of time and eliminate erasing code.
User ybungalobill suggests using volatile keyword:
{
char buffer[size];
//obtain and use password
std::fill_n( (volatile char*)buffer, size, 0);
}
The intent is that upon seeing the volatile keyword the compiler will not try to eliminate the call to std::fill_n().
Will volatile keyword always prevent the compiler from such memory modifying code elimination?

The compiler is free to optimize your code out because buffer is not a volatile object.
The Standard only requires a compiler to strictly adhere to semantics for volatile objects. Here is what C++03 says
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and
subsequent evaluations have not yet occurred.
[...]
and
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and
calls to library I/O functions
In your example, what you have are reads and writes using volatile lvalues to non-volatile objects. C++0x removed the second text I quoted above, because it's redundant. C++0x just says
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.[...]
These collectively are referred to as the observable behavior of the program.
While one may argue that "volatile data" could maybe mean "data accessed by volatile lvalues", which would still be quite a stretch, the C++0x wording removed all doubts about your code and clearly allows implementations to optimize it away.
But as people pointed out to me, It probably does not matter in practice. A compiler that optimizes such a thing will most probably go against the programmers intention (why would someone have a pointer to volatile otherwise) and so would probably contain a bug. Still, I have experienced compiler vendors that cited these paragraphs when they were faced with bugreports about their over-aggressive optimizations. In the end, volatile is inherent platform specific and you are supposed to double check the result anyway.

From the last C++0x draft [intro.execution]:
8 The least requirements on a
conforming implementation are:
— Access to volatile objects are
evaluated strictly according to the
rules of the abstract machine.
[...]
12 Accessing an object designated by a
volatile glvalue (3.10), modifying an
object, calling a library I/O
function, or calling a function that
does any of those operations are all
side effects, [...]
So even the code you provided must not be optimized.

The memory content you wish to remove may have already been flushed out from your CPU/core's inner cache to RAM, where other CPUs can continue to see it. After overwriting it, you need to use a mutex / memory barrier instruction / atomic operation or something to trigger a sync with other cores. In practice, your compiler will probably do this before calling any external functions (google Dave Butenhof's post on volatile's dubious utility in multi-threading), so if you thread does that soon afterwards anyway then it's not a major issue. Summarily: volatile isn't needed.

A conforming implementation may, at its leisure, defer the actual performance of any volatile reads and writes until the result of a volatile read would affect the execution of a volatile write or I/O operation.
For example, given something like:
volatile unsigned char vol1,vol2;
extern unsigned char res[1000];
void test(int scale)
{
unsigned char ch;
for (int 0=0; i<10000; i++)
{
res[i] = i*vol1*scale;
vol2 = res[i];
}
}
a conforming compiler could, at its option, check whether scale is a multiple of 128 and--if so--clear out all even-indexed values of res before doing any reads from vol1 or writes to vol2. Even though the compiler would need to do each reads from vol1 before it could do the following write to vol2, a compiler may be able to defer both operations until after it has run an essentially unlimited amount of code.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js