Compiler optimizations and temporary assignments in C and C++

Compiler optimizations and temporary assignments in C and C++ - c++

Please see the following code valid in C and C++:
extern int output;
extern int input;
extern int error_flag;
void func(void)
{
if (0 != error_flag)
{
output = -1;
}
else
{
output = input;
}
}
Is the compiler allowed to compile the above code in the same way as if it looked like below?
extern int output;
extern int input;
extern int error_flag;
void func(void)
{
output = -1;
if (0 == error_flag)
{
output = input;
}
}
In other words, is the compiler allowed to generate (from the first snippet) code that always makes a temporary assignment of -1 to output and then assign input value to output depending on error_flag status?
Would the compiler be allowed to do it if output would be declared as volatile?
Would the compiler be allowed to do it if output would be declared as atomic_int (stdatomic.h)?
Update after David Schwartz's comment:
If the compiler is free to add additional writes to a variable, it seems it is not possible to tell from the C code whether a data race exists or not. How to determine this?

Yes, the speculative assignment is possible. Modification of a non-volatile variable is not part of the observable behaviour of the program and thus a spurious write is allowed. (See below for the definition of "observable behaviour", which does not actually include all behaviour which you might observe.)
No. If output is volatile, speculative or spurious mutations are not permitted because the mutation is part of observable behaviour. (Writing to -- or reading from -- a hardware register may have consequences other than just storing a value. This is one of the primary use cases of volatile.)
(Edited) No, the speculative assignment is not possible with atomic output. Loads and stores of atomic variables are synchronized operations, so it should not be possible to load a value of such a variable which was not explicitly stored into the variable.
Observable behaviour
Although a program can do lots of obviously observable things (for example, abruptly terminating because of a segfault), the C and C++ standards only guarantee a limited set of results. Observable behaviour is defined in the C11 draft at §5.1.2.3p6 and in the current C++14 draft at §1.9p8 [intro.execution] with very similar wording:
The least requirements on a conforming implementation are:
— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
— At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program.
The above is taken from the C++ standard; the C standard differs in that in the second point it does not allow multiple possible results, and in the third point it explicitly references a relevant section of the standard library requirements. But details aside, the definitions are co-ordinated; for the purpose of this question, the relevant point is that only access to volatile variables is observable (up to the point that the value of a non-volatile variable is sent to an output device or file).
Data Races
This paragraph needs also to be read in the overall context of the C and C++ standards, which free the implementation from all requirements if the program engenders undefined behaviour. That's why the segfault is not considered in the definition of observable behaviour above: the segfault is a possible undefined behaviour but not a possible behaviour in a conformant program. So in the universe of only conformant programs and conformant implementations, there are no segfaults.
That's important because a program with a data race is not conformant. A data race has undefined behaviour, even if it seems innocuous. And since it is the responsibility of the programmer to avoid undefined behaviour, the implementation may optimize without regard to data races.
The exposition of the memory model in the C and C++ standards is dense and technical, and probably not suitable as an introduction to the concepts. (Browsing around the material on Hans Boehm's site will probably prove less difficult.) Extracting quotes from the standard is risky, because the details are important. But here is a small leap into the morass, from the current C++14 standard, §1.10 [intro.multithread]:
Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.
…
Two actions are potentially concurrent if
— they are performed by different threads, or
— they are unsequenced, and at least one is performed by a signal handler.
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
The take-away here is that a read and a write of the same variable need to be synchronized; otherwise it is a data race and the result is undefined behaviour. Some programmers might object to the strictness of this prohibition, arguing that some data races are "benign". This is the topic of Hans Boehm's 2011 HotPar paper "How to miscompile programs with "benign" data races" (pdf) (author's summary: "There are no benign data races"), and he explains it all much better than I could.
Synchronization here includes the use of atomic types, so it is not a data race to concurrently read and modify an atomic variable. (The result of the read is unpredictable, but it must be either the value before the modification or the value afterwards.) This prevents the compiler from performing "piecemeal" modification of an atomic variable without some explicit synchronization.
After some thought and more research, my conclusion is that the compiler cannot perform speculative writes to atomic variables either. Consequently, I modified the answer to question 3, which I had originally answered "no".
Other useful references:
Bartosz Milewski: Dealing with Benign Data Races the C++ Way
Milewski deals with the precise issue of speculative writes to atomic variables, and concludes:
Can’t the compiler still do the same dirty trick, and momentarily store 42 in the owner variable? No, it can’t! Since the variable is declared atomic the compiler can no longer assume that the write can’t be observed by other threads.
Herb Sutter on Thread Safety and Synchronization
As usual, an accessible and well-written explanation.

Yes, the compiler is allowed to do that kind of optimization. In general, you can assume that the compiler (and the CPU too) can reorder your code assuming it is running in a single thread. If you have more than one thread, you need to synchronize. If you don't synchronize and your code writes to a memory location that is written to or read by another thread, your code contains a data race, in C++ this is undefined behavior.
volatile doesn't change the data race problem. However IIRC, the compiler is not allowed to reorder reads and writes to a volatile variable.
When using atomic_int, the compiler can still perform certain optimizations. I don't think that the compiler can invent writes though (that could break a multithreaded program). However, it can still reorder operations, so be careful.

Related

Is the compiler allowed to constant-fold a local volatile?

Consider this simple code:
void g();
void foo()
{
volatile bool x = false;
if (x)
g();
}
https://godbolt.org/z/I2kBY7
You can see that neither gcc nor clang optimize out the potential call to g. This is correct in my understanding: The abstract machine is to assume that volatile variables may change at any moment (due to being e.g. hardware-mapped), so constant-folding the false initialization into the if check would be wrong.
But MSVC eliminates the call to g entirely (keeping the reads and writes to the volatile though!). Is this standard-compliant behavior?
Background: I occasionally use this kind of construct to be able to turn on/off debugging output on-the-fly: The compiler has to always read the value from memory, so changing that variable/memory during debugging should modify the control flow accordingly. The MSVC output does re-read the value but ignores it (presumably due to constant folding and/or dead code elimination), which of course defeats my intentions here.
Edits:
The elimination of the reads and writes to volatile is discussed here: Is it allowed for a compiler to optimize away a local volatile variable? (thanks Nathan!). I think the standard is abundantly clear that those reads and writes must happen. But that discussion does not cover whether it is legal for the compiler to take the results of those reads for granted and optimize based on that. I suppose this is under-/unspecified in the standard, but I'd be happy if someone proved me wrong.
I can of course make x a non-local variable to side-step the issue. This question is more out of curiosity.

I think [intro.execution] (paragraph number vary) could be used to explain MSVC behavior:
An instance of each object with automatic storage duration is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended...
The standard does not permit elimination of a read through a volatile glvalue, but the paragraph above could be interpreted as allowing to predict the value false.
BTW, the C Standard (N1570 6.2.4/2) says that
An object exists, has a constant address, and retains its last-stored value throughout its lifetime.34
34) In the case of a volatile object, the last store need not be explicit in the program.
It is unclear if there could be a non-explicit store into an object with automatic storage duration in C memory/object model.

TL;DR The compiler can do whatever it wants on each volatile access. But the documentation has to tell you.--"The semantics of an access through a volatile glvalue are implementation-defined."
The standard defines for a program permitted sequences of "volatile accesses" & other "observable behavior" (achieved via "side-effects") that an implementation must respect per "the 'as-if' rule".
But the standard says (my boldface emphasis):
Working Draft, Standard for Programming Language C++
Document Number: N4659
Date: 2017-03-21
§ 10.1.7.1 The cv-qualifiers
5 The semantics of an access through a volatile glvalue are implementation-defined. […]
Similarly for interactive devices (my boldface emphasis):
§ 4.6 Program execution
5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. [...]
7 The least requirements on a conforming implementation are:
(7.1) — Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
(7.2) — At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
(7.3) — The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program. [...]
(Anyway what specific code is generated for a program is not specified by the standard.)
So although the standard says that volatile accesses can't be elided from the abstract sequences of abstract machine side effects & consequent observable behaviors that some code (maybe) defines, you can't expect anything to be reflected in object code or real-world behaviour unless your compiler documentation tells you what constitutes a volatile access. Ditto for interactive devices.
If you are interested in volatile vis a vis the abstract sequences of abstract machine side effects and/or consequent observable behaviors that some code (maybe) defines then say so. But if you are interested in what corresponding object code is generated then you must interpret that in the context of your compiler & compilation.
Chronically people wrongly believe that for volatile accesses an abstract machine evaluation/read causes an implemented read & an abstract machine assignment/write causes an implemented write. There is no basis for this belief absent implementation documentation saying so. When/iff the implementation says that it actually does something upon a "volatile access", people are justified in expecting that something--maybe, the generation of certain object code.

I believe it is legal to skip the check.
The paragraph that everyone likes to quote
34) In the case of a volatile object, the last store need not be explicit in the program
does not imply that an implementation must assume such stores are possible at any time, or for any volatile variable. An implementation knows which stores are possible. For instance, it is entirely reasonable to assume that such implicit writes only happen for volatile variables that are mapped to device registers, and that such mapping is only possible for variables with external linkage. Or an implementation may assume that such writes only hapen to word-sized, word-aligned memory locations.
Having said that, I think MSVC behaviour is a bug. There is no real-world reason to optimise away the call. Such optimisation may be compliant, but it is needlessly evil.

For { A=a; B=b; }, will "A=a" be strictly executed before "B=b"?

Suppose A, B, a, and b are all variables, and the addresses of A, B, a, and b are all different. Then, for the following code:
A = a;
B = b;
Do the C and C++ standard explicitly require A=a be strictly executed before B=b? Given that the addresses of A, B, a, and b are all different, are compilers allowed to swap the execution sequence of two statements for some purpose such as optimization?
If the answer to my question is different in C and C++, I would like to know both.
Edit: The background of the question is the following. In board game AI design, for optimization people use lock-less shared-hash table, whose correctness strongly depends on the execution order if we do not add volatile restriction.

Both standards allow for these instructions to be performed out of order, so long as that does not change observable behaviour. This is known as the as-if rule:
What exactly is the "as-if" rule?
http://en.cppreference.com/w/cpp/language/as_if
Note that as is pointed out in the comments, what is meant by "observable behaviour" is the observable behaviour of a program with defined behaviour. If your program has undefined behaviour, then the compiler is excused from reasoning about that.

The compiler is only obligated to emulate the observable behavior of a program, so if a re-ordering would not violate that principle then it would be allowed. Assuming the behavior is well defined, if your program contains undefined behavior such as a data race then the behavior of the program will be unpredictable and as commented would require use of some form of synchronization to protect the critical section.
A Useful reference
An interesting article that covers this is Memory Ordering at Compile Time and it says:
The cardinal rule of memory reordering, which is universally followed
by compiler developers and CPU vendors, could be phrased as follows:
Thou shalt not modify the behavior of a single-threaded program.
An Example
The article provides a simple program where we can see this reordering:
int A, B; // Note: static storage duration so initialized to zero
void foo()
{
A = B + 1;
B = 0;
}
and shows at higher optimization levels B = 0 is done before A = B + 1, and we can reproduce this result using godbolt, which while using -O3 produces the following (see it live):
movl $0, B(%rip) #, B
addl $1, %eax #, D.1624
Why?
Why does the compiler reorder? The article explains it is exactly the same reason the processor does so, because of complexity of the architecture:
As I mentioned at the start, the compiler modifies the order of memory
interactions for the same reason that the processor does it –
performance optimization. Such optimizations are a direct consequence
of modern CPU complexity.
Standards
In the draft C++ standard this is covered in section 1.9 Program execution which says (emphasis mine going forward):
The semantic descriptions in this International Standard define a
parameterized nondeterministic abstract machine. This International
Standard places no requirement on the structure of conforming
implementations. In particular, they need not copy or emulate the
structure of the abstract machine. Rather, conforming implementations
are required to emulate (only) the observable behavior of the abstract
machine as explained below.5
footnote 5 tells us this is also known as the as-if rule:
This provision is sometimes called the “as-if” rule, because an
implementation is free to disregard any requirement of this
International Standard as long as the result is as if the requirement
had been obeyed, as far as can be determined from the observable
behavior of the program. For instance, an actual implementation need
not evaluate part of an expression if it can deduce that its value is
not used and that no side effects affecting the observable behavior of
the program are produced.
the draft C99 and draft C11 standard covers this in section 5.1.2.3 Program execution although we have to go to the index to see that it is called the as-if rule in the C standard as well:
as−if rule, 5.1.2.3
Update on Lock-Free considerations
The article An Introduction to Lock-Free Programming covers this topic well and for the OPs concerns on lock-less shared-hash table implementation this section is probably the most relevant:
Memory Ordering
As the flowchart suggests, any time you do lock-free programming for
multicore (or any symmetric multiprocessor), and your environment does
not guarantee sequential consistency, you must consider how to prevent
memory reordering.
On today’s architectures, the tools to enforce correct memory ordering
generally fall into three categories, which prevent both compiler
reordering and processor reordering:
A lightweight sync or fence instruction, which I’ll talk about in future posts;
A full memory fence instruction, which I’ve demonstrated previously;
Memory operations which provide acquire or release semantics.
Acquire semantics prevent memory reordering of operations which follow
it in program order, and release semantics prevent memory reordering
of operations preceding it. These semantics are particularly suitable
in cases when there’s a producer/consumer relationship, where one
thread publishes some information and the other reads it. I’ll also
talk about this more in a future post.

If there is no dependency of instructions, these may be executed out of order also if final outcome is not affected. You can observe this while debugging a code compiled at higher optimization level.

Since A = a; and B = b; are independent in terms of data dependencies, this should not matter. If there was an output/outcome of previous instruction affecting the subsequent instruction's input, then ordering matters, otherwise not. this is strictly sequential execution normally.

My read is that this is required to work by the C++ standard; however if you're trying to use this for multithreading control, it doesn't work in that context because there is nothing here to guarantee the registers get written to memory in the right order.
As your edit indicates, you are trying to use it exactly where it will not work.

It may be of interest that if you do this:
{ A=a, B=b; /*etc*/ }
Note the comma in place of the semi-colon.
Then the C++ specification and any confirming compiler will have to guarantee the execution order because operands of the comma operator are always evaluated left to right.
This can indeed be used to prevent the optimizer from subverting your thread synchronization by reordering. The comma effectively becomes a barrier across which reordering is not allowed.

May accesses to volatiles be reordered?

Consider the following sequence of writes to volatile memory, which I've taken from David Chisnall's article at InformIT, "Understanding C11 and C++11 Atomics":
volatile int a = 1;
volatile int b = 2;
a = 3;
My understanding from C++98 was that these operations could not be reordered, per C++98 1.9:
conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as
explained below
...
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and
calls to library I/O functions
Chisnall says that the constraint on order preservation applies only to individual variables, writing that a conforming implementation could generate code that does this:
a = 1;
a = 3;
b = 2;
Or this:
b = 2;
a = 1;
a = 3;
C++11 repeats the C++98 wording that
conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as explained
below.
but says this about volatiles (1.9/8):
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
1.9/12 says that accessing a volatile glvalue (which includes the variables a, b, and c above) is a side effect, and 1.9/14 says that the side effects in one full expression (e.g., a statement) must precede the side effects of a later full expression in the same thread. This leads me to conclude that the two reorderings Chisnall shows are invalid, because they do not correspond to the ordering dictated by the abstract machine.
Am I overlooking something, or is Chisnall mistaken?
(Note that this is not a threading question. The question is whether a compiler is permitted to reorder accesses to different volatile variables in a single thread.)

IMO Chisnalls interpretation (as presented by you) is clearly wrong. The simpler case is C++98. The sequence of reads and writes to volatile data needs to be preserved and that applies to the ordered sequence of reads and writes of any volatile data, not to a single variable.
This becomes obvious, if you consider the original motivation for volatile: memory-mapped I/O. In mmio you typically have several related registers at different memory location and the protocol of an I/O device requires a specific sequence of reads and writes to its set of registers - order between registers is important.
The C++11 wording avoids talking about an absolute sequence of reads and writes, because in multi-threaded environments there is not one single well-defined sequence of such events across threads - and that is not a problem, if these accesses go to independent memory locations. But I believe the intent is that for any sequence of volatile data accesses with a well-defined order the rules remain the same as for C++98 - the order must be preserved, no matter how many different locations are accessed in that sequence.
It is an entirely separate issue what that entails for an implementation. How (and even if) a volatile data access is observable from outside the program and how the access order of the program maps to externally observable events is unspecified. An implementation should probably give you a reasonable interpretation and reasonable guarantees, but what is reasonable depends on the context.
The C++11 standard leaves room for data races between unsynchronized volatile accesses, so there is nothing that requires surrounding these by full memory fences or similar constructs. If there are parts of memory that are truly used as external interface - for memory-mapped I/O or DMA - then it may be reasonable for the implementation to give you guarantees for how volatile accesses to these parts are exposed to consuming devices.
One guarantee can probably be inferred from the standard (see [into.execution]): values of type volatile std::sigatomic_t must have values compatible with the order of writes to them even in a signal handler - at least in a single-threaded program.

You're right, he's wrong. Accesses to distinct volatile variables cannot be reordered by the compiler as long as they occur in separate full expressions i.e. are separated by what C++98 called a sequence point, or in C++11 terms one access is sequenced before the other.
Chisnall seems to be trying to explain why volatile is useless for writing thread-safe code, by showing a simple mutex implementation relying on volatile that would be broken by compiler reorderings. He's right that volatile is useless for thread-safety, but not for the reasons he gives. It's not because the compiler might reorder accesses to volatile objects, but because the CPU might reorder them. Atomic operations and memory barriers prevent the compiler and the CPU from reordering things across the barrier, as needed for thread-safety.
See the bottom right cell of Table 1 at Sutter's informative volatile vs volatile article.

For the moment, I'm going to assume your a=3s are just a mistake in copying and pasting, and you really meant them to be c=3.
The real question here is one of the difference between evaluation, and how things become visible to another processor. The standards describe order of evaluation. From that viewpoint, you're entirely correct -- given assignments to a, b and c in that order, the assignments must be evaluated in that order.
That may not correspond to the order in which those values become visible to other processors though. On a typical (current) CPU, that evaluation will only write values out to the cache. The hardware can reorder things from there though, so (for example) writes out to main memory happen in an entirely different order. Likewise, if another processor attempts to use the values, it may see them as changing in a different order.
Yes, this is entirely allowable -- the CPU is still evaluating the assignments in exactly the order prescribed by the standard, so the requirements are met. The standard simply doesn't place any requirements on what happens after evaluation, which is what happens here.
I should add: on some hardware it is sufficient though. For example, the x86 uses cache snooping, so if another processor tries to read a value that's been updated by one processor (but is still only in the cache) the processor that has the current value will put a hold on the read by the other processor until the current value can be written out so the other processor will see the current value.
That's not the case with all hardware though. While maintaining that strict model keeps things simple, it's also fairly expensive both in terms of extra hardware to ensure consistency and in simple speed when/if you have a lot of processors.
Edit: if we ignore threading for a moment, the question gets a little simpler -- but not much. According to C++11, §1.9/12:
When a call to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
As such, the accesses to volatile objects must be initiated in order, but not necessarily completed in order. Unfortunately, it's often the completion that's externally visible. As such, we pretty much come back to the usual as-if rule: the compiler can rearrange things as much as it wants, as long it produces no externally visible change.

Looks like it can happen.
There is a discussion on this page:
http://gcc.gnu.org/ml/gcc/2003-11/msg01419.html

It depends on your compiler. For example, MSVC++ as of Visual Studio 2005 guarantees* volatiles will not be reordered (actually, what Microsoft did is give up and assume programmers will forever abuse volatile - MSVC++ now adds a memory barrier around certain usages of volatile). Other versions and other compilers may not have such guarantees.
Long story short: don't bet on it. Design your code properly, and don't misuse volatile. Use memory barriers instead or full-blown mutexes as necessary. C++11's atomic types will help.

C++98 doesn't say the instructions cannot be re-ordered.
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions
This says it's the actual sequence of the reads and writes themselves, not the instructions that generate them. Any argument that says that the instructions must reflect the reads and writes in program order could equally argue that the reads and writes to the RAM itself must occur in program order, and clearly that's an absurd interpretation of the requirement.
Simply put, this doesn't mean anything. There is no "one right place" to observe the orders of reads and writes (The RAM bus? The CPU bus? Between the L1 and L2 caches? From another thread? From another core?), so this requirement is essentially meaningless.
Versions of C++ prior to any references to threads clearly don't specify the behavior of volatile variables as seen from another thread. And C++11 (wisely, IMO) didn't change this but instead introduced sensible atomic operations with well-defined inter-thread semantics.
As for memory-mapped hardware, that's always going to be platform-specific. The C++ standard doesn't even pretend to address how that might be done properly. For example, the platform might be such that only a subset of memory operations are legal in that context, say ones that bypass a write posting buffer that can reorder, and the C++ standard certainly doesn't compel the compiler to emit the right instructions for that particular hardware device -- how could it?
Update: I see some downvotes because people don't like this truth. Unfortunately, it is true.
If the C++ standard prohibits the compiler from reordering accesses to distinct volatiles, on the theory that the order of such accesses is part of the program's observable behavior, then it also requires the compiler to emit code that prohibits the CPU from doing so. The standard does not differentiate between what the compiler does and what the compiler's generated code makes the CPU do.
Since nobody believes the standard requires the compiler to emit instructions to keep the CPU from reordering accesses to volatile variables, and modern compilers don't do this, nobody should believe the C++ standard prohibits the compiler from reordering accesses to distinct volatiles.

Does using "pointer to volatile" prevent compiler optimizations at all times?

Here's the problem: your program temporarily uses some sensitive data and wants to erase it when it's no longer needed. Using std::fill() on itself won't always help - the compiler might decide that the memory block is not accessed later, so erasing it is a waste of time and eliminate erasing code.
User ybungalobill suggests using volatile keyword:
{
char buffer[size];
//obtain and use password
std::fill_n( (volatile char*)buffer, size, 0);
}
The intent is that upon seeing the volatile keyword the compiler will not try to eliminate the call to std::fill_n().
Will volatile keyword always prevent the compiler from such memory modifying code elimination?

The compiler is free to optimize your code out because buffer is not a volatile object.
The Standard only requires a compiler to strictly adhere to semantics for volatile objects. Here is what C++03 says
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and
subsequent evaluations have not yet occurred.
[...]
and
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and
calls to library I/O functions
In your example, what you have are reads and writes using volatile lvalues to non-volatile objects. C++0x removed the second text I quoted above, because it's redundant. C++0x just says
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.[...]
These collectively are referred to as the observable behavior of the program.
While one may argue that "volatile data" could maybe mean "data accessed by volatile lvalues", which would still be quite a stretch, the C++0x wording removed all doubts about your code and clearly allows implementations to optimize it away.
But as people pointed out to me, It probably does not matter in practice. A compiler that optimizes such a thing will most probably go against the programmers intention (why would someone have a pointer to volatile otherwise) and so would probably contain a bug. Still, I have experienced compiler vendors that cited these paragraphs when they were faced with bugreports about their over-aggressive optimizations. In the end, volatile is inherent platform specific and you are supposed to double check the result anyway.

From the last C++0x draft [intro.execution]:
8 The least requirements on a
conforming implementation are:
— Access to volatile objects are
evaluated strictly according to the
rules of the abstract machine.
[...]
12 Accessing an object designated by a
volatile glvalue (3.10), modifying an
object, calling a library I/O
function, or calling a function that
does any of those operations are all
side effects, [...]
So even the code you provided must not be optimized.

The memory content you wish to remove may have already been flushed out from your CPU/core's inner cache to RAM, where other CPUs can continue to see it. After overwriting it, you need to use a mutex / memory barrier instruction / atomic operation or something to trigger a sync with other cores. In practice, your compiler will probably do this before calling any external functions (google Dave Butenhof's post on volatile's dubious utility in multi-threading), so if you thread does that soon afterwards anyway then it's not a major issue. Summarily: volatile isn't needed.

A conforming implementation may, at its leisure, defer the actual performance of any volatile reads and writes until the result of a volatile read would affect the execution of a volatile write or I/O operation.
For example, given something like:
volatile unsigned char vol1,vol2;
extern unsigned char res[1000];
void test(int scale)
{
unsigned char ch;
for (int 0=0; i<10000; i++)
{
res[i] = i*vol1*scale;
vol2 = res[i];
}
}
a conforming compiler could, at its option, check whether scale is a multiple of 128 and--if so--clear out all even-indexed values of res before doing any reads from vol1 or writes to vol2. Even though the compiler would need to do each reads from vol1 before it could do the following write to vol2, a compiler may be able to defer both operations until after it has run an essentially unlimited amount of code.

"volatile" qualifier and compiler reorderings

A compiler cannot eliminate or reorder reads/writes to a volatile-qualified variables.
But what about the cases where other variables are present, which may or may not be volatile-qualified?
Scenario 1
volatile int a;
volatile int b;
a = 1;
b = 2;
a = 3;
b = 4;
Can the compiler reorder first and the second, or third and the fourth assignments?
Scenario 2
volatile int a;
int b, c;
b = 1;
a = 1;
c = b;
a = 3;
Same question, can the compiler reorder first and the second, or third and the fourth assignments?

The C++ standard says (1.9/6):
The observable behavior of the
abstract machine is its sequence of
reads and writes to volatile data and
calls to library I/O functions.
In scenario 1, either of the changes you propose changes the sequence of writes to volatile data.
In scenario 2, neither change you propose changes the sequence. So they're allowed under the "as-if" rule (1.9/1):
... conforming implementations are
required to emulate (only) the
observable behavior of the abstract
machine ...
In order to tell that this has happened, you would need to examine the machine code, use a debugger, or provoke undefined or unspecified behavior whose result you happen to know on your implementation. For example, an implementation might make guarantees about the view that concurrently-executing threads have of the same memory, but that's outside the scope of the C++ standard. So while the standard might permit a particular code transformation, a particular implementation could rule it out, on grounds that it doesn't know whether or not your code is going to run in a multi-threaded program.
If you were to use observable behavior to test whether the re-ordering has happened or not (for example, printing the values of variables in the above code), then of course it would not be allowed by the standard.

For scenario 1, the compiler should not perform any of the reorderings you mention. For scenario 2, the answer might depend on:
and whether the b and c variables are visible outside the current function (either by being non-local or having had their address passed
who you talk to (apparently there is some disagreement about how string volatile is in C/C++)
your compiler implementation
So (softening my first answer), I'd say that if you're depending on certain behavior in scenario 2, you'd have to treat it as non-portable code whose behavior on a particular platform would have be determined by whatever the implementation's documentation might indicate (and if the docs said nothing about it, then you're out of luck with a guaranteed behavior.
from C99 5.1.2.3/2 "Program execution":
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
...
(paragraph 5) The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred.
Here's a little of what Herb Sutter has to say about the required behavior of volatile accesses in C/C++ (from "volatile vs. volatile" http://www.ddj.com/hpc-high-performance-computing/212701484) :
what about nearby ordinary reads and writes -- can those still be reordered around unoptimizable reads and writes? Today, there is no practical portable answer because C/C++ compiler implementations vary widely and aren't likely to converge anytime soon. For example, one interpretation of the C++ Standard holds that ordinary reads can move freely in either direction across a C/C++ volatile read or write, but that an ordinary write cannot move at all across a C/C++ volatile read or write -- which would make C/C++ volatile both less restrictive and more restrictive, respectively, than an ordered atomic. Some compiler vendors support that interpretation; others don't optimize across volatile reads or writes at all; and still others have their own preferred semantics.
And for what it's worth, Microsoft documents the following for the C/C++ volatile keyword (as Microsoft-sepcific):
A write to a volatile object (volatile write) has Release semantics; a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.
A read of a volatile object (volatile read) has Acquire semantics; a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.
This allows volatile objects to be used for memory locks and releases in multithreaded applications.

Volatile is not a memory fence. Assignments to B and C in snippet #2 can be eliminated or performed whenever. Why would you want the declarations in #2 to cause the behavior of #1?

Some compilers regard accesses to volatile-qualified objects as a memory fence. Others do not. Some programs are written to require that volatile works as a fence. Others aren't.
Code which is written to require fences, running on platforms that provide them, may run better than code which is written to not require fences, running on platforms that don't provide them, but code which requires fences will malfunction if they are not provided. Code which doesn't require fences will often run slower on platforms that provide them than would code which does require the fences, and implementations which provide fences will run such code more slowly than those that don't.
A good approach may be to define a macro semi_volatile as expanding to nothing on systems where volatile implies a memory fence, or to volatile on systems where it doesn't. If variables that need to have accesses ordered with respect to other volatile variables but not to each other are qualified as semi-volatile, and that macro is defined correctly, reliable operation will be achieved on systems with or without memory fences, and the most efficient operation that can be achieved on systems with fences will be achieved. If a compiler actually implements a qualifier that works as required, semivolatile, it could be defined as a macro that uses that qualifier and achieve even better code.
IMHO, that's an area the Standard really should address, since the concepts involved are applicable on many platforms, and any platform where fences aren't meaningful can simply ignore them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js