Is a C++ optimizer allowed to move statements across a function call? - c++

Note: No multithreading at all here. Just optimized single-threaded code.
A function call introduces a sequence point. (Apparently.)
Does it follow that a compiler (if the optimizer inlines the function) is not allowed to move/intermingle any instructions prior/after with the function's instructions? (As long as it can "proove" no observable effects obviously.)
Explanatory background:
Now, there is a nice article wrt. a benchmarking class for C++, where the author stated:
The code we time won’t be rearranged by the optimizer and will always
lie between those start / end calls to now(), so we can guarantee our
timing will be valid.
to which I asked how he can be sure, and nick replied:
You can check the comment in this answer
https://codereview.stackexchange.com/a/48884. I quote : “I would be
careful about timing things that are not functions because of
optimizations that the compiler is allowed to do. I am not sure about
the sequencing requirements and the observable behavior understanding
of such a program. With a function call the compiler is not allowed to
move statements across the call point (they are sequenced before or
after the call).”
What we do is basically abstract the callable (function, lambda, block
of code surrounded by lambda) and have a signle call
callable(factor) inside the measure structure that acts as a
barrier (not the barrier in multithreading, I believe I convey the
message).
I am quite unsure about this, especially the quote:
With a function call the compiler is not allowed to
move statements across the call point (they are sequenced before or
after the call).
Now, I was always under the impression that when an optimizer inlines some function (which may very well be the case in a (simple) benchmark scenario), it is free to rearrange whatever it likes as long as it does not affect observable behavior.
That is, as far as the language / the optimizer are concerned, these two snippets are exactly the same:
void f() {
// do stuff / Multiple statements
}
auto start = ...;
f();
auto stop = ...;
vs.
auto start = ...;
// do stuff / Multiple statements
auto stop = ...;

Now, I was always under the impression that when an optimizer inlines
some function (which may very well be the case in a (simple) benchmark
scenario), it is free to rearrange whatever it likes as long as it
does not affect observable behavior.
It absolutely is. The optimizer doesn't even need to inline it for this to occur in theory.
However, timing functions are observable behaviour- specifically, they are I/O on the part of the system. The optimizer cannot know that that I/O will produce the same outcome (it obviously won't) if performed in a different order to other I/O calls, which can include non-obvious things like even memory allocation calls that can invoke syscalls to get their memory.
What this basically means is that by and large, for most function calls, the optimizer can't do a great deal of re-arranging because there's potentially a vast quantity of state involved that it can't reason about.
Furthermore, the optimizer can't really know that re-arranging your function calls will actually make the code run faster, and it will make debugging it harder, so they don't have a great deal of incentive to go screwing around with the program's stated order.
Basically, in theory the optimizer can do this, but in reality it won't because doing so would be a massive undertaking for not a lot of benefit.
You'll only encounter conditions like this if your benchmark is fairly trivial or consists virtually entirely of primitive operations like integer addition- in which case you'll want to check the assembly anyway.

Your concern is perfectly valid, the optimizer is allowed to move anything past a function call if it can prove that this does not change observable behavior (other than runtime, that is).
The point about using a function to stop the optimizer from doing things is not to tell the optimizer about the function. That is, the function must not be inlined, and it must not be included in the same compilation unit. Since optimizers are generally a compiler feature, moving the function definition to a different compilation unit deprives the optimizer of the information necessary to prove anything about the function, and consequently stops it from moving anything across the function call.
Beware that this assumes that there is no linker doing global analysis for optimization. If it does, it can still skrew you.

What the comment you quoted has not considered is that sequence points are not primarily about order of execution (although they do constrain it, they don't act as full barriers), but rather about values of expressions.
C++11 actually gets rid of the "sequence point" terminology completely, and instead discussed ordering of "value computation" and "side effects".
To illustrate, the following code exhibits undefined behavior because it doesn't respect ordering:
int a = 5;
int x = a++ + a;
This version is well-defined:
int a = 5;
a++;
int x = a + a;
When the sequence point / ordering of side effects and value computations guarantees us, is that the a used in x = a + a is 6, not 5. So the compiler cannot rewrite it to:
int a = 5;
int x = a + a;
a++;
However, it's perfectly legal to rewrite it as:
int a = 5;
int x = (a+1) + (a+1);
a++;
The order of execution between assigning x and assigning a isn't constrained, because neither of them is volatile or atomic<T> and they aren't externally visible side effects.

The standard leaves definitively free room for the optimizer to sequence operations across the boundary of a function:
1.9/15 Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or
after the execution of the body of the called function is
indeterminately sequenced with respect to the execution of the called
function.
as long as the as-if rule is respectd:
1.9/5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible
executions of the corresponding instance of the abstract machine with
the same program and the same input.
The practice of leaving the optimizer in the blind as suggested by cmaster is in general very effective. By the way, the global optimization issue at linking can also be circumvented using dynamic linking of the benchmarked function.
There is, however, another a hard sequencing constraint that can be used to achieve the same purpose, even within the same compilation unit:
1.9/15 When calling a function (whether or not the function is inline), every value computation and side effect associated with any
argument expression, or with the postfix expression designating the
called function, is sequenced before execution of every expression
or statement in the body of the called function.
So you may use safely an expression like:
my_timer_off(stop, f( my_timer_on(start) ) );
This "functional" writing ensures that:
my_timer_on() is evaluated before any statement of f() is executed,
f() is called before the body of my_timer_off() is executed
thus ensuring the sequence timer-on / f / timer-off (the my_timer_xx would take the start/stop by value).
Of course, this assumes that the signature of the benchmarked function f() can be changed to allow the expression above.

Related

Is it legal for a C++ optimizer to reorder calls to clock()?

The C++ Programming Language 4th edition, page 225 reads: A compiler may reorder code to improve performance as long as the result is identical to that of the simple order of execution. Some compilers, e.g. Visual C++ in release mode, will reorder this code:
#include <time.h>
...
auto t0 = clock();
auto r = veryLongComputation();
auto t1 = clock();
std::cout << r << " time: " << t1-t0 << endl;
into this form:
auto t0 = clock();
auto t1 = clock();
auto r = veryLongComputation();
std::cout << r << " time: " << t1-t0 << endl;
which guarantees different result than original code (zero vs. greater than zero time reported). See my other question for detailed example. Is this behavior compliant with the C++ standard?
Well, there is something called Subclause 5.1.2.3 of the C Standard [ISO/IEC 9899:2011] which states:
In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
Therefore I really suspect that this behaviour - the one you described - is compliant with the standard.
Furthermore - the reorganization indeed has an impact on the computation result, but if you look at it from compiler perspective - it lives in the int main() world and when doing time measurements - it peeps out, asks the kernel to give it the current time, and goes back into the main world where the actual time of the outside world doesn't really matter. The clock() itself won't affect the program and variables and program behaviour won't affect that clock() function.
The clocks values are used to calculate difference between them - that is what you asked for. If there is something going on, between the two measuring, is not relevant from compilers perspective since what you asked for was clock difference and the code between the measuring won't affect the measuring as a process.
This however doesn't change the fact that the described behaviour is very unpleasant.
Even though inaccurate measurements are unpleasant, it could get much more worse and even dangerous.
Consider the following code taken from this site:
void GetData(char *MFAddr) {
char pwd[64];
if (GetPasswordFromUser(pwd, sizeof(pwd))) {
if (ConnectToMainframe(MFAddr, pwd)) {
// Interaction with mainframe
}
}
memset(pwd, 0, sizeof(pwd));
}
When compiled normally, everything is OK, but if optimizations are applied, the memset call will be optimized out which may result in a serious security flaw. Why does it get optimized out? It is very simple; the compiler again thinks in its main() world and considers the memset to be a dead store since the variable pwd is not used afterwards and won't affect the program itself.
The compiler cannot exchange the two clock calls. t1 must be set after t0. Both calls are observable side effects. The compiler may reorder anything between those observable effects, and even over an observable side effect, as long as the observations are consistent with possible observations of an abstract machine.
Since the C++ abstract machine is not formally restricted to finite speeds, it could execute veryLongComputation() in zero time. Execution time itself is not defined as an observable effect. Real implementations may match that.
Mind you, a lot of this answer depends on the C++ standard not imposing restrictions on compilers.
Yes, it is legal - if the compiler can see the entirety of the code that occurs between the clock() calls.
If veryLongComputation() internally performs any opaque function call, then no, because the compiler cannot guarantee that its side effects would be interchangeable with those of clock().
Otherwise, yes, it is interchangeable.
This is the price you pay for using a language in which time isn't a first-class entity.
Note that memory allocation (such as new) can fall in this category, as allocation function can be defined in a different translation unit and not compiled until the current translation unit is already compiled. So, if you merely allocate memory, the compiler is forced to treat the allocation and deallocation as worst-case barriers for everything -- clock(), memory barriers, and everything else -- unless it already has the code for the memory allocator and can prove that this is not necessary. In practice I don't think any compiler actually looks at the allocator code to try to prove this, so these types of function calls serve as barriers in practice.
At least by my reading, no, this is not allowed. The requirement from the standard is (§1.9/14):
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
The degree to which the compiler is free to reorder beyond that is defined by the "as-if" rule (§1.9/1):
This International Standard places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
That leaves the question of whether the behavior in question (the output written by cout) is officially observable behavior. The short answer is that yes, it is (§1.9/8):
The least requirements on a conforming implementation are:
[...]
— At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
At least as I read it, that means the calls to clock could be rearranged compared to the execution of your long computation if and only if it still produced identical output to executing the calls in order.
If, however, you wanted to take extra steps to ensure correct behavior, you could take advantage of one other provision (also §1.9/8):
— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
To take advantage of this, you'd modify your code slightly to become something like:
auto volatile t0 = clock();
auto volatile r = veryLongComputation();
auto volatile t1 = clock();
Now, instead of having to base the conclusion on three separate sections of the standard, and still having only a fairly certain answer, we can look at exactly one sentence, and have an absolutely certain answer--with this code, re-ordering uses of clock vs., the long computation is clearly prohibited.
Let's suppose that the sequence is in a loop, and the veryLongComputation () randomly throws an exception. Then how many t0s and t1s will be calculated? Does it pre-calculate the random variables and reorder based on the precalculation - sometimes reordering and sometimes not?
Is the compiler smart enough to know that just a memory read is a read from shared memory. The read is a measure of how far the control rods have moved in a nuclear reactor. The clock calls are used to control the speed at which they are moved.
Or maybe the timing is controlling the grinding of a Hubble telescope mirror. LOL
Moving clock calls around seems too dangerous to leave to the decisions of compiler writers. So if it is legal, perhaps the standard is flawed.
IMO.
It is certainly not allowed, since it changes, as you have noted, the observeable behavior (different output) of the program (I won't go into the hypothetical case that veryLongComputation() might not consume any measurable time -- given the function's name, is presumably not the case. But even if that was the case, it wouldn't really matter). You wouldn't expect that it is allowable to reorder fopen and fwrite, would you.
Both t0 and t1 are used in outputting t1-t0. Therefore, the initializer expressions for both t0 and t1 must be executed, and doing so must follow all standard rules. The result of the function is used, so it is not possible to optimize out the function call, though it doesn't directly depend on t1 or vice versa, so one might naively be inclined to think that it's legal to move it around, why not. Maybe after the initialization of t1, which doesn't depend on the calculation?
Indirectly, however, the result of t1 does of course depend on side effects by veryLongComputation() (notably the computation taking time, if nothing else), which is exactly one of the reasons that there exist such a thing as "sequence point".
There are three "end of expression" sequence points (plus three "end of function" and "end of initializer" SPs), and at every sequence point it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed.
There is no way you can keep this promise if you move around the three statements, since the possible side effects of all functions called are not known. The compiler is only allowed to optimize if it can guarantee that it will keep the promise up. It can't, since the library functions are opaque, their code isn't available (nor is the code within veryLongComputation, necessarily known in that translation unit).
Compilers do however sometimes have "special knowledge" about library functions, such as some functions will not return or may return twice (think exit or setjmp).
However, since every non-empty, non-trivial function (and veryLongComputation is quite non-trivial from its name) will consume time, a compiler having "special knowledge" about the otherwise opaque clock library function would in fact have to be explicitly disallowed from reordering calls around this one, knowing that doing so not only may, but will affect the results.
Now the interesting question is why does the compiler do this anyway? I can think of two possibilities. Maybe your code triggers a "looks like benchmark" heuristic and the compiler is trying to cheat, who knows. It wouldn't be the first time (think SPEC2000/179.art, or SunSpider for two historic examples). The other possibility would be that somewhere inside veryLongComputation(), you inadvertedly invoke undefined behavior. In that case, the compiler's behavior would even be legal.

Could a C++ implementation, in theory, parallelise the evaluation of two function arguments?

Given the following function call:
f(g(), h())
since the order of evaluation of function arguments is unspecified (still the case in C++11 as far as I'm aware), could an implementation theoretically execute g() and h() in parallel?
Such a parallelisation could only kick in were g and h known to be fairly trivial (in the most obvious case, accessing only data local to their bodies) so as not to introduce concurrency issues but, beyond that restriction I can't see anything to prohibit it.
So, does the standard allow it? Even if only by the as-if rule?
(In this answer, Mankarse asserts otherwise; however, he does not cite the standard, and my read-through of [expr.call] hasn't revealed any obvious wording.)
The requirement comes from [intro.execution]/15:
... When calling a function ... Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function [Footnote: In other words, function executions do not interleave with each other.].
So any execution of the body of g() must be indeterminately sequenced with (that is, not overlapping with) the evaluation of h() (because h() is an expression in the calling function).
The critical point here is that g() and h() are both function calls.
(Of course, the as-if rule means that the possibility cannot be entirely ruled out, but it should never happen in a way that could affect the observable behaviour of a program. At most, such an implementation would just change the performance characteristics of the code.)
As long as you can't tell, whatever the compiler does to evaluate these functions is entirely up to the compiler. Clearly, the evaluation of the functions cannot involve any access to shared, mutable data as this would introduce data races. The basic guiding principle is the "as if"-rule and the fundamental observable operations, i.e., access to volatile data, I/O operations, access to atomic data, etc. The relevant section is 1.9 [intro.execution].
Not unless the compiler knew exactly what g(), h(), and anything they call does.
The two expressions are function calls, which may have unknown side effects. Therefore, parallelizing them could cause a data-race on those side effects. Since the C++ standard does not allow argument evaluation to cause a data-race on any side effects of the expressions, the compiler can only parallelize them if it knows that no such data race is possible.
That means walking though each function and look at exactly what they do and/or call, then tracking through those functions, etc. In the general case, it's not feasible.
Easy answer: when the functions are sequenced, even if indeterminately, there is no possibility for a race condition between the two, which is not true if they are parallelized. Even a pair of one line "trivial" functions could do it.
void g()
{
*p = *p + 1;
}
void h()
{
*p = *p - 1;
}
If p is a name shared by g and h, then a sequential calling of g and h in any order will result in the value pointed to by p not changing. If they are parallelized, the reading of *p and the assigning of it could be interleaved arbitrarily between the two:
g reads *p and finds the value 1.
f reads *p and also finds the value 1.
g writes 2 to *p.
f, still using the value 1 it read before will write 0 to *p.
Thus, the behavior is different when they are parallelized.

What exactly is a 'side-effect' in C++?

Is it a standard term which is well defined, or just a term coined by developers to explain a concept (.. and what is the concept)? As I understand this has something to do with the all-confusing sequence points, but am not sure.
I found one definition here, but doesn't this make each and every statement of code a side effect?
A side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.
Can someone please explain what the term 'side effect' formally means in C++, and what is its significance?
For reference, some questions talking about side effects:
Is comma operator free from side effect?
Force compiler to not optimize side-effect-less statements
Side effects when passing objects to function in C++
A "side effect" is defined by the C++ standard in [intro.execution], by:
Reading an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
The term "side-effect" arises from the distinction between imperative languages and pure functional languages. A C++ expression can do three things:
compute a result (or compute "no result" in the case of a void expression),
raise an exception instead of evaluating to a result,
in addition to 1 or 2, otherwise alter the state of the abstract machine on which the program is nominally running.
(3) are side-effects, the "main effect" being to evaluate the result of the expression. Exceptions are a slightly awkward special case, in that altering the flow of control does change the state of the abstract machine (by changing the current point of execution), but isn't a side-effect. The code to construct, handle and destroy the exception may have its own side-effects, of course.
The same principles apply to functions, with the return value in place of the result of the expression.
So, int foo(int a, int b) { return a + b; } just computes a return value, it doesn't alter anything else. Therefore it has no side-effects, which sometimes is an interesting property of a function when it comes to reasoning about your program (e.g. to prove that it is correct, or by the compiler when it optimizes). int bar(int &a, int &b) { return ++a + b; } does have a side-effect, since modifying the caller's object a is an additional effect of the function beyond simply computing a return value. It would not be permitted in a pure functional language.
The stuff in your quote about "has finished being evaluated" refers to the fact that the result of an expression (or return value of a function) can be a "temporary object", which is destroyed at the end of the full expression in which it occurs. So creating a temporary isn't a "side-effect" by that definition: other changes are.
What exactly is a 'side-effect' in C++? Is it a standard term which is well defined...
c++11 draft - 1.9.12: Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
I found one definition here, but doesn't this make each and every statement of code a side effect?
A side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.
Can someone please explain what the term 'side effect' formally means in C++, and what is its significance?
The significance is that, as expressions are being evaluated they can modify the program state and/or perform I/O. Expressions are allowed in myriad places in C++: variable assignments, if/else/while conditions, for loop setup/test/modify steps, function parameters etc.... A couple examples: ++x and strcat(buffer, "append this").
In a C++ program, the Standard grants the optimiser the right to generate code representing the program operations, but requires that all the operations associated with steps before a sequence point appear before any operations related to steps after the sequence point.
The reason C++ programmers tend to have to care about sequence points and side effects is that there aren't as many sequence points as you might expect. For example: given x = 1; f(++x, ++x);, you may expect a call to f(2, 3) but it's actually undefined behaviour. This behaviour is left undefined so the compiler's optimiser has more freedom to arrange operations with side effects to run in the most efficient order possible - perhaps even in parallel. It also avoid burdening compiler writers with detecting such conditions.
1.Is comma operator free from side effect?
Yes - a comma operator introduces a sequence point: the steps on the left must be complete before those on the right execute. There are a list of sequence points at http://en.wikipedia.org/wiki/Sequence_point - you should read this! (If you have to ask about side effects, then be careful in interpreting this answer - the "comma operator" is NOT invoked between function arguments, array initialisation elements etc.. The comma operator is relatively rarely used and somewhat obscure. Do some reading if you're not sure what the comma operator really is.)
2.Force compiler to not optimize side-effect-less statements
I assume you mean "side-effect-ful" statements. Compiler's are not obliged to support any such option. What behaviour would they exhibit if they tried? - the Standard doesn't define what they should do in such situations. Sometimes a majority of programmers might share an intuitive expectation, but other times it's really arbitary.
3.Side effects when passing objects to function in C++
When calling a function, all the parameters must have been completely evaluated - and their side effects triggered - before the function call takes place. BUT, there are no restrictions on the compiler related to evaluating specific parameter expressions before any other. They can be overlapping, in parallel etc.. So, in f(expr1, expr2) - some of the steps in evaluating expr2 might run before anything from expr1, but expr1 might still complete first - it's undefined.
1.9.6
The observable behavior of the abstract machine is its sequence of
reads and writes to volatile data and calls to library I/O
functions.
A side-effect is anything that affects observable behavior.
Note that there are exceptions specified by the standard, where observable behavior doesn't have to conform to that of the abstract machine - see return value optimization, temporary copy elision.

C++: Calling a constructor to a temporary object

Suppose I have the following:
int main() {
SomeClass();
return 0;
}
Without optimization, the SomeClass() constructor will be called, and then its destructor will be called, and the object will be no more.
However, according to an IRC channel that constructor/destructor call may be optimized away if the compiler thinks there's no side effect to the SomeClass constructors/destructors.
I suppose the obvious way to go about this is not to use some constructor/destructor function (e.g use a function, or a static method or so), but is there a way to ensure the calling of the constructors/destructors?
However, according to an IRC channel that constructor/destructor call may be optimized away if the compiler thinks there's no side effect to the SomeClass constructors/destructors.
The bolded part is wrong. That should be: knows there is no observable behaviour
E.g. from § 1.9 of the latest standard (there are more relevant quotes):
A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even
with regard to operations preceding the first undefined operation).
As a matter of fact, this whole mechanism underpins the sinlge most ubiquitous C++ language idiom: Resource Acquisition Is Initialization
Backgrounder
Having the compiler optimize away the trivial case-constructors is extremely helpful. It is what allows iterators to compile down to exactly the same performance code as using raw pointer/indexers.
It is also what allows a function object to compile down to the exact same code as inlining the function body.
It is what makes C++11 lambdas perfectly optimal for simple use cases:
factorial = std::accumulate(begin, end, [] (int a,int b) { return a*b; });
The lambda compiles down to a functor object similar to
struct lambda_1
{
int operator()(int a, int b) const
{ return a*b; }
};
The compiler sees that the constructor/destructor can be elided and the function body get's inlined. The end result is optimal 1
More (un)observable behaviour
The standard contains a very entertaining example to the contrary, to spark your imagination.
§ 20.7.2.2.3
[ Note: The use count updates caused by the temporary object construction and destruction are not
observable side effects, so the implementation may meet the effects (and the implied guarantees) via
different means, without creating a temporary. In particular, in the example:
shared_ptr<int> p(new int);
shared_ptr<void> q(p);
p = p;
q = p;
both assignments may be no-ops. —end note ]
IOW: Don't underestimate the power of optimizing compilers. This in no way means that language guarantees are to be thrown out of the window!
1 Though there could be faster algorithms to get a factorial, depending on the problem domain :)
I'm sure is 'SomeClass::SomeClass()' is not implemented as 'inline', the compiler has no way of knowing that the constructor/destructor has no side effects, and it will call the constructor/destructor always.
If the compiler is optimizing away a visible effect of the constructor/destructor call, it is buggy. If it has no visible effect, then you shouldn't notice it anyway.
However let's assume that somehow your constructor or destructor does have a visible effect (so construction and subsequent destruction of that object isn't effectively a no-op) in such a way that the compiler could legitimately think it wouldn't (not that I can think of such a situation, but then, it might be just a lack of imagination on my side). Then any of the following strategies should work:
Make sure that the compiler cannot see the definition of the constructor and/or destructor. If the compiler doesn't know what the constructor/destructor does, it cannot assume it does not have an effect. Note, however, that this also disables inlining. If your compiler does not do cross-module optimization, just putting the constructor/destructor into a different file should suffice.
Make sure that your constructor/destructor actually does have observable behaviour, e.g. through use of volatile variables (every read or write of a volatile variable is considered observable behaviour in C++).
However let me stress again that it's very unlikely that you have to do anything, unless your compiler is horribly buggy (in which case I'd strongly advice you to change the compiler :-)).

What Rules does compiler have to follow when dealing with volatile memory locations?

I know when reading from a location of memory which is written to by several threads or processes the volatile keyword should be used for that location like some cases below but I want to know more about what restrictions does it really make for compiler and basically what rules does compiler have to follow when dealing with such case and is there any exceptional case where despite simultaneous access to a memory location the volatile keyword can be ignored by programmer.
volatile SomeType * ptr = someAddress;
void someFunc(volatile const SomeType & input){
//function body
}
What you know is false. Volatile is not used to synchronize memory access between threads, apply any kind of memory fences, or anything of the sort. Operations on volatile memory are not atomic, and they are not guaranteed to be in any particular order. volatile is one of the most misunderstood facilities in the entire language. "Volatile is almost useless for multi-threadded programming."
What volatile is used for is interfacing with memory-mapped hardware, signal handlers and the setjmp machine code instruction.
It can also be used in a similar way that const is used, and this is how Alexandrescu uses it in this article. But make no mistake. volatile doesn't make your code magically thread safe. Used in this specific way, it is simply a tool that can help the compiler tell you where you might have messed up. It is still up to you to fix your mistakes, and volatile plays no role in fixing those mistakes.
EDIT: I'll try to elaborate a little bit on what I just said.
Suppose you have a class that has a pointer to something that cannot change. You might naturally make the pointer const:
class MyGizmo
{
public:
const Foo* foo_;
};
What does const really do for you here? It doesn't do anything to the memory. It's not like the write-protect tab on an old floppy disc. The memory itself it still writable. You just can't write to it through the foo_ pointer. So const is really just a way to give the compiler another way to let you know when you might be messing up. If you were to write this code:
gizmo.foo_->bar_ = 42;
...the compiler won't allow it, because it's marked const. Obviously you can get around this by using const_cast to cast away the const-ness, but if you need to be convinced this is a bad idea then there is no help for you. :)
Alexandrescu's use of volatile is exactly the same. It doesn't do anything to make the memory somehow "thread safe" in any way whatsoever. What it does is it gives the compiler another way to let you know when you may have screwed up. You mark things that you have made truly "thread safe" (through the use of actual synchronization objects, like Mutexes or Semaphores) as being volatile. Then the compiler won't let you use them in a non-volatile context. It throws a compiler error you then have to think about and fix. You could again get around it by casting away the volatile-ness using const_cast, but this is just as Evil as casting away const-ness.
My advice to you is to completely abandon volatile as a tool in writing multithreadded applications (edit:) until you really know what you're doing and why. It has some benefit but not in the way that most people think, and if you use it incorrectly, you could write dangerously unsafe applications.
It's not as well defined as you probably want it to be. Most of the relevant standardese from C++98 is in section 1.9, "Program Execution":
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.
Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
Once the execution of a function begins, no expressions from the calling function are evaluated until execution of the called function has completed.
When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with type other than volatile sig_atomic_t are unspecified, and the value of any object not of volatile sig_atomic_t that is modified by the handler becomes undefined.
An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal).
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting messages actually appear prior to a program waiting for input. What constitutes an interactive device is implementation-defined.
So what that boils down to is:
The compiler cannot optimize away reads or writes to volatile objects. For simple cases like the one casablanca mentioned, that works the way you might think. However, in cases like
volatile int a;
int b;
b = a = 42;
people can and do argue about whether the compiler has to generate code as if the last line had read
a = 42; b = a;
or if it can, as it normally would (in the absence of volatile), generate
a = 42; b = 42;
(C++0x may have addressed this point, I haven't read the whole thing.)
The compiler may not reorder operations on two different volatile objects that occur in separate statements (every semicolon is a sequence point) but it is totally allowed to rearrange accesses to non-volatile objects relative to volatile ones. This is one of the many reasons why you should not try to write your own spinlocks, and is the primary reason why John Dibling is warning you not to treat volatile as a panacea for multithreaded programming.
Speaking of threads, you will have noticed the complete absence of any mention of threads in the standards text. That is because C++98 has no concept of threads. (C++0x does, and may well specify their interaction with volatile, but I wouldn't be assuming anyone implements those rules yet if I were you.) Therefore, there is no guarantee that accesses to volatile objects from one thread are visible to another thread. This is the other major reason volatile is not especially useful for multithreaded programming.
There is no guarantee that volatile objects are accessed in one piece, or that modifications to volatile objects avoid touching other things right next to them in memory. This is not explicit in what I quoted but is implied by the stuff about volatile sig_atomic_t -- the sig_atomic_t part would be unnecessary otherwise. This makes volatile substantially less useful for access to I/O devices than it was probably intended to be, and compilers marketed for embedded programming often offer stronger guarantees, but it's not something you can count on.
Lots of people try to make specific accesses to objects have volatile semantics, e.g. doing
T x;
*(volatile T *)&x = foo();
This is legit (because it says "object designated by a volatile lvalue" and not "object with a volatile type") but has to be done with great care, because remember what I said about the compiler being totally allowed to reorder non-volatile accesses relative to volatile ones? That goes even if it's the same object (as far as I know anyway).
If you are worried about reordering of accesses to more than one volatile value, you need to understand the sequence point rules, which are long and complicated and I'm not going to quote them here because this answer is already too long, but here's a good explanation which is only a little simplified. If you find yourself needing to worry about the differences in the sequence point rules between C and C++ you have already screwed up somewhere (for instance, as a rule of thumb, never overload &&).
A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).
Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).
Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.
Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.
Declaring a variable as volatile means the compiler can't make any assumptions about the value that it could have done otherwise, and hence prevents the compiler from applying various optimizations. Essentially it forces the compiler to re-read the value from memory on each access, even if the normal flow of code doesn't change the value. For example:
int *i = ...;
cout << *i; // line A
// ... (some code that doesn't use i)
cout << *i; // line B
In this case, the compiler would normally assume that since the value at i wasn't modified in between, it's okay to retain the value from line A (say in a register) and print the same value in B. However, if you mark i as volatile, you're telling the compiler that some external source could have possibly modified the value at i between line A and B, so the compiler must re-fetch the current value from memory.
The compiler is not allowed to optimize away reads of a volatile object in a loop, which otherwise it'd normally do (i.e. strlen()).
It's commonly used in embedded programming when reading a hardware registry at a fixed address, and that value may change unexpectedly. (In contrast with "normal" memory, that doesn't change unless written to by the program itself...)
That is it's main purpose.
It could also be used to make sure one thread see the change in a value written by another, but it in no way guarantees atomicity when reading/writing to said object.