What exactly is the "as-if" rule?

What exactly is the "as-if" rule? - c++

As the title says:
What exactly is the "as-if" rule?
A typical answer one would get is:
The rule that allows any and all code transformations that do not change the observable behavior of the program
From time to time, we keep getting behaviors from certain implementations, which are attributed to this rule. Many times wrongly.
So, what exactly is this rule? The standard does not clearly mention this rule as a section or paragraph, so what exactly falls under the purview of this rule?
To me, it seems like a grey area which is not defined in detail by the standard. Can someone elaborate on the details, citing the references from the standard?
Note: Tagging this as C and C++ both, because it is relevant to both languages.

What is the "as-if" rule?
The "as-if" rule basically defines what transformations an implementation is allowed to perform on a legal C++ program. In short, all transformations that do not affect a program's "observable behavior" (see below for a precise definition) are allowed.
The goal is to give implementations freedom to perform optimizations as long as the behavior of the program remains compliant with the semantics specified by the C++ Standard in terms of an abstract machine.
Where does the Standard introduce this rule?
The C++11 Standard introduces the "as-if" rule in Paragraph 1.9/1:
The semantic descriptions in this International Standard define a parameterized nondeterministic abstract
machine. This International Standard places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as explained
below.
Also, an explanatory footnote adds:
This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this
International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the
observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can
deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.
What does the rule mandate exactly?
Paragraph 1.9/5 further specifies:
A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even
with regard to operations preceding the first undefined operation).
It is worth stressing that this constraint applies when "executing a well-formed program" only, and that the possible outcomes of executing a program which contains undefined behavior are unconstrained. This is made explicit in Paragraph 1.9/4 as well:
Certain other operations are described in this International Standard as undefined (for example, the effect
of attempting to modify a const object). [ Note: This International Standard imposes no requirements on
the behavior of programs that contain undefined behavior. —end note ]
Finally, concerning the definition of "observable behavior", Paragraph 1.9/8 goes as follows:
The least requirements on a conforming implementation are:
— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
— At program termination, all data written into files shall be identical to one of the possible results that
execution of the program according to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place in such a fashion that prompting
output is actually delivered before a program waits for input. What constitutes an interactive device
is implementation-defined.
These collectively are referred to as the observable behavior of the program. [ Note: More stringent
correspondences between abstract and actual semantics may be defined by each implementation. —end
note ]
Are there situations where this rule does not apply?
To the best of my knowledge, the only exception to the "as-if" rule is copy/move elision, which is allowed even though the copy constructor, move constructor, or destructor of a class have side effects. The exact conditions for this are specified in Paragraph 12.8/31:
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class
object, even if the constructor selected for the copy/move operation and/or the destructor for the object
have side effects. [...]

In C11 the rule is never called by that name. However C, just like C++, defines the behaviour in terms of abstract machine. The as-if rule is in C11 5.1.2.3p4 and p6:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
[...]
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take
place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
 
This is the observable behavior of the program.

In C, C++, Ada, Java, SML... in any programming language well specified by describing the (usually many possible, non-deterministic) behavior(s) of a program (exposed to series of interactions on I/O ports), there is no distinct as-if rule.
An example of distinct rule is the one that says that a division by zero raises an exception (Ada, Caml) or a null dereference raises an exception (Java). You could change the rule to specify something else and you would end up with a different language (that some people would rather call a "dialect"(*). A distinct rule is there to specify some distinct uses of a programming language like a distinct grammatical rule cover some syntax constructs.
(*) A dialect according to some linguists is a language with an "army". in that context, that could mean a programming language without a committee and a specific industry of compiler editors.
The as-if rule is not a distinct rule; it doesn't cover any program in particular and is not even a rule that could be discussed, removed, or altered in any way: the so called "rule" simply reiterates that program semantics is defined, and can only be portably (universally) defined, in term of the visible interactions of an execution of the program with the "external" world.
The external world can be I/O interfaces (stdio), a GUI, even an interactive interpreter that output the resulting value of a pure applicative language. In C and C++ is includes the (vaguely specified) accesses to volatile objects, which is another way of saying that some objects at given point must be represented in memory strictly according to the ABI (Application Binary Interface) without ever mentioning the ABI explicitly.
The definition of what is a trace of execution, also called the visible or observable behavior defines what is meant by "as-if rule". The as-if rule tries to explain it, but by doing so, it confuses people more than it clarifies things as it gives the expression of being an additional semantic rule giving more leeway to the implementation.
Summary:
The so called "as-if rule" does not relax any constraints on implementations.
You cannot remove the as-if rule in any programming language specified in term of visible behavior (execution traces composed for interaction with the external world) to get a distinct dialect.
You cannot add the as-if rule to any programming language not specified in term of visible behavior.

Related

Is a program where undefined behavior (UB) is conditional on implementation leeway a program with unconditional UB?

In the answer to
Is it valid to create closure (lambda) objects using `std::bit_cast` in C++20?
it was shown that a program could have undefined behavior "depending on" (des Pudels Kern in this question) how an implementation used implementation leeway given by the standard. As an example, [expr.prim.lambda.closure]/2:
The closure type is declared in the smallest block scope, class scope,
or namespace scope that contains the corresponding lambda-expression. [...]
The closure type is not an aggregate type. An
implementation may define the closure type differently from what is
described below provided this does not alter the observable behavior
of the program other than by changing:
(2.1) the size and/or alignment of the closure type,
(2.2) whether the closure type is trivially copyable ([class.prop]), or
(2.3) whether the closure type is a standard-layout class ([class.prop]). [...]
It was pointed out in a comment to the answer that this scenario is not implementation-defined behavior
"implementation-defined" has a very specific meaning ([intro.abstract]/2); this isn't a case of that.
Would a program which had undefined behavior (UB) conditionally on such implementation leeway, have unconditional UB, possibly as per [intro.abstract]/5? Or how would such a program be described, in standardese terms?

Assuming I understand the question correctly, here is a simpler example:
void* storage = ::operator new(100);
new (storage) std::string;
In some language implementation, where the string fits in the memory, the behaviour of this example program would be defined. But the standard does not provide a guarantee that any language implementation satisfies that assumption and in language implementation where the assumption doesn't hold, the behaviour is undefined.
The behaviour is undefined conditionally, depending on the language implementation. Same applies to the more subtle example described in the question.
It's not "implementation defined" behaviour because the standard doesn't say that it's "implementation defined" using those quoted words. If standard did say that, it would imply that language implementation must document that behaviour. As it is, there is no requirement to document whether closure type is trivially copyable.
To avoid this phrase with special meaning, we can use alternatives such as "implementation dependent" or "unspecified" to describe the situation instead.
If you wish to write programs that are portable to any language implementation of the current standard, including one's that exist in the future whose implementation you cannot know at the moment, you should not unconditionally rely on such implementation details.
You could use a type trait to observe whether the closure is trivially copyable, and conditionally use std::bit_cast only when it is well formed and well defined - if you have a good reason to do so.

Neither the C nor C++ Standard was written to fully describe all situations where implementations for various platforms and purposes should or should not be expected to process programs meaningfully. The term "implementation defined" is used only in situations where all implementations would be required to specify a behavior which, at least for code running on a single thread, would be consistent with sequential program execution. Even if most implementations should process a construct identically, the Standards will still use the term "Undefined Behavior" if there might be some implementations where it would be impractical to specify and implement a behavior that would always be predictable and consistent with sequential program execution. This among other things applies to constructs that could trap with side effects not anticipated by the Standard. For example, given something like:
float x,y; // Assume at least one might not get written before the following:
float temp= x*y;
if (func1())
func2(temp);
if no further use is made of temp, an implementation might sensibly defer the multiplication across the function call. If an attempt to multiply an invalid float value might trap, however, the effect of such deferral might be observable. Because implementations that can offer useful useful behavioral guarantees in cases not mandated by the Standard are always free to do so whether or not the Standard would require such behavior, the question of whether to mandate the behavior was only expected to be relevant in cases where it would be impractical implementations to process it meaningfully.

Is the compiler allowed to constant-fold a local volatile?

Consider this simple code:
void g();
void foo()
{
volatile bool x = false;
if (x)
g();
}
https://godbolt.org/z/I2kBY7
You can see that neither gcc nor clang optimize out the potential call to g. This is correct in my understanding: The abstract machine is to assume that volatile variables may change at any moment (due to being e.g. hardware-mapped), so constant-folding the false initialization into the if check would be wrong.
But MSVC eliminates the call to g entirely (keeping the reads and writes to the volatile though!). Is this standard-compliant behavior?
Background: I occasionally use this kind of construct to be able to turn on/off debugging output on-the-fly: The compiler has to always read the value from memory, so changing that variable/memory during debugging should modify the control flow accordingly. The MSVC output does re-read the value but ignores it (presumably due to constant folding and/or dead code elimination), which of course defeats my intentions here.
Edits:
The elimination of the reads and writes to volatile is discussed here: Is it allowed for a compiler to optimize away a local volatile variable? (thanks Nathan!). I think the standard is abundantly clear that those reads and writes must happen. But that discussion does not cover whether it is legal for the compiler to take the results of those reads for granted and optimize based on that. I suppose this is under-/unspecified in the standard, but I'd be happy if someone proved me wrong.
I can of course make x a non-local variable to side-step the issue. This question is more out of curiosity.

I think [intro.execution] (paragraph number vary) could be used to explain MSVC behavior:
An instance of each object with automatic storage duration is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended...
The standard does not permit elimination of a read through a volatile glvalue, but the paragraph above could be interpreted as allowing to predict the value false.
BTW, the C Standard (N1570 6.2.4/2) says that
An object exists, has a constant address, and retains its last-stored value throughout its lifetime.34
34) In the case of a volatile object, the last store need not be explicit in the program.
It is unclear if there could be a non-explicit store into an object with automatic storage duration in C memory/object model.

TL;DR The compiler can do whatever it wants on each volatile access. But the documentation has to tell you.--"The semantics of an access through a volatile glvalue are implementation-defined."
The standard defines for a program permitted sequences of "volatile accesses" & other "observable behavior" (achieved via "side-effects") that an implementation must respect per "the 'as-if' rule".
But the standard says (my boldface emphasis):
Working Draft, Standard for Programming Language C++
Document Number: N4659
Date: 2017-03-21
§ 10.1.7.1 The cv-qualifiers
5 The semantics of an access through a volatile glvalue are implementation-defined. […]
Similarly for interactive devices (my boldface emphasis):
§ 4.6 Program execution
5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. [...]
7 The least requirements on a conforming implementation are:
(7.1) — Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
(7.2) — At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
(7.3) — The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program. [...]
(Anyway what specific code is generated for a program is not specified by the standard.)
So although the standard says that volatile accesses can't be elided from the abstract sequences of abstract machine side effects & consequent observable behaviors that some code (maybe) defines, you can't expect anything to be reflected in object code or real-world behaviour unless your compiler documentation tells you what constitutes a volatile access. Ditto for interactive devices.
If you are interested in volatile vis a vis the abstract sequences of abstract machine side effects and/or consequent observable behaviors that some code (maybe) defines then say so. But if you are interested in what corresponding object code is generated then you must interpret that in the context of your compiler & compilation.
Chronically people wrongly believe that for volatile accesses an abstract machine evaluation/read causes an implemented read & an abstract machine assignment/write causes an implemented write. There is no basis for this belief absent implementation documentation saying so. When/iff the implementation says that it actually does something upon a "volatile access", people are justified in expecting that something--maybe, the generation of certain object code.

I believe it is legal to skip the check.
The paragraph that everyone likes to quote
34) In the case of a volatile object, the last store need not be explicit in the program
does not imply that an implementation must assume such stores are possible at any time, or for any volatile variable. An implementation knows which stores are possible. For instance, it is entirely reasonable to assume that such implicit writes only happen for volatile variables that are mapped to device registers, and that such mapping is only possible for variables with external linkage. Or an implementation may assume that such writes only hapen to word-sized, word-aligned memory locations.
Having said that, I think MSVC behaviour is a bug. There is no real-world reason to optimise away the call. Such optimisation may be compliant, but it is needlessly evil.

Why Loops with one basic instruction works in O(1) [duplicate]

As the title says:
What exactly is the "as-if" rule?
A typical answer one would get is:
The rule that allows any and all code transformations that do not change the observable behavior of the program
From time to time, we keep getting behaviors from certain implementations, which are attributed to this rule. Many times wrongly.
So, what exactly is this rule? The standard does not clearly mention this rule as a section or paragraph, so what exactly falls under the purview of this rule?
To me, it seems like a grey area which is not defined in detail by the standard. Can someone elaborate on the details, citing the references from the standard?
Note: Tagging this as C and C++ both, because it is relevant to both languages.

In C11 the rule is never called by that name. However C, just like C++, defines the behaviour in terms of abstract machine. The as-if rule is in C11 5.1.2.3p4 and p6:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
[...]
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take
place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
 
This is the observable behavior of the program.

Creating files vs compiler optimizations [duplicate]

As the title says:
What exactly is the "as-if" rule?
A typical answer one would get is:
The rule that allows any and all code transformations that do not change the observable behavior of the program
From time to time, we keep getting behaviors from certain implementations, which are attributed to this rule. Many times wrongly.
So, what exactly is this rule? The standard does not clearly mention this rule as a section or paragraph, so what exactly falls under the purview of this rule?
To me, it seems like a grey area which is not defined in detail by the standard. Can someone elaborate on the details, citing the references from the standard?
Note: Tagging this as C and C++ both, because it is relevant to both languages.

In C11 the rule is never called by that name. However C, just like C++, defines the behaviour in terms of abstract machine. The as-if rule is in C11 5.1.2.3p4 and p6:
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
[...]
The least requirements on a conforming implementation are:
Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take
place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
 
This is the observable behavior of the program.

Do empty expressions evaluate to NOP?

Wondering if empty expressions evaluate to NOP or if it's compiler dependent.
// Trivial example
int main()
{
;;
}

It's compiler dependent but the observable behaviour must be that nothing happens. In practice, I'm sure most compilers will omit no code at all for an empty expression.
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input.
And the observable behaviour is defined by:
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program.
This is really the only requirement for an implementation. It is often known as the "as-if" rule - the compiler can do whatever it likes as long as the observable behaviour is as expected.
For what it's worth, these empty expressions are known as null statements:
An expression statement with the expression missing is called a null statement.
If you really want a NOP, you can try:
asm("nop");
This is, however, conditionally supported and its behaviour is implementation-defined.

or if it's compiler dependent.
It is compiler-dependent ("as-if rule"), but most reasonable optimizing compilers will just ignore empty statements for the sake of efficiency, and they generally won't emit NOP instructions.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js