Related
When a class member cannot have a sensible meaning at the moment of construction,
I don't initialize it. Obviously that only applies to POD types, you cannot NOT
initialize an object with constructors.
The advantage of that, apart from saving CPU cycles initializing something to
a value that has no meaning, is that I can detect erroneous usage of these
variables with valgrind; which is not possible when I'd just give those variables
some random value.
For example,
struct MathProblem {
bool finished;
double answer;
MathProblem() : finished(false) { }
};
Until the math problem is solved (finished) there is no answer. It makes no sense to initialize answer in advance (to -say- zero) because that might not be the answer. answer only has a meaning after finished was set to true.
Usage of answer before it is initialized is therefore an error and perfectly OK to be UB.
However, a trivial copy of answer before it is initialized is currently ALSO UB (if I understand the standard correctly), and that doesn't make sense: the default copy and move constructor should simply be able to make a trivial copy (aka, as-if using memcpy), initialized or not: I might want to move this object into a container:
v.push_back(MathProblem());
and then work with the copy inside the container.
Is moving an object with an uninitialized, trivially copyable member indeed defined as UB by the standard? And if so, why? It doesn't seem to make sense.
Is moving an object with an uninitialized, trivially copyable member indeed defined as UB by the standard?
Depends on the type of the member. Standard says:
[basic.indet]
When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced ([expr.ass]).
If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
If an indeterminate value of unsigned ordinary character type ([basic.fundamental]) or std::byte type ([cstddef.syn]) is produced by the evaluation of:
the second or third operand of a conditional expression,
the right operand of a comma expression,
the operand of a cast or conversion ([conv.integral], [expr.type.conv], [expr.static.cast], [expr.cast]) to an unsigned ordinary character type or std::byte type ([cstddef.syn]), or
a discarded-value expression,
then the result of the operation is an indeterminate value.
If an indeterminate value of unsigned ordinary character type or std::byte type is produced by the evaluation of the right operand of a simple assignment operator ([expr.ass]) whose first operand is an lvalue of unsigned ordinary character type or std::byte type, an indeterminate value replaces the value of the object referred to by the left operand.
If an indeterminate value of unsigned ordinary character type is produced by the evaluation of the initialization expression when initializing an object of unsigned ordinary character type, that object is initialized to an indeterminate value.
If an indeterminate value of unsigned ordinary character type or std::byte type is produced by the evaluation of the initialization expression when initializing an object of std::byte type, that object is initialized to an indeterminate value.
None of the exceptional cases apply to your example object, so UB applies.
with memcpy is UB?
It is not. std::memcpy interprets the object as an array of bytes, in which exceptional case there is no UB. You still have UB if you attempt to read the indeterminate copy (unless the exceptions above apply).
why?
The C++ standard doesn't include a rationale for most rules. This particular rule has existed since the first standard. It is slightly stricter than the related C rule which is about trap representations. To my understanding, there is no established convention for trap handling, and the authors didn't wish to restrict implementations by specifying it, and instead opted to specify it as UB. This also has the effect of allowing optimiser to deduce that indeterminate values will never be read.
I might want to move this object into a container:
Moving an uninitialised object into a container is typically a logic error. It is unclear why you might want to do such thing.
The design of the C++ Standard was heavily influenced by the C Standard, whose authors (according to the published Rationale) intended and expected that implementations would, on a quality-of-implementation basis, extend the semantics of the language by meaningfully processing programs in cases where it was clear that doing so would be useful, even if the Standard didn't "officially" define the behavior of those programs. Consequently, both standards place more priority upon ensuring that they don't mandate behaviors in cases where doing so might make some implementations less useful, than upon ensuring that they mandate everything that should be supported by quality general-purpose implementations.
There are many cases where it may be useful for an implementation to extend the semantics of the language by guaranteeing that using memcpy on any valid region of storage will, at worst, behave in a fashion consistent with populating the destination with some possibly-meaningless bit pattern with no outside side effects, and few if any where it would be either easier or more useful to have it do something else. The only situations where anyone should care about whether the behavior of memcpy is defined in a particular situation involving valid regions of storage would be those in which some alternative behavior would be genuinely more useful than the commonplace one. If such situations exist, compiler writers and their customers would be better placed than the Committee to judge which behavior would be most useful.
As an example of a situation where an alternative behavior might be more useful, consider code which uses memcpy to copy a partially-written structure, and then uses it to make two copies of that structure. In some cases, having the compiler only write the parts of the two destination structures which had been written in the original may improve efficiency, but that behavior would be observably different from having the first memcpy behave as though it stores some bit pattern to its destination. Note that while such a change would not adversely affect a program's overall behavior if no copies of the uninitialized parts of the structure are ever used in a way that would affect behavior, the Standard has no nice way of distinguishing scenarios that could or could not occur under such a module, and thus leaves all such scenarios undefined.
This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 6 years ago.
I am confused about the output of the code.
It depends on what compiler i run the code. Why is it so?
#include <iostream>
using namespace std;
int f(int &n)
{
n--;
return n;
}
int main()
{
int n=10;
n=n-f(n);
cout<<n;
return 0;
}
Running it on the Ubuntu terminal with g++, the output is 1 whereas running it on Turbo C++ ( the compiler we used in school) gives output as 0.
In C++03, modifying a variable and also using its value in the same expression, without an intervening C++03 sequence point, was Undefined Behavior.
C++03 §5/4:
” Between the previous
and next sequence point a scalar object shall have its stored value modified at most once by the evaluation
of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.
The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.
Undefined Behavior, UB, provides the compiler with an opportunity to optimize, because it can assume that UB does not occur in a valid program.
However, with all the myriad UB rules of C++ it's difficult to reason about source code.
In C++11 sequence points were replaced with sequenced before, indeterminately sequenced and unsequenced relations:
C++11 §1.9/3
” Given any two evaluations A and B, if
A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before
B and B is not sequenced before A, then A and B are unsequenced. [Note: The execution of unsequenced
evaluations can overlap. —end note ] Evaluations A and B are indeterminately sequenced when either A
is sequenced before B or B is sequenced before A, but it is unspecified which.
And with the new C++11 sequence relationship rules the modification in the function in the code in question is indeterminately sequenced with respect to the use of the variable, and so the code has unspecified behavior rather than Undefined Behavior, as noted by Eric M Schmidt in a comment to (the first version of) this answer. Essentially that means that there is no danger of nasal daemons or other possible UB effects, and that the behavior is a reasonable one. The two possible behaviors here are that the modification via the function call is done before the use of the value, or that it's done after the use of the value.
Why it's unspecified behavior:
C++11 §1.9/15:
” Every evaluation in the calling function (including other function calls) that is not otherwise specifically
sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.
What “unspecified behavior” means:
C++11 §1.3.25:
” unspecified behavior
Behavior, for a well-formed program construct and correct data, that depends on the implementation
[Note: The implementation is not required to document which behavior occurs. The range of possible
behaviors is usually delineated by this International Standard. —end note ]
Why the modification effected by the assignment is not problematic:
C++11 §5.17/1
” In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
This is also quite different from C++03.
As the rather drastic edit of this answer shows, following Eric's comment, this kind of issue is not simple! The main advice I can give is to as much as possible just Say No™ to effects governed by subtle or very complex rules, the corners of the language. Simple code has a better chance of being correct, while so called clever code does not have a good chance of being significantly faster.
In many discussions about undefined behavior (UB), the point of view has been put forward that in the mere presence in a program of any construct that has UB in a program mandates a conforming implementation to do just anything (including nothing at all). My question is whether this should be taken in that sense even in those cases where the UB is associated to the execution of code, while the behaviour (otherwise) specified in the standard stipulates that the code in question should not be executed (and this possibly for specific input to the program; it might not be decidable at compile time).
Phrased more informally, does the smell of UB mandate a conforming implementation to decide that the whole program stinks, and refuse to execute correctly even the parts of the program for which the behaviour is perfectly well defined. An example program would be
#include <iostream>
int main()
{
int n = 0;
if (false)
n=n++; // Undefined behaviour if it gets executed, which it doesn't
std::cout << "Hi there.\n";
}
For clarity,
I am assuming the program is well-formed (so in particular the UB is not associated to preprocessing). In fact I am willing to restrict to UB associated to "evaluations", which clearly are not compile-time entities. The definitions pertinent to the example given are, I think,(emphasis is mine):
Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread (1.10), which induces a partial order among those evaluations
The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either ... or a value computation using the value of the same scalar object, the behavior is undefined.
It is implicitly clear that the subjects in the final sentence, "side effect" and "value computation", are instances of "evaluation", since that is what the relation "sequenced before" is defined for.
I posit that in the above program, the standard stipulates that no evaluations occur for which the condition in the final sentence is satisfied (unsequenced relative to each other and of the described kind) and that therfore the program does not have UB; it is not erroneous.
In other words I am convinced that the answer to the question of my title is negative. However I would appreciate the (motivated) opinions of other people on this matter.
Maybe an additional question for those who advocate an affirmative answer, would that mandate that the proverbial reformatting of your hard drive might occur when an erroneous program is compiled?
Some related pointers on this site:
Observable behavior and undefined behavior -- What happens if I don't call a destructor?
Comments to this answer https://stackoverflow.com/a/24143792/1436796 (I do no longer stand absolutely with my answer itself)
C++ What is the earliest undefined behavior can manifest itself?
Difference between Undefined Behavior and Ill-formed, no diagnostic message required and its two answers, which represent opposite points of view
If a side effect on a scalar object is unsequenced relative to etc
Side effects are changes in the state of the execution environment (1.9/12). A change is a change, not an expression that, if evaluated, would potentially produce a change. If there is no change, there is no side effect. If there is no side effect, then no side effect is unsequenced relative to anything else.
This does not mean that any code which is never executed is UB-free (though I'm pretty sure most of it is). Each occurrence of UB in the standard needs to be examined separately. (The stricken-out text is probably overly cautious; see below).
The standard also says that
A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even
with regard to operations preceding the first undefined operation).
(emphasis mine)
This, as far as I can tell, is the only normative reference that says what the phrase "undefined behavior" means: an undefined operation in a program execution. No execution, no UB.
No. Example:
struct T {
void f() { }
};
int main() {
T *t = nullptr;
if (t) {
t->f(); // UB if t == nullptr but since the code tested against that
}
}
Deciding whether a program will perform an integer division by 0 (which is UB) is in general equivalent the halting problem. There is no way a compiler can determine that, in general. And so the mere presence of possible UB can not logically affect the rest of the program: a requirement to that effect in the standard, would require each compiler vendor to provide a halting problem solver in the compiler.
Even simpler, the following program has UB only if the user inputs 0:
#include <iostream>
using namespace std;
auto main() -> int
{
int x;
if( cin >> x ) cout << 100/x << endl;
}
It would be absurd to maintain that this program in itself has UB.
Once the undefined behavior occurs, however, then anything can happen: the further execution of code in the program is then compromised (e.g. the stack might have been fouled up).
In the general case the best we can say here is that it depends.
One case where the answer is no, happens when dealing with indeterminate values. The latest draft clearly makes it undefined behavior to produce an indeterminate value during an evaluation with some exceptions but the code sample clearly shows how subtle it could be:
[ Example:
int f(bool b) {
unsigned char c;
unsigned char d = c; // OK, d has an indeterminate value
int e = d; // undefined behavior
return b ? d : 0; // undefined behavior if b is true
}
— end example ]
so this line of code:
return b ? d : 0;
is only undefined if b is true. This seems to be the intuitive approach and seems to be how John Regehr sees it as well, if we read It’s Time to Get Serious About Exploiting Undefined Behavior.
In this case the answer is yes, the code is erroneous even though we are not calling the code invoking undefined behavior:
constexpr const char *str = "Hello World" ;
constexpr char access()
{
return str[100] ;
}
int main()
{
}
clang chooses to make access erroneous even though it is never invoked (see it live).
There's a clear divide between inherent undefined behaviour, such as n=n++, and code that can have defined or undefined behaviour depending on the program state at runtime, such as x/y for ints. In the latter case the program is required to work unless y is 0, but in the first case the compiler's asked to generate code that's totally illegitimate - it's within its rights to refuse to compile, it may just not be "bullet proofed" against such code and consequently its optimiser state (register allocations, records of which values may have been modified since read etc) gets corrupted resulting in bogus machine code for that and surrounding source code. It may be that early analysis recognised an "a=b++" situation and generated code for the preceding if to jump over a two byte instruction, but when n=n++ is encountered no instruction was output, such that the if statement jumps somewhere into the following opcodes. Anyway, it's simply game over. Putting an "if" in front, or even wrapping it in a different function, isn't documented as "containing" the undefined behaviour... bits of code aren't tainted with undefined behaviour - the Standard consistently says "the program has undefined behaviour".
It should be, if not "shall".
Behavior, by definition from ISO C (no corresponding definition found in ISO C++ but it should be still somehow applicable), is:
3.4
1 behavior
external appearance or action
And UB:
WG21/N4527
1.3.25 [defns.undefined]
undefined behavior
behavior for which this International Standard imposes no requirements [ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.
—end note ]
Despite "to behaving during translation" above, the word "behavior" used by ISO C++ is mainly about the execution of programs.
WG21/N4527
1.9 Program execution [intro.execution]
1 The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.5
2 Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine.
Each implementation shall include documentation describing its characteristics and behavior in these respects.6 Such documentation shall define the instance of the abstract machine that corresponds to that implementation (referred to as the “corresponding instance” below).
3 Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, evaluation of expressions in a new-initializer if the allocation function fails to allocate memory (5.3.4)). Where possible, this International Standard defines a set of allowable behaviors.
These define the nondeterministic aspects of the abstract machine. An instance of the abstract machine can thus have more than one possible execution for a given program and a given input.
4 Certain other operations are described in this International Standard as undefined (for example, the effect of attempting to modify a const object). [ Note: This International Standard imposes no requirements on the behavior of programs that contain undefined behavior. —end note ]
5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
5) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.
6) This documentation also includes conditionally-supported constructs and locale-specific behavior. See 1.4.
It is clear the undefined behavior would be caused by specific language construct used wrongly or in a non-portable way (which is not conforming to the standard). However, the standard mention nothing about which specific portion of code in a program would cause it. In other words, "having undefined behavior" is the property (about conforming) of the whole program being executed, not any smaller parts of it.
The standard could have given a stronger guarantee to make the behavior well-defined once some specific code is not being executed, only when there exists a way to map the C++ code to the corresponding behavior precisely. This is hard (if not impossible) without a detailed semantic model about execution. In short, the operational semantics given by the abstract machine model above is not enough to achieve the stronger guarantee. But anyway, ISO C++ would never be JVMS or ECMA-335. And I don't expect there would be a complete set of formal semantics describing the language.
A key problem here is the meaning of "execution". Some people think "executing a program" means making the program being run. This is not quite true. Note the representation of program executed in the abstract machine is not specified. (Also note "this International Standard places no requirement on the structure of conforming implementations".) The code being executed here can be literally C++ code (not necessarily machine code or some other forms of intermediate code which is not specified by the standard at all). This effectively allows the core language to be implemented as an interpreter, an online partial evaluator or some other monsters translating C++ code on-the-fly. As a result, actually there is no way to split the phases of translation (defined by ISO C++ [lex.phases]) completely ahead of the process of execution without knowledge about specific implementations. Thus, it is necessary to allow UB occurring during the translation when it is too difficult to specify portable well-defined behavior.
Besides the problems above, perhaps for most ordinary users, one (non-technical) reason is enough: it is simply unnecessary to provide the stronger guarantee, allow bad code and defeat one of the (probable most important) usefulness aspect of UB itself: to encourage quickly throwing away some (unnecessarily) nonportable smelly code without effort to "fix" them which would be eventually in vain.
Additional notes:
Some words are copied and reconstructed from one of my reply to this comment.
A C compiler is allowed to do anything it likes as soon as a program enters a state via which there is no defined sequence of events which would allow the program to avoid invoking Undefined Behavior at some point in the future (note any loop which does not have any side-effects, and which does not have an exit condition which a compiler would be to required to recognize, invokes Undefined Behavior in and of itself). The compiler's behavior in such cases is bound by the laws of neither time nor causality. In situations where Undefined Behavior occurs in an expression whose result is never used, some compilers won't generate any code for the expression (so it will never "execute") but that won't prevent compilers from using the Undefined Behavior to make other inferences about program behavior.
For example:
void maybe_launch_missiles(void)
{
if (should_launch_missiles())
{
arm_missiles();
if (should_launch_missiles())
launch_missiles();
}
disarm_missiles();
}
int foo(int x)
{
maybe_launch_missiles();
return x<<1;
}
Under the C current C standard, if the compiler could determinate that disarm_missiles() would always return without terminating but the three other external functions called above might terminate, the most efficient standard-compliant replacement for the statement foo(-1); (return value ignored) would be should_launch_missiles(); arm_missiles(); should_launch_missiles(); launch_missiles();.
Program behavior will only be defined if either call to should_launch_missiles() terminates without returning, if the first call returns non-zero and arm_missiles() terminates without returning, or if both calls return non-zero and launch_missiles() terminates without returning. A program which works correctly in those cases will abide by the standard regardless of what it does in any other situation. If returning from maybe_launch_missiles() would cause Undefined Behavior, compiler would not be required to recognize the possibility that either call to should_launch_missiles() could return zero.
As a consequence, some modern compilers, the effect of left-shifting a negative number may be worse than anything that could be caused by any kind of Undefined Behavior on a typical C99 compiler on platforms that separate code and data spaces and trap stack overflow. Even if code engaged in Undefined Behavior which could cause random control transfers, there would be no means by which it could cause arm_missiles() and launch_missiles() to be called consecutively without having an intervening call to disarm_missiles() unless at least one call to should_launch_missiles() returned a non-zero value. A hyper-modern compiler, however, may negate such protections.
In the dialect processed by gcc with full optimizations enabled, if a program contains two constructs which would behave identically in cases where both are defined, reliable program operation requires that any code that would switch among them only be executed in cases where both are defined. For example, when optimizations are enabled, both ARM gcc 9.2.1 and x86-64 gcc 10.1 will process the following source:
#include <limits.h>
#if LONG_MAX == 0x7FFFFFFF
typedef int longish;
#else
typedef long long longish;
#endif
long test(long *x, long *y)
{
if (*x)
{
if (x==y)
*y = 1;
else
*(longish*)y = 1;
}
return *x;
}
into machine code that will test if x and y are equal, set *x to 1 if they are and *y to 1 if they aren't, but return the previous value of *x in either case. For purpose of determining whether anything might affect *x, gcc decides that both branches of the if are equivalent, and thus only evaluates the "false" branch. Since that can't affect *x, it concludes that the if as a whole can't either. That determination is unswayed by its observation that on the true branch, the write to *y can be replaced with a write to *x.
In the context of a safety-critical embedded system, the posted code would be considered defective:
The code should not pass code review and/or standards compliance (MISRA, etc)
Static analysis (lint, cppcheck, etc) should flag this as a defect
Some compilers can flag this as a warning (implying a defect, as well.)
Given the following function call:
f(g(), h())
since the order of evaluation of function arguments is unspecified (still the case in C++11 as far as I'm aware), could an implementation theoretically execute g() and h() in parallel?
Such a parallelisation could only kick in were g and h known to be fairly trivial (in the most obvious case, accessing only data local to their bodies) so as not to introduce concurrency issues but, beyond that restriction I can't see anything to prohibit it.
So, does the standard allow it? Even if only by the as-if rule?
(In this answer, Mankarse asserts otherwise; however, he does not cite the standard, and my read-through of [expr.call] hasn't revealed any obvious wording.)
The requirement comes from [intro.execution]/15:
... When calling a function ... Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function [Footnote: In other words, function executions do not interleave with each other.].
So any execution of the body of g() must be indeterminately sequenced with (that is, not overlapping with) the evaluation of h() (because h() is an expression in the calling function).
The critical point here is that g() and h() are both function calls.
(Of course, the as-if rule means that the possibility cannot be entirely ruled out, but it should never happen in a way that could affect the observable behaviour of a program. At most, such an implementation would just change the performance characteristics of the code.)
As long as you can't tell, whatever the compiler does to evaluate these functions is entirely up to the compiler. Clearly, the evaluation of the functions cannot involve any access to shared, mutable data as this would introduce data races. The basic guiding principle is the "as if"-rule and the fundamental observable operations, i.e., access to volatile data, I/O operations, access to atomic data, etc. The relevant section is 1.9 [intro.execution].
Not unless the compiler knew exactly what g(), h(), and anything they call does.
The two expressions are function calls, which may have unknown side effects. Therefore, parallelizing them could cause a data-race on those side effects. Since the C++ standard does not allow argument evaluation to cause a data-race on any side effects of the expressions, the compiler can only parallelize them if it knows that no such data race is possible.
That means walking though each function and look at exactly what they do and/or call, then tracking through those functions, etc. In the general case, it's not feasible.
Easy answer: when the functions are sequenced, even if indeterminately, there is no possibility for a race condition between the two, which is not true if they are parallelized. Even a pair of one line "trivial" functions could do it.
void g()
{
*p = *p + 1;
}
void h()
{
*p = *p - 1;
}
If p is a name shared by g and h, then a sequential calling of g and h in any order will result in the value pointed to by p not changing. If they are parallelized, the reading of *p and the assigning of it could be interleaved arbitrarily between the two:
g reads *p and finds the value 1.
f reads *p and also finds the value 1.
g writes 2 to *p.
f, still using the value 1 it read before will write 0 to *p.
Thus, the behavior is different when they are parallelized.
Is it a standard term which is well defined, or just a term coined by developers to explain a concept (.. and what is the concept)? As I understand this has something to do with the all-confusing sequence points, but am not sure.
I found one definition here, but doesn't this make each and every statement of code a side effect?
A side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.
Can someone please explain what the term 'side effect' formally means in C++, and what is its significance?
For reference, some questions talking about side effects:
Is comma operator free from side effect?
Force compiler to not optimize side-effect-less statements
Side effects when passing objects to function in C++
A "side effect" is defined by the C++ standard in [intro.execution], by:
Reading an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
The term "side-effect" arises from the distinction between imperative languages and pure functional languages. A C++ expression can do three things:
compute a result (or compute "no result" in the case of a void expression),
raise an exception instead of evaluating to a result,
in addition to 1 or 2, otherwise alter the state of the abstract machine on which the program is nominally running.
(3) are side-effects, the "main effect" being to evaluate the result of the expression. Exceptions are a slightly awkward special case, in that altering the flow of control does change the state of the abstract machine (by changing the current point of execution), but isn't a side-effect. The code to construct, handle and destroy the exception may have its own side-effects, of course.
The same principles apply to functions, with the return value in place of the result of the expression.
So, int foo(int a, int b) { return a + b; } just computes a return value, it doesn't alter anything else. Therefore it has no side-effects, which sometimes is an interesting property of a function when it comes to reasoning about your program (e.g. to prove that it is correct, or by the compiler when it optimizes). int bar(int &a, int &b) { return ++a + b; } does have a side-effect, since modifying the caller's object a is an additional effect of the function beyond simply computing a return value. It would not be permitted in a pure functional language.
The stuff in your quote about "has finished being evaluated" refers to the fact that the result of an expression (or return value of a function) can be a "temporary object", which is destroyed at the end of the full expression in which it occurs. So creating a temporary isn't a "side-effect" by that definition: other changes are.
What exactly is a 'side-effect' in C++? Is it a standard term which is well defined...
c++11 draft - 1.9.12: Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call to a library I/O function returns or an access to a volatile object is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.
I found one definition here, but doesn't this make each and every statement of code a side effect?
A side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.
Can someone please explain what the term 'side effect' formally means in C++, and what is its significance?
The significance is that, as expressions are being evaluated they can modify the program state and/or perform I/O. Expressions are allowed in myriad places in C++: variable assignments, if/else/while conditions, for loop setup/test/modify steps, function parameters etc.... A couple examples: ++x and strcat(buffer, "append this").
In a C++ program, the Standard grants the optimiser the right to generate code representing the program operations, but requires that all the operations associated with steps before a sequence point appear before any operations related to steps after the sequence point.
The reason C++ programmers tend to have to care about sequence points and side effects is that there aren't as many sequence points as you might expect. For example: given x = 1; f(++x, ++x);, you may expect a call to f(2, 3) but it's actually undefined behaviour. This behaviour is left undefined so the compiler's optimiser has more freedom to arrange operations with side effects to run in the most efficient order possible - perhaps even in parallel. It also avoid burdening compiler writers with detecting such conditions.
1.Is comma operator free from side effect?
Yes - a comma operator introduces a sequence point: the steps on the left must be complete before those on the right execute. There are a list of sequence points at http://en.wikipedia.org/wiki/Sequence_point - you should read this! (If you have to ask about side effects, then be careful in interpreting this answer - the "comma operator" is NOT invoked between function arguments, array initialisation elements etc.. The comma operator is relatively rarely used and somewhat obscure. Do some reading if you're not sure what the comma operator really is.)
2.Force compiler to not optimize side-effect-less statements
I assume you mean "side-effect-ful" statements. Compiler's are not obliged to support any such option. What behaviour would they exhibit if they tried? - the Standard doesn't define what they should do in such situations. Sometimes a majority of programmers might share an intuitive expectation, but other times it's really arbitary.
3.Side effects when passing objects to function in C++
When calling a function, all the parameters must have been completely evaluated - and their side effects triggered - before the function call takes place. BUT, there are no restrictions on the compiler related to evaluating specific parameter expressions before any other. They can be overlapping, in parallel etc.. So, in f(expr1, expr2) - some of the steps in evaluating expr2 might run before anything from expr1, but expr1 might still complete first - it's undefined.
1.9.6
The observable behavior of the abstract machine is its sequence of
reads and writes to volatile data and calls to library I/O
functions.
A side-effect is anything that affects observable behavior.
Note that there are exceptions specified by the standard, where observable behavior doesn't have to conform to that of the abstract machine - see return value optimization, temporary copy elision.