Why does the following print bD aD aB aA aC aU instead of aD aB aA aC bD aU? In other words, why is b-- evaluated before --++a--++?
#include <iostream>
using namespace std;
class A {
char c_;
public:
A(char c) : c_(c) {}
A& operator++() {
cout << c_ << "A ";
return *this;
}
A& operator++(int) {
cout << c_ << "B ";
return *this;
}
A& operator--() {
cout << c_ << "C ";
return *this;
}
A& operator--(int) {
cout << c_ << "D ";
return *this;
}
void operator+(A& b) {
cout << c_ << "U ";
}
};
int main()
{
A a('a'), b('b');
--++a-- ++ +b--; // the culprit
}
From what I gather, here's how the expression is parsed by the compiler:
Preprocessor tokenization: -- ++ a -- ++ + b --;
Operator precedence1: (--(++((a--)++))) + (b--);
+ is left-to-right associative, but nonetheless the compiler may choose to evaluate the expression on the right (b--) first.
I'm assuming the compiler chooses to do it this way because it leads to better optimized code (less instructions). However, it's worth noting that I get the same result when compiling with /Od (MSVC) and -O0 (GCC). This brings me to my question:
Since I was asked this on a test which should in principle be implementation/compiler-agnostic, is there something in the C++ standard that prescribes the above behavior, or is it truly unspecified? Can someone quote an excerpt from the standard which confirms either? Was it wrong to have such a question on the test?
1 I realize the compiler doesn't really know about operator precedence or associativity, rather it cares only about the language grammar, but this should get the point across either way.
The expression statement
--++a-- ++ +b--; // the culprit
can be represented the following way
at first like
( --++a-- ++ ) + ( b-- );
then like
( -- ( ++ ( ( a-- ) ++ ) ) ) + ( b-- );
and at last like
a.operator --( 0 ).operator ++( 0 ).operator ++().operator --().operator + ( b.operator --( 0 ) );
Here is a demonstrative program.
#include <iostream>
using namespace std;
#include <iostream>
using namespace std;
class A {
char c_;
public:
A(char c) : c_(c) {}
A& operator++() {
cout << c_ << "A ";
return *this;
}
A& operator++(int) {
cout << c_ << "B ";
return *this;
}
A& operator--() {
cout << c_ << "C ";
return *this;
}
A& operator--(int) {
cout << c_ << "D ";
return *this;
}
void operator+(A& b) {
cout << c_ << "U ";
}
};
int main()
{
A a('a'), b('b');
--++a-- ++ +b--; // the culprit
std::cout << std::endl;
a.operator --( 0 ).operator ++( 0 ).operator ++().operator --().operator + ( b.operator --( 0 ) );
return 0;
}
Its output is
bD aD aB aA aC aU
bD aD aB aA aC aU
You can imagine the last expression written in the functional form like a postfix expression of the form
postfix-expression ( expression-list )
where the postfix expression is
a.operator --( 0 ).operator ++( 0 ).operator ++().operator --().operator +
and the expression-list is
b.operator --( 0 )
In the C++ Standard (5.2.2 Function call) there is said that
8 [Note: The evaluations of the postfix expression and of the arguments
are all unsequenced relative to one another. All side effects of
argument evaluations are sequenced before the function is entered (see
1.9). —end note]
So it is implementation-defined whether at first the argument will be evaluated or the postfix expression. According to the showed output the compiler at first evaluates the argument and only then the postfix expression.
I would say they were wrong to include such a question.
Except as noted, the following excerpts are all from §[intro.execution] of N4618 (and I don't think any of this stuff has changed in more recent drafts).
Paragraph 16 has the basic definition of sequenced before, indeterminately sequenced, etc.
Paragraph 18 says:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.
In this case, you're (indirectly) calling some functions. The rules there are fairly simple as well:
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. For each function invocation F, for every evaluation A that occurs within F and every evaluation B that does not occur within F but is evaluated on the same thread and as part of the same signal handler (if any), either A is
sequenced before B or B is sequenced before A.
Putting that into bullet points to more directly indicate order:
first evaluate the function arguments, and whatever designates the function being called.
Evaluate the body of the function itself.
Evaluate another (sub-)expression.
No interleaving is allowed unless something starts up a thread to allow something else to execute in parallel.
So, does any of this change before we're invoking the functions via operator overloads rather than directly? Paragraph 19 says "No":
The sequencing constraints on the execution of the called function (as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the function might be.
§[expr]/2 also says:
Uses of overloaded operators are transformed into function calls as described
in 13.5. Overloaded operators obey the rules for syntax and evaluation order specified in Clause 5, but the requirements of operand type and value category are replaced by the rules for function call.
Individual operators
The only operator you've used that has somewhat unusual requirements with respect to sequencing are the post-increment and post-decrement. These say (§[expr.post.incr]/1:
The value computation of the ++ expression is sequenced before the modification of the operand object. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single postfix ++ operator. —end note ]
In the end, however, this is pretty much just what you'd probably expect: if you pass x++ as a parameter to a function, the function receives the previous value of x, but if x is also in scope inside the function, x will have the incremented value by the time the body of the function starts to execute.
The + operator, however, does not specify ordering of the evaluation of its operands.
Summary
Using overloaded operators does not enforce any sequencing on the evaluation of sub-expressions within an expression, beyond the fact that evaluating an individual operator is a function call, and has the ordering requirements of any other function call.
More specifically, in this case, b-- is the operand to a function call, and --++a-- ++ is the expression that designates the function being called (or at least the object on which the function will be called--the -- designates the function within that object). As noted, ordering between these two is not specified (nor does operator + specify an order of evaluating its left vs. right operand).
There is not something in the C++ standard which says things need to be evaluated in this way. C++ has the concept of sequenced-before, where some operations are guaranteed to happen before other operations are. This is a partially-ordered set; that is, sosome operations are sequenced before others, two operations can’t be sequenced before eath other, and if a is sequenced before b, and b is sequenced before c, then a is sequenced before c. However, there are many types of operation which have no sequenced-before guarantees. Before C++11, there was instead a concept of a sequence point, which isn’t quite the same but similar.
Very few operators (only ,, &&, ?:, and ||, I believe) guarantee a sequence point between their arguments (and even then, until C++17, this guarantee doesn’t exist when the operators are overloaded). In particular, the addition does not guarantee any such thing. The compiler is free to evaluate the left-hand side first, to evaluate the right-hand side first, or (I think) even to evaluate them simultaneously.
Sometimes changing optimization options can change the results, or changing compilers. Apparently you aren’t seeing that; there are no guarantees here.
Operator precedence and associativity rules are only used to convert your expression from the original "operators in expression" notation to the equivalent "function call" format. After the conversion you end up with a bunch of nested function calls, which are processed in the usual way. In particular, order of parameter evaluation is unspecified, which means that there's no way to say which operand of the "binary +" call will get evaluated first.
Also, note that in your case binary + is implemented as a member function, which creates certain superficial asymmetry between its arguments: one argument is "regular" argument, another is this. Maybe some compilers "prefer" to evaluate "regular" arguments first, which is what leads to b-- being evaluated first in your tests (you might end up with different ordering from the same compiler if you implement your binary + as a freestanding function). Or maybe it doesn't matter at all.
Clang, for example, begins with evaluating the first operand, leaving b-- for later.
Take in account priority of operators in c++:
a++ a-- Suffix/postfix increment and decrement. Left-to-right
++a --a Prefix increment and decrement. Right-to-left
a+b a-b Addition and subtraction. Left-to-right
Keeping the list in your mind you can easily read the expression even without parentheses:
--++a--+++b--;//will follow with
--++a+++b--;//and so on
--++a+b--;
--++a+b;
--a+b;
a+b;
And dont forget about essential difference prefix and postfix operators in terms of order evaluation of variable and expression ))
Related
Originally, I presented a more complicated example, this one was proposed by #n. 'pronouns' m. in a now-deleted answer. But the question became too long, see edit history if you are interested.
Has the following program well-defined behaviour in C++17?
int main()
{
int a=5;
(a += 1) += a;
return a;
}
I believe this expression is well-defined and evaluated like this:
The right side a is evaluated to 5.
There are no side-effects of the right side.
The left side is evaluated to a reference to a, a += 1 is well-defined for sure.
The left-side side-effect is executed, making a==6.
The assignment is evaluted, adding 5 to the current value of a, making it 11.
The relevant sections of the standard:
[intro.execution]/8:
An expression X is said to be sequenced before an expression Y if
every value computation and every side effect associated with the
expression X is sequenced before every value computation and every
side effect associated with the expression Y.
[expr.ass]/1 (emphasis mine):
The assignment operator (=) and the compound assignment operators all
group right-to-left. All require a modifiable lvalue as their left
operand; their result is an lvalue referring to the left operand. The
result in all cases is a bit-field if the left operand is a bit-field.
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression. The right operand is sequenced before the
left operand. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation.
The wording originally comes from the accepted paper P0145R3.
Now, I feel there is some ambiguity, even contradiction, in this second section.
The right operand is sequenced before the left operand.
Together with the definition of sequenced before strongly implies the ordering of side-effects, yet the previous sentence:
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression
only explicitly sequences the assignment after value computation, not their side-effects. Thus allowing this behaviour:
The right side a is evaluated to 5.
The left side is evaluated to a reference of a, a += 1 is well-defined for sure.
The assignment is evaluted, adding 5 to the current value of a, making it 10.
The left-side side-effect is executed, making a==11 or maybe even 6 if the old values was used even for the side-effect.
But this ordering clearly violates the definition of sequenced before since the side-effects of the left operand happened after the value computation of the right operand. Thus left operand was not sequenced after the right operand which violets the above mentioned sentence. No I done goofed. This is allowed behaviour, right? I.e. the assignment can interleave the right-left evaluation. Or it can be done after both full evaluations.
Running the code gcc outputs 12, clang 11. Furthermore, gcc warns about
<source>: In function 'int main()':
<source>:4:8: warning: operation on 'a' may be undefined [-Wsequence-point]
4 | (a += 1) += a;
| ~~~^~~~~
I am terrible at reading assembly, maybe someone can at least rewrite how gcc got to 12? (a += 1), a+=a works but that seems extra wrong.
Well, thinking more about it, the right side also does evaluate to a reference to a, not just to a value 5. So Gcc could still be right, in that case clang could be wrong.
In order to follow better what is actually performed, let's try to mimic the same with our own type and add some printouts:
class Number {
int num = 0;
public:
Number(int n): num(n) {}
Number operator+=(int i) {
std::cout << "+=(int) for *this = " << num
<< " and int = " << i << std::endl;
num += i;
return *this;
}
Number& operator+=(Number n) {
std::cout << "+=(Number) for *this = " << num
<< " and Number = " << n << std::endl;
num += n.num;
return *this;
}
operator int() const {
return num;
}
};
Then when we run:
Number a {5};
(a += 1) += a;
std::cout << "result: " << a << std::endl;
We get different results with gcc and clang (and without any warning!).
gcc:
+=(int) for *this = 5 and int = 1
+=(Number) for *this = 6 and Number = 6
result: 12
clang:
+=(int) for *this = 5 and int = 1
+=(Number) for *this = 6 and Number = 5
result: 11
Which is the same result as for ints in the question. Even though it is not the same exact story: built-in assignment has its own sequencing rules, as opposed to overloaded operator which is a function call, still the similarity is interesting.
It seems that while gcc keeps the right side as a reference and turns it to a value on the call to +=, clang on the other hand turns the right side to a value first.
The next step would be to add a copy constructor to our Number class, to follow exactly when the reference is turned into a value. Doing that results with calling the copy constructor as the first operation, both by clang and gcc, and the result is the same for both: 11.
It seems that gcc delays the reference to value conversion (both in the built-in assignment as well as with user defined type without a user defined copy constructor). Is it coherent with C++17 defined sequencing? To me it seems as a gcc bug, at least for the built-in assignment as in the question, as it sounds that the conversion from reference to value is part of the "value computation" that shall be sequenced before the assignment.
As for a strange behavior of clang reported in previous version of the original post - returning different results in assert and when printing:
constexpr int foo() {
int res = 0;
(res = 5) |= (res *= 2);
return res;
}
int main() {
std::cout << foo() << std::endl; // prints 5
assert(foo() == 5); // fails in clang 11.0 - constexpr foo() is 10
// fixed in clang 11.x - correct value is 5
}
This relates to a bug in clang. The failure of the assert is wrong and is due to wrong evaluation order of this expression in clang, during constant evaluation in compile time. The value should be 5. This bug is already fixed in clang trunk.
if(a && b)
{
do something;
}
is there any possibility to evaluate arguments from right to left(b -> a)?
if "yes", what influences the evaluation order?
(i'm using VS2008)
With C++ there are only a few operators that guarantee the evaluation order
operator && evaluates left operand first and if the value is logically false then it avoids evaluating the right operand. Typical use is for example if (x > 0 && k/x < limit) ... that avoids division by zero problems.
operator || evaluates left operand first and if the value is logically true then it avoids evaluating the right operand. For example if (overwrite_files || confirm("File existing, overwrite?")) ... will not ask confirmation when the flag overwrite_files is set.
operator , evaluates left operand first and then right operand anyway, returning the value of right operand. This operator is not used very often. Note that commas between parameters in a function call are not comma operators and the order of evaluation is not guaranteed.
The ternary operator x?y:z evaluates x first, and then depending on the logical value of the result evaluates either only y or only z.
For all other operators the order of evaluation is not specified.
The situation is actually worse because it's not that the order is not specified, but that there is not even an "order" for the expression at all, and for example in
std::cout << f() << g() << x(k(), h());
it's possible that functions will be called in the order h-g-k-x-f (this is a bit disturbing because the mental model of << operator conveys somehow the idea of sequentiality but in reality respects the sequence only in the order results are put on the stream and not in the order the results are computed).
Obviously the value dependencies in the expression may introduce some order guarantee; for example in the above expression it's guaranteed that both k() and h() will be called before x(...) because the return values from both are needed to call x (C++ is not lazy).
Note also that the guarantees for &&, || and , are valid only for predefined operators. If you overload those operators for your types they will be in that case like normal function calls and the order of evaluation of the operands will be unspecified.
Changes since C++17
C++17 introduced some extra ad-hoc specific guarantees about evaluation order (for example in the left-shift operator <<). For all the details see https://stackoverflow.com/a/38501596/320726
The evaluation order is specified by the standard and is left-to-right. The left-most expression will always be evaluated first with the && clause.
If you want b to be evaluated first:
if(b && a)
{
//do something
}
If both arguments are methods and you want both of them to be evaluated regardless of their result:
bool rb = b();
bool ra = a();
if ( ra && rb )
{
//do something
}
In this case, since you're using &&, a will always be evaluated first because the result is used to determine whether or not to short-circuit the expression.
If a returns false, then b is not allowed to evaluate at all.
Every value computation and side effect of the first (left) argument of the built-in logical AND operator && and the built-in logical OR operator || is sequenced before every value computation and side effect of the second (right) argument.
Read here for a more exhaustive explanation of the rules set:
order evaluation
It will evaluate from left to right and short-circuit the evaluation if it can (e.g. if a evaluates to false it won't evaluate b).
If you care about the order they are evaluated in you just need to specify them in the desired order of evaluation in your if statement.
The built-in && operator always evaluates its left operand first. For example:
if (a && b)
{
//block of code
}
If a is false, then b will not be evaluated.
If you want b to be evaluated first, and a only if b is true, simply write the expression the other way around:
if (b && a)
{
//block of code
}
While working on a school assignment, we had to do something with operator overloading and templates. All cool. I wrote:
template<class T>
class Multiplication : public Expression<T>
{
private:
typename std::shared_ptr<Expression<T> > l, r;
public:
Multiplication(typename std::shared_ptr<Expression<T> > l, typename std::shared_ptr<Expression<T> > r) : l(l), r(r) {};
virtual ~Multiplication() {};
T evaluate() const
{
std::cout << "*";
T ml = l->evaluate();
T mr = r->evaluate();
return ml * mr;
};
};
Then a friend asked me why his code produced output in the "wrong" order.
He had something like
T evaluate() const
{
std::cout << "*";
return l->evaluate() * r->evaluate();
};
The code of r->evaluate() printed the debug information, before l->evaluate().
I tested it on my machine as well, by just changing these three lines to a one-liner.
So, I thought, well then * should be right-to-left associative. But everywhere on the internet they say it is left-to-right. Are there some extra rules? Maybe something special when using templates? Or is this a bug in VS2012 ?
When we say the associativity of * is left-to-right, we mean that the expression a*b*c*d will always evaluate as (((a*b)*c)*d). That's it. In your example, you only have one operator*, so there isn't anything to associate.
What you're running into is the order of evaluation of operands. You are calling:
operator*(l->evaluate(), r->evaluate());
Both expressions need to be evaluated before the call to operator*, but it is unspecified (explicitly) by the C++ standard what order they get evaluated in. In your case, r->evaluate() got evaluated first - but that has nothing to do with the associativity of operator*.
Note that even if you had a->evaluate() * b->evaluate() * c->evaluate(), that would get parsed as:
operator*(operator*(a->evaluate(), b->evaluate()), c->evaluate())
based on the rules of operator associativity - but even in that case, there's no rule to prevent c->evaluate() from being called first. It may very well be!
You have a single operator in your expression:
l->evaluate() * r->evaluate()
so associativity is not involved at all here. The catch is that the two operands are evaluated before calling the * operator and the order in which they are evaluated is not defined. A compiler is allowed to reorder the evaluation in any suitable way.
In C++11 terms, the call to operator* is sequenced after the operand evaluation, but there is no sequence relation between the two evaluations. From the n4296 draft (post C++14), page 10:
§1.9.15 Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.
if(a && b)
{
do something;
}
is there any possibility to evaluate arguments from right to left(b -> a)?
if "yes", what influences the evaluation order?
(i'm using VS2008)
With C++ there are only a few operators that guarantee the evaluation order
operator && evaluates left operand first and if the value is logically false then it avoids evaluating the right operand. Typical use is for example if (x > 0 && k/x < limit) ... that avoids division by zero problems.
operator || evaluates left operand first and if the value is logically true then it avoids evaluating the right operand. For example if (overwrite_files || confirm("File existing, overwrite?")) ... will not ask confirmation when the flag overwrite_files is set.
operator , evaluates left operand first and then right operand anyway, returning the value of right operand. This operator is not used very often. Note that commas between parameters in a function call are not comma operators and the order of evaluation is not guaranteed.
The ternary operator x?y:z evaluates x first, and then depending on the logical value of the result evaluates either only y or only z.
For all other operators the order of evaluation is not specified.
The situation is actually worse because it's not that the order is not specified, but that there is not even an "order" for the expression at all, and for example in
std::cout << f() << g() << x(k(), h());
it's possible that functions will be called in the order h-g-k-x-f (this is a bit disturbing because the mental model of << operator conveys somehow the idea of sequentiality but in reality respects the sequence only in the order results are put on the stream and not in the order the results are computed).
Obviously the value dependencies in the expression may introduce some order guarantee; for example in the above expression it's guaranteed that both k() and h() will be called before x(...) because the return values from both are needed to call x (C++ is not lazy).
Note also that the guarantees for &&, || and , are valid only for predefined operators. If you overload those operators for your types they will be in that case like normal function calls and the order of evaluation of the operands will be unspecified.
Changes since C++17
C++17 introduced some extra ad-hoc specific guarantees about evaluation order (for example in the left-shift operator <<). For all the details see https://stackoverflow.com/a/38501596/320726
The evaluation order is specified by the standard and is left-to-right. The left-most expression will always be evaluated first with the && clause.
If you want b to be evaluated first:
if(b && a)
{
//do something
}
If both arguments are methods and you want both of them to be evaluated regardless of their result:
bool rb = b();
bool ra = a();
if ( ra && rb )
{
//do something
}
In this case, since you're using &&, a will always be evaluated first because the result is used to determine whether or not to short-circuit the expression.
If a returns false, then b is not allowed to evaluate at all.
Every value computation and side effect of the first (left) argument of the built-in logical AND operator && and the built-in logical OR operator || is sequenced before every value computation and side effect of the second (right) argument.
Read here for a more exhaustive explanation of the rules set:
order evaluation
It will evaluate from left to right and short-circuit the evaluation if it can (e.g. if a evaluates to false it won't evaluate b).
If you care about the order they are evaluated in you just need to specify them in the desired order of evaluation in your if statement.
The built-in && operator always evaluates its left operand first. For example:
if (a && b)
{
//block of code
}
If a is false, then b will not be evaluated.
If you want b to be evaluated first, and a only if b is true, simply write the expression the other way around:
if (b && a)
{
//block of code
}
5.15 Logical OR operator in the standard says the following:
Unlike |, || guarantees left-to-right evaluation;
Does this mean somewhere I cannot locate in the standard, | is defined to evaluate right-to-left, or that it is implementation-defined? Does this vary when the operator is overloaded? I wrote a quick program to test this and both MSVC++ and GCC seem to evaluate right-to-left.
#include<iostream>
using namespace std;
int foo = 7;
class Bar {
public:
Bar& operator|(Bar& other) {
return *this;
}
Bar& operator++() {
foo += 2;
return *this;
}
Bar& operator--() {
foo *= 2;
return *this;
}
};
int main(int argc, char** argv) {
Bar a;
Bar b;
Bar c = ++a | --b;
cout << foo;
}
This outputs 16.
If ++a and --b are switched it outputs 19.
I've also considered that I may be running into the multiple changes between sequence points rule (and thus undefined behavior), but I'm unsure how/if that applies with two separate instances as operands.
Ignore that operator for now, and just take note of this:
(x + y) * (z + 1)
Here, both operands must be evaluated before the multiplication can take place (otherwise we wouldn't know what to multiply). In C++, the order in which this is done is unspecified: it could be (x + y) first, or (z + 1) first, whatever the compiler feels is better.†
The same is true for the operator |. However, operator || must short-circuit, and in order to do that, it must evaluate strictly left to right. (And if the left evaluation yields true, the evaluation ends without evaluating the right operand.) That's what the sentence means.
†Note that it may have no preference one way or another, and just evaluate in the order it's listed. This is why you get the output you do, though you cannot rely on it at the language level.
As others said, it means that the order of the evaluation of the two sides is unspecified. To answer your other questions -
I've also considered that I may be running into the multiple changes between sequence points rule (and thus undefined behavior)
No, your case does not modify foo in between two adjacent sequence points. Before entering a function and before leaving a function, there always is a sequence point, which means that both modifications of foo happen in between two different pairs of sequence points.
Does this vary when the operator is overloaded?
All of clause 5 only talks about builtin operators. For user defined operator implementations, the rules don't apply. So also for ||, for user defined operators the order is not specified. But notice that it is only for user defined operators; not when both operands are converted to bool and trigger the builtin operator:
struct A {
operator bool() const { return false; }
};
struct B {
operator bool() const { return true; }
};
int main() {
A a;
B b;
a || b;
shared_ptr<myclass> p = ...;
if(p && p->dosomething()) ...;
}
This will always first execute A::operator bool, and then B::operator bool. And it will only call p->dosomething() if p evaluates to true.
Does this mean somewhere I cannot locate in the standard, | is defined to evaluate right-to-left, or that it is implementation-defined?
Pedantically speaking the order of evaluation of arguments of | operator is unspecified. So that means the operands can be evaluated in either order.
However the order of evaluation of operands of logical operators (i.e &&, || etc) and comma operator is specified i.e from left to right.