Calling function with side effects inside expression - c++

I thought I understand how sequence points work in C++, but this GeeksQuiz question puzzled me:
int f(int &x, int c) {
c = c - 1;
if (c == 0) return 1;
x = x + 1;
return f(x, c) * x;
}
int main() {
int p = 5;
cout << f(p, p) << endl;
return 0;
}
The “correct” answer to this question says it prints 6561. Indeed, in VS2013 it does. But isn't it UB anyway because there is no guarantee which will be evaluated first: f(x, c) or x. We get 6561 if f(x, c) is evaluated first: the whole thing turns into five recursive calls: the first four (c = 5, 4, 3, 2) continue on, the last one (c = 1) terminates and returns 1, which amounts to 9 ** 4 in the end.
However, if x was evaluated first, then we'd get 6 * 7 * 8 * 9 * 1 instead. The funny thing is, in VS2013 even replacing f(x, c) * x with x * f(x, c) doesn't change the result. Not that it means anything.
According to the standard, is this UB or not? If not, why?

This is UB.
n4140 §1.9 [intro.execution]/15
Except where noted, evaluations of
operands of individual operators and of subexpressions of individual
expressions are unsequenced. [...] If a side effect on a scalar object
is unsequenced relative to [...] value computation using the value of
the same scalar object [...] the behavior is undefined.
Multiplicative operators don't have sequencing explicitly noted.

This is UB
Order of evaluation of the operands of almost all C++ operators (including the order of evaluation of function arguments in a function-call expression and the order of evaluation of the subexpressions within any expression) is unspecified. The compiler can evaluate operands in any order, and may choose another order when the same expression is evaluated again.
There are exceptions to this rule which are noted below.
Except where noted below, there is no concept of left-to-right or
right-to-left evaluation in C++. This is not to be confused with
left-to-right and right-to-left associativity of operators: the
expression f1() + f2() + f3() is parsed as (f1() + f2()) + f3() due to
left-to-right associativity of operator+, but the function call to f3
may be evaluated first, last, or between f1() or f2() at run time.

Related

In C and C++, is an expression using the comma operator like "a = b, ++a;" undefined?

Take these three snippets of C code:
1) a = b + a++
2) a = b + a; a++
3) a = b + a, a++
Everyone knows that example 1 is a Very Bad Thing, and clearly invokes undefined behavior. Example 2 has no problems. My question is regarding example 3. Does the comma operator work like a semicolon in this kind of expression? Are 2 and 3 equivalent or is 3 just as undefined as 1?
Specifically I was considering this regarding something like free(foo), foo = bar. This is basically the same problem as above. Can I be sure that foo is freed before it's reassigned, or is this a clear sequence point problem?
I am aware that both examples are largely pointless and it makes far more sense to just use a semicolon and be done with it. I'm just asking out of curiosity.
Case 3 is well defined.
First, let's look at how the expression is parsed:
a = b + a, a++
The comma operator , has the lowest precedence, followed by the assignment operator =, the addition operator + and the postincrement operator ++. So with the implicit parenthesis it is parsed as:
(a = (b + a)), (a++)
From here, section 6.5.17 of the C standard regarding the comma operator , says the following:
2 The left operand of a comma operator is evaluated as a void expression; there is a sequence point between its
evaluation and that of the right operand. Then the right
operand is evaluated; the result has its type and value
Section 5.14 p1 of the C++11 standard has similar language:
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded- value expression.
Every value computation and side effect associated with the left
expression is sequenced before every value computation and side effect
associated with the right expression. The type and value of the result
are the type and value of the right operand; the result is of the same
value category as its right operand, and is a bit-field if its right
operand is a glvalue and a bit-field.
Because of the sequence point, a = b + a is guaranteed to be fully evaluated before a++ in the expression a = b + a, a++.
Regarding free(foo), foo = bar, this also guarantees that foo is free'ed before a new value is assigned.
a = b + a, a++; is well-defined, but a = (b + a, a++); can be undefined.
First of all, the operator precedence makes the expression equivalent to (a = (b+a)), a++;, where + has the highest precedence, followed by =, followed by ,. The comma operator includes a sequence point between the evaluation of its left and right operand. So the code is, uninterestingly, completely equivalent to:
a = b + a;
a++;
Which is of course well-defined.
Had we instead written a = (b + a, a++);, then the sequence point in the comma operator wouldn't save the day. Because then the expression would have been equivalent to
(void)(b + a);
a = a++;
In C and C++14 or older, a = a++ is unsequenced , (see C11 6.5.16/3). Meaning this is undefined behavior (Per C11 6.5/2). Note that C++11 and C++14 were badly formulated and ambiguous.
In C++17 or later, the operands of the = operator are sequenced right to left and this is still well-defined.
All of this assuming no C++ operator overloading takes place. In that case, the parameters to the overloaded operator function will be evaluated, a sequence point takes place before the function is called, and what happens from there depends on the internals of that function.

How stack frames look like when two functions lie in one line?

For example, we have
int p(void) {
return 4;
}
int q(void) {
return 5;
}
int main(void) {
int x = p() + q();
return 0;
}
How does the stack frame look like in this case? To be exact, are p and q evaluated simultaneously, or after p is first evaluated to be 4, the program proceeds to q?
From cppreference
Order of evaluation of the operands of almost all C++ operators
(including the order of evaluation of function arguments in a
function-call expression and the order of evaluation of the
subexpressions within any expression) is unspecified. The compiler can
evaluate operands in any order, and may choose another order when the
same expression is evaluated again.
There are exceptions to this rule which are noted below.
Except where noted below, there is no concept of left-to-right or
right-to-left evaluation in C++. This is not to be confused with
left-to-right and right-to-left associativity of operators: the
expression f1() + f2() + f3() is parsed as (f1() + f2()) + f3() due to
left-to-right associativity of operator+, but the function call to f3
may be evaluated first, last, or between f1() or f2() at run time

Is the order of execution always the same in C/C++

Will this code always result in the same result?
return c * (t /= d) * t * t + b;
So I expect:
return ((c * (t / d) ^ 3) + b);
But I am not sure if the compiler can also interpret it as:
return ((c * t * t * (t / d)) + b)
I have searched in the C standard but could not find an answer,
I know that x = x++ is undefined but here I am not sure because of the () around the t /= d which I think force the compiler to first calculate that statement.
I have searched in the C standard but could not find an answer
The thing you're searching for is the sequence point.
Your expression
c * (t /= d) * t * t + b
doesn't contain any sequence points, so the sub-expressions may be evaluated in any relative order.
NOTE that this applies to C, since you mentioned that in the question. You've also tagged the related-but-very different language C++, which has different rules. Luckily, in this case, they give exactly the same result.
The relevant text from the 2014-11-19 working draft PDF:N4296 is
1.9 Program Execution [intro.execution]
...
14 Every value computation and side effect associated with a full-expression is sequenced before every value
computation and side effect associated with the next full-expression to be evaluated.
15 Except where noted, evaluations of operands of individual operators and of subexpressions of individual
expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be
performed consistently in different evaluations. — end note ] The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar
object is unsequenced relative to either another side effect on the same scalar object or a value computation
using the value of the same scalar object, and they are not potentially concurrent (1.10), the behavior is
undefined. [ Note: The next section imposes similar, but more complex restrictions on potentially concurrent
computations. — end note ]
So the logic in C++ is that unless things are explicitly sequenced (eg, by a ; separating two full expressions), then they can happen in any order.
As the (second) highlighted section mentions, when two un-sequenced sub-expressions modify the same object (or one modifies and one reads), the behaviour is undefined.
The above expression, with parenthesis making the order of operations explicit, is as follows:
return ((((c * (t /= d)) * t) * t) + b);
The problem here, however, is that there is no sequence point in this expression. So any of the subexpressions can be evaluated in any order.
For example, the compiler may choose to evaluate the value of t once, then use the original value each place it appears. Conversely, it may first evaluate t /= d which modifies t, then use this modified value anyplace else it appears.
In short, because you are both reading and writing a variable in a single expression without a sequence point, you invoke undefined behavior.
The following statement:
return c * (t /= d) * t * t + b;
invokes undefined behaviour in C (and I believe in C++ too). This is because t is evaluated twice (counting the (t /= d) subexpression) despite of an unsequenced side effect (produced by the compound assignment operator), that is affecting object represented by t variable.
The moment when you encounter UB is the one you should stop thinking about "proper" value of the expression. There is none, because anything is possible, including turning off your PC.
The recent versions of gcc and clang with -Wall may tell you that expression is suspected of invoking UB. Here, the warnings are:
warning: operation on 't' may be undefined [-Wsequence-point]
warning: unsequenced modification and access to 't' [-Wunsequenced]

Who defines C operator precedence and associativity?

Introduction
In every textbook on C/C++, you'll find an operator precedence and associativity table such as the following:
http://en.cppreference.com/w/cpp/language/operator_precedence
One of the questions on StackOverflow asked something like this:
What order do the following functions execute:
f1() * f2() + f3();
f1() + f2() * f3();
Referring to the previous chart I confidently replied that functions have left-to-right associativity so in the previous statements the are evaluated like this in both cases:
f1() -> f2() -> f3()
After the functions are evaluated you finish the evaluation like this:
(a1 * a2) + a3
a1 + (a2 * a3)
To my surprise, many people told me I was flat out wrong. Determined to prove them wrong, I decided to turn to the ANSI C11 standard. I was once again surprised to find out that very little is mentioned on operator precedence and associativity.
Questions
If my belief that functions are always evaluated from left-to-right is wrong, what does the table referring to function precedence and associativity really mean?
Who defines operator precedence and associativity if it's not ANSI? If it is ANSI who makes the definition, why is little mentioned about operator precedence and associativity? Is operator precedence and associativity inferred from the ANSI C standard or is it defined in Mathematics?
Operator precedence is defined in the appropriate standard. The standards for C and C++ are the One True Definition of what exactly C and C++ are. So if you look closely, the details are there. In fact, the details are in the grammar of the language. For example, take a look at the grammar production rule for + and - in C++ (collectively, additive-expressions):
additive-expression:
multiplicative-expression
additive-expression + multiplicative-expression
additive-expression - multiplicative-expression
As you can see, a multiplicative-expression is a subrule of an additive-expression. This means that if you have something like x + y * z, the y * z expression is a subexpression of x + y * z. This defines the precedence between these two operators.
We can also see that the left operand of an additive-expression expands to another additive-expression, which means that with x + y + z, x + y is a subexpression of it. This defines the associativity.
Associativity determines how adjacent uses of the same operator will be grouped. For example, + is left-to-right associative, which means that x + y + z will be grouped like so: (x + y) + z.
Don't mistake this for order of evaluation. There is absolutely no reason why the value of z could not be computed before x + y is. What matters is that it is x + y that is computed and not y + z.
For the function call operator, left-to-right associativity means that f()() (which could happen if f returned a function pointer, for example) is grouped like so: (f())() (of course, the other direction wouldn't make any sense).
Now let's consider the example you were looking at:
f1() + f2() * f3()
The * operator has higher precedence than the + operator, so the expressions are grouped like so:
f1() + (f2() * f3())
We don't even have to consider associativity here, because we don't have any of the same operator adjacent to each other.
Evaluation of the functions call expressions is, however, completely unsequenced. There's no reason f3 couldn't be called first, then f1, and then f2. The only requirement in this case is that operands of an operator are evaluated before the operator is. So that would mean f2 and f3 have to be called before the * is evaluated and the * must be evaluated and f1 must be called before the + is evaluated.
Some operators do, however, impose a sequencing on the evaluation of their operands. For example, in x || y, x is always evaluated before y. This allows for short-circuiting, where y does not need to be evaluated if x is known already to be true.
The order of evaluation was previously defined in C and C++ with the use of sequence points, and both have changed terminology to define things in terms of a sequenced before relationship. For more information, see Undefined Behaviour and Sequence Points.
The precedence of operators in the C Standard is indicated by the syntax.
(C99, 6.5p3) "The grouping of operators and operands is indicated by the syntax. 74)"
74) "The syntax specifies the precedence of operators in the evaluation of an expression"
C99 Rationale also says
"The rules of precedence are encoded into the syntactic rules for each operator."
and
"The rules of associativity are similarly encoded into the syntactic rules."
Also note that associativity has nothing to do with evaluation order. In:
f1() * f2() + f3()
function calls are evaluated in any order. The C syntactic rules says that f1() * f2() + f3() means (f1() * f2()) + f3() but the evaluation order of the operands in the expression is unspecified.
One way to think about precedence and associativity is to imagine that the language only allows statements containing an assignment and one operator, rather than multiple operators. So a statement like:
a = f1() * f2() + f3();
would not be allowed, since it has 5 operators: 3 function calls, multiplication, and addition. In this simplified language, you would have to assign everything to temporaries and then combine them:
temp1 = f1();
temp2 = f2();
temp3 = temp1 * temp2;
temp4 = f3();
a = temp3 + temp4;
Associativity and precedence specify that the last two statements must be performed in that order, since multiplication has higher precedence than addition. But it doesn't specify the relative order of the first 3 statements; it would be just as valid to do:
temp4 = f3();
temp2 = f2();
temp1 = f1();
temp3 = temp1 * temp2;
a = temp3 + temp4;
sftrabbit gave an example where associativity of function call operators is relevant:
a = f()();
When simplifying it as above, this becomes:
temp = f();
a = temp();
Precedence and associativity are defined in the standard, and they decide how to build the syntax tree. Precedence works by operator type(1+2*3 is 1+(2*3) and not (1+2)*3) and associativity works by operator position(1+2+3 is (1+2)+3 and not 1+(2+3)).
Order of evaluation is different - it does not define how to build the syntax tree - it defines how to evaluate the nodes of operators in the syntax tree. Order of evaluation is defined not to be defined - you can never rely on it because compilers are free to choose any order they see fit. This is done so compilers could try to optimize the code. The idea is that programmers write code that shouldn't be affected by order of evaluation, and yield the same results no matter the order.
Left-to-right associativity means that f() - g() - h() means (f() - g()) - h(), nothing more. Suppose f returns 1. Suppose g returns 2. Suppose h returns 3. Left-to-right associativity means the result is (1 - 2) - 3, or -4: a compiler is still permitted to first call g and h, that has nothing to do with associativity, but it is not allowed to give a result of 1 - (2 - 3), which would be something completely different.

Order of evaluation of expression

I've just read that order of evaluation and precedence of operators are different but related concepts in C++. But I'm still unclear how those are different but related?.
int x = c + a * b; // 31
int y = (c + a) * b; // 36
What does the above statements has to with order of evaluation. e.g. when I say (c + a) am I changing the order of evaluation of expression by changing its precedence?
The important part about order of evaluation is whether any of the components have side effects.
Suppose you have this:
int i = c() + a() * b();
Where a and b have side effects:
int global = 1;
int a() {
return global++;
}
int b() {
return ++global;
}
int c() {
return global * 2;
}
The compiler can choose what order to call a(), b() and c() and then insert the results into the expression. At that point, precedence takes over and decides what order to apply the + and * operators.
In this example the most likely outcomes are either
The compiler will evaluate c() first, followed by a() and then b(), resulting in i = 2 + 1 * 3 = 5
The compiler will evaluate b() first, followed by a() and then c(), resulting in i = 6 + 2 * 2 = 10
But the compiler is free to choose whatever order it wants.
The short story is that precedence tells you the order in which operators are applied to arguments (* before +), whereas order of evaluation tells you in what order the arguments are resolved (a(), b(), c()). This is why they are "different but related".
"Order of evaluation" refers to when different subexpressions within the same expression are evaulated relative to each other.
For example in
3 * f(x) + 2 * g(x, y)
you have the usual precedence rules between multiplication and addition. But we have an order of evaluation question: will the first multiplication happen before the second or the second before the first? It matters because if f() has a side effect that changes y, the result of the whole expression will be different depending on the order of operations.
In your specific example, this order of evaluation scenario (in which the resulting value depends on order) does not arise.
As long as we are talking about built-in operators: no, you are not changing the order of evaluation by using the (). You have no control over the order of evaluation. In fact, there's no "order of evaluation" here at all.
The compiler is allowed to evaluate this expression in any way it desires, as long as the result is correct. It is not even required to use addition and multiplication operations to evaluate these expressions. The addition and multiplication only exist in the text of your program. The compiler is free to totally and completely ignore these specific operations. On some hardware platform, such expressions might be evaluated by a single atomic machine operation. For this reason, the notion of "order of evaluation" does not make any sense here. There's nothing there that you can apply the concept of "order" to.
The only thing you are changing by using () is the mathematical meaning of the expression. Let's say a, b and c are all 2. The a + b * c must evaluate to 6, while (a + b) * c must evaluate to to 8. That's it. This is the only thing that is guaranteed to you: that the results will be correct. How these results are obtained is totally unknown. The compiler might use absolutely anything, any method and any "order of evaluation" as long as the results are correct.
For another example, if you have two such expressions in your program following each other
int x = c + a * b;
int y = (c + a) * b;
the compiler is free to evaluate them as
int x = c + a * b;
int y = c * b + x - c;
which will also produce the correct result (assuming no overflow-related problems). In which case the actual evaluation schedule will not even remotely look like something that you wrote in your source code.
To put it short, to assume that the actual evaluation will have any significant resemblance to what you wrote in the source code of your program is naive at best. Despite popular belief, built-in operators are not generally translated in their machine "counterparts".
The above applies to built-in operators, again. Once we start dealing with overloaded operators, things change drastically. Overloaded operators are indeed evaluated in full accordance with the semantic structure of the expression. There's some freedom there even with overloaded operators, but it is not as unrestricted as in case of built-in operators.
The answer is may or may not.
The evaluation order of a, b and c depends on the compiler's interpretation of this formula.
Consider the below example:
#include <limits.h>
#include <stdio.h>
int main(void)
{
double a = 1 + UINT_MAX + 1.0;
double b = 1 + 1.0 + UINT_MAX;
printf("a=%g\n", a);
printf("b=%g\n", b);
return 0;
}
Here in terms of math as we know it, a and b are to be computed equally and must have the same result. But is that true in the C(++) world? See the program's output.
I want to introduce a link worth reading with regard to this question.
The Rules 3 and 4 mention about sequence point, another concept worth remembering.