While porting some C code to Windows, I've discovered an interesting ternary operator behavior in MSVC++. It appears that compiler evaluates both branches around ? : in the following example:
#include <stdio.h>
struct S {
int x;
};
int getNum() {
printf("get num\n");
return 4;
}
int main(int argc, char **argv) {
struct S s = argc ? (struct S) { .x = getNum() } : (struct S) { .x = getNum() };
printf("%d\n", s.x);
return 0;
}
Prints:
get num
get num
4
But, GCC and Clang evaluate getNum() only once. Which behavior is correct or allowed by the standard?
According to C++11 §5.16.1 Conditional operator:
Conditional expressions group right-to-left. The first expression is
contextually converted to bool (Clause 4). It is evaluated and if it
is true, the result of the conditional expression is the value of the
second expression, otherwise that of the third expression. Only one
of the second and third expressions is evaluated. Every value
computation and side effect associated with the first expression is
sequenced before every value computation and side effect associated
with the second or third expression.
According to C11 §6.5.15 Conditional operator:
The first operand is evaluated; there is a sequence point between its
evaluation and the evaluation of the second or third operand
(whichever is evaluated). The second operand is evaluated only if
the first compares unequal to 0; the third operand is evaluated only
if the first compares equal to 0; the result is the value of the
second or third operand (whichever is evaluated), converted to the
type described below.
Related
Originally, I presented a more complicated example, this one was proposed by #n. 'pronouns' m. in a now-deleted answer. But the question became too long, see edit history if you are interested.
Has the following program well-defined behaviour in C++17?
int main()
{
int a=5;
(a += 1) += a;
return a;
}
I believe this expression is well-defined and evaluated like this:
The right side a is evaluated to 5.
There are no side-effects of the right side.
The left side is evaluated to a reference to a, a += 1 is well-defined for sure.
The left-side side-effect is executed, making a==6.
The assignment is evaluted, adding 5 to the current value of a, making it 11.
The relevant sections of the standard:
[intro.execution]/8:
An expression X is said to be sequenced before an expression Y if
every value computation and every side effect associated with the
expression X is sequenced before every value computation and every
side effect associated with the expression Y.
[expr.ass]/1 (emphasis mine):
The assignment operator (=) and the compound assignment operators all
group right-to-left. All require a modifiable lvalue as their left
operand; their result is an lvalue referring to the left operand. The
result in all cases is a bit-field if the left operand is a bit-field.
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression. The right operand is sequenced before the
left operand. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation.
The wording originally comes from the accepted paper P0145R3.
Now, I feel there is some ambiguity, even contradiction, in this second section.
The right operand is sequenced before the left operand.
Together with the definition of sequenced before strongly implies the ordering of side-effects, yet the previous sentence:
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression
only explicitly sequences the assignment after value computation, not their side-effects. Thus allowing this behaviour:
The right side a is evaluated to 5.
The left side is evaluated to a reference of a, a += 1 is well-defined for sure.
The assignment is evaluted, adding 5 to the current value of a, making it 10.
The left-side side-effect is executed, making a==11 or maybe even 6 if the old values was used even for the side-effect.
But this ordering clearly violates the definition of sequenced before since the side-effects of the left operand happened after the value computation of the right operand. Thus left operand was not sequenced after the right operand which violets the above mentioned sentence. No I done goofed. This is allowed behaviour, right? I.e. the assignment can interleave the right-left evaluation. Or it can be done after both full evaluations.
Running the code gcc outputs 12, clang 11. Furthermore, gcc warns about
<source>: In function 'int main()':
<source>:4:8: warning: operation on 'a' may be undefined [-Wsequence-point]
4 | (a += 1) += a;
| ~~~^~~~~
I am terrible at reading assembly, maybe someone can at least rewrite how gcc got to 12? (a += 1), a+=a works but that seems extra wrong.
Well, thinking more about it, the right side also does evaluate to a reference to a, not just to a value 5. So Gcc could still be right, in that case clang could be wrong.
In order to follow better what is actually performed, let's try to mimic the same with our own type and add some printouts:
class Number {
int num = 0;
public:
Number(int n): num(n) {}
Number operator+=(int i) {
std::cout << "+=(int) for *this = " << num
<< " and int = " << i << std::endl;
num += i;
return *this;
}
Number& operator+=(Number n) {
std::cout << "+=(Number) for *this = " << num
<< " and Number = " << n << std::endl;
num += n.num;
return *this;
}
operator int() const {
return num;
}
};
Then when we run:
Number a {5};
(a += 1) += a;
std::cout << "result: " << a << std::endl;
We get different results with gcc and clang (and without any warning!).
gcc:
+=(int) for *this = 5 and int = 1
+=(Number) for *this = 6 and Number = 6
result: 12
clang:
+=(int) for *this = 5 and int = 1
+=(Number) for *this = 6 and Number = 5
result: 11
Which is the same result as for ints in the question. Even though it is not the same exact story: built-in assignment has its own sequencing rules, as opposed to overloaded operator which is a function call, still the similarity is interesting.
It seems that while gcc keeps the right side as a reference and turns it to a value on the call to +=, clang on the other hand turns the right side to a value first.
The next step would be to add a copy constructor to our Number class, to follow exactly when the reference is turned into a value. Doing that results with calling the copy constructor as the first operation, both by clang and gcc, and the result is the same for both: 11.
It seems that gcc delays the reference to value conversion (both in the built-in assignment as well as with user defined type without a user defined copy constructor). Is it coherent with C++17 defined sequencing? To me it seems as a gcc bug, at least for the built-in assignment as in the question, as it sounds that the conversion from reference to value is part of the "value computation" that shall be sequenced before the assignment.
As for a strange behavior of clang reported in previous version of the original post - returning different results in assert and when printing:
constexpr int foo() {
int res = 0;
(res = 5) |= (res *= 2);
return res;
}
int main() {
std::cout << foo() << std::endl; // prints 5
assert(foo() == 5); // fails in clang 11.0 - constexpr foo() is 10
// fixed in clang 11.x - correct value is 5
}
This relates to a bug in clang. The failure of the assert is wrong and is due to wrong evaluation order of this expression in clang, during constant evaluation in compile time. The value should be 5. This bug is already fixed in clang trunk.
If you have the following:
if (x)
{
y = *x;
}
else
{
y = 0;
}
Then behavior is guaranteed to be defined since we can only dereference x if it is not 0
Can the same be said for:
y = (x) ? *x : 0;
This seems to work as expected (even compiled with -Wpedantic on g++)
Is this guaranteed?
Yes, only the second or third operand will be evaluated, the draft C++ standard section 5.16 [expr.cond] says:
Conditional expressions group right-to-left. The first expression is contextually converted to bool (Clause 4).
It is evaluated and if it is true, the result of the conditional expression is the value of the second expression,
otherwise that of the third expression. Only one of the second and third expressions is evaluated. Every value
computation and side effect associated with the first expression is sequenced before every value computation
and side effect associated with the second or third expression.
Is there a way in C and C++ to cause functions returning void to be evaluated in unspecified order?
I know that function arguments are evaluated in unspecified order so for functions not returning void this can be used to evaluate those functions in unspecified order:
#include <stdio.h>
int hi(void) {
puts("hi");
return 0;
}
int bye(void) {
puts("bye");
return 0;
}
int moo(void) {
puts("moo");
return 0;
}
void dummy(int a, int b, int c) {}
int main(void) {
dummy(hi(), bye(), moo());
}
Legal C and C++ code compiled by a conforming compiler may print hi, bye, and moo in any order. This is not undefined behavior (nasal demons would not be valid), there is simply more than one but less than infinite valid outputs and a compliant compiler need not even be deterministic in what it produces.
Is there any way to do this without the dummy return values?
Clarification: This is an abstract question about C and C++. A better original phrasing might have been is there any context in which function evaluation order is unspecified for functions returning void? I'm not trying to solve a specific problem.
You can take advantage of the fact that the left hand side of a the comma operator is a discarded value expression (void expression in C) like this (see it live):
int main(void) {
dummy((hi(),0), (bye(),0), (moo(),0));
}
From the draft C++ standard section 5.18 Comma operator:
A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded-value expression (Clause 5).
and C11 section 6.5.17 Comma operator:
The left operand of a comma operator is evaluated as a void expression; there is a
sequence point between its evaluation and that of the right operand. Then the right
operand is evaluated; the result has its type and value.
As Matt points out is is also possible to mix the above method with arithmetic operators to achieve unspecified order of evaluation:
(hi(),0) + (bye(),0) + (moo(),0) ;
Well there's always the obvious approach of putting pointers to the functions in a container, shuffling it up (or as suggested in a comment sorting it), and calling each item in the container. If you need to have the same behavior each run just make sure your seed is the same each time.
Recently I came across this piece of code. I don't know why I never saw this kind of syntax in all my "coding life".
int main()
{
int b;
int a = (b=5, b + 5);
std::cout << a << std::endl;
}
a has value of 10. What exactly is this way of initialization called? How does it work?
This statement:
int a = (b=5, b + 5);
Makes use of the comma operator. Per Paragraph 5.18/1 of the C++11 Standard:
[...] A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded value
expression (Clause 5).83 Every value computation and side effect associated with the left expression
is sequenced before every value computation and side effect associated with the right expression. The type
and value of the result are the type and value of the right operand; the result is of the same value category
as its right operand, and is a bit-field if its right operand is a glvalue and a bit-field. If the value of the right
operand is a temporary (12.2), the result is that temporary.
Therefore, your statement is equivalent to:
b = 5;
int a = b + 5;
Personally, I do not see a reason for using the comma operator here. Just initialize your variable the easily readable way, unless you have a good reason for doing otherwise.
operator , evaluates arguments one after another and return the last value
It may be used not only in initialization
The comma , operator allows you to separate expressions. The compount statement made by
exp1, exp2, ..., expn
evaluates to expn.
So what happens is that first b is set to 5 and then a is set to b + 5.
A side note: since , has the lowest precedence in the table of operators the semantics of
int a = b = 5, b+5;
is different from
int a = (b = 5, b+5);
because the first is parsed as (int a = b = 5), b + 5
When used in an expression the comma operator will evaluate all of its operands (left-to-right) and return the last.
The initialization is called copy initialization. If you ignore the complex expression on the right, it's the same as in:
int a = 10;
This is to be contrasted with direct initialization, which looks like this:
int a(10);
(It's possible that you were separately confused about how to evalue a comma expression. Please indicate if that's the case.)
Lets say i have following code:
std::vector<T> R;
if (condition) R = generate();
...
for (int i = 0; i < N; ++i) {
const auto &r = (R.empty() ? generate() : R);
}
It appears that generate is called regardless of R.empty(). Is that standard behavior?
From Paragraph 5.16/1 of the C++ 11 Standard:
Conditional expressions group right-to-left. The first expression is contextually converted to bool (Clause 4). It is evaluated and if it is true, the result of the conditional expression is the value of the second expression, otherwise that of the third expression. Only one of the second and third expressions is evaluated. Every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second or third expression.