C++17 sequencing: post-increment on left side of assignment - c++

The C++17 standard revised the definitions of the order of operations for the C++ language by a rule stating, to the effect:
In every simple assignment expression E1=E2 and every compound
assignment expression E1#=E2, every value computation and side-effect
of E2 is sequenced before every value computation and side effect of
E1
However, when compiling the following code in GCC 8.1 with -std=c++17 and -Wall
int v[] { 0,1,2,3,4,5,6,7 };
int *p0 = &v[0];
*p0++ = *p0 + 1;
cout << "v[0]: " << v[0] << endl;
I get the following warning:
main.cpp:266:8: warning: operation on 'p0' may be undefined [-Wsequence-point]
*p0++ = *p0 + 1;
~~^~
The output is:
v[0]: 1
And the question: is the warning erroneous?

And the question: is the warning erroneous?
It depends.
Technically, the code in question is well-defined. The right-hand side is sequenced before the left-hand side in C++17, whereas before it was indeterminately sequenced. And gcc compiles the code correctly, v[0] == 1 after that assignment.
However, it is also terrible code that should not be written, so while the specific wording of the warning is erroneous, the actual spirit of the warning seems fine to me. At least, I'm not about to file a bug report about it and it doesn't seem like the kind of thing that's worth developer time to fix. YMMV.

[I leave my answer below for reference but further discussion has shown that my answer below is incomplete and that its conclusion is ultimately incorrect.]
The C++17 standard (draft here), [expr.ass], indeed reads:
The right operand [of an assignment operator] is sequenced before the left operand.
This sounds as wrong to me as it does to you. #Barry dislikes your sample code so, to avoid distracting the question, I have tested alternate code:
#include <iostream>
namespace {
int a {3};
int& left()
{
std::cout << "in left () ...\n";
return ++a;
}
int right()
{
std::cout << "in right() ...\n";
return a *= 2;
}
}
int main()
{
left() = right();
std::cout << a << "\n";
return 0;
}
Output (using GCC 6.3):
in left () ...
in right() ...
8
Whether you regard the printed messages or consider the computed value of 8, it looks as though the left operand were sequenced before the right operand—which makes sense, insofar as efficient machine code
should generally prefer to decide where to store a computed result
before actually computing the result.
I would disagree with #Barry. You may have discovered a nontrivial problem with the standard. When you have some time, report it.
UPDATE
#SombreroChicken adds:
That's just because GCC 6.3 didn't correctly implement C++17 yet. From 7.1 and onwards it evaluates right first as seen here.
Output:
in right() ...
in left () ...
6

Related

Why change UB that will always work as intended? [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Undefined behavior and sequence points
(5 answers)
Closed 5 years ago.
In the legacy code base I'm working on, I discovered the line
n = ++n % size;
that is just a bad phrasing of the intended
n = (n+1) % size;
as deduced from the surrounding code and runtime-proved. (The latter now replaces the former.)
But since this code was marked as an error by Cppckeck, and caused a warning in GCC, without ever having caused any malfunction, I didn't stop thinking here. I reduced the line to
n = ++n;
still getting the original error/warning messages:
Cppcheck 1.80:
Id: unknownEvaluationOrder
Summary: Expression 'n=++n' depends on order of evaluation of side effects
Message: Expression 'n=++n' depends on order of evaluation of side effects
GCC (mingw32-g++.exe, version 4.9.2, C++98):
warning: operation on 'n' may be undefined [-Wsequence-point]|
I already learned that assignment expressions in C/C++ can be heavily affected by undefined evaluation order, but in this very case I just can't imagine how.
Can the undefined evaluation order of n = ++n; really be relevant for the resulting program, especially for intended value of n? That's what I imagine what may happen.
Scenario #1
++n;
n=n;
Scenario #2
n=n;
++n;
I know that the meaning and implications of relaying on undefined behaviour in C++, is hard to understand and hard to teach.
I know that the behaviour of n=++n; is undefined by C++ standards before C++11. But it has a defined behaviour from C++11 on, and this (now standard-defined behaviour) is exactly the same I'm observing with several compilers[1] for this small demo program
#include <iostream>
using namespace std;
int main()
{
int n = 0;
cout << "n before: " << n << endl;
n=++n;
cout << "n after: " << n << endl;
return 0;
}
that has the output
n before: 0
n after: 1
Is it reasonable to expect that the behaviour is actually the same for all compilers regardless of being defined or not by standards? Can you (a) show one counter example or (b) give an easy to understand explanation how this code could produce wrong results?
[1] the compilers a used
Borland-C++ 5.3.0 (pre-C++98)
Borland-C++ 5.6.4 (C++98)
C++ (vc++)
C++ (gcc 6.3)
C++14 (gcc 6.3)
C++14 clang
The increment order is precisely defined. It is stated there that
i = ++i + 2; // undefined behavior until C++11
Since you use a C++11 compiler, you can leave your code as is is. Nevertheless, I think that the expressiveness of
n = (n+1) % size;
is higher. You can more easily figure out what was intended by the writer of this code.
According to cppreference:
If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined:
i = ++i + 2; // undefined behavior until C++11
i = i++ + 2; // undefined behavior until C++17
f(i = -2, i = -2); // undefined behavior until C++17
f(++i, ++i); // undefined behavior until C++17, unspecified after C++17
i = ++i + i++; // undefined behavior
For the case n = ++n; it would be an undefined behavior but we do not care which assignment happens first, n = or ++n.

Are there sequence points in braced initializer lists when they apply to constructors?

According to the n4296 C++ standard document:
[dcl.init.list] (8.5.4.4) (pg223-224)
Within the initializer-list of a braced-init-list, the
initializer-clauses, including any that result from pack expansions
(14.5.3), are evaluated in the order in which they appear. That is,
every value computation and side effect associated with a given
initializer-clause is sequenced before every value computation and
side effect associated with any initializer-clause that follows it in
the comma-separated list of the initializer-list. [Note: This
evaluation ordering holds regardless of the semantics of the
initialization; for example, it applies when the elements of the
initializer-list are interpreted as arguments of a constructor call,
even though ordinarily there are no sequencing constraints on the
arguments of a call. —end note ]
(emphasis mine)
The note was added here: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1030
This reads to me that the following code:
#include <iostream>
struct MyType {
MyType(int i, int j, int k, int l)
: sum(i + j + k + l)
{
}
int sum;
};
int main()
{
int i = 0;
std::cout << MyType{ ++i, ++i, ++i, ++i }.sum << '\n';
}
Should print "10".
This is my reasoning:
MyType is being initialized via a braced-init-list
braced-init-lists are evaluated in order
even when it is interpreted as arguments of a constructor call
this means that it should be evaluated as MyType(1,2,3,4)
That is to say, the above code should behave exactly like this code:
#include <initializer_list>
#include <iostream>
int main()
{
int i = 0;
std::initializer_list<int> il{++i, ++i, ++i, ++i};
std::cout << *il.begin() + *(il.begin() + 1) + *(il.begin() + 2) + *(il.begin() + 3) << '\n';
}
But it does not. The first example prints '16' and the second example prints '10'
Literally every compiler from every vendor that I can get my hands on prints '16', seemingly ignoring that part of the standard and not inserting sequence points.
What am I missing here?
Note: The following seem to be related to this question:
(Optimization?) Bug regarding GCC std::thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51253
The answer seems to be that yes, this is a bug in both GCC and MSVC.
This is the status of this issue:
There are several bugs against GCC regarding init-list rules. Most of them have gone completely unacknowledged by the GCC team. This at least implies that G++ does have bugs here because the issues have not been closed as invalid
I have received unofficial word from the MSVC compiler team that this is in fact a bug in their compiler and they are working internally to fix it. However, I have no external bug to point to. As of MSVC 2015 Update 3, the old behavior remains.
Clang, which at this point is by far the most pedantic standards-complaint compiler, implements it the way the standard seems to read.
My personal investigation, discussions with C++ experts at conferences, and unofficial answers I've received from compiler developers indicates that this is a bug in MSVC and GCC, but I'm always reluctant to answer my own questions on StackOverflow. But here we are.
This note is not related to evaluation order. As it was stated in one of comments, it's about order of converting actual parameters to rvalues and standard does not define such order.
You should receive following warning (gcc):
17:58: warning: operation on 'i' may be undefined [-Wsequence-point]
I modified a program slightly to demonstrate how evaluation of arguments work with {} and ().
With such modification, program does not depend on order of converting lvalue to rvalue, thus does not have ambiguity which disappointed you.
#include <iostream>
struct MyType {
MyType(int i, int j)
: sum(i + j)
{
}
int sum;
};
int main()
{
int i = 0;
int a,b;
std::cout << MyType{ (a = ++i), (b = ++i) }.sum << '\n';
std::cout << "Here clauses are evaluated in order they appear: a=" << a << ", b=" << b << std::endl;
i = 0;
std::cout << MyType( (a = ++i), (b = ++i) ).sum << '\n';
std::cout << "Here order of evaluation depends on implementation: a=" << a << ", b=" << b << std::endl;
}
And the output of this program for clang and gcc:
clang:
3
Here clauses are evaluated in order they appear: a=1, b=2
3
Here order of evaluation depends on implementation: a=1, b=2
gcc:
3
Here clauses are evaluated in order they appear: a=1, b=2
3
Here order of evaluation depends on implementation: a=2, b=1
As you can see, in case of curly brackets, clauses are evaluated in order of appearance under both compilers, which corresponds to the note you provided.

Are comma separated statements considered full statements? (and other diagnostic issues)

I guess the answer is "no", but from a compiler point of view, I don't understand why.
I made a very simple code which freaks out compiler diagnostics quite badly (both clang and gcc), but I would like to have confirmation that the code is not ill formatted before I report mis-diagnostics. I should point out that these are not compiler bugs, the output is correct in all cases, but I have doubts about the warnings.
Consider the following code:
#include <iostream>
int main(){
int b,a;
b = 3;
b == 3 ? a = 1 : b = 2;
b == 2 ? a = 2 : b = 1;
a = a;
std::cerr << a << std::endl;
}
The assignment of a is a tautology, meaning that a will be initialized after the two ternary statements, regardless of b. GCC is perfectly happy with this code. Clang is slighly more clever and spot something silly (warning: explicitly assigning a variable of type 'int' to itself [-Wself-assign]), but no big deal.
Now the same thing (semantically at least), but shorter syntax:
#include <iostream>
int main(){
int b,a = (b=3,
b == 3 ? a = 1 : b = 2,
b == 2 ? a = 2 : b = 1,
a);
std::cerr << a << std::endl;
}
Now the compilers give me completely different warnings. Clang doesn't report anything strange anymore (which is probably correct because of the parenthesis precedence). gcc is a bit more scary and says:
test.cpp: In function ‘int main()’:
test.cpp:7:15: warning: operation on ‘a’ may be undefined [-Wsequence-point]
But is that true? That sequence-point warning gives me a hint that coma separated statements are not handled in the same way in practice, but I don't know if they should or not.
And it gets weirder, changing the code to:
#include <iostream>
int main(){
int b,a = (b=3,
b == 3 ? a = 1 : b = 2,
b == 2 ? a = 2 : b = 1,
a+0); // <- i just changed this line
std::cerr << a << std::endl;
}
and then suddenly clang realized that there might be something fishy with a:
test.cpp:7:14: warning: variable 'a' is uninitialized when used within its own initialization [-Wuninitialized]
a+0);
^
But there was no problem with a before... For some reasons clang cannot spot the tautology in this case. Again, it might simply be because those are not full statements anymore.
The problems are:
is this code valid and well defined (in all versions)?
how is the list of comma separated statements handled? Should it be different from the first version of the code with explicit statements?
is GCC right to report undefined behavior and sequence point issues? (in this case clang is missing some important diagnostics) I am aware that it says may, but still...
is clang right to report that a might be uninitialized in the last case? (then it should have the same diagnostic for the previous case)
Edit and comments:
I am getting several (rightful) comments that this code is anything but simple. This is true, but the point is that the compilers mis-diagnose when they encounter comma-separated statements in initializers. This is a bad thing. I made my code more complete to avoid the "have you tried this syntax..." comments. A much more realistic and human readable version of the problem could be written, which would exhibit wrong diagnostics, but I think this version shows more information and is more complete.
in a compiler-torture test suite, this would be considered very understandable and readable, they do much much worse :) We need code like that to test and assess compilers. This would not look pretty in production code, but that is not the point here.
5 Expressions
10 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value
expression. The expression is evaluated and its value is discarded
5.18 Comma operator [expr.comma]
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded-value expression (Clause 5).83 Every
value computation and side effect associated with the left expression
is sequenced before every value computation and side effect associated
with the right expression. The type and value of the result are the
type and value of the right operand; the result is of the same value
category as its right operand, and is a bit-field if its right operand
is a glvalue and a bit-field.
It sounds to me like there's nothing wrong with your statement.
Looking more closely at the g++ warning, may be undefined, which tells me that the parser isn't smart enough to see that a=1 is guaranteed to be evaluated.

What is "from ?: first" In TextMate Source Code regexp.cc [duplicate]

I was writing a console application that would try to "guess" a number by trial and error, it worked fine and all but it left me wondering about a certain part that I wrote absentmindedly,
The code is:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int x,i,a,cc;
for(;;){
scanf("%d",&x);
a=50;
i=100/a;
for(cc=0;;cc++)
{
if(x<a)
{
printf("%d was too big\n",a);
a=a-((100/(i<<=1))?:1);
}
else if (x>a)
{
printf("%d was too small\n",a);
a=a+((100/(i<<=1))?:1);
}
else
{
printf("%d was the right number\n-----------------%d---------------------\n",a,cc);
break;
}
}
}
return 0;
}
More specifically the part that confused me is
a=a+((100/(i<<=1))?:1);
//Code, code
a=a-((100/(i<<=1))?:1);
I used ((100/(i<<=1))?:1) to make sure that if 100/(i<<=1) returned 0 (or false) the whole expression would evaluate to 1 ((100/(i<<=1))?:***1***), and I left the part of the conditional that would work if it was true empty ((100/(i<<=1))? _this space_ :1), it seems to work correctly but is there any risk in leaving that part of the conditional empty?
This is a GNU C extension (see ?: wikipedia entry), so for portability you should explicitly state the second operand.
In the 'true' case, it is returning the result of the conditional.
The following statements are almost equivalent:
a = x ?: y;
a = x ? x : y;
The only difference is in the first statement, x is always evaluated once, whereas in the second, x will be evaluated twice if it is true. So the only difference is when evaluating x has side effects.
Either way, I'd consider this a subtle use of the syntax... and if you have any empathy for those maintaining your code, you should explicitly state the operand. :)
On the other hand, it's a nice little trick for a common use case.
This is a GCC extension to the C language. When nothing appears between ?:, then the value of the comparison is used in the true case.
The middle operand in a conditional expression may be omitted. Then if the first operand is nonzero, its value is the value of the conditional expression.
Therefore, the expression
    x ? : y
has the value of x if that is nonzero; otherwise, the value of y.
This example is perfectly equivalent to
    x ? x : y
In this simple case, the ability to omit the middle operand is not especially useful. When it becomes useful is when the first operand does, or may (if it is a macro argument), contain a side effect. Then repeating the operand in the middle would perform the side effect twice. Omitting the middle operand uses the value already computed without the undesirable effects of recomputing it.

Does postfix increment perform increment not on returned value?

Again, a silly question.
#include <stdio.h>
#include <iostream>
using namespace std;
int main()
{
int i = 0;
i = i++;
cout<<i;
return 0;
}
I get 1 printed as a result of this program though I expected 0: first a temp object created eing equal 0, then i is incremented, then temp object is returned and assigned to i. Just according to:
5.2.6 Increment and decrement [expr.post.incr]
1 The value obtained
by applying a postfix ++ is the value
that the operand had before applying
the operator. [Note: the value
obtained is a copy of the original
value ]
I checked it under MS VC 2008 and GCC. They give both the same result, though at least gcc issues a warning in incrementation string. Where am I wrong?
The behavior of
i = i++;
is undefined. If a single expression assigns two different values to a variable, the C++ spec says that anything can happen - it could take on its old value, one of the two new values, or pretty much anything at all. The reason for this is that it allows the compiler to make much more aggressive optimizations of simple expressions. The compiler could rearrange the order in which the assignment and ++ are executed, for example, if it thought it were more efficient.