I realized recently that you can use the ternary operator in GCC and clang without a middle (?: or ? : works) and it will insert the first expression into the middle:
// outputs 2
cout << (2 ?: 4);
// outputs 3
cout << (0 ? : 3);
Where is this in the standard? I looked and didn't see anything about it.
It isn't in the standard at all.
What you are observing is a GCC extension: https://gcc.gnu.org/onlinedocs/gcc/Conditionals.html
If you omit it, its value is taken from the first operand prior to contextual conversion to bool.
The extensions value lies in not repeating side-effects and reducing the source-codes size.
Related
Based on my understanding, the insertion operator when used with any ostream object like an std::cout, will simply insert the values which follow. But when I use brackets, I am getting a different result than expected. I am trying to understand why does the usage of insertion operator with brackets in the code shown below, give the following result in C++ ?
Code
std::cout << (2 << 3) << std::endl;
Result
16
It becomes bitshift instead of insertion operator when you bracket it like that.
2 in binary is 10
After a left shift of 3, the binary becomes 10000 which is equivalent to 16
Why does the subtraction operator give a different result in math expression when I use extra brackets: 1 - (1 + 1)? Answer: The parentheses change the order of operations.
What does it mean to insert 3 into 2? Answer: 2 << 3 is not a stream insertion operator at all. After all, 2 is not a character stream. It is the bit shift left operator. Different operators have different meanings for different types.
#include <iostream>
using namespace std;
int main()
{
int arr[3] = { 10, 20, 30 };
cout << arr[-2] << endl;
cout << -2[arr] << endl;
return 0;
}
Output:
4196160
-30
Here arr[-2] is out of range and invalid, causing undefined behavior.
But -2[arr] evaluates to -30. Why?
Isn't arr[-2] equivalent to -2[arr]?
-2[arr] is parsed as -(2[arr]). In C (and in C++, ignoring overloading), the definition of X[Y] is *(X+Y) (see more discussion of this in this question), which means that 2[arr] is equal to arr[2].
The compiler parses this expression
-2
like
unary_minus decimal_integer_literal
That is definitions of integer literals do not include signs.
In turn the expression
2[arr]
is parsed by the compiler as a postfix expression.
Postfix expressions have higher precedence than unary expressions. Thus this expression
-2[arr]
is equivalent to
- ( 2[arr] )
So the unary minus is applied to the lvalue returned by the postfix expression 2[arr].
On the other hand if you wrote
int n = -2;
and then
n[arr]
then this expression would be equivalent to
arr[-2]
-2[arr] is equivalent to -(2[arr]), which is equivalent to -arr[2]. However, (-2)[arr] is equivalent to arr[-2].
This is because E1[E2] is identical to (*((E1)+(E2)))
The underlying problem is with operator precedence. In C++ the [], ie the Subscript operator hold more precedence (somewhat akin to preferance) than the - unary_minus operator.
So when one writes,
arr[-2]
The compiler first executes arr[] then - , but the unary_minus is enclosed within the bounds of the [-2] so the expression is decomposed together.
In the,
-2[arr]
The same thing happens but, the compiler executes 2[] first the n the - operator so it ends up being
-(2[arr]) not (-2)[arr]
Your understanding of the concept that,
arr[i] i[arr] and *(i+arr) are all the same is correct. They are all equivalent expressions.
If you want to write in that way, write it as (-2)[arr]. You will get the same value for sure.
Check this out for future referance :http://en.cppreference.com/w/cpp/language/operator_precedence
In Bjarne Stroustrup's The C++ Programming Language 4th edition section 36.3.6 STL-like Operations the following code is used as an example of chaining:
void f2()
{
std::string s = "but I have heard it works even if you don't believe in it" ;
s.replace(0, 4, "" ).replace( s.find( "even" ), 4, "only" )
.replace( s.find( " don't" ), 6, "" );
assert( s == "I have heard it works only if you believe in it" ) ;
}
The assert fails in gcc (see it live) and Visual Studio (see it live), but it does not fail when using Clang (see it live).
Why am I getting different results? Are any of these compilers incorrectly evaluating the chaining expression or does this code exhibit some form of unspecified or undefined behavior?
The code exhibits unspecified behavior due to unspecified order of evaluation of sub-expressions although it does not invoke undefined behavior since all side effects are done within functions which introduces a sequencing relationship between the side effects in this case.
This example is mentioned in the proposal N4228: Refining Expression Evaluation Order for Idiomatic C++ which says the following about the code in the question:
[...]This code has been reviewed by C++ experts world-wide, and published
(The C++ Programming Language, 4th edition.) Yet, its vulnerability
to unspecified order of evaluation has been discovered only recently
by a tool[...]
Details
It may be obvious to many that arguments to functions have an unspecified order of evaluation but it is probably not as obvious how this behavior interacts with chained functions calls. It was not obvious to me when I first analyzed this case and apparently not to all the expert reviewers either.
At first glance it may appear that since each replace has to be evaluated from left to right that the corresponding function argument groups must be evaluated as groups from left to right as well.
This is incorrect, function arguments have an unspecified order of evaluation, although chaining function calls does introduce a left to right evaluation order for each function call, the arguments of each function call are only sequenced before with respect to the member function call they are part of. In particular this impacts the following calls:
s.find( "even" )
and:
s.find( " don't" )
which are indeterminately sequenced with respect to:
s.replace(0, 4, "" )
the two find calls could be evaluated before or after the replace, which matters since it has a side effect on s in a way that would alter the result of find, it changes the length of s. So depending on when that replace is evaluated relative to the two find calls the result will differ.
If we look at the chaining expression and examine the evaluation order of some of the sub-expressions:
s.replace(0, 4, "" ).replace( s.find( "even" ), 4, "only" )
^ ^ ^ ^ ^ ^ ^ ^ ^
A B | | | C | | |
1 2 3 4 5 6
and:
.replace( s.find( " don't" ), 6, "" );
^ ^ ^ ^
D | | |
7 8 9
Note, we are ignoring the fact that 4 and 7 can be further broken down into more sub-expressions. So:
A is sequenced before B which is sequenced before C which is sequenced before D
1 to 9 are indeterminately sequenced with respect to other sub-expressions with some of the exceptions listed below
1 to 3 are sequenced before B
4 to 6 are sequenced before C
7 to 9 are sequenced before D
The key to this issue is that:
4 to 9 are indeterminately sequenced with respect to B
The potential order of evaluation choice for 4 and 7 with respect to B explains the difference in results between clang and gcc when evaluating f2(). In my tests clang evaluates B before evaluating 4 and 7 while gcc evaluates it after. We can use the following test program to demonstrate what is happening in each case:
#include <iostream>
#include <string>
std::string::size_type my_find( std::string s, const char *cs )
{
std::string::size_type pos = s.find( cs ) ;
std::cout << "position " << cs << " found in complete expression: "
<< pos << std::endl ;
return pos ;
}
int main()
{
std::string s = "but I have heard it works even if you don't believe in it" ;
std::string copy_s = s ;
std::cout << "position of even before s.replace(0, 4, \"\" ): "
<< s.find( "even" ) << std::endl ;
std::cout << "position of don't before s.replace(0, 4, \"\" ): "
<< s.find( " don't" ) << std::endl << std::endl;
copy_s.replace(0, 4, "" ) ;
std::cout << "position of even after s.replace(0, 4, \"\" ): "
<< copy_s.find( "even" ) << std::endl ;
std::cout << "position of don't after s.replace(0, 4, \"\" ): "
<< copy_s.find( " don't" ) << std::endl << std::endl;
s.replace(0, 4, "" ).replace( my_find( s, "even" ) , 4, "only" )
.replace( my_find( s, " don't" ), 6, "" );
std::cout << "Result: " << s << std::endl ;
}
Result for gcc (see it live)
position of even before s.replace(0, 4, "" ): 26
position of don't before s.replace(0, 4, "" ): 37
position of even after s.replace(0, 4, "" ): 22
position of don't after s.replace(0, 4, "" ): 33
position don't found in complete expression: 37
position even found in complete expression: 26
Result: I have heard it works evenonlyyou donieve in it
Result for clang (see it live):
position of even before s.replace(0, 4, "" ): 26
position of don't before s.replace(0, 4, "" ): 37
position of even after s.replace(0, 4, "" ): 22
position of don't after s.replace(0, 4, "" ): 33
position even found in complete expression: 22
position don't found in complete expression: 33
Result: I have heard it works only if you believe in it
Result for Visual Studio (see it live):
position of even before s.replace(0, 4, "" ): 26
position of don't before s.replace(0, 4, "" ): 37
position of even after s.replace(0, 4, "" ): 22
position of don't after s.replace(0, 4, "" ): 33
position don't found in complete expression: 37
position even found in complete expression: 26
Result: I have heard it works evenonlyyou donieve in it
Details from the standard
We know that unless specified the evaluations of sub-expressions are unsequenced, this is from the draft C++11 standard section 1.9 Program execution which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.[...]
and we know that a function call introduces a sequenced before relationship of the function calls postfix expression and arguments with respect to the function body, from section 1.9:
[...]When calling a function (whether or not the function is inline), every
value computation and side effect associated with any argument
expression, or with the postfix expression designating the called
function, is sequenced before execution of every expression or
statement in the body of the called function.[...]
We also know that class member access and therefore chaining will evaluate from left to right, from section 5.2.5 Class member access which says:
[...]The postfix expression before the dot or arrow is evaluated;64
the result of that evaluation, together with the id-expression,
determines the result of the entire postfix expression.
Note, in the case where the id-expression ends up being a non-static member function it does not specify the order of evaluation of the expression-list within the () since that is a separate sub-expression. The relevant grammar from 5.2 Postfix expressions:
postfix-expression:
postfix-expression ( expression-listopt) // function call
postfix-expression . templateopt id-expression // Class member access, ends
// up as a postfix-expression
C++17 changes
The proposal p0145r3: Refining Expression Evaluation Order for Idiomatic C++ made several changes. Including changes that give the code well specified behavior by strengthening the order of evaluation rules for postfix-expressions and their expression-list.
[expr.call]p5 says:
The postfix-expression is sequenced before each expression in the expression-list and any default argument. The
initialization of a parameter, including every associated value computation and side effect, is indeterminately
sequenced with respect to that of any other parameter. [ Note: All side effects of argument evaluations are
sequenced before the function is entered (see 4.6). —end note ] [ Example:
void f() {
std::string s = "but I have heard it works even if you don’t believe in it";
s.replace(0, 4, "").replace(s.find("even"), 4, "only").replace(s.find(" don’t"), 6, "");
assert(s == "I have heard it works only if you believe in it"); // OK
}
—end example ]
This is intended to add information on the matter with regards to C++17. The proposal (Refining Expression Evaluation Order for Idiomatic C++ Revision 2) for C++17 addressed the issue citing the code above was as specimen.
As suggested, I added relevant information from the proposal and to quote (highlights mine):
The order of expression evaluation, as it is currently specified in the standard, undermines advices, popular programming idioms, or the relative safety of standard library facilities. The traps aren't just for novices
or the careless programmer. They affect all of us indiscriminately, even when we know the rules.
Consider the following program fragment:
void f()
{
std::string s = "but I have heard it works even if you don't believe in it"
s.replace(0, 4, "").replace(s.find("even"), 4, "only")
.replace(s.find(" don't"), 6, "");
assert(s == "I have heard it works only if you believe in it");
}
The assertion is supposed to validate the programmer's intended result. It uses "chaining" of member function calls, a common standard practice. This code has been reviewed by C++ experts world-wide, and published (The C++ Programming Language, 4th edition.) Yet, its vulnerability to unspecified order of evaluation has been discovered only recently by a tool.
The paper suggested changing the pre-C++17 rule on the order of expression evaluation which was influenced by C and have existed for more than three decades. It proposed that the language should guarantee contemporary idioms or risk "traps and sources of obscure, hard to find bugs" such as what happened with the code specimen above.
The proposal for C++17 is to require that every expression has a well-defined evaluation order:
Postfix expressions are evaluated from left to right. This includes functions calls and member selection expressions.
Assignment expressions are evaluated from right to left. This includes compound assignments.
Operands to shift operators are evaluated from left to right.
The order of evaluation of an expression involving an overloaded operator is determined by the order associated with the corresponding built-in operator, not the rules for function calls.
The above code compiles successfully using GCC 7.1.1 and Clang 4.0.0.
I guess the answer is "no", but from a compiler point of view, I don't understand why.
I made a very simple code which freaks out compiler diagnostics quite badly (both clang and gcc), but I would like to have confirmation that the code is not ill formatted before I report mis-diagnostics. I should point out that these are not compiler bugs, the output is correct in all cases, but I have doubts about the warnings.
Consider the following code:
#include <iostream>
int main(){
int b,a;
b = 3;
b == 3 ? a = 1 : b = 2;
b == 2 ? a = 2 : b = 1;
a = a;
std::cerr << a << std::endl;
}
The assignment of a is a tautology, meaning that a will be initialized after the two ternary statements, regardless of b. GCC is perfectly happy with this code. Clang is slighly more clever and spot something silly (warning: explicitly assigning a variable of type 'int' to itself [-Wself-assign]), but no big deal.
Now the same thing (semantically at least), but shorter syntax:
#include <iostream>
int main(){
int b,a = (b=3,
b == 3 ? a = 1 : b = 2,
b == 2 ? a = 2 : b = 1,
a);
std::cerr << a << std::endl;
}
Now the compilers give me completely different warnings. Clang doesn't report anything strange anymore (which is probably correct because of the parenthesis precedence). gcc is a bit more scary and says:
test.cpp: In function ‘int main()’:
test.cpp:7:15: warning: operation on ‘a’ may be undefined [-Wsequence-point]
But is that true? That sequence-point warning gives me a hint that coma separated statements are not handled in the same way in practice, but I don't know if they should or not.
And it gets weirder, changing the code to:
#include <iostream>
int main(){
int b,a = (b=3,
b == 3 ? a = 1 : b = 2,
b == 2 ? a = 2 : b = 1,
a+0); // <- i just changed this line
std::cerr << a << std::endl;
}
and then suddenly clang realized that there might be something fishy with a:
test.cpp:7:14: warning: variable 'a' is uninitialized when used within its own initialization [-Wuninitialized]
a+0);
^
But there was no problem with a before... For some reasons clang cannot spot the tautology in this case. Again, it might simply be because those are not full statements anymore.
The problems are:
is this code valid and well defined (in all versions)?
how is the list of comma separated statements handled? Should it be different from the first version of the code with explicit statements?
is GCC right to report undefined behavior and sequence point issues? (in this case clang is missing some important diagnostics) I am aware that it says may, but still...
is clang right to report that a might be uninitialized in the last case? (then it should have the same diagnostic for the previous case)
Edit and comments:
I am getting several (rightful) comments that this code is anything but simple. This is true, but the point is that the compilers mis-diagnose when they encounter comma-separated statements in initializers. This is a bad thing. I made my code more complete to avoid the "have you tried this syntax..." comments. A much more realistic and human readable version of the problem could be written, which would exhibit wrong diagnostics, but I think this version shows more information and is more complete.
in a compiler-torture test suite, this would be considered very understandable and readable, they do much much worse :) We need code like that to test and assess compilers. This would not look pretty in production code, but that is not the point here.
5 Expressions
10 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value
expression. The expression is evaluated and its value is discarded
5.18 Comma operator [expr.comma]
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded-value expression (Clause 5).83 Every
value computation and side effect associated with the left expression
is sequenced before every value computation and side effect associated
with the right expression. The type and value of the result are the
type and value of the right operand; the result is of the same value
category as its right operand, and is a bit-field if its right operand
is a glvalue and a bit-field.
It sounds to me like there's nothing wrong with your statement.
Looking more closely at the g++ warning, may be undefined, which tells me that the parser isn't smart enough to see that a=1 is guaranteed to be evaluated.
I was writing a console application that would try to "guess" a number by trial and error, it worked fine and all but it left me wondering about a certain part that I wrote absentmindedly,
The code is:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int x,i,a,cc;
for(;;){
scanf("%d",&x);
a=50;
i=100/a;
for(cc=0;;cc++)
{
if(x<a)
{
printf("%d was too big\n",a);
a=a-((100/(i<<=1))?:1);
}
else if (x>a)
{
printf("%d was too small\n",a);
a=a+((100/(i<<=1))?:1);
}
else
{
printf("%d was the right number\n-----------------%d---------------------\n",a,cc);
break;
}
}
}
return 0;
}
More specifically the part that confused me is
a=a+((100/(i<<=1))?:1);
//Code, code
a=a-((100/(i<<=1))?:1);
I used ((100/(i<<=1))?:1) to make sure that if 100/(i<<=1) returned 0 (or false) the whole expression would evaluate to 1 ((100/(i<<=1))?:***1***), and I left the part of the conditional that would work if it was true empty ((100/(i<<=1))? _this space_ :1), it seems to work correctly but is there any risk in leaving that part of the conditional empty?
This is a GNU C extension (see ?: wikipedia entry), so for portability you should explicitly state the second operand.
In the 'true' case, it is returning the result of the conditional.
The following statements are almost equivalent:
a = x ?: y;
a = x ? x : y;
The only difference is in the first statement, x is always evaluated once, whereas in the second, x will be evaluated twice if it is true. So the only difference is when evaluating x has side effects.
Either way, I'd consider this a subtle use of the syntax... and if you have any empathy for those maintaining your code, you should explicitly state the operand. :)
On the other hand, it's a nice little trick for a common use case.
This is a GCC extension to the C language. When nothing appears between ?:, then the value of the comparison is used in the true case.
The middle operand in a conditional expression may be omitted. Then if the first operand is nonzero, its value is the value of the conditional expression.
Therefore, the expression
x ? : y
has the value of x if that is nonzero; otherwise, the value of y.
This example is perfectly equivalent to
x ? x : y
In this simple case, the ability to omit the middle operand is not especially useful. When it becomes useful is when the first operand does, or may (if it is a macro argument), contain a side effect. Then repeating the operand in the middle would perform the side effect twice. Omitting the middle operand uses the value already computed without the undesirable effects of recomputing it.