why is 0; a valid statement in C++? - c++

int main() {
int a;
2;
4;
a;
return 0;
}
Why is this piece of code valid (i.e. not raising any compilation error) ? What the computer does when executing 1; or a; ?

Statements like 0;, 4; are no-operations.
Note that the behaviour of your program is undefined since a; is a read of the uninitialised variable a. Oops.
0, for example, is a valid expression (it's an octal literal int type with a value zero).
And a statement can be an expression followed by a semi-colon.
Hence
0;
is a legal statement. It is what it is really. Of course, to change the language now to disallow such things could break existing code. And there probably wasn't much appetite to disallow such things in the formative years of C either. Any reasonable compiler will optimise out such statements.
(One place where you need at least one statement is in a switch block body. ; on its own could issue warnings with some compilers, so 0; could have its uses.)

They are expression statements, statements consisting of an expression followed by a semicolon. Many statements (like a = 3;, or printf("Hello, World!");) are also expression statements, and are written because they have useful side effects.
As for what the computer does, we can examine the assembly generated by the compiler, and can see that when the expression has no side effects, the compiler is able to (and does) optimise the statements out.

Related

In c++ what does if(a=b) mean? versus if(a==b)

Probably been answered, but could not find it.
In c++ what does
if(a=b)
mean?
versus
if(a==b)
I just spent two hours debugging to find that
if(a=b)
compiles as
a=b
Why does compiler not flag
if(a=b)
as an error?
In c++ what does if(a=b) mean?
a=b is an assignment expression. If the type of a is primitive, or if the assignment operator is generated by the compiler, then the effect of such assignment is that the value of a is modified to match b. Result of the assignment will be lvalue referring to a.
If the operator is user defined, then it can technically have any behaviour, but it is conventional to conform to the expectations by doing similar modification and return of the left operand.
The returned value is converted to bool which affects whether the following statement is executed.
versus
if(a==b)
a==b is an equality comparison expression. Nothing is assigned. If the types are primitive, or if the comparison operator is generated by the compiler, then the result will be true when the operands are equal and otherwise false.
If the operator is user defined, then it can technically have any behaviour, but it is conventional to conform to the expectations by doing similar equality comparison.
Why does compiler not flag
if(a=b)
as an error?
Because it is a well-formed expression (fragment) as long as a is assignable with b.
if(a=b) is a conventional pattern to express the operation of setting value of a variable, and having conditional behaviour depending on the new value.
Some compilers do optionally "flag" it with a warning.
Note that if you would assign value int a = 1 and you make an if statement
if (a = 2)
{
std::cout << "Hello World!" << std::endl;
}
It still works, even though they are two different values, they will do the std::cout
However if you use a double equals sign == it will not.
The reason for this is if you use the standard double equals sign == you are asking the code if a is equivalent to 2, if it is 2. Obviously it's not, so it doesn't std::cout. But if you use an equals sign, you are changing the value a to 2, so it continues with the if statement.
And, to prove this, try taking away the int a = 1 from before the if statement and add an int before a in the if statement, it works.

Is undefined behavior in given code?

What is the return value of f(p,p), if the value of p is
initialized to 5 before the call? Note that the first parameter is
passed by reference, whereas the second parameter is passed by value.
int f (int &x, int c) {
c = c - 1;
if (c==0) return 1;
x = x + 1;
return f(x,c) * x;
}
Options are:
3024
6561
55440
161051
I try to explain:
In this code, there will be four recursive calls with parameters (6,4), (7,3), (8,2) and (9,1). The last call returns 1. But due to pass by reference, x in all the previous functions is now 9. Hence, the value returned by f(p,p) will be 9 * 9 * 9 * 9 * 1 = 6561.
This question is from the competitive exam GATE, (see Q.no.-42). The answer key is given by GATE "Marks to all" (means there is no option correct.) key set-C, Q.no.-42. Somewhere explained as:
In GATE 2013 marks were given to all as the same code in C/C++ produces undefined behavior. This is because * is not a sequence point in C/C++. The correct code must replace
return f(x,c) * x;
with
res = f(x,c);
return res * x;
But the given code works fine. Is GATE's key wrong? Or is really the mistake with the question?
return f(x,c) * x;
The result of this operation depends on the order in which the two things are evaluated. Since you cannot predict the order they'll be evaluated, you cannot predict the result of this operation.
C++03 chapter 5:
Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual
expressions, and the order in which side effects take place, is unspecified.
So in the case of f(x,c) * x, the order of evaluation of the operands is unspecified, meaning that you can't know if the left or right operand will get evaluated first.
Your code has unspecified behavior, meaning that it will behave in a certain defined manner which is only known to the compiler. The programmer can't know what the code will do, only that it will either evaluate the left operand first, or the right operand first. The compiler is even allowed to change the order of evaluation on a case-to-case basis, for optimization purposes.
If the order of evaluation matters, you need to rewrite the code. Code relying on unspecified behavior is always a bug, possibly a quite subtle one which will not immediately surface.
Unspecified behavior is different from undefined behavior, which would mean that anything could happen, including the program going haywire or crashing.

compiler behaviour on return value of new

#include <iostream>
using namespace std;
int main(void){
int size = -2;
int* p = new int[size];
cout<<p<<endl;
return 0;
}
Above code compiles without any problem on visual studio 2010.
But
#include <iostream>
using namespace std;
int main(void){
const int size = -2;
int* p = new int[size];
cout<<p<<endl;
return 0;
}
But this code(added const keyword) gives error on compilation(size of array cannot be negative).
Why these different results ?
By making it a constant expression, you've given the compiler the ability to diagnose the problem at compile time. Since you've made it const, the compiler can easily figure out that the value will necessarily be negative when you pass it to new.
With a non-constant expression, you have the same problem, but it takes more intelligence on the part of the compiler to detect it. Specifically, the compiler has to detect that the value doesn't change between the time you initialize it and the time you pass it to new. With optimization turned on, chances are pretty good at least some compilers still could/would detect the problem, because for optimization purposes they detect data flow so they'd "realize" that in this case the value remains constant, even though you haven't specified it as const.
Because const int size = -2; can be replaced at compilation time by the compiler - whereas a non-const cannot - the compiler can tell that size is negative and doesn't allow the allocation.
There's no way for the compiler to tell whether int* p = new int[size]; is legal or not, if size is not const - size can be altered in the program or by a different thread. A const can't.
You'll get into undefined behavior running the first sample anyway, it's just not legal.
In the first, size is a variable whose value happens to be -2 but the compiler doesn't track it as it is a variable (at least for diagnostic purpose, I'm pretty sure the optimizing phase can track it). Execution should be problematic (I don't know if it is guarantee to have an exception or just UB).
In the second, size is a constant and thus its value is known and verified at compile time.
The first yields a compilation error because the compiler can detect a valid syntax is being violated at compilation time.
The Second yields a Undefined Behaivor[1] because compiler cannot detect the illegal syntax at compilation time.
Rationale:
Once you make the variable a const the compiler knows that the value of the variable size is not supposed to change anytime during the execution of the program. So compiler will apply its optimization & just simply replace the const integral type size with its constant value -2 wherever it is being used at compile time, While doing so it understands that legal syntax is not being folowed[1] and complains with a error.
In the Second case not adding a const prevents the compiler from applying the optimization mentioned above, because it cannot be sure that the variable size will never change during the course of execution of the program.So it cannot detect the illegal syntax at all.However, You end up getting a Undefined Behavior at run-time.
[1]Reference:
C++03 5.3.4 New [expr.new]
noptr-new-declarator:
[ expression ] attribute-specifier-seqopt
noptr-new-declarator [ constant-expression ] attribute-specifier-seqopt
constant-expression are further explained in 5.4.3/6 and 5.4.3/7:
Every constant-expression in a noptr-new-declarator shall be an integral constant expression (5.19) and evaluate to a strictly positive value. The expression in a direct-new-declarator shall have integral or enumeration type (3.9.1) with a non-negative value. [Example: if n is a variable of type int, then new float[n][5] is well-formed (because n is the expression of a direct-new-declarator), but new float[5][n] is ill-formed (because n is not a constant-expression). If n is negative, the effect of new float[n][5] is undefined. ]

The result of int c=0; cout<<c++<<c;

I think it should be 01 but someone says its "undefined", any reason for that?
c++ is both an increment and an assignment. When the assignment occurs (before or after other code on that line) is left up to the discretion of the compiler. It can occur after the cout << or before.
This can be found in the C99 standard
http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf
You can find it on page 28 in the pdf or section 5.1.2.3
the actual increment of p can occur at any time between the previous sequence point and the next sequence point
Since someone asked for the C++ standard (as this is a C++ question) it can be found in section 1.9.15 page 10 (or 24 in pdf format)
evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced
It also includes the following code block:
i = v[i++]; // the behavior is undefined
i = 7, i++, i++; // i becomes 9
i = i++ + 1; // the behavior is undefined
I feel that the C99 standard's explanation is clearer, but it is true in both languages.
It is undefined behavior if you modify a value and then read it (or try to modify it again) without an intervening sequence point. The concept of a sequence point in C++ is a bit technical (you can read a little about it here), but the bottom line is that stream insertion (<<) is not a sequence point.
The reason why this is undefined behavior is because, in the absence of a sequence point, the compiler is allowed to re-order operations in any way it sees fit. That is, it is permitted to retrieve the value of c (and hold onto it for the second insertion) and then afterwords execute c++ to get the value for the first insertion. So you can't be sure whether the increment will occur before or after the value of c for the second insertion is determined.
The reason it is undefined is that the compiler is free to calculate function parameters in any order. Consider if you where calling a function (because you are, but it's easier to envision when it's in function syntax):
cout.output(c++).output(c);
The compiler may hit the parameters in reverse order, forward order, or whatever. It may call the first output before calculating the parameter to the second output or it may do both and then call.
The behavior is defined but unspecified. The relative order of evaluating the two uses of 'c' in the expression isn't specified. However, if you convert it to functional notation, it looks like this:
cout.operator<<(c++).operator<<(c);
There's a sequence point between evaluating the arguments to a function, and executing the body of the function, and function bodies aren't interleaved, so the result is only unspecified, not undefined behavior.
If you didn't have an overloaded operator:
int c=0;
int a = c++ << c;
Then the behavior would be undefined, because both modifying and using the value of c without an intervening sequence point.
Edit: The sequence litb brings up is simply wrong. The standard specifies (§1.9/17): "When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body."
This clearly written with the idea that arguments are evaluated, then (immediately afterward) the body of the function is executed. The sequence he suggests, in which arguments to one function are evaluated, then arguments to another, then execution of both function bodies doesn't seem to have been intended, but also isn't prohibited. That, however, changes nothing -- the requirement is still that: "...there is a sequence point after the evaluation of all function arguments (if any)..."
The subsequent language about execution of the body does NOT remove the requirement for a sequence point after evaluating all function arguments. All other evaluation, whether of the function body or other function arguments follows that sequence point. I can be as pedantic and perverse as anybody about mis-reading what's clearly intended (but not quite stated) -- but I can't imagine how "there is a sequence point after the evaluation of all function arguments" can be read as meaning "there is NOT a sequence point after the evaluation of all function arguments."
Neil's point is, of course, correct: the syntax I've used above is for member functions. For a non-member overload, the syntax would be more like:
operator<<(operator<<(cout,c++), c);
This doesn't remove the requirement for sequence points either though.
As far as it being unspecified: it's pretty simple really: there's a sequence point after evaluating all function arguments, so all the arguments for one function call must be fully evaluated (including all side effects), then arguments for the other function call can be evaluated (taking into account any side effects from the other) -- BUT there's no requirement about WHICH function call's arguments must be evaluated first or second, so it could be c, then c++, or it could be c++, then c -- but it has to be one or the other, not an interleaving.
As I see it,
f(c++);
is equivalent to:
f(c); c += 1;
And
f(c++,c++);
is equivalent to:
f(c,c); c += 1; c += 1;
But it may be the case that
f(c++,c++);
becomes
f(c,c+1); c+= 2;
An experiment with gcc and clang, first in C
#include <stdio.h>
void f(int a, int b) {
printf("%d %d\n",a,b);
}
int main(int argc, char **argv) {
int c = 0;
f(c++,c++);
return 0;
}
and in C++
#include <iostream>
int main(int argc, char **argv) {
int c = 0;
std::cout << c++ << " " << c++ << std::endl;
return 0;
}
Is interesting, as gcc and g++ compiled results in
1 0
whereas clang compiled results in
0 1

Is it legal to use the increment operator in a C++ function call?

There's been some debate going on in this question about whether the following code is legal C++:
std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
bool isActive = (*i)->update();
if (!isActive)
{
items.erase(i++); // *** Is this undefined behavior? ***
}
else
{
other_code_involving(*i);
++i;
}
}
The problem here is that erase() will invalidate the iterator in question. If that happens before i++ is evaluated, then incrementing i like that is technically undefined behavior, even if it appears to work with a particular compiler. One side of the debate says that all function arguments are fully evaluated before the function is called. The other side says, "the only guarantees are that i++ will happen before the next statement and after i++ is used. Whether that is before erase(i++) is invoked or afterwards is compiler dependent."
I opened this question to hopefully settle that debate.
Quoth the C++ standard 1.9.16:
When calling a function (whether or
not the function is inline), every
value computation and side effect
associated with any argument
expression, or with the postfix
expression designating the called
function, is sequenced before
execution of every expression or
statement in the body of the called
function. (Note: Value computations
and side effects associated with the
different argument expressions are
unsequenced.)
So it would seem to me that this code:
foo(i++);
is perfectly legal. It will increment i and then call foo with the previous value of i. However, this code:
foo(i++, i++);
yields undefined behavior because paragraph 1.9.16 also says:
If a side effect on a scalar object is
unsequenced relative to either another
side effect on the same scalar object
or a value computation using the value
of the same scalar object, the
behavior is undefined.
To build on Kristo's answer,
foo(i++, i++);
yields undefined behavior because the order that function arguments are evaluated is undefined (and in the more general case because if you read a variable twice in an expression where you also write it, the result is undefined). You don't know which argument will be incremented first.
int i = 1;
foo(i++, i++);
might result in a function call of
foo(2, 1);
or
foo(1, 2);
or even
foo(1, 1);
Run the following to see what happens on your platform:
#include <iostream>
using namespace std;
void foo(int a, int b)
{
cout << "a: " << a << endl;
cout << "b: " << b << endl;
}
int main()
{
int i = 1;
foo(i++, i++);
}
On my machine I get
$ ./a.out
a: 2
b: 1
every time, but this code is not portable, so I would expect to see different results with different compilers.
The standard says the side effect happens before the call, so the code is the same as:
std::list<item*>::iterator i_before = i;
i = i_before + 1;
items.erase(i_before);
rather than being:
std::list<item*>::iterator i_before = i;
items.erase(i);
i = i_before + 1;
So it is safe in this case, because list.erase() specifically doesn't invalidate any iterators other than the one erased.
That said, it's bad style - the erase function for all containers returns the next iterator specifically so you don't have to worry about invalidating iterators due to reallocation, so the idiomatic code:
i = items.erase(i);
will be safe for lists, and will also be safe for vectors, deques and any other sequence container should you want to change your storage.
You also wouldn't get the original code to compile without warnings - you'd have to write
(void)items.erase(i++);
to avoid a warning about an unused return, which would be a big clue that you're doing something odd.
It's perfectly OK.
The value passed would be the value of "i" before the increment.
++Kristo!
The C++ standard 1.9.16 makes a lot of sense with respect to how one implements operator++(postfix) for a class. When that operator++(int) method is called, it increments itself and returns a copy of the original value. Exactly as the C++ spec says.
It's nice to see standards improving!
However, I distinctly remember using older (pre-ANSI) C compilers wherein:
foo -> bar(i++) -> charlie(i++);
Did not do what you think! Instead it compiled equivalent to:
foo -> bar(i) -> charlie(i); ++i; ++i;
And this behavior was compiler-implementation dependent. (Making porting fun.)
It's easy enough to test and verify that modern compilers now behave correctly:
#define SHOW(S,X) cout << S << ": " # X " = " << (X) << endl
struct Foo
{
Foo & bar(const char * theString, int theI)
{ SHOW(theString, theI); return *this; }
};
int
main()
{
Foo f;
int i = 0;
f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);
SHOW("END ",i);
}
Responding to comment in thread...
...And building on pretty much EVERYONE's answers... (Thanks guys!)
I think we need spell this out a bit better:
Given:
baz(g(),h());
Then we don't know whether g() will be invoked before or after h(). It is "unspecified".
But we do know that both g() and h() will be invoked before baz().
Given:
bar(i++,i++);
Again, we don't know which i++ will be evaluated first, and perhaps not even whether i will be incremented once or twice before bar() is called. The results are undefined! (Given i=0, this could be bar(0,0) or bar(1,0) or bar(0,1) or something really weird!)
Given:
foo(i++);
We now know that i will be incremented before foo() is invoked. As Kristo pointed out from the C++ standard section 1.9.16:
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [ Note: Value computations and side effects associated with different argument expressions are unsequenced. -- end note ]
Though I think section 5.2.6 says it better:
The value of a postfix ++ expression is the value of its operand. [ Note: the value obtained is a copy of the original value -- end note ] The operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a complete effective object type. The value of the operand object is modified by adding 1 to it, unless the object is of type bool, in which case it is set to true. [ Note: this use is deprecated, see Annex D. -- end note ] The value computation of the ++ expression is sequenced before the modification of the operand object. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single postfix ++ operator. -- end note ] The result is an rvalue. The type of the result is the cv-unqualified version of the type of the operand. See also 5.7 and 5.17.
The standard, in section 1.9.16, also lists (as part of its examples):
i = 7, i++, i++; // i becomes 9 (valid)
f(i = -1, i = -1); // the behavior is undefined
And we can trivially demonstrate this with:
#define SHOW(X) cout << # X " = " << (X) << endl
int i = 0; /* Yes, it's global! */
void foo(int theI) { SHOW(theI); SHOW(i); }
int main() { foo(i++); }
So, yes, i is incremented before foo() is invoked.
All this makes a lot of sense from the perspective of:
class Foo
{
public:
Foo operator++(int) {...} /* Postfix variant */
}
int main() { Foo f; delta( f++ ); }
Here Foo::operator++(int) must be invoked prior to delta(). And the increment operation must be completed during that invocation.
In my (perhaps overly complex) example:
f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);
f.bar("A",i) must be executed to obtain the object used for object.bar("B",i++), and so on for "C" and "D".
So we know that i++ increments i prior to calling bar("B",i++) (even though bar("B",...) is invoked with the old value of i), and therefore i is incremented prior to bar("C",i) and bar("D",i).
Getting back to j_random_hacker's comment:
j_random_hacker writes: +1, but I had to read the standard carefully to convince myself that this was OK. Am I right in thinking that, if bar() was instead a global function returning say int, f was an int, and those invocations were connected by say "^" instead of ".", then any of A, C and D could report "0"?
This question is a lot more complicated than you might think...
Rewriting your question as code...
int bar(const char * theString, int theI) { SHOW(...); return i; }
bar("A",i) ^ bar("B",i++) ^ bar("C",i) ^ bar("D",i);
Now we have only ONE expression. According to the standard (section 1.9, page 8, pdf page 20):
Note: operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.(7) For example, in the following fragment: a=a+32760+b+5; the expression statement behaves exactly the same as: a=(((a+32760)+b)+5); due to the associativity and precedence of these operators. Thus, the result of the sum (a+32760) is next added to b, and that result is then added to 5 which results in the value assigned to a. On a machine in which overflows produce an exception and in which the range of values representable by an int is [-32768,+32767], the implementation cannot rewrite this expression as a=((a+b)+32765); since if the values for a and b were, respectively, -32754 and -15, the sum a+b would produce an exception while the original expression would not; nor can the expression be rewritten either as a=((a+32765)+b); or a=(a+(b+32765)); since the values for a and b might have been, respectively, 4 and -8 or -17 and 12. However on a machine in which overflows do not produce an exception and in which the results of overflows are reversible, the above expression statement can be rewritten by the implementation in any of the above ways because the same result will occur. -- end note ]
So we might think that, due to precedence, that our expression would be the same as:
(
(
( bar("A",i) ^ bar("B",i++)
)
^ bar("C",i)
)
^ bar("D",i)
);
But, because (a^b)^c==a^(b^c) without any possible overflow situations, it could be rewritten in any order...
But, because bar() is being invoked, and could hypothetically involve side effects, this expression cannot be rewritten in just any order. Rules of precedence still apply.
Which nicely determines the order of evaluation of the bar()'s.
Now, when does that i+=1 occur? Well it still has to occur before bar("B",...) is invoked. (Even though bar("B",....) is invoked with the old value.)
So it's deterministically occurring before bar(C) and bar(D), and after bar(A).
Answer: NO. We will always have "A=0, B=0, C=1, D=1", if the compiler is standards-compliant.
But consider another problem:
i = 0;
int & j = i;
R = i ^ i++ ^ j;
What is the value of R?
If the i+=1 occurred before j, we'd have 0^0^1=1. But if the i+=1 occurred after the whole expression, we'd have 0^0^0=0.
In fact, R is zero. The i+=1 does not occur until after the expression has been evaluated.
Which I reckon is why:
i = 7, i++, i++; // i becomes 9 (valid)
Is legal... It has three expressions:
i = 7
i++
i++
And in each case, the value of i is changed at the conclusion of each expression. (Before any subsequent expressions are evaluated.)
PS: Consider:
int foo(int theI) { SHOW(theI); SHOW(i); return theI; }
i = 0;
int & j = i;
R = i ^ i++ ^ foo(j);
In this case, i+=1 has to be evaluated before foo(j). theI is 1. And R is 0^0^1=1.
To build on MarkusQ's answer: ;)
Or rather, Bill's comment to it:
(Edit: Aw, the comment is gone again... Oh well)
They're allowed to be evaluated in parallel. Whether or not it happens in practice is technically speaking irrelevant.
You don't need thread parallelism for this to occur though, just evaluate the first step of both (take the value of i) before the second (increment i). Perfectly legal, and some compilers may consider it more efficient than fully evaluating one i++ before starting on the second.
In fact, I'd expect it to be a common optimization. Look at it from an instruction scheduling point of view. You have the following you need to evaluate:
Take the value of i for the right argument
Increment i in the right argument
Take the value of i for the left argument
Increment i in the left argument
But there's really no dependency between the left and the right argument. Argument evaluation happens in an unspecified order, and need not be done sequentially either (which is why new() in function arguments is usually a memory leak, even when wrapped in a smart pointer)
It's also undefined what happens when you modify the same variable twice in the same expression.
We do have a dependency between 1 and 2, however, and between 3 and 4.
So why would the compiler wait for 2 to complete before computing 3? That introduces added latency, and it'll take even longer than necessary before 4 becomes available.
Assuming there's a 1 cycle latency between each, it'll take 3 cycles from 1 is complete until the result of 4 is ready and we can call the function.
But if we reorder them and evaluate in the order 1, 3, 2, 4, we can do it in 2 cycles. 1 and 3 can be started in the same cycle (or even merged into one instruction, since it's the same expression), and in the following, 2 and 4 can be evaluated.
All modern CPU's can execute 3-4 instructions per cycle, and a good compiler should try to exploit that.
Sutter's Guru of the Week #55 (and the corresponding piece in "More Exceptional C++") discusses this exact case as an example.
According to him, it is perfectly valid code, and in fact a case where trying to transform the statement into two lines:
items.erase(i);
i++;
does not produce code that is semantically equivalent to the original statement.
To build on Bill the Lizard's answer:
int i = 1;
foo(i++, i++);
might also result in a function call of
foo(1, 1);
(meaning that the actuals are evaluated in parallel, and then the postops are applied).
-- MarkusQ