Does *&++i cause undefined behaviour in C++03? - c++

In another answer it was stated that prior to C++11, where i is an int, then use of the expression:
*&++i
caused undefined behaviour. Is this true?
On the other answer there was a little discussion in comments but it seems unconvincing.

It makes little sense to ask whether *&++i in itself has UB. The deferencing doesn't necessarily access the stored value (prior or new) of i, as you can see by using this as an initializer expression for a reference. Only if an rvalue conversion is involved (usage in such context) is there any question to discuss at all. And then, since we can use the value of ++i, we can use the value of *&++i with exactly the same caveats as for ++i.
The original question concerned essentially i = ++i, which is the same as i = *&++i. That was undefined behavior in C++03, due to i being modified twice between sequence points, and is well-defined in C++11, due to the side-effects of the assignment operator being sequenced after the value computations of the left and right hand sides.
It is perhaps relevant to note that the non-normative examples in the C++98 and C++03 standards, were incorrect, describing some cases of formally Undefined Behavior as merely unspecified behavior. Thus, the intent has not been entirely clear, all the way back. A good rule of thumb is to simply not rely on such obscure corner cases of the language, to avoid them: one should not need to be a language lawyer in order to make sense of the codeā€¦

I think the question only makes sense if we deal with the expression:
i = *&++i;
The relevant quote in the C++03 standard would be [expr]/4:
Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual
expressions, and the order in which side effects take place, is unspecified. Between the previous
and next sequence point a scalar object shall have its stored value modified at most once by the evaluation
of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.
The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.
i = ++i + 1; // the behavior is unspecified
We can just compare the sequencing of i = *&++i vs i = ++i + 1 to determine that the same rule causes both to be unspecified. They are both statements of the form:
i = f(++i);
For any function f, the reading of i on the left-hand side and the side-effect of the ++i on the right-hand side are not sequenced relative with each other. Hence, undefined behavior.

Related

Is (a=1)=2 undefined behaviour in C++98?

Similar codes for example (a+=1)%=7;, where a is an int variable.
We know that operator += or = is not a sequence point, therefore we have two side-effects between two adjcent sequence points. (we are using cpp98's sequence point rules here)
However, assignment operators like += or = guarantees to return the lvalue after assignment, which means the order of execution is to some degree "defined".
So is that an undefined behaviour ?
(a=1)=2 was undefined prior to C++11, as the = operator did not introduce a sequence point and therefore a is modified twice without an intervening sequence point. The same applies to (a+=1)%=7
The text was:
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
It's worth mentioning that the description of the assignment operator is defective:
The result of the assignment operation is the value stored in the left operand after the assignment has taken place; the result is an lvalue.
If the result is an lvalue then the result cannot be the stored value (that would be an rvalue). Lvalues designate memory locations. This sentence seems to imply an ordering relation, but regardless of how we want to interpret it, it doesn't use the term "sequence point" and therefore the earlier text about sequence points applies.
If anything, that wording casts a bit of doubt on expressions like (a=1) + 2. The C++11 revision of sequencing straightened out all these ambiguities.

When are the ++c and c++ increments applied exactly here? [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 3 years ago.
Just to see how much I understood how the ++c/c++ operator works, I tried to run these C programs:
int c = 5;
c = c - c++;
printf("%d\n", c);
prints 1, I guess the logic is that the ++ is applied after the line of code where it's used, so c becomes = c - c which is 0, and on the "next line" it's increased by one. But it seems strange to me, I'd like to know more in detail what should happen with regards to the operators priority.
Now on to this:
int c = 5;
c = c - ++c;
printf("%d\n", c);
this one prints 0, and I can't really understand why. If right hand values are parsed from left to right, I guess it would read c which is 5, then ++c which is 6 as it should be applied immediately. Or does it calculate the ++c before the whole right hand value calculation, so that it's actually doing 6 - 6 because the increment also involves the first calling of c?
For C++ (all versions, explanation applies to C++11 and later):
Both have undefined behavior, meaning that not only is the value that it will return unspecified, but that it causes your whole program to behave in an undefined manner.
The reason for this is that evaluation order inside an expression is only specified for certain cases. The order in which expressions are evaluated does not follow the order in the source code and is not related to operator precedence or associativity. In most cases the compiler can freely choose in which order it will evaluate expressions, following some general rules (e.g. the evaluation of an operator is sequenced after the value computation of its operands, etc.) and some specific ones (e.g. &&'s and ||'s left-hand operands are always sequenced before their right-hand operands).
In particular the order in which the operands of - are evaluated is unspecified. It is said that the two operands are unsequenced relative to one another.
This in itself means that we won't know whether c on the left-hand side of c - [...] will evaluate to the value of c before or after the increment.
There is however an even stricter rule forbidding the use of a value computation from a scalar object (here c) in a manner unsequenced relative to a side effect on the same scalar object. In your case both ++c and c++ cause side effects on c, but they are unsequenced with the use of the value on the left hand side of c - [...]. Not following this rule causes undefined behavior.
Therefore your compiler is allowed to output whatever it wants and you should avoid writing code like that.
For a detailed list of all the evaluation order rules of C++, see cppreference.com. Note that they changed somewhat with the different C++ versions, making more and more previously undefined or unspecified behavior defined. None of these changes apply to your particular case though.
c = c - c++;
In C, this is a very bad idea(a). You are not permitted to modify and modify/use the same object without an intervening sequence point, and that subtraction operator is not a sequence point.
The things that are sequence points can be found in Annex C of the ISO standard.
(a) Technically, the behaviour of each operation (the evaluation of c1 and c++, and the assignment to c) is well defined but the sequencing is either unsequenced or indeterminate. In the former case, actions from each part can interleave while, in the latter, they do not interleave but you don't know in which order the two parts will be done.
However, the standard C11 6.5/2 also makes it clear that a sequencing issue using the same variable is undefined behaviour:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
Bottom line, it's not something you should be doing.

order of evaluation of function parameters

What will be printed as the result of the operation below:
x=5;
printf("%d,%d,%d\n",x,x<<2,x>>2);
Answer: 5,20,1
I thought order is undefined yet I found above as interview question on many sites.
From the C++ standard:
The order of evaluation of arguments is unspecified. All side effects of argument expression evaluations take effect before the function is entered. The order of evaluation of the postfix expression and the argument expression list is unspecified.
However, your example would only have undefined behavior if the arguments were x>>=2 and x<<=2, such that x were being modified.
Bit shift operators don't modify the value of the variable... so order doesn't matter.
The order of evaluation is unspecified, but it doesn't matter because you're not modifying x at all.
So the program is well-defined, and the answer is as given.
The following would have undefined semantics:
printf("%d,%d,%d\n", x, x <<= 2, x >>= 2);
I found the answer in c++ standards.
Paragraph 5.2.2.8:
The order of evaluation of arguments is unspecified. All side effects
of argument expression evaluations take effect before the function is
entered. The order of evaluation of the postfix expression and the
argument expression list is unspecified.
In other words, It depends on compiler only.
The order of evaluation is undefined in the Official C Specification.
However, as a matter of practicality, parameters are usually evaluated right-to-left.
In your problem, the bit-shift operator doesn't change the value of X, so the order of evaluation is not important. You'd get 5,20,1, whether evaluated left-to-right, right-to-left, or middle-first.
In C, parameters are pushed on to the stack in a right-to-left order, so that the 1st param (in this case, the char* "%d,%d,%d") is at the top of the stack. Parameters are usually (but not always) evaluated in the same order they are pushed.
A problem that better illustrates what you're talking about is:
int i=1;
printf("%d, %d, %d", i++, i++, i++);
The official answer is "undefined".
The practical answer, (in the several compilers/platforms I've tried), is "3, 2, 1".

Is "++l *= m" undefined behaviour?

I have started studying about C++0x. I came across the follow expression somewhere:
int l = 1, m=2;
++l *= m;
I have no idea whether the second expression has well defined behavior or not. So I am asking it here.
Isn't it UB? I am just eager to know.
The expression is well defined in C++0x. A very Standardese quoting FAQ is given by Prasoon here.
I'm not convinced that such a high ratio of (literal Standards quotes : explanatory text) is preferable, so I'm giving an additional small explanation: Remember that ++L is equivalent to L += 1, and that the value computation of that expression is sequenced after the increment of L. And in a *= b, value computation of expression a is sequenced before assignment of the multiplication result into a.
What side effects do you have?
Increment
Assignment of the multiplication result
Both side-effects are transitively sequenced by the above two sequenced after and sequenced before.
In the code above, prefix ++ has precedence over *=, and so gets executed first. The result is that l equals 4.
UPDATE: It is indeed undefined behavior. My assumption that precedence ruled was false.
The reason is that l is both an lvalue and rvalue in *=, and also in ++. These two operations are not sequenced. Hence l is written (and read) twice "without a sequence point" (old standard wording), and behavior is undefined.
As a sidenote, I presume your question stems from changes regarding sequence points in C++0x. C++0x has changed wording regarding "sequence points" to "sequenced before", to make the standard clearer. To my knowledge, this does not change the behavior of C++.
UPDATE 2: It turns out there actually is a well defined sequencing as per sections 5.17(1), 5.17(7) and 5.3.2(1) of the N3126 draft for C++0x. #Johannes Schaub's answer is correct, and documents the sequencing of the statement. Credit should of course go to his answer.

Sequence points and partial order

A few days back there was a discussion here about whether the expression
i = ++i + 1
invokes UB
(Undefined Behavior) or not.
Finally the conclusion was made that it invokes UB as the value of 'i' is changing more than once between two sequence points.
I was involved in a discussion with Johannes Schaub in that same thread. According to him
i=(i,i++,i)+1 ------ (1) /* invokes UB as well */
I said (1) does not invoke UB because the side effects of the previous subexpressions are cleared by the comma operator ',' between i and i++ and between i++ and i.
Then he gave the following explanation:
"Yes the sequence point after i++ completes all side effects before it, but there is nothing that stops the assignment side effect overlapping with the side effect of i++.The underlying problem is that the side effect of an assignment is not specified to happen after or before the evaluation of both operands of the assignment, and so sequence points cannot do anything with regard to protecting this: Sequence points induce a partial order: Just because there is a sequence point after and before i++ doesn't mean all side effects are sequenced with regard to i.
Also, notice that merely a sequence point means nothing: The order of evaluations isn't dictated by the form of code. It's dictated by semantic rules. In this case, there is no semantic rule saying when the assignment side effect happens with regard to evaluating both of its operands or subexpressions of those operands".
The statement written in "bold" confused me. As far as I know:
"At certain specified points in the execution sequence called sequence points,all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place."
Since,comma operators also specify execution order the side effect of i++ have been cancelled when we reach the last i.He(Johannes) would have been right had the order of evaluation been not specified(but in case of comma operator it is well specified).
So I just want to know whether (1) invokes UB or not?. Can someone give another valid explanation?
Thanks!
The C standard says this about assignment operators (C90 6.3.16 or C99 6.5.16 Assignment operators):
The side effect of updating the stored value of the left operand shall occur between the previous and the next sequence point.
It seems to me that in the statement:
i=(i,i++,i)+1;
the sequence point 'previous' to the assignment operator would be the second comma operator and the 'next' sequence point would be the end of the expression. So I'd say that the expression doesn't invoke undefined behavior.
However, this expression:
*(some_ptr + i) = (i,i++,i)+1;
would have undefined behavior because the order of evaluation of the 2 operands of the assignment operator is undefined, and in this case instead of the problem being when the assignment operator's side effect takes place, the problem is you don't know whether the value of i used in the left handle operand will be evaluated before or after the right hand side. This order of evaluation problem doesn't occur in the first example because in that expression the value of i isn't actually used in the left-hand side - all that the assignment operator is interested in is the "lvalue-ness" of i.
But I also think that all this is sketchy enough (and my understanding of the nuances involved are sketchy enough) that I wouldn't be surprised if someone can convince me otherwise (on either count).
i=(i,i++,i)+1 ------ (1) /* invokes UB as well */
It does not invoke undefined behaviour. The side effect of i++ will take place before the evaluation of the next sequence point, which is denoted by the comma following it, and also before the assignment.
Nice language sudoku, though. :-)
edit: There's a more elaborate explanation here.
I believe that the following expression definitely has undefined behaviour.
i + ((i, i++, i) + 1)
The reason is that the comma operator specifies sequence points between the subexpressions in parentheses but does not specify where in that sequence the evaluation of the left hand operand of + occurs. One possibility is between the sequence points surrounding i++ and this violates the 5/4 as i is written to between two sequence points but is also read twice between the same sequence points and not just to determine the value to be stored but also to determine the value of the first operand to the + operator.
This also has undefined behaviour.
i += (i, i++, i) + 1;
Now, I am not so sure about this statement.
i = (i, i++, i) + 1;
Although the same principals apply, i must be "evaluated" as a modifiable lvalue and can be done so at any time, but I'm not convinced that its value is ever read as part of this. (Or is there another restriction that the expression violates to cause UB?)
The sub-expression (i, i++, i) happens as part of determining the value to be stored and that sub-expression contains a sequence point after the storage of a value to i. I don't see any way that this wouldn't require the side effect of i++ to be complete before the determination of the value to be stored and hence the earliest possible point that the assignment side effect could occur.
After this sequnce point i's value is read at most once and only to determine the value that will be stored back to i, so this last part is fine.