I had a person claiming that this line is not covered by the C++ standard:
int i(1);
array_of_int[i] = i++;
The person said that it will assign 1 but we cannot know whether it will be in array_of_int[1] or array_of_int[2] although visual studio and most of compilers will be in array_of_int[1].
Is he correct ?
This is undefined behavior. Literally any behavior is legal.
The passage that forbids that line of code is this:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored
There is no sequence point between a[i] and i++ and the read to i in a[i] is not for the purpose of determining what value is stored in i by i++.
Related
Supposing we have:
char* a;
int i;
Many introductions to C++ (like this one) suggest that the rvalues a+i and &a[i] are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]:
in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.
In other words, "binding" a reference object to a null-dereference causes undefined behavior. Based on the context of the above text, one infers that merely evaluating &a[i] (within the offsetof macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that &a[i] causes undefined behavior in the case where a=null and i=0. This behavior is different from a+i (at least in C++, in the a=null, i=0 case).
This leads to at least 2 questions about the differences between a+i and &a[i]:
First, what is the underlying semantic difference between a+i and &a[i] that causes this difference in behavior. Can it be explained in terms of any kind of general principles, not just "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? Is it that &a[i] might generate a memory access to a[i]? Or the spec author wasn't happy with null dereferences that day? Or something else?
Second, besides the case where a=null and i=0, are there any other cases where a+i and &a[i] behave differently? (could be covered by the first question, depending on the answer to it.)
TL;DR: a+i and &a[i] are both well-formed and produce a null pointer when a is a null pointer and i is 0, according to (the intent of) the standard, and all compilers agree.
a+i is obviously well-formed per [expr.add]/4 of the latest draft standard:
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
[...]
&a[i] is tricky. Per [expr.sub]/1, a[i] is equivalent to *(a+i), thus &a[i] is equivalent to &*(a+i). Now the standard is not quite clear about whether &*(a+i) is well-formed when a+i is a null pointer. But as #n.m. points out in comment, the intent as recorded in cwg 232 is to permit this case.
Since core language UB is required to be caught in a constant expression ([expr.const]/(4.6)), we can test whether compilers think these two expressions are UB.
Here's the demo, if the compilers think the constant expression in static_assert is UB, or if they think the result is not true, then they must produce a diagnostic (error or warning) per standard:
(note that this uses single-parameter static_assert and constexpr lambda which are C++17 features, and default lambda argument which is also pretty new)
static_assert(nullptr == [](char* a=nullptr, int i=0) {
return a+i;
}());
static_assert(nullptr == [](char* a=nullptr, int i=0) {
return &a[i];
}());
From https://godbolt.org/z/hhsV4I, it seems all compilers behave uniformly in this case, producing no diagnostics at all (which surprises me a bit).
However, this is different from the offset case. The implementation posted in that question explicitly creates a reference (which is necessary to sidestep user-defined operator&), and thus is subject to the requirements on references.
In the C++ standard, section [expr.sub]/1 you can read:
The expression E1[E2] is identical (by definition) to *((E1)+(E2)).
This means that &a[i] is exactly the same as &*(a+i). So you would dereference * a pointer first and get the address & second. In case the pointer is invalid (i.e. nullptr, but also out of range), this is UB.
a+i is based on pointer arithmetics. At first it looks less dangerous since there is no dereferencing that would be UB for sure. However, it may also be UB (see [expr.add]/4:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x with n
elements, the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤
n; otherwise, the behavior is undefined. Likewise, the expression P -
J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j
≤ n; otherwise, the behavior is undefined.
So, while the semantics behind these two expression are slightly different, I would say that the result is the same in the end.
This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Closed 9 years ago.
int a[]={1,2,3,5};
int i=1;
a[++i]=a[i];
int j;
for(j=0;j<4;j++)
{
printf("%d",a[j]);
}
output:1235;
why the output is 1225 and not 1335.
I executed this program on codeblocks. In a[++i]=a[i], Right to left assignment will be their,leading to a[2]=a[1]. Correct me if i am wrong.
Because a[++i]=a[i]; is undefined behavior.
A sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete. The sequence points listed in the C standard are:
at the end of the evaluation of a full expression (a full
expression is an expression statement, or any other expression which
is not a subexpression within any larger expression);
at the ||, &&, ?:, and comma operators; and
at a function call (after the evaluation of all the arguments, and just before the actual call).
The standard states that
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored.
a[++i]=a[i]; // this is undefined
If you only want to change single element of the array ... do it by referencing it directly:
int a[]={1,2,3,5};
int i=1;
a[i]++; // this will increment the ith element of the array by 1
int j;
for(j=0;j<4;j++)
{
printf("%d",a[j]);
}
Output:
1335
a[++i]=a[i]; is undefined behavior. Because according to C99 section 6.5 paragraph 2
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.72) Furthermore, the prior value shall be read only to
determine the value to be stored.73)
= is not a sequence point. Check annex C.
You are modifying the value i one time, but "the prior value shall be read only to determine the value to be stored" is violated as you do a[++i].
Check Footnote 73) for an example of what the paragraph says.
73)This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
Therefore what the outcome will be cannot be determined. For different run and/or across different computers you can get different results. Such kind of expressions should not be used in C programming.
a[++i] = a[i] is undefined behavior. look-up this presentation.
What, if anything, is theoretically wrong with this c/c++ statement:
*memory++ = BIT_MASK & *memory;
Where BIT_MASK is an arbitrary bitwise AND mask, and memory is a pointer.
The intent was to read a memory location, AND the value with the mask, store the result at the original location, then finally increment the pointer to point to the next memory location.
You are invoking undefined behaviour because you reference memory twice (once for reading, once for writing) in a single statement without an intervening sequence point, and the language standards do not specify when the increment will occur. (You can read the same memory multiple times; the troubles occur when you try to mix some writing in with the reading - as in your example.)
You can use:
*memory++ &= BIT_MASK;
to achieve what you want to achieve without incurring undefined behaviour.
In the C standard (ISO/IEC 9899:1999 aka C99), §6.5 'Expressions', ¶2 says
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.70)
That's the primary source in the C standard. The footnote says:
This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
In addition, 'Annex C (informative) Sequence Points' has an extensive discussion of all this.
You would find similar wording in the C++ standard, though I'm not sure it has an analogue to 'Annex C'.
It's undefined behavior since you have memory++ and memory in the same statement.
This is because C/C++ does not specify exactly when the ++ will occur. It can be before or after *memory is evaluated.
Here are two ways to fix it:
*memory = BIT_MASK & *memory;
memory++;
or just simply:
*memory++ &= BIT_MASK;
Take your pick.
Possible Duplicate:
Could anyone explain these undefined behaviors (i = i++ + ++i , i = i++, etc…)
According to c++ standard,
i = 3;
i = i++;
will result in undefined behavior.
We use the term "undefined behavior" if it can lead to more then one result. But here, the final value of i will be 4 no matter what the order of evaluation, so shouldn't this really be called "unspecified behavior"?
The phrase, "…the final value of i will be 4 no matter what the order of evaluation…" is incorrect. The compiler could emit the equivalent of this:
i = 3;
int tmp = i;
++i;
i = tmp;
or this:
i = 3;
++i;
i = i - 1;
or this:
i = 3;
i = i;
++i;
As to the definitions of terms, if the answer was guaranteed to be 4, that wouldn't be unspecified or undefined behavior, it would be defined behavior.
As it stands, it is undefined behaviour according to the standard (Wikipedia), so it's even free to do this:
i = 3;
system("sudo rm -rf /"); // DO NOT TRY THIS AT HOME … OR AT WORK … OR ANYWHERE.
No, we don't use the term "undefined behavior" when it can simply lead to more than one arithmetical result. When the behavior is limited to different arithmetical results (or, more generally, to some set of predictable results), it is typically referred to as unspecified behavior.
Undefined behavior means completely unpredictable and unlimited consequences, like formatting the hard drive on your computer or simply making your program to crash. And i = i++ is undefined behavior.
Where you got the idea that i should be 4 in this case is not clear. There's absolutely nothing in C++ language that would let you come to that conclusion.
In C and also in C++, the order of any operation between two sequence points is completely up to the compiler and cannot be dependent on. The standard defines a list of things that makes up sequence points, from memory this is
the semicolon after a statement
the comma operator
evaluation of all function arguments before the call to the function
the && and || operand
Looking up the page on wikipedia, the lists is more complete and describes more in detail. Sequence points is an extremely important concept and if you do not already know what it means, you will benefit greatly by learning it right away.
1.
No, the result will be different depending on the order of evaluation. There is no evaluation boundary between the increment and the assignment, so the increment can be performed before or after the assignment. Consider this behaviour:
load i into CX
copy CX to DX
increase DX
store DX in i
store CX in i
The result is that i contains 3, not 4.
As a comparison, in C# there is a evaluation boundary between the evaulation of the expression and the assignment, so the result will always be 3.
2.
Even if the exact behaviour isn't specified, the specification is very clear on what it covers and what it doesn't cover. The behaviour is specified as undefined, it's not unspecified.
i=, and i++ are both side effects that modify i.
i++ does not imply that i is only incremented after the entire statement is evaluated, merely that the current value of i has been read.
As such, the assignment, and the increment, could happen in any order.
This question is old, but still appears to be referenced frequently, so it deserves a new answer in light of changes to the standard, from C++17.
expr.ass Subclause 1 explains
... the assignment is sequenced after the value computation of the right and left operands ...
and
The right operand is sequenced before the left operand.
The implication here is that the side-effects of the right operand are sequenced before the assignment, which means that the expression is not addressed by the provision in [basic.exec] Subclause 10:
If a side effect on a memory location ([intro.memory]) is unsequenced relative to either another side effect on the same memory location or a value computation using the value of any object in the same memory location, and they are not potentially concurrent ([intro.multithread]), the behavior is undefined
The behavior is defined, as explained in the example which immediately follows.
See also: What made i = i++ + 1; legal in C++17?
To answer your questions:
I think "undefined behavior" means that the compiler/language implementator is free to do whatever it thinks best, and no that it could lead to more than one result.
Because it's not unspecified. It's clearly specified that its behavior is undefined.
It's not worth it to type i=i++ when you could simply type i++.
I saw such question at OCAJP practice test.
IntelliJ's IDEA decompiler turns this
public static int iplus(){
int i=0;
return i=i++;
}
into this
public static int iplus() {
int i = 0;
byte var10000 = i;
int var1 = i + 1;
return var10000;
}
Create JAR from module, then import as library & inspect.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Could anyone explain these undefined behaviors (i = i++ + ++i , i = i++, etc…)
What is the difference between i = ++i; and ++i; where i is an integer with value 10?
According to me both do the same job of incrementing i i.e after completion of both the expressions i =11.
i = ++i; invokes Undefined Behaviour whereas ++i; does not.
C++03 [Section 5/4] says Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
In i = ++i i is being modified twice[pre-increment and assignment] without any intervening sequence point so the behaviour is Undefined in C as well as in C++.
However i = ++i is well defined in C++0x :)
Writing i = ++i; writes to variable i twice (one for the increment, one for the assignment) without a sequence point between the two. This, according to the C language standard causes undefined behavior.
This means the compiler is free to implement i = ++i as identical to i = i + 1, as i = i + 2 (this actually makes sense in certain pipeline- and cache-related circumstances), or as format C:\ (silly, but technically allowed by the standard).
i = ++i will often, but not necessarily, give the result of
i = i;
i +1;
which gives i = 10
As pointed out by the comments, this is undefined behaviour and should never be relied on
while ++i will ALWAYS give
i = i+1;
which gives i = 11;
And is therefore the correct way of doing it
If i is of scalar type, then i = ++i is UB, and ++i is equivalent to i+=1.
if i is of class type and there's an operator++ overloaded for that class then
i = ++i is equivalent to i.operator=(operator++(i)), which is NOT UB, and ++i just executes the ++ operator, with whichever semantics you put in it.
The result for the first one is undefined.
These expressions are related to sequence points and, the most importantly, the first one results in undefined behavior.