So I've recently come across something that isn't really intuitive to me and got me a little confused. If I allocate an array on the heap like this:
uint32_t* Array = new uint32_t[5];
and then try to add a certain amount of bytes to the array pointer like this:
Array + 3
the result is going to be Array + sizeof(uint32_t) * 3 instead of Array + 3.
Why is this being done?
Additive operators (§7.6.6/4) [expr.add]/4:
When an expression J that has integral type is added to or subtracted
from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the
result is a null pointer value.
Otherwise, if P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) element *x[i − j] if 0 ≤ i − j ≤ n.
Otherwise, the behavior is undefined.
Subscripting (§7.6.1.1/1) [expr.sub]/1
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)),
except that in the case of an array operand, the result is an lvalue
if that operand is an lvalue and an xvalue otherwise. The expression
E1 is sequenced before the expression E2.
Related
I know that in c++ access out of buffer bounds is undefined behaviour.
Here is example from cppreference:
int table[4] = {};
bool exists_in_table(int v)
{
// return true in one of the first 4 iterations or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return true;
}
return false;
}
But, I can't find according paragraph in c++ standard.
Can anyone point me out on concrete paragraph in standard where such case is explained?
It's undefined behavior. We can juxtapose a couple of passages to be convinced of it. First, and I won't explicitly prove it, table[4] is *(table + 4). We need only ask ourselves the properties of the pointer value table + 4 and how it relates to the requirements of the indirection operator.
On the pointer, we have this passage:
[basic.compound]
3 Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value for that type, or
an invalid pointer value.
Our pointer is of the second bullet's type, not the first. As for the indirection operator:
[expr.unary.op]
1 The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”.
I hope it's obvious from reading this paragraph that the operation is defined for a pointer of the category described by the first bullet in the preceding paragraph.
So we apply an operation to a pointer value for which its behavior is not defined. The result is undefined behavior.
Subscript operator is defined through addition operator. The array decays to a pointer to first element in this identical expression, so rules of pointer arithmetic apply. Indirection operator is used on the hypothetical result of the addition.
[expr.sub]
A postfix expression followed by an expression in square brackets is a postfix expression.
One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.
The result is of type “T”.
The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), ...
In case where the array index is more than one past the last element i.e. E2 > std::size(E1) (which isn't the case in the example program), the hypothetical pointer arithmetic itself is undefined.
[expr.add]
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value ... (does not apply)
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n. (does not apply when i-j > n)
Otherwise, the behavior is undefined.
In case of E2 == std::size(E1) (which is the case in last iteration of the example), the hypothetical result of the addition is a pointer to one past the array and points to outside the storage of the array. The hypothetical pointer arithmetic is well defined.
[basic.compound]
A value of a pointer type that is a pointer ... past the end of an object represents ... the first byte in memory after the end of the storage occupied by the object
Access is defined in terms of objects. But there is no object there, nor is there even storage, and thus there isn't definition for the behaviour.
OK, there might in some cases be an unrelated object in the pointed memory address. Following note says that pointer past the end is not a pointer to such unrelated object sharing the address. I couldn't find which normative rule causes this.
[Note 2: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type, even if the unrelated object is located at that address. ...
Alternatively, we can look at the definition of indirection operator:
[expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type ... and the result is an lvalue referring to the object ... to which the expression points. ...
There is a contradiction because there is no object that could be referred to.
So, in conclusion:
int table[N] = {};
table[N] == 0; // UB, accessing non-existing object
table[N + 1]; // UB, [expr.add]
table + N; // OK, one past last element
table[N]; // ¯\_(ツ)_/¯ See CWG 232
In the standard's specification for pointer arithmetic ([expr.add]/4.2, we have:
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 ≤ i − j ≤ n.
What does "possibly-hypothetical" mean here? The passage already constrains the resulting pointer to be in range of the array. Well, including the one-past-the-end slot. Is that what it's referring to?
Yes, it's the one-past-the-end "element".
[basic.compound]/3: [..] For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T. [..]
Are *variable[0] and variable[0][0] the same thing?
The first one is a pointer to the first element of an array. And the second one is the first element of an array which is pointed by the first element of the pointed array. Are they pointing to the same element?
According to the C Standard (6.5.2.1 Array subscripting)
2 A postfix expression followed by an expression in square brackets []
is a subscripted designation of an element of an array object. The
definition of the subscript operator [] is that E1[E2] is identical to
(*((E1)+(E2))). Because of the conversion rules that apply to the
binary + operator, if E1 is an array object (equivalently, a pointer
to the initial element of an array object) and E2 is an integer,
E1[E2] designates the E2-th element of E1 (counting from zero).
And (6.3.2.1 Lvalues, arrays, and function designators)
3 Except when it is the operand of the sizeof operator or the unary &
operator, or is a string literal used to initialize an array, an
expression that has type ‘‘array of type’’ is converted to an
expression with type ‘‘pointer to type’’ that points to the initial
element of the array object and is not an lvalue. If the array object
has register storage class, the behavior is undefined.
This expression
variable[0]
yields an array. Applying to it the unary operator * the array is converted to pointer to its first element. So
*variable[0] is equivalent to variable[0][0]
On the other hand according to the first quote the expression
variable[0][0] is equivalent to the expression *( variable[0] + 0 ) that in turn is equivalent to *( variable[0] ) or just *variable[0]
Lets say we create an array like:
int a[4]={1,2,3,4};
Now a is the name of this array and also the pointer points to the first element a[0]. So when I want to call the elements in the array, I can use a[ i ] or *(a+i).
Now I have a function:
void print_array(int* array, int arraySize){
for(int i=0; i<arraySize; i++){
cout<<*(array+i)<<endl;
cout<<array[i]<<endl;
}
}
When I pass a[4]={1,2,3,4} into this function using print_array(a,4), for the first line of cout, I fully understand because I use *(a+i) method to access data and a is the pointer I passed.
What I can't understand is: since I pass a pointer a into function, why can I use a in the format of a[i] like the second line of cout? Isn't a a pointer? If a is a pointer why does a[i] work?
This has confused me for a whole day. Any help will be much appreciated!
a is an array, not a pointer. They are not the same things. However, the name a can be implicitly converted to a pointer (with the value &a[0]).
For example;
int main()
{
int a[] = {1,2,3,4};
int *p = a; // p now has the value &a[0]
Now, after this partial code snippet, assuming i is an integral value, rules of the language amount to;
a[i] is equivalent to *(a + i) which is equivalent to *(&a[0] + i)
p[i] is equivalent to *(p + i)
Now, since p is equal to &a[0] this means that a[i], *(a + i), p[i], and *(p + i) are all equivalent.
When calling print_arrat(a, 4) where a is the name of an array, then a is ALWAYS converted to a pointer. This means print_arrat() is always passed a pointer. And this means *(array + i) inside print_arrat() is the same as a[i] in the caller.
This quote from the C++ Standard will make the point clear (5.2.1 Subscripting)
1 A postfix expression followed by an expression in square brackets is
a postfix expression. One of the expressions shall have the type
“array of T” or “pointer to T” and the other shall have unscoped
enumeration or integral type. The result is of type “T.” The type “T”
shall be a completely-defined object type.64 The expression E1[E2] is
identical (by definition) to *((E1)+(E2)) [Note: see 5.3 and 5.7 for
details of * and + and 8.3.4 for details of arrays. —end note], except
that in the case of an array operand, the result is an lvalue if that
operand is an lvalue and an xvalue otherwise.
Because in effect, while the subscript operator is defined on arrays, what happens is that they decay into pointers for the arithmetic to occur.
Meaning if a is an array, semantically what happens is:
int b = a[i]; => int *__tmp = a; int b = *(__tmp + i);
However, once operator overloading comes into play, then it is no longer true that a[i] == *(a + i). The right hand side may not even be defined.
What I can't understand is: since I pass a pointer "a" into function, why can I use "a" in the format of a[i] like the second line of "cout"?
Because subscript operator a[i] is defined for arrays and it is equivalent to *(a+i) by definition.
In the line with cout, you use array[i] however, where array is a pointer. This is also allowed, because the subscript operator is also defined for pointers.
Isn't "a" a pointer?
No. a is an array. array is a pointer.
Given the code:
int arr[] = {11,22,33,44,55}
for(int i = 0; i <5 ; i++)
cout << *(arr+i) << " ";
Does *(arr+i) have the same effect as arr[i]?
Yes. In fact, the subscript operator E1[E2] is defined as equivalent to *((E1)+(E2)):
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “pointer to T” and the other shall have unscoped enumeration or integral type. The result is an lvalue of type “T.” The type “T” shall be a completely-defined object type. The expression E1[E2] is identical (by definition) to *((E1)+(E2)).
yes. array are decayed to pointers. Array name points to first element of array. So
*(arr +i)
is equivalent to:
arr[i]