Lets say we create an array like:
int a[4]={1,2,3,4};
Now a is the name of this array and also the pointer points to the first element a[0]. So when I want to call the elements in the array, I can use a[ i ] or *(a+i).
Now I have a function:
void print_array(int* array, int arraySize){
for(int i=0; i<arraySize; i++){
cout<<*(array+i)<<endl;
cout<<array[i]<<endl;
}
}
When I pass a[4]={1,2,3,4} into this function using print_array(a,4), for the first line of cout, I fully understand because I use *(a+i) method to access data and a is the pointer I passed.
What I can't understand is: since I pass a pointer a into function, why can I use a in the format of a[i] like the second line of cout? Isn't a a pointer? If a is a pointer why does a[i] work?
This has confused me for a whole day. Any help will be much appreciated!
a is an array, not a pointer. They are not the same things. However, the name a can be implicitly converted to a pointer (with the value &a[0]).
For example;
int main()
{
int a[] = {1,2,3,4};
int *p = a; // p now has the value &a[0]
Now, after this partial code snippet, assuming i is an integral value, rules of the language amount to;
a[i] is equivalent to *(a + i) which is equivalent to *(&a[0] + i)
p[i] is equivalent to *(p + i)
Now, since p is equal to &a[0] this means that a[i], *(a + i), p[i], and *(p + i) are all equivalent.
When calling print_arrat(a, 4) where a is the name of an array, then a is ALWAYS converted to a pointer. This means print_arrat() is always passed a pointer. And this means *(array + i) inside print_arrat() is the same as a[i] in the caller.
This quote from the C++ Standard will make the point clear (5.2.1 Subscripting)
1 A postfix expression followed by an expression in square brackets is
a postfix expression. One of the expressions shall have the type
“array of T” or “pointer to T” and the other shall have unscoped
enumeration or integral type. The result is of type “T.” The type “T”
shall be a completely-defined object type.64 The expression E1[E2] is
identical (by definition) to *((E1)+(E2)) [Note: see 5.3 and 5.7 for
details of * and + and 8.3.4 for details of arrays. —end note], except
that in the case of an array operand, the result is an lvalue if that
operand is an lvalue and an xvalue otherwise.
Because in effect, while the subscript operator is defined on arrays, what happens is that they decay into pointers for the arithmetic to occur.
Meaning if a is an array, semantically what happens is:
int b = a[i]; => int *__tmp = a; int b = *(__tmp + i);
However, once operator overloading comes into play, then it is no longer true that a[i] == *(a + i). The right hand side may not even be defined.
What I can't understand is: since I pass a pointer "a" into function, why can I use "a" in the format of a[i] like the second line of "cout"?
Because subscript operator a[i] is defined for arrays and it is equivalent to *(a+i) by definition.
In the line with cout, you use array[i] however, where array is a pointer. This is also allowed, because the subscript operator is also defined for pointers.
Isn't "a" a pointer?
No. a is an array. array is a pointer.
Related
Regular static allocated array looks like this, and may be accessed using the following formulas:
const int N = 3;
const int M = 3;
int a1[N][M] = { {0,1,2}, {3,4,5}, {6,7,8} };
int x = a1[1][2]; // x = 5
int y = *(a1+2+N*1); // y = 5, this is what [] operator is doing in the background
Array is continuous region of memory. It looks different in case of dynamic array allocation, there is array of pointer to arrays instead:
int** a2 = new int*[N];
for (int i = 0; i < N; i++)
a2[i] = new int[M];
//Assignment of values as in previous example
int x = a2[1][2];
int y = *(*(a2+1))+2); // This is what [] operator is doing in the background, it needs to dereference pointers twice
As we can see, operations done by [] operator are completely different in case of typical continuous array and dynamically allocated array.
My questions are now following:
Is my understanding of [] operations correct?
How C/C++ compiler can distinguish which [] operation it should perform, and where it's implemented? I can image implementing it myself in C++ by overloading [] operator, but how C/C++ treat this?
Will it work correctly in C language using malloc instead of new? I don't see any reasons why not actually.
For this declaration of an array
int a1[N][M] = { {0,1,2}, {3,4,5}, {6,7,8} };
these records
int x = a1[1][2];
int y = *(a1+2+N*1);
are not equivalent.
The second one is incorrect. The expression *(a1+2+N*1) has the type int[3] that is implicitly converted to an object of the type int * used as an initializer. So the integer variable y is initialized by a pointer.
The operator a1[1] is evaluated like *( a1 + 1 ) . The result is a one-dimensional array of the type int[3].
So applying the second subscript operator you will get *( *( a1 + 1 ) + 2 ).
The difference between the expressions when used the two-dimensional array and the dynamically allocated array is that the designator of the two-dimensional array in this expression (a1 + 1) is implicitly converted to a pointer to its first element of the type int ( * )[3] while the pointer to the dynamically allocated array of pointers still have the same type int **.
In the first case dereferencing the expression *(a1 + 1 ) you will get lvalue of the type int[3] that in turn used in the expression *( a1 + 1) + 2 is again implicitly converted to a pointer of the type int *.
In the second case the expression *(a1 + 1) yields an object of the type int *.
In the both cases there is used the pointer arithmetic. The difference is that when you are using arrays in the subscript operator then they are implicitly converted to pointers to their first elements.
When you are allocating dynamically arrays when you are already deals with pointers to their first elements.
For example instead of these allocations
int** a2 = new int*[N];
for (int i = 0; i < N; i++)
a2[i] = new int[M];
you could just write
int ( *a2 )[M] = new int[N][M];
Is my understanding of [] operations correct?
int y = *(a1+2+N*1); // y = 5, this is what [] operator is doing in the background
By definition, the way to translate the subscript operators to the corresponding indirection and pointer arithmetic is:
int y = *(*(a1+1)+2)
Which is exactly the same as in the case of int**.
How C/C++ compiler can distinguish which [] operation it should perform
The compiler uses the type system. It knows the types of the expressions and it knows what subscript operation means for each type.
Will it work correctly in C language using malloc instead of new? I don't see any reasons why not actually.
It doesn't matter how an array is created. Subscript operator works the same way with all pointers.
a1 and a2 are different types, and as such, the behavior of operator [] will depend on how that type defines the operator. In this case you're dealing with intrinsic compiler behaviors that conform to the C++ spec, but it could just as well be a std::unique_ptr<>, or MyClass with overloaded operator[]
Each operation leads to result of some specific type. Each type defines what kind of operation is available for it.
Note that array has ability to decay to pointer to element of array. So some_array + int_value leads to pointer to element.
Here is code which exposes types of each step: https://godbolt.org/z/jeKWh5WWW
#include <type_traits>
const int N = 3;
const int M = 4;
int a1[N][M] = { {0,1,2,0}, {3,4,5,0}, {6,7,8,0} };
int** a2 = new int*[N];
static_assert(
std::is_same_v<decltype(a1[0][0]), int&>,
"value type is reference to int");
static_assert(
std::is_same_v<decltype(a1[0]), int(&)[M]>,
"row type is reference to int aray");
static_assert(
std::is_same_v<decltype(a1 + 1), int(*)[M]>,
"advanced pointer is pointer to array of ints");
static_assert(
!std::is_same_v<decltype(a1[0]), int*&>,
"row type is reference to int pointer");
static_assert(
std::is_same_v<decltype(a2[0][0]), int&>,
"value type is reference to int");
static_assert(
!std::is_same_v<decltype(a2[0]), int(&)[M]>,
"row type is not reference to int aray");
static_assert(
std::is_same_v<decltype(a2 + 1), int**>,
"advanced pointer is pointer to pointer to int");
static_assert(
std::is_same_v<decltype(a2[0]), int*&>,
"row type is reference to int pointer");
I think this is good appendix to other answers.
How C/C++ compiler can distinguish which [] operation it should perform, and where it's implemented?
The built-in [] operator (that is, not a user-defined overload) always does one thing: It adds its two operands and dereference the results. E1[E2] is defined to be (*((E1)+(E2))). Here is how this works:
If E1 or E2 is an array, it is automatically converted to a pointer to its first element. This not a part of the [] operator per se; it is a built-in part of the C and C++ languages. In C, the specific rule is that, whenever an array is used in an expression other than as the operand of sizeof, the operand of unary &, or as a string literal used to initialize an array, it is converted to a pointer to its first element.
Thus, whether the code is written with a pointer or an array, [] always has a pointer operand. You may write an array, but [] always receives a pointer.
The + operator adds an integer to a pointer by adjusting the pointer by the given number of elements: Given a pointer to element j of an array and an integer k to add to it, it produces a pointer to element j+k of the array.
From the pointer to an element, the * operator produces an lvalue for the referenced element.
The combination of automatic array conversion, +, and *, means that A[i] produces an lvalue for element i of the array A.
Here is how this works for the expression A[i][j] where A is an array declared as SomeType A[m][n]:
In A[i][j], A is an array of m arrays of n elements. It is automatically converted to a pointer to its first element (the one with index 0).
Then A[i] produces an lvalue for element i of this array. In other words, the result of A[i] is an array; it is an array of n SomeType objects.
Since the result of A[i] is an array, it is automatically converted to a pointer to its first element.
Then A[i][j] produces an lvalue for element j of that array.
Since the pointer arithmetic operates in units of the pointed-to-type, it includes the scaling for the sizes of the elements. This is what makes the calculation of A[i] scaled by the size of the subarray of n elements.
Will it work correctly in C language using malloc instead of new? I don't see any reasons why not actually.
Sure, if done correctly.
I know that in c++ access out of buffer bounds is undefined behaviour.
Here is example from cppreference:
int table[4] = {};
bool exists_in_table(int v)
{
// return true in one of the first 4 iterations or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return true;
}
return false;
}
But, I can't find according paragraph in c++ standard.
Can anyone point me out on concrete paragraph in standard where such case is explained?
It's undefined behavior. We can juxtapose a couple of passages to be convinced of it. First, and I won't explicitly prove it, table[4] is *(table + 4). We need only ask ourselves the properties of the pointer value table + 4 and how it relates to the requirements of the indirection operator.
On the pointer, we have this passage:
[basic.compound]
3 Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value for that type, or
an invalid pointer value.
Our pointer is of the second bullet's type, not the first. As for the indirection operator:
[expr.unary.op]
1 The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”.
I hope it's obvious from reading this paragraph that the operation is defined for a pointer of the category described by the first bullet in the preceding paragraph.
So we apply an operation to a pointer value for which its behavior is not defined. The result is undefined behavior.
Subscript operator is defined through addition operator. The array decays to a pointer to first element in this identical expression, so rules of pointer arithmetic apply. Indirection operator is used on the hypothetical result of the addition.
[expr.sub]
A postfix expression followed by an expression in square brackets is a postfix expression.
One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.
The result is of type “T”.
The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), ...
In case where the array index is more than one past the last element i.e. E2 > std::size(E1) (which isn't the case in the example program), the hypothetical pointer arithmetic itself is undefined.
[expr.add]
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value ... (does not apply)
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n. (does not apply when i-j > n)
Otherwise, the behavior is undefined.
In case of E2 == std::size(E1) (which is the case in last iteration of the example), the hypothetical result of the addition is a pointer to one past the array and points to outside the storage of the array. The hypothetical pointer arithmetic is well defined.
[basic.compound]
A value of a pointer type that is a pointer ... past the end of an object represents ... the first byte in memory after the end of the storage occupied by the object
Access is defined in terms of objects. But there is no object there, nor is there even storage, and thus there isn't definition for the behaviour.
OK, there might in some cases be an unrelated object in the pointed memory address. Following note says that pointer past the end is not a pointer to such unrelated object sharing the address. I couldn't find which normative rule causes this.
[Note 2: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type, even if the unrelated object is located at that address. ...
Alternatively, we can look at the definition of indirection operator:
[expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type ... and the result is an lvalue referring to the object ... to which the expression points. ...
There is a contradiction because there is no object that could be referred to.
So, in conclusion:
int table[N] = {};
table[N] == 0; // UB, accessing non-existing object
table[N + 1]; // UB, [expr.add]
table + N; // OK, one past last element
table[N]; // ¯\_(ツ)_/¯ See CWG 232
In C++, when I have to use an array inside some function, I pass the array as an argument and get a pointer pointing to the first element of the array. While it is okay to use, and not much of a hassle to use a pointer, I was wondering if there exists some in-built header file, or any other set of instructions, by which when I want to access the i-th element of the array I can simply write array[i] and it gets read by the compiler as *(array+i)?
It would be great if one exists, since it would make it quite uniform and easy to code, since all the times when I use vectors I can access the i-th element just by vector[i] while in array I have to use it the other way *(array+i).
Also, is there some reason why the developers of C++ chose to return pointers to an array instead of the object itself?
If a is a pointer and i has an integral type, then a[i] is always the same as *(a+i). There is no need to include a header or anything to make it work.
I was wondering if there exists some in-built header file or any other set of instructions by which when I want to access the i-th element of the array I can simply write array[i] and it gets read by the compiler as *(array+i)
No, there is no such header or set of instructions, because it is part of the language.
For a pointer and an integral type, a[i] means *(a+i). This is such a strong statement that:
int base_array[3]={1,2,3}; // an array of 3 elements
int* ptr_array = base_array; // base_array "decays" to a pointer to the first element
std::cout << 2[ptr_array] << "\n"; // huh?!?!
prints 3, because *(2+ptr_array) is 3; ie, it even works backwards.
Can we somehow use array[i] instead of *(array+i)?
Yes, we can. Those expressions are practically identical. The subscript operator is much more readable, so I recommend using that.
I was wondering if there exists some in-built header file
You don't need to include any header.
while in array i have to use it the other way *(array+i)
Just because you can, doesn't mean that you have to. You don't have to use it the other way.
P.S. Besides arrays and vectors, we can also use the subscript operator with a pointer to element of an array.
Also, is there some reason why the developers of C++ chose to return pointers to an array instead of the object itself?
Because sometimes indirection is necessary or useful.
From the C++ 14 Standard (5.2.1 Subscripting)
1 A postfix expression followed by an expression in square brackets is
a postfix expression. One of the expressions shall have the type
“array of T” or “pointer to T” and the other shall have unscoped
enumeration or integral type. The result is of type “T.” The type “T”
shall be a completely-defined object type. The expression E1[E2] is
identical (by definition) to *((E1)+(E2)) [ Note: see 5.3 and 5.7
for details of * and + and 8.3.4 for details of arrays. — end note ],
except that in the case of an array operand, the result is an lvalue
if that operand is an lvalue and an xvalue otherwise.
So these expressions array[i] and *( array + i ) are evaluated the same way whether array in this expressions is an array designator or a pointer to first element of an array,
Moreover these expressions array[i] and i[array] are also evaluated the same way.
Arrays are non-modifianle lvalues so you may not return an array from a function. If to use an array designator in a return statement it will be converted to pointer to its first element. As a result arrays do not have the copy assignment operator.
On the other hand you can return a reference to an array provided that it doesn not have automatic storage duration.
For example
#include <iostream>
const size_t N = 5;
decltype( auto ) f( int ( &a )[N], int init )
{
for ( size_t i = 0; i < N; i++ )
{
a[i] = init++;
}
return a;
}
int main()
{
int a[N];
decltype( auto ) ra = f( a, 0 );
std::cout << sizeof( ra ) << '\n';
for ( const auto &item : ra )
{
std::cout << item << ' ';
}
std::cout << '\n';
return 0;
}
The program output is
20
0 1 2 3 4
This question already has answers here:
What's the type difference between a, &a, and &a[0]?
(3 answers)
Difference between `a` and `&a` in C++ where `a` is an array
(4 answers)
How come an array's address is equal to its value in C?
(6 answers)
Closed 1 year ago.
When we using '&' operator in scanf, do we scan address or exact value in that adress? For instance i don't understand how this 2 code give to us same result.
CODE 1
#include <stdio.h>
int main(){
int arr[6], i, sum=0;
for(i=0;i<6;i++){
scanf("%d", &arr[i]);
sum+=arr[i];
}
printf("%d", sum);
}
CODE 2
#include <stdio.h>
int main(){
int arr[6], i, sum=0;
for(i=0;i<6;i++){
scanf("%d", (arr+i));
sum+=*(arr+i);
}
printf("%d", sum);
}
scanf() requires you to pass it the address of each variable that you want it to write parsed values to. The & operator returns those addresses.
In your examples, &arr[i] and (arr+i) both represent the same address of the i'th array element, and also arr[i] and *(arr+i) both represent accesses to that same element's stored value.
& is the address-of operator, i.e. given an object, it gives you a pointer to that object.
Along with ordinary arithemetic, operator+ applied to a (pointer, integer) pair gives you a pointer to another element of the same array (so long as there are such elements) or a pointer one past the end of the array (if the value is exactly distance to the end) or undefined behaviour (if you would be out of range).
a[i] is defined as *(a + i). It dereferences the calculated pointer.
Putting that together, arr + i is an int* expression, whereas arr[i] is an int expression. As we need an int* for scanf, & is used to get one.
From the C Standard (6.5.2.1 Array subscripting)
2 A postfix expression followed by an expression in square brackets []
is a subscripted designation of an element of an array object. The
definition of the subscript operator [] is that E1[E2] is identical to
(*((E1)+(E2))). Because of the conversion rules that apply to the
binary + operator, if E1 is an array object (equivalently, a pointer
to the initial element of an array object) and E2 is an integer,
E1[E2] designates the E2-th element of E1 (counting from zero).
So the expression &arr[i] is equivalent to &( *( arr + i ) ). The operators * and & applied to the expression arr + i sequentially may be omitted and you will get that &arr[i] is equivalent tp arr + i. The both expressions are pointers to the i-th element of the array arr.
The C++17 standard seems to say that an integer can only be added to a pointer if the pointer is to an array element, or, as a special exception, the pointer is the result of unary operator &:
8.5.6 [expr.add] describing addition to a pointer:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined.
This quote includes a non-normative footnote:
An object that is not an array element is considered to belong to a single-element array for this purpose; see 8.5.2.1
which references 8.5.2.1 [expr.unary.op] discussing the unary & operator:
The result of the unary & operator is a pointer to its operand... For purposes of pointer arithmetic (8.5.6) and comparison (8.5.9, 8.5.10), an object that is not an array element whose address is taken in this way is considered to belong to an array with one element of type T.
The non-normative footnote seems to be slightly misleading, as the section it references describes behavior specific to the result of unary operator &. Nothing appears to permit other pointers (e.g. from non-array new) to be considered single-element arrays.
This seems to suggest:
void f(int a) {
int* z = (new int) + 1; // undefined behavior
int* w = &a + 1; // ok
}
Is this an oversight in the changes made for C++17? Am I missing something? Is there a reason that the "single-element array rule" is only provided specifically for unary operator &?
Note: As specified in the title, this question is specific to C++17. The C standard and prior versions of the C++ standard contained clear normative language that is no longer present. Older, vague questions like this are not relevant.
Yes, this appears to be a bug in the c++17 standard.
int* z = (new int)+1; // undefined behavior.
int* a = new int;
int* b = a+1; // undefined behavior, same reason as `z`
&*a; // seeming noop, but magically makes `*a` into an array of one element!
int* c = a+1; // defined behavior!
this is pretty ridiculous.
8.5.2.1 [expr.unary.op]
[...] an object that is not an array element whose address is taken in this way is considered to belong to an array with one element of type T
once "blessed" by 8.5.2.1, the object is an array of one element. If you don't bless it by invoking & at least once, it has never been blessed by 8.5.2.1 and is not an array of one element.
It was fixed as a defect in c++20.