C/C++: Is this undefined behavior? (2D arrays) - c++

Is it undefined behavior if I go through the elements of a 2D array in the following manner?
int v[5][5], i;
for (i = 0; i < 5*5; ++i) {
v[i] = i;
}
Then again, does it even compile? (I can't try it right now, I'm not at home.) If it doesn't, then imagine I somehow acquired a pointer to the first element and using taht instead of v[i].

Accessing elements of a multidimensional array from a pointer to the first element is Undefined Behavior (UB) for the elements that are not part of the first array.
Given T array[n], array[i] is a straight trip to UB-land for all i >= n. Even when T is U[m]. Even if it's through a pointer. It's true there are strong requirements on arrays (e.g. sizeof(int[N]) == N*sizeof(int)), as mentioned by others, but no exception is explicitly made so nothing can be done about it.
I don't have an official reference because as far as I can tell the C++ standard leaves the details to the C89 standard and I'm not familiar with either the C89 or C99 standard. Instead I have a reference to the comp.lang.c FAQ:
[...] according to an official interpretation, the behavior of accessing (&array[0][0])[x] is not defined for x >= NCOLUMNS.

It will not compile.
The more of less equivalent
int v[5][5], *vv, i;
vv = &v[0][0];
for (i = 0; i < 5*5; ++i) {
vv[i] = i;
}
and
int v[5][5], i;
for (i = 0; i < 5*5; ++i) {
v[0][i] = i;
}
will compile. I'm not sure if they are UB or not (and it could in fact be different between C90, C99 and C++; aliasing is a tricky area). I'll try to find references one way or the other.

It is really quite hard to find any reference in the standard explicitly stating that this is undefined behavior. Sure, the standard clearly states (C99 6.5.6 §8-9) that if you do pointer arithmetics beyond the array, it is UB. The question then is, what is the definition of an array?
If a multi-dimensional array is regarded as an array of array objects, then it is UB. But if it is regarded as one array with multiple dimensions, the code would be perfectly fine.
There is an interesting note of another undefined behavior in Annex J of the standard:
An array subscript is out of range,
even if an object is apparently
accessible with the given subscript
(as in the lvalue expression a[1][7]
given the declaration int a[4][5])
(6.5.6).
This insinuates that accessing a multi-dimensional array out of the range of the 1st dimension is undefined behavior. However, the annex is not normative text, and 6.5.6 is quite vauge.
Perhaps someone can find a clear definition of the difference between an array object and a multi-dimensional array? Until then, I am not convinced that this is UB.
EDIT: Forgot to mention that v[i] is certainly not valid C syntax. As per 6.5.2.1, v[i] is equivalent to *(v+i), which is an array pointer and not an array element. What I am not certain about is whether accessing it as v[0][too_large_value] is UB or not.

Here v[i] stands for integer array of 5 elements..
and an integer array is referenced by an address location which depending on your 'c' compiler could be 16 bits, 32 bits...
so v[i] = i may compile in some compilers.... but it definitely won't yield the result u are looking for.
Answer by sharptooth is correct v[i][j] = i... is one of the easiest and readable solution..
other could be
int *ptr;
ptr = v;
now u can iterate over this ptr to assign the values
for (i = 0; i < 5*5; i++, ptr++) {
*ptr = i;
}

This will not compile.
You will get the following error for the line:
v[i] = i;
error: incompatible types in assignment of ‘int’ to ‘int [5]’
To give an answer taken from a similar question at:
http://www.velocityreviews.com/forums/t318379-incompatible-types-in-assignment.html
v is a 2D array. Since you are only referencing one dimension, what you end up getting is a char pointer to the underlying array, and hence this statement is trying to assign a char constant to a char pointer.
You can either use double quotes to change the constant to a C-style string or you can explicitly reference v[i][0] which is what I assume you intended.

Related

Why is pointer/reference to array allowed to not specify the array size? [duplicate]

This question already has an answer here:
Call Parameter as Reference to Array of Unknown Bound in C++
(1 answer)
Closed 8 months ago.
This may be a bad question as I'm really new to the concept of statically-typed language and arrays in C++ and may have misunderstood these concepts...
I know that C++ is a statically-typed language, in my understanding, this implies that all types must be known at compile-time.
Also, I know that the number of elements in an array is part of the array's type, and thus the dimension must be known at compile time.
int a = 10;
int arr[a]; // error
However, I noticed that we could have:
int (*p)[];
void fun(int(&r)[]) { }
Why can p and r not specify the size of the array?
My humble guess is that, for the above cases, the type of p is just int(*)[] and the type of r is just int(&)[], that is, for pointer/reference to array, it is allowed to have type which does not include the size of the array. (At first I was tending to think of p and r being in some kind of state where their types are to be "completed" later, but then I realized that if so, C++ would not be statically-typed)
My I ask if my guess is correct?
It seems really quirky to me, since we can do (in C++20):
int a[10];
int b[100];
p = &a;
p = &b;
fun(a);
fun(b);
Why would C++ allow such types (and conversion from the types with array size to them)? When are these "incomplete" type useful?
Everything you said is correct.
In addition to the regular array types T[N] there are "arrays of unknown bound" T[].
Those types are "incomplete", meaning objects of those types can't be created, sizeof() can't be applied to them, etc. Attempting to create an array of unknown bound requires you to provide an initializer, and transforms the type into a regular array, based on the number of elements in the initializer.
References and pointers to arrays of unknown bound are allowed to point to regular arrays (and that's the only thing they could point to). This is a C++20 change.
Those are indeed quirky. It's not immediately clear to me why they would be useful. They allow you to have a function that accepts a reference/pointer to any "array of T" without adding the array size as a template parameter, and without also accepting "pointers to T".
Yes these are legal types explicitly allowed by the C++ standard. Such types are useful in about the same circumstances where bare pointers that point to array elements are useful. In both cases, there is an array, but the type of the pointer/reference does not reflect the size of the array. Here are two silly examples:
// use a pointer to an array element
int sum(int* array, int size) {
int ret = 0;
for (int i = 0; i < size; ++i) {
ret += array[i];
}
return ret;
}
// use a reference to an array
int sum(int (&array)[], int size) {
int ret = 0;
for (int i = 0; i < size; ++i) {
ret += array[i];
}
return ret;
}

Is flattening a multi-dimensional array with a cast UB [duplicate]

Consider the following code:
int a[25][80];
a[0][1234] = 56;
int* p = &a[0][0];
p[1234] = 56;
Does the second line invoke undefined behavior? How about the fourth line?
Both lines do result in undefined behavior.
Subscripting is interpreted as pointer addition followed by an indirection, that is, a[0][1234]/p[1234] is equivalent to *(a[0] + 1234)/*(p + 1234). According to [expr.add]/4 (here I quote the newest draft, while for the time OP is proposed, you can refer to this comment, and the conclusion is the same):
If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.
since a[0](decayed to a pointer to a[0][0])/p points to an element of a[0] (as an array), and a[0] only has size 80, the behavior is undefined.
As Language Lawyer pointed out in the comment, the following program does not compile.
constexpr int f(const int (&a)[2][3])
{
auto p = &a[0][0];
return p[3];
}
int main()
{
constexpr int a[2][3] = { 1, 2, 3, 4, 5, 6, };
constexpr int i = f(a);
}
The compiler detected such undefined behaviors when it appears in a constant expression.
It's up to interpretation. While the contiguity requirements of arrays don't leave much to the imagination in terms of how to layout a multidimensional arrays (this has been pointed out before), notice that when you're doing p[1234] you're indexing the 1234th element of the zeroth row of only 80 columns. Some interpret the only valid indices to be 0..79 (&p[80] being a special case).
Information from the C FAQ which is the collected wisdom of Usenet on matters relevant to C. (I do not think C and C++ differ on that matter and that this is very much relevant.)
In the language the Standard was written to describe, there would be no problem with invoking a function like:
void print_array(double *d, int rows, int cols)
{
int r,c;
for (r = 0; r < rows; r++)
{
printf("%4d: ", r);
for (c = 0; c < cols; c++)
printf("%10.4f ", d[r*cols+c]);
printf("\n");
}
}
on a double[10][4], or a double[50][40], or any other size, provided that the total number of elements in the array was less than rows*cols. Indeed, the guarantee that the row stride of a T[R][C] would equal C * sizeof (T) was designed among other things to make it possible to write code that could work with arbitrarily-sized multi-dimensional arrays.
On the other hand, the authors of the Standard recognized that when implementations are given something like:
double d[10][10];
double test(int i)
{
d[1][0] = 1.0;
d[0][i] = 2.0;
return d[1][0];
}
allowing them to generate code that would assume that d[1][0] would still hold 1.0 when the return executes, or allowing them to generate code that would trap if i is greater than 10, would allow them to be more suitable for some purposes than requiring that they silently return 2.0 if invoked with i==10.
Nothing in the Standard makes any distinction between those scenarios. While it would have been possible for the Standard to have included rules that would say that the second example invokes UB if i >= 10 without affecting the first example (e.g. say that applying [N] to an array doesn't cause it to decay to a pointer, but instead yields the Nth element, which must exist in that array), the Standard instead relies upon the fact that implementations are allowed to behave in useful fashion even when not required to do so, and compiler writers should presumably be capable of recognizing situations like the first example when doing so would benefit their customers.
Since the Standard never sought to fully define everything that programmers would need to do with arrays, it should not be looked to for guidance as to what constructs quality implementations should support.
Your compiler will throw a bunch of warnings/errors because of subscript out of range (line 2) and incompatioble types (line 3), but as long as the actual variable (int in this case) is one of the intrinsic base-types this is save to do in C and C++.
(If the variable is a class/struct it will probably still work in C, but in C++ all bets are off.)
Why you would want to do this....
For the 1st variant: If your code relies on this sort of messing about it will be error-prone and hard to maintain in the long run.
I can see a some use for the second variant when performance optimizing loops over 2D arrays by replacing them by a 1D pointer run over the data-space, but a good optimizing compiler will often do that by itself anyway.
If the body of the loop is so big/complex the compiler can't optimize/replace the loop by a 1D run on it's own, the performance gain by doing it manually will most likely not be significant either.
You're free to reinterpret the memory any way you'd like. As long as the multiple does not exceed the linear memory. You can even move a to 12, 40 and use negative indexes.
The memory referenced by a is both a int[25][80] and a int[2000]. So says the Standard, 3.8p2:
[ Note: The lifetime of an array object starts as soon as storage with proper size and alignment is obtained, and its lifetime ends when the storage which the array occupies is reused or released. 12.6.2 describes the lifetime of base and member subobjects. — end note ]
a has a particular type, it is an lvalue of type int[25][80]. But p is just int*. It is not "int* pointing into a int[80]" or anything like that. So in fact, the int pointed to is an element of int[25][80] named a, and also an element of int[2000] occupying the same space.
Since p and p+1234 are both elements of the same int[2000] object, the pointer arithmetic is well-defined. And since p[1234] means *(p+1234), it too is well-defined.
The effect of this rule for array lifetime is that you can freely use pointer arithmetic to move through a complete object.
Since std::array got mentioned in the comments:
If one has std::array<std::array<int, 80>, 25> a; then there does not exist a std::array<int, 2000>. There does exist a int[2000]. I'm looking for anything that requires sizeof (std::array<T,N>) == sizeof (T[N]) (and == N * sizeof (T)). Absent that, you have to assume that there could be gaps which mess up traversal of nested std::array.

Internal logic of operator [] when dealing with pointers

I've been studying C++ for couple of months now and just recently decided to look more deeply into the logic of pointers and arrays. What I've been taught in uni is pretty basic - pointers contain the address of a variable. When an array is created, basically a pointer to its first element is created.
So I started experimenting a bit. (and got to a conclusion which I need confirmation for). First of all I created
int arr[10];
int* ptr = &arr[5];
And as you would imagine
cout << ptr[3];
gave me the 8th element of the array. Next I tried
int num = 6;
int* ptr2 = &num;
cout << ptr2[5];
cout << ptr2 + 5;
which to my great delight (not irony) returned the same addresses. Even though num wasn't an array.
The conclusion to which I got: array is not something special in C++. It's just a pointer to the first element (already typed that). More important: Can I think about every pointer in the manner of object of a class variable*. Is the operator [] just overloaded in the class int*? For example to be something along the lines of:
int operator[] (int index){
return *(arrayFirstaddress + index);
}
What was interesting to me in these experiments is that operator [] works for EVERY pointer. (So it's exactly like overloading an operator for all instances of the said class)
Of course, I can be as wrong as possible. I couldn't find much information in the web, since I didn't know how to word my question so I decided to ask here.
It would be extremely helpful if you explained to me if I'm right/wrong/very wrong and why.
You find the definition of subscripting, i.e. an expression like ptr2[5] in the c++ standard, e.g. like in this online c++ draft standard:
5.2.1 Subscripting [expr.sub]
(1) ... The expression E1[E2] is identical (by definition) to
*((E1)+(E2))
So your "discovery" sounds correct, although your examples seem to have some bugs (e.g. ptr2[5] should not return an address but an int value, whereas ptr2+5 is an address an not an int value; I suppose you meant &ptr2[5]).
Further, your code is not a prove of this discovery as it is based on undefined behaviour. It may yield something that supports your "discovery", but your discovery could still be not valid, and it could also do the opposite (really!).
The reason why it is undefined behaviour is that even pointer arithmetics like ptr2+5 is undefined behaviour if the result is out of the range of the allocated memory block ptr2 points to (which is definitely the case in your example):
5.7 Additive operators
(6) ... Unless both pointers point to elements of the same array
object, or one past the last element of the array object, the behavior
is undefined.
Different compilers, different optimization settings, and even slight modifications anywhere in your program may let the compiler do other things here.
An array in C++ is a collection of objects. A pointer is a variable that can store the address of something. The two are not the same thing.
Unfortunately, your sample
int num = 6;
int* ptr2 = &num;
cout << ptr2[5];
cout << ptr2 + 5;
exhibits undefined behaviour, both in the evaluation of ptr2[5] and ptr2 + 5. Pointer expressions are special - arithmetic involving pointers only has defined behaviour if the pointer being acted on (ptr2 in this case) and the result (ptr2 + 5) are within the same object. Or one past the end (although dereferencing a "one past the end" pointer - trying to access the value it points at - also gives undefined behaviour).
Semantically, *(ptr + n) and ptr[n] are equivalent (i.e. they have the same meaning) if ptr is a pointer and n is an integral value. So if evaluating ptr + n gives undefined behaviour, so does evaluating ptr[n]. Similarly, &ptr[n] and ptr + n are equivalent.
In expressions, depending on context, the name of an array is converted to a pointer, and that pointer is equal to the address of that array's first element. So, given
int x[5];
int *p;
// the following all have the same effect
p = x + 2;
p = &x[0] + 2;
p = &x[2];
That does not mean an array is a pointer though.

Why does C++ consider pointer and array of pointers as same thing?

Once I encountered following code:
int s =10;
int *p=&s;
cout << p[3] << endl;
And I can't understand why am I able to access p[3] that doesn't exist (only p exists that is single pointer but I still get access to p[3] that is array that I have never created).
Is it some compiler bug or it is a feature or I don't know some basics of C++ that covers this?
Thank you
Why does C++ consider pointer and array of pointers as same thing?
It doesn't. You're asking why it treats pointers and arrays as the same.
The [] operator is just an abbreviated form of pointer arithmetic. a[b] is equivalent to *(a + b). Array names can decay into pointers, and then pointer arithmetic is applied. It's the programmers job to make sure they don't go out of bounds. The compiler can't possibly stop you from shooting your foot off.
Also, claiming to be able to "access" it is a strong assertion. That is UB, and is most likely going to either read the wrong memory or get a segfault.
No, it's not a compiler bug, its a very useful feature... but lets not get ahead of ourselves here, the consequence of your code is called Undefined Behaviour
So, what's the feature? All naked arrays are actually pointer to the first element. Except un-decayed arrays (See What is array decaying?).
Consider this code:
int s =10;
int* array = new int[12];
int *p;
p = array; // p refers to the first element
int* x = p + 7; //advances to the 7th element, compiler never checks bounds
int* y = p + 700; //ditto ...this is obviously undefined
p = &s; //p is now pointing to where s
int* xx = p + 3; //But s is a single element, so Undefined Behaviour
Once an array is decayed, it's simply a pointer... And a pointer can be incremented, decremented, dereferenced, advanced, assigned or reassigned.
So,
cout << p[7] << endl;
is a valid C++ program. but not necessarily correct.
It's the responsibility of the programmer to know whether a pointer points to a single element or an array. but thanks to static analyzers and https://github.com/isocpp/CppCoreGuidelines, things are changing for good.
Also see What are all the common undefined behaviours that a C++ programmer should know about?
From here, section array-to-pointer decay:
There is an implicit conversion from lvalues and rvalues of array type to rvalues of pointer type: it constructs a pointer to the first element of an array. This conversion is used whenever arrays appear in context where arrays are not expected, but pointers are
Inherited from C, C++ allows you to treat any pointer like the first element of an array starting at that address.
That's in part because it passes arrays by reference as pointers and so for that to make sense you need to be able to treat a pointer as an array.
It also enables some quite neat and very efficient code in various circumstances.
The upshot is that p[3] is a valid construct in this context.
Obviously however it has undefined behaviour because p isn't pointing to an array! Unfortunately the language rules (and compiler) aren't smart enough to work that out.
C is a very low level language and doesn't enforce nice things like range checking either during compilation or execution.

May I treat a 2D array as a contiguous 1D array?

Consider the following code:
int a[25][80];
a[0][1234] = 56;
int* p = &a[0][0];
p[1234] = 56;
Does the second line invoke undefined behavior? How about the fourth line?
Both lines do result in undefined behavior.
Subscripting is interpreted as pointer addition followed by an indirection, that is, a[0][1234]/p[1234] is equivalent to *(a[0] + 1234)/*(p + 1234). According to [expr.add]/4 (here I quote the newest draft, while for the time OP is proposed, you can refer to this comment, and the conclusion is the same):
If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.
since a[0](decayed to a pointer to a[0][0])/p points to an element of a[0] (as an array), and a[0] only has size 80, the behavior is undefined.
As Language Lawyer pointed out in the comment, the following program does not compile.
constexpr int f(const int (&a)[2][3])
{
auto p = &a[0][0];
return p[3];
}
int main()
{
constexpr int a[2][3] = { 1, 2, 3, 4, 5, 6, };
constexpr int i = f(a);
}
The compiler detected such undefined behaviors when it appears in a constant expression.
It's up to interpretation. While the contiguity requirements of arrays don't leave much to the imagination in terms of how to layout a multidimensional arrays (this has been pointed out before), notice that when you're doing p[1234] you're indexing the 1234th element of the zeroth row of only 80 columns. Some interpret the only valid indices to be 0..79 (&p[80] being a special case).
Information from the C FAQ which is the collected wisdom of Usenet on matters relevant to C. (I do not think C and C++ differ on that matter and that this is very much relevant.)
In the language the Standard was written to describe, there would be no problem with invoking a function like:
void print_array(double *d, int rows, int cols)
{
int r,c;
for (r = 0; r < rows; r++)
{
printf("%4d: ", r);
for (c = 0; c < cols; c++)
printf("%10.4f ", d[r*cols+c]);
printf("\n");
}
}
on a double[10][4], or a double[50][40], or any other size, provided that the total number of elements in the array was less than rows*cols. Indeed, the guarantee that the row stride of a T[R][C] would equal C * sizeof (T) was designed among other things to make it possible to write code that could work with arbitrarily-sized multi-dimensional arrays.
On the other hand, the authors of the Standard recognized that when implementations are given something like:
double d[10][10];
double test(int i)
{
d[1][0] = 1.0;
d[0][i] = 2.0;
return d[1][0];
}
allowing them to generate code that would assume that d[1][0] would still hold 1.0 when the return executes, or allowing them to generate code that would trap if i is greater than 10, would allow them to be more suitable for some purposes than requiring that they silently return 2.0 if invoked with i==10.
Nothing in the Standard makes any distinction between those scenarios. While it would have been possible for the Standard to have included rules that would say that the second example invokes UB if i >= 10 without affecting the first example (e.g. say that applying [N] to an array doesn't cause it to decay to a pointer, but instead yields the Nth element, which must exist in that array), the Standard instead relies upon the fact that implementations are allowed to behave in useful fashion even when not required to do so, and compiler writers should presumably be capable of recognizing situations like the first example when doing so would benefit their customers.
Since the Standard never sought to fully define everything that programmers would need to do with arrays, it should not be looked to for guidance as to what constructs quality implementations should support.
Your compiler will throw a bunch of warnings/errors because of subscript out of range (line 2) and incompatioble types (line 3), but as long as the actual variable (int in this case) is one of the intrinsic base-types this is save to do in C and C++.
(If the variable is a class/struct it will probably still work in C, but in C++ all bets are off.)
Why you would want to do this....
For the 1st variant: If your code relies on this sort of messing about it will be error-prone and hard to maintain in the long run.
I can see a some use for the second variant when performance optimizing loops over 2D arrays by replacing them by a 1D pointer run over the data-space, but a good optimizing compiler will often do that by itself anyway.
If the body of the loop is so big/complex the compiler can't optimize/replace the loop by a 1D run on it's own, the performance gain by doing it manually will most likely not be significant either.
You're free to reinterpret the memory any way you'd like. As long as the multiple does not exceed the linear memory. You can even move a to 12, 40 and use negative indexes.
The memory referenced by a is both a int[25][80] and a int[2000]. So says the Standard, 3.8p2:
[ Note: The lifetime of an array object starts as soon as storage with proper size and alignment is obtained, and its lifetime ends when the storage which the array occupies is reused or released. 12.6.2 describes the lifetime of base and member subobjects. — end note ]
a has a particular type, it is an lvalue of type int[25][80]. But p is just int*. It is not "int* pointing into a int[80]" or anything like that. So in fact, the int pointed to is an element of int[25][80] named a, and also an element of int[2000] occupying the same space.
Since p and p+1234 are both elements of the same int[2000] object, the pointer arithmetic is well-defined. And since p[1234] means *(p+1234), it too is well-defined.
The effect of this rule for array lifetime is that you can freely use pointer arithmetic to move through a complete object.
Since std::array got mentioned in the comments:
If one has std::array<std::array<int, 80>, 25> a; then there does not exist a std::array<int, 2000>. There does exist a int[2000]. I'm looking for anything that requires sizeof (std::array<T,N>) == sizeof (T[N]) (and == N * sizeof (T)). Absent that, you have to assume that there could be gaps which mess up traversal of nested std::array.