Is flattening a multi-dimensional array with a cast UB [duplicate]

Is flattening a multi-dimensional array with a cast UB [duplicate] - c++

Consider the following code:
int a[25][80];
a[0][1234] = 56;
int* p = &a[0][0];
p[1234] = 56;
Does the second line invoke undefined behavior? How about the fourth line?

Both lines do result in undefined behavior.
Subscripting is interpreted as pointer addition followed by an indirection, that is, a[0][1234]/p[1234] is equivalent to *(a[0] + 1234)/*(p + 1234). According to [expr.add]/4 (here I quote the newest draft, while for the time OP is proposed, you can refer to this comment, and the conclusion is the same):
If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.
since a[0](decayed to a pointer to a[0][0])/p points to an element of a[0] (as an array), and a[0] only has size 80, the behavior is undefined.
As Language Lawyer pointed out in the comment, the following program does not compile.
constexpr int f(const int (&a)[2][3])
{
auto p = &a[0][0];
return p[3];
}
int main()
{
constexpr int a[2][3] = { 1, 2, 3, 4, 5, 6, };
constexpr int i = f(a);
}
The compiler detected such undefined behaviors when it appears in a constant expression.

It's up to interpretation. While the contiguity requirements of arrays don't leave much to the imagination in terms of how to layout a multidimensional arrays (this has been pointed out before), notice that when you're doing p[1234] you're indexing the 1234th element of the zeroth row of only 80 columns. Some interpret the only valid indices to be 0..79 (&p[80] being a special case).
Information from the C FAQ which is the collected wisdom of Usenet on matters relevant to C. (I do not think C and C++ differ on that matter and that this is very much relevant.)

In the language the Standard was written to describe, there would be no problem with invoking a function like:
void print_array(double *d, int rows, int cols)
{
int r,c;
for (r = 0; r < rows; r++)
{
printf("%4d: ", r);
for (c = 0; c < cols; c++)
printf("%10.4f ", d[r*cols+c]);
printf("\n");
}
}
on a double[10][4], or a double[50][40], or any other size, provided that the total number of elements in the array was less than rows*cols. Indeed, the guarantee that the row stride of a T[R][C] would equal C * sizeof (T) was designed among other things to make it possible to write code that could work with arbitrarily-sized multi-dimensional arrays.
On the other hand, the authors of the Standard recognized that when implementations are given something like:
double d[10][10];
double test(int i)
{
d[1][0] = 1.0;
d[0][i] = 2.0;
return d[1][0];
}
allowing them to generate code that would assume that d[1][0] would still hold 1.0 when the return executes, or allowing them to generate code that would trap if i is greater than 10, would allow them to be more suitable for some purposes than requiring that they silently return 2.0 if invoked with i==10.
Nothing in the Standard makes any distinction between those scenarios. While it would have been possible for the Standard to have included rules that would say that the second example invokes UB if i >= 10 without affecting the first example (e.g. say that applying [N] to an array doesn't cause it to decay to a pointer, but instead yields the Nth element, which must exist in that array), the Standard instead relies upon the fact that implementations are allowed to behave in useful fashion even when not required to do so, and compiler writers should presumably be capable of recognizing situations like the first example when doing so would benefit their customers.
Since the Standard never sought to fully define everything that programmers would need to do with arrays, it should not be looked to for guidance as to what constructs quality implementations should support.

Your compiler will throw a bunch of warnings/errors because of subscript out of range (line 2) and incompatioble types (line 3), but as long as the actual variable (int in this case) is one of the intrinsic base-types this is save to do in C and C++.
(If the variable is a class/struct it will probably still work in C, but in C++ all bets are off.)
Why you would want to do this....
For the 1st variant: If your code relies on this sort of messing about it will be error-prone and hard to maintain in the long run.
I can see a some use for the second variant when performance optimizing loops over 2D arrays by replacing them by a 1D pointer run over the data-space, but a good optimizing compiler will often do that by itself anyway.
If the body of the loop is so big/complex the compiler can't optimize/replace the loop by a 1D run on it's own, the performance gain by doing it manually will most likely not be significant either.

You're free to reinterpret the memory any way you'd like. As long as the multiple does not exceed the linear memory. You can even move a to 12, 40 and use negative indexes.

The memory referenced by a is both a int[25][80] and a int[2000]. So says the Standard, 3.8p2:
[ Note: The lifetime of an array object starts as soon as storage with proper size and alignment is obtained, and its lifetime ends when the storage which the array occupies is reused or released. 12.6.2 describes the lifetime of base and member subobjects. — end note ]
a has a particular type, it is an lvalue of type int[25][80]. But p is just int*. It is not "int* pointing into a int[80]" or anything like that. So in fact, the int pointed to is an element of int[25][80] named a, and also an element of int[2000] occupying the same space.
Since p and p+1234 are both elements of the same int[2000] object, the pointer arithmetic is well-defined. And since p[1234] means *(p+1234), it too is well-defined.
The effect of this rule for array lifetime is that you can freely use pointer arithmetic to move through a complete object.
Since std::array got mentioned in the comments:
If one has std::array<std::array<int, 80>, 25> a; then there does not exist a std::array<int, 2000>. There does exist a int[2000]. I'm looking for anything that requires sizeof (std::array<T,N>) == sizeof (T[N]) (and == N * sizeof (T)). Absent that, you have to assume that there could be gaps which mess up traversal of nested std::array.

Related

Can we safely call C API functions from C++ when arrays are involved?

Context:
As an old C programmer (even K&R C...) I had always believed that an array was nothing more than contiguously allocated nonempty set of objects with a
particular member object type, called the element type (from n1570 draft for C11 standard, 6.2.5 Types). For that reason I did not worry too much about pointer arithmetics.
I now know that an array is an object type and that it can be only created by a definition (6.1), by a new-expression (8.3.4), when implicitly changing the active member of a
union (12.3), or when a temporary object is created (7.4, 15.2) (from n4659 draft for C++17).
Problem:
I have to use a C library in which some functions return pointers to arrays of C structs. So far so good, a C struct is a POD type, and proper padding and alignment is achieved by using the standard flags of the compiler. But as the size of the array is only known at runtime, even with the correct extern "C" declarations, my function is declared to return a pointer to the first element of the array - the actual size is returned by a different function of the API.
Simplified example:
#include <iostream>
extern "C" {
struct Elt {
int ival;
//...
};
void *libinit(); // initialize the library and get a handle
size_t getNElts(void *id); // get the number of elements
struct Elt* getElts(void *id); // get an access the the array of elements
void libend(void *id); // releases library internal data
}
int main() {
void *libid = libinit();
Elt* elts = getElts(libid);
size_t nelts = getNElts(libid);
for(int i=0; i<nelts; i++) {
std::cout << elts[i].ival << " "; // is elts[i] legal?
}
std::cout << std::endl;
libend(libid);
return 0;
}
Question:
I know that the bloc of memory has probably been allocated through malloc, which could allow to use pointers on it and I assume that getElts(libid)[0] does not involve Undefined Behaviour. But is it legal to use pointer arithmetics over the C array, when it has never been declared as a C++ array: the API only guarantees that I have a contiguously allocated set of objects of type Elt and that getElts returns a pointer to the first element of that set.
Because [expr.add] explicitely restrict pointer arithmetics inside an array:
4 When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,
the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element
x[i + j] if 0 <= i + j <=n; otherwise, the behavior is undefined...
That used to be a common pratice...
EDIT
In order to make more clear my question, I know that this would be UB if done in C++
libstub.c++
/* C++ simulation of a C implementation */
extern "C" {
struct Elt {
int ival;
//...
};
void *libinit(); // initialize the library and get a handle
size_t getNElts(void *id); // get the number of elements
struct Elt* getElts(void *id); // get an access the the array of elements
void libend(void *id); // releases library internal data
}
size_t getCurrentSize() {
return 1024; // let us assume that the returned value is not a constexpr
}
void *libinit() {
size_t N = getCurrentSize();
unsigned char * storage = new unsigned char[(N + 1) * sizeof(Elt)];
// storage can provide storage for a size_t correct alignment
size_t *n = new(storage) size_t;
*n = N;
for (size_t i=1; i<=N; i++) {
// storage can provide storage for a size_t, correct alignment
Elt *elt = new(storage + (i+1) * sizeof(Elt)) Elt();
elt->ival = i; // put values into elt...
}
return static_cast<void *>(storage);
}
void libend(void * id) {
unsigned char *storage = static_cast<unsigned char *>(id); // ok, back cast is valid
delete[] storage; // ok, was allocated by new[]
}
size_t getNElts(void *id) {
size_t *n = reinterpret_cast<size_t *>(id); // ok a size_t was created there
return *n;
}
Elt *getElts(void *id) {
unsigned char *storage = static_cast<unsigned char *>(id); // ok, back cast
Elt* elt = reinterpret_cast<Elt *>(storage + sizeof(Elt)); // ok an Elt was created there
return elt;
}
This is valid C++ code, and it fullfills the C API requirement. The problem is that getElts returns a pointer to a single element object which is not member of any array. So according to [expr.add] pointer arithmetics based on the return value of getElts invokes UB

The c++ standard provides nearly zero interoperability guarantees woth c.
As far as C++ is concerned, what happens within C code is outside the scope of C++.
So "does this pointer poimt to an array" is a question the C++ standard cannot answer, as the pointer comes from a C function. Rather it is a question left to your particular compiler.
In practice, this works. In theory, there are no guarantees provided by C++ that your program is well formed when you interact in any way with C.
This is good news, because the C++ standard is broken around creating dynamic arrays of type T. It is so bad that there is no standard-compliant way to implement std::vector without compiler magic, or by the compiler defining the undefined behavior that results from attempting to do it.
However, C++ compilers are free to completely ignore this problem. They are free to define inter-element pointer arithmetic when objects are contiguously allocated to behave just like an array. I am unaware of a compiler that states or guarantees this formally.
Similarly, they are free to produce any guarantees whatesoever with how they treat pointers from C code. And in practice, they do provide quite reasonable behavior when you interact with C code.
I am unaware of any formal guaratees by any compiler.

Pointer arithmetic using the builtin [] operator on pointers is strictly equivalent to doing the pointer arithmetic by hand in the sense that the following is guaranteed:
int arr[2] = { 0, 1 };
assert(arr[1] == *(arr + 1));
The two variants are guaranteed to have the same semantics. As far as your example is concerned, if you know for sure that your API returns a pointer to some contiguous memory, then your code is perfectly valid. This is assumption seems perfectly fine given the way the API seems to work. As a side note, I have never seen an allocator that did not allocate contiguous memory on a modern system, it just seems like a very silly thing to do to me and it does not seem to me like something that is doable given the way C and C++ work (at least not with language support w.r.t to field accesses), anyone correct me if I am wrong though.

getElts returns an address to the beginning of what is an array of something, created in the C library.
getNElts returns the number of elements in that array.
Presumably, you know the exact size of Elt.
Thus, you have all of the information necessary to access your data in C++, using pointer arithmetic if you so choose. It may be technically "undefined", but practically it is not undefined, and it works. This has to be commonly done, especially when dealing with interfaces to hardware.
If you are uncomfortable with going out of bounds on the array that you say is not a C++ array, create an array in C++ and place it at the location returned by getElts. You could even create a std::vector in C++, and memcpy the data pointed to by getElts on to the vector.
something like this:
struct Elt{
int j;
// etc.
}
std::vector<Elt> elts; // create a vector of Elt
size_t n_elts = getNElts(); // call library to get number of Elts
elts.resize(n_elts); // resize the vector according to number of elements
Elt* addr = getElts(); // get the address of the elements array from the library
std::memcpy(&elts[0], addr, n_elts * sizeof(Elt)); // copy the array over the vector data, which starts at &elts[0].
// there may be better ways to do this copy but this works very well.
// now you can access the elements from the vector.
// using .at for bounds check.
Elt my_elt = elts.at(1);
// not bound checked...
Elt my_elt_2 = elts[2];
You are now working on a copy of the elements contained in a C++ std::vector. If the elements are dynamic from the library, you can 'place' the vector contents at the address returned by the library, and not do the copy. Then you are 'looking' at the memory allocated in the C side.
I'm not sure that all of this is 'defined' behavior, but it will work (I'm not an expert on the standard). You may have other issues with assuring that the Elt structure really lays out the same in your C and C++ implementations, but that can all be worked out.
The bottom line is, there are many ways to do what it appears you are wanting to do. I think you are getting hung up on semantics of pointer arithmetic. Pointer arithmetic is always dangerous, and can lead to undefined behavior, because it is easy to go out of bounds on an array. This is why bare arrays are not recommended practice in C++. There are usually safer ways to do things than using bare arrays.

Internal logic of operator [] when dealing with pointers

I've been studying C++ for couple of months now and just recently decided to look more deeply into the logic of pointers and arrays. What I've been taught in uni is pretty basic - pointers contain the address of a variable. When an array is created, basically a pointer to its first element is created.
So I started experimenting a bit. (and got to a conclusion which I need confirmation for). First of all I created
int arr[10];
int* ptr = &arr[5];
And as you would imagine
cout << ptr[3];
gave me the 8th element of the array. Next I tried
int num = 6;
int* ptr2 = &num;
cout << ptr2[5];
cout << ptr2 + 5;
which to my great delight (not irony) returned the same addresses. Even though num wasn't an array.
The conclusion to which I got: array is not something special in C++. It's just a pointer to the first element (already typed that). More important: Can I think about every pointer in the manner of object of a class variable*. Is the operator [] just overloaded in the class int*? For example to be something along the lines of:
int operator[] (int index){
return *(arrayFirstaddress + index);
}
What was interesting to me in these experiments is that operator [] works for EVERY pointer. (So it's exactly like overloading an operator for all instances of the said class)
Of course, I can be as wrong as possible. I couldn't find much information in the web, since I didn't know how to word my question so I decided to ask here.
It would be extremely helpful if you explained to me if I'm right/wrong/very wrong and why.

You find the definition of subscripting, i.e. an expression like ptr2[5] in the c++ standard, e.g. like in this online c++ draft standard:
5.2.1 Subscripting [expr.sub]
(1) ... The expression E1[E2] is identical (by definition) to
*((E1)+(E2))
So your "discovery" sounds correct, although your examples seem to have some bugs (e.g. ptr2[5] should not return an address but an int value, whereas ptr2+5 is an address an not an int value; I suppose you meant &ptr2[5]).
Further, your code is not a prove of this discovery as it is based on undefined behaviour. It may yield something that supports your "discovery", but your discovery could still be not valid, and it could also do the opposite (really!).
The reason why it is undefined behaviour is that even pointer arithmetics like ptr2+5 is undefined behaviour if the result is out of the range of the allocated memory block ptr2 points to (which is definitely the case in your example):
5.7 Additive operators
(6) ... Unless both pointers point to elements of the same array
object, or one past the last element of the array object, the behavior
is undefined.
Different compilers, different optimization settings, and even slight modifications anywhere in your program may let the compiler do other things here.

An array in C++ is a collection of objects. A pointer is a variable that can store the address of something. The two are not the same thing.
Unfortunately, your sample
int num = 6;
int* ptr2 = &num;
cout << ptr2[5];
cout << ptr2 + 5;
exhibits undefined behaviour, both in the evaluation of ptr2[5] and ptr2 + 5. Pointer expressions are special - arithmetic involving pointers only has defined behaviour if the pointer being acted on (ptr2 in this case) and the result (ptr2 + 5) are within the same object. Or one past the end (although dereferencing a "one past the end" pointer - trying to access the value it points at - also gives undefined behaviour).
Semantically, *(ptr + n) and ptr[n] are equivalent (i.e. they have the same meaning) if ptr is a pointer and n is an integral value. So if evaluating ptr + n gives undefined behaviour, so does evaluating ptr[n]. Similarly, &ptr[n] and ptr + n are equivalent.
In expressions, depending on context, the name of an array is converted to a pointer, and that pointer is equal to the address of that array's first element. So, given
int x[5];
int *p;
// the following all have the same effect
p = x + 2;
p = &x[0] + 2;
p = &x[2];
That does not mean an array is a pointer though.

Can I safely create references to possibly invalid memory as long as I don't use it?

I want to parse UTF-8 in C++. When parsing a new character, I don't know in advance if it is an ASCII byte or the leader of a multibyte character, and also I don't know if my input string is sufficiently long to contain the remaining characters.
For simplicity, I'd like to name the four next bytes a, b, c and d, and because I am in C++, I want to do it using references.
Is it valid to define those references at the beginning of a function as long as I don't access them before I know that access is safe? Example:
void parse_utf8_character(const string s) {
for (size_t i = 0; i < s.size();) {
const char &a = s[i];
const char &b = s[i + 1];
const char &c = s[i + 2];
const char &d = s[i + 3];
if (is_ascii(a)) {
i += 1;
do_something_only_with(a);
} else if (is_twobyte_leader(a)) {
i += 2;
if (is_safe_to_access_b()) {
do_something_only_with(a, b);
}
}
...
}
}
The above example shows what I want to do semantically. It doesn't illustrate why I want to do this, but obviously real code will be more involved, so defining b,c,d only when I know that access is safe and I need them would be too verbose.

There are three takes on this:
Formally
well, who knows. I could find out for you by using quite some time on it, but then, so could you. Or any reader. And it's not like that's very practically useful.
EDIT: OK, looking it up, since you don't seem happy about me mentioning the formal without looking it up for you. Formally you're out of luck:
N3280 (C++11) §5.7/5 “If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.”
Two situations where this can produce undesired behavior: (1) computing an address beyond the end of a segment, and (2) computing an address beyond an array that the compiler knows the size of, with debug checks enabled.
Technically
you're probably OK as long as you avoid any lvalue-to-rvalue conversion, because if the references are implemented as pointers, then it's as safe as pointers, and if the compiler chooses to implement them as aliases, well, that's also ok.
Economically
relying needlessly on a subtlety wastes your time, and then also the time of others dealing with the code. So, not a good idea. Instead, declare the names when it's guaranteed that what they refer to, exists.

Before going into the legality of references to unaccessible memory, you have another problem in your code. Your call to s[i+x] might call string::operator[] with a parameter bigger then s.size(). The C++11 standard says about string::operator[] ([string.access], §21.4.5):
Requires: pos <= size().
Returns: *(begin()+pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modiﬁed.
This means that calling s[x] for x > s.size() is undefined behaviour, so the implementation could very well terminate your program, e.g. by means of an assertion, for that.
Since string is now guaranteed to be continous, you could go around that problem using &s[i]+x to get an address. In praxis this will probably work.
However, strictly speaking doing this is still illegal unfortunately. The reason for this is that the standard allows pointer arithmetic only as long as the pointer stays inside the same array, or one past the end of the array. The relevant part of the (C++11) standard is in [expr.add], §5.7.5:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overﬂow; otherwise, the behavior is undeﬁned.
Therefore generating references or pointers to invalid memory locations might work on most implementations, but it is technically undefined behaviour, even if you never dereference the pointer/use the reference. Relying on UB is almost never a good idea , because even if it works for all targeted systems, there are no guarantees about it continuing to work in the future.

In principle, the idea of taking a reference for a possibly illegal memory address is itself perfectly legal. The reference is only a pointer under the hood, and pointer arithmetic is legal until dereferencing occurs.
EDIT: This claim is a practical one, not one covered by the published standard. There are many corners of the published standard which are formally undefined behaviour, but don't produce any kind of unexpected behaviour in practice.
Take for example to possibility of computing a pointer to the second item after the end of an array (as #DanielTrebbien suggests). The standard says overflow may result in undefined behaviour. In practice, the overflow would only occur if the upper end of the array is just short of the space addressable by a pointer. Not a likely scenario. Even when if it does happen, nothing bad would happen on most architectures. What is violated are certain guarantees about pointer differences, which don't apply here.
#JoSo If you were working with a character array, you can avoid some of the uncertainty about reference semantics by replacing the const-references with const-pointers in your code. That way you can be certain no compiler will alias the values.

May I treat a 2D array as a contiguous 1D array?

Consider the following code:
int a[25][80];
a[0][1234] = 56;
int* p = &a[0][0];
p[1234] = 56;
Does the second line invoke undefined behavior? How about the fourth line?

Both lines do result in undefined behavior.
Subscripting is interpreted as pointer addition followed by an indirection, that is, a[0][1234]/p[1234] is equivalent to *(a[0] + 1234)/*(p + 1234). According to [expr.add]/4 (here I quote the newest draft, while for the time OP is proposed, you can refer to this comment, and the conclusion is the same):
If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.
since a[0](decayed to a pointer to a[0][0])/p points to an element of a[0] (as an array), and a[0] only has size 80, the behavior is undefined.
As Language Lawyer pointed out in the comment, the following program does not compile.
constexpr int f(const int (&a)[2][3])
{
auto p = &a[0][0];
return p[3];
}
int main()
{
constexpr int a[2][3] = { 1, 2, 3, 4, 5, 6, };
constexpr int i = f(a);
}
The compiler detected such undefined behaviors when it appears in a constant expression.

It's up to interpretation. While the contiguity requirements of arrays don't leave much to the imagination in terms of how to layout a multidimensional arrays (this has been pointed out before), notice that when you're doing p[1234] you're indexing the 1234th element of the zeroth row of only 80 columns. Some interpret the only valid indices to be 0..79 (&p[80] being a special case).
Information from the C FAQ which is the collected wisdom of Usenet on matters relevant to C. (I do not think C and C++ differ on that matter and that this is very much relevant.)

In the language the Standard was written to describe, there would be no problem with invoking a function like:
void print_array(double *d, int rows, int cols)
{
int r,c;
for (r = 0; r < rows; r++)
{
printf("%4d: ", r);
for (c = 0; c < cols; c++)
printf("%10.4f ", d[r*cols+c]);
printf("\n");
}
}
on a double[10][4], or a double[50][40], or any other size, provided that the total number of elements in the array was less than rows*cols. Indeed, the guarantee that the row stride of a T[R][C] would equal C * sizeof (T) was designed among other things to make it possible to write code that could work with arbitrarily-sized multi-dimensional arrays.
On the other hand, the authors of the Standard recognized that when implementations are given something like:
double d[10][10];
double test(int i)
{
d[1][0] = 1.0;
d[0][i] = 2.0;
return d[1][0];
}
allowing them to generate code that would assume that d[1][0] would still hold 1.0 when the return executes, or allowing them to generate code that would trap if i is greater than 10, would allow them to be more suitable for some purposes than requiring that they silently return 2.0 if invoked with i==10.
Nothing in the Standard makes any distinction between those scenarios. While it would have been possible for the Standard to have included rules that would say that the second example invokes UB if i >= 10 without affecting the first example (e.g. say that applying [N] to an array doesn't cause it to decay to a pointer, but instead yields the Nth element, which must exist in that array), the Standard instead relies upon the fact that implementations are allowed to behave in useful fashion even when not required to do so, and compiler writers should presumably be capable of recognizing situations like the first example when doing so would benefit their customers.
Since the Standard never sought to fully define everything that programmers would need to do with arrays, it should not be looked to for guidance as to what constructs quality implementations should support.

Your compiler will throw a bunch of warnings/errors because of subscript out of range (line 2) and incompatioble types (line 3), but as long as the actual variable (int in this case) is one of the intrinsic base-types this is save to do in C and C++.
(If the variable is a class/struct it will probably still work in C, but in C++ all bets are off.)
Why you would want to do this....
For the 1st variant: If your code relies on this sort of messing about it will be error-prone and hard to maintain in the long run.
I can see a some use for the second variant when performance optimizing loops over 2D arrays by replacing them by a 1D pointer run over the data-space, but a good optimizing compiler will often do that by itself anyway.
If the body of the loop is so big/complex the compiler can't optimize/replace the loop by a 1D run on it's own, the performance gain by doing it manually will most likely not be significant either.

You're free to reinterpret the memory any way you'd like. As long as the multiple does not exceed the linear memory. You can even move a to 12, 40 and use negative indexes.

The memory referenced by a is both a int[25][80] and a int[2000]. So says the Standard, 3.8p2:
[ Note: The lifetime of an array object starts as soon as storage with proper size and alignment is obtained, and its lifetime ends when the storage which the array occupies is reused or released. 12.6.2 describes the lifetime of base and member subobjects. — end note ]
a has a particular type, it is an lvalue of type int[25][80]. But p is just int*. It is not "int* pointing into a int[80]" or anything like that. So in fact, the int pointed to is an element of int[25][80] named a, and also an element of int[2000] occupying the same space.
Since p and p+1234 are both elements of the same int[2000] object, the pointer arithmetic is well-defined. And since p[1234] means *(p+1234), it too is well-defined.
The effect of this rule for array lifetime is that you can freely use pointer arithmetic to move through a complete object.
Since std::array got mentioned in the comments:
If one has std::array<std::array<int, 80>, 25> a; then there does not exist a std::array<int, 2000>. There does exist a int[2000]. I'm looking for anything that requires sizeof (std::array<T,N>) == sizeof (T[N]) (and == N * sizeof (T)). Absent that, you have to assume that there could be gaps which mess up traversal of nested std::array.

C/C++: Is this undefined behavior? (2D arrays)

Is it undefined behavior if I go through the elements of a 2D array in the following manner?
int v[5][5], i;
for (i = 0; i < 5*5; ++i) {
v[i] = i;
}
Then again, does it even compile? (I can't try it right now, I'm not at home.) If it doesn't, then imagine I somehow acquired a pointer to the first element and using taht instead of v[i].

Accessing elements of a multidimensional array from a pointer to the first element is Undefined Behavior (UB) for the elements that are not part of the first array.
Given T array[n], array[i] is a straight trip to UB-land for all i >= n. Even when T is U[m]. Even if it's through a pointer. It's true there are strong requirements on arrays (e.g. sizeof(int[N]) == N*sizeof(int)), as mentioned by others, but no exception is explicitly made so nothing can be done about it.
I don't have an official reference because as far as I can tell the C++ standard leaves the details to the C89 standard and I'm not familiar with either the C89 or C99 standard. Instead I have a reference to the comp.lang.c FAQ:
[...] according to an official interpretation, the behavior of accessing (&array[0][0])[x] is not defined for x >= NCOLUMNS.

It will not compile.
The more of less equivalent
int v[5][5], *vv, i;
vv = &v[0][0];
for (i = 0; i < 5*5; ++i) {
vv[i] = i;
}
and
int v[5][5], i;
for (i = 0; i < 5*5; ++i) {
v[0][i] = i;
}
will compile. I'm not sure if they are UB or not (and it could in fact be different between C90, C99 and C++; aliasing is a tricky area). I'll try to find references one way or the other.

It is really quite hard to find any reference in the standard explicitly stating that this is undefined behavior. Sure, the standard clearly states (C99 6.5.6 §8-9) that if you do pointer arithmetics beyond the array, it is UB. The question then is, what is the definition of an array?
If a multi-dimensional array is regarded as an array of array objects, then it is UB. But if it is regarded as one array with multiple dimensions, the code would be perfectly fine.
There is an interesting note of another undefined behavior in Annex J of the standard:
An array subscript is out of range,
even if an object is apparently
accessible with the given subscript
(as in the lvalue expression a[1][7]
given the declaration int a[4][5])
(6.5.6).
This insinuates that accessing a multi-dimensional array out of the range of the 1st dimension is undefined behavior. However, the annex is not normative text, and 6.5.6 is quite vauge.
Perhaps someone can find a clear definition of the difference between an array object and a multi-dimensional array? Until then, I am not convinced that this is UB.
EDIT: Forgot to mention that v[i] is certainly not valid C syntax. As per 6.5.2.1, v[i] is equivalent to *(v+i), which is an array pointer and not an array element. What I am not certain about is whether accessing it as v[0][too_large_value] is UB or not.

Here v[i] stands for integer array of 5 elements..
and an integer array is referenced by an address location which depending on your 'c' compiler could be 16 bits, 32 bits...
so v[i] = i may compile in some compilers.... but it definitely won't yield the result u are looking for.
Answer by sharptooth is correct v[i][j] = i... is one of the easiest and readable solution..
other could be
int *ptr;
ptr = v;
now u can iterate over this ptr to assign the values
for (i = 0; i < 5*5; i++, ptr++) {
*ptr = i;
}

This will not compile.
You will get the following error for the line:
v[i] = i;
error: incompatible types in assignment of ‘int’ to ‘int [5]’
To give an answer taken from a similar question at:
http://www.velocityreviews.com/forums/t318379-incompatible-types-in-assignment.html
v is a 2D array. Since you are only referencing one dimension, what you end up getting is a char pointer to the underlying array, and hence this statement is trying to assign a char constant to a char pointer.
You can either use double quotes to change the constant to a C-style string or you can explicitly reference v[i][0] which is what I assume you intended.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is flattening a multi-dimensional array with a cast UB [duplicate] - c++

Consider the following code: int a[25][80]; a[0][1234] = 56; int* p = &a[0][0]; p[1234] = 56; Does the second line invoke undefined behavior? How about the fourth line?

You're free to reinterpret the memory any way you'd like. As long as the multiple does not exceed the linear memory. You can even move a to 12, 40 and use negative indexes.

Related

Can we safely call C API functions from C++ when arrays are involved?

Internal logic of operator [] when dealing with pointers

Can I safely create references to possibly invalid memory as long as I don't use it?

May I treat a 2D array as a contiguous 1D array?

C/C++: Is this undefined behavior? (2D arrays)

Categories

Resources