Related
This is a follow up to the following question. I was under the assumption, that the pointer arithmetic I originally used would cause undefined behavior. However I was told by a colleague, that the usage is actually well defined. The following is a simplified example:
typedef struct StructA {
int a;
} StructA ;
typedef struct StructB {
StructA a;
StructA* b;
} StructB;
int main() {
StructB* original = (StructB*)malloc(sizeof(StructB));
original->a.a = 5;
original->b = &original->a;
StructB* copy = (StructB*)malloc(sizeof(StructB));
memcpy(copy, original, sizeof(StructB));
free(original);
ptrdiff_t offset = (char*)copy - (char*)original;
StructA* a = (StructA*)((char*)(copy->b) + offset);
printf("%i\n", a->a);
free(copy)
}
According to §5.7 ¶5 of the C++11 spec:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
I assumed, that the following part of the code:
ptrdiff_t offset = (char*)copy - (char*)original;
StructA* a = (StructA*)((char*)(copy->b) + offset);
causes undefined behavior, since it:
subtracts two pointers, which point to different arrays
the resulting pointer of the offset calculation does not point into the same array anymore.
Does this cause undefined behavior, or do I misinterpret the C++ specification? Does the same apply in C as well?
Edit:
Following the comments I assume the following modification would still be undefined behavior because of the object usage after the lifetime has ended:
ptrdiff_t offset = (char*)(copy->b) - (char*)original;
StructA* a = (StructA*)((char*)copy + offset);
Would it be defined when working with indexes instead:
typedef struct StructB {
StructA a;
ptrdiff_t b_offset;
} StructB;
int main() {
StructB* original = (StructB*)malloc(sizeof(StructB));
original->a.a = 5;
original->b_offset = (char*)&(original->a) - (char*)original
StructB* copy = (StructB*)malloc(sizeof(StructB));
memcpy(copy, original, sizeof(StructB));
free(original);
StructA* a = (StructA*)((char*)copy + copy->b_offset);
printf("%i\n", a->a);
free(copy);
}
It is undefined behavior because there are severe restrictions on what can be done with pointer arithmetic. The edits that you have made and that were suggested do nothing to fix this.
Undefined Behavior in Addition
StructA* a = (StructA*)((char*)copy + offset);
First of all, this is undefined behavior due to the addition onto copy:
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
(4.1) If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
(4.2) Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 ≤ i − j ≤ n.
(4.3) Otherwise, the behavior is undefined.
See https://eel.is/c++draft/expr.add#4
In short, performing pointer arithmetic on non-arrays and non-null-pointers is always undefined behavior. Even if copy or its members were arrays, adding onto a pointer so that it becomes:
two or more past the end of the array
at least one before the first element
is also undefined behavior.
Undefined Behavior in Subtraction
ptrdiff_t offset = (char*)original - (char*)(copy->b);
The subtraction of your two pointers is also undefined behavior:
When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; [...]
(5.1) If P and Q both evaluate to null pointer values, the result is 0.
(5.2) Otherwise, if P and Q point to, respectively, array elements i and j of the same array object x, the expression P - Q has the value i − j.
(5.3) Otherwise, the behavior is undefined.
See https://eel.is/c++draft/expr.add#5
So subtracting pointers from one another, when they are not both null or pointers to elements of the same array is undefined behavior.
Undefined Behavior in C
The C standard has similar restrictions:
(8) [...] If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression.
(The standard does not mention what happens for non-array pointer addition)
(9) When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; [...]
See §6.5.6 Additive Operators in the C11 standard (n1570).
Using Data Member Pointers Instead
A clean and type-safe solution in C++ would be to use data member pointers.
typedef struct StructB {
StructA a;
StructA StructB::*b_offset;
} StructB;
int main() {
StructB* original = (StructB*) malloc(sizeof(StructB));
original->a.a = 5;
original->b_offset = &StructB::a;
StructB* copy = (StructB*) malloc(sizeof(StructB));
memcpy(copy, original, sizeof(StructB));
free(original);
printf("%i\n", (copy->*(copy->b_offset)).a);
free(copy);
}
Notes
The standard citations are from a C++ draft. The C++11 which you have cited does not appear to have any looser restrictions on pointer arithmetic, it is just formatted differently. See C++11 standard (n3337).
The Standard explicitly provides that in situations it characterizes as Undefined Behavior, implementations may behave "in a documented fashion characteristic of the environment". According to the Rationale, the intention of such characterization was, among other things, to identify avenues of "conforming language extension"; the question of when implementations support such "popular extensions" was a Quality of Implementation issue best left to the marketplace.
Many implementations intended and/or configured for low-level programming on commonplace platforms extend the language by specifying that the following equivalences hold, for any pointers p and q of type T* and integer expression i:
The bit patterns of p, (uintptr_t)p, and (intptr_t)p are identical.
p+i is equivalent to (T*)((uintptr_t)p + (uintptr_t)i * sizeof (T))
p-i is equivalent to (T*)((uintptr_t)p - (uintptr_t)i * sizeof (T))
p-q is equivalent to ((uintptr_t)p - (uintptr_t)q) / sizeof (T) in all cases where the division would have no remainder.
p>q is equivalent to (uintptr_t)p > (uintptr_t)q and likewise for all other relational and comparison operators.
The Standard does not recognize any category of implementations that always uphold those equivalences, as distinct from those that do not, in part because they did not wish to portray as "inferior" implementations for unusual platforms where such upholding equivalence would be impractical. Instead, it expected that such implementations would be upheld on implementations where that would make sense, and programmers would know when they were targeting such implementations. Someone writing memory-management code for the 68000, or for small-model 8086 (where such equivalences would naturally hold) could write memory management code that would run interchangeably on other systems where those equivalences would hold, but someone writing memory-management code for large-model 8086 would need to design it explicitly for that platform because those equivalences do not hold (pointers are 32 bits, but individual objects are limited to 65520 bytes and most pointer operations only act upon the bottom 16 bits of a pointer).
Unfortunately, even on platforms where such equivalences would normally hold, some kinds of optimizations may yield corner-case behaviors that differ from those otherwise implied by those equivalences. Commercial compilers generally uphold the Spirit of C principle "don't prevent the programmer from doing what needs to be done", and can be configured to uphold the equivalences even when most optimizations are enabled. The gcc and clang C compilers, however, don't allow such control over semantics. When all optimizations are disabled, they will uphold those equivalences on commonplace platforms, but there is no other optimization setting that will prevent them from making inferences that would be inconsistent with them.
As an example, consider the following structure:
struct S {
int a[4];
int b[4];
} s;
Would it be legal to write s.a[6] and expect it to be equal to s.b[2]?
Personally, I feel that it must be UB in C++, whereas I'm not sure about C.
However, I failed to find anything relevant in the standards of C and C++ languages.
Update
There are several answers suggesting ways to make sure there is no padding
between fields in order to make the code work reliably. I'd like to emphasize
that if such code is UB, then absense of padding is not enough. If it is UB,
then the compiler is free to assume that accesses to S.a[i] and S.b[j] do not
overlap and the compiler is free to reorder such memory accesses. For example,
int x = s.b[2];
s.a[6] = 2;
return x;
can be transformed to
s.a[6] = 2;
int x = s.b[2];
return x;
which always returns 2.
Would it be legal to write s.a[6] and expect it to be equal to s.b[2]?
No. Because accessing an array out of bound invoked undefined behaviour in C and C++.
C11 J.2 Undefined behavior
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond
the array object and is used as the operand of a unary * operator that
is evaluated (6.5.6).
An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).
C++ standard draft section 5.7 Additive operators paragraph 5 says:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression.
[...] If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
Apart from the answer of #rsp (Undefined behavior for an array subscript that is out of range) I can add that it is not legal to access b via a because the C language does not specify how much padding space can be between the end of area allocated for a and the start of b, so even if you can run it on a particular implementation , it is not portable.
instance of struct:
+-----------+----------------+-----------+---------------+
| array a | maybe padding | array b | maybe padding |
+-----------+----------------+-----------+---------------+
The second padding may miss as well as the alignment of struct object is the alignment of a which is the same as the alignment of b but the C language also does not impose the second padding not to be there.
a and b are two different arrays, and a is defined as containing 4 elements. Hence, a[6] accesses the array out of bounds and is therefore undefined behaviour. Note that array subscript a[6] is defined as *(a+6), so the proof of UB is actually given by section "Additive operators" in conjunction with pointers". See the following section of the C11-standard (e.g. this online draft version) describing this aspect:
6.5.6 Additive operators
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i-n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
The same argument applies to C++ (though not quoted here).
Further, though it is clearly undefined behaviour due to the fact of exceeding array bounds of a, note that the compiler might introduce padding between members a and b, such that - even if such pointer arithmetics were allowed - a+6 would not necessarily yield the same address as b+2.
Is it legal? No. As others mentioned, it invokes Undefined Behavior.
Will it work? That depends on your compiler. That's the thing about undefined behavior: it's undefined.
On many C and C++ compilers, the struct will be laid out such that b will immediately follow a in memory and there will be no bounds checking. So accessing a[6] will effectively be the same as b[2] and will not cause any sort of exception.
Given
struct S {
int a[4];
int b[4];
} s
and assuming no extra padding, the structure is really just a way of looking at a block of memory containing 8 integers. You could cast it to (int*) and ((int*)s)[6] would point to the same memory as s.b[2].
Should you rely on this sort of behavior? Absolutely not. Undefined means that the compiler doesn't have to support this. The compiler is free to pad the structure which could render the assumption that &(s.b[2]) == &(s.a[6]) incorrect. The compiler could also add bounds checking on the array access (although enabling compiler optimizations would probably disable such a check).
I've have experienced the effects of this in the past. It's quite common to have a struct like this
struct Bob {
char name[16];
char whatever[64];
} bob;
strcpy(bob.name, "some name longer than 16 characters");
Now bob.whatever will be " than 16 characters". (which is why you should always use strncpy, BTW)
As #MartinJames mentioned in a comment, if you need to guarantee that a and b are in contiguous memory (or at least able to be treated as such, (edit) unless your architecture/compiler uses an unusual memory block size/offset and forced alignment that would require padding to be added), you need to use a union.
union overlap {
char all[8]; /* all the bytes in sequence */
struct { /* (anonymous struct so its members can be accessed directly) */
char a[4]; /* padding may be added after this if the alignment is not a sub-factor of 4 */
char b[4];
};
};
You can't directly access b from a (e.g. a[6], like you asked), but you can access the elements of both a and b by using all (e.g. all[6] refers to the same memory location as b[2]).
(Edit: You could replace 8 and 4 in the code above with 2*sizeof(int) and sizeof(int), respectively, to be more likely to match the architecture's alignment, especially if the code needs to be more portable, but then you have to be careful to avoid making any assumptions about how many bytes are in a, b, or all. However, this will work on what are probably the most common (1-, 2-, and 4-byte) memory alignments.)
Here is a simple example:
#include <stdio.h>
union overlap {
char all[2*sizeof(int)]; /* all the bytes in sequence */
struct { /* anonymous struct so its members can be accessed directly */
char a[sizeof(int)]; /* low word */
char b[sizeof(int)]; /* high word */
};
};
int main()
{
union overlap testing;
testing.a[0] = 'a';
testing.a[1] = 'b';
testing.a[2] = 'c';
testing.a[3] = '\0'; /* null terminator */
testing.b[0] = 'e';
testing.b[1] = 'f';
testing.b[2] = 'g';
testing.b[3] = '\0'; /* null terminator */
printf("a=%s\n",testing.a); /* output: a=abc */
printf("b=%s\n",testing.b); /* output: b=efg */
printf("all=%s\n",testing.all); /* output: all=abc */
testing.a[3] = 'd'; /* makes printf keep reading past the end of a */
printf("a=%s\n",testing.a); /* output: a=abcdefg */
printf("b=%s\n",testing.b); /* output: b=efg */
printf("all=%s\n",testing.all); /* output: all=abcdefg */
return 0;
}
No, since accesing an array out of bounds invokes Undefined Behavior, both in C and C++.
Short Answer: No. You're in the land of undefined behavior.
Long Answer: No. But that doesn't mean that you can't access the data in other sketchier ways... if you're using GCC you can do something like the following (elaboration of dwillis's answer):
struct __attribute__((packed,aligned(4))) Bad_Access {
int arr1[3];
int arr2[3];
};
and then you could access via (Godbolt source+asm):
int x = ((int*)ba_pointer)[4];
But that cast violates strict aliasing so is only safe with g++ -fno-strict-aliasing. You can cast a struct pointer to a pointer to the first member, but then you're back in the UB boat because you're accessing outside the first member.
Alternatively, just don't do that. Save a future programmer (probably yourself) the heartache of that mess.
Also, while we're at it, why not use std::vector? It's not fool-proof, but on the back-end it has guards to prevent such bad behavior.
Addendum:
If you're really concerned about performance:
Let's say you have two same-typed pointers that you're accessing. The compiler will more than likely assume that both pointers have the chance to interfere, and will instantiate additional logic to protect you from doing something dumb.
If you solemnly swear to the compiler that you're not trying to alias, the compiler will reward you handsomely:
Does the restrict keyword provide significant benefits in gcc / g++
Conclusion: Don't be evil; your future self, and the compiler will thank you.
Jed Schaff’s answer is on the right track, but not quite correct. If the compiler inserts padding between a and b, his solution will still fail. If, however, you declare:
typedef struct {
int a[4];
int b[4];
} s_t;
typedef union {
char bytes[sizeof(s_t)];
s_t s;
} u_t;
You may now access (int*)(bytes + offsetof(s_t, b)) to get the address of s.b, no matter how the compiler lays out the structure. The offsetof() macro is declared in <stddef.h>.
The expression sizeof(s_t) is a constant expression, legal in an array declaration in both C and C++. It will not give a variable-length array. (Apologies for misreading the C standard before. I thought that sounded wrong.)
In the real world, though, two consecutive arrays of int in a structure are going to be laid out the way you expect. (You might be able to engineer a very contrived counterexample by setting the bound of a to 3 or 5 instead of 4 and then getting the compiler to align both a and b on a 16-byte boundary.) Rather than convoluted methods to try to get a program that makes no assumptions whatsoever beyond the strict wording of the standard, you want some kind of defensive coding, such as static assert(&both_arrays[4] == &s.b[0], "");. These add no run-time overhead and will fail if your compiler is doing something that would break your program, so long as you don’t trigger UB in the assertion itself.
If you want a portable way to guarantee that both sub-arrays are packed into a contiguous memory range, or split a block of memory the other way, you can copy them with memcpy().
The Standard does not impose any restrictions upon what implementations must do when a program tries to use an out-of-bounds array subscript in one structure field to access a member of another. Out-of-bounds accesses are thus "illegal" in strictly conforming programs, and programs which make use of such accesses cannot simultaneously be 100% portable and free of errors. On the other hand, many implementations do define the behavior of such code, and programs which are targeted solely at such implementations may exploit such behavior.
There are three issues with such code:
While many implementations lay out structures in predictable fashion, the Standard allows implementations to add arbitrary padding before any structure member other than the first. Code could use sizeof or offsetof to ensure that structure members are placed as expected, but the other two issues would remain.
Given something like:
if (structPtr->array1[x])
structPtr->array2[y]++;
return structPtr->array1[x];
it would normally be useful for a compiler to assume that the use of structPtr->array1[x] will yield the same value as the preceding use in the "if" condition, even though it would change the behavior of code that relies upon aliasing between the two arrays.
If array1[] has e.g. 4 elements, a compiler given something like:
if (x < 4) foo(x);
structPtr->array1[x]=1;
might conclude that since there would be no defined cases where x isn't less than 4, it could call foo(x) unconditionally.
Unfortunately, while programs can use sizeof or offsetof to ensure that there aren't any surprises with struct layout, there's no way by which they can test whether compilers promise to refrain from the optimizations of types #2 or #3. Further, the Standard is a little vague about what would be meant in a case like:
struct foo {char array1[4],array2[4]; };
int test(struct foo *p, int i, int x, int y, int z)
{
if (p->array2[x])
{
((char*)p)[x]++;
((char*)(p->array1))[y]++;
p->array1[z]++;
}
return p->array2[x];
}
The Standard is pretty clear that behavior would only be defined if z is in the range 0..3, but since the type of p->array in that expression is char* (due to decay) it's not clear the cast in the access using y would have any effect. On the other hand, since converting pointer to the first element of a struct to char* should yield the same result as converting a struct pointer to char*, and the converted struct pointer should be usable to access all bytes therein, it would seem the access using x should be defined for (at minimum) x=0..7 [if the offset of array2 is greater than 4, it would affect the value of x needed to hit members of array2, but some value of x could do so with defined behavior].
IMHO, a good remedy would be to define the subscript operator on array types in a fashion that does not involve pointer decay. In that case, the expressions p->array[x] and &(p->array1[x]) could invite a compiler to assume that x is 0..3, but p->array+x and *(p->array+x) would require a compiler to allow for the possibility of other values. I don't know if any compilers do that, but the Standard doesn't require it.
Is this code correct?
int arr[2];
int (*ptr)[2] = (int (*)[2]) &arr[1];
ptr[0][0] = 0;
Obviously ptr[0][1] would be invalid by accessing out of bounds of arr.
Note: There's no doubt that ptr[0][0] designates the same memory location as arr[1]; the question is whether we are allowed to access that memory location via ptr. Here are some more examples of when an expression does designate the same memory location but it is not permitted to access the memory location that way.
Note 2: Also consider **ptr = 0; . As pointed out by Marc van Leeuwen, ptr[0] is equivalent to *(ptr + 0), however ptr + 0 seems to fall foul of the pointer arithmetic section. But by using *ptr instead, that is avoided.
Not an answer but a comment that I can't seem to word well without being a wall of text:
Given arrays are guaranteed to store their contents contiguously so that they can be 'iterated over' using a pointer. If I can take a pointer to the begin of an array and successively increment that pointer until I have accessed every element of the array then surely that makes a statement that the array can be accessed as a series of whatever type it is composed of.
Surely the combination of:
1) Array[x] stores its first element at address 'array'
2) Successive increments of the a pointer to it are sufficient to access the next item
3) Array[x-1] obeys the same rules
Then it should be legal to at least look at the address 'array' as if it were type array[x-1] instead of type array[x].
Furthermore given the points about being contiguous and how pointers to elements in the array have to behave, surely it must be legal to then group any contiguous subset of array[x] as array[y] where y < x and it's upper bound does not exceed the extent of array[x].
Not being a language-lawyer this is just me spouting some rubbish. I am very interested in the outcome of this discussion though.
EDIT:
On further consideration of the original code, it seems to me that arrays are themselves very much a special case in many regards. They decay to a pointer, and I believe can be aliased as per what I just said earlier in this post.
So without any standardese to back up my humble opinion, an array can't really be invalid or 'undefined' as a whole if it doesn't really get treated as a whole uniformly.
What does get treated uniformly are the individual elements. So I think it only makes sense to talk about whether accessing a specific element is valid or defined.
For C++ (I'm using draft N4296) [dcl.array]/7 says in particular that if the result of subscripting is an array, it's immediately converted to pointer. That is, in ptr[0][0] ptr[0] is first converted to int* and only then second [0] is applied to it. So it's perfectly valid code.
For C (C11 draft N1570) 6.5.2.1/3 states the same.
Yes, this is correct code. Quoting N4140 for C++14:
[expr.sub]/1 ... The expression E1[E2] is identical (by definition) to *((E1)+(E2))
[expr.add]/5 ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
There is no overflow here. &*(*(ptr)) == &ptr[0][0] == &arr[1].
For C11 (N1570) the rules are the same. §6.5.2.1 and §6.5.6
Let me give a dissenting opinion: this is (at least in C++) undefined behaviour, for much the same reason as in the other question that this question linked to.
First let me clarify the example with some typedefs that will simplify the discussion.
typedef int two_ints[2];
typedef int* int_ptr;
typedef two_ints* two_ints_ptr;
two_ints arr;
two_ints_ptr ptr = (two_ints_ptr) &arr[1];
int_ptr temp = ptr[0]; // the two_ints value ptr[0] gets converted to int_ptr
temp[0] = 0;
So the question is whether, although there is no object of type two_ints whose address coincides with that of arr[1] (in the same sense that the adress of arr coincides with that of arr[0]), and therefore no object to which ptr[0] could possibly point to, one can nonetheless convert the value of that expression to one of type int_ptr (here given the name temp) that does point to an object (namely the integer object also called arr[1]).
The point where I think behaviour is undefined is in the evaluation of ptr[0], which is equivalent (per 5.2.1[expr.sub]) to *(ptr+0); more precisely the evaluation of ptr+0 has undefined behaviour.
I'll cite my copy of the C++ which is not official [N3337], but probably the language has not changed; what bothers me slightly is that the section number does not at all match the one mentioned at the accepted answer of the linked question. Anyway, for me it is §5.7[expr.add]
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce overflow; otherwise the behavior is undefined.
Since the pointer operand ptr has type pointer to two_ints, the "array object" mentioned in the cited text would have to be an array of two_ints objects. However there is only one such object here, the fictive array whose unique element is arr that we are supposed to conjure up in such situations (as per: "pointer to nonarray object behaves the same as a pointer to the first element of an array of length one..."), but clearly ptr does not point to its unique element arr. So even though ptr and ptr+0 are no doubt equal values, neither of them point to elements of any array object at all (not even a fictive one), nor one past the end of such an array object, and the condition of the cited phrase is not met. The consequence is (not that overflow is produced, but) that behavior is undefined.
So behavior is already undefined before the indirection operator * is applied. I would not argue for undefined behavior from the latter evaluation, even though the phrase "the result is an lvalue referring to the object or function to which the expression points" is hard to interpret for expressions that do not refer to any object at all. But I would be lenient in interpreting this, since I think dereferencing a pointer past an array should not itself be undefined behavior (for instance if used to initialise a reference).
This would suggest that if instead of ptr[0][0] one wrote (*ptr)[0] or **ptr, then behaviour would not be undefined. This is curious, but it would not be the first time the C++ standard surprises me.
It depends on what you mean by "correct". You are doing a cast on the ptr to arr[1]. In C++ this will probably be a reinterpret_cast. C and C++ are languages which (most of the time) assume that the programmer knows what he is doing. That this code is buggy has nothing to do with the fact that it is valid C/C++ code.
You are not violating any rules in the standards (as far as I can see).
Trying to answer here why the code works on commonly used compilers:
int arr[2];
int (*ptr)[2] = (int (*)[2]) &arr[1];
printf("%p\n", (void*)ptr);
printf("%p\n", (void*)*ptr);
printf("%p\n", (void*)ptr[0]);
All lines print the same address on commonly used compilers. So, ptr is an object for which *ptr represents the same memory location as ptr on commonly used compilers and therefore ptr[0] is really a pointer to arr[1] and therefore arr[0][0] is arr[1]. So, the code assigns a value to arr[1].
Now, let's suppose a perverse implementation where a pointer to an array (NOTE: I'm saying pointer to an array, i.e. &arr which has the type int(*)[], not arr which means the same as &arr[0] and has the type int*) is the pointer to the second byte within the array. Then dereferencing ptr is the same as subtracting 1 from ptr using char* arithmetic. For structs and unions, it is guaranteed that pointer to such types is the same as pointer to the first element of such types, but in casting pointer to array into pointer no such guarantee was found for arrays (i.e. that pointer to an array would be the same as pointer to the first element of the array) and as a matter of fact #FUZxxl planned to file a defect report about the standard. For such a perverse implementation, *ptr i.e. ptr[0] would not be the same as &arr[1]. On RISC processors, it would as a matter of fact cause problems due to data alignment.
Some additional fun:
int arr[2] = {0, 0};
int *ptr = (int*)&arr;
ptr[0] = 5;
printf("%d\n", arr[0]);
Should that code work? It prints 5.
Even more fun:
int arr[2] = {0, 0};
int (*ptr)[3] = (int(*)[3])&arr;
ptr[0][0] = 6;
printf("%d\n", arr[0]);
Should this work? It prints 6.
This should obviously work:
int arr[2] = {0, 0};
int (*ptr)[2] = &arr;
ptr[0][0] = 7;
printf("%d\n", arr[0]);
Considering that C++ does not have bound checking for built-in type arrays, Is it possible that:
One array's off-the-end pointer points to another array's first element?
Yes, a pointer beyond the end of an array could point to another object. Dereferencing a pointer beyond the end of an array results in undefined behavior.
My opinion: yes, it is possible in C++. There have been several SO threads on this topic, none of which reached any solid conclusion. Here is one example.
In some cases we can be sure that there is actually a valid object in memory immediately after the end of the old object. One case is standard-layout structs; another is multi-dimensional arrays. I originally wrote this post with a multi-dimensional array, but I have edited it to use the standard layout struct case, to avoid any objections about what the term "array object" means in the Standard.
struct
{
int a[2];
int b[2];
} foo;
if ( sizeof foo == 4 * sizeof(int) )
{
int *p = &foo.a[0];
++p; // (1)
++p; // (2)
*p = 3; // (3)
++p; // (4)
*p = 5; // (5)
}
Which line causes undefined behaviour (if any)? p is (initially, anyway) a pointer into the array of type int[2] which is designated by foo.a.
After line (2), p is now a one-past-the-end pointer. Is this dereferenceable?
The case of incrementing the pointer is covered by the section on the + operator (it is defined to have the same effect on p as p = p + 1). Here is a quote from C++11 [expr.add]#7:
Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
Line (2) does not cause UB by this clause. What about line (3)?
As far as I can see, there is no clause in the C++ standard that says dereferencing a one-past-the-end pointer causes undefined behaviour. In several places it says that iterators "might not be dereferencable", or "the library does not assume that the iterator is dereferenceable". But it carefully avoids saying "the iterator is not dereferenceable".
From the fact that we proved there is no padding, and the rules about standard-layout structs saying that elements cannot be reordered; we can conclude that now p must hold the address of the element foo.b[0]. Therefore, p is a pointer into the subobject foo.b, as well as being a one-past-the-end pointer for foo.a.
Note that in C99 it is different. The text in C99 for the + operator has (emphasis mine):
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
So, in C99 line (3) causes undefined behaviour. However C++ deliberately omits the bolded line.
Rationale: I don't know what the actual rationale is. However, my "mental model" for C's pointers is that it permits the compiler to implement "fat pointers", i.e. bounds-checked pointers. A pointer may contain the bounds of the (sub-)object that it was pointed to; and so the executable can detect array bounds errors at runtime just based on the pointer value.
I believe the C99 text is compatible with this; and the compiler can produce an executable that aborts on line (3).
However , as already stated, C++ does not have equivalent text and I can find no justification in the C++ Standard for considering (3) to cause UB; nor (4) or (5).
Is it possible that:
One array's off-the-end pointer points to another array's first element?
I'm not sure by what you mean by off the end pointer. As c++ iterators use half open ranges, I'm assuming you mean the pointer that represents the end position in an iteration. As that is one past the end, yes, it might overlap a next array, and hence it may not be dereferenced.
When using pointers as iterators, addresses and not values are compared. End implies the next address beyond end.
Reading beyond the bound of an array might result in dirty read.
It could be possible you may hit another array body
but it could also be possible that you may hit an unallocated region or
in case of int pointer you may point to a 4 byte region shared by an array of two shorts.
Your pointer may try to access a region which does not belongs to your process. Fatal error!
Not recommended to go beyond the bounds.
Regards
Kajal
From Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?
It seems that there is language specific to taking the address of one more than an array end.
Why would 2 or 2,000,000 past the end be an issue if it's not derefferenced?
Looking at some simple loop:
int array[];
...
for (int i = 0: i < array_max; ++i)
{
int * x = &array[i *2]; // Is this legal
int y=0;
if (i * 2 < array_max) // We check here before dereference
{
y = *x; // Legal dereference
}
...
}
Why or at what point does this become undefined, in practice it just sets a ptr to some value, why would it be undefined if it's not refferenced?
More specifically - what example of anything but what is expected to happen could there be?
The key issue with taking addresses beyond the end of an array are segmented architectures: you may overflow the representable range of the pointer. The existing rule already creates some level of pain as it means that the last object can't be right on the boundary of a segment. however, the ability to form this address was well established.
Since array[i *2] is equivalent to *((array) + (i*2)), we should look at the rules for pointer addition. C++11 §5.7 says:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
So you have undefined behaviour even if you don't perform indirection on the pointer (not to mention that you do perform indirection, due to the expression equivalence I gave at the beginning).
in practice it just sets a ptr to some value
In theory, just having a pointer that points somewhere invalid is not allowed.
Pointers are not integers: they are things that point to other things, or to nullity.
You can't just set them to whatever number you like.
in this example, there is nothing but assignment and then qualified dereferrencing - the value is only used if validated, the question is why is the setting of the value an issue?
Yeah, you'd have to be pretty unlucky to run into practical consequences of doing that. "Undefined behaviour" does not mean "always crash". Why should the standard actually mandate semantics for such an operation? What do you think such semantics should be?