Pointer Arithmetic confusion

Pointer Arithmetic confusion - c++

I know you can add a pointer to an int, and subtract two pointers, and a pointer and an int, but can you add a int to a pointer. So 5 + pointer.

You can, but restrictions apply. Pointer arithmetics is only valid within an array (or 1 past the end of an array).
Here's some of the rules:
5.7 Additive operators [expr.add]
5) [...] If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
and
6) When two pointers to elements of the same array object are subtracted, the result is the difference of the
subscripts of the two array elements. [...] Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
pasted here for confirmation.
So
int* x = new int;
int* y = new int;
is okay, but:
y-x;
x + 4;
y - 1;
or even comparisons using binary comparison operators are undefined behavior.
However x+1 and 1+x are okay (a single object counts as an array of size 1)

Adding an int to a pointer is syntactically okay but there are so many issues that you have to watch out for, e.g. overflow errors.
Ideally, you should do it only within an array.

Related

Multidimensional array indexing using pointer to elements

As far as I know, multidimensional array on stack will occupy continuous memory in row order. Is it undefined behavior to index multidimensional array using a pointer to elements according to ISO C++ Standard? For example:
#include <iostream>
#include <type_traits>
int main() {
int a[5][4]{{1,2,3,4},{},{5,6,7,8}};
constexpr auto sz = sizeof(a) / sizeof(std::remove_all_extents<decltype(a)>::type);
int *p = &a[0][0];
int i = p[11]; // <-- here
p[19] = 20; // <-- here
for (int k = 0; k < sz; ++k)
std::cout << p[k] << ' '; // <-- and here
return 0;
}
Above code will compile and run correctly if pointer does not go out of the boundary of array a. But is this happen because of compiler defined behavior or language standard? Any reference from the ISO C++ Standard would be best.

The problem here is the strict aliasing rule that exists in my draft n3337 for C++11 in 3.10 Lvalues and rvalues [basic.lval] § 10. This is an exhaustive list that does not explicetely allow to alias a multidimensional array to an unidimensional one of the whole size.
So even if it is indeed required that arrays are allocated consecutively in memory, which proves that the size of a multidimensional array, say for example T arr[n][m] is the product of is dimensions by the size of an element: n * m *sizeof(T). When converted to char pointers, you can even do arithmetic pointer operations on the whole array, because any pointer to an object can be converted to a char pointer, and that char pointer can be used to access the consecutive bytes of the object (*).
But unfortunately, for any other type, the standard only allow arithmetic pointer operations inside one array (and by definition dereferening an array element is the same as dereferencing a pointer after pointer arithmetics: a[i] is *(a + i)). So if you both respect the rule on pointer arithmetics and the strict aliasing rule, the global indexing of a multi-dimensional array is not defined by C++11 standard, unless you go through char pointer arithmetics:
int a[3][4];
int *p = &a[0][0]; // perfectly defined
int b = p[3]; // ok you are in same row which means in same array
b = p[5]; // OUPS: you dereference past the declared array that builds first row
char *cq = (((char *) p) + 5 * sizeof(int)); // ok: char pointer arithmetics inside an object
int *q = (int *) cq; // ok because what lies there is an int object
b = *q; // almost the same as p[5] but behaviour is defined
That char pointer arithmetics along with the fear of breaking a lot of existing code explains why all well known compiler silently accept the aliasing of a multi-dimensional array with a 1D one of same global size (it leads to same internal code), but technically, the global pointer arithmetics is only valid for char pointers.
(*) The standard declares in 1.7 The C++ memory model [intro.memory] that
The fundamental storage unit in the C++ memory model is the byte... The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every
byte has a unique address.
and later in 3.9 Types [basic.types] §2
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object
holds a valid value of type T, the underlying bytes making up the object can be copied into an array
of char or unsigned char.
and to copy them you must access them through a char * or unsigned char *

I believe the behavior in your example is technically undefined.
The standard has no concept of a multidimensional array. What you've actually declared is an "array of 5 arrays of 4 ints". That is a[0] and a[1] are actually two different arrays of 4 ints, both of which are contained in the array a. What this means is that a[0][0] and a[1][0] are not elements of the same array.
[expr.add]/4 says the following (emphasis mine)
When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the pointer operand points to an element of an array object, and the array is
large enough, the result points to an element offset from the original element such that the difference of
the subscripts of the resulting and original array elements equals the integral expression. In other words, if
the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P))
and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array
object, provided they exist. Moreover, if the expression P points to the last element of an array object,
the expression (P)+1 points one past the last element of the array object, and if the expression Q points
one past the last element of an array object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined
So, since p[11] expands to *(p + 11) and since p and p + 11 are not elements of the same array (one is an element of a[0] and the other is more than one element past the end of a[0]), the behavior of that addition is undefined.
I would, however, be very surprised to find any implementation where such an addition resulted in anything other than the one you expect.

if you declare
int arr[3][4][5];
the type of arr is int[3][4][5], type of arr[3] is int[4][5], etc. Array of array of arrays, but NOT an array of pointers. Let's see what happens if we increment first index? It would shift pointer forward by size of array element, but array element of arr is a two-dimensional array! It is equivalent to incrementing: arr + sizeof(int[4][5])/sizeof(int) or arr + 20.
Iterating this way we'll find that arr[a][b][c] equals to *(*(*(arr + a) + b) + c), provided that there is never any padding with arrays (to comply with mandatory compatibility of POD types with C99):
*((int*)arr + 20*a + 5*b + c)
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression

Is pointer comparison undefined or unspecified behavior in C++?

The C++ Programming Language 3rd edition by Stroustrup says that,
Subtraction of pointers is defined only when both pointers point to
elements of the same array (although the language has no fast way of
ensuring that is the case). When subtracting one pointer from another,
the result is the number of array elements between the two pointers
(an integer). One can add an integer to a pointer or subtract an
integer from a pointer; in both cases, the result is a pointer value.
If that value does not point to an element of the same array as the
original pointer or one beyond, the result of using that value is
undefined.
For example:
void f ()
{
int v1 [10];
int v2 [10];
int i1 = &v1[5] - &v1[3]; // i1 = 2
int i2 = &v1[5] - &v2[3]; // result undefined
}
I was reading about unspecified behavior on Wikipedia. It says that
In C and C++, the comparison of pointers to objects is only strictly defined if the pointers point to members of the same object, or elements of the same array.
Example:
int main(void)
{
int a = 0;
int b = 0;
return &a < &b; /* unspecified behavior in C++, undefined in C */
}
So, I am confused. Which one is correct? Wikipedia or Stroustrup's book? What C++ standard says about this?
Correct me If I am misunderstanding something.

Note that pointer subtraction and pointer comparison are different operations with different rules.
C++14 5.6/6, on subtracting pointers:
Unless both pointers point to elements of the same array object or one past the last element of the array object, the behavior is undefined.
C++14 5.9/3-4:
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control and provided their class is not a union.
If two operands p and q compare equal (5.10), p<=q and p>=q both yield true and p<q and p>q both yield false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true, and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.

pointer comparisons "<" with one past the last element of an array object

I know the pointer comparisons with < is allowed in C standard only when the pointers point at the same memory space (like array).
if we take an array:
int array[10];
int *ptr = &array[0];
is comparing ptr to array+10 allowed? Is the array+10 pointer considered outside the array memory and so the comparison is not allowed?
example
for(ptr=&array[0]; ptr<(array+10); ptr++) {...}

Yes, a pointer is permitted to point to the location just past the end of the array. However you aren't permitted to deference such a pointer.
C99 6.5.6/8 Additive operators (emphasis added)
if the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.
And, specifically for comparision operations on pointers:
C99 6.5.8/5 Relational operators
If the expression P points to an element of an array object and the
expression Q points to the last element of the same array object, the
pointer expression Q+1 compares greater than P. In all other cases,
the behavior is undefined.

Yes, that is allowed, and C++ relies heavily on it (C doesn't use it quite as much, but in C++, a very common way to denote ranges is by have a pointer (or more generally, an iterator) pointing to the first element, and another pointing one past the end of the range.
It is legal for such a pointer to exist, and to compare it against the rest of the array.
But it is not legal to ever dereference the pointer.

Are comparisons on out-of-range pointers well-defined?

Given the following code:
char buffer[1024];
char * const begin = buffer;
char * const end = buffer + 1024;
char *p = begin + 2000;
if (p < begin || p > end)
std::cout << "pointer is out of range\n";
Are the comparisons performed (p < begin and p > end) well-defined? Or does this code have undefined behaviour because the pointer has been advanced past the end of the array?
If the comparisons are well defined, what is that definition?
(extra credit: is the evaluation of begin + 2000 itself undefined behaviour?)

I'll assume the C++11 standard. According to section 5.7 (Additive Operands) paragraph 5, the behavior of *p = begin + 2000 is undefined first, before you even get to the comparison:
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overﬂow; otherwise, the behavior
is undeﬁned.

The evaluation of begin+2000 is undefined, it's going past the end of the array - you can go up to one past the end, but not further.
From C++11 §5.7/5 Additive operators:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
For pointer comparisons to be specified, assuming you have valid pointers to start with, they essentially need to be pointers to the same array (or one past the end), or pointers to non-static data members of the same access control of the same object (unless it's an union...).
The details are in §5.9/2 Relational operators:
Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
If two pointers p and q of the same type point to the same object or function, or both point one past
the end of the same array, or are both null, then p<=q and p>=q both yield true and p<q and p>q
both yield false.
If two pointers p and q of the same type point to different objects that are not members of the same
object or elements of the same array or to different functions, or if only one of them is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified.
If two pointers point to non-static data members of the same object, or to subobjects or array elements
of such members, recursively, the pointer to the later declared member compares greater provided the
two members have the same access control (Clause 11) and provided their class is not a union.
If two pointers point to non-static data members of the same object with different access control
(Clause 11) the result is unspecified.
— If two pointers point to non-static data members of the same union object, they compare equal (after
conversion to void*, if necessary). If two pointers point to elements of the same array or one beyond
the end of the array, the pointer to the object with the higher subscript compares higher.
Other pointer comparisons are unspecified.

Your program's behavior is undefined, but not because of the comparison.
The evaluation of the expression begin + 2000 has undefined behavior because the result would point more than one element past the end of the 1024-element array.
Quoting C++11 (actually the N3485 draft), 5.7p4 [expr.add]:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. [...]
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
In short, just computing an out-of-bounds pointer has undefined behavior; it doesn't matter what operations you perform on that pointer after that.

Is it undefined behavior to form a pointer range from a stack address?

Some C or C++ programmers are surprised to find out that even storing an invalid pointer is undefined behavior. However, for heap or stack arrays, it's okay to store the address of one past the end of the array, which allows you to store "end" positions for use in loops.
But is it undefined behavior to form a pointer range from a single stack variable, like:
char c = 'X';
char* begin = &c;
char* end = begin + 1;
for (; begin != end; ++begin) { /* do something */ }
Although the above example is pretty useless, this might be useful in the event that some function expects a pointer range, and you have a case where you simply have a single value to pass it.
Is this undefined behavior?

This is allowed, the behavior is defined and both begin and end are safely-derived pointer values.
In the C++ standard section 5.7 ([expr.add]) paragraph 4:
For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
When using C a similar clause can be found in the the C99/N1256 standard section 6.5.6 paragraph 7.
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
As an aside, in section 3.7.4.3 ([basic.stc.dynamic.safety]) "Safely-derived pointers" there is a footnote:
This section does not impose restrictions on dereferencing pointers to memory not allocated by ::operator new. This maintains the ability of many C++ implementations to use binary libraries and components written in other languages. In particular, this applies to C binaries, because dereferencing pointers to memory allocated by malloc is not restricted.
This suggests that pointer arithmetic throughout the stack is implementation-defined behavior, not undefined behavior.

I believe that legally, you may treat a single object as an array of size one. In addition, it is most definitely legal to take a pointer one past the end of any array as long as it's not de-referenced. So I believe that it is not UB.

It is not Undefined Behavior as long as you don't dereference the invalid iterator.
You are allowed to hold a pointer to memory beyond your allocation but not allowed to dereference it.

5.7-5 of ISO14882:2011(e) states:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i + n-th and i −
n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
Unless I overlooked something there, the addition only applies to pointers pointing to the same array. For everything else, the last sentence applies: "otherwise, the behaviour is undefined"
edit:
Indeed, when you add 5.7-4 it turns out that the operation you do is (virtually) on an array, thus the sentence does not apply:
For the purposes of these operators, a pointer to a nonarray object
behaves the same as a pointer to the first element of an array of
length one with the type of the object as its element type.

In general it would be undefined behaviour to point beyond the memory space, however there is an exception for "one past the end", which is valid according to the standard.
Therefore in the particular example, &c+1 is a valid pointer but cannot be safely dereferenced.

You could define c as an array of size 1:
char c[1] = { 'X' };
Then the undefined behavior would become defined behavior.
Resulting code should be identical.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Pointer Arithmetic confusion - c++

I know you can add a pointer to an int, and subtract two pointers, and a pointer and an int, but can you add a int to a pointer. So 5 + pointer.

Adding an int to a pointer is syntactically okay but there are so many issues that you have to watch out for, e.g. overflow errors. Ideally, you should do it only within an array.

Related

Multidimensional array indexing using pointer to elements

Is pointer comparison undefined or unspecified behavior in C++?

pointer comparisons "<" with one past the last element of an array object

Are comparisons on out-of-range pointers well-defined?

Is it undefined behavior to form a pointer range from a stack address?

Categories

Resources