Is subtraction of non-divisible pointer addresses defined in C? In C++?
Here's an example:
void* p = malloc(64);
int* one = (int*)((char*)p);
int* two = (int*)((char*)p + 7);
printf("%x %x %d %d\n", one, two, sizeof(int), two - one);
Ideone link.
I get the output 8a94008 8a9400f 4 1, so it seems like it does the division and truncates the remainder. Is the behavior defined?
This is undefined behavior according to 5.7.6:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. [...] Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
In your code, pointer two is not pointing to an element of the same int array as pointer one. In fact, it is not pointing to any array element of p, because it points to the "middle" of one of the elements (which in itself is an undefined behavior).
Under some assumptions1, in C the third line:
int* two = (int*)((char*)p + 7);
already causes undefined behavior, because the pointer p isn't correctly aligned for the type it is referencing2.
1 The assumption is that alignment requirements for type int are be higher than for type char. This is true on most moderns architectures. Since all alignments must be powers of two3 and the value 7 isn't, the addition of that value to the pointer p cannot produce a pointer with an alignment that is as strict as is the alignment requirement for type int.
2 (Quoted from: ISO/IEC 9899:201x 6.3.2.3 Pointers 7.)
A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced type, the behavior is
undefined.
3 (Quoted from: ISO/IEC 9899:201x 6.2.8 Alignment of objects 4.)
Every valid
alignment value shall be a nonnegative integral power of two.
Related
The C++ Programming Language 3rd edition by Stroustrup says that,
Subtraction of pointers is defined only when both pointers point to
elements of the same array (although the language has no fast way of
ensuring that is the case). When subtracting one pointer from another,
the result is the number of array elements between the two pointers
(an integer). One can add an integer to a pointer or subtract an
integer from a pointer; in both cases, the result is a pointer value.
If that value does not point to an element of the same array as the
original pointer or one beyond, the result of using that value is
undefined.
For example:
void f ()
{
int v1 [10];
int v2 [10];
int i1 = &v1[5] - &v1[3]; // i1 = 2
int i2 = &v1[5] - &v2[3]; // result undefined
}
I was reading about unspecified behavior on Wikipedia. It says that
In C and C++, the comparison of pointers to objects is only strictly defined if the pointers point to members of the same object, or elements of the same array.
Example:
int main(void)
{
int a = 0;
int b = 0;
return &a < &b; /* unspecified behavior in C++, undefined in C */
}
So, I am confused. Which one is correct? Wikipedia or Stroustrup's book? What C++ standard says about this?
Correct me If I am misunderstanding something.
Note that pointer subtraction and pointer comparison are different operations with different rules.
C++14 5.6/6, on subtracting pointers:
Unless both pointers point to elements of the same array object or one past the last element of the array object, the behavior is undefined.
C++14 5.9/3-4:
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control and provided their class is not a union.
If two operands p and q compare equal (5.10), p<=q and p>=q both yield true and p<q and p>q both yield false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true, and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.
I know you can add a pointer to an int, and subtract two pointers, and a pointer and an int, but can you add a int to a pointer. So 5 + pointer.
You can, but restrictions apply. Pointer arithmetics is only valid within an array (or 1 past the end of an array).
Here's some of the rules:
5.7 Additive operators [expr.add]
5) [...] If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
and
6) When two pointers to elements of the same array object are subtracted, the result is the difference of the
subscripts of the two array elements. [...] Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
pasted here for confirmation.
So
int* x = new int;
int* y = new int;
is okay, but:
y-x;
x + 4;
y - 1;
or even comparisons using binary comparison operators are undefined behavior.
However x+1 and 1+x are okay (a single object counts as an array of size 1)
Adding an int to a pointer is syntactically okay but there are so many issues that you have to watch out for, e.g. overflow errors.
Ideally, you should do it only within an array.
Given the following code:
char buffer[1024];
char * const begin = buffer;
char * const end = buffer + 1024;
char *p = begin + 2000;
if (p < begin || p > end)
std::cout << "pointer is out of range\n";
Are the comparisons performed (p < begin and p > end) well-defined? Or does this code have undefined behaviour because the pointer has been advanced past the end of the array?
If the comparisons are well defined, what is that definition?
(extra credit: is the evaluation of begin + 2000 itself undefined behaviour?)
I'll assume the C++11 standard. According to section 5.7 (Additive Operands) paragraph 5, the behavior of *p = begin + 2000 is undefined first, before you even get to the comparison:
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
The evaluation of begin+2000 is undefined, it's going past the end of the array - you can go up to one past the end, but not further.
From C++11 §5.7/5 Additive operators:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
For pointer comparisons to be specified, assuming you have valid pointers to start with, they essentially need to be pointers to the same array (or one past the end), or pointers to non-static data members of the same access control of the same object (unless it's an union...).
The details are in §5.9/2 Relational operators:
Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
If two pointers p and q of the same type point to the same object or function, or both point one past
the end of the same array, or are both null, then p<=q and p>=q both yield true and p<q and p>q
both yield false.
If two pointers p and q of the same type point to different objects that are not members of the same
object or elements of the same array or to different functions, or if only one of them is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified.
If two pointers point to non-static data members of the same object, or to subobjects or array elements
of such members, recursively, the pointer to the later declared member compares greater provided the
two members have the same access control (Clause 11) and provided their class is not a union.
If two pointers point to non-static data members of the same object with different access control
(Clause 11) the result is unspecified.
— If two pointers point to non-static data members of the same union object, they compare equal (after
conversion to void*, if necessary). If two pointers point to elements of the same array or one beyond
the end of the array, the pointer to the object with the higher subscript compares higher.
Other pointer comparisons are unspecified.
Your program's behavior is undefined, but not because of the comparison.
The evaluation of the expression begin + 2000 has undefined behavior because the result would point more than one element past the end of the 1024-element array.
Quoting C++11 (actually the N3485 draft), 5.7p4 [expr.add]:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. [...]
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
In short, just computing an out-of-bounds pointer has undefined behavior; it doesn't matter what operations you perform on that pointer after that.
Some C or C++ programmers are surprised to find out that even storing an invalid pointer is undefined behavior. However, for heap or stack arrays, it's okay to store the address of one past the end of the array, which allows you to store "end" positions for use in loops.
But is it undefined behavior to form a pointer range from a single stack variable, like:
char c = 'X';
char* begin = &c;
char* end = begin + 1;
for (; begin != end; ++begin) { /* do something */ }
Although the above example is pretty useless, this might be useful in the event that some function expects a pointer range, and you have a case where you simply have a single value to pass it.
Is this undefined behavior?
This is allowed, the behavior is defined and both begin and end are safely-derived pointer values.
In the C++ standard section 5.7 ([expr.add]) paragraph 4:
For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
When using C a similar clause can be found in the the C99/N1256 standard section 6.5.6 paragraph 7.
For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
As an aside, in section 3.7.4.3 ([basic.stc.dynamic.safety]) "Safely-derived pointers" there is a footnote:
This section does not impose restrictions on dereferencing pointers to memory not allocated by ::operator new. This maintains the ability of many C++ implementations to use binary libraries and components written in other languages. In particular, this applies to C binaries, because dereferencing pointers to memory allocated by malloc is not restricted.
This suggests that pointer arithmetic throughout the stack is implementation-defined behavior, not undefined behavior.
I believe that legally, you may treat a single object as an array of size one. In addition, it is most definitely legal to take a pointer one past the end of any array as long as it's not de-referenced. So I believe that it is not UB.
It is not Undefined Behavior as long as you don't dereference the invalid iterator.
You are allowed to hold a pointer to memory beyond your allocation but not allowed to dereference it.
5.7-5 of ISO14882:2011(e) states:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i + n-th and i −
n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
Unless I overlooked something there, the addition only applies to pointers pointing to the same array. For everything else, the last sentence applies: "otherwise, the behaviour is undefined"
edit:
Indeed, when you add 5.7-4 it turns out that the operation you do is (virtually) on an array, thus the sentence does not apply:
For the purposes of these operators, a pointer to a nonarray object
behaves the same as a pointer to the first element of an array of
length one with the type of the object as its element type.
In general it would be undefined behaviour to point beyond the memory space, however there is an exception for "one past the end", which is valid according to the standard.
Therefore in the particular example, &c+1 is a valid pointer but cannot be safely dereferenced.
You could define c as an array of size 1:
char c[1] = { 'X' };
Then the undefined behavior would become defined behavior.
Resulting code should be identical.
I've always wondered: isn't ptrdiff_t supposed to be able to hold the difference of any two pointers by definition? How come it fails when the two pointers are too far? (I'm not pointing at any particular language... I'm referring to all languages which have this type.)
(e.g. subtract the pointer with address 1 from the byte pointer with address 0xFFFFFFFF when you have 32-bit pointers, and it overflows the sign bit...)
No, it is not.
$5.7 [expr.add] (from n3225 - C++0x FCD)
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the <cstddef> header (18.2). As with any other arithmetic overflow, if the result does not fit in the space provided, the behavior is undefined.
In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i − j provided the value fits in an object of type std::ptrdiff_t. Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object. Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.
Note the number of times undefined appears in the paragraph. Also note that you can only subtract pointers if they point within the same object.
No, because there is no such thing as the difference between "any two pointers". You can only subtract pointers to elements of the same array (or the pointer to the location just past the end of an array).
To add a more explicit standard quote, ISO 9899:1999 §J.2/1 states:
The behavior is undefined in the following circumstances:
[...]
-- The result of subtracting two pointers is not representable in an object of type
ptrdiff_t (6.5.6).
It is entirely acceptable for ptrdiff_t to be the same size as pointer types, provided the overflow semantics are defined by the compiler so that any difference is still representable. There is no guarantee that a negative ptrdiff_t means that the second pointer lives at a lower address in memory than the first, or that ptrdiff_t is signed at all.
Over/underflow is mathematically well-defined for fixed-size integer arithmetic:
(1 - 0xFFFFFFFF) % (1<<32) =
(1 + -0xFFFFFFFF) % (1<<32) =
1 + (-0xFFFFFFFF % (1<<32)) = 2
This is the correct result!
Specifically, the result after over/underflow is an alias of the correct integer. In fact, every non-representable integer is aliased (undistinguishable) with one representable integer — count to infinity in fixed-size integers, and you will repeat yourself, round and round like a dial of an analog clock.
An N-bit integer represents any real integer modulo 2^N. In C, modulo 2^N is written as %(1<<32).
I believe C guarrantees mathematical correctness of over/underflow, but only for unsigned integers. Signed under/overflow is assumed to never happen (for the sake of optimization).
In practice, signed integers are two's complement, which makes no difference in addition or subtraction, so correct under/overflow behavior is guarranteed for signed integers too (although not by C).