Is it safe to calculate pointer offset using nullptr? - c++

Suppose I have two pointers:
char* p1 = nullptr;
char* p2 = std::malloc( 4 );
std::size_t offset = p2 - p1;
Is it safe to get offset in this way? So far it works fine on my computer. But I'm wondering if the offset can exceed the maximum number of size_t such that this method fails?

This is undefined behavior, from the draft C++ standard section 5.7 Additive operators:
When two pointers to elements of the same array object are subtracted,
the result is the difference of the subscripts of the two array
elements. The type of the result is an implementation-defined signed
integral type; this type shall be the same type that is defined as
std::ptrdiff_t in the header (18.2). [...] Unless both
pointers point to elements of the same array object, or one past the
last element of the array object, the behavior is
undefined.82
Also as the reference mentions, the result is std::ptrdiff_t not size_t.
you can on the other hand add or subtract the value 0 which is covered in paragraph 7:
If the value 0 is added to or subtracted from a pointer value, the
result compares equal to the original pointer value. If two pointers
point to the same object or both point one past the end of the same
array or both are null, and the two pointers are subtracted, the
result compares equal to the value 0 converted to the type
std::ptrdiff_t.
If you want to convert a pointer to an integral value then you should use either intptr_t or uinitptr_t:
intptr_t integer type capable of holding a pointer
uintptr_t unsigned integer type capable of holding a pointer
For example:
uintptr_t ip = reinterpret_cast<uintptr_t>( p2 ) ;

No it is not safe. Basically the only thing you can do with null pointer is to compare it with another pointer. As for addition and subtraction one can only add or subtract zero to a null pointer, and subtract two null pointers - which may be useful in generic programming. Your case is undefined behaviour.

In addition to the answer by Wojtek, pointer arithmetic can and should only be done between related pointers. For example if you have e.g. char* p3 = p2 + 4, then you could do p3 - p2 to get the difference between the two pointers, that would be legal.
However, things like
char* p4 = new char[4];
std::cout << p4 - p2 << '\n';
is not legal, as p2 and p4 are not related.

Related

Pointer arithmetics with two different buffers

Consider the following code:
int* p1 = new int[100];
int* p2 = new int[100];
const ptrdiff_t ptrDiff = p1 - p2;
int* p1_42 = &(p1[42]);
int* p2_42 = p1_42 + ptrDiff;
Now, does the Standard guarantee that p2_42 points to p2[42]? If not, is it always true on Windows, Linux or webassembly heap?
To add the standard quote:
expr.add#5
When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_­t in the <cstddef> header ([support.types]).
(5.1)
If P and Q both evaluate to null pointer values, the result is 0.
(5.2)
Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.
(5.3)
Otherwise, the behavior is undefined.
[ Note: If the value i−j is not in the range of representable values of type std::ptrdiff_­t, the behavior is undefined.
— end note
 ]
(5.1) does not apply as the pointers are not nullptrs. (5.2) does not apply because the pointers are not into the same array. So, we are left with (5.3) - UB.
const ptrdiff_t ptrDiff = p1 - p2;
This is undefined behavior. Subtraction between two pointers is well defined only if they point to elements in the same array. ([expr.add] ¶5.3).
When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_­t in the <cstddef> header ([support.types]).
If P and Q both evaluate to null pointer values, the result is 0.
Otherwise, if P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j.
Otherwise, the behavior is undefined
And even if there was some hypothetical way to obtain this value in a legal way, even that summation is illegal, as even a pointer+integer summation is restricted to stay inside the boundaries of the array ([expr.add] ¶4.2)
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to element x[i] of an array object x with n elements,81 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n.
Otherwise, the behavior is undefined.
The third line is Undefined Behavior, so the Standard allows anything after that.
It's only legal to subtract two pointers pointing to (or after) the same array.
Windows or Linux aren't really relevant; compilers and especially their optimizers are what breaks your program. For instance, an optimizer might recognize that p1 and p2 both point to the begin of an int[100] so p1-p2 has to be 0.
The Standard allows for implementations on platforms where memory is divided into discrete regions which cannot be reached from each other using pointer arithmetic. As a simple example, some platforms use 24-bit addresses that consist of an 8-bit bank number and a 16-bit address within a bank. Adding one to an address that identifies the last byte of a bank will yield a pointer to the first byte of that same bank, rather than the first byte of the next bank. This approach allows address arithmetic and offsets to be computed using 16-bit math rather than 24-bit math, but requires that no object span a bank boundary. Such a design would impose some extra complexity on malloc, and would likely result in more memory fragmentation than would otherwise occur, but user code wouldn't generally need to care about the partitioning of memory into banks.
Many platforms do not have such architectural restrictions, and some compilers which are designed for low-level programming on such platforms will allow address arithmetic to be performed between arbitrary pointers. The Standard notes that a common way of treating Undefined Behavior is "behaving during translation or program execution in a documented manner characteristic of the environment", and support for generalized pointer arithmetic in environments that support it would fit nicely under that category. Unfortunately, the Standard fails to provide any means of distinguishing implementations that behave in such useful fashion and those which don't.

Subtraction of non-divisible pointer addresses

Is subtraction of non-divisible pointer addresses defined in C? In C++?
Here's an example:
void* p = malloc(64);
int* one = (int*)((char*)p);
int* two = (int*)((char*)p + 7);
printf("%x %x %d %d\n", one, two, sizeof(int), two - one);
Ideone link.
I get the output 8a94008 8a9400f 4 1, so it seems like it does the division and truncates the remainder. Is the behavior defined?
This is undefined behavior according to 5.7.6:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. [...] Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
In your code, pointer two is not pointing to an element of the same int array as pointer one. In fact, it is not pointing to any array element of p, because it points to the "middle" of one of the elements (which in itself is an undefined behavior).
Under some assumptions1, in C the third line:
int* two = (int*)((char*)p + 7);
already causes undefined behavior, because the pointer p isn't correctly aligned for the type it is referencing2.
1 The assumption is that alignment requirements for type int are be higher than for type char. This is true on most moderns architectures. Since all alignments must be powers of two3 and the value 7 isn't, the addition of that value to the pointer p cannot produce a pointer with an alignment that is as strict as is the alignment requirement for type int.
2 (Quoted from: ISO/IEC 9899:201x 6.3.2.3 Pointers 7.)
A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced type, the behavior is
undefined.
3 (Quoted from: ISO/IEC 9899:201x 6.2.8 Alignment of objects 4.)
Every valid
alignment value shall be a nonnegative integral power of two.

Is pointer comparison undefined or unspecified behavior in C++?

The C++ Programming Language 3rd edition by Stroustrup says that,
Subtraction of pointers is defined only when both pointers point to
elements of the same array (although the language has no fast way of
ensuring that is the case). When subtracting one pointer from another,
the result is the number of array elements between the two pointers
(an integer). One can add an integer to a pointer or subtract an
integer from a pointer; in both cases, the result is a pointer value.
If that value does not point to an element of the same array as the
original pointer or one beyond, the result of using that value is
undefined.
For example:
void f ()
{
int v1 [10];
int v2 [10];
int i1 = &v1[5] - &v1[3]; // i1 = 2
int i2 = &v1[5] - &v2[3]; // result undefined
}
I was reading about unspecified behavior on Wikipedia. It says that
In C and C++, the comparison of pointers to objects is only strictly defined if the pointers point to members of the same object, or elements of the same array.
Example:
int main(void)
{
int a = 0;
int b = 0;
return &a < &b; /* unspecified behavior in C++, undefined in C */
}
So, I am confused. Which one is correct? Wikipedia or Stroustrup's book? What C++ standard says about this?
Correct me If I am misunderstanding something.
Note that pointer subtraction and pointer comparison are different operations with different rules.
C++14 5.6/6, on subtracting pointers:
Unless both pointers point to elements of the same array object or one past the last element of the array object, the behavior is undefined.
C++14 5.9/3-4:
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control and provided their class is not a union.
If two operands p and q compare equal (5.10), p<=q and p>=q both yield true and p<q and p>q both yield false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true, and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.

Pointer Arithmetic confusion

I know you can add a pointer to an int, and subtract two pointers, and a pointer and an int, but can you add a int to a pointer. So 5 + pointer.
You can, but restrictions apply. Pointer arithmetics is only valid within an array (or 1 past the end of an array).
Here's some of the rules:
5.7 Additive operators [expr.add]
5) [...] If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
and
6) When two pointers to elements of the same array object are subtracted, the result is the difference of the
subscripts of the two array elements. [...] Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
pasted here for confirmation.
So
int* x = new int;
int* y = new int;
is okay, but:
y-x;
x + 4;
y - 1;
or even comparisons using binary comparison operators are undefined behavior.
However x+1 and 1+x are okay (a single object counts as an array of size 1)
Adding an int to a pointer is syntactically okay but there are so many issues that you have to watch out for, e.g. overflow errors.
Ideally, you should do it only within an array.

Pointer incrementing in C++

What does this mean: that a pointer increment points to the address of the next base type of the pointer?
For example:
p1++; // p1 is a pointer to an int
Does this statement mean that the address pointed to by p1 should change to the address of the next int or it should just be incremented by 2 (assuming an int is 2 bytes), in which case the particular address may not contain an int?
I mean, if p1 is, say, 0x442012, will p1++ be 0x442014 (which may be part of the address of a double) or will it point to the next int which is in an address like 0x44201F?
Thanks
Pointer arithmetic doesn’t care about the content – or validity – of the pointee. It will simply increment the pointer address using the following formula:
new_value = reinterpret_cast<char*>(p) + sizeof(*p);
(Assuming a pointer to non-const – otherwise the cast wouldn’t work.)
That is, it will increment the pointer by an amount of sizeof(*p) bytes, regardless of things like pointee value and memory alignment.
The compiler will add sizeof(int) (usually 4) to the numeric value of the pointer. If p1 is 0x442012 before the increment, then after the increment it will be 0x442012 + 4 = 0x442016.
Mind you, 0x442012 is not a multiple of 4, so it is unlikely to be the address of a valid four-byte int, though it would be fine for your two-byte ints.
It certainly won't go looking for the next integer. That would require magic.
p1++ gives rise to assembly language instructions which increment p1 by the size of what it points to. So you get
(char *)p1 = (char *)p1 + sizeof (object pointed to by p1)
(When this question was answered) Typically an int is 4 bytes, so it would increment by 4, but it depends on the sizeof() on your machine.
It does not go to "the next int".
An example: assume a 4 byte address and p1 = 0x20424 (where p1 is an int*). Then
p1++
would set the new value of p1 to 0x20428. NOT 0x20425.
If p1 is pointing into the element of index n of an array of objects of type int (a non-array object counts as an array of length 1 for this purpose), then after p1++, p1 is either:
Pointing to the element of index n+1 if the array is of length greater than n+1.
The 'past-the-end' address of the array, if the array is of length exactly n+1.
p1++ causes undefined behavior if p1 is not pointing to an element of an array of objects of type int.
The only meaning that the C and C++ languages give to the notion of "address" is the value of a pointer object.
Any relationship that C/C++'s notion of address has to the notion of a numeric addresses you'd consider in assembly language is purely an implementation detail (albeit, an extremely common implementation detail).
Pointer arithmetic are done in sizoeof(*pointer) multiples - that is, for a pointer to int, increment will advance to the next integer (or 4 bytes for 32 bit integers).