size_t ptrdiff_t and address space

size_t ptrdiff_t and address space - c++

On my system both ptrdiff_t and size_t are 64-bit.
I would like to clarify two things:
I believe that no array could be as large as size_t due to address space restrictions. Is this true?
If yes, then, is there a guarantee that ptrdiff_t will be able to hold the result of subtraction of any pointers within the max-sized array?

No, there is no such guarantee. See, for example, here: https://en.cppreference.com/w/cpp/types/ptrdiff_t
If an array is so large (greater than PTRDIFF_MAX elements, but less
than SIZE_MAX bytes), that the difference between two pointers may not
be representable as std::ptrdiff_t, the result of subtracting two such
pointers is undefined.

Most implementations artificially restrict the maximum array size to make sure that difference between two pointers pointing into the same array fits into ptrdiff_t. So, it is more than likely that on your platform the maximum allowed array size is about SIZE_MAX / 2 (try it). This is not an "address space restriction", it is just a restriction internally enforced by your implementation. Under this restriction, legal pointer subtraction ("legal" = two pointers into the same array) will not overflow.
The language specification does not require that though. Implementations are not required to restrict their array size in that way, meaning that language specification allows seemingly legal pointer subtractions to overflow and produce undefined behavior. But most implementations prefer to defend against this by restricting their array sizes.
See the "three options" here for more details: Why is the maximum size of an array "too large"?

From [support.types.layout]/3
The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
So you are guaranteed that size_t can hold the size of the largest array you can have.
ptrdiff_t unfortunately is not so guaranteed. From [support.types.layout]/2
The type ptrdiff_t is an implementation-defined signed integer type that can hold the difference of two subscripts in an array object, as described in 8.7.
Which is okay-ish but then we have [expr.add]/5
When two pointers to elements of the same array object are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (21.2). If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined. [ Note: If the value i − j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined. —end note ]
Which states that ptrdiff_t may not be large enough.

Related

Padding a struct out to a precise size

Consider the following struct definition:
#define SIZE ... // it's a positive multiple of sizeof(Foo*)
struct Foo {
Foo* ptr;
char padding[SIZE - sizeof(Foo*)];
};
Given that SIZE is a positive multiple of the pointer size (sizeof(Foo*)), is it guaranteed by the standard that sizeof(Foo) == SIZE?
If it is not guaranteed, as a practical matter, are there any platforms in common use that provide a counter-example (where the equality doesn't hold)?
Yes, I'm aware of alignas...

There is no guarantee about padding.
C++ Standard (working draft n4741) 6.7(4) Types
4 The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits that
hold the value of type T. Bits in the object representation that are not
part of the value representation are padding bits. For trivially copyable
types, the value representation is a set of bits in the object
representation that determines a value, which is one discrete element of
an implementation-deﬁned set of values. (41)
(41) The intent is that the memory model of C++ is compatible with that
of ISO/IEC 9899 Programming Language C.
C++ Standard (working draft n4741) 8.5.2.3(2) Sizeof
When applied to a reference or a reference type, the result is the size
of the referenced type. When applied to a class, the result is the
number of bytes in an object of that class including any padding required
for placing objects of that type in an array. The result of applying
sizeof to a potentially-overlapping subobject is the size of the type,
not the size of the subobject.78 When applied to an array, the result is
the total number of bytes in the array. This implies that the size of an
array of n elements is n times the size of an element.
I can point to no example, off-hand, where it would not hold, but based on the standard's memory model compatibility with "ISO/IEC 9899 Programming Language C", there can be no guarantees given regarding padding -- it is implementation defined.

Can ptrdiff_t represent all subtractions of pointers to elements of the same array object?

For subtraction of pointers i and j to elements of the same array object the note in [expr.add#5] reads:
[ Note: If the value i−j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined. — end note ]
But given [support.types.layout#2], which states that (emphasis mine):
The type ptrdiff_t is an implementation-defined signed integer type that can hold the difference of two subscripts in an array object, as described in [expr.add].
Is it even possible for the result of i-j not to be in the range of representable values of ptrdiff_t?
PS: I apologize if my question is caused by my poor understanding of the English language.
EDIT: Related: Why is the maximum size of an array "too large"?

Is it even possible for the result of i-j not to be in the range of representable values of ptrdiff_t?
Yes, but it's unlikely.
In fact, [support.types.layout]/2 does not say much except the proper rules about pointers subtraction and ptrdiff_t are defined in [expr.add]. So let us see this section.
[expr.add]/5
When two pointers to elements of the same array object are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the <cstddef> header.
First of all, note that the case where i and j are subscript indexes of different arrays is not considered. This allows to treat i-j as P-Q would be where P is a pointer to the element of an array at subscript i and Q is a pointer to the element of the same array at subscript j. In deed, subtracting two pointers to elements of different arrays is undefined behavior:
[expr.add]/5
If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j
; otherwise, the behavior is undefined.
As a conclusion, with the notation defined previously, i-j and P-Q are defined to have the same value, with the latter being of type std::ptrdiff_t. But nothing is said about the possibility for this type to hold such a value. This question can, however, be answered with the help of std::numeric_limits; especially, one can detect if an array some_array is too big for std::ptrdiff_t to hold all index differences:
static_assert(std::numeric_limits<std::ptrdiff_t>::max() > sizeof(some_array)/sizeof(some_array[0]),
"some_array is too big, subtracting its first and one-past-the-end element indexes "
"or pointers would lead to undefined behavior as per [expr.add]/5."
);
Now, on usual target, this would usually not happen as sizeof(std::ptrdiff_t) == sizeof(void*); which means an array would need to be stupidly big for ptrdiff_t to overflow. But there is no guarantee of it.

I think it is a bug of the wordings.
The rule in [expr.add] is inherited from the same rule for pointer subtraction in the C standard. In the C standard, ptrdiff_t is not required to hold any difference of two subscripts in an array object.
The rule in [support.types.layout] comes from Core Language Issue 1122. It added direct definitions for std::size_t and std::ptrdiff_t, which is supposed to solve the problem of circular definition. I don't see there is any reason (at least not mentioned in any official document) to make std::ptrdiff_t hold any difference of two subscripts in an array object. I guess it just uses an improper definition to solve the circular definition issue.
As another evidence, [diff.library] does not mention any difference between std::ptrdiff_t in C++ and ptrdiff_t in C. Since in C ptrdiff_t has no such constraint, in C++ std::ptrdiff_t should not have such constraint too.

why is sizeof(ptrdiff_t) == sizeof(uintptr_t)

I see several posts (such as size_t vs. uintptr_t) about size_t versus uintptr_t/ptrdiff_t, but none about the relative sizes of these new c99 ptr size types.
example machine: vanilla ubuntu 14lts x64, gcc 4.8:
printf("%zu, %zu, %zu\n", sizeof(uintptr_t), sizeof(intptr_t), sizeof(ptrdiff_t));
prints: "8, 8, 8"
this does not make sense to me, as i would expect the diff type, which must be signed, to require more bits than the unsigned ptr itself.
consider:
NULL - (2^64-1) /*largest ptr, 64bits of 1's.*/
which being 2's complement negative would not fit in 64bits; hence I would expect ptrdiff_t to be larger than than ptr_t.
[a related question is why is intptr_t the same size as uintptr_t .... although i was comfortable this was possibly just to allow a signed type to contain the representation's bits (eg, using signed arithmetic on a negative ptr would (a) be undefined, and (b) have limited utility as ptrs are by definition "positive")]
thanks!

Firstly, it is clear not what uintptr_t is doing here. The languages (C and C++) do not allow you to subtract just any arbitrary pointer values from each other. Two pointers can only be subtracted if they point into the same object (into the same array object). Otherwise, the behavior is undefined. This means that these two pointers cannot possibly be farther than SIZE_MAX bytes apart. Note: the distance is limited by the range of size_t, not by the range of uintptr_t. In general case uintptr_t can be a larger type than size_t. Nobody in C/C++ ever promised you that you should be able to subtract two pointers located UINTPTR_MAX bytes apart.
(And yes, I know that on flat-memory platforms uintptr_t and size_t are usually the same type, at least by range and representation. But from the language point of view it is incorrect to assume that they always are.)
Your NULL - (2^64-1) (if interpreted as address subtraction) is a clear example of such questionable subtraction. What made you think that you should be able to do that in the first place?
Secondly, after switching from the irrelevant uintptr_t to the much more relevant size_t, one can say that your logic is perfectly valid. sizeof(ptrdiff_t) should be greater than sizeof(size_t) because of an extra bit required to represent the signed result. Nevertheless, however weird it sounds, the language specification does not require ptrdiff_t to be wide enough to accommodate all pointer subtraction results, even if two pointers point to parts of the same object (i.e. they are no farther than SIZE_MAX bytes apart). ptrdiff_t is legally permitted to have the same bit-count as size_t.
This means that a "seemingly valid" pointer subtraction may actually lead to undefined behavior simply because the result is too large. If your implementation allows you to declare a char array of size, say, SIZE_MAX / 3 * 2
char array[SIZE_MAX / 3 * 2]; // This is smaller than `SIZE_MAX`
then subtracting perfectly valid pointers to the end and to the beginning of this array might lead to undefined behavior if ptrdiff_t has the same size as size_t
char *b = array;
char *e = array + sizeof array;
ptrdiff_t distance = e - b; // Undefined behavior!
The authors of these languages decided to opt for this easier solution instead of requiring compilers to implement support for [likely non-native] extra wide signed integer type ptrdiff_t.
Real-life implementations are aware of this potential problem and usually take steps to avoid it. They artificially restrict the size of the largest supported object to make sure that pointer subtraction never overflows. In a typical implementation you will not be able to declare an array larger than PTRDIFF_MAX bytes (which is about SIZE_MAX / 2). E.g. even if SIZE_MAX on your platform is 264-1, the implementation will not let you to declare anything larger than 263-1 bytes (and real-life restrictions derived from other factors might be even tighter than that). With this restriction in place, any legal pointer subtraction will produce a result that fits into the range of ptrdiff_t.
See also,
Why is the maximum size of an array “too large”?

The accepted answer is not wrong, but does not offer much insight into why intptr_t, size_t and ptrdiff_t is actually useful, and how to use them. So here it is:
size_t is basically the type of a size_of expression. It is only required to be able to hold the size of the largest object that you can make, including arrays. So if you can only ever use 64k continues memory, then size_t can be as little as 16 bits, even if you have 64 bit pointers.
ptrdiff_t is the type of pointer difference, e.g &a - &b. And while it is true that 0 - &a is undefined behavior (as doing almost everything in C/C++), whatever it is, must fit into ptrdiff_t. It is usually the same size as pointers, because that makes the most sense. If ptrdiff_t would be a weird size, pointer arithmetics itself would break.
intptr_t/uintptr_t has the same size as pointers. They fit into the same int*_t pattern, where * is the size of the int. As with all int*_t/uint*_t types the standard for some reason allows them to be larger then required, but that's very rare.
As a rule of thumb, you can use size_t for sizes and array indices, and use intptr_t/uintptr_t for everything pointer related. Do not use ptrdiff_t.

What is the largest value sizeof(T) can yield?

At first one might think std::numeric_limits<size_t>::max(), but if there was an object that huge, could it still offer a one-past-the-end pointer? I guess not. Does that imply the largest value sizeof(T) could yield is std::numeric_limits<size_t>::max()-1? Am I right, or am I missing something?

Q: What is the largest value sizeof(T) can yield?
A: std::numeric_limits<size_t>::max()
Clearly, sizeof cannot return a value larger than std::numeric_limits<size_t>::max(), since it wouldn't fit. The only question is, can it return ...::max()?
Yes. Here is a valid program, that violates no constraints of the C++03 standard, which demonstrates a proof-by-example. In particular, this program does not violate any constraint listed in §5.3.3 [expr.sizeof], nor in §8.3.4 [dcl.array]:
#include <limits>
#include <iostream>
int main () {
typedef char T[std::numeric_limits<size_t>::max()];
std::cout << sizeof(T)<<"\n";
}

If std::numeric_limits<ptrdiff_t>::max() > std::numeric_limits<size_t>::max() you can compute the size of an object of size std::numeric_limits<size_t>::max() by subtracting a pointer to it from a one-past-the-end pointer.
If sizeof(T*) > sizeof(size_t) you can have enough distinct pointers to address each and every single byte inside that object (in case you have an array of char, for example) plus one for one-past-the-end.
So, it's possible to write an implementation where sizeof can return std::numeric_limits<size_t>::max(), and where you can get pointer to one-past-the-end of an object that large.

it's not exactly well-defined. but to stay within safe limits of the standard, max object size is std::numeric_limits<ptrdiff_t>::max()
that's because when you subtract two pointers, you get a ptrdiff_t
which is a signed integer type
cheers & hth.,

The requirement to be able to point beyond the end of an array has nothing to do with the range of size_t. Given an object x, it's quite possible for (&x)+1 to be a valid pointer, even if the number of bytes separating the two pointers can't be represented by size_t.
You could argue that the requirement does imply an upper bound on object size of the maximum range of pointers, minus the alignment of the object. However, I don't believe the standard says anywhere that such a type can't be defined; it would just be impossible to instantiate one and still remain conformant.

If this was a test, I'd say (size_t) -1

A sizeof() expression yields a value of type size_t. From C99 standard 6.5.3.4:
The value of the result is implementation-defined, and its type (an
unsigned integer type) is size_t, defined in stddef.h (and other
headers).
Therefore, the maximum value that sizeof() can yield is SIZE_MAX.

You can have a standard compliant compiler that allows for object sizes that cause pointer arithmetic to overflow; however, the result is undefined. From the C++ standard, 5.7 [expr.add]:
When two pointers to elements of the same array object are subtracted,
the result is the difference of the subscripts of the two array
elements. The type of the result is an implementation-defined signed
integral type; this type shall be the same type that is defined as
std::ptrdiff_t in the <cstddef> header (18.2). As with any other
arithmetic overflow, if the result does not fit in the space provided,
the behavior is undefined.

ptrdiff_t too small?

I've always wondered: isn't ptrdiff_t supposed to be able to hold the difference of any two pointers by definition? How come it fails when the two pointers are too far? (I'm not pointing at any particular language... I'm referring to all languages which have this type.)
(e.g. subtract the pointer with address 1 from the byte pointer with address 0xFFFFFFFF when you have 32-bit pointers, and it overflows the sign bit...)

No, it is not.
$5.7 [expr.add] (from n3225 - C++0x FCD)
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the <cstddef> header (18.2). As with any other arithmetic overflow, if the result does not fit in the space provided, the behavior is undefined.
In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i − j provided the value fits in an object of type std::ptrdiff_t. Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object. Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.
Note the number of times undefined appears in the paragraph. Also note that you can only subtract pointers if they point within the same object.

No, because there is no such thing as the difference between "any two pointers". You can only subtract pointers to elements of the same array (or the pointer to the location just past the end of an array).

To add a more explicit standard quote, ISO 9899:1999 §J.2/1 states:
The behavior is undefined in the following circumstances:
[...]
-- The result of subtracting two pointers is not representable in an object of type
ptrdiff_t (6.5.6).

It is entirely acceptable for ptrdiff_t to be the same size as pointer types, provided the overflow semantics are defined by the compiler so that any difference is still representable. There is no guarantee that a negative ptrdiff_t means that the second pointer lives at a lower address in memory than the first, or that ptrdiff_t is signed at all.

Over/underflow is mathematically well-defined for fixed-size integer arithmetic:
(1 - 0xFFFFFFFF) % (1<<32) =
(1 + -0xFFFFFFFF) % (1<<32) =
1 + (-0xFFFFFFFF % (1<<32)) = 2
This is the correct result!
Specifically, the result after over/underflow is an alias of the correct integer. In fact, every non-representable integer is aliased (undistinguishable) with one representable integer — count to infinity in fixed-size integers, and you will repeat yourself, round and round like a dial of an analog clock.
An N-bit integer represents any real integer modulo 2^N. In C, modulo 2^N is written as %(1<<32).
I believe C guarrantees mathematical correctness of over/underflow, but only for unsigned integers. Signed under/overflow is assumed to never happen (for the sake of optimization).
In practice, signed integers are two's complement, which makes no difference in addition or subtraction, so correct under/overflow behavior is guarranteed for signed integers too (although not by C).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js