why is sizeof(ptrdiff_t) == sizeof(uintptr_t) - c++

I see several posts (such as size_t vs. uintptr_t) about size_t versus uintptr_t/ptrdiff_t, but none about the relative sizes of these new c99 ptr size types.
example machine: vanilla ubuntu 14lts x64, gcc 4.8:
printf("%zu, %zu, %zu\n", sizeof(uintptr_t), sizeof(intptr_t), sizeof(ptrdiff_t));
prints: "8, 8, 8"
this does not make sense to me, as i would expect the diff type, which must be signed, to require more bits than the unsigned ptr itself.
consider:
NULL - (2^64-1) /*largest ptr, 64bits of 1's.*/
which being 2's complement negative would not fit in 64bits; hence I would expect ptrdiff_t to be larger than than ptr_t.
[a related question is why is intptr_t the same size as uintptr_t .... although i was comfortable this was possibly just to allow a signed type to contain the representation's bits (eg, using signed arithmetic on a negative ptr would (a) be undefined, and (b) have limited utility as ptrs are by definition "positive")]
thanks!

Firstly, it is clear not what uintptr_t is doing here. The languages (C and C++) do not allow you to subtract just any arbitrary pointer values from each other. Two pointers can only be subtracted if they point into the same object (into the same array object). Otherwise, the behavior is undefined. This means that these two pointers cannot possibly be farther than SIZE_MAX bytes apart. Note: the distance is limited by the range of size_t, not by the range of uintptr_t. In general case uintptr_t can be a larger type than size_t. Nobody in C/C++ ever promised you that you should be able to subtract two pointers located UINTPTR_MAX bytes apart.
(And yes, I know that on flat-memory platforms uintptr_t and size_t are usually the same type, at least by range and representation. But from the language point of view it is incorrect to assume that they always are.)
Your NULL - (2^64-1) (if interpreted as address subtraction) is a clear example of such questionable subtraction. What made you think that you should be able to do that in the first place?
Secondly, after switching from the irrelevant uintptr_t to the much more relevant size_t, one can say that your logic is perfectly valid. sizeof(ptrdiff_t) should be greater than sizeof(size_t) because of an extra bit required to represent the signed result. Nevertheless, however weird it sounds, the language specification does not require ptrdiff_t to be wide enough to accommodate all pointer subtraction results, even if two pointers point to parts of the same object (i.e. they are no farther than SIZE_MAX bytes apart). ptrdiff_t is legally permitted to have the same bit-count as size_t.
This means that a "seemingly valid" pointer subtraction may actually lead to undefined behavior simply because the result is too large. If your implementation allows you to declare a char array of size, say, SIZE_MAX / 3 * 2
char array[SIZE_MAX / 3 * 2]; // This is smaller than `SIZE_MAX`
then subtracting perfectly valid pointers to the end and to the beginning of this array might lead to undefined behavior if ptrdiff_t has the same size as size_t
char *b = array;
char *e = array + sizeof array;
ptrdiff_t distance = e - b; // Undefined behavior!
The authors of these languages decided to opt for this easier solution instead of requiring compilers to implement support for [likely non-native] extra wide signed integer type ptrdiff_t.
Real-life implementations are aware of this potential problem and usually take steps to avoid it. They artificially restrict the size of the largest supported object to make sure that pointer subtraction never overflows. In a typical implementation you will not be able to declare an array larger than PTRDIFF_MAX bytes (which is about SIZE_MAX / 2). E.g. even if SIZE_MAX on your platform is 264-1, the implementation will not let you to declare anything larger than 263-1 bytes (and real-life restrictions derived from other factors might be even tighter than that). With this restriction in place, any legal pointer subtraction will produce a result that fits into the range of ptrdiff_t.
See also,
Why is the maximum size of an array “too large”?

The accepted answer is not wrong, but does not offer much insight into why intptr_t, size_t and ptrdiff_t is actually useful, and how to use them. So here it is:
size_t is basically the type of a size_of expression. It is only required to be able to hold the size of the largest object that you can make, including arrays. So if you can only ever use 64k continues memory, then size_t can be as little as 16 bits, even if you have 64 bit pointers.
ptrdiff_t is the type of pointer difference, e.g &a - &b. And while it is true that 0 - &a is undefined behavior (as doing almost everything in C/C++), whatever it is, must fit into ptrdiff_t. It is usually the same size as pointers, because that makes the most sense. If ptrdiff_t would be a weird size, pointer arithmetics itself would break.
intptr_t/uintptr_t has the same size as pointers. They fit into the same int*_t pattern, where * is the size of the int. As with all int*_t/uint*_t types the standard for some reason allows them to be larger then required, but that's very rare.
As a rule of thumb, you can use size_t for sizes and array indices, and use intptr_t/uintptr_t for everything pointer related. Do not use ptrdiff_t.

Related

Does size_t have the same size and alignment as ptrdiff_t?

On my platform (and on most of them I think) std::size_t and std::ptrdiff_t have the same size and the same alignment. Is there any platform where that is not true? In short: is it required by the standard?
In short: is it required by the standard?
No. The only requirement is from [support.types.layout]/2 and it is:
The type ptrdiff_­t is an implementation-defined signed integer type that can hold the difference of two subscripts in an array object, as described in [expr.add].
There is paragraph 4
[ Note: It is recommended that implementations choose types for ptrdiff_­t and size_­t whose integer conversion ranks are no greater than that of signed long int unless a larger size is necessary to contain all the possible values. — end note ]
but notes are non-normative and it is only a recommendation, not a requirement.
std::size_t is defined as
The type size_­t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object ([expr.sizeof]).
in paragraph 3 and it also has no requirement that they be the same.
It is not required by the standard.
Note that the current crop of Intel processors have 48 bit pointers under the hood.
So personally I don't see it too far-fetched to conceive a 64 bit unsigned for std::size_t and a 49 bit signed type for a std::ptrdiff_t. Although such a scheme would be a headache to implement.
More interestingly once chipsets evolve to have 64 bit pointers (we are some way off that being necessary), presumably std::ptrdiff_t will have to be at least 65 bits! Personally therefore I keep in mind that one day sizeof(std::ptrdiff_t) may be larger than sizeof(std::size_t).
On my platform ... std::size_t and std::ptrdiff_t have the same size
How is this compliant?
C has (which I believe C++ inherits - if not let me know to delete) as UB in § J.2:
The result of subtracting two pointers is not representable in an object of type ptrdiff_t (6.5.6)."
This allows the type of ptrdiff_t to be the signed counterpart of the unsigned size_t.
When paired as such with no padding,
char a[PTRDIFF_MAX + (size_t)1]; // OK with enough memory in the location needed
size_t size_a = sizeof a; // OK
size_t diff0 = &a[sizeof a - 1] - &a[0]; // OK
ptrdiff_t diff1 = &a[sizeof a] - &a[0]; // UB
ptrdiff_t diff2 = %a[0] - &a[sizeof a]; // UB
Moral of the story: troubles with pointer subtraction (result type: ptrdiff_t) may begin when the array element count exceeds PTRDIFF_MAX.

size_t ptrdiff_t and address space

On my system both ptrdiff_t and size_t are 64-bit.
I would like to clarify two things:
I believe that no array could be as large as size_t due to address space restrictions. Is this true?
If yes, then, is there a guarantee that ptrdiff_t will be able to hold the result of subtraction of any pointers within the max-sized array?
No, there is no such guarantee. See, for example, here: https://en.cppreference.com/w/cpp/types/ptrdiff_t
If an array is so large (greater than PTRDIFF_MAX elements, but less
than SIZE_MAX bytes), that the difference between two pointers may not
be representable as std::ptrdiff_t, the result of subtracting two such
pointers is undefined.
Most implementations artificially restrict the maximum array size to make sure that difference between two pointers pointing into the same array fits into ptrdiff_t. So, it is more than likely that on your platform the maximum allowed array size is about SIZE_MAX / 2 (try it). This is not an "address space restriction", it is just a restriction internally enforced by your implementation. Under this restriction, legal pointer subtraction ("legal" = two pointers into the same array) will not overflow.
The language specification does not require that though. Implementations are not required to restrict their array size in that way, meaning that language specification allows seemingly legal pointer subtractions to overflow and produce undefined behavior. But most implementations prefer to defend against this by restricting their array sizes.
See the "three options" here for more details: Why is the maximum size of an array "too large"?
From [support.types.layout]/3
The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
So you are guaranteed that size_t can hold the size of the largest array you can have.
ptrdiff_t unfortunately is not so guaranteed. From [support.types.layout]/2
The type ptrdiff_t is an implementation-defined signed integer type that can hold the difference of two subscripts in an array object, as described in 8.7.
Which is okay-ish but then we have [expr.add]/5
When two pointers to elements of the same array object are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (21.2). If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined. [ Note: If the value i − j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined. —end note ]
Which states that ptrdiff_t may not be large enough.

Why it is not safe casting a pointer to a numeric type?

consider this code
T* pa = new T(13);
int address = reinterpret_cast<int>(pa);
Where T can be any built-in type.
1) I can not understand what's wrong with reinterpret cast here ??
2) What are the cases when this kind of casting will lead to an undefined behavour ?
3) Will pa always contain the correct decimal representation of a memory address ?
I can not understand what's wrong with reinterpret cast here?
Because ints and pointers are not required by the standard to have the same size, thus you might lose information by casting.
What are the cases when this kind of casting will lead to an undefined behavior?
When this is true:
1 << (sizeof(T*) * CHAR_BIT) > INT_MAX
then the value of the address might not fill in an int, which invokes Undefined Behavior.
Will pa always contain the correct decimal representation of a memory address ?
If you use std::[u]intptr_t, then yes, since these types are guaranteed to be able to hold the value of a pointer.
The size of T* and int depends on your compiler and architecture.
If sizeof(T*) > sizeof(int) you'll discard information. While this might be true on one compiler and architecture might not be true on another.
You can use either std::intptr_t or std::uintptr_t instead of int. These two are guaranteed to be big enough to hold the value of a pointer.
As other answers have pointed out, the size of an integer and the size of a pointer don't have to be the same. For example, on many 64-bit Intel machines a pointer is 64 bits and an integer is only 32.
There are other reasons to prevent this as well, though. For example, some older processor architectures have integer representations that include trap representations that result in undefined behavior if used. This means that in principle a pointer could fit into the size of an integer, but casting a pointer to an integer could result in a trap representation that causes further numeric calculations to fail.

Is sizeof any type other than char guaranteed?

I know that in C++ sizeof(char) is guaranteed to be 1, but is it the only case or are there any other built-in types guaranteed to have exact size?
In particular is sizeof(bool) == 1 or sizeof(int) == 4 demanded by language or is it an implementation detail?
The size is only guaranteed explicitly for char: sizeof(char) == 1. Implicitly this guarantee also applies to signed char and unsigned char as one of them is required to use the same representation as char and the other is bound by the conversion rules between signed char and unsigned char to use the same size.
Other than that there are only guarantees on the number if bits present in some types and a size relation between some types. Note, that char can have any number of bits equal or bigger than 8.
The rules are strict enough that size of signed char and unsigned char must also be 1.
There is no other type for which the size is guaranteed--and I know of compilers that make sizeof(bool) a value larger than 1, and that make sizeof(int) a value other than 4.
Types are not necessarily guaranteed to always have the same byte size across architectures. sizeof(X) is actually implemented by the compiler and outputs an integer (1,2,4,8, etc.) and is therefore not a function call. As a result, the output for a given type (e.g. int) will depend on the system for which your application was compiled. This is why you have to recompile an application for a different architecture.
That said, some types are always a particular size (e.g. int32).
See: What does the C++ standard state the size of int, long type to be?
In theory, an (old C++) implementation (but probably not C++11) might have sizeof every scalar type (numerical, pointer, boolean) be 1. But I cannot name such an implementation (where sizeof(int), sizeof(double), sizeof(long long), sizeof(bool), sizeof(void*) are all 1).
You probably should use <cstdint> header if you care about data type sizes.
Also, code portability can be tricky. You should care not only about integral data type size, but also about endianess and operating system issues (standards like POSIX should help). An aphorism says that there is no software that is portable, only code that has been painfully ported.

Alternate way of computing size of a type using pointer arithmetic

Is the following code 100% portable?
int a=10;
size_t size_of_int = (char *)(&a+1)-(char*)(&a); // No problem here?
std::cout<<size_of_int;// or printf("%zu",size_of_int);
P.S: The question is only for learning purpose. So please don't give answers like Use sizeof() etc
From ANSI-ISO-IEC 14882-2003, p.87 (c++03):
"75) Another way to approach pointer
arithmetic is first to convert the
pointer(s) to character pointer(s): In
this scheme the integral value of the
expression added to or subtracted from
the converted pointer is first
multiplied by the size of the object
originally pointed to, and the
resulting pointer is converted back to
the original type. For pointer
subtraction, the result of the
difference between the character
pointers is similarly divided by the
size of the object originally pointed
to."
This seems to suggest that the pointer difference equals to the object size.
If we remove the UB'ness from incrementing a pointer to a scalar a and turn a into an array:
int a[1];
size_t size_of_int = (char*)(a+1) - (char*)(a);
std::cout<<size_of_int;// or printf("%zu",size_of_int);
Then this looks OK. The clauses about alignment requirements are consistent with the footnote, if alignment requirements are always divisible by the size of the object.
UPDATE: Interesting. As most of you probably know, GCC allows to specify an explicit alignment to types as an extension. But I can't break OP's "sizeof" method with it because GCC refuses to compile it:
#include <stdio.h>
typedef int a8_int __attribute__((aligned(8)));
int main()
{
a8_int v[2];
printf("=>%d\n",((char*)&v[1]-(char*)&v[0]));
}
The message is error: alignment of array elements is greater than element size.
&a+1 will lead to undefined behavior according to the C++ Standard 5.7/5:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of
the pointer operand. <...> If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
&a+1 is OK according to 5.7/4:
For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first
element of an array of length one with the type of the object as its element type.
That means that 5.7/5 can be applied without UB. And finally remark 75 from 5.7/6 as #Luther Blissett noted in his answer says that the code in the question is valid.
In the production code you should use sizeof instead. But the C++ Standard doesn't guarantee that sizeof(int) will result in 4 on every 32-bit platform.
No. This code won't work as you expect on every plattform. At least in theory, there might be a plattform with e.g. 24 bit integers (=3 bytes) but 32 bit alignment. Such alignments are not untypical for (older or simpler) plattforms. Then, your code would return 4, but sizeof( int ) would return 3.
But I am not aware of a real hardware that behaves that way. In practice, your code will work on most or all plattforms.
It's not 100% portable for the following reasons:
Edit: You'd best use int a[1]; and then a+1 becomes definitively valid.
&a invokes undefined behaviour on objects of register storage class.
In case of alignment restrictions that are larger or equal than the size of int type, size_of_int will not contain the correct answer.
Disclaimer:
I am uncertain if the above hold for C++.
Why not just:
size_t size_of_int = sizeof(int);
It is probably implementation defined.
I can imagine a (hypothetical) system where sizeof(int) is smaller than the default alignment.
It looks only safe to say that size_of_int >= sizeof(int)
The code above will portably compute sizeof(int) on a target platform but the latter is implementation defined - you will get different results on different platforms.
Yes, it gives you the equivalent of sizeof(a) but using ptrdiff_t instead of size_t type.
There was a debate on a similar question.
See the comments on my answer to that question for some pointers at why this is not only non-portable, but also is undefined behaviour by the standard.