What is the largest value sizeof(T) can yield? - c++

At first one might think std::numeric_limits<size_t>::max(), but if there was an object that huge, could it still offer a one-past-the-end pointer? I guess not. Does that imply the largest value sizeof(T) could yield is std::numeric_limits<size_t>::max()-1? Am I right, or am I missing something?

Q: What is the largest value sizeof(T) can yield?
A: std::numeric_limits<size_t>::max()
Clearly, sizeof cannot return a value larger than std::numeric_limits<size_t>::max(), since it wouldn't fit. The only question is, can it return ...::max()?
Yes. Here is a valid program, that violates no constraints of the C++03 standard, which demonstrates a proof-by-example. In particular, this program does not violate any constraint listed in §5.3.3 [expr.sizeof], nor in §8.3.4 [dcl.array]:
#include <limits>
#include <iostream>
int main () {
typedef char T[std::numeric_limits<size_t>::max()];
std::cout << sizeof(T)<<"\n";
}

If std::numeric_limits<ptrdiff_t>::max() > std::numeric_limits<size_t>::max() you can compute the size of an object of size std::numeric_limits<size_t>::max() by subtracting a pointer to it from a one-past-the-end pointer.
If sizeof(T*) > sizeof(size_t) you can have enough distinct pointers to address each and every single byte inside that object (in case you have an array of char, for example) plus one for one-past-the-end.
So, it's possible to write an implementation where sizeof can return std::numeric_limits<size_t>::max(), and where you can get pointer to one-past-the-end of an object that large.

it's not exactly well-defined. but to stay within safe limits of the standard, max object size is std::numeric_limits<ptrdiff_t>::max()
that's because when you subtract two pointers, you get a ptrdiff_t
which is a signed integer type
cheers & hth.,

The requirement to be able to point beyond the end of an array has nothing to do with the range of size_t. Given an object x, it's quite possible for (&x)+1 to be a valid pointer, even if the number of bytes separating the two pointers can't be represented by size_t.
You could argue that the requirement does imply an upper bound on object size of the maximum range of pointers, minus the alignment of the object. However, I don't believe the standard says anywhere that such a type can't be defined; it would just be impossible to instantiate one and still remain conformant.

If this was a test, I'd say (size_t) -1

A sizeof() expression yields a value of type size_t. From C99 standard 6.5.3.4:
The value of the result is implementation-defined, and its type (an
unsigned integer type) is size_t, defined in stddef.h (and other
headers).
Therefore, the maximum value that sizeof() can yield is SIZE_MAX.

You can have a standard compliant compiler that allows for object sizes that cause pointer arithmetic to overflow; however, the result is undefined. From the C++ standard, 5.7 [expr.add]:
When two pointers to elements of the same array object are subtracted,
the result is the difference of the subscripts of the two array
elements. The type of the result is an implementation-defined signed
integral type; this type shall be the same type that is defined as
std::ptrdiff_t in the <cstddef> header (18.2). As with any other
arithmetic overflow, if the result does not fit in the space provided,
the behavior is undefined.

Related

size_t ptrdiff_t and address space

On my system both ptrdiff_t and size_t are 64-bit.
I would like to clarify two things:
I believe that no array could be as large as size_t due to address space restrictions. Is this true?
If yes, then, is there a guarantee that ptrdiff_t will be able to hold the result of subtraction of any pointers within the max-sized array?
No, there is no such guarantee. See, for example, here: https://en.cppreference.com/w/cpp/types/ptrdiff_t
If an array is so large (greater than PTRDIFF_MAX elements, but less
than SIZE_MAX bytes), that the difference between two pointers may not
be representable as std::ptrdiff_t, the result of subtracting two such
pointers is undefined.
Most implementations artificially restrict the maximum array size to make sure that difference between two pointers pointing into the same array fits into ptrdiff_t. So, it is more than likely that on your platform the maximum allowed array size is about SIZE_MAX / 2 (try it). This is not an "address space restriction", it is just a restriction internally enforced by your implementation. Under this restriction, legal pointer subtraction ("legal" = two pointers into the same array) will not overflow.
The language specification does not require that though. Implementations are not required to restrict their array size in that way, meaning that language specification allows seemingly legal pointer subtractions to overflow and produce undefined behavior. But most implementations prefer to defend against this by restricting their array sizes.
See the "three options" here for more details: Why is the maximum size of an array "too large"?
From [support.types.layout]/3
The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
So you are guaranteed that size_t can hold the size of the largest array you can have.
ptrdiff_t unfortunately is not so guaranteed. From [support.types.layout]/2
The type ptrdiff_t is an implementation-defined signed integer type that can hold the difference of two subscripts in an array object, as described in 8.7.
Which is okay-ish but then we have [expr.add]/5
When two pointers to elements of the same array object are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (21.2). If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined. [ Note: If the value i − j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined. —end note ]
Which states that ptrdiff_t may not be large enough.

Can ptrdiff_t represent all subtractions of pointers to elements of the same array object?

For subtraction of pointers i and j to elements of the same array object the note in [expr.add#5] reads:
[ Note: If the value i−j is not in the range of representable values of type std​::​ptrdiff_­t, the behavior is undefined. — end note ]
But given [support.types.layout#2], which states that (emphasis mine):
The type ptrdiff_­t is an implementation-defined signed integer type that can hold the difference of two subscripts in an array object, as described in [expr.add].
Is it even possible for the result of i-j not to be in the range of representable values of ptrdiff_t?
PS: I apologize if my question is caused by my poor understanding of the English language.
EDIT: Related: Why is the maximum size of an array "too large"?
Is it even possible for the result of i-j not to be in the range of representable values of ptrdiff_t?
Yes, but it's unlikely.
In fact, [support.types.layout]/2 does not say much except the proper rules about pointers subtraction and ptrdiff_t are defined in [expr.add]. So let us see this section.
[expr.add]/5
When two pointers to elements of the same array object are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std​::​ptrdiff_­t in the <cstddef> header.
First of all, note that the case where i and j are subscript indexes of different arrays is not considered. This allows to treat i-j as P-Q would be where P is a pointer to the element of an array at subscript i and Q is a pointer to the element of the same array at subscript j. In deed, subtracting two pointers to elements of different arrays is undefined behavior:
[expr.add]/5
If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i−j
; otherwise, the behavior is undefined.
As a conclusion, with the notation defined previously, i-j and P-Q are defined to have the same value, with the latter being of type std::ptrdiff_t. But nothing is said about the possibility for this type to hold such a value. This question can, however, be answered with the help of std::numeric_limits; especially, one can detect if an array some_array is too big for std::ptrdiff_t to hold all index differences:
static_assert(std::numeric_limits<std::ptrdiff_t>::max() > sizeof(some_array)/sizeof(some_array[0]),
"some_array is too big, subtracting its first and one-past-the-end element indexes "
"or pointers would lead to undefined behavior as per [expr.add]/5."
);
Now, on usual target, this would usually not happen as sizeof(std::ptrdiff_t) == sizeof(void*); which means an array would need to be stupidly big for ptrdiff_t to overflow. But there is no guarantee of it.
I think it is a bug of the wordings.
The rule in [expr.add] is inherited from the same rule for pointer subtraction in the C standard. In the C standard, ptrdiff_t is not required to hold any difference of two subscripts in an array object.
The rule in [support.types.layout] comes from Core Language Issue 1122. It added direct definitions for std::size_t and std::ptrdiff_t, which is supposed to solve the problem of circular definition. I don't see there is any reason (at least not mentioned in any official document) to make std::ptrdiff_t hold any difference of two subscripts in an array object. I guess it just uses an improper definition to solve the circular definition issue.
As another evidence, [diff.library] does not mention any difference between std::ptrdiff_t in C++ and ptrdiff_t in C. Since in C ptrdiff_t has no such constraint, in C++ std::ptrdiff_t should not have such constraint too.

Confusion regarding types, overflows and UB in pointer-integral addition

I used to think that adding an integral type to a pointer (provided that the the pointer points to an array of a certain size etc. etc.) is always well defined, regardless of the integral type. The C++11 standard says ([expr.add]):
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i -th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n ) point to, respectively, the i + n -th and i − n -th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
On the other hand, it was brought to my attention recently that the built-in add operators for pointers are defined in terms of ptrdiff_t, which is a signed type (see 13.6/13). This seems to hint that if one does a malloc() with a very large (unsigned) size and then tries to reach the end of the allocated space via a pointer addition with a std::size_t value, this might result in undefined behaviour because the unsigned std::size_t will be converted to ptrdiff_t which is potentially UB.
I imagine similar issues would arise, e.g., in the operator[]() of std::vector, which is implemented in terms of an unsigned size_type. In general, it seems to me like this would make practically impossible to fully use the memory storage available on a platform.
It's worth noting that nor GCC nor Clang complain about signed-unsigned integral conversions with all the relevant diagnostic turned on when adding unsigned values to pointers.
Am I missing something?
EDIT: I'd like to clarify that I am talking about additions involving a pointer and an integral type (not two pointers).
EDIT2: an equivalent way of formulating the question might be this. Does this code result in UB in the second line, if ptrdiff_t has a smaller positive range than size_t?
char *ptr = static_cast<char * >(std::malloc(std::numeric_limits<std::size_t>::max()));
auto end = ptr + std::numeric_limits<std::size_t>::max();
Your question is based on a false premise.
Subtraction of pointers produces a ptrdiff_t §[expr.add]/6:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (18.2).
That does not, however, mean that addition is defined in terms of ptrdiff_t. Rather the contrary, for addition only one conversion is specified (§[expr.add]/1):
The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.
The "usual arithmetic conversions" are defined in §[expr]/10. This includes only one conversion from unsigned type to signed type:
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
So, while there may be some room for question about exactly what type the size_t will be converted to (and whether it's converted at all), there's no question on one point: the only way it can be converted to a ptrdiff_t is if all its values can be represented without change as a ptrdiff_t.
So, given:
size_t N;
T *p;
...the expression p + N will never fail because of some (imagined) conversion of N to a ptrdiff_t before the addition takes place.
Since §13.6 is being mentioned, perhaps it's best to back up and look carefully at what §13.6 really is:
The candidate operator functions that represent the built-in operators defined in Clause 5 are specified in this subclause. These candidate functions participate in the operator overload resolution process as described in 13.3.1.2 and are used for no other purpose.
[emphasis added]
In other words, the fact that §13.6 defines an operator that adds a ptrdiff_t to a pointer does not mean that when any other integer type is added to a pointer, it's first converted to a ptrdiff_t, or anything like that. More generally, the operators defined in §13.6 are never used to carry out any arithmetic operations.
With that, and the rest of the text you quoted from §[expr.add], we can quickly conclude that adding a size_t to a pointer can overflow if and only if there aren't that many elements in the array after the pointer.
Given the above, one more question probably occurs to you. If I have code like this:
char *p = huge_array;
size_t N = sizeof(huge_array);
char *p2 = p + N;
ptrdiff_t diff = p2 - p;
...is it possible that the final subtraction will overflow? The short and simple answer to that is: Yes, it can.

ptrdiff_t too small?

I've always wondered: isn't ptrdiff_t supposed to be able to hold the difference of any two pointers by definition? How come it fails when the two pointers are too far? (I'm not pointing at any particular language... I'm referring to all languages which have this type.)
(e.g. subtract the pointer with address 1 from the byte pointer with address 0xFFFFFFFF when you have 32-bit pointers, and it overflows the sign bit...)
No, it is not.
$5.7 [expr.add] (from n3225 - C++0x FCD)
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the <cstddef> header (18.2). As with any other arithmetic overflow, if the result does not fit in the space provided, the behavior is undefined.
In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i − j provided the value fits in an object of type std::ptrdiff_t. Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object. Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.
Note the number of times undefined appears in the paragraph. Also note that you can only subtract pointers if they point within the same object.
No, because there is no such thing as the difference between "any two pointers". You can only subtract pointers to elements of the same array (or the pointer to the location just past the end of an array).
To add a more explicit standard quote, ISO 9899:1999 §J.2/1 states:
The behavior is undefined in the following circumstances:
[...]
-- The result of subtracting two pointers is not representable in an object of type
ptrdiff_t (6.5.6).
It is entirely acceptable for ptrdiff_t to be the same size as pointer types, provided the overflow semantics are defined by the compiler so that any difference is still representable. There is no guarantee that a negative ptrdiff_t means that the second pointer lives at a lower address in memory than the first, or that ptrdiff_t is signed at all.
Over/underflow is mathematically well-defined for fixed-size integer arithmetic:
(1 - 0xFFFFFFFF) % (1<<32) =
(1 + -0xFFFFFFFF) % (1<<32) =
1 + (-0xFFFFFFFF % (1<<32)) = 2
This is the correct result!
Specifically, the result after over/underflow is an alias of the correct integer. In fact, every non-representable integer is aliased (undistinguishable) with one representable integer — count to infinity in fixed-size integers, and you will repeat yourself, round and round like a dial of an analog clock.
An N-bit integer represents any real integer modulo 2^N. In C, modulo 2^N is written as %(1<<32).
I believe C guarrantees mathematical correctness of over/underflow, but only for unsigned integers. Signed under/overflow is assumed to never happen (for the sake of optimization).
In practice, signed integers are two's complement, which makes no difference in addition or subtraction, so correct under/overflow behavior is guarranteed for signed integers too (although not by C).

Alternate way of computing size of a type using pointer arithmetic

Is the following code 100% portable?
int a=10;
size_t size_of_int = (char *)(&a+1)-(char*)(&a); // No problem here?
std::cout<<size_of_int;// or printf("%zu",size_of_int);
P.S: The question is only for learning purpose. So please don't give answers like Use sizeof() etc
From ANSI-ISO-IEC 14882-2003, p.87 (c++03):
"75) Another way to approach pointer
arithmetic is first to convert the
pointer(s) to character pointer(s): In
this scheme the integral value of the
expression added to or subtracted from
the converted pointer is first
multiplied by the size of the object
originally pointed to, and the
resulting pointer is converted back to
the original type. For pointer
subtraction, the result of the
difference between the character
pointers is similarly divided by the
size of the object originally pointed
to."
This seems to suggest that the pointer difference equals to the object size.
If we remove the UB'ness from incrementing a pointer to a scalar a and turn a into an array:
int a[1];
size_t size_of_int = (char*)(a+1) - (char*)(a);
std::cout<<size_of_int;// or printf("%zu",size_of_int);
Then this looks OK. The clauses about alignment requirements are consistent with the footnote, if alignment requirements are always divisible by the size of the object.
UPDATE: Interesting. As most of you probably know, GCC allows to specify an explicit alignment to types as an extension. But I can't break OP's "sizeof" method with it because GCC refuses to compile it:
#include <stdio.h>
typedef int a8_int __attribute__((aligned(8)));
int main()
{
a8_int v[2];
printf("=>%d\n",((char*)&v[1]-(char*)&v[0]));
}
The message is error: alignment of array elements is greater than element size.
&a+1 will lead to undefined behavior according to the C++ Standard 5.7/5:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of
the pointer operand. <...> If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
&a+1 is OK according to 5.7/4:
For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first
element of an array of length one with the type of the object as its element type.
That means that 5.7/5 can be applied without UB. And finally remark 75 from 5.7/6 as #Luther Blissett noted in his answer says that the code in the question is valid.
In the production code you should use sizeof instead. But the C++ Standard doesn't guarantee that sizeof(int) will result in 4 on every 32-bit platform.
No. This code won't work as you expect on every plattform. At least in theory, there might be a plattform with e.g. 24 bit integers (=3 bytes) but 32 bit alignment. Such alignments are not untypical for (older or simpler) plattforms. Then, your code would return 4, but sizeof( int ) would return 3.
But I am not aware of a real hardware that behaves that way. In practice, your code will work on most or all plattforms.
It's not 100% portable for the following reasons:
Edit: You'd best use int a[1]; and then a+1 becomes definitively valid.
&a invokes undefined behaviour on objects of register storage class.
In case of alignment restrictions that are larger or equal than the size of int type, size_of_int will not contain the correct answer.
Disclaimer:
I am uncertain if the above hold for C++.
Why not just:
size_t size_of_int = sizeof(int);
It is probably implementation defined.
I can imagine a (hypothetical) system where sizeof(int) is smaller than the default alignment.
It looks only safe to say that size_of_int >= sizeof(int)
The code above will portably compute sizeof(int) on a target platform but the latter is implementation defined - you will get different results on different platforms.
Yes, it gives you the equivalent of sizeof(a) but using ptrdiff_t instead of size_t type.
There was a debate on a similar question.
See the comments on my answer to that question for some pointers at why this is not only non-portable, but also is undefined behaviour by the standard.