Is it guaranteed that sizeof(T[N]) == N * sizeof(T)? - c++

I had always assumed that the size of an array of N elements of type T, as returned by sizeof was guaranteed to be exactly N times sizeof(T).
The comments on this question made me doubt it though. There are claims from reputable users that arrays may contain padding, which would break the equality. Of course such platforms may not exist, but are they allowed?
If allowed, this would break many common idioms, such as calculating the needed storage for an array with N * sizeof(T), or calculating the number of elements in an array using sizeof(a)/sizeof(a[0]).

Yes. [expr.sizeof] includes this bit about sizeof:
When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element.

The whole point of sizeof is it includes the relevant padding. Every element of an array is exactly sizeof(T) bytes after the previous element. So the size of the entire array is N * sizeof(T).

Related

Does C or C++ guarantee array < array + SIZE?

Suppose you have an array:
int array[SIZE];
or
int *array = new(int[SIZE]);
Does C or C++ guarantee that array < array + SIZE, and if so where?
I understand that regardless of the language spec, many operating systems guarantee this property by reserving the top of the virtual address space for the kernel. My question is whether this is also guaranteed by the language, rather than just by the vast majority of implementations.
As an example, suppose an OS kernel lives in low memory and sometimes gives the highest page of virtual memory out to user processes in response to mmap requests for anonymous memory. If malloc or ::operator new[] directly calls mmap for the allocation of a huge array, and the end of the array abuts the top of the virtual address space such that array + SIZE wraps around to zero, does this amount to a non-compliant implementation of the language?
Clarification
Note that the question is not asking about array+(SIZE-1), which is the address of the last element of the array. That one is guaranteed to be greater than array. The question is about a pointer one past the end of an array, or also p+1 when p is a pointer to a non-array object (which the section of the standard pointed to by the selected answer makes clear is treated the same way).
Stackoverflow has asked me to clarify why this question is not the same as this one. The other question asks how to implement total ordering of pointers. That other question essentially boils down to how could a library implement std::less such that it works even for pointers to differently allocated objects, which the standard says can only be compared for equality, not greater and less than.
In contrast, my question was about whether one past the end of an array is always guaranteed to be greater than the array. Whether the answer to my question is yes or no doesn't actually change how you would implement std::less, so the other question doesn't seem relevant. If it's illegal to compare to one past the end of an array, then std::less could simply exhibit undefined behavior in this case. (Also, typically the standard library is implemented by the same people as the compiler, and so is free to take advantage of properties of the particular compiler.)
Yes. From section 6.5.8 para 5.
If the expression P points to an element of an array object
and the expression Q points to the last element of the same array
object, the pointer expression Q+1 compares greater than P.
Expression array is P. The expression array + SIZE - 1 points to the last element of array, which is Q.
Thus:
array + SIZE = array + SIZE - 1 + 1 = Q + 1 > P = array
C requires this. Section 6.5.8 para 5 says:
pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values
I'm sure there's something analogous in the C++ specification.
This requirement effectively prevents allocating objects that wrap around the address space on common hardware, because it would be impractical to implement all the bookkeeping necessary to implement the relational operator efficiently.
The guarantee does not hold for the case int *array = new(int[SIZE]); when SIZE is zero .
The result of new int[0] is required to be a valid pointer that can have 0 added to it , but array == array + SIZE in this case, and a strictly less-than test will yield false.
This is defined in C++, from 7.6.6.4 (p139 of current C++23 draft):
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
(4.2) — Otherwise, if P points to an array element i of an array object x with n elements (9.3.4.5) the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 <= i − j <= n.
(4.3) — Otherwise, the behavior is undefined.
Note that 4.2 explicitly has "<= n", not "< n". It's undefined for any value larger than size(), but is defined for size().
The ordering of array elements is defined in 7.6.9 (p141):
(4.1) If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
Which means the hypothetical element n will compare greater than the array itself (element 0) for all well defined cases of n > 0.
The relevant rule in C++ is [expr.rel]/4.1:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
The above rule appears to only cover pointers to array elements, and array + SIZE doesn't point to an array element. However, as mentioned in the footnote, a one-past-the-end pointer is treated as if it were an array element here. The relevant language rule is in [basic.compound]/3:
For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.
So C++ guarantees that array + SIZE > array (at least when SIZE > 0), and that &x + 1 > &x for any object x.
array is guaranteed to have consecutive memory space inside. after c++03 or so vectors is guaranteed to have one too for its &vec[0] ... &vec[vec.size() - 1]. This automatically means that that what you're asking about is true
it's called contiguous storage . can be found here for vectors
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0944r0.html
The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size(). Presumably five more years of studying the interactions of contiguity with caching made it clear to WG21 that contiguity needed to be mandated and non-contiguous vector implementation should be clearly banned.
latter is from standard docs. C++03 I've guessed right.

Will sizeof always be a multiple of alignof?

Is sizeof(Type) always divisible by alignof(Type)
such that this statement will always be true? sizeof(Type) % alignof(Type) == 0
Yes, sizeof(Type) % alignof(Type) == 0 is true for all class types.
The standard draft says:
[dcl.array] ... An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
[expr.sizeof] ... When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.
In order for every element of an array to be aligned, the distance between two adjacent elements must be a multiple of the alignment. sizeof is defined to be this distance.
Interestingly, for fundamental types other than narrow character type, sizeof is just implementation defined:
[expr.sizeof] ... The result of sizeof applied to any other fundamental type (6.7.1)
is implementation-defined.
That said, I've never seen a system where the size of a fundamental type hasn't been a multiple of its alignment. They have to be aligned in an array as well after all.

size_t ptrdiff_t and address space

On my system both ptrdiff_t and size_t are 64-bit.
I would like to clarify two things:
I believe that no array could be as large as size_t due to address space restrictions. Is this true?
If yes, then, is there a guarantee that ptrdiff_t will be able to hold the result of subtraction of any pointers within the max-sized array?
No, there is no such guarantee. See, for example, here: https://en.cppreference.com/w/cpp/types/ptrdiff_t
If an array is so large (greater than PTRDIFF_MAX elements, but less
than SIZE_MAX bytes), that the difference between two pointers may not
be representable as std::ptrdiff_t, the result of subtracting two such
pointers is undefined.
Most implementations artificially restrict the maximum array size to make sure that difference between two pointers pointing into the same array fits into ptrdiff_t. So, it is more than likely that on your platform the maximum allowed array size is about SIZE_MAX / 2 (try it). This is not an "address space restriction", it is just a restriction internally enforced by your implementation. Under this restriction, legal pointer subtraction ("legal" = two pointers into the same array) will not overflow.
The language specification does not require that though. Implementations are not required to restrict their array size in that way, meaning that language specification allows seemingly legal pointer subtractions to overflow and produce undefined behavior. But most implementations prefer to defend against this by restricting their array sizes.
See the "three options" here for more details: Why is the maximum size of an array "too large"?
From [support.types.layout]/3
The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
So you are guaranteed that size_t can hold the size of the largest array you can have.
ptrdiff_t unfortunately is not so guaranteed. From [support.types.layout]/2
The type ptrdiff_t is an implementation-defined signed integer type that can hold the difference of two subscripts in an array object, as described in 8.7.
Which is okay-ish but then we have [expr.add]/5
When two pointers to elements of the same array object are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (21.2). If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined. [ Note: If the value i − j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined. —end note ]
Which states that ptrdiff_t may not be large enough.

How does sizeof know the size of array? [duplicate]

This question already has answers here:
How does sizeof know the size of the operand array?
(12 answers)
Closed 8 years ago.
I have codes as following:
main() {
int array[5] = {3,6,9,-8,1};
printf("the size of the array is %d\n", sizeof(array));
printf("the address of array is %p\n", array);
printf("the address of array is %p\n", &array);
int * x = array;
printf("the address of x is %p\n", x);
printf("the size of x is %d\n", sizeof(x));
}
The output is
the size of the array is 20
the address of array is 0x7fff02309560
the address of array is 0x7fff02309560
the address of x is 0x7fff02309560
the size of x is 8
I know the variable array will be seen as a pointer to the first element of the array, so I understand the the size of x is 8. But I don't know why the size of the array is 20. Isn't it should be 8 (in a 64-bits machine)?
Besides how does the program know that it is 20? As far as I know in C it doesn't store the number of elements. How come the sizeof(array) and sizeof(x) is different? I tracked several posts pertaining to array decaying but no idea on this problem.
The name of an array decays to a pointer to the first element of the array in most situations. There are a couple of exceptions to that rule though. The two most important are when the array name is used as the operand of either the sizeof operator or the address-of operator (&). In these cases, the name of the array remains an identifier for the array as a whole.
For a non-VLA array, this means that the size of the array can be determined statically (at compile time) and the result of the expression will be the size of the array (in bytes), not the size of a pointer.
When you take the address of the array, you get the same value (i.e., the same address) as if you'd just used the name of the array without taking the address. The type is different though--when you explicitly take the address, what you get is a pointer of type "pointer to array of N items of type T". That means (for one example) that while array+1 points to the second element of the array, &array+1 points to another array just past the end of the entire array.
Assuming an array of at least two items, *(array+1) will refer to the second element of the array. Regardless of the array size, &array+1 will yield an address past the end of the array, so attempting to dereference that address gives undefined behavior.
In your case, given that the size of the array is 20, and the size of one element of the array is 4, if array was, say, 0x1000, then array+1 would be 0x1004 and &array+1 would be 0x1014 (0x14 = 20).
Your array has a static length so it can be determined at compile time. Your compiler knows the sizeof(int) = 4 and your static array length [5]. 4 * 5 = 20
Edit: Your compilers int is probably 32-bit, but addressing 64-bit. That is why sizeof(pointer) returns 8.
Note that sizeof is not a library function. sizeof is
a compile-time unary operator [...] that can be used to compute the
size of any object K&R
So sizeof doesn't know how big the array is, the compiler knows how big the array is, and by definition
when applied to an array, the result is the total number of bytes
in the array.K&R
A pointer and an array are 2 different data types.
Array can hold elements of similar data type. The memory for array is contiguous.
Pointer is used to point to some valid memory location.
sizeof(type) gives you the number of bytes of the type you pass.
Now if you pass array then the compiler knows that this is an array and number of elements in it and it just multiplies that many elements with the respective data-type size value.
In this case:
5*4 = 20
Again the sizeof(int) or sizeof(pointer) is platform dependent. In this case you are seeing sizeof(pointer) as 8.
No, arrays do not decay as operands of the sizeof operator. This is one of the few places where arrays don't decay. If an int is 4 bytes on your machine, then the total number of bytes of the array should be 20 (4 * 5). We don't even need an object to test this.
sizeof(int[5]) // 20
sizeof(int*) // 8 on a 64-bit machine
C11: 6.5.3.4 (p2)
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of
the operand. [...]
In the declaration
int array[5]
the type of array is an array of 5 ints. The compiler will determine the size of array from this type.
Try this
int x = sizeof(array)/sizeof(int);
printf("the size of the array is %d\n", x);

How is an array aligned in C++ compared to a type contained?

Suppose I have some type T that has to be N bytes aligned. Now I declare an array of type T:
T array[size];
Will the array have the same alignment requirements as type T or will it have any other alignment requirements?
Yes, the alignment requirements must be the same. Obviously an array of T must be aligned at least as strictly as a single T otherwise its first member would not be properly aligned. The fact that an array cannot be more strictly aligned than its element type follows from the standard section 8.3.4 which says that arrays are contiguously allocated element subobjects. Consider this array of arrays:
T a[2][size];
Whatever the value of size, there can be no "extra" padding between the two arrays a[0] and a[1] otherwise this violates the contiguosly allocated requirement.
Equivalently, we know that (char*)&a[1] == (char*)&a[0] + sizeof(a[0]) and sizeof(a[0]) == sizeof(T[size]) == size * sizeof(T). As this holds for any size it must be possible to place an array of T at any address which is suitably aligned for a single T object (given adequate address space).
The array's alignment requirements will be identical to those of the array elements, I believe.
Obviously, the start of the array must be aligned at least as strictly as its first element requires, so its alignment requirements can't be less strict.
The start address of the array plus the size of each element must leave the second element sufficiently aligned. That places a constraint on the size of the element type, which I believe means padding can be introduced at the end of a structure just to keep arrays aligned, even if you never use that struct in an array. But it does not mean there's any need for stricter alignment.
By induction, subsequent elements are OK if the first two are OK, so giving the array the same alignment requirements as its elements should be fine.
A citation from the spec would be nice, though.
The rules are the same i believe but the interpretation might be confusing.
I believed since each element of array would be of the same size so only aligning the first element would automatically align the rest and hence there would never be any padding between elements.
This might be true in case of a trivial array but not for complex scenarios.
The stride of an array can be large than element size i.e. there could be pads between each individual elements.
Following is a good example
struct ThreeBytesWide {
char a[3];
};
struct ThreeBytesWide myArray[100];
source - stride wikipedia
Each element of ThreeBytesWide array could be aligned to four byte boundary
Edit: As elaborated in the comments, the mention of having pads between individual elements is when the element itself is say 3 bytes and aligned to four byte boundary.
An array of objects is required to be contiguous, so there's never padding between the objects, though padding can be added to the end of an object (producing nearly the same effect).
C++ Data Member Alignment and Array Packing
#include <iostream>
__declspec(align(32))
struct Str1
{
int a;
char c;
};
template<typename T>
struct size
{
T arr[10];
};
int main()
{
size<Str1> b1;
std::cout << sizeof(Str1) << std::endl; // prints 32
std::cout << sizeof(b1) << std::endl; // prints 320
std::cin.ignore();
return 0;
}
References:
Data alignment in C++, standard and portability
http://msdn.microsoft.com/en-us/library/83ythb65.aspx