Is sizeof(Type) always divisible by alignof(Type)
such that this statement will always be true? sizeof(Type) % alignof(Type) == 0
Yes, sizeof(Type) % alignof(Type) == 0 is true for all class types.
The standard draft says:
[dcl.array] ... An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
[expr.sizeof] ... When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.
In order for every element of an array to be aligned, the distance between two adjacent elements must be a multiple of the alignment. sizeof is defined to be this distance.
Interestingly, for fundamental types other than narrow character type, sizeof is just implementation defined:
[expr.sizeof] ... The result of sizeof applied to any other fundamental type (6.7.1)
is implementation-defined.
That said, I've never seen a system where the size of a fundamental type hasn't been a multiple of its alignment. They have to be aligned in an array as well after all.
Related
On my system both ptrdiff_t and size_t are 64-bit.
I would like to clarify two things:
I believe that no array could be as large as size_t due to address space restrictions. Is this true?
If yes, then, is there a guarantee that ptrdiff_t will be able to hold the result of subtraction of any pointers within the max-sized array?
No, there is no such guarantee. See, for example, here: https://en.cppreference.com/w/cpp/types/ptrdiff_t
If an array is so large (greater than PTRDIFF_MAX elements, but less
than SIZE_MAX bytes), that the difference between two pointers may not
be representable as std::ptrdiff_t, the result of subtracting two such
pointers is undefined.
Most implementations artificially restrict the maximum array size to make sure that difference between two pointers pointing into the same array fits into ptrdiff_t. So, it is more than likely that on your platform the maximum allowed array size is about SIZE_MAX / 2 (try it). This is not an "address space restriction", it is just a restriction internally enforced by your implementation. Under this restriction, legal pointer subtraction ("legal" = two pointers into the same array) will not overflow.
The language specification does not require that though. Implementations are not required to restrict their array size in that way, meaning that language specification allows seemingly legal pointer subtractions to overflow and produce undefined behavior. But most implementations prefer to defend against this by restricting their array sizes.
See the "three options" here for more details: Why is the maximum size of an array "too large"?
From [support.types.layout]/3
The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
So you are guaranteed that size_t can hold the size of the largest array you can have.
ptrdiff_t unfortunately is not so guaranteed. From [support.types.layout]/2
The type ptrdiff_t is an implementation-defined signed integer type that can hold the difference of two subscripts in an array object, as described in 8.7.
Which is okay-ish but then we have [expr.add]/5
When two pointers to elements of the same array object are subtracted, the type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (21.2). If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined. [ Note: If the value i − j is not in the range of representable values of type std::ptrdiff_t, the behavior is undefined. —end note ]
Which states that ptrdiff_t may not be large enough.
Consider the following struct definition:
#define SIZE ... // it's a positive multiple of sizeof(Foo*)
struct Foo {
Foo* ptr;
char padding[SIZE - sizeof(Foo*)];
};
Given that SIZE is a positive multiple of the pointer size (sizeof(Foo*)), is it guaranteed by the standard that sizeof(Foo) == SIZE?
If it is not guaranteed, as a practical matter, are there any platforms in common use that provide a counter-example (where the equality doesn't hold)?
Yes, I'm aware of alignas...
There is no guarantee about padding.
C++ Standard (working draft n4741) 6.7(4) Types
4 The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits that
hold the value of type T. Bits in the object representation that are not
part of the value representation are padding bits. For trivially copyable
types, the value representation is a set of bits in the object
representation that determines a value, which is one discrete element of
an implementation-defined set of values. (41)
(41) The intent is that the memory model of C++ is compatible with that
of ISO/IEC 9899 Programming Language C.
C++ Standard (working draft n4741) 8.5.2.3(2) Sizeof
When applied to a reference or a reference type, the result is the size
of the referenced type. When applied to a class, the result is the
number of bytes in an object of that class including any padding required
for placing objects of that type in an array. The result of applying
sizeof to a potentially-overlapping subobject is the size of the type,
not the size of the subobject.78 When applied to an array, the result is
the total number of bytes in the array. This implies that the size of an
array of n elements is n times the size of an element.
I can point to no example, off-hand, where it would not hold, but based on the standard's memory model compatibility with "ISO/IEC 9899 Programming Language C", there can be no guarantees given regarding padding -- it is implementation defined.
Assume we have an array that contains N elements of type T.
T a[N];
According to the C++14 Standard, under which conditions do we have a guarantee that
(char*)(void*)&a[0] + n*sizeof(T) == (char*)(void*)&a[n], (0<=n<N) ?
While this is true for many types and implementations, the standard mentions it in a footnote, and in an ambiguous way:
§5.7.6, footnote 85) Another way to approach pointer arithmetic ...
There is little indication that this other way was thought of being equivalent to the standard's way. It might rather be a hint for implementers that suggests one of many conforming implementations.
Edits:
People have underestimated the difficulty of this question.
This question is not about what you can read in textbooks, it is about what what you can deduce from the C++14 Standard through the use of logic and reason.
If you use 'contiguous' or 'contiguously', please also say what is being contiguous.
While T[] and T* are closely related, they are abstractions, and the addition on T* x N may be defined by the implementation in any consistent way.
The equation was rearranged using pointer addition. If p points to a char, p+1 is always defined using (§5.7 (4)) or unary addition, so we don't run into UB. The original included a pointer subtraction, which might have caused UB early on. (The char pointers are only compared, not dereferenced).
In [dcl.array]:
An object of array type contains a contiguously allocated non-empty
set of N subobjects of type T.
Contiguous implies that the offset between any consecutive subobjects of type T is sizeof(T), which implies that the offset of the nth subobject is n*sizeof(T).
The upper bound of n < N comes from [expr.add]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,
the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j < n; otherwise, the behavior is undefined.
It's always true, but instead of looking at the rules for pointer arithmetic you must rely on the semantics given for the sizeof operator (5.3.3 [expr.sizeof]):
When applied to a reference or a reference type, the result is the size of the referenced type. When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. The size of a most derived class shall be greater than zero.
The result of applying sizeof to a base class subobject is the size of the base class type. When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element.
It should be clear that there's only one packing that puts n non-overlapping elements in space of n * sizeof(element), namely that they are regularly spaced sizeof (element) bytes apart. And only one ordering is allowed by the pointer comparison rules found under the relational operator section (5.9 [expr.rel]):
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
The declaration in the first line is also a definition. (§3.1(2))
It creates the array object. (§1.8(1))
An object can be accessed via multiple lvalues
due to the aliasing rules. (§3.10(10)) In particular, the objects on the
right hand side may be legally accessed (aliased) through char pointers.
Lets look at a sentence in the array definition and then disambiguate 'contiguous'.
"An object of array type contains a contiguously allocated non-empty set
of N subobjects of type T." [dcl.array] §8.3.4.
Disambiguation
We start from the binary symmetric relation 'contiguous' for char objects, which should be obvious. ('iff' is short for 'if and only if', sets and sequences are mathematical ones, not C++ containers) If you can
link to a better or more acknowledged definition, comment.
A sequence x_1 ... x_N of char objects is contiguous iff
x_i and x_{i+1} are contiguous in memory for all i=1...N-1.
A set M of char objects is contiguous iff the objects in
M can be numbered, x_1 ...x_N, say, such that the sequence (x_i)_i is contiguous.
That is, iff M is the image of a contiguous, injective sequence.
Two sets M_1, M_2 of char objects are contiguous iff there
exist x_1 in M_1 and x_2 in M_2 such that x_1 and x_2 are contiguous.
A sequence M_1 ... M_N of sets of char objects is contiguous iff
M_i and M_{i+1} are contiguous for all i=1...N-1.
A set of sets of char objects is contiguous iff it is the image of
a contiguous, injective sequence of sets of char objects.
Now which version of 'contiguous' to apply? Linguistic overload resolution:
1) 'contiguous' may refer to 'allocation'. As an allocation function call provides a
subset of the available char objects, this would invoke the set-of-chars variant. That is,
the set of all char objects that occur in any of the N subobjects would be meant to be contiguous.
2) 'contiguous' may refer to 'set'. This would invoke the set-of-sets-of-chars variant with every subobject considered as a set of char objects.
What does this mean? First, while the authors numbered the array subobjects a[0] ... a[N-1], they chose not to say anything about the
order of subobjects in memory: they used 'set' instead of 'sequence'.
They described the allocation as contiguous, but they do not say that
a[j] and a[j+1] are contiguous in memory. Also, they chose not to write down the
straightforward formula involving (char*) pointers and sizeof(). While it looks like they
deliberately separated contiguity from ordering concerns,
§5.9 (3) requires one and the same ordering for array subobjects of all types.
If pointers point to two different elements of the same array, or a subobject thereof, the pointer
to the element with the higher subscript compares greater.
Now do the bytes that make up the array subobjects qualify as
subobjects in the sense of the above quote? Reading §1.8(2) and Complete object or subobject?
the answer is: No, at least not for arrays whose elements don't contain subobjects and are no arrays of chars, e.g. arrays of ints. So we may find examples where no particular ordering is imposed on the array elements.
But for the moment let's assume that our array subobjects are populated with chars only.
What does this mean considering the two possible interpretations of 'contiguous'?
1) We have a contiguous set of bytes that coincides with an ordered set of subobjects.
Then the claim in the OP is unconditionally true.
2) We have a contiguous sequence of subobjects, each of which may be non-contiguous individually.
This may happen in two ways: either the subobjects may have gaps, that is, they
contain two char objects at distance greater than sizeof(subobject)-1. Or the
subobjects may be distributed among different sequences of contiguous bytes.
In case 2) there is no guarantee that that the claim in the OP is true.
Therefore, it is important to be clear about what 'contiguous' means.
Finally, here's an example of an implementation where no obvious ordering is imposed on the array subobjects by §5.9 because the array subobjects don't have subobjects themselves. Readers raised concerns that this would contradict the standard in other places, but no definite contradiction has been demonstrated yet.
Assume T is int, and we have one particular conforming implementation that behaves as expected naively with one exception:
It allocates arrays of ints in reversed memory order,
putting the array's first element at the high-memory-address end of the object:
a[N-1], a[N-2], ... a[0]
instead of
a[0], a[1], ... a[N-1]
This implementation satisfies any reasonable contiguity
requirement, so we don't have to agree on a single interpretation of
'contiguous' to proceed with the argument.
Then if p points to a, mapping p to &a[0] (invoking [conv.array]) would make the pointer jump near the high memory end of a.
As array arithmetic has to be compatible with pointer arithmetic, we'd also have
int * p= &intVariable;
(char*)(p+1) + sizeof(int) == (char*)p
and
int a[N];
(char*)(void*)&a[n] + n*sizeof(int)==(char*)(void*)&a[0], (0<=n<N)
Then, for T=int, there is no guarantee that the claim in the original post is true.
edit history: removed and reintroduced in modified form a possibly erroneous shortcut that was due to not applying a relevant part of the pointer < relation specification. It has not been determined yet whether this was justified or not, but the main argument about contiguity comes through anyway.
I've been poring over the draft standard and can't seem to find what I'm looking for.
If I have a standard-layout type
struct T {
unsigned handle;
};
Then I know that reinterpret_cast<unsigned*>(&t) == &t.handle for some T t;
The goal is to create some vector<T> v and pass &v[0] to a C function that expects a pointer to an array of unsigned integers.
So, does the standard define sizeof(T) == sizeof(unsigned) and does that imply that an array of T would have the same layout as an array of unsigned?
While this question addresses a very similar topic, I'm asking about the specific case where both the data member and the class are standard layout, and the data member is a fundamental type.
I've read some paragraphs that seem to hint that maybe it might be true, but nothing that hits the nail on the head. For example:
§ 9.2.17
Two standard-layout struct (Clause 9) types are layout-compatible if
they have the same number of non-static data members and corresponding
non-static data members (in declaration order) have layout-compatible
types
This isn't quite what I'm looking for, I don't think.
You essentially are asking, given:
struct T {
U handle;
};
whether it's guaranteed that sizeof(T) == sizeof(U). No, it is not.
Section 9.2/17 of the ISO C++03 standard says:
A pointer to a POD-struct object, suitably converted using a
reinterpret_cast, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides) and vice versa.
Suppose you have an array of struct T. The vice versa part means that the address of any of the T::handle members must also be a valid address of a struct T. Now, suppose that these members are of type char and that your claim is true. This would mean that struct T would be allowed to have an unaligned address, which seems rather unlikely. The standard usually tries to not tie the hands of implementations in such a way. For your claim to be true, the standard would have to require that struct T be allowed to have unaligned addresses. And it would have to be allowed for all structures, because struct T could be a forward-declared, opaque type.
Furthermore, section 9.2/17 goes on to state:
[Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment.]
Which, taken a different way, means that there is no guarantee that there will never be padding.
I am used to Borland environments and for them:
T is a struct in your case so sizeof(T) is size of struct
that depends on #pragma pack and align setting of your compiler
so sometimes it can be greater than sizeof(unsigned) !!!
for the same reason if you have 4Byte struct (uint32) and 16Byte allign
struct T { uint32 u; };
then T a[100] is not the same as uint32 a[100];
because T is uint32 + 12 Byte empty space !!!
RETRACTION: The argument is erroneous. The proof of Lemma 2 relies on a hidden premise that the alignment of an aggregate type is determined strictly by the alignments of its member types. As Dyp points out in the commentary, that premise is not supported by the standard. It is therefore admissible for struct { Foo f } to have a more strict alignment requirement that Foo.
I'll play devil's advocate here, since no one else seems to be willing. I will argue that standard C++ (I'll refer to N3797 herein) guarantees that sizeof(T) == sizeof(U) when T is a standard layout class (9/7) with default alignment having a single default-aligned non-static data member U, e.g,
struct T { // or class, or union
U u;
};
It's well-established that:
sizeof(T) >= sizeof(U)
offsetof(T, u) == 0 (9.2/19)
U must be a standard layout type for T to be a standard layout class
u has a representation consisting of exactly sizeof(U) contiguous bytes of memory (1.8/5)
Together these facts imply that the first sizeof(U) bytes of the representation of T are occupied by the representation of u. If sizeof(T) > sizeof(U), the excess bytes must then be tail padding: unused padding bytes inserted after the representation of u in T.
The argument is, in short:
The standard details the circumstances in which an implementation may add padding to a standard-layout class,
None of those cirumstances applies in this particular instance, and hence
A conforming implementation may not add padding.
Potential Sources of Padding
Under what circumstances does the standard allow an implementation to add such padding to the representation of a standard layout class? When it's necessary for alignment: per 3.11/1, "An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated." There are two mentions of adding padding, both for alignment reasons:
5.3.3 Sizeof [expr.sizeof]/2 states "When applied to a reference or a reference type, the result is the size of the referenced type. When applied
to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array. The size of a most derived class shall be greater than zero (1.8). The result of applying sizeof to a base class subobject is the size of the base class type.77 When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element."
9.2 Class members [class.mem]/13 states "Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1)."
(Notably the C++ standard does not contain a blanket statement allowing implementations to insert padding in structures as in the C standards, e.g., N1570 (C11-ish) §6.7.2.1/15 "There may be unnamed padding within a structure object, but not at its beginning." and /17 "There may be unnamed padding at the end of a structure or union.")
Clearly the text of 9.2 doesn't apply to our problem, since (a) T has only one member and thus no "adjacent members", and (b) T is standard layout and hence has no virtual functions or virtual base classes (per 9/7). Demonstrating that 5.3.3/2 doesn't allow padding in our problem is more challenging.
Some Prerequisites
Lemma 1: For any type W with default alignment, alignof(W) divides sizeof(W): By 5.3.3/2, the size of an array of n elements of type W is exactly n times sizeof(W) (i.e., there is no "external" padding between array elements). The addresses of consecutive array elements are then sizeof(W) bytes apart. By the definition of alignment, it must then be that alignof(W) divides sizeof(W).
Lemma 2: The alignment alignof(W) of a default-aligned standard layout class W with only default-aligned data members is the least common multiple LCM(W) of the alignments of the data members (or 1 if there are none): Given an address at which an object of W can be allocated, the address LCM(W) bytes away must also be appropriately aligned: the difference between the addresses of member subobjects would also be LCM(W) bytes, and the alignment of each such member subobject divides LCM(W). Per the definition of alignment in 3.11/1, we have that alignof(W) divides LCM(W). Any whole number of bytes n < LCM(W) must not be divisible by the alignment of some member v of W, so an address that is only n bytes away from an address at which an object of W can be allocated is consequently not appropriately aligned for an object of W, i.e., alignof(W) >= LCM(W). Given that alignof(W) divides LCM(W) and alignof(W) >= LCM(W), we have alignof(W) == LCM(W).
Conclusion
Application of this lemma to the original problem has the immediate consequence that alignof(T) == alignof(U). So how much padding might be "required for placing objects of that type in an array"? None. Since alignof(T) == alignof(U) by the second lemma, and alignof(U) divides sizeof(U) by the first, it must be that alignof(T) divides sizeof(U) so zero bytes of padding are required to place objects of type T in an array.
Since all possible sources of padding bytes have been eliminated, an implementation may not add padding to T and we have sizeof(T) == sizeof(U) as required.
At first one might think std::numeric_limits<size_t>::max(), but if there was an object that huge, could it still offer a one-past-the-end pointer? I guess not. Does that imply the largest value sizeof(T) could yield is std::numeric_limits<size_t>::max()-1? Am I right, or am I missing something?
Q: What is the largest value sizeof(T) can yield?
A: std::numeric_limits<size_t>::max()
Clearly, sizeof cannot return a value larger than std::numeric_limits<size_t>::max(), since it wouldn't fit. The only question is, can it return ...::max()?
Yes. Here is a valid program, that violates no constraints of the C++03 standard, which demonstrates a proof-by-example. In particular, this program does not violate any constraint listed in §5.3.3 [expr.sizeof], nor in §8.3.4 [dcl.array]:
#include <limits>
#include <iostream>
int main () {
typedef char T[std::numeric_limits<size_t>::max()];
std::cout << sizeof(T)<<"\n";
}
If std::numeric_limits<ptrdiff_t>::max() > std::numeric_limits<size_t>::max() you can compute the size of an object of size std::numeric_limits<size_t>::max() by subtracting a pointer to it from a one-past-the-end pointer.
If sizeof(T*) > sizeof(size_t) you can have enough distinct pointers to address each and every single byte inside that object (in case you have an array of char, for example) plus one for one-past-the-end.
So, it's possible to write an implementation where sizeof can return std::numeric_limits<size_t>::max(), and where you can get pointer to one-past-the-end of an object that large.
it's not exactly well-defined. but to stay within safe limits of the standard, max object size is std::numeric_limits<ptrdiff_t>::max()
that's because when you subtract two pointers, you get a ptrdiff_t
which is a signed integer type
cheers & hth.,
The requirement to be able to point beyond the end of an array has nothing to do with the range of size_t. Given an object x, it's quite possible for (&x)+1 to be a valid pointer, even if the number of bytes separating the two pointers can't be represented by size_t.
You could argue that the requirement does imply an upper bound on object size of the maximum range of pointers, minus the alignment of the object. However, I don't believe the standard says anywhere that such a type can't be defined; it would just be impossible to instantiate one and still remain conformant.
If this was a test, I'd say (size_t) -1
A sizeof() expression yields a value of type size_t. From C99 standard 6.5.3.4:
The value of the result is implementation-defined, and its type (an
unsigned integer type) is size_t, defined in stddef.h (and other
headers).
Therefore, the maximum value that sizeof() can yield is SIZE_MAX.
You can have a standard compliant compiler that allows for object sizes that cause pointer arithmetic to overflow; however, the result is undefined. From the C++ standard, 5.7 [expr.add]:
When two pointers to elements of the same array object are subtracted,
the result is the difference of the subscripts of the two array
elements. The type of the result is an implementation-defined signed
integral type; this type shall be the same type that is defined as
std::ptrdiff_t in the <cstddef> header (18.2). As with any other
arithmetic overflow, if the result does not fit in the space provided,
the behavior is undefined.