I was looking at a small program -
#include<iostream>
class A
{
bool a;
bool c;
};
int main()
{
std::cout << sizeof(A) << std::endl;
return 0;
}
Here, it shows the size of class A as 2 bytes. But, if I add another integer data member to this class as -
class A
{
int b;
bool a;
bool c;
};
Now, it shows the size of class A as 8 bytes instead of 6 bytes. Why compiler does padding in second case & why not in first case?
The size of a structure is a multiple of the alignment requirement of a data member with the largest alignment requirement. This is so that when an array of structures is used there is no padding between the elements of the array and the alignment requirements for each data member of the structure is satisfied.
In the first case, the largest alignment requirement is alignof(bool) which is 1, so the size of the structure is a multiple of 1.
In the second case, the largest alignment requirement is alignof(int) which is 4, so the size of the structure is a multiple of 4.
Try adding a double member with the alignment requirement of 8.
A rule of thumb to minimize padding in structures is to arrange data members from the largest alignment requirement to the smallest, e.g. doubles followed by pointers followed by longs followed by ints followed by shorts followed by bools followed by chars.
Imagine you have an Array of As. The second A in the array would have its member b misaligned if there was no additional padding. That said the compiler can choose whatever it feels like. Misaligned accesses cost runtime, padding costs space. Your compiler favored runtime.
The only things the compiler must do is align the first member so that the address of that member is the same as the address of the structure and preserve the order of the members as specified by the class.
Other than these, the compiler is free to do what it wants (unless you tell it otherwise using packing directives which vary from compiler to compiler). It will probably pack members in order to optimise runtime speed.
Related
I roughly know about Alignment and have read cppreference's Objects and alignment
and Wikipedia's Data structure alignment
. However I still have some doubts. I'm mainly interested in C++, but the question applies to C too as it uses mostly the same rules for alignment.
I know that padding is added to increase the efficiency of data access, because on some architectures accessing a value at an address multiple of its size is faster/better (alignment).
Is that the only reason why padding is used?
If so, consider the following structures:
struct A {
int i;
char c;
};
struct B {
struct A a;
char d;
};
On my architecture (x86_64), the compiler places 3 bytes of padding at the end of A so that sizeof(A)==8 and sizeof(A[2])==16, and other 3 bytes of padding at the end of B, so that sizeof(B)==12.
I understand that aligning A to 8 bytes makes storing it in an array more efficient. But it doesn't seem to be useful at all, when A is placed inside B.
If everything so far is correct, then I'm wondering:
Why padding is placed at the end of types, instead of limiting it to between elements of aggregated types (e.g. struct or array) and never at the end?
An example of what I mean: wouldn't it be better if the compiler decided that sizeof(A)==5, sizeof(B)==6, sizeof(A[2])==13 (3 bytes of padding between the elements, but not at the end)?
Consider an architecture where int 4-byte alignment is required (or desired for performance). Now consider the following structure:
struct S {
int i;
char c;
}
There will probably won't be any padding between i and c. But now think what will happen if you define something like:
struct S array[10];
Since the arrays are not allowed to have any padding between the elements, this padding has to be added to the S structure - in the end of it (3 bytes after c), to maintan the proper alignment of each element of array.
How to simply obtain the sum of sizes of data members belonging to a structure like this: (including padding between data members but not including padding between structures) ?
#pragma pack(4)
alignas(16) struct {
char c[4];
short s;
unsigned char u[3];
} MyStruct;
The expression below correctly calculates this sum, but is difficult to construct since it requires the knowledge of member names, types as well as their order:
printf("%zu\n", &MyStruct.u[3] - &MyStruct.c[0] ); //Outputs: 11
The expression below does not calculate this sum correctly because it also counts the inter-structure padding needed to maintain alignment (the padding that would be needed if many such structures were put in an array).
printf("%zu\n", sizeof(MyStruct) ); //Outputs: 16
I realize, that the latter is by design because it facilitates pointer arithmetic. Also, the C++ documentation states:
"When applied to a structure/class type, the result of sizeof() is the size of an object of that structure/class plus any additional padding required to place such object in an array."
However, that behavior of sizeof() makes it useless for me when dealing with single non-arrayed structures, especially when calculating the number of bytes to transmit over a network, if I do not want to transmit the inter-structure padding bytes...and I don't!
Note, that transmitting the inter-member padding does not bother me, because this padding is a part of the network protocol. The protocol designers can eliminate this inter-member padding with #pragma pack(1) anytime they so desire ( if they did this, then the 1st printf() would output: 9 )
Suppose I have some type T that has to be N bytes aligned. Now I declare an array of type T:
T array[size];
Will the array have the same alignment requirements as type T or will it have any other alignment requirements?
Yes, the alignment requirements must be the same. Obviously an array of T must be aligned at least as strictly as a single T otherwise its first member would not be properly aligned. The fact that an array cannot be more strictly aligned than its element type follows from the standard section 8.3.4 which says that arrays are contiguously allocated element subobjects. Consider this array of arrays:
T a[2][size];
Whatever the value of size, there can be no "extra" padding between the two arrays a[0] and a[1] otherwise this violates the contiguosly allocated requirement.
Equivalently, we know that (char*)&a[1] == (char*)&a[0] + sizeof(a[0]) and sizeof(a[0]) == sizeof(T[size]) == size * sizeof(T). As this holds for any size it must be possible to place an array of T at any address which is suitably aligned for a single T object (given adequate address space).
The array's alignment requirements will be identical to those of the array elements, I believe.
Obviously, the start of the array must be aligned at least as strictly as its first element requires, so its alignment requirements can't be less strict.
The start address of the array plus the size of each element must leave the second element sufficiently aligned. That places a constraint on the size of the element type, which I believe means padding can be introduced at the end of a structure just to keep arrays aligned, even if you never use that struct in an array. But it does not mean there's any need for stricter alignment.
By induction, subsequent elements are OK if the first two are OK, so giving the array the same alignment requirements as its elements should be fine.
A citation from the spec would be nice, though.
The rules are the same i believe but the interpretation might be confusing.
I believed since each element of array would be of the same size so only aligning the first element would automatically align the rest and hence there would never be any padding between elements.
This might be true in case of a trivial array but not for complex scenarios.
The stride of an array can be large than element size i.e. there could be pads between each individual elements.
Following is a good example
struct ThreeBytesWide {
char a[3];
};
struct ThreeBytesWide myArray[100];
source - stride wikipedia
Each element of ThreeBytesWide array could be aligned to four byte boundary
Edit: As elaborated in the comments, the mention of having pads between individual elements is when the element itself is say 3 bytes and aligned to four byte boundary.
An array of objects is required to be contiguous, so there's never padding between the objects, though padding can be added to the end of an object (producing nearly the same effect).
C++ Data Member Alignment and Array Packing
#include <iostream>
__declspec(align(32))
struct Str1
{
int a;
char c;
};
template<typename T>
struct size
{
T arr[10];
};
int main()
{
size<Str1> b1;
std::cout << sizeof(Str1) << std::endl; // prints 32
std::cout << sizeof(b1) << std::endl; // prints 320
std::cin.ignore();
return 0;
}
References:
Data alignment in C++, standard and portability
http://msdn.microsoft.com/en-us/library/83ythb65.aspx
suppose a struct defined like this:
struct S{
char a[3];
char b[3];
char c[3];
};
then what will be the output of printf("%d", sizeof(S)) ? On My compiler of Vc++ 2008 expression, the output is 9. And I got confused... I suppose the result be 12, but it is not. Shouldn't the compiler align the structure to 4 or 8 ?
The value of the sizeof-expression is implementation-dependent; the only thing guaranteed by the C++ standard is that it must be at least nine since you're storing nine char's in the struct.
The new C++11 standard has an alignas keyword, but this may not be implemented in VC++08. Check your compiler's manual (see e.g. __declspec(align(#))).
There's nothing in S that would force any of its members to be aligned other than per-byte so the compiler doesn't need to add any padding at all.
First, the alignment is implementation dependent, so it will depend on the compiler.
Now, remember that for a statically allocated array, the size need not be stored (the standard does not require it is), therefore it is usual for the alignment of an array to be the alignment of its elements.
Here, char[3] thus has an alignment of 1, and they are perfectly packed.
There is a compiler switch, /Zp, that allows you to set the default struct member alignment. There are also some other methods for specifying alignment in the c language.
Check out this MSDN post for details:
http://msdn.microsoft.com/en-us/library/xh3e3fd0(v=vs.80).aspx
Maybe your compiler is using one of these settings?
The typical requirement that each member be aligned only requires that the structure itself be aligned to the largest member type. Since all member types are char, the alignment is 1, so there's no need for padding. (For arrays, the base type (all extents removed) is what counts.)
Think about making an array of your structure: You'll want all the members of all the elements of that array to be aligned. But in your case that's just one large array of chars, so there's no need for padding.
As an example, suppose that on your platform sizeof(short) == 2 and that alignment equals size, and consider struct X { char a; short b; char c; };. Then there's one byte internal padding between a and b to align b correctly, but also one byte terminal padding after c so that the entire struct has a size that's a multiple of 2, the largest member size. That way, when you have an array X arr[10], all the elements of arr will be properly aligned individually.
The compiler is given fairly wide latitude about how its aligns data. As a practical matter, however, the alignment of a datum will not exceed its size. That is, chars must be byte-aligned, while ints and longs are often four-byte aligned.
Additionally, structs are aligned to the strictest alignment requirement of their members.
So, in your example, the strictest internal alignment requirement is 1-byte aligned, so the struct is 1-byte aligned. This means that it requires no padding.
Once again, I'm questioning a longstanding belief.
Until today, I believed that the alignment of the following struct would normally be 4 and the size would normally be 5...
struct example
{
int m_Assume_32_Bits;
char m_Assume_8_Bit_Bytes;
};
Because of this assumption, I have data structure code that uses offsetof to determine the distance in bytes between two adjacent items in an array. Today, I spotted some old code that was using sizeof where it shouldn't, couldn't understand why I hadn't had bugs from it, coded up a unit test - and the test surprised me by passing.
A bit of investigation showed that the sizeof the type I used for the test (similar to the struct above) was an exact multiple of the alignment - ie 8 bytes. It had padding after the final member. Here is an example of why I never expected this...
struct example2
{
example m_Example;
char m_Why_Cant_This_Be_At_Offset_6_Bytes;
};
A bit of Googling showed examples that make it clear that this padding after the final member is allowed - for example http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding (the "or at the end of the structure" bit).
This is a bit embarrassing, as I recently posted this comment - Use of struct padding (my first comment to that answer).
What I can't seem to determine is whether this padding to an exact multiple of the alignment is guaranteed by the C++ standard, or whether it is just something that is permitted and that some (but maybe not all) compilers do.
So - is the size of a struct required to be an exact multiple of the alignment of that struct according to the C++ standard?
If the C standard makes different guarantees, I'm interested in that too, but the focus is on C++.
5.3.3/2
When applied to a class, the result [of sizeof] is the number of bytes in an object of that class, including any padding required for placing objects of that type in an array.
So yes, object size is a multiple of its alignment.
One definition of alignment size:
The alignment size of a struct is the offset from one element to the next element when you have an array of that struct.
By its nature, if you have an array of a struct with two elements, then both need to have aligned members, so that means that yes, the size has to be a multiple of the alignment. (I'm not sure if any standard explicitly enforce this, but because the size and alignment of a struct don't depend on whether the struct is alone or inside an array, the same rules apply to both, so it can't really be any other way.)
The standard says (section [dcl.array]:
An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
Therefore there is no padding between array elements.
Padding inside structures is not required by the standard, but the standard doesn't permit any other way of aligning array elements.
I am unsure if this is in the actual C/C++ standard, and I am inclined to say that it is up to the compiler (just to be on the safe side). However, I had a "fun" time figuring that out a few months ago, where I had to send dynamically generated C structs as byte arrays across a network as part of a protocol, to communicate with a chip. The alignment and size of all the structs had to be consistent with the structs in the code running on the chip, which was compiled with a variant of GCC for the MIPS architecture. I'll attempt to give the algorithm, and it should apply to all variants of gcc (and hopefully most other compilers).
All base types, like char, short and int align to their size, and they align to the next available position, regardless of the alignment of the parent. And to answer the original question, yes the total size is a multiple of the alignment.
// size 8
struct {
char A; //byte 0
char B; //byte 1
int C; //byte 4
};
Even though the alignment of the struct is 4 bytes, the chars are still packed as close as possible.
The alignment of a struct is equal to the largest alignment of its members.
Example:
//size 4, but alignment is 2!
struct foo {
char A; //byte 0
char B; //byte 1
short C; //byte 3
}
//size 6
struct bar {
char A; //byte 0
struct foo B; //byte 2
}
This also applies to unions, and in a curious way. The size of a union can be larger than any of the sizes of its members, simply due to alignment:
//size 3, alignment 1
struct foo {
char A; //byte 0
char B; //byte 1
char C; //byte 2
};
//size 2, alignment 2
struct bar {
short A; //byte 0
};
//size 4! alignment 2
union foobar {
struct foo A;
struct bar B;
}
Using these simple rules, you should be able to figure out the alignment/size of any horribly nested union/struct you come across. This is all from memory, so if I have missed a corner case that can't be decided from these rules please let me know!
C++ doesn't explicitly says so, but it is a consequence of two other requirements:
First, all objects must be well-aligned.
3.8/1 says
The lifetime of an object of type T begins when [...] storage with the proper alignment and size for type T is obtained
and 3.9/5:
Object types have *alignnment requirements (3.9.1, 3.9.2). The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.
So every object must be aligned according to its alignment requirements.
The other requirement is that objects in an array are allocated contigulously:
8.3.4/1:
An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
For the objects in an array to be contiguously allocated, there can be no padding between them. But for every object in the array to be properly aligned, each individual object must be padded so that the byte immediately after the end of the object is also well aligned. In other words, the size of the object must be a multiple of its alignment.
So to split your question up into two:
1. Is it legal?
[5.3.3.2] When applied to a class, the result [of the sizeof() operator] is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.
So, no, it's not.
2. Well, why isn't it?
Here, I cna only speculate.
2.1. Pointer arithmetics get weirder
If alignment would be "between array elements" but would not affect the size, zthigns would get needlessly complicated, e.g.
(char *)(X+1) != ((char *)X) + sizeof(X)
(I have a hunch that this is required implicitely by the standard even without above statement, but I can't put it to proof)
2.2 Simplicity
If alignment affects size, alignment and size can be decided by looking at a single type. Consider this:
struct A { int x; char y; }
struct B { A left, right; }
With the current standard, I just need to know sizeof(A) to determine size and layout of B.
With the alternate you suggest I need to know the internals of A. Similar to your example2: for a "better packing", sizeof(example) is not enough, you need to consider the internals of example.
It is possible to produce a C or C++ typedef whose alignment is not a multiple of its size. This came up recently in this bindgen bug. Here's a minimal example, which I'll call test.c below:
#include <stdio.h>
#include <stdalign.h>
__attribute__ ((aligned(4))) typedef struct {
char x[3];
} WeirdType;
int main() {
printf("sizeof(WeirdType) = %ld\n", sizeof(WeirdType));
printf("alignof(WeirdType) = %ld\n", alignof(WeirdType));
return 0;
}
On my Arch Linux x86_64 machine, gcc -dumpversion && gcc test.c && ./a.out prints:
9.3.0
sizeof(WeirdType) = 3
alignof(WeirdType) = 4
Similarly clang -dumpversion && clang test.c && ./a.out prints:
9.0.1
sizeof(WeirdType) = 3
alignof(WeirdType) = 4
Saving the file as test.cc and using g++/clang++ gives the same result. (Update from a couple years later: I get the same results from GCC 11.1.0 and Clang 13.0.0.)
Notably however, MSVC on Windows does not seem to reproduce any behavior like this.
The standard says very little about padding and alignment. Very little is guaranteed. About the only thing you can bet on is that the first element is at the beginning of the structure. After that...alignment and padding can be anything.
Seems the C++03 standard didn't say (or I didn't find) whether the alignment padding bytes should be included in the object representation.
And the C99 standard says the "sizeof" a struct type or union type includes internal and trailing padding, but I'm not sure if all alignment padding is included in that "trailing padding".
Now back to your example. There is really no confusion. sizeof(example) == 8 means the structure does take 8 bytes to represent itself, including the tailing 3 padding bytes. If the char in the second structure has an offset of 6, it will overwrite the space used by m_Example. The layout of a certain type is implementation-defined, and should be kept stable in the whole implementation.
Still, whether p+1 equals (T*)((char*)p + sizeof(T)) is unsure. And I'm hoping to find the answer.