Field alignment of a struct in C/C++

Field alignment of a struct in C/C++ - c++

Are the members of a structure packed in C/C++?
By packed I mean that they are compact and among the fields there aren't memory spaces.

That isn't what aligned means, and no, no particular alignment or packing is guaranteed. The elements will be in order, but the compiler can insert padding where it chooses. This actually creates (useful) alignment. E.g., for a x86:
struct s
{
char c;
int i;
};
there will probably (but not necessarily) be three bytes between c and i. This allows i to be aligned on a word boundary, which can provide much faster memory access (on some architectures, it's required).
From C99 §6.7.2.1:
Each non-bit-field member of a
structure or union object is aligned
in an implementation- defined manner
appropriate to its type.

What you are asking for is packing, and alignment is different. Both are outside of the scope of the language and are specific for each implementation. Take a look here.

Generally not. Some info here.
Depending on the compiler, you can introduce pragmas to help (from the link above):
#pragma pack(push) /* push current alignment to stack */
#pragma pack(1) /* set alignment to 1 byte boundary */
struct MyPackedData
{
char Data1;
long Data2;
char Data3;
};
#pragma pack(pop) /* restore original alignment from stack */

Typically (but under no guarantees), members of a struct are word-aligned. This means that a field less than the size of a word will be padded to take up an entire word.
However, when the next member of the struct can also fit inside the same word, then the compiler will put both members into the same word. This is more efficient space-wise, but depending on your platform, retrieving said members might be more expensive computationally.
On my 32-bit system using GCC under Cygwin, this program...
#include <iostream>
struct foo
{
char a;
int b;
char c;
};
int main(int argc, char** argv)
{
std::cout << sizeof(foo) << std::endl;
}
outputs '12' because both chars are word-aligned and take up 4 bytes each.
However, switch the struct to
struct foo
{
char a;
char c;
int b;
};
and the output is '8' because both chars next to each other can fit in a single word.

It is possible to pack bytes in order to conserve memory. For instance, pack(2) will tell members that longer than a byte to pack to two-bytes in order to maintain a two-byte boundary so that any padding members are two bytes long. Sometimes packing is used as part of a standard communication protocol where it expects a certain size. Here is what Wikipedia has to say about C/C++ and padding:
Padding is only inserted when a
structure member is followed by a
member with a larger alignment
requirement or at the end of the
structure. By changing the ordering of
members in a structure, it is possible
to change the amount of padding
required to maintain alignment. For
example, if members are sorted by
ascending or descending alignment
requirements a minimal amount of
padding is required. The minimal
amount of padding required is always
less than the largest alignment in the
structure. Computing the maximum
amount of padding required is more
complicated, but is always less than
the sum of the alignment requirements
for all members minus twice the sum of
the alignment requirements for the
least aligned half of the structure
members.
Although C and C++ do not allow the
compiler to reorder structure members
to save space, other languages might.
Since in struct's, the compiler treats things as words, sometimes care must be taken if you are relying on the size of the struct to be a certain size. For instance, aligning char vs int.

They are not packed by default. Instead, they are word-aligned depending on how your machine is set up. If you do want them to be packed. Then you can use __attribute__((__packed__)) at the end of your struct declaration like this:
struct abc {
char a;
int b;
char c;
}__attribute__((__packed__));
Then, for
struct abc _abc;
_abc will be packed.
Reference: Specific structure packing when using the GNU C Compiler

Seeing some outputs of the same structure's variations may give a clue about what is going on. After reading this, if I did not get it wrong, small types will be padded to be a word-lengths.
struct Foo {
char x ; // 1 byte
int y ; // 4 byte
char z ; // 1 byte
int w ; // 4 byte
};
struct FooOrdered {
char x ; // 1 byte
char z ; // 1 byte
int y ; // 4 byte
int w ; // 4 byte
};
struct Bar {
char x ; // 1 byte
int w ; // 4 byte
};
struct BarSingleType {
char x ; // 1 byte
};
int main(int argc, char const *argv[]) {
cout << sizeof(Foo) << endl;
cout << sizeof(FooOrdered) << endl;
cout << sizeof(Bar) << endl;
cout << sizeof(BarSingleType) << endl;
return 0;
}
In my environment output was like this:
16
12
8
1

Related

why size of structure does not change when use 24 bit integer

I am trying to port embed code in windows platform.
I have come across below problem I am posting a sample code here.
here even after I use int24 size remains 12 bytes in windows why ?
struct INT24
{
INT32 data : 24;
};
struct myStruct
{
INT32 a;
INT32 b;
INT24 c;
};
int _tmain(int argc, _TCHAR* argv[])
{
unsigned char myArr[11] = { 0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00, 0xFF,0xFF,0xFF };
myStruct *p = (myStruct*)myArr;
cout << sizeof(*p);
}

There are two reasons, each of which would be enough by themselves.
Presumably the size of INT32 is 4 bytes. The size of INT24 is also 4 bytes, because it contains a INT32 bit field. Since, myStruct contains 3 members of size 4, its size must therefore be at least 12.
Presumably the alignment requirement of INT32 is 4. So, even if the size of INT24 were 3, the size of myStruct would still have to be 12, because it must have at least the alignment requirement of INT32 and therefore the size of myStruct must be padded to the nearest multiple of 4.
any way or workaround ?
This is implementation specific, but the following hack may work for some compilers/cpu combinations. See the manual of your compiler for the syntax for similar feature, and the manual for your target cpu whether it supports unaligned memory access. Also do realize that unaligned memory access does have a performance penalty.
#pragma pack(push, 1)
struct INT24
{
INT32 data : 24;
};
#pragma pack(pop)
#pragma pack(push, 1)
struct myStruct
{
INT32 a;
INT32 b;
INT24 c;
};
#pragma pack(pop)
Packing a bit field might not work the same in all compilers. Be sure to check how yours behaves.
I think that a standard compliant way would be to store char arrays of sizes 3 and 4, and whenever you need to read or write one of the integer, you'd have to std::memcpy the value. That would be a bit burdensome to implement and possibly also slower than the #pragma pack hack.

Sadly for you, the compiler in optimising the code for a particular architecture reserves the right to pad out the structure by inserting spaces between members and even at the end of the structure.
Using a bit field does not reduce the size of the struct; you still get the whole of the "fielded" type in the struct.
The standard guarantees that the address of the first member of a struct is the same as the address of the struct, unless it's a polymorphic type.
But all is not lost: you can rely on the fact that an array of char will always be contiguous and contain no packing.
If CHAR_BIT is defined as 8 on your system (it probably is), you can model an array of 24 bit types on an array of char. If it's not 8 then even this approach will not work: I'd then suggest resorting to inline assembly.

When the compiler decides to pad a struct

Let's say we have:
struct A{
char a1;
char a2;
};
struct B{
int b1;
char b2;
};
struct C{
char C1;
int C2;
};
I know that because of padding to a multiple of the word size (assuming word size=4), sizeof(C)==8 although sizeof(char)==1 and sizeof(int)==4.
I would expect that sizeof(B)==5 instead of sizeof(B)==8.
But if sizeof(B)==8 I would expect that sizeof(A)==4 instead of sizeof(A)==2.
Could anyone please explain why the padding and the aligning are working differently in those cases?

A common padding scheme is to pad structs so that each member starts at an even multiple of the size of that member or to the machine word size (whichever is smaller). The entire struct is padded following the same rule.
Assuming such a padding scheme I would expect:
The biggest member in struct A has size 1, so no padding is used.
In struct B, the size of 5 is padded to 8, because one member has size 4.
The layout would be:
int 4
char 1
padding 3
In struct C, some padding is inserted before the int, so that it starts at an address divisible by 4.
The layout would be:
char 1
padding 3
int 4

It's up to the compiler to decide how best to pad the struct. For some reason, it decided that in struct B that char b2 was more optimally aligned on a 4 byte boundary. Additionally, the specific architecture may have requirements/behaviors that the compiler takes into account when deciding how to pad structs.
If you 'pack' the struct, then you'd see the sizes you expect (although that is not portable and may have performance penalties and other issues depending on the architecture).

structs in general will be aligned to boundaries based on the largest type contained. Consider an array of struct B myarray[5];
struct B must be aligned to 8 bytes so that it's b1 member is always on a 4 byte boundary. myarray[1].b1 can't start at the 6th byte into the array, which is what you would have if sizeof(B) == 5.

MSVC default memory alignment of 8

According to MSDN, the /Zp command defaults to 8, which means 64-bit alignment boundaries are used. I have always assumed that for 32-bit applications, the MSVC compiler will use 32-bit boundaries. For example:
struct Test
{
char foo;
int bar;
};
The compiler will pad it like so:
struct Test
{
char foo;
char padding[3];
int bar;
};
So, since /Zp8 is used by default, does that mean my padding becomes 7+4 bytes using the same example above:
struct Test
{
char foo;
char padding1[7];
int bar;
char padding2[4];
}; // Structure has 16 bytes, ending on an 8-byte boundary
This is a bit ridiculous isn't it? Am I misunderstanding? Why is such a large padding used, it seems like a waste of space. Most types on a 32-bit system aren't even going to use 64-bits, so the majority of variables would have padding (probably over 80%).

That's not how it works. Members are aligned to a multiple of their size. Char to 1 byte, short to 2, int to 4, double to 8. The structure is padded at the end to ensure the members still align correctly when the struct is used in an array.
A packing of 8 means it stops trying to align members that are larger than 8. Which is a practical limit, the memory allocator doesn't return addresses aligned better than 8. And double is brutally expensive if it isn't aligned properly and ends up straddling a cache line. But otherwise a headache if you write SIMD code, it requires 16 byte alignment.

That does not mean every member is aligned on an 8byte boundary. Read a little more carefully:
the smaller member type or n-byte boundaries
The key here is the first part- "smaller member type". That means that members with less alignment might be aligned less, effectively.
struct x {
char c;
int y;
};
std::cout << sizeof(x);
std::cout << "offsetof(x, c) = " << offsetof(x, c) << '\n';
std::cout << "offsetof(x, c) = " << offsetof(x, y) << '\n';
This yields 8, 0, 4- meaning that in fact, the int is only padded to a 4byte alignment.

struct alignment question

typedef struct {
char c;
char cc[2];
short s;
char ccc;
}stuck;
Should the above struct have a memory layout as this ?
1 2 3 4 5 6 7
- c - cc - s - ccc -
or this ?
1 2 3 4 5 6 7 8
- c - cc - s - ccc -
I think the first should be better but why my VS09 compiler chooses the second ? (Is my layout correct by the way ?) Thank you

I think that your structure will have the following layout, at least on Windows:
typedef struct {
char c;
char cc[2];
char __padding;
short s;
char ccc;
char __tail_padding;
} stuck;
You could avoid the padding by reordering the structure members:
typedef struct {
char c;
char cc[2];
char ccc;
short s;
} stuck;

The compiler can't choose the second. The standard mandates that the first field must be aligned with the start of the structure.
Are you using offsetof from stddef.h for finding this out ?
6.7.2.1 - 13
A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-ﬁeld, then to the unit in
which it resides), and vice versa. There may be unnamed padding
within a structure object, but not at its beginning.
It means that you can have
struct s {
int x;
char y;
double z;
};
struct s obj;
int *x = (int *)&obj; /* Legal. */
Put another way
offsetof(s, x); /* Must yield 0. */

Other than at the beginning of a structure, an implementation can put whatever padding it wants in your structures so there's no right way. From C99 6.7.2.1 Structure and union specifiers, paragraphs:
/12:Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
/13:There may be unnamed
padding within a structure object, but not at its beginning.
/15:There may be unnamed padding at the end of a structure or union.
Paragraph 13 also contains:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
This means that the fields within the structure cannot be re-ordered. And, in a large number of modern implementations (but this is not mandated by the standard), the alignment of an object is equal to its size. For example a 32-bit integer data type may have an alignment requirement of four (8-bit) bytes.
Hence, a logical alignment would be:
offset size field
------ ---- -----
0 1 char c;
1 2 char cc[2];
3 1 padding
4 2 short s;
6 1 char ccc;
7 1 padding
but, as stated, it may be something different. The final padding is to ensure that consecutive array elements are aligned correctly (since the short most likely has to be on a 2-byte boundary).
There are a number of (non-portable) ways in which you may be able to control the padding. Many compilers have a #pragma pack option that you can use to control padding (although be careful: while some systems may just slow down when accessing unaligned data, some will actually dump core for an illegal access).
Also, re-ordering the elements within the structure from largest to smallest tends to reduce padding as well since the larger elements tend to have stricter alignment requirements.
These, and an even uglier "solution" are discussed more here.

While I do really understand your visual representation of the alignment, I can tell you that with VS you can achieve a packed structure by using 'pragma':
__pragma( pack(push, 1) )
struct { ... };
__pragma( pack(pop) )
In general struct-alignment depends on the compiler used, the target-platform (and its address-size) and the weather, IOW in reality it is not well defined.

Others have mentionned that padding may be introduced either between attributes or after the last attribute.
The interesting thing though, I believe, is to understand why.
Types usually have an alignment. This property precises which address are valid (or not) for a particular type. On some architecture, this is a loose requirement (if you do not respect it, you only incur some overhead), on others, violating it causes hardware exceptions.
For example (arbitrary, as each platform define its own):
char: 1
short (16 bits): 2
int (32 bits): 4
long int (64 bits): 8
A compound type will usually have as alignment the maximum of the alignment of its parts.
How does alignment influences padding ?
In order to respect the alignment of a type, some padding may be necessary, for example:
struct S { char a; int b; };
align(S) = max(align(a), align(b)) = max(1, 4) = 4
Thus we have:
// S allocated at address 0x16 (divisible by 4)
0x16 a
0x17
0x18
0x19
0x20 b
0x21 b
0x22 b
0x23 b
Note that because b can only be allocated at an address also divisible by 4, there is some space between a and b, this space is called padding.
Where does padding comes from ?
Padding may have two different reasons:
between attributes, it is caused by a difference in alignment (see above)
at the end of the struct, it is caused by array requirements
The array requirement is that elements of an array should be allocated without intervening padding. This allows one to use pointer arithmetic to navigate from an element to another:
+---+---+---+
| S | S | S |
+---+---+---+
S* p = /**/;
p = p + 1; // <=> p = (S*)((void*)p + sizeof(S));
This means, however, than the structure S size needs be a multiple of S alignment.
Example:
struct S { int a; char b; };
+----+-+---+
| a |b| ? |
+----+-+---+
a: offset 0, size 4
b: offset 4, size 1
?: offset 5, size 3 (padding)
Putting it altogether:
typedef struct {
char a;
char b[2];
short s;
char c;
} stuck;
+-+--+-+--+-+-+
|a| b|?|s |c|?|
+-+--+-+--+-+-+
If you really wish to avoid padding, one (simple) trick (which does not involve addition nor substraction) is to simply order your attributes starting from the maximum alignment.
typedef struct {
short s;
char a;
char b[2];
char c;
} stuck;
+--+-+--+-+
| s|a| b|c|
+--+-+--+-+
It's a simple rule of thumb, especially as the alignment of basic types may change from platform to platform (32bits/64bits) whereas the relative order of the types is pretty stable (exception: the pointers).

size of a user defined class

why the size of the class cl1 in the following code is 8 but not 5, while the size of class cl2 is 1?
class cl1 {
public:
int n;
char cb;
cl1();
~cl1();
};
class cl2 {
public:
char cb;
cl2();
~cl2();
};

The compiler is free to insert padding in between and after class members in order to ensure that variables are properly aligned, etc. Exactly what padding is inserted is up to the implementation. In this case, I'd guess that the compiler is adding 3 bytes of padding after cl1::cb, perhaps to ensure that the next variable in memory is aligned on a 4-byte boundary.

The largest member of cl1 (n) is 4 bytes, so the size of cl1 is padded up to the nearest 4 bytes (8 in this case) so that an array of cl1 objects does not create n members which are not aligned to 4-byte addresses. Most processors really hate misaligned multi-byte values, either suffering performance losses (two memory cycles to access one value) or outright crashes (alignment exceptions).
There is no guarantee that this will be consistent from compiler to compiler or platform to platform.

This is all due to padding. More info can be found, for example, here
The thing is that the addresses of both the object and its members should be properly aligned for OS and Hardware - specific reasons. Thus the result. The problem of padding is complicated by the fact that objects in an array must be located consecutively, without any space in between, and ALL should be properly aligned.

It is because of structure padding by the compiler. If you want to remove the padding, try #pragma pack(1) and you should get 5 and 1 as expected.

While you're exploring the size of struct and how padding is done, let me tell you an interesting thing. The size of struct not only depends on the members, but also on the order of their declaration. For example, size of the following structs is different, even though both has same number of members of same types, the only difference is the order of their declaration!
struct A
{
int a;
char b;
char c;
};
struct B
{
char b;
int a;
char c;
};
cout << "sizeof(A) = " << sizeof(A) << endl;
cout << "sizeof(B) = " << sizeof(B) << endl;
Output:
sizeof(A) = 8
sizeof(B) = 12
Online Demo at Ideone: http://www.ideone.com/8OoxX

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js