Trailing padding in C/C++ in nested structures - is it neccesary? - c++

This is more of a theoretical question.
I'm familiar with how padding and trailing padding works.
struct myStruct{
uint32_t x;
char* p;
char c;
};
// myStruct layout will compile to
// x: 4 Bytes
// padding: 4 Bytes
// *p: 8 Bytes
// c: 1 Byte
// padding: 7 Bytes
// Total: 24 Bytes
There needs to be padding after x, so that *p is aligned, and there needs to be trailing padding after c so that the whole struct size is divisible by 8 (in order to get the right stride length).
But consider this example:
struct A{
uint64_t x;
uint8_t y;
};
struct B{
struct A myStruct;
uint32_t c;
};
// Based on all information I read on internet, and based on my tinkering
// with both GCC and Clang, the layout of struct B will look like:
// myStruct.x: 8 Bytes
// myStruct.y: 1 Byte
// myStruct.padding: 7 Bytes
// c: 4 Bytes
// padding: 4 Bytes
// total size: 24 Bytes
// total padding: 11 Bytes
// padding overhead: 45%
// my question is, why struct A does not get "inlined" into struct B,
// and therefore why the final layout of struct B does not look like this:
// myStruct.x: 8 Bytes
// myStruct.y: 1 Byte
// padding 3 Bytes
// c: 4 Bytes
// total size: 16 Bytes
// total padding: 3 Bytes
// padding overhead: 19%
Both layouts satisfy alignments of all variables. Both layouts have the same order of variables. In both layouts struct B has correct stride length (divisible by 8 Bytes). Only thing that differs (besides 33% smaller size), is that struct A does not have correct stride length in layout 2, but that should not matter, since clearly there is no array of struct As.
I checked this layout in GCC with -O3 and -g, struct B has 24 Bytes.
My question is - is there some reason why this optimization is not applied? Is there some layout requirement in C/C++ that forbids this? Or is there some compilation flag I'm missing? Or is this an ABI thing?
EDIT:
Answered.
See answer from #dbush on why compiler cannot emit this layout on it's own.
The following code example uses GCC pragmas packed and aligned (as suggested by #jaskij) to manualy enforce the more optimized layout. Struct B_packed has only 16 Bytes instead of 24 Bytes (note that this code might cause issues/run slow when there is an array of structs B_packed, be aware and don't blindly copy this code):
struct __attribute__ ((__packed__)) A_packed{
uint64_t x;
uint8_t y;
};
struct __attribute__ ((__packed__)) B_packed{
struct A_packed myStruct;
uint32_t c __attribute__ ((aligned(4)));
};
// Layout of B_packed will be
// myStruct.x: 8 Bytes
// myStruct.y: 1 Byte
// padding for c: 3 Bytes
// c: 4 Bytes
// total size: 16 Bytes
// total padding: 3 Bytes
// padding overhead: 19%

is there some reason why this optimization is not applied
If this were allowed, the value of sizeof(struct B) would be ambiguous.
Suppose you did this:
struct B b;
struct A a = { 1, 2 };
b.c = 0x12345678;
memcpy(&b.myStruct, &a, sizeof(struct A));
You'd be overwriting the value of b.c.

Padding is used to force alignment. Now if you have an array of struct myStruct, then there is a rule that array elements follow each other without any padding. In your case, without padding inside myStruct after the last field, the second myStruct in an array wouldn't be properly aligned. Therefore it is necessary that sizeof(myStruct) is a multiple of the alignment of myStruct, and for that you may need enough padding at the end.

Related

Not Understanding C++ Memory Alignment

I understand that in C++, if we have a struct like this:
struct x_
{
char a; // 1 byte
int b; // 4 bytes
short c; // 2 bytes
char d; // 1 byte
} MyStruct;
Memory structure will look like this due to compiler padding:
struct x_
{
char a; // 1 byte
char _pad0[3]; // padding to put 'b' on 4-byte boundary
int b; // 4 bytes
short c; // 2 bytes
char d; // 1 byte
char _pad1[1]; // padding to make sizeof(x_) multiple of 4
}
Can somebody please help me understand why sizeof(x_) must be a multiple of 4, and not any other number?
At least, a class must have the same alignment largest alignment of any of its non-static data member. So x_ must be aligned as b. If x_ had a lower alignment requirement, then b could be misaligned. Imagine an object x_ at address 3 then its b member would also be at an address 3+alignof(int). If alignof(int)==4 then b would be at address 7, so it would be misaligned. So following this example, alignof(x_)==4;
Then the size of an object must be a multiple of the alignment because elements of an array are required to be contiguous: a pointer to the end of element n must be a pointer to the element n+1. So in your case x size must be a multiple of 4.

Structure alignment padding, largest size of padding, and order of struct members

I've been learning about structure data padding since I found out my sizeof() operator wasn't returning what I expected. According to the pattern that I've observed, it aligns structure members with the largest data type. So for example...
struct MyStruct1
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
// Total 5 Bytes
//Total size of struct = 5 (no padding)
};
struct MyStruct2
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
short f; // 2 bytes
// Total 7 Bytes
//Total size of struct = 8 (1 byte of padding between char e and short f
};
struct MyStruct3
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
int f; // 4 bytes
// Total 9 bytes
//Total size of struct = 12 (3 bytes of padding between char e and int f
};
However if make the last member an 8 byte data type, for example a long long, it still only adds 3 bytes of padding, making a four-byte aligned structure. However if I build in 64 bit mode, it does in fact align for 8 bytes (the biggest data type). My first question is, am I wrong in saying it aligns the members with the largest data type? This statement seems correct for a 64 bit build, but only true up to 4 byte data types in a 32 bit build. Has this to do with the native 'word' size of the CPU? Or the program itself?
My second question is, would the following be an entire waste of space and bad programming?
struct MyBadStruct
{
char a; // 1 byte
unsigned int b; // 4 bytes
UINT8 c; // 1 byte
long d; // 4 bytes
UCHAR e; // 1 byte
char* f; // 4 bytes
char g; // 1 byte
// Total of 16 bytes
//Total size of struct = 28 bytes (12 bytes of padding, wasted)
};
How padding is done, is not part of the standard. So it can be done differently on different systems and compilers. It is often done so that variables are aligned at there size, i.e. size=1 -> no alignment, size=2 -> 2 byte alignment, size=4 -> 4 byte alignment and so on. For size=8, it is normally 4 or 8 bytes aligned. The struct it self is normally 4 or 8 bytes aligned. But - just to repeat - it is system/compiler dependent.
In your case it seems to follow the pattern above.
So
char a;
int b;
will give 3 bytes padding to 4 byte align the int.
and
char a1;
int b1;
char a2;
int b2;
char a3;
int b3;
char a4;
int b4;
will end up as 32 byte (again to 4 byte align the int).
But
int b1;
int b2;
int b3;
int b4;
char a1;
char a2;
char a3;
char a4;
will be just 20 as the int is already aligned.
So if memory matters, put the largest members first.
However, if memory doesn't matter (e.g. because the struct isn't used that much), it may be better to keep things in a logical order so that the code is easy to read for humans.
Typically the best way to reduce the amount of padding inserted by the compiler is to sort the data members inside of your struct from largest to smallest:
struct MyNotSOBadStruct
{
long d; // 4 bytes
char* f; // 4 bytes
unsigned int b; // 4 bytes
char a; // 1 byte
UINT8 c; // 1 byte
UCHAR e; // 1 byte
char g; // 1 byte
// Total of 16 bytes
};
the size may vary depending on 32 vs 64 bit os because the size of a pointer will change
live version: http://coliru.stacked-crooked.com/a/aee33c64192f2fe0
i get size = 24
All of the following is implementation dependent. Do not rely on this for the correctness of your programs (but by all means make use of it for debugging or improving performance).
In general each datatype has a preferred alignment. This is never larger than the size of the type, but it can be smaller.
It appears that your compiler is aligning 64-bit integers on a 32-bit boundary when compiling in 32-bit mode, but on a 64-bit boundary in 64-bit mode.
As to your question about MyBadStruct: In general, write your code to be simple and easy to understand; only do anything else if you know (through measurement) that you have a problem. Having said that, if you sort your member variables by size (largest first), you will minimize padding space.

Struct size stays the same even after adding a new member to it

when is simply execute
cout << sizeof(string);
i got 8 as answer.
now i am having a structure
typedef struct {
int a;
string str;
} myType;
and i am executing
cout << sizeof(myType);
i got 16 as the answer.
now i made a change in my structure
typedef struct {
int a, b;
string str;
} myType;
and i am executing
cout << sizeof(myType);
i got 16 as the answer!!!. How? What is happening?
Perhaps padding is happening. E.g. sizeof(int) can be 4 bytes and compiler can add 4 bytes after a for the sake of data alignment. The layout could be like this:
typedef struct {
int a; // 4 bytes
// 4 bytes for padding
string str; // 8 bytes
} myType;
typedef struct {
int a; // 4 bytes
int b; // 4 bytes
string str; // 8 bytes
} myType;
Looks like 8 byte alignment.
So if you have any data type that has less than 8 bytes, it will still use 8 bytes.
I assume the pointer is 8 byte, whereas the ints are only 4 bytes each.
You can force 1 byte alignment using code like outlined here Struct one-byte alignment conflicted with alignment requirement of the architecture? . You should then get different size for first case.
It's called structure packing to achieve optimal memory alignment.
See The Lost Art of C Structure Packing to understand the how and why. It's done the same way in both C and C++.
In C/C++ structs are "packed" in byte chunks. You can specify which size your structs should be packed.
Here a reference: http://msdn.microsoft.com/en-us/library/2e70t5y1.aspx

Why does an uint64_t needs more memory than 2 uint32_t's when used in a class? And how to prevent this?

I have made the following code as an example.
#include <iostream>
struct class1
{
uint8_t a;
uint8_t b;
uint16_t c;
uint32_t d;
uint32_t e;
uint32_t f;
uint32_t g;
};
struct class2
{
uint8_t a;
uint8_t b;
uint16_t c;
uint32_t d;
uint32_t e;
uint64_t f;
};
int main(){
std::cout << sizeof(class1) << std::endl;
std::cout << sizeof(class2) << std::endl;
std::cout << sizeof(uint64_t) << std::endl;
std::cout << sizeof(uint32_t) << std::endl;
}
prints
20
24
8
4
So it's fairly simple to see that one uint64_t is as large as two uint32_t's, Why would class 2 have 4 extra bytes, if they are the same except for the substitution of two uint32_t's for an uint64_t.
As it was pointed out, this is due to padding.
To prevent this, you may use
#pragma pack(1)
class ... {
};
#pragma pack(pop)
It tells your compiler to align not to 8 bytes, but to one byte. The pop command switches it off (this is very important, since if you do that in the header and somebody includes your header, very weird errors may occur)
Why does an uint64_t needs more memory than 2 uint32_t's when used in a class?
The reason is padding due to alignment requirements.
On most 64-bit architectures uint8_t has an alignment requirement of 1, uint16_t has an alignment requirement of 2, uint32_t has an alignment requirement of 4 and uint64_t has an alignment requirement of 8. The compiler must ensure that all members in a structure are correctly aligned and that the size of a structure is a multiple of it's overall alignment requirement. Furthermore the compiler is not allowed to re-order members.
So your structs end up laid out as follows
struct class1
{
uint8_t a; //offset 0
uint8_t b; //offset 1
uint16_t c; //offset 2
uint32_t d; //offset 4
uint32_t e; //offset 8
uint32_t f; //offset 12
uint32_t g; //offset 16
}; //overall alignment requirement 4, overall size 20.
struct class2
{
uint8_t a; //offset 0
uint8_t b; //offset 1
uint16_t c; //offset 2
uint32_t d; //offset 4
uint32_t e; //offset 8
// 4 bytes of padding because f has an alignment requirement of 8
uint64_t f; //offset 16
}; //overall alignment requirement 8, overall size 24
And how to prevent this?
Unfortunately there is no good general solution.
Sometimes it is possible to reduce the amount of padding by re-ordering fields, but that doesn't help in your case. It just moves the padding around in the structure. A structure with a field requiring 8 byte alignment will always have a size that is a multiple of 8. Therefore no matter how much you rearrange the fields your structure will always have a size of at least 24.
You can use compiler-specific features such as #pragma pack or __attribute((packed)) to force the compiler to pack the structure more tightly than normal alignment requirements would allow. However, as well as limiting portability, this creates a problem when taking the address of a member or binding a reference to the member. The resulting pointer or reference may not satisfy the alignment requirements and therefore may not be safe to use.
Different compilers vary in how they handle this problem. From some playing around on godbolt.
g++ 9 through 11 will refuse to bind a reference to a packed member and give a warning when taking the address.
clang 4 through 11 will give a warning when taking the address, but will silently bind a reference and pass that reference across a compilation unit boundary.
Clang 3.9 and earlier will take the address and bind a reference silently.
g++ 8 and earlier and clang 3.9 and earlier (down to the oldest version on godbolt) will also refuse to bind a reference, but will take the address with no warning.
icc will bind a pointer or take the address without producing any warnings in either case (though to be fair intel processors support unaligned access in hardware).
The rule for alignment (on x86 and x86_64) is generally to align a variable on it's size.
In other words, 32-bit variables are aligned on 4 bytes, 64-bit variables on 8 bytes, etc.
The offset of f is 12, so in case of uint32_t f no padding is needed, but when f is an uint64_t, 4 bytes of padding are added to get f to align on 8 bytes.
For this reason it is better to order data members from largest to smallest. Then there wouldn't be any need for padding or packing (except possibly at the end of the structure).

When the compiler decides to pad a struct

Let's say we have:
struct A{
char a1;
char a2;
};
struct B{
int b1;
char b2;
};
struct C{
char C1;
int C2;
};
I know that because of padding to a multiple of the word size (assuming word size=4), sizeof(C)==8 although sizeof(char)==1 and sizeof(int)==4.
I would expect that sizeof(B)==5 instead of sizeof(B)==8.
But if sizeof(B)==8 I would expect that sizeof(A)==4 instead of sizeof(A)==2.
Could anyone please explain why the padding and the aligning are working differently in those cases?
A common padding scheme is to pad structs so that each member starts at an even multiple of the size of that member or to the machine word size (whichever is smaller). The entire struct is padded following the same rule.
Assuming such a padding scheme I would expect:
The biggest member in struct A has size 1, so no padding is used.
In struct B, the size of 5 is padded to 8, because one member has size 4.
The layout would be:
int 4
char 1
padding 3
In struct C, some padding is inserted before the int, so that it starts at an address divisible by 4.
The layout would be:
char 1
padding 3
int 4
It's up to the compiler to decide how best to pad the struct. For some reason, it decided that in struct B that char b2 was more optimally aligned on a 4 byte boundary. Additionally, the specific architecture may have requirements/behaviors that the compiler takes into account when deciding how to pad structs.
If you 'pack' the struct, then you'd see the sizes you expect (although that is not portable and may have performance penalties and other issues depending on the architecture).
structs in general will be aligned to boundaries based on the largest type contained. Consider an array of struct B myarray[5];
struct B must be aligned to 8 bytes so that it's b1 member is always on a 4 byte boundary. myarray[1].b1 can't start at the 6th byte into the array, which is what you would have if sizeof(B) == 5.