I have a structure that looks like this:
struct vdata {
static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[12]; // 12 bytes
} u;
int16_t C; // 2 bytes
int16_t D; // 2 bytes
};
I would like to make this 16 bytes, but GCC is telling me it is 24, as the union is padding to 16 bytes.
I would like to put vdata into a large std::vector. From my understanding, there should be no issue with alignment if this were 16 bytes, since the pointer would always be 8 byte aligned.
I understand that I can force this to be packed using __attribute__((__packed__)) in GCC. But I would like to know if there is a portable and standard compliant way to get this to be 16 bytes?
Edit: Ideas
Idea 1: split up the B array.
struct vdata {
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[8]; // 8 bytes
} u;
uint8_t B2[4]; // 4 bytes
int16_t C; // 2 bytes
int16_t D; // 2 bytes
};
Could B2 elements be reliably accessed from a pointer of B? Is that defined behavior?
Idea 2: store pointer as byte array and memcpy as necessary (#Eljay)
struct vdata {
union union_data {
std::byte A[sizeof(uint8_t*)]; // 8 bytes
uint8_t B[12]; // 12 bytes
} u;
int16_t C; // 2 bytes
int16_t D; // 2 bytes
};
Would there be a performance penalty for accessing the pointer, or would it be optimized out? (Assuming GCC x86).
You could change A to std::byte A[sizeof(uint8_t*)]; and then std::memcpy the pointer into A and out of A.
Worth commenting as to what is going on, and that these extra hoops are to avoid padding bytes.
Also adding a set_A setter and get_A getter may be very helpful.
struct vdata {
union union_data {
std::byte A[sizeof(uint8_t*)]; // 8 bytes
uint8_t B[12]; // 12 bytes
} u;
int16_t C; // 2 bytes
int16_t D; // 2 bytes
void set_A(uint8_t* p) {
std::memcpy(u.A, &p, sizeof p);
}
uint8_t* get_A() {
uint8_t* result;
std::memcpy(&result, u.A, sizeof result);
return result;
}
};
Store C+D in the union's array, and provide method access to them:
struct vdata {
static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[16]; // 12 + 2*2 bytes
} u;
int16_t& C() {
return *reinterpret_cast<int16_t*>(static_cast<void*>(&u.B[12]));
}
int16_t& D() {
return *reinterpret_cast<int16_t*>(static_cast<void*>(&u.B[14]));
}
};
Demo (with zero warnings for strict aliasing violations and run-time address sanitization enabled)
Keep in mind that there's no strict aliasing violation when the buffer is char* i.e. single byte type like uint8_t - I mean thankfully because otherwise it would be impossible to create memory pools. If it makes things clearer/safer you can even have an explicit char array buffer:
struct vdata {
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[12]; // 12 bytes
char buf[16]; // 16 bytes - could be std::byte buf[16]
} u;
int16_t& C() { return *(int16_t*)(&u.buf[12]); }
int16_t& D() { return *(int16_t*)(&u.buf[14]); }
};
Regarding alignment The array is 8-aligned due to the address of the union, so positions 12&14 are guaranteed to be 2-aligned which is the requirement for int16_t (even though the string u.B appears in the code).
Alternatively you can force align the structure. The C++ alignas specifier would not be valid here because you want to lower the alignment of your structure, put a pragma directive is possible to give you again 16 bytes:
#pragma pack(4)
struct vdata {
static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[12]; // 12 bytes
} u;
int16_t C; // 2 bytes
int16_t D; // 2 bytes
};
Demo
I'm fairly certain that this one will cause problems.
As far as I understand, the following code would be the most safe one.
The data that specify the type is in the Initial common sequence. Thus you can access it either way (by using cda.C or cdb.C) so it is perfect for determining the type.
Then putting everything in a struct for both cases allows to ensure that each struct layout is independant (thus B can start before next 8 bytes alignment).
#include <cstdint>
#include <iostream>
struct CDA
{
int16_t C; // 2 bytes
int16_t D; // 2 bytes
uint8_t* A; // 8 bytes
};
struct CDB
{
int16_t C; // 2 bytes
int16_t D; // 2 bytes
uint8_t B[12]; // 12 bytes
};
struct vdata {
union union_data {
CDA cda;
CDB cdb;
} u;
};
static_assert(sizeof(uint8_t*) == 8);
static_assert(sizeof(CDA) == 16);
static_assert(sizeof(CDB) == 16);
static_assert(offsetof(vdata::union_data, cda) == offsetof(vdata::union_data, cdb));
static_assert(offsetof(CDA, C) == offsetof(CDB, C));
static_assert(offsetof(CDA, C) == 0);
static_assert(sizeof(vdata) == 16);
int main()
{
std::cout << "sizeof(CDA) : " << sizeof(CDA) << std::endl;
std::cout << "sizeof(CDB) : " << sizeof(CDB) << std::endl;
std::cout << "sizeof(vdata) : " << sizeof(vdata) << std::endl;
}
Usefull source of information:
CppCon 2017: Scott Schurr “Type Punning in C++17: Avoiding Pun-defined Behavior”
Union declaration
std::launder
std::variant
How to decide?
If the size optimization is not that important, I would recommend to use std::variant.
If the size is important but the order is not, then the current solution might be the best choice.
If portability is not so important, then pragma pack solution might be appropriate (remember to reset alignment after the struct definition).
Otherwise, if you really need layout control, then either use:
std::byte array and memcpy (access data with functions)
placement new and std::launder.
In all cases, be sure to have appropriate assertion that verify assumptions you make. I have put many in my sample code but you can adjust depending on your need.
Also, unless you have millions of vdata items or you are on an embedded device, then using 24 bytes instead of 16 might not be a big deal.
You might also use conditionnal define to optimize only for your current compiler. This could be useful to ensure that you have working code (though maybe less optimal) for every target or it can allows to depend on behavior that is undefined from the standard but might be defined on your compiler.
Related
Suppose I have a struct which contains other structs, which are not packed:
struct ContainerOfNonPacked
{
NonPacked1 first;
NonPacked2 second;
};
And then I have a struct which contains other structs which are packed:
struct ContainerOfPacked
{
Packed1 first; // declaration of struct had __attribute__((packed))
Packed2 second; // ditto
};
The first is not going to be packed by the compiler (i.e. there's no guarantee that there will be no "holes" inside the struct). It might coincidentally have no holes, but that's not what the question is about.
What about the second container, that contains packed structs? Is there any guarantee that a struct consisting solely of packed structs as its fields is itself packed?
Edit: it is up to the compiler and is not a behaviour one can depend on.
For the case of GCC and clang, structs that contain only packed structs are themselves "packed". They will not have any padding "holes" in them.
Here is an example to illustrate (godbolt link: https://godbolt.org/g/2noSpP):
We have four structs, Packed1, Packed2, NonPacked1, and NonPacked2.
#include <cstdint>
struct Packed1
{
uint32_t a;
uint64_t b;
} __attribute__((packed));
struct NonPacked1
{
uint16_t a;
// Padding of 2 bytes
uint32_t b;
uint8_t c;
};
// We have two structs of the same size, one packed and one not
static_assert(sizeof(NonPacked1) == 12);
static_assert(sizeof(Packed1) == 12);
struct Packed2
{
uint64_t a;
uint64_t b;
} __attribute__((packed)); // packing has no effect, but better for illustration
struct NonPacked2
{
uint32_t a;
// Padding of 4 bytes
uint64_t b;
};
// And again, two structs of the same size
static_assert(sizeof(Packed2) == 16);
static_assert(sizeof(NonPacked2) == 16);
Packed1 and Packed2 go into a ContainerOfPacked struct, and the other two into a ContainerOfNonPacked.
struct ContainerOfNonPacked
{
NonPacked1 first; // 12 bytes
// Padding of 4 bytes between the non-packed struct-fields
NonPacked2 second; // 16 bytes
};
struct ContainerOfPacked
{
Packed1 first; // 12
// No padding between the packed struct-fields
Packed2 second; // 16
};
static_assert(sizeof(ContainerOfNonPacked) == 32);
static_assert(sizeof(ContainerOfPacked) == 28);
As the code comments and static asserts show, the container of packed structs does not have padding holes inside it. It behaves as if it is itself "packed".
Although this is an answer, I'm also looking for answers which can cite the relevant parts of the Standard or some other authoritative/theoretical explanations.
I've been learning about structure data padding since I found out my sizeof() operator wasn't returning what I expected. According to the pattern that I've observed, it aligns structure members with the largest data type. So for example...
struct MyStruct1
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
// Total 5 Bytes
//Total size of struct = 5 (no padding)
};
struct MyStruct2
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
short f; // 2 bytes
// Total 7 Bytes
//Total size of struct = 8 (1 byte of padding between char e and short f
};
struct MyStruct3
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
int f; // 4 bytes
// Total 9 bytes
//Total size of struct = 12 (3 bytes of padding between char e and int f
};
However if make the last member an 8 byte data type, for example a long long, it still only adds 3 bytes of padding, making a four-byte aligned structure. However if I build in 64 bit mode, it does in fact align for 8 bytes (the biggest data type). My first question is, am I wrong in saying it aligns the members with the largest data type? This statement seems correct for a 64 bit build, but only true up to 4 byte data types in a 32 bit build. Has this to do with the native 'word' size of the CPU? Or the program itself?
My second question is, would the following be an entire waste of space and bad programming?
struct MyBadStruct
{
char a; // 1 byte
unsigned int b; // 4 bytes
UINT8 c; // 1 byte
long d; // 4 bytes
UCHAR e; // 1 byte
char* f; // 4 bytes
char g; // 1 byte
// Total of 16 bytes
//Total size of struct = 28 bytes (12 bytes of padding, wasted)
};
How padding is done, is not part of the standard. So it can be done differently on different systems and compilers. It is often done so that variables are aligned at there size, i.e. size=1 -> no alignment, size=2 -> 2 byte alignment, size=4 -> 4 byte alignment and so on. For size=8, it is normally 4 or 8 bytes aligned. The struct it self is normally 4 or 8 bytes aligned. But - just to repeat - it is system/compiler dependent.
In your case it seems to follow the pattern above.
So
char a;
int b;
will give 3 bytes padding to 4 byte align the int.
and
char a1;
int b1;
char a2;
int b2;
char a3;
int b3;
char a4;
int b4;
will end up as 32 byte (again to 4 byte align the int).
But
int b1;
int b2;
int b3;
int b4;
char a1;
char a2;
char a3;
char a4;
will be just 20 as the int is already aligned.
So if memory matters, put the largest members first.
However, if memory doesn't matter (e.g. because the struct isn't used that much), it may be better to keep things in a logical order so that the code is easy to read for humans.
Typically the best way to reduce the amount of padding inserted by the compiler is to sort the data members inside of your struct from largest to smallest:
struct MyNotSOBadStruct
{
long d; // 4 bytes
char* f; // 4 bytes
unsigned int b; // 4 bytes
char a; // 1 byte
UINT8 c; // 1 byte
UCHAR e; // 1 byte
char g; // 1 byte
// Total of 16 bytes
};
the size may vary depending on 32 vs 64 bit os because the size of a pointer will change
live version: http://coliru.stacked-crooked.com/a/aee33c64192f2fe0
i get size = 24
All of the following is implementation dependent. Do not rely on this for the correctness of your programs (but by all means make use of it for debugging or improving performance).
In general each datatype has a preferred alignment. This is never larger than the size of the type, but it can be smaller.
It appears that your compiler is aligning 64-bit integers on a 32-bit boundary when compiling in 32-bit mode, but on a 64-bit boundary in 64-bit mode.
As to your question about MyBadStruct: In general, write your code to be simple and easy to understand; only do anything else if you know (through measurement) that you have a problem. Having said that, if you sort your member variables by size (largest first), you will minimize padding space.
when is simply execute
cout << sizeof(string);
i got 8 as answer.
now i am having a structure
typedef struct {
int a;
string str;
} myType;
and i am executing
cout << sizeof(myType);
i got 16 as the answer.
now i made a change in my structure
typedef struct {
int a, b;
string str;
} myType;
and i am executing
cout << sizeof(myType);
i got 16 as the answer!!!. How? What is happening?
Perhaps padding is happening. E.g. sizeof(int) can be 4 bytes and compiler can add 4 bytes after a for the sake of data alignment. The layout could be like this:
typedef struct {
int a; // 4 bytes
// 4 bytes for padding
string str; // 8 bytes
} myType;
typedef struct {
int a; // 4 bytes
int b; // 4 bytes
string str; // 8 bytes
} myType;
Looks like 8 byte alignment.
So if you have any data type that has less than 8 bytes, it will still use 8 bytes.
I assume the pointer is 8 byte, whereas the ints are only 4 bytes each.
You can force 1 byte alignment using code like outlined here Struct one-byte alignment conflicted with alignment requirement of the architecture? . You should then get different size for first case.
It's called structure packing to achieve optimal memory alignment.
See The Lost Art of C Structure Packing to understand the how and why. It's done the same way in both C and C++.
In C/C++ structs are "packed" in byte chunks. You can specify which size your structs should be packed.
Here a reference: http://msdn.microsoft.com/en-us/library/2e70t5y1.aspx
I have made the following code as an example.
#include <iostream>
struct class1
{
uint8_t a;
uint8_t b;
uint16_t c;
uint32_t d;
uint32_t e;
uint32_t f;
uint32_t g;
};
struct class2
{
uint8_t a;
uint8_t b;
uint16_t c;
uint32_t d;
uint32_t e;
uint64_t f;
};
int main(){
std::cout << sizeof(class1) << std::endl;
std::cout << sizeof(class2) << std::endl;
std::cout << sizeof(uint64_t) << std::endl;
std::cout << sizeof(uint32_t) << std::endl;
}
prints
20
24
8
4
So it's fairly simple to see that one uint64_t is as large as two uint32_t's, Why would class 2 have 4 extra bytes, if they are the same except for the substitution of two uint32_t's for an uint64_t.
As it was pointed out, this is due to padding.
To prevent this, you may use
#pragma pack(1)
class ... {
};
#pragma pack(pop)
It tells your compiler to align not to 8 bytes, but to one byte. The pop command switches it off (this is very important, since if you do that in the header and somebody includes your header, very weird errors may occur)
Why does an uint64_t needs more memory than 2 uint32_t's when used in a class?
The reason is padding due to alignment requirements.
On most 64-bit architectures uint8_t has an alignment requirement of 1, uint16_t has an alignment requirement of 2, uint32_t has an alignment requirement of 4 and uint64_t has an alignment requirement of 8. The compiler must ensure that all members in a structure are correctly aligned and that the size of a structure is a multiple of it's overall alignment requirement. Furthermore the compiler is not allowed to re-order members.
So your structs end up laid out as follows
struct class1
{
uint8_t a; //offset 0
uint8_t b; //offset 1
uint16_t c; //offset 2
uint32_t d; //offset 4
uint32_t e; //offset 8
uint32_t f; //offset 12
uint32_t g; //offset 16
}; //overall alignment requirement 4, overall size 20.
struct class2
{
uint8_t a; //offset 0
uint8_t b; //offset 1
uint16_t c; //offset 2
uint32_t d; //offset 4
uint32_t e; //offset 8
// 4 bytes of padding because f has an alignment requirement of 8
uint64_t f; //offset 16
}; //overall alignment requirement 8, overall size 24
And how to prevent this?
Unfortunately there is no good general solution.
Sometimes it is possible to reduce the amount of padding by re-ordering fields, but that doesn't help in your case. It just moves the padding around in the structure. A structure with a field requiring 8 byte alignment will always have a size that is a multiple of 8. Therefore no matter how much you rearrange the fields your structure will always have a size of at least 24.
You can use compiler-specific features such as #pragma pack or __attribute((packed)) to force the compiler to pack the structure more tightly than normal alignment requirements would allow. However, as well as limiting portability, this creates a problem when taking the address of a member or binding a reference to the member. The resulting pointer or reference may not satisfy the alignment requirements and therefore may not be safe to use.
Different compilers vary in how they handle this problem. From some playing around on godbolt.
g++ 9 through 11 will refuse to bind a reference to a packed member and give a warning when taking the address.
clang 4 through 11 will give a warning when taking the address, but will silently bind a reference and pass that reference across a compilation unit boundary.
Clang 3.9 and earlier will take the address and bind a reference silently.
g++ 8 and earlier and clang 3.9 and earlier (down to the oldest version on godbolt) will also refuse to bind a reference, but will take the address with no warning.
icc will bind a pointer or take the address without producing any warnings in either case (though to be fair intel processors support unaligned access in hardware).
The rule for alignment (on x86 and x86_64) is generally to align a variable on it's size.
In other words, 32-bit variables are aligned on 4 bytes, 64-bit variables on 8 bytes, etc.
The offset of f is 12, so in case of uint32_t f no padding is needed, but when f is an uint64_t, 4 bytes of padding are added to get f to align on 8 bytes.
For this reason it is better to order data members from largest to smallest. Then there wouldn't be any need for padding or packing (except possibly at the end of the structure).
Can anybody please explain what's going on?
My MSVC 2008 project's structure member alignment setting is set to 16 bytes (/Zp16) alignment, however one of the following structures is being aligned by 16 bytes and another is aligned only by 8 bytes... WHY?!!!
struct HashData
{
void *pData;
const char* pName;
int crc;
bool bModified;
}; // sizeof (HashData) == 4 + 4 + 4 + 1 + padding = 16 bytes, ok
class StringHash
{
HashData data[1024];
int mask;
int size;
}; // sizeof(StringHash) == 1024 * 16 + 4 + 4 + 0 = 16392 bytes, why not 16400 bytes?
This may not look like a big deal, but it's a big problem for me, since I am forced to emulate the MSVC structures alignment in GCC and specifying the aligned(16) attribute makes the sizeof (StringHash) == 16400!
Please tell me, when and why MSVC overrides the /Zp16 setting, I absolutely can't fathom it...
I think you misunderstood the /Zp16 option.
MSDN says,
When you specify this option, each structure member after the first is
stored on either the size of the member type or n-byte boundaries
(where n is 1, 2, 4, 8, or 16), whichever is smaller.
Please read the "whichever is smaller". It doesn't say that the struct will be padded by 16. It rather defines the boundary of each member relative to each other, starting from the first member.
What you basically want is align (C++) attribute, which says
Use __declspec(align(#)) to precisely control the alignment of user-defined data
So try this:
_declspec(align(16)) struct StringHash
{
HashData data[1024];
int mask;
int size;
};
std::cout << sizeof(StringHash) << std::endl;
It should print what you expect.
Or you can use #pragma pack(16).
Consider using the pack pragma directive:
// Set packing to 16 byte alignment
#pragma pack(16)
struct HashData
{
void *pData;
const char* pName;
int crc;
bool bModified;
};
class StringHash
{
HashData data[1024];
int mask;
int size;
};
// Restore default packing
#pragma pack()
See: pack and Working with Packing Structures