Are structs that contain packed structs packed themselves? - c++

Suppose I have a struct which contains other structs, which are not packed:
struct ContainerOfNonPacked
{
NonPacked1 first;
NonPacked2 second;
};
And then I have a struct which contains other structs which are packed:
struct ContainerOfPacked
{
Packed1 first; // declaration of struct had __attribute__((packed))
Packed2 second; // ditto
};
The first is not going to be packed by the compiler (i.e. there's no guarantee that there will be no "holes" inside the struct). It might coincidentally have no holes, but that's not what the question is about.
What about the second container, that contains packed structs? Is there any guarantee that a struct consisting solely of packed structs as its fields is itself packed?

Edit: it is up to the compiler and is not a behaviour one can depend on.
For the case of GCC and clang, structs that contain only packed structs are themselves "packed". They will not have any padding "holes" in them.
Here is an example to illustrate (godbolt link: https://godbolt.org/g/2noSpP):
We have four structs, Packed1, Packed2, NonPacked1, and NonPacked2.
#include <cstdint>
struct Packed1
{
uint32_t a;
uint64_t b;
} __attribute__((packed));
struct NonPacked1
{
uint16_t a;
// Padding of 2 bytes
uint32_t b;
uint8_t c;
};
// We have two structs of the same size, one packed and one not
static_assert(sizeof(NonPacked1) == 12);
static_assert(sizeof(Packed1) == 12);
struct Packed2
{
uint64_t a;
uint64_t b;
} __attribute__((packed)); // packing has no effect, but better for illustration
struct NonPacked2
{
uint32_t a;
// Padding of 4 bytes
uint64_t b;
};
// And again, two structs of the same size
static_assert(sizeof(Packed2) == 16);
static_assert(sizeof(NonPacked2) == 16);
Packed1 and Packed2 go into a ContainerOfPacked struct, and the other two into a ContainerOfNonPacked.
struct ContainerOfNonPacked
{
NonPacked1 first; // 12 bytes
// Padding of 4 bytes between the non-packed struct-fields
NonPacked2 second; // 16 bytes
};
struct ContainerOfPacked
{
Packed1 first; // 12
// No padding between the packed struct-fields
Packed2 second; // 16
};
static_assert(sizeof(ContainerOfNonPacked) == 32);
static_assert(sizeof(ContainerOfPacked) == 28);
As the code comments and static asserts show, the container of packed structs does not have padding holes inside it. It behaves as if it is itself "packed".
Although this is an answer, I'm also looking for answers which can cite the relevant parts of the Standard or some other authoritative/theoretical explanations.

Related

Packing unions/structure to avoid padding

I have a structure that looks like this:
struct vdata {
static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[12]; // 12 bytes
} u;
int16_t C; // 2 bytes
int16_t D; // 2 bytes
};
I would like to make this 16 bytes, but GCC is telling me it is 24, as the union is padding to 16 bytes.
I would like to put vdata into a large std::vector. From my understanding, there should be no issue with alignment if this were 16 bytes, since the pointer would always be 8 byte aligned.
I understand that I can force this to be packed using __attribute__((__packed__)) in GCC. But I would like to know if there is a portable and standard compliant way to get this to be 16 bytes?
Edit: Ideas
Idea 1: split up the B array.
struct vdata {
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[8]; // 8 bytes
} u;
uint8_t B2[4]; // 4 bytes
int16_t C; // 2 bytes
int16_t D; // 2 bytes
};
Could B2 elements be reliably accessed from a pointer of B? Is that defined behavior?
Idea 2: store pointer as byte array and memcpy as necessary (#Eljay)
struct vdata {
union union_data {
std::byte A[sizeof(uint8_t*)]; // 8 bytes
uint8_t B[12]; // 12 bytes
} u;
int16_t C; // 2 bytes
int16_t D; // 2 bytes
};
Would there be a performance penalty for accessing the pointer, or would it be optimized out? (Assuming GCC x86).
You could change A to std::byte A[sizeof(uint8_t*)]; and then std::memcpy the pointer into A and out of A.
Worth commenting as to what is going on, and that these extra hoops are to avoid padding bytes.
Also adding a set_A setter and get_A getter may be very helpful.
struct vdata {
union union_data {
std::byte A[sizeof(uint8_t*)]; // 8 bytes
uint8_t B[12]; // 12 bytes
} u;
int16_t C; // 2 bytes
int16_t D; // 2 bytes
void set_A(uint8_t* p) {
std::memcpy(u.A, &p, sizeof p);
}
uint8_t* get_A() {
uint8_t* result;
std::memcpy(&result, u.A, sizeof result);
return result;
}
};
Store C+D in the union's array, and provide method access to them:
struct vdata {
static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[16]; // 12 + 2*2 bytes
} u;
int16_t& C() {
return *reinterpret_cast<int16_t*>(static_cast<void*>(&u.B[12]));
}
int16_t& D() {
return *reinterpret_cast<int16_t*>(static_cast<void*>(&u.B[14]));
}
};
Demo (with zero warnings for strict aliasing violations and run-time address sanitization enabled)
Keep in mind that there's no strict aliasing violation when the buffer is char* i.e. single byte type like uint8_t - I mean thankfully because otherwise it would be impossible to create memory pools. If it makes things clearer/safer you can even have an explicit char array buffer:
struct vdata {
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[12]; // 12 bytes
char buf[16]; // 16 bytes - could be std::byte buf[16]
} u;
int16_t& C() { return *(int16_t*)(&u.buf[12]); }
int16_t& D() { return *(int16_t*)(&u.buf[14]); }
};
Regarding alignment The array is 8-aligned due to the address of the union, so positions 12&14 are guaranteed to be 2-aligned which is the requirement for int16_t (even though the string u.B appears in the code).
Alternatively you can force align the structure. The C++ alignas specifier would not be valid here because you want to lower the alignment of your structure, put a pragma directive is possible to give you again 16 bytes:
#pragma pack(4)
struct vdata {
static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
union union_data {
uint8_t * A; // 8 bytes
uint8_t B[12]; // 12 bytes
} u;
int16_t C; // 2 bytes
int16_t D; // 2 bytes
};
Demo
I'm fairly certain that this one will cause problems.
As far as I understand, the following code would be the most safe one.
The data that specify the type is in the Initial common sequence. Thus you can access it either way (by using cda.C or cdb.C) so it is perfect for determining the type.
Then putting everything in a struct for both cases allows to ensure that each struct layout is independant (thus B can start before next 8 bytes alignment).
#include <cstdint>
#include <iostream>
struct CDA
{
int16_t C; // 2 bytes
int16_t D; // 2 bytes
uint8_t* A; // 8 bytes
};
struct CDB
{
int16_t C; // 2 bytes
int16_t D; // 2 bytes
uint8_t B[12]; // 12 bytes
};
struct vdata {
union union_data {
CDA cda;
CDB cdb;
} u;
};
static_assert(sizeof(uint8_t*) == 8);
static_assert(sizeof(CDA) == 16);
static_assert(sizeof(CDB) == 16);
static_assert(offsetof(vdata::union_data, cda) == offsetof(vdata::union_data, cdb));
static_assert(offsetof(CDA, C) == offsetof(CDB, C));
static_assert(offsetof(CDA, C) == 0);
static_assert(sizeof(vdata) == 16);
int main()
{
std::cout << "sizeof(CDA) : " << sizeof(CDA) << std::endl;
std::cout << "sizeof(CDB) : " << sizeof(CDB) << std::endl;
std::cout << "sizeof(vdata) : " << sizeof(vdata) << std::endl;
}
Usefull source of information:
CppCon 2017: Scott Schurr “Type Punning in C++17: Avoiding Pun-defined Behavior”
Union declaration
std::launder
std::variant
How to decide?
If the size optimization is not that important, I would recommend to use std::variant.
If the size is important but the order is not, then the current solution might be the best choice.
If portability is not so important, then pragma pack solution might be appropriate (remember to reset alignment after the struct definition).
Otherwise, if you really need layout control, then either use:
std::byte array and memcpy (access data with functions)
placement new and std::launder.
In all cases, be sure to have appropriate assertion that verify assumptions you make. I have put many in my sample code but you can adjust depending on your need.
Also, unless you have millions of vdata items or you are on an embedded device, then using 24 bytes instead of 16 might not be a big deal.
You might also use conditionnal define to optimize only for your current compiler. This could be useful to ensure that you have working code (though maybe less optimal) for every target or it can allows to depend on behavior that is undefined from the standard but might be defined on your compiler.

Why is there a size mismatch between structures and unions?

I have declared a union allocating 4100 bytes to variable "sample_union" and made the same union declaration as part of a structure which is allocating 4104 bytes.
union test_size_union {
struct {
uint8_t type;
union {
uint8_t count;
uint8_t list;
};
uint16_t rc;
uint16_t arr_value[2048];
};
uint64_t first_dword;
}__attribute__((packed)) sample_union ;
Placing the above union inside structure is allocating 4104 bytes.
struct test_size_struct {
union {
struct {
uint8_t type;
union {
uint8_t count;
uint8_t list;
};
uint16_t rc;
uint16_t arr_value[2048];
};
uint64_t first_dword;
};
}__attribute__((packed)) sample_struct;
Well, this is not a project requirement, but I would like to know why compiler is behaving differently for this two declaration.
gcc version: (GCC) 4.9.2, x86_64
Platform: Linux, x86_64
When you placed the union inside the struct, you didn't mark the union as packed. The unpacked union has a little padding (four bytes) so that its size is a multiple of the size of uint64_t.
The packed union doesn't have this padding, so it is smaller.
As a side observation, the anonymous struct inside the union is not marked packed. That happens not to matter in this case, because everything is nicely aligned anyway - but it's something to be aware of.

How can I refer to sub-elements in a particular variable?

My problem is as follows: I have a 64 bit variable, of type uint64_t (so I know it's specified to be at least 64 bits wide).
I want to be able to refer to different parts of it, for example breaking it down into two uint32_ts, four uint16_ts or eight uint8_ts. Is there a standards compliant way to do it that doesn't rely on undefined behavior?
My approach is as follows:
class Buffer
{
uint64_t m_64BitBuffer;
public:
uint64_t & R64() { return m_64BitBuffer; }
uint32_t & R32(R32::Part part) { return *(reinterpret_cast<uint32_t*>(&m_64BitBuffer)+part); }
uint16_t & R16(R16::Part part) { return *(reinterpret_cast<uint16_t*>(&m_64BitBuffer)+part); }
uint8_t & R8(R8::Part part) { return *(reinterpret_cast<uint8_t*>(&m_64BitBuffer)+part); }
};
Where R32::Part, R16::Part and R8::Part are enums that define values between 0 and 1, 0 and 3 and 0 and 7 respectively.
I imagine this should be ok. There should be no issues with alignment, for example. I'd like to know if I'm breaking any rules, and if so, how to do this properly.
Type-punning through a union is allowed by some compilers, so you could simply have the following anonymous union member:
union {
uint64_t val;
struct { uint32_t as32[2]; };
struct { uint16_t as16[4]; };
struct { uint8_t as8[8]; };
} u;
Access to each part is as easy as reading from the appropriate member.

what does a unioned struct do?

I have no idea how to decipher this man someone needs to tell me what is going on please help
typedef struct _ARGBCOLOR {
union {
struct {
BYTE B;
BYTE G;
BYTE R;
BYTE A;
};
DWORD ARGB;
};
} ARGBCOLOR, *PARGBCOLOR;
If you have a ARGBCOLOR x;You can access 4 separate bytes as x.B,x.G,x.R, and x.A, or a 32-bit word as x.ARGB.
The C standard guarantees these will overlay properly (assuming the sizes fit and padding requirements don't screw the pooch (not the case here)). But this struct clearly assumes a little-endian system.
One extra complication is that the union is anonymous. It's common to name a union like this u, but some compilers allow internal unions and structs to be anonymous, so its members are accessed as if they were up one level (at the same level as the union itself).
My favorite way to do this sort of overlay type is to put the union at the outermost level. You can duplicate the kind or type member so it's accessible everywhere. But this way removes the temptation to use anonymous unions (which are not available in Ansi-C or C99) because you don't need a bogus u member in the middle.
typedef union _ARGBCOLOR {
//BYTE type;
struct {
//BYTE type;
BYTE B;
BYTE G;
BYTE R;
BYTE A;
} bytes;
struct {
//BYTE type;
DWORD ARGB;
} word;
} ARGBCOLOR, *PARGBCOLOR;
Due to the common initial prefix property, all three of the BYTE type; members would overlay the same memory.
Another variation is to make an array for the individual bytes.
typedef union _ARGBCOLOR {
DWORD dword;
BYTE byte[ sizeof(DWORD)/sizeof(BYTE) ];
} ARGBCOLOR, *PARGBCOLOR;
enum { B,G,R,A };
Now we don't need two levels, and the type-punning is more apparent.
ARGBCOLOR x = { 0x10203040 };
x.byte[B] = 50;
x.byte[G] = 60;
printf("0x%8x\n", x.dword); // prints: 0x10206050
It does the same thing that declaring the structure outside of the union would do, but allows you to do it anonymously, exposing the struct's members as the same scope as the other union members. Your construct in particular allows you to do something like
ARGBCOLOR a;
printf("a = %u.%u.%u.%u or %lu\n", a.B, a.G, a.R, a.A, a.ARGB);
(The union itself lets you see it as one of the types within the union, in this case you can see it either as a sequence of four BYTE values or as a single DWORD value)
If you hoisted the struct definition out of the union:
struct ARGB_VALUES {
BYTE B, G, R, A;
};
struct ARGBCOLOR {
union {
ARGB_VALUES VALUES; /* My god, how can you tell a type from a member? */
DWORD ARGB;
};
};
The previous printf would have to be:
ARGBCOLOR a;
printf("a = %u.%u.%u.%u or %lu\n", a.VALUES.B, a.VALUES.G, a.VALUES.R, a.VALUES.A, a.ARGB);
Note: The struct definition construct you're using is very archaic. You can reduce it as follows and save giving every structure/class a second alias that bloats your symbol table:
typedef struct ARGBCOLOR {
...
} *PARGBCOLOR;
The way you're doing it is equivalent to this:
struct _ARGBCOLOR {}; // First symbol.
typedef _ARGBCOLOR ARGBCOLOR;
typedef _ARGBCOLOR *PARGBCOLOR;
typedef struct _ARGBCOLOR
{
union
{
struct
{
BYTE B;
BYTE G;
BYTE R;
BYTE A;
};
DWORD ARGB;
};
} ARGBCOLOR, *PARGBCOLOR;
If you observe carefully, you can see a struct and a DWORD within the union. You can use either of these from your typedefed struct ARGBCOLOR, whichever is available
In a union, each data member starts at the same location in memory. The DWORD ARGB shares the memory location of the struct with 4 bytes. So, the size of union will be 4 bytes and you may access it as either as a whole using ARGB or byte-wise, using A, R, G, or B (a single byte can be modified without affecting other bytes).
The outermost struct is not really useful in your example, but it is when it has an additional member called kind or some such:
struct ARGBCOLOR {
uint8_t kind;
union {
struct {
BYTE B;
BYTE G;
BYTE R;
BYTE A;
};
DWORD ARGB;
};
} ARGBCOLOR;
Then depending on the value of kind you interpret the union as either a struct with members B, G, R and A, or the DWORD ARGB.
edit: Keep in mind that anonymous members such as the union and the inner struct only became part of the C standard in C11.

Define an enum to be smaller than one byte / Why is this struct larger than one byte?

I'd like to define an enum to be smaller than one byte while maintaining type safety.
Defining an enum as:
enum MyEnum : unsigned char
{
i ,j, k, w
};
I can shrink it to one byte, however I'd like to make it use only 2 bits since I will at most have 4 values in it. Can this be done?
In my struct where I use the enum, the following does not work
struct MyStruct
{
MyEnum mEnum : 2; // This will be 4 bytes in size
};
Thanks!
Update:
The questions comes from this scenario:
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
union
{
signed int mXa:3;
unsigned int mXb:3;
};
union
{
signed int mYa:3;
unsigned int mYb:3;
};
MyEnum mEnum:2;
};
sizeof(MyStruct) is showing 9 bytes. Ideally I'd like the struct to be 1 bytes in size.
Update for implemented solution:
This struct is one byte and offers the same functionality and type safety:
enum MyEnum :unsigned char
{
i,j,k,w
};
struct MyStruct
{
union
{
struct { MyEnum mEnum:2; char mXa:3; char mXb:3;};
struct { MyEnum mEnum:2; unsigned char mYa:3; unsigned char mYb:3;};
};
};
As per standard definition, a types sizeof must be at least 1 byte. This is the smallest addressable unit of memory.
The feature of bitfields you are mentioning allows to define members of structures to have smaller sizes, but the struct itself may not be because
It must be of at least 1 byte too
Alignment considerations might need it to be even bigger
additionally you may not take the address of bitfield members, since as said above, a byte is the smallest addressable unit of memory (You can already see that by sizeofactually returning the number of bytes, not bits, so if you expected less than CHAR_BIT bits, sizeof would not even be able to express it).
bitfields can only share space if they use the same underlying type. And any unused bits are actually left unused; if the sum of bits in an unsigned int bitfield is 3 bits, it still takes 4 bytes total. Since both enums have unsigned int members, they're both 4 bytes, but since they are bitfields, they have an alignment of one. So the first enum is 4 bytes, and the second is four bytes, then the MyEnum is 1 byte. Since all of those have an alignment of one, no padding is needed.
Unfortunately, union doesn't really work with bitfields really at all. Bitfields are for integer types only. The most I could get your data to without serious redesign is 3 bytes: http://coliru.stacked-crooked.com/view?id=c6ad03c93d7893ca2095fabc7f72ca48-e54ee7a04e4b807da0930236d4cc94dc
enum MyEnum : unsigned char
{
i ,j, k, w
};
union MyUnion
{
signed char ma:3; //char to save memory
unsigned char mb:3;
};
struct MyStruct
{
MyUnion X;
MyUnion Y;
MyEnum mEnum;
}; //this structure is three bytes
In the complete redesign category, you have this: http://coliru.stacked-crooked.com/view?id=58269eef03981e5c219bf86167972906-e54ee7a04e4b807da0930236d4cc94dc
No. C++ defines "char" to be the smallest addressable unit of memory for the platform. You can't address 2 bits.
Bit packing 'Works for me'
#include <iostream>
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
MyEnum mEnum : 2;
unsigned char val : 6;
};
int main()
{
std::cout << sizeof(MyStruct);
}
prints out 1. How / what are you measuring?
Edit: Live link
Are you doing something like having a pointer as the next thing in the struct? In which case, you'll have 30bits of dead space as pointers must be 4 byte aligned on most 32bit systems.
Edit: With your updated example, its the unions which are breaking you
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
unsigned char mXb:3;
unsigned char mYb:3;
MyEnum mEnum:2;
};
Has size 1. I'm not sure how unions and bit packing work together though, so I'm no more help.