Offset in a struct with bit fields

Offset in a struct with bit fields - c++

If we have a struct with bit fields, then how are the subsequent members aligned in the struct? Consider the following code:
struct A{
int a:1;
char b; // at offset 1
};
struct B{
int a:16;
int b: 17;
char c; // at offset 7
};
printf("Size of A: %d\n", (int)sizeof(struct A));
printf("Offset of b in A: %d\n", (int)offsetof(struct A, b));
printf("Size of B: %d\n", (int)sizeof(struct B));
printf("Offset of c in B: %d\n", (int)offsetof(struct B, c));
Output:
Size of A: 4
Offset of b in A: 1
Size of B: 8
Offset of c in B: 7
Here, in the first case, b is allocated just in the 2nd byte of the struct without any padding. But, in the 2nd case, when bit fields overflow 4 bytes, c is allocated in the last (8th) byte.
What is happening in the 2nd case? What is the rule for padding in structs involving bit fields in general?

how are the subsequent members aligned in the struct?
Nobody knows. This is implementation-defined behavior and thus compiler-specific.
What is happening in the 2nd case?
The compiler may have added padding bytes or padding bits. Or the bit order of the struct might be different than you expect. The first item of the struct is not necessarily containing the MSB.
What is the rule for padding in structs involving bit fields in general?
The compiler is free to add any kind of padding bytes (and padding bits in a bit field), anywhere in the struct, as long as it isn't done at the very beginning of the struct.
Bit-fields are very poorly defined by the standard. They are essentially useless for anything else but chunks of boolean flags allocated at random places in memory. I would advise you to use bit-wise operators on plain integers instead. Then you get 100% deterministic, portable code.

I would take a small example. Hope this will make clear ::
Consider two structures :
struct {
char a;
int b;
char c;
} X;
Versus.
struct {
char a;
char b;
int c;
} Y;
A little more explanation regarding comments below:
All the below is not a 100%, but the common way the structs will be constructed in 32 bits system where int is 32 bits:
Struct X:
| | | | | | | | | | | | |
char pad pad pad ---------int---------- char pad pad pad = 12 bytes
struct Y:
| | | | | | | | |
char char pad pad ---------int---------- = 8 bytes
Thank you
Some reference ::
Data structure Alignment-wikipedia

Related

Virtual class inheritance object size issue

Here, in this code, the size of ob1 is 16 which is fine(because of the virtual pointer) but I can't understand why the size of ob2 is 24.
#include <iostream>
using namespace std;
class A {
int x;
};
class B {
int y, z;
};
class C : virtual public A {
int a;
};
class D : virtual public B {
int b;
};
int main() {
C ob1;
D ob2;
cout << sizeof(ob1) << sizeof(ob2) << "\n";
}
I expect the size of ob2 as 20, but the output is 24

One possible layout for objects of type D is:
+----------+
| y | The B subobject (8 bytes)
| z |
+----------+
| vptr | vtable pointer (8 bytes)
| |
+----------+
| b | 4 bytes
+----------+
| unused | 4 bytes (padding for alignment purposes)
+----------+
That would make sizeof(ob2) 24.
Alignment requirements are defined by an implementation. Most of the time, the size of the largest member object or subobject dictates the alignment requirement of an object. In your case, the size of the largest object, the vtable pointer, is 8 bytes. Hence, the implementation aligns objects at 8 bit boundaries, adding padding when necessary.

To implement the virtual inheritance, D contains as data member a pointer, which on a 64 bit system requires 8 bytes. Moreover, these 8 bytes must be aligned to a 8-byte memory boundary. This latter requirement in turn mandates that D itself be aligned to a 8-byte memory boundary. The simplest way to implement that is to make sizeof(D) a multiple of 8 by padding it with unused bytes (21-24).

Not Understanding C++ Memory Alignment

I understand that in C++, if we have a struct like this:
struct x_
{
char a; // 1 byte
int b; // 4 bytes
short c; // 2 bytes
char d; // 1 byte
} MyStruct;
Memory structure will look like this due to compiler padding:
struct x_
{
char a; // 1 byte
char _pad0[3]; // padding to put 'b' on 4-byte boundary
int b; // 4 bytes
short c; // 2 bytes
char d; // 1 byte
char _pad1[1]; // padding to make sizeof(x_) multiple of 4
}
Can somebody please help me understand why sizeof(x_) must be a multiple of 4, and not any other number?

At least, a class must have the same alignment largest alignment of any of its non-static data member. So x_ must be aligned as b. If x_ had a lower alignment requirement, then b could be misaligned. Imagine an object x_ at address 3 then its b member would also be at an address 3+alignof(int). If alignof(int)==4 then b would be at address 7, so it would be misaligned. So following this example, alignof(x_)==4;
Then the size of an object must be a multiple of the alignment because elements of an array are required to be contiguous: a pointer to the end of element n must be a pointer to the element n+1. So in your case x size must be a multiple of 4.

Structure alignment padding, largest size of padding, and order of struct members

I've been learning about structure data padding since I found out my sizeof() operator wasn't returning what I expected. According to the pattern that I've observed, it aligns structure members with the largest data type. So for example...
struct MyStruct1
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
// Total 5 Bytes
//Total size of struct = 5 (no padding)
};
struct MyStruct2
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
short f; // 2 bytes
// Total 7 Bytes
//Total size of struct = 8 (1 byte of padding between char e and short f
};
struct MyStruct3
{
char a; // 1 byte
char b; // 1 byte
char c; // 1 byte
char d; // 1 byte
char e; // 1 byte
int f; // 4 bytes
// Total 9 bytes
//Total size of struct = 12 (3 bytes of padding between char e and int f
};
However if make the last member an 8 byte data type, for example a long long, it still only adds 3 bytes of padding, making a four-byte aligned structure. However if I build in 64 bit mode, it does in fact align for 8 bytes (the biggest data type). My first question is, am I wrong in saying it aligns the members with the largest data type? This statement seems correct for a 64 bit build, but only true up to 4 byte data types in a 32 bit build. Has this to do with the native 'word' size of the CPU? Or the program itself?
My second question is, would the following be an entire waste of space and bad programming?
struct MyBadStruct
{
char a; // 1 byte
unsigned int b; // 4 bytes
UINT8 c; // 1 byte
long d; // 4 bytes
UCHAR e; // 1 byte
char* f; // 4 bytes
char g; // 1 byte
// Total of 16 bytes
//Total size of struct = 28 bytes (12 bytes of padding, wasted)
};

How padding is done, is not part of the standard. So it can be done differently on different systems and compilers. It is often done so that variables are aligned at there size, i.e. size=1 -> no alignment, size=2 -> 2 byte alignment, size=4 -> 4 byte alignment and so on. For size=8, it is normally 4 or 8 bytes aligned. The struct it self is normally 4 or 8 bytes aligned. But - just to repeat - it is system/compiler dependent.
In your case it seems to follow the pattern above.
So
char a;
int b;
will give 3 bytes padding to 4 byte align the int.
and
char a1;
int b1;
char a2;
int b2;
char a3;
int b3;
char a4;
int b4;
will end up as 32 byte (again to 4 byte align the int).
But
int b1;
int b2;
int b3;
int b4;
char a1;
char a2;
char a3;
char a4;
will be just 20 as the int is already aligned.
So if memory matters, put the largest members first.
However, if memory doesn't matter (e.g. because the struct isn't used that much), it may be better to keep things in a logical order so that the code is easy to read for humans.

Typically the best way to reduce the amount of padding inserted by the compiler is to sort the data members inside of your struct from largest to smallest:
struct MyNotSOBadStruct
{
long d; // 4 bytes
char* f; // 4 bytes
unsigned int b; // 4 bytes
char a; // 1 byte
UINT8 c; // 1 byte
UCHAR e; // 1 byte
char g; // 1 byte
// Total of 16 bytes
};
the size may vary depending on 32 vs 64 bit os because the size of a pointer will change
live version: http://coliru.stacked-crooked.com/a/aee33c64192f2fe0
i get size = 24

All of the following is implementation dependent. Do not rely on this for the correctness of your programs (but by all means make use of it for debugging or improving performance).
In general each datatype has a preferred alignment. This is never larger than the size of the type, but it can be smaller.
It appears that your compiler is aligning 64-bit integers on a 32-bit boundary when compiling in 32-bit mode, but on a 64-bit boundary in 64-bit mode.
As to your question about MyBadStruct: In general, write your code to be simple and easy to understand; only do anything else if you know (through measurement) that you have a problem. Having said that, if you sort your member variables by size (largest first), you will minimize padding space.

size of a structure containing bit fields [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
I was trying to understand the concept of bit fields.
But I am not able to find why the size of the following structure in CASE III is coming out as 8 bytes.
CASE I:
struct B
{
unsigned char c; // +8 bits
} b;
sizeof(b); // Output: 1 (because unsigned char takes 1 byte on my system)
CASE II:
struct B
{
unsigned b: 1;
} b;
sizeof(b); // Output: 4 (because unsigned takes 4 bytes on my system)
CASE III:
struct B
{
unsigned char c; // +8 bits
unsigned b: 1; // +1 bit
} b;
sizeof(b); // Output: 8
I don't understand why the output for case III comes as 8. I was expecting 1(char) + 4(unsigned) = 5.

You can check the layout of the struct by using offsetof, but it will be something along the lines of:
struct B
{
unsigned char c; // +8 bits
unsigned char pad[3]; //padding
unsigned int bint; //your b:1 will be the first byte of this one
} b;
Now, it is obvious that (in a 32-bit arch.) the sizeof(b) will be 8, isn't it?
The question is, why 3 bytes of padding, and not more or less?
The answer is that the offset of a field into a struct has the same alignment requirements as the type of the field itself. In your architecture, integers are 4-byte-aligned, so offsetof(b, bint) must be multiple of 4. It cannot be 0, because there is the c before, so it will be 4. If field bint starts at offset 4 and is 4 bytes long, then the size of the struct is 8.
Another way to look at it is that the alignment requirement of a struct is the biggest of any of its fields, so this B will be 4-byte-aligned (as it is your bit field). But the size of a type must be a multiple of the alignment, 4 is not enough, so it will be 8.

I think you're seeing an alignment effect here.
Many architectures require integers to be stored at addresses in memory that are multiple of the word size.
This is why the char in your third struct is being padded with three more bytes, so that the following unsigned integer starts at an address that is a multiple of the word size.

Char are by definition a byte. ints are 4 bytes on a 32 bit system. And the struct is being padded the extra 4.
See http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86 for some explanation of padding

To keep the accesses to memory aligned the compiler is adding padding if you pack the structure it will no add the padding.

I took another look at this and here's what I found.
From the C book, "Almost everything about fields is implementation-dependant."
On my machine:
struct B {
unsigned c: 8;
unsigned b: 1;
}b;
printf("%lu\n", sizeof(b));
print 4 which is a short;
You were mixing bit fields with regular struct elements.
BTW, a bit fields is defined as: "a set of adjacent bits within a sindle implementation-defined storage unit" So, I'm not even sure that the ':8' does what you want. That would seem to not be in the spirit of bit fields (as it's not a bit any more)

The alignment and total size of the struct are platform and compiler specific. You cannot not expect straightforward and predictable answers here. Compiler can always have some special idea. For example:
struct B
{
unsigned b0: 1; // +1 bit
unsigned char c; // +8 bits
unsigned b1: 1; // +1 bit
};
Compiler can merge fields b0 and b1 into one integer and may not. It is up to compiler. Some compilers have command line keys that control this, some compilers not. Other example:
struct B
{
unsigned short c, d, e;
};
It is up to compiler to pack/not pack the fields of this struct (asuming 32 bit platform). Layout of the struct can differ between DEBUG and RELEASE builds.
I would recommend using only the following pattern:
struct B
{
unsigned b0: 1;
unsigned b1: 7;
unsigned b2: 2;
};
When you have sequence of bit fields that share the same type, compiler will put them into one int. Otherwise various aspects can kick in. Also take into account that in a big project you write piece of code and somebody else will write and rewrite the makefile; move your code from one dll into another. At this point compiler flags will be set and changed. 99% chance that those people will have no idea of alignment requirements for your struct. They will not even open your file ever.

struct alignment question

typedef struct {
char c;
char cc[2];
short s;
char ccc;
}stuck;
Should the above struct have a memory layout as this ?
1 2 3 4 5 6 7
- c - cc - s - ccc -
or this ?
1 2 3 4 5 6 7 8
- c - cc - s - ccc -
I think the first should be better but why my VS09 compiler chooses the second ? (Is my layout correct by the way ?) Thank you

I think that your structure will have the following layout, at least on Windows:
typedef struct {
char c;
char cc[2];
char __padding;
short s;
char ccc;
char __tail_padding;
} stuck;
You could avoid the padding by reordering the structure members:
typedef struct {
char c;
char cc[2];
char ccc;
short s;
} stuck;

The compiler can't choose the second. The standard mandates that the first field must be aligned with the start of the structure.
Are you using offsetof from stddef.h for finding this out ?
6.7.2.1 - 13
A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-ﬁeld, then to the unit in
which it resides), and vice versa. There may be unnamed padding
within a structure object, but not at its beginning.
It means that you can have
struct s {
int x;
char y;
double z;
};
struct s obj;
int *x = (int *)&obj; /* Legal. */
Put another way
offsetof(s, x); /* Must yield 0. */

Other than at the beginning of a structure, an implementation can put whatever padding it wants in your structures so there's no right way. From C99 6.7.2.1 Structure and union specifiers, paragraphs:
/12:Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
/13:There may be unnamed
padding within a structure object, but not at its beginning.
/15:There may be unnamed padding at the end of a structure or union.
Paragraph 13 also contains:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
This means that the fields within the structure cannot be re-ordered. And, in a large number of modern implementations (but this is not mandated by the standard), the alignment of an object is equal to its size. For example a 32-bit integer data type may have an alignment requirement of four (8-bit) bytes.
Hence, a logical alignment would be:
offset size field
------ ---- -----
0 1 char c;
1 2 char cc[2];
3 1 padding
4 2 short s;
6 1 char ccc;
7 1 padding
but, as stated, it may be something different. The final padding is to ensure that consecutive array elements are aligned correctly (since the short most likely has to be on a 2-byte boundary).
There are a number of (non-portable) ways in which you may be able to control the padding. Many compilers have a #pragma pack option that you can use to control padding (although be careful: while some systems may just slow down when accessing unaligned data, some will actually dump core for an illegal access).
Also, re-ordering the elements within the structure from largest to smallest tends to reduce padding as well since the larger elements tend to have stricter alignment requirements.
These, and an even uglier "solution" are discussed more here.

While I do really understand your visual representation of the alignment, I can tell you that with VS you can achieve a packed structure by using 'pragma':
__pragma( pack(push, 1) )
struct { ... };
__pragma( pack(pop) )
In general struct-alignment depends on the compiler used, the target-platform (and its address-size) and the weather, IOW in reality it is not well defined.

Others have mentionned that padding may be introduced either between attributes or after the last attribute.
The interesting thing though, I believe, is to understand why.
Types usually have an alignment. This property precises which address are valid (or not) for a particular type. On some architecture, this is a loose requirement (if you do not respect it, you only incur some overhead), on others, violating it causes hardware exceptions.
For example (arbitrary, as each platform define its own):
char: 1
short (16 bits): 2
int (32 bits): 4
long int (64 bits): 8
A compound type will usually have as alignment the maximum of the alignment of its parts.
How does alignment influences padding ?
In order to respect the alignment of a type, some padding may be necessary, for example:
struct S { char a; int b; };
align(S) = max(align(a), align(b)) = max(1, 4) = 4
Thus we have:
// S allocated at address 0x16 (divisible by 4)
0x16 a
0x17
0x18
0x19
0x20 b
0x21 b
0x22 b
0x23 b
Note that because b can only be allocated at an address also divisible by 4, there is some space between a and b, this space is called padding.
Where does padding comes from ?
Padding may have two different reasons:
between attributes, it is caused by a difference in alignment (see above)
at the end of the struct, it is caused by array requirements
The array requirement is that elements of an array should be allocated without intervening padding. This allows one to use pointer arithmetic to navigate from an element to another:
+---+---+---+
| S | S | S |
+---+---+---+
S* p = /**/;
p = p + 1; // <=> p = (S*)((void*)p + sizeof(S));
This means, however, than the structure S size needs be a multiple of S alignment.
Example:
struct S { int a; char b; };
+----+-+---+
| a |b| ? |
+----+-+---+
a: offset 0, size 4
b: offset 4, size 1
?: offset 5, size 3 (padding)
Putting it altogether:
typedef struct {
char a;
char b[2];
short s;
char c;
} stuck;
+-+--+-+--+-+-+
|a| b|?|s |c|?|
+-+--+-+--+-+-+
If you really wish to avoid padding, one (simple) trick (which does not involve addition nor substraction) is to simply order your attributes starting from the maximum alignment.
typedef struct {
short s;
char a;
char b[2];
char c;
} stuck;
+--+-+--+-+
| s|a| b|c|
+--+-+--+-+
It's a simple rule of thumb, especially as the alignment of basic types may change from platform to platform (32bits/64bits) whereas the relative order of the types is pretty stable (exception: the pointers).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js