Multiple adjacent bit fields in C++

Multiple adjacent bit fields in C++ - c++

I saw multiple adjacent bit fields while browsing cppreference.
unsigned char b1 : 3, : 2, b2 : 6, b3 : 2;
So,
What is the purpose of it?
When and where should I use it?

Obviously, to consume less memory working with bitwise operations. That may be significant for embedded programming, for example.
You can also use std::vector<bool> which can (and usually does) have bitfield implementation.

Multiple adjacent bit fields are usually packed together (although this behavior is implementation-defined)
If you want compiler to not add padding or perform structure alignment during multiple bit field allocation you can compile them in a single variable.
struct x
{
unsigned char b1 : 4; // compiler will add padding after this. adding to the structure size.
unsigned char b2 : 3; // compiler will add padding here too! sizeof(x) is 2.
}
struct y
{
unsigned char b1 : 4, : 3; // padding will be added only once. sizeof(y) is 1
}
Or if you want to allocate bigger bit field in a single variable
struct x
{
unsigned char b1 :9; //warning: width of 'x::b1' exceeds its type
};
struct y
{
unsigned char b1 :6, :3; //no warning
};

According to c++draft :
3 A memory location is either an object of scalar type or a maximal
sequence of adjacent bit-fields all having nonzero
width.[ Note: Various features of the language, such as references and
virtual functions, might involve additional memory locations that are
not accessible to programs but are managed by the implementation.— end
note ]Two or more threads of execution ([intro.multithread]) can
access separate memory locations without interfering with each other.

One reason is to reduce memory consumption when you don't need full integer type range. It only matters if you have large arrays.
struct Bytes
{
unsigned char b1; // Eight bits, so values 0-255
unsigned char b2;
};
Bytes b[1000000000]; // sizeof(b) == 2000000000
struct Bitfields
{
unsigned char b1 : 4; // Four bits, so only accommodates values 0-15
unsigned char b2 : 4;
};
Bitfields f[1000000000]; // sizeof(f) == 1000000000
The other reason is matching binary layout of certain hardware or file formats. Such code is not portable.

Related

When the compiler decides to pad a struct

Let's say we have:
struct A{
char a1;
char a2;
};
struct B{
int b1;
char b2;
};
struct C{
char C1;
int C2;
};
I know that because of padding to a multiple of the word size (assuming word size=4), sizeof(C)==8 although sizeof(char)==1 and sizeof(int)==4.
I would expect that sizeof(B)==5 instead of sizeof(B)==8.
But if sizeof(B)==8 I would expect that sizeof(A)==4 instead of sizeof(A)==2.
Could anyone please explain why the padding and the aligning are working differently in those cases?

A common padding scheme is to pad structs so that each member starts at an even multiple of the size of that member or to the machine word size (whichever is smaller). The entire struct is padded following the same rule.
Assuming such a padding scheme I would expect:
The biggest member in struct A has size 1, so no padding is used.
In struct B, the size of 5 is padded to 8, because one member has size 4.
The layout would be:
int 4
char 1
padding 3
In struct C, some padding is inserted before the int, so that it starts at an address divisible by 4.
The layout would be:
char 1
padding 3
int 4

It's up to the compiler to decide how best to pad the struct. For some reason, it decided that in struct B that char b2 was more optimally aligned on a 4 byte boundary. Additionally, the specific architecture may have requirements/behaviors that the compiler takes into account when deciding how to pad structs.
If you 'pack' the struct, then you'd see the sizes you expect (although that is not portable and may have performance penalties and other issues depending on the architecture).

structs in general will be aligned to boundaries based on the largest type contained. Consider an array of struct B myarray[5];
struct B must be aligned to 8 bytes so that it's b1 member is always on a 4 byte boundary. myarray[1].b1 can't start at the 6th byte into the array, which is what you would have if sizeof(B) == 5.

Why size of my object is not reduced?

I've written CMyObject class as follows:
class CMyObject
{
public:
CMyOjbect(void) {};
virtual ~CMyOjbect(void) {};
public:
ULONGLONG m_uField1;
UINT m_uField2;
BOOL m_bField3;
int m_iField4;
BOOL m_bField5;
}
To reduce the size of CMyObject, I changed it to:
class CMyObject
{
public:
CMyOjbect(void) {};
virtual ~CMyOjbect(void) {};
public:
ULONGLONG m_uField1;
UINT m_uField2;
short m_sField4; // Change from int to short, since short is enough for the data range
unsigned m_bField3: 1; // Change field 3 and 5 from BOOL to bit field to save spaces
unsigned m_bField5: 1;
}
However, the sizeof(CMyObject) is still not changed, why?
Can I use pargma pack(1) in a class to pack all the member variables, like this:
pargma pack(1)
class CMyObject
{
public:
CMyOjbect(void) {};
virtual ~CMyOjbect(void) {};
public:
ULONGLONG m_uField1;
UINT m_uField2;
short m_sField4; // Change from int to short, since short is enough for the data range
unsigned m_bField3: 1; // Change field 3 and 5 from BOOL to bit field to save spaces
unsigned m_bField5: 1;
}
pargma pack(0)

Because of your ULONGLONG first member, your structure will have 8-byte (64-bit) alignment. Assuming 32-bit ints, your first version uses 18 bytes, which would take 24 bytes to store. (The large and small members are interspersed, which makes the matter worse, but by my count, that doesn't change the answer here.) Your second version also takes 18 bytes:
8 bytes for m_uField1
4 bytes for m_uField2
2 bytes for m_sField4
4 bytes for the two unsigned bitfields (which will have 4-byte alignment, so will also inject 2 bytes of padding after m_sField4)
If you switch to short unsigned m_bField3:1 and short unsigned m_bField4:1, I think there's a good chance your structure will become smaller and fit in only 16 bytes.
Use of #pragma pack is not portable, so I can't comment on the specifics there, but it's possible that could shrink the size of your structure. I'm guessing it may not, though, since it's usually better at compensating for nonoptimal ordering of members with alignment, and by my counts, your variables themselves are just too big. (However, removing the alignment requirement on the ULONGLONG may shrink your structure size in either version, and #pragma pack may do that.)
As #slater mentions, in order to decrease the size and padding in your structure, you should declare your member variables of similar sizes next to each other. It's a pretty good rule of thumb to declare your variables in decreasing size order, which will tend to minimize padding and leverage coinciding alignment requirements.
However, the size of the structure isn't always the most important concern. Members are initialized in declaration order in the constructor, and for some classes, this matters, so you should take that into account. Additionally, if your structure spans multiple cache lines and will be used concurrently by multiple threads, you should ideally put variables that are used together nearby and in the same cache line and variables that are not used together in separate cache lines to reduce/eliminate false sharing.

In regards to your first question "However, the sizeof(CMyObject) is still not changed, why?"
Your BOOLs are not defined contiguously, so they are padded by the compiler for the purposes of memory-alignment.
On most 32-bit systems, this struct uses 16 bytes:
struct {
char b1; // 1 byte for char, 3 for padding
int i1; // 4 bytes
char b2; // 1 byte for char, 3 for padding
int i2; // 4 bytes
}
This struct uses 12 bytes:
struct {
char b1; // 1 byte
char b2; // 1 byte, 2 bytes for padding
int i1; // 4 bytes
int i2; // 4 bytes
}

Alignment and packing are implementation dependent, but typically you can request smaller alignment and better packing by using smaller types. That applies to specifying smaller types in bit-field declarations as well, since many compilers interpret the bit-field type as a request for allocation unit of that size specifically and alignment requirement of that type specifically.
In your case one obvious mistake is using unsigned for bit fields. Use unsigned char for bit fields and it should pack much better
class CMyObject
{
public:
CMyOjbect(void) {};
virtual ~CMyOjbect(void) {};
public:
ULONGLONG m_uField1;
UINT m_uField2;
short m_sField4;
unsigned char m_bField3: 1;
unsigned char m_bField5: 1;
};
This will not necessarily make it as compact as #pragma pack(1) can make it, but it will take it much closer to it.

Characters in structures

A quick question : What is the meaning of char c:4 in the structure given below
struct s
{
char c:4;
}
Thanks in advance.

This is a bit field consisting of a four-bit portion of a char. You can define more bit fields to subdivide a larger type into "nibblets", like this:
struct s
{
char c:4;
char d:2;
char e:2;
};
This struct defines three fields, all "packed" into a single char. The c field can hold sixteen distinct values; fields d and e can hold four values each.

The char c:4 means that it is a char sized variable of which 4 bits are the variable c. Put another way, c does not refer to the entire 8 bit char memory space, only four bits of it. This is a mechanism for bit-packing flags and uncommon data sizes into a structure.

Bit-fields are a quirky feature of the C language whose exact specifications and behavior have been somewhat inconsistent over the years; later versions of the C standard have nailed down some aspects of the behavior enough to prevent compilers from doing some useful things, but not enough to make them portable in any useful fashion.
If two or more consecutive members of a structure have the same underlying integer type, and have their field name followed by a colon and a number, the compiler will attempt to pack each one after the first into the same storage element as the one before. If it cannot do so, it will advance to the next storage element.
Thus, on a machine where unsigned short is sixteen bits, a declaration:
struct {
unsigned short a1 : 5;
unsigned short a2 : 5;
unsigned short a3 : 7;
unsigned short a4 : 5;
unsigned short a5 : 4;
unsigned short a6 : 6;
}
the compiler will pack a1 and a2 into a 16-bit integer; since a3 won't fit in the same integer as a1 and a2 (the total would be 17 bits), it will start a new one. Although a4 would bit with a1 and a2, the fact that a3 is in the way means that it will be put with a3. Then a5 will be placed with a3 and a4, filling up the second spot. A6 will be given a 16-bit spot all its own.
Note that if the underlying type of all those structure elements had been a 32-bit unsigned int, everything could have been packed into one such item rather than three 16-bit ones.
Frankly, I really dislike the rules around bitfields. They rules are sufficiently vague that the underlying representation one compiler generates for bitfield data may not be readable by another, but they're sufficiently specific that compilers are forbidden from generating what could otherwise be useful data structures (e.g. it should be possible to pack four six-bit numbers into a 3-byte (24-bit) structure, but unless a compiler happens to have a 24-bit integer type there is way to request that). Still, the rules are what they are.

size of a structure containing bit fields [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
I was trying to understand the concept of bit fields.
But I am not able to find why the size of the following structure in CASE III is coming out as 8 bytes.
CASE I:
struct B
{
unsigned char c; // +8 bits
} b;
sizeof(b); // Output: 1 (because unsigned char takes 1 byte on my system)
CASE II:
struct B
{
unsigned b: 1;
} b;
sizeof(b); // Output: 4 (because unsigned takes 4 bytes on my system)
CASE III:
struct B
{
unsigned char c; // +8 bits
unsigned b: 1; // +1 bit
} b;
sizeof(b); // Output: 8
I don't understand why the output for case III comes as 8. I was expecting 1(char) + 4(unsigned) = 5.

You can check the layout of the struct by using offsetof, but it will be something along the lines of:
struct B
{
unsigned char c; // +8 bits
unsigned char pad[3]; //padding
unsigned int bint; //your b:1 will be the first byte of this one
} b;
Now, it is obvious that (in a 32-bit arch.) the sizeof(b) will be 8, isn't it?
The question is, why 3 bytes of padding, and not more or less?
The answer is that the offset of a field into a struct has the same alignment requirements as the type of the field itself. In your architecture, integers are 4-byte-aligned, so offsetof(b, bint) must be multiple of 4. It cannot be 0, because there is the c before, so it will be 4. If field bint starts at offset 4 and is 4 bytes long, then the size of the struct is 8.
Another way to look at it is that the alignment requirement of a struct is the biggest of any of its fields, so this B will be 4-byte-aligned (as it is your bit field). But the size of a type must be a multiple of the alignment, 4 is not enough, so it will be 8.

I think you're seeing an alignment effect here.
Many architectures require integers to be stored at addresses in memory that are multiple of the word size.
This is why the char in your third struct is being padded with three more bytes, so that the following unsigned integer starts at an address that is a multiple of the word size.

Char are by definition a byte. ints are 4 bytes on a 32 bit system. And the struct is being padded the extra 4.
See http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86 for some explanation of padding

To keep the accesses to memory aligned the compiler is adding padding if you pack the structure it will no add the padding.

I took another look at this and here's what I found.
From the C book, "Almost everything about fields is implementation-dependant."
On my machine:
struct B {
unsigned c: 8;
unsigned b: 1;
}b;
printf("%lu\n", sizeof(b));
print 4 which is a short;
You were mixing bit fields with regular struct elements.
BTW, a bit fields is defined as: "a set of adjacent bits within a sindle implementation-defined storage unit" So, I'm not even sure that the ':8' does what you want. That would seem to not be in the spirit of bit fields (as it's not a bit any more)

The alignment and total size of the struct are platform and compiler specific. You cannot not expect straightforward and predictable answers here. Compiler can always have some special idea. For example:
struct B
{
unsigned b0: 1; // +1 bit
unsigned char c; // +8 bits
unsigned b1: 1; // +1 bit
};
Compiler can merge fields b0 and b1 into one integer and may not. It is up to compiler. Some compilers have command line keys that control this, some compilers not. Other example:
struct B
{
unsigned short c, d, e;
};
It is up to compiler to pack/not pack the fields of this struct (asuming 32 bit platform). Layout of the struct can differ between DEBUG and RELEASE builds.
I would recommend using only the following pattern:
struct B
{
unsigned b0: 1;
unsigned b1: 7;
unsigned b2: 2;
};
When you have sequence of bit fields that share the same type, compiler will put them into one int. Otherwise various aspects can kick in. Also take into account that in a big project you write piece of code and somebody else will write and rewrite the makefile; move your code from one dll into another. At this point compiler flags will be set and changed. 99% chance that those people will have no idea of alignment requirements for your struct. They will not even open your file ever.

struct alignment question

typedef struct {
char c;
char cc[2];
short s;
char ccc;
}stuck;
Should the above struct have a memory layout as this ?
1 2 3 4 5 6 7
- c - cc - s - ccc -
or this ?
1 2 3 4 5 6 7 8
- c - cc - s - ccc -
I think the first should be better but why my VS09 compiler chooses the second ? (Is my layout correct by the way ?) Thank you

I think that your structure will have the following layout, at least on Windows:
typedef struct {
char c;
char cc[2];
char __padding;
short s;
char ccc;
char __tail_padding;
} stuck;
You could avoid the padding by reordering the structure members:
typedef struct {
char c;
char cc[2];
char ccc;
short s;
} stuck;

The compiler can't choose the second. The standard mandates that the first field must be aligned with the start of the structure.
Are you using offsetof from stddef.h for finding this out ?
6.7.2.1 - 13
A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-ﬁeld, then to the unit in
which it resides), and vice versa. There may be unnamed padding
within a structure object, but not at its beginning.
It means that you can have
struct s {
int x;
char y;
double z;
};
struct s obj;
int *x = (int *)&obj; /* Legal. */
Put another way
offsetof(s, x); /* Must yield 0. */

Other than at the beginning of a structure, an implementation can put whatever padding it wants in your structures so there's no right way. From C99 6.7.2.1 Structure and union specifiers, paragraphs:
/12:Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
/13:There may be unnamed
padding within a structure object, but not at its beginning.
/15:There may be unnamed padding at the end of a structure or union.
Paragraph 13 also contains:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
This means that the fields within the structure cannot be re-ordered. And, in a large number of modern implementations (but this is not mandated by the standard), the alignment of an object is equal to its size. For example a 32-bit integer data type may have an alignment requirement of four (8-bit) bytes.
Hence, a logical alignment would be:
offset size field
------ ---- -----
0 1 char c;
1 2 char cc[2];
3 1 padding
4 2 short s;
6 1 char ccc;
7 1 padding
but, as stated, it may be something different. The final padding is to ensure that consecutive array elements are aligned correctly (since the short most likely has to be on a 2-byte boundary).
There are a number of (non-portable) ways in which you may be able to control the padding. Many compilers have a #pragma pack option that you can use to control padding (although be careful: while some systems may just slow down when accessing unaligned data, some will actually dump core for an illegal access).
Also, re-ordering the elements within the structure from largest to smallest tends to reduce padding as well since the larger elements tend to have stricter alignment requirements.
These, and an even uglier "solution" are discussed more here.

While I do really understand your visual representation of the alignment, I can tell you that with VS you can achieve a packed structure by using 'pragma':
__pragma( pack(push, 1) )
struct { ... };
__pragma( pack(pop) )
In general struct-alignment depends on the compiler used, the target-platform (and its address-size) and the weather, IOW in reality it is not well defined.

Others have mentionned that padding may be introduced either between attributes or after the last attribute.
The interesting thing though, I believe, is to understand why.
Types usually have an alignment. This property precises which address are valid (or not) for a particular type. On some architecture, this is a loose requirement (if you do not respect it, you only incur some overhead), on others, violating it causes hardware exceptions.
For example (arbitrary, as each platform define its own):
char: 1
short (16 bits): 2
int (32 bits): 4
long int (64 bits): 8
A compound type will usually have as alignment the maximum of the alignment of its parts.
How does alignment influences padding ?
In order to respect the alignment of a type, some padding may be necessary, for example:
struct S { char a; int b; };
align(S) = max(align(a), align(b)) = max(1, 4) = 4
Thus we have:
// S allocated at address 0x16 (divisible by 4)
0x16 a
0x17
0x18
0x19
0x20 b
0x21 b
0x22 b
0x23 b
Note that because b can only be allocated at an address also divisible by 4, there is some space between a and b, this space is called padding.
Where does padding comes from ?
Padding may have two different reasons:
between attributes, it is caused by a difference in alignment (see above)
at the end of the struct, it is caused by array requirements
The array requirement is that elements of an array should be allocated without intervening padding. This allows one to use pointer arithmetic to navigate from an element to another:
+---+---+---+
| S | S | S |
+---+---+---+
S* p = /**/;
p = p + 1; // <=> p = (S*)((void*)p + sizeof(S));
This means, however, than the structure S size needs be a multiple of S alignment.
Example:
struct S { int a; char b; };
+----+-+---+
| a |b| ? |
+----+-+---+
a: offset 0, size 4
b: offset 4, size 1
?: offset 5, size 3 (padding)
Putting it altogether:
typedef struct {
char a;
char b[2];
short s;
char c;
} stuck;
+-+--+-+--+-+-+
|a| b|?|s |c|?|
+-+--+-+--+-+-+
If you really wish to avoid padding, one (simple) trick (which does not involve addition nor substraction) is to simply order your attributes starting from the maximum alignment.
typedef struct {
short s;
char a;
char b[2];
char c;
} stuck;
+--+-+--+-+
| s|a| b|c|
+--+-+--+-+
It's a simple rule of thumb, especially as the alignment of basic types may change from platform to platform (32bits/64bits) whereas the relative order of the types is pretty stable (exception: the pointers).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js