size of a structure containing bit fields [duplicate]

size of a structure containing bit fields [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
I was trying to understand the concept of bit fields.
But I am not able to find why the size of the following structure in CASE III is coming out as 8 bytes.
CASE I:
struct B
{
unsigned char c; // +8 bits
} b;
sizeof(b); // Output: 1 (because unsigned char takes 1 byte on my system)
CASE II:
struct B
{
unsigned b: 1;
} b;
sizeof(b); // Output: 4 (because unsigned takes 4 bytes on my system)
CASE III:
struct B
{
unsigned char c; // +8 bits
unsigned b: 1; // +1 bit
} b;
sizeof(b); // Output: 8
I don't understand why the output for case III comes as 8. I was expecting 1(char) + 4(unsigned) = 5.

You can check the layout of the struct by using offsetof, but it will be something along the lines of:
struct B
{
unsigned char c; // +8 bits
unsigned char pad[3]; //padding
unsigned int bint; //your b:1 will be the first byte of this one
} b;
Now, it is obvious that (in a 32-bit arch.) the sizeof(b) will be 8, isn't it?
The question is, why 3 bytes of padding, and not more or less?
The answer is that the offset of a field into a struct has the same alignment requirements as the type of the field itself. In your architecture, integers are 4-byte-aligned, so offsetof(b, bint) must be multiple of 4. It cannot be 0, because there is the c before, so it will be 4. If field bint starts at offset 4 and is 4 bytes long, then the size of the struct is 8.
Another way to look at it is that the alignment requirement of a struct is the biggest of any of its fields, so this B will be 4-byte-aligned (as it is your bit field). But the size of a type must be a multiple of the alignment, 4 is not enough, so it will be 8.

I think you're seeing an alignment effect here.
Many architectures require integers to be stored at addresses in memory that are multiple of the word size.
This is why the char in your third struct is being padded with three more bytes, so that the following unsigned integer starts at an address that is a multiple of the word size.

Char are by definition a byte. ints are 4 bytes on a 32 bit system. And the struct is being padded the extra 4.
See http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86 for some explanation of padding

To keep the accesses to memory aligned the compiler is adding padding if you pack the structure it will no add the padding.

I took another look at this and here's what I found.
From the C book, "Almost everything about fields is implementation-dependant."
On my machine:
struct B {
unsigned c: 8;
unsigned b: 1;
}b;
printf("%lu\n", sizeof(b));
print 4 which is a short;
You were mixing bit fields with regular struct elements.
BTW, a bit fields is defined as: "a set of adjacent bits within a sindle implementation-defined storage unit" So, I'm not even sure that the ':8' does what you want. That would seem to not be in the spirit of bit fields (as it's not a bit any more)

The alignment and total size of the struct are platform and compiler specific. You cannot not expect straightforward and predictable answers here. Compiler can always have some special idea. For example:
struct B
{
unsigned b0: 1; // +1 bit
unsigned char c; // +8 bits
unsigned b1: 1; // +1 bit
};
Compiler can merge fields b0 and b1 into one integer and may not. It is up to compiler. Some compilers have command line keys that control this, some compilers not. Other example:
struct B
{
unsigned short c, d, e;
};
It is up to compiler to pack/not pack the fields of this struct (asuming 32 bit platform). Layout of the struct can differ between DEBUG and RELEASE builds.
I would recommend using only the following pattern:
struct B
{
unsigned b0: 1;
unsigned b1: 7;
unsigned b2: 2;
};
When you have sequence of bit fields that share the same type, compiler will put them into one int. Otherwise various aspects can kick in. Also take into account that in a big project you write piece of code and somebody else will write and rewrite the makefile; move your code from one dll into another. At this point compiler flags will be set and changed. 99% chance that those people will have no idea of alignment requirements for your struct. They will not even open your file ever.

Related

Why the Struct padding is same in 64 bit and 32 bit systems?

After much time spent on the internet to find the answer, I didn't get anything.
My question is this: I have a 64-bit system and when I run this code:
#include<iostream>
using std::cout;
class A
{
char a ; //1
char b ; //1
int c; //4
int d; //4
};
int main()
{
A a1;
cout<<sizeof(a1);
return 0;
}
the output is 12, not 16. Why?

In the used system sizeof( int ) is equal to 4. So to align the data member c the compiler pads 2 bytes after the variable b. As a result the size of the structure is 12 bytes.
You could get a more visible result dependent on whether a 32 or 64 bit system is used if you would declare data members c and d having the type long. In most 64-nit systems sizeof( long ) is equal to 8.

Padding bytes are added between data members of a structure or class when they are needed to either fulfil the alignment requirements of any of those data members, or (more generally) to optimize access speed to those members.
Now, although the alignment requirements for types such as int are not specified by the C++ Standard, and can vary (or not vary) between 32-bit and 64-bit builds for the same underlying chip (such as the Intel x86-64 systems), the alignment requirement for the char type (and also signed char and unsigned char) is specified by the Standard! From cppreference:
The weakest alignment (the smallest alignment requirement) is the
alignment of char, signed char, and unsigned char, which equals 1; the
largest fundamental alignment of any type is implementation-defined
and equal to the alignment of std::max_align_t (since C++11).
So, as the alignment requirement for the b member is 1, no padding is required between a and b. In your case, the alignment requirement for int is 4 bytes, so two padding bytes must be added before the c member. Thus, 2 × sizeof(char) (= 2) + 2 padding bytes + 2 × sizeof(int) (=8) gives a total size of 12 bytes.

Multiple adjacent bit fields in C++

I saw multiple adjacent bit fields while browsing cppreference.
unsigned char b1 : 3, : 2, b2 : 6, b3 : 2;
So,
What is the purpose of it?
When and where should I use it?

Obviously, to consume less memory working with bitwise operations. That may be significant for embedded programming, for example.
You can also use std::vector<bool> which can (and usually does) have bitfield implementation.

Multiple adjacent bit fields are usually packed together (although this behavior is implementation-defined)
If you want compiler to not add padding or perform structure alignment during multiple bit field allocation you can compile them in a single variable.
struct x
{
unsigned char b1 : 4; // compiler will add padding after this. adding to the structure size.
unsigned char b2 : 3; // compiler will add padding here too! sizeof(x) is 2.
}
struct y
{
unsigned char b1 : 4, : 3; // padding will be added only once. sizeof(y) is 1
}
Or if you want to allocate bigger bit field in a single variable
struct x
{
unsigned char b1 :9; //warning: width of 'x::b1' exceeds its type
};
struct y
{
unsigned char b1 :6, :3; //no warning
};

According to c++draft :
3 A memory location is either an object of scalar type or a maximal
sequence of adjacent bit-fields all having nonzero
width.[ Note: Various features of the language, such as references and
virtual functions, might involve additional memory locations that are
not accessible to programs but are managed by the implementation.— end
note ]Two or more threads of execution ([intro.multithread]) can
access separate memory locations without interfering with each other.

One reason is to reduce memory consumption when you don't need full integer type range. It only matters if you have large arrays.
struct Bytes
{
unsigned char b1; // Eight bits, so values 0-255
unsigned char b2;
};
Bytes b[1000000000]; // sizeof(b) == 2000000000
struct Bitfields
{
unsigned char b1 : 4; // Four bits, so only accommodates values 0-15
unsigned char b2 : 4;
};
Bitfields f[1000000000]; // sizeof(f) == 1000000000
The other reason is matching binary layout of certain hardware or file formats. Such code is not portable.

Packing bools with bit field (C++)

I'm trying to interface with Ada code using C++, so I'm defining a struct using bit fields, so that all the data is in the same place in both languages. The following is not precisely what I'm doing, but outlines the problem. The following is also a console application in VS2008, but that's not super relevant.
using namespace System;
int main() {
int array1[2] = {0, 0};
int *array2 = new int[2]();
array2[0] = 0;
array2[1] = 0;
#pragma pack(1)
struct testStruct {
// Word 0 (desired)
unsigned a : 8;
unsigned b : 1;
bool c : 1;
unsigned d : 21;
bool e : 1;
// Word 1 (desired)
int f : 32;
// Words 2-3 (desired)
int g[2]; //Cannot assign bit field but takes 64 bits in my compiler
};
testStruct test;
Console::WriteLine("size of char: {0:D}", sizeof(char) * 8);
Console::WriteLine("size of short: {0:D}", sizeof(short) * 8);
Console::WriteLine("size of int: {0:D}", sizeof(int) * 8);
Console::WriteLine("size of unsigned: {0:D}", sizeof(unsigned) * 8);
Console::WriteLine("size of long: {0:D}", sizeof(long) * 8);
Console::WriteLine("size of long long: {0:D}", sizeof(long long) * 8);
Console::WriteLine("size of bool: {0:D}", sizeof(bool) * 8);
Console::WriteLine("size of int[2]: {0:D}", sizeof(array1) * 8);
Console::WriteLine("size of int*: {0:D}", sizeof(array2) * 8);
Console::WriteLine("size of testStruct: {0:D}", sizeof(testStruct) * 8);
Console::WriteLine("size of test: {0:D}", sizeof(test) * 8);
Console::ReadKey(true);
delete[] array2;
return 0;
}
(If it wasn't clear, in the real program, the basic idea is that the program gets a void* from something communicating with the Ada code and casts it to a testStruct* to access the data.)
With #pragma pack(1) commented out, the output is:
size of char: 8
size of short: 16
size of int: 32
size of unsigned: 32
size of long: 32
size of long long: 64
size of bool: 8
size of int[2]: 64
size of int*: 32
size of testStruct: 224
size of test: 224
Obviously 4 words (indexed 0-3) should be 448 = 32*4 = 128 bits, not 224. The other output lines were to help confirm the size of types under the VS2008 compiler.
With #pragma pack(1) uncommented, that number (on the last two lines of output) is reduced to 176, which is still greater than 128. It seems that the bools aren't being packed together with the unsigned ints in "Word 0".
Note: a&b, c, d, e, f, packaged in different words would be 5, +2 for the array = 7 words, times 32 bits = 224, the number we get with #pragma pack(1) commented out. If c and e (the bools) instead take up 8 bits each, as opposed to 32, we get 176, which is the number we get with #pragma pack(1) uncommented. It seems #pragma pack(1) is only allowing the bools to be packed into single bytes by themselves, instead of words, but not the bools with the unsigned ints at all.
So my question, in one sentence: Is there a way to force the compiler to pack a through e into one word? Related is this question: C++ bitfield packing with bools , but that doesn't answer my question; it only points out the behavior I'm trying to force to go away.
If there is literally no way to do this, does anyone have any ideas for workarounds? I'm at a loss, because:
I was asked to avoid changing the struct format that I'm copying (no re-ordering).
I don't want to change the bools to unsigned ints because it may cause problems down the road with constantly having to re-cast it to bool and maybe accidentally using the wrong version of an overloaded function, not to mention making the code more obscure for others who read it later.
I don't want to declare them as private unsigned ints then make public accessors or something because all other members of all other structs in the project are accessed directly without () afterward, so it would seem a bit hacky and obtuse, and one would almost NEED the IntelliSense or trial-and-error to remember which needs () and which doesn't.
I would like to avoid creating another struct type just for the data conversion (and e.g. make a constructor for testStruct that takes in a single testStructImport-type object) because the actual struct is very long with lots of bit-field-specified variables.

I recommend that you create a "normal" structure without any bit packing. Use default POD types for the members.
Create interface functions for loading the "normal" fields from a buffer (uint8_t), and storing to a buffer.
This will allow you to use the data members in a sane method in your program. The bit packing and unpacking will be handled by the interface function. The bit twiddling should use bitwise AND and bitwise OR functions and not rely on the bit field notation in a structure. This will allow you to adjust the bit twiddling and be more portable among compilers.
This is how I designed my protocol classes. And I don't have to worry about bit field positioning, Endianess or things of that sort.
Also, I can use block I/O for reading and writing the buffer.

Try packing in this way:
#pragma pack( push, 1 )
struct testStruct {
// Word 0 (desired)
unsigned a : 8;
unsigned b : 1;
unsigned c : 1;
unsigned d : 21;
unsigned e : 1;
// Word 1 (desired)
unsigned f : 32;
// Words 2-3 (desired)
unsigned g[2]; //Cannot assign bit field but takes 64 bits in my compiler
};
#pragma pack(pop)

There is no easy, elegant method without using accessors or an interface layer. Unfortunately, there is nothing like a #pragma thing to fix this. I ended up just converting the bools to unsigned int and renaming variables from e.g. f to f_flag or f_bool to encourage correct usage and make it clear what the variables contained. It's lower-effort than Thomas's solution, but not as robust, obviously, and still gets around some of the main drawbacks with any of the easier methods.

Years after I posted this question, user #WaltK added this comment to the linked, related question:
"If you want to have more control over the layout of bit field
structures in memory, consider using this bit field facility,
implemented as a library header file."

When the compiler decides to pad a struct

Let's say we have:
struct A{
char a1;
char a2;
};
struct B{
int b1;
char b2;
};
struct C{
char C1;
int C2;
};
I know that because of padding to a multiple of the word size (assuming word size=4), sizeof(C)==8 although sizeof(char)==1 and sizeof(int)==4.
I would expect that sizeof(B)==5 instead of sizeof(B)==8.
But if sizeof(B)==8 I would expect that sizeof(A)==4 instead of sizeof(A)==2.
Could anyone please explain why the padding and the aligning are working differently in those cases?

A common padding scheme is to pad structs so that each member starts at an even multiple of the size of that member or to the machine word size (whichever is smaller). The entire struct is padded following the same rule.
Assuming such a padding scheme I would expect:
The biggest member in struct A has size 1, so no padding is used.
In struct B, the size of 5 is padded to 8, because one member has size 4.
The layout would be:
int 4
char 1
padding 3
In struct C, some padding is inserted before the int, so that it starts at an address divisible by 4.
The layout would be:
char 1
padding 3
int 4

It's up to the compiler to decide how best to pad the struct. For some reason, it decided that in struct B that char b2 was more optimally aligned on a 4 byte boundary. Additionally, the specific architecture may have requirements/behaviors that the compiler takes into account when deciding how to pad structs.
If you 'pack' the struct, then you'd see the sizes you expect (although that is not portable and may have performance penalties and other issues depending on the architecture).

structs in general will be aligned to boundaries based on the largest type contained. Consider an array of struct B myarray[5];
struct B must be aligned to 8 bytes so that it's b1 member is always on a 4 byte boundary. myarray[1].b1 can't start at the 6th byte into the array, which is what you would have if sizeof(B) == 5.

struct alignment question

typedef struct {
char c;
char cc[2];
short s;
char ccc;
}stuck;
Should the above struct have a memory layout as this ?
1 2 3 4 5 6 7
- c - cc - s - ccc -
or this ?
1 2 3 4 5 6 7 8
- c - cc - s - ccc -
I think the first should be better but why my VS09 compiler chooses the second ? (Is my layout correct by the way ?) Thank you

I think that your structure will have the following layout, at least on Windows:
typedef struct {
char c;
char cc[2];
char __padding;
short s;
char ccc;
char __tail_padding;
} stuck;
You could avoid the padding by reordering the structure members:
typedef struct {
char c;
char cc[2];
char ccc;
short s;
} stuck;

The compiler can't choose the second. The standard mandates that the first field must be aligned with the start of the structure.
Are you using offsetof from stddef.h for finding this out ?
6.7.2.1 - 13
A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-ﬁeld, then to the unit in
which it resides), and vice versa. There may be unnamed padding
within a structure object, but not at its beginning.
It means that you can have
struct s {
int x;
char y;
double z;
};
struct s obj;
int *x = (int *)&obj; /* Legal. */
Put another way
offsetof(s, x); /* Must yield 0. */

Other than at the beginning of a structure, an implementation can put whatever padding it wants in your structures so there's no right way. From C99 6.7.2.1 Structure and union specifiers, paragraphs:
/12:Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
/13:There may be unnamed
padding within a structure object, but not at its beginning.
/15:There may be unnamed padding at the end of a structure or union.
Paragraph 13 also contains:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
This means that the fields within the structure cannot be re-ordered. And, in a large number of modern implementations (but this is not mandated by the standard), the alignment of an object is equal to its size. For example a 32-bit integer data type may have an alignment requirement of four (8-bit) bytes.
Hence, a logical alignment would be:
offset size field
------ ---- -----
0 1 char c;
1 2 char cc[2];
3 1 padding
4 2 short s;
6 1 char ccc;
7 1 padding
but, as stated, it may be something different. The final padding is to ensure that consecutive array elements are aligned correctly (since the short most likely has to be on a 2-byte boundary).
There are a number of (non-portable) ways in which you may be able to control the padding. Many compilers have a #pragma pack option that you can use to control padding (although be careful: while some systems may just slow down when accessing unaligned data, some will actually dump core for an illegal access).
Also, re-ordering the elements within the structure from largest to smallest tends to reduce padding as well since the larger elements tend to have stricter alignment requirements.
These, and an even uglier "solution" are discussed more here.

While I do really understand your visual representation of the alignment, I can tell you that with VS you can achieve a packed structure by using 'pragma':
__pragma( pack(push, 1) )
struct { ... };
__pragma( pack(pop) )
In general struct-alignment depends on the compiler used, the target-platform (and its address-size) and the weather, IOW in reality it is not well defined.

Others have mentionned that padding may be introduced either between attributes or after the last attribute.
The interesting thing though, I believe, is to understand why.
Types usually have an alignment. This property precises which address are valid (or not) for a particular type. On some architecture, this is a loose requirement (if you do not respect it, you only incur some overhead), on others, violating it causes hardware exceptions.
For example (arbitrary, as each platform define its own):
char: 1
short (16 bits): 2
int (32 bits): 4
long int (64 bits): 8
A compound type will usually have as alignment the maximum of the alignment of its parts.
How does alignment influences padding ?
In order to respect the alignment of a type, some padding may be necessary, for example:
struct S { char a; int b; };
align(S) = max(align(a), align(b)) = max(1, 4) = 4
Thus we have:
// S allocated at address 0x16 (divisible by 4)
0x16 a
0x17
0x18
0x19
0x20 b
0x21 b
0x22 b
0x23 b
Note that because b can only be allocated at an address also divisible by 4, there is some space between a and b, this space is called padding.
Where does padding comes from ?
Padding may have two different reasons:
between attributes, it is caused by a difference in alignment (see above)
at the end of the struct, it is caused by array requirements
The array requirement is that elements of an array should be allocated without intervening padding. This allows one to use pointer arithmetic to navigate from an element to another:
+---+---+---+
| S | S | S |
+---+---+---+
S* p = /**/;
p = p + 1; // <=> p = (S*)((void*)p + sizeof(S));
This means, however, than the structure S size needs be a multiple of S alignment.
Example:
struct S { int a; char b; };
+----+-+---+
| a |b| ? |
+----+-+---+
a: offset 0, size 4
b: offset 4, size 1
?: offset 5, size 3 (padding)
Putting it altogether:
typedef struct {
char a;
char b[2];
short s;
char c;
} stuck;
+-+--+-+--+-+-+
|a| b|?|s |c|?|
+-+--+-+--+-+-+
If you really wish to avoid padding, one (simple) trick (which does not involve addition nor substraction) is to simply order your attributes starting from the maximum alignment.
typedef struct {
short s;
char a;
char b[2];
char c;
} stuck;
+--+-+--+-+
| s|a| b|c|
+--+-+--+-+
It's a simple rule of thumb, especially as the alignment of basic types may change from platform to platform (32bits/64bits) whereas the relative order of the types is pretty stable (exception: the pointers).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js